Summary
Overview
Work History
Education
Skills
Websites
Timeline
Generic

VENKATASAI PERAM

Henrcio

Summary

Accomplished Senior Data Engineer at CapitalOne, specializing in designing robust data pipelines and optimizing ETL processes. Proficient in Python and AWS, I achieved a 50% reduction in processing time while enhancing data governance. A collaborative team player, I excel in delivering impactful solutions that drive business success.

Overview

11
11
years of professional experience

Work History

Senior Data Engineer

CapitalOne
Richmond
09.2018 - Current
  • Designed and led end-to-end data pipeline architecture for Capital One’s marketing and messaging platform in the banking domain and Marketing team, implementing CDC-based ingestion into Delta Lake using PySpark, SQL, AWS Glue, Lambda, Step Functions, and DynamoDB.
  • Optimized ETL Performance by transitioning from sequential to parallel workflows using AWS Step Functions, lambda and Glue, reducing processing time by 50% and improving scalability.
  • Applied PySpark best practices such as broadcast joins, partition pruning, and caching, resulting in a 35% reduction in processing time and 25% cost savings on cloud resources.
  • Implemented Medallion Architecture to organize data into Bronze (raw), Silver (trusted), and Gold (curated) layers, ensuring modular design and better data governance.
  • Designed and integrated RESTful APIs for data services using Python and Java, enabling secure, scalable, and real-time data access across cloud platforms.
  • Built a YAML-based ETL framework to automate and standardize workflows, incorporating Linux shell scripting and Python for orchestration, and wrote highly efficient SQL queries leveraging window functions, CTEs, and optimized joins to improve query performance.
  • Implemented data quality checks and governance frameworks using PyDeequ and schema validation, ensuring compliance with enterprise data standards and regulatory requirements.
  • Managed production deployments using CI/CD pipelines (GitLab/GitHub) to ensure stable, seamless application updates; provided 24/7 production support, quickly troubleshooting and resolving critical issues to maintain uninterrupted business operations.
  • Built a comprehensive Power BI and Redash dashboard displaying full ETL lifecycle—job status, source/destination paths, data freshness, error and runtime metrics—empowering stakeholders with real-time visibility and reducing debugging time by 40%

Big Data Engineer

Genentech
San Francisco
09.2016 - 08.2018
  • Optimized SQL queries, enhancing report generation speed and efficiency.
  • Designed a comprehensive schema, boosting productivity and performance metrics.
  • Authored shell scripts for seamless integration of source table attributes with Hive staging tables.
  • Reduced processing time by 60% through Hive and HBase integration, improving storage efficiency.
  • Enhanced HQL performance with strategic Hive table optimization using Partitioning and Bucketing.
  • Increased data processing speeds by 50% with Spark, boosting overall productivity.
  • Implemented Airflow to efficiently orchestrate batch and real-time data workflows from source to target.
  • Automated data retrieval from FTP servers to Hive tables using Oozie workflows.
  • Utilized Sqoop for efficient data transfer between RDBMS and HDFS.

Hadoop Developer

BBVA
Birmingham
09.2015 - 08.2016
  • Involved in review of functional and non-functional requirements.
  • Facilitated knowledge transfer sessions.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Imported and exported data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows.
  • Experienced in managing and reviewing Hadoop log files.
  • Developed Spark application using Java and Implemented Apache Spark Data processing project to handle data from various RDBMS.

JAVA Developer

Indigene
Bengaluru
08.2014 - 07.2015
  • Involved in design, development and analysis documents prepared with Clients.
  • Responsible for the Requirement Analysis and Design of Smart Systems Pro (SSP)
  • Involved in Object Oriented Design (OOD) and Analysis (OOA).
  • Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application.
  • Worked with Restful Web Services and WSDL.

Education

Master's - computer science

Silicon Valley University
USA
01.2016

Bachelor of Technology - CSE

JNTU
01.2014

Skills

  • Programming Languages: Python, Scala, Java, SQL
  • Big Data Technologies: Spark, Hadoop, Hive, EMR, Kafka, Delta Lake, Iceberg
  • Cloud Platforms: AWS (EC2, S3, EMR, Glue, Redshift, Lambda, Step Functions)
  • Data Warehousing: Redshift, Snowflake
  • NoSQL Databases: DynamoDB, Cassandra, MongoDB
  • Operating Systems: UNIX/Linux, Windows
  • Agile Practices: CI/CD, Scrum, Test-Driven Development

Timeline

Senior Data Engineer

CapitalOne
09.2018 - Current

Big Data Engineer

Genentech
09.2016 - 08.2018

Hadoop Developer

BBVA
09.2015 - 08.2016

JAVA Developer

Indigene
08.2014 - 07.2015

Master's - computer science

Silicon Valley University

Bachelor of Technology - CSE

JNTU
VENKATASAI PERAM