Summary

Overview

Work History

Education

Skills

Websites

Timeline

VENKATASAI PERAM

Henrcio

Summary

Accomplished Senior Data Engineer at CapitalOne, specializing in designing robust data pipelines and optimizing ETL processes. Proficient in Python and AWS, I achieved a 50% reduction in processing time while enhancing data governance. A collaborative team player, I excel in delivering impactful solutions that drive business success.

Overview

years of professional experience

Work History

Senior Data Engineer

CapitalOne

Richmond

09.2018 - Current

Designed and led end-to-end data pipeline architecture for Capital One’s marketing and messaging platform in the banking domain and Marketing team, implementing CDC-based ingestion into Delta Lake using PySpark, SQL, AWS Glue, Lambda, Step Functions, and DynamoDB.
Optimized ETL Performance by transitioning from sequential to parallel workflows using AWS Step Functions, lambda and Glue, reducing processing time by 50% and improving scalability.
Applied PySpark best practices such as broadcast joins, partition pruning, and caching, resulting in a 35% reduction in processing time and 25% cost savings on cloud resources.
Implemented Medallion Architecture to organize data into Bronze (raw), Silver (trusted), and Gold (curated) layers, ensuring modular design and better data governance.
Designed and integrated RESTful APIs for data services using Python and Java, enabling secure, scalable, and real-time data access across cloud platforms.
Built a YAML-based ETL framework to automate and standardize workflows, incorporating Linux shell scripting and Python for orchestration, and wrote highly efficient SQL queries leveraging window functions, CTEs, and optimized joins to improve query performance.
Implemented data quality checks and governance frameworks using PyDeequ and schema validation, ensuring compliance with enterprise data standards and regulatory requirements.
Managed production deployments using CI/CD pipelines (GitLab/GitHub) to ensure stable, seamless application updates; provided 24/7 production support, quickly troubleshooting and resolving critical issues to maintain uninterrupted business operations.
Built a comprehensive Power BI and Redash dashboard displaying full ETL lifecycle—job status, source/destination paths, data freshness, error and runtime metrics—empowering stakeholders with real-time visibility and reducing debugging time by 40%

Big Data Engineer

Genentech

San Francisco

09.2016 - 08.2018

Optimized SQL queries, enhancing report generation speed and efficiency.
Designed a comprehensive schema, boosting productivity and performance metrics.
Authored shell scripts for seamless integration of source table attributes with Hive staging tables.
Reduced processing time by 60% through Hive and HBase integration, improving storage efficiency.
Enhanced HQL performance with strategic Hive table optimization using Partitioning and Bucketing.
Increased data processing speeds by 50% with Spark, boosting overall productivity.
Implemented Airflow to efficiently orchestrate batch and real-time data workflows from source to target.
Automated data retrieval from FTP servers to Hive tables using Oozie workflows.
Utilized Sqoop for efficient data transfer between RDBMS and HDFS.

Hadoop Developer

BBVA

Birmingham

09.2015 - 08.2016

Involved in review of functional and non-functional requirements.
Facilitated knowledge transfer sessions.
Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
Imported and exported data into HDFS and Hive using Sqoop.
Experienced in defining job flows.
Experienced in managing and reviewing Hadoop log files.
Developed Spark application using Java and Implemented Apache Spark Data processing project to handle data from various RDBMS.

JAVA Developer

Indigene

Bengaluru

08.2014 - 07.2015

Involved in design, development and analysis documents prepared with Clients.
Responsible for the Requirement Analysis and Design of Smart Systems Pro (SSP)
Involved in Object Oriented Design (OOD) and Analysis (OOA).
Analysis and Design of the Object models using JAVA/J2EE Design Patterns in various tiers of the application.
Worked with Restful Web Services and WSDL.

Education

Master's - computer science

Silicon Valley University

USA

01.2016

Bachelor of Technology - CSE

JNTU

01.2014

Skills

Programming Languages: Python, Scala, Java, SQL
Big Data Technologies: Spark, Hadoop, Hive, EMR, Kafka, Delta Lake, Iceberg
Cloud Platforms: AWS (EC2, S3, EMR, Glue, Redshift, Lambda, Step Functions)
Data Warehousing: Redshift, Snowflake

NoSQL Databases: DynamoDB, Cassandra, MongoDB
Operating Systems: UNIX/Linux, Windows
Agile Practices: CI/CD, Scrum, Test-Driven Development

Websites

https://www.linkedin.com/in/venkatasai-peram-671870162/

Timeline

Senior Data Engineer

CapitalOne

09.2018 - Current

Big Data Engineer

Genentech

09.2016 - 08.2018

Hadoop Developer

BBVA

09.2015 - 08.2016

JAVA Developer

Indigene

08.2014 - 07.2015

Master's - computer science

Silicon Valley University

Bachelor of Technology - CSE

JNTU

VENKATASAI PERAM

Summary

Overview

Work History

Senior Data Engineer

Big Data Engineer

Hadoop Developer

JAVA Developer

Education

Master's - computer science

Bachelor of Technology - CSE

Skills

Websites

Timeline

Senior Data Engineer

Big Data Engineer

Hadoop Developer

JAVA Developer

Master's - computer science

Bachelor of Technology - CSE

Similar Profiles

Cami SeayCami Seay

Taye DehhhhTaye Dehhhh

Sai Keerthi ParuchuriSai Keerthi Paruchuri

Vera WillsVera Wills

Chih Han YuChih Han Yu