Results-driven Data Engineering Leader with 20+ years of expertise in designing, developing, and delivering enterprise-scale data systems and applications.
Proven ability to lead high-performing teams to execute complex, large-scale projects on time and within budget
Extensive experience in the full lifecycle development of enterprise-scale data engineering solutions, including data quality frameworks, data warehousing architectures, real-time and batch processing pipelines, system integrations, and comprehensive data strategy implementation.
Expertise in building advanced data engineering workflows, including migrating batch data from relational databases and Hadoop clusters to modern, cloud-based data lakes on AWS.
Proficient in real-time data processing and streaming applications using Kafka and Spark Streaming, delivering high-performance, low-latency data solutions on AWS
Specialized in Snowflake development, leveraging advanced capabilities such as data sharing, Snowpipe, Snowpark, stored procedures, and query optimization to implement efficient and scalable data warehousing solutions.
Passionate about staying at the forefront of technology trends, adopting best practices in Agile methodologies, and driving innovation to deliver impactful data solutions.
Overview
11
11
years of professional experience
1
1
Certification
Work History
Data Engineering Technical Lead
Freddie Mac
05.2020 - Current
Successfully led a team of 10+ developers, analysts, and testers for over 4 years, managing the end-to-end migration of data from traditional RDBMS to enterprise-scale data lakes.
Provided mentorship and professional development opportunities to team members, driving skill enhancement and team productivity.
Oversaw the development and implementation of scalable data frameworks, ensuring timely delivery of projects and fostering a culture of continuous improvement and collaboration.
Led design and development a custom framework for migrating data from Cloudera big data cluster(HDFS/Hive) to enterprise data lake using Apache Spark, Python.
Directed the creation of optimized data pipelines for transforming raw data into enriched formats, leveraging AWS EMR, Spark,Glue, DynamoDB, Snowflake, and S3.
Guided team members in building a Spark-based framework to process semi-structured data (JSON/XML) into relational formats for Snowflake and S3 ingestion, ensuring high-quality, reliable outputs.
Strong hands on experience with design and developing data warehouse applications on Snowflake using SnowSQL, Snowpark, Data sharing, tasks and Stored procedures.
Developed workflow orchestration using AWS Step functions and AWS Lambda.
Implemented robust workflow orchestration solutions using AWS Step Functions and Lambda, integrating seamlessly with CI/CD pipelines to support deployments in Dev, UAT, SIT, and Production environments.
As a Principal solution architect involved in architecture design of integrated data lake(IDL) with data ingested from 150 data sources in AWS.
Data is ingested into the data lake using Apache NiFi, EMR and Spark
Involved in data model design of spatial database tables/columns in geo-spatial database GeoMesa and creating indexes on commonly used spatial attributes
Designed containerized applications that are deployed in a kubernetes cluster managed by Rancher Kubernetes Engine(RKE) and applications are deployed using Helm charts.
As a Big Data Architect, Involved in solution engineering of migrating traditional on premise applications to Amazon Web Services Cloud
Designed and implemented big data pipelines to load, transform and process large volumes of data from on premise to cloud using S3, EMR, Glue, DynamoDB and Apache Spark
Implemented REST API using python Flask and AWS Elastic Beanstalk for automating uploading data sets, running machine learning models as well as running predictions on AWS Batch (Elastic Container Services, Docker)
Developed workflows using AWS Step functions and AWS Lambda
Created CI/CD pipelines using AWS code commit, Code Build and Code Deploy for automatic deployment of applications into development, UAT and production environments
Provisioned AWS resources like EMR, DynamoDB, AWS Batch, Lambda functions and Step functions using AWS Cloudformation.
Big Data Software Engineer IV (Principal Software Engineer)
Radiant Solutions
10.2017 - 07.2018
Member of the core development team that is responsible for developing an open source framework called Geowave
Geowave is a software framework/library that connects the scalability of distributed computing frameworks and key-value stores with modern geospatial software to store, retrieve and analyze massive geospatial datasets
As core developer, involved in adding major new features in recent releases such as Distributed spatial ingest using apache spark 2.x and support for Cassandra as pluggable backend key-value store
Worked on tuning and optimizing of spark configuration to improve performance of the spark ingest jobs.
Big Data Tech Lead
Freddie Mac
10.2013 - 09.2017
Led team of big data developers in design/development/deployment of big data solutions in Freddie Mac for various organizations by creating data lakes by ingesting and transformation of structured/unstructured data from multiple data sources and enabled the organization to run analytics and get insights from large volumes of data.
Affordable Lending Manager at Federal Home Loan Mortgage Corporation (Freddie Mac)Affordable Lending Manager at Federal Home Loan Mortgage Corporation (Freddie Mac)