Experienced DevSecOps and Site Reliability Engineer with over 20 years of expertise in cloud infrastructure, security automation, and system reliability. Proficient in AWS, Kubernetes, and CI/CD tools, with a strong focus on automating security controls, building resilient systems, and ensuring high availability. Adept at incident management, performance optimization, and cost-effective cloud operations. Passionate about building secure, scalable, and reliable systems, and mentoring teams to adopt best practices for security and operational excellence
Overview
17
17
years of professional experience
1
1
Certification
Work History
Sr SRE/DevOps Consultant
Apex Systems / CapitalOne
10.2024 - Current
Part of CapitalOne’s CORE team responsible for entire Consumer Identity platform supporting more than 25 ASVs (Application services)
Providing first line of support in on-call rotation once a week
Leading a team of SREs, taking ownership of large-scale system reliability initiatives
Designing and implementing complex system architectures with a focus on scalability and resilience
Focusing on driving DevOps culture within the organization, promoting automation and collaboration
Developing and maintaining automation scripts for deployment, monitoring, and remediation tasks using tools like Ansible, Terraform, and custom scripts
Implementing robust monitoring systems to proactively identify potential issues and trigger alerts for timely response
Analyzing system performance and making capacity adjustments to anticipate future demand
Contributing to codebases for building and maintaining infrastructure components, including writing clean, maintainable code
Performing SLO analysis helps organizations track, measure, and improve service quality, ensuring that they meet their service commitments effectively using tools like New Relic, Splunk, Observe and other APM tools
Own the availability, reliability, and performance of critical services and systems
Define and implement Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to measure and ensure system health
Ensure that services meet agreed-upon SLAs (Service Level Agreements) by driving reliability engineering practices across the organization
Proficient in AWS services like EC2, S3, RDS, Lambda, and CloudFormation, with a strong focus on reliability, scalability, and cost optimization
Skilled in implementing monitoring solutions using CloudWatch, automating deployments through CI/CD pipelines, and driving performance improvements for mission-critical services
Experienced in incident management, disaster recovery, and security best practices on AWS
Using custom in-house tools like Cloud Doctro, AION, Smart Ops, Counslr, PagerDuty, Hawkeye, Ozone, Cloud Radar and many more.
Subject Matter Expert
CompTIACompTIA
10.2017 - Current
Helped develop exam objectives, validate content, and write questions for the CompTIA Linux+ exam
Helped develop exam objectives, validate content, and write questions for the CompTIA Cloud+ CV0-003 exam.
Adjunct Associate Professor
University of Maryland Global Campus
08.2017 - Current
Teaching Linux Systems Administration (CMIT), SDEV-400 Secure Cloud Programming
DevSecOps Consultant/Engineer
Department
06.2024 - 10.2024
Helping DOL secure and enhance DevSecOps initiatives for Appian and WCMS projects
Creating modern and secure DevSecOps pipeline to automate and ease deployment of various components and applications
Using various tools like Gitlab, Jenkins, Sonarqube, Trivy, Pally, Qualys, argoCD and EKS to rapidly build and deploy the code in Dev/Stage and Production environments
Created secure CI pipeline and generated various vulnerability reports like SAST, DAST and SBOM
Participated in daily scrum and by weekly client sprint planning
Participated in various sprint demo to achieve MVP goals
Worked with CloduOps, Development and Security team to create collaborative CI/CD pipelines.
Lead DevSecOps Engineer/Architect
Karthikconsulting, USCG
09.2023 - 06.2024
Helping United States Coast Guard’s HERMN team managing secure software factory platform running on Amazon Gov Cloud EKS environment
Managing EKS based managed Kubernetes cluster that is hosting Cost Guar’s major software development
Daily activity involves managing secure CI/CD pipelines, scanning containers using tools like Trivy, Synk and OpenSCAP
Used Confluence for documentation management, creating detailed project documentation
Attending SOC and ATO compliance meetings to provide support for security reviews and solutions
Deploying microservices on EKS platforms
Developed and maintained large-scale applications utilizing advanced TypeScript features such as generics, decorators, and type inference to ensure type safety and improve code maintainability
Reviewing Gitlab SAST and DAST compliance report and fixing open vulnerabilities in microservices platforms and software libraries
Utilized XML for data exchange between different systems and applications, ensuring structured and well-formed data communication
Conducting IT security
Risk assessment accordance with FedRAMP, FISMA and NIST 800-53a compliance
Developing security controls, threat models, threat analysis and risk mitigations
Designed and implemented RESTful APIs in PHP, enabling robust backend services for various client applications
Performing vulnerability assessment using various tools like Nessus, Nikto, RedHat Clair and Kubescope/Kivarno
Implanting firewall using AWS WAF and DoD approved tools and technologies
Using Secure container registries from DoD Iron Bank and Harbor
Implementing Security Event Management (SIEM) and Incidents using tools like Wazuh, Nessus and Qualys
Performing monthly patching on cloud resources and EKS based managed Kubernetes platform
Implementing security polices, trainings and standards across all departments and IT resources including Cloud services, Linux servers and web applications
Conducting security audits on a weekly basis.
DevSecOps and Cloud Engineering Director
Universal
08.2022 - 09.2023
Managing the team of DevOps and Security engineers in the US and abroad
Helping secure Google Cloud, Kubernetes clusters and cloud resources
Leveraged advanced PostgreSQL features such as partitioning, indexing, and full-text search to improve application performance and scalability
Developing SOC, DevOps and production release strategies and plans
Supporting health care products like uVAx, uConsult, uWellness and mobile apps for the organization
Supporting government agencies like DHS (CBP), FENMA and many more in the HIPAA compliant environment
Create DR and BC plans along with IT security assessment strategy and plans
Created RESTful and GraphQL APIs in Node.js, enabling efficient communication between frontend and backend services
Used open-source tools like Wazuh, Nessus and Qualys for internal vulnerability assessment and file integrity monitoring
Integrated TypeScript with popular frameworks like Angular and React, enhancing the robustness and scalability of front-end applications
Performed security audits of web applications and cloud computing resources
Created security policies and implemented the same for all IT resources.
Director
DTIS, Digital Trusted Identity Service
10.2019 - 08.2022
Of DevSecOps and Cloud Technologies
Leading DevOps and IT-ops team within the organization
Helping DBA and Dev team for the better
Providing robust and automated build and deployment in various stages of the software development lifecycle
Managing large portfolios in both commercial and public sectors
Used Confluence for documentation and knowledge management, creating detailed project documentation, user guides, and technical specifications to facilitate team collaboration and information sharing
Helping government agencies like TSA, FBI, NASA, etc
To hire employees securely
Providing 24x7 on-call support for many websites, applications, and cloud resources
Migrated legacy applications to cloud providers like AWS and Azure, saving more than $30M in operating costs
Managing a portfolio of large commercial accounts worth more than $250M
Supporting high transactions product like IdentityX that helps large international financial institutes
Created architectural design for EKS cluster for some of the commercial applications
Developed scalable and high-performance server-side applications using Node.js, employing asynchronous programming and event-driven architecture
Built PagerDuty and Icinga2 dashboards for the executive team to visualize platform health
Build Hashicorp Vault for better management of secrets
Built Icinga2 monitoring cluster for applications and servers monitoring from the ground up
Created multiple production environments AWS for various applications
Created centralized docker registry for entire origination to manage docker images
Created custom Centos OS-based images to be used as AMI for AWS as well as on-prem server buildouts
Installed and configured the following products
Managed incidents, changes, and service requests in ServiceNow, ensuring timely resolution and adherence to ITIL best practices
Supported applications and infrastructure required to comply with FISMA, FedRAMP, DoD, and NIST standards
Supported Microsoft Project Management Server (EPM) and Primavera
Supported applications are:
Application Based Transport – UBER/Wingz/LYFT Q management systems for large airports
AAAE clearing house - Employee badging systems for airports
FBI proxy - FBI channelling partner service for criminal background check
QStartr – Taxicab Q system for large airports
Access/Preenrollment for local Sheriff’s offices across the nation
NASA/FINRA employee background check services
IdentityX – Fingerprint authentication for financial institutes
Technical responsibilities:
Created a custom operating system for microservices deployment
Docker and Kubernetes compatible images
CI/CD pipeline on OKD clusters
(Openshift 3.x and 4.x version)
HAProxy HA cluster for various environments
ELK cluster (15+ nodes)
Hashicorp Vault for secret management
Site24x7 and working dashboards
Icigna2 HA cluster and dashboards
PagerDuty and dashboards
Multiple AWS environment setup from scratch
Setup multi nodes EKS cluster from scratch with filebeat, metricsbeat and logstash
On-site Kubernetes (50 nodes cluster) on VMWare
Created Xwiki application environment for the company documentation
Automated infrastructure as code with tools like Ansible and salt
Perform Vulnerability scan using Tenable enterprise “Nessus scan”
Fix CIS and STIG benchmark vulnerabilities using Ansible automated way
Patch Linux system and create dashboards using the Patchman tool
Built RabbitMQ cluster (3 nodes) on an internal VMWare-based environment to store fingerprint image metadata
Managing the production cluster
Built Kafka cluster (3 nodes) on an internal VMware-based environment to store fingerprint images and converted PDF reports
Managing the production cluster.
Enterprise Cloud Solutions Lead/Manager
Fannie Mae
07.2019 - 10.2019
Supported applications are:
Single Family loan processing system
Portfolio transaction services
Technical responsibilities:
Worked on building enterprise container solutions on private and public clouds
Wrote custom Docker file and converted legacy applications to Docker containers and deployed them with helm charts on Kubernetes
Built mid-size cluster on OpenShift and Rancher/RKE
Also created small size cluster on AWS EKS
Created CI/CD pipelines using OpenShift build-in solution
Converted spring boot application to Docker-based images in test and staging environment
Created Jupyterhub/Jupyterlab cluster on top of OpenShift for the development team.
DevOps Tech Lead
Alarm.com
06.2017 - 07.2019
Supported applications are:
Alarm.com dealer onboarding system
AWS cloud
911 extended module
ADT pulse connected applications
Technical responsibilities:
Completed large migration project of legacy containerized application to highly secure, scalable Kubernetes environment that saved millions of dollars
Completed RKE-based cluster on VMWare Virtual environment running docker containers on Kubernetes managed platform
Created dockerized Kubernetes environment from scratch and deployed PHP/MySQL-based application
Built a large cluster of Kubernetes cluster with Rancher/RKE technology and integrated with F5 load balancer for high traffic critical customer-facing applications
Automated CI/CD build pipeline using Gitlab runners running on Kubernetes
Integrated application metrics and monitoring with Icinga2, PagerDuty, SumoLogic, Grafana, Prometheus, Influxdb and Wavefront
Managed large Hadoop cluster based on Cloudera express running Spark jobs for business intelligence data
Built two large Hadoop clusters in the Staging and Production environment
Built secure, scalable centralized docker registry with Nexus OSS front-end from scratch
Created multiple dashboards with TICK (Telegraf, Influxdb, Chronograf, and Kapacitor) stack for Docker, Linux hosts statistics, analysis, and monitoring
Supported Microsoft Project Management Server (EPM) and Primavera
Created VMWare vRA automation blueprint to deploy Dockerized VMware host and integrated with internal docker registry to build .net and docker enabled VMware environment thought company
Built open-source Kafa clusters in testing and production environment for IoT devices.
Comcast
01.2013 - 07.2017
Lead DevOps Engineer, Sr. System Engineer
Corpus Inc Washington
02.2009 - 01.2013
Worked as a 24x7 Production support engineer for Comcast’s UDB team which manages a highly available cluster application environment that provides Entitlements to Comcast customers
The team also manages other applications like Title availability service, digital Locker, Resume point, Streams, Caller ID, Data Ingest, and many more
As team members, used to manage large number of physical, virtual, and microservices instances in a production environment
Providing on-call support for all customer-facing applications and services, metrics, monitoring, scripting, etc
Hands-on experience on backend systems like Hadoop, Cassandra, MySQL, Hazelcast, and Riak
Working knowledge of Ansible playbooks, puppet configuration management, and Jenkins for automated deployments
Have worked on many projects that involved heavy scripting using Perl, Python, Ruby, and PHP
Created many monitoring scripts and a configuration management interface called RMI to ease Nagios configuration management easy
Hands-on experience with virtualization technology like VMWare, KVN, and Xen
Also worked on messaging systems like RabbitMQ and Redis
Installed, configured, and maintained ELK (Elasticsearch, Logstash, and Kibana) stack
Performed security patches to Linux (OpenSSH/OpenSSL/Glibc) and VMWare (ESXi/VCenter) servers
Provide 24x7 on-call support once a month
Supporting large-scale Java applications providing Video-on-demand and linear schedule data
Installing and Managing security patches, OS upgrades, etc
Deployment automation using Jenkins, Ansible, Puppet, and Docker
Created a few Nagios plugins in Perl, and Python to provide monitoring and metrics for the production applications.
Sr. Production Operation Engineer
Comcast
05.2008 - 12.2012
Worked as a UNIX Production support engineer for, ’s TVSearch, Video Search, TVPlanner, and video on Demand Team
Have implemented more than 200 Servers on ESXi Virtualization technology
Helped plan and execute VMWare ESXi to Xen on Dell R900 platform
Implemented a bunch of Red Hat Enterprise Linux 5.3 64-bit kickstart server installations
Also, installed and configured PXE boot for diskless and auto boot process
Developed Nagios monitoring customer PERL/Python code for Systems and software downtime alerts
Upgraded JBOSS / Jdk in Production and non-Production environment
Installed, configured, and managed Webmethods 7.1.2 on RedHat Linux to integrate a Video-on-demand metadata hub with a Java-based VOD search application system running on Weblogic 8.1
Used Weblogic Workshop control to seamlessly connect with Wemethods
Developed MySQL/PHP-based Video on Demand assets tracking system as well as Xen host and Guest dynamic search system
Provide 24x7 on-call support once a month
Supporting large-scale Java applications providing Video-on-demand and linear schedule data
Installing and Managing OS patch, Java security patch
Using Xymon, Nagios, Cacti, and Bamboo tools for builds and system monitoring
Providing support for Jboss, Apache Tomcat, Hadoop, and Solr products
Installation and configuration of Redhat, Xen, and KVM.
UNIX Lead Developer
Duke University Heath Technology Systems
09.2008 - 12.2008
Worked on Solaris OS migration from 8 to 10 and RHEL 3 and 4 to RHEL 5.2
Managing 50+ Servers of HP, IBM, and Sun
Installed, and configured Veritas FS 5.0 and Netbackup on Solaris and Linux Servers
Wrote custom script in PERL/PHP/Shell for OS, Database migration
Writing PERL modules for Nagios and Cacti for Java-based applications hosted on JBOSS, Apache, and Tombact on Linux Serves
Installed and configured Veritas Cluster system for OS and Java application availability
Writing technical documents for upgrades, applications, and log monitoring
System log management using Splunk on Linux servers
Major work included in Nagios, Cacti, Splunk, Jboss, and Apache products
Code design with PERL, PHP, MySQL, Python, and Dtrace for application testing, monitoring, and debugging
Use Control-M for scheduling jobs
Used Linux LVM + Xen to create high availability Linux Environments.
Infrastructure Management Specialist
Fannie Mae
05.2008 - 08.2008
Environment: UNIX, Veritas, ClearCase, ClearQuest, Subversion, Oracle, Weblogic, SunOne, Web server, AutoSys-Remedy, PERL-PHP, and Shell programming
Worked as an Infrastructure Manage specialist for Disaster Recovery System setup and integration, built scripts using PERL, shell scripting and Upgraded ClearCase Server version 7 in UNIX Environment
Installing, configuring, and Administration of Global & Sparse Zones (LDOMS) on Solaris 10 Servers
Support provided for Central Log analysis and monitoring team for Log storage for different teams within an organization
Daily activities included Backup and Analysis of Logs in a Production environment
Java builds deployments in Development System testing and Production environments
Updated documents on SharePoint sites
LDAP and Main server management
Configure JDBC connection pool in config.xml
Responsible for understanding application environment and changes to tune WebLogic parameters in support of application stability
Worked closely with the development team, infrastructure DBA, application architect, and performance test engineer to tune the application adjusting thread pool sizes, bean pool sizes, database connection pools, and JVM heap size
Weblogic, SunOne Web server, Sun Cluster 3.1 on Solaris 10, Veritas Volume Manager support, administration, and troubleshooting
Installation and administration experience with HP OVO 8
X
Used Dtrace for OS / Users logs
Digital Certificates installation for Weblogic Servers for Production and non-Production Environments
Different job automation, Calling System commands, scanning networks, file manipulation using UNIX Shell and PERL/PHP scripting
Installation, configuration, and integration of Nagios on RHEL 5 servers
Nagios Plugin development using perl, php, python and shell scripting
Clearcase to Subversion migration for large Java based application code for health system.
Education
Master of Science - Information Security
Strayer University
Washington
09-2015
Skills
Automation and CI/CD pipeline
Infrastructure automation (IaC)
Monitoring, logging and Observability
Incident management
DevSecOps implementations
Cloud Infrastructure Management
Capacity Planning & Performance Tuning
Disaster Recovery (DR) & Business Continuity
Cost optimization
Container Orchestration & Management
Version Control & Collaboration
Certification
CKA (Certified Kubernetes Administrator)
CKAD (Certified Kubernetes Application Developer)
KCAN (Kubernetes and Cloud Native Associate)
CompTIA Linux+
CompTIA PenTest+
Timeline
Sr SRE/DevOps Consultant
Apex Systems / CapitalOne
10.2024 - Current
DevSecOps Consultant/Engineer
Department
06.2024 - 10.2024
Lead DevSecOps Engineer/Architect
Karthikconsulting, USCG
09.2023 - 06.2024
DevSecOps and Cloud Engineering Director
Universal
08.2022 - 09.2023
Director
DTIS, Digital Trusted Identity Service
10.2019 - 08.2022
Enterprise Cloud Solutions Lead/Manager
Fannie Mae
07.2019 - 10.2019
Subject Matter Expert
CompTIACompTIA
10.2017 - Current
Adjunct Associate Professor
University of Maryland Global Campus
08.2017 - Current
DevOps Tech Lead
Alarm.com
06.2017 - 07.2019
Comcast
01.2013 - 07.2017
Lead DevOps Engineer, Sr. System Engineer
Corpus Inc Washington
02.2009 - 01.2013
UNIX Lead Developer
Duke University Heath Technology Systems
09.2008 - 12.2008
Sr. Production Operation Engineer
Comcast
05.2008 - 12.2012
Infrastructure Management Specialist
Fannie Mae
05.2008 - 08.2008
Master of Science - Information Security
Strayer University
Similar Profiles
Bishwash SapkotaBishwash Sapkota
Commissioning/Operation Engineer at Walmart Advanced Systems & Robotics (Contract through Apex Systems LLC)Commissioning/Operation Engineer at Walmart Advanced Systems & Robotics (Contract through Apex Systems LLC)