Summary
Overview
Work History
Education
Skills
Websites
US Patents
Timeline
Generic
Ravi Malghan

Ravi Malghan

Senior Manager - Enterprise Monitoring And Observability
Fairfax,VA

Summary

Results-driven leader with 25+ years of experience designing and delivering high-availability monitoring solutions for large-scale enterprises. Known for replacing complex vendor tools with streamlined, in-house applications that improve stability, reduce costs, and enhance user experience. Proven track record of building systems with near 100% availability. Strong advocate for simple, effective solutions over buzzword-heavy architectures.

Overview

29
29
years of professional experience
2
2
years of post-secondary education

Work History

Senior Manager - Enterprise Monitoring and Observability

Fannie Mae
Reston, VA
05.2018 - Current
  • Led design and implementation of AI Ops AWS-native application, consolidating 3 vendor products into single in-house solution, achieving $2M+ annual savings.
  • Directed migration of Netcool to in-house Event Collector framework, reducing hardware footprint and saving $300K-$400K annually.
  • Developed SNMP trap and syslog receivers for diverse on-prem and cloud event sources.
  • Spearheaded creation of self-monitoring components and AWS Health Dashboard, proactively identifying issues.
  • Initiated AI/ML integration with AWS Bedrock for efficient alert summarization and root cause analysis.

Network Engineer III

Sprint
Reston, VA
06.2007 - 05.2018
  • As member of the Network Operations and Systems group, my responsibilities include maintaining network infrastructure monitoring tools, identify areas where monitoring and/or process can be improved and developing new functionalities.
  • Fault Event collection for Wireless Network: Currently member of a team responsible for designing and developing a new fault monitoring and automation system to manage Sprint wireless equipment. I developed a multi threaded Perl application to collect alarms from various vendor equipment, normalize alarms, match them with inventory and forward them for presentation to a NOC GUI. Built, tested and deployed a highly available and modular application. Worked with Docker and GIT repositories.
  • Redesign of Network Management System for Wireline Services: Led the effort to define, plan and implement a management system (Manager of Manager (MOM)) that supported multiple element management systems (Security Incident Management, Server alarm management and Network alarm management). Project included integrating the MOM with BMC Remedy using Webservices and also a home grown Trouble Ticket system using Java Messaging Service (JMS). I designed and developed workflow in Remedy to implement the trouble ticket integration. In addition I integrated Remedy AR Email engine to process email notifications with ticket status information to over 2000 users based on their subscription in the customer web portal. This project enabled Sprint to support additional customers/devices without hiring additional resources, a savings of about $200K+/year.
  • Customer Provisioning Tool using Remedy: Lead developer in streamlining customer provisioning process and workflow around customer provisioning. I developed a deployable application in Remedy. GUI and some of the workflow was implemented within Remedy and published over the intranet using the Mid-Tier component. Rest of the workflow was implemented using Remedy's JAVA API. This initiative saved the company $100+K/year while ensuring the customer provisioning processes were accurately and consistently completed every time.
  • Security Event Management System: I was member of team that designed and implemented IBM TSOM/NeuSecure supporting a Security Operation Center (SOC). I built correlation rules to correlate logs coming from over 300+ security devices and present them as new tickets or notes in existing trouble tickets. The resulting correlated alarms were integrated with the MoM system described above.

Sr. Consultant, Networks & Systems Management

International Network Services
Tysons Corner, VA
09.2004 - 06.2007
  • As member of the Networks and Systems Management group, I was involved in a number of consulting engagements that involved developing and implementing ENMS changes to improve network security and/or fault management.
  • Sprint Managed Services: I was assisted Sprint's Product Sales in developing a plan to provide reporting using IBM/NeuSecure for Distributed Denial Of Services (DDOS) devices. I identified metrics that are required for management of DOS events, and provided them as requirements to Cisco and Arbor. I implemented the necessary customizations within NeuSecure EAM to process DOS events and developed reports within NeuSecure/JReports.
  • Cogeco Cable, Set Top Box (STB) Management Assessment: Conducted an element level Network Management assessment to provide monitoring capabilities for Motorola STB.

Pr. Consultant, Enterprise Tech Mgmt

BTS Partners
02.2003 - 09.2004
  • As member of the Enterprise Technology Management (ETM) Group, I performed enterprise management assessments, provided solutions for Network Operations Group in a variety of environments. Responsibilities included define, design, develop system requirements and implement solutions to detect and manage IT infrastructure problems.
  • Precision Response Corporation, Netcool Deployment: Deployed Netcool Suite (Precision, ObjectServer, Probes and SSMS) to discover and manage a 200+ device network. I integrated the Netcool environment with Remedy to automate ticket create/update/delete processes.
  • Verizon, Network/Systems Security Assessment: Provided analysis/assessments on security exposures for Verizon routers and servers. Performed penetration tests using Nessus to deliver security analysis reports documenting security vulnerabilities. I implemented changes to close a limited number of security holes identified in the assessment phase.
  • Fleet, OSPF Management: Designed and implemented NerveCenter (NC) models to monitor OSPF operations via SNMP for Fleet network. Models provided a higher level of network availability by monitoring and detecting performance degradation in OSPF operations.
  • Fleet, NerveCenter Failover: Advanced NC development to provide automatic fail-over of NC components, modules to monitor OSPF performance and detect OSPF faults.

Pr. Consultant, Network/Systems Mgmt

Predictive Systems
04.1999 - 02.2003
  • Participated as a project lead and/or team member for design and implementation of Enterprise Network Management Systems (ENMS) at service provider as well as enterprise organizations. I also supported sales team by assisting in pre-sales calls and developing Statement of Work.
  • Fannie Mae, Member of ENMS Team: Designed and deployed a ENMS using Netcool Omnibus and Netcool Impact to migrate current business processes from Tivoli. This deployment was less complex and significantly lowered the cost of deploying business process changes. Completed the design and implementation in less than 8 weeks compared to 3-4 years for a similar Tivoli deployment. I implemented Impact policies to integrate ENMS with the Remedy ticketing system.
  • MediaCenter, Technical Lead: Led a 5-member team responsible for rolling out a NOC to manage MediaCenter's Optical network, a Managed Service Provider. I was the technical lead for the Veritas NerveCenter implementation. I built NerveCenter behavior models to manage Cerent Optical switches.
  • Global Tele-Systems (GTS), Technical Lead for the ENMS implementation: Led a 8-member team, which included a gap analysis, design and implementation of ENMS for GTS intra-network. I developed NerveCenter models, implemented a fail-over Netcool architecture and Netcool Firewall modules to manage 14 firewalls. I developed requirements for the Trouble-Ticketing system and the Asset Management to be implemented within Remedy.

Sr. Consultant, Professional Services

Hitachi Data Systems
11.1997 - 04.1999
  • Participated as a project lead and/or team member for design and implementation of ENMS at multiple organizations. Designed and implemented network architectures including LAN/WAN design, simulation of different network architectures using OPNET network simulation tool.
  • Designed and implemented HP Network Node Manager for UUNET Technologies to monitor a WAN network.
  • Designed and implemented a LAN for Hitachi Data Systems which included requirement gathering, network design and ENMS design. Network included Cisco routers (7507, 4500), dial-up access devices, Network management (NNM, CiscoWorks), Gauntlet Firewall, DNS server, T1 access through an ISP and SNMP manageable hubs.

Member of Technical Staff

Comsat Laboratories
Clarksburg, MD
04.1996 - 11.1997
  • Supported research team at Comsat Laboratories by running simulations with OPNET Modeler to evaluate satellite network performance, contributing to analysis of protocol efficiency and network reliability in mission-critical scenarios.

Education

M.S - Electrical Engineering

George Mason University
Fairfax, VA
08.1994 - 05.1996

Skills

Strategic planning

Operations management

Problem-solving

Python

AWS Services (ECS, Lambda, SNS/SQS)

Docker

TensorFlow

CI/CD

Gitlab

Observability Systems

US Patents

  • Patent No: US 9922539 - System and Method of Telecommunication Network Infrastructure Alarms Queuing and Multi-Threading
  • Patent No: US 9584462 - Universal Email Failure Notification System

Timeline

Senior Manager - Enterprise Monitoring and Observability

Fannie Mae
05.2018 - Current

Network Engineer III

Sprint
06.2007 - 05.2018

Sr. Consultant, Networks & Systems Management

International Network Services
09.2004 - 06.2007

Pr. Consultant, Enterprise Tech Mgmt

BTS Partners
02.2003 - 09.2004

Pr. Consultant, Network/Systems Mgmt

Predictive Systems
04.1999 - 02.2003

Sr. Consultant, Professional Services

Hitachi Data Systems
11.1997 - 04.1999

Member of Technical Staff

Comsat Laboratories
04.1996 - 11.1997

M.S - Electrical Engineering

George Mason University
08.1994 - 05.1996
Ravi MalghanSenior Manager - Enterprise Monitoring And Observability