Amazon DevOps Guru increases Operational Efficiency for 605

Post Syndicated from Mohit Gadkari original https://aws.amazon.com/blogs/devops/amazon-devops-guru-increases-operational-efficiency-for-605/

605 is an independent TV measurement firm that offers advertising and content measurement, full-funnel attribution, media planning, optimization, and analytical solutions, all on top of their multi-source viewership data set covering over 21 million U.S. households. 605 has built their technology solutions on AWS with dozens of accounts and tens of thousands of resources to monitor.

As 605 continues to innovate and build new solutions, the size and complexity of their AWS deployment has also grown proportionally. Over time, managing their deployment has become an operational challenge for their current team. 605 has deployed different application performance monitoring (APM) tools and notification systems to help their observability staff scale and support their growing cloud environment. However, 605 realized that their continued growth on the cloud would necessitate either increasing their observability staff or assuming some risk of potential application performance issues or even outages.

Amazon DevOps Guru allowed 605 to find a third path forward. Rather than accepting the trade-off of hiring more staff or assuming more risk, 605 discovered that DevOps Guru provides an increase in operational efficiency using their existing staff resources by applying artificial intelligence (AI) to supplement their existing APM and notification platform. Layering DevOps Guru into their DevOps environment , 605 realized a 4-fold decrease in the number of alerts and notifications that proved to be false positives. In fact, 605 went from an environment where 76.2% of their alerts and notifications were false positives, to one with only 18.9% false positives simply by adding Amazon DevOps Guru. In the end, 605 can more effectively and efficiently manage their environment with existing resources and actually freeing-up DevOps brainpower to work on more strategically important initiatives than application management.

“Amazon DevOps Guru has provided insights that help us focus our infrastructure roadmap. Our current SIEM tools require building out alerting ahead of time, while DevOps Guru is constantly evolving, which prevents becoming stagnant in our monitoring. Reducing the risk of false positive alerts has saved countless engineering hours.”

Jared Williams, VP of Infrastructure and Architecture, 605

605 without DevOps Guru had their Amazon CloudWatch and Amazon Elastic Container Service for Kubernetes ( Amazon EKS) configured with different application performance monitoring and notification systems. They saw only 23.8 % legitimate alerts and notifications, where as with the integration with DevOps Guru the legitimate alerts and notifications went up to 81% for a 6-month time period.
605 are monitoring over 13+ AWS Accounts, 20+ Amazon EKS Clusters, 500+ Pods ,15000+ EC2 Instances, 500+ S3 Buckets and 55+ Application Load Balancers with DevOps Guru

605 without DevOps Guru had their Amazon CloudWatch and Amazon Elastic Container Service for Kubernetes ( Amazon EKS) configured with different application performance monitoring and notification systems. They saw only 23.8 % legitimate alerts and notifications, where as with the integration with DevOps Guru the legitimate alerts and notifications went up to 81% for a 6-month time period.

Figure 1. 605 are monitoring over 13+ AWS Accounts, 20+ Amazon EKS Clusters, 500+ Pods ,15000+ EC2 Instances, 500+ S3 Buckets and 55+ Application Load Balancers with DevOps Guru.

Amazon DevOps Guru is a service powered by applying artificial intelligence (AI) that’s designed to make it easy to improve an application’s operational performance and availability. DevOps Guru helps detect behaviors that deviate from normal operating patterns so that you can identify operational issues long before they impact your applications. DevOps Guru utilizes ML models informed by years of Amazon.com and AWS operational excellence to identify anomalous application behavior (for example, increased latency, error rates, resource constraints, and others). Furthermore, it helps surface critical issues that could cause potential outages or service disruptions. When DevOps Guru identifies a critical issue, it automatically sends an alert and provides a summary of related anomalies, the likely root cause, and context for when and where the issue occurred. When possible, DevOps Guru also helps provide recommendations regarding how to remediate the issue. DevOps Guru ingests operational data from your AWS applications and provides a single dashboard to visualize issues in your operational data. DevOps Guru can be enabled for all of the resources in your AWS account, resources in your AWS CloudFormation Stacks, or resources grouped together by AWS Tags, with no manual setup or ML expertise required.

The value of DevOps Guru for 605 goes beyond providing operational efficiency and avoiding the choice of adding DevOps resources or assuming more risk. DevOps Guru also discovered issues with application performance that their existing solutions weren’t trained to inspect.

This new data allowed 605 to avoid a potential problem that they didn’t otherwise know would occur. As DevOps Guru doesn’t require any set-up beyond enabling the service and choosing resources to monitor (it’s a managed service), the service can surface issues without any prior configuration.

In the end, the value of DevOps Guru for 605 surfaces in three ways. First, it increases operational efficiency by allowing their existing DevOps team to more effectively manage its AWS applications and resources, as well as the room to grow along with their business needs. Second, DevOps Guru reduces operational fatigue and allows their DevOps teams to focus on more strategic issues by significantly reducing false positives. Lastly, DevOps Guru can find operational issues to which existing APM tools may not be configured or able to detect.

Start monitoring your AWS applications with AWS DevOps Guru today using this link

About the authors:

Mohit Gadkari

Mohit Gadkari is a Solutions Architect at Amazon Web Services (AWS) supporting SMB customers. He has been professionally using AWS since 2015 specializing in DevOps and Cloud Security and currently he is using this experience to help customers navigate the cloud.

Pauly Longani

Pauly is an Enterprise Support Lead at AWS, USA. He is a customer advocate and supports his customers in their cloud journey. He is passionate about the cloud and how it can be leveraged to overcome challenges across industry verticals.

Jared Williams

Jared, VP of Infrastructure and Architecture at 605, is in his 15th year managing or working on teams with DevOps type focuses. He has been involved with AWS since 2009. He manages the multi-team DevOps department at 605 where he has been for more than three years. Jared also co-founded a 24,000+ person DevOps community.