Tag Archives: Customer Solutions

Decreasing incident response time for OutSystems with AWS serverless technology

Post Syndicated from Ivo Pinto original https://aws.amazon.com/blogs/architecture/decreasing-incident-response-time-for-outsystems-with-aws-serverless-technology/

Leading modern application platform space OutSystems is a low-code platform that provides tools for companies to develop, deploy, and manage omnichannel enterprise applications.

Security is a top priority at OutSystems. Their Security Operations Center (SOC) deals with thousands of incidents a year, each with a set of response actions that need to be executed as quickly as possible. Providing security at such large scale is a challenge, even for the most well-prepared organizations. Manual and repetitive tasks account for the majority of the response time involved in this process, and decreasing this key metric requires orchestration and automation.

Security orchestration, automation, and response (SOAR) systems are designed to translate security analysts’ manual procedures into automated actions, making them faster and more scalable.

In this blog post, we’ll explore how OutSystems lowered their incident response time by 99 percent by designing and deploying a custom SOAR using Serverless services on AWS.

Solution architecture

Security incidents happen with unknown frequency, making serverless services a natural fit to boost security at OutSystems because of their increased agility and capability to scale to zero.

There are two ways to trigger SOAR actions in this architecture:

  1. Automatically through Security Information and Event Management (SIEM) security incident findings
  2. On-demand through chat application

Using the first method, when a security incident is detected by the SIEM, an event is published to Amazon Simple Notification Service (Amazon SNS). This triggers an AWS Lambda function that creates a ticket in an internal ticketing system. Then the Lambda Playbooks function triggers to decide which playbook to run depending on the incident details.

Each playbook is a set of actions that are executed in response to a trigger. Playbooks are the key component behind automated tasks. OutSystems uses AWS Step Functions to orchestrate the actions and Lambda functions to execute them.

But this solution does not exist in isolation. Depending on the playbook, Step Functions interacts with other components such as AWS Secrets Manager or external APIs.

Using the second method, the on-demand trigger for OutSystems SOAR relies on a chat application. This application calls a Lambda function URL that interacts with the playbooks we just discussed.

Figure 1 represents the high-level architecture of OutSystems’ custom SOAR.

SOAR architecture for AWS

Figure 1. SOAR architecture for AWS

This architecture was deployed with Infrastructure as Code (IaC) using AWS CloudFormation and AWS CodePipeline.

This same IaC architecture is used when new playbooks or updates to existing ones are made. Code changes that are committed to a source control repository trigger the CodePipeline which uses AWS CodeBuild and CloudFormation change sets to deploy the updates to the affected resources.

Use cases

The use cases that OutSystems has deployed playbooks for to date include:

  • SQL injection
  • Unauthorized access to credentials
  • Issuance of new certificates
  • Login brute forces
  • Impossible travel

Let’s explore the Impossible travel use case. Impossible travel happens when a user logs in from one location, and then later logs in from a different location that would be impossible to travel between within the elapsed time.

When the SIEM identifies this behavior, it triggers an alert and the following actions are performed:

  1. A ticket is created
  2. An IP address check is performed in reputation databases, such as AbuseIPDB or VirusTotal
  3. An IP address check is performed in the internal database, and the IP address is added if it is not found
  4. A search is performed for past events with the same IP address
  5. A WHOIS is performed on the IP address
  6. Recent logins of the user are identified in the SIEM, along with all related information
  7. All of this information is automatically added to the ticket. Every step listed here was previously performed manually; a task that took an average of 15 minutes. Now, the process takes just 8 seconds—a 99.1% incident response time improvement.

The following remediation actions can also be automated, along with many others:

Some of these remediation actions are already in place, while others are in development.


At OutSystems, much like at AWS, security is considered “job zero.” It is not only important to be proactive in preventing security incidents, but when they happen, the response must be quick, effective, and as immune to human error as possible.

With the implementation of this custom SOAR, OutSystems reduced the average response time to security incidents by 99%. Tasks that previously took 76 hours of analysts’ time are now accomplished automatically within 31 minutes.

During the evaluation period, SOAR addressed hundreds of real-world incidents with some threat intel use cases being executed thousands of times.

An architecture composed of serverless services ensures OutSystems does not pay for systems that are standing by waiting for work, and at the same time, not compromising on performance.

If you are interested in this topic—how to respond to security incidents using AWS serverless services—be sure you also read the Orchestrating a security incident response with AWS Step Functions and How to get started with security response automation on AWS blog posts.

Deliver Operational Insights to Atlassian Opsgenie using DevOps Guru

Post Syndicated from Brendan Jenkins original https://aws.amazon.com/blogs/devops/deliver-operational-insights-to-atlassian-opsgenie-using-devops-guru/

As organizations continue to grow and scale their applications, the need for teams to be able to quickly and autonomously detect anomalous operational behaviors becomes increasingly important. Amazon DevOps Guru offers a fully managed AIOps service that enables you to improve application availability and resolve operational issues quickly. DevOps Guru helps ease this process by leveraging machine learning (ML) powered recommendations to detect operational insights, identify the exhaustion of resources, and provide suggestions to remediate issues. Many organizations running business critical applications use different tools to be notified about anomalous events in real-time for the remediation of critical issues. Atlassian is a modern team collaboration and productivity software suite that helps teams organize, discuss, and complete shared work. You can deliver these insights in near-real time to DevOps teams by integrating DevOps Guru with Atlassian Opsgenie. Opsgenie is a modern incident management platform that receives alerts from your monitoring systems and custom applications and categorizes each alert based on importance and timing.

This blog post walks you through how to integrate Amazon DevOps Guru with Atlassian Opsgenie to
receive notifications for new operational insights detected by DevOps Guru with more flexibility and customization using Amazon EventBridge and AWS Lambda. The Lambda function will be used to demonstrate how to customize insights sent to Opsgenie.

Solution overview

Figure 1: Amazon EventBridge Integration with Opsgenie using AWS Lambda

Figure 1: Amazon EventBridge Integration with Opsgenie using AWS Lambda

Amazon DevOps Guru directly integrates with Amazon EventBridge to notify you of events relating to generated insights and updates to insights. To begin routing these notifications to Opsgenie, you can configure routing rules to determine where to send notifications. As outlined below, you can also use pre-defined DevOps Guru patterns to only send notifications or trigger actions that match that pattern. You can select any of the following pre-defined patterns to filter events to trigger actions in a supported AWS resource. Here are the following predefined patterns supported by DevOps Guru:

  • DevOps Guru New Insight Open
  • DevOps Guru New Anomaly Association
  • DevOps Guru Insight Severity Upgraded
  • DevOps Guru New Recommendation Created
  • DevOps Guru Insight Closed

By default, the patterns referenced above are enabled so we will leave all patterns operational in this implementation.  However, you do have flexibility to change which of these patterns to choose to send to Opsgenie. When EventBridge receives an event, the EventBridge rule matches incoming events and sends it to a target, such as AWS Lambda, to process and send the insight to Opsgenie.


The following prerequisites are required for this walkthrough:

Push Insights using Amazon EventBridge & AWS Lambda

In this tutorial, you will perform the following steps:

  1. Create an Opsgenie integration
  2. Launch the SAM template to deploy the solution
  3. Test the solution

Create an Opsgenie integration

In this step, you will navigate to Opsgenie to create the integration with DevOps Guru and to obtain the API key and team name within your account. These parameters will be used as inputs in a later section of this blog.

  1. Navigate to Teams, and take note of the team name you have as shown below, as you will need this parameter in a later section.
Figure 2: Opsgenie team names

Figure 2: Opsgenie team names

  1. Click on the team to proceed and navigate to Integrations on the left-hand pane. Click on Add Integration and select the Amazon DevOps Guru option.
Figure 3: Integration option for DevOps Guru

Figure 3: Integration option for DevOps Guru

  1. Now, scroll down and take note of the API Key for this integration and copy it to your notes as it will be needed in a later section. Click Save Integration at the bottom of the page to proceed.


 Figure 4: API Key for DevOps Guru Integration

Figure 4: API Key for DevOps Guru Integration

  1. Now, the Opsgenie integration has been created and we’ve obtained the API key and team name. The email of any team member will be used in the next section as well.

Review & launch the AWS SAM template to deploy the solution

In this step, you will review & launch the SAM template. The template will deploy an AWS Lambda function that is triggered by an Amazon EventBridge rule when Amazon DevOps Guru generates a new event. The Lambda function will retrieve the parameters obtained from the deployment and pushes the events to Opsgenie via an API.

Reviewing the template

Below is the SAM template that will be deployed in the next step. This template launches a few key components specified earlier in the blog. The Transform section of the template allows us takes an entire template written in the AWS Serverless Application Model (AWS SAM) syntax and transforms and expands it into a compliant CloudFormation template. Under the Resources section this solution will deploy an AWS Lamba function using the Java runtime as well as an Amazon EventBridge Rule/Pattern. Another key aspect of the template are the Parameters. As shown below, the ApiKey, Email, and TeamName are parameters we will use for this CloudFormation template which will then be used as environment variables for our Lambda function to pass to OpsGenie.

Figure 5: Review of SAM Template

Figure 5: Review of SAM Template

Launching the Template

  1. Navigate to the directory of choice within a terminal and clone the GitHub repository with the following command:
  1. Change directories with the command below to navigate to the directory of the SAM template.
cd amazon-devops-guru-connector-opsgenie/OpsGenieServerlessTemplate
  1. From the CLI, use the AWS SAM to build and process your AWS SAM template file, application code, and any applicable language-specific files and dependencies.
sam build
  1. From the CLI, use the AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file.
sam deploy --guided
  1. You will now be prompted to enter the following information below. Use the information obtained from the previous section to enter the Parameter ApiKey, Parameter Email, and Parameter TeamName fields.
  •  Stack Name
  • AWS Region
  • Parameter ApiKey
  • Parameter Email
  • Parameter TeamName
  • Allow SAM CLI IAM Role Creation

Test the solution

  1. Follow this blog to enable DevOps Guru and generate an operational insight.
  2. When DevOps Guru detects a new insight, it will generate an event in EventBridge. EventBridge then triggers Lambda and sends the event to Opsgenie as shown below.
Figure 6: Event Published to Opsgenie with details such as the source, alert type, insight type, and a URL to the insight in the AWS console.

Figure 6: Event Published to Opsgenie with details such as the source, alert type, insight type, and a URL to the insight in the AWS console.enecccdgruicnuelinbbbigebgtfcgdjknrjnjfglclt

Cleaning up

To avoid incurring future charges, delete the resources.

  1. Delete resources deployed from this blog.
  2. From the command line, use AWS SAM to delete the serverless application along with its dependencies.
sam delete

Customizing Insights published using Amazon EventBridge & AWS Lambda

The foundation of the DevOps Guru and Opsgenie integration is based on Amazon EventBridge and AWS Lambda which allows you the flexibility to implement several customizations. An example of this would be the ability to generate an Opsgenie alert when a DevOps Guru insight severity is high. Another example would be the ability to forward appropriate notifications to the AIOps team when there is a serverless-related resource issue or forwarding a database-related resource issue to your DBA team. This section will walk you through how these customizations can be done.

EventBridge customization

EventBridge rules can be used to select specific events by using event patterns. As detailed below, you can trigger the lambda function only if a new insight is opened and the severity is high. The advantage of this kind of customization is that the Lambda function will only be invoked when needed.

  "source": [
  "detail-type": [
    "DevOps Guru New Insight Open"
  "detail": {
    "insightSeverity": [

Applying EventBridge customization

  1. Open the file template.yaml reviewed in the previous section and implement the changes as highlighted below under the Events section within resources (original file on the left, changes on the right hand side).
Figure 7: CloudFormation template file changed so that the EventBridge rule is only triggered when the alert type is "DevOps Guru New Insight Open" and insightSeverity is “high”.

Figure 7: CloudFormation template file changed so that the EventBridge rule is only triggered when the alert type is “DevOps Guru New Insight Open” and insightSeverity is “high”.

  1. Save the changes and use the following command to apply the changes
sam deploy --template-file template.yaml
  1. Accept the changeset deployment

Determining the Ops team based on the resource type

Another customization would be to change the Lambda code to route and control how alerts will be managed.  Let’s say you want to get your DBA team involved whenever DevOps Guru raises an insight related to an Amazon RDS resource. You can change the AlertType Java class as follows:

  1. To begin this customization of the Lambda code, the following changes need to be made within the AlertType.java file:
  • At the beginning of the file, the standard java.util.List and java.util.ArrayList packages were imported
  • Line 60: created a list of CloudWatch metrics namespaces
  • Line 74: Assigned the dataIdentifiers JsonNode to the variable dataIdentifiersNode
  • Line 75: Assigned the namespace JsonNode to a variable namespaceNode
  • Line 77: Added the namespace to the list for each DevOps Insight which is always raised as an EventBridge event with the structure detail►anomalies►0►sourceDetails►0►dataIdentifiers►namespace
  • Line 88: Assigned the default responder team to the variable defaultResponderTeam
  • Line 89: Created the list of responders and assigned it to the variable respondersTeam
  • Line 92: Check if there is at least one AWS/RDS namespace
  • Line 93: Assigned the DBAOps_Team to the variable dbaopsTeam
  • Line 93: Included the DBAOps_Team team as part of the responders list
  • Line 97: Set the OpsGenie request teams to be the responders list
Figure 8: java.util.List and java.util.ArrayList packages were imported

Figure 8: java.util.List and java.util.ArrayList packages were imported


Figure 9: AlertType Java class customized to include DBAOps_Team for RDS-related DevOps Guru insights.

Figure 9: AlertType Java class customized to include DBAOps_Team for RDS-related DevOps Guru insights.


  1. You then need to generate the jar file by using the mvn clean package command.
  • The function needs to be updated with:
    • FUNCTION_NAME=$(aws lambda
      list-functions –query ‘Functions[?contains(FunctionName, `DevOps-Guru`) ==
      `true`].FunctionName’ –output text)
    • aws lambda update-function-code –region
      us-east-1 –function-name $FUNCTION_NAME –zip-file fileb://target/Functions-1.0.jar
  1. As result, the DBAOps_Team will be assigned to the Opsgenie alert in the case a DevOps Guru Insight is related to RDS.
Figure 10: Opsgenie alert assigned to both DBAOps_Team and AIOps_Team.

Figure 10: Opsgenie alert assigned to both DBAOps_Team and AIOps_Team.


In this post, you learned how Amazon DevOps Guru integrates with Amazon EventBridge and publishes insights to Opsgenie using AWS Lambda. By creating an Opsgenie integration with DevOps Guru, you can now leverage Opsgenie strengths, incident management, team communication, and collaboration when responding to an insight. All of the insight data can be viewed and addressed in Opsgenie’s Incident Command Center (ICC).  By customizing the data sent to Opsgenie via Lambda, you can empower your organization even more by fine tuning and displaying the most relevant data thus decreasing the MTTR (mean time to resolve) of the responding operations team.

About the authors:

Brendan Jenkins

Brendan Jenkins is a solutions architect working with Enterprise AWS customers providing them with technical guidance and helping achieve their business goals. He has an area of interest around DevOps and Machine Learning technology. He enjoys building solutions for customers whenever he can in his spare time.

Pablo Silva

Pablo Silva is a Sr. DevOps consultant that guide customers in their decisions on technology strategy, business model, operating model, technical architecture, and investments.

He holds a master’s degree in Artificial Intelligence and has more than 10 years of experience with telecommunication and financial companies.

Joseph Simon

Joseph Simon is a solutions architect working with mid to large Enterprise AWS customers. He has been in technology for 13 years with 5 of those centered around DevOps. He has a passion for Cloud, DevOps and Automation and in his spare time, likes to travel and spend time with his family.

Diligent enhances customer governance with automated data-driven insights using Amazon QuickSight

Post Syndicated from Vidya Kotamraju original https://aws.amazon.com/blogs/big-data/diligent-enhances-customer-governance-with-automated-data-driven-insights-using-amazon-quicksight/

This post is co-written with Vidya Kotamraju and Tallis Hobbs, from Diligent.

Diligent is the global leader in modern governance, providing software as a service (SaaS) services across governance, risk, compliance, and audit, helping companies meet their environmental, social, and governance (ESG) commitments. Serving more than 1 million users from over 25,000 customers around the world, we empower transformational leaders with software, insights, and confidence to drive greater impact and lead with purpose.

We provide the right governance technology that empowers our customers to act strategically while maintaining compliance, mitigating risk, and driving efficiency. With the Diligent Platform, organizations can bring their most critical data into one centralized place. By using powerful analytics, automation, and unparalleled industry data, our customers’ board and c-suite get relevant insights from across risk, compliance, audit, and ESG teams that help them make better decisions, faster and more securely.

One of the biggest obstacles that customers face is obtaining a holistic view of their data. To effectively manage risks and ensure compliance, organizations need to have a comprehensive understanding of their operations and processes. However, this can be difficult to achieve. Scenarios such as data being dispersed across multiple systems and departments, or if data is not consistently collected and updated, or if data is not in a format that can be easily analyzed can all present various challenges. To address them, we turned to Amazon QuickSight to enhance our customer-facing products with embedded insights and reports.

In this post, we cover what we were looking for in a business intelligence (BI) tool, and how QuickSight met our requirements.

Narrowing down the competition

To effectively serve our customers, we needed a platform-wide reporting solution that would enable our users to centralize their governance, risk, and compliance (GRC) programs, and collect information from disparate data sources, while allowing for integrated automation and analytics for data-driven insights, thereby providing a curated picture of GRC with confidence.

When we started our research into various BI tool offerings to embed into our platform, we narrowed the list down to a handful that had most of the capabilities we were looking for. After reviewing the options, QuickSight was our top option when it came to the ease of integration with our existing AWS-built ecosystem. QuickSight offered everything we needed, with the flexibility we wanted, at an affordable price.

Why we chose QuickSight

There are many data points that can be useful for making business decisions; the specific data points that are most critical will depend on the nature of the business and the decisions being made. However, there are some common types of data that are often important for making informed business decisions: financial data, marketing data, operational data, and customer data.

Translating those requirements into BI tool functionality, we were looking for:

  • A seamless way to obtain a holistic and unified view of data
  • The ability to handle substantial amounts (over 100 TB) of data
  • Enough flexibility to support the changing needs of our solution as we grow
  • Great value for the price

QuickSight checked all the boxes on our list. The most compelling reasons why we ultimately chose QuickSight were:

  • Visualization and reporting capabilities – QuickSight offers a wide range of visualization options and allows creation of custom reports and dashboards
  • Data sources – QuickSight supports a wide variety of data sources, making connection and analysis easy
  • Ease of integration – QuickSight fit seamlessly with our existing AWS technology stack with a price that fits our budget

Comprehensive, personalized, customer-facing reporting platform

Today, we’re using Quicksight to create a customer-facing reporting platform that allows our customers to report on their data within our ecosystem. QuickSight helps empower our customers by putting the reporting tools and capability in their hands, allowing them to get a comprehensive, personalized (via row-level security) view of data, unique to their workflow.

The following screenshot shows an example of our Issues & Actions dashboard, designed for risk managers and audit managers, showing various issues in need of attention.

QuickSight has provided a way to enable our customers to bring data and intelligence to the board or leadership teams in a simple, more streamlined way that saves time and effort—by automating standard reporting and surfacing it in a rich and interactive dashboard for directors. Boards and leaders will have access to curated insights, culled from both internal operations and external sources, integrated into the Diligent Boards platform—visualized in such a way that their data tells the story that accompanies the board materials.

For us, the most compelling benefit of using QuickSight is the ease of integration with Diligent’s existing tech stack and data stack. Quicksight integrates seamlessly with other AWS products in our technology stack, making it easy to incorporate data from various sources and systems into our dashboards and reports.

QuickSight was the perfect fit

Our customers love the flexibility with reporting. Quicksight provides a range of visualization options that allows users to customize their dashboards and reports to fit their specific needs and preferences. We love that the QuickSight team is open to taking prompt action on customer feedback. Their continuous and frequent feature release process is confidence-inspiring.

QuickSight helps us provide flexibility to our customers, enabling them to quickly put the right data in front of the right audience to make the right business decisions.

To learn more, visit Amazon QuickSight.

About the Authors

Vidya Kotamraju is a Product Management Leader at Diligent, with close to 2 decades of experience leading award-winning B2B, B2C product and team success across multiple industries and geographies. Currently, she is focused on Diligent Highbond’s Data Automation Solutions.

Tallis Hobbs is a Senior Software Engineer at Diligent. As a previous educator, he brings a unique skill set to the engineering space. He is passionate about the AWS serverless space and currently works on Diligent’s client facing Quicksight integration.

Samit Kumbhani is a Sr. Solutions Architect at AWS based out of New York City area. Has has 18+ years of experience in building applications and focuses on Analytics, Business Intelligence and Databases. He enjoys working with customers to understand their challenges and solve them by creating innovative solutions using AWS services. Outside of work, Samit loves playing cricket, traveling and spending time with his family and friends.

Adopt Recommendations and Monitor Predictive Scaling for Optimal Compute Capacity

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/evaluating-predictive-scaling-for-amazon-ec2-capacity-optimization/

This post is written by Ankur Sethi, Sr. Product Manager, EC2, and Kinnar Sen, Sr. Specialist Solution Architect, AWS Compute.

Amazon EC2 Auto Scaling helps customers optimize their Amazon EC2 capacity by dynamically responding to varying demand. Based on customer feedback, we enhanced the scaling experience with the launch of predictive scaling policies. Predictive scaling proactively adds EC2 instances to your Auto Scaling group in anticipation of demand spikes. This results in better availability and performance for your applications that have predictable demand patterns and long initialization times. We recently launched a couple of features designed to help you assess the value of predictive scaling – prescriptive recommendations on whether to use predictive scaling based on its potential availability and cost impact, and integration with Amazon CloudWatch to continuously monitor the accuracy of predictions. In this post, we discuss the features in detail and the steps that you can easily adopt to enjoy the benefits of predictive scaling.

Recap: Predictive Scaling

EC2 Auto Scaling helps customers maintain application availability by managing the capacity and health of the underlying cluster. Prior to predictive scaling, EC2 Auto Scaling offered dynamic scaling policies such as target tracking and step scaling. These dynamic scaling policies are configured with an Amazon CloudWatch metric that represents an application’s load. EC2 Auto Scaling constantly monitors this metric and responds according to your policies, thereby triggering the launch or termination of instances. Although it’s extremely effective and widely used, this model is reactive in nature, and for larger spikes, may lead to unfulfilled capacity momentarily as the cluster is scaling out. Customers mitigate this by adopting aggressive scale out and conservative scale in to manage the additional buffer of instances. However, sometimes applications take a long time to initialize or have a recurring pattern with a sudden spike of high demand. These can have an impact on the initial response of the system when it is scaling out. Customers asked for a proactive scaling mechanism that can scale capacity ahead of predictable spikes, and so we delivered predictive scaling.

Predictive scaling was launched to make the scaling action proactive as it anticipates the changes required in the compute demand and scales accordingly. The scaling action is determined by ensemble machine learning (ML) built with data from your Auto Scaling group’s scaling patterns, as well as billions of data points from our observations. Predictive scaling should be used for applications where demand changes rapidly but with a recurring pattern, instances require a long time to initialize, or where you’re manually invoking scheduled scaling for routine demand patterns. Predictive scaling not only forecasts capacity requirements based on historical usage, but also learns continuously, thereby making forecasts more accurate with time. Furthermore, predictive scaling policy is designed to only scale out and not scale in your Auto Scaling groups, eliminating the risk of ending with lesser capacity because of inexact predictions. You must use dynamic scaling policy, scheduled scaling, or your own custom mechanism for scale-ins. In case of exceptional demand spikes, this addition of dynamic scaling policy can also improve your application performance by bridging the gap between demand and predicted capacity.

What’s new with predictive scaling

Predictive scaling policies can be configured in a non-mutative ‘Forecast Only’ mode to evaluate the accuracy of forecasts. When you’re ready to start scaling, you can switch to the ‘Forecast and Scale’ mode. Now we prescriptively recommend whether your policy should be switched to ‘Forecast and Scale’ mode if it can potentially lead to better availability and lower costs, saving you the time and effort of doing such an evaluation manually. You can test different configurations by creating multiple predictive scaling policies in ‘Forecast Only’ mode, and choose the one that performs best in terms of availability and cost improvements.

Monitoring and observability are key elements of AWS Well Architected Framework. Now we also offer CloudWatch metrics for your predictive scaling policies so that that you can programmatically monitor your predictive scaling policy for demand pattern changes or prolonged periods of inaccurate predictions. This will enable you to monitor the key performance metrics and make it easier to adopt AWS Well-Architected best practices.

In the following sections, we deep dive into the details of these two features.

Recommendations for predictive scaling

Once you set up an Auto Scaling group with predictive scaling policy in Forecast Only mode as explained in this introduction to predictive scaling blog post , you can review the results of the forecast visually and adjust any parameters to more accurately reflect the behavior that you desire. Evaluating simply on the basis of visualization may not be very intuitive if the scaling patterns are erratic. Moreover, if you keep higher minimum capacities, then the graph may show a flat line for the actual capacity as your Auto Scaling group capacity is an outcome of existing scaling policy configurations and the minimum capacity that you configured. This makes it difficult to contemplate whether the lower capacity predicted by predictive scaling wouldn’t leave your Auto Scaling group under-scaled.

This new feature provides a prescriptive guidance to switch on predictive scaling in Forecast and Scale mode based on the factors of availability and cost savings. To determine the availability and cost savings, we compare the predictions against the actual capacity and the optimal, required capacity. This required capacity is inferred based on whether your instances were running at a higher or lower value than the target value for scaling metric that you defined as part of the predictive scaling policy configuration. For example, if an Auto Scaling group is running 10 instances at 20% CPU Utilization while the target defined in predictive scaling policy is 40%, then the instances are running under-utilized by 50% and the required capacity is assumed to be 5 instances (half of your current capacity). For an Auto Scaling group, based on the time range in which you’re interested (two weeks as default), we aggregate the cost saving and availability impact of predictive scaling. The availability impact measures for the amount of time that the actual metric value was higher than the target value that you defined to be optimal for each policy. Similarly, cost savings measures the aggregated savings based on the capacity utilization of the underlying Auto Scaling group for each defined policy. The final cost and availability will lead us to a recommendation based on:

  • If availability increases (or remains same) and cost reduces (or remains same), then switch on Forecast and Scale
  • If availability reduces, then disable predictive scaling
  • If availability increase comes at an increased cost, then the customer should take the call based on their cost-availability tradeoff threshold

This figure shows the console view of how the recommendations look like on the Auto Scaling console. For each policy we make prescriptive recommendation of whether to switch to Forecast And Scale mode along with whether doing so can lead to better availability and lower costFigure 1: Predictive Scaling Recommendations on EC2 Auto Scaling console

The preceding figure shows how the console reflects the recommendation for a predictive scaling policy. You get information on whether the policy can lead to higher availability and lower cost, which leads to a recommendation to switch to Forecast and Scale. To achieve this cost saving, you might have to lower your minimum capacity and aim for higher utilization in dynamic scaling policies.

To get the most value from this feature, we recommend that you create multiple predictive scaling policies in Forecast Only mode with different configurations, choosing different metrics and/or different target values. Target value is an important lever that changes how aggressive the capacity forecasts must be. A lower target value increases your capacity forecast resulting in better availability for your application. However, this also means more dollars to be spent on the Amazon EC2 cost. Similarly, a higher target value can leave you under-scaled while reactive scaling bridges the gap in just a few minutes. Separate estimates of cost and availability impact are provided for each of the predictive scaling policies. We recommend using a policy if either availability or cost are improved and the other variable improves or stays the same. As long as there is a predictable pattern, Auto Scaling enhanced with predictive scaling maintains high availability for your applications.

Continuous Monitoring of predictive scaling

Once you’re using a predictive scaling policy in Forecast and Scale mode based on the recommendation, you must monitor the predictive scaling policy for demand pattern changes or inaccurate predictions. We introduced two new CloudWatch Metrics for predictive scaling called ‘PredictiveScalingLoadForecast’ and ‘PredictiveScalingCapacityForecast’. Using CloudWatch mertic math feature, you can create a customized metric that measures the accuracy of predictions. For example, to monitor whether your policy is over or under-forecasting, you can publish separate metrics to measure the respective errors. In the following graphic, we show how the metric math expressions can be used to create a Mean Absolute Error for over-forecasting on the load forecasts. Because predictive scaling can only increase capacity, it is useful to alert when the policy is excessively over-forecasting to prevent unnecessary cost.This figure shows the CloudWatch graph of three metrics – the total CPU Utilization of the Auto Scaling group, the load forecast generated by predictive scaling, and the derived metric using metric math that measures error for over-forecastingFigure 2: Graphing an accuracy metric using metric math on CloudWatch

In the previous graph, the total CPU Utilization of the Auto Scaling group is represented by m1 metric in orange color while the predicted load by the policy is represented by m2 metric in green color. We used the following expression to get the ratio of over-forecasting error with respect to the actual value.

IF((m2-m1)>0, (m2-m1),0))/m1

Next, we will setup an alarm to automatically send notifications using Amazon Simple Notification Service (Amazon SNS). You can create similar accuracy monitoring for capacity forecasts, but remember that once the policy is in Forecast and Scale mode, it already starts influencing the actual capacity. Hence, putting alarms on load forecast accuracy might be more intuitive as load is generally independent of the capacity of an Auto Scaling group.

This figure shows creation of alarm when 10 out of 12 data points breach 0.02 threshold for the accuracy metricFigure 3: Creating a CloudWatch Alarm on the accuracy metric

In the above screenshot, we have set an alarm that triggers when our custom accuracy metric goes above 0.02 (20%) for 10 out of last 12 data points which translates to 10 hours of the last 12 hours. We prefer to alarm on a greater number of data points so that we get notified only when predictive scaling is consistently giving inaccurate results.


With these new features, you can make a more informed decision about whether predictive scaling is right for you and which configuration makes the most sense. We recommend that you start off with Forecast Only mode and switch over to Forecast and Scale based on the recommendations. Once in Forecast and Scale mode, predictive scaling starts taking proactive scaling actions so that your instances are launched and ready to contribute to the workload in advance of the predicted demand. Then continuously monitor the forecast to maintain high availability and cost optimization of your applications. You can also use the new predictive scaling metrics and CloudWatch features, such as metric math, alarms, and notifications, to monitor and take actions when predictions are off by a set threshold for prolonged periods.

How SikSin improved customer engagement with AWS Data Lab and Amazon Personalize

Post Syndicated from Byungjun Choi original https://aws.amazon.com/blogs/big-data/how-siksin-improved-customer-engagement-with-aws-data-lab-and-amazon-personalize/

This post is co-written with Byungjun Choi and Sangha Yang from SikSin.

SikSin is a technology platform connecting customers with restaurant partners serving their multiple needs. Customers use the SikSin platform to search and discover restaurants, read and write reviews, and view photos. From the restaurateurs’ perspective, SikSin enables restaurant partners to engage and acquire customers in order to grow their business. SikSin has a partnership with 850 corporate companies and more than 50,000 restaurants. They issue restaurant e-vouchers to more than 220,000 members, including individuals as well as corporate members. The SikSin platform receives more than 3 million users in a month. SikSin was listed in the top 100 of the Financial Times’s Asia-Pacific region’s high-growth companies in 2022.

SikSin was looking to deliver improved customer experiences and increase customer engagement. SikSin confronted two business challenges:

  • Customer engagement – SikSin maintains data on more than 750,000 restaurants and has more than 4,000 restaurant articles (and growing). SikSin was looking for a personalized and customized approach to provide restaurant recommendations for their customers and get them engaged with the content, thereby providing a personalized customer experience.
  • Data analysis activities – The SikSin Food Service team experienced difficulties in regards to report generation due to scattered data across multiple systems. The team previously had to submit a request to the IT team and then wait for answers that might be outdated. For the IT team, they needed to manually pull data out of files, databases, and applications, and then combine them upon every request, which is a time-consuming activity. The SikSin Food Service team wanted to view web analytics log data by multiple dimensions, such as customer profiles and places. Examples include page view, conversion rate, and channels.

To overcome these two challenges, SikSin participated in the AWS Data Lab program to assist them in building a prototype solution. The AWS Data Lab offers accelerated, joint-engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives. The Build Lab is a 2–5-day intensive build with a technical customer team.

In this post, we share how SikSin built the basis for accelerating their data project with the help of the Data Lab and Amazon Personalize.

Use cases

The Data Lab team and SikSin team had three consecutive meetings to discuss business and technical requirements, and decided to work on two uses cases to resolve their two business challenges:

  • Build personalized recommendations – SikSin wanted to deploy a machine learning (ML) model to produce personalized content on the landing page of the platform, particularly restaurants and restaurant articles. The success criteria was to increase the number of page views per session and membership subscription, reduce their bounce rate, and ultimately engage more visitors and members in SikSin’s contents.
  • Establish self-service analytics – SikSin’s business users wanted to reduce time to insight by making data more accessible while removing the reliance on the IT team by giving business users the ability to query data. The key was to consolidate web logs from BigQuery and operational business data from Amazon Relational Data Service (Amazon RDS) into a single place and analyze data whenever they need.

Solution overview

The following architecture depicts what the SikSin team built in the 4-day Build Lab. There are two parts in the solution to address SikSin’s business and technical requirements. The first part (1–8) is for building personalized recommendations, and the second part (A–D) is for establishing self-service analytics.

SikSin Solution Architecture

SikSin deployed an ML model to produce personalized content recommendations by using the following AWS services:

  1. AWS Database Migration Service (AWS DMS) helps migrate databases to AWS quickly and securely with minimal downtime. The SikSin team used AWS DMS to perform full load to bring data from the database tables into Amazon Simple Storage Service (Amazon S3) as a target. Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. An AWS Glue crawler populates the AWS Glue Data Catalog with the data schema definitions (in a landing folder).
  2. An AWS Lambda function checks if any previous files still exist in the landing folder and archives the files into a backup folder, if any.
  3. AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, ML, and application development. The SikSin team created AWS Glue Spark extract, transform, and load (ETL) jobs to prepare input datasets for ML models. These datasets are used to train ML models in bulk mode. There are a total of five datasets for training and two datasets for batch inference jobs.
  4. Amazon Personalize allows developers to quickly build and deploy curated recommendations and intelligent user segmentation at scale using ML. Because Amazon Personalize can be tailored to your individual needs, you can deliver the right customer experience at the right time and in the right place. Also, users will select existing ML models (also known as recipes), train models, and run batch inference to make recommendations.
  5. An Amazon Personalize job predicts for each line of input data (restaurants and restaurant articles) and produces ML-generated recommendations in the designated S3 output folder. The recommendation records are surfaced using interaction data, product data, and predictive models. An AWS Glue crawler populates the AWS Glue Data Catalog with the data schema definitions (in an output folder).
  6. The SikSin team applied business logics and filters in an AWS Glue job to prepare the final datasets for recommendations.
  7. AWS Step Functions enables you to build scalable, distributed applications using state machines. The SikSin team used AWS Step Functions Workflow Studio to visually create, run, and debug workflow runs. This workflow is triggered based on a schedule. The process includes data ingestion, cleansing, processing, and all steps defined in Amazon Personalize. This also involves managing run dependencies, scheduling, error-catching, and concurrency in accordance with the logical flow of the pipeline.
  8. Amazon Simple Notification Service (Amazon SNS) sends notifications. The SikSin team used Amazon SNS to send a notification via email and Google Hangouts with a Lambda function as a target.

To establish a self-service analytics environment to enable business users to perform data analysis, SikSin used the following services:

  1. The Google BigQuery Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from BigQuery. The SikSin team used the connector to extract web analytics logs from BigQuery and load them to an S3 bucket.
  2. AWS Glue DataBrew is a visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and ML. You can choose from over 250 pre-built transformations to automate data preparation tasks, all without the need to write any code. The SikSin Food Service team used it to visually inspect large datasets and shape the data for their data analysis activities. An S3 bucket (in the intermediate folder) contains business operational data such as customers, places, articles, and products, and reference data loaded from AWS DMS and web analytics logs and data by AWS Glue jobs.
  3. An AWS Glue Python shell runs a job to cleanse and join data, and apply business rules to prepare the data for queries. The SikSin team used AWS SDK Pandas, an AWS Professional Service open-source Python initiative, which extends the power of the Pandas library to AWS, connecting DataFrames and AWS data related services. The output files are stored in an Apache Parquet format in a single folder. An AWS Glue crawler populates the data schema definitions (in an output folder) into the AWS Glue Data Catalog.
  4. The SikSin Food Service team used Amazon Athena and Amazon Quicksight to query and visualize the data analysis. Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. QuickSight is an ML-powered business intelligence service built for the cloud.

Business outcomes

The SikSin Food Service team is now able to access the available data for performing data analysis and manipulation operations efficiently, as well as for getting insights on their own. This immediately allows the team as well as other lines of business to understand how customers are interacting with SikSin’s contents and services on the platform and make decisions sooner. For example, with the data output, the Food Service team was able to provide insights and data points for their external stakeholder and customer to initiate a new business idea. Moreover, the team shared, “We anticipate the recommendations and personalized content will increase conversion rates and customer engagement.”

The AWS Data Lab enabled SikSin to review and assess thoroughly what data is actually usable and available. With SikSin’s objective to successfully build a data pipeline for data analytics purposes, the SikSin team came to realize the importance of data cleansing, categorization, and standardization. “Only fruitful analysis and recommendation are possible when data is intact and properly cleansed,” said Byungjun Choi (the Head of SikSin’s Food Service Team). After completing the Data Lab, SikSin completed and set up an internal process that can streamline the data cleansing pipeline.

SikSin was stuck in the research phase of looking for a solution to solve their personalization challenges. The AWS Data Lab enabled the SikSin IT Team to get hands-on with the technology and build a minimum viable product (MVP) to explore how Amazon Personalize would work in their environment with their data. They achieved this via the Data Lab by adopting AWS DMS, AWS Glue, Amazon Personalize, and Step Functions. “Though it is still the early stage of building a prototype, I am very confident with the right enablement provided from AWS that an effective recommendation system can be adopted on production level very soon,” commented Sangha Yang (the Head of SikSin IT Team).


As a result of the 4-day Build Lab, the SikSin team left with a working prototype that is custom fit to their needs, gaining a clear path forward for enabling end-users to gain valuable insights into its data. The Data Lab allowed the SikSin team to accelerate the architectural design and prototype build of this solution by months. Based on the lessons and learnings obtained from Data Lab, SikSin is planning to launch a Global News Content Platform equipped with a recommendation feature in FY23.

As demonstrated by SikSin’s achievements, Amazon Personalize allows developers to quickly build and deploy curated recommendations and intelligent user segmentation at scale using ML. Because Amazon Personalize can be tailored to your individual needs, you can deliver the right customer experience at the right time and in the right place. Whether you want to optimize recommendations, target customers more accurately, maximize your data’s value, or promote items using business rules.

To accelerate your digital transformation with ML, the Data Lab program is available to support you by providing prescriptive architectural guidance on a particular use case, sharing best practices, and removing technical roadblocks. You’ll leave the engagement with an architecture or working prototype that is custom fit to your needs, a path to production, and deeper knowledge of AWS services.

Please contact your AWS Account Manager or Solutions Architect to get started. If you don’t have an AWS Account Manager, please contact Sales.

About the Authors

bdb-2857-BJByungjun Choi is the Head of SikSin Food Service at SikSin.

bdb-2857-SHSangha Yang is the Head of IT team at SinSin.

bdb-2857-youngguYounggu Yun is a Senior Data Lab Architect at AWS. He works with customers around the APAC region to help them achieve business goals and solve technical problems by providing prescriptive architectural guidance, sharing best practices, and building innovative solutions together.

Junwoo Lee is an Account Manager at AWS. He provides technical and business support to help customer resolve their problems and enrich customer journey by introducing local and global programs for his customers.

bdb-2857-jinwooJinwoo Park is a Senior Solutions Architect at AWS. He provides technical support for AWS customers to succeed with their cloud journey. He helps customers build more secure, efficient, and cost-optimized architectures and solutions, and delivers best practices and workshops.

Managing Dev Environments with Amazon CodeCatalyst

Post Syndicated from Ryan Bachman original https://aws.amazon.com/blogs/devops/managing-dev-environments-with-amazon-codecatalyst/

An Amazon CodeCatalyst Dev Environment is a cloud-based development environment that you can use in CodeCatalyst to quickly work on the code stored in the source repositories of your project. The project tools and application libraries included in your Dev Environment are defined by a devfile in the source repository of your project.


In the previous CodeCatalyst post, Team Collaboration with Amazon CodeCatalyst, I focused on CodeCatalyst’s collaboration capabilities and how that related to The Unicorn Project’s main protaganist. At the beginning of Chapter 2, Maxine is struggling to configure her development environment. She is two days into her new job and still cannot build the application code. She has identified over 100 dependencies she is missing. The documentation is out of date and nobody seems to know where the dependencies are stored. I can sympathize with Maxine. In this post, I will focus on managing development environments to show how CodeCatalyst removes the burden of managing workload specific configurations and produces reliable on-demand development environments.


If you would like to follow along with this walkthrough, you will need to:

Have an AWS Builder ID for signing in to CodeCatalyst.

Belong to a space and have the space administrator role assigned to you in that space. For more information, see Creating a space in CodeCatalystManaging members of your space, and Space administrator role.

Have an AWS account associated with your space and have the IAM role in that account. For more information about the role and role policy, see Creating a CodeCatalyst service role.


As with the previous posts in our CodeCatalyst series, I am going to use the Modern Three-tier Web Application blueprint.  Blueprints provide sample code and CI/CD workflows to help make getting started easier across different combinations of programming languages and architectures. To follow along, you can re-use a project you created previously, or you can refer to a previous post that walks through creating a project using the blueprint.

One of the most difficult aspects of my time spent as a developer was finding ways to quickly contribute to a new project. Whenever I found myself working on a new project, getting to the point where I could meaningfully contribute to a project’s code base was always more difficult than writing the actual code. A major contributor to this inefficiency, was the lack of process managing my local development environment. I will be exploring how CodeCatalyst can help solve this challenge.  For this walkthrough, I want to add a new test that will allow local testing of Amazon DynamoDB. To achieve this, I will use a CodeCatalyst dev environment.

CodeCatalyst Dev Environments are managed cloud-based development environments that you can use to access and modify code stored in a source repository. You can launch a project specific dev environment that will automate check-out of your project’s repo or you can launch an empty environment to use for accessing third-party source providers.  You can learn more about CodeCatalyst Dev Environments in the CodeCatalyst User Guide.

CodeCatalyst user interface showing Create Dev Environment

Figure 1. Creating a new Dev Environment

To begin, I navigate to the Dev Environments page under the Code section of the navigaiton menu.  I then use the Create Dev Environment to launch my environment.  For this post, I am using the AWS Cloud9 IDE, but you can follow along with the IDE you are most comfortable using.  In the next screen, I select Work in New Branch and assign local_testing for the new branch name, and I am branching from main.  I leave the remaining default options and Create.

Create Dev Environment user interface with work in a new branch selected

Figure 2. Dev Environment Create Options

After waiting less than a minute, my IDE is ready in a new tab and I am ready to begin work.  The first thing I see in my dev environment is an information window asking me if I want to navigate to the Dev Environment Settings.  Because I need to enable local testing of Dynamodb, not only for myself, but other developers that will collaborate on this project, I need to update the project’s devfile.  I select to navigate to the settings tab because I know that contains information on the project’s devfile and allows me to access the file to edit.

AWS Toolkit prompting to Open Dev Environment Settings.

Figure 3. Toolkit Welcome Banner

Devfiles allow you to model a Dev Environment’s configuration and dependencies so that you can re-produce consisent Dev Environments and reduce the manual effort in setting up future environments.  The tools and application libraries included in your Dev Environment are defined by the devfile in the source repository of your project.  Since this project was created from a blueprint, there is one provided.  For blank projects, a default CodeCatalyst devfile is created when you first launch an environment.  To learn more about the devfile, see https://devfile.io.

In the settings tab, I find a link to the devfile that is configured.  When I click the edit button, a new file tab launches and I can now make changes.  I first add an env section to the container that hosts our dev environment.  By adding an environment variable and value, anytime a new dev environment is created from this project’s repository, that value will be included.  Next, I add a second container to the dev environment that will run DynamoDB locally.  I can do this by adding a new container component.  I use Amazon’s verified DynamoDB docker image for my environment. Attaching additional images allow you to extend the dev environment and include tools or services that can be made available locally.  My updates are highlighted in the green sections below.

Devfile.yaml with environment variable and DynamoDB container added

Figure 4. Example Devfile

I save my changes and navigate back to the Dev Environment Settings tab. I notice that my changes were automatically detected and I am prompted to restart my development environment for the changes to take effect.  Modifications to the devfile requires a restart. You can restart a dev environment using the toolkit, or from the CodeCatalyst UI.

AWS Toolkit prompt asking to restart the dev environment

Figure 5. Dev Environment Settings

After waiting a few seconds for my dev environment to restart, I am ready to write my test.  I use the IDE’s file explorer, expand the repo’s ./tests/unit folder, and create a new file named test_dynamodb.py.  Using the IS_LOCAL environment variable I configured in the devfile, I can include a conditional in my test that sets the endpoint that Amazon’s python SDK ( Boto3 ) will use to connect to the Dynamodb service.  This way, I can run tests locally before pushing my changes and still have tests complete successfully in my project’s workflow.  My full test file is included below.

Python unit test with local code added

Figure 6. Dynamodb test file

Now that I have completed my changes to the dev environment using the devfile and added a test, I am ready to run my test locally to verify.  I will use pytest to ensure the tests are passing before pushing any changes.  From the repo’s root folder, I run the command pip install -r requirements-dev.txt.  Once my dependencies are installed, I then issue the command pytest -k unit.  All tests pass as I expect.

Result of the pytest shown at the command line

Figure 7. Pytest test results

Rather than manually installing my development dependencies in each environment, I could also use the devfile to include commands and automate the execution of those commands during the dev environment lifecycle events.  You can refer to the links for commands and events for more information.

Finally, I am ready to push my changes back to my CodeCatalyst source repository.  I use the git extension of Cloud9 to review my changes.  After reviewing my changes are what I expect, I use the git extension to stage, commit, and push the new test file and the modified devfile so other collaborators can adopt the improvements I made.

Figure 8.  Changes reviewed in CodeCatalyst Cloud9 git extension.

Figure 8.  Changes reviewed in CodeCatalyst Cloud9 git extension.


If you have been following along with this workflow, you  should delete the resources you deployed so you do not continue to incur  charges. First, delete the two stacks that CDK deployed using the AWS CloudFormation console in the AWS account you associated when you launched the blueprint. These stacks will have names like mysfitsXXXXXWebStack and mysfitsXXXXXAppStack. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project.


In this post, you learned how CodeCatalyst provides configurable on-demand dev environments.  You also learned how devfiles help you define a consistent experience for developing within a CodeCatalyst project.  Please follow our DevOps blog channel as I continue to explore how CodeCatalyst solve Maxine’s and other builders’ challenges.

About the author:

Ryan Bachman

Ryan Bachman is a Sr. Specialist Solutions Architect at AWS, and specializes in working with customers to improve their DevOps practices. Ryan has over 20 years of professional experience as a technologist, and has held roles in many different domains to include development, networking architecture, and technical product management. He is passionate about automation and helping customers increase software development productivity.

Journey to adopt Cloud-Native DevOps platform Series #2: Progressive delivery on Amazon EKS with Flagger and Gloo Edge Ingress Controller

Post Syndicated from Purna Sanyal original https://aws.amazon.com/blogs/devops/journey-to-adopt-cloud-native-devops-platform-series-2-progressive-delivery-on-amazon-eks-with-flagger-and-gloo-edge-ingress-controller/

In the last post, OfferUp modernized its DevOps platform with Amazon EKS and Flagger to accelerate time to market, we talked about hypergrowth and the technical challenges encountered by OfferUp in its existing DevOps platform. As a reminder, we presented how OfferUp modernized its DevOps platform with Amazon Elastic Kubernetes Service (Amazon EKS) and Flagger to gain developer’s velocity, automate faster deployment, and achieve lower cost of ownership.

In this post, we discuss the technical steps to build a DevOps platform that enables the progressive deployment of microservices on Amazon Managed Amazon EKS. Progressive delivery exposes a new version of the software incrementally to ingress traffic and continuously measures the success rate of the metrics before allowing all of the new traffics to a newer version of the software. Flagger is the Graduate project of Cloud Native Computing Foundations (CNCF) that enables progressive canary delivery, along with bule/green and A/B Testing, while measuring metrics like HTTP/gRPC request success rate and latency. Flagger shifts and routes traffic between app versions using a service mesh or an Ingress controller

We leverage Gloo Ingress Controller for traffic routing, Prometheus, Datadog, and Amazon CloudWatch for application metrics analysis and Slack to send notification. Flagger will post messages to slack when a deployment has been initialized, when a new revision has been detected, and if the canary analysis failed or succeeded.

Prerequisite steps to build the modern DevOps platform

You need an AWS Account and AWS Identity and Access Management (IAM) user to build the DevOps platform. If you don’t have an AWS account with Administrator access, then create one now by clicking here. Create an IAM user and assign admin role. You can build this platform in any AWS region however, I will you us-west-1 region throughout this post. You can use a laptop (Mac or Windows) or an Amazon Elastic Compute Cloud (AmazonEC2) instance as a client machine to install all of the necessary software to build the GitOps platform. For this post, I launched an Amazon EC2 instance (with Amazon Linux2 AMI) as the client and install all of the prerequisite software. You need the awscli, git, eksctl, kubectl, and helm applications to build the GitOps platform. Here are the prerequisite steps,

  1. Create a named profile(eks-devops)  with the config and credentials file:

aws configure --profile eks-devops

AWS Access Key ID [None]: xxxxxxxxxxxxxxxxxxxxxx

AWS Secret Access Key [None]: xxxxxxxxxxxxxxxxxx

Default region name [None]: us-west-1

Default output format [None]:

View and verify your current IAM profile:

export AWS_PROFILE=eks-devops

aws sts get-caller-identity

  1. If the Amazon EC2 instance doesn’t have git preinstalled, then install git in your Amazon EC2 instance:

sudo yum update -y

sudo yum install git -y

Check git version

git version

Git clone the repo and download all of the prerequisite software in the home directory.

git clone https://github.com/aws-samples/aws-gloo-flux.git

  1. Download all of the prerequisite software from install.sh which includes awscli, eksctl, kubectl, helm, and docker:

cd aws-gloo-flux/eks-flagger/

ls -lt

chmod 700 install.sh ecr-setup.sh

. install.sh

Check the version of the software installed:

aws --version

eksctl version

kubectl version -o json

helm version

docker --version

docker info

If the docker info shows an error like “permission denied”, then reboot the Amazon EC2 instance or re-log in to the instance again.

  1. Create an Amazon Elastic Container Repository (Amazon ECR) and push application images.

Amazon ECR is a fully-managed container registry that makes it easy for developers to share and deploy container images and artifacts. ecr setup.sh script will create a new Amazon ECR repository and also push the podinfo images (6.0.0, 6.0.1, 6.0.2, 6.1.0, 6.1.5 and 6.1.6) to the Amazon ECR. Run ecr-setup.sh script with the parameter, “ECR repository name” (e.g. ps-flagger-repository) and region (e.g. us-west-1)

./ecr-setup.sh <ps-flagger-repository> <us-west-1>

You’ll see output like the following (truncated).


Successfully created ECR repository and pushed podinfo images to ECR #

Please note down the ECR repository URI          


Technical steps to build the modern DevOps platform

This post shows you how to use the Gloo Edge ingress controller and Flagger to automate canary releases for progressive deployment on the Amazon EKS cluster. Flagger requires a Kubernetes cluster v1.16 or newer and Gloo Edge ingress 1.6.0 or newer. This post will provide a step-by-step approach to install the Amazon EKS cluster with managed node group, Gloo Edge ingress controller, and Flagger for Gloo in the Amazon EKS cluster. Now that the cluster, metrics infrastructure, and Flagger are installed, we can install the sample application itself. We’ll use the standard Podinfo application used in the Flagger project and the accompanying loadtester tool. The Flagger “podinfo” backend service will be called by Gloo’s “VirtualService”, which is the root routing object for the Gloo Gateway. A virtual service describes the set of routes to match for a set of domains. We’ll automate the canary promotion, with the new image of the “podinfo” service, from version 6.0.0 to version 6.0.1. We’ll also create a scenario by injecting an error for automated canary rollback while deploying version 6.0.2.

  1. Use myeks-cluster.yaml to create your Amazon EKS cluster with managed nodegroup. myeks-cluster.yaml deployment file has “cluster name” value as ps-eks-66, region value as us-west-1, availabilityZones as [us-west-1a, us-west-1b], Kubernetes version as 1.24, and nodegroup Amazon EC2 instance type as m5.2xlarge. You can change this value if you want to build the cluster in a separate region or availability zone.

eksctl create cluster -f myeks-cluster.yaml

Check the Amazon EKS Cluster details:

kubectl cluster-info

kubectl version -o json

kubectl get nodes -o wide

kubectl get pods -A -o wide

Deploy the Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

kubectl get deployment metrics-server -n kube-system

Update the kubeconfig file to interact with you cluster:

# aws eks update-kubeconfig --name <ekscluster-name> --region <AWS_REGION>

kubectl config view

cat $HOME/.kube/config

  1. Create a namespace “gloo-system” and Install Gloo with Helm Chart. Gloo Edge is an Envoy-based Kubernetes-native ingress controller to facilitate and secure application traffic.

helm repo add gloo https://storage.googleapis.com/solo-public-helm

kubectl create ns gloo-system

helm upgrade -i gloo gloo/gloo --namespace gloo-system

  1. Install Flagger and the Prometheus add-on in the same gloo-system namespace. Flagger is a Cloud Native Computing Foundation project and part of Flux family of GitOps tools.

helm repo add flagger https://flagger.app

helm upgrade -i flagger flagger/flagger \

--namespace gloo-system \

--set prometheus.install=true \

--set meshProvider=gloo

  1. [Optional] If you’re using Datadog as a monitoring tool, then deploy Datadog agents as a DaemonSet using the Datadog Helm chart. Replace RELEASE_NAME and DATADOG_API_KEY accordingly. If you aren’t using Datadog, then skip this step. For this post, we leverage the Prometheus open-source monitoring tool.

helm repo add datadog https://helm.datadoghq.com

helm repo update

helm install <RELEASE_NAME> \

    --set datadog.apiKey=<DATADOG_API_KEY> datadog/datadog

Integrate Amazon EKS/ K8s Cluster with the Datadog Dashboard – go to the Datadog Console and add the Kubernetes integration.

  1. [Optional] If you’re using Slack communication tool and have admin access, then Flagger can be configured to send alerts to the Slack chat platform by integrating the Slack alerting system with Flagger. If you don’t have admin access in Slack, then skip this step.

helm upgrade -i flagger flagger/flagger \

--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \

--set slack.channel=general \

--set slack.user=flagger \

--set clusterName=<my-cluster>

  1. Create a namespace “apps”, and applications and load testing service will be deployed into this namespace.

kubectl create ns apps

Create a deployment and a horizontal pod autoscaler for your custom application or service for which canary deployment will be done.

kubectl -n apps apply -k app

kubectl get deployment -A

kubectl get hpa -n apps

Deploy the load testing service to generate traffic during the canary analysis.

kubectl -n apps apply -k tester

kubectl get deployment -A

kubectl get svc -n apps

  1. Use apps-vs.yaml to create a Gloo virtual service definition that references a route table that will be generated by Flagger.

kubectl apply -f ./apps-vs.yaml

kubectl get vs -n apps

[Optional] If you have your own domain name, then open apps-vs.yaml in vi editor and replace podinfo.example.com with your own domain name to run the app in that domain.

  1. Use canary.yaml to create a canary custom resource. Review the service, analysis, and metrics sections of the canary.yaml file.

kubectl apply -f ./canary.yaml

After a couple of seconds, Flagger will create the canary objects. When the bootstrap finishes, Flagger will set the canary status to “Initialized”.

kubectl -n apps get canary podinfo


podinfo   Initialized   0        2023-xx-xxTxx:xx:xxZ

Gloo automatically creates an ELB. Once the load balancer is provisioned and health checks pass, we can find the sample application at the load balancer’s public address. Note down the ELB’s Public address:

kubectl get svc -n gloo-system --field-selector 'metadata.name==gateway-proxy'   -o=jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}{"\n"}'

Validate if your application is running, and you’ll see an output with version 6.0.0.

curl <load balancer’s public address> -H "Host:podinfo.example.com"

Trigger progressive deployments and monitor the status

You can Trigger a canary deployment by updating the application container image from 6.0.0 to 6.01.

kubectl -n apps set image deployment/podinfo  podinfod=<ECR URI>:6.0.1

Flagger detects that the deployment revision changed and starts a new rollout.

kubectl -n apps describe canary/podinfo

Monitor all canaries, as the promoted status condition can have one of the following statuses: initialized, Waiting, Progressing, Promoting, Finalizing, Succeeded, and Failed.

watch kubectl get canaries --all-namespaces

curl < load balancer’s public address> -H "Host:podinfo.example.com"

Once canary is completed, validate your application. You can see that the version of the application is changed from 6.0.0 to 6.0.1.


  "hostname": "podinfo-primary-658c9f9695-4pqbl",

  "version": "6.0.1",

  "revision": "",

  "color": "#34577c",

  "logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif",

  "message": "greetings from podinfo v6.0.1",


[Optional] Open podinfo application from the laptop browser

Find out both of the IP addresses associated with load balancer.

dig < load balancer’s public address >

Open /etc/hosts file in the laptop and add both of the IPs of load balancer in the host file.

sudo vi /etc/hosts

<Public IP address of LB Target node> podinfo.example.com


xx.xx.xxx.xxx podinfo.example.com

xx.xx.xxx.xxx podinfo.example.com

Type “podinfo.example.com” in your browser and you’ll find the application in form similar to this:

Figure 1: Greetings from podinfo v6.0.1

Automated rollback

While doing the canary analysis, you’ll generate HTTP 500 errors and high latency to check if Flagger pauses and rolls back the faulted version. Flagger performs automatic Rollback in the case of failure.

Introduce another canary deployment with podinfo image version 6.0.2 and monitor the status of the canary.

kubectl -n apps set image deployment/podinfo podinfod=<ECR URI>:6.0.2

Run HTTP 500 errors or a high-latency error from a separate terminal window.

Generate HTTP 500 errors:

watch curl -H 'Host:podinfo.example.com' <load balancer’s public address>/status/500

Generate high latency:

watch curl -H 'Host:podinfo.example.com' < load balancer’s public address >/delay/2

When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero, and the rollout is marked as failed.

kubectl get canaries --all-namespaces

kubectl -n apps describe canary/podinfo


When you’re done experimenting, you can delete all of the resources created during this series to avoid any additional charges. Let’s walk through deleting all of the resources used.

Delete Flagger resources and apps namespace
kubectl delete canary podinfo -n  apps

kubectl delete HorizontalPodAutoscaler podinfo -n apps

kubectl delete deployment podinfo -n   apps

helm -n gloo-system delete flagger

helm -n gloo-system delete gloo

kubectl delete namespace apps

Delete Amazon EKS Cluster
After you’ve finished with the cluster and nodes that you created for this tutorial, you should clean up by deleting the cluster and nodes with the following command:

eksctl delete cluster --name <cluster name> --region <region code>

Delete Amazon ECR

aws ecr delete-repository --repository-name ps-flagger-repository  --force


This post explained the process for setting up Amazon EKS cluster and how to leverage Flagger for progressive deployments along with Prometheus and Gloo Ingress Controller. You can enhance the deployments by integrating Flagger with Slack, Datadog, and webhook notifications for progressive deployments. Amazon EKS removes the undifferentiated heavy lifting of managing and updating the Kubernetes cluster. Managed node groups automate the provisioning and lifecycle management of worker nodes in an Amazon EKS cluster, which greatly simplifies operational activities such as new Kubernetes version deployments.

We encourage you to look into modernizing your DevOps platform from monolithic architecture to microservice-based architecture with Amazon EKS, and leverage Flagger with the right Ingress controller for secured and automated service releases.

Further Reading

Journey to adopt Cloud-Native DevOps platform Series #1: OfferUp modernized DevOps platform with Amazon EKS and Flagger to accelerate time to market

About the authors:

Purna Sanyal

Purna Sanyal is a technology enthusiast and an architect at AWS, helping digital native customers solve their business problems with successful adoption of cloud native architecture. He provides technical thought leadership, architecture guidance, and conducts PoCs to enable customers’ digital transformation. He is also passionate about building innovative solutions around Kubernetes, database, analytics, and machine learning.

A dive into redBus’s data platform and how they used Amazon QuickSight to accelerate business insights

Post Syndicated from Girish Kumar Chidananda original https://aws.amazon.com/blogs/big-data/a-dive-into-redbuss-data-platform-and-how-they-used-amazon-quicksight-to-accelerate-business-insights/

This post is co-authored with Girish Kumar Chidananda from redBus.

redBus is one of the earliest adopters of AWS in India, and most of its services and applications are hosted on the AWS Cloud. AWS provided redBus the flexibility to scale their infrastructure rapidly while keeping costs extremely low. AWS has a comprehensive suite of services to cater to most of their needs, including providing customer support that redBus can vouch for.

In this post, we share redBus’s data platform architecture, and how various components are connected to form their data highway. We also discuss the challenges redBus faced in building dashboards for their real-time business intelligence (BI) use cases, and how they used Amazon QuickSight, a fast, easy-to-use, cloud-powered business analytics service that makes it easy for all employees within redBus to build visualizations and perform ad hoc analysis to gain business insights from their data, any time, and on any device.

About redBus

redBus is the world’s largest online bus ticketing platform built in India and serving more than 36 million happy customers around the world. Along with its bus ticketing vertical, redBus also runs a rail ticketing service called redRails and a bus and car rental service called rYde. It is part of the GO-MMT group, which is India’s leading online travel company, with an extensive brand portfolio that includes other prominent online travel brands like MakeMyTrip and Goibibo.

redBus’s data highway 1.0

redBus relies heavily on making data-driven decisions at every level, from its traveler journey tracking, forecasting demand during high traffic, identifying and addressing bottlenecks in their bus operators signup process, and more. As redBus’s business started growing in terms of the number of cities and countries they operated in and the number of bus operators and travelers using the service in each city, the amount of incoming data also increased. The need to access and analyze the data in one place required them to build their own data platform, as shown in the following diagram.

redBus data platform 1.0

In the following sections, we look at each component in more detail.

Data ingestion sources

With the data platform 1.0, the data is ingested from various sources:

  • Real time – The real-time data flows from redBus mobile apps, the backend microservices, and when a passenger, bus operator, or application does any operation like booking bus tickets, searching the bus inventory, uploading a KYC document, and more
  • Batch mode – Scheduled jobs fetch data from multiple persistent data stores like Amazon Relational Database Service (Amazon RDS), where the OLTP data from all its applications are stored, Apache Cassandra clusters, where the bus inventory from various operators is stored, Arango DB, where the user identity graphs are stored, and more

Data cataloging

The real-time data is ingested into their self-managed Apache Nifi clusters, an open-source data platform that is used to clean, analyze, and catalog the data with its routing capabilities before sending the data to its destination.

Storage and analytics

redBus uses the following services for its storage and analytical needs:

  • Amazon Simple Storage Service (Amazon S3), an object storage service that provides the foundation for their data lake because of its virtually unlimited scalability and higher durability. Real-time data flows from Apache Druid and data from the data stores flow at regular intervals based on the schedules.
  • Apache Druid, an OLAP-style data store (data flows via Kafka Druid data loader), which computes facts and metrics against various dimensions during the data loading process.
  • Amazon Redshift, a cloud data warehouse service that helps you analyze exabytes of data and run complex analytical queries. redBus uses Amazon Redshift to store the processed data from Amazon S3 and the aggregated data from Apache Druid.

Querying and visualization

To make redBus as data-driven as possible, they ensured that the data is accessible to their SRE engineers, data engineers, and business analysts via a visualization layer. This layer features dashboards being served using Apache SuperSet, an open-source data visualization application, and Amazon Athena, an interactive query service to analyze data in Amazon S3 using standard SQL for ad hoc querying requirements.

The challenges

Initially, redBus handled data that was being ingested at the rate of 10 million events per day. Over time, as its business started growing, so did the data volume (from gigabytes to terabytes to petabytes), data ingestion per day (from 10 million to 320 million events), and its business intelligence dashboard needs. Soon after, they started facing challenges with their self-managed Superset’s BI capabilities, and the increased operational complexities.

Limited BI capabilities

redBus encountered the following BI limitations:

  • Inability to create visualizations from multiple data sources – Superset doesn’t allow creating visualizations from multiple tables within its data exploration layer. redBus data engineers had to have the tables joined beforehand at the data source level itself. In order to create a 360-degree view for redBus’s business stakeholders, it became inconvenient for data engineers to maintain multiple tables supporting the visualization layer.
  • No global filter for visuals in a dashboard – A global or primary filter across visuals in a dashboard is not supported in Superset. For example, consider there are visuals like Sales Wins by Region, YTD Revenue Realized by Region, Sales Pipeline by Region, and more in a dashboard, and a filter Region is added to the dashboard with values like EMEA, APAC, and US. The filter Region will only apply to one of the visuals, not the entire dashboard. However, dashboard users expected filtering across the dashboard.
  • Not a business-user friendly tool – Superset is highly developer centric when it comes to customization. For example, if a redBus business analyst had to customize a timed refresh that automatically re-queries every slice on a dashboard according to a pre-set value, then the analyst has to update the dashboard’s JSON metadata field. Therefore, having knowledge of JSON and its syntax is mandatory for doing any customization on the visuals or dashboard.

Increased operational cost

Although Superset is open source, which means there are no licensing costs, it also means there is more effort in maintaining all the components required for it to function as an enterprise-grade BI tool. redBus has deployed and maintained a web server (Nginx) fronted by an Application Load Balancer to do the load balancing; a metadata database server (MySQL) where Superset stores its internal information like users, slices, and dashboard definitions; an asynchronous task queue (Celery) for supporting long-running queries; a message broker (RabbitMQ); and a distributed caching server (Redis) for caching the results, charting data, and more on Amazon Elastic Compute Cloud (Amazon EC2) instances. The following diagram illustrates this architecture.

Apache Superset Deploment at redBus

redBus’s DevOps team had to do the heavy lifting of provisioning the infrastructure, taking backups, scaling the components manually as needed, upgrading the components individually, and more. It also required a Python web developer to be around for making the configurational changes so all the components work together seamlessly. All these manual operations increased the total cost of ownership for redBus.

Journey towards QuickSight

redBus started exploring BI solutions primarily around a couple of its dashboarding requirements:

  • BI dashboards for business stakeholders and analysts, where the data is sourced via Amazon S3 and Amazon Redshift.
  • A real-time application performance monitoring (APM) dashboard to help their SRE engineers and developers identify the root cause of an issue in their microservices deployment so they can fix the issues before they affect their customer’s experience. In this case, the data is sourced via Druid.

QuickSight fit into most of redBus’s BI dashboard requirements, and in no time their data platform team started with a proof of concept (POC) for a couple of their complex dashboards. At the end of the POC, which spanned a month’s time, the team shared their findings.

First, QuickSight is rich in BI capabilities, including the following:

  • It’s a self-service BI solution with drag-and-drop features that could help redBus analysts comfortably use it without any coding efforts.
  • Visualizations from multiple data sources in a single dashboard could help redBus business stakeholders get a 360-degree view of sales, forecasting, and insights in a single pane of glass.
  • Cascading filters across visuals and across sheets in a dashboard are much-needed features for redBus’s BI requirements.
  • QuickSight offers Excel-like visuals—tables with calculations, pivot tables with cell grouping, and styling are attractive for the viewers.
  • The Super-fast, Parallel, In-memory Calculation Engine (SPICE) in QuickSight could help redBus scale to hundreds of thousands of users, who can all simultaneously perform fast interactive analysis across a wide variety of AWS data sources.
  • Off-the-shelf ML insights and forecasting at no additional cost would allow redBus’s data science team to focus on ML models besides sales forecasting and similar models.
  • Built-in row-level security (RLS) could allow redBus to grant filtered access for their viewers. For example, redBus has many business analysts who manage different countries. With RLS, each business analyst only sees data related to their assigned country within a single dashboard.
  • redBus uses OneLogin as its identity provider, which supports Security Assertion Markup Language 2.0 (SAML 2.0). With the help of identity federation and single sign-on support from QuickSight, redBus could provide a simple onboarding flow for their QuickSight users.
  • QuickSight offers built-in alerts and email notification capabilities.

Secondly, QuickSight is a fully managed, cloud-native, serverless BI service offering from AWS, with the following features:

  • redBus engineers don’t need to focus on the heavy lifting of provisioning, scaling, and maintaining their BI solution on EC2 instances.
  • QuickSight offers native integration with AWS services like Amazon Redshift, Amazon S3, and Athena, and other popular frameworks like Presto, Snowflake, Teradata, and more. QuickSight connects to most of the data sources that redBus already has except Apache Druid, because native integration with Druid was not available as of December 2022. For a complete list of the supported data sources, see Supported data sources.

The outcome

Considering all the rich features and lower total cost of ownership, redBus chose QuickSight for their BI dashboard requirements. With QuickSight, redBus’s data engineers have built a number of dashboards in no time to give insights from petabytes of data to business stakeholders and analysts. The redBus data highway evolved to bring business intelligence to a much wider audience in their organization, with better performance and faster time-to-value. As of November 2022, it combines QuickSight for business users and Superset for real-time APM dashboards (at the time of writing, QuickSight doesn’t offer a native connector to Druid), as shown in the following diagram.

redBus data platform 2.0

Sales anomaly detection dashboard

Although there are many dashboards that redBus deployed to production, sales anomaly detection is one of the interesting dashboards that redBus built. It uses redBus’s proprietary sales forecasting model, which in turn is sourced by historical sales data from Amazon Redshift tables and real-time sales data from Druid tables, as shown in the following figure.

Sales anomaly detection data flow

At regular intervals, the scheduled jobs feed the redBus forecasting model with real-time and historical sales data, and then the forecasted data is pushed into an Amazon Redshift table. The sales anomaly detection dashboard in QuickSight is served by the resultant Amazon Redshift table.

The following is one of the visuals from the sales anomaly detection dashboard. It’s built using a line chart representing hourly actual sales, predicted sales, and an alert threshold for a time series for a particular business cohort in redBus.

Sales and Predicted Sales for a particular cohort

In this visual, each bar represents the number of sales anomalies triggered at a particular point in the time series.

redBus’s analysts could further drill down to the sales details and anomalies at the minute level, as shown in the following diagram. This drill-down feature comes out of the box with QuickSight.

Drill-Down Chart - Sales and Predicted Sales for a particular cohort

For more details on adding drill-downs to QuickSight dashboard visuals, see Adding drill-downs to visual data in Amazon QuickSight.

Apart from the visuals, it has become one of viewers’ favorite dashboards at redBus due to the following notable features:

  • Because filtering across visuals is an out-of-the-box feature in QuickSight, a timestamp-based filter is added to the dashboard. This helps in filtering multiple visuals in the dashboard in a single click.
  • URL actions configured on the visuals help the viewers navigate to the context-sensitive in-house applications.
  • Email alerts configured on KPIs and Gauge visuals help the viewers get notifications on time.

Next steps

Apart from building new dashboards for their BI dashboard needs, redBus is taking the following next steps:

  • Exploring QuickSight Embedded Analytics for a couple of their application requirements to accelerate time to insights for users with in-context data visuals, interactive dashboards, and more directly within applications
  • Exploring QuickSight Q, which could enable their business stakeholders to ask questions in natural language and receive accurate answers with relevant visualizations that can help them gain insights from the data
  • Building a unified dashboarding solution using QuickSight covering all their data sources as integrations become available


In this post, we showed you how redBus built its data platform using various AWS services and Apache frameworks, the challenges the platform went through (especially in their BI dashboard requirements and challenges while scaling), and how they used QuickSight and lowered the total cost of ownership.

To know more about engineering at redBus, check out their medium blog posts. To learn more about what is happening in QuickSight or if you have any questions, reach out to the QuickSight Community, which is very active and offers several resources.

About the Authors

Author: Girish Chidanand
Girish Kumar Chidananda
works as a Senior Engineering Manager – Data Engineering at redBus, where he has been building various data engineering applications and components for redBus for the last 5 years. Prior to starting his journey in the IT industry, he worked as a Mechanical and Control systems engineer in various organizations, and he holds an MS degree in Fluid Power Engineering from University of Bath.

Author: Kayalvizhi Kandasamy
Kayalvizhi Kandasamy
works with digital-native companies to support their innovation. As a Senior Solutions Architect (APAC) at Amazon Web Services, she uses her experience to help people bring their ideas to life, focusing primarily on microservice architectures and cloud-native solutions using AWS services. Outside of work, she likes playing chess and is a FIDE rated chess player. She also coaches her daughters the art of playing chess, and prepares them for various chess tournaments.

Manually Approving Security Changes in CDK Pipeline

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/devops/manually-approving-security-changes-in-cdk-pipeline/

In this post I will show you how to add a manual approval to AWS Cloud Development Kit (CDK) Pipelines to confirm security changes before deployment. With this solution, when a developer commits a change, CDK pipeline identifies an IAM permissions change, pauses execution, and sends a notification to a security engineer to manually approve or reject the change before it is deployed.


In my role I talk to a lot of customers that are excited about the AWS Cloud Development Kit (CDK). One of the things they like is that L2 constructs often generate IAM and other security policies. This can save a lot of time and effort over hand coding those policies. Most customers also tell me that the policies generated by CDK are more secure than the policies they generate by hand.

However, these same customers are concerned that their security engineering team does not know what is in the policies CDK generates. In the past, these customers spent a lot of time crafting a handful of IAM policies that developers can use in their apps. These policies were well understood, but overly permissive because they were often reused across many applications.

Customers want more visibility into the policies CDK generates. Luckily CDK provides a mechanism to approve security changes. If you are using CDK, you have probably been prompted to approve security changes when you run cdk deploy at the command line. That works great on a developer’s machine, but customers want to build the same confirmation into their continuous delivery pipeline. CDK provides a mechanism for this with the ConfirmPermissionsBroadening action. Note that ConfirmPermissionsBroadening is only supported by the AWS CodePipline deployment engine.


Before I talk about ConfirmPermissionsBroadening, let me review how CDK creates IAM policies. Consider the “Hello, CDK” application created in AWS CDK Workshop. At the end of this module, I have an AWS Lambda function and an Amazon API Gateway defined by the following CDK code.

// defines an AWS Lambda resource
const hello = new lambda.Function(this, 'HelloHandler', {
  runtime: lambda.Runtime.NODEJS_14_X,    // execution environment
  code: lambda.Code.fromAsset('lambda'),  // code loaded from "lambda" directory
  handler: 'hello.handler'                // file is "hello", function is "handler"

// defines an API Gateway REST API resource backed by our "hello" function.
new apigw.LambdaRestApi(this, 'Endpoint', {
  handler: hello

Note that I did not need to define the IAM Role or Lambda Permissions. I simply passed a refence to the Lambda function to the API Gateway (line 10 above). CDK understood what I was doing and generated the permissions for me. For example, CDK generated the following Lambda Permission, among others.

  "Effect": "Allow",
  "Principal": {
    "Service": "apigateway.amazonaws.com"
  "Action": "lambda:InvokeFunction",
  "Resource": "arn:aws:lambda:us-east-1:123456789012:function:HelloHandler2E4FBA4D",
  "Condition": {
    "ArnLike": {
      "AWS:SourceArn": "arn:aws:execute-api:us-east-1:123456789012:9y6ioaohv0/prod/*/"

Notice that CDK generated a narrowly scoped policy, that allows a specific API (line 10 above) to call a specific Lambda function (line 7 above). This policy cannot be reused elsewhere. Later in the same workshop, I created a Hit Counter Construct using a Lambda function and an Amazon DynamoDB table. Again, I associated them using a single line of CDK code.


As in the prior example, CDK generated a narrowly scoped IAM policy. This policy allows the Lambda function to perform certain actions (lines 4-11) on a specific table (line 14 below).

  "Effect": "Allow",
  "Action": [
  "Resource": [

As you can see, CDK is doing a lot of work for me. In addition, CDK is creating narrowly scoped policies for each resource, rather than sharing a broadly scoped policy in multiple places.

CDK Pipelines Permissions Checks

Now that I have reviewed how CDK generates policies, let’s discuss how I can use this in a Continuous Deployment pipeline. Specifically, I want to allow CDK to generate policies, but I want a security engineer to review any changes using a manual approval step in the pipeline. Of course, I don’t want security to be a bottleneck, so I will only require approval when security statements or traffic rules are added. The pipeline should skip the manual approval if there are no new security rules added.

Let’s continue to use CDK Workshop as an example. In the CDK Pipelines module, I used CDK to configure AWS CodePipeline to deploy the “Hello, CDK” application I discussed above. One of the last things I do in the workshop is add a validation test using a post-deployment step. Adding a permission check is similar, but I will use a pre-deployment step to ensure the permission check happens before deployment.

First, I will import ConfirmPermissionsBroadening from the pipelines package

import {ConfirmPermissionsBroadening} from "aws-cdk-lib/pipelines";

Then, I can simply add ConfirmPermissionsBroadening to the deploySatage using the addPre method as follows.

const deploy = new WorkshopPipelineStage(this, 'Deploy');
const deployStage = pipeline.addStage(deploy);

  new ConfirmPermissionsBroadening("PermissionCheck", {
    stage: deploy

    // Post Deployment Test Code Omitted

Once I commit and push this change, a new manual approval step called PermissionCheck.Confirm is added to the Deploy stage of the pipeline. In the future, if I push a change that adds additional rules, the pipeline will pause here and await manual approval as shown in the screenshot below.

Figure 1. Pipeline waiting for manual review

Figure 1. Pipeline waiting for manual review

When the security engineer clicks the review button, she is presented with the following dialog. From here, she can click the URL to see a summary of the change I am requesting which was captured in the build logs. She can also choose to approve or reject the change and add comments if needed.

Figure 2. Manual review dialog with a link to the build logsd

Figure 2. Manual review dialog with a link to the build logs

When the security engineer clicks the review URL, she is presented with the following sumamry of security changes.

Figure 3. Summary of security changes in the build logs

Figure 3. Summary of security changes in the build logs

The final feature I want to add is an email notification so the security engineer knows when there is something to approve. To accomplish this, I create a new Amazon Simple Notification Service (SNS) topic and subscription and associate it with the ConfirmPermissionsBroadening Check.

// Create an SNS topic and subscription for security approvals
const topic = new sns.Topic(this, 'SecurityApproval’);
topic.addSubscription(new subscriptions.EmailSubscription('[email protected]')); 

  new ConfirmPermissionsBroadening("PermissionCheck", {
    stage: deploy,
    notificationTopic: topic

With the notification configured, the security engineer will receive an email when an approval is needed. She will have an opportunity to review the security change I made and assess the impact. This gives the security engineering team the visibility they want into the policies CDK is generating. In addition, the approval step is skipped if a change does not add security rules so the security engineer does not become a bottle neck in the deployment process.


AWS Cloud Development Kit (CDK) automates the generation of IAM and other security policies. This can save a lot of time and effort but security engineering teams want visibility into the policies CDK generates. To address this, CDK Pipelines provides the ConfirmPermissionsBroadening action. When you add ConfirmPermissionsBroadening to your CI/CD pipeline, CDK will wait for manual approval before deploying a change that includes new security rules.

About the author:

Brian Beach

Brian Beach has over 20 years of experience as a Developer and Architect. He is currently a Principal Solutions Architect at Amazon Web Services. He holds a Computer Engineering degree from NYU Poly and an MBA from Rutgers Business School. He is the author of “Pro PowerShell for Amazon Web Services” from Apress. He is a regular author and has spoken at numerous events. Brian lives in North Carolina with his wife and three kids.

How NETSCOUT built a global DDoS awareness platform with Amazon OpenSearch Service

Post Syndicated from Hardik Modi original https://aws.amazon.com/blogs/big-data/how-netscout-built-a-global-ddos-awareness-platform-with-amazon-opensearch-service/

This post was co-written with Hardik Modi, AVP, Threat and Migitation Products at NETSCOUT.

NETSCOUT Omnis Threat Horizon is a global cybersecurity awareness platform providing users with highly contextualized visibility into “over the horizon” threat activity on the global DDoS (Distributed Denial of Service) landscape—threats that could be impacting their industry, their customers, or their suppliers. It allows visitors to create custom profiles and understand DDoS activity that is being observed in near-real time through NETSCOUT’s ATLAS visibility platform. Users can create free accounts to create customized profiles that lead to a map-based visualization (as in the following screenshot) as well as tailored summary reporting. DDoS attacks can be impactful to services delivered over the internet. Visibility of this nature is key to anyone who wishes to understand what is happening on the threat landscape. Omnis Threat Horizon has been generally available since August 2019.

NETSCOUT Omnis Threat Horizon: Real-Time DDoS Attack Map

To provide continuous visibility at a low per-user cost (to enable a free service), the NETSCOUT development team chose a series of AWS technologies to power the collection, storage, analysis, warehousing, user authentication, and delivery of the application. In particular, they chose Amazon OpenSearch Service as the core analytics engine. They store all processed attack records in OpenSearch Service.

This post discusses the challenges and design patterns NETSCOUT used on its path to representing the details of roughly 10 million annual DDoS attacks in near-real time.


NETSCOUT, through its Arbor product line, is a long-time provider of solutions for network visibility and DDoS mitigation for service providers and enterprises. Since 2007, NETSCOUT has operated a program called ATLAS, in which customers can opt to share anonymized data about the DDoS attacks they are observing on their network. As this program has matured, NETSCOUT has comprehensive visibility into the DDoS attack landscape—both the number and nature of attacks. This visibility informs and improves their products, allowing them to share analysis findings in the form of papers, blog posts, and a biannual threat report. Since NETSCOUT started collecting and analyzing data in the current form in September 2012, they have observed 96 million attacks, allowing them to perform considerable analysis of trends across regions and verticals, as well as understand the vectors used and sizes of attacks.

Omnis Threat Horizon is a solution to display this information to a broader audience—essentially anyone interested in the threat landscape, and specifically the DDoS attack trends at any given time. In addition to providing real-time maps, the solution allows the user to go back in time to observe visually or in summary form what might have been happening at a given time.

They wanted to make sure that the visual elements and application was responsive globally, both in terms of representing real-time data as well as showing historical information. Furthermore, they wanted to keep the incremental cost per user as low as possible, in order to be able to provide this service for free globally.

Solution overview

The following diagram illustrates the solution architecture.

One of the objectives behind the chosen solution was to utilize native AWS services in every instance possible. Furthermore, they chose to break component functionality into their own microservices, and make consistent use of this through the solution.

Individual monitoring sensors deliver data to Amazon Simple Storage Service (Amazon S3) on an hourly basis. As new entries are received, Amazon Simple Notification Service (Amazon SNS) notifications are delivered, resulting in processing of the data. Successive microservices are responsible for:

  • Parsing
  • Running algorithms to identify and separate spurious entries
  • Deduplication
  • Scoring
  • Confidence

After this processing, each attack is represented as a separate document in the OpenSearch Service domain. As of writing this post, NETSCOUT has approximately 96 million attacks in the cluster, all of which can be represented in some form in the maps and reports in Omnis Threat Horizon.

The data is organized in hourly bin files, and served to the application via Amazon CloudFront.

Lessons learned related to Elasticsearch

On previous projects, NETSCOUT tried Apache Cassandra, a popular NoSQL open-source database, and considered it inadequate for aggregation queries. While developing Horizon, they chose Elasticsearch to get access to more powerful aggregation query capabilities with significantly less developer time.

They started with a self-managed instance, but faced the following issues:

  • Considerable expenditure of person hours simply to manage the infrastructure
  • Each version upgrade was an involved process, requiring a lot of planning and still posing technical challenges along the way
  • No auto scaling and big aggregation queries could break Elasticsearch

After a few cycles of powering through this, they moved to OpenSearch Service to overcome these challenges.


NETSCOUT saw the following benefits from this architecture:

  • Fast processing of attack data – The time from when attack data is received to when it is available in the data store is of the order of seconds, allowing them to provide near-real-time visibility in the solution.
  • Lower management overhead – The data store grows consistently, and by using a managed service, teams avoid having to perform tasks related to cluster management. This was a big pain point with previous solutions adopted involving the same technology.
  • Scalable architecture – It’s possible to add new capabilities into the pipeline as requirements emerge, without rearchitecting other components.


With OpenSearch Service, NETSCOUT has been able to build a resilient data store for the attack data they capture. As a result of architectural choices made and the underlying AWS services, they’re able to provide visibility into their data at small incremental costs, allowing them to provide a global visibility platform at no cost to the end-user.

With the most experience, the most reliable, scalable, and secure cloud, and the most comprehensive set of services and solutions, AWS is the best place to unlock value from your data and turn it into insight.

About the Authors

Hardik Modi is AVP, Threat and Migitation Products at NETSCOUT. In this role, he oversees the teams responsible for mitigation products as well as the creation of security content for NETSCOUTs products, enabling best-in-class protection for users, as well as the continuous delivery and publication of impactful research across the DDoS and Intrusion landscapes.

Sujatha Kuppuraju is a Principal Solutions Architect at Amazon Web Services (AWS). She engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services.

Mike Arruda is a Senior Technical Account Manager at AWS, based in the New England area. He works with AWS Enterprise customers, supporting their success in adopting best practices and helping them achieve their desired business outcomes with AWS.

Setting up a secure CI/CD pipeline in a private Amazon Virtual Private Cloud with no public internet access

Post Syndicated from MJ Kubba original https://aws.amazon.com/blogs/devops/setting-up-a-secure-ci-cd-pipeline-in-a-private-amazon-virtual-private-cloud-with-no-public-internet-access/

With the rise of the cloud and increased security awareness, the use of private Amazon VPCs with no public internet access also expanded rapidly. This setup is recommended to make sure of proper security through isolation. The isolation requirement also applies to code pipelines, in which developers deploy their application modules, software packages, and other dependencies and bundles throughout the development lifecycle. This is done without having to push larger bundles from the developer space to the staging space or the target environment. Furthermore, AWS CodeArtifact is used as an artifact management service that will help organizations of any size to securely store, publish, and share software packages used in their software development process.

We’ll walk through the steps required to build a secure, private continuous integration/continuous development (CI/CD) pipeline with no public internet access while maintaining log retention in Amazon CloudWatch. We’ll utilize AWS CodeCommit for source, CodeArtifact for the Modules and software packages, and Amazon Simple Storage Service (Amazon S3) as artifact storage.


The prerequisites for following along with this post include:

  • An AWS Account
  • A Virtual Private Cloud (Amazon VPC)
  • A CI/CD pipeline – This can be CodePipeline, Jenkins or any CI/CD tool you want to integrate CodeArtifact with, we will use CodePipeline in our walkthrough here.

Solution walkthrough

The main service we’ll focus on is CodeArtifact, a fully managed artifact repository service that makes it easy for organizations of any size to securely store, publish, and share software packages used in their software development process. CodeArtifact works with commonly used package managers and build tools, such as Maven and Gradle (Java), npm and yarn (JavaScript), pip and twine (Python), or NuGet (.NET).

user checkin code to CodeCommit, CodePipeline will detect the change and start the pipeline, in CodeBuild the build stage will utilize the private endpoints and download the software packages needed without the need to go over the internet.

Users push code to CodeCommit, CodePipeline will detect the change and start the pipeline, in CodeBuild the build stage will utilize the private endpoints and download the software packages needed without the need to go over the internet.

The preceding diagram shows how the requests remain private within the VPC and won’t go through the Internet gateway, by going from CodeBuild over the private endpoint to CodeArtifact service, all within the private subnet.

The requests will use the following VPC endpoints to connect to these AWS services:

  • CloudWatch Logs endpoint (for CodeBuild to put logs in CloudWatch)
  • CodeArtifact endpoints
  • AWS Security Token Service (AWS STS) endpoint
  • Amazon Simple Storage Service (Amazon S3) endpoint


  1. Create a CodeCommit Repository:
    1. Navigate to your CodeCommit Console then click on Create repository
Screenshot: Create repository button

Figure 2. Screenshot: Create repository button.

    1. Type in name for the repository then click Create
Screenshot: Repository setting with name shown as "Private" and empty Description

Figure 3. Screenshot: Repository setting with name shown as “Private” and empty Description.

    1. Scroll down and click Create file
Figure 4. Create file button.

Figure 4. Create file button.

    1. Copy the example buildspec.yml file and paste it to the editor

Example buildspec.yml file:

version: 0.2
        nodejs: 16
      - export AWS_STS_REGIONAL_ENDPOINTS=regional
      - ACCT=`aws sts get-caller-identity --region ${AWS_REGION} --query Account --output text`
      - aws codeartifact login --tool npm --repository Private --domain private --domain-owner ${ACCT}
      - npm install
      - node index.js
    1. Name the file buildspec.yml, type in your name and your email address then Commit changes
Figure 5. Screenshot: Create file page.

Figure 5. Screenshot: Create file page.

  1. Create CodeArtifact
    1. Navigate to your CodeArtifact Console then click on Create repository
    2. Give it a name and select npm-store as public upsteam repository
Figure 6. Screenshot: Create repository page with Repository name "Private".

Figure 6. Screenshot: Create repository page with Repository name “Private”.

    1. For the Domain Select this AWS account and enter a domain name
Figure 7. Screenshot: Select domain page.

Figure 7. Screenshot: Select domain page.

    1. Click Next then Create repository
Figure 8. Screenshot: Create repository review page.

Figure 8. Screenshot: Create repository review page.

  1. Create a CI/CD using CodePipeline
    1. Navigate to your CodePipeline Console then click on Create pipeline
Figure 9. Screenshot: Create pipeline button.

Figure 9. Screenshot: Create pipeline button.

    1. Type a name, leave the Service role as “New service role” and click next
Figure 10. Screenshot: Choose pipeline setting page with pipeline name "Private".

Figure 10. Screenshot: Choose pipeline setting page with pipeline name “Private”.

    1. Select AWS CodeCommit as your Source provider
    2. Then choose the CodeCommit repository you created earlier and for branch select main then click Next
Figure 11. Screenshot: Create pipeline add source stage.

Figure 11. Screenshot: Create pipeline add source stage.

    1. For the Build Stage, Choose AWS CodeBuild as the build provider, then click Create Project
Figure 12. Screenshot: Create pipeline add build stage.

Figure 12. Screenshot: Create pipeline add build stage.

    1. This will open new window to create the new Project, Give this project a name
Figure 13. Screenshot: Create pipeline create build project window.

Figure 13. Screenshot: Create pipeline create build project window.

    1.  Scroll down to the Environment section: select pick Managed image,
    2. For Operating system select “Amazon Linux 2”,
    3. Runtime “Standard” and
    4. For Image select the aws/codebuild/amazonlinux2-x86+64-standard:4.0
      For the Image version: Always use the latest image for this runtime version
    5. Select Linux for the Environment type
    6. Leave the Privileged option unchecked and set Service Role to “New service role”
Figure 14. Screenshot: Create pipeline create build project, setting up environment window.

Figure 14. Screenshot: Create pipeline create build project, setting up environment window.

    1. Expand Additional configurations and scroll down to the VPC section, select the desired VPC, your Subnets (we recommend selecting multiple AZs, to ensure high availability), and Security Group (the security group rules must allow resources that will use the VPC endpoint to communicate with the AWS service to communicate with the endpoint network interface, default VPC security group will be used here as an example)
Figure 15. Screenshot: Create pipeline create build project networking window.

Figure 15. Screenshot: Create pipeline create build project networking window.

    1. Scroll down to the Buildspec and select “Use a buildspec file” and type “buildspec.yml” for the Buildspec name
Figure 16. Screenshot: Create pipeline create build project buildspec window.

Figure 16. Screenshot: Create pipeline create build project buildspec window.

    1. Select the CloudWatch logs option you can leave the group name and stream empty this will let the service use the default values and click Continue to CodePipeline
Figure 17. Screenshot: Create pipeline create build project logs window.

Figure 17. Screenshot: Create pipeline create build project logs window.

    1. This will create the new CodeBuild Project, update the CodePipeline page, now you can click Next
Figure 18. Screenshot: Create pipeline add build stage window.

Figure 18. Screenshot: Create pipeline add build stage window.

    1.  Since we are not deploying this to any environment, you can skip the deploy stage and click “Skip deploy stage”

Figure 19. Screenshot: Create pipeline add deploy stage.

Figure 20. Screenshot: Create pipeline skip deployment stage confirmation.

Figure 20. Screenshot: Create pipeline skip deployment stage confirmation.

    1. After you get the popup click skip again you’ll see the review page, scroll all the way down and click Create Pipeline
  1. Create a VPC endpoint for Amazon CloudWatch Logs. This will enable CodeBuild to send execution logs to CloudWatch:
    1. Navigate to your VPC console, and from the navigation menu on the left select “Endpoints”.
Figure 21. Screenshot: VPC endpoint.

Figure 21. Screenshot: VPC endpoint.

    1.  click Create endpoint Button.
Figure 22. Screenshot: Create endpoint.

Figure 22. Screenshot: Create endpoint.

    1. For service Category, select “AWS Services”. You can set a name for the new endpoint, and make sure to use something descriptive.
Figure 23. Screenshot: Create endpoint page.

Figure 23. Screenshot: Create endpoint page.

    1. From the list of services, search for the endpoint by typing logs in the search bar and selecting the one with com.amazonaws.us-west-2.logs.
      This walkthrough can be done in any region that supports the services. I am going to be using us-west-2, please select the appropriate region for your workload.
Figure 24. Screenshot: create endpoint select services with com.amazonaws.us-west-2.logs selected.

Figure 24. Screenshot: create endpoint select services with com.amazonaws.us-west-2.logs selected.

    1. Select the VPC that you want the endpoint to be associated with, and make sure that the Enable DNS name option is checked under additional settings.
Figure 25. Screenshot: create endpoint VPC setting shows VPC selected.

Figure 25. Screenshot: create endpoint VPC setting shows VPC selected.

    1. Select the Subnets where you want the endpoint to be associated, and you can leave the security group as default and the policy as empty.
Figure 26. Screenshot: create endpoint subnet setting shows 2 subnet selected and default security group selected.

Figure 26. Screenshot: create endpoint subnet setting shows 2 subnet selected and default security group selected.

    1. Select Create Endpoint.
Figure 27. Screenshot: create endpoint button.

Figure 27. Screenshot: create endpoint button.

  1. Create a VPC endpoint for CodeArtifact. At the time of writing this article, CodeArifact has two endpoints: one is for API operations like service level operations and authentication, and the other is for using the service such as getting modules for our code. We’ll need both endpoints to automate working with CodeArtifact. Therefore, we’ll create both endpoints with DNS enabled.

In addition, we’ll need AWS Security Token Service (AWS STS) endpoint for get-caller-identity API call:

Follow steps a-c from the steps that were used from the creating the Logs endpoint above.

a. From the list of services, you can search for the endpoint by typing codeartifact in the search bar and selecting the one with com.amazonaws.us-west-2.codeartifact.api.

Figure 28. Screenshot: create endpoint select services with com.amazonaws.us-west-2.codeartifact.api selected.

Figure 28. Screenshot: create endpoint select services with com.amazonaws.us-west-2.codeartifact.api selected.

Follow steps e-g from Part 4.

Then, repeat the same for com.amazon.aws.us-west-2.codeartifact.repositories service.

Figure 29. Screenshot: create endpoint select services with com.amazonaws.us-west-2.codeartifact.api selected.

Figure 29. Screenshot: create endpoint select services with com.amazonaws.us-west-2.codeartifact.api selected.

  1. Enable a VPC endpoint for AWS STS:

Follow steps a-c from Part 4

a. From the list of services you can search for the endpoint by typing sts in the search bar and selecting the one with com.amazonaws.us-west-2.sts.

Figure 30.Screenshot: create endpoint select services with com.amazon.aws.us-west-2.codeartifact.repositories selected.

Figure 30.Screenshot: create endpoint select services with com.amazon.aws.us-west-2.codeartifact.repositories selected.

Then follow steps e-g from Part 4.

  1. Create a VPC endpoint for S3:

Follow steps a-c from Part 4

a. From the list of services you can search for the endpoint by typing sts in the search bar and selecting the one with com.amazonaws.us-west-2.s3, select the one with type of Gateway

Then select your VPC, and select the route tables for your subnets, this will auto update the route table with the new S3 endpoint.

Figure 31. Screenshot: create endpoint select services with com.amazonaws.us-west-2.s3 selected.

Figure 31. Screenshot: create endpoint select services with com.amazonaws.us-west-2.s3 selected.

  1. Now we have all of the endpoints set. The last step is to update your pipeline to point at the CodeArtifact repository when pulling your code dependencies. I’ll use CodeBuild buildspec.yml as an example here.

Make sure that your CodeBuild AWS Identity and Access Management (IAM) role has the permissions to perform STS and CodeArtifact actions.

Navigate to IAM console and click Roles from the left navigation menu, then search for your IAM role name, in our case since we selected “New service role” option in step 2.k was created with the name “codebuild-Private-service-role” (codebuild-<BUILD PROJECT NAME>-service-role)

Figure 32. Screenshot: IAM roles with codebuild-Private-service-role role shown in search.

Figure 32. Screenshot: IAM roles with codebuild-Private-service-role role shown in search.

From the Add permissions menu, click on Create inline policy

Search for STS in the services then select STS

Figure 34. Screenshot: IAM visual editor with sts shown in search.

Figure 34. Screenshot: IAM visual editor with sts shown in search.

Search for “GetCallerIdentity” and select the action

Figure 35. Screenshot: IAM visual editor with GetCallerIdentity in search and action selected.

Figure 35. Screenshot: IAM visual editor with GetCallerIdentity in search and action selected.

Repeat the same with “GetServiceBearerToken”

Figure 36. Screenshot: IAM visual editor with GetServiceBearerToken in search and action selected.

Figure 36. Screenshot: IAM visual editor with GetServiceBearerToken in search and action selected.

Click on Review, add a name then click on Create policy

Figure 37. Screenshot: Review page and Create policy button.

Figure 37. Screenshot: Review page and Create policy button.

You should see the new inline policy added to the list

Figure 38. Screenshot: shows the new in-line policy in the list.

Figure 38. Screenshot: shows the new in-line policy in the list.

For CodeArtifact actions we will do the same on that role, click on Create inline policy

Figure 39. Screenshot: attach policies.

Figure 39. Screenshot: attach policies.

Search for CodeArtifact in the services then select CodeArtifact

Figure 40. Screenshot: select service with CodeArtifact in search.

Figure 40. Screenshot: select service with CodeArtifact in search.

Search for “GetAuthorizationToken” in actions and select that action in the check box

Figure 41. CodeArtifact: with GetAuthorizationToken in search.

Figure 41. CodeArtifact: with GetAuthorizationToken in search.

Repeat for “GetRepositoryEndpoint” and “ReadFromRepository”

Click on Resources to fix the 2 warnings, then click on Add ARN on the first one “Specify domain resource ARN for the GetAuthorizationToken action.”

Figure 42. Screenshot: with all selected filed and 2 warnings.

Figure 42. Screenshot: with all selected filed and 2 warnings.

You’ll get a pop up with fields for Region, Account and Domain name, enter your region, your account number, and the domain name, we used “private” when we created our domain earlier.

Figure 43. Screenshot: Add ARN page.

Figure 43. Screenshot: Add ARN page.

Then click Add

Repeat the same process for “Specify repository resource ARN for the ReadFromRepository and 1 more”, and this time we will provide Region, Account ID, Domain name and Repository name, we used “Private” for the repository we created earlier and “private” for domain

Figure 44. Screenshot: add ARN page.

Figure 44. Screenshot: add ARN page.

Note it is best practice to specify the resource we are targeting, we can use the checkbox for “Any” but we want to narrow the scope of our IAM role best we can.

  1. Navigate to CodeCommit then click on the repo you created earlier in step1
Figure 45. Screenshot: CodeCommit repo.

Figure 45. Screenshot: CodeCommit repo.

Click on Add file dropdown, then Create file button

Paste the following in the editor space:

  "dependencies": {
    "mathjs": "^11.2.0"

Name the file “package.json”

Add your name and email, and optional commit message

Repeat this process for “index.js” and paste the following in the editor space:

const { sqrt } = require('mathjs')

Figure 46. Screenshot: CodeCommit Commit changes button.

Figure 46. Screenshot: CodeCommit Commit changes button.

This will force the pipeline to kick off and start building the application

Figure 47. Screenshot: CodePipeline.

Figure 47. Screenshot: CodePipeline.

This is a very simple application that gets the square root of 49 and log it to the screen, if you click on the Details link from the pipeline build stage, you’ll see the output of running the NodeJS application, the logs are stored in CloudWatch and you can navigate there by clicking on the link the View entire log “Showing the last xx lines of the build log. View entire log”

Figure 48. Screenshot: Showing the last 54 lines of the build log. View entire log.

Figure 48. Screenshot: Showing the last 54 lines of the build log. View entire log.

We used npm example in the buildspec.yml above, Similar setup will be used for pip and twine,

For Maven, Gradle, and NuGet, you must set Environment variables and change your settings.xml and build.gradle, as well as install the plugin for your IDE. For more information, see here.


Navigate to VPC endpoint from the AWS console and delete the endpoints that you created.

Navigate to CodePipeline and delete the Pipeline you created.

Navigate to CodeBuild and delete the Build Project created.

Navigate to CodeCommit and delete the Repository you created.

Navigate to CodeArtifact and delete the Repository and the domain you created.

Navigate to IAM and delete the Roles created:

For CodeBuild: codebuild-<Build Project Name>-service-role

For CodePipeline: AWSCodePipelineServiceRole-<Region>-<Project Name>


In this post, we deployed a full CI/CD pipeline with CodePipeline orchestrating CodeBuild to build and test a small NodeJS application, using CodeArtifact to download the application code dependencies. All without going to the public internet and maintaining the logs in CloudWatch.

About the author:

MJ Kubba

MJ Kubba is a Solutions Architect who enjoys working with public sector customers to build solutions that meet their business needs. MJ has over 15 years of experience designing and implementing software solutions. He has a keen passion for DevOps and cultural transformation.

re:Invent 2022 DevOps and Developer Productivity Playlist

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/devops/reinvent-2022-devops-and-developer-productivity-playlist/

Danielle Kucera, Karun Bakshi, and I were privileged to organize the DevOps and Developer Productivity (DOP) track for re:Invent 2022. For 2022, the DOP track included 58 sessions and nearly 100 speakers.  If you weren’t able to attend, I have compiled a list of the on-demand sessions for you below.

Leadership Sessions

Delighting developers: Builder experience at AWS Adam Seligman, Vice President of Developer Experience, and Emily Freeman, Head of Community Development, share the latest AWS tools and experiences for teams developing in the cloud. Adam recaps the latest launches and demos how key services can integrate to accelerate developer productivity.

Amazon CodeCatalyst

Amazon CodeCatalyst, announced during Dr. Werner Vogels Keynote, is a unified software development service that makes it faster to build and deliver on AWS.

Introducing Amazon CodeCatalyst – Harry Mower, Director of DevOps Services, and Doug Clauson, Product Manager, provide an overview of Amazon CodeCatalyst. CodeCatalyst provides one place where you can plan work, collaborate on code, and build, test, and deploy applications with nearly continuous integration/continuous delivery (CI/CD) tools.

Deep dive on CodeCatalyst Workspaces – Tmir Karia, Sr. Product Manager, and Rahul Gulati, Sr. Product Manager,  discuss how Amazon CodeCatalyst Workspaces decreases the time you spend creating and maintaining a local development environment and allows you to quickly set up a cloud development workspace, switch between projects, and replicate the development workspace configuration across team members.


AWS Well-Architected best practices for DevOps on AWS Elamaran Shanmugam, Sr. Container Specialist, and Deval Perikh, Sr. Enterprise Solutions Architect, discuss the components required to align your DevOps practices to the pillars of the AWS Well-Architected Framework.

Best practices for securing your software delivery lifecycle Jams Bland, Principal Solutions Architect, and Curtis Rissi, Principal Solutions Architect, discus ways you can secure your CI/CD pipeline on AWS. Review topics like security of the pipeline versus security in the pipeline, ways to incorporate security checkpoints across various pipeline stages, security event management, and aggregating vulnerability findings into a single pane of glass.

Build it & run it: Streamline your DevOps capabilities with machine learning Rafael Ramos, Shivansh Singh, and Jared Reimer discuss how to use machine learning–powered tools like Amazon CodeWhisperer, Amazon CodeGuru, and Amazon DevOps Guru to boost your applications’ availability and write software faster and more reliably.

Infrastructure as Code

AWS infrastructure as code: A year in review  Tatiana Cooke, Principal Product Manager, and Ben Perak, Principal Product Manage, discuss the new features and improvements for AWS infrastructure as code with AWS CloudFormation and AWS CDK.

How to reuse patterns when developing infrastructure as code Ryan Bachman, Ethan Rucinski, and Ravi Palakodeti explore AWS Cloud Development Kit (AWS CDK) constructs and AWS CloudFormation modules and how they make it easier to build applications on AWS.

Governance and security with infrastructure as code David Hessler, Senior DevOps Consultant, and Eric Beard, Senior Solutions Architect, discuss how to use AWS CloudFormation and the AWS CDK to deploy cloud applications in regulated environments while enforcing security controls.

Developer Productivity

Building on AWS with AWS tools, services, and SDKs Kyle Thomson, Senior Software Development Engineer and Deval Parikh, Senior Solutions Architect, discuss the ways developers can set up secure development environments and use their favorite IDEs to interact with, and deploy to, the AWS Cloud.

The Amazon Builders’ Library: 25 years of operational excellence at Amazon Colm MacCarthaigh, Distinguished Engineer, and David Yanacek, Sr. Principal Engineer, discuss how Amazon practices have changed and improved over time and what we’ve learned as builders and as operators.

Sustainability in the cloud with Rust and AWS Graviton Emil Lerch, Principal DevOps Specialist, and Esteban Kuber, Principal Engineer, discuss the benefits of Rust and AWS Graviton that can reduce energy consumption and increase productivity.



About the author:

Brian Beach

Brian Beach has over 20 years of experience as a Developer and Architect. He is currently a Principal Solutions Architect at Amazon Web Services. He holds a Computer Engineering degree from NYU Poly and an MBA from Rutgers Business School. He is the author of “Pro PowerShell for Amazon Web Services” from Apress. He is a regular author and has spoken at numerous events. Brian lives in North Carolina with his wife and three kids.

How BookMyShow saved 80% in costs by migrating to an AWS modern data architecture

Post Syndicated from Mahesh Vandi Chalil original https://aws.amazon.com/blogs/big-data/how-bookmyshow-saved-80-in-costs-by-migrating-to-an-aws-modern-data-architecture/

This is a guest post co-authored by Mahesh Vandi Chalil, Chief Technology Officer of BookMyShow.

BookMyShow (BMS), a leading entertainment company in India, provides an online ticketing platform for movies, plays, concerts, and sporting events. Selling up to 200 million tickets on an annual run rate basis (pre-COVID) to customers in India, Sri Lanka, Singapore, Indonesia, and the Middle East, BookMyShow also offers an online media streaming service and end-to-end management for virtual and on-ground entertainment experiences across all genres.

The pandemic gave BMS the opportunity to migrate and modernize our 15-year-old analytics solution to a modern data architecture on AWS. This architecture is modern, secure, governed, and cost-optimized architecture, with the ability to scale to petabytes. BMS migrated and modernized from on-premises and other cloud platforms to AWS in just four months. This project was run in parallel with our application migration project and achieved 90% cost savings in storage and 80% cost savings in analytics spend.

The BMS analytics platform caters to business needs for sales and marketing, finance, and business partners (e.g., cinemas and event owners), and provides application functionality for audience, personalization, pricing, and data science teams. The prior analytics solution had multiple copies of data, for a total of over 40 TB, with approximately 80 TB of data in other cloud storage. Data was stored on‑premises and in the cloud in various data stores. Growing organically, the teams had the freedom to choose their technology stack for individual projects, which led to the proliferation of various tools, technology, and practices. Individual teams for personalization, audience, data engineering, data science, and analytics used a variety of products for ingestion, data processing, and visualization.

This post discusses BMS’s migration and modernization journey, and how BMS, AWS, and AWS Partner Minfy Technologies team worked together to successfully complete the migration in four months and saving costs. The migration tenets using the AWS modern data architecture made the project a huge success.

Challenges in the prior analytics platform

  • Varied Technology: Multiple teams used various products, languages, and versions of software.
  • Larger Migration Project: Because the analytics modernization was a parallel project with application migration, planning was crucial in order to consider the changes in core applications and project timelines.
  • Resources: Experienced resource churn from the application migration project, and had very little documentation of current systems.
  • Data : Had multiple copies of data and no single source of truth; each data store provided a view for the business unit.
  • Ingestion Pipelines: Complex data pipelines moved data across various data stores at varied frequencies. We had multiple approaches in place to ingest data to Cloudera, via over 100 Kafka consumers from transaction systems and MQTT(Message Queue Telemetry Transport messaging protocol) for clickstreams, stored procedures, and Spark jobs. We had approximately 100 jobs for data ingestion across Spark, Alteryx, Beam, NiFi, and more.
  • Hadoop Clusters: Large dedicated hardware on which the Hadoop clusters were configured incurring fixed costs. On-premises Cloudera setup catered to most of the data engineering, audience, and personalization batch processing workloads. Teams had their implementation of HBase and Hive for our audience and personalization applications.
  • Data warehouse: The data engineering team used TiDB as their on-premises data warehouse. However, each consumer team had their own perspective of data needed for analysis. As this siloed architecture evolved, it resulted in expensive storage and operational costs to maintain these separate environments.
  • Analytics Database: The analytics team used data sourced from other transactional systems and denormalized data. The team had their own extract, transform, and load (ETL) pipeline, using Alteryx with a visualization tool.

Migration tenets followed which led to project success:

  • Prioritize by business functionality.
  • Apply best practices when building a modern data architecture from Day 1.
  • Move only required data, canonicalize the data, and store it in the most optimal format in the target. Remove data redundancy as much possible. Mark scope for optimization for the future when changes are intrusive.
  • Build the data architecture while keeping data formats, volumes, governance, and security in mind.
  • Simplify ELT and processing jobs by categorizing the jobs as rehosted, rewritten, and retired. Finalize canonical data format, transformation, enrichment, compression, and storage format as Parquet.
  • Rehost machine learning (ML) jobs that were critical for business.
  • Work backward to achieve our goals, and clear roadblocks and alter decisions to move forward.
  • Use serverless options as a first option and pay per use. Assess the cost and effort for rearchitecting to select the right approach. Execute a proof of concept to validate this for each component and service.

Strategies applied to succeed in this migration:

  • Team – We created a unified team with people from data engineering, analytics, and data science as part of the analytics migration project. Site reliability engineering (SRE) and application teams were involved when critical decisions were needed regarding data or timeline for alignment. The analytics, data engineering, and data science teams spent considerable time planning, understanding the code, and iteratively looking at the existing data sources, data pipelines, and processing jobs. AWS team with partner team from Minfy Technologies helped BMS arrive at a migration plan after a proof of concept for each of the components in data ingestion, data processing, data warehouse, ML, and analytics dashboards.
  • Workshops – The AWS team conducted a series of workshops and immersion days, and coached the BMS team on the technology and best practices to deploy the analytics services. The AWS team helped BMS explore the configuration and benefits of the migration approach for each scenario (data migration, data pipeline, data processing, visualization, and machine learning) via proof-of-concepts (POCs). The team captured the changes required in the existing code for migration. BMS team also got acquainted with the following AWS services:
  • Proof of concept – The BMS team, with help from the partner and AWS team, implemented multiple proofs of concept to validate the migration approach:
    • Performed batch processing of Spark jobs in Amazon EMR, in which we checked the runtime, required code changes, and cost.
    • Ran clickstream analysis jobs in Amazon EMR, testing the end-to-end pipeline. Team conducted proofs of concept on AWS IoT Core for MQTT protocol and streaming to Amazon S3.
    • Migrated ML models to Amazon SageMaker and orchestrated with Amazon MWAA.
    • Created sample QuickSight reports and dashboards, in which features and time to build were assessed.
    • Configured for key scenarios for Amazon Redshift, in which time for loading data, query performance, and cost were assessed.
  • Effort vs. cost analysis – Team performed the following assessments:
    • Compared the ingestion pipelines, the difference in data structure in each store, the basis of the current business need for the data source, the activity for preprocessing the data before migration, data migration to Amazon S3, and change data capture (CDC) from the migrated applications in AWS.
    • Assessed the effort to migrate approximately 200 jobs, determined which jobs were redundant or need improvement from a functional perspective, and completed a migration list for the target state. The modernization of the MQTT workflow code to serverless was time-consuming, decided to rehost on Amazon Elastic Compute Cloud (Amazon EC2) and modernization to Amazon Kinesis in to the next phase.
    • Reviewed over 400 reports and dashboards, prioritized development in phases, and reassessed business user needs.

AWS cloud services chosen for proposed architecture:

  • Data lake – We used Amazon S3 as the data lake to store the single truth of information for all raw and processed data, thereby reducing the copies of data storage and storage costs.
  • Ingestion – Because we had multiple sources of truth in the current architecture, we arrived at a common structure before migration to Amazon S3, and existing pipelines were modified to do preprocessing. These one-time preprocessing jobs were run in Cloudera, because the source data was on-premises, and on Amazon EMR for data in the cloud. We designed new data pipelines for ingestion from transactional systems on the AWS cloud using AWS Glue ETL.
  • Processing – Processing jobs were segregated based on runtime into two categories: batch and near-real time. Batch processes were further divided into transient Amazon EMR clusters with varying runtimes and Hadoop application requirements like HBase. Near-real-time jobs were provisioned in an Amazon EMR permanent cluster for clickstream analytics, and a data pipeline from transactional systems. We adopted a serverless approach using AWS Glue ETL for new data pipelines from transactional systems on the AWS cloud.
  • Data warehouse – We chose Amazon Redshift as our data warehouse, and planned on how the data would be distributed based on query patterns.
  • Visualization – We built the reports in Amazon QuickSight in phases and prioritized them based on business demand. We discussed with business users their current needs and identified the immediate reports required. We defined the phases of report and dashboard creation and built the reports in Amazon QuickSight. We plan to use embedded reports for external users in the future.
  • Machine learning – Custom ML models were deployed on Amazon SageMaker. Existing Airflow DAGs were migrated to Amazon MWAA.
  • Governance, security, and compliance – Governance with Amazon Lake Formation was adopted from Day 1. We configured the AWS Glue Data Catalog to reference data used as sources and targets. We had to comply to Payment Card Industry (PCI) guidelines because payment information was in the data lake, so we ensured the necessary security policies.

Solution overview

BMS modern data architecture

The following diagram illustrates our modern data architecture.

The architecture includes the following components:

  1. Source systems – These include the following:
    • Data from transactional systems stored in MariaDB (booking and transactions).
    • User interaction clickstream data via Kafka consumers to DataOps MariaDB.
    • Members and seat allocation information from MongoDB.
    • SQL Server for specific offers and payment information.
  2. Data pipeline – Spark jobs on an Amazon EMR permanent cluster process the clickstream data from Kafka clusters.
  3. Data lake – Data from source systems was stored in their respective Amazon S3 buckets, with prefixes for optimized data querying. For Amazon S3, we followed a hierarchy to store raw, summarized, and team or service-related data in different parent folders as per the source and type of data. Lifecycle polices were added to logs and temp folders of different services as per teams’ requirements.
  4. Data processing – Transient Amazon EMR clusters are used for processing data into a curated format for the audience, personalization, and analytics teams. Small file merger jobs merge the clickstream data to a larger file size, which saved costs for one-time queries.
  5. Governance – AWS Lake Formation enables the usage of AWS Glue crawlers to capture the schema of data stored in the data lake and version changes in the schema. The Data Catalog and security policy in AWS Lake Formation enable access to data for roles and users in Amazon Redshift, Amazon Athena, Amazon QuickSight, and data science jobs. AWS Glue ETL jobs load the processed data to Amazon Redshift at scheduled intervals.
  6. Queries – The analytics team used Amazon Athena to perform one-time queries raised from business teams on the data lake. Because report development is in phases, Amazon Athena was used for exporting data.
  7. Data warehouse – Amazon Redshift was used as the data warehouse, where the reports for the sales teams, management, and third parties (i.e., theaters and events) are processed and stored for quick retrieval. Views to analyze the total sales, movie sale trends, member behavior, and payment modes are configured here. We use materialized views for denormalized tables, different schemas for metadata, and transactional and behavior data.
  8. Reports – We used Amazon QuickSight reports for various business, marketing, and product use cases.
  9. Machine learning – Some of the models deployed on Amazon SageMaker are as follows:
    • Content popularity – Decides the recommended content for users.
    • Live event popularity – Calculates the popularity of live entertainment events in different regions.
    • Trending searches – Identifies trending searches across regions.


Migration execution steps

We standardized tools, services, and processes for data engineering, analytics, and data science:

  • Data lake
    • Identified the source data to be migrated from Archival DB, BigQuery, TiDB, and the analytics database.
    • Built a canonical data model that catered to multiple business teams and reduced the copies of data, and therefore storage and operational costs. Modified existing jobs to facilitate migration to a canonical format.
    • Identified the source systems, capacity required, anticipated growth, owners, and access requirements.
    • Ran the bulk data migration to Amazon S3 from various sources.
  • Ingestion
    • Transaction systems – Retained the existing Kafka queues and consumers.
    • Clickstream data – Successfully conducted a proof of concept to use AWS IoT Core for MQTT protocol. But because we needed to make changes in the application to publish to AWS IoT Core, we decided to implement it as part of mobile application modernization at a later time. We decided to rehost the MQTT server on Amazon EC2.
  • Processing
  • Listed the data pipelines relevant to business and migrated them with minimal modification.
  • Categorized workloads into critical jobs, redundant jobs, or jobs that can be optimized:
    • Spark jobs were migrated to Amazon EMR.
    • HBase jobs were migrated to Amazon EMR with HBase.
    • Metadata stored in Hive-based jobs were modified to use the AWS Glue Data Catalog.
    • NiFi jobs were simplified and rewritten in Spark run in Amazon EMR.
  • Amazon EMR clusters were configured one persistent cluster for streaming the clickstream and personalization workloads. We used multiple transient clusters for running all other Spark ETL or processing jobs. We used Spot Instances for task nodes to save costs. We optimized data storage with specific jobs to merge small files and compressed file format conversions.
  • AWS Glue crawlers identified new data in Amazon S3. AWS Glue ETL jobs transformed and uploaded processed data to the Amazon Redshift data warehouse.
  • Datawarehouse
    • Defined the data warehouse schema by categorizing the critical reports required by the business, keeping in mind the workload and reports required in future.
    • Defined the staging area for incremental data loaded into Amazon Redshift, materialized views, and tuning the queries based on usage. The transaction and primary metadata are stored in Amazon Redshift to cater to all data analysis and reporting requirements. We created materialized views and denormalized tables in Amazon Redshift to use as data sources for Amazon QuickSight dashboards and segmentation jobs, respectively.
    • Optimally used the Amazon Redshift cluster by loading last two years data in Amazon Redshift, and used Amazon Redshift Spectrum to query historical data through external tables. This helped balance the usage and cost of the Amazon Redshift cluster.
  • Visualization
    • Amazon QuickSight dashboards were created for the sales and marketing team in Phase 1:
      • Sales summary report – An executive summary dashboard to get an overview of sales across the country by region, city, movie, theatre, genre, and more.
      • Live entertainment – A dedicated report for live entertainment vertical events.
      • Coupons – A report for coupons purchased and redeemed.
      • BookASmile – A dashboard to analyze the data for BookASmile, a charity initiative.
  • Machine learning
    • Listed the ML workloads to be migrated based on current business needs.
    • Priority ML processing jobs were deployed on Amazon EMR. Models were modified to use Amazon S3 as source and target, and new APIs were exposed to use the functionality. ML models were deployed on Amazon SageMaker for movies, live event clickstream analysis, and personalization.
    • Existing artifacts in Airflow orchestration were migrated to Amazon MWAA.
  • Security
    • AWS Lake Formation was the foundation of the data lake, with the AWS Glue Data Catalog as the foundation for the central catalog for the data stored in Amazon S3. This provided access to the data by various functionalities, including the audience, personalization, analytics, and data science teams.
    • Personally identifiable information (PII) and payment data was stored in the data lake and data warehouse, so we had to comply to PCI guidelines. Encryption of data at rest and in transit was considered and configured in each service level (Amazon S3, AWS Glue Data Catalog, Amazon EMR, AWS Glue, Amazon Redshift, and QuickSight). Clear roles, responsibilities, and access permissions for different user groups and privileges were listed and configured in AWS Identity and Access Management (IAM) and individual services.
    • Existing single sign-on (SSO) integration with Microsoft Active Directory was used for Amazon QuickSight user access.
  • Automation
    • We used AWS CloudFormation for the creation and modification of all the core and analytics services.
    • AWS Step Functions was used to orchestrate Spark jobs on Amazon EMR.
    • Scheduled jobs were configured in AWS Glue for uploading data in Amazon Redshift based on business needs.
    • Monitoring of the analytics services was done using Amazon CloudWatch metrics, and right-sizing of instances and configuration was achieved. Spark job performance on Amazon EMR was analyzed using the native Spark logs and Spark user interface (UI).
    • Lifecycle policies were applied to the data lake to optimize the data storage costs over time.

Benefits of a modern data architecture

A modern data architecture offered us the following benefits:

  • Scalability – We moved from a fixed infrastructure to the minimal infrastructure required, with configuration to scale on demand. Services like Amazon EMR and Amazon Redshift enable us to do this with just a few clicks.
  • Agility – We use purpose-built managed services instead of reinventing the wheel. Automation and monitoring were key considerations, which enable us to make changes quickly.
  • Serverless – Adoption of serverless services like Amazon S3, AWS Glue, Amazon Athena, AWS Step Functions, and AWS Lambda support us when our business has sudden spikes with new movies or events launched.
  • Cost savings – Our storage size was reduced by 90%. Our overall spend on analytics and ML was reduced by 80%.


In this post, we showed you how a modern data architecture on AWS helped BMS to easily share data across organizational boundaries. This allowed BMS to make decisions with speed and agility at scale; ensure compliance via unified data access, security, and governance; and to scale systems at a low cost without compromising performance. Working with the AWS and Minfy Technologies teams helped BMS choose the correct technology services and complete the migration in four months. BMS achieved the scalability and cost-optimization goals with this updated architecture, which has set the stage for innovation using graph databases and enhanced our ML projects to improve customer experience.

About the Authors

Mahesh Vandi Chalil is Chief Technology Officer at BookMyShow, India’s leading entertainment destination. Mahesh has over two decades of global experience, passionate about building scalable products that delight customers while keeping innovation as the top goal motivating his team to constantly aspire for these. Mahesh invests his energies in creating and nurturing the next generation of technology leaders and entrepreneurs, both within the organization and outside of it. A proud husband and father of two daughters and plays cricket during his leisure time.

Priya Jathar is a Solutions Architect working in Digital Native Business segment at AWS. She has more two decades of IT experience, with expertise in Application Development, Database, and Analytics. She is a builder who enjoys innovating with new technologies to achieve business goals. Currently helping customers Migrate, Modernise, and Innovate in Cloud. In her free time she likes to paint, and hone her gardening and cooking skills.

Vatsal Shah is a Senior Solutions Architect at AWS based out of Mumbai, India. He has more than nine years of industry experience, including leadership roles in product engineering, SRE, and cloud architecture. He currently focuses on enabling large startups to streamline their cloud operations and help them scale on the cloud. He also specializes in AI and Machine Learning use cases.

Team Collaboration with Amazon CodeCatalyst

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/devops/team-collaboration-with-amazon-codecatalyst/

Amazon CodeCatalyst enables teams to collaborate on features, tasks, bugs, and any other work involved when building software. CodeCatalyst was announced at re:Invent 2022 and is currently in preview.


In a prior post in this series, Using Workflows to Build, Test, and Deploy with Amazon CodeCatalyst, I discussed reading The Unicorn Project, by Gene Kim, and how the main character, Maxine, struggles with a complicated software development lifecycle (SLDC) after joining a new team. Some of the challenges she encounters include:

  • Continually delivering high-quality updates is complicated and slow
  • Collaborating efficiently with others is challenging
  • Managing application environments is increasingly complex
  • Setting up a new project is a time-consuming chore

In this post, I will focus on the second bullet, and how CodeCatalyst helps you collaborate from anywhere with anyone.


If you would like to follow along with this walkthrough, you will need to:


Similar to the prior post, I am going to use the Modern Three-tier Web Application blueprint in this walkthrough. A CodeCatalyst blueprint provides a template for a new project. If you would like to follow along, you can launch the blueprint as described in Creating a project in Amazon CodeCatalyst.  This will deploy the Mythical Mysfits sample application shown in the following image.

The Mythical Mysfits user interface showing header and three Mysfits

Figure 1. The Mythical Mysfits user interface showing header and three Mysfits

For this Walkthrough, let us assume that I need to make a simple change to the application. The legal department would like to add a footer that includes the text “© 2023 Demo Organization.” I will create an issue in CodeCatalyst to track this work and use CodeCatalyst to track the change throughout the entire Software Development Life Cycle (SDLC).

CodeCatalyst organizes projects into Spaces. A space represents your company, department, or group; and contains projects, members, and the associated cloud resources you create in CodeCatalyst. In this walkthrough, my Space currently includes two members, Brian Beach and Panna Shetty, as shown in the following screenshot.  Note that both users are administrators, but CodeCatalyst supports multiple roles. You can read more about roles in members of your space.

The space members configuration page showing two users

Figure 2. The space members configuration page showing two users

To begin, Brian creates a new issue to track the request from legal. He assigns the issue to Panna, but leaves it in the backlog for now. Note that CodeCatalyst supports multiple metadata fields to organize your work. This issue is not impacting users and is relatively simple to fix. Therefore, Brian has categorized it as low priority and estimated the effort as extra small (XS). Brian has also added a label, so all the requests from legal can be tracked together. Note that these metadata fields are customizable. You can read more in configuring issue settings.

Create issue dialog box with name, description and metadata

Figure 3. Create issue dialog box with name, description and metadata

CodeCatalyst supports rich markdown in the description field. You can read about this in Markdown tips and tricks. In the following screenshot, Brian types “@app.vue” which brings up an inline search for people, issues, and code to help Panna find the relevant bit of code that needs changing later.

Create issue dialog box with type-ahead overlay

Figure 4. Create issue dialog box with type-ahead overlay

When Panna is ready to begin work on the new feature, she moves the issue from the “Backlog“ to ”In progress.“ CodeCatalyst allows users to manage their work using a Kanban style board. Panna can simply drag-and-drop issues on the board to move the issue from one state to another. Given the small team, Brian and Panna use a single board. However, CodeCatalyst allows you to create multiple views filtered by the metadata fields discussed earlier. For example, you might create a label called Sprint-001, and use that to create a board for the sprint.

Kanban board showing to do, in progress and in review columns

Figure 5. Kanban board showing to do, in progress and in review columns

Panna creates a new branch for the change called feature_add_copyright and uses the link in the issue description to navigate to the source code repository. This change is so simple that she decides to edit the file in the browser and commits the change. Note that for more complex changes, CodeCatalyst supports Dev Environments. The next post in this series will be dedicated to Dev Environments. For now, you just need to know that a Dev Environment is a cloud-based development environment that you can use to quickly work on the code stored in the source repositories of your project.

Editor with new lines highlighted

Figure 6. Editor with new lines highlighted

Panna also creates a pull request to merge the feature branch in to the main branch. She identifies Brian as a required reviewer. Panna then moves the issue to the “In review” column on the Kanban board so the rest of the team can track the progress. Once Brian reviews the change, he approves and merges the pull request.

Pull request details with title, description, and reviewed assigned

Figure 7. Pull request details with title, description, and reviewed assigned

When the pull request is merged, a workflow is configured to run automatically on code changes to build, test, and deploy the change. Note that Workflows were covered in the prior post in this series. Once the workflow is complete, Panna is notified in the team’s Slack channel. You can read more about notifications in working with notifications in CodeCatalyst. She verifies the change in production and moves the issue to the done column on the Kanban board.

Kanban board showing in progress, in review, and done columns

Figure 8. Kanban board showing in progress, in review, and done columns

Once the deployment completes, you will see the footer added at the bottom of the page.

Figure 9. The Mythical Mysfits user interface showing footer and three Mysfits

At this point the issue is complete and you have seen how this small team collaborated to progress through the entire software development lifecycle (SDLC).


If you have been following along with this workflow, you should delete the resources you deployed so you do not continue to incur charges. First, delete the two stacks that CDK deployed using the AWS CloudFormation console in the AWS account you associated when you launched the blueprint. These stacks will have names like mysfitsXXXXXWebStack and mysfitsXXXXXAppStack. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project.


In this post, you learned how CodeCatalyst can help you rapidly collaborate with other developers. I used issues to track feature and bugs, assigned code reviews, and managed pull requests. In future posts I will continue to discuss how CodeCatalyst can address the rest of the challenges Maxine encountered in The Unicorn Project.

About the authors:

Brian Beach

Brian Beach has over 20 years of experience as a Developer and Architect. He is currently a Principal Solutions Architect at Amazon Web Services. He holds a Computer Engineering degree from NYU Poly and an MBA from Rutgers Business School. He is the author of “Pro PowerShell for Amazon Web Services” from Apress. He is a regular author and has spoken at numerous events. Brian lives in North Carolina with his wife and three kids.

Panna Shetty

Panna Shetty is a Sr. Solutions Architect with Amazon Web Services (AWS), working with public sector customers. She enjoys helping customers architect and build scalable and
reliable modern applications using cloud-native technologies.

Secure CDK deployments with IAM permission boundaries

Post Syndicated from Brian Farnhill original https://aws.amazon.com/blogs/devops/secure-cdk-deployments-with-iam-permission-boundaries/

The AWS Cloud Development Kit (CDK) accelerates cloud development by allowing developers to use common programming languages when modelling their applications. To take advantage of this speed, developers need to operate in an environment where permissions and security controls don’t slow things down, and in a tightly controlled environment this is not always the case. Of particular concern is the scenario where a developer has permission to create AWS Identity and Access Management (IAM) entities (such as users or roles), as these could have permissions beyond that of the developer who created them, allowing for an escalation of privileges. This approach is typically controlled through the use of permission boundaries for IAM entities, and in this post you will learn how these boundaries can now be applied more effectively to CDK development – allowing developers to stay secure and move fast.

Time to read 10 minutes
Learning level Advanced (300)
Services used

AWS Cloud Development Kit (CDK)

AWS Identity and Access Management (IAM)

Applying custom permission boundaries to CDK deployments

When the CDK deploys a solution, it assumes a AWS CloudFormation execution role to perform operations on the user’s behalf. This role is created during the bootstrapping phase by the AWS CDK Command Line Interface (CLI). This role should be configured to represent the maximum set of actions that CloudFormation can perform on the developers behalf, while not compromising any compliance or security goals of the organisation. This can become complicated when developers need to create IAM entities (such as IAM users or roles) and assign permissions to them, as those permissions could be escalated beyond their existing access levels. Taking away the ability to create these entities is one way to solve the problem. However, doing this would be a significant impediment to developers, as they would have to ask an administrator to create them every time. This is made more challenging when you consider that security conscious practices will create individual IAM roles for every individual use case, such as each AWS Lambda Function in a stack. Rather than taking this approach, IAM permission boundaries can help in two ways – first, by ensuring that all actions are within the overlap of the users permissions and the boundary, and second by ensuring that any IAM entities that are created also have the same boundary applied. This blocks the path to privilege escalation without restricting the developer’s ability to create IAM identities. With the latest version of the AWS CLI these boundaries can be applied to the execution role automatically when running the bootstrap command, as well as being added to IAM entities that are created in a CDK stack.

To use a permission boundary in the CDK, first create an IAM policy that will act as the boundary. This should define the maximum set of actions that the CDK application will be able to perform on the developer’s behalf, both during deployment and operation. This step would usually be performed by an administrator who is responsible for the security of the account, ensuring that the appropriate boundaries and controls are enforced. Once created, the name of this policy is provided to the bootstrap command. In the example below, an IAM policy called “developer-policy” is used to demonstrate the command.
cdk bootstrap –custom-permissions-boundary developer-policy
Once this command runs, a new bootstrap stack will be created (or an existing stack will be updated) so that the execution role has this boundary applied to it. Next, you can ensure that any IAM entities that are created will have the same boundaries applied to them. This is done by either using a CDK context variable, or the permissionBoundary attribute on those resources. To explain this in some detail, let’s use a real world scenario and step through an example that shows how this feature can be used to restrict developers from using the AWS Config service.

Installing or upgrading the AWS CDK CLI

Before beginning, ensure that you have the latest version of the AWS CDK CLI tool installed. Follow the instructions in the documentation to complete this. You will need version 2.54.0 or higher to make use of this new feature. To check the version you have installed, run the following command.

cdk --version

Creating the policy

First, let’s begin by creating a new IAM policy. Below is a CloudFormation template that creates a permission policy for use in this example. In this case the AWS CLI can deploy it directly, but this could also be done at scale through a mechanism such as CloudFormation Stack Sets. This template has the following policy statements:

  1. Allow all actions by default – this allows you to deny the specific actions that you choose. You should carefully consider your approach to allow/deny actions when creating your own policies though.
  2. Deny the creation of users or roles unless the “developer-policy” permission boundary is used. Additionally limit the attachment of permissions boundaries on existing entities to only allow “developer-policy” to be used. This prevents the creation or change of an entity that can escalate outside of the policy.
  3. Deny the ability to change the policy itself so that a developer can’t modify the boundary they will operate within.
  4. Deny the ability to remove the boundary from any user or role
  5. Deny any actions against the AWS Config service

Here items 2, 3 and 4 all ensure that the permission boundary works correctly – they are controls that prevent the boundary being removed, tampered with, or bypassed. The real focus of this policy in terms of the example are items 1 and 5 – where you allow everything, except the specific actions that are denied (creating a deny list of actions, rather than an allow list approach).

    Type: AWS::IAM::ManagedPolicy
          # ----- Begin base policy ---------------
          # If permission boundaries do not have an explicit allow
          # then the effect is deny
          - Sid: ExplicitAllowAll
            Action: "*"
            Effect: Allow
            Resource: "*"
          # Default permissions to prevent privilege escalation
          - Sid: DenyAccessIfRequiredPermBoundaryIsNotBeingApplied
              - iam:CreateUser
              - iam:CreateRole
              - iam:PutRolePermissionsBoundary
              - iam:PutUserPermissionsBoundary
                  Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
            Effect: Deny
            Resource: "*"
          - Sid: DenyPermBoundaryIAMPolicyAlteration
              - iam:CreatePolicyVersion
              - iam:DeletePolicy
              - iam:DeletePolicyVersion
              - iam:SetDefaultPolicyVersion
            Effect: Deny
              Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
          - Sid: DenyRemovalOfPermBoundaryFromAnyUserOrRole
              - iam:DeleteUserPermissionsBoundary
              - iam:DeleteRolePermissionsBoundary
            Effect: Deny
            Resource: "*"
          # ----- End base policy ---------------
          # -- Begin Custom Organization Policy --
          - Sid: DenyModifyingOrgCloudTrails
            Effect: Deny
            Action: config:*
            Resource: "*"
          # -- End Custom Organization Policy --
        Version: "2012-10-17"
      Description: "Bootstrap Permission Boundary"
      ManagedPolicyName: developer-policy
      Path: /

Save the above locally as developer-policy.yaml and then you can deploy it with a CloudFormation command in the AWS CLI:

aws cloudformation create-stack --stack-name DeveloperPolicy \
        --template-body file://developer-policy.yaml \
        --capabilities CAPABILITY_NAMED_IAM

Creating a stack to test the policy

To begin, create a new CDK application that you will use to test and observe the behaviour of the permission boundary. Create a new directory with a TypeScript CDK application in it by executing these commands.

mkdir DevUsers && cd DevUsers
cdk init --language typescript

Once this is done, you should also make sure that your account has a CDK bootstrap stack deployed with the cdk bootstrap command – to start with, do not apply a permission boundary to it, you can add that later an observe how it changes the behaviour of your deployment. Because the bootstrap command is not using the --cloudformation-execution-policies argument, it will default to arn:aws:iam::aws:policy/AdministratorAccess which means that CloudFormation will have full access to the account until the boundary is applied.

cdk bootstrap

Once the command has run, create an AWS Config Rule in your application to be sure that this works without issue before the permission boundary is applied. Open the file lib/dev_users-stack.ts and edit its contents to reflect the sample below.

import * as cdk from 'aws-cdk-lib';
import { ManagedRule, ManagedRuleIdentifiers } from 'aws-cdk-lib/aws-config';
import { Construct } from "constructs";

export class DevUsersStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new ManagedRule(this, 'AccessKeysRotated', {
      configRuleName: 'access-keys-policy',
      identifier: ManagedRuleIdentifiers.ACCESS_KEYS_ROTATED,
      inputParameters: {
        maxAccessKeyAge: 60, // default is 90 days

Next you can deploy with the CDK CLI using the cdk deploy command, which will succeed (the output below has been truncated to show a summary of the important elements).

❯ cdk deploy
✨  Synthesis time: 3.05s
✅  DevUsersStack
✨  Deployment time: 23.17s

Stack ARN:

✨  Total time: 26.21s

Before you deploy the permission boundary, remove this stack again with the cdk destroy command.

❯ cdk destroy
Are you sure you want to delete: DevUsersStack (y/n)? y
DevUsersStack: destroying... [1/1]
✅ DevUsersStack: destroyed

Using a permission boundary with the CDK test application

Now apply the permission boundary that you created above and observe the impact it has on the same deployment. To update your booststrap with the permission boundary, re-run the cdk bootstrap command with the new custom-permissions-boundary parameter.

cdk bootstrap --custom-permissions-boundary developer-policy

After this command executes, the CloudFormation execution role will be updated to use that policy as a permission boundary, which based on the deny rule for config:* will cause this same application deployment to fail. Run cdk deploy again to confirm this and observe the error message.

❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack
named DevUsersStack failed creation, it may need to be manually deleted 
from the AWS console: 
    User: arn:aws:sts::123456789012:assumed-role/cdk-hnb659fds-cfn-exec-role-123456789012-ap-southeast-2/AWSCloudFormation
    is not authorized to perform: config:PutConfigRule on resource: access-keys-policy with an explicit deny in a
    permissions boundary

This shows you that the action was denied specifically due to the use of a permissions boundary, which is what was expected.

Applying permission boundaries to IAM entities automatically

Next let’s explore how the permission boundary can be extended to IAM entities that are created by a CDK application. The concern here is that a developer who is creating a new IAM entity could assign it more permissions than they have themselves – the permission boundary manages this by ensuring that entities can only be created that also have the boundary attached. You can validate this by modifying the stack to deploy a Lambda function that uses a role that doesn’t include the boundary. Open the file lib/dev_users-stack.ts again and edit its contents to reflect the sample below.

import * as cdk from 'aws-cdk-lib';
import { PolicyStatement } from "aws-cdk-lib/aws-iam";
import {
} from "aws-cdk-lib/custom-resources";
import { Construct } from "constructs";

export class DevUsersStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new AwsCustomResource(this, "Resource", {
      onUpdate: {
        service: "ConfigService",
        action: "putConfigRule",
        parameters: {
          ConfigRule: {
            ConfigRuleName: "SampleRule",
            Source: {
              Owner: "AWS",
              SourceIdentifier: "ACCESS_KEYS_ROTATED",
            InputParameters: '{"maxAccessKeyAge":"60"}',
        physicalResourceId: PhysicalResourceId.of("SampleConfigRule"),
      policy: AwsCustomResourcePolicy.fromStatements([
        new PolicyStatement({
          actions: ["config:*"],
          resources: ["*"],

Here the AwsCustomResource is used to provision a Lambda function that will attempt to create a new config rule. This is the same result as the previous stack but in this case the creation of the rule is done by a new IAM role that is created by the CDK construct for you. Attempting to deploy this will result in a failure – run cdk deploy to observe this.

❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named 
DevUsersStack failed creation, it may need to be manually deleted from the AWS 
    API: iam:CreateRole User: arn:aws:sts::123456789012:assumed-
    is not authorized to perform: iam:CreateRole on resource:
    with an explicit deny in a permissions boundary

The error message here details that the stack was unable to deploy because the call to iam:CreateRole failed because the boundary wasn’t applied. The CDK now offers a straightforward way to set a default permission boundary on all IAM entities that are created, via the CDK context variable core:permissionsBoundary in the cdk.json file.

  "context": {
     "@aws-cdk/core:permissionsBoundary": {
       "name": "developer-policy"

This approach is useful because now you can import constructs that create IAM entities (such as those found on Construct Hub or out of the box constructs that create default IAM roles) and have the boundary apply to them as well. There are alternative ways to achieve this, such as setting a boundary on specific roles, which can be used in scenarios where this approach does not fit. Make the change to your cdk.json file and run the CDK deploy again. This time the custom resource will attempt to create the config rule using its IAM role instead of the CloudFormation execution role. It is expected that the boundary will also protect this Lambda function in the same way – run cdk deploy again to confirm this. Note that the deployment updates from CloudFormation show that this time the role creation succeeds this time, and a new error message is generated.

❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named
DevUsersStack failed creation, it may need to be manually deleted from the AWS 
    Received response status [FAILED] from custom resource. Message returned: User:
    is not authorized to perform: config:PutConfigRule on resource: SampleRule with an explicit deny in a permissions boundary

In this error message you can see that the user it refers to is DevUsersStack-AWS679f53fac002430cb0da5b7982bd2287S-84VFVA7OGC9N rather than the CloudFormation execution role. This is the role being used by the custom Lambda function resource, and when it attempts to create the Config rule it is rejected because of the permissions boundary in the same way. Here you can see how the boundary is being applied consistently to all IAM entities that are created in your CDK app, which ensures the administrative controls can be applied consistently to everything a developer does with a minimal amount of overhead.


At this point you can either choose to remove the CDK bootstrap stack if you no longer require it, or remove the permission boundary from the stack. To remove it, delete the CDKToolkit stack from CloudFormation with this AWS CLI command.

aws cloudformation delete-stack --stack-name CDKToolkit

If you want to keep the bootstrap stack, you can remove the boundary by following these steps:

  1. Browse to the CloudFormation page in the AWS console, and select the CDKToolit stack.
  2. Select the ‘Update’ button. Choose “Use Current Template” and then press ‘Next’
  3. On the parameters page, find the value InputPermissionsBoundary which will have developer-policy as the value, and delete the text in this input to leave it blank. Press ‘Next’ and the on the following page, press ‘Next’ again
  4. On the final page, scroll to the bottom and check the box acknowledging that CloudFormation might create IAM resources with custom names, and choose ‘Submit’

With the permission boundary no longer being used, you can now remove the stack that created it as the final step.

aws cloudformation delete-stack --stack-name DeveloperPolicy


Now you can see how IAM permission boundaries can easily be integrated in to CDK development, helping ensure developers have the control they need while administrators can ensure that security is managed in a way that meets the needs of the organisation as well.

With this being understood, there are next steps you can take to further expand on the use of permission boundaries. The CDK Security and Safety Developer Guide document on GitHub outlines these approaches, as well as ways to think about your approach to permissions on deployment. It’s recommended that developers and administrators review this, and work to develop and appropriate approach to permission policies that suit your security goals.

Additionally, the permission boundary concept can be applied in a multi-account model where each Stage has a unique boundary name applied. This can allow for scenarios where a lower-level environment (such as a development or beta environment) has more relaxed permission boundaries that suit troubleshooting and other developer specific actions, but then the higher level environments (such as gamma or production) could have the more restricted permission boundaries to ensure that security risks are more appropriately managed. The mechanism for implement this is defined in the security and safety developer guide also.

About the authors:

Brian Farnhill

Brian Farnhill is a Software Development Engineer at AWS, helping public sector customers in APAC create impactful solutions running in the cloud. His background is in building solutions and helping customers improve DevOps tools and processes. When he isn’t working, you’ll find him either coding for fun or playing online games.

David Turnbull

David Turnbull is a Software Development Engineer at AWS, helping public sector customers in APAC create impactful solutions running in the cloud. He likes to comprehend new programming languages and has used this to stray out of his line. David writes computer simulations for fun.

Building a Cloud in the Cloud: Running Apache CloudStack on Amazon EC2, Part 2

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-a-cloud-in-the-cloud-running-apache-cloudstack-on-amazon-ec2-part-2/

This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.

In part 1, I showed you how to run Apache CloudStack with KVM on a single Amazon Elastic Compute Cloud (Amazon EC2) instance. That simple setup is great for experimentation and light workloads. In this post, things will get a lot more interesting. I’ll show you how to create an overlay network in your Amazon Virtual Private Cloud (Amazon VPC) that allows CloudStack to scale horizontally across multiple EC2 instances. This same method could work with other hypervisors, too.

If you haven’t read it yet, then start with part 1. It explains why this network setup is necessary. The same prerequisites apply to both posts.

Making things easier

I wrote some scripts to automate the CloudStack installation and OS configuration on CentOS 7. You can customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.

The scalable method

Our team started out using a single EC2 instance, as described in my last post. That worked at first, but it didn’t have the capacity we needed. We were limited to a couple of dozen VMs, but we needed hundreds. We also needed to scale up and down as our needs changed. This meant we needed the ability to add and remove CloudStack hosts. Using a Linux bridge as a virtual subnet was no longer adequate.

To support adding hosts, we need a subnet that spans multiple instances. The solution I found is Virtual Extensible LAN (VXLAN). It’s lightweight, easy to configure, and included in the Linux kernel. VXLAN creates a layer 2 overlay network that abstracts away the details of the underlying network. It allows machines in different parts of a network to communicate as if they’re all attached to the same simple network switch.

Another example of an overlay network is an Amazon VPC. It acts like a physical network, but it’s actually a layer on top of other networks. It’s networks all the way down. VXLAN provides a top layer where CloudStack can sit comfortably, handling all of your VM needs, blissfully unaware of the world below it.

An overlay network comes with some big advantages. The biggest improvement is that you can have multiple hosts, allowing for horizontal scaling. Having more hosts not only gives you more computing power, but also lets you do rolling maintenance. Instead of putting the database and file storage on the management server, I’ll show you how to use Amazon Elastic File System (Amazon EFS) and Amazon Relational Database Service (Amazon RDS) for scalable and reliable storage.

EC2 Instances

Let’s start with three Amazon EC2 instances. One will be a router between the overlay network and your Amazon VPC, the second one will be your CloudStack management server, and the third one will be your VM host. You’ll also need a way to connect to your instances, such as a bastion host or a VPN endpoint.

Three EC2 instances are connected to an AWS subnet. There's an overlay network that spans all three instances.The router instance connects the overlay network to the AWS subnet. The management instance contains the CloudStack management service, which is attached to the overlay network. The host instance contains the CloudStack agent and some VMs, all of which are connected to the overlay network.VXLAN must send and receive multicast traffic. Only Nitro instances can be multicast senders. As you plan, look at the list of Nitro instance types.

The router won’t need much computing power, but it will need enough network bandwidth to meet your needs. If you put Amazon EFS in the same subnet as your instances, then they’ll communicate with it directly, thereby reducing the load on the router. Decide how much network throughput you want, and then pick a suitable Nitro instance type.

After creating the router instance, configure AWS to use it as a router. Stop source/destination checking in the instance’s network settings. Then update the applicable AWS route tables to use the router as the target for the overlay network. The router’s security group needs to allow ingress to the CloudStack UI (TCP port 8080) and any services you plan to offer from VMs.

For the management server, you’ll want a Nitro instance type. It’s going to use more CPU than your router, so plan accordingly.

In addition to being a Nitro type, the host instance must also be a metal type. Metal instances have hardware virtualization support, which is needed by KVM. If you have a new AWS account with a low on-demand vCPU limit, then consider starting with an m5zn.metal, which has 48 vCPUs. Otherwise, I suggest going directly to a c5.metal because it provides 96 vCPUs for a similar price. There are bigger types available depending on your compute needs, budget, and vCPU limit. If your account’s on-demand vCPU limit is too low, then you can file a support ticket to have it raised.


All of the instances should be on a dedicated subnet. Sharing the subnet with other instances can cause communication issues. For an example, refer to the following figure. The subnet has an instance named TroubleMaker that’s not on the overlay network. If TroubleMaker sends a request to the management instance’s overlay network address, then here’s what happens:

  1. The request goes through the AWS subnet to the router.
  2. The router forwards the request via the overlay network.
  3. The CloudStack management instance has a connection to the same AWS subnet that TroubleMaker is on. Therefore, it responds directly instead of using the router. This isn’t the return path that AWS is expecting, so the response is dropped.

This diagram depicts the steps described in the previous paragraph.

If you move TroubleMaker to a different subnet, then the requests and responses will all go through the router. That will fix the communication issues.

The instances in the overlay network will use special interfaces that serve as VXLAN tunnel endpoints (VTEPs). The VTEPs must know how to contact each other via the underlay network. You could manually give each instance a list of all of the other instances, but that’s a maintenance nightmare. It’s better to let the VTEPs discover each other, which they can do using multicast. You can add multicast support using AWS Transit Gateway.

Here are the steps to make VXLAN multicasts work:

  1. Enable multicast support when you create the transit gateway.
  2. Attach the transit gateway to your subnet.
  3. Create a transit gateway multicast domain with IGMPv2 support enabled.
  4. Associate the multicast domain with your subnet.
  5. Configure the eth0 interface on each instance to use IGMPv2. The following sample code shows how to do this.
  6. Make sure that your instance security groups allow ingress for IGMP queries (protocol 2 traffic from and VXLAN traffic (UDP port 4789 from the other instances).

CloudStack VMs must connect to the same bridge as the VXLAN interface. As mentioned in the previous post, CloudStack cares about names. I recommend giving the interface a name starting with “eth”. Moreover, this naming convention tells CloudStack which bridge to use, thereby avoiding the need for a dummy interface like the one in the simple setup.

The following snippet shows how I configured the networking in CentOS 7. You must provide values for these variables:

  • $overlay_host_ip_address, $overlay_netmask, and $overlay_gateway_ip: Use values for the overlay network that you’re creating.
  • $dns_address: I recommend using the base of the VPC IPv4 network range, plus two. You shouldn’t use 169.654.169.253 because CloudStack reserves link-local addresses for its own use.
  • $multicast_address: The multicast address that you want VXLAN to use. Pick something in the multicast range that won’t conflict with anything else. I recommend choosing from the IPv4 local scope (
  • $interface_name: The name of the interface VXLAN should use to communicate with the physical network. This is typically eth0.

A couple of the steps are different for the router instance than for the other instances. Pay attention to the comments!

yum install -y bridge-utils net-tools

# IMPORTANT: Omit the GATEWAY setting on the router instance!
cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0

cat << EOF > /sbin/ifup-local
# Set up VXLAN once cloudbr0 is available.
if [[ \$1 == "cloudbr0" ]]
    ip link add ethvxlan0 type vxlan id 100 dstport 4789 group "$multicast_address" dev "$interface_name"
    brctl addif cloudbr0 ethvxlan0
    ip link set up dev ethvxlan0

chmod +x /sbin/ifup-local

# Transit Gateway requires IGMP version 2
echo "net.ipv4.conf.$interface_name.force_igmp_version=2" >> /etc/sysctl.conf
sysctl -p

# Enable IPv4 forwarding
# IMPORTANT: Only do this on the router instance!
echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

# Restart the network service to make the changes take effect.
systemctl restart network


Let’s look at storage. Create an Amazon RDS database using the MySQL 8.0 engine, and set a master password for CloudStack’s database setup tool to use. Refer to the CloudStack documentation to find the MySQL settings that you’ll need. You can put the settings in an RDS parameter group. In case you’re wondering why I’m not using Amazon Aurora, it’s because CloudStack needs the MyISAM storage engine, which isn’t available in Aurora.

I recommend Amazon EFS for file storage. For efficiency, create a mount target in the subnet with your EC2 instances. That will enable them to communicate directly with the mount target, thereby bypassing the overlay network and router. Note that the system VMs will use Amazon EFS via the router.

If you want, you can consolidate your CloudStack file systems. Just create a single file system with directories for each zone and type of storage. For example, I use directories named /zone1/primary, /zone1/secondary, /zone2/primary, etc. You should also consider enabling provisioned throughput on the file system, or you may run out of bursting credits after booting a few VMs.

One consequence of the file system’s scalability is that the amount of free space (8 exabytes) will cause an integer overflow in CloudStack! To avoid this problem, reduce storage.overprovisioning.factor in CloudStack’s global settings from 2 to 1.

When your environment is ready, install CloudStack. When it asks for a default gateway, remember to use the router’s overlay network address. When you add a host, make sure that you use the host’s overlay network IP address.


If you used my CloudFormation template, delete the stack and remove any route table entries you added.

If you didn’t use CloudFormation, here are the things to delete:

  1. The CloudStack EC2 instances
  2. The Amazon RDS database and parameter group
  3. The Amazon EFS file system
  4. The transit gateway multicast domain subnet association
  5. The transit gateway multicast domain
  6. The transit gateway VPC attachment
  7. The transit gateway
  8. The route table entries that you created
  9. The security groups that you created for the instances, database, and file system


The approach I shared has many steps, but they’re not bad when you have a plan. Whether you need a simple setup for experiments, or you need a scalable environment for a data center migration, you now have a path forward. Give it a try, and comment here about the things you learned. I hope you find it useful and fun!

“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.

Building a Cloud in the Cloud: Running Apache CloudStack on Amazon EC2, Part 1

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/building-a-cloud-in-the-cloud-running-apache-cloudstack-on-amazon-ec2-part-1/

This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.

How do you put a cloud inside another cloud? Some features that make Amazon Elastic Compute Cloud (Amazon EC2) secure and wonderful also make running CloudStack difficult. The biggest obstacle is that AWS and CloudStack both want to manage network resources. Therefore, we must keep them out of each other’s way. This requires some steps that aren’t obvious, and it took a long time to figure out. I’m going to share what I learned, so that you can navigate the process more easily.

Apache CloudStack is an open-source platform for deploying and managing virtual machines (VMs) and the associated network and storage infrastructure. You would normally run it on your own hardware to create your own cloud. But there can be advantages to running it inside of an Amazon Virtual Private Cloud (Amazon VPC), including how it could help you migrate out of a data center. It’s a great way to create disposable environments for experiments or training. Furthermore, it’s a convenient way to test-drive the new CloudStack support in Amazon Elastic Kubernetes Service (Amazon EKS) Anywhere. In my case, I needed to create development and test environments for a project that uses the CloudStack API. The environments needed to be shared and scalable. Our build pipelines were already in AWS, so it made sense to put the new environments there, too.

CloudStack can work with a number of hypervisors. The instructions in this article will use Kernel-based Virtual Machine (KVM) on Linux. KVM will manage the VMs at a low level, and CloudStack will manage KVM.


Most of the information in this article should be applicable to a range of CloudStack versions. I targeted CloudStack 4.14 on CentOS 7. I also tested CloudStack versions 4.16 and 4.17, and I recommend them.

The official CentOS 7 x86_64 HVM image works well. If you use a different Linux flavor or version, then you might have to modify some of the implementation details.

You’ll need to know the basics of CloudStack. The scope of this article is making CloudStack and AWS coexist peacefully. Once CloudStack is running, I’m assuming that you’ll handle things from there.  Refer to the AWS documentation and CloudStack documentation for information on security and other best practices.

Making things easier

I wrote some scripts to automate the installation. You can run them on EC2 instances with CentOS 7, and they’ll do all the installation and OS configuration for you. You can use them as they are, or customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.

Amazon EC2 instance types

KVM requires hardware virtualization support. Most EC2 instances are VMs that don’t support nested virtualization. To get access to the bare hardware, you need a metal instance type.

I like c5.metal because it’s one of the least expensive metal types, and has a low cost per vCPU. It has 96 vCPUs and 192 GiB of memory. If you run 20 VMs on it, with 4 CPU cores and 8 GiB of memory each, then you’d still have 16 vCPUs and 32 GiB to share between the operating system, CloudStack, and MySQL. Using CloudStack’s overprovisioning feature, you could fit even more VMs if they’re running light loads.


The biggest challenge is the network. AWS knows which IP and MAC addresses should exist, and it knows the machines to which they should belong. It blocks any traffic that doesn’t fit its idea of how the network should behave. Simultaneously, CloudStack assumes that any IP or MAC address it invents should work just fine. When CloudStack assigns addresses to VMs on an AWS subnet, their network traffic gets blocked.

You could get around this by enabling network address translation (NAT) on the instance running CloudStack. That’s a great solution if it fits your needs, but it makes it hard for other machines in your Amazon VPC to contact your VMs. I recommend a different approach.

Although AWS restricts what you can do with its layer 2 network, it’s perfectly happy to let you run your own layer 3 router. Your EC2 instance can act as a router to a new virtual subnet that’s outside of the jurisdiction of AWS. The instance integrates with AWS just like a VPN appliance, routing traffic to wherever it needs to go. CloudStack can do whatever it wants in the virtual subnet, and everybody’s happy.

What do I mean by a virtual subnet? This is a subnet that exists only inside the EC2 instance.  It consists of logical network interfaces attached to a Linux bridge. The entire subnet exists inside a single EC2 instance. It doesn’t scale well, but it’s simple. In my next post, I’ll cover a more complicated setup with an overlay network that spans multiple instances to allow horizontal scaling.

The simple way

The simple way is to put everything in one EC2 instance, including the database, file storage, and a virtual subnet. Because everything’s stored locally, allocate enough disk space for your needs. 500 GB will be enough to support a few basic VMs. Create or select a security group for your instance that gives users access to the CloudStack UI (TCP port 8080). The security group should also allow access to any services that you’ll offer from your VMs.

EC2 instance summary info showing 1 instance, CentOS 7 (x86_64) AMI, c5.metal instance type, a security group name, and a 500 GiB volume

When you have your instance, configure AWS to treat it as a router.

  1. Go to Amazon EC2 in the AWS Management Console.
  2. Select your instance, and stop source/destination checking.

In the EC2 Actions menu, select Networking, then Change source/destination check.

3. Update the subnet route tables.

a. Go to the VPC settings, and select Route Tables.

b. Identify the tables for subnets that need CloudStack access.

c. In each of these tables, add a route to the new virtual subnet. The route target should be your EC2 instance.

4. Depending on your network needs, you may also need to add routes to transit gateways, VPN endpoints, etc.

Because everything will be on one server, creating a virtual subnet is simply a matter of creating a Linux bridge. CloudStack must find a network adapter attached to the bridge. Therefore, add a dummy interface with a name that CloudStack will recognize.

A single EC2 instance contains the CloudStack management service, the CloudStack agent, a dummy network interface, several virtual machines, and a router. All of those things are connected to each other by a virtual subnet that exists inside the instance. The instance's elastic network interface is connected between the router and the Amazon VPC.The following snippet shows how I configure networking in CentOS 7. You must provide values for the variables $virutal_host_ip_address and $virtual_netmask to reflect the virtual subnet that you want to create. For $dns_address, I recommend the base of the VPC IPv4 network range, plus two. You shouldn’t use 169.654.169.253 because CloudStack reserves link-local addresses for its own use.

yum install -y bridge-utils net-tools

# The bridge must be named cloudbr0.

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0

# Create a dummy network interface.
cat << EOF > /etc/sysconfig/modules/dummy.modules
/sbin/modprobe dummy numdummies=1
/sbin/ip link set name ethdummy0 dev dummy0

chmod +x /etc/sysconfig/modules/dummy.modules

cat << EOF > /etc/sysconfig/network-scripts/ifcfg-ethdummy0

# Turn the instance into a router

echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p

# Must kill dhclient or the network service won't restart properly.
# A reboot would also work, if you’d rather do that.

pkill dhclient
systemctl restart network

CloudStack must know which IP addresses to use for inter-service communication. It will select by resolving the machine’s fully qualified domain name (FQDN) to an address. The following commands will make it to choose the right one. You must provide a value for $virtual_host_ip_address.

hostnamectl set-hostname cloudstack.localdomain

echo "$virtual_host_ip_address cloudstack.localdomain" >> 

You can finish the setup by following the Quick Installation Guide.

Remember that CloudStack is only directly connected to your virtual network. The EC2 instance is the router that connects the virtual subnet to the Amazon VPC. When you’re configuring CloudStack, use your instance’s virtual subnet address as the default gateway.

Use the EC2 instance's virtual subnet IP address as the default gateway in CloudStack. In this example, the virtual subnet is, and the instance's address in that subnet is CloudStack then uses as the default gateway.

To access CloudStack from your workstation, you’ll need a connection to your VPC. This can be through a client VPN or a bastion host. If you use a bastion, its subnet needs a route to your virtual subnet, and you’ll need an SSH tunnel for your browser to access the CloudStack UI. The UI is at http://x.x.x.x:8080/client/, where x.x.x.x is your CloudStack instance’s virtual subnet address. Note that CloudStack’s console viewer won’t work if you’re using an SSH tunnel.

If you’re just experimenting with CloudStack, then I suggest saving money by stopping your instance when it isn’t needed. The safe way to do that is:

  1. Disable your zone in the CloudStack UI.
  2. Put the primary storage into maintenance mode.
  3. Wait for the switch to maintenance mode to be complete.
  4. Stop the EC2 instance.

When you’re ready to turn everything back on, simply reverse those steps. If you have any virtual routers in CloudStack, then you may need to start those, too.


If you used my CloudFormation template, then delete the stack and remove any route table entries you added. If you didn’t use CloudFormation, then terminate the EC2 instance, delete the security group you created for it, and remove any route table entries that you added.


Getting CloudStack to run on AWS isn’t so bad. The hardest part is simply knowing how. The setup explained here is great for small installations, but it can only scale vertically. In my next post, I’ll show you how to create an installation that scales horizontally. Instead of using a virtual subnet that exists in a single EC2 instance, we’ll build an overlay network that spans multiple instances. It will use more components and features, including some that might be new to you. I hope you find it interesting!

Now that you can create a simple setup, give it a try! I hope you have fun and learn something new along the way. Comment with the results of your experiments.

“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.

How Contino improved collaboration with Amazon CodeCatalyst

Post Syndicated from Chetan Makvana original https://aws.amazon.com/blogs/devops/how-contino-improved-collaboration-with-amazon-codecatalyst/

Amazon CodeCatalyst is a modern software development service that empowers teams to deliver software on AWS easily and quickly. CodeCatalyst provides one place where you can plan, code, and build, test, and deploy applications with continuous integration/continuous delivery (CI/CD) tools. It also helps streamlined team collaboration. Developers on modern software teams are usually distributed, work independently, and use disparate tools. Often, ad hoc collaboration is necessary to resolve problems. Today, developers are forced to do this across many tools, which distract developers from their primary task—adding business critical features and enhancing their quality and completeness.

In this post, we explain how Contino uses CodeCatalyst to on-board their engineering team onto new projects, eliminates the overhead of managing disparate tools, and streamlines collaboration among different stakeholders.

The Problem

Contino helps customers migrate their applications to the cloud, and then improves their architecture by taking full advantage of cloud-native features to improve agility, performance, and scalability. This usually involves the build out of a central landing zone platform. A landing zone is a set of standard building blocks that allows customers to automatically create accounts, infrastructure and environments that are pre-configured in line with security policies, compliance guidelines and cloud native best practices. Some features are common to most landing zones, for example creating secure container images, AMIs, and environment setup boilerplate. In order to provide maximum value to the customers, Contino develops in-house versions of such features, incorporating AWS best practices, and later rolls out to the customer’s environment with some customization. Contino’s technical consultants, who are not currently assigned to customer work, collectively known as ‘Squad 0’ work on these features. Squad 0 builds the foundation for the work that will be re-used by other squads that work directly with Contino’s customers. As the technical consultants are typically on Squad 0 for a short period, it is critical that they can be productive in this short time, without spending too much time getting set up.

To build these foundational services, Contino was looking for something more integrated that would allow them to quickly setup development environments, enable collaboration between Squad 0 members, invite other squads to validate foundations services usage for their respective customers, and provide access to different AWS accounts and git repos centrally from one place. Historically, Contino has used disparate tools to achieve this, which meant having to grant/revoke access to the various AWS accounts individually on a continual basis. With these disparate tools, granting access to the tools needed for squads to be productive was non-trivial.

The Solution

It was at this point Contino participated in the private beta for CodeCatalyst prior to the public preview. CodeCatalyst has allowed Contino to move to a structure, as shown in Figure 1 below. A Project Manager at Contino creates a different project for each foundational service and invites Squad 0 members to join the relevant project. With CodeCatalyst, Squad 0 technical consultants use features like CI/CD, source repositories, and issue trackers to build foundational services. This helps eliminate the overhead of managing and integrating developer tools and provides more time to focus on developing code. Once Squad 0 is ready with the foundational services, they invite customer squads using their email address to validate the readiness of the project for use with their customers. Finally, members of Squad 0 use Cloud 9 Dev Environments from within CodeCatalyst to rapidly create consistent cloud development environments, without manual configuration, so they can work on new or multiple projects simultaneously, without conflict.

With CodeCatalyst, Squad 0 technical consultants use features like CI/CD, source repositories, and issue trackers to build foundational services. This helps eliminate the overhead of managing and integrating developer tools and provides more time to focus on developing code.

Figure 1: CodeCatalyst with multiple account connections

Contino uses CI/CD to conduct multi-account deployments. Contino typically does one of two types of deployments: 1. Traditional sequential application deployment that is promoted from one environment to another, for example dev -> test -> prod, and 2. Parallel deployment, for example, a security control that is required to be deployed out into multiple AWS accounts at the same time. CodeCatalyst solves this problem by making it easier to construct workflows using a workflow definition file that can deploy either sequentially or in parallel to multiple AWS accounts. Figure 2 shows parallel deployment.

CodeCatalyst provides a feature to add CI/CD pipeline for Dev, Test and Production accounts

Figure 2: CI/CD with CodeCatalyst

The Value

CodeCatalyst has reduced the time it takes for members of Squad 0 to complete the necessary on-boarding to work on foundational services from 1.5 days to about 1 hour. These tasks include setting up connections to source repositories, setting up development environments, configuring IAM roles and trust relationships, etc. With support for integrated tools and better collaboration, CodeCatalyst minimized overhead for ad hoc collaboration. Squad 0 could spend more time on writing code to build foundation services. This has led to tasks being completed, on average, 20% faster. This increased productivity led to increased value delivered to Contino’s customers. As Squad 0 is more productive, more foundation services are available for other squads to reuse for their respective customers. Now, Contino’s teams on the ground working directly with customers can re-use these services with some customization for the specific needs of the customer.


Amazon CodeCatalyst brings together everything software development teams need to plan, code, build, test, and deploy applications on AWS into a streamlined, integrated experience. With CodeCatalyst, developers can spend more time developing application features and less time setting up project tools, creating and managing CI/CD pipelines, provisioning and configuring various development environments or coordinating with team members. With CodeCatalyst, the Contino engineers can improve productivity and focus on rapidly developing application code which captures business value for their customers.

About the authors:

Mark Faiers

Mark Faiers started out as a software engineer and later transitioned into DevOps, and Cloud. He has worked across numerous technology stacks and industries, including Healthcare, FinTech, and Logistics. Mark is currently working as an AWS consultant to some of the biggest Financial and Insurance firms in the U.K., as well as running the AWS Practice at Contino. He is especially passionate about serverless, and sustainability.

Chetan Makvana

Chetan Makvana is a senior solutions architect working with global systems integrators at AWS. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture and execute strategies to drive adoption of AWS services. He is a technology enthusiast and a builder with a core area of interest on serverless and DevOps. Outside of work, he enjoys binge-watching, traveling and music.

Green Flag uses Amazon QuickSight to democratize data and enable self-serve insights to all employees

Post Syndicated from Jeremy Bristow original https://aws.amazon.com/blogs/big-data/green-flag-uses-amazon-quicksight-to-democratize-data-and-enable-self-serve-insights-to-all-employees/

This is a guest post by Jeremy Bristow, Head of Product at Green Flag.

In the US, there’s a saying: “Sooner or later, you’ll break down and call Triple A.” In the UK, that same saying might be “Sooner or later, you’ll break down and call Green Flag.”

Green Flag has been assisting stranded motorists all over Europe for more than 50 years. Having started in the UK in 1971, with the first office located above a fish and chip shop, we have built an expansive network of automobile service providers to assist customers. With that network in place, Green Flag customers need only make a single phone call, or request service via the Green Flag app, to quickly contact a local repair garage, no matter where they’ve broken down.

Green Flag’s commitment to providing world-class service to its customers has consistently earned a Net Promoter Score (NPS) of over 70. We were also recently rated as the top breakdown provider in the H1 2022 UK Customer Satisfaction Index report, which is a national benchmark of customer satisfaction covering 13 sectors and based on 45,000 customer responses.

Unlike most breakdown providers in the UK, Green Flag doesn’t currently own a large fleet of trucks to provide roadside assistance to our 3 million plus customers. Our established network of over 200 locally operated businesses provides in-depth knowledge and expertise related to their specific areas. Our overall mission is simple: “Make motoring stress-free, safe, and simple, for every driver.”

Tech transformation to democratize data

Over the past 4 years, Green Flag has been undertaking a technology transformation with the end goal of enabling data democratization and faster data-driven decisions. This has been a massive undertaking, requiring the business to replace our entire systems architecture, which we’ve done while remaining open 24/7, 365 days a year, maintaining our industry-leading customer service standards and growing the business. Building the plane while flying it is an apt analogy, but it felt more like performing open-heart surgery while running a marathon.

Early on during our tech transformation, we made the decision to develop natively with AWS, using our in-house development team. When it came time to decide on a new business intelligence (BI) tool, it was an easy decision to switch to Amazon QuickSight.

In this post, we discuss what influenced the decision to implement QuickSight and why doing so has made a significant impact on our efficiency and agility.

Changing the game with self-serve insights

Prior to embarking on this transformation, technology had consistently been a blocker to us being able to quickly serve our customers. Our old stack was slow, hard to change, and consistently prevented us from gaining valuable and timely insights from our datasets. We were data rich, but insight poor. It was clear that the status quo was not compatible with our ambitions to grow and trade in an increasingly digital world. We needed a BI tool to bring new and meaningful insights to our data, which would help us leverage our new technologies to deliver meaningful business change.

Although going with QuickSight was an easy decision, we considered the following factors to ensure our needs would be met:

  • Alignment to AWS architecture – We didn’t want to invest time and development resources into complicated implementations—we wanted a seamless integration that would cause no headaches.
  • Ease of use – We needed an intuitive, user-friendly interface that would make it easy for any user, no matter their tech background or level of expertise in pulling data insights.
  • Cost – Affordability is always a concern. With AWS, we can monitor daily usage and the associated costs via our console, so there were never any surprises or hidden costs post-implementation.

Although these considerations were all top of mind, the primary driver for switching to QuickSight is rooted in the data democratization goal within our tech transformation initiative. Providing self-serve access to data and insights to everyone, no matter their tech background, has been a game changer. With QuickSight, we can now present data and insights in a meaningful, easy-to-understand format via near-real-time embedded dashboards that anyone from any tech background can easily build.

Evolving our data culture with QuickSight

Because all employees are now empowered to make data-driven decisions, our data culture has begun to evolve. Having consistent, accurate dashboards that access standardized datasets brings with it a new level of confidence in decision-making. We’re no longer wasting time on discussions about data sources or the risks of inaccuracy. QuickSight and the data governance processes we have built around its use have shifted the discussion to focus on the insights, what they tell us, and what our next steps should be.

The following screenshot shows one of Green Flag’s custom-built dashboards.

Green Flag QuickSight dashboard

Additionally, because QuickSight is so easy to use that self-service requires minimal training and no specialized skills, our data experts can now shift from responding to constant requests for information to focusing on more valuable projects that require their technical depth and expertise. We can now work more efficiently, producing actionable insights that have never before been available, at a pace that enables us to make better decisions, faster.

Mapping our next analytics adventure

The next features that the Green Flag team plan to test and then roll out is Amazon QuickSight Q, which should allow both our data teams and business users to have a more frictionless experience with our data. The feature should enable users to interrogate complex data using simple conversational questions as if they were talking to a colleague, producing results that are then customizable. This will further empower our self-serve strategy by enabling non-technical experts to pose business questions rather than rely on technical expertise and experience.

To learn more about how you can embed customized data visuals, interactive dashboards, and natural language querying into any application, visit Amazon QuickSight Embedded.

About the Author

Jeremy “Jez” Bristow is Head of Product at Green Flag. He has been working at Green Flag, part of Direct Line Group, for over 4 years and has been co-responsible for delivering the digital transformation programme within Green Flag. They have built a cloud-based ecosystem of micro-services supported by a data platform within AWS to enable the delivery of better customer outcomes to grow their business.

The most visited AWS DevOps blogs in 2022

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/devops/the-most-visited-aws-devops-blogs-in-2022/

As we kick off 2023, I wanted to take a moment to highlight the top posts from 2022. Without further ado, here are the top 10 AWS DevOps Blog posts of 2022.

#1: Integrating with GitHub Actions – CI/CD pipeline to deploy a Web App to Amazon EC2

Coming in at #1, Mahesh Biradar, Solutions Architect and Suresh Moolya, Cloud Application Architect use GitHub Actions and AWS CodeDeploy to deploy a sample application to Amazon Elastic Compute Cloud (Amazon EC2).

Architecture diagram from the original post.

#2: Deploy and Manage GitLab Runners on Amazon EC2

Sylvia Qi, Senior DevOps Architect, and Sebastian Carreras, Senior Cloud Application Architect, guide us through utilizing infrastructure as code (IaC) to automate GitLab Runner deployment on Amazon EC2.

Architecture diagram from the original post.

#3 Multi-Region Terraform Deployments with AWS CodePipeline using Terraform Built CI/CD

Lerna Ekmekcioglu, Senior Solutions Architect, and Jack Iu, Global Solutions Architect, demonstrate best practices for multi-Region deployments using HashiCorp Terraform, AWS CodeBuild, and AWS CodePipeline.

Architecture diagram from the original post.

#4 Use the AWS Toolkit for Azure DevOps to automate your deployments to AWS

Mahmoud Abid, Senior Customer Delivery Architect, leverages the AWS Toolkit for Azure DevOps to deploy AWS CloudFormation stacks.

Architecture diagram from the original post.

#5 Deploy and manage OpenAPI/Swagger RESTful APIs with the AWS Cloud Development Kit

Luke Popplewell, Solutions Architect, demonstrates using AWS Cloud Development Kit (AWS CDK) to build and deploy Amazon API Gateway resources using the OpenAPI specification.

Architecture diagram from the original post.

#6: How to unit test and deploy AWS Glue jobs using AWS CodePipeline

Praveen Kumar Jeyarajan, Senior DevOps Consultant, and Vaidyanathan Ganesa Sankaran, Sr Modernization Architect, discuss unit testing Python-based AWS Glue Jobs in AWS CodePipeline.

Architecture diagram from the original post.

#7: Jenkins high availability and disaster recovery on AWS

James Bland, APN Global Tech Lead for DevOps, and Welly Siauw, Sr. Partner solutions architect, discuss the challenges of architecting Jenkins for scale and high availability (HA).

Architecture diagram from the original post.

#8: Monitor AWS resources created by Terraform in Amazon DevOps Guru using tfdevops

Harish Vaswani, Senior Cloud Application Architect, and Rafael Ramos, Solutions Architect, explain how you can configure and use tfdevops to easily enable Amazon DevOps Guru for your existing AWS resources created by Terraform.

Architecture diagram from the original post.

#9: Manage application security and compliance with the AWS Cloud Development Kit and cdk-nag

Arun Donti, Senior Software Engineer with Twitch, demonstrates how to integrate cdk-nag into an AWS Cloud Development Kit (AWS CDK) application to provide continual feedback and help align your applications with best practices.

Featured image from the original post.

#10: Smithy Server and Client Generator for TypeScript (Developer Preview)

Adam Thomas, Senior Software Development Engineer, demonstrate how you can use Smithy to define services and SDKs and deploy them to AWS Lambda using a generated client.

Architecture diagram from the original post.

A big thank you to all our readers! Your feedback and collaboration are appreciated and help us produce better content.



About the author:

Brian Beach

Brian Beach has over 20 years of experience as a Developer and Architect. He is currently a Principal Solutions Architect at Amazon Web Services. He holds a Computer Engineering degree from NYU Poly and an MBA from Rutgers Business School. He is the author of “Pro PowerShell for Amazon Web Services” from Apress. He is a regular author and has spoken at numerous events. Brian lives in North Carolina with his wife and three kids.