Leading modern application platform space OutSystems is a low-code platform that provides tools for companies to develop, deploy, and manage omnichannel enterprise applications.
Security is a top priority at OutSystems. Their Security Operations Center (SOC) deals with thousands of incidents a year, each with a set of response actions that need to be executed as quickly as possible. Providing security at such large scale is a challenge, even for the most well-prepared organizations. Manual and repetitive tasks account for the majority of the response time involved in this process, and decreasing this key metric requires orchestration and automation.
Security orchestration, automation, and response (SOAR) systems are designed to translate security analysts’ manual procedures into automated actions, making them faster and more scalable.
In this blog post, we’ll explore how OutSystems lowered their incident response time by 99 percent by designing and deploying a custom SOAR using Serverless services on AWS.
Solution architecture
Security incidents happen with unknown frequency, making serverless services a natural fit to boost security at OutSystems because of their increased agility and capability to scale to zero.
There are two ways to trigger SOAR actions in this architecture:
Automatically through Security Information and Event Management (SIEM) security incident findings
On-demand through chat application
Using the first method, when a security incident is detected by the SIEM, an event is published to Amazon Simple Notification Service (Amazon SNS). This triggers an AWS Lambda function that creates a ticket in an internal ticketing system. Then the Lambda Playbooks function triggers to decide which playbook to run depending on the incident details.
Each playbook is a set of actions that are executed in response to a trigger. Playbooks are the key component behind automated tasks. OutSystems uses AWS Step Functions to orchestrate the actions and Lambda functions to execute them.
But this solution does not exist in isolation. Depending on the playbook, Step Functions interacts with other components such as AWS Secrets Manager or external APIs.
Using the second method, the on-demand trigger for OutSystems SOAR relies on a chat application. This application calls a Lambda function URL that interacts with the playbooks we just discussed.
Figure 1 represents the high-level architecture of OutSystems’ custom SOAR.
This same IaC architecture is used when new playbooks or updates to existing ones are made. Code changes that are committed to a source control repository trigger the CodePipeline which uses AWS CodeBuild and CloudFormation change sets to deploy the updates to the affected resources.
Use cases
The use cases that OutSystems has deployed playbooks for to date include:
SQL injection
Unauthorized access to credentials
Issuance of new certificates
Login brute forces
Impossible travel
Let’s explore the Impossible travel use case. Impossible travel happens when a user logs in from one location, and then later logs in from a different location that would be impossible to travel between within the elapsed time.
When the SIEM identifies this behavior, it triggers an alert and the following actions are performed:
A ticket is created
An IP address check is performed in reputation databases, such as AbuseIPDB or VirusTotal
An IP address check is performed in the internal database, and the IP address is added if it is not found
A search is performed for past events with the same IP address
Recent logins of the user are identified in the SIEM, along with all related information
All of this information is automatically added to the ticket. Every step listed here was previously performed manually; a task that took an average of 15 minutes. Now, the process takes just 8 seconds—a 99.1% incident response time improvement.
The following remediation actions can also be automated, along with many others:
Some of these remediation actions are already in place, while others are in development.
Conclusion
At OutSystems, much like at AWS, security is considered “job zero.” It is not only important to be proactive in preventing security incidents, but when they happen, the response must be quick, effective, and as immune to human error as possible.
With the implementation of this custom SOAR, OutSystems reduced the average response time to security incidents by 99%. Tasks that previously took 76 hours of analysts’ time are now accomplished automatically within 31 minutes.
During the evaluation period, SOAR addressed hundreds of real-world incidents with some threat intel use cases being executed thousands of times.
An architecture composed of serverless services ensures OutSystems does not pay for systems that are standing by waiting for work, and at the same time, not compromising on performance.
As organizations continue to grow and scale their applications, the need for teams to be able to quickly and autonomously detect anomalous operational behaviors becomes increasingly important. Amazon DevOps Guru offers a fully managed AIOps service that enables you to improve application availability and resolve operational issues quickly. DevOps Guru helps ease this process by leveraging machine learning (ML) powered recommendations to detect operational insights, identify the exhaustion of resources, and provide suggestions to remediate issues. Many organizations running business critical applications use different tools to be notified about anomalous events in real-time for the remediation of critical issues. Atlassian is a modern team collaboration and productivity software suite that helps teams organize, discuss, and complete shared work. You can deliver these insights in near-real time to DevOps teams by integrating DevOps Guru with Atlassian Opsgenie. Opsgenie is a modern incident management platform that receives alerts from your monitoring systems and custom applications and categorizes each alert based on importance and timing.
This blog post walks you through how to integrate Amazon DevOps Guru with Atlassian Opsgenie to receive notifications for new operational insights detected by DevOps Guru with more flexibility and customization using Amazon EventBridge and AWS Lambda. The Lambda function will be used to demonstrate how to customize insights sent to Opsgenie.
Solution overview
Figure 1: Amazon EventBridge Integration with Opsgenie using AWS Lambda
Amazon DevOps Guru directly integrates with Amazon EventBridge to notify you of events relating to generated insights and updates to insights. To begin routing these notifications to Opsgenie, you can configure routing rules to determine where to send notifications. As outlined below, you can also use pre-defined DevOps Guru patterns to only send notifications or trigger actions that match that pattern. You can select any of the following pre-defined patterns to filter events to trigger actions in a supported AWS resource. Here are the following predefined patterns supported by DevOps Guru:
DevOps Guru New Insight Open
DevOps Guru New Anomaly Association
DevOps Guru Insight Severity Upgraded
DevOps Guru New Recommendation Created
DevOps Guru Insight Closed
By default, the patterns referenced above are enabled so we will leave all patterns operational in this implementation. However, you do have flexibility to change which of these patterns to choose to send to Opsgenie. When EventBridge receives an event, the EventBridge rule matches incoming events and sends it to a target, such as AWS Lambda, to process and send the insight to Opsgenie.
Prerequisites
The following prerequisites are required for this walkthrough:
In this step, you will navigate to Opsgenie to create the integration with DevOps Guru and to obtain the API key and team name within your account. These parameters will be used as inputs in a later section of this blog.
Navigate to Teams, and take note of the team name you have as shown below, as you will need this parameter in a later section.
Figure 2: Opsgenie team names
Click on the team to proceed and navigate to Integrations on the left-hand pane. Click on Add Integration and select the Amazon DevOps Guru option.
Figure 3: Integration option for DevOps Guru
Now, scroll down and take note of the API Key for this integration and copy it to your notes as it will be needed in a later section. Click Save Integration at the bottom of the page to proceed.
Figure 4: API Key for DevOps Guru Integration
Now, the Opsgenie integration has been created and we’ve obtained the API key and team name. The email of any team member will be used in the next section as well.
Review & launch the AWS SAM template to deploy the solution
In this step, you will review & launch the SAM template. The template will deploy an AWS Lambda function that is triggered by an Amazon EventBridge rule when Amazon DevOps Guru generates a new event. The Lambda function will retrieve the parameters obtained from the deployment and pushes the events to Opsgenie via an API.
Reviewing the template
Below is the SAM template that will be deployed in the next step. This template launches a few key components specified earlier in the blog. The Transform section of the template allows us takes an entire template written in the AWS Serverless Application Model (AWS SAM) syntax and transforms and expands it into a compliant CloudFormation template. Under the Resources section this solution will deploy an AWS Lamba function using the Java runtime as well as an Amazon EventBridge Rule/Pattern. Another key aspect of the template are the Parameters. As shown below, the ApiKey, Email, and TeamName are parameters we will use for this CloudFormation template which will then be used as environment variables for our Lambda function to pass to OpsGenie.
Figure 5: Review of SAM Template
Launching the Template
Navigate to the directory of choice within a terminal and clone the GitHub repository with the following command:
Change directories with the command below to navigate to the directory of the SAM template.
cd amazon-devops-guru-connector-opsgenie/OpsGenieServerlessTemplate
From the CLI, use the AWS SAM to build and process your AWS SAM template file, application code, and any applicable language-specific files and dependencies.
sam build
From the CLI, use the AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file.
sam deploy --guided
You will now be prompted to enter the following information below. Use the information obtained from the previous section to enter the Parameter ApiKey, Parameter Email, and Parameter TeamName fields.
Stack Name
AWS Region
Parameter ApiKey
Parameter Email
Parameter TeamName
Allow SAM CLI IAM Role Creation
Test the solution
Follow this blog to enable DevOps Guru and generate an operational insight.
When DevOps Guru detects a new insight, it will generate an event in EventBridge. EventBridge then triggers Lambda and sends the event to Opsgenie as shown below.
Figure 6: Event Published to Opsgenie with details such as the source, alert type, insight type, and a URL to the insight in the AWS console.enecccdgruicnuelinbbbigebgtfcgdjknrjnjfglclt
Cleaning up
To avoid incurring future charges, delete the resources.
From the command line, use AWS SAM to delete the serverless application along with its dependencies.
sam delete
Customizing Insights published using Amazon EventBridge & AWS Lambda
The foundation of the DevOps Guru and Opsgenie integration is based on Amazon EventBridge and AWS Lambda which allows you the flexibility to implement several customizations. An example of this would be the ability to generate an Opsgenie alert when a DevOps Guru insight severity is high. Another example would be the ability to forward appropriate notifications to the AIOps team when there is a serverless-related resource issue or forwarding a database-related resource issue to your DBA team. This section will walk you through how these customizations can be done.
EventBridge customization
EventBridge rules can be used to select specific events by using event patterns. As detailed below, you can trigger the lambda function only if a new insight is opened and the severity is high. The advantage of this kind of customization is that the Lambda function will only be invoked when needed.
Open the file template.yaml reviewed in the previous section and implement the changes as highlighted below under the Events section within resources (original file on the left, changes on the right hand side).
Figure 7: CloudFormation template file changed so that the EventBridge rule is only triggered when the alert type is “DevOps Guru New Insight Open” and insightSeverity is “high”.
Save the changes and use the following command to apply the changes
sam deploy --template-file template.yaml
Accept the changeset deployment
Determining the Ops team based on the resource type
Another customization would be to change the Lambda code to route and control how alerts will be managed. Let’s say you want to get your DBA team involved whenever DevOps Guru raises an insight related to an Amazon RDS resource. You can change the AlertType Java class as follows:
To begin this customization of the Lambda code, the following changes need to be made within the AlertType.java file:
At the beginning of the file, the standard java.util.List and java.util.ArrayList packages were imported
Line 60: created a list of CloudWatch metrics namespaces
Line 74: Assigned the dataIdentifiers JsonNode to the variable dataIdentifiersNode
Line 75: Assigned the namespace JsonNode to a variable namespaceNode
Line 77: Added the namespace to the list for each DevOps Insight which is always raised as an EventBridge event with the structure detail►anomalies►0►sourceDetails►0►dataIdentifiers►namespace
Line 88: Assigned the default responder team to the variable defaultResponderTeam
Line 89: Created the list of responders and assigned it to the variable respondersTeam
Line 92: Check if there is at least one AWS/RDS namespace
Line 93: Assigned the DBAOps_Team to the variable dbaopsTeam
Line 93: Included the DBAOps_Team team as part of the responders list
Line 97: Set the OpsGenie request teams to be the responders list
Figure 8: java.util.List and java.util.ArrayList packages were imported
Figure 9: AlertType Java class customized to include DBAOps_Team for RDS-related DevOps Guru insights.
You then need to generate the jar file by using the mvn clean package command.
As result, the DBAOps_Team will be assigned to the Opsgenie alert in the case a DevOps Guru Insight is related to RDS.
Figure 10: Opsgenie alert assigned to both DBAOps_Team and AIOps_Team.
Conclusion
In this post, you learned how Amazon DevOps Guru integrates with Amazon EventBridge and publishes insights to Opsgenie using AWS Lambda. By creating an Opsgenie integration with DevOps Guru, you can now leverage Opsgenie strengths, incident management, team communication, and collaboration when responding to an insight. All of the insight data can be viewed and addressed in Opsgenie’s Incident Command Center (ICC). By customizing the data sent to Opsgenie via Lambda, you can empower your organization even more by fine tuning and displaying the most relevant data thus decreasing the MTTR (mean time to resolve) of the responding operations team.
This post is co-written with Vidya Kotamraju and Tallis Hobbs, from Diligent.
Diligent is the global leader in modern governance, providing software as a service (SaaS) services across governance, risk, compliance, and audit, helping companies meet their environmental, social, and governance (ESG) commitments. Serving more than 1 million users from over 25,000 customers around the world, we empower transformational leaders with software, insights, and confidence to drive greater impact and lead with purpose.
We provide the right governance technology that empowers our customers to act strategically while maintaining compliance, mitigating risk, and driving efficiency. With the Diligent Platform, organizations can bring their most critical data into one centralized place. By using powerful analytics, automation, and unparalleled industry data, our customers’ board and c-suite get relevant insights from across risk, compliance, audit, and ESG teams that help them make better decisions, faster and more securely.
One of the biggest obstacles that customers face is obtaining a holistic view of their data. To effectively manage risks and ensure compliance, organizations need to have a comprehensive understanding of their operations and processes. However, this can be difficult to achieve. Scenarios such as data being dispersed across multiple systems and departments, or if data is not consistently collected and updated, or if data is not in a format that can be easily analyzed can all present various challenges. To address them, we turned to Amazon QuickSight to enhance our customer-facing products with embedded insights and reports.
In this post, we cover what we were looking for in a business intelligence (BI) tool, and how QuickSight met our requirements.
Narrowing down the competition
To effectively serve our customers, we needed a platform-wide reporting solution that would enable our users to centralize their governance, risk, and compliance (GRC) programs, and collect information from disparate data sources, while allowing for integrated automation and analytics for data-driven insights, thereby providing a curated picture of GRC with confidence.
When we started our research into various BI tool offerings to embed into our platform, we narrowed the list down to a handful that had most of the capabilities we were looking for. After reviewing the options, QuickSight was our top option when it came to the ease of integration with our existing AWS-built ecosystem. QuickSight offered everything we needed, with the flexibility we wanted, at an affordable price.
Why we chose QuickSight
There are many data points that can be useful for making business decisions; the specific data points that are most critical will depend on the nature of the business and the decisions being made. However, there are some common types of data that are often important for making informed business decisions: financial data, marketing data, operational data, and customer data.
Translating those requirements into BI tool functionality, we were looking for:
A seamless way to obtain a holistic and unified view of data
The ability to handle substantial amounts (over 100 TB) of data
Enough flexibility to support the changing needs of our solution as we grow
Great value for the price
QuickSight checked all the boxes on our list. The most compelling reasons why we ultimately chose QuickSight were:
Visualization and reporting capabilities – QuickSight offers a wide range of visualization options and allows creation of custom reports and dashboards
Data sources – QuickSight supports a wide variety of data sources, making connection and analysis easy
Ease of integration – QuickSight fit seamlessly with our existing AWS technology stack with a price that fits our budget
Today, we’re using Quicksight to create a customer-facing reporting platform that allows our customers to report on their data within our ecosystem. QuickSight helps empower our customers by putting the reporting tools and capability in their hands, allowing them to get a comprehensive, personalized (via row-level security) view of data, unique to their workflow.
The following screenshot shows an example of our Issues & Actions dashboard, designed for risk managers and audit managers, showing various issues in need of attention.
QuickSight has provided a way to enable our customers to bring data and intelligence to the board or leadership teams in a simple, more streamlined way that saves time and effort—by automating standard reporting and surfacing it in a rich and interactive dashboard for directors. Boards and leaders will have access to curated insights, culled from both internal operations and external sources, integrated into the Diligent Boards platform—visualized in such a way that their data tells the story that accompanies the board materials.
For us, the most compelling benefit of using QuickSight is the ease of integration with Diligent’s existing tech stack and data stack. Quicksight integrates seamlessly with other AWS products in our technology stack, making it easy to incorporate data from various sources and systems into our dashboards and reports.
QuickSight was the perfect fit
Our customers love the flexibility with reporting. Quicksight provides a range of visualization options that allows users to customize their dashboards and reports to fit their specific needs and preferences. We love that the QuickSight team is open to taking prompt action on customer feedback. Their continuous and frequent feature release process is confidence-inspiring.
QuickSight helps us provide flexibility to our customers, enabling them to quickly put the right data in front of the right audience to make the right business decisions.
Vidya Kotamraju is a Product Management Leader at Diligent, with close to 2 decades of experience leading award-winning B2B, B2C product and team success across multiple industries and geographies. Currently, she is focused on Diligent Highbond’s Data Automation Solutions.
Tallis Hobbs is a Senior Software Engineer at Diligent. As a previous educator, he brings a unique skill set to the engineering space. He is passionate about the AWS serverless space and currently works on Diligent’s client facing Quicksight integration.
Samit Kumbhani is a Sr. Solutions Architect at AWS based out of New York City area. Has has 18+ years of experience in building applications and focuses on Analytics, Business Intelligence and Databases. He enjoys working with customers to understand their challenges and solve them by creating innovative solutions using AWS services. Outside of work, Samit loves playing cricket, traveling and spending time with his family and friends.
This post is written by Ankur Sethi, Sr. Product Manager, EC2, and Kinnar Sen, Sr. Specialist Solution Architect, AWS Compute.
Amazon EC2 Auto Scaling helps customers optimize their Amazon EC2 capacity by dynamically responding to varying demand. Based on customer feedback, we enhanced the scaling experience with the launch of predictive scaling policies. Predictive scaling proactively adds EC2 instances to your Auto Scaling group in anticipation of demand spikes. This results in better availability and performance for your applications that have predictable demand patterns and long initialization times. We recently launched a couple of features designed to help you assess the value of predictive scaling – prescriptive recommendations on whether to use predictive scaling based on its potential availability and cost impact, and integration withAmazon CloudWatch to continuously monitor the accuracy of predictions. In this post, we discuss the features in detail and the steps that you can easily adopt to enjoy the benefits of predictive scaling.
Recap: Predictive Scaling
EC2 Auto Scaling helps customers maintain application availability by managing the capacity and health of the underlying cluster. Prior to predictive scaling, EC2 Auto Scaling offered dynamic scaling policies such as target tracking and step scaling. These dynamic scaling policies are configured with an Amazon CloudWatch metric that represents an application’s load. EC2 Auto Scaling constantly monitors this metric and responds according to your policies, thereby triggering the launch or termination of instances. Although it’s extremely effective and widely used, this model is reactive in nature, and for larger spikes, may lead to unfulfilled capacity momentarily as the cluster is scaling out. Customers mitigate this by adopting aggressive scale out and conservative scale in to manage the additional buffer of instances. However, sometimes applications take a long time to initialize or have a recurring pattern with a sudden spike of high demand. These can have an impact on the initial response of the system when it is scaling out. Customers asked for a proactive scaling mechanism that can scale capacity ahead of predictable spikes, and so we delivered predictive scaling.
Predictive scaling was launched to make the scaling action proactive as it anticipates the changes required in the compute demand and scales accordingly. The scaling action is determined by ensemble machine learning (ML) built with data from your Auto Scaling group’s scaling patterns, as well as billions of data points from our observations. Predictive scaling should be used for applications where demand changes rapidly but with a recurring pattern, instances require a long time to initialize, or where you’re manually invoking scheduled scaling for routine demand patterns. Predictive scaling not only forecasts capacity requirements based on historical usage, but also learns continuously, thereby making forecasts more accurate with time. Furthermore, predictive scaling policy is designed to only scale out and not scale in your Auto Scaling groups, eliminating the risk of ending with lesser capacity because of inexact predictions. You must use dynamic scaling policy, scheduled scaling, or your own custom mechanism for scale-ins. In case of exceptional demand spikes, this addition of dynamic scaling policy can also improve your application performance by bridging the gap between demand and predicted capacity.
What’s new with predictive scaling
Predictive scaling policies can be configured in a non-mutative ‘Forecast Only’ mode to evaluate the accuracy of forecasts. When you’re ready to start scaling, you can switch to the ‘Forecast and Scale’ mode. Now we prescriptively recommend whether your policy should be switched to ‘Forecast and Scale’ mode if it can potentially lead to better availability and lower costs, saving you the time and effort of doing such an evaluation manually. You can test different configurations by creating multiple predictive scaling policies in ‘Forecast Only’ mode, and choose the one that performs best in terms of availability and cost improvements.
Monitoring and observability are key elements of AWS Well Architected Framework. Now we also offer CloudWatch metrics for your predictive scaling policies so that that you can programmatically monitor your predictive scaling policy for demand pattern changes or prolonged periods of inaccurate predictions. This will enable you to monitor the key performance metrics and make it easier to adopt AWS Well-Architected best practices.
In the following sections, we deep dive into the details of these two features.
Recommendations for predictive scaling
Once you set up an Auto Scaling group with predictive scaling policy in Forecast Only mode as explained in this introduction to predictive scaling blog post , you can review the results of the forecast visually and adjust any parameters to more accurately reflect the behavior that you desire. Evaluating simply on the basis of visualization may not be very intuitive if the scaling patterns are erratic. Moreover, if you keep higher minimum capacities, then the graph may show a flat line for the actual capacity as your Auto Scaling group capacity is an outcome of existing scaling policy configurations and the minimum capacity that you configured. This makes it difficult to contemplate whether the lower capacity predicted by predictive scaling wouldn’t leave your Auto Scaling group under-scaled.
This new feature provides a prescriptive guidance to switch on predictive scaling in Forecast and Scale mode based on the factors of availability and cost savings. To determine the availability and cost savings, we compare the predictions against the actual capacity and the optimal, required capacity. This required capacity is inferred based on whether your instances were running at a higher or lower value than the target value for scaling metric that you defined as part of the predictive scaling policy configuration. For example, if an Auto Scaling group is running 10 instances at 20% CPU Utilization while the target defined in predictive scaling policy is 40%, then the instances are running under-utilized by 50% and the required capacity is assumed to be 5 instances (half of your current capacity). For an Auto Scaling group, based on the time range in which you’re interested (two weeks as default), we aggregate the cost saving and availability impact of predictive scaling. The availability impact measures for the amount of time that the actual metric value was higher than the target value that you defined to be optimal for each policy. Similarly, cost savings measures the aggregated savings based on the capacity utilization of the underlying Auto Scaling group for each defined policy. The final cost and availability will lead us to a recommendation based on:
If availability increases (or remains same) and cost reduces (or remains same), then switch on Forecast and Scale
If availability reduces, then disable predictive scaling
If availability increase comes at an increased cost, then the customer should take the call based on their cost-availability tradeoff threshold
Figure 1: Predictive Scaling Recommendations on EC2 Auto Scaling console
The preceding figure shows how the console reflects the recommendation for a predictive scaling policy. You get information on whether the policy can lead to higher availability and lower cost, which leads to a recommendation to switch to Forecast and Scale. To achieve this cost saving, you might have to lower your minimum capacity and aim for higher utilization in dynamic scaling policies.
To get the most value from this feature, we recommend that you create multiple predictive scaling policies in Forecast Only mode with different configurations, choosing different metrics and/or different target values. Target value is an important lever that changes how aggressive the capacity forecasts must be. A lower target value increases your capacity forecast resulting in better availability for your application. However, this also means more dollars to be spent on the Amazon EC2 cost. Similarly, a higher target value can leave you under-scaled while reactive scaling bridges the gap in just a few minutes. Separate estimates of cost and availability impact are provided for each of the predictive scaling policies. We recommend using a policy if either availability or cost are improved and the other variable improves or stays the same. As long as there is a predictable pattern, Auto Scaling enhanced with predictive scaling maintains high availability for your applications.
Continuous Monitoring of predictive scaling
Once you’re using a predictive scaling policy in Forecast and Scale mode based on the recommendation, you must monitor the predictive scaling policy for demand pattern changes or inaccurate predictions. We introduced two new CloudWatch Metrics for predictive scaling called ‘PredictiveScalingLoadForecast’ and ‘PredictiveScalingCapacityForecast’. Using CloudWatch mertic math feature, you can create a customized metric that measures the accuracy of predictions. For example, to monitor whether your policy is over or under-forecasting, you can publish separate metrics to measure the respective errors. In the following graphic, we show how the metric math expressions can be used to create a Mean Absolute Error for over-forecasting on the load forecasts. Because predictive scaling can only increase capacity, it is useful to alert when the policy is excessively over-forecasting to prevent unnecessary cost.Figure 2: Graphing an accuracy metric using metric math on CloudWatch
In the previous graph, the total CPU Utilization of the Auto Scaling group is represented by m1 metric in orange color while the predicted load by the policy is represented by m2 metric in green color. We used the following expression to get the ratio of over-forecasting error with respect to the actual value.
IF((m2-m1)>0, (m2-m1),0))/m1
Next, we will setup an alarm to automatically send notifications using Amazon Simple Notification Service (Amazon SNS). You can create similar accuracy monitoring for capacity forecasts, but remember that once the policy is in Forecast and Scale mode, it already starts influencing the actual capacity. Hence, putting alarms on load forecast accuracy might be more intuitive as load is generally independent of the capacity of an Auto Scaling group.
Figure 3: Creating a CloudWatch Alarm on the accuracy metric
In the above screenshot, we have set an alarm that triggers when our custom accuracy metric goes above 0.02 (20%) for 10 out of last 12 data points which translates to 10 hours of the last 12 hours. We prefer to alarm on a greater number of data points so that we get notified only when predictive scaling is consistently giving inaccurate results.
Conclusion
With these new features, you can make a more informed decision about whether predictive scaling is right for you and which configuration makes the most sense. We recommend that you start off with Forecast Only mode and switch over to Forecast and Scale based on the recommendations. Once in Forecast and Scale mode, predictive scaling starts taking proactive scaling actions so that your instances are launched and ready to contribute to the workload in advance of the predicted demand. Then continuously monitor the forecast to maintain high availability and cost optimization of your applications. You can also use the new predictive scaling metrics and CloudWatch features, such as metric math, alarms, and notifications, to monitor and take actions when predictions are off by a set threshold for prolonged periods.
This post is co-written with Byungjun Choi and Sangha Yang from SikSin.
SikSin is a technology platform connecting customers with restaurant partners serving their multiple needs. Customers use the SikSin platform to search and discover restaurants, read and write reviews, and view photos. From the restaurateurs’ perspective, SikSin enables restaurant partners to engage and acquire customers in order to grow their business. SikSin has a partnership with 850 corporate companies and more than 50,000 restaurants. They issue restaurant e-vouchers to more than 220,000 members, including individuals as well as corporate members. The SikSin platform receives more than 3 million users in a month. SikSin was listed in the top 100 of the Financial Times’s Asia-Pacific region’s high-growth companies in 2022.
SikSin was looking to deliver improved customer experiences and increase customer engagement. SikSin confronted two business challenges:
Customer engagement – SikSin maintains data on more than 750,000 restaurants and has more than 4,000 restaurant articles (and growing). SikSin was looking for a personalized and customized approach to provide restaurant recommendations for their customers and get them engaged with the content, thereby providing a personalized customer experience.
Data analysis activities – The SikSin Food Service team experienced difficulties in regards to report generation due to scattered data across multiple systems. The team previously had to submit a request to the IT team and then wait for answers that might be outdated. For the IT team, they needed to manually pull data out of files, databases, and applications, and then combine them upon every request, which is a time-consuming activity. The SikSin Food Service team wanted to view web analytics log data by multiple dimensions, such as customer profiles and places. Examples include page view, conversion rate, and channels.
To overcome these two challenges, SikSin participated in the AWS Data Lab program to assist them in building a prototype solution. The AWS Data Lab offers accelerated, joint-engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives. The Build Lab is a 2–5-day intensive build with a technical customer team.
In this post, we share how SikSin built the basis for accelerating their data project with the help of the Data Lab and Amazon Personalize.
Use cases
The Data Lab team and SikSin team had three consecutive meetings to discuss business and technical requirements, and decided to work on two uses cases to resolve their two business challenges:
Build personalized recommendations – SikSin wanted to deploy a machine learning (ML) model to produce personalized content on the landing page of the platform, particularly restaurants and restaurant articles. The success criteria was to increase the number of page views per session and membership subscription, reduce their bounce rate, and ultimately engage more visitors and members in SikSin’s contents.
Establish self-service analytics – SikSin’s business users wanted to reduce time to insight by making data more accessible while removing the reliance on the IT team by giving business users the ability to query data. The key was to consolidate web logs from BigQuery and operational business data from Amazon Relational Data Service (Amazon RDS) into a single place and analyze data whenever they need.
Solution overview
The following architecture depicts what the SikSin team built in the 4-day Build Lab. There are two parts in the solution to address SikSin’s business and technical requirements. The first part (1–8) is for building personalized recommendations, and the second part (A–D) is for establishing self-service analytics.
SikSin deployed an ML model to produce personalized content recommendations by using the following AWS services:
AWS Database Migration Service (AWS DMS) helps migrate databases to AWS quickly and securely with minimal downtime. The SikSin team used AWS DMS to perform full load to bring data from the database tables into Amazon Simple Storage Service (Amazon S3) as a target. Amazon S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. An AWS Glue crawler populates the AWS Glue Data Catalog with the data schema definitions (in a landing folder).
An AWS Lambda function checks if any previous files still exist in the landing folder and archives the files into a backup folder, if any.
AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, ML, and application development. The SikSin team created AWS Glue Spark extract, transform, and load (ETL) jobs to prepare input datasets for ML models. These datasets are used to train ML models in bulk mode. There are a total of five datasets for training and two datasets for batch inference jobs.
Amazon Personalize allows developers to quickly build and deploy curated recommendations and intelligent user segmentation at scale using ML. Because Amazon Personalize can be tailored to your individual needs, you can deliver the right customer experience at the right time and in the right place. Also, users will select existing ML models (also known as recipes), train models, and run batch inference to make recommendations.
An Amazon Personalize job predicts for each line of input data (restaurants and restaurant articles) and produces ML-generated recommendations in the designated S3 output folder. The recommendation records are surfaced using interaction data, product data, and predictive models. An AWS Glue crawler populates the AWS Glue Data Catalog with the data schema definitions (in an output folder).
The SikSin team applied business logics and filters in an AWS Glue job to prepare the final datasets for recommendations.
AWS Step Functions enables you to build scalable, distributed applications using state machines. The SikSin team used AWS Step Functions Workflow Studio to visually create, run, and debug workflow runs. This workflow is triggered based on a schedule. The process includes data ingestion, cleansing, processing, and all steps defined in Amazon Personalize. This also involves managing run dependencies, scheduling, error-catching, and concurrency in accordance with the logical flow of the pipeline.
Amazon Simple Notification Service (Amazon SNS) sends notifications. The SikSin team used Amazon SNS to send a notification via email and Google Hangouts with a Lambda function as a target.
To establish a self-service analytics environment to enable business users to perform data analysis, SikSin used the following services:
The Google BigQuery Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from BigQuery. The SikSin team used the connector to extract web analytics logs from BigQuery and load them to an S3 bucket.
AWS Glue DataBrew is a visual data preparation tool that makes it easy for data analysts and data scientists to clean and normalize data to prepare it for analytics and ML. You can choose from over 250 pre-built transformations to automate data preparation tasks, all without the need to write any code. The SikSin Food Service team used it to visually inspect large datasets and shape the data for their data analysis activities. An S3 bucket (in the intermediate folder) contains business operational data such as customers, places, articles, and products, and reference data loaded from AWS DMS and web analytics logs and data by AWS Glue jobs.
An AWS Glue Python shell runs a job to cleanse and join data, and apply business rules to prepare the data for queries. The SikSin team used AWS SDK Pandas, an AWS Professional Service open-source Python initiative, which extends the power of the Pandas library to AWS, connecting DataFrames and AWS data related services. The output files are stored in an Apache Parquet format in a single folder. An AWS Glue crawler populates the data schema definitions (in an output folder) into the AWS Glue Data Catalog.
The SikSin Food Service team used Amazon Athena and Amazon Quicksight to query and visualize the data analysis. Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. QuickSight is an ML-powered business intelligence service built for the cloud.
Business outcomes
The SikSin Food Service team is now able to access the available data for performing data analysis and manipulation operations efficiently, as well as for getting insights on their own. This immediately allows the team as well as other lines of business to understand how customers are interacting with SikSin’s contents and services on the platform and make decisions sooner. For example, with the data output, the Food Service team was able to provide insights and data points for their external stakeholder and customer to initiate a new business idea. Moreover, the team shared, “We anticipate the recommendations and personalized content will increase conversion rates and customer engagement.”
The AWS Data Lab enabled SikSin to review and assess thoroughly what data is actually usable and available. With SikSin’s objective to successfully build a data pipeline for data analytics purposes, the SikSin team came to realize the importance of data cleansing, categorization, and standardization. “Only fruitful analysis and recommendation are possible when data is intact and properly cleansed,” said Byungjun Choi (the Head of SikSin’s Food Service Team). After completing the Data Lab, SikSin completed and set up an internal process that can streamline the data cleansing pipeline.
SikSin was stuck in the research phase of looking for a solution to solve their personalization challenges. The AWS Data Lab enabled the SikSin IT Team to get hands-on with the technology and build a minimum viable product (MVP) to explore how Amazon Personalize would work in their environment with their data. They achieved this via the Data Lab by adopting AWS DMS, AWS Glue, Amazon Personalize, and Step Functions. “Though it is still the early stage of building a prototype, I am very confident with the right enablement provided from AWS that an effective recommendation system can be adopted on production level very soon,” commented Sangha Yang (the Head of SikSin IT Team).
Conclusion
As a result of the 4-day Build Lab, the SikSin team left with a working prototype that is custom fit to their needs, gaining a clear path forward for enabling end-users to gain valuable insights into its data. The Data Lab allowed the SikSin team to accelerate the architectural design and prototype build of this solution by months. Based on the lessons and learnings obtained from Data Lab, SikSin is planning to launch a Global News Content Platform equipped with a recommendation feature in FY23.
As demonstrated by SikSin’s achievements, Amazon Personalize allows developers to quickly build and deploy curated recommendations and intelligent user segmentation at scale using ML. Because Amazon Personalize can be tailored to your individual needs, you can deliver the right customer experience at the right time and in the right place. Whether you want to optimize recommendations, target customers more accurately, maximize your data’s value, or promote items using business rules.
To accelerate your digital transformation with ML, the Data Lab program is available to support you by providing prescriptive architectural guidance on a particular use case, sharing best practices, and removing technical roadblocks. You’ll leave the engagement with an architecture or working prototype that is custom fit to your needs, a path to production, and deeper knowledge of AWS services.
Please contact your AWS Account Manager or Solutions Architect to get started. If you don’t have an AWS Account Manager, please contact Sales.
About the Authors
Byungjun Choi is the Head of SikSin Food Service at SikSin.
Sangha Yang is the Head of IT team at SinSin.
Younggu Yun is a Senior Data Lab Architect at AWS. He works with customers around the APAC region to help them achieve business goals and solve technical problems by providing prescriptive architectural guidance, sharing best practices, and building innovative solutions together.
Junwoo Lee is an Account Manager at AWS. He provides technical and business support to help customer resolve their problems and enrich customer journey by introducing local and global programs for his customers.
Jinwoo Park is a Senior Solutions Architect at AWS. He provides technical support for AWS customers to succeed with their cloud journey. He helps customers build more secure, efficient, and cost-optimized architectures and solutions, and delivers best practices and workshops.
An Amazon CodeCatalyst Dev Environment is a cloud-based development environment that you can use in CodeCatalyst to quickly work on the code stored in the source repositories of your project. The project tools and application libraries included in your Dev Environment are defined by a devfile in the source repository of your project.
Introduction
In the previous CodeCatalyst post, Team Collaboration with Amazon CodeCatalyst, I focused on CodeCatalyst’s collaboration capabilities and how that related to The Unicorn Project’s main protaganist. At the beginning of Chapter 2, Maxine is struggling to configure her development environment. She is two days into her new job and still cannot build the application code. She has identified over 100 dependencies she is missing. The documentation is out of date and nobody seems to know where the dependencies are stored. I can sympathize with Maxine. In this post, I will focus on managing development environments to show how CodeCatalyst removes the burden of managing workload specific configurations and produces reliable on-demand development environments.
Prerequisites
If you would like to follow along with this walkthrough, you will need to:
Have an AWS account associated with your space and have the IAM role in that account. For more information about the role and role policy, see Creating a CodeCatalyst service role.
Walkthrough
As with the previous posts in our CodeCatalyst series, I am going to use the Modern Three-tier Web Application blueprint. Blueprints provide sample code and CI/CD workflows to help make getting started easier across different combinations of programming languages and architectures. To follow along, you can re-use a project you created previously, or you can refer to a previous post that walks through creating a project using the blueprint.
One of the most difficult aspects of my time spent as a developer was finding ways to quickly contribute to a new project. Whenever I found myself working on a new project, getting to the point where I could meaningfully contribute to a project’s code base was always more difficult than writing the actual code. A major contributor to this inefficiency, was the lack of process managing my local development environment. I will be exploring how CodeCatalyst can help solve this challenge. For this walkthrough, I want to add a new test that will allow local testing of Amazon DynamoDB. To achieve this, I will use a CodeCatalyst dev environment.
CodeCatalyst Dev Environments are managed cloud-based development environments that you can use to access and modify code stored in a source repository. You can launch a project specific dev environment that will automate check-out of your project’s repo or you can launch an empty environment to use for accessing third-party source providers. You can learn more about CodeCatalyst Dev Environments in the CodeCatalyst User Guide.
Figure 1. Creating a new Dev Environment
To begin, I navigate to the Dev Environments page under the Code section of the navigaiton menu. I then use the Create Dev Environment to launch my environment. For this post, I am using the AWS Cloud9 IDE, but you can follow along with the IDE you are most comfortable using. In the next screen, I select Work in New Branch and assign local_testing for the new branch name, and I am branching from main. I leave the remaining default options and Create.
Figure 2. Dev Environment Create Options
After waiting less than a minute, my IDE is ready in a new tab and I am ready to begin work. The first thing I see in my dev environment is an information window asking me if I want to navigate to the Dev Environment Settings. Because I need to enable local testing of Dynamodb, not only for myself, but other developers that will collaborate on this project, I need to update the project’s devfile. I select to navigate to the settings tab because I know that contains information on the project’s devfile and allows me to access the file to edit.
Figure 3. Toolkit Welcome Banner
Devfiles allow you to model a Dev Environment’s configuration and dependencies so that you can re-produce consisent Dev Environments and reduce the manual effort in setting up future environments. The tools and application libraries included in your Dev Environment are defined by the devfile in the source repository of your project. Since this project was created from a blueprint, there is one provided. For blank projects, a default CodeCatalyst devfile is created when you first launch an environment. To learn more about the devfile, see https://devfile.io.
In the settings tab, I find a link to the devfile that is configured. When I click the edit button, a new file tab launches and I can now make changes. I first add an env section to the container that hosts our dev environment. By adding an environment variable and value, anytime a new dev environment is created from this project’s repository, that value will be included. Next, I add a second container to the dev environment that will run DynamoDB locally. I can do this by adding a new container component. I use Amazon’s verified DynamoDB docker image for my environment. Attaching additional images allow you to extend the dev environment and include tools or services that can be made available locally. My updates are highlighted in the green sections below.
Figure 4. Example Devfile
I save my changes and navigate back to the Dev Environment Settings tab. I notice that my changes were automatically detected and I am prompted to restart my development environment for the changes to take effect. Modifications to the devfile requires a restart. You can restart a dev environment using the toolkit, or from the CodeCatalyst UI.
Figure 5. Dev Environment Settings
After waiting a few seconds for my dev environment to restart, I am ready to write my test. I use the IDE’s file explorer, expand the repo’s ./tests/unit folder, and create a new file named test_dynamodb.py. Using the IS_LOCAL environment variable I configured in the devfile, I can include a conditional in my test that sets the endpoint that Amazon’s python SDK ( Boto3 ) will use to connect to the Dynamodb service. This way, I can run tests locally before pushing my changes and still have tests complete successfully in my project’s workflow. My full test file is included below.
Figure 6. Dynamodb test file
Now that I have completed my changes to the dev environment using the devfile and added a test, I am ready to run my test locally to verify. I will use pytest to ensure the tests are passing before pushing any changes. From the repo’s root folder, I run the command pip install -r requirements-dev.txt. Once my dependencies are installed, I then issue the command pytest -k unit. All tests pass as I expect.
Figure 7. Pytest test results
Rather than manually installing my development dependencies in each environment, I could also use the devfile to include commands and automate the execution of those commands during the dev environment lifecycle events. You can refer to the links for commands and events for more information.
Finally, I am ready to push my changes back to my CodeCatalyst source repository. I use the git extension of Cloud9 to review my changes. After reviewing my changes are what I expect, I use the git extension to stage, commit, and push the new test file and the modified devfile so other collaborators can adopt the improvements I made.
Figure 8. Changes reviewed in CodeCatalyst Cloud9 git extension.
Cleanup
If you have been following along with this workflow, you should delete the resources you deployed so you do not continue to incur charges. First, delete the two stacks that CDK deployed using the AWS CloudFormation console in the AWS account you associated when you launched the blueprint. These stacks will have names like mysfitsXXXXXWebStack and mysfitsXXXXXAppStack. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project.
Conclusion
In this post, you learned how CodeCatalyst provides configurable on-demand dev environments. You also learned how devfiles help you define a consistent experience for developing within a CodeCatalyst project. Please follow our DevOps blog channel as I continue to explore how CodeCatalyst solve Maxine’s and other builders’ challenges.
In the last post, OfferUp modernized its DevOps platform with Amazon EKS and Flagger to accelerate time to market, we talked about hypergrowth and the technical challenges encountered by OfferUp in its existing DevOps platform. As a reminder, we presented how OfferUp modernized its DevOps platform with Amazon Elastic Kubernetes Service (Amazon EKS) and Flagger to gain developer’s velocity, automate faster deployment, and achieve lower cost of ownership.
In this post, we discuss the technical steps to build a DevOps platform that enables the progressive deployment of microservices on Amazon Managed Amazon EKS. Progressive delivery exposes a new version of the software incrementally to ingress traffic and continuously measures the success rate of the metrics before allowing all of the new traffics to a newer version of the software. Flagger is the Graduate project of Cloud Native Computing Foundations (CNCF) that enables progressive canary delivery, along with bule/green and A/B Testing, while measuring metrics like HTTP/gRPC request success rate and latency. Flagger shifts and routes traffic between app versions using a service mesh or an Ingress controller
We leverage Gloo Ingress Controller for traffic routing, Prometheus, Datadog, and Amazon CloudWatch for application metrics analysis and Slack to send notification. Flagger will post messages to slack when a deployment has been initialized, when a new revision has been detected, and if the canary analysis failed or succeeded.
Prerequisite steps to build the modern DevOps platform
You need an AWS Account and AWS Identity and Access Management (IAM) user to build the DevOps platform. If you don’t have an AWS account with Administrator access, then create one now by clicking here. Create an IAM user and assign admin role. You can build this platform in any AWS region however, I will you us-west-1 region throughout this post. You can use a laptop (Mac or Windows) or an Amazon Elastic Compute Cloud (AmazonEC2) instance as a client machine to install all of the necessary software to build the GitOps platform. For this post, I launched an Amazon EC2 instance (with Amazon Linux2 AMI) as the client and install all of the prerequisite software. You need the awscli, git, eksctl, kubectl, and helm applications to build the GitOps platform. Here are the prerequisite steps,
Create a named profile(eks-devops) with the config and credentials file:
aws configure --profile eks-devops
AWS Access Key ID [None]: xxxxxxxxxxxxxxxxxxxxxx
AWS Secret Access Key [None]: xxxxxxxxxxxxxxxxxx
Default region name [None]: us-west-1
Default output format [None]:
View and verify your current IAM profile:
export AWS_PROFILE=eks-devops
aws sts get-caller-identity
If the Amazon EC2 instance doesn’t have git preinstalled, then install git in your Amazon EC2 instance:
sudo yum update -y
sudo yum install git -y
Check git version
git version
Git clone the repo and download all of the prerequisite software in the home directory.
Amazon ECR is a fully-managed container registry that makes it easy for developers to share and deploy container images and artifacts. ecr setup.sh script will create a new Amazon ECR repository and also push the podinfo images (6.0.0, 6.0.1, 6.0.2, 6.1.0, 6.1.5 and 6.1.6) to the Amazon ECR. Run ecr-setup.sh script with the parameter, “ECR repository name” (e.g. ps-flagger-repository) and region (e.g. us-west-1)
Technical steps to build the modern DevOps platform
This post shows you how to use the Gloo Edge ingress controller and Flagger to automate canary releases for progressive deployment on the Amazon EKS cluster. Flagger requires a Kubernetes cluster v1.16 or newer and Gloo Edge ingress 1.6.0 or newer. This post will provide a step-by-step approach to install the Amazon EKS cluster with managed node group, Gloo Edge ingress controller, and Flagger for Gloo in the Amazon EKS cluster. Now that the cluster, metrics infrastructure, and Flagger are installed, we can install the sample application itself. We’ll use the standard Podinfo application used in the Flagger project and the accompanying loadtester tool. The Flagger “podinfo” backend service will be called by Gloo’s “VirtualService”, which is the root routing object for the Gloo Gateway. A virtual service describes the set of routes to match for a set of domains. We’ll automate the canary promotion, with the new image of the “podinfo” service, from version 6.0.0 to version 6.0.1. We’ll also create a scenario by injecting an error for automated canary rollback while deploying version 6.0.2.
Use myeks-cluster.yaml to create your Amazon EKS cluster with managed nodegroup. myeks-cluster.yaml deployment file has “cluster name” value as ps-eks-66, region value as us-west-1, availabilityZones as [us-west-1a, us-west-1b], Kubernetes version as 1.24, and nodegroup Amazon EC2 instance type as m5.2xlarge. You can change this value if you want to build the cluster in a separate region or availability zone.
Create a namespace “gloo-system” and Install Gloo with Helm Chart. Gloo Edge is an Envoy-based Kubernetes-native ingress controller to facilitate and secure application traffic.
[Optional] If you’re using Datadog as a monitoring tool, then deploy Datadog agents as a DaemonSet using the Datadog Helm chart. Replace RELEASE_NAME and DATADOG_API_KEY accordingly. If you aren’t using Datadog, then skip this step. For this post, we leverage the Prometheus open-source monitoring tool.
Integrate Amazon EKS/ K8s Cluster with the Datadog Dashboard – go to the Datadog Console and add the Kubernetes integration.
[Optional] If you’re using Slack communication tool and have admin access, then Flagger can be configured to send alerts to the Slack chat platform by integrating the Slack alerting system with Flagger. If you don’t have admin access in Slack, then skip this step.
Create a namespace “apps”, and applications and load testing service will be deployed into this namespace.
kubectl create ns apps
Create a deployment and a horizontal pod autoscaler for your custom application or service for which canary deployment will be done.
kubectl -n apps apply -k app
kubectl get deployment -A
kubectl get hpa -n apps
Deploy the load testing service to generate traffic during the canary analysis.
kubectl -n apps apply -k tester
kubectl get deployment -A
kubectl get svc -n apps
Use apps-vs.yaml to create a Gloo virtual service definition that references a route table that will be generated by Flagger.
kubectl apply -f ./apps-vs.yaml
kubectl get vs -n apps
[Optional] If you have your own domain name, then open apps-vs.yaml in vi editor and replace podinfo.example.com with your own domain name to run the app in that domain.
Use canary.yaml to create a canary custom resource. Review the service, analysis, and metrics sections of the canary.yaml file.
kubectl apply -f ./canary.yaml
After a couple of seconds, Flagger will create the canary objects. When the bootstrap finishes, Flagger will set the canary status to “Initialized”.
kubectl -n apps get canary podinfo
NAME STATUS WEIGHT LASTTRANSITIONTIME
podinfo Initialized 0 2023-xx-xxTxx:xx:xxZ
Gloo automatically creates an ELB. Once the load balancer is provisioned and health checks pass, we can find the sample application at the load balancer’s public address. Note down the ELB’s Public address:
kubectl get svc -n gloo-system --field-selector 'metadata.name==gateway-proxy' -o=jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}{"\n"}'
Validate if your application is running, and you’ll see an output with version 6.0.0.
curl <load balancer’s public address> -H "Host:podinfo.example.com"
Trigger progressive deployments and monitor the status
You can Trigger a canary deployment by updating the application container image from 6.0.0 to 6.01.
kubectl -n apps set image deployment/podinfo podinfod=<ECR URI>:6.0.1
Flagger detects that the deployment revision changed and starts a new rollout.
kubectl -n apps describe canary/podinfo
Monitor all canaries, as the promoted status condition can have one of the following statuses: initialized, Waiting, Progressing, Promoting, Finalizing, Succeeded, and Failed.
watch kubectl get canaries --all-namespaces
curl < load balancer’s public address> -H "Host:podinfo.example.com"
Once canary is completed, validate your application. You can see that the version of the application is changed from 6.0.0 to 6.0.1.
[Optional] Open podinfo application from the laptop browser
Find out both of the IP addresses associated with load balancer.
dig < load balancer’s public address >
Open /etc/hosts file in the laptop and add both of the IPs of load balancer in the host file.
sudo vi /etc/hosts
<Public IP address of LB Target node> podinfo.example.com
e.g.
xx.xx.xxx.xxx podinfo.example.com
xx.xx.xxx.xxx podinfo.example.com
Type “podinfo.example.com” in your browser and you’ll find the application in form similar to this:
Figure 1: Greetings from podinfo v6.0.1
Automated rollback
While doing the canary analysis, you’ll generate HTTP 500 errors and high latency to check if Flagger pauses and rolls back the faulted version. Flagger performs automatic Rollback in the case of failure.
Introduce another canary deployment with podinfo image version 6.0.2 and monitor the status of the canary.
kubectl -n apps set image deployment/podinfo podinfod=<ECR URI>:6.0.2
Run HTTP 500 errors or a high-latency error from a separate terminal window.
Generate HTTP 500 errors:
watch curl -H 'Host:podinfo.example.com' <load balancer’s public address>/status/500
Generate high latency:
watch curl -H 'Host:podinfo.example.com' < load balancer’s public address >/delay/2
When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero, and the rollout is marked as failed.
kubectl get canaries --all-namespaces
kubectl -n apps describe canary/podinfo
Cleanup
When you’re done experimenting, you can delete all of the resources created during this series to avoid any additional charges. Let’s walk through deleting all of the resources used.
Delete Amazon EKS Cluster After you’ve finished with the cluster and nodes that you created for this tutorial, you should clean up by deleting the cluster and nodes with the following command:
This post explained the process for setting up Amazon EKS cluster and how to leverage Flagger for progressive deployments along with Prometheus and Gloo Ingress Controller. You can enhance the deployments by integrating Flagger with Slack, Datadog, and webhook notifications for progressive deployments. Amazon EKS removes the undifferentiated heavy lifting of managing and updating the Kubernetes cluster. Managed node groups automate the provisioning and lifecycle management of worker nodes in an Amazon EKS cluster, which greatly simplifies operational activities such as new Kubernetes version deployments.
We encourage you to look into modernizing your DevOps platform from monolithic architecture to microservice-based architecture with Amazon EKS, and leverage Flagger with the right Ingress controller for secured and automated service releases.
Further Reading
Journey to adopt Cloud-Native DevOps platform Series #1: OfferUp modernized DevOps platform with Amazon EKS and Flagger to accelerate time to market
This post is co-authored with Girish Kumar Chidananda from redBus.
redBus is one of the earliest adopters of AWS in India, and most of its services and applications are hosted on the AWS Cloud. AWS provided redBus the flexibility to scale their infrastructure rapidly while keeping costs extremely low. AWS has a comprehensive suite of services to cater to most of their needs, including providing customer support that redBus can vouch for.
In this post, we share redBus’s data platform architecture, and how various components are connected to form their data highway. We also discuss the challenges redBus faced in building dashboards for their real-time business intelligence (BI) use cases, and how they used Amazon QuickSight, a fast, easy-to-use, cloud-powered business analytics service that makes it easy for all employees within redBus to build visualizations and perform ad hoc analysis to gain business insights from their data, any time, and on any device.
About redBus
redBus is the world’s largest online bus ticketing platform built in India and serving more than 36 million happy customers around the world. Along with its bus ticketing vertical, redBus also runs a rail ticketing service called redRails and a bus and car rental service called rYde. It is part of the GO-MMT group, which is India’s leading online travel company, with an extensive brand portfolio that includes other prominent online travel brands like MakeMyTrip and Goibibo.
redBus’s data highway 1.0
redBus relies heavily on making data-driven decisions at every level, from its traveler journey tracking, forecasting demand during high traffic, identifying and addressing bottlenecks in their bus operators signup process, and more. As redBus’s business started growing in terms of the number of cities and countries they operated in and the number of bus operators and travelers using the service in each city, the amount of incoming data also increased. The need to access and analyze the data in one place required them to build their own data platform, as shown in the following diagram.
In the following sections, we look at each component in more detail.
Data ingestion sources
With the data platform 1.0, the data is ingested from various sources:
Real time – The real-time data flows from redBus mobile apps, the backend microservices, and when a passenger, bus operator, or application does any operation like booking bus tickets, searching the bus inventory, uploading a KYC document, and more
Batch mode – Scheduled jobs fetch data from multiple persistent data stores like Amazon Relational Database Service (Amazon RDS), where the OLTP data from all its applications are stored, Apache Cassandra clusters, where the bus inventory from various operators is stored, Arango DB, where the user identity graphs are stored, and more
Data cataloging
The real-time data is ingested into their self-managed Apache Nifi clusters, an open-source data platform that is used to clean, analyze, and catalog the data with its routing capabilities before sending the data to its destination.
Storage and analytics
redBus uses the following services for its storage and analytical needs:
Amazon Simple Storage Service (Amazon S3), an object storage service that provides the foundation for their data lake because of its virtually unlimited scalability and higher durability. Real-time data flows from Apache Druid and data from the data stores flow at regular intervals based on the schedules.
Apache Druid, an OLAP-style data store (data flows via Kafka Druid data loader), which computes facts and metrics against various dimensions during the data loading process.
Amazon Redshift, a cloud data warehouse service that helps you analyze exabytes of data and run complex analytical queries. redBus uses Amazon Redshift to store the processed data from Amazon S3 and the aggregated data from Apache Druid.
Querying and visualization
To make redBus as data-driven as possible, they ensured that the data is accessible to their SRE engineers, data engineers, and business analysts via a visualization layer. This layer features dashboards being served using Apache SuperSet, an open-source data visualization application, and Amazon Athena, an interactive query service to analyze data in Amazon S3 using standard SQL for ad hoc querying requirements.
The challenges
Initially, redBus handled data that was being ingested at the rate of 10 million events per day. Over time, as its business started growing, so did the data volume (from gigabytes to terabytes to petabytes), data ingestion per day (from 10 million to 320 million events), and its business intelligence dashboard needs. Soon after, they started facing challenges with their self-managed Superset’s BI capabilities, and the increased operational complexities.
Limited BI capabilities
redBus encountered the following BI limitations:
Inability to create visualizations from multiple data sources – Superset doesn’t allow creating visualizations from multiple tables within its data exploration layer. redBus data engineers had to have the tables joined beforehand at the data source level itself. In order to create a 360-degree view for redBus’s business stakeholders, it became inconvenient for data engineers to maintain multiple tables supporting the visualization layer.
No global filter for visuals in a dashboard – A global or primary filter across visuals in a dashboard is not supported in Superset. For example, consider there are visuals like Sales Wins by Region, YTD Revenue Realized by Region, Sales Pipeline by Region, and more in a dashboard, and a filter Region is added to the dashboard with values like EMEA, APAC, and US. The filter Region will only apply to one of the visuals, not the entire dashboard. However, dashboard users expected filtering across the dashboard.
Not a business-user friendly tool – Superset is highly developer centric when it comes to customization. For example, if a redBus business analyst had to customize a timed refresh that automatically re-queries every slice on a dashboard according to a pre-set value, then the analyst has to update the dashboard’s JSON metadata field. Therefore, having knowledge of JSON and its syntax is mandatory for doing any customization on the visuals or dashboard.
Increased operational cost
Although Superset is open source, which means there are no licensing costs, it also means there is more effort in maintaining all the components required for it to function as an enterprise-grade BI tool. redBus has deployed and maintained a web server (Nginx) fronted by an Application Load Balancer to do the load balancing; a metadata database server (MySQL) where Superset stores its internal information like users, slices, and dashboard definitions; an asynchronous task queue (Celery) for supporting long-running queries; a message broker (RabbitMQ); and a distributed caching server (Redis) for caching the results, charting data, and more on Amazon Elastic Compute Cloud (Amazon EC2) instances. The following diagram illustrates this architecture.
redBus’s DevOps team had to do the heavy lifting of provisioning the infrastructure, taking backups, scaling the components manually as needed, upgrading the components individually, and more. It also required a Python web developer to be around for making the configurational changes so all the components work together seamlessly. All these manual operations increased the total cost of ownership for redBus.
Journey towards QuickSight
redBus started exploring BI solutions primarily around a couple of its dashboarding requirements:
BI dashboards for business stakeholders and analysts, where the data is sourced via Amazon S3 and Amazon Redshift.
A real-time application performance monitoring (APM) dashboard to help their SRE engineers and developers identify the root cause of an issue in their microservices deployment so they can fix the issues before they affect their customer’s experience. In this case, the data is sourced via Druid.
QuickSight fit into most of redBus’s BI dashboard requirements, and in no time their data platform team started with a proof of concept (POC) for a couple of their complex dashboards. At the end of the POC, which spanned a month’s time, the team shared their findings.
First, QuickSight is rich in BI capabilities, including the following:
It’s a self-service BI solution with drag-and-drop features that could help redBus analysts comfortably use it without any coding efforts.
Visualizations from multiple data sources in a single dashboard could help redBus business stakeholders get a 360-degree view of sales, forecasting, and insights in a single pane of glass.
Cascading filters across visuals and across sheets in a dashboard are much-needed features for redBus’s BI requirements.
QuickSight offers Excel-like visuals—tables with calculations, pivot tables with cell grouping, and styling are attractive for the viewers.
The Super-fast, Parallel, In-memory Calculation Engine (SPICE) in QuickSight could help redBus scale to hundreds of thousands of users, who can all simultaneously perform fast interactive analysis across a wide variety of AWS data sources.
Off-the-shelf ML insights and forecasting at no additional cost would allow redBus’s data science team to focus on ML models besides sales forecasting and similar models.
Built-in row-level security (RLS) could allow redBus to grant filtered access for their viewers. For example, redBus has many business analysts who manage different countries. With RLS, each business analyst only sees data related to their assigned country within a single dashboard.
redBus uses OneLogin as its identity provider, which supports Security Assertion Markup Language 2.0 (SAML 2.0). With the help of identity federation and single sign-on support from QuickSight, redBus could provide a simple onboarding flow for their QuickSight users.
QuickSight offers built-in alerts and email notification capabilities.
Secondly, QuickSight is a fully managed, cloud-native, serverless BI service offering from AWS, with the following features:
redBus engineers don’t need to focus on the heavy lifting of provisioning, scaling, and maintaining their BI solution on EC2 instances.
QuickSight offers native integration with AWS services like Amazon Redshift, Amazon S3, and Athena, and other popular frameworks like Presto, Snowflake, Teradata, and more. QuickSight connects to most of the data sources that redBus already has except Apache Druid, because native integration with Druid was not available as of December 2022. For a complete list of the supported data sources, see Supported data sources.
The outcome
Considering all the rich features and lower total cost of ownership, redBus chose QuickSight for their BI dashboard requirements. With QuickSight, redBus’s data engineers have built a number of dashboards in no time to give insights from petabytes of data to business stakeholders and analysts. The redBus data highway evolved to bring business intelligence to a much wider audience in their organization, with better performance and faster time-to-value. As of November 2022, it combines QuickSight for business users and Superset for real-time APM dashboards (at the time of writing, QuickSight doesn’t offer a native connector to Druid), as shown in the following diagram.
Sales anomaly detection dashboard
Although there are many dashboards that redBus deployed to production, sales anomaly detection is one of the interesting dashboards that redBus built. It uses redBus’s proprietary sales forecasting model, which in turn is sourced by historical sales data from Amazon Redshift tables and real-time sales data from Druid tables, as shown in the following figure.
At regular intervals, the scheduled jobs feed the redBus forecasting model with real-time and historical sales data, and then the forecasted data is pushed into an Amazon Redshift table. The sales anomaly detection dashboard in QuickSight is served by the resultant Amazon Redshift table.
The following is one of the visuals from the sales anomaly detection dashboard. It’s built using a line chart representing hourly actual sales, predicted sales, and an alert threshold for a time series for a particular business cohort in redBus.
In this visual, each bar represents the number of sales anomalies triggered at a particular point in the time series.
redBus’s analysts could further drill down to the sales details and anomalies at the minute level, as shown in the following diagram. This drill-down feature comes out of the box with QuickSight.
Apart from the visuals, it has become one of viewers’ favorite dashboards at redBus due to the following notable features:
Because filtering across visuals is an out-of-the-box feature in QuickSight, a timestamp-based filter is added to the dashboard. This helps in filtering multiple visuals in the dashboard in a single click.
URL actions configured on the visuals help the viewers navigate to the context-sensitive in-house applications.
Email alerts configured on KPIs and Gauge visuals help the viewers get notifications on time.
Next steps
Apart from building new dashboards for their BI dashboard needs, redBus is taking the following next steps:
Exploring QuickSight Embedded Analytics for a couple of their application requirements to accelerate time to insights for users with in-context data visuals, interactive dashboards, and more directly within applications
Exploring QuickSight Q, which could enable their business stakeholders to ask questions in natural language and receive accurate answers with relevant visualizations that can help them gain insights from the data
Building a unified dashboarding solution using QuickSight covering all their data sources as integrations become available
Conclusion
In this post, we showed you how redBus built its data platform using various AWS services and Apache frameworks, the challenges the platform went through (especially in their BI dashboard requirements and challenges while scaling), and how they used QuickSight and lowered the total cost of ownership.
To know more about engineering at redBus, check out their medium blog posts. To learn more about what is happening in QuickSight or if you have any questions, reach out to the QuickSight Community, which is very active and offers several resources.
About the Authors
Girish Kumar Chidananda works as a Senior Engineering Manager – Data Engineering at redBus, where he has been building various data engineering applications and components for redBus for the last 5 years. Prior to starting his journey in the IT industry, he worked as a Mechanical and Control systems engineer in various organizations, and he holds an MS degree in Fluid Power Engineering from University of Bath.
Kayalvizhi Kandasamy works with digital-native companies to support their innovation. As a Senior Solutions Architect (APAC) at Amazon Web Services, she uses her experience to help people bring their ideas to life, focusing primarily on microservice architectures and cloud-native solutions using AWS services. Outside of work, she likes playing chess and is a FIDE rated chess player. She also coaches her daughters the art of playing chess, and prepares them for various chess tournaments.
In this post I will show you how to add a manual approval to AWS Cloud Development Kit (CDK) Pipelines to confirm security changes before deployment. With this solution, when a developer commits a change, CDK pipeline identifies an IAM permissions change, pauses execution, and sends a notification to a security engineer to manually approve or reject the change before it is deployed.
Introduction
In my role I talk to a lot of customers that are excited about the AWS Cloud Development Kit (CDK). One of the things they like is that L2 constructs often generate IAM and other security policies. This can save a lot of time and effort over hand coding those policies. Most customers also tell me that the policies generated by CDK are more secure than the policies they generate by hand.
However, these same customers are concerned that their security engineering team does not know what is in the policies CDK generates. In the past, these customers spent a lot of time crafting a handful of IAM policies that developers can use in their apps. These policies were well understood, but overly permissive because they were often reused across many applications.
Customers want more visibility into the policies CDK generates. Luckily CDK provides a mechanism to approve security changes. If you are using CDK, you have probably been prompted to approve security changes when you run cdk deploy at the command line. That works great on a developer’s machine, but customers want to build the same confirmation into their continuous delivery pipeline. CDK provides a mechanism for this with the ConfirmPermissionsBroadening action. Note that ConfirmPermissionsBroadening is only supported by the AWS CodePipline deployment engine.
Background
Before I talk about ConfirmPermissionsBroadening, let me review how CDK creates IAM policies. Consider the “Hello, CDK” application created in AWS CDK Workshop. At the end of this module, I have an AWS Lambda function and an Amazon API Gateway defined by the following CDK code.
// defines an AWS Lambda resource
const hello = new lambda.Function(this, 'HelloHandler', {
runtime: lambda.Runtime.NODEJS_14_X, // execution environment
code: lambda.Code.fromAsset('lambda'), // code loaded from "lambda" directory
handler: 'hello.handler' // file is "hello", function is "handler"
});
// defines an API Gateway REST API resource backed by our "hello" function.
new apigw.LambdaRestApi(this, 'Endpoint', {
handler: hello
});
Note that I did not need to define the IAM Role or Lambda Permissions. I simply passed a refence to the Lambda function to the API Gateway (line 10 above). CDK understood what I was doing and generated the permissions for me. For example, CDK generated the following Lambda Permission, among others.
Notice that CDK generated a narrowly scoped policy, that allows a specific API (line 10 above) to call a specific Lambda function (line 7 above). This policy cannot be reused elsewhere. Later in the same workshop, I created a Hit Counter Construct using a Lambda function and an Amazon DynamoDB table. Again, I associated them using a single line of CDK code.
table.grantReadWriteData(this.handler);
As in the prior example, CDK generated a narrowly scoped IAM policy. This policy allows the Lambda function to perform certain actions (lines 4-11) on a specific table (line 14 below).
As you can see, CDK is doing a lot of work for me. In addition, CDK is creating narrowly scoped policies for each resource, rather than sharing a broadly scoped policy in multiple places.
CDK Pipelines Permissions Checks
Now that I have reviewed how CDK generates policies, let’s discuss how I can use this in a Continuous Deployment pipeline. Specifically, I want to allow CDK to generate policies, but I want a security engineer to review any changes using a manual approval step in the pipeline. Of course, I don’t want security to be a bottleneck, so I will only require approval when security statements or traffic rules are added. The pipeline should skip the manual approval if there are no new security rules added.
Let’s continue to use CDK Workshop as an example. In the CDK Pipelines module, I used CDK to configure AWS CodePipeline to deploy the “Hello, CDK” application I discussed above. One of the last things I do in the workshop is add a validation test using a post-deployment step. Adding a permission check is similar, but I will use a pre-deployment step to ensure the permission check happens before deployment.
First, I will import ConfirmPermissionsBroadening from the pipelines package
import {ConfirmPermissionsBroadening} from "aws-cdk-lib/pipelines";
Then, I can simply add ConfirmPermissionsBroadening to the deploySatage using the addPre method as follows.
const deploy = new WorkshopPipelineStage(this, 'Deploy');
const deployStage = pipeline.addStage(deploy);
deployStage.addPre(
new ConfirmPermissionsBroadening("PermissionCheck", {
stage: deploy
})
deployStage.addPost(
// Post Deployment Test Code Omitted
)
Once I commit and push this change, a new manual approval step called PermissionCheck.Confirm is added to the Deploy stage of the pipeline. In the future, if I push a change that adds additional rules, the pipeline will pause here and await manual approval as shown in the screenshot below.
Figure 1. Pipeline waiting for manual review
When the security engineer clicks the review button, she is presented with the following dialog. From here, she can click the URL to see a summary of the change I am requesting which was captured in the build logs. She can also choose to approve or reject the change and add comments if needed.
Figure 2. Manual review dialog with a link to the build logs
When the security engineer clicks the review URL, she is presented with the following sumamry of security changes.
Figure 3. Summary of security changes in the build logs
The final feature I want to add is an email notification so the security engineer knows when there is something to approve. To accomplish this, I create a new Amazon Simple Notification Service (SNS) topic and subscription and associate it with the ConfirmPermissionsBroadening Check.
// Create an SNS topic and subscription for security approvals
const topic = new sns.Topic(this, 'SecurityApproval’);
topic.addSubscription(new subscriptions.EmailSubscription('[email protected]'));
deployStage.addPre(
new ConfirmPermissionsBroadening("PermissionCheck", {
stage: deploy,
notificationTopic: topic
})
With the notification configured, the security engineer will receive an email when an approval is needed. She will have an opportunity to review the security change I made and assess the impact. This gives the security engineering team the visibility they want into the policies CDK is generating. In addition, the approval step is skipped if a change does not add security rules so the security engineer does not become a bottle neck in the deployment process.
Conclusion
AWS Cloud Development Kit (CDK) automates the generation of IAM and other security policies. This can save a lot of time and effort but security engineering teams want visibility into the policies CDK generates. To address this, CDK Pipelines provides the ConfirmPermissionsBroadening action. When you add ConfirmPermissionsBroadening to your CI/CD pipeline, CDK will wait for manual approval before deploying a change that includes new security rules.
This post was co-written with Hardik Modi, AVP, Threat and Migitation Products at NETSCOUT.
NETSCOUT Omnis Threat Horizon is a global cybersecurity awareness platform providing users with highly contextualized visibility into “over the horizon” threat activity on the global DDoS (Distributed Denial of Service) landscape—threats that could be impacting their industry, their customers, or their suppliers. It allows visitors to create custom profiles and understand DDoS activity that is being observed in near-real time through NETSCOUT’s ATLAS visibility platform. Users can create free accounts to create customized profiles that lead to a map-based visualization (as in the following screenshot) as well as tailored summary reporting. DDoS attacks can be impactful to services delivered over the internet. Visibility of this nature is key to anyone who wishes to understand what is happening on the threat landscape. Omnis Threat Horizon has been generally available since August 2019.
To provide continuous visibility at a low per-user cost (to enable a free service), the NETSCOUT development team chose a series of AWS technologies to power the collection, storage, analysis, warehousing, user authentication, and delivery of the application. In particular, they chose Amazon OpenSearch Service as the core analytics engine. They store all processed attack records in OpenSearch Service.
This post discusses the challenges and design patterns NETSCOUT used on its path to representing the details of roughly 10 million annual DDoS attacks in near-real time.
Background
NETSCOUT, through its Arbor product line, is a long-time provider of solutions for network visibility and DDoS mitigation for service providers and enterprises. Since 2007, NETSCOUT has operated a program called ATLAS, in which customers can opt to share anonymized data about the DDoS attacks they are observing on their network. As this program has matured, NETSCOUT has comprehensive visibility into the DDoS attack landscape—both the number and nature of attacks. This visibility informs and improves their products, allowing them to share analysis findings in the form of papers, blog posts, and a biannual threat report. Since NETSCOUT started collecting and analyzing data in the current form in September 2012, they have observed 96 million attacks, allowing them to perform considerable analysis of trends across regions and verticals, as well as understand the vectors used and sizes of attacks.
Omnis Threat Horizon is a solution to display this information to a broader audience—essentially anyone interested in the threat landscape, and specifically the DDoS attack trends at any given time. In addition to providing real-time maps, the solution allows the user to go back in time to observe visually or in summary form what might have been happening at a given time.
They wanted to make sure that the visual elements and application was responsive globally, both in terms of representing real-time data as well as showing historical information. Furthermore, they wanted to keep the incremental cost per user as low as possible, in order to be able to provide this service for free globally.
Solution overview
The following diagram illustrates the solution architecture.
One of the objectives behind the chosen solution was to utilize native AWS services in every instance possible. Furthermore, they chose to break component functionality into their own microservices, and make consistent use of this through the solution.
Individual monitoring sensors deliver data to Amazon Simple Storage Service (Amazon S3) on an hourly basis. As new entries are received, Amazon Simple Notification Service (Amazon SNS) notifications are delivered, resulting in processing of the data. Successive microservices are responsible for:
Parsing
Running algorithms to identify and separate spurious entries
Deduplication
Scoring
Confidence
After this processing, each attack is represented as a separate document in the OpenSearch Service domain. As of writing this post, NETSCOUT has approximately 96 million attacks in the cluster, all of which can be represented in some form in the maps and reports in Omnis Threat Horizon.
The data is organized in hourly bin files, and served to the application via Amazon CloudFront.
Lessons learned related to Elasticsearch
On previous projects, NETSCOUT tried Apache Cassandra, a popular NoSQL open-source database, and considered it inadequate for aggregation queries. While developing Horizon, they chose Elasticsearch to get access to more powerful aggregation query capabilities with significantly less developer time.
They started with a self-managed instance, but faced the following issues:
Considerable expenditure of person hours simply to manage the infrastructure
Each version upgrade was an involved process, requiring a lot of planning and still posing technical challenges along the way
No auto scaling and big aggregation queries could break Elasticsearch
After a few cycles of powering through this, they moved to OpenSearch Service to overcome these challenges.
Outcome
NETSCOUT saw the following benefits from this architecture:
Fast processing of attack data – The time from when attack data is received to when it is available in the data store is of the order of seconds, allowing them to provide near-real-time visibility in the solution.
Lower management overhead – The data store grows consistently, and by using a managed service, teams avoid having to perform tasks related to cluster management. This was a big pain point with previous solutions adopted involving the same technology.
Scalable architecture – It’s possible to add new capabilities into the pipeline as requirements emerge, without rearchitecting other components.
Conclusion
With OpenSearch Service, NETSCOUT has been able to build a resilient data store for the attack data they capture. As a result of architectural choices made and the underlying AWS services, they’re able to provide visibility into their data at small incremental costs, allowing them to provide a global visibility platform at no cost to the end-user.
With the most experience, the most reliable, scalable, and secure cloud, and the most comprehensive set of services and solutions, AWS is the best place to unlock value from your data and turn it into insight.
About the Authors
Hardik Modi is AVP, Threat and Migitation Products at NETSCOUT. In this role, he oversees the teams responsible for mitigation products as well as the creation of security content for NETSCOUTs products, enabling best-in-class protection for users, as well as the continuous delivery and publication of impactful research across the DDoS and Intrusion landscapes.
Sujatha Kuppuraju is a Principal Solutions Architect at Amazon Web Services (AWS). She engages with customers to create innovative solutions that address customer business problems and accelerate the adoption of AWS services.
Mike Arruda is a Senior Technical Account Manager at AWS, based in the New England area. He works with AWS Enterprise customers, supporting their success in adopting best practices and helping them achieve their desired business outcomes with AWS.
With the rise of the cloud and increased security awareness, the use of private Amazon VPCs with no public internet access also expanded rapidly. This setup is recommended to make sure of proper security through isolation. The isolation requirement also applies to code pipelines, in which developers deploy their application modules, software packages, and other dependencies and bundles throughout the development lifecycle. This is done without having to push larger bundles from the developer space to the staging space or the target environment. Furthermore, AWS CodeArtifact is used as an artifact management service that will help organizations of any size to securely store, publish, and share software packages used in their software development process.
We’ll walk through the steps required to build a secure, private continuous integration/continuous development (CI/CD) pipeline with no public internet access while maintaining log retention in Amazon CloudWatch. We’ll utilize AWS CodeCommit for source, CodeArtifact for the Modules and software packages, and Amazon Simple Storage Service (Amazon S3) as artifact storage.
Prerequisites
The prerequisites for following along with this post include:
A CI/CD pipeline – This can be CodePipeline, Jenkins or any CI/CD tool you want to integrate CodeArtifact with, we will use CodePipeline in our walkthrough here.
Solution walkthrough
The main service we’ll focus on is CodeArtifact, a fully managed artifact repository service that makes it easy for organizations of any size to securely store, publish, and share software packages used in their software development process. CodeArtifact works with commonly used package managers and build tools, such as Maven and Gradle (Java), npm and yarn (JavaScript), pip and twine (Python), or NuGet (.NET).
Users push code to CodeCommit, CodePipeline will detect the change and start the pipeline, in CodeBuild the build stage will utilize the private endpoints and download the software packages needed without the need to go over the internet.
The preceding diagram shows how the requests remain private within the VPC and won’t go through the Internet gateway, by going from CodeBuild over the private endpoint to CodeArtifact service, all within the private subnet.
The requests will use the following VPC endpoints to connect to these AWS services:
CloudWatch Logs endpoint (for CodeBuild to put logs in CloudWatch)
Expand Additional configurations and scroll down to the VPC section, select the desired VPC, your Subnets (we recommend selecting multiple AZs, to ensure high availability), and Security Group (the security group rules must allow resources that will use the VPC endpoint to communicate with the AWS service to communicate with the endpoint network interface, default VPC security group will be used here as an example)
Select the CloudWatch logs option you can leave the group name and stream empty this will let the service use the default values and click Continue to CodePipeline
After you get the popup click skip again you’ll see the review page, scroll all the way down and click Create Pipeline
Create a VPC endpoint for Amazon CloudWatch Logs. This will enable CodeBuild to send execution logs to CloudWatch:
Navigate to your VPC console, and from the navigation menu on the left select “Endpoints”.
Figure 21. Screenshot: VPC endpoint.
click Create endpoint Button.
Figure 22. Screenshot: Create endpoint.
For service Category, select “AWS Services”. You can set a name for the new endpoint, and make sure to use something descriptive.
Figure 23. Screenshot: Create endpoint page.
From the list of services, search for the endpoint by typing logs in the search bar and selecting the one with com.amazonaws.us-west-2.logs. This walkthrough can be done in any region that supports the services. I am going to be using us-west-2, please select the appropriate region for your workload.
Figure 24. Screenshot: create endpoint select services with com.amazonaws.us-west-2.logs selected.
Select the VPC that you want the endpoint to be associated with, and make sure that the Enable DNS name option is checked under additional settings.
Select the Subnets where you want the endpoint to be associated, and you can leave the security group as default and the policy as empty.
Figure 26. Screenshot: create endpoint subnet setting shows 2 subnet selected and default security group selected.
Select Create Endpoint.
Figure 27. Screenshot: create endpoint button.
Create a VPC endpoint for CodeArtifact. At the time of writing this article, CodeArifact has two endpoints: one is for API operations like service level operations and authentication, and the other is for using the service such as getting modules for our code. We’ll need both endpoints to automate working with CodeArtifact. Therefore, we’ll create both endpoints with DNS enabled.
Follow steps a-c from the steps that were used from the creating the Logs endpoint above.
a. From the list of services, you can search for the endpoint by typing codeartifact in the search bar and selecting the one with com.amazonaws.us-west-2.codeartifact.api.
Figure 28. Screenshot: create endpoint select services with com.amazonaws.us-west-2.codeartifact.api selected.
Follow steps e-g from Part 4.
Then, repeat the same for com.amazon.aws.us-west-2.codeartifact.repositories service.
Figure 29. Screenshot: create endpoint select services with com.amazonaws.us-west-2.codeartifact.api selected.
Enable a VPC endpoint for AWS STS:
Follow steps a-c from Part 4
a. From the list of services you can search for the endpoint by typing sts in the search bar and selecting the one with com.amazonaws.us-west-2.sts.
Figure 30.Screenshot: create endpoint select services with com.amazon.aws.us-west-2.codeartifact.repositories selected.
Then follow steps e-g from Part 4.
Create a VPC endpoint for S3:
Follow steps a-c from Part 4
a. From the list of services you can search for the endpoint by typing sts in the search bar and selecting the one with com.amazonaws.us-west-2.s3, select the one with type of Gateway
Then select your VPC, and select the route tables for your subnets, this will auto update the route table with the new S3 endpoint.
Figure 31. Screenshot: create endpoint select services with com.amazonaws.us-west-2.s3 selected.
Now we have all of the endpoints set. The last step is to update your pipeline to point at the CodeArtifact repository when pulling your code dependencies. I’ll use CodeBuild buildspec.yml as an example here.
Navigate to IAM console and click Roles from the left navigation menu, then search for your IAM role name, in our case since we selected “New service role” option in step 2.k was created with the name “codebuild-Private-service-role” (codebuild-<BUILD PROJECT NAME>-service-role)
Figure 32. Screenshot: IAM roles with codebuild-Private-service-role role shown in search.
From the Add permissions menu, click on Create inline policy
Search for STS in the services then select STS
Figure 34. Screenshot: IAM visual editor with sts shown in search.
Search for “GetCallerIdentity” and select the action
Figure 35. Screenshot: IAM visual editor with GetCallerIdentity in search and action selected.
Repeat the same with “GetServiceBearerToken”
Figure 36. Screenshot: IAM visual editor with GetServiceBearerToken in search and action selected.
Click on Review, add a name then click on Create policy
Figure 37. Screenshot: Review page and Create policy button.
You should see the new inline policy added to the list
Figure 38. Screenshot: shows the new in-line policy in the list.
For CodeArtifact actions we will do the same on that role, click on Create inline policy
Figure 39. Screenshot: attach policies.
Search for CodeArtifact in the services then select CodeArtifact
Figure 40. Screenshot: select service with CodeArtifact in search.
Search for “GetAuthorizationToken” in actions and select that action in the check box
Figure 41. CodeArtifact: with GetAuthorizationToken in search.
Repeat for “GetRepositoryEndpoint” and “ReadFromRepository”
Click on Resources to fix the 2 warnings, then click on Add ARN on the first one “Specify domain resource ARN for the GetAuthorizationToken action.”
Figure 42. Screenshot: with all selected filed and 2 warnings.
You’ll get a pop up with fields for Region, Account and Domain name, enter your region, your account number, and the domain name, we used “private” when we created our domain earlier.
Figure 43. Screenshot: Add ARN page.
Then click Add
Repeat the same process for “Specify repository resource ARN for the ReadFromRepository and 1 more”, and this time we will provide Region, Account ID, Domain name and Repository name, we used “Private” for the repository we created earlier and “private” for domain
Figure 44. Screenshot: add ARN page.
Note it is best practice to specify the resource we are targeting, we can use the checkbox for “Any” but we want to narrow the scope of our IAM role best we can.
Navigate to CodeCommit then click on the repo you created earlier in step1
Figure 45. Screenshot: CodeCommit repo.
Click on Add file dropdown, then Create file button
Paste the following in the editor space:
{
"dependencies": {
"mathjs": "^11.2.0"
}
}
Name the file “package.json”
Add your name and email, and optional commit message
Repeat this process for “index.js” and paste the following in the editor space:
This will force the pipeline to kick off and start building the application
Figure 47. Screenshot: CodePipeline.
This is a very simple application that gets the square root of 49 and log it to the screen, if you click on the Details link from the pipeline build stage, you’ll see the output of running the NodeJS application, the logs are stored in CloudWatch and you can navigate there by clicking on the link the View entire log “Showing the last xx lines of the build log. View entire log”
Figure 48. Screenshot: Showing the last 54 lines of the build log. View entire log.
We used npm example in the buildspec.yml above, Similar setup will be used for pip and twine,
For Maven, Gradle, and NuGet, you must set Environment variables and change your settings.xml and build.gradle, as well as install the plugin for your IDE. For more information, see here.
Cleanup
Navigate to VPC endpoint from the AWS console and delete the endpoints that you created.
Navigate to CodePipeline and delete the Pipeline you created.
Navigate to CodeBuild and delete the Build Project created.
Navigate to CodeCommit and delete the Repository you created.
Navigate to CodeArtifact and delete the Repository and the domain you created.
Navigate to IAM and delete the Roles created:
For CodeBuild: codebuild-<Build Project Name>-service-role
For CodePipeline: AWSCodePipelineServiceRole-<Region>-<Project Name>
Conclusion
In this post, we deployed a full CI/CD pipeline with CodePipeline orchestrating CodeBuild to build and test a small NodeJS application, using CodeArtifact to download the application code dependencies. All without going to the public internet and maintaining the logs in CloudWatch.
Danielle Kucera, Karun Bakshi, and I were privileged to organize the DevOps and Developer Productivity (DOP) track for re:Invent 2022. For 2022, the DOP track included 58 sessions and nearly 100 speakers. If you weren’t able to attend, I have compiled a list of the on-demand sessions for you below.
Leadership Sessions
Delighting developers: Builder experience at AWS Adam Seligman, Vice President of Developer Experience, and Emily Freeman, Head of Community Development, share the latest AWS tools and experiences for teams developing in the cloud. Adam recaps the latest launches and demos how key services can integrate to accelerate developer productivity.
Amazon CodeCatalyst
Amazon CodeCatalyst, announced during Dr. Werner Vogels Keynote, is a unified software development service that makes it faster to build and deliver on AWS.
Introducing Amazon CodeCatalyst – Harry Mower, Director of DevOps Services, and Doug Clauson, Product Manager, provide an overview of Amazon CodeCatalyst. CodeCatalyst provides one place where you can plan work, collaborate on code, and build, test, and deploy applications with nearly continuous integration/continuous delivery (CI/CD) tools.
Deep dive on CodeCatalyst Workspaces – Tmir Karia, Sr. Product Manager, and Rahul Gulati, Sr. Product Manager, discuss how Amazon CodeCatalyst Workspaces decreases the time you spend creating and maintaining a local development environment and allows you to quickly set up a cloud development workspace, switch between projects, and replicate the development workspace configuration across team members.
DevOps
AWS Well-Architected best practices for DevOps on AWS Elamaran Shanmugam, Sr. Container Specialist, and Deval Perikh, Sr. Enterprise Solutions Architect, discuss the components required to align your DevOps practices to the pillars of the AWS Well-Architected Framework.
Best practices for securing your software delivery lifecycle Jams Bland, Principal Solutions Architect, and Curtis Rissi, Principal Solutions Architect, discus ways you can secure your CI/CD pipeline on AWS. Review topics like security of the pipeline versus security in the pipeline, ways to incorporate security checkpoints across various pipeline stages, security event management, and aggregating vulnerability findings into a single pane of glass.
Build it & run it: Streamline your DevOps capabilities with machine learning Rafael Ramos, Shivansh Singh, and Jared Reimer discuss how to use machine learning–powered tools like Amazon CodeWhisperer, Amazon CodeGuru, and Amazon DevOps Guru to boost your applications’ availability and write software faster and more reliably.
Infrastructure as Code
AWS infrastructure as code: A year in review Tatiana Cooke, Principal Product Manager, and Ben Perak, Principal Product Manage, discuss the new features and improvements for AWS infrastructure as code with AWS CloudFormation and AWS CDK.
How to reuse patterns when developing infrastructure as code Ryan Bachman, Ethan Rucinski, and Ravi Palakodeti explore AWS Cloud Development Kit (AWS CDK) constructs and AWS CloudFormation modules and how they make it easier to build applications on AWS.
Governance and security with infrastructure as code David Hessler, Senior DevOps Consultant, and Eric Beard, Senior Solutions Architect, discuss how to use AWS CloudFormation and the AWS CDK to deploy cloud applications in regulated environments while enforcing security controls.
Developer Productivity
Building on AWS with AWS tools, services, and SDKs Kyle Thomson, Senior Software Development Engineer and Deval Parikh, Senior Solutions Architect, discuss the ways developers can set up secure development environments and use their favorite IDEs to interact with, and deploy to, the AWS Cloud.
Sustainability in the cloud with Rust and AWS Graviton Emil Lerch, Principal DevOps Specialist, and Esteban Kuber, Principal Engineer, discuss the benefits of Rust and AWS Graviton that can reduce energy consumption and increase productivity.
This is a guest post co-authored by Mahesh Vandi Chalil, Chief Technology Officer of BookMyShow.
BookMyShow (BMS), a leading entertainment company in India, provides an online ticketing platform for movies, plays, concerts, and sporting events. Selling up to 200 million tickets on an annual run rate basis (pre-COVID) to customers in India, Sri Lanka, Singapore, Indonesia, and the Middle East, BookMyShow also offers an online media streaming service and end-to-end management for virtual and on-ground entertainment experiences across all genres.
The pandemic gave BMS the opportunity to migrate and modernize our 15-year-old analytics solution to a modern data architecture on AWS. This architecture is modern, secure, governed, and cost-optimized architecture, with the ability to scale to petabytes. BMS migrated and modernized from on-premises and other cloud platforms to AWS in just four months. This project was run in parallel with our application migration project and achieved 90% cost savings in storage and 80% cost savings in analytics spend.
The BMS analytics platform caters to business needs for sales and marketing, finance, and business partners (e.g., cinemas and event owners), and provides application functionality for audience, personalization, pricing, and data science teams. The prior analytics solution had multiple copies of data, for a total of over 40 TB, with approximately 80 TB of data in other cloud storage. Data was stored on‑premises and in the cloud in various data stores. Growing organically, the teams had the freedom to choose their technology stack for individual projects, which led to the proliferation of various tools, technology, and practices. Individual teams for personalization, audience, data engineering, data science, and analytics used a variety of products for ingestion, data processing, and visualization.
This post discusses BMS’s migration and modernization journey, and how BMS, AWS, and AWS Partner Minfy Technologies team worked together to successfully complete the migration in four months and saving costs. The migration tenets using the AWS modern data architecture made the project a huge success.
Challenges in the prior analytics platform
Varied Technology: Multiple teams used various products, languages, and versions of software.
Larger Migration Project: Because the analytics modernization was a parallel project with application migration, planning was crucial in order to consider the changes in core applications and project timelines.
Resources: Experienced resource churn from the application migration project, and had very little documentation of current systems.
Data : Had multiple copies of data and no single source of truth; each data store provided a view for the business unit.
Ingestion Pipelines: Complex data pipelines moved data across various data stores at varied frequencies. We had multiple approaches in place to ingest data to Cloudera, via over 100 Kafka consumers from transaction systems and MQTT(Message Queue Telemetry Transport messaging protocol) for clickstreams, stored procedures, and Spark jobs. We had approximately 100 jobs for data ingestion across Spark, Alteryx, Beam, NiFi, and more.
Hadoop Clusters: Large dedicated hardware on which the Hadoop clusters were configured incurring fixed costs. On-premises Cloudera setup catered to most of the data engineering, audience, and personalization batch processing workloads. Teams had their implementation of HBase and Hive for our audience and personalization applications.
Data warehouse: The data engineering team used TiDB as their on-premises data warehouse. However, each consumer team had their own perspective of data needed for analysis. As this siloed architecture evolved, it resulted in expensive storage and operational costs to maintain these separate environments.
Analytics Database: The analytics team used data sourced from other transactional systems and denormalized data. The team had their own extract, transform, and load (ETL) pipeline, using Alteryx with a visualization tool.
Migration tenets followed which led to project success:
Prioritize by business functionality.
Apply best practices when building a modern data architecture from Day 1.
Move only required data, canonicalize the data, and store it in the most optimal format in the target. Remove data redundancy as much possible. Mark scope for optimization for the future when changes are intrusive.
Build the data architecture while keeping data formats, volumes, governance, and security in mind.
Simplify ELT and processing jobs by categorizing the jobs as rehosted, rewritten, and retired. Finalize canonical data format, transformation, enrichment, compression, and storage format as Parquet.
Rehost machine learning (ML) jobs that were critical for business.
Work backward to achieve our goals, and clear roadblocks and alter decisions to move forward.
Use serverless options as a first option and pay per use. Assess the cost and effort for rearchitecting to select the right approach. Execute a proof of concept to validate this for each component and service.
Strategies applied to succeed in this migration:
Team – We created a unified team with people from data engineering, analytics, and data science as part of the analytics migration project. Site reliability engineering (SRE) and application teams were involved when critical decisions were needed regarding data or timeline for alignment. The analytics, data engineering, and data science teams spent considerable time planning, understanding the code, and iteratively looking at the existing data sources, data pipelines, and processing jobs. AWS team with partner team from Minfy Technologies helped BMS arrive at a migration plan after a proof of concept for each of the components in data ingestion, data processing, data warehouse, ML, and analytics dashboards.
Workshops – The AWS team conducted a series of workshops and immersion days, and coached the BMS team on the technology and best practices to deploy the analytics services. The AWS team helped BMS explore the configuration and benefits of the migration approach for each scenario (data migration, data pipeline, data processing, visualization, and machine learning) via proof-of-concepts (POCs). The team captured the changes required in the existing code for migration. BMS team also got acquainted with the following AWS services:
Proof of concept – The BMS team, with help from the partner and AWS team, implemented multiple proofs of concept to validate the migration approach:
Performed batch processing of Spark jobs in Amazon EMR, in which we checked the runtime, required code changes, and cost.
Ran clickstream analysis jobs in Amazon EMR, testing the end-to-end pipeline. Team conducted proofs of concept on AWS IoT Core for MQTT protocol and streaming to Amazon S3.
Migrated ML models to Amazon SageMaker and orchestrated with Amazon MWAA.
Created sample QuickSight reports and dashboards, in which features and time to build were assessed.
Configured for key scenarios for Amazon Redshift, in which time for loading data, query performance, and cost were assessed.
Effort vs. cost analysis – Team performed the following assessments:
Compared the ingestion pipelines, the difference in data structure in each store, the basis of the current business need for the data source, the activity for preprocessing the data before migration, data migration to Amazon S3, and change data capture (CDC) from the migrated applications in AWS.
Assessed the effort to migrate approximately 200 jobs, determined which jobs were redundant or need improvement from a functional perspective, and completed a migration list for the target state. The modernization of the MQTT workflow code to serverless was time-consuming, decided to rehost on Amazon Elastic Compute Cloud (Amazon EC2) and modernization to Amazon Kinesis in to the next phase.
Reviewed over 400 reports and dashboards, prioritized development in phases, and reassessed business user needs.
AWS cloud services chosen for proposed architecture:
Data lake – We used Amazon S3 as the data lake to store the single truth of information for all raw and processed data, thereby reducing the copies of data storage and storage costs.
Ingestion – Because we had multiple sources of truth in the current architecture, we arrived at a common structure before migration to Amazon S3, and existing pipelines were modified to do preprocessing. These one-time preprocessing jobs were run in Cloudera, because the source data was on-premises, and on Amazon EMR for data in the cloud. We designed new data pipelines for ingestion from transactional systems on the AWS cloud using AWS Glue ETL.
Processing – Processing jobs were segregated based on runtime into two categories: batch and near-real time. Batch processes were further divided into transient Amazon EMR clusters with varying runtimes and Hadoop application requirements like HBase. Near-real-time jobs were provisioned in an Amazon EMR permanent cluster for clickstream analytics, and a data pipeline from transactional systems. We adopted a serverless approach using AWS Glue ETL for new data pipelines from transactional systems on the AWS cloud.
Data warehouse – We chose Amazon Redshift as our data warehouse, and planned on how the data would be distributed based on query patterns.
Visualization – We built the reports in Amazon QuickSight in phases and prioritized them based on business demand. We discussed with business users their current needs and identified the immediate reports required. We defined the phases of report and dashboard creation and built the reports in Amazon QuickSight. We plan to use embedded reports for external users in the future.
Machine learning – Custom ML models were deployed on Amazon SageMaker. Existing Airflow DAGs were migrated to Amazon MWAA.
Governance, security, and compliance – Governance with Amazon Lake Formation was adopted from Day 1. We configured the AWS Glue Data Catalog to reference data used as sources and targets. We had to comply to Payment Card Industry (PCI) guidelines because payment information was in the data lake, so we ensured the necessary security policies.
Solution overview
BMS modern data architecture
The following diagram illustrates our modern data architecture.
The architecture includes the following components:
Source systems – These include the following:
Data from transactional systems stored in MariaDB (booking and transactions).
User interaction clickstream data via Kafka consumers to DataOps MariaDB.
Members and seat allocation information from MongoDB.
SQL Server for specific offers and payment information.
Data pipeline – Spark jobs on an Amazon EMR permanent cluster process the clickstream data from Kafka clusters.
Data lake – Data from source systems was stored in their respective Amazon S3 buckets, with prefixes for optimized data querying. For Amazon S3, we followed a hierarchy to store raw, summarized, and team or service-related data in different parent folders as per the source and type of data. Lifecycle polices were added to logs and temp folders of different services as per teams’ requirements.
Data processing – Transient Amazon EMR clusters are used for processing data into a curated format for the audience, personalization, and analytics teams. Small file merger jobs merge the clickstream data to a larger file size, which saved costs for one-time queries.
Governance – AWS Lake Formation enables the usage of AWS Glue crawlers to capture the schema of data stored in the data lake and version changes in the schema. The Data Catalog and security policy in AWS Lake Formation enable access to data for roles and users in Amazon Redshift, Amazon Athena, Amazon QuickSight, and data science jobs. AWS Glue ETL jobs load the processed data to Amazon Redshift at scheduled intervals.
Queries – The analytics team used Amazon Athena to perform one-time queries raised from business teams on the data lake. Because report development is in phases, Amazon Athena was used for exporting data.
Data warehouse – Amazon Redshift was used as the data warehouse, where the reports for the sales teams, management, and third parties (i.e., theaters and events) are processed and stored for quick retrieval. Views to analyze the total sales, movie sale trends, member behavior, and payment modes are configured here. We use materialized views for denormalized tables, different schemas for metadata, and transactional and behavior data.
Reports – We used Amazon QuickSight reports for various business, marketing, and product use cases.
Machine learning – Some of the models deployed on Amazon SageMaker are as follows:
Content popularity – Decides the recommended content for users.
Live event popularity – Calculates the popularity of live entertainment events in different regions.
Trending searches – Identifies trending searches across regions.
Walkthrough
Migration execution steps
We standardized tools, services, and processes for data engineering, analytics, and data science:
Data lake
Identified the source data to be migrated from Archival DB, BigQuery, TiDB, and the analytics database.
Built a canonical data model that catered to multiple business teams and reduced the copies of data, and therefore storage and operational costs. Modified existing jobs to facilitate migration to a canonical format.
Identified the source systems, capacity required, anticipated growth, owners, and access requirements.
Ran the bulk data migration to Amazon S3 from various sources.
Ingestion
Transaction systems – Retained the existing Kafka queues and consumers.
Clickstream data – Successfully conducted a proof of concept to use AWS IoT Core for MQTT protocol. But because we needed to make changes in the application to publish to AWS IoT Core, we decided to implement it as part of mobile application modernization at a later time. We decided to rehost the MQTT server on Amazon EC2.
Processing
Listed the data pipelines relevant to business and migrated them with minimal modification.
Categorized workloads into critical jobs, redundant jobs, or jobs that can be optimized:
Spark jobs were migrated to Amazon EMR.
HBase jobs were migrated to Amazon EMR with HBase.
Metadata stored in Hive-based jobs were modified to use the AWS Glue Data Catalog.
NiFi jobs were simplified and rewritten in Spark run in Amazon EMR.
Amazon EMR clusters were configured one persistent cluster for streaming the clickstream and personalization workloads. We used multiple transient clusters for running all other Spark ETL or processing jobs. We used Spot Instances for task nodes to save costs. We optimized data storage with specific jobs to merge small files and compressed file format conversions.
AWS Glue crawlers identified new data in Amazon S3. AWS Glue ETL jobs transformed and uploaded processed data to the Amazon Redshift data warehouse.
Datawarehouse
Defined the data warehouse schema by categorizing the critical reports required by the business, keeping in mind the workload and reports required in future.
Defined the staging area for incremental data loaded into Amazon Redshift, materialized views, and tuning the queries based on usage. The transaction and primary metadata are stored in Amazon Redshift to cater to all data analysis and reporting requirements. We created materialized views and denormalized tables in Amazon Redshift to use as data sources for Amazon QuickSight dashboards and segmentation jobs, respectively.
Optimally used the Amazon Redshift cluster by loading last two years data in Amazon Redshift, and used Amazon Redshift Spectrum to query historical data through external tables. This helped balance the usage and cost of the Amazon Redshift cluster.
Visualization
Amazon QuickSight dashboards were created for the sales and marketing team in Phase 1:
Sales summary report – An executive summary dashboard to get an overview of sales across the country by region, city, movie, theatre, genre, and more.
Live entertainment – A dedicated report for live entertainment vertical events.
Coupons – A report for coupons purchased and redeemed.
BookASmile – A dashboard to analyze the data for BookASmile, a charity initiative.
Machine learning
Listed the ML workloads to be migrated based on current business needs.
Priority ML processing jobs were deployed on Amazon EMR. Models were modified to use Amazon S3 as source and target, and new APIs were exposed to use the functionality. ML models were deployed on Amazon SageMaker for movies, live event clickstream analysis, and personalization.
Existing artifacts in Airflow orchestration were migrated to Amazon MWAA.
Security
AWS Lake Formation was the foundation of the data lake, with the AWS Glue Data Catalog as the foundation for the central catalog for the data stored in Amazon S3. This provided access to the data by various functionalities, including the audience, personalization, analytics, and data science teams.
Personally identifiable information (PII) and payment data was stored in the data lake and data warehouse, so we had to comply to PCI guidelines. Encryption of data at rest and in transit was considered and configured in each service level (Amazon S3, AWS Glue Data Catalog, Amazon EMR, AWS Glue, Amazon Redshift, and QuickSight). Clear roles, responsibilities, and access permissions for different user groups and privileges were listed and configured in AWS Identity and Access Management (IAM) and individual services.
Existing single sign-on (SSO) integration with Microsoft Active Directory was used for Amazon QuickSight user access.
Automation
We used AWS CloudFormation for the creation and modification of all the core and analytics services.
AWS Step Functions was used to orchestrate Spark jobs on Amazon EMR.
Scheduled jobs were configured in AWS Glue for uploading data in Amazon Redshift based on business needs.
Monitoring of the analytics services was done using Amazon CloudWatch metrics, and right-sizing of instances and configuration was achieved. Spark job performance on Amazon EMR was analyzed using the native Spark logs and Spark user interface (UI).
Lifecycle policies were applied to the data lake to optimize the data storage costs over time.
Benefits of a modern data architecture
A modern data architecture offered us the following benefits:
Scalability – We moved from a fixed infrastructure to the minimal infrastructure required, with configuration to scale on demand. Services like Amazon EMR and Amazon Redshift enable us to do this with just a few clicks.
Agility – We use purpose-built managed services instead of reinventing the wheel. Automation and monitoring were key considerations, which enable us to make changes quickly.
Serverless – Adoption of serverless services like Amazon S3, AWS Glue, Amazon Athena, AWS Step Functions, and AWS Lambda support us when our business has sudden spikes with new movies or events launched.
Cost savings – Our storage size was reduced by 90%. Our overall spend on analytics and ML was reduced by 80%.
Conclusion
In this post, we showed you how a modern data architecture on AWS helped BMS to easily share data across organizational boundaries. This allowed BMS to make decisions with speed and agility at scale; ensure compliance via unified data access, security, and governance; and to scale systems at a low cost without compromising performance. Working with the AWS and Minfy Technologies teams helped BMS choose the correct technology services and complete the migration in four months. BMS achieved the scalability and cost-optimization goals with this updated architecture, which has set the stage for innovation using graph databases and enhanced our ML projects to improve customer experience.
About the Authors
Mahesh Vandi Chalil is Chief Technology Officer at BookMyShow, India’s leading entertainment destination. Mahesh has over two decades of global experience, passionate about building scalable products that delight customers while keeping innovation as the top goal motivating his team to constantly aspire for these. Mahesh invests his energies in creating and nurturing the next generation of technology leaders and entrepreneurs, both within the organization and outside of it. A proud husband and father of two daughters and plays cricket during his leisure time.
Priya Jathar is a Solutions Architect working in Digital Native Business segment at AWS. She has more two decades of IT experience, with expertise in Application Development, Database, and Analytics. She is a builder who enjoys innovating with new technologies to achieve business goals. Currently helping customers Migrate, Modernise, and Innovate in Cloud. In her free time she likes to paint, and hone her gardening and cooking skills.
Vatsal Shah is a Senior Solutions Architect at AWS based out of Mumbai, India. He has more than nine years of industry experience, including leadership roles in product engineering, SRE, and cloud architecture. He currently focuses on enabling large startups to streamline their cloud operations and help them scale on the cloud. He also specializes in AI and Machine Learning use cases.
Amazon CodeCatalyst enables teams to collaborate on features, tasks, bugs, and any other work involved when building software. CodeCatalyst was announced at re:Invent 2022 and is currently in preview.
Have an AWS account associated with your space and have the IAM role in that account. For more information about the role and role policy, see Creating a CodeCatalyst service role.
Walkthrough
Similar to the prior post, I am going to use the Modern Three-tier Web Application blueprint in this walkthrough. A CodeCatalyst blueprint provides a template for a new project. If you would like to follow along, you can launch the blueprint as described in Creating a project in Amazon CodeCatalyst. This will deploy the Mythical Mysfits sample application shown in the following image.
Figure 1. The Mythical Mysfits user interface showing header and three Mysfits
CodeCatalyst organizes projects into Spaces. A space represents your company, department, or group; and contains projects, members, and the associated cloud resources you create in CodeCatalyst. In this walkthrough, my Space currently includes two members, Brian Beach and Panna Shetty, as shown in the following screenshot. Note that both users are administrators, but CodeCatalyst supports multiple roles. You can read more about roles in members of your space.
Figure 2. The space members configuration page showing two users
To begin, Brian creates a new issue to track the request from legal. He assigns the issue to Panna, but leaves it in the backlog for now. Note that CodeCatalyst supports multiple metadata fields to organize your work. This issue is not impacting users and is relatively simple to fix. Therefore, Brian has categorized it as low priority and estimated the effort as extra small (XS). Brian has also added a label, so all the requests from legal can be tracked together. Note that these metadata fields are customizable. You can read more in configuring issue settings.
Figure 3. Create issue dialog box with name, description and metadata
CodeCatalyst supports rich markdown in the description field. You can read about this in Markdown tips and tricks. In the following screenshot, Brian types “@app.vue” which brings up an inline search for people, issues, and code to help Panna find the relevant bit of code that needs changing later.
Figure 4. Create issue dialog box with type-ahead overlay
When Panna is ready to begin work on the new feature, she moves the issue from the “Backlog“ to ”In progress.“ CodeCatalyst allows users to manage their work using a Kanban style board. Panna can simply drag-and-drop issues on the board to move the issue from one state to another. Given the small team, Brian and Panna use a single board. However, CodeCatalyst allows you to create multiple views filtered by the metadata fields discussed earlier. For example, you might create a label called Sprint-001, and use that to create a board for the sprint.
Figure 5. Kanban board showing to do, in progress and in review columns
Panna creates a new branch for the change called feature_add_copyright and uses the link in the issue description to navigate to the source code repository. This change is so simple that she decides to edit the file in the browser and commits the change. Note that for more complex changes, CodeCatalyst supports Dev Environments. The next post in this series will be dedicated to Dev Environments. For now, you just need to know that a Dev Environment is a cloud-based development environment that you can use to quickly work on the code stored in the source repositories of your project.
Figure 6. Editor with new lines highlighted
Panna also creates a pull request to merge the feature branch in to the main branch. She identifies Brian as a required reviewer. Panna then moves the issue to the “In review” column on the Kanban board so the rest of the team can track the progress. Once Brian reviews the change, he approves and merges the pull request.
Figure 7. Pull request details with title, description, and reviewed assigned
When the pull request is merged, a workflow is configured to run automatically on code changes to build, test, and deploy the change. Note that Workflows were covered in the prior post in this series. Once the workflow is complete, Panna is notified in the team’s Slack channel. You can read more about notifications in working with notifications in CodeCatalyst. She verifies the change in production and moves the issue to the done column on the Kanban board.
Figure 8. Kanban board showing in progress, in review, and done columns
Once the deployment completes, you will see the footer added at the bottom of the page.
Figure 9. The Mythical Mysfits user interface showing footer and three Mysfits
At this point the issue is complete and you have seen how this small team collaborated to progress through the entire software development lifecycle (SDLC).
Cleanup
If you have been following along with this workflow, you should delete the resources you deployed so you do not continue to incur charges. First, delete the two stacks that CDK deployed using the AWS CloudFormation console in the AWS account you associated when you launched the blueprint. These stacks will have names like mysfitsXXXXXWebStack and mysfitsXXXXXAppStack. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project.
Conclusion
In this post, you learned how CodeCatalyst can help you rapidly collaborate with other developers. I used issues to track feature and bugs, assigned code reviews, and managed pull requests. In future posts I will continue to discuss how CodeCatalyst can address the rest of the challenges Maxine encountered in The Unicorn Project.
The AWS Cloud Development Kit (CDK) accelerates cloud development by allowing developers to use common programming languages when modelling their applications. To take advantage of this speed, developers need to operate in an environment where permissions and security controls don’t slow things down, and in a tightly controlled environment this is not always the case. Of particular concern is the scenario where a developer has permission to create AWS Identity and Access Management (IAM) entities (such as users or roles), as these could have permissions beyond that of the developer who created them, allowing for an escalation of privileges. This approach is typically controlled through the use of permission boundaries for IAM entities, and in this post you will learn how these boundaries can now be applied more effectively to CDK development – allowing developers to stay secure and move fast.
Applying custom permission boundaries to CDK deployments
When the CDK deploys a solution, it assumes a AWS CloudFormation execution role to perform operations on the user’s behalf. This role is created during the bootstrapping phase by the AWS CDK Command Line Interface (CLI). This role should be configured to represent the maximum set of actions that CloudFormation can perform on the developers behalf, while not compromising any compliance or security goals of the organisation. This can become complicated when developers need to create IAM entities (such as IAM users or roles) and assign permissions to them, as those permissions could be escalated beyond their existing access levels. Taking away the ability to create these entities is one way to solve the problem. However, doing this would be a significant impediment to developers, as they would have to ask an administrator to create them every time. This is made more challenging when you consider that security conscious practices will create individual IAM roles for every individual use case, such as each AWS Lambda Function in a stack. Rather than taking this approach, IAM permission boundaries can help in two ways – first, by ensuring that all actions are within the overlap of the users permissions and the boundary, and second by ensuring that any IAM entities that are created also have the same boundary applied. This blocks the path to privilege escalation without restricting the developer’s ability to create IAM identities. With the latest version of the AWS CLI these boundaries can be applied to the execution role automatically when running the bootstrap command, as well as being added to IAM entities that are created in a CDK stack.
To use a permission boundary in the CDK, first create an IAM policy that will act as the boundary. This should define the maximum set of actions that the CDK application will be able to perform on the developer’s behalf, both during deployment and operation. This step would usually be performed by an administrator who is responsible for the security of the account, ensuring that the appropriate boundaries and controls are enforced. Once created, the name of this policy is provided to the bootstrap command. In the example below, an IAM policy called “developer-policy” is used to demonstrate the command. cdk bootstrap –custom-permissions-boundary developer-policy Once this command runs, a new bootstrap stack will be created (or an existing stack will be updated) so that the execution role has this boundary applied to it. Next, you can ensure that any IAM entities that are created will have the same boundaries applied to them. This is done by either using a CDK context variable, or the permissionBoundary attribute on those resources. To explain this in some detail, let’s use a real world scenario and step through an example that shows how this feature can be used to restrict developers from using the AWS Config service.
Installing or upgrading the AWS CDK CLI
Before beginning, ensure that you have the latest version of the AWS CDK CLI tool installed. Follow the instructions in the documentation to complete this. You will need version 2.54.0 or higher to make use of this new feature. To check the version you have installed, run the following command.
cdk --version
Creating the policy
First, let’s begin by creating a new IAM policy. Below is a CloudFormation template that creates a permission policy for use in this example. In this case the AWS CLI can deploy it directly, but this could also be done at scale through a mechanism such as CloudFormation Stack Sets. This template has the following policy statements:
Allow all actions by default – this allows you to deny the specific actions that you choose. You should carefully consider your approach to allow/deny actions when creating your own policies though.
Deny the creation of users or roles unless the “developer-policy” permission boundary is used. Additionally limit the attachment of permissions boundaries on existing entities to only allow “developer-policy” to be used. This prevents the creation or change of an entity that can escalate outside of the policy.
Deny the ability to change the policy itself so that a developer can’t modify the boundary they will operate within.
Deny the ability to remove the boundary from any user or role
Deny any actions against the AWS Config service
Here items 2, 3 and 4 all ensure that the permission boundary works correctly – they are controls that prevent the boundary being removed, tampered with, or bypassed. The real focus of this policy in terms of the example are items 1 and 5 – where you allow everything, except the specific actions that are denied (creating a deny list of actions, rather than an allow list approach).
Resources:
PermissionsBoundary:
Type: AWS::IAM::ManagedPolicy
Properties:
PolicyDocument:
Statement:
# ----- Begin base policy ---------------
# If permission boundaries do not have an explicit allow
# then the effect is deny
- Sid: ExplicitAllowAll
Action: "*"
Effect: Allow
Resource: "*"
# Default permissions to prevent privilege escalation
- Sid: DenyAccessIfRequiredPermBoundaryIsNotBeingApplied
Action:
- iam:CreateUser
- iam:CreateRole
- iam:PutRolePermissionsBoundary
- iam:PutUserPermissionsBoundary
Condition:
StringNotEquals:
iam:PermissionsBoundary:
Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
Effect: Deny
Resource: "*"
- Sid: DenyPermBoundaryIAMPolicyAlteration
Action:
- iam:CreatePolicyVersion
- iam:DeletePolicy
- iam:DeletePolicyVersion
- iam:SetDefaultPolicyVersion
Effect: Deny
Resource:
Fn::Sub: arn:${AWS::Partition}:iam::${AWS::AccountId}:policy/developer-policy
- Sid: DenyRemovalOfPermBoundaryFromAnyUserOrRole
Action:
- iam:DeleteUserPermissionsBoundary
- iam:DeleteRolePermissionsBoundary
Effect: Deny
Resource: "*"
# ----- End base policy ---------------
# -- Begin Custom Organization Policy --
- Sid: DenyModifyingOrgCloudTrails
Effect: Deny
Action: config:*
Resource: "*"
# -- End Custom Organization Policy --
Version: "2012-10-17"
Description: "Bootstrap Permission Boundary"
ManagedPolicyName: developer-policy
Path: /
Save the above locally as developer-policy.yaml and then you can deploy it with a CloudFormation command in the AWS CLI:
To begin, create a new CDK application that you will use to test and observe the behaviour of the permission boundary. Create a new directory with a TypeScript CDK application in it by executing these commands.
mkdir DevUsers && cd DevUsers
cdk init --language typescript
Once this is done, you should also make sure that your account has a CDK bootstrap stack deployed with the cdk bootstrap command – to start with, do not apply a permission boundary to it, you can add that later an observe how it changes the behaviour of your deployment. Because the bootstrap command is not using the --cloudformation-execution-policies argument, it will default to arn:aws:iam::aws:policy/AdministratorAccess which means that CloudFormation will have full access to the account until the boundary is applied.
cdk bootstrap
Once the command has run, create an AWS Config Rule in your application to be sure that this works without issue before the permission boundary is applied. Open the file lib/dev_users-stack.ts and edit its contents to reflect the sample below.
import * as cdk from 'aws-cdk-lib';
import { ManagedRule, ManagedRuleIdentifiers } from 'aws-cdk-lib/aws-config';
import { Construct } from "constructs";
export class DevUsersStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
new ManagedRule(this, 'AccessKeysRotated', {
configRuleName: 'access-keys-policy',
identifier: ManagedRuleIdentifiers.ACCESS_KEYS_ROTATED,
inputParameters: {
maxAccessKeyAge: 60, // default is 90 days
},
});
}
}
Next you can deploy with the CDK CLI using the cdk deploy command, which will succeed (the output below has been truncated to show a summary of the important elements).
Before you deploy the permission boundary, remove this stack again with the cdk destroy command.
❯ cdk destroy
Are you sure you want to delete: DevUsersStack (y/n)? y
DevUsersStack: destroying... [1/1]
DevUsersStack: destroyed
Using a permission boundary with the CDK test application
Now apply the permission boundary that you created above and observe the impact it has on the same deployment. To update your booststrap with the permission boundary, re-run the cdk bootstrap command with the new custom-permissions-boundary parameter.
After this command executes, the CloudFormation execution role will be updated to use that policy as a permission boundary, which based on the deny rule for config:* will cause this same application deployment to fail. Run cdk deploy again to confirm this and observe the error message.
Deployment failed: Error: Stack Deployments Failed: Error: The stack
named DevUsersStack failed creation, it may need to be manually deleted
from the AWS console:
ROLLBACK_COMPLETE:
User: arn:aws:sts::123456789012:assumed-role/cdk-hnb659fds-cfn-exec-role-123456789012-ap-southeast-2/AWSCloudFormation
is not authorized to perform: config:PutConfigRule on resource: access-keys-policy with an explicit deny in a
permissions boundary
This shows you that the action was denied specifically due to the use of a permissions boundary, which is what was expected.
Applying permission boundaries to IAM entities automatically
Next let’s explore how the permission boundary can be extended to IAM entities that are created by a CDK application. The concern here is that a developer who is creating a new IAM entity could assign it more permissions than they have themselves – the permission boundary manages this by ensuring that entities can only be created that also have the boundary attached. You can validate this by modifying the stack to deploy a Lambda function that uses a role that doesn’t include the boundary. Open the file lib/dev_users-stack.ts again and edit its contents to reflect the sample below.
Here the AwsCustomResource is used to provision a Lambda function that will attempt to create a new config rule. This is the same result as the previous stack but in this case the creation of the rule is done by a new IAM role that is created by the CDK construct for you. Attempting to deploy this will result in a failure – run cdk deploy to observe this.
Deployment failed: Error: Stack Deployments Failed: Error: The stack named
DevUsersStack failed creation, it may need to be manually deleted from the AWS
console:
ROLLBACK_COMPLETE:
API: iam:CreateRole User: arn:aws:sts::123456789012:assumed-
role/cdk-hnb659fds-cfn-exec-role-123456789012-ap-southeast-2/AWSCloudFormation
is not authorized to perform: iam:CreateRole on resource:
arn:aws:iam::123456789012:role/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd2287S-1EAD7M62914OZ
with an explicit deny in a permissions boundary
The error message here details that the stack was unable to deploy because the call to iam:CreateRole failed because the boundary wasn’t applied. The CDK now offers a straightforward way to set a default permission boundary on all IAM entities that are created, via the CDK context variable core:permissionsBoundary in the cdk.json file.
This approach is useful because now you can import constructs that create IAM entities (such as those found on Construct Hub or out of the box constructs that create default IAM roles) and have the boundary apply to them as well. There are alternative ways to achieve this, such as setting a boundary on specific roles, which can be used in scenarios where this approach does not fit. Make the change to your cdk.json file and run the CDK deploy again. This time the custom resource will attempt to create the config rule using its IAM role instead of the CloudFormation execution role. It is expected that the boundary will also protect this Lambda function in the same way – run cdk deploy again to confirm this. Note that the deployment updates from CloudFormation show that this time the role creation succeeds this time, and a new error message is generated.
Deployment failed: Error: Stack Deployments Failed: Error: The stack named
DevUsersStack failed creation, it may need to be manually deleted from the AWS
console:
ROLLBACK_COMPLETE:
Received response status [FAILED] from custom resource. Message returned: User:
arn:aws:sts::123456789012:assumed-role/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd2287S-84VFVA7OGC9N/DevUsersStack-
AWS679f53fac002430cb0da5b7982bd22872-MBnArBmaaLJp
is not authorized to perform: config:PutConfigRule on resource: SampleRule with an explicit deny in a permissions boundary
In this error message you can see that the user it refers to is DevUsersStack-AWS679f53fac002430cb0da5b7982bd2287S-84VFVA7OGC9N rather than the CloudFormation execution role. This is the role being used by the custom Lambda function resource, and when it attempts to create the Config rule it is rejected because of the permissions boundary in the same way. Here you can see how the boundary is being applied consistently to all IAM entities that are created in your CDK app, which ensures the administrative controls can be applied consistently to everything a developer does with a minimal amount of overhead.
Cleanup
At this point you can either choose to remove the CDK bootstrap stack if you no longer require it, or remove the permission boundary from the stack. To remove it, delete the CDKToolkit stack from CloudFormation with this AWS CLI command.
If you want to keep the bootstrap stack, you can remove the boundary by following these steps:
Browse to the CloudFormation page in the AWS console, and select the CDKToolit stack.
Select the ‘Update’ button. Choose “Use Current Template” and then press ‘Next’
On the parameters page, find the value InputPermissionsBoundary which will have developer-policy as the value, and delete the text in this input to leave it blank. Press ‘Next’ and the on the following page, press ‘Next’ again
On the final page, scroll to the bottom and check the box acknowledging that CloudFormation might create IAM resources with custom names, and choose ‘Submit’
With the permission boundary no longer being used, you can now remove the stack that created it as the final step.
Now you can see how IAM permission boundaries can easily be integrated in to CDK development, helping ensure developers have the control they need while administrators can ensure that security is managed in a way that meets the needs of the organisation as well.
With this being understood, there are next steps you can take to further expand on the use of permission boundaries. The CDK Security and Safety Developer Guide document on GitHub outlines these approaches, as well as ways to think about your approach to permissions on deployment. It’s recommended that developers and administrators review this, and work to develop and appropriate approach to permission policies that suit your security goals.
Additionally, the permission boundary concept can be applied in a multi-account model where each Stage has a unique boundary name applied. This can allow for scenarios where a lower-level environment (such as a development or beta environment) has more relaxed permission boundaries that suit troubleshooting and other developer specific actions, but then the higher level environments (such as gamma or production) could have the more restricted permission boundaries to ensure that security risks are more appropriately managed. The mechanism for implement this is defined in the security and safety developer guide also.
This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.
In part 1, I showed you how to run Apache CloudStack with KVM on a single Amazon Elastic Compute Cloud (Amazon EC2) instance. That simple setup is great for experimentation and light workloads. In this post, things will get a lot more interesting. I’ll show you how to create an overlay network in your Amazon Virtual Private Cloud (Amazon VPC) that allows CloudStack to scale horizontally across multiple EC2 instances. This same method could work with other hypervisors, too.
If you haven’t read it yet, then start with part 1. It explains why this network setup is necessary. The same prerequisites apply to both posts.
Making things easier
I wrote some scripts to automate the CloudStack installation and OS configuration on CentOS 7. You can customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.
The scalable method
Our team started out using a single EC2 instance, as described in my last post. That worked at first, but it didn’t have the capacity we needed. We were limited to a couple of dozen VMs, but we needed hundreds. We also needed to scale up and down as our needs changed. This meant we needed the ability to add and remove CloudStack hosts. Using a Linux bridge as a virtual subnet was no longer adequate.
To support adding hosts, we need a subnet that spans multiple instances. The solution I found is Virtual Extensible LAN (VXLAN). It’s lightweight, easy to configure, and included in the Linux kernel. VXLAN creates a layer 2 overlay network that abstracts away the details of the underlying network. It allows machines in different parts of a network to communicate as if they’re all attached to the same simple network switch.
Another example of an overlay network is an Amazon VPC. It acts like a physical network, but it’s actually a layer on top of other networks. It’s networks all the way down. VXLAN provides a top layer where CloudStack can sit comfortably, handling all of your VM needs, blissfully unaware of the world below it.
An overlay network comes with some big advantages. The biggest improvement is that you can have multiple hosts, allowing for horizontal scaling. Having more hosts not only gives you more computing power, but also lets you do rolling maintenance. Instead of putting the database and file storage on the management server, I’ll show you how to use Amazon Elastic File System (Amazon EFS) and Amazon Relational Database Service (Amazon RDS) for scalable and reliable storage.
EC2 Instances
Let’s start with three Amazon EC2 instances. One will be a router between the overlay network and your Amazon VPC, the second one will be your CloudStack management server, and the third one will be your VM host. You’ll also need a way to connect to your instances, such as a bastion host or a VPN endpoint.
The router won’t need much computing power, but it will need enough network bandwidth to meet your needs. If you put Amazon EFS in the same subnet as your instances, then they’ll communicate with it directly, thereby reducing the load on the router. Decide how much network throughput you want, and then pick a suitable Nitro instance type.
After creating the router instance, configure AWS to use it as a router. Stop source/destination checking in the instance’s network settings. Then update the applicable AWS route tables to use the router as the target for the overlay network. The router’s security group needs to allow ingress to the CloudStack UI (TCP port 8080) and any services you plan to offer from VMs.
For the management server, you’ll want a Nitro instance type. It’s going to use more CPU than your router, so plan accordingly.
In addition to being a Nitro type, the host instance must also be a metal type. Metal instances have hardware virtualization support, which is needed by KVM. If you have a new AWS account with a low on-demand vCPU limit, then consider starting with an m5zn.metal, which has 48 vCPUs. Otherwise, I suggest going directly to a c5.metal because it provides 96 vCPUs for a similar price. There are bigger types available depending on your compute needs, budget, and vCPU limit. If your account’s on-demand vCPU limit is too low, then you can file a support ticket to have it raised.
Networking
All of the instances should be on a dedicated subnet. Sharing the subnet with other instances can cause communication issues. For an example, refer to the following figure. The subnet has an instance named TroubleMaker that’s not on the overlay network. If TroubleMaker sends a request to the management instance’s overlay network address, then here’s what happens:
The request goes through the AWS subnet to the router.
The router forwards the request via the overlay network.
The CloudStack management instance has a connection to the same AWS subnet that TroubleMaker is on. Therefore, it responds directly instead of using the router. This isn’t the return path that AWS is expecting, so the response is dropped.
If you move TroubleMaker to a different subnet, then the requests and responses will all go through the router. That will fix the communication issues.
The instances in the overlay network will use special interfaces that serve as VXLAN tunnel endpoints (VTEPs). The VTEPs must know how to contact each other via the underlay network. You could manually give each instance a list of all of the other instances, but that’s a maintenance nightmare. It’s better to let the VTEPs discover each other, which they can do using multicast. You can add multicast support using AWS Transit Gateway.
Here are the steps to make VXLAN multicasts work:
Enable multicast support when you create the transit gateway.
Attach the transit gateway to your subnet.
Create a transit gateway multicast domain with IGMPv2 support enabled.
Associate the multicast domain with your subnet.
Configure the eth0 interface on each instance to use IGMPv2. The following sample code shows how to do this.
Make sure that your instance security groups allow ingress for IGMP queries (protocol 2 traffic from 0.0.0.0/32) and VXLAN traffic (UDP port 4789 from the other instances).
CloudStack VMs must connect to the same bridge as the VXLAN interface. As mentioned in the previous post, CloudStack cares about names. I recommend giving the interface a name starting with “eth”. Moreover, this naming convention tells CloudStack which bridge to use, thereby avoiding the need for a dummy interface like the one in the simple setup.
The following snippet shows how I configured the networking in CentOS 7. You must provide values for these variables:
$overlay_host_ip_address, $overlay_netmask, and $overlay_gateway_ip: Use values for the overlay network that you’re creating.
$multicast_address: The multicast address that you want VXLAN to use. Pick something in the multicast range that won’t conflict with anything else. I recommend choosing from the IPv4 local scope (239.255.0.0/16).
$interface_name: The name of the interface VXLAN should use to communicate with the physical network. This is typically eth0.
A couple of the steps are different for the router instance than for the other instances. Pay attention to the comments!
yum install -y bridge-utils net-tools
# IMPORTANT: Omit the GATEWAY setting on the router instance!
cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=no
USERCTL=no
NM_CONTROLLED=no
IPADDR=$overlay_host_ip_address
NETMASK=$overlay_netmask
DNS1=$dns_address
GATEWAY=$overlay_gateway_ip
EOF
cat << EOF > /sbin/ifup-local
#!/bin/bash
# Set up VXLAN once cloudbr0 is available.
if [[ \$1 == "cloudbr0" ]]
then
ip link add ethvxlan0 type vxlan id 100 dstport 4789 group "$multicast_address" dev "$interface_name"
brctl addif cloudbr0 ethvxlan0
ip link set up dev ethvxlan0
fi
EOF
chmod +x /sbin/ifup-local
# Transit Gateway requires IGMP version 2
echo "net.ipv4.conf.$interface_name.force_igmp_version=2" >> /etc/sysctl.conf
sysctl -p
# Enable IPv4 forwarding
# IMPORTANT: Only do this on the router instance!
echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p
# Restart the network service to make the changes take effect.
systemctl restart network
Storage
Let’s look at storage. Create an Amazon RDS database using the MySQL 8.0 engine, and set a master password for CloudStack’s database setup tool to use. Refer to the CloudStack documentation to find the MySQL settings that you’ll need. You can put the settings in an RDS parameter group. In case you’re wondering why I’m not using Amazon Aurora, it’s because CloudStack needs the MyISAM storage engine, which isn’t available in Aurora.
I recommend Amazon EFS for file storage. For efficiency, create a mount target in the subnet with your EC2 instances. That will enable them to communicate directly with the mount target, thereby bypassing the overlay network and router. Note that the system VMs will use Amazon EFS via the router.
If you want, you can consolidate your CloudStack file systems. Just create a single file system with directories for each zone and type of storage. For example, I use directories named /zone1/primary, /zone1/secondary, /zone2/primary, etc. You should also consider enabling provisioned throughput on the file system, or you may run out of bursting credits after booting a few VMs.
One consequence of the file system’s scalability is that the amount of free space (8 exabytes) will cause an integer overflow in CloudStack! To avoid this problem, reduce storage.overprovisioning.factor in CloudStack’s global settings from 2 to 1.
When your environment is ready, install CloudStack. When it asks for a default gateway, remember to use the router’s overlay network address. When you add a host, make sure that you use the host’s overlay network IP address.
The security groups that you created for the instances, database, and file system
Conclusion
The approach I shared has many steps, but they’re not bad when you have a plan. Whether you need a simple setup for experiments, or you need a scalable environment for a data center migration, you now have a path forward. Give it a try, and comment here about the things you learned. I hope you find it useful and fun!
“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.
This blog is written by Mark Rogers, SDE II – Customer Engineering AWS.
How do you put a cloud inside another cloud? Some features that make Amazon Elastic Compute Cloud (Amazon EC2) secure and wonderful also make running CloudStack difficult. The biggest obstacle is that AWS and CloudStack both want to manage network resources. Therefore, we must keep them out of each other’s way. This requires some steps that aren’t obvious, and it took a long time to figure out. I’m going to share what I learned, so that you can navigate the process more easily.
Apache CloudStack is an open-source platform for deploying and managing virtual machines (VMs) and the associated network and storage infrastructure. You would normally run it on your own hardware to create your own cloud. But there can be advantages to running it inside of an Amazon Virtual Private Cloud (Amazon VPC), including how it could help you migrate out of a data center. It’s a great way to create disposable environments for experiments or training. Furthermore, it’s a convenient way to test-drive the new CloudStack support in Amazon Elastic Kubernetes Service (Amazon EKS) Anywhere. In my case, I needed to create development and test environments for a project that uses the CloudStack API. The environments needed to be shared and scalable. Our build pipelines were already in AWS, so it made sense to put the new environments there, too.
CloudStack can work with a number of hypervisors. The instructions in this article will use Kernel-based Virtual Machine (KVM) on Linux. KVM will manage the VMs at a low level, and CloudStack will manage KVM.
Prerequisites
Most of the information in this article should be applicable to a range of CloudStack versions. I targeted CloudStack 4.14 on CentOS 7. I also tested CloudStack versions 4.16 and 4.17, and I recommend them.
The official CentOS 7 x86_64 HVM image works well. If you use a different Linux flavor or version, then you might have to modify some of the implementation details.
You’ll need to know the basics of CloudStack. The scope of this article is making CloudStack and AWS coexist peacefully. Once CloudStack is running, I’m assuming that you’ll handle things from there. Refer to the AWS documentation and CloudStack documentation for information on security and other best practices.
Making things easier
I wrote some scripts to automate the installation. You can run them on EC2 instances with CentOS 7, and they’ll do all the installation and OS configuration for you. You can use them as they are, or customize them to meet your needs. I also wrote some AWS CloudFormation templates you can copy in order to create a demo environment. The README file has more details.
I like c5.metal because it’s one of the least expensive metal types, and has a low cost per vCPU. It has 96 vCPUs and 192 GiB of memory. If you run 20 VMs on it, with 4 CPU cores and 8 GiB of memory each, then you’d still have 16 vCPUs and 32 GiB to share between the operating system, CloudStack, and MySQL. Using CloudStack’s overprovisioning feature, you could fit even more VMs if they’re running light loads.
Networking
The biggest challenge is the network. AWS knows which IP and MAC addresses should exist, and it knows the machines to which they should belong. It blocks any traffic that doesn’t fit its idea of how the network should behave. Simultaneously, CloudStack assumes that any IP or MAC address it invents should work just fine. When CloudStack assigns addresses to VMs on an AWS subnet, their network traffic gets blocked.
You could get around this by enabling network address translation (NAT) on the instance running CloudStack. That’s a great solution if it fits your needs, but it makes it hard for other machines in your Amazon VPC to contact your VMs. I recommend a different approach.
Although AWS restricts what you can do with its layer 2 network, it’s perfectly happy to let you run your own layer 3 router. Your EC2 instance can act as a router to a new virtual subnet that’s outside of the jurisdiction of AWS. The instance integrates with AWS just like a VPN appliance, routing traffic to wherever it needs to go. CloudStack can do whatever it wants in the virtual subnet, and everybody’s happy.
What do I mean by a virtual subnet? This is a subnet that exists only inside the EC2 instance. It consists of logical network interfaces attached to a Linux bridge. The entire subnet exists inside a single EC2 instance. It doesn’t scale well, but it’s simple. In my next post, I’ll cover a more complicated setup with an overlay network that spans multiple instances to allow horizontal scaling.
The simple way
The simple way is to put everything in one EC2 instance, including the database, file storage, and a virtual subnet. Because everything’s stored locally, allocate enough disk space for your needs. 500 GB will be enough to support a few basic VMs. Create or select a security group for your instance that gives users access to the CloudStack UI (TCP port 8080). The security group should also allow access to any services that you’ll offer from your VMs.
When you have your instance, configure AWS to treat it as a router.
a. Go to the VPC settings, and select Route Tables.
b. Identify the tables for subnets that need CloudStack access.
c. In each of these tables, add a route to the new virtual subnet. The route target should be your EC2 instance.
4. Depending on your network needs, you may also need to add routes to transit gateways, VPN endpoints, etc.
Because everything will be on one server, creating a virtual subnet is simply a matter of creating a Linux bridge. CloudStack must find a network adapter attached to the bridge. Therefore, add a dummy interface with a name that CloudStack will recognize.
The following snippet shows how I configure networking in CentOS 7. You must provide values for the variables $virutal_host_ip_address and $virtual_netmask to reflect the virtual subnet that you want to create. For $dns_address, I recommend the base of the VPC IPv4 network range, plus two. You shouldn’t use 169.654.169.253 because CloudStack reserves link-local addresses for its own use.
yum install -y bridge-utils net-tools
# The bridge must be named cloudbr0.
cat << EOF > /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
IPV6INIT=no
IPV6_AUTOCONF=no
DELAY=5
STP=yes
USERCTL=no
NM_CONTROLLED=no
IPADDR=$virtual_host_ip_address
NETMASK=$virtual_netmask
DNS1=$dns_address
EOF
# Create a dummy network interface.
cat << EOF > /etc/sysconfig/modules/dummy.modules
#!/bin/sh
/sbin/modprobe dummy numdummies=1
/sbin/ip link set name ethdummy0 dev dummy0
EOF
chmod +x /etc/sysconfig/modules/dummy.modules
/etc/sysconfig/modules/dummy.modules
cat << EOF > /etc/sysconfig/network-scripts/ifcfg-ethdummy0
TYPE=Ethernet
BOOTPROTO=none
NAME=ethdummy0
DEVICE=ethdummy0
ONBOOT=yes
BRIDGE=cloudbr0
NM_CONTROLLED=no
EOF
# Turn the instance into a router
echo 'net.ipv4.ip_forward=1' >> /etc/sysctl.conf
sysctl -p
# Must kill dhclient or the network service won't restart properly.
# A reboot would also work, if you’d rather do that.
pkill dhclient
systemctl restart network
CloudStack must know which IP addresses to use for inter-service communication. It will select by resolving the machine’s fully qualified domain name (FQDN) to an address. The following commands will make it to choose the right one. You must provide a value for $virtual_host_ip_address.
Remember that CloudStack is only directly connected to your virtual network. The EC2 instance is the router that connects the virtual subnet to the Amazon VPC. When you’re configuring CloudStack, use your instance’s virtual subnet address as the default gateway.
To access CloudStack from your workstation, you’ll need a connection to your VPC. This can be through a client VPN or a bastion host. If you use a bastion, its subnet needs a route to your virtual subnet, and you’ll need an SSH tunnel for your browser to access the CloudStack UI. The UI is at http://x.x.x.x:8080/client/, where x.x.x.x is your CloudStack instance’s virtual subnet address. Note that CloudStack’s console viewer won’t work if you’re using an SSH tunnel.
If you’re just experimenting with CloudStack, then I suggest saving money by stopping your instance when it isn’t needed. The safe way to do that is:
Disable your zone in the CloudStack UI.
Put the primary storage into maintenance mode.
Wait for the switch to maintenance mode to be complete.
Stop the EC2 instance.
When you’re ready to turn everything back on, simply reverse those steps. If you have any virtual routers in CloudStack, then you may need to start those, too.
Getting CloudStack to run on AWS isn’t so bad. The hardest part is simply knowing how. The setup explained here is great for small installations, but it can only scale vertically. In my next post, I’ll show you how to create an installation that scales horizontally. Instead of using a virtual subnet that exists in a single EC2 instance, we’ll build an overlay network that spans multiple instances. It will use more components and features, including some that might be new to you. I hope you find it interesting!
Now that you can create a simple setup, give it a try! I hope you have fun and learn something new along the way. Comment with the results of your experiments.
“Apache”, “Apache CloudStack”, and “CloudStack” are trademarks of the Apache Software Foundation.
Amazon CodeCatalyst is a modern software development service that empowers teams to deliver software on AWS easily and quickly. CodeCatalyst provides one place where you can plan, code, and build, test, and deploy applications with continuous integration/continuous delivery (CI/CD) tools. It also helps streamlined team collaboration. Developers on modern software teams are usually distributed, work independently, and use disparate tools. Often, ad hoc collaboration is necessary to resolve problems. Today, developers are forced to do this across many tools, which distract developers from their primary task—adding business critical features and enhancing their quality and completeness.
In this post, we explain how Contino uses CodeCatalyst to on-board their engineering team onto new projects, eliminates the overhead of managing disparate tools, and streamlines collaboration among different stakeholders.
The Problem
Contino helps customers migrate their applications to the cloud, and then improves their architecture by taking full advantage of cloud-native features to improve agility, performance, and scalability. This usually involves the build out of a central landing zone platform. A landing zone is a set of standard building blocks that allows customers to automatically create accounts, infrastructure and environments that are pre-configured in line with security policies, compliance guidelines and cloud native best practices. Some features are common to most landing zones, for example creating secure container images, AMIs, and environment setup boilerplate. In order to provide maximum value to the customers, Contino develops in-house versions of such features, incorporating AWS best practices, and later rolls out to the customer’s environment with some customization. Contino’s technical consultants, who are not currently assigned to customer work, collectively known as ‘Squad 0’ work on these features. Squad 0 builds the foundation for the work that will be re-used by other squads that work directly with Contino’s customers. As the technical consultants are typically on Squad 0 for a short period, it is critical that they can be productive in this short time, without spending too much time getting set up.
To build these foundational services, Contino was looking for something more integrated that would allow them to quickly setup development environments, enable collaboration between Squad 0 members, invite other squads to validate foundations services usage for their respective customers, and provide access to different AWS accounts and git repos centrally from one place. Historically, Contino has used disparate tools to achieve this, which meant having to grant/revoke access to the various AWS accounts individually on a continual basis. With these disparate tools, granting access to the tools needed for squads to be productive was non-trivial.
The Solution
It was at this point Contino participated in the private beta for CodeCatalyst prior to the public preview. CodeCatalyst has allowed Contino to move to a structure, as shown in Figure 1 below. A Project Manager at Contino creates a different project for each foundational service and invites Squad 0 members to join the relevant project. With CodeCatalyst, Squad 0 technical consultants use features like CI/CD, source repositories, and issue trackers to build foundational services. This helps eliminate the overhead of managing and integrating developer tools and provides more time to focus on developing code. Once Squad 0 is ready with the foundational services, they invite customer squads using their email address to validate the readiness of the project for use with their customers. Finally, members of Squad 0 use Cloud 9 Dev Environments from within CodeCatalyst to rapidly create consistent cloud development environments, without manual configuration, so they can work on new or multiple projects simultaneously, without conflict.
Figure 1: CodeCatalyst with multiple account connections
Contino uses CI/CD to conduct multi-account deployments. Contino typically does one of two types of deployments: 1. Traditional sequential application deployment that is promoted from one environment to another, for example dev -> test -> prod, and 2. Parallel deployment, for example, a security control that is required to be deployed out into multiple AWS accounts at the same time. CodeCatalyst solves this problem by making it easier to construct workflows using a workflow definition file that can deploy either sequentially or in parallel to multiple AWS accounts. Figure 2 shows parallel deployment.
Figure 2: CI/CD with CodeCatalyst
The Value
CodeCatalyst has reduced the time it takes for members of Squad 0 to complete the necessary on-boarding to work on foundational services from 1.5 days to about 1 hour. These tasks include setting up connections to source repositories, setting up development environments, configuring IAM roles and trust relationships, etc. With support for integrated tools and better collaboration, CodeCatalyst minimized overhead for ad hoc collaboration. Squad 0 could spend more time on writing code to build foundation services. This has led to tasks being completed, on average, 20% faster. This increased productivity led to increased value delivered to Contino’s customers. As Squad 0 is more productive, more foundation services are available for other squads to reuse for their respective customers. Now, Contino’s teams on the ground working directly with customers can re-use these services with some customization for the specific needs of the customer.
Conclusion
Amazon CodeCatalyst brings together everything software development teams need to plan, code, build, test, and deploy applications on AWS into a streamlined, integrated experience. With CodeCatalyst, developers can spend more time developing application features and less time setting up project tools, creating and managing CI/CD pipelines, provisioning and configuring various development environments or coordinating with team members. With CodeCatalyst, the Contino engineers can improve productivity and focus on rapidly developing application code which captures business value for their customers.
This is a guest post by Jeremy Bristow, Head of Product at Green Flag.
In the US, there’s a saying: “Sooner or later, you’ll break down and call Triple A.” In the UK, that same saying might be “Sooner or later, you’ll break down and call Green Flag.”
Green Flag has been assisting stranded motorists all over Europe for more than 50 years. Having started in the UK in 1971, with the first office located above a fish and chip shop, we have built an expansive network of automobile service providers to assist customers. With that network in place, Green Flag customers need only make a single phone call, or request service via the Green Flag app, to quickly contact a local repair garage, no matter where they’ve broken down.
Green Flag’s commitment to providing world-class service to its customers has consistently earned a Net Promoter Score (NPS) of over 70. We were also recently rated as the top breakdown provider in the H1 2022 UK Customer Satisfaction Index report, which is a national benchmark of customer satisfaction covering 13 sectors and based on 45,000 customer responses.
Unlike most breakdown providers in the UK, Green Flag doesn’t currently own a large fleet of trucks to provide roadside assistance to our 3 million plus customers. Our established network of over 200 locally operated businesses provides in-depth knowledge and expertise related to their specific areas. Our overall mission is simple: “Make motoring stress-free, safe, and simple, for every driver.”
Tech transformation to democratize data
Over the past 4 years, Green Flag has been undertaking a technology transformation with the end goal of enabling data democratization and faster data-driven decisions. This has been a massive undertaking, requiring the business to replace our entire systems architecture, which we’ve done while remaining open 24/7, 365 days a year, maintaining our industry-leading customer service standards and growing the business. Building the plane while flying it is an apt analogy, but it felt more like performing open-heart surgery while running a marathon.
Early on during our tech transformation, we made the decision to develop natively with AWS, using our in-house development team. When it came time to decide on a new business intelligence (BI) tool, it was an easy decision to switch to Amazon QuickSight.
In this post, we discuss what influenced the decision to implement QuickSight and why doing so has made a significant impact on our efficiency and agility.
Changing the game with self-serve insights
Prior to embarking on this transformation, technology had consistently been a blocker to us being able to quickly serve our customers. Our old stack was slow, hard to change, and consistently prevented us from gaining valuable and timely insights from our datasets. We were data rich, but insight poor. It was clear that the status quo was not compatible with our ambitions to grow and trade in an increasingly digital world. We needed a BI tool to bring new and meaningful insights to our data, which would help us leverage our new technologies to deliver meaningful business change.
Although going with QuickSight was an easy decision, we considered the following factors to ensure our needs would be met:
Alignment to AWS architecture – We didn’t want to invest time and development resources into complicated implementations—we wanted a seamless integration that would cause no headaches.
Ease of use – We needed an intuitive, user-friendly interface that would make it easy for any user, no matter their tech background or level of expertise in pulling data insights.
Cost – Affordability is always a concern. With AWS, we can monitor daily usage and the associated costs via our console, so there were never any surprises or hidden costs post-implementation.
Although these considerations were all top of mind, the primary driver for switching to QuickSight is rooted in the data democratization goal within our tech transformation initiative. Providing self-serve access to data and insights to everyone, no matter their tech background, has been a game changer. With QuickSight, we can now present data and insights in a meaningful, easy-to-understand format via near-real-time embedded dashboards that anyone from any tech background can easily build.
Evolving our data culture with QuickSight
Because all employees are now empowered to make data-driven decisions, our data culture has begun to evolve. Having consistent, accurate dashboards that access standardized datasets brings with it a new level of confidence in decision-making. We’re no longer wasting time on discussions about data sources or the risks of inaccuracy. QuickSight and the data governance processes we have built around its use have shifted the discussion to focus on the insights, what they tell us, and what our next steps should be.
The following screenshot shows one of Green Flag’s custom-built dashboards.
Additionally, because QuickSight is so easy to use that self-service requires minimal training and no specialized skills, our data experts can now shift from responding to constant requests for information to focusing on more valuable projects that require their technical depth and expertise. We can now work more efficiently, producing actionable insights that have never before been available, at a pace that enables us to make better decisions, faster.
Mapping our next analytics adventure
The next features that the Green Flag team plan to test and then roll out is Amazon QuickSight Q, which should allow both our data teams and business users to have a more frictionless experience with our data. The feature should enable users to interrogate complex data using simple conversational questions as if they were talking to a colleague, producing results that are then customizable. This will further empower our self-serve strategy by enabling non-technical experts to pose business questions rather than rely on technical expertise and experience.
To learn more about how you can embed customized data visuals, interactive dashboards, and natural language querying into any application, visit Amazon QuickSight Embedded.
About the Author
Jeremy “Jez” Bristow is Head of Product at Green Flag. He has been working at Green Flag, part of Direct Line Group, for over 4 years and has been co-responsible for delivering the digital transformation programme within Green Flag. They have built a cloud-based ecosystem of micro-services supported by a data platform within AWS to enable the delivery of better customer outcomes to grow their business.
As we kick off 2023, I wanted to take a moment to highlight the top posts from 2022. Without further ado, here are the top 10 AWS DevOps Blog posts of 2022.
Sylvia Qi, Senior DevOps Architect, and Sebastian Carreras, Senior Cloud Application Architect, guide us through utilizing infrastructure as code (IaC) to automate GitLab Runner deployment on Amazon EC2.
Lerna Ekmekcioglu, Senior Solutions Architect, and Jack Iu, Global Solutions Architect, demonstrate best practices for multi-Region deployments using HashiCorp Terraform, AWS CodeBuild, and AWS CodePipeline.
Praveen Kumar Jeyarajan, Senior DevOps Consultant, and Vaidyanathan Ganesa Sankaran, Sr Modernization Architect, discuss unit testing Python-based AWS Glue Jobs in AWS CodePipeline.
James Bland, APN Global Tech Lead for DevOps, and Welly Siauw, Sr. Partner solutions architect, discuss the challenges of architecting Jenkins for scale and high availability (HA).
Harish Vaswani, Senior Cloud Application Architect, and Rafael Ramos, Solutions Architect, explain how you can configure and use tfdevops to easily enable Amazon DevOps Guru for your existing AWS resources created by Terraform.
Arun Donti, Senior Software Engineer with Twitch, demonstrates how to integrate cdk-nag into an AWS Cloud Development Kit (AWS CDK) application to provide continual feedback and help align your applications with best practices.
Adam Thomas, Senior Software Development Engineer, demonstrate how you can use Smithy to define services and SDKs and deploy them to AWS Lambda using a generated client.
A big thank you to all our readers! Your feedback and collaboration are appreciated and help us produce better content.
About the author:
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.