A new week has begun. Last week, there was a lot of news related to AWS. I have compiled a few announcements you need to know. Let’s get started right away!
Last Week’s Launches Let’s take a look at some launches from the last week that I want to remind you of:
New Amazon EC2 I4g Instances – Powered by AWS Graviton2 processors, Amazon Elastic Compute Cloud (Amazon EC2) I4g instances improve real-time storage performance up to 2x compared to prior generation storage-optimized instances. Based on AWS Nitro SSDs that are custom-built by AWS and reduce both latency and latency variability, I4g instances are optimized for workloads that perform a high mix of random read/write and require very low I/O latency, such as transactional databases and real-time analytics. To learn more, see Jeff’s post.
Amazon Aurora I/O-Optimized – You can now choose between two storage configurations for Amazon Aurora DB clusters: Aurora Standard or Aurora I/O-Optimized. For applications with low-to-moderate I/Os, Aurora Standard is a cost-effective option.
For applications with high I/Os, Aurora I/O-Optimized provides improved price performance, predictable pricing, and up to 40 percent costs savings. To learn more, see my full blog post.
AWS Management Console Private Access – This is a new security feature that allows you to limit access to the AWS Management Console from your Virtual Private Cloud (VPC) or connected networks to a set of trusted AWS accounts and organizations. It is built on VPC endpoints, which use AWS PrivateLink to establish a private connection between your VPC and the console.
AWS Management Console Private Access is useful when you want to prevent users from signing in to unexpected AWS accounts from within your network. To learn more, see the AWS Management Console getting started guide.
One-Click Security Protection on the Amazon CloudFront Console – You can now secure your web applications and APIs with AWS WAF with a single click on the Amazon CloudFront console. CloudFront handles creating and configuring AWS WAF for you with out-of-the-box protections recommended by AWS and this simple and convenient way to protect applications at the time you create or edit your distribution.
You may continue to select a preconfigured AWS WAF web access control list (ACL) when you prefer to use an existing web ACL. To learn more, see Using AWS WAF to control access to your content in the AWS documentation.
Tracing AWS Lambda SnapStart Functions with AWS X-Ray – You can use AWS X-Ray traces to gain deeper visibility into your function’s performance and execution lifecycle, helping you identify errors and performance bottlenecks for your latency-sensitive Java applications built using SnapStart-enabled functions.
With X-Ray support for SnapStart-enabled functions, you can now see trace data about the restoration of the execution environment and execution of your function code. You can enable X-Ray for Java-based SnapStart-enabled Lambda functions running on Amazon Corretto 11 or 17. To learn more about X-Ray for SnapStart-enabled functions, visit the Lambda Developer Guide or read Marcia’s blog post.
For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.
Open Source Updates Last week, we introduced new open-source projects and significant roadmap contributions to the Jupyter community.
Snapchange – Snapchange is a new open-source project to make fuzzing of a memory snapshot easier using KVM written by Rust. Snapchange enables a target binary to be fuzzed with minimal modifications, providing useful introspection that aids in fuzzing. Snapchange utilizes the features of the Linux kernel’s built-in virtual machine manager known as kernel virtual machine or KVM. To learn more, see the announcement post and GitHub repository.
Cedar – Cedar is a new open-source language for defining permissions as policies, which describes who should have access to what, and evaluating those policies. You can use Cedar to control access to resources such as photos in a photo-sharing app, compute nodes in a microservices cluster, or components in a workflow automation system. Cedar is also authorization-policy language used by the Amazon Verified Permissions, a scalable, fine-grained permissions management and authorization service for custom applications and AWS Verified Access managed services to validate each application request before granting access. To learn more, see the announcement post , Amazon Science blog post and Cedar playground to test sample policies.
Jupyter Community Contributions – We announced new contributions to Jupyter community to democratize generative artificial intelligence (AI) and scale machine learning (ML) workloads. We contributed two Jupyter extensions – Jupyter AI to bring generative AI to Jupyter notebooks and Amazon CodeWhisperer Jupyter extension to generate code suggestions for Python notebooks in JupyterLab. We also contributed three new capabilities to help you scale ML development faster: notebooks scheduling, SageMaker open-source distribution, and Amazon CodeGuru Jupyter extension. To learn more, see the announcement post and Jupyter on AWS.
Upcoming AWS Events Check your calendars and sign up for these AWS-led events:
AWS Serverless Innovation Day on May 17 – Join us for a free full-day virtual event to learn about AWS Serverless technologies and event-driven architectures from customers, experts, and leaders. Marcia outlined the agenda and main topics of this event in her post. You can register on the event page.
AWS Data Insights Day on May 24 – Join us for another virtual event to discover ways to innovate faster and more cost-effectively with data. Whether your data is stored in operational data stores, data lakes, streaming engines, or within your data warehouse, Amazon Redshift helps you achieve the best performance with the lowest spend. This event focuses on customer voices, deep-dive sessions, and best practices of Amazon Redshift. You can register on the event page.
AWS Silicon Innovation Day on June 21 – Join AWS leaders and experts showcasing AWS innovations in custom-designed EC2 chips built for high performance and scale in the cloud. AWS has designed and developed purpose-built silicon specifically for the cloud. You can understand AWS Silicons and how they can use AWS’s unique EC2 chip offerings to their benefit. You can register on the event page.
AWS re:Inforce 2023 – You can still register for AWS re:Inforce, in Anaheim, California, June 13–14.
AWS Community Day – Join community-led conferences driven by AWS user group leaders closest to your city: Chicago (June 15), and Philippines (June 29–30).
Since Amazon Aurora launched in 2014, hundreds of thousands of customers have chosen Aurora to run their most demanding applications. Aurora provides unparalleled high performance and availability at global scale with full MySQL and PostgreSQL compatibility at up to one-tenth the cost of commercial databases.
Many customers benefit from the cost-effectiveness of Aurora’s current simple, pay-per-request pricing for input/output (I/O) usage, removing the need to provision I/Os in advance. Customers also benefit from additional cost-saving innovations such as Amazon Aurora Serverless v2 (ASv2), which provides seamless scaling in fine-grained increments based on the application’s demands. For workloads with spikes in demand, you can save up to 90 percent in costs vs. provisioning capacity for peak load with ASv2.
Today, we are announcing the general availability of Amazon Aurora I/O-Optimized, a new cluster configuration that offers improved price performance and predictable pricing for customers with I/O-intensive applications, such as e-commerce applications, payment processing systems, and more. Aurora I/O-Optimized offers improved performance, increasing throughput and reducing latency to support your most demanding workloads.
You can now confidently predict costs for your most I/O-intensive workloads, with up to 40 percent cost savings when your I/O spend exceeds 25 percent of your current Aurora database spend. If you are using Reserved Instances, you will see even greater cost savings.
Now you have the flexibility to choose between the existing configuration newly called Aurora Standard, which is the existing pay-per-request pricing model that is cost-effective for applications with low-to-moderate I/O usage or the new Aurora I/O-Optimized configuration for I/O-intensive applications.
Getting Started with Aurora I/O-Optimized You can create a new database cluster using the Aurora I/O-Optimized configuration or convert your existing database clusters with a few clicks in the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs.
For the Aurora MySQL-Compatible Edition and Aurora PostgreSQL-Compatible Edition, you can choose either the Aurora Standard or Aurora I/O-Optimized configuration.
Aurora I/O-Optimized configuration is available in the latest version of Aurora MySQL version 3.03.1 and higher, Aurora PostgreSQL v15.2 and higher, v14.7 and higher, and v13.10 and higher.
This configuration supports Intel-based Aurora database instance types such as t3, r5, and r6i, Graviton-based database instance types such as t4g, r7g, and x2g, Aurora Serverless v2, Aurora Global Database, on-demand Aurora database instances, and reserved instances.
R7g instances for Amazon Aurora are powered by the latest generation AWS Graviton3 processors, delivering up to 30 percent performance gains and up to 20 percent improved price performance for Aurora, as compared to R6g instances.
In your existing Aurora clusters, you can switch the storage configuration to Aurora I/O-Optimized once every 30 days or switch back to Aurora Standard at any time. You can change the cluster storage configuration only at the cluster level. The change applies to all instances in the cluster.
After changing the configuration, you don’t need to reboot the database instances within the cluster to take advantage of the price-performance benefits of Aurora I/O-Optimized.
Now Available Amazon Aurora I/O-Optimized configuration is now generally available for Amazon Aurora MySQL-Compatible Edition and Aurora PostgreSQL-Compatible Edition in most AWS Regions where Aurora is available, with China (Beijing), China (Ningxia), AWS GovCloud (US-East), and AWS GovCloud (US-West) Regions coming soon.
Aurora is billed differently for the two configurations: Aurora Standard or Aurora I/O-Optimized. The latter doesn’t charge for I/Os, charging a set price for compute and storage relative to the former. For I/O-intensive applications, its price/performance will be better, and you can save up to 40 percent on costs. To see pricing examples, visit the Aurora Pricing page.
With Amazon Relational Database Service (Amazon RDS), you can set up, operate, and scale a relational database in the AWS Cloud. Amazon RDS provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks.
If you use Amazon RDS for your workloads, you can now use Amazon GuardDutyRDS Protection to help detect threats to your data stored in Amazon Aurora databases. GuardDuty is a continuous security monitoring service that can help you identify and prioritize potential threats in your AWS environment. By analyzing and profiling RDS login activity to your Aurora databases, GuardDuty can detect threats, such as high severity brute force events, suspicious logins, access from Tor, and access by known threat actors.
In this post, we will provide an overview of how to get started with RDS Protection, dive into its finding types, and walk you through examples of how to investigate and remediate findings.
Overview of RDS Protection
RDS Protection in GuardDuty analyzes and profiles Amazon RDS login activity to identify potential threats to your data stored in Aurora databases by using a combination of threat intelligence and machine learning. At launch, RDS Protection supports Aurora MySQL versions 2.10.2 and 3.2.1 or later and Aurora PostgreSQL versions 10.17, 11.12, 12.7, 13.3, and 14.3 or later. An updated list of the supported engines and versions is available in the GuardDuty documentation. RDS Protection doesn’t require additional infrastructure, and you don’t need to configure, collect, or store RDS logs in your own account. RDS Protection is also designed to have no impact on the performance of your database instances so that you don’t have to worry about compromising performance to better secure your data stored in Amazon RDS.
When RDS Protection detects a suspicious or anomalous login attempt that indicates a potential threat to your database instance, GuardDuty generates a finding with details to help you quickly identify relevant information to assist in remediation. RDS Protection findings include details on both anomalous and normal login activity in addition to information such as database instance details, database user details, action information, and actor information. These findings are available to you in the GuardDuty console, AWS Command Line Interface (AWS CLI), and API, and all GuardDuty findings are sent to Amazon EventBridge and AWS Security Hub, giving you options to respond by sending alerts to chat or ticketing systems, or by using AWS Lambda and AWS Systems Manager for automatic remediation.
Enable RDS Protection
Getting started with RDS Protection is simple, and you can do it with just a few steps in the console. Both new and existing GuardDuty customers can take advantage of the GuardDuty RDS Protection 30-day free trial. You can turn RDS Protection on or off for each of your accounts in supported AWS Regions. If you already use GuardDuty, you will need to enable RDS Protection either in the console or CLI, or through the API. You will have the option to enable it in the account that you are currently in, or if you are using a GuardDuty delegated administrator account (as shown in Figure 1), you can enable it for all accounts in your AWS Organizations organization. You’ll also have the ability to auto-enable. The auto-enable feature helps ensure that RDS Protection is enabled for each new account added to your organization, without the need for you to configure anything in each member account. If you are turning on GuardDuty for the first time, RDS Protection is enabled by default.
After GuardDuty generates a finding, you will need to analyze the finding so that you understand the potential impact to your environment. We recommend that you familiarize yourself with the GuardDuty finding types. Understanding GuardDuty finding types can help you understand the types of activity that GuardDuty is looking for and help you prepare for how to respond if they occur in your environment.
As adversaries become more sophisticated, it becomes even more important for you to align to a common framework to understand the tactics, techniques, and procedures (TTPs) behind an individual event. GuardDuty aligns findings using the MITRE ATT&CK framework, which is a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations. GuardDuty findings have a specific finding format that helps you understand the details of each finding. You can examine the Threat Purpose section of the GuardDuty finding types to see finding types associated with various MITRE ATT&CK tactics, including CredentialAccess and Discovery. This can help you identify and understand the type of activity associated with a finding.
For example, consider two finding types that seem similar: CredentialAccess:RDS/MaliciousIPCaller.SuccessfulLoginand Discovery:RDS/MaliciousIPCaller. The difference between them is the ThreatPurpose aspect, located at the beginning of the finding type. GuardDuty has determined that both are involved with MaliciousIPCaller, and the difference is the intent of the activity associated with each finding. CredentialAccess SuccessfulLogin indicates that there was a successful login to your RDS database from a known malicious IP address. Discovery indicates that a threat actor opened a connection to the database, but didn’t attempt to authenticate. This indicates scanning behavior, but it might not be targeted at RDS instances. For more information, see GuardDuty RDS Protection finding types.
GuardDuty uses threat intelligence and machine learning to continually monitor and identify potential threats in your environment. To understand how to investigate RDS Protection finding types, you need to understand the details of a finding type that are derived from machine learning. As shown in Figure 2, RDS Protection finding types have two sections: one that shows the unusual behavior and one that shows the normal, historical behavior. To determine this, GuardDuty uses machine learning models to evaluate API requests to your account and identify anomalous events that are associated with tactics used by adversaries. The machine learning model tracks various factors of the API request, such as the user that made the request, the location the request was made from, and the specific API that was requested. It also looks at information such as successfulLoginCount, failedLoginCount, and incompleteConnectionCount for anomalies based on login activity. For more information about anomalous activity in GuardDuty findings, see Anomalous behavior.
With RDS Protection, you now have an additional mechanism to gain insight into your Amazon RDS databases across your accounts to continuously monitor for suspicious activity. RDS Protection can alert you to suspicious activity in Amazon RDS, such as a potentially suspicious or anomalous login attempt, unusual pattern in a series of successful, failed, or incomplete login attempts, and unauthorized access to your database instance from a previously unseen internal or external actor. With this new feature, GuardDuty also extends support for finding types that you might already be familiar with that also apply to RDS databases. These finding types include calls to an RDS database API from a Tor node, or calls to an RDS database from a known malicious IP address, which can indicate that there are interactions with your RDS database from sources that are associated with known malicious activity.
Remediate RDS Protection findings
In this section, we describe two RDS Protection findings and how you can investigate and remediate them. Understanding how to remediate these findings can help you maintain the integrity of your database. We share recommendations that focus specifically on security groups, network access control lists (network ACLs), and firewall rules.
The CredentialAccess:RDS/AnomalousBehavior.SuccessfulLogin finding informs you that an anomalous successful login was observed on an RDS database in your AWS environment. It might indicate that a previous unseen user logged in to an RDS database for the first time. A common scenario involves an internal user logging in to a database that is accessed programmatically by applications and not by individual users. A potential malicious actor might have compromised and accessed the role on your RDS database. The default Severity for this finding varies, depending on the anomalous behavior associated with the finding.
Figure 3 shows an example of this finding.
Figure 3: Finding of an anomalous behavior successful login
How to remediate
If the activity is unexpected for the associated database, AWS recommends that you change the password of the associated database user, and review available audit logs for activity that the user performed. Medium and high severity findings might indicate an overly permissive access policy to the database, and user credentials might have been exposed or compromised. We recommend that you place the database in a private virtual private cloud (VPC), and limit the security group rules to allow traffic only from necessary sources. For more information, see Remediating potentially compromised database with successful login events.
We recommend that you take the following steps to remediate this finding:
Remediation step 1: Identify the affected database and user
Identify the affected database and user and confirm whether the behavior is expected or unexpected by looking through the GuardDuty finding details, which provide the name of the affected database instance and the corresponding user details. Use the findings to confirm if the behavior is expected or not—for example, the findings might help you identify a user who logs in to their database instance after a long time has passed; a user who logs in to their database instance only occasionally, such as a financial analyst who logs in each quarter; or a suspicious actor who is involved in a successful login attempt that isn’t authorized and potentially compromises the database instance. Review the IP address of the finding. Public IP addresses might signify overly permissive access if it’s not a known network associated with your account.
Figure 4: Finding with details showing Amazon RDS database instance and user details
If the behavior is unexpected, complete the following steps:
Restrict database instance access for the suspected accounts and the source of the login activity. For more information, see Remediating potentially compromised credentials and Restrict network access. You can identify the user in the RDS DB user details section within the finding panel in the console, or within the resource.rdsDbUserDetails of the findings JSON. These user details include user name, application used, database accessed, SSL version, and authentication method.
The following CLI command is an example of how to revoke access to a user in a MySQL database. If the behavior is unexpected, you can revoke the privileges while you assess if the user is malicious.
REVOKE CONNECTION_ADMIN ON *.* FROM 'fakeadmin'@'%';
You can revoke privileges from the user, but when taking this action, you should make sure that the user isn’t vital to your system and that revoking permissions won’t break your production or development application. The following CLI command is an example of how to revoke privileges from a user:
REVOKE ALL PRIVILEGES ON *.* FROM 'fakeadmin'@'%';
If you know that the user isn’t necessary for your database or application to function, then you can remove the user from the system. To make sure that your security team can run forensics, check your company’s incident response policy. If you need help getting started with incident response, see AWS sample incident response playbooks. The following CLI command is an example of how to remove a user:
DROP USER 'fakeadmin'@'%';
Let’s say that you find the behavior unexpected, but the user turns out to be the application user, and making a change to the database credential will break your application. You can use AWS Systems Manager to help in this scenario, in which the affected RDS user is the account that is tied to your application. In many cases, a password rotation can break your application, depending on how you connect. If you rotate the password without notifying your application, the application might require additional cascading changes. You could lose connectivity to your application because the credentials that your application is using to connect to your database didn’t change, and now you are experiencing an outage that will remain until you update the credentials. Systems Manager can tie into your application code to automatically update the rotated database credentials in your application. For more information, see Rotate Amazon RDS database credentials automatically with AWS Secrets Manager.
The following figure shows a CLI command to get a secret from Secrets Manager — for this example, we assume the secret is compromised.
Figure 5: Example compromised credentials
The following figures shows that we have a new set of credentials that replace our old credentials, as indicated by “CreatedDate”.
Figure 6: Example remediated credentials
Remediation step 3: Assess the impact and determine what information was accessed
To learn how to restrict IP access on a security group, see Control traffic to resources using security groups. You can identify the user in the RDS DB user details section within the finding panel in the console, or within the resource.rdsDbUserDetails of the findings JSON. These user details include user name, application used, database accessed, SSL version, and authentication method.
Remediation step 5: Perform root-cause analysis and determine the steps that potentially led to this activity
Implementing a lessons-learned framework and methodology can help improve your incident response capabilities and also help prevent the incident from recurring. By learning from each incident, you can help avoid repeating the same mistakes, exposures, or misconfigurations, which can both improve your security posture and reduce the time lost to preventable situations. To learn more about post-incident activity, see AWS Security Incident Response Guide.
The CredentialAccess:RDS/AnomalousBehavior.successfulBruteForce finding informs you that an anomalous login occurred that is indicative of a successful brute force event, as observed on an RDS database in your AWS environment. Before the anomalous successful login, a consistent pattern of unusual failed login attempts was observed. This indicates that the user and password associated with the RDS database in your account might have been compromised, and a potentially malicious actor might have accessed the RDS database. The Severity of this finding is high. Figure 7 shows an example of this finding.
Figure 7: Example of an anomalous successful brute force finding
How to remediate
This activity indicates that database credentials might have been exposed or compromised. We recommend that you change the password of the associated database user, and review available audit logs for activity performed by the potentially compromised user. A consistent pattern of unusual failed login attempts indicates an overly permissive access policy to the database, or that the database might also have been publicly exposed. AWS recommends that you place the database in a private VPC, and limit the security group rules to allow traffic only from necessary sources. For more information, see Remediating potentially compromised database with successful login events.
We recommend that you take the following steps to remediate this finding
Remediation step 1: Identify the affected database and user
The generated GuardDuty finding provides the name of the affected database instance and the corresponding user details. For more information, see Finding details.
Figure 8: Finding details showing Amazon RDS database instance and user details
Remediation step 2: Identify the source of the failed login attempts
In the generated GuardDuty finding, you can find the IP address, and if it was a public connection, the ASN organization in the Actor section of the finding panel. An autonomous system is a group of one or more IP prefixes (lists of IP addresses accessible on a network) run by one or more network operators that maintain a single, clearly-defined routing policy. Network operators need autonomous system numbers (ASNs) to control routing within their networks and to exchange routing information with other internet service providers.
Figure 9: Action and actor details related to GuardDuty brute force finding
Remediation step 3: Confirm that the behavior is unexpected
Examine if this activity represents an attempt to gain additional unauthorized access to the database instance as follows:
If the source is internal to your network, examine if an application is misconfigured and attempting a connection repeatedly.
If this is an external actor, examine whether the corresponding database instance is public facing or is misconfigured and thus allowing potential malicious actors to attempt to log in with common user names.
If the behavior is unexpected, complete the following steps:
As discussed previously for the CredentialAccess:RDS/AnomalousBehavior.SuccessfulLogin finding, you can restrict access to the database through credentials or network access:
Remediation step 5: Perform root-cause analysis and determine the steps that potentially led to this activity
By learning from each incident, you can help avoid repeating the same mistakes, exposures, or misconfigurations, which can both improve your security posture and reduce time lost to preventable situations.
Conclusion
In this post, you learned about the new GuardDuty RDS Protection feature and how to understand, operationalize, and respond to the new findings. You can enable this feature through the GuardDuty console, CLI, or APIs to start monitoring your Amazon RDS workloads today.
If you’ve created EventBridge rules to send findings from GuardDuty to a target, make sure that you’ve configured your rules to deliver the newly added findings. After you enable GuardDuty findings, consider creating IR playbooks, doing tabletops and AWS gamedays, and mapping out what you want to automate. For more information, see the AWS Security Incident Response Guide and AWS Incident Response Playbook resources. To gain hands-on experience with different AWS Security services, see AWS Activation Days. The Activation Days workshops begin with hands-on work with different services in sandbox accounts, and then take you through the steps to deploy them across your organization.
To make it more efficient for you to operate securely on AWS, we are committed to continually improving GuardDuty, and we value your feedback. If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on AWS re:Post or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
This is a guest blog post co-written with Corey Johnson from Huron.
Having an accurate and up-to-date inventory of all technical assets helps an organization ensure it can keep track of all its resources with metadata information such as their assigned oners, last updated date, used by whom, how frequently and more. It helps engineers, analysts and businesses access the most up-to-date release of the software asset that bring accuracy to the decision-making process. By keeping track of this information, organizations will be able to identify technology gaps, refresh cycles, and expire assets as needed for archival.
In addition, an inventory of all assets is one of the foundational elements of an organization that facilitates the security and compliance team to audit the assets for improving privacy, security posture and mitigate risk to ensure the business operations run smoothly. Organizations may have different ways of maintaining an asset inventory, that may be an Excel spreadsheet or a database with a fully automated system to keep it up-to-date, but with a common objective of keeping it accurate. Even if organizations can follow manual approaches to update the inventory records but it is recommended to build automation, so that it is accurate at any point of time.
The DevOps practices which revolutionized software engineering in the last decade have yet to come to the world of Business Intelligence solutions. Business intelligence tools by their nature use a paradigm of UI driven development with code-first practices being secondary or nonexistent. As the need for applications that can leverage the organizations internal and client data increases, the same DevOps practices (BIOps) can drive and delivery quality insights more reliably
In this post, we walk you through a solution that Huron and manage lifecycle for all Amazon QuickSight resources across the organization by collaborating with AWS Data Lab Resident Architect & AWS Professional Services team.
About Huron
Huron is a global professional services firm that collaborates with clients to put possible into practice by creating sound strategies, optimizing operations, accelerating digital transformation, and empowering businesses and their people to own their future. By embracing diverse perspectives, encouraging new ideas, and challenging the status quo, Huron creates sustainable results for the organizations we serve. To help address its clients’ growing cloud needs, Huron is an AWS Partner.
Use Case Overview
Huron’s Business Intelligence use case represents visualizations as a service, where Huron has core set of visualizations and dashboards available as products for its customers. The products exist in different industry verticals (healthcare, education, commercial) with independent development teams. Huron’s consultants leverage the products to provide insights as part of consulting engagements. The insights from the product help Huron’s consultants accelerate their customer’s transformation. As part of its overall suite of offerings, there are product dashboards that are featured in a software application following a standardized development lifecycle. In addition, these product dashboards may be forked for customer-specific customization to support a consulting engagement while still consuming from Huron’s productized data assets and datasets. In the next stage of the cycle, Huron’s consultants experiment with new data sources and insights that in turn fed back into the product dashboards.
When changes are made to a product analysis, challenges arise when a base reference analysis gets updated because of new feature releases or bug fixes, and all the customer visualizations that are created from it also need to be updated. To maintain the integrity of embedded visualizations, all metadata and lineage must be available to the parent application. This access to the metadata supports the need for updating visuals based on changes as well as automating row and column level security ensuring customer data is properly governed.
In addition, few customers request customizations on top of the base visualizations, for which Huron team needs to create a replica of the base reference and then customize it for the customer. These are maintained by Huron’s in the field consultants rather than the product development team. These customer specific visualizations create operational overhead because they require Huron to keep track of new customer specific visualizations and maintain them for future releases when the product visuals change.
Huron leverages Amazon QuickSight for their Business Intelligence (BI) reporting needs, enabling them to embed visualizations at scale with higher efficiency and lower cost. A large attraction for Huron to adopt QuickSight came from the forward-looking API capabilities that enable and set the foundation for a BIOps culture and technical infrastructure. To address the above requirement, Huron Global Product team decided to build a QuickSight Asset Tracker and QuickSight Asset Deployment Pipeline.
The QuickSight Asset tracker serves as a catalogue of all QuickSight resources (datasets, analysis, templates, dashboards etc.) with its interdependent relationship. It will help;
Create an inventory of all QuickSight resources across all business units
Enable dynamic embedding of visualizations and dashboards based on logged in user
Enable dynamic row and column level security on the dashboards and visualizations based on the logged-in user
Meet compliance and audit requirements of the organization
Maintain the current state of all customer specific QuickSight resources
The solution integrates an AWS CDK based pipeline to deploy QuickSight Assets that:
Supports Infrastructure-as-a-code for QuickSight Asset Deployment and enables rollbacks if required.
Enables separation of development, staging and production environments using QuickSight folders that reduces the burden of multi-account management of QuickSight resources.
Enables a hub-and-spoke model for Data Access in multiple AWS accounts in a data mesh fashion.
The QuickSight Asset Tracker was built as an independent service, which was deployed in a shared AWS service account that integrated Amazon Aurora Serverless PostgreSQL to store metadata information, AWS Lambda as the serverless compute and Amazon API Gateway to provide the REST API layer.
It also integrated AWS CDK and AWS CloudFormation to deploy the product and customer specific QuickSight resources and keep them in consistent and stable state. The metadata of QuickSight resources, created using either AWS console or the AWS CDK based deployment were maintained in Amazon Aurora database through the QuickSight Asset Tracker REST API service.
The CDK based deployment pipeline is triggered via a CI/CD pipeline which performs the following functions:
Takes the ARN of the QuickSight assets (dataset, analysis, etc.)
Describes the asset and dependent resources (if selected)
Creates a copy of the resource in another environment (in this case a QuickSight folder) using CDK
The solution architecture integrated the following AWS services.
Amazon Aurora Serverless integrated as the backend database to store metadata information of all QuickSight resources with customer and product information they are related to.
Amazon QuickSight as the BI service using which visualization and dashboards can be created and embedded into the online applications.
AWS Lambda as the serverless compute service that gets invoked by online applications using Amazon API Gateway service.
Amazon SQS to store customer request messages, so that the AWS CDK based pipeline can read from it for processing.
AWS CodeCommit is integrated to store the AWS CDK deployment scripts and AWS CodeBuild, AWS CloudFormation integrated to deploy the AWS resources using an infrastructure as a code approach.
AWS CloudTrail is integrated to audit user actions and trigger Amazon EventBridge rules when a QuickSight resource is created, updated or deleted, so that the QuickSight Asset Tracker is up-to-date.
Amazon S3 integrated to store metadata information, which is used by AWS CDK based pipeline to deploy the QuickSight resources.
AWS LakeFormation enables cross-account data access in support of the QuickSight Data Mesh
The following provides a high-level view of the solution architecture.
Architecture Walkthrough:
The following provides a detailed walkthrough of the above architecture.
QuickSight Dataset, Template, Analysis, Dashboard and visualization relationships:
Steps 1 to 2 represent QuickSight reference analysis reading data from different data sources that may include Amazon S3, Amazon Athena, Amazon Redshift, Amazon Aurora or any other JDBC based sources.
Step 3 represents QuickSight templates being created from reference analysis when a customer specific visualization needs to be created and step 4.1 to 4.2 represents customer analysis and dashboards being created from the templates.
Steps 7 to 8 represent QuickSight visualizations getting generated from analysis/dashboard and step 6 represents the customer analysis/dashboard/visualizations referring their own customer datasets.
Step 10 represents a new fork being created from the base reference analysis for a specific customer, which will create a new QuickSight template and reference analysis for that customer.
Step 9 represents end users accessing QuickSight visualizations.
Asset Tracker REST API service:
Step 15.2 to 15.4 represents the Asset Tracker service, which is deployed in a shared AWS service account, where Amazon API Gateway provides the REST API layer, which invokes AWS Lambda function to read from or write to backend Aurora database (Aurora Serverless v2 – PostgreSQL engine). The database captures all relationship metadata between QuickSight resources, its owners, assigned customers and products.
Online application – QuickSight asset discovery and creation
Step 15.1 represents the front-end online application reading QuickSight metadata information from the Asset Tracker service to help customers or end users discover visualizations available and be able to dynamically render based on the user login.
Step 11 to 12 represents the online application requesting creation of new QuickSight resources, which pushes requests to Amazon SQS and then AWS Lambda triggers AWS CodeBuild to deploy new QuickSight resources. Step 13.1 and 13.2 represents the CDK based pipeline maintaining the QuickSight resources to keep them in a consistent state. Finally, the AWS CDK stack invokes the Asset Tracker service to update its metadata as represented in step 13.3.
Tracking QuickSight resources created outside of the AWS CDK Stack
Step 14.1 represents users creating QuickSight resources using the AWS Console and step 14.2 represents that activity getting logged into AWS CloudTrail.
Step 14.3 to 14.5 represents triggering EventBridge rule for CloudTrail activities that represents QuickSight resource being created, updated or deleted and then invoke the Asset Tracker REST API to register the QuickSight resource metadata.
Architecture Decisions:
The following are few architecture decisions we took while designing the solution.
Choosing Aurora database for Asset Tracker: We have evaluated Amazon Neptune for the Asset Tracker database as most of the metadata information we capture are primarily maintaining relationship between QuickSight resources. But when we looked at the query patterns, we found the query pattern is always just one level deep to find who is the parent of a specific QuickSight resource and that can be solved with a relational database’s Primary Key / Foreign Key relationship and with simple self-join SQL query. Knowing the query pattern does not require a graph database, we decided to go with Amazon Aurora to keep it simple, so that we can avoid introducing a new database technology and can reduce operational overhead of maintaining it. In future as the use case evolve, we can evaluate the need for a Graph database and plan for integrating it. For Amazon Aurora, we choose Amazon Aurora Serverless as the usage pattern is not consistent to reserve a server capacity and the serverless tech stack will help reduce operational overhead.
Decoupling Asset Tracker as a common REST API service: The Asset Tracker has future scope to be a centralized metadata layer to keep track of all the QuickSight resources across all business units of Huron. So instead of each business unit having its own metadata database, if we build it as a service and deploy it in a shared AWS service account, then we will get benefit from reduced operational overhead, duplicate infrastructure cost and will be able to get a consolidated view of all assets and their integrations. The service provides the ability of applications to consume metadata about the QuickSight assets and then apply their own mapping of security policies to the assets based on their own application data and access control policies.
Central QuickSight account with subfolder for environments: The choice was made to use a central account which reduces developer friction of having multiple accounts with multiple identities, end users having to manage multiple accounts and access to resources. QuickSight folders allow for appropriate permissions for separating “environments”. Furthermore, by using folder-based sharing with QuickSight groups, users with appropriate permissions already have access to the latest versions of QuickSight assets without having to share their individual identities.
The solution included an automated Continuous Integration (CI) and Continuous Deployment (CD) pipeline to deploy the resources from development to staging and then finally to production. The following provides a high-level view of the QuickSight CI/CD deployment strategy.
Aurora Database Tables and Reference Analysis update flow
The following are the database tables integrated to capture the QuickSight resource metadata.
QS_Dataset: This captures metadata of all QuickSight datasets that are integrated in the reference analysis or customer analysis. This includes AWS ARN (Amazon Resource Name), data source type, ID and more.
QS_Template: This table captures metadata of all QuickSight templates, from which customer analysis and dashboards will be created. This includes AWS ARN, parent reference analysis ID, name, version number and more.
QS_Folder: This table captures metadata about QuickSight folders which logically groups different visualizations. This includes AWS ARN, name, and description.
QS_Analysis: This table captures metadata of all QuickSight analysis that includes AWS ARN, name, type, dataset IDs, parent template ID, tags, permissions and more.
QS_Dashboard: This table captures metadata information of QuickSight dashboards that includes AWS ARN, parent template ID, name, dataset IDs, tags, permissions and more.
QS_Folder_Asset_Mapping: This table captures folder to QuickSight asset mapping that includes folder ID, Asset ID, and asset type.
As the solution moves to the next phase of implementation, we plan to introduce additional database tables to capture metadata information about QuickSight sheets and asset mapping to customers and products. We will extend the functionality to support visual based embedding to enable truly integrated customer data experiences where embedded visuals mesh with the native content on a web page.
While explaining the use case, we have highlighted it creates a challenge when a base reference analysis gets updated and we need to track the templates that are inherited from it make sure the change is pushed to the linked customer analysis and dashboards. The following example scenarios explains, how the database tables change when a reference analysis is updated.
Example Scenario: When “reference analysis” is updated with a new release
When a base reference analysis is updated because of a new feature release, then a new QuickSight reference analysis and template needs to be created. Then we need to update all customer analysis and dashboard records to point to the new template ID to form the lineage.
The following sequential steps represent the database changes that needs to happen.
Insert a new record to the “Analysis” table to represent the new reference analysis creation.
Insert a new record to the “Template” table with new reference analysis ID as parent, created in step 1.
Retrieve “Analysis” and “Dashboard” table records that points to previous template ID and then update those records with the new template ID, created in step 2.
How will it enable a more robust embedding experience
The QuickSight asset tracker integration with Huron’s products provide users with a personalized, secure and modern analytics experience. When user’s login through Huron’s online application, it will use logged in user’s information to dynamically identify the products they are mapped to and then render the QuickSight visualizations & dashboards that the user is entitled to see. This will improve user experience, enable granular permission management and will also increase performance.
How AWS collaborated with Huron to help build the solution
AWS team collaborated with Huron team to design and implement the solution. AWS Data Lab Resident Architect collaborated with Huron’s lead architect for initial architecture design that compared different options for integration and deriving tradeoffs between them, before finalizing the final architecture. Then with the help of AWS Professional service engineer, we could build the base solution that can be extended by Huron team to roll it out to all business units and integrate additional reporting features on top of it.
The AWS Data Lab Resident Architect program provides AWS customers with guidance in refining and executing their data strategy and solutions roadmap. Resident Architects are dedicated to customers for 6 months, with opportunities for extension, and help customers (Chief Data Officers, VPs of Data Architecture, and Builders) make informed choices and tradeoffs about accelerating their data and analytics workloads and implementation.
The AWS Professional Services organization is a global team of experts that can help customers realize their desired business outcomes when using the AWS Cloud. The Professional Services team work together with customer’s team and their chosen member of the AWS Partner Network (APN) to execute their enterprise cloud computing initiatives.
Next Steps
Huron has rolled out the solution for one business unit and as a next step we plan to roll it out to all business units, so that the asset tracker service is populated with assets available across all business units of the organization to provide consolidated view.
In addition, Huron will be building a reporting layer on top of the Amazon Aurora asset tracker database, so that the leadership has a way to discover assets by business unit, by owner, created between specific date range or the reports that are not updated since a while.
Once the asset tracker is populated with all QuickSight assets, it will be integrated into the front-end online application that can help end users discover existing assets and request creation of new assets.
Newer QuickSight API’s such as assets-as-a-bundle and assets-as-code further accelerate the capabilities of the service by improving the development velocity and reliability of making changes.
Conclusion
This blog explained how Huron built an Asset Tracker to keep track of all QuickSight resources across the organization. This solution may provide a reference to other organizations who would like to build an inventory of visualization reports, ML models or other technical assets. This solution leveraged Amazon Aurora as the primary database, but if an organization would also like to build a detailed lineage of all the assets to understand how they are interrelated then they can consider integrating Amazon Neptune as an alternate database too.
If you have a similar use case and would like to collaborate with AWS Data Analytics Specialist Architects to brainstorm on the architecture, rapidly prototype it and implement a production ready solution then connect with your AWS Account Manager or AWS Solution Architect to start an engagement with AWS Data Lab team.
About the Authors
Corey Johnson is the Lead Data Architect at Huron, where he leads its data architecture for their Global Products Data and Analytics initiatives.
Sakti Mishra is a Principal Data Analytics Architect at AWS, where he helps customers modernize their data architecture, help define end to end data strategy including data security, accessibility, governance, and more. He is also the author of the book Simplify Big Data Analytics with Amazon EMR. Outside of work, Sakti enjoys learning new technologies, watching movies, and visiting places with family.
Inventory management is a critical function for any business that deals with physical products. The primary challenge businesses face with inventory management is balancing the cost of holding inventory with the need to ensure that products are available when customers demand them.
The consequences of poor inventory management can be severe. Overstocking can lead to increased holding costs and waste, while understocking can result in lost sales, reduced customer satisfaction, and damage to the business’s reputation. Inefficient inventory management can also tie up valuable resources, including capital and warehouse space, and can impact profitability.
Forecasting is another critical component of effective inventory management. Accurately predicting demand for products allows businesses to optimize inventory levels, minimize stockouts, and reduce holding costs. However, forecasting can be a complex process, and inaccurate predictions can lead to missed opportunities and lost revenue.
To address these challenges, businesses need an inventory management and forecasting solution that can provide real-time insights into inventory levels, demand trends, and customer behavior. Such a solution should use the latest technologies, including Internet of Things (IoT) sensors, cloud computing, and machine learning (ML), to provide accurate, timely, and actionable data. By implementing such a solution, businesses can improve their inventory management processes, reduce holding costs, increase revenue, and enhance customer satisfaction.
In this post, we discuss how to streamline inventory management forecasting systems with AWS managed analytics, AI/ML, and database services.
Solution overview
In today’s highly competitive business landscape, it’s essential for retailers to optimize their inventory management processes to maximize profitability and improve customer satisfaction. With the proliferation of IoT devices and the abundance of data generated by them, it has become possible to collect real-time data on inventory levels, customer behavior, and other key metrics.
To take advantage of this data and build an effective inventory management and forecasting solution, retailers can use a range of AWS services. By collecting data from store sensors using AWS IoT Core, ingesting it using AWS Lambda to Amazon Aurora Serverless, and transforming it using AWS Glue from a database to an Amazon Simple Storage Service (Amazon S3) data lake, retailers can gain deep insights into their inventory and customer behavior.
With Amazon Athena, retailers can analyze this data to identify trends, patterns, and anomalies, and use Amazon ElastiCache for customer-facing applications with reduced latency. Additionally, by building a point of sales application on Amazon QuickSight, retailers can embed customer 360 views into the application to provide personalized shopping experiences and drive customer loyalty.
Finally, we can use Amazon SageMaker to build forecasting models that can predict inventory demand and optimize stock levels.
With these AWS services, retailers can build an end-to-end inventory management and forecasting solution that provides real-time insights into inventory levels and customer behavior, enabling them to make informed decisions that drive business growth and customer satisfaction.
The following diagram illustrates a sample architecture.
With the appropriate AWS services, your inventory management and forecasting system can have optimized collection, storage, processing, and analysis of data from multiple sources. The solution includes the following components.
Data ingestion and storage
Retail businesses have event-driven data that requires action from downstream processes. It’s critical for an inventory management application to handle the data ingestion and storage for changing demands.
The data ingestion process is typically triggered by an event such as an order being placed, kicking off the inventory management workflow, which requires actions from backend services. Developers are responsible for the operational overhead of trying to maintain the data ingestion load from an event driven-application.
The volume and velocity of data can change in the retail industry each day. Events like Black Friday or a new campaign can create volatile demand in what is required to process and store the inventory data. Serverless services designed to scale to businesses’ needs help reduce the architectural and operational challenges that are driven from high-demand retail applications.
Understanding the scaling challenges that occur when inventory demand spikes, we can deploy Lambda, a serverless, event-driven compute service, to trigger the data ingestion process. As inventory events occur like purchases or returns, Lambda automatically scales compute resources to meet the volume of incoming data.
After Lambda responds to the inventory action request, the updated data is stored in Aurora Serverless. Aurora Serverless is a serverless relational database that is designed to scale to the application’s needs. When peak loads hit during events like Black Friday, Aurora Serverless deploys only the database capacity necessary to meet the workload.
Inventory management applications have ever-changing demands. Deploying serverless services to handle the ingestion and storage of data will not only optimize cost but also reduce the operational overhead for developers, freeing up bandwidth for other critical business needs.
Data performance
Customer-facing applications require low latency to maintain positive user experiences with microsecond response times. ElastiCache, a fully managed, in-memory database, delivers high-performance data retrieval to users.
In-memory caching provided by ElastiCache is used to improve latency and throughput for read-heavy applications that online retailers experience. By storing critical pieces of data in-memory like commonly accessed product information, the application performance improves. Product information is an ideal candidate for a cached store due to data staying relatively the same.
Functionality is often added to retail applications to retrieve trending products. Trending products can be cycled through the cache dependent on customer access patterns. ElastiCache manages the real-time application data caching, allowing your customers to experience microsecond response times while supporting high-throughput handling of hundreds of millions of operations per second.
Data transformation
Data transformation is essential in inventory management and forecasting solutions for both data analysis around sales and inventory, as well as ML for forecasting. This is because raw data from various sources can contain inconsistencies, errors, and missing values that may distort the analysis and forecast results.
In the inventory management and forecasting solution, AWS Glue is recommended for data transformation. The tool addresses issues such as cleaning, restructuring, and consolidating data into a standard format that can be easily analyzed. As a result of the transformation, businesses can obtain a more precise understanding of inventory, sales trends, and customer behavior, influencing data-driven decisions to optimize inventory management and sales strategies. Furthermore, high-quality data is crucial for ML algorithms to make accurate forecasts.
By transforming data, organizations can enhance the accuracy and dependability of their forecasting models, ultimately leading to improved inventory management and cost savings.
Data analysis
Data analysis has become increasingly important for businesses because it allows leaders to make informed operational decisions. However, analyzing large volumes of data can be a time-consuming and resource-intensive task. This is where Athena come in. With Athena, businesses can easily query historical sales and inventory data stored in S3 data lakes and combine it with real-time transactional data from Aurora Serverless databases.
The federated capabilities of Athena allow businesses to generate insights by combining datasets without the need to build ETL (extract, transform, and load) pipelines, saving time and resources. This enables businesses to quickly gain a comprehensive understanding of their inventory and sales trends, which can be used to optimize inventory management and forecasting, ultimately improving operations and increasing profitability.
With Athena’s ease of use and powerful capabilities, businesses can quickly analyze their data and gain valuable insights, driving growth and success without the need for complex ETL pipelines.
Forecasting
Inventory forecasting is an important aspect of inventory management for businesses that deal with physical products. Accurately predicting demand for products can help optimize inventory levels, reduce costs, and improve customer satisfaction. ML can help simplify and improve inventory forecasting by making more accurate predictions based on historical data.
SageMaker is a powerful ML platform that you can use to build, train, and deploy ML models for a wide range of applications, including inventory forecasting. In this solution, we use SageMaker to build and train an ML model for inventory forecasting, covering the basic concepts of ML, the data preparation process, model training and evaluation, and deploying the model for use in a production environment.
The solution also introduces the concept of hierarchical forecasting, which involves generating coherent forecasts that maintain the relationships within the hierarchy or reconciling incoherent forecasts. The workshop provides a step-by-step process for using the training capabilities of SageMaker to carry out hierarchical forecasting using synthetic retail data and the scikit-hts package. The FBProphet model was used along with bottom-up and top-down hierarchical aggregation and disaggregation methods. We used Amazon SageMaker Experiments to train multiple models, and the best model was picked out of the four trained models.
Although the approach was demonstrated on a synthetic retail dataset, you can use the provided code with any time series dataset that exhibits a similar hierarchical structure.
Security and authentication
The solution takes advantage of the scalability, reliability, and security of AWS services to provide a comprehensive inventory management and forecasting solution that can help businesses optimize their inventory levels, reduce holding costs, increase revenue, and enhance customer satisfaction. By incorporating user authentication with Amazon Cognito and Amazon API Gateway, the solution ensures that the system is secure and accessible only by authorized users.
Next steps
The next step to build an inventory management and forecasting solution on AWS would be to go through the Inventory Management workshop. In the workshop, you will get hands-on with AWS managed analytics, AI/ML, and database services to dive deep into an end-to-end inventory management solution. By the end of the workshop, you will have gone through the configuration and deployment of the critical pieces that make up an inventory management system.
Conclusion
In conclusion, building an inventory management and forecasting solution on AWS can help businesses optimize their inventory levels, reduce holding costs, increase revenue, and enhance customer satisfaction. With AWS services like IoT Core, Lambda, Aurora Serverless, AWS Glue, Athena, ElastiCache, QuickSight, SageMaker, and Amazon Cognito, businesses can use scalable, reliable, and secure technologies to collect, store, process, and analyze data from various sources.
The end-to-end solution is designed for individuals in various roles, such as business users, data engineers, data scientists, and data analysts, who are responsible for comprehending, creating, and overseeing processes related to retail inventory forecasting. Overall, an inventory management and forecasting solution on AWS can provide businesses with the insights and tools they need to make data-driven decisions and stay competitive in a constantly evolving retail landscape.
About the Authors
Jason D’Alba is an AWS Solutions Architect leader focused on databases and enterprise applications, helping customers architect highly available and scalable solutions.
Navnit Shukla is an AWS Specialist Solution Architect, Analytics, and is passionate about helping customers uncover insights from their data. He has been building solutions to help organizations make data-driven decisions.
Vetri Natarajan is a Specialist Solutions Architect for Amazon QuickSight. Vetri has 15 years of experience implementing enterprise business intelligence (BI) solutions and greenfield data products. Vetri specializes in integration of BI solutions with business applications and enable data-driven decisions.
Sindhura Palakodety is a Solutions Architect at AWS. She is passionate about helping customers build enterprise-scale Well-Architected solutions on the AWS platform and specializes in Data Analytics domain.
A new week starts, and Spring is almost here! If you’re curious about AWS news from the previous seven days, I got you covered.
Last Week’s Launches Here are the launches that got my attention last week:
Amazon S3 – Last week there was AWS Pi Day 2023 celebrating 17 years of innovation since Amazon S3 was introduced on March 14, 2006. For the occasion, the team released many new capabilities:
Amazon S3 has also simplified private connectivity from on-premises networks: with private DNS for S3, on-premises applications can use AWS PrivateLink to access S3 over an interface endpoint, while requests from your in-VPC applications access S3 using gateway endpoints.
We released Mountpoint for Amazon S3, a high performance open source file client. Read more in the blog. Note that Mountpoint isn’t a general-purpose networked file system, and comes with some restrictions on file operations.
Amazon Neptune – Now offers a graph summary API to help understand important metadata about property graphs (PG) and resource description framework (RDF) graphs. Neptune added support for Slow Query Logs to help identify queries that need performance tuning.
Amazon OpenSearch Service – The team introduced security analytics that provides new threat monitoring, detection, and alerting features. The service now supports OpenSearchversion 2.5 that adds several new features such as support for Point in Time Search and improvements to observability and geospatial functionality.
AWS Lake Formation and Apache Hive on Amazon EMR – Introduced fine-grained access controls that allow data administrators to define and enforce fine-grained table and column level security for customers accessing data via Apache Hive running on Amazon EMR.
Customers are adopting microservices architecture to build innovative and scalable applications on Amazon Web Services (AWS). These microservices applications are deployed across multiple AWS services, and customers are looking for comprehensive observability solutions that can help them effectively monitor and manage the performance of their applications in real-time.
IBM Instana is a fully automated application performance management (APM) solution, available to customers as a fully managed software as a service (SaaS) solution on AWS. It is specifically designed to help customers address the challenges of monitoring microservices and cloud-native applications in real-time. It uses artificial intelligence and machine learning to provide detailed insights into the health and behavior of applications, allowing developers and IT teams to gain real-time insights into their microservices applications, optimize performance, and quickly identify and troubleshoot issues.
This post explains the capabilities of IBM Instana to automatically collect observability metrics, traces, and events from microservices deployed on AWS cloud, as well as on-premises, to provide full visibility into the performance of individual components and applications as a whole.
IBM Instana solution overview
IBM Instana is designed to be highly scalable and adaptable to changing microservices applications environments. Its architecture (Figure 1) consists of several components that work together to provide comprehensive monitoring for microservices and cloud-native applications.
Instana’s main building blocks are host agents and agent sensors that are deployed in a customer’s AWS account and responsible for collecting, aggregating, and sending detailed monitoring information of applications and AWS services to the Instana SaaS backend.
The Instana SaaS backend services provide several key components, including data collectors, storage services, analytics engines, and user interfaces. It allows customers to process and analyze data in real-time, generate actionable insights, have a comprehensive view of their applications and infrastructure performance, enabling them to quickly identify and resolve issues and improve their overall operations.
Figure 1. IBM Instana architecture on AWS
Monitoring data
Instana monitors and observes microservices and cloud-native applications by collecting beacons, traces, and one-second metrics:
Beacons are small monitoring payloads that are transmitted by a JavaScript agent to the Instana servers, modeling specific events occurring within the lifecycle of a page view of a website; for example, page loading, resource retrieval, and HTTP requests.
Traces are detailed records of the requests and transactions that flow through a microservice architecture. They record the sequence of events that occur when a request is processed, including the services that are involved, the duration of each service, and any errors or exceptions that occur. Instana automatically correlates traces across services to provide a complete view of an entire transaction. This allows for easy identification and diagnosis of performance issues.
Metrics are numerical values that represent the performance and resource utilization of a microservice or infrastructure component. Metrics are collected by Instana Agents and sent to the Instana backend at regular intervals. Instana Agents collect hundreds of different metrics, including (but not limited to) CPU usage, memory usage, network traffic, and disk I/O.
This information is captured by Instana agents and sensors, which also collect application configurations and events, plus discover application building blocks, including clusters, containers, and services.
Once Instana agents are running, they automatically detect applications and services, such as containers running on Amazon EKS, and processes like Nginx, NodeJS, Spring Boot, Postgres, Elasticsearch, or Cassandra. For each component detected, different Instana sensors are automatically downloaded, installed, and configured to monitor the environment.
Instana sensors are small programs that are designed to attach and monitor one specific technology and pass their data to the agent. They are automatically managed, updated, loaded, and unloaded by the host agent.
Instana also provides tracers, which are used with runtimes like Java, .NET, NodeJS, plus others. They modify code execution to capture logs, traces at request level, and send those back to the Instana agent.
With the use of sensors, the host agent collects configuration data and monitors the applications it has detected. The host agent also handles communications with the Instana SaaS backend services. It collects, aggregates and sends logs, traces and records metrics (such as response times, error rates, and resource utilization) every second to the Instana SaaS backend in real-time, using secure and efficient communication protocols.
IBM Instana SaaS
The Instana SaaS backend is the heart of the Instana APM solution and responsible for processing, storing, and analyzing the monitoring data collected from the Instana agents and sensors installed in the customer’s infrastructure.
It consists of several components and services that work together to provide real-time monitoring and analysis of microservices applications, including:
Data collectors: Receive and process data from the Instana agents and sensors, and store it in the Instana backend for further analysis.
Analytics engine: Analyzes the data collected by the agents and sensors to provide insights into the performance and health of the microservices applications.
User interface: Web-based interface that customers use to view and analyze their monitoring data.
Alerting engine: Generates alerts when thresholds or anomalies are detected in the monitoring data.
Data storage: Time-series database that stores the monitoring data collected by the agents and sensors. Allows customers to query and analyze the data in real-time.
Integrations: Integrates with various third-party tools, such as Slack, PagerDuty, and ServiceNow, providing seamless alerting and incident management.
IBM Instana backend: making sense of the situation in real time
The Instana SaaS platform automatically ingests data from agents and continuously updates a dependency map (Figure 2). This map presents every dependency in context, giving users an easy way to understand the interrelationships between application components and services.
This understanding enables users to identify the upstream and downstream impacts of any issue, ensuring that they stay informed about any potential impacts.
Figure 2. An example of an IBM Instana dependency map
Instana traces every request end-to-end without sampling. The traces are analyzed in real-time, providing metrics that make any performance problems immediately visible. In the event of an incident, Instana can illustrate how a single issue can generate a ripple effect and impact a number of directly and indirectly connected services. Using the relationship information from the Dynamic Graph, Instana’s automatic root-cause analysis can precisely aggregate the individual issues into a single incident.
Figure 3. Applications monitoring with IBM Instana
Developers, IT operations, or site reliability engineers (SREs) can access the Instana backend end-user monitoring interface (Figure 3) or end-user monitoring (EUM) interface (Figure 4) to view monitoring data of their workloads. These can be websites, mobile applications, AWS services, and infrastructure levels. From this UI, these personas can access service dashboards that show key performance indicators (KPIs), like response time and error rate.
Figure 4. End-user monitoring with IBM Instana
The following actions demonstrate how an EUM for a JavaScript application, deployed to Amazon S3 can be completed:
Developers inject Instana JavaScript code (Figure 5) into the static website (HTML).
When a user visits the website, the JavaScript agent sends beacons to the Instana backend.
Dashboards show specific events of the website lifecycle, including page loading, JS errors, and HTTP requests.
Teams access Instana UI to check performance matrices. They can configure Smart Alerts with custom alerting policies based on specific metrics and KPIs.
Smart Alerts can send alerts via various channels, such as email, Slack, or IBM Watson AIOps Webhook.
In case of an incident, teams can use Instana to retrieve various performance metrics for root-cause analysis.
Developers can resolve the issues and apply the patch.
Figure 5. IBM Instana EUM JavaScript agent
Instana also offers Smart Alerts (Figure 6) to provide a more intuitive process of managing alerts. With Smart Alerts, customers can automatically generate alerting configurations using relevant KPIs and automatic threshold detection for use cases like website slowness or website errors.
Figure 6. IBM Instana Smart Alerts
Conclusion
In this post, we discussed how IBM Instana provides a comprehensive monitoring solution with the right tools to help you implement a real-time observability and monitoring solution. It allows you to gain insight into your microservices and cloud-native applications, including visibility into AWS services, containers, on-premises infrastructure, and other technologies. Instana can quickly identify and resolve issues before they impact end-users, ensuring that your applications are performing optimally.
As an IT administrator, developer, or business owner, IBM Instana on AWS give a deeper understanding of your applications and help you make data-driven decisions to improve overall performance.
Application migrations, especially from legacy/mainframe to the cloud, are done in phases that sometimes span multiple years. Each phase migrates a set of applications, data, and other resources to the cloud. During the transition phases, applications might require access to both on-premises and cloud-based resources to perform their function. While working with our customers, we observed that the most common resources that applications require access to are databases, file storage, and shared services.
This blog post includes architecture guidelines for setting up access to commonly used resources by building a security model in Amazon Web Services (AWS). As you move your legacy applications to the cloud, you can apply Zero Trust concepts and security best practices according to your security needs. With AWS, you can build strong identity and access management with centralized control and set up and manage guardrails and fine-grained access controls for your workforce and applications.
In large organizations, on-premises applications rely on mainframe-based security services, an Identity Provider (IdP) platform, or a combination of both.
A mainframe-based control facility enables on-premises applications to:
Identify and verify users.
Establish an authority (authorize users and backend programs to access protected resources) through privileges defined in the control facility.
The backend programs use a unique identifier (or surrogate key) and run under the authority defined by the privileges assigned to the unique identifier.This security mechanism needs to be transformed into a role-based security model in AWS as applications are moved to the cloud. You assign permissions to a role, which is assumed by an application to get access to resources in AWS, similar to an authority defined in the legacy environment.
An IdP platform (such as Octa or Ping Identify) provides capabilities such as centralized access management and identity federation using SAML 2.0 or OpenID Connect (OIDC), that builds a system of trust between on-premises IdP and AWS. Once the federation is set up, on-premises applications can access AWS resources using AWS Identity and Access Management (IAM) roles, as explained in the next section.
Setting up a scalable security model in AWS
Figure 1 shows an on-premises environment where enterprise identity management is integrated with the mainframe and provides authentication and authorization to applications running off the mainframe. Generally, mainframe-based security controls (users, resources, and profiles) are replicated to the enterprise identity platform and are kept in sync through a change data capture process.
Figure 1. Access to AWS resources from on-premises
To enable your on-premises applications to access AWS resources, the applications need valid AWS credentials for making AWS API requests. Avoid using long-term access keys (such as those associated with IAM users) because they remain valid until you remove them. The following two methods can be used to assume an IAM role and get temporary security credentials to gain access to the AWS resources:
SAML based Identity federation – AWS supports identity federation with SAML. It allows federated access to users and applications in your organization by assuming an IAM role created for SAML federation to get temporary credentials. This method is helpful:
If your application uses a service account to manage AWS resource access, regardless of who is logged in.
IAM Roles Anywhere – Your on-premises applications will exchange X.509 certificates so that they can assume a role and get temporary credentials. This method is helpful if your application needs access to an AWS resource based on a service account.
In both of these cases, authenticated requests assume an IAM role, get temporary security credentials, and perform certain actions using AWS command line interface (CLI) and AWS SDKs. The IAM role has attached permissions for AWS resources such as Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and Amazon Relational Database Service (Amazon RDS).
The temporary credentials expire when the session expires. By default, the session duration is one hour; you can request longer duration and session refresh.
To understand better, let’s consider the use case in Figure 2, where on-premises applications need access to AWS resources.
Figure 2. Access to resources that are created or already migrated to AWS from on-premises
Applications can get temporary security credentials through SAML or IAM Roles Anywhere as explained earlier. The next sections explain setting up access to the resources in Figure 2 using temporary credentials.
1. Amazon S3
On-premises applications can access Amazon S3 using the REST API or the AWS SDK to perform certain actions (such as GetObjects or ListObjects):
You can also simplify by creating Amazon S3 access points for your application to perform object-level operations in your S3 bucket. Each access point has distinct permissions and network controls that S3 applies for any request made through it.
2. Amazon RDS and Amazon Aurora
AWS Secrets Manager helps you store credentials for Amazon RDS and Amazon Aurora. You can also set up automatic rotation of your database secrets to meet your security and compliance needs. Applications can retrieve secrets using AWS SDKs and AWS CLI.
Additional configuration values can be stored in AWS Systems Manager Parameter Store, which provides secure, hierarchical storage for configuration data management such as passwords, database strings and license codes as parameter values rather than hard coding them in the code.
To access Amazon RDS and Amazon Aurora:
You can launch Amazon RDS DB instances into a virtual private cloud (VPC). A client application can access DB instance through the internet or through the private network only using an established connection from on-premises to the AWS environment.
On-premises applications can connect to a relational database using a database driver such as Java Database Connectivity (JDBC). The application can retrieve database connection details (such as database URL, port, or credentials) from AWS Secrets Manager and AWS Systems Manager Parameter Store through API calls and can use them for the database connection.
Database admins can access AWS Management Console through an assumed role and can have access to database credentials from AWS Secrets Manager in order to connect directly with the database. For certain administration tasks (such as cluster setup, backup, recovery, maintenance, and management), they will need access to the Amazon RDS management console.
Amazon RDS also provides IAM database authentication option for MariaDB, MySQL, and PostgreSQL. You can authenticate without a password when you connect to a DB instance. Instead, you use an authentication token. For more information, go to IAM database authentication.
This blog helps you architect an application security model in AWS to provide on-premises access to commonly used resources in AWS.
You can apply security best practices and Zero Trust concepts as you move your legacy applications to the cloud. With AWS, you can build identity and access management with centralized and fine-grained access controls for your workforce and applications.
The world is asynchronous, is what Werner Vogels, Amazon CTO, reminded us during his keynote last week at AWS re:Invent. At the beginning of the keynote, he showed us how weird a synchronous world would be and how everything in nature is asynchronous. One example of an event-driven application he showcased during his keynote is Serverlesspresso, a project my team has been working on for the last year. And last week, we announced Serverlesspresso extensions, a new program that lets you contribute to Serverlesspresso and learn how event-driven applications can be extended.
Last Week’s Launches Here are some launches that got my attention during the previous week.
Amazon RDS Proxynow supports creating proxies in Amazon Aurora Global Database primary and secondary Regions. Now, building multi-Region applications with Amazon Aurora is simpler. RDS proxy sits between your application and the database pool and shares established database connections.
Other AWS News Some other updates and news that you may have missed:
I would like to recommend this really interesting Amazon Science article about federated learning. This is a framework that allows edge devices to work together to train a global model while keeping customers’ data on-device.
Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish, and every other week there is a new episode. Today the final episode for season three launched, and in it, we discussed many of the re:Invent launches. You can listen to all the episodes directly from your favorite podcast app or at AWS Podcasts en español.
AWS open-source news and updates–This is a newsletter curated by my colleague Ricardo to bring you the latest open-source projects, posts, events, and more.
Upcoming AWS Events Check your calendars and sign up for these AWS events:
AWS Resiliency Hub Activation Day is a half-day technical virtual session to deep dive into the features and functionality of Resiliency Hub. You can register for free here.
AWS re:Invent recaps in your area. During the re:Invent week, we had lots of new announcements, and in the next weeks you can find in your area a recap of all these launches. All the events will be posted on this site, so check it regularly to find an event nearby.
AWS re:Invent keynotes, leadership sessions, and breakout sessions are available on demand. I recommend that you check the playlists and find the talks about your favorite topics in one collection.
That’s all for this week. Check back next Monday for another Week in Review!
Written in collaboration with Ben Moses, AWS Senior Solutions Architect, and Michael Holtby, AWS Senior Manager Solutions Architecture
Designing an architecture is not a simple task. There are many dimensions and characteristics of a solution to consider, such as the availability, performance, or resilience.
In this Let’s Architect!, we explore cost optimization and ideas on how to rethink your AWS workloads, providing suggestions that span from compute to data transfer.
AWS Graviton processors are custom silicon from Amazon’s Annapurna Labs. Based on the Arm processor architecture, they are optimized for performance and cost, which allows customers to get up to 34% better price performance.
This AWS Compute Blog post discusses some of the differences between the x86 and Arm architectures, as well as methods for developing Lambda functions on Graviton2, including performance benchmarking.
Many serverless workloads can benefit from Graviton2, especially when they are not using a library that requires an x86 architecture to run.
Amazon Relational Database Service (Amazon RDS) and Amazon Aurora support a multitude of instance types to scale database workloads based on needs. Both services now support Arm-based AWS Graviton2 instances, which provide up to 52% price/performance improvement for Amazon RDS open-source databases, depending on database engine, version, and workload. They also provide up to 35% price/performance improvement for Amazon Aurora, depending on database size.
This AWS Database Blog post showcases strategies for updating RDS DB instances to make use of Graviton2 with minimal changes.
Data transfer charges are often overlooked while architecting an AWS solution. Considering data transfer charges while making architectural decisions can save costs. This AWS Architecture Blog post describes the different flows of traffic within a typical cloud architecture, showing where costs do and do not apply. For areas where cost applies, it shows best-practice strategies to minimize these expenses while retaining a healthy security posture.
This Architecture Blog post is a collection of best practices for cost management in AWS, including the relevant tools; plus, it is part of a series on cost optimization using an e-commerce example.
AWS Cost Explorer is used to first identify opportunities for optimizations, including data transfer, storage in Amazon Simple Storage Service and Amazon Elastic Block Store, idle resources, and the use of Graviton2 (Amazon’s Arm-based custom silicon). The post discusses establishing a FinOps culture and making use of Service Control Policies (SCPs) to control ongoing costs and guide deployment decisions, such as instance-type selection.
Applying SCPs on different environments for cost control
See you next time!
Thanks for joining us to discuss optimizing costs while architecting! This is the last Let’s Architect! post of 2022. We will see you again in 2023, when we explore even more architecture topics together.
Wishing you a happy holiday season and joyous new year!
PostgreSQL has become the preferred open-source relational database for many enterprises and start-ups with its extensible design for developers. One of the reasons developers use PostgreSQL is it allows them to add database functionality by building extensions with their preferred programming languages.
Today, we are announcing the general availability of Trusted Language Extensions for PostgreSQL (pg_tle), a new open-source development kit for building PostgreSQL extensions. With Trusted Language Extensions for PostgreSQL, developers can build high-performance extensions that run safely on PostgreSQL.
Trusted Language Extensions for PostgreSQL provides database administrators control over who can install extensions and a permissions model for running them, letting application developers deliver new functionality as soon as they determine an extension meets their needs.
To start building with Trusted Language Extensions, you can use trusted languages such as JavaScript, Perl, and PL/pgSQL. These trusted languages have safety attributes, including restricting direct access to the file system and preventing unwanted privilege escalations. You can easily install extensions written in a trusted language on Amazon Aurora PostgreSQL-Compatible Edition 14.5 and Amazon RDS for PostgreSQL 14.5 or a newer version.
Trusted Language Extensions for PostgreSQL is an open-source project licensed under Apache License 2.0 on GitHub. You can comment or suggest items on the Trusted Language Extensions for PostgreSQL roadmap and help us support this project across multiple programming languages, and more. Doing this as a community will help us make it easier for developers to use the best parts of PostgreSQL to build extensions.
Let’s explore how we can use Trusted Language Extensions for PostgreSQL to build a new PostgreSQL extension for Amazon Aurora and Amazon RDS.
Setting up Trusted Language Extensions for PostgreSQL To use pg_tle with Amazon Aurora or Amazon RDS for PostgreSQL, you need to set up a parameter group that loads pg_tle in the PostgreSQL shared_preload_libraries setting. Choose Parameter groups in the left navigation pane in the Amazon RDS console and Create parameter group to make a new parameter group.
Choose Create after you select postgres14 with Amazon RDS for PostgreSQL in the Parameter group family and pg_tle in the Group Name. You can select aurora-postgresql14 for an Amazon Aurora PostgreSQL-Compatible cluster.
Choose a created pgtle parameter group and Edit in the Parameter group actions dropbox menu. You can search shared_preload_library in the search box and choose Edit parameter. You can add your preferred values, including pg_tle, and choose Save changes.
You can also do the same job in the AWS Command Line Interface (AWS CLI).
Now, you can add the pgtle parameter group to your Amazon Aurora or Amazon RDS for PostgreSQL database. If you have a database instance called testing-pgtle, you can add the pgtle parameter group to the database instance using the command below. Please note that this will cause an active instance to reboot.
Verify that the pg_tle library is available on your Amazon Aurora or Amazon RDS for PostgreSQL instance. Run the following command on your PostgreSQL instance:
SHOW shared_preload_libraries;
pg_tle should appear in the output.
Now, we need to create the pg_tle extension in your current database to run the command:
CREATE EXTENSION pg_tle;
You can now create and install Trusted Language Extensions for PostgreSQL in your current database. If you create a new extension, you should grant the pgtle_admin role to your primary user (e.g., postgres) with the following command:
GRANT pgtle_admin TO postgres;
Let’s now see how to create our first pg_tle extension!
Building a Trusted Language Extension for PostgreSQL For this example, we are going to build a pg_tle extension to validate that a user is not setting a password that’s found in a common password dictionary. Many teams have rules around the complexity of passwords, particularly for database users. PostgreSQL allows developers to help enforce password complexity using the check_password_hook.
In this example, you will build a password check hook using PL/pgSQL. In the hook, you can check to see if the user-supplied password is in a dictionary of 10 of the most common password values:
SELECT pgtle.install_extension (
'my_password_check_rules',
'1.0',
'Do not let users use the 10 most commonly used passwords',
$_pgtle_$
CREATE SCHEMA password_check;
REVOKE ALL ON SCHEMA password_check FROM PUBLIC;
GRANT USAGE ON SCHEMA password_check TO PUBLIC;
CREATE TABLE password_check.bad_passwords (plaintext) AS
VALUES
('123456'),
('password'),
('12345678'),
('qwerty'),
('123456789'),
('12345'),
('1234'),
('111111'),
('1234567'),
('dragon');
CREATE UNIQUE INDEX ON password_check.bad_passwords (plaintext);
CREATE FUNCTION password_check.passcheck_hook(username text, password text, password_type pgtle.password_types, valid_until timestamptz, valid_null boolean)
RETURNS void AS $$
DECLARE
invalid bool := false;
BEGIN
IF password_type = 'PASSWORD_TYPE_MD5' THEN
SELECT EXISTS(
SELECT 1
FROM password_check.bad_passwords bp
WHERE ('md5' || md5(bp.plaintext || username)) = password
) INTO invalid;
IF invalid THEN
RAISE EXCEPTION 'password must not be found on a common password dictionary';
END IF;
ELSIF password_type = 'PASSWORD_TYPE_PLAINTEXT' THEN
SELECT EXISTS(
SELECT 1
FROM password_check.bad_passwords bp
WHERE bp.plaintext = password
) INTO invalid;
IF invalid THEN
RAISE EXCEPTION 'password must not be found on a common password dictionary';
END IF;
END IF;
END
$$ LANGUAGE plpgsql SECURITY DEFINER;
GRANT EXECUTE ON FUNCTION password_check.passcheck_hook TO PUBLIC;
SELECT pgtle.register_feature('password_check.passcheck_hook', 'passcheck');
$_pgtle_$
);
You need to enable the hook through the pgtle.enable_password_check configuration parameter. On Amazon Aurora and Amazon RDS for PostgreSQL, you can do so with the following command:
It may take several minutes for these changes to propagate. You can check that the value is set using the SHOW command:
SHOW pgtle.enable_password_check;
If the value is on, you will see the following output:
pgtle.enable_password_check
-----------------------------
on
Now you can create this extension in your current database and try setting your password to one of the dictionary passwords and observe how the hook rejects it:
CREATE EXTENSION my_password_check_rules;
CREATE ROLE test_role PASSWORD '123456';
ERROR: password must not be found on a common password dictionary
CREATE ROLE test_role;
SET SESSION AUTHORIZATION test_role;
SET password_encryption TO 'md5';
\password
-- set to "password"
ERROR: password must not be found on a common password dictionary
To disable the hook, set the value of pgtle.enable_password_check to off:
You can uninstall this pg_tle extension from your database and prevent anyone else from running CREATE EXTENSION on my_password_check_rules with the following command:
DROP EXTENSION my_password_check_rules;
SELECT pgtle.uninstall_extension('my_password_check_rules');
You can find more sample extensions and give them a try. To build and test your Trusted Language Extensions in your local PostgreSQL database, you can build from our source code after cloning the repository.
Join Our Community! The Trusted Language Extensions for PostgreSQL community is open to everyone. Give it a try, and give us feedback on what you would like to see in future releases. We welcome any contributions, such as new features, example extensions, additional documentation, or any bug reports in GitHub.
When updating databases, using a blue/green deployment technique is an appealing option for users to minimize risk and downtime. This method of making database updates requires two database environments—your current production environment, or blue environment, and a staging environment, or green environment. You must then keep these two environments in sync with each other so you may safely test and upgrade your changes to production.
Amazon Aurora and Amazon Relational Database Service (Amazon RDS) customers can use database cloning and promotable read replicas to help self-manage a blue/green deployment. However, self-managing a blue/green deployment can be costly and complex to build and manage. As a result, customers sometimes delay implementing database updates, choosing availability over the benefits that they would gain from updating their databases.
Today, we are announcing the general availability of Amazon RDS Blue/Green Deployments, a new feature for Amazon Aurora with MySQL compatibility, Amazon RDS for MySQL, and Amazon RDS for MariaDB that enables you to make database updates safer, simpler, and faster.
With just a few steps, you can use Blue/Green Deployments to create a separate, synchronized, fully managed staging environment that mirrors the production environment. The staging environment clones your production environment’s primary database and in-Region read replicas. Blue/Green Deployments keep these two environments in sync using logical replication.
In as fast as a minute, you can promote the staging environment to be the new production environment with no data loss. During switchover, Blue/Green Deployments blocks writes on blue and green environments so that the green catches up with the blue, ensuring no data loss. Then, Blue/Green Deployments redirects production traffic to the newly promoted staging environment, all without any code changes to your application.
With Blue/Green Deployments, you can make changes, such as major and minor version upgrades, schema modifications, and operating system or maintenance updates, to the staging environment without impacting the production workload.
Getting Started with Blue/Green Deployments for MySQL Clusters You can start updating your databases with just a few clicks in the AWS Management Console. To get started, simply select the database that needs to be updated in the console and click Create Blue/Green Deployment under the Actions dropdown menu.
You can set a Blue/Green Deployment identifier and the attributes of your database to be modified, such as the engine version, DB cluster parameter group, and DB parameter group for green databases. To use a Blue/Green Deployment in your Aurora MySQL DB cluster, you should turn on binary logging, changing the value for the binlog_format parameter from OFF to MIXED in the DB cluster parameter group.
When you choose Create Blue/Green Deployment, it creates a new staging environment and runs automated tasks to prepare the database for production. Note, you will be charged the cost of the green database, including read replicas and DB instances in Multi-AZ deployments, and any other features such as Amazon RDS Performance Insights that you may have enabled on green.
You can also do the same job in the AWS Command Line Interface (AWS CLI). To perform an engine version upgrade, simply add a targetEngineVersion parameter and specify the engine version you’d like to upgrade to. This parameter works with both minor and major version upgrades, and it accepts short versions like 5.7 for Amazon Aurora MySQL-Compatible.
After creation is complete, you now have a staging environment that is ready for test and validation before promoting it to be the new production environment.
When testing and qualification of changes are complete, you can choose Switch over in the Actions dropdown menu to promote the staging environment marked as Green to be the new production system.
Now you are nearly ready to switch over your green databases to production. Check the settings of your green databases to verify that they are ready for the switchover. You may also set a timeout setting to determine the maximum time limit for your switchover. If Blue/Green Deployments’ switchover guardrails detect that it would take longer than the specified duration, then the switchover is canceled, and no changes are made to the environments. We recommend that you identify times of low or moderate production traffic to initiate a switchover.
After switchover, Blue/Green Deployments does not delete your old production environment. You may access it for additional validations and performance/regression testing, if needed. Please note that it is your responsibility to delete the old production environment when you no longer need it. Standard billing charges apply on old production instances until you delete them.
Now Available Amazon RDS Blue/Green Deployments is available today on Amazon Aurora with MySQL Compatibility 5.6 or higher, Amazon RDS for MySQL major version 5.6 or higher, and Amazon RDS for MariaDB 10.2 and higher in all AWS commercial Regions, excluding China, and AWS GovCloud Regions.
In the next few years, companies will build over 500 million new applications, more than has been developed in the previous 40 years combined (see IDC article). API operations enable innovation. They are the “front door” to applications and microservices, and an integral layer in the application stack. In recent years, GraphQL has emerged as a modern API approach. With GraphQL, companies can improve the performance of their applications and the speed in which development teams can build applications. In this post, we will discuss how GraphQL works and how integrating it with AWS services can help you build modern applications. We will explore the options for running GraphQL on AWS.
How GraphQL works
Imagine you have an API frontend implemented with GraphQL for your ecommerce application. As shown in Figure 1, there are different services in your ecommerce system backend that are accessible via different technologies. For example, user profile data is stored in a highly scalable NoSQL table. Orders are accessed through a REST API. The current inventory stock is checked through an AWS Lambda function. And the pricing information is in an SQL database.
Figure 1. How GraphQL works
Without using GraphQL, client applications must make multiple separate calls to each one of these services. Because each service is exposed through different API endpoints, the complexity of accessing data from the client side increases significantly. In order to get the data, you have to make multiple calls. In some cases, you might over fetch data as the data source would send you an entire payload including data you might not need. In some other circumstances, you might under fetch data as a single data source would not have all your required data.
A GraphQL API combines the data from all these different services into a single payload that the client defines based on its needs. For example, a smartphone has a smaller screen than a desktop application. A smartphone application might require less data. The data is retrieved from multiple data sources automatically. The client just sees a single constructed payload. This payload might be receiving user profile data from Amazon DynamoDB, or order details from Amazon API Gateway. Or it could involve the injection of specific fields with inventory availability and price data from AWS Lambda and Amazon Aurora.
When modernizing frontend APIs with GraphQL, you can build applications faster because your frontend developers don’t need to wait for backend service teams to create new APIs for integration. GraphQL simplifies data access by interacting with data from multiple data sources using a single API. This reduces the number of API requests and network traffic, which results in improved application performance. Furthermore, GraphQL subscriptions enable two-way communication between the backend and client. It supports publishing updates to data in real time to subscribed clients. You can create engaging applications in real time with use cases such as updating sports scores, bidding statuses, and more.
Options for running GraphQL on AWS
There are two main options for running GraphQL implementation on AWS, fully managed on AWS using AWS AppSync, and self-managed GraphQL.
I. Fully managed using AWS AppSync
The most straightforward way to run GraphQL is by using AWS AppSync, a fully managed service. AWS AppSync handles the heavy lifting of securely connecting to data sources, such as Amazon DynamoDB, and to develop GraphQL APIs. You can write business logic against these data sources by choosing code templates that implement common GraphQL API patterns. Your APIs can also interact with other AWS AppSync functionality such as caching, to improve performance. Use subscriptions to support real-time updates, and client-side data stores to keep offline devices in sync. AWS AppSync will scale automatically to support varied API request loads. You can find more details from the AWS AppSync features page.
Figure 2. AWS AppSync in an ecommerce system implementation
Let’s take a closer look at this GraphQL implementation with AWS AppSync in an ecommerce system. In Figure 2, a schema is created to define types and capabilities of the desired GraphQL API. You can tie the schema to a Resolver function. The schema can either be created to mirror existing data sources, or AWS AppSync can create tables automatically based the schema definition. You can also use GraphQL features for data discovery without viewing the backend data sources.
After a schema definition is established, an AWS AppSync client can be configured with an operation request, such as a query operation. The client submits the operation request to GraphQL Proxy along with an identity context and credentials. The GraphQL Proxy passes this request to the Resolver, which maps and initiates the request payload against pre-configured AWS data services. These can be an Amazon DynamoDB table for user profile, an AWS Lambda function for inventory service, and more. The Resolver initiates calls to one or all of these services within a single API call. This minimizes CPU cycles and network bandwidth needs. The Resolver then returns the response to the client. Additionally, the client application can change data requirements in code on demand. The AWS AppSync GraphQL API will dynamically map requests for data accordingly, enabling faster prototyping and development.
II. Self-Managed GraphQL
If you want the flexibility of selecting a particular open-source project, you may choose to run your own GraphQL API layer. Apollo, graphql-ruby, Juniper, gqlgen, and Lacinia are some popular GraphQL implementations. You can leverage AWS Lambda or container services such as Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Services (EKS) to run GraphQL open-source implementations. This gives you the ability to fine-tune the operational characteristics of your API.
When running a GraphQL API layer on AWS Lambda, you can take advantage of the serverless benefits of automatic scaling, paying only for what you use, and not having to manage your servers. You can create a private GraphQL API using Amazon ECS, EKS, or AWS Lambda, which can only be accessed from your Amazon Virtual Private Cloud (VPC). With Apollo GraphQL open-source implementation, you can create a Federated GraphQL that allows you to combine GraphQL APIs from multiple microservices into a single API, illustrated in Figure 3. The Apollo GraphQL Federation with AWS AppSync post shows a concrete example of how to integrate an AWS AppSync API with an Apollo Federation gateway. It uses specification-compliant queries and directives.
Figure 3. Apollo GraphQL implementation on AWS Lambda
When choosing self-managed GraphQL implementation, you have to spend time writing non-business logic code to connect data sources. You must implement authorization, authentication, and integrate other common functionalities. This can be caches to improve performance, subscriptions to support real-time updates, and client-side data stores to keep offline devices in sync. Because of these responsibilities, you have less time to focus on the business logic of application.
Similarly, backend development teams and API operators of an open-source GraphQL implementation must provision and maintain their own GraphQL servers. Remember that even with a serverless model, API developers and operators are still responsible for monitoring, performance tuning, and troubleshooting the API platform service.
Conclusion
Modernizing APIs with GraphQL gives your frontend application the ability to fetch just the data that’s needed from multiple data sources with an API call. You can build modern mobile and web applications faster, because GraphQL simplifies API management. You have flexibility to run an open-source GraphQL implementation most closely aligned with your needs on AWS Lambda, Amazon ECS, and Amazon EKS. With AWS AppSync, you can set up GraphQL quickly and increase your development velocity by reducing the amount of non-business API logic code.
Last week, we announced plans to launch the AWS Asia Pacific (Bangkok) Region, which will become our third AWS Region in Southeast Asia. This Region will have three Availability Zones and will give AWS customers in Thailand the ability to run workloads and store data that must remain in-country.
In the Works – AWS Region in Thailand With this big news, AWS announced a 190 billion baht (US 5 billion dollars) investment to drive Thailand’s digital future over the next 15 years. It includes capital expenditures on the construction of data centers, operational expenses related to ongoing utilities and facility costs, and the purchase of goods and services from Regional businesses.
Since we first opened an office in Bangkok in 2015, AWS has launched 10 Amazon CloudFront edge locations, a highly secure and programmable content delivery network (CDN) in Bangkok. In 2020, we launched AWS Outposts, a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience in Thailand. This year, we also plan the upcoming launch of an AWS Local Zone in Bangkok, which will enable customers to deliver applications that require single-digit millisecond latency to end users in Thailand.
Photo courtesy of Conor McNamara, Managing Director, ASEAN at AWS
The new AWS Region in Thailand is also part of our broader, multifaceted investment in the country, covering our local team, partners, skills, and the localization of services, including Amazon Transcribe, Amazon Translate, and Amazon Connect.
Many Thailand customers have chosen AWS to run their workloads to accelerate innovation, increase agility, and drive cost savings, such as 2C2P, CP All Plc., Digital Economy Promotion Agency, Energy Response Co. Ltd. (ENRES), PTT Global Public Company Limited (PTT), Siam Cement Group (SCG), Sukhothai Thammathirat Open University, The Stock Exchange of Thailand, Papyrus Studio, and more.
For example, Dr. Werner Vogels, CTO of Amazon.com, introduced the story of Papyrus Studio, a large film studio and one of the first customers in Thailand.
“Customer stories like Papyrus Studio inspire us at AWS. The cloud can allow a small company to rapidly scale and compete globally. It also provides new opportunities to create, innovate, and identify business opportunities that just aren’t possible with conventional infrastructure.”
For more information on how to enable AWS and get support in Thailand, contact our AWS Thailand team.
Last Week’s Launches My favorite news of last week was to launch dark mode as a beta feature in the AWS Management Console. In Unified Settings, you can choose between three settings for visual mode: Browser default, Light, and Dark. Browser default applies the default dark or light setting of the browser, dark applies the new built-in dark mode, and light maintains the current look and feel of the AWS console. Choose your favorite!
Here are some launches that caught my eye for web, mobile, and IoT application developers:
New AWS Amplify Library for Swift – We announce the general availability of Amplify Library for Swift (previously Amplify iOS). Developers can use Amplify Library for Swift via the Swift Package Manager to build apps for iOS and macOS (currently in beta) platforms with Auth, Storage, Geo, and more features.
New Amazon IVS Chat SDKs – Amazon Interactive Video Service (Amazon IVS) now provides SDKs for stream chat with support for web, Android, and iOS. The Amazon IVS stream chat SDKs support common functions for chat room resource management, sending and receiving messages, and managing chat room participants.
Amazon IVS is a managed, live-video streaming service using the broadcast SDKs or standard streaming software such as Open Broadcaster Software (OBS). The service provides cross-platform player SDKs for playback of Amazon IVS streams you need to make low-latency live video available to any viewer around the world. Also, it offers Chat Client Messaging SDK. For more information, see Getting Started with Amazon IVS Chat in the AWS documentation.
New AWS Parameters and Secrets Lambda Extension – This is new extension for AWS Lambda developers to retrieve parameters from AWS Systems Manager Parameter Store and secrets from AWS Secrets Manager. Lambda function developers can leverage this extension to improve their application performance as it decreases the latency and the cost of retrieving parameters and secrets.
New FreeRTOS Long Term Support Version – We announce the second release of FreeRTOS Long Term Support (LTS) – FreeRTOS 202210.00 LTS. FreeRTOS LTS offers a more stable foundation than standard releases as manufacturers deploy and later update devices in the field. This release includes new and upgraded libraries such as AWS IoT Fleet Provisioning, Cellular LTE-M Interface, coreMQTT, and FreeRTOS-Plus-TCP.
All libraries included in this FreeRTOS LTS version will receive security and critical bug fixes until October 2024. With an LTS release, you can continue to maintain your existing FreeRTOS code base and avoid any potential disruptions resulting from FreeRTOS version upgrades. To learn more, see the FreeRTOS announcement.
Here is some news on performance improvement and increasing capacity:
Up to 10X Improving Amazon Aurora Snapshot Exporting Speed – Amazon Aurora MySQL-Compatible Edition for MySQL 5.7 and 8.0 now speed up to 10x faster snapshot exports to Amazon S3. The performance improvement is automatically applied to all types of database snapshot exports, including manual snapshots, automated system snapshots, and snapshots created by the AWS Backup service. For more information, see Exporting DB cluster snapshot data to Amazon S3 in the Amazon Aurora documentation.
3X Increasing Amazon RDS Read Capacity – Amazon Relational Database Service (RDS) for MySQL, MariaDB, and PostgreSQL now supports 15 read replicas per instance, including up to 5 cross-Region read replicas, delivering up to 3x the previous read capacity. For more information, see Working with read replicas in the Amazon RDS documentation.
2X Increasing AWS Snowball Edge Compute Capacity – The AWS Snowball Edge Compute Optimized device doubled the compute capacity up to 104 vCPUs, doubled the memory capacity up to 416GB RAM, and is now fully SSD with 28TB NVMe storage. The updated device is ideal when you need dense compute resources to run complex workloads such as machine learning inference or video analytics at the rugged, mobile edge such as trucks, aircraft or ships. You can get started by ordering a Snowball Edge device on the AWS Snow Family console.
2X Increasing Amazon SQS FIFO Default Quota – Amazon Simple Queue Service (SQS) announces the increase of default quota up to 6,000 transactions per second per API action. It is double the previous 3,000 throughput quota for a high throughput mode for FIFO (first in, first out) queues in all AWS Regions where Amazon SQS FIFO queue is available. For a detailed breakdown of default throughput quotas per Region, see Quotas related to messages in the Amazon SQS documentation.
Sustainability with AWS Partners (ft. AWS On Air) – This episode covers a broad discipline of environmental, social, and governance (ESG) issues across all regions, organization types, and industries. AWS Sustainability & Climate Tech provides a comprehensive portfolio of AWS Partner solutions built on AWS that address climate change events and the United Nation’s Sustainable Development Goals (SDG).
Upcoming AWS Events Check your calendars and sign up for these AWS events:
AWS re:Invent 2022 Attendee Guide – Browse re:Invent 2022 attendee guides, curated by AWS Heroes, AWS industry teams, and AWS Partners. Each guide contains recommended sessions, tips and tricks for building your agenda, and other useful resources. Also, seat reservations for all sessions are now open for all re:Invent attendees. You can still register for AWS re:Invent either offline or online.
AWS AI/ML Innovation Day on October 25 – Join us for this year’s AWS AI/ML Innovation Day, where you’ll hear from Bratin Saha and other leaders in the field about the great strides AI/ML has made in the past and the promises awaiting us in the future.
AWS Container Day at Kubecon 2022 on October 25–28– Come join us at KubeCon + CloudNativeCon North America 2022, where we’ll be hosting AWS Container Day Featuring Kubernetes on October 25 and educational sessions at our booth on October 26–28. Throughout the event, our sessions focus on security, cost optimization, GitOps/multi-cluster management, hybrid and edge compute, and more.
I had an amazing start to the week last week as I was speaking at the AWS Community Day NL. This event had 500 attendees and over 70 speakers, and Dr. Werner Vogels, Amazon CTO, delivered the keynote. AWS Community Days are community-led conferences organized by local communities, with a variety of workshops and sessions. I recommend checking your region for any of these events.
Last Week’s Launches Here are some launches that got my attention during the previous week.
Amazon S3 Object Lambdanow supports using your own code to change the results of HEAD and LIST requests, besides GET (which we launched last year). This feature now enables more capabilities for what you can do with S3 Object Lambda. Danilo made a Twitter thread with lots of use cases for this new launch.
Amazon SageMaker Clarifynow can provide near real-time explanations for ML predictions. SageMaker Clarify is a service that provides explainability by ML models individual predictions. These explanations are important for developers to get visibility into their training data and models to identify potential bias.
AWS Storage Gatewaynow supports 15 TiB tapes. It increased the maximum supported virtual tape size on Tape Gateway from 5 TiB to 15 TiB, so you can store more data on a single virtual tape, and you can reduce the number of tapes you need to manage.
AWS Config now supports 15 new resource types, including AWS DataSync, Amazon GuardDuty, Amazon Simple Email Service (Amazon SES), AWS AppSync, AWS Cloud Map, Amazon EC2, and AWS AppConfig. With this launch, you can use AWS Config to monitor configuration data for the supported resource types in your AWS account, and you can see how the configuration changes.
Other AWS News Some other updates and news that you may have missed:
This week an article about how AWS is leading a pilot project to turn the Greek island of Naxos into a smart island caught my attention. The project introduces smart solutions for mobility, primary healthcare, and the transport of goods. The solution has been built based on four pillars that were important for the island: sustainability, telehealth, leisure, and digital skills. Check out the whole article to learn what they are doing.
Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish, and every other week there is a new episode. The podcast is meant for builders, and it shares stories about how customers implemented and learned AWS services, how to architect applications, and how to use new services. You can listen to all the episodes directly from your favorite podcast app or at AWS Podcasts en español.
AWS open-source news and updates – This is a newsletter curated by my colleague Ricardo to bring you the latest open-source projects, posts, events, and more.
Upcoming AWS Events Check your calendars and sign up for these AWS events:
AWS re:Invent reserved seating opens on October 11. If you are planning to attend, book a spot in advance for your favorite sessions. AWS re:Invent is our biggest conference of the year, it happens in Las Vegas from November 28 to December 2, and registrations are open. Many writers of this blog have sessions at re:Invent, and you can search the event agenda using our names.
I started the post talking about AWS Community Days, and there is one in Warsaw, Poland, on October 14. If you are around Warsaw during this week, you can first check out the AWS Pop-up Hub in Warsaw that runs October 10-14 and then join for the Community Day.
On October 20, there is a virtual event for modernizing .NET workloads with Windows containers on AWS,You can register for free.
That’s all for this week. Check back next Monday for another Week in Review!
Launchmetrics offers its Brand Performance Cloud tools and intelligence to help fashion, luxury, and beauty retail executives optimize their global strategy. Launchmetrics initially operated their whole infrastructure on-premises; however, they wanted to scale their data ingestion while simultaneously providing improved and faster insights for their clients. These business needs led them to build their architecture in AWS cloud.
In this blog post, we explain how Launchmetrics’ uses Amazon Web Services (AWS) to crawl the web for online social and print media. Using the data gathered, Launchmetrics is able to provide prescriptive analytics and insights to their clients. As a result, clients can understand their brand’s momentum and interact with their audience, successfully launching their products.
Architecture overview
Launchmetrics’ platform architecture is represented in Figure 1 and composed of three tiers:
Crawl
Data Persistence
Processing
Figure 1. Launchmetrics backend architecture
The Crawl tier is composed of several Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances launched via Auto Scaling groups. Spot Instances take advantage of unused Amazon EC2 capacity at a discounted rate compared with On-Demand Instances, which are compute instances that are billed per-hour or -second with no long-term commitments. Launchmetrics heavily leverages Spot Instances. The Crawl tier is responsible for retrieving, processing, and storing data from several media sources (represented in Figure 1 with the number 1).
The Data Persistence tier consists of two components: Amazon Kinesis Data Streams and Amazon Simple Queue Service (Amazon SQS). Kinesis Data Streams stores data that the Crawl tier collects, while Amazon SQS stores the metadata of the whole process. In this context, metadata helps Launchmetrics gain insight into when the data is collected and if it has started processing. This is key information if a Spot Instance is interrupted, which we will dive deeper into later.
The third tier, Processing, also makes use of Spot Instances and is responsible for pulling data from the Data Persistence tier (represented in Figure 1 with the number 2). It then applies proprietary algorithms, both analytics and machine learning models, to create consumer insights. These insights are stored in a data layer (not depicted) that consists of an Amazon Aurora cluster and an Amazon OpenSearch Service cluster.
By having this separation of tiers, Launchmetrics is able to use a decoupled architecture, where each component can scale independently and is more reliable. Both the Crawl and the Data Processing tiers use Spot Instances for up to 90% of their capacity.
Data processing using EC2 Spot Instances
When Launchmetrics decided to migrate their workloads to the AWS cloud, Spot Instances were one of the main drivers. As Spot Instances offer large discounts without commitment, Launchmetrics was able to track more than 1200 brands, translating to 1+ billion end users. Daily, this represents tracking upwards of 500k influencer profiles, 8 million documents, and around 70 million social media comments.
Aside from the cost-savings with Spot Instances, Launchmetrics incurred collateral benefits in terms of architecture design: building stateless, decoupled, elastic, and fault-tolerant applications. In turn, their stack architecture became more loosely coupled, as well.
All Launchmetrics Auto Scaling groups have the following configuration:
Spot allocation strategy: cost-optimized
Capacity rebalance: true
Three availability zones
A diversified list of instance types
By using Auto Scaling groups, Launchmetrics is able to scale worker instances depending on how many items they have in the SQS queue, increasing the instance efficiency. Data processing workloads like the ones Launchmetrics’ platform have, are an exemplary use of multiple instance types, such as M5, M5a, C5, and C5a. When adopting Spot Instances, Launchmetrics considered other instance types to have access to spare capacity. As a result, Launchmetrics found out that workload’s performance improved, as they use instances with more resources at a lower cost.
By decoupling their data processing workload using SQS queues, processes are stopped when an interruption arrives. As the Auto Scaling group launches a replacement Spot Instance, clients are not impacted and data is not lost. All processes go through a data checkpoint, where a new Spot Instance resumes processing any pending data. Spot Instances have resulted in a reduction of up to 75% of related operational costs.
After processing data from different media sources, AWS aided Launchmetrics in producing higher quality data insights, faster: the previous on-premises architecture had a time range of 5-6 minutes to run, whereas the AWS-driven architecture takes less than 1 minute.
This is made possible by elasticity and availability compute capacity that Amazon EC2 provides compared with an on-premises static fleet. Furthermore, offloading some management and operational tasks to AWS by using AWS managed services, such as Amazon Aurora or Amazon OpenSearch Service, Launchmetrics can focus on their core business and improve proprietary solutions rather than use that time in undifferentiated activities.
Building continuous delivery pipelines
Let’s discuss how Launchmetrics makes changes to their software with so many components.
Both of their computing tiers, Crawl and Processing, consist of standalone EC2 instances launched via Auto Scaling groups and EC2 instances that are part of an Amazon Elastic Container Service (Amazon ECS) cluster. Currently, 70% of Launchmetrics workloads are still running with Auto Scaling groups, while 30% are containerized and run on Amazon ECS. This is important because for each of these workload groups, the deployment process is different.
For workloads that run on Auto Scaling groups, they use an AWS CodePipeline to orchestrate the whole process, which includes:
I. Creating a new Amazon Machine Image (AMI) using AWS CodeBuild II. Deploying the newly built AMI using Terraform in CodeBuild
For containerized workloads that run on Amazon ECS, Launchmetrics also uses a CodePipeline to orchestrate the process by:
III. Creating a new container image, and storing it in Amazon Elastic Container Registry IV. Changing the container image in the task definition, and updating the Amazon ECS service using CodeBuild
Conclusion
In this blog post, we explored how Launchmetrics is using EC2 Spot Instances to reduce costs while producing high-quality data insights for their clients. We also demonstrated how decoupling an architecture is important for handling interruptions and why following Spot Instance best practices can grant access to more spare capacity.
Using this architecture, Launchmetrics produced faster, data-driven insights for their clients and increased their capacity to innovate. They are continuing to containerize their applications and are projected to have 100% of their workloads running on Amazon ECS with Spot Instances by the end of 2023.
Facteus Inc. is a leading provider of actionable insights from sensitive transaction data. Facteus safely transforms raw financial transaction data from legacy technologies into actionable information, without compromising data privacy, through its innovative synthetic data process. Quantamatics is one of Facteus’ core product offering.
Quantamatics accelerates the time it takes a user to go from raw alternative data to insights, by providing a cloud-based, turnkey research platform that handles data from ingestion to analysis. This platform saves the analysts, data researchers, and data scientists time by doing all the preparation and normalization efforts prior to working with the data for insight discovery. The provided cloud environment also allows for easy and flexible analysis of both provided and external data sources. Quantamatics is a SaaS offering with a subscription model that provides access to both the research platform and the associated Facteus datasets.
A great place to start when evaluating existing workloads for fault tolerance and reliability is the AWS Well-Architected Framework. The Well-Architected Framework is designed to help cloud architects build secure, high-performing, resilient, and efficient infrastructure for their applications. Based on six pillars—operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability—the Framework provides a consistent approach for customers to evaluate architectures, and implement designs that will scale over time.
The AWS Well-Architected Tool, available at no charge in the AWS Management Console, lets you create self-assessments to identify and correct gaps in your current architecture. Adhering to Well-Architected principles, Facteus adopted managed services, such as Amazon EKS and Amazon Aurora Serverless, as they reduce efforts on provisioning, configuring, scaling, backing up, and so on. Additionally, using managed services helps to save on the overall costs of maintaining the services.
Facteus’ architecture overview
Before
Users can access Quantamatics for their research either through a Jupyter notebook or a Microsoft Excel plugin. Facteus used EC2 instances to directly host the underlying JupyterHub deployments and AWS Elastic Beanstalk to deploy APIs.
The legacy architecture, while cloud-based, had multiple issues that made it ineffective from a maintenance, scalability, and cost perspective (as demonstrated in Figure 1):
JupyterHub does not currently support high availability (HA) natively. This meant an EC2 failover would require relatively long unavailability while a replacement EC2 node spun up or potentially double the cost to keep an idle node on standby.
Also, with the EC2 instances being specialized, portions of each EC2 instance will remain unused, resulting in unnecessary costs compared to more modern solutions such as Amazon EKS, which can pool and divide up instances in a more granular fashion.
Finally, as the EC2 instances were standalone, solutions would need to be set up to both monitor application health and perform the appropriate actions in case of an outage.
Although Elastic Beanstalk was a great way to deploy API instances in an HA and scalable way, to completely modernize and remain consistent throughout application to a microservice-based architecture, Facteus migrated their Elastic Beanstalk instances as well, to better utilize the pooled resources.
Figure 1. Cloud-based legacy architecture
Quantamatics requires a Data Warehouse solution to constantly run behind an API to allow for acceptable request and response times. While Snowflake is a great data warehousing and big data querying solution, Facteus found it expensive for their deployment. The queries that the Quantamatics APIs run are typically not computationally expensive but do end up returning relatively large amounts of data. This makes transferring the results back to the API over the internet a potential bottleneck.
To address these bottlenecks, Facteus re-architected their application into an Amazon EKS based one, backed with Aurora Serverless v2 (Postgres).
The new architecture resolves the previous problems in two ways (Figure 2):
By using Aurora Serverless v2 (Postgres) to store and query the datasets used by the API within the same VPC instead of Snowflake, it kept the query run time relatively the same but drastically decreased both the transfer time and the associated costs due to the locality of the database as well as the cost and scalability of Aurora Serverless v2.
By switching to Amazon EKS, the underlying EC2 nodes could easily be pooled and more thoroughly utilized across the various deployments, thus reducing costs. Additionally, as the deployments were now containerized, an outage would result in the quick relocation of those containerized apps (pods) to nodes with capacity, thus reducing downtime and cost.
As a side benefit with the move to managed nodes on Amazon EKS, this completely removed the node patching overhead, as Amazon EKS safely handles the patching of the underlying nodes with a single command.
Amazon EKS monitors and restarts pods automatically, which eliminated the need to set up and manage a solution that monitors pod health and takes the appropriate actions upon failures.
Figure 2. Contemporary architecture with Amazon EKS and Aurora Serverless v2 (Postgres)
Auto scaling with Amazon EKS and Aurora Serverless
Amazon EKS helped to greatly reduce the overhead of setting up and managing the auto scaling of Quantamatics in two ways:
User compute environments could be spun up as isolated pods, with Amazon EKS spinning nodes up and down automatically based on demand.
API instances could also be automatically spun up and down based on network throughput metrics queried by Amazon EKS to handle the requests made by users in a timely fashion.
Aurora Serverless v2
With Aurora Serverless v2, the needed compute capacity of the database automatically scales based on load generated by the corresponding API requests. This both reduced the cost as the load varies heavily throughout the day, reducing the management overhead of handling spinning up and down of read replicas if other solutions were used instead.
Snowflake vs. Aurora Serverless V2 (Postgres) – Quantamatics query performance and cost comparison
The following steps were performed to migrate data from Snowflake to Aurora Serverless v2:
Use the Snowflake COPY INTO <location> command to copy the data from the Snowflake database table into one or more files in an S3 bucket.
Create tables in Aurora Serverless. Use the create_s3_uri function to load variables.
This blog post describes importing data from Amazon S3 to Amazon Aurora PostgreSQL.
Testing strategy: Run the corresponding CLI database utility for each database (snowsql vs psql) from within the VPC. Run the same query on each dataset. Return and write the results as CSV to a local file. Data set size: ~178,000,000 rows Result set size: ~418,000 rows
Data source
Configuration
Results
Snowflake
Snowflake: Medium Warehouse (running), AWS based in same Region as APIs
Cost: ~$0.01 per query based on credit usage
21.99 seconds run time
3.36 seconds run time, 18.63 seconds transfer time
Aurora Serverless V2(Postgres)
Idling on four Aurora Compute Units (ACU)
Cost: ~$0.24 an hour
Tables and indexes tuned for Quantamatics use cases
7.00 seconds run time
3.58 seconds run time, 3.42 seconds transfer time
Conclusion
The customer was able to achieve similar run times for the given dataset and query, but faster transfer speeds from Aurora Serverless due to the locality of the database. They also realized up to ~40x runtime cost savings by using Aurora Serverless—1,000 queries in Aurora Serverless vs. ~24 queries in Snowflake for the same cost.
Note: These results are specific to Quantamatics use cases where queries are fixed and well-known, and relatively limited in terms of complexity. This allowed the tables and database in Aurora Serverless v2 to be tuned for those specific purposes.
AWS recommends customers review their workloads using the AWS Well-Architected Tool to help ensure that their workloads are performant, secure, and cost-optimized. Well-Architected Framework Reviews are excellent opportunities to work together with your AWS account team and key stakeholders to discuss how modern infrastructure can help you win in the market.
With the Graviton Challenge last year, we helped customers migrate to Graviton-based EC2 instances and get up to 40 percent price performance benefit in as little as 4 days. Tens of thousands of customers, including 48 of the top 50 Amazon Elastic Compute Cloud (Amazon EC2) customers, use AWS Graviton processors for their workloads. In addition to EC2, many AWS managed services can run their workloads on Graviton. For most customers, adoption is easy, requiring minimal code changes. However, the effort and time required to move workloads to Graviton depends on a few factors including your software development environment and the technology stack on which your application is built.
This year, we want to take it a step further and make it even easier for customers to adopt Graviton not only through EC2, but also through managed services. Today, we are launching AWS Graviton Fast Start, a new program that makes it even easier to move your workloads to AWS Graviton by providing step-by-step directions for EC2 and other managed services that support the Graviton platform:
Amazon Elastic Compute Cloud (Amazon EC2) – EC2 provides the most flexible environment for a migration and can support many kinds of workloads, such as web apps, custom databases, or analytics. You have full control over the interpreted or compiled code running in the EC2 instance. You can also use many open-source and commercial software products that support the Arm64 architecture.
AWS Lambda – Migrating your serverless functions can be really easy, especially if you use an interpreted runtime such as Node.js or Python. Most of the time, you only have to check the compatibility of your software dependencies. I have shown a few examples in this blog post.
AWS Fargate – Fargate works best if your applications are already running in containers or if you are planning to containerize them. By using multi-architecture container images or images that have Arm64 in their image manifest, you get the serverless benefits of Fargate and the price-performance advantages of Graviton.
Amazon Aurora – Relational databases are at the core of many applications. If you need a database compatible with PostgreSQL or MySQL, you can use Amazon Aurora to have a highly performant and globally available database powered by Graviton.
Amazon Relational Database Service (RDS) – Similarly to Aurora, Amazon RDS engines such as PostgreSQL, MySQL, and MariaDB can provide a fully managed relational database service using Graviton-based instances.
Amazon ElastiCache – When your workload requires ultra-low latency and high throughput, you can speed up your applications with ElastiCache and have a fully managed in-memory cache running on Graviton and compatible with Redis or Memcached.
Amazon EMR – With Amazon EMR, you can run large-scale distributed data processing jobs, interactive SQL queries, and machine learning applications on Graviton using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.
Formula 1 racing told us that Graviton2-based C6gn instances provided the best price performance benefits for some of their computational fluid dynamics (CFD) workloads. More recently, they found that Graviton3 C7g instances are 40 percent faster for the same simulations and expect Graviton3-based instances to become the optimal choice to run all of their CFD workloads.
Honeycomb has 100 percent of their production workloads running on Graviton using EC2 and Lambda. They have tested the high-throughput telemetry ingestion workload they use for their observability platform against early preview instances of Graviton3 and have seen a 35 percent performance increase for their workload over Graviton2. They were able to run 30 percent fewer instances of C7g than C6g serving the same workload and with 30 percent reduced latency. With these instances in production, they expect over 50 percent price performance improvement over x86 instances.
Twitter is working on a multi-year project to leverage Graviton-based EC2 instances to deliver Twitter timelines. As part of their ongoing effort to drive further efficiencies, they tested the new Graviton3-based C7g instances. Across a number of benchmarks representative of their workloads, they found Graviton3-based C7g instances deliver 20-80 percent higher performance compared to Graviton2-based C6g instances, while also reducing tail latencies by as much as 35 percent. They are excited to utilize Graviton3-based instances in the future to realize significant price performance benefits.
With all these options, getting the benefits of running all or part of your workload on AWS Graviton can be easier than you expect. To help you get started, there’s also a free trial on the Graviton-based T4g instances for up to 750 hours per month through December 31st, 2022.
As part of my annual tradition to tell you about how AWS makes Prime Day possible, I am happy to be able to share some chart-topping metrics (check out my 2016, 2017, 2019, 2020, and 2021 posts for a look back).
My purchases this year included a first aid kit, some wood brown filament for my 3D printer, and a non-stick frying pan! According to our official news release, Prime members worldwide purchased more than 100,000 items per minute during Prime Day, with best-selling categories including Amazon Devices, Consumer Electronics, and Home.
Powered by AWS As always, AWS played a critical role in making Prime Day a success. A multitude of two-pizza teams worked together to make sure that every part of our infrastructure was scaled, tested, and ready to serve our customers. Here are a few examples:
Amazon Aurora – On Prime Day, 5,326 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed 288 billion transactions, stored 1,849 terabytes of data, and transferred 749 terabytes of data.
Amazon EC2 – For Prime Day 2022, Amazon increased the total number of normalized instances (an internal measure of compute power) on Amazon Elastic Compute Cloud (Amazon EC2) by 12%. This resulted in an overall server equivalent footprint that was only 7% larger than that of Cyber Monday 2021 due to the increased adoption of AWS Graviton2 processors.
Amazon EBS – For Prime Day, the Amazon team added 152 petabytes of EBS storage. The resulting fleet handled 11.4 trillion requests per day and transferred 532 petabytes of data per day. Interestingly enough, due to increased efficiency of some of the internal Amazon services used to run Prime Day, Amazon actually used about 4% less EBS storage and transferred 13% less data than it did during Prime Day last year. Here’s a graph that shows the increase in data transfer during Prime Day:
Amazon SES – In order to keep Prime Day shoppers aware of the deals and to deliver order confirmations, Amazon Simple Email Service (SES) peaked at 33,000 Prime Day email messages per second.
Amazon SQS – During Prime Day, Amazon Simple Queue Service (SQS) set a new traffic record by processing 70.5 million messages per second at peak:
Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 105.2 million requests per second.
Amazon SageMaker – The Amazon Robotics Pick Time Estimator, which uses Amazon SageMaker to train a machine learning model to predict the amount of time future pick operations will take, processed more than 100 million transactions during Prime Day 2022.
Package Planning – In North America, and on the highest traffic Prime 2022 day, package-planning systems performed 60 million AWS Lambda invocations, processed 17 terabytes of compressed data in Amazon Simple Storage Service (Amazon S3), stored 64 million items across Amazon DynamoDB and Amazon ElastiCache, served 200 million events over Amazon Kinesis, and handled 50 million Amazon Simple Queue Service events.
Prepare to Scale Every year I reiterate the same message: rigorous preparation is key to the success of Prime Day and our other large-scale events. If you are preparing for a similar chart-topping event of your own, I strongly recommend that you take advantage of AWS Infrastructure Event Management (IEM). As part of an IEM engagement, my colleagues will provide you with architectural and operational guidance that will help you to execute your event with confidence!
Plugsurfing aligns the entire car charging ecosystem—drivers, charging point operators, and carmakers—within a single platform. The over 1 million drivers connected to the Plugsurfing Power Platform benefit from a network of over 300,000 charging points across Europe. Plugsurfing serves charging point operators with a backend cloud software for managing everything from country-specific regulations to providing diverse payment options for customers. Carmakers benefit from white label solutions as well as deeper integrations with their in-house technology. The platform-based ecosystem has already processed more than 18 million charging sessions. Plugsurfing was acquired fully by Fortum Oyj in 2018.
Plugsurfing uses Amazon OpenSearch Service as a central data store to store 300,000 charging stations’ information and to power search and filter requests coming from mobile, web, and connected car dashboard clients. With the increasing usage, Plugsurfing created multiple read replicas of an OpenSearch Service cluster to meet demand and scale. Over time and with the increase in demand, this solution started to become cost exhaustive and limited in terms of cost performance benefit.
AWS EMEA Prototyping Labs collaborated with the Plugsurfing team for 4 weeks on a hands-on prototyping engagement to solve this problem, which resulted in 70% cost savings and doubled the performance benefit over the current solution. This post shows the overall approach and ideas we tested with Plugsurfing to achieve the results.
The challenge: Scaling higher transactions per second while keeping costs under control
One of the key issues of the legacy solution was keeping up with higher transactions per second (TPS) from APIs while keeping costs low. The majority of the cost was coming from the OpenSearch Service cluster, because the mobile, web, and EV car dashboards use different APIs for different use cases, but all query the same cluster. The solution to achieve higher TPS with the legacy solution was to scale the OpenSearch Service cluster.
The following figure illustrates the legacy architecture.
Plugsurfing APIs are responsible for serving data for four different use cases:
Radius search – Find all the EV charging stations (latitude/longitude) with in x km radius from the point of interest (or current location on GPS).
Square search – Find all the EV charging stations within a box of length x width, where the point of interest (or current location on GPS) is at the center.
Geo clustering search – Find all the EV charging stations clustered (grouped) by their concentration within a given area. For example, searching all EV chargers in all of Germany results in something like 50 in Munich and 100 in Berlin.
Radius search with filtering – Filter the results by EV charger that are available or in use by plug type, power rating, or other filters.
The OpenSearch Service domain configuration was as follows:
m4.10xlarge.search x 4 nodes
Elasticsearch 7.10 version
A single index to store 300,000 EV charger locations with five shards and one replica
AWS EMEA Prototyping Labs proposed an experimentation approach to try three high-level ideas for performance optimization and to lower overall solution costs.
We launched an Amazon Elastic Compute Cloud (EC2) instance in a prototyping AWS account to host a benchmarking tool based on k6 (an open-source tool that makes load testing simple for developers and QA engineers) Later, we used scripts to dump and restore production data to various databases, transforming it to fit with different data models. Then we ran k6 scripts to run and record performance metrics for each use case, database, and data model combination. We also used the AWS Pricing Calculator to estimate the cost of each experiment.
Experiment 1: Use AWS Graviton and optimize OpenSearch Service domain configuration
We benchmarked a replica of the legacy OpenSearch Service domain setup in a prototyping environment to baseline performance and costs. Next, we analyzed the current cluster setup and recommended testing the following changes:
Use AWS Graviton based memory optimized EC2 instances (r6g) x 2 nodes in the cluster
Reduce the number of shards from five to one, given the volume of data (all documents) is less than 1 GB
Increase the refresh interval configuration from the default 1 second to 5 seconds
Denormalize the full document; if not possible, then denormalize all the fields that are part of the search query
Upgrade to Amazon OpenSearch Service 1.0 from Elasticsearch 7.10
Plugsurfing created multiple new OpenSearch Service domains with the same data and benchmarked them against the legacy baseline to obtain the following results. The row in yellow represents the baseline from the legacy setup; the rows with green represent the best outcome out of all experiments performed for the given use cases.
DB Engine
Version
Node Type
Nodes in Cluster
Configurations
Data Modeling
Radius req/sec
Filtering req/sec
Performance Gain %
Elasticsearch
7.1
m4.10xlarge
4
5 shards, 1 replica
Nested
2841
580
0
Amazon OpenSearch Service
1.0
r6g.xlarge
2
1 shards, 1 replica
Nested
850
271
32.77
Amazon OpenSearch Service
1.0
r6g.xlarge
2
1 shards, 1 replica
Denormalized
872
670
45.07
Amazon OpenSearch Service
1.0
r6g.2xlarge
2
1 shards, 1 replica
Nested
1667
474
62.58
Amazon OpenSearch Service
1.0
r6g.2xlarge
2
1 shards, 1 replica
Denormalized
1993
1268
95.32
Plugsurfing was able to gain 95% (doubled) better performance across the radius and filtering use cases with this experiment.
Experiment 2: Use purpose-built databases on AWS for different use cases
We tested the square search use case with an Aurora PostgreSQL cluster with a db.r6g.2xlarge single node as the reader and a db.r6g.large single node as the writer. The square search used a single PostgreSQL table configured via the following steps:
Create the geo search table with geography as the data type to store latitude/longitude:
CREATE TYPE status AS ENUM ('available', 'inuse', 'out-of-order');
CREATE TABLE IF NOT EXISTS square_search
(
id serial PRIMARY KEY,
geog geography(POINT),
status status,
data text -- Can be used as json data type, or add extra fields as flat json
);
Create an index on the geog field:
CREATE INDEX global_points_gix ON square_search USING GIST (geog);
We achieved an eight-times greater improvement in TPS for the square search use case, as shown in the following table.
DB Engine
Version
Node Type
Nodes in Cluster
Configurations
Data modeling
Square req/sec
Performance Gain %
Elasticsearch
7.1
m4.10xlarge
4
5 shards, 1 replica
Nested
412
0
Aurora PostgreSQL
13.4
r6g.large
2
PostGIS, Denormalized
Single table
881
213.83
Aurora PostgreSQL
13.4
r6g.xlarge
2
PostGIS, Denormalized
Single table
1770
429.61
Aurora PostgreSQL
13.4
r6g.2xlarge
2
PostGIS, Denormalized
Single table
3553
862.38
We tested the geo clustering search use case with a DynamoDB model. The partition key (PK) is made up of three components: <zoom-level>:<geo-hash>:<api-key>, and the sort key is the EV charger current status. We examined the following:
The zoom level of the map set by the user
The geo hash computed based on the map tile in the user’s view port area (at every zoom level, the map of Earth is divided into multiple tiles, where each tile can be represented as a geohash)
The API key to identify the API user
Partition Key: String
Sort Key: String
total_pins: Number
filter1_pins: Number
filter2_pins: Number
filter3_pins: Number
5:gbsuv:api_user_1
Available
100
50
67
12
5:gbsuv:api_user_1
in-use
25
12
6
1
6:gbsuvt:api_user_1
Available
35
22
8
0
6:gbsuvt:api_user_1
in-use
88
4
0
35
The writer updates the counters (increment or decrement) against each filter condition and charger status whenever the EV charger status is updated at all zoom levels. With this model, the reader can query pre-clustered data with a single direct partition hit for all the map tiles viewable by the user at the given zoom level.
The DynamoDB model helped us gain a 45-times greater read performance for our geo clustering use case. However, it also added extra work on the writer side to pre-compute numbers and update multiple rows when the status of a single EV charger is updated. The following table summarizes our results.
DB Engine
Version
Node Type
Nodes in Cluster
Configurations
Data modeling
Clustering req/sec
Performance Gain %
Elasticsearch
7.1
m4.10xlarge
4
5 shards, 1 replica
Nested
22
0
DynamoDB
NA
Serverless
0
100 WCU, 500 RCU
Single table
1000
4545.45
Experiment 3: Use AWS [email protected] and AWS Wavelength for better network performance
We recommended that Plugsurfing use [email protected] and AWS Wavelength to optimize network performance by shifting some of the APIs at the edge to closer to the user. The EV car dashboard can use the same 5G network connectivity to invoke Plugsurfing APIs with AWS Wavelength.
Post-prototype architecture
The post-prototype architecture used purpose-built databases on AWS to achieve better performance across all four use cases. We looked at the results and split the workload based on which database performs best for each use case. This approach optimized performance and cost, but added complexity on readers and writers. The final experiment summary represents the database fits for the given use cases that provide the best performance (highlighted in orange).
Plugsurfing has already implemented a short-term plan (light green) as an immediate action post-prototype and plans to implement mid-term and long-term actions (dark green) in the future.
DB Engine
Node Type
Configurations
Radius req/sec
Radius Filtering req/sec
Clustering req/sec
Square req/sec
Monthly Costs $
Cost Benefit %
Performance Gain %
Elasticsearch 7.1
m4.10xlarge x4
5 shards
2841
580
22
412
9584,64
0
0
Amazon OpenSearch Service 1.0
r6g.2xlarge x2
1 shard
Nested
1667
474
34
142
1078,56
88,75
-39,9
Amazon OpenSearch Service 1.0
r6g.2xlarge x2
1 shard
1993
1268
125
685
1078,56
88,75
5,6
Aurora PostgreSQL 13.4
r6g.2xlarge x2
PostGIS
0
0
275
3553
1031,04
89,24
782,03
DynamoDB
Serverless
100 WCU
500 RCU
0
0
1000
0
106,06
98,89
4445,45
Summary
.
.
2052
1268
1000
3553
2215,66
76,88
104,23
The following diagram illustrates the updated architecture.
Conclusion
Plugsurfing was able to achieve a 70% cost reduction over their legacy setup with two-times better performance by using purpose-built databases like DynamoDB, Aurora PostgreSQL, and AWS Graviton based instances for Amazon OpenSearch Service. They achieved the following results:
The radius search and radius search with filtering use cases achieved better performance using Amazon OpenSearch Service on AWS Graviton with a denormalized document structure
The square search use case performed better using Aurora PostgreSQL, where we used the PostGIS extension for geo square queries
The geo clustering search use case performed better using DynamoDB
Anand Shah is a Big Data Prototyping Solution Architect at AWS. He works with AWS customers and their engineering teams to build prototypes using AWS Analytics services and purpose-built databases. Anand helps customers solve the most challenging problems using art-of-the-possible technology. He enjoys beaches in his leisure time.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.