One of the challenges with distributed systems is that they are made up of many interdependent services, which add a degree of complexity when you are trying to monitor their performance. Determining which services and APIs are experiencing high latencies or degraded availability requires manually putting together telemetry signals. This can result in time and effort establishing the root cause of any issues with the system due to the inconsistent experiences across metrics, traces, logs, real user monitoring, and synthetic monitoring.
You want to provide your customers with continuously available and high-performing applications. At the same time, the monitoring that assures this must be efficient, cost-effective, and without undifferentiated heavy lifting.
Amazon CloudWatch Application Signals helps you automatically instrument applications based on best practices for application performance. There is no manual effort, no custom code, and no custom dashboards. You get a pre-built, standardized dashboard showing the most important metrics, such as volume of requests, availability, latency, and more, for the performance of your applications. In addition, you can define Service Level Objectives (SLOs) on your applications to monitor specific operations that matter most to your business. An example of an SLO could be to set a goal that a webpage should render within 2000 ms 99.9 percent of the time in a rolling 28-day interval.
Application Signals automatically correlates telemetry across metrics, traces, logs, real user monitoring, and synthetic monitoring to speed up troubleshooting and reduce application disruption. By providing an integrated experience for analyzing performance in the context of your applications, Application Signals gives you improved productivity with a focus on the applications that support your most critical business functions.
My personal favorite is the collaboration between teams that’s made possible by Application Signals. I started this post by mentioning that distributed systems are made up of many interdependent services. On the Service Map, which we will look at later in this post, if you, as a service owner, identify an issue that’s caused by another service, you can send a link to the owner of the other service to efficiently collaborate on the triage tasks.
Getting started with Application Signals You can easily collect application and container telemetry when creating new Amazon EKS clusters in the Amazon EKS console by enabling the new Amazon CloudWatch Observability EKS add-on. Another option is to enable for existing Amazon EKS Clusters or other compute types directly in the Amazon CloudWatch console.
All of the services discovered and their golden metrics (volume of requests, latency, faults and errors) are then automatically displayed on the Services page and the Service Map. The Service Map gives you a visual deep dive to evaluate the health of a service, its operations, dependencies, and all the call paths between an operation and a dependency.
The list of services that are enabled in Application Signals will also show in the services dashboard, along with operational metrics across all of your services and dependencies to easily spot anomalies. The Application column is auto-populated if the EKS cluster belongs to an application that’s tagged in AppRegistry. The Hosted In column automatically detects which EKS pod, cluster, or namespace combination the service requests are running in, and you can select one to go directly to Container Insights for detailed container metrics such as CPU or memory utilization, to name a few.
Team collaboration with Application Signals Now, to expand on the team collaboration that I mentioned at the beginning of this post. Let’s say you consult the services dashboard to do sanity checks and you notice two SLO issues for one of your services named pet-clinic-frontend. Your company maintains a set of SLOs, and this is the view that you use to understand how the applications are performing against the objectives. For the services that are tagged in AppRegistry all teams have a central view of the definition and ownership of the application. Further navigation to the service map gives you even more details on the health of this service.
At this point you make the decision to send the link to thepet-clinic-frontendservice to Sarah whose details you found in the AppRegistry. Sarah is the person on-call for this service. The link allows you to efficiently collaborate with Sarah because it’s been curated to land directly on the triage view that is contextualized based on your discovery of the issue. Sarah notices that the POST /api/customer/owners latency has increased to 2k ms for a number of requests and as the service owner, dives deep to arrive at the root cause.
Clicking into the latency graph returns a correlated list of traces that correspond directly to the operation, metric, and moment in time, which helps Sarah to find the exact traces that may have led to the increase in latency.
Sarah uses Amazon CloudWatch Synthetics and Amazon CloudWatch RUM and has enabled the X-Ray active tracing integration to automatically see the list of relevant canaries and pages correlated to the service. This integrated view now helps Sarah gain multiple perspectives in the performance of the application and quickly troubleshoot anomalies in a single view.
Available now Amazon CloudWatch Application Signals is available in preview and you can start using it today in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Asia Pacific (Sydney), and Asia Pacific (Tokyo).
Amazon EBS Snapshots Archive in the AWS Backup console Snapshots Archive with AWS Backup is only available for snapshots with a backup frequency of one month or longer (28-day cron expression) and a retention of more than 90 days. This is a protective measure to ensure that you don’t archive snapshots, such as hourly snapshots that wouldn’t benefit from the transition to the cold tier.
The ability to archive Amazon EBS Snapshots is a new parameter of the Lifecycle section of the AWS Backup Plans. You must explicitly opt into moving your Amazon EBS Snapshots to cold storage, because this has different properties of our existing cold storage including:
Always converting an incremental backup to a full backup.
Longer recovery time objective (RTO) (up to 72 hours).
Limitations on the frequency of backups that can be transitioned to cold storage (monthly or greater).
Time in warm storage indicates how long the backups will remain in warm storage before they are transitioned to cold storage. Total retention period is the total time the backups will be retained by AWS Backup, and its value is the sum of both warm and cold storage. For backups in cold storage, the minimum retention period is 90 days. This is why the default total retention is 98 days (8 days in warm + 90 days in cold). The bar graph shows the total retention of your backups and where the backups will reside during that time. In the example shown in this graph, 8 days is in warm storage (red bar), and 90 days is in cold storage (blue bar).
To restore or use the archived Amazon EBS snapshot today (outside of AWS Backup), you have to follow a two-step process:
Temporarily or permanently restore the snapshot from archive to standard tier.
Once it’s in standard tier, call the CreateVolumeAPI from the standard tier.
With this announcement, using either the AWS Backup console or the API to restore the archived Amazon EBS snapshot in AWS Backup, the following restore workflow applies:
Enter the number of days you want to temporarily restore your snapshot from cold to standard tier.
Choose your volume configuration.
The end result will be a restored EBS volume. You will not have to manually move the snapshot from cold to standard tier, then restore the volume, this will be done automatically for you.
Now available Amazon EBS Snapshots Archive with AWS Backup is available for you today in all AWS Regions except China and AWS GovCloud (US).
As usual, you pay as you go, with no minimum or fixed fees. There are two metrics that influence Amazon EBS Snapshots Archive billing: data storage and data retrieval. You are charged for a 90-day period at minimum. This means that if you delete a snapshot archive or permanently restore it less than 90 days after creation, then we charge for the full 90-day period. The AWS Backup pricing page has the details.
Performing automatic game day testing of all your critical resources is an important step in determining that you are prepared to respond to ransomware or any data loss event. This gives you the opportunity to take appropriate corrective actions based on the results and monitor results such as success or failure from these tests. Ultimately, you will be able to ascertain if the restore times meet your expected organization’s recovery time objective (RTO) goals, helping you develop improved recovery strategies.
Today, we’re announcing restore testing, a new capability in AWS Backup that allows you to perform restore testing of your AWS resources across storage, compute, and databases. With this feature, you can automate the entire restore testing process and avoid surprises later by determining now whether you can successfully recover using your backups in the event of a data loss such as ransomware. As an additional option, to demonstrate compliance with your organizational and regulatory data governance requirements, you can use the restore job results.
Earlier, I created EC2 instances and a backup of these instances. Then, I created my restore testing plan in the AWS Backup console.
In this General section, I enter the name of the plan, a test frequency, a Start time, and a Start within. Start time sets the time for the test to begin, for example, if you have a daily test frequency set, you specify what time the plan will run each day. Start within is the period of time in which the restore test is designated to begin. AWS Backup makes a best effort to commence all designated restore jobs during the Start within time window. You have a choice to keep this very minimal or very large based on your preference.
In the Recovery point selection section, I specify the vaults that the recovery points should come from, and a timeframe of eligible recovery points as part of this restore testing plan. I left the criteria for a recovery point at the default selection. I also didn’t opt to include recovery points generated by point-in-time recovery (PITR) in this restore testing plan.
Tagging is optional so for the purposes of this test I didn’t add a tag. I was then finished with setup, and it was time for me to choose Create restore testing plan to proceed with creating this restore testing plan.
Once the restore testing plan has been created, it is time to assign resources. I start by specifying the IAM role that AWS Backup will assume when running the restore test. In terms of retention period before cleanup, I kept the default selection of deleting the restored resources immediately, to optimize costs. Alternatively, by specifying a retention period I could have also configured to integrate my own tests (for example, AWS Lambda) using Amazon EventBridge (CloudWatch Events) and send back validation status using the new PutRestoreValidationResultAPI so that it is reported in the restore job.
I have EC2 instances that I created and backed up earlier, and I specify that this plan is for Amazon EC2 resource types. I include all protected resources of this EC2 resource type in the selection scope. I have very few resources, so I didn’t add the optional tags.
I opted to use the default instance type for the restore. I also didn’t specify any additional parameters. It’s then time to choose Assign resources.
Once the resources have been assigned, all information related to the restore testing plan will be presented in a summarized form where you’ll be able to see when the restore testing jobs have executed.
Once I have enough restores performed over time, I can also view the Restore time history for every resource restored from the Protectedresources tab.
Now available Restore testing in AWS Backup is available in all AWS Regions where AWS Backup is available except AWS China Regions, AWS GovCloud (US), and Israel (Tel Aviv). To learn more, visit the AWS Backup user guide. You can submit your questions to AWS re:Post for AWS Backup or through your usual AWS Support contacts.
I’ve been at AWS for more than four years, and I’ve enjoyed every minute of it! I started as a consultant in Professional Services, and I’ve been a Security Solutions Architect for around three years.
How do you explain your job to your non-tech friends?
Well, my mother doesn’t totally understand my role, and she’s been known to tell her friends that I’m the cable company technician that installs your internet modem and router. I usually tell my non-tech friends that I help AWS customers protect their sensitive data. If I mention cryptography, I typically only get asked questions about cryptocurrency—which I’m not qualified to answer. If someone asks what cryptography is, I usually say it’s protecting data by using mathematics.
How did you get started in data protection and cryptography? What about it piqued your interest?
I originally went to school to become a network engineer, but I discovered that moving data packets from point A to point B wasn’t as interesting to me as securing those data packets. Early in my career, I was an intern at an insurance company, and I had a mentor who set up ethnical hacking lessons for me—for example, I’d come into the office and he’d have a compromised workstation preconfigured. He’d ask me to do an investigation and determine how the workstation was compromised and what could be done to isolate it and collect evidence. Other times, I’d come in and find my desk cabinets were locked with a padlock, and he wanted me to pick the lock. Security is particularly interesting because it’s an ever-evolving field, and I enjoy learning new things.
What’s been the most dramatic change you’ve seen in the data protection landscape?
One of the changes that I’ve been excited to see is an emphasis on encrypting everything. When I started my career, we’d often have discussions about encryption in the context of tradeoffs. If we needed to encrypt sensitive data, we’d have a conversation with application teams about the potential performance impact of encryption and decryption operations on their systems (for example, their databases), when to schedule downtime for the application to encrypt the data or rotate the encryption keys protecting the data, how to ensure the durability of their keys and make sure they didn’t lose data, and so on.
When I talk to customers about encryption on AWS today—of course, it’s still useful to talk about potential performance impact—but the conversation has largely shifted from “Should I encrypt this data?” to “How should I encrypt this data?” This is due to services such as AWS Key Management Service (AWS KMS) making it simpler for customers to manage encryption keys and encrypt and decrypt data in their applications with minimal performance impact or application downtime. AWS KMS has also made it simple to enable encryption of sensitive data—with over 120 AWS services integrated with AWS KMS, and services such as Amazon Simple Storage Service (Amazon S3) encrypting new S3 objects by default.
You are a frequent contributor to the AWS Security Blog. What were some of your recent posts about?
You are speaking in a couple of sessions at AWS re:Invent; what will your sessions focus on? What do you hope attendees will take away from your session?
I’m delivering two sessions at re:Invent this year. The first is a chalk talk, Centrally manage application secrets with AWS Secrets Manager (SEC221), that I’m delivering with Ritesh Desai, who is the General Manager of Secrets Manager. We’re discussing how you can securely store and manage secrets in your workloads inside and outside of AWS. We will highlight some recommended practices for managing secrets, and answer your questions about how Secrets Manager integrates with services such as AWS KMS to help protect application secrets.
The second session is also a chalk talk, Securely modernize payment applications with AWS (SEC326). I’m delivering this talk with Mark Cline, who is the Senior Product Manager of AWS Payment Cryptography. We will walk through an example scenario on creating a new payment processing application. We will discuss how to use AWS Payment Cryptography, as well as other services such as AWS Lambda, to build a simple architecture to help process and secure credit card payment data. We will also include common payment industry use cases such as tokenization of sensitive data, and how to include basic anti-fraud detection, in our example app.
What are you currently working on that you’re excited about?
My re:Invent sessions are definitely something that I’m excited about. Otherwise, I spend most of my time talking to customers about AWS Cryptography services such as AWS KMS, AWS Secrets Manager, and AWS Private Certificate Authority. I also lead a program at AWS that enables our subject matter experts to create and publish videos to demonstrate new features of AWS Security Services. I like helping people create videos, and I hope that our videos provide another mechanism for viewers who prefer information in a video format. Visual media can be more inclusive for customers with certain disabilities or for neurodiverse customers who find it challenging to focus on written text. Plus, you can consume videos differently than a blog post or text documentation. If you don’t have the time or desire to read a blog post or AWS public doc, you can listen to an instructional video while you work on other tasks, eat lunch, or take a break. I invite folks to check out the AWS Security Services Features Demo YouTube video playlist.
Is there something you wish customers would ask you about more often?
I always appreciate when customers provide candid feedback on our services. AWS is a customer-obsessed company, and we build our service roadmaps based on what our customers tell us they need. You should feel comfortable letting AWS know when something could be easier, more efficient, or less expensive. Many customers I’ve worked with have provided actionable feedback on our services and influenced service roadmaps, just by speaking up and sharing their experiences.
How about outside of work, any hobbies?
I have two toddlers that keep me pretty busy, so most of my hobbies are what they like to do. So I tend to spend a lot of time building elaborate toy train tracks, pushing my kids on the swings, and pretending to eat wooden toy food that they “cook” for me. Outside of that, I read a lot of fiction and indulge in binge-worthy TV.
If you have feedback about this post, submit comments in the Comments section below.
Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.
AWS re:Invent 2023 is fast approaching, and we can’t wait to see you in Las Vegas in November. re:Invent offers you the chance to come together with cloud enthusiasts from around the world to hear the latest cloud industry innovations, meet with Amazon Web Services (AWS) experts, and build connections. This post will highlight key security sessions organized by various themes, so you don’t miss any of the newest and most exciting tech innovations and the sessions where you can learn how to put those innovations into practice.
re:Invent offers a diverse range of content tailored to all personas. Seminar-style content includes breakout sessions and innovation talks, delivered by AWS thought leaders. These are curated to focus on topics most critical to our customers’ businesses and spotlight advancements AWS has enabled for them. For more interactive or hands-on content, check out chalk talks, dev chats, builder sessions, workshops, and code talks.
If you plan to attend re:Invent 2023, and you’re interested in connecting with a security, identity, or compliance product team, reach out to your AWS account team.
Sessions for security leaders
Security leaders are always reinventing, tasked with aligning security goals to business objectives and reducing overall risk to the organization. Attend sessions at re:Invent where you can learn from security leadership and thought leaders on how to empower your teams, build sustainable security culture, and move fast and stay secure in an ever-evolving threat landscape.
INNOVATION TALK
SEC237-INT | Move fast, stay secure: Strategies for the future of security
BREAKOUT SESSIONS
SEC211 | Sustainable security culture: Empower builders for success
SEC216 | The AWS Digital Sovereignty Pledge: Control without compromise
SEC219 | Build secure applications on AWS the well-architected way
SEC236 | The AWS data-driven perspective on threat landscape trends
NET201 | Safeguarding infrastructure from DDoS attacks with AWS edge services
The role of generative AI in security
The swift rise of generative artificial intelligence (generative AI) illustrates the need for security practices to quickly adapt to meet evolving business requirements and drive innovation. In addition to the security Innovation Talk (Move fast, stay secure: Strategies for the future of security), attend sessions where you can learn about how large language models can impact security practices, how security teams can support safer use of this technology in the business, and how generative AI can help organizations move security forward.
BREAKOUT SESSIONS
SEC210 | How security teams can strengthen security using generative AI
SEC214 | Threat modeling your generative AI workload to evaluate security risk
CHALK TALKS
SEC317 | Building secure generative AI applications on AWS
OPN201 | Evolving OSPOs for supply chain security and generative AI
AIM352 | Securely build generative AI apps and control data with Amazon Bedrock
DEV CHAT
COM309 | Shaping the future of security on AWS with generative AI
Architecting and operating container workloads securely
The unique perspectives that drive how system builders and security teams perceive and address system security can present both benefits and obstacles to collaboration within a business. Find out more about how you can bolster your container security through sessions focused on best practices, detecting and patching threats and vulnerabilities in containerized environments, and managing risk across your AWS container workloads.
BREAKOUT SESSIONS
CON325 | Securing containerized workloads on Amazon ECS and AWS Fargate
CON320 | Building for the future with AWS serverless services
CHALK TALKS
SEC332 | Comprehensive vulnerability management across your AWS environments
FSI307 | Best practices for securing containers and being compliant
CON334 | Strategies and best practices for securing containerized environments
WORKSHOP
SEC303 | Container threat detection with AWS security services
BUILDER SESSION
SEC330 | Patch it up: Building a vulnerability management solution
Zero Trust
At AWS, we consider Zero Trust a security model—not a product. Zero Trust requires users and systems to strongly prove their identities and trustworthiness, and enforces fine-grained identity-based authorization rules before allowing access to applications, data, and other systems. It expands authorization decisions to consider factors like the entity’s current state and the environment. Learn more about our approach to Zero Trust in these sessions.
INNOVATION TALK
SEC237-INT | Move fast, stay secure: Strategies for the future of security
CHALK TALKS
WPS304 | Using Zero Trust to reduce security risk for the public sector
OPN308 | Build and operate a Zero Trust Apache Kafka cluster
NET312 | Connecting and securing services with Amazon VPC Lattice
NET315 | Building Zero Trust architectures using AWS Verified Access
WORKSHOPS
SEC302 | Zero Trust architecture for service-to-service workloads
Managing identities and encrypting data
At AWS, security is our top priority. AWS provides you with features and controls to encrypt data at rest, in transit, and in memory. We build features into our services that make it easier to encrypt your data and control user and application access to data. Explore these topics in depth during these sessions.
BREAKOUT SESSIONS
SEC209 | Modernize authorization: Lessons from cryptography and authentication
SEC336 | Spur productivity with options for identity and access
SEC333 | Better together: Using encryption & authorization for data protection
CHALK TALKS
SEC221 | Centrally manage application secrets with AWS Secrets Manager
SEC322 | Integrate apps with Amazon Cognito and Amazon Verified Permissions
SEC223 | Optimize your workforce identity strategy from top to bottom
WORKSHOPS
SEC247 | Practical data protection and risk assessment for sensitive workloads
For a full view of security content, including hands-on learning and interactive sessions, explore the AWS re:Invent catalog and under Topic, filter on Security, Compliance, & Identity. Not able to attend in-person? Livestream keynotes and leadership sessions for free by registering for the virtual-only pass!
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Join us on Thursday, October 5, 2023, for a free-to-attend online event, Data and Generative AI Day. AWS will stream the event simultaneously across multiple platforms, including LinkedIn Live and YouTube.
In the realm of generative AI, the power and potential hidden within your organization’s data are more expansive than ever before. Generative AI has the capability to reshape customer interactions, elevate employee productivity, stimulate creative ideation, and drive groundbreaking innovation. However, as a forward-thinking leader, what steps are required to fully harness this data-driven potential and translate it into tangible outcomes?
During this half-day event, AWS experts, partners, customers, and leading startups will provide you insights into their efforts to propel innovation using data and generative AI within the ever-evolving landscape of today. You shall find practical guidance from industry leaders on how to navigate the diverse spectrum of opportunities and challenges presented by this transformative technology, all while gaining a glimpse into what the future holds in store.
Here are some of the highlights you can expect from this event.
Swami Sivasubramanian, VP, Database, Analytics, and ML at AWS, will kick off the event with a keynote session where he will share the blueprint to democratize data and AI for business leaders. Swami will share how leaders can usher in the right mindset, strategy, and tools to translate the promise of generative AI into real business value.
Tom Godden, Director of Enterprise Strategy at AWS, will explore practical strategies for leveraging generative AI to drive business outcomes. He will share the frameworks for identifying opportunities to pilot generative AI across your organization and provide business leaders with a timely understanding of how to employ these powerful emerging capabilities.
Gopinath Sankaran, Vice President, Strategic Cloud Ecosystems at Informatica, will share insights on the impact of generative AI on data management and explore how Informatica’s AI-powered Intelligent Data Management Cloud and AWS AI and Analytics services can power a new wave of insights and experiences.
Diego Saenz, Managing Director of Data & AI at Deloitte, and Jojy Matthew, Principal, Global Financial Services Industry (GFSI) Data, Analytics, & AI at Deloitte, will share what a well-crafted data strategy means to generative AI success. Diego will share practical advice on assessing if your data estate is ready for leveraging generative AI and driving business outcomes.
You will hear from AWS leaders and AWS customers FOX, Salesforce, and Booking.com, as they share their data and generative AI journeys and explain how you can leverage this transformational technology to re-imagine your customer and employee experiences.
You can add an event reminder to your calendar by registering on the event page.
Last week I attended the AWS Summit Johannesburg. This was the first summit to be hosted in my own country and my own city since 2019 so it was very special to have the opportunity to attend. It was great to get to meet with so many of our customers and hear how they are building on AWS.
Now on to the AWS updates. I’ve compiled a few announcements and upcoming events you need to know about. Let’s get started!
Last Week’s Launches Amazon Bedrock Is Now Generally Available – Amazon Bedrock was announced in preview in April of this year as part of a set of new tools for building with generative AI on AWS. Last week’s announcement of this service being generally available was received with a lot of excitement and customers have already been sharing what they are building with Amazon Bedrock. I quite enjoyed this lighthearted post from AWS Serverless Hero Jones Zachariah Noel about the “Bengaluru with traffic-filled roads” image he produced using Stability AI’s Stable Diffusion XL image generation model on Amazon Bedrock.
Amazon MSK Introduces Managed Data Delivery from Apache Kafka to Your Data Lake – Amazon MSK was released in 2019 to help our customers reduce the work needed to set up, scale, and manage Apache Kafka in production. Now you can continuously load data from an Apache Kafka cluster to Amazon Simple Storage Service (Amazon S3).
Other AWS News A few more news items and blog posts you might have missed:
For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.
Upcoming AWS Events We have the following upcoming events:
AWS Cloud Days (October 10, 24) – Connect and collaborate with other like-minded folks while learning about AWS at the AWS Cloud Day in Athens and Prague.
AWS Innovate Online (October 19) – Register for AWS Innovate Online to learn how you can build, run, and scale next-generation applications on the most extensive cloud platform. There will be 80+ sessions delivered in five languages and you’ll receive a certificate of attendance to showcase all you’ve learned.
We’re focused on improving our content to provide a better customer experience, and we need your feedback to do so. Take this quick survey to share insights on your experience with the AWS Blog. Note that this survey is hosted by an external company, so the link doesn’t lead to our website. AWS handles your information as described in the AWS Privacy Notice.
Adapting to a complex landscape shaped by return-to-office mandates, pressure to migrate out of self-operated data centers, escalating security concerns, scarcity of in-house IT expertise, and constant focus on controlling expenses creates numerous challenges for IT teams responsible for providing the tools employees need to do their jobs.
AWS End User Computing Innovation Day 2023 is a one-day, free virtual event designed to dissect these very challenges. Join us as we delve into how AWS End User Computing (EUC) services can be harnessed to navigate this transformative era. Discover how to construct a remarkably agile and secure foundation poised to support the immediate and future requirements of remote and hybrid workforces.
During this event, you will have the opportunity to hear directly from senior leaders at AWS. Here are some of the highlights you can expect from this event.
Keynote – Muneer Mirza, General Manager of AWS End User Computing, will kick off with a keynote session. Muneer will explore an array of strategic approaches primed to maximize agility and foster seamless adaptation to change.
Browser-based workload security – Brett Taylor, General Manager of Amazon WorkSpaces Web, will discuss ways to secure web-based applications using Amazon WorkSpaces Web, so you can strengthen your security and compliance posture.
You can add an event reminder to your calendar by registering on the event page.
Data warehousing provides a business with several benefits such as advanced business intelligence and data consistency. It plays a big role within an organization by helping to make the right strategic decision at the right moment which could have a huge impact in a competitive market. One of the major and essential parts in a data warehouse is the extract, transform, and load (ETL) process which extracts the data from different sources, applies business rules and aggregations and then makes the transformed data available for the business users.
This process is always evolving to reflect new business and technical requirements, especially when working in an ambitious market. Nowadays, more verification steps are applied to source data before processing them which so often add an administration overhead. Hence, automatic notifications are more often required in order to accelerate data ingestion, facilitate monitoring and provide accurate tracking about the process.
Amazon Redshift is a fast, fully managed, cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. It also helps you to securely access your data in operational databases, data lakes or third-party datasets with minimal movement or copying. AWS Step Functions is a fully managed service that gives you the ability to orchestrate and coordinate service components. Amazon S3 Event Notifications is an Amazon S3 feature that you can enable in order to receive notifications when specific events occur in your S3 bucket.
In this post we discuss how we can build and orchestrate in a few steps an ETL process for Amazon Redshift using Amazon S3 Event Notifications for automatic verification of source data upon arrival and notification in specific cases. And we show how to use AWS Step Functions for the orchestration of the data pipeline. It can be considered as a starting point for teams within organizations willing to create and build an event driven data pipeline from data source to data warehouse that will help in tracking each phase and in responding to failures quickly. Alternatively, you can also use Amazon Redshift auto-copy from Amazon S3 to simplify data loading from Amazon S3 into Amazon Redshift.
Solution overview
The workflow is composed of the following steps:
A Lambda function is triggered by an S3 event whenever a source file arrives at the S3 bucket. It does the necessary verifications and then classifies the file before processing by sending it to the appropriate Amazon S3 prefix (accepted or rejected).
There are two possibilities:
If the file is moved to the rejected Amazon S3 prefix, an Amazon S3 event sends a message to Amazon SNS for further notification.
If the file is moved to the accepted Amazon S3 prefix, an Amazon S3 event is triggered and sends a message with the file path to Amazon SQS.
An Amazon EventBridge scheduled event triggers the AWS Step Functions workflow.
The workflow executes a Lambda function that pulls out the messages from the Amazon SQS queue and generates a manifest file for the COPY command.
Once the manifest file is generated, the workflow starts the ETL process using stored procedure.
The following image shows the workflow.
Prerequisites
Before configuring the previous solution, you can use the following AWS CloudFormation template to set up and create the infrastructure
Give the stack a name, select a deployment VPC and define the master user for the Amazon Redshift cluster by filling in the two parameters MasterUserName and MasterUserPassword.
The template will create the following services:
An S3 bucket
An Amazon Redshift cluster composed of two ra3.xlplus nodes
An empty AWS Step Functions workflow
An Amazon SQS queue
An Amazon SNS topic
An Amazon EventBridge scheduled rule with a 5-minute rate
Two empty AWS Lambda functions
IAM roles and policies for the services to communicate with each other
The names of the created services are usually prefixed by the stack’s name or the word blogdemo. You can find the names of the created services in the stack’s resources tab.
Step 1: Configure Amazon S3 Event Notifications
Create the following four folders in the S3 bucket:
received
rejected
accepted
manifest
In this scenario, we will create the following three Amazon S3 event notifications:
Trigger an AWS Lambda function on the received folder.
Send a message to the Amazon SNS topic on the rejected folder.
Send a message to Amazon SQS on the accepted folder.
To create an Amazon S3 event notification:
Go to the bucket’s Properties tab.
In the Event Notifications section, select Create Event Notification.
Fill in the necessary properties:
Give the event a name.
Specify the appropriate prefix or folder (accepted/, rejected/ or received/).
Select All object create events as an event type.
Select and choose the destination (AWS lambda, Amazon SNS or Amazon SQS). Note: for an AWS Lambda destination, choose the function that starts with ${stackname}-blogdemoVerify_%
At the end, you should have three Amazon S3 events:
An event for the received prefix with an AWS Lambda function as a destination type.
An event for the accepted prefix with an Amazon SQS queue as a destination type.
An event for the rejected prefix with an Amazon SNS topic as a destination type.
The following image shows what you should have after creating the three Amazon S3 events:
Step 2: Create objects in Amazon Redshift
Connect to the Amazon Redshift cluster and create the following objects:
Three schemas:
create schema blogdemo_staging; -- for staging tables
create schema blogdemo_core; -- for target tables
create schema blogdemo_proc; -- for stored procedures
A table in the blogdemo_staging and blogdemo_core schemas:
create table ${schemaname}.rideshare
(
id_ride bigint not null,
date_ride timestamp not null,
country varchar (20),
city varchar (20),
distance_km smallint,
price decimal (5,2),
feedback varchar (10)
) distkey(id_ride);
A stored procedure to extract and load data into the target schema:
create or replace procedure blogdemo_proc.elt_rideshare (bucketname in varchar(200),manifestfile in varchar (500))
as $$
begin
-- purge staging table
truncate blogdemo_staging.rideshare;
-- copy data from s3 bucket to staging schema
execute 'copy blogdemo_staging.rideshare from ''s3://' + bucketname + '/' + manifestfile + ''' iam_role default delimiter ''|'' manifest;';
-- apply transformation rules here
-- insert data into target table
insert into blogdemo_core.rideshare
select * from blogdemo_staging.rideshare;
end;
$$ language plpgsql;
Set the role ${stackname}-blogdemoRoleRedshift_% as a default role:
In the Amazon Redshift console, go to clusters and click on the cluster blogdemoRedshift%.
Go to the Properties tab.
In the Cluster permissions section, select the role ${stackname}-blogdemoRoleRedshift%.
Click on Set default then Make default.
Step 3: Configure Amazon SQS queue
The Amazon SQS queue can be used as it is; this means with the default values. The only thing you need to do for this demo is to go to the created queue ${stackname}-blogdemoSQS% and purge the test messages generated (if any) by the Amazon S3 event configuration. Copy its URL in a text file for further use (more precisely, in one of the AWS Lambda functions).
Step 4: Setup Amazon SNS topic
In the Amazon SNS console, go to the topic ${stackname}-blogdemoSNS%
Click on the Create subscription button.
Choose the blogdemo topic ARN, email protocol, type your email and then click on Create subscription.
Confirm your subscription in your email that you received.
Step 5: Customize the AWS Lambda functions
The following code verifies the name of a file. If it respects the naming convention, it will move it to the accepted folder. If it does not respect the naming convention, it will move it to the rejected one. Copy it to the AWS Lambda function ${stackname}-blogdemoLambdaVerify and then deploy it:
The second AWS Lambda function ${stackname}-blogdemonLambdaGenerate% retrieves the messages from the Amazon SQS queue and generates and stores a manifest file in the S3 bucket manifest folder. Copy the following content, replace the variable ${sqs_url} by the value retrieved in Step 3 and then click on Deploy.
Step 6: Add tasks to the AWS Step Functions workflow
Create the following workflow in the state machine ${stackname}-blogdemoStepFunctions%.
If you would like to accelerate this step, you can drag and drop the content of the following JSON file in the definition part when you click on Edit. Make sure to replace the three variables:
${GenerateManifestFileFunctionName} by the ${stackname}-blogdemoLambdaGenerate% arn.
${RedshiftClusterIdentifier} by the Amazon Redshift cluster identifier.
${MasterUserName} by the username that you defined while deploying the CloudFormation template.
Step 7: Enable Amazon EventBridge rule
Enable the rule and add the AWS Step Functions workflow as a rule target:
Go to the Amazon EventBridge console.
Select the rule created by the Amazon CloudFormation template and click on Edit.
Enable the rule and click Next.
You can change the rate if you want. Then select Next.
Add the AWS Step Functions state machine created by the CloudFormation template blogdemoStepFunctions% as a target and use an existing role created by the CloudFormation template ${stackname}-blogdemoRoleEventBridge%
Click on Next and then Update rule.
Test the solution
In order to test the solution, the only thing you should do is upload some csv files in the received prefix of the S3 bucket. Here are some sample data; each file contains 1000 rows of rideshare data.
If you upload them in one click, you should receive an email because the ridesharedata2022.csv does not respect the naming convention. The other three files will be loaded in the target table blogdemo_core.rideshare. You can check the Step Functions workflow to verify that the process finished successfully.
Clean up
Go to the Amazon EventBridge console and delete the rule ${stackname}-blogdemoevenbridge%.
In the Amazon S3 console, select the bucket created by the CloudFormation template ${stackname}-blogdemobucket% and click on Empty.
Go to Subscriptions in the Amazon SNS console and delete the subscription created in Step 4.
In the AWS CloudFormation console, select the stack and delete it.
Conclusion
In this post, we showed how different AWS services can be easily implemented together in order to create an event-driven architecture and automate its data pipeline, which targets the cloud data warehouse Amazon Redshift for business intelligence applications and complex queries.
About the Author
Ziad WALI is an Acceleration Lab Solutions Architect at Amazon Web Services. He has over 10 years of experience in databases and data warehousing where he enjoys building reliable, scalable and efficient solutions. Outside of work, he enjoys sports and spending time in nature.
In his keynote of AWS re:Invent 2021, Dr. Werner Vogels shared the insight of how “the everywhere cloud” is bringing AWS to new locales through AWS hardware and services and spotlighted it as one of his tech predictions for 2022 and beyond in his blog post.
“What we will see in 2022, and even more so in the years to come, is the cloud accelerating beyond the traditional centralized infrastructure model and into unexpected environments where specialized technology is needed. The cloud will be in your car, your tea kettle, and your TV. The cloud will be in everything from trucks driving down the road, to the ships and planes that transport goods. The cloud will be globally distributed, and connected to almost any digital device or system on Earth, and even in space.”
AWS provides a truly consistent and secure experience to build and run applications across the continuum of environments where customers operate—from the cloud to large metro areas, 5G networks, on-premises locations, and to mobile and Internet of Things (IoT) devices.
To learn more, join us for AWS Hybrid Cloud & Edge Day, a free-to-attend one-day virtual event on August 30, 2023, starting at 10:00 AM PDT (1:00 PM ET). We will stream the event simultaneously across multiple platforms, including LinkedIn Live, Twitter, YouTube, and Twitch.
You can hear from AWS leaders and industry analysts on the latest hybrid cloud and edge computing trends and emerging technologies and learn best practices for using AWS hybrid cloud and edge services across the cloud continuum. Also, learn from our customers on data strategies and key use cases and gain a deeper understanding of AWS hybrid cloud and edge services and new features and benefits.
Here are some of the highlights you can expect from this event:
Leadership session – To kick off the day, we have a leadership session featuring Jan Hofmeyr, vice president of EC2 Edge, sharing insights into how customers are building high-performance, intelligent applications with recently announced AWS hybrid cloud, edge, and IoT capabilities. Elias Khnaser, chief of research at EK Media Group, will join Jan to discuss the global, business, and economic trends impacting hybrid cloud and edge computing and discuss the customer requirements and use cases.
Cloud-closer sessions – We’ll discuss how AWS is bringing the cloud closer to metro areas and telco networks. Services such as AWS Local Zones, AWS Outposts family, and AWS Wavelength bring the power of cloud compute and storage to the edge of 5G networks, unlocking more performant mobile experiences. We’ll highlight new and innovative use cases, including Norton LifeLock, Electronic Arts, and Epic Games, who have taken advantage of the operational consistency between AWS Regions and the edge. Also you can learn how to deploy in hybrid cloud scenarios in on-premises locations, such as examples from MindBody and ElToro through Onica, and more customer cases.
On-premises sessions – Learn about our options to bring AWS Cloud to your data centers and on-premises locations for a truly consistent experience across your environments. We will review real-world examples of how AWS hybrid and edge services enable local processing of data for faster response time and faster decision-making. Also, we will share how Toyota takes advantage of hybrid options from Amazon ECS and Amazon EKS to use familiar management tools across your environments to successfully modernize your applications. You can learn how to meet your on-premises regulatory requirements and real-world scenarios effectively in critical aspects of digital sovereignty and data residency.
Rugged edge sessions – You will learn about AWS services to support rugged, mobile, and disconnected edge, such as AWS Snow Family to enable organizations to deploy compute workloads in locations with denied, disrupted, intermittent, and limited (DDIL) connectivity. Learn how DDR.Live deployed their own 4G/LTE or 5G private network using AWS Private 5G for live events in the place with limited wireless connection. We will discuss the top use cases, such as deploying a pre-trained object detection model and architecting applications at the edge. Finally, we will discuss the benefits and requirements of operating at the edge with Holger Mueller, vice president and principal analyst, Constellation Research, Inc.
IoT panel discussion – We will discuss from panelist of AWS IoT customers and industry experts on their innovation journey. Join us to see how EuroTech brought to market a set of devices and services that improve operational efficiencies with connectivity at the edge. You’ll also hear how Wallbox, an Electric Vehicle charging company, reduced their operational costs and scaled efficiently with AWS IoT services.
Multicloud sessions – AWS has the tools to help you run and support your multicloud operations in the areas of governance, ops management, observability, and more. We will discuss common challenges in hybrid and multicloud environments and how AWS helps you manage, operate, and automate your processes. We’ll also talk about how Rackspace used AWS Systems Manager for instance patching across hybrid and multicloud environments, automating their infrastructure management across cloud providers.
This event is for any customer and builder who is eager to learn more about hybrid cloud, edge computing, IoT, networking, content delivery, and 5G. We’ll cover how you can support applications that need to remain on premises or at the edge due to low latency, local data processing, or data residency requirements.
To learn more details, see the event schedule, and register for AWS Hybrid Cloud & Edge Day, go to the event page.
Amazon EventBridge Scheduler now supports configuring automatic deletion of schedules after completion. Now you can configure one-time and recurring schedules with an end date to be automatically deleted upon completion to avoid managing individual schedules.
Amazon EventBridge Scheduler allows you to create, run, and manage schedules on scale. Using EventBridge Scheduler, you can schedule millions of tasks to invoke over 270 AWS services and over 6,000 API operations, such as AWS Lambda, AWS Step Functions, and Amazon SNS.
By default, EventBridge Scheduler allows customers to have 1 million schedules per account, which can be increased as needed. However, completed schedules are counted towards the account quota limits. In addition, completed schedules are visible when listing schedules, and require customers to remove them. Some customers have created their own patterns to automatically remove completed schedules and since the EventBridge Scheduler announcement last November, this was one of the most requested features by customers.
Deleting after completion
When you configure automatic deletion for a schedule, EventBridge Scheduler deletes the schedule shortly after its last target invocation. You can set up automatic deletion when you create the schedule, or you can update the schedule settings at any point before its last invocation.
You can configure this setting in one-time and recurring schedules.
One-time schedules: your schedule is deleted after the schedule has invoked its target once.
Recurring schedules: set with rate or cron expressions, your schedule is deleted after the last invocation.
If all retries are exhausted because of failure for a schedule configured with automatic deletion, the schedule is deleted shortly after the last unsuccessful attempt.
With this new capability, you can save time, resources, and operational costs when managing your schedules.
Setting up schedules to delete after completion
You can create schedules that are automatically deleted after completion from the AWS Management Console, AWS SDK, or AWS CLI in all AWS Regions where EventBridge Scheduler is available.
For example, imagine that you are a developer on a platform that allows end users to receive notifications when a task is due. You are already using EventBridge Scheduler to implement this feature. For every task that your users create in your application, your code creates a new schedule in EventBridge Scheduler. You can now configure all these schedules to be deleted automatically after completion. And shortly after the schedules run, they are removed from your EventBridge Scheduler, allowing you to scale your system and keep on creating schedules, making it easier to manage your active schedules and quota limits.
Let see how you can implement this example with the new capability of EventBridge Scheduler. When a user creates a new task with a reminder, a function is triggered from your application. That function creates a one-time schedule in EventBridge Scheduler.
This example shows how you can create a new one-time schedule that is automatically deleted after completion using the AWS CLI and has SNS as a target. Make sure that you update the AWS CLI to the latest version. Then you can create a new schedule with the parameter action-after-completion ‘DELETE’.
This command creates a one-time schedule with the name SendEmailOnce, that runs at a specific date, defined in the schedule-expression, and in a specific time zone, defined in the schedule-expression-timezone. This schedule is not using the flexible time window feature. Next, you must define the target for this schedule. This one sends a message to an SNS topic.
You can validate that your schedule is created correctly from the AWS CLI with the get-schedule command.
In addition, you can see the details of the schedule from the AWS Management Console.
Now when the date of the notification arrives, EventBridge Scheduler invokes the target configured in the schedule, in this case SNS, and emails a notification to the customer.
Shortly after this schedule is completed, if you list the schedules, you see that the schedule was deleted and it is no longer listed.
Traditionally, many problems that EventBridge Scheduler solves were addressed using batch processes and pull-based models.
Some organizations are using EventBridge Scheduler to replace their pull-based models for a more dynamic push-based model. Before implementing Scheduler, they were relying on the customer to ask for the data when they need it. Now, with EventBridge Scheduler, they are creating schedules to report back to their customers at critical times of their journey.
For example, an airline can use EventBridge Scheduler to create one-time schedules 24 hours, 4 hours, and 2 hours before the flight, to keep their passengers up to date with the flight status. Customers receive a notification with the link for the online check-in, the check-in counter number, baggage pick up information, and any flight changes that occur. In this way, passengers are always up to date on their flight status and they can take immediate action. This dynamic model not only helps to improve the customer experience but also improves the operational efficiency for the airline.
Other organizations use EventBridge Scheduler to replace batch operations, as you can configure a schedule that starts a batch process at the time of the day you need. Also, you can take advantage of EventBridge Scheduler time zones and run the processes at the time that make sense for your end customer.
For example, consider an international financial institution that must send customers a statement of their account at the end of the day. You can use EventBridge Scheduler to set up a recurrent schedule for each of your customers that sends a report at the end of the day of your customers’ time zone. In this way, you can improve the customer experience as now the system is personalized for their settings, and also reduce operational overhead, as the processing operations are distributed throughout the day.
In addition, EventBridge Scheduler solves many new use cases for customers. For example, if you are a financial institution that handles payments, you can create a one-time schedule for every large transaction that needs a confirmation. If the transaction is not confirmed when the schedule runs, you can cancel the transaction. This decreases the risk of handling transactions, improves the customer experience, and also improves the automation of your processes by making them real time.
Another use case is to handle credit card expiration dates. You can create a one-time schedule that emails the customer to update their credit card information one month before the expiration date. This solution removes operational overhead compared to the traditional implementation of using servers and batch processes.
Conclusion
In the preceding use cases listed, automation and task scheduling improve the end user experience, remove undifferentiated heavy lifting, and benefit from using the new capability of removing schedules after their completion.
This blog post introduces the new capability from Amazon EventBridge Scheduler that automatically deletes the completed schedules. This feature simplifies the use of EventBridge Scheduler, reduces the operational overhead of managing schedules at scale, and allows you to scale even further.
To get started with EventBridge Scheduler, visit Serverless Land patterns where you can find over 20 patterns using this service.
In March 2023, AWS and NVIDIA announced a multipart collaboration focused on building the most scalable, on-demand artificial intelligence (AI) infrastructure optimized for training increasingly complex large language models (LLMs) and developing generative AI applications.
We preannounced Amazon Elastic Compute Cloud (Amazon EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs and AWS’s latest networking and scalability that will deliver up to 20 exaflops of compute performance for building and training the largest machine learning (ML) models. This announcement is the product of more than a decade of collaboration between AWS and NVIDIA, delivering the visual computing, AI, and high performance computing (HPC) clusters across the Cluster GPU (cg1) instances (2010), G2 (2013), P2 (2016), P3 (2017), G3 (2017), P3dn (2018), G4 (2019), P4 (2020), G5 (2021), and P4de instances (2022).
Most notably, ML model sizes are now reaching trillions of parameters. But this complexity has increased customers’ time to train, where the latest LLMs are now trained over the course of multiple months. HPC customers also exhibit similar trends. With the fidelity of HPC customer data collection increasing and data sets reaching exabyte scale, customers are looking for ways to enable faster time to solution across increasingly complex applications.
Introducing EC2 P5 Instances Today, we are announcing the general availability of Amazon EC2 P5 instances, the next-generation GPU instances to address those customer needs for high performance and scalability in AI/ML and HPC workloads. P5 instances are powered by the latest NVIDIA H100 Tensor Core GPUs and will provide a reduction of up to 6 times in training time (from days to hours) compared to previous generation GPU-based instances. This performance increase will enable customers to see up to 40 percent lower training costs.
P5 instances provide 8 x NVIDIA H100 Tensor Core GPUs with 640 GB of high bandwidth GPU memory, 3rd Gen AMD EPYC processors, 2 TB of system memory, and 30 TB of local NVMe storage. P5 instances also provide 3200 Gbps of aggregate network bandwidth with support for GPUDirect RDMA, enabling lower latency and efficient scale-out performance by bypassing the CPU on internode communication.
Here are the specs for these instances:
Instance Size
vCPUs
Memory (GiB)
GPUs (H100)
Network Bandwidth (Gbps)
EBS Bandwidth (Gbps)
Local Storage (TB)
P5.48xlarge
192
2048
8
3200
80
8 x 3.84
Here’s a quick infographic that shows you how the P5 instances and NVIDIA H100 Tensor Core GPUs compare to previous instances and processors:
P5 instances are ideal for training and running inference for increasingly complex LLMs and computer vision models behind the most demanding and compute-intensive generative AI applications, including question answering, code generation, video and image generation, speech recognition, and more. P5 will provide up to 6 times lower time to train compared with previous generation GPU-based instances across those applications. Customers who can use lower precision FP8 data types in their workloads, common in many language models that use a transformer model backbone, will see further benefit at up to 6 times performance increase through support for the NVIDIA transformer engine.
HPC customers using P5 instances can deploy demanding applications at greater scale in pharmaceutical discovery, seismic analysis, weather forecasting, and financial modeling. Customers using dynamic programming (DP) algorithms for applications like genome sequencing or accelerated data analytics will also see further benefit from P5 through support for a new DPX instruction set.
This enables customers to explore problem spaces that previously seemed unreachable, iterate on their solutions at a faster clip, and get to market more quickly.
You can see the detail of instance specifications along with comparisons of instance types between p4d.24xlarge and new p5.48xlarge below:
Feature
p4d.24xlarge
p5.48xlarge
Comparision
Number & Type of Accelerators
8 x NVIDIA A100
8 x NVIDIA H100
–
FP8 TFLOPS per Server
–
16,000
640% vs.A100 FP16
FP16 TFLOPS per Server
2,496
8,000
GPU Memory
40 GB
80 GB
200%
GPU Memory Bandwidth
12.8 TB/s
26.8 TB/s
200%
CPU Family
Intel Cascade Lake
AMD Milan
–
vCPUs
96
192
200%
Total System Memory
1152 GB
2048 GB
200%
Networking Throughput
400 Gbps
3200 Gbps
800%
EBS Throughput
19 Gbps
80 Gbps
400%
Local Instance Storage
8 TBs NVMe
30 TBs NVMe
375%
GPU to GPU Interconnect
600 GB/s
900 GB/s
150%
Second-generation Amazon EC2 UltraClusters and Elastic Fabric Adaptor P5 instances provide market-leading scale-out capability for multi-node distributed training and tightly coupled HPC workloads. They offer up to 3,200 Gbps of networking using the second-generation Elastic Fabric Adaptor (EFA) technology, 8 times compared with P4d instances.
To address customer needs for large-scale and low latency, P5 instances are deployed in the second-generation EC2 UltraClusters, which now provide customers with lower latency across up to 20,000+ NVIDIA H100 Tensor Core GPUs. Providing the largest scale of ML infrastructure in the cloud, P5 instances in EC2 UltraClusters deliver up to 20 exaflops of aggregate compute capability.
EC2 UltraClusters use Amazon FSx for Lustre, fully managed shared storage built on the most popular high-performance parallel file system. With FSx for Lustre, you can quickly process massive datasets on demand and at scale and deliver sub-millisecond latencies. The low-latency and high-throughput characteristics of FSx for Lustre are optimized for deep learning, generative AI, and HPC workloads on EC2 UltraClusters.
FSx for Lustre keeps the GPUs and ML accelerators in EC2 UltraClusters fed with data, accelerating the most demanding workloads. These workloads include LLM training, generative AI inferencing, and HPC workloads, such as genomics and financial risk modeling.
Getting Started with EC2 P5 Instances To get started, you can use P5 instances in the US East (N. Virginia) and US West (Oregon) Region.
When launching P5 instances, you will choose AWS Deep Learning AMIs (DLAMIs) to support P5 instances. DLAMI provides ML practitioners and researchers with the infrastructure and tools to quickly build scalable, secure distributed ML applications in preconfigured environments.
You will be able to run containerized applications on P5 instances with AWS Deep Learning Containers using libraries for Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). For a more managed experience, you can also use P5 instances via Amazon SageMaker, which helps developers and data scientists easily scale to tens, hundreds, or thousands of GPUs to train a model quickly at any scale without worrying about setting up clusters and data pipelines. HPC customers can leverage AWS Batch and ParallelCluster with P5 to help orchestrate jobs and clusters efficiently.
Existing P4 customers will need to update their AMIs to use P5 instances. Specifically, you will need to update your AMIs to include the latest NVIDIA driver with support for NVIDIA H100 Tensor Core GPUs. They will also need to install the latest CUDA version (CUDA 12), CuDNN version, framework versions (e.g., PyTorch, Tensorflow), and EFA driver with updated topology files. To make this process easy for you, we will provide new DLAMIs and Deep Learning Containers that come prepackaged with all the needed software and frameworks to use P5 instances out of the box.
Now Available Amazon EC2 P5 instances are available today in AWS Regions: US East (N. Virginia) and US West (Oregon). For more information, see the Amazon EC2 pricing page. To learn more, visit our P5 instance page and explore AWS re:Post for EC2 or through your usual AWS Support contacts.
You can choose a broad range of AWS services that have generative AI built in, all running on the most cost-effective cloud infrastructure for generative AI. To learn more, visit Generative AI on AWS to innovate faster and reinvent your applications.
As organizations grow, the records that contain information about customers, businesses, or products tend to be increasingly fragmented and siloed across applications, channels, and data stores. Because information can be gathered in different ways, there is also the issue of different but equivalent data, such as for street addresses (“5th Avenue” and “5th Ave”). As a consequence, it’s not easy to link related records together to create a unified view and gain better insights.
For example, companies want to run advertising campaigns to reach consumers across multiple applications and channels with personalized messaging. Companies often have to deal with disparate data records that contain incomplete or conflicting information, creating a difficult matching process.
In the retail industry, companies have to reconcile, across their supply chain and stores, products that use multiple and different product codes, such as stock keeping units (SKUs), universal product codes (UPCs), or proprietary codes. This prevents them from analyzing information quickly and holistically.
One way to address this problem is to build bespoke data resolution solutions such as complex SQL queries interacting with multiple databases, or train machine learning (ML) models for record matching. But these solutions take months to build, require development resources, and are costly to maintain.
To help you with that, today we’re introducing AWS Entity Resolution, an ML-powered service that helps you match and link related records stored across multiple applications, channels, and data stores. You can get started in minutes configuring entity resolution workflows that are flexible, scalable, and can seamlessly connect to your existing applications.
AWS Entity Resolution offers advanced matching techniques, such as rule-based matching and machine learning models, to help you accurately link related sets of customer information, product codes, or business data codes. For example, you can use AWS Entity Resolution to create a unified view of your customer interactions by linking recent events (such as ad clicks, cart abandonment, and purchases) into a unique entity ID, or better track products that use different codes (like SKUs or UPCs) across your stores.
With AWS Entity Resolution, you can improve matching accuracy and protect data security while minimizing data movement because it reads records where they already live. Let’s see how that works in practice.
Using AWS Entity Resolution As part of my analytics platform, I have a comma-separated values (CSV) file containing one million fictitious customers in an Amazon Simple Storage Service (Amazon S3) bucket. These customers come from a loyalty program but can have applied through different channels (online, in store, by post), so it’s possible that multiple records relate to the same customer.
I use an AWS Glue crawler to automatically determine the content of the file and keep the metadata table updated in the data catalog so that it’s available for my analytics jobs. Now, I can use the same setup with AWS Entity Resolution.
To create a matching workflow, I first need to define my data with a schema mapping.
I choose Create schema mapping, enter a name and description, and select the option to import the schema from AWS Glue. I could also define a custom schema using a step-by-step flow or a JSON editor.
I select the AWS Glue database and table from the two dropdowns to import columns and pre-populate the input fields.
I select the Unique ID from the dropdown. The unique ID is the column that can distinctly reference each row of my data. In this case, it’s the loyalty_id in the CSV file.
I select the input fields that are going to be used for matching. In this case, I choose the columns from the dropdown that can be used to recognize if multiple records are related to the same customer. If some columns aren’t required for matching but are required in the output file, I can optionally add them as pass-through fields. I choose Next.
I map the input fields to their input type and match key. In this way, AWS Entity Resolution knows how to use these fields to match similar records. To continue, I choose Next.
Now, I use grouping to better organize the data I need to compare. For example, the First name, Middle name, and Last name input fields can be grouped together and compared as a Full name.
I also create a group for the Address fields.
I choose Next and review all configurations. Then, I choose Create schema mapping.
Now that I’ve created the schema mapping, I choose Matching workflows from the navigation pane and then Create matching workflow.
I enter a name and a description. Then, to configure the input data, I select the AWS Glue database and table and the schema mapping.
To give the service access to the data, I select a service role that I configured previously. The service role gives access to the input and output S3 buckets and the AWS Glue database and table. If the input or output buckets are encrypted, the service role can also give access to the AWS Key Management Service (AWS KMS) keys needed to encrypt and decrypt the data. I choose Next.
I have the option to use a rule-based or ML-powered matching method. Depending on the method, I can use a manual or automatic processing cadence to run the matching workflow job. For now, I select Machine learning matching and Manual for the Processing cadence, and then choose Next.
I configure an S3 bucket as the output destination. Under Data format, I select Normalized data so that special characters and extra spaces are removed, and data is formatted to lowercase.
I use the default Encryption settings. For Data output, I use the default so that all input fields are included. For security, I can hide fields to exclude them from output or hash fields I want to mask. I choose Next.
I review all settings and choose Create and run to complete the creation of the matching workflow and run the job for the first time.
After a few minutes, the job completes. According to this analysis, of the 1 million records, only 835 thousand are unique customers. I choose View output in Amazon S3 to download the output files.
In the output files, each record has the original unique ID (loyalty_id in this case) and a newly assigned MatchID. Matching records, related to the same customers, have the same MatchID. The ConfidenceLevel field describes the confidence that machine learning matching has that the corresponding records are actually a match.
I can now use this information to have a better understanding of customers who are subscribed to the loyalty program.
Availability and Pricing AWS Entity Resolution is generally available today in the following AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Seoul, Singapore, Sydney, Tokyo), and Europe (Frankfurt, Ireland, London).
With AWS Entity Resolution, you pay only for what you use based on the number of source records processed by your workflows. Pricing doesn’t depend on the matching method, whether it’s machine learning or rule-based record matching. For more information, see AWS Entity Resolution pricing.
Using AWS Entity Resolution, you gain a deeper understanding of how data is linked. That helps you deliver new insights, enhance decision making, and improve customer experiences based on a unified view of their records.
P.S. We’re focused on improving our content to provide a better customer experience, and we need your feedback to do so. Please take this quick survey to share insights on your experience with the AWS Blog. Note that this survey is hosted by an external company, so the link does not lead to our website. AWS handles your information as described in the AWS Privacy Notice.
This April, Swami Sivasubramanian, Vice President of Data and Machine Learning at AWS, announced Amazon Bedrock and Amazon Titan models as part of new tools for building with generative AI on AWS. Amazon Bedrock, currently available in preview, is a fully managed service that makes foundation models (FMs) from Amazon and leading AI startups—such as AI21 Labs, Anthropic, Cohere, and Stability AI—available through an API.
Today, I’m excited to announce the preview of agents for Amazon Bedrock, a new capability for developers to create fully managed agents in a few clicks. Agents for Amazon Bedrock accelerate the delivery of generative AI applications that can manage and perform tasks by making API calls to your company systems. Agents extend FMs to understand user requests, break down complex tasks into multiple steps, carry on a conversation to collect additional information, and take actions to fulfill the request.
Using agents for Amazon Bedrock, you can automate tasks for your internal or external customers, such as managing retail orders or processing insurance claims. For example, an agent-powered generative AI e-commerce application can not only respond to the question, “Do you have this jacket in blue?” with a simple answer but can also help you with the task of updating your order or managing an exchange.
For this to work, you first need to give the agent access to external data sources and connect it to existing APIs of other applications. This allows the FM that powers the agent to interact with the broader world and extend its utility beyond just language processing tasks. Second, the FM needs to figure out what actions to take, what information to use, and in which sequence to perform these actions. This is possible thanks to an exciting emerging behavior of FMs—their ability to reason. You can show FMs how to handle such interactions and how to reason through tasks by building prompts that include definitions and instructions. The process of designing prompts to guide the model towards desired outputs is known as prompt engineering.
Introducing Agents for Amazon Bedrock Agents for Amazon Bedrock automate the prompt engineering and orchestration of user-requested tasks. Once configured, an agent automatically builds the prompt and securely augments it with your company-specific information to provide responses back to the user in natural language. The agent is able to figure out the actions required to automatically process user-requested tasks. It breaks the task into multiple steps, orchestrates a sequence of API calls and data lookups, and maintains memory to complete the action for the user.
With fully managed agents, you don’t have to worry about provisioning or managing infrastructure. You’ll have seamless support for monitoring, encryption, user permissions, and API invocation management without writing custom code. As a developer, you can use the Bedrock console or SDK to upload the API schema. The agent then orchestrates the tasks with the help of FMs and performs API calls using AWS Lambda functions.
Primer on Advanced Reasoning and ReAct You can help FMs to reason and figure out how to solve user-requested tasks with a reasoning technique called ReAct (synergizing reasoning and acting). Using ReAct, you can structure prompts to show an FM how to reason through a task and decide on actions that help find a solution. The structured prompts include a sequence of question-thought-action-observation examples.
The question is the user-requested task or problem to solve. The thought is a reasoning step that helps demonstrate to the FM how to tackle the problem and identify an action to take. The action is an API that the model can invoke from an allowed set of APIs. The observation is the result of carrying out the action. The actions that the FM is able to choose from are defined by a set of instructions that are prepended to the example prompt text. Here is an illustration of how you would build up a ReAct prompt:
The good news is that Bedrock performs the heavy lifting for you! Behind the scenes, agents for Amazon Bedrock build the prompts based on the information and actions you provide.
Now, let me show you how to get started with agents for Amazon Bedrock.
Create an Agent for Amazon Bedrock Let’s assume you’re a developer at an insurance company and want to provide a generative AI application that helps the insurance agency owners automate repetitive tasks. You create an agent in Bedrock and integrate it into your application.
To get started with the agent, open the Bedrock console, select Agents in the left navigation panel, then choose Create Agent.
Select a foundation model from Bedrock that fits your use case. Here, you provide an instruction to your agent in natural language. The instruction tells the agent what task it’s supposed to perform and the persona it’s supposed to assume. For example, “You are an agent designed to help with processing insurance claims and managing pending paperwork.”
Add action groups. An action is a task that the agent can perform automatically by making API calls to your company systems. A set of actions is defined in an action group. Here, you provide an API schema that defines the APIs for all the actions in the group. You also must provide a Lambda function that represents the business logic for each API. For example, let’s define an action group called ClaimManagementActionGroup that manages insurance claims by pulling a list of open claims, identifying outstanding paperwork for each claim, and sending reminders to policy holders. Make sure to capture this information in the action group description. The business logic for my action group is captured in the Lambda function InsuranceClaimsLambda. This AWS Lambda function implements methods for the following API calls: open-claims, identify-missing-documents, and send-reminders.Here’s a short extract from my OrderManagementLambda:
import json
import time
def open_claims():
...
def identify_missing_documents(parameters):
...
def send_reminders():
...
def lambda_handler(event, context):
responses = []
for prediction in event['actionGroups']:
response_code = ...
action = prediction['actionGroup']
api_path = prediction['apiPath']
if api_path == '/claims':
body = open_claims()
elif api_path == '/claims/{claimId}/identify-missing-documents':
parameters = prediction['parameters']
body = identify_missing_documents(parameters)
elif api_path == '/send-reminders':
body = send_reminders()
else:
body = {"{}::{} is not a valid api, try another one.".format(action, api_path)}
response_body = {
'application/json': {
'body': str(body)
}
}
action_response = {
'actionGroup': prediction['actionGroup'],
'apiPath': prediction['apiPath'],
'httpMethod': prediction['httpMethod'],
'httpStatusCode': response_code,
'responseBody': response_body
}
responses.append(action_response)
api_response = {'response': responses}
return api_response
Note that you also must provide an API schema in the OpenAPI schema JSON format. Here’s what my API schema file insurance_claim_schema.json looks like:
{"openapi": "3.0.0",
"info": {
"title": "Insurance Claims Automation API",
"version": "1.0.0",
"description": "APIs for managing insurance claims by pulling a list of open claims, identifying outstanding paperwork for each claim, and sending reminders to policy holders."
},
"paths": {
"/claims": {
"get": {
"summary": "Get a list of all open claims",
"description": "Get the list of all open insurance claims. Return all the open claimIds.",
"operationId": "getAllOpenClaims",
"responses": {
"200": {
"description": "Gets the list of all open insurance claims for policy holders",
"content": {
"application/json": {
"schema": {
"type": "array",
"items": {
"type": "object",
"properties": {
"claimId": {
"type": "string",
"description": "Unique ID of the claim."
},
"policyHolderId": {
"type": "string",
"description": "Unique ID of the policy holder who has filed the claim."
},
"claimStatus": {
"type": "string",
"description": "The status of the claim. Claim can be in Open or Closed state"
}
}
}
}
}
}
}
}
}
},
"/claims/{claimId}/identify-missing-documents": {
"get": {
"summary": "Identify missing documents for a specific claim",
"description": "Get the list of pending documents that need to be uploaded by policy holder before the claim can be processed. The API takes in only one claim id and returns the list of documents that are pending to be uploaded by policy holder for that claim. This API should be called for each claim id",
"operationId": "identifyMissingDocuments",
"parameters": [{
"name": "claimId",
"in": "path",
"description": "Unique ID of the open insurance claim",
"required": true,
"schema": {
"type": "string"
}
}],
"responses": {
"200": {
"description": "List of documents that are pending to be uploaded by policy holder for insurance claim",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"pendingDocuments": {
"type": "string",
"description": "The list of pending documents for the claim."
}
}
}
}
}
}
}
}
},
"/send-reminders": {
"post": {
"summary": "API to send reminder to the customer about pending documents for open claim",
"description": "Send reminder to the customer about pending documents for open claim. The API takes in only one claim id and its pending documents at a time, sends the reminder and returns the tracking details for the reminder. This API should be called for each claim id you want to send reminders for.",
"operationId": "sendReminders",
"requestBody": {
"required": true,
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"claimId": {
"type": "string",
"description": "Unique ID of open claims to send reminders for."
},
"pendingDocuments": {
"type": "string",
"description": "The list of pending documents for the claim."
}
},
"required": [
"claimId",
"pendingDocuments"
]
}
}
}
},
"responses": {
"200": {
"description": "Reminders sent successfully",
"content": {
"application/json": {
"schema": {
"type": "object",
"properties": {
"sendReminderTrackingId": {
"type": "string",
"description": "Unique Id to track the status of the send reminder Call"
},
"sendReminderStatus": {
"type": "string",
"description": "Status of send reminder notifications"
}
}
}
}
}
},
"400": {
"description": "Bad request. One or more required fields are missing or invalid."
}
}
}
}
}
}
When a user asks your agent to complete a task, Bedrock will use the FM you configured for the agent to identify the sequence of actions and invoke the corresponding Lambda functions in the right order to solve the user-requested task.
In the final step, review your agent configuration and choose Create Agent.
Congratulations, you’ve just created your first agent in Amazon Bedrock!
Deploy an Agent for Amazon Bedrock To deploy an agent in your application, you must create an alias. Bedrock then automatically creates a version for that alias.
In the Bedrock console, select your agent, then select Deploy, and choose Create to create an alias.
Provide an alias name and description and choose whether to create a new version or use an existing version of your agent to associate with this alias.
This saves a snapshot of the agent code and configuration and associates an alias with this snapshot or version. You can use the alias to integrate the agent into your applications.
Now, let’s test the insurance agent! You can do this right in the Bedrock console.
Let’s ask the agent to “Send reminder to all policy holders with open claims and pending paper work.” You can see how the FM-powered agent is able to understand the user request, break down the task into steps (collect the open insurance claims, lookup the claim IDs, send reminders), and perform the corresponding actions.
Agents for Amazon Bedrock can help you increase productivity, improve your customer service experience, or automate DevOps tasks. I’m excited to see what use cases you will implement!
Learn the Fundamentals of Generative AI If you’re interested in the fundamentals of generative AI and how to work with FMs, including advanced prompting techniques and agents, check out this this new hands-on course that I developed with AWS colleagues and industry experts in collaboration with DeepLearning.AI:
P.S. We’re focused on improving our content to provide a better customer experience, and we need your feedback to do so. Please take this quick survey to share insights on your experience with the AWS Blog. Note that this survey is hosted by an external company, so the link does not lead to our website. AWS handles your information as described in the AWS Privacy Notice.
It’ll be a full house as the AWS Summit gets underway in New York City on Wednesday, July 26, 2023. The cloud event has something for everyone including a keynote, breakout sessions, opportunities to network, and of course, to learn about the latest exciting AWS product announcements.
Today, we’re sharing a selection of announcements to get the fun started. We’ll also share major updates from Wednesday’s keynote, so check back for more exciting news to come soon.
AWS Glue Studio now supports Amazon Redshift Serverless Before this launch, developers using Glue Studio only had access to Redshift tables in Redshift clusters. Now, those same developers can connect to Redshift Serverless tables directly without manual configuration.
The US celebrated Independence Day last week on July 4 with fireworks and barbecues across the country. But fireworks weren’t the only thing that launched last week. Let’s have a look!
Last Week’s Launches Here are some launches that got my attention:
AWS Glue – AWS Glue Crawlers now supports Apache Iceberg tables. Apache Iceberg is an open-source table format for data stored in data lakes. You can now automatically register Apache Iceberg tables into AWS Glue Data Catalog by running the Glue Crawler. You can then query Glue Catalog Iceberg tables across various analytics engines and apply AWS Lake Formation fine-grained permissions when querying from Amazon Athena. Check out the AWS Glue Crawler documentation to learn more.
In addition, Amazon RDS for PostgreSQL Multi-AZ Deployments with two readable standbys now supports logical replication. With logical replication, you can stream data changes from Amazon RDS for PostgreSQL to other databases for use cases such as data consolidation for analytical applications, change data capture (CDC), replicating select tables rather than the entire database, or for replicating data between different major versions of PostgreSQL. Check out the Amazon RDS User Guide for more details.
Amazon CloudWatch – Amazon CloudWatch now supports Service Quotas in cross-account observability. With this, you can track and visualize resource utilization and limits across various AWS services from multiple AWS accounts within a region using a central monitoring account. You no longer have to track the quotas by logging in to individual accounts, instead from a central monitoring account, you can create dashboards and alarms for the AWS service quota usage across all your source accounts from a central monitoring account. Setup CloudWatch cross-account observability to get started.
Amazon SageMaker – You can now associate a SageMaker Model Card with a specific model version in SageMaker Model Registry. This lets you establish a single source of truth for your registered model versions, with comprehensive, centralized, and standardized documentation across all stages of the model’s journey on SageMaker, facilitating discoverability and promoting governance, compliance, and accountability throughout the model lifecycle. Learn more about SageMaker Model Cards in the developer guide.
Other AWS News Here are some additional blog posts and news items that you might find interesting:
Building generative AI applications for your startup – In this AWS Startups Blog post, Hrushikesh explains various approaches to build generative AI applications, and reviews their key component. Read the full post for the details.
Components of the generative AI landscape.
How Alexa learned to speak with an Irish accent – If you’re curious how Amazon researchers used voice conversation to generate Irish-accented training data in Alexa’s own voice, check out this Amazon Science Blog post.
AWS open-source news and updates – My colleague Ricardo writes this weekly open-source newsletter in which he highlights new open-source projects, tools, and demos from the AWS Community.
Upcoming AWS Events Check your calendars and sign up for these AWS events:
AWS Global Summits – Check your calendars and sign up for the AWS Summit close to where you live or work: Hong Kong (July 20), New York City (July 26), Taiwan (August 2-3), São Paulo (August 3), and Mexico City (August 30).
AWS Community Days – Join a community-led conference run by AWS user group leaders in your region: Malaysia (July 22), Philippines (July 29-30), Colombia (August 12), and West Africa (August 19).
AWS re:Invent (November 27 – December 1) – Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community. Registration is now open.
Companies continue to adopt software as a service (SaaS) applications at a rapid clip, with recent research showing that the average SaaS portfolio now has at least 200 applications. While organizations purchase these purpose-built tools to make their employees more productive, they now must contend with growing security complexities, context switching, and data silos.
If your company faces these issues, or you want to avoid them in the future, join us on Tuesday, June 27, for a free-to-attend online event AWS Applications Innovation Day. AWS will stream the event simultaneously across multiple platforms, including LinkedIn Live, Twitter, YouTube, and Twitch. You can also join us in person in Seattle to hear from Dilip Kumar, Vice President of AWS Applications and an executive panel with AWS Partners Splunk, Asana, and Okta.
Applications Innovation Day is designed to give you the tools you need to improve how your organization uses and secures SaaS applications. Sessions throughout the day will show you how you can secure data while providing your employees with the best tools for the job. You’ll also learn how to support the right mix of applications to improve workforce collaboration, and how to use generative artificial intelligence securely and effectively to improve insights and enhance employee productivity.
We’ll start the virtual broadcast with a keynote from Dilip Kumar, Vice President of AWS Applications, who will discuss the way we use and govern SaaS applications at AWS. He’ll also discuss how we’ll make it easier to deploy purpose-built SaaS applications like Asana, Okta, Splunk, Zoom, and others across your business, including the announcement of some exciting new innovations from AWS.
AWS product leaders will present technical breakout sessions during the day on the productivity and security aspects of managing a SaaS application tech stack. Sessions will cover a wide range of topics, including how the nature of productivity at work is changing, how AI is transforming SaaS applications and collaboration, how you can improve your security observability across your applications, and how you can create custom analytics on SaaS application activity.
Overall, the event is a great opportunity for security leaders, IT administrators and operations leaders, and anyone leading digital workplace and transformation initiatives to learn how to better leverage and govern SaaS applications.
AWS Silicon Innovation Day is a one-day virtual event on June 21, 2023, that will allow you to better understand AWS Silicon and how you can use AWS’s unique Amazon EC2 chip offerings to your benefit. AWS has designed and developed purpose-built silicon specifically for the cloud.
During this event, you will have the opportunity to hear directly from senior leaders at AWS. Our panel of lead architects, engineers, customers, and analysts will provide insights into our silicon journey. Through deep dives into our cutting-edge silicon design and customer success stories, the panel will provide insights on security enhancements and cost-saving opportunities. Here are some of the highlights you can expect from this event.
Leadership session – To kick off the day, we have a Leadership session featuring Dave Brown, VP of Amazon EC2 and Dr. Ruba Borno, VP of WW Channels and Alliances joining us on stage. Dave will engage in a discussion with Ruba about how you can benefit from the innovation AWS delivers with its silicon technology.
AI/ML session – Gary Szilagyi, VP of Annapurna Labs will discuss with Nafea Bshara, co-founder of Annapurna Labs the utilization of chipset development by his team to create specialized chips for Generative AI, CPU, and the AWS Nitro system. He will highlight how you can harness the Annapurna mindset to develop not only CPUs but also tailor-made chips with specific purposes in mind.
Customer session – Jeff Barr, VP of AWS Evangelism, and Tiffany Wissner, Director of Product Marketing, will delve into insights from our customers. They will share anecdotes and experiences gathered from various sources, such as re:Invent, summits, and developer events, where you have expressed how you harnessed AWS silicon to drive your own remarkable innovations.
Networking session – JR Rivers, Senior Principal Engineer, and Madhura Kale, Senior Product Manager will shed light on the impact of silicon innovation, not only on the benefits you experience using our CPUs, GPUs, or Nitro System, but also on the transformation of AWS’s network infrastructure. They will delve into the realm of networking advancements, showcasing some of the latest innovations and highlighting the instrumental role played by AWS silicon in powering these developments.
Arm and Nitro Innovation session – Anthony Liguori, VP and Fellow, Nitro System architecture will be joined by Ali Saidi, Director of Annapurna Labs to discuss harnessing the power of hardware and software in tandem to drive the development of cutting-edge silicon technologies.
Analyst and Executive session – Raj Pai, VP of Amazon EC2 Product Management will engage in a conversation with an analyst, delving into the realm of silicon innovation in the cloud.
No advance registration is needed to participate in AWS Silicon Innovation Day, but you can add an event reminder to your calendar by registering on the event page. We sincerely hope that you will join us in embracing the excitement and seizing the valuable learning opportunities at this new event!
In the AWS Security Profile series, we interview Amazon Web Services (AWS) thought leaders who help keep our customers safe and secure. This interview features Matt Campagna, Senior Principal, Security Engineering, AWS Cryptography, and re:Inforce 2023 session speaker, who shares thoughts on data protection, cloud security, post-quantum cryptography, and more. Matthew was first profiled on the AWS Security Blog in 2019. This is part 1 of 3 in a series of interviews with our AWS Cryptography team.
What do you do in your current role and how long have you been at AWS?
I started at Amazon in 2013 as the first cryptographer at AWS. Today, my focus is on the cryptographic security of our customers’ data. I work across AWS to make sure that our cryptographic engineering meets our most sensitive customer needs. I lead our migration to quantum-resistant cryptography, and help make privacy-preserving cryptography techniques part of our security model.
How did you get started in the data protection and cryptography space? What about it piqued your interest?
I first learned about public-key cryptography (for example, RSA) during a math lesson about group theory. I found the mathematics intriguing and the idea of sending secret messages using only a public value astounding. My undergraduate and graduate education focused on group theory, and I started my career at the National Security Agency (NSA) designing and analyzing cryptologics. But what interests me most about cryptography is its ability to enable business by reducing risks. I look at cryptography as a financial instrument that affords new business cases, like e-commerce, digital currency, and secure collaboration. What enables Amazon to deliver for our customers is rooted in cryptography; our business exists because cryptography enables trust and confidentiality across the internet. I find this the most intriguing aspect of cryptography.
AWS has invested in the migration to post-quantum cryptography by contributing to post-quantum key agreement and post-quantum signature schemes to protect the confidentiality, integrity, and authenticity of customer data. What should customers do to prepare for post-quantum cryptography?
Our focus at AWS is to help ensure that customers can migrate to post-quantum cryptography as fast as prudently possible. This work started with inventorying our dependencies on algorithms that aren’t known to be quantum-resistant, like integer-factorization-based cryptography, and discrete-log-based cryptography, like ECC. Customers can rely on AWS to assist with transitioning to post-quantum cryptography for their cloud computing needs.
We recommend customers begin inventorying their dependencies on algorithms that aren’t quantum-resistant, and consider developing a migration plan, to understand if they can migrate directly to new post-quantum algorithms or if they should re-architect them. For the systems that are provided by a technology provider, customers should ask what their strategy is for post-quantum cryptography migration.
AWS offers post-quantum TLS endpoints in some security services. Can you tell us about these endpoints and how customers can use them?
Our open source TLS implementation, s2n-TLS, includes post-quantum hybrid key exchange (PQHKEX) in its mainline. It’s deployed everywhere that s2n is deployed. AWS Key Management Service, AWS Secrets Manager, and AWS Certificate Manager have enabled PQHKEX cipher suites in our commercial AWS Regions. Today customers can use the AWS SDK for Java 2.0 to enable PQHKEX on their connection to AWS, and on the services that also have it enabled, they will negotiate a post-quantum key exchange method. As we enable these cipher suites on additional services, customers will also be able to connect to these services using PQHKEX.
You are a frequent contributor to the Amazon Science Blog. What were some of your recent posts about?
In 2022, we published a post on preparing for post-quantum cryptography, which provides general information on the broader industry development and deployment of post-quantum cryptography. The post links to a number of additional resources to help customers understand post-quantum cryptography. The AWS Post-Quantum Cryptography page and the Science Blog are great places to start learning about post-quantum cryptography.
We also published a post highlighting the security of post-quantum hybrid key exchange. Amazon believes in evidencing the cryptographic security of the solutions that we vend. We are actively participating in cryptographic research to validate the security that we provide in our services and tools.
What’s been the most dramatic change you’ve seen in the data protection and post-quantum cryptography landscape since we talked to you in 2019?
Since 2019, there have been two significant advances in the development of post-quantum cryptography.
First, the National Institute of Standards and Technology (NIST) announced their selection of PQC algorithms for standardization. NIST expects to finish the standardization of a post-quantum key encapsulation mechanism (Kyber) and digital signature scheme (Dilithium) by 2024 as part of the Federal Information Processing Standard (FIPS). NIST will also work on standardization of two additional signature standards (FALCON and SPHINCS+), and continue to consider future standardization of the key encapsulation mechanisms BIKE, HQC, and Classical McEliece.
Second, the NSA announced their Commercial National Security Algorithm (CNSA) Suite 2.0, which includes their timelines for National Security Systems (NSS) to migrate to post-quantum algorithms. The NSA will begin preferring post-quantum solutions in 2025 and expect that systems will have completed migration by 2033. Although this timeline might seem far away, it’s an aggressive strategy. Experience shows that it can take 20 years to develop and deploy new high-assurance cryptographic algorithms. If technology providers are not already planning to migrate their systems and services, they will be challenged to meet this timeline.
What makes cryptography exciting to you?
Cryptography is a dynamic area of research. In addition to the business applications, I enjoy the mathematics of cryptography. The state-of-the-art is constantly progressing in terms of new capabilities that cryptography can enable, and the potential risks to existing cryptographic primitives. This plays out in the public sphere of cryptographic research across the globe. These advancements are made public and are accessible for companies like AWS to innovate on behalf of our customers, and protect our systems in advance of the development of new challenges to our existing crypto algorithms. This is happening now as we monitor the advancements of quantum computing against our ability to define and deploy new high-assurance quantum-resistant algorithms. For me, it doesn’t get more exciting than this.
Where do you see the cryptography and post-quantum cryptography space heading to in the future?
While NIST transitions from their selection process to standardization, the broader cryptographic community will be more focused on validating the cryptographic assurances of these proposed schemes for standardization. This is a critical part of the process. I’m optimistic that we will enter 2025 with new cryptographic standards to deploy.
There is a lot of additional cryptographic research and engineering ahead of us. Applying these new primitives to the cryptographic applications that use classical asymmetric schemes still needs to be done. Some of this work is happening in parallel, like in the IETF TLS working group, and in the ETSI Quantum-Safe Cryptography Technical Committee. The next five years should see the adoption of PQHKEX in protocols like TLS, SSH, and IKEv2 and certification of new FIPS hardware security modules (HSMs) for establishing new post-quantum, long-lived roots of trust for code-signing and entity authentication.
I expect that the selected primitives for standardization will also be used to develop novel uses in fields like secure multi-party communication, privacy preserving machine learning, and cryptographic computing.
With AWS re:Inforce 2023 around the corner, what will your session focus on? What do you hope attendees will take away from your session?
Session DAP302 – “Post-quantum cryptography migration strategy for cloud services” is about the challenge quantum computers pose to currently used public-key cryptographic algorithms and how the industry is responding. Post-quantum cryptography (PQC) offers a solution to this challenge, providing security to help protect against quantum computer cybersecurity events. We outline current efforts in PQC standardization and migration strategies. We want our customers to leave with a better understanding of the importance of PQC and the steps required to migrate to it in a cloud environment.
Is there something you wish customers would ask you about more often?
The question I am most interested in hearing from our customers is, “when will you have a solution to my problem?” If customers have a need for a novel cryptographic solution, I’m eager to try to solve that with them.
How about outside of work, any hobbies?
My main hobbies outside of work are biking and running. I wish I was as consistent attending to my hobbies as I am to my work desk. I am happier being able to run every day for a constant speed and distance as opposed to running faster or further tomorrow or next week. Last year I was fortunate enough to do the Cycle Oregon ride. I had registered for it twice before without being able to find the time to do it.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
A full conference pass is $1,099. Register today with the code secure150off to receive a limited time $150 discount, while supplies last.
AWS re:Inforce is back, and we can’t wait to welcome security builders to Anaheim, CA, on June 13 and 14. AWS re:Inforce is a security learning conference where you can gain skills and confidence in cloud security, compliance, identity, and privacy. As an attendee, you will have access to hundreds of technical and non-technical sessions, an Expo featuring AWS experts and security partners with AWS Security Competencies, and keynote and leadership sessions featuring Security leadership. re:Inforce 2023 features content across the following six areas:
Data protection
Governance, risk, and compliance
Identity and access management
Network and infrastructure security
Threat detection and incident response
Application security
The threat detection and incident response track is designed to showcase how AWS, customers, and partners can intelligently detect potential security risks, centralize and streamline security management at scale, investigate and respond quickly to security incidents across their environment, and unlock security innovation across hybrid cloud environments.
Breakout sessions, chalk talks, and lightning talks
TDR201 | Breakout session | How Citi advanced their containment capabilities through automation Incident response is critical for maintaining the reliability and security of AWS environments. To support the 28 AWS services in their cloud environment, Citi implemented a highly scalable cloud incident response framework specifically designed for their workloads on AWS. Using AWS Step Functions and AWS Lambda, Citi’s automated and orchestrated incident response plan follows NIST guidelines and has significantly improved its response time to security events. In this session, learn from real-world scenarios and examples on how to use AWS Step Functions and other core AWS services to effectively build and design scalable incident response solutions.
TDR202 | Breakout session | Wix’s layered security strategy to discover and protect sensitive data Wix is a leading cloud-based development platform that empowers users to get online with a personalized, professional web presence. In this session, learn how the Wix security team layers AWS security services including Amazon Macie, AWS Security Hub, and AWS Identity and Access Management Access Analyzer to maintain continuous visibility into proper handling and usage of sensitive data. Using AWS security services, Wix can discover, classify, and protect sensitive information across terabytes of data stored on AWS and in public clouds as well as SaaS applications, while empowering hundreds of internal developers to drive innovation on the Wix platform.
TDR203 | Breakout session | Vulnerability management at scale drives enterprise transformation Automating vulnerability management at scale can help speed up mean time to remediation and identify potential business-impacting issues sooner. In this session, explore key challenges that organizations face when approaching vulnerability management across large and complex environments, and consider the innovative solutions that AWS provides to help overcome them. Learn how customers use AWS services such as Amazon Inspector to automate vulnerability detection, streamline remediation efforts, and improve compliance posture. Whether you’re just getting started with vulnerability management or looking to optimize your existing approach, gain valuable insights and inspiration to help you drive innovation and enhance your security posture with AWS.
TDR204 | Breakout session | Continuous innovation in AWS detection and response services Join this session to learn about the latest advancements and most recent AWS launches in detection and response. This session focuses on use cases such as automated threat detection, continual vulnerability management, continuous cloud security posture management, and unified security data management. Through these examples, gain a deeper understanding of how you can seamlessly integrate AWS services into your existing security framework to gain greater control and insight, quickly address security risks, and maintain the security of your AWS environment.
TDR205 | Breakout session | Build your security data lake with Amazon Security Lake, featuring Interpublic Group Security teams want greater visibility into security activity across their entire organizations to proactively identify potential threats and vulnerabilities. Amazon Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your account and allows you to use industry-leading AWS and third-party analytics and ML tools to gain insights from your data and identify security risks that require immediate attention. Discover how Security Lake can help you consolidate and streamline security logging at scale and speed, and hear from an AWS customer, Interpublic Group (IPG), on their experience.
TDR209 | Breakout session | Centralizing security at scale with Security Hub & Intuit’s experience As organizations move their workloads to the cloud, it becomes increasingly important to have a centralized view of security across their cloud resources. AWS Security Hub is a powerful tool that allows organizations to gain visibility into their security posture and compliance status across their AWS accounts and Regions. In this session, learn about Security Hub’s new capabilities that help simplify centralizing and operationalizing security. Then, hear from Intuit, a leading financial software company, as they share their experience and best practices for setting up and using Security Hub to centralize security management.
TDR210 | Breakout session | Streamline security analysis with Amazon Detective Join us to discover how to streamline security investigations and perform root-cause analysis with Amazon Detective. Learn how to leverage the graph analysis techniques in Detective to identify related findings and resources and investigate them together to accelerate incident analysis. Also hear a customer story about their experience using Detective to analyze findings automatically ingested from Amazon GuardDuty, and walk through a sample security investigation.
TDR310 | Breakout session | Developing new findings using machine learning in Amazon GuardDuty Amazon GuardDuty provides threat detection at scale, helping you quickly identify and remediate security issues with actionable insights and context. In this session, learn how GuardDuty continuously enhances its intelligent threat detection capabilities using purpose-built machine learning models. Discover how new findings are developed for new data sources using novel machine learning techniques and how they are rigorously evaluated. Get a behind-the-scenes look at GuardDuty findings from ideation to production, and learn how this service can help you strengthen your security posture.
TDR311 | Breakout session | Securing data and democratizing the alert landscape with an event-driven architecture Security event monitoring is a unique challenge for businesses operating at scale and seeking to integrate detections into their existing security monitoring systems while using multiple detection tools. Learn how organizations can triage and raise relevant cloud security findings across a breadth of detection tools and provide results to downstream security teams in a serverless manner at scale. We discuss how to apply a layered security approach to evaluate the security posture of your data, protect your data from potential threats, and automate response and remediation to help with compliance requirements.
TDR231 | Chalk talk | Operationalizing security findings at scale You enabled AWS Security Hub standards and checks across your AWS organization and in all AWS Regions. What should you do next? Should you expect zero critical and high findings? What is your ideal state? Is achieving zero findings possible? In this chalk talk, learn about a framework you can implement to triage Security Hub findings. Explore how this framework can be applied to several common critical and high findings, and take away mechanisms to prioritize and respond to security findings at scale.
TDR232 | Chalk talk | Act on security findings using Security Hub’s automation capabilities Alert fatigue, a shortage of skilled staff, and keeping up with dynamic cloud resources are all challenges that exist when it comes to customers successfully achieving their security goals in AWS. In order to achieve their goals, customers need to act on security findings associated with cloud-based resources. In this session, learn how to automatically, or semi-automatically, act on security findings aggregated in AWS Security Hub to help you secure your organization’s cloud assets across a diverse set of accounts and Regions.
TDR233 | Chalk talk | How LLA reduces incident response time with AWS Systems Manager Liberty Latin America (LLA) is a leading telecommunications company operating in over 20 countries across Latin America and the Caribbean. LLA offers communications and entertainment services, including video, broadband internet, telephony, and mobile services. In this chalk talk, discover how LLA implemented a security framework to detect security issues and automate incident response in more than 180 AWS accounts accessed by internal stakeholders and third-party partners using AWS Systems Manager Incident Manager, AWS Organizations, Amazon GuardDuty, and AWS Security Hub.
TDR432 | Chalk talk | Deep dive into exposed credentials and how to investigate them In this chalk talk, sharpen your detection and investigation skills to spot and explore common security events like unauthorized access with exposed credentials. Learn how to recognize the indicators of such events, as well as logs and techniques that unauthorized users use to evade detection. The talk provides knowledge and resources to help you immediately prepare for your own security investigations.
TDR332 | Chalk talk | Speed up zero-day vulnerability response In this chalk talk, learn how to scale vulnerability management for Amazon EC2 across multiple accounts and AWS Regions. Explore how to use Amazon Inspector, AWS Systems Manager, and AWS Security Hub to respond to zero-day vulnerabilities, and leave knowing how to plan, perform, and report on proactive and reactive remediations.
TDR333 | Chalk talk | Gaining insights from Amazon Security Lake You’ve created a security data lake, and you’re ingesting data. Now what? How do you use that data to gain insights into what is happening within your organization or assist with investigations and incident response? Join this chalk talk to learn how analytics services and security information and event management (SIEM) solutions can connect to and use data stored within Amazon Security Lake to investigate security events and identify trends across your organization. Leave with a better understanding of how you can integrate Amazon Security Lake with other business intelligence and analytics tools to gain valuable insights from your security data and respond more effectively to security events.
TDR431 | Chalk talk | The anatomy of a ransomware event Ransomware events can cost governments, nonprofits, and businesses billions of dollars and interrupt operations. Early detection and automated responses are important steps that can limit your organization’s exposure. In this chalk talk, examine the anatomy of a ransomware event that targets data residing in Amazon RDS and get detailed best practices for detection, response, recovery, and protection.
TDR221 | Lightning talk | Streamline security operations and improve threat detection with OCSF Security operations centers (SOCs) face significant challenges in monitoring and analyzing security telemetry data from a diverse set of sources. This can result in a fragmented and siloed approach to security operations that makes it difficult to identify and investigate incidents. In this lightning talk, get an introduction to the Open Cybersecurity Schema Framework (OCSF) and its taxonomy constructs, and see a quick demo on how this normalized framework can help SOCs improve the efficiency and effectiveness of their security operations.
TDR222 | Lightning talk | Security monitoring for connected devices across OT, IoT, edge & cloud With the responsibility to stay ahead of cybersecurity threats, CIOs and CISOs are increasingly tasked with managing cybersecurity risks for their connected devices including devices on the operational technology (OT) side of the company. In this lightning talk, learn how AWS makes it simpler to monitor, detect, and respond to threats across the entire threat surface, which includes OT, IoT, edge, and cloud, while protecting your security investments in existing third-party security tools.
TDR223 | Lightning talk | Bolstering incident response with AWS Wickr enterprise integrations Every second counts during a security event. AWS Wickr provides end-to-end encrypted communications to help incident responders collaborate safely during a security event, even on a compromised network. Join this lightning talk to learn how to integrate AWS Wickr with AWS security services such as Amazon GuardDuty and AWS WAF. Learn how you can strengthen your incident response capabilities by creating an integrated workflow that incorporates GuardDuty findings into a secure, out-of-band communication channel for dedicated teams.
TDR224 | Lightning talk | Securing the future of mobility: Automotive threat modeling Many existing automotive industry cybersecurity threat intelligence offerings lack the connected mobility insights required for today’s automotive cybersecurity threat landscape. Join this lightning talk to learn about AWS’s approach to developing an automotive industry in-vehicle, domain-specific threat intelligence solution using AWS AI/ML services that proactively collect, analyze, and deduce threat intelligence insights for use and adoption across automotive value chains.
Hands-on sessions (builders’ sessions and workshops)
TDR251 | Builders’ session | Streamline and centralize security operations with AWS Security Hub AWS Security Hub provides you with a comprehensive view of the security state of your AWS resources by collecting security data from across AWS accounts, Regions, and services. In this builders’ session, explore best practices for using Security Hub to manage security posture, prioritize security alerts, generate insights, automate response, and enrich findings. Come away with a better understanding of how to use Security Hub features and practical tips for getting the most out of this powerful service.
TDR351 | Builders’ session | Broaden your scope: Analyze and investigate potential security issues In this builders’ session, learn how you can more efficiently triage potential security issues with a dynamic visual representation of the relationship between security findings and associated entities such as accounts, IAM principals, IP addresses, Amazon S3 buckets, and Amazon EC2 instances. With Amazon Detective finding groups, you can group related Amazon GuardDuty findings to help reduce time spent in security investigations and in understanding the scope of a potential issue. Leave this hands-on session knowing how to quickly investigate and discover the root cause of an incident.
TDR352 | Builders’ session | How to automate containment and forensics for Amazon EC2 In this builders’ session, learn how to deploy and scale the self-service Automated Forensics Orchestrator for Amazon EC2 solution, which gives you a standardized and automated forensics orchestration workflow capability to help you respond to Amazon EC2 security events. Explore the prerequisites and ways to customize the solution to your environment.
TDR353 | Builders’ session | Detecting suspicious activity in Amazon S3 Have you ever wondered how to uncover evidence of unauthorized activity in your AWS account? In this builders’ session, join the AWS Customer Incident Response Team (CIRT) for a guided simulation of suspicious activity within an AWS account involving unauthorized data exfiltration and Amazon S3 bucket and object data deletion. Learn how to detect and respond to this malicious activity using AWS services like AWS CloudTrail, Amazon Athena, Amazon GuardDuty, Amazon CloudWatch, and nontraditional threat detection services like AWS Billing to uncover evidence of unauthorized use.
TDR354 | Builders’ session | Simulate and detect unwanted IMDS access due to SSRF Using appropriate security controls can greatly reduce the risk of unauthorized use of web applications. In this builders’ session, find out how the server-side request forgery (SSRF) vulnerability works, how unauthorized users may try to use it, and most importantly, how to detect it and prevent it from being used to access the instance metadata service (IMDS). Also, learn some of the detection activities that the AWS Customer Incident Response Team (CIRT) performs when responding to security events of this nature.
TDR341 | Code talk | Investigating incidents with Amazon Security Lake & Jupyter notebooks In this code talk, watch as experts live code and build an incident response playbook for your AWS environment using Jupyter notebooks, Amazon Security Lake, and Python code. Leave with a better understanding of how to investigate and respond to a security event and how to use these technologies to more effectively and quickly respond to disruptions.
TDR441 | Code talk | How to run security incident response in your Amazon EKS environment Join this Code Talk to get both an adversary’s and a defender’s point of view as AWS experts perform live exploitation of an application running on multiple Amazon EKS clusters, invoking an alert in Amazon GuardDuty. Experts then walk through incident response procedures to detect, contain, and recover from the incident in near real-time. Gain an understanding of how to respond and recover to Amazon EKS-specific incidents as you watch the events unfold.
TDR271-R | Workshop | Chaos Kitty: Gamifying incident response with chaos engineering When was the last time you simulated an incident? In this workshop, learn to build a sandbox environment to gamify incident response with chaos engineering. You can use this sandbox to test out detection capabilities, play with incident response runbooks, and illustrate how to integrate AWS resources with physical devices. Walk away understanding how to get started with incident response and how you can use chaos engineering principles to create mechanisms that can improve your incident response processes.
TDR371-R | Workshop | Threat detection and response on AWS Join AWS experts for a hands-on threat detection and response workshop using Amazon GuardDuty, AWS Security Hub, and Amazon Detective. This workshop simulates security events for different types of resources and behaviors and illustrates both manual and automated responses with AWS Lambda. Dive in and learn how to improve your security posture by operationalizing threat detection and response on AWS.
TDR372-R | Workshop | Container threat detection with AWS security services Join AWS experts for a hands-on container security workshop using AWS threat detection and response services. This workshop simulates scenarios and security events while using Amazon EKS and demonstrates how to use different AWS security services to detect and respond to events and improve your security practices. Dive in and learn how to improve your security posture when running workloads on Amazon EKS.
Browse the full re:Inforce catalog to get details on additional sessions and content at the event, including gamified learning, leadership sessions, partner sessions, and labs.
If you want to learn the latest threat detection and incident response best practices and updates, join us in California by registering for re:Inforce 2023. We look forward to seeing you there!
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.