Tag Archives: Advanced (300)

Infrastructure as code translation for serverless using AI code assistants

2025-07-16 Debasis Rath

Post Syndicated from Debasis Rath original https://aws.amazon.com/blogs/compute/infrastructure-as-code-translation-for-serverless-using-ai-code-assistants/

Serverless applications commonly use infrastructure as code (IaC) frameworks to define and manage their cloud resources. Teams choose different IaC tools based on their skills, existing tooling, or compliance needs. As applications grow, the need to shift between IaC formats may arise to adopt new features or align with evolving standards. Developers are rapidly adopting AI-powered coding assistants to help with these evolving demands. In this post, we focus on Amazon Q Developer as an example, but the guidance applies broadly to any coding assistant. Amazon Q Developer is an AI-powered assistant that helps developers with code generation, problem-solving, and development tasks within the Amazon Web Services (AWS) ecosystem. Amazon Q Developer command line interface (CLI) allows developers to convert infrastructure definitions between popular IaC frameworks. This post demonstrates how to use Amazon Q CLI to translate a serverless project from a source IaC such as Serverless Framework version 3 to an IaC framework of choice such as the AWS Serverless Application Model (AWS SAM). To make demonstration more accessible, we have chosen a low-complexity project. However, Amazon Q CLI supports bidirectional translation across multiple IaC formats. We walk through how to migrate a reference architecture to show how the process works, as shown in the following figure.

Figure 1. Architecture diagram of example AWS solution to translate

This sample project orchestrates the deployment of a REST API using Amazon API Gateway, acting as an Amazon Simple Storage Service (Amazon S3) proxy for write operations. It includes API-Key setup, basic request validator, AWS Lambda invocation on Amazon S3 events, and enables Amazon CloudWatch Logs and AWS X-Ray tracing for API Gateway and Lambda using the Powertools for Lambda developer toolkit.

Solution overview

Amazon Q Developer is trained on AWS best practices and provides an AI-powered experience through its CLI. It automates IaC translation by reducing manual effort, minimizing errors, and preserving the original intent across frameworks. The translation process follows four steps: assess, translate, test and refine, and deploy. The following figure shows this workflow.

Figure 2. Logical flow for assessment, translation, testing, and deployment

Assess: Analyze existing Serverless Framework projects for compatibility and readiness.
Translate: Convert Serverless Framework configuration into AWS SAM templates using Amazon Q Developer CLI.
Test and refine: Validate and improve translated templates to make sure of functional accuracy and best practices.
Deploy: Package and deploy the finalized AWS SAM templates to AWS environments.

Prerequisites and considerations

The following prerequisites and considerations are necessary to complete this solution.

Define custom rules to guide automation with Amazon Q Developer

Amazon Q Developer uses a rule-based model to automate tasks that is guided by user-defined rules. These rules encode your team’s standards to make sure that the automation is consistent and repeatable. You can create a library of custom rules to enforce best practices when using Amazon Q in your integrated development environment (IDE) or through the CLI. To help you get started, we’ve included a sample rules file that provides a baseline configuration. This file defines the structure of the output, sequence of the automation steps, and best practices to follow during each phase of the project. You can customize these rules to align with your organization’s architectural guidelines, security policies, or compliance needs.

Understand and categorize project complexity

Serverless projects differ in scale and structure, which directly impacts how you assess them. Smaller projects with minimal configuration and a few functions typically present fewer challenges. Larger, more complex projects can include dozens of Lambda functions, shared layers, and integrations across services such as Amazon Simple Queue Service (Amazon SQS), Amazon DynamoDB, or Amazon EventBridge. Start by categorizing the project as low, medium, or high complexity based on factors such as the number of functions, the diversity of event sources, and the presence of shared configurations. Use this categorization to prioritize and scope your assessment efforts. For complex workloads, assess individual components separately to reduce the surface area for troubleshooting and remediation.

Handle framework-specific tooling and plugins

Plugins or dependencies in different IaC frameworks extend core functionality or introduce custom behaviors. AWS SAM supports similar capabilities but in a different way. For example, you may be able to use AWS SAM, but for capabilities not found in AWS SAM, you can use AWS CloudFormation macros or Lambda-backed custom resources. During assessment, identify all active plugins and document their purpose and integration points. Evaluate whether each plugin’s functionality can be replicated using native AWS services or custom resources in AWS SAM. For common patterns—such as packaging optimizations, function warmers, or custom deploy hooks—consider using the CloudFormation macros and custom resources. When plugin functionality cannot be translated directly, annotate it in your assessment report for manual intervention. Clearly mapping each plugin’s role helps maintain parity and reduces surprises during deployment in the new environment.

With all of this you are ready to start the conversion.

Assess with Amazon Q Developer

The animated diagrams included in this post offer step-by-step visuals to explain the Amazon Q behavior throughout the workflow. Remember that you have already set rules for Amazon Q for each phase. Now your prompt to Amazon Q is clear. At this point Amazon Q has enough context to get you crisper and deterministic result. Use the following prompt to start the assessment:

Prompt

Evaluate the readiness of the Serverless Framework v3 project for 
translation to AWS SAM using the provided assessment rules.

Figure 3. Assessment step using Amazon Q Developer

After the assessment, Amazon Q Developer generates translation recommendations based on AWS best practices. It produces an evaluation_summary.md file with detailed insights, mapping guidance, and technical considerations for converting components to AWS SAM resources. The report serves as the foundation for the next step: automated translation into AWS SAM resources.

Translate using Amazon Q Developer

After completing the assessment, begin the translation using the baseline rules defined in .amazonq/rules/translation_rules.md. These rules guide the conversion and make sure of consistency with the assessment outputs. Amazon Q Developer CLI uses these rules to parse the serverless.yml file, scaffold a new project structure, and generate a complete AWS SAM template. During translation, Amazon Q Developer performs the following actions:

Converts each Lambda function into an AWS::Serverless::Function, preserving runtime, handler, memory, timeout, and environment settings.
Translates event sources such as HTTP APIs and Amazon SQS into SAM event definitions.
Maps AWS Identity and Access Management (IAM) policies and permissions into CloudFormation-compatible resources.
Removes development-only settings such as the serverless-offline plugin.

Serverless Framework v3 often uses CloudFormation orchestration and custom resources to deliver certain capabilities. For example, it may use custom resources to provision S3 bucket notifications. Amazon Q detects these patterns during assessment and translates them into explicit, well-structured AWS SAM resources. This makes sure of functional parity in the target IaC.Use the following prompt to begin the translation:

Prompt

Apply the translation rules to migrate this Serverless Framework 
v3 project into an AWS SAM project while maintaining all 
original infrastructure behavior.

Figure 4. Translation using Amazon Q Developer

After translation, Amazon Q Developer produces a complete AWS SAM project with test scripts and documentation. The project supports local testing, automated deployment, consistent resource management, and native integration with AWS tools. You also receive a development_summary.md file with a structured project overview and step-by-step testing instructions.

Amazon Q Developer replaces resources created implicitly by Serverless Framework plugins (such as Serverless Lift or custom resources for handling circular dependencies) with explicit CloudFormation definitions. To support custom or unsupported plugins, define the translation logic in .amazonq/rules/development_rules.md. Specify mappings or flag resources for manual review. This maximizes automation while highlighting exceptions early in the workflow.

Test and refine using Amazon Q Developer

Validate the translated AWS SAM application using the local testing rules defined in .amazonq/rules/local_testing_rules.md. These rules guide high-fidelity simulation and verification.

Amazon Q Developer generates test commands that use the AWS SAM CLI to replicate real-world behavior. It uses sam local invoke to test Lambda functions and sam local start-api to simulate HTTP API calls. This makes sure of the translated application behaves as expected when compared to the original Serverless Framework project.

To simulate Amazon S3 events, provision temporary S3 buckets, and instruct Amazon Q Developer to reference them during testing, it enables full end-to-end validation by combining real Amazon S3 interactions with a local function execution.Use the following prompt to begin testing:

Based on the local test rules, test the Lambda function in 
SAM project. Assume S3 bucket name is : <BUCKET_NAME>

Figure 5. Testing and refinement step using Amazon Q Developer

Use AWS SAM Accelerate with sam sync to run cloud-based integration tests in a lower environment after completing local validation. This complements early testing and helps catch runtime issues before deployment. Combining Amazon Q Developer automation with AWS SAM CLI allows you to speed up feedback cycles and make sure of functional accuracy in the cloud environment.

Deploy

The translated and tested AWS SAM application is ready, thus the final step is deployment. Using AWS SAM CLI, package and deploy the application to an AWS environment where it becomes fully operational. Begin by running the following:sam build

This command prepares the application for deployment by packaging the Lambda function code, resolving dependencies, and creating build artifacts in the .aws-sam directory.Next, deploy the application using the following:

sam deploy --guided

The --guided flag walks you through the initial configuration, such as stack name, AWS Region, and necessary capabilities such as IAM role creation. When it’s complete, CloudFormation provisions all resources defined in the template.yaml, such as Lambda functions, API Gateway endpoints, SQS queues, and IAM policies. Here is how the output looks from the deployment:

Key                 ApiGKeyId
Description         API Gateway Key ID
Value               j5u41XXXXXX

Key                 S3BucketName
Description         Name of the S3 bucket
Value               bb245-sfp-XXXXXXXXXX

Key                 ApiRootUrl
Description         Root URL of the API Gateway
Value               https://XXXXXXXX.execute-api.us-east-1.amazonaws.com/dev/api/{order_object_path+}

Key                 ProcessS3DataFunction
Description         ProcessS3Data Lambda Function ARN
Value               arn:aws:lambda:us-east-1:0123456789012:function:q-generated-stack-ProcessS3DataFunction-
jvXXXXXXXMAT

AWS SAM emphasizes explicit definitions such as resource names and parameters. Therefore, using the AWS SAM guided deployment here helps by presenting change set reviews to verify these changes.Now that you’ve translated and tested your AWS SAM application, verify its parity with the original Serverless Framework stack. Compare CloudFormation outputs—API Gateway endpoints, S3 bucket names, Lambda Amazon Resource Names (ARNs), and queue URLs—and automate integration or A/B tests to confirm functional equivalence. Then, deploy the AWS SAM version using a canary release, monitor performance and user metrics, and shift traffic gradually to minimize risk.

Cleaning up

If you no longer need the AWS resources that you created by running this example, then you can remove them by deleting the CloudFormation stack that you deployed.

To delete the CloudFormation stack, use the sam delete command:

sam delete --stack-name apigw-s3-lambda-sam-stack

Conclusion

In this post you’ve learned how Amazon Q Developer CLI can streamline the translation of IaC by using an example of migrating Serverless Framework to AWS SAM. Using AI-powered conversational interfaces and deep integration with AWS knowledge means that Amazon Q Developer substantially reduces the manual effort and potential errors involved in these translations. Comprehensive assessment, translation, testing, and deployment can be difficult to accelerate, but this can be streamlined with new generative AI tools from AWS.

For more information on Amazon Q, you can check out Amazon Q Developer. For more serverless learning resources, visit Serverless Land. To find more patterns, go directly to the Serverless Patterns Collection.

Modernizing SOAP applications using Amazon API Gateway and AWS Lambda

2025-07-15 Daniel Abib

Post Syndicated from Daniel Abib original https://aws.amazon.com/blogs/compute/modernizing-soap-applications-using-amazon-api-gateway-and-aws-lambda/

This post demonstrates how you can modernize legacy SOAP applications using Amazon API Gateway and AWS Lambda to create bidirectional proxy architectures that enable integration between SOAP and REST systems without disrupting existing business operations.

Many organizations today face the challenge of maintaining critical business systems that were built decades ago. These legacy applications power essential business operations despite relying on outdated technologies and integration patterns. Although complete system replacement would be ideal, practical constraints such as budget limitations, resource availability, technical complexity, and missing documentation often make modernization efforts challenging.

This post first shows proxy architecture patterns to expose a legacy SOAP server over a REST API. It then shows how to integrate a legacy SOAP client with applications using a REST API.

While SOAP and REST APIs share HTTP as their foundation, SOAP has some limitations compared to REST, like limited HTTP methods (GET/POST only) and mandatory XML formatting. REST is more flexible with multiple HTTP methods and diverse payload formats (plain text, binary, HTML, JSON, XML).

Using API Gateway and Lambda to proxy SOAP service

Consider a legacy solution that only supports SOAP. The following diagram shows the architecture for a SOAP proxy server using API Gateway and Lambda.

Figure 1: SOAP Server Proxy for modernized architecture

The proxy exposes the APIs hosted on the SOAP Server (on the right side of the image) over a REST interface. A SOAP service expects the HTTP Content-Type header set to text/xml, and a XML format payload that follows the WSDL specification defined by the server.

In the proposed architecture, the Lambda function is the core transformation engine, handling the bidirectional conversion between JSON and XML formats. Lambda functions can be developed in multiple programming languages such as Python, Node.js, Java, C#, Go, Ruby, and PowerShell, allowing you to use your existing development expertise. The serverless nature of Lambda provides automatic scaling to handle traffic spikes without needing infrastructure management or capacity planning.

API Gateway acts as the intelligent front door, managing all incoming requests and routing them appropriately. It provides enterprise-grade features such as request throttling to protect backend systems from overload, comprehensive authentication and authorization mechanisms, API key management for partner access control, request and response validation, caching capabilities for improved performance, and detailed monitoring and logging. These built-in features remove the need for custom middleware development and provide immediate operational benefits. API Gateway can receive multiple payload format such as XML, JSON, binary data, and plain text. This makes it suitable for diverse integration scenarios.

Using API Gateway and Lambda to support legacy SOAP clients

The previous section focused on exposing SOAP services over REST APIs. Organizations also face the reverse challenge where legacy SOAP client applications must access REST services. The architecture for supporting legacy SOAP clients follows a similar pattern but with reversed data flow. In this case, the legacy SOAP client sends XML-formatted requests to what it believes is a SOAP server. However, behind the scenes API Gateway and Lambda work together to translate these requests into REST API calls.

Figure 2: Legacy SOAP client modernization architecture

The legacy SOAP client application sends XML SOAP messages to API Gateway. The Lambda function receives these SOAP requests, extracts the relevant data from the XML envelope, and transforms it into JSON format for the modern REST service.

The Lambda function wraps the JSON response from the REST services into the SOAP XML format that the legacy client expects. It recreates the appropriate XML structure, SOAP headers, and ensures that the response conforms to the WSDL specification that the client application was designed to consume.

Example scenario

Let’s suppose our legacy client application needs to send a SOAP request to convert an integer number to its word form. The SOAP envelop to convert the number 1519 to its long form “one thousand, five hundred and nine” looks like this:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body> \
        <ConvertNumberToWordsSoapIn>
            <NumberToWordsRequest>1519</NumberToWordsRequest>
        </ConvertNumberToWordsSoapIn>
    </soap:Body>
</soap:Envelope>

The REST conversion service expects a JSON payload the as follownig:

jsonObject = {
	"data" : 1519
}

The following code block shows a sample Lambda function implementation for this. This function converts the SOAP XML envelop to JSON, changes the http header to application/json, and converts response from REST service to SOAP format.

var parseString = require('xml2js').parseString;
const axios = require('axios');

exports.handler = async (event, context) => {
    var valueNumber;
    
    try {
        console.log("Parsing XML string");

        // Parsing the XML to obtain data needed for conversion (number to words)
        parseString(event.body, function (err, result) {
            if (!err) {
                valueNumber = result['soap:Envelope']['soap:Body'][0]
                              ['ConvertNumberToWordsSoapIn'][0]
                              ['NumberToWordsRequest'][0];
            } else { 
                console.log (err);
                throw (err);
            }
        });
        console.log("Creating JSON for calling the service");
        // Creating JSON to call service
        var jsonObject = {
            "data" : valueNumber
        }
        
        console.log("Calling Microservice (NumberToWords)");
        const headers = { 
            'Content-Type': 'Application/json'
        };
        
        console.log ("Parameter for NumberToWords URL:" + 
                    JSON.stringify(process.env.NumberToWordMicroservice));

        // Calling numberToWords REST Server
        var resultNumberToWords = await 
            axios.post(process.env.NumberToWordMicroservice, jsonObject, { headers });
        
        // Creating the response
        console.log("Creating response XML");

        var resp =  create_response (JSON.stringify(resultNumberToWords.data.message));
        console.log("Response in XML: "+ resp);
        
        // Returning the value in XML using text/xml content type
        let response = {'statusCode': 200, headers: {"content-type": "text/xml"}, 
                        'body': JSON.stringify(resp)}
        return response;
        
    } catch (err) {
        console.log ("Error: " + err);
        let response = {'statusCode': 500, 
                        headers: {"content-type": "text/xml"}, 'body': err}
        return response;
    }
};

// Function to create a SOAP XML envelope with the result value
function create_response(numberInWords) {
  return '<?xml version="1.0" encoding="utf-8"?> \
            <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">\
            <soap:Body>\
              <m:ConvertNumberToWordsResponse xmlns:m="http://www.dataaccess.com/webservicesserver/"> \
                  <m:ConvertNumberToWordsResponseResult>' + numberInWords + '</m:ConvertNumberToWordsResponseResult> \
              </m:ConvertNumberToWordsResponse> \
            </soap:Body>\
            </soap:Envelope>';
}

With this approach, you can maintain your existing SOAP client applications without modification, allowing them to consume modern REST services. You can preserve investments in legacy client applications while gradually modernizing the overall system. This architecture is particularly valuable in scenarios where multiple legacy SOAP clients need to access the same modern REST services. This is because a single proxy can serve multiple client applications simultaneously. The serverless nature of the architecture makes sure that it scales automatically based on the number of client requests, providing cost-effective operation regardless of usage patterns.

Alternative approach using API Gateway transformation capabilities

The Lambda-based approach provides maximum flexibility and control. API Gateway also offers built-in transformation capabilities that can handle certain SOAP modernization scenarios without the need for compute resources.

The native API Gateway transformation uses Apache Velocity Template Language mapping templates. It converts the payload directly at the gateway, offering a streamlined solution for specific modernization scenarios.

The VTL approach works by defining mapping templates that handle the conversion process between different payload formats. When modernizing SOAP services, these templates can intercept REST requests with JSON payloads, restructure the data into XML format compatible with your legacy SOAP endpoints, and reverse the process for responses returning to the client.

Figure 3: API Gateway with velocity template language transformation

This gateway-native transformation strategy offers several operational advantages. You benefit from streamlined architecture because the transformation logic resides entirely within the API Gateway service. There are no other infrastructure components to manage or monitor, and the solution avoids the complexity of coordinating between multiple AWS services. Cost efficiency is another key benefit, as there are no compute charges beyond the standard API Gateway pricing.

Consider the previous example of converting a number to its word format. The VTL transformation in API Gateway will look like this:

## Parse the SOAP envelope and extract the number value
#set(\$xmlDoc = \$input.path('\$'))
#set(\$numberToWords = \$xmlDoc.Envelope.Body.ConvertNumberToWordsSoapIn.NumberToWordsRequest)

## Convert to integer if it's a string
#if(\$numberToWords.toString().matches("^\d+\$"))
  #set(\$dataValue = \$numberToWords.toInteger())
#else
  #set(\$dataValue = \$numberToWords)
#end

{
  "data": \$dataValue
}

You should consider VTL transformations when your SOAP services have predictable, stable schemas with relatively direct XML structures. This approach works particularly well for legacy systems that rarely undergo changes and have clear request-response patterns. For more dynamic environments or complex transformation requirements, the Lambda-based solution provides superior flexibility and maintainability.

Security considerations

An important consideration when working with legacy SOAP services is understanding their authentication mechanisms. SOAP protocols often implement authentication through security standards, where authentication credentials and security tokens are embedded directly within the SOAP envelope headers. This includes username tokens, digital signatures, and encryption elements that are part of the XML structure.

When SOAP envelopes contain unencrypted authentication information in the headers, the proxy architecture typically functions without more modifications. This is because the Lambda function can pass through these authentication elements transparently to the backend SOAP service. However, due to the nature of SOAP authentication being tightly integrated with the XML envelope structure, certain scenarios may need custom handling within the Lambda function.

For example, if the SOAP service uses timestamp-based authentication tokens, session management, or needs specific security header modifications, the Lambda function may need customization to properly handle, validate, or refresh these authentication elements during the JSON-to-XML transformation process. Organizations should carefully analyze their SOAP service authentication requirements to determine if more Lambda logic is needed to maintain security compliance.

Moreover, make sure that any SOAP authentication credentials processed by the Lambda function are handled securely and never logged in plain text.

Conclusion

In this post, you learned how cloud-native services can bridge the gap between legacy systems and modern application architectures, allowing you to use your existing investments while adopting contemporary development practices and technologies.

Amazon API Gateway and AWS Lambda enable organizations to create REST services that proxy legacy SOAP servers, allowing modern applications to consume legacy services through JSON payloads while preserving existing SOAP infrastructure. This serverless solution provides cost-effectiveness, automatic scaling, and reduced operational overhead while facilitating company modernization through scalable APIs without abandoning legacy software investments.

This modernization strategy allows you to gradually transition from legacy SOAP services to modern REST APIs without disrupting existing business operations. As your modernization journey progresses, you can extend this pattern to support more SOAP services or implement more sophisticated transformation logic based on your specific business requirements.

For more serverless learning resources, visit Serverless Land.

How Scale to Win uses AWS WAF to block DDoS events

2025-07-14 Ben "Fuzzy" Shonaldmann

Post Syndicated from Ben "Fuzzy" Shonaldmann original https://aws.amazon.com/blogs/architecture/how-scale-to-win-uses-aws-waf-to-block-ddos-events/

This is a guest post coauthored by Ben “Fuzzy” Shonaldmann of Scale to Win.

Scale to Win was created for organizers by organizers. Born out of the 2020 election cycle, a group of friends and former colleagues came out of a whirlwind presidential primary season with frustrations and ideas to improve campaign technology. With extensive outreach experience on high-profile presidential campaigns such as Biden/Harris 2020, Bernie 2016 and 2020, Warren 2020, and Clinton 2016, we knew the power of organizing. We saw how conversations—neighbor to neighbor, friend to friend, volunteer to voter—could transform communities and drive movements. Yet, the tools available for voter contact programs regularly fell short of our needs. We built and launched our first product in April 2020, a peer-to-peer (P2P) texting tool. We’ve since added a dialer tool that allows organizations to easily call voters, and Scale to Win Text, an all-in-one texting tool for organizing and fundraising.

During the 2024 US presidential election campaign season, we were the target of distributed denial of service (DDoS) events. These events reached peaks of over 2 million requests per second from nearly ten thousand unique IPs. After a brief window of downtime at the start of these events, Scale to Win partnered with Amazon Web Services (AWS) to implement AWS WAF, AWS Shield Advanced, and Amazon CloudFront to mitigate these targeted DDoS events.

One key element of our defense was AWS WAF support for Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) to automatically present a challenge to suspicious-looking clients. We used this to provide a per-IP rate limit for traffic we expected to see from legitimate clients behind a single IP. For IPs that exceed this rate limit, we present a CAPTCHA and provide a higher rate limit to the maximum amount of traffic we expect to come from a single IP, like a campaign office or college campus.

However, the AWS WAF out-of-the-box CAPTCHA had an important caveat—it doesn’t provide protection against an event using a token that is solved and distributed across a network of machines behind multiple IPs. To prevent this class of event, AWS WAF users need to identify CAPTCHA tokens that are being used from multiple IPs and automatically block this traffic.In this post, you’ll learn how Scale to Win configured our network topology to maximize DDoS protection capacity, configured AWS WAF to block DDoS events, segmented machine-to-machine and browser-to-machine traffic to target CAPTCHA interventions, and blocked token reuse across IP addresses.

Network Topology Overview

The Scale to Win application runs behind an Application Load Balancer (ALB) that serves as the single point of entry to our application servers. Before implementing AWS WAF, our DNS records resolved to the ALB as shown in the following diagram. The diagram shows a path from users to Elastic Load Balancing to an auto scaling group of Amazon Elastic Compute Cloud (Amazon EC2) instances.

Diagram of a VPC with public and private subnets, Elastic Load Balancing, and Amazon EC2 instances

The recommended pattern for using AWS WAF to protect against DDoS events is to instead route traffic through a CloudFront distribution as shown in the following diagram. The diagram shows a path from users to CloudFront with AWS WAF to Elastic Load Balancing to an auto scaling group of EC2 instances.

VPC architecture with AWS WAF, CloudFront, Load Balancing, and EC2 instances

Using this topology, we can take advantage of the following benefits:

The capacity of CloudFront to handle network-layer events sending TCP traffic that aren’t valid HTTP requests is greater than ALB capacity. Network-level protections are strongest when applied by CloudFront before passing traffic to the ALB.
The edge deployment of AWS WAF has higher quotas and capacity than Regional deployments. AWS scales AWS WAF capacity one-to-one with CloudFront capacity, so AWS WAF always has enough capacity to handle incoming traffic.
When implementing a geographic based block strategy, such as blocking specific countries, CloudFront provides a geographic restriction feature. In event patterns we observed, over half of malicious traffic originated from countries that we blocked completely in CloudFront, resulting in cost savings on our AWS WAF deployment.

To prevent threat actors from bypassing our CloudFront distribution and AWS WAF, we implemented a security group that allows traffic only from CloudFront edge IP addresses using the managed prefix list to prevent direct connections to the ALB. We then configured the ALB to require a secret, pre-shared token in a request header, for example, our header x-stw-example-secret. We configured the CloudFront distribution to add that request header when forwarding traffic to the origin. This secret isn’t logged to Application Load Balancer logs but is logged to AWS CloudTrail when setting or updating the CloudFront or ALB configuration. It’s possible to rotate this secret on a schedule by generating a random password with AWS Secrets Manager with an AWS Lambda function to update the CloudFront origin configuration and ALB listener configuration, as shown in the following two screenshots. This shared secret between the CloudFront distribution and ALB means that an event can’t bypass the security group control by configuring its own CloudFront distribution with our ALB as an origin.

The following screenshot shows configuration of the Application Load Balancer, showing a priority-1 rule to route traffic to our target group only if the x-stw-example-secret is correct. A default rule returns a 403 if it’s not correct.

AWS Elastic Load Balancing configuration details, displaying the protocol, port, load balancer name, and SSL/TLS certificate for a listener.

The following screenshot shows CloudFront origin settings, showing a custom header configuration to add the shared secret to requests to the origin.

AWS CloudFront origin settings showing domain, protocol, SSL, and custom header options

Configuring AWS WAF rules

We use two approaches to identify and block malicious DDoS traffic, a heuristic approach and a hard limit approach.

For the heuristic approach, we observe event traffic in AWS WAF sample requests and AWS WAF logs, identify patterns in the event, and block those specific patterns. To use this tactic in your own AWS WAF configuration, first identify a characteristic that is common to all or most event traffic but rare in legitimate traffic, like a HTTP header or query parameter. Create a AWS WAF rule that blocks the traffic. Using the Count rule action is helpful to test your rule and find false-positive and false-negative rates by correlating requests that match the rule with IPs that are sending high volumes of traffic. You can query AWS WAF logs with Amazon Athena to find these correlations. When you’re satisfied with your false-positive and false-negative rates, set the rule action to Block to actively block the matched traffic.

A motivated threat actor will notice the event is being blocked and randomize these parameters to bypass your rules. For example, they might replay legitimate session parameters, so that request URIs, query parameters, and request bodies are identical to legitimate traffic. Ultimately, the heuristic approach is a useful tool, and a good reason to set up logging and Athena to quickly query your AWS WAF logs, but it’s a solution that requires active reconfiguration of your AWS WAF as event patterns change.

One specific heuristic that’s worth calling out is JA4 fingerprints and their predecessor, JA3 fingerprints. A JA4 fingerprint is a hash of several parameters presented by TLS clients in the client hello packet. JA4 fingerprints roughly correspond to the specific TLS client parameters of the request, so they’re usually stable for a specific type of client. For example, a specific version of Google Chrome on a particular operating system has a fingerprint, but a version on a different operating system would have a different fingerprint. This makes JA4 fingerprints a useful heuristic for blocking malicious traffic because the botnet sending the traffic is likely to have a small number of different fingerprints. However, there are two caveats with blocking requests based on JA4 fingerprints:

Threat actors might have the same JA4 fingerprint as legitimate users. In our case, we saw malicious traffic with the same fingerprint as a common NodeJS API client and were unable to block that fingerprint without blocking legitimate API clients.
A sophisticated threat actor can randomize some of the TLS connection parameters, such as the order of TLS extensions in the client hello packet, which is done by threat actors to avoid JA3 fingerprint blocking and some browsers to preserve user privacy. Although the newer JA4 specification reduces some randomization elements, the fundamental challenge remains—because end users maintain control over the client hello packet, this creates an ongoing challenge of adaptation and response.

For the hard limit approach, we take advantage of request parameters that the event can’t fake, such as the source IP of the request. Although the source IP of a packet can be forged, these requests are dropped because they don’t successfully complete the TCP or TLS handshake. You can create AWS WAF rules that limit how much traffic can be sent from a single source IP using rate-based rule statements, setting the limit above what a legitimate user would send but below what the event needs to be effective.

The simplest approach with rate-based rule statements would be to have a single limit in the AWS WAF policy, but this has challenges. First, we have APIs that receive high volumes of machine-to-machine traffic. For example, we place outbound phone calls using Twilio, and Twilio sends tens of thousands of webhooks to our API per second to relay delivery status, sometimes from a single Twilio IP. Also, our application is often used by campaign offices or student-led phone banks on college campuses that might have hundreds of users behind a single or handful of IPs, so we can’t assume that a single IP is being used by a single user to calculate our acceptable rates of traffic.

Segmenting human and machine traffic

To address these challenges, we structure our AWS WAF rules to segment human and machine traffic. We do this by request path: AWS WAF rules can match based on request path, so we have a set of rules for machine traffic based on the webhook URLs or API paths where we expect machine traffic. For all other paths, we assume the traffic originates from a user visiting the site from their browser.

For machine traffic, we can’t use a CAPTCHA. Our API clients and the Twilio webhook infrastructure can’t solve the CAPTCHA, so we validate traffic differently. For our API clients, we set per-IP rate limits to what we expect from a single API client and return a 429 HTTP status when traffic exceeds our limit. We implement automatic retries to handle the occasional 429 error when sending above the limit. For Twilio or other webhook callbacks, we use published IPs from the service provider to create an IP set in AWS WAF. Then, we block requests to our webhook URL that don’t originate from the IP set. Not all service providers use static IPs—for Twilio specifically, we worked with their team to implement their Static Proxy feature to proxy webhook requests from a stable list of IPs. It also makes sense to implement API key authentication, request signing, and certificate-based authentication when possible for your machine traffic. For requests that originate from the IP set, we apply an Allow action in our AWS WAF rule to allow expected machine traffic through, as shown in the following screenshot. These rules allow high-volume machine-to-machine traffic through our AWS WAF configuration before we apply AWS WAF rules to block high volume user-based traffic. The following screenshot shows an example rule allowing expected machine-to-machine traffic.

WAF rule configuration showing machine traffic allow rules with URI path and IP settings

For our browser-based human traffic, we use a tiered rate limit strategy. We implement two tiers of rate limit: a low-level rate limit that corresponds to the maximum rate we expect two to three users to send, and a high-level rate limit that corresponds to the maximum rate we expect from a single IP with hundreds of users sharing the IP. Our low-level rule uses a CAPTCHA rule action when requests exceed the limit. Our high-level rule applies a Block action when requests exceed the limit. We set the high-level rate limit rule priority lower than the low-level rule so it’s processed first, as shown in the following screenshot.

AWS WAF web ACL interface displaying rules for traffic allow, block, and CAPTCHA actions

When a few users are sharing an IP address, they’re unlikely to hit either limit and can use the application without interruption. If a large group of users share an IP address, they’ll need to solve a CAPTCHA. After they solve it, they’re still limited by the high-level rate limit rule, which will prevent a threat actor from solving the CAPTCHA and then sending unlimited traffic using the token issued by AWS WAF.

Handling CAPTCHA on the frontend

For server-rendered applications, AWS WAF handles CAPTCHA challenges automatically through the following process:

When a request matches a CAPTCHA rule without a valid token, AWS WAF presents an AWS managed CAPTCHA challenge page.
Upon successful CAPTCHA completion, AWS WAF issues a Set-Cookie header containing the CAPTCHA token.
Subsequent requests with valid CAPTCHA tokens can pass through AWS WAF for the configured immunity period.

For single-page applications (SPAs) or applications making API requests to a AWS WAF protected backend, additional implementation steps are required:

Configure your frontend to detect blocked requests (HTTP 405 status code indicates CAPTCHA requirement).
Present the CAPTCHA challenge to the user when required.
Resubmit the original request after successful CAPTCHA completion.

The following sample code demonstrates how to implement CAPTCHA handling in a React component using the AWS WAF CAPTCHA JavaScript API:

import { useState, useRef, useEffect, ReactElement } from "react";

declare global {
  class CaptchaError extends Error {
    constructor(message: string);
    kind: "internal_error" | "network_error" | "token_error" | "client_error";
    statusCode?: number;
  }
  interface Window {
    AwsWafCaptcha: {
      renderCaptcha: (
        targetDiv: HTMLElement,
        options: {
          apiKey: string;
          onSuccess: (token: string) => void;
          onError: (err: CaptchaError) => void;
        }
      ) => void;
    };
    AwsWafIntegration: {
      fetch: typeof fetch;
    },
    awsWafCookieDomainList: string[];
  }
}

type DataLoading = {
  state: "loading";
};

type DataError = {
  state: "error";
  message: string;
};

type DataLoaded = {
  state: "loaded";
  data: any;
};

type Data = DataLoading | DataError | DataLoaded;

function MyComponent(): ReactElement {
  const [data, setData] = useState<Data>({ state: "loading" });
  const captchaDiv = useRef<HTMLDivElement>(null);

  useEffect(() => {
    void (async () => {
      try {
        const response = await window.AwsWafIntegration.fetch('/your/api/url');

        if (response.status === 405) {
          // The user needs to solve a captcha. 
          window.AwsWafCaptcha.renderCaptcha(captchaDiv.current!, {
              apiKey: "...API key goes here...",
              onSuccess: async _wafToken => {
                const response = await window.AwsWafIntegration.fetch('/your/api/url');
    
                if (response.status !== 200) {
                  setData({ state: "error", message: "Unable to load data." });
                }
    
                const data = await response.json();
                setData({ state: "loaded", data });
              },
              onError: e => {
                console.error(e);
                setData({ state: "error", message: "Unable to load data." });
              }
              /* ...other configuration parameters as needed... */
          });
          

          return;
        }

        if (response.status !== 200) {
          throw new Error(`Unexpected response status: ${response.status}`);
        }

        const data = await response.json();
        setData({ state: "loaded", data });
      } catch (e) {
        console.error(e);
        setData({ state: "error", message: "Unable to load data." });
      }
    })();
  }, []);

  return (
    <div>
      <div ref={captchaDiv} />
      <div>
        {/* render the data */}
      </div>
    </div>
  );
}

Preventing CAPTCHA token reuse

Although AWS WAF provides built-in CAPTCHA functionality, additional security measures are necessary to prevent token reuse events. In these events, threat actors can solve a CAPTCHA one time and distribute the token across their botnet. AWS WAF Bot Control offers protection against this vulnerability by detecting and blocking CAPTCHA token reuse across multiple IPs, ASNs, or countries.

To implement this protection:

Configure the AWSManagedRulesBotControlRuleSet managed rule group after your rate-limiting rules.
Use the targeted protection level.
Apply Count or Allow actions to specific rules that you don’t want to apply to your traffic instead of a Block or CAPTCHA action.
Monitor and adjust rule actions based on observed traffic patterns.

Conclusion

In this post, you learned how we implemented comprehensive DDoS protection using AWS WAF by:

Implementing best practices for AWS WAF network topology using CloudFront
Segmenting human and machine traffic in our AWS WAF configuration
Using tiered rate limits and CAPTCHA to allow legitimate requests
Preventing CAPTCHA token reuse by using AWS WAF Bot Control

To learn more about fine-tuning and optimizing AWS WAF Bot Control, refer to Fine-tune and optimize AWS WAF Bot Control mitigation capability by Dmitriy Novikov in the AWS Security Blog.

About the authors

Orchestrating document processing with AWS AppSync Events and Amazon Bedrock

2025-07-09 Mehdi Amrane

Post Syndicated from Mehdi Amrane original https://aws.amazon.com/blogs/compute/orchestrating-document-processing-with-aws-appsync-events-and-amazon-bedrock/

Many organizations implement intelligent document processing pipelines in order to extract meaningful insights from an increasing volume of unstructured content (such as insurance claims, loan applications and more). Traditionally, these pipelines require significant engineering efforts, as the implementation often involves using several machine learning (ML) models and orchestrating complex workflows.

As organizations integrate these pipelines to customer facing applications (such as web applications for customers to upload documents such as insurance claims, loan approval documents and more), they set goals to provide insights in real time to increase the end customer experience. These organizations also aim to run and scale these workloads with minimal operational overhead and optimizing on costs. In addition, these organizations require the implementation of common security practices such as identity and access management, to make sure that only authorized and authenticated users are allowed to perform specific actions or access specific resources.

In this post, we show you a solution to simplify the creation of an intelligent document processing pipeline, with a web application for customers to upload their files (documents and images) and derive insights from it (summarization, fields extraction and classification). The solution primarily use serverless technologies, it includes a web socket to receive insights in real time and offers several benefits, such as automatic scaling, built-in high availability, and a pay-per-use billing model to optimize on costs. The solution also includes an authentication layer and an authorization layer to manage identities and permissions.

Solution overview

In this post, we provide an operational overview of the solution, and then describe how to set it up with the following services:

Amazon Bedrock and Amazon Bedrock Data Automation to summarize the content of uploaded files (documents or images) and generate insights from it
AWS Step Functions and AWS Lambda to orchestrate the summarization and extraction operations, using Amazon Bedrock and Amazon Bedrock Data Automation
AWS AppSync Events to create a serverless websocket in order for the web application to receive the summarization and extraction insights in real time
AWS Amplify to create and deploy the web application
Amazon EventBridge to trigger the orchestration workflow (using AWS Step Functions and AWS Lambda) upon the upload of a new file
Amazon Cognito to implement an identity platform (user directory and authorization management) for the web application.
Amazon Simple Storage Service (Amazon S3) to store uploaded files (to be processed by the processing pipeline) and web application-related assets.

The solution architecture is illustrated in the following diagram:

Step 1: The user authenticates to the web application (hosted in AWS Amplify).
Step 2: Amazon Cognito validates the authentication details. After this, the user is now logged in the web application.
Steps 3aand 3b:

Step 3a: The web application (AWS Amplify) subscribes to an AWS AppSync Events web socket.
Step 3b: The AWS AppSync Events web socket calls an AWS Lambda authorizer to confirm that the user is authorized to subscribe to the web socket.

Step 4: The user uploads a file (document or image) using the web application.
Step 5: The web application (hosted in AWS Amplify) calls Amazon Cognito (identity pool) to confirm that the user is authorized to upload a file.
Step 6: The file is uploaded in an Amazon S3 bucket.
Steps 7a and 7b: Upon reception of an Amazon S3 upload event (which notifies that the file was uploaded in the Amazon S3 bucket) in the default Amazon Event Bridge bus, an Amazon Event Bridge bus rule triggers the execution of an AWS Step Functions state machine to start the orchestration workflow.
Step 8 (Step to extract fields from a file and classify it):

Step 8a: The first AWS Lambda function starts a new Amazon Bedrock Automation job (this job extracts specific fields from the uploaded file and classify it)
Step 8b: Once the job is completed, the results are stored in an Amazon S3 bucket.
Step 8c and 8d: Upon reception of an Amazon S3 event (which notifies that the results were stored in the Amazon S3 bucket) in the default Amazon Event Bridge, an Amazon Event Bridge bus rule triggers the execution of an AWS Lambda function
Step 8e: An AWS Lambda function publishes the results to the web socket.

Steps 9a and 9b: The second AWS Lambda function submits a prompt to an Amazon Bedrock foundation model (Sonnet 3), to request a summarization in streaming of the uploaded file. The AWS Lambda function publishes the streaming data to the web socket.

After Step 8e and Step 9b, the user can now consult the summarization result and extraction insights of the uploaded file in the web application.

Pre-requisites

To follow along and set up this solution, you must have the following:

An AWS account
A device with access to your AWS account with the following:
- Python 3.12 installed (including pip)
- Node.js 20.12.0 installed
Enable Model Access to the Claude 3 Sonnet model in Amazon Bedrock

Note: Deploying this solution will incur costs. Review the pricing page of each AWS service used in this post for details on costs. The cost of running this solution will primarily depend on:

The number of documents (and the size of each document)
The number of active users

Setup Amazon Bedrock Data Automation

In this section, we setup an Amazon Bedrock Data Automation project and an Amazon Bedrock blueprint.

A project contains a list of blueprints, and each blueprint defines the fields to extract from different types of files (such as documents or images). In this post, we define a blueprint for a driving license.

Complete the following steps to create an Amazon Bedrock Data Automation project and a driving license blueprint:

Clone the GitHub repository

git clone https://github.com/aws-samples/sample-create-idp-with-appsyncevents-and-amazonbedrock.git

Go to the sample-create-idp-with-appsyncevents-and-amazonbedrock folder
```
cd sample-create-idp-with-appsyncevents-and-amazonbedrock
```
Initialize the environment (make the shell script files, from the GitHub repository, ready to be used)
```
chmod +x ./init-env.sh && source ./init-env.sh
```
Run the script setup-bda-project.sh to create an Amazon Bedrock Data Automation project and a sample driving license blueprint:
```
./setup-bda-project.sh
```

Create the web socket and orchestration backend

In this section, we create the following resources:

A user directory for web authentication and authorization, created with an Amazon Cognito user pool. An Amazon Cognito identity pool is also created to validate that users are authorized to upload files via the web application.
A web socket using AWS AppSync Events. This allows our web application to receive real time updates for summarization and extraction results. An authorization layer is also created to protect the web socket from unauthorized users. This is implemented with a Lambda authorizer function to validate that incoming requests include valid authorization details.
A state machine using AWS Step Functions and AWS Lambda to orchestrate the summarization and extraction operations from the unstructured content
Amazon S3 buckets to store files for document processing, and code files for AWS Lambda functions

Complete the following steps to create the web socket and the orchestration backend of the solution, using AWS CloudFormation templates:

Create Amazon S3 buckets used by the solution by running the following script. These buckets will store the files uploaded by users and code files of the AWS Lambda functions used in this solution.
```
cd $CURRENT_DIR/s3; ./create-s3-buckets.sh
```
Create the Amazon Cognito user pool and identity pool by running the create-cognito-userpool.sh script:
```
cd $CURRENT_DIR/cognito; ./create-cognito-userpool.sh
```
Create the AWS AppSync Events web socket by running the following script:
```
cd $CURRENT_DIR/appsync/; ./create-appsync-api.sh
```
Create the AWS Step Functions state machine (including AWS Lambda functions) by running the following scripts:
```
cd $CURRENT_DIR/orchestration/; ./create-orchestration.sh
```

Configure the Amazon Cognito user pool

In this section, we create a user in our Amazon Cognito user pool. This user will log in to our web application.

Run the script create-cognito-testuser.sh to create the user (make sure to provide your email address):

cd $CURRENT_DIR/cognito; ./create-cognito-testuser.sh #your-email-address#

After you create the user, you should receive an email with a temporary password in this format: “Your username is #your-email-address# and temporary password is #temporary-password#.”

Keep note of these login details (email address and temporary password) to use later when testing the web application.

Create the web application

In this section, we build a web application using AWS Amplify and publish it to make it accessible through an endpoint URL.

Complete the following steps to create the web application:

Run the script create-webapp.sh to create the web application with AWS Amplify:
```
cd $CURRENT_DIR/amplify/; ./create-webapp.sh
```
Run the script deploy.sh to deploy the web application
```
cd $CURRENT_DIR/amplify/amplify-idp; ./deploy.sh
```

The web application is now available for testing and a URL should be displayed, as shown in the following screenshot. Take note of the URL to use in the following section.

Test the web application

In this section, we test the web application and upload a file to be processed:

Open the URL of the AWS Amplify application in your web browser.
Enter your login information (your email and the temporary password you received earlier while configuring the user pool in Amazon Cognito) and choose Sign in.
When prompted, enter a new password and choose Change Password.
You should now be able to see a web interface.
Download the sample driving license at this location and upload it via the web application using either your camera or a file in your local device, as illustrated

Once the file is uploaded, you should start receiving responses in the web application. When all the operations are completed, you should see a result equivalent to what is shown in the following screenshot:

Note: If you are planning to use other driving license sample images with other formats, you may have to update the existing Bedrock Data Automation blueprint we created earlier or define a new blueprint in your Bedrock Data Automation project we created earlier for these new images to work. For more information, please review the Bedrock Data Automation documentation.

Clean up

To make sure that no additional cost is incurred, remove the resources provisioned in your account. Make sure you’re in the correct AWS account before deleting the following resources.

Important note: You should exercise caution when performing the preceding steps. Make sure you are deleting the resources in the correct AWS account.

You can either navigate to the AWS CloudFormation console to delete the CloudFormation stacks associated to the resources provisioned or use the cleanup helper script cleanup.sh available at the root of the sample-create-idp-with-appsyncevents-and-amazonbedrock folder:

./cleanup.sh #region#

Conclusion

In this post, we walked through a solution to create a document processing pipeline, with a web application using serverless services. Via the web application, we were able to upload a file and receive responses in real time for different types of operations (summarization, extraction of specific fields and classification). First, we created an Amazon Bedrock Data Automation project (with a driving license blueprint). Then we created a web socket along with an orchestration solution using a state machine (AWS Step Functions and AWS Lambda functions). We also configured a user pool to grant a user access to the web application. Finally, we created the frontend of the web application in AWS Amplify.

To dive deeper into this solution, a self-paced workshop is available in AWS Workshop Studio.

Perform per-project cost allocation in Amazon SageMaker Unified Studio

2025-07-09 Enrique Salgado Hernández

Post Syndicated from Enrique Salgado Hernández original https://aws.amazon.com/blogs/big-data/perform-per-project-cost-allocation-in-amazon-sagemaker-unified-studio/

Amazon SageMaker Unified Studio is a single data and AI development environment where you can find and access your data and act on it using AWS resources for SQL analytics, data processing, model development, and generative AI application development.

SageMaker Unified Studio is part of the next generation of Amazon SageMaker. SageMaker brings together AWS artificial intelligence and machine learning (AI/ML) and analytics capabilities and delivers an integrated experience for analytics and AI with unified access to data.

With SageMaker Unified Studio, you can create domains and projects, providing a single interface to build, deploy, execute, and monitor end-to-end workflows. This approach helps drive collaboration across teams and facilitates agile development.

SageMaker Unified Studio implements resource tagging when AWS resources are provisioned. You can use these tags to track and allocate costs for the various resources created as part of the domains and projects within SageMaker Unified Studio.

This post demonstrates how to perform cost allocation using these resource tags, so finance analysts and business analysts can implement and follow Financial Operations (FinOps) best practices to control and track cloud infrastructure costs.

Solution overview

The following diagram illustrates how tagging works within SageMaker domains.

High level diagram that illustrates SageMaker Unified Studio entities (domains, projects and environments) are organized and how tags are applied to each of them

Before reviewing the implementation details, let’s explore several key SageMaker concepts: domain, project, project profile, and environment blueprint. For more information, refer to the SageMaker Unified Studio Administrator Guide.

Domain – A domain is an organizing entity created by an administrator. Administrators assign users to domains to enable collaboration using similar tools, assets, and resources. A domain can represent a business organization or a business unit containing people who collaborate and share resources. After creating a domain, administrators share the URL with users to access the portal.
Projects – Projects exist within each domain. A project provides a boundary where users can collaborate on a business use case. Users can create and share data, computing, and other resources within projects.
Project profile – When you create a project, you must select a project profile. A project profile is a template that governs infrastructure for the project, simplifying project creation with preconfigured settings and resources ready for use.
Environment blueprints – Environment blueprints are reusable templates for creating environments. They define settings for resource deployment and provide information for provisioning. Each blueprint uses an AWS CloudFormation template to create resources in a repeatable and scalable manner.

For effective cost tracking and allocation, make sure your SageMaker resources have proper tags. You can configure these as cost allocation tags to group and filter across AWS Billing and Cost Management tools (such as AWS Cost Explorer and AWS Data Exports).

As of this writing, SageMaker domains support tagging at the blueprint, domain, project, and environment level. When you create projects or add resources within an existing project, the following tags are automatically added to resources through CloudFormation resource tags, configured for each blueprint stack:

AmazonDataZoneBlueprint – Type of blueprint corresponding to this blueprint’s CloudFormation template (for example, Tooling)
AmazonDataZoneDomain – Amazon DataZone domain associated with this CloudFormation template
AmazonDataZoneEnvironment – Amazon DataZone environment ID associated with this CloudFormation template
AmazonDataZoneProject – Amazon DataZone project associated with this CloudFormation template

To track costs in SageMaker Unified Studio, you will perform the following steps:

Create a SageMaker domain and project.
Configure cost and billing settings by enabling cost allocation tags.
(Optional) Generate costs for your project.
Track costs using Cost Explorer and Data Exports.

Prerequisites

This post requires the following configurations in your AWS account:

AWS IAM Identity Center enabled in your organization management account (preferred) or in the member account where you will use SageMaker Unified Studio. For instructions on enabling IAM Identity Center, refer to Enable IAM Identity Center.
Cost Explorer enabled in your organization management account (preferred) or in the member account where you will use SageMaker Unified Studio. For configuration steps, refer to Enabling Cost Explorer.

Either legacy AWS Cost and Usage Reports (AWS CUR) with Amazon Athena integration or Data Exports configured and integrated with Athena for queries. For setup instructions, refer to creating Data Exports.

Create a SageMaker Unified Studio domain and project

Complete the following steps to set up your domain and project:

Create a SageMaker Unified Studio domain using the Quick setup option (recommended for new users) or manual setup.

After domain creation, you will be redirected to the domain overview page.

Choose Open Unified Studio.
On the SageMaker Unified Studio console, choose Create project.
For Project profile, choose SQL analytics, then choose Continue.

SageMaker Unified Studio create project wokflow (configuration page)

Choose Continue to keep the default blueprint parameters.
Review the configuration summary, then choose Create project.

SageMaker Unified Studio create project wokflow (confirmation page)

After the project is created, you will be redirected to the project overview page. Record the project ID and domain ID.

Project details page showing various details such as project id, project name and project IAM role ARN

Cost and billing configuration

As mentioned earlier, to track costs in SageMaker Unified Studio, you must configure cost allocation tags. Refer to Organizing and tracking costs using AWS cost allocation tags for more information about this feature.

Complete the following steps:

On the AWS Billing and Cost Management console, under Cost organization in the navigation pane, choose Cost allocation tags.
Select the following tags and choose Activate:
1. AmazonDataZoneDomain
2. AmazonDataZoneProject
3. AmazonDataZoneEnvironment
4. AmazonDataZoneBlueprint

The AmazonDataZoneProject and AmazonDataZoneDomain tags correspond to the project and domain ID values you recorded earlier.

AWS cost allocation tags interface showing the AWS tags that are currently configured as cost allocation tags

Cost allocation tags configuration doesn’t apply retroactively. If you want to monitor costs associated with these tags in the AWS Billing and Cost Management tools before the activation date, you must request a cost allocation tag backfill. The backfill operation can take several hours to complete.

Generate costs for the project

This section explains how to generate costs associated with the underlying data backend (Amazon Redshift in this case) to examine them using AWS billing tools. You can skip this section if you’re tracking costs on an active project.

To generate costs, we use the table structure used in the Redshift Immersion Labs. Refer to Create Tables for more details.

To run queries in SageMaker Unified Studio, follow these steps:

In your project, choose New and then Query.

Image that shows the query button within the SageMaker Unified Studio project overview page allowing users to open the query editor tool

Use the Amazon Redshift Serverless compute configured for the project to generate the costs:
1. Choose the Redshift (Lakehouse) connection.
2. Choose the dev database.
3. Choose the project schema.
4. Choose Choose.

Image that shows the conection selector available in SageMaker Unified Studio. In this case Redshift LakeHouse connection is selected with dev database and project schema selected underneath

Copy and execute the SQL statements provided in the following GitHub repo into the SageMaker Unified Studio query editor to create, load, and validate data on the tables.

View of the Query editor within the SageMaker Unified Studio portal. Image contains two SQL queries (create tables and COPY data operation)

After running these steps, you will have generated some Amazon Redshift costs that will be present for further analysis in AWS Billing and Cost Management tools. However, these tools (Cost Explorer and Data Exports) are refreshed least one time every 24 hours, so you might need to wait up to 24 hours before proceeding to the next section.

Tracking costs in AWS Billing and Cost Management tools

With the cost allocation tags enabled, you can use AWS Billing and Cost Management tools to analyze and track costs, including Cost Explorer and Data Exports. For more information about using these tools, refer to the AWS Billing and Cost Management User Guide.

Check costs in Cost Explorer

You can check your SageMaker Unified Studio costs using Cost Explorer. With this tool, you can view and analyze your costs and usage through an interface with pre-built filters and aggregation capabilities for various metrics. For more information, refer to the Analyzing your costs and usage with AWS Cost Explorer.

To access Cost Explorer, complete the following steps:

On the AWS Management Console, choose your account name in the top right corner and choose Billing Dashboard, or search for “Cost Explorer” in the console search bar.
On the Billing Dashboard, choose Cost Explorer in the navigation pane.
For first-time users, choose Launch Cost Explorer to enable the service.

AWS can take up to 24 hours to prepare your cost data.

To view overall costs per project, configure the following report parameters:
1. For Date Range, enter your range.
2. For Granularity, choose Monthly.
3. For Dimension, choose Tag.
4. For Tag, enter your tag (AmazonDataZoneProject).

Image that shows how to group by a particular dimension (tag) in cost explorer

The following screenshot shows a sample report.

AWS cost explorer report showing costs by SageMaker Unified Studio project

To view different service costs for a specific project, update the following parameters:
1. For Dimension, choose Service.
2. For Tag¸ choose AmazonDataZoneProject and choose the value of the project you want to inspect (in this case, 4z9d694nbsnyqx).

Image that illustrates how to filter by a specific dimension (tag) and value in cost explorer

The results should look similar to the following screenshot.

AWS cost explorer report showing service costs for a particular SageMaker Unified Studio project

Check costs using Data Exports

With Data Exports, you can query your cost and usage in AWS with the maximum flexibility degree compared to other tools such as Cost Explorer. It provides a comprehensive set of measures and dimensions that you can include in the export to create a personalized report. This report is then delivered to Amazon Simple Storage Service (Amazon S3) so you can configure it with Athena, so it can be queried using SQL or business intelligence (BI) tools such as Amazon QuickSight.

This post assumes you have already configured a data export and you have it integrated with Athena (refer to Processing data exports for more information). For instructions on setting up CUR and Athena integration, refer to Creating reports.

Check costs by project

Use the following query to check costs by project:

SELECT product_servicecode,
    product_product_family,
    resource_tags[ 'user_amazon_data_zone_project' ] as user_amazon_data_zone_project,
    round(sum(line_item_unblended_cost), 2) costs,
    line_item_line_item_description 
FROM "data_exports"."data_exportdata"
where resource_tags [ 'user_amazon_data_zone_project' ] != ''
group by product_product_family,
    product_servicecode,
    resource_tags[ 'user_amazon_data_zone_project' ],
    line_item_line_item_description
order by round(sum(line_item_unblended_cost), 2) DESC;

Results will look similar to the following screenshot on the Athena console.

Athena SQL query results when querying cost and usage data from data exports

The preceding query shows your costs grouped by:

Project (using tags)
Service
Product family, which corresponds to the subtype for a given product usage charge (for example, ML Instance for SageMaker, or Managed Storage for Amazon Redshift)

Check costs for individual projects

To check costs for a specific SageMaker Unified Studio project (for example, the sample project 4z9d694nbsnyqx created during this walkthrough), you can use the following query:

SELECT product_servicecode,
    product_product_family,
    resource_tags[ 'user_amazon_data_zone_project' ] as user_amazon_data_zone_project,
    round(sum(line_item_unblended_cost), 2) costs,
    line_item_line_item_description 
FROM "data_exports"."data_exportdata"
where resource_tags [ 'user_amazon_data_zone_project' ] != ''
and resource_tags [ 'user_amazon_data_zone_project' ] = <provide the project id here>
group by product_product_family,
    product_servicecode,
    resource_tags[ 'user_amazon_data_zone_project' ],
    line_item_line_item_description
order by round(sum(line_item_unblended_cost), 2) DESC;

Monitor costs with Data Exports and QuickSight

If you enabled Athena to work with Data Exports, you can also configure QuickSight to query this data source. With QuickSight, you can create interactive dashboards to track SageMaker costs in SageMaker Unified Studio at scale.

Configure access and permissions

To create CUR dashboards in QuickSight, first complete the following steps:

Subscribe to QuickSight and have an author user account. For instructions on subscribing to QuickSight, refer to Signing up for an Amazon QuickSight subscription.
Enable access to Athena and your CUR S3 bucket in the Security & permissions section of the QuickSight administration console. You need QuickSight administrator permissions to access this console.

Image shows QuickSight administration console where administrators can edit the AWS services (Athena in this case) that QuickSight is allowed to access

If you’re using AWS Lake Formation, make sure your QuickSight user is authorized to query the CUR database and table. For more information about granting access in Lake Formation, refer to Granting permissions on Data Catalog resources.

Create a QuickSight dataset

The next step is to create a dataset in QuickSight using a SQL query. For instructions on creating a dataset with SQL, refer to Using SQL to customize data. Use the following SQL expression:

SELECT product_servicecode,
    product_product_family,
    resource_tags[ 'user_amazon_data_zone_environment' ] as user_amazon_data_zone_environment,
    resource_tags[ 'user_amazon_data_zone_project' ] as user_amazon_data_zone_project,
    resource_tags[ 'user_amazon_data_zone_domain' ] as user_amazon_data_zone_domain,
    line_item_unblended_cost,
    line_item_usage_start_date,
    line_item_line_item_description
FROM "data_exports"."data_exportdata"
where resource_tags [ 'user_amazon_data_zone_environment' ] != '' or resource_tags [ 'user_amazon_data_zone_project' ] != ''

Image of QuickSight dataset preparation page. Shows a SQL query that is used to extract data from the data exports previously configured.

The preceding query includes only cost and usage data that’s tagged with either user_amazon_data_zone_environment or user_amazon_data_zone_project to focus on SageMaker associated costs. To include other AWS costs, you must modify these filters.

Create QuickSight dashboards

Using the authoring capabilities of QuickSight, you can create interactive dashboards where business stakeholders can explore and track costs associated with SageMaker Unified Studio projects. You can use these dashboards to review relevant cost metrics at a glance that are derived from the Data Exports dimensions and metrics included in your dataset, as shown in the following screenshot. For more information about adding visuals to analyses, refer to Adding visuals to Amazon QuickSight analyses.

Example of a QuickSight dashboard consuming data exports cost and usage data. Dashboard contains multiple visuals that illustrate SageMaker Unified Studio costs by project and service

The preceding example shows a dashboard built using QuickSight connected to a Data Exports dataset. The dashboard contains the following visuals:

KPI visual showing the current monthly costs for SageMaker Unified Studio along with the month over month (MoM) variation and history
Autonarrative visual analyzing SageMaker Unified Studio costs (highest) by month
Vertical stacked bar chart showing SageMaker Unified Studio costs by month (grouped by project)
Donut chart showing SageMaker Unified Studio cost by service
Heat map visual correlating costs by project ID and service

Using this approach (QuickSight and Data Exports), you can create highly customizable dashboards to explore and monitor your SageMaker Unified Studio costs. Furthermore, you can create automated reports using the QuickSight reporting feature to send these by email to the relevant stakeholders.

Clean up

Delete the resources you created as part of this post when you’re done with them to avoid monthly charges. This includes SageMaker resources, created Data Export reports and the QuickSight subscription (in case it was created to visualize costs).

Delete SageMaker resources
1. Log in to the SageMaker domain using an admin role.
2. Delete the project you created.
3. Delete the SageMaker domain.
Delete Data Exports reports
1. On the AWS Billing console, in the navigation pane, choose Cost & Usage Reports.
2. Select the report you want to delete.
3. Choose Delete.
4. Confirm the deletion by choosing Delete report.

For more information about managing Data Exports, refer to Deleting exports.

Unsubscribe from QuickSight
1. On the QuickSight console, choose your profile name in the top right corner.
2. Choose Manage QuickSight.
3. Choose Account settings.
4. At the bottom of the page, choose Delete your QuickSight account.
5. Review the information about data deletion.
6. Enter delete to confirm.
7. Choose Delete.

IMPORTANT NOTE: Before unsubscribing, make sure you backed up any dashboards or analyses you want to keep. After deletion, you can’t recover your QuickSight assets. For more information about managing your QuickSight subscription, refer to Deleting your Amazon QuickSight subscription and closing the account.

Conclusion

Managing costs on a unified platform like SageMaker can seem challenging because it aggregates many tools and services with different cost models. In this post, we showed how to use AWS Billing and Cost Management tools to aggregate and categorize costs across the various services used within SageMaker. With this approach, you can monitor and track respective service costs, either in aggregate or focusing on a particular project.

Start taking control of your analytics and ML costs today. With AWS Billing and Cost Management tools with SageMaker, you can:

Track and monitor your service costs
Break down expenses by project or service
Implement efficient back charging mechanisms to the different business units or organizations using SageMaker within your organization

For further reading, refer to Analyzing your costs and usage with AWS Cost Explorer and Processing Data Exports (using Athena).

About the authors

Enrique Salgado Hernández is a Senior Specialist Solutions Architect at AWS with more than 10 years of experience working in the cloud. He specializes in designing and implementing large-scale analytics architectures across various industry sectors. He is passionate about working with customers to solve their problems by supporting them during their cloud journey.

Angel Conde Manjon is a Senior EMEA Data & AI PSA, based in Madrid. He previously worked on research related to data analytics and AI in diverse European research projects. In his current role, Angel helps partners develop businesses centered on data and AI.

How Stifel built a modern data platform using AWS Glue and an event-driven domain architecture

2025-07-07 Amit Maindola

Post Syndicated from Amit Maindola original https://aws.amazon.com/blogs/big-data/how-stifel-built-a-modern-data-platform-using-aws-glue-and-an-event-driven-domain-architecture/

Stifel Financial Corp. is an American multinational independent investment bank and financial services company, founded in 1890 and headquartered in downtown St. Louis, Missouri. Stifel offers securities-related financial services in the United States and Europe through several wholly owned subsidiaries. Stifel provides both equity and fixed income research and is the largest provider of US equity research.

In this post, we show you how Stifel implemented a modern data platform using AWS services and open data standards, building an event-driven architecture for domain data products while centralizing the metadata to facilitate discovery and sharing of data products.

Stifel’s modern data platform use case

Stifel envisioned a data platform that delivers accurate, timely, and properly governed data, providing consistency throughout the organization whenever users access the information. This approach showed limitations as the data complexity increased, data volumes grew, and demand for quick, business-driven insights rose. These challenges are encountered by financial institutions worldwide, leading to a reassessment of traditional data management practices. Under the federated governance model, Stifel developed a modern data strategy based on the following objectives:

Managing ingestion and metadata
Creating source-aligned data products complying with Stifel business streams
Integrating source-aligned data products from other domains (Stifel business units)
Producing consumer-aligned data products for specific business purposes
Publishing data products to a centralized data catalog

Some of the Stifel challenges highlighted in the preceding list required building a data platform that can:

Boost agility by democratizing data, thus reducing time to market and enhancing the customer experience
Improve data quality and trust in the data
Standardize tools and eliminate the shadow information technology (IT) culture to increase scalability, reduce risk, and minimize operational inefficiencies

Following the federated governance model, Stifel has organized its domain structure to provide autonomy to various functional teams while preserving the core values of data mesh. The following diagram depicts a high-level architecture of the data mesh implementation at Stifel.

Each data domain has the flexibility to create data products that can be published to the centralized catalog, while maintaining the autonomy for teams to develop data products that are exclusively accessible to teams within the domain. These products aren’t available to others until they are deemed ready for broader enterprise use. Domains have the freedom to decide which data they want to share. They can either:

Make their data products visible to everyone through the central catalog
Keep their data products visible only within their own domain

By implementing an event-driven domain architecture, organizations can achieve significant business advantages while positioning themselves for future growth and innovation. Stifel data products refreshes were dependent on data assets with variable cadence. Event-driven architecture enables real-time or near real-time updates by allowing data products to automatically respond to changes in underlying data assets as they occur, rather than relying on fixed batch schedules that might miss critical updates or waste resources on unnecessary refreshes. The key is to carefully plan the implementation and make sure of alignment with business objectives while considering both technical and organizational factors. This architecture style particularly suits organizations that:

Need real-time processing capabilities
Have complex domain interactions
Require high scalability
Want to improve business agility
Need better system integration
Are pursuing digital transformation

The following are some of the key AWS Services that helped Stifel to build their modern data platform.

AWS Glue is a serverless data integration service that’s used for data processing to build data assets and data products in the domains. Data is also cataloged in AWS Glue Catalog, making it straightforward to discover and query with supported engines.
Amazon EventBridge provides a scalable and flexible serverless event bus that facilitates seamless communication between different domains and services. By using EventBridge, Stifel was able to implement a publish-subscribe model where domain events can be emitted, filtered, and routed to appropriate consumers based on configurable rules. EventBridge supports custom event buses for domain-specific events, enabling clear separation of concerns and improved manageability.
AWS Lake Formation helped in providing centralized security, governance, and catalog capabilities while preserving domain autonomy in data product creation and management. With Lake Formation, data domains were able to maintain their independent data products within a federated structure while enforcing consistent access controls, data quality standards, and metadata management across the organization.
Apache Hudi on Amazon Simple Storage Service (Amazon S3) offers an optimized way to store data assets and products and promotes interoperability across other services.

Stifel’s solution architecture

The following diagram illustrates the data mesh architecture that Stifel uses to build a domain-driven architecture. In this system, various domains create data products and share them with other domains through a central governance account that uses Lake Formation.

Let’s look at some of the key design components that are being used to enable and implement data mesh and event driven design

Data ingestion framework

The data ingestion framework consists of several processor modules that are built using several AWS services and metadata driven architecture. The following diagram shows the architecture of the raw data ingestion framework.

The framework gets raw data files from both internal Stifel systems and third-party data sources. These files are processed and stored in a raw data ingestion account on Amazon S3 in open table format Apache Hudi. This stored data is then shared with different parts of the organization, called data domains. Each domain can use this shared data to create their own data products.

As a file (in CSV, XML, JSON and custom formats) lands into the landing bucket, an Amazon S3 event notification is created and placed in an Amazon Simple Queue Service (Amazon SQS)queue. The Amazon SQS queue triggers an AWS Lambda function and saves the metadata (such as the name of the file, date and time the file was received, and the file size) to a file audit data store (Amazon Aurora PostgreSQL-Compatible Edition).

An EventBridge time scheduler invokes an AWS Step Functions workflow at pre-determined intervals. The Step Functions workflow orchestrates the batch ingestion from raw to staging layer.

The Step Functions workflow orchestrates a set of Lambda functions to get the list of unprocessed raw files from the audit data store and create batches of raw files to process them in parallel. The Step Functions workflow then triggers parallel AWS Glue jobs that process each batch of raw files.
Each raw file is validated for any data quality checks and the data is saved to staging tables in Hudi format. Any errors encountered are logged into an audit table and a notification is generated for support team. For all successfully processed raw files, the file status is updated to PROCESSED and logged into an audit table.
After the Hudi table is updated, a data refresh event is sent to EventBridge and then passed to the Central Mesh Account. The Central Mesh Account forwards these events to the data domains to notify them that the raw tables are refreshed, allowing the data domains to use this data for creating their own data products.

Event driven data product refresh

The Stifel data lake is based on a data mesh architecture where several data producers share data across data domains. A mechanism is needed to alert consumers who depend on other data producers’ data products when those source data products are refreshed, so that the consumers can update their own data products accordingly. The following diagram describes the technical architecture of event-based data processing. The central governance account acts as the central event bus, which receives all data refresh events from all data producers. The central event bus forwards the events to consumer accounts. The consumer accounts filter the events consumers are interested in from data producers for their data processing needs.

Orchestration design

Stifel designed and implemented an event-based data pipeline orchestration system that triggers data pipelines when specific events occur. This system processes data immediately after receiving all required dependency events, enabling efficient workflow management.

The following diagram describes the logical architecture of the domain data pipeline orchestration framework.

The orchestration framework includes the components described in the following list. The data dependencies and data pipeline state management metadata are hosted in an Aurora PostgreSQL database.

Data refresh processor: Receives data refresh events from central mesh and local data domain and evaluates if the domain data products data dependencies are met
Data product dependency processor: Retrieves metadata for the product, kicks off a corresponding data domain AWS Glue job, and updates metadata with the job information
Data pipeline state change processor: Monitors the domain data jobs and takes actions based on the job’s final status (SUCCEED or FAILED) and then creates incident tickets for failed jobs

Conclusion

Stifel has improved its data management and reduced data silos by adopting a data product approach. This strategy has positioned Stifel to become a data-driven, customer-centric organization. The company combines federated platform practices with AWS and open standards. As a result, Stifel is achieving its decentralization objectives through a scalable data platform. This platform empowers domain teams to make informed decisions, drive innovation, and maintain a competitive edge. Here are the some of the advantages Stifel got from an event-driven domain architecture (EDDA):

Business agility: Rapid market response, new business capability integration, scalable domains, quicker feature deployment, and flexible process modification
Customer experience: Real-time processing, responsive interactions, personalized services, consistent omnichannel presence, and enhanced service availability
Operational efficiency: Reduced system coupling, optimal resource use, scalable systems, lower maintenance overhead, and efficient data processing
Cost benefits: Lower development costs, reduced infrastructure expenses, decreased maintenance costs, efficient resource usage, and a better ROI on technology investments

In this post, we demonstrated how Stifel is building a modern data platform by recognizing the critical importance of data in today’s financial landscape. This strategic approach not only enhances operational efficiency but also positions Stifel at the forefront of technological innovation in the financial services industry. To learn more and get started, see the following resources:

About the authors

Amit Maindola is a Senior Data Architect focused on data engineering, analytics, and AI/ML at Amazon Web Services. He helps customers in their digital transformation journey and enables them to build highly scalable, robust, and secure cloud-based analytical solutions on AWS to gain timely insights and make critical business decisions.

Srinivas Kandi is a Senior Architect at Stifel focusing on delivering the next generation of cloud data platform on AWS. Prior to joining Stifel, Srini was a delivery specialist in cloud data analytics at AWS helping several customers in their transformational journey into AWS cloud. In his free time, Srini likes to explore cooking, travel and learn new trends and innovations in AI and cloud computing.

Hossein Johari is a seasoned data and analytics leader with over 25 years of experience architecting enterprise-scale platforms. As Lead and Senior Architect at Stifel Financial Corp. in St. Louis, Missouri, he spearheads initiatives in Data Platforms and Strategic Solutions, driving the design and implementation of innovative frameworks that support enterprise-wide analytics, strategic decision-making, and digital transformation. Known for aligning technical vision with business objectives, he works closely with cross-functional teams to deliver scalable, forward-looking solutions that advance organizational agility and performance.

Ahmad Rawashdeh is a Senior Architect at Stifel Financial. He supports Stifel and its clients in designing, implementing, and building scalable and reliable data architectures on Amazon Web Services (AWS), with a strong focus on data lake strategies, database services, and efficient data ingestion and transformation pipelines.

Lei Meng is a data architect at Stifel. His focus is working in designing and implementing scalable and secure data solutions on the AWS and helping Stifel’s cloud migration from on-premises systems.

How Skroutz handles real-time schema evolution in Amazon Redshift with Debezium

2025-06-23 Konstantina Mavrodimitraki

Post Syndicated from Konstantina Mavrodimitraki original https://aws.amazon.com/blogs/big-data/how-skroutz-handles-real-time-schema-evolution-in-amazon-redshift-with-debezium/

This guest post was co-authored with Kostas Diamantis from Skroutz.

At Skroutz, we are passionate about our product, and it is always our top priority. We are constantly working to improve and evolve it, supported by a large and talented team of software engineers. Our product’s continuous innovation and evolution lead to frequent updates, often necessitating changes and additions to the schemas of our operational databases.

When we decided to build our own data platform to meet our data needs, such as supporting reporting, business intelligence (BI), and decision-making, the main challenge—and also a strict requirement—was to make sure it wouldn’t block or delay our product development.

We chose Amazon Redshift to promote data democratization, empowering teams across the organization with seamless access to data, enabling faster insights and more informed decision-making. This choice supports a culture of transparency and collaboration, as data becomes readily available for analysis and innovation across all departments.

However, keeping up with schema changes from our operational databases, while updating the data warehouse without constantly coordinating with development teams, delaying releases, or risking data loss, became a new challenge for us.

In this post, we share how we handled real-time schema evolution in Amazon Redshift with Debezium.

Solution overview

Most of our data resides in our operational databases, such as MariaDB and MongoDB. Our approach involves using the change data capture (CDC) technique, which automatically handles the schema evolution of the data stores being captured. For this, we used Debezium along with a Kafka cluster. This solution enables schema changes to be propagated without disrupting the Kafka consumers.

However, handling schema evolution in Amazon Redshift became a bottleneck, prompting us to develop a strategy to address this challenge. It’s important to note that, in our case, changes in our operational databases primarily involve adding new columns rather than breaking changes like altering data types. Therefore, we have implemented a semi-manual process to resolve this issue, along with a mandatory alerting mechanism to notify us of any schema changes. This two-step process consists of handling schema evolution in real time and handling data updates in an asynchronous manual step. The following architectural diagram illustrates a hybrid deployment model, integrating both on-premises and cloud-based components.

End-to-end data migration workflow from on-premises databases to AWS cloud using CDC, messaging, and data warehouse services

The data flow begins with data from MariaDB and MongoDB, captured using Debezium for CDC in near real-time mode. The captured data is streamed to a Kafka cluster, where Kafka consumers (built on the Ruby Karafka framework) read and write them to the staging area, either in Amazon Redshift or Amazon Simple Storage Service (Amazon S3). From the staging area, DataLoaders promote the data to production tables in Amazon Redshift. At this stage, we apply the slowly changing dimension (SCD) concept to these tables, using Type 7 for most of them.

In data warehousing, an SCD is a dimension that stores data, and though it’s generally stable, it might change over time. Various methodologies address the complexities of SCD management. SCD Type 7 places both the surrogate key and the natural key into the fact table. This allows the user to select the appropriate dimension records based on:

The primary effective date on the fact record
The most recent or current information
Other dates associated with the fact record

Afterwards, analytical jobs are run to create reporting tables, enabling BI and reporting processes. The following diagram provides an example of the data modeling process from a staging table to a production table.

Database schema evolution: staging.shops to production.shops with added temporal and versioning columns

The architecture depicted in the diagram shows only our CDC pipeline, which fetches data from our operational databases and doesn’t include other pipelines, such as those for fetching data through APIs, scheduled batch processes, and many more. Also note that our convention is that dw_* columns are used to catch SCD metadata information and other metadata in general. In the following sections, we discuss the key components of the solution in more detail.

Real-time workflow

For the schema evolution part, we focus on the column dw_md_missing_data, which captures schema evolution changes in near real time that occur in the source databases. When a new change is produced to the Kafka cluster, the Kafka consumer is responsible for writing this change to the staging table in Amazon Redshift. For example, a message produced by Debezium to the Kafka cluster will have the following structure when a new shop entity is created:

{
  "before": null,
  "after": {
    "id": 1,
    "name": "shop1",
    "state": "hidden"
  },
  "source": {
    ...
    "ts_ms": "1704114000000",
    ...
  },
  "op": "c",
  ...
}

The Kafka consumer is responsible for preparing and executing the SQL INSERT statement:

INSERT INTO staging.shops (
  id,
  "name",
  state,
  dw_md_changed_at,
  dw_md_operation,
  dw_md_missing_data
)
VALUES
  (
    1,
    'shop1',
    'hidden',
    '2024-01-01 13:00:00',
    'create',
    NULL
  )
;

After that, let’s say a new column is added to the source table called new_column, with the value new_value.
The new message produced to the Kafka cluster will have the following format:

{
  "before": { ... },
  "after": {
    "id": 1,
    "name": "shop1",
    "state": "hidden",
    "new_column": "new_value"
  },
  "source": {
    ...
    "ts_ms": "1704121200000"
    ...
  },
  "op": "u"
  ...
}

Now the SQL INSERT statement executed by the Kafka consumer will be as follows:

INSERT INTO staging.shops (
  id,
  "name",
  state,
  dw_md_changed_at,
  dw_md_operation,
  dw_md_missing_data
)
VALUES
  (
    1,
    'shop1',
    'hidden',
    '2024-01-01 15:00:00',
    'update',
    JSON_PARSE('{"new_column": "new_value"}') /* <-- check this */
  )
;

The consumer performs an INSERT as it would for the known schema, and anything new is added to the dw_md_missing_data column as key-value JSON. After the data is promoted from the staging table to the production table, it will have the following structure.

Production.shops table displaying temporal data versioning with creation, update history, and current state indicators

At this point, the data flow continues running without any data loss or the need for communication with teams responsible for maintaining the schema in the operational databases. However, this data might not be easily accessible for the data consumers, analysts, or other personas. It’s worth noting that dw_md_missing_data is defined as a column of the SUPER data type, which was introduced in Amazon Redshift to store semistructured data or documents as values.

Monitoring mechanism

To track new columns added to a table, we have a scheduled process that runs weekly. This process checks for tables in Amazon Redshift with values in the dw_md_missing_data column and generates a list of tables requiring manual action to make this data available through a structured schema. A notification is then sent to the team.

Manual remediation steps

In the aforementioned example, the manual steps to make this column available would be:

Add the new columns to both staging and production tables:

ALTER TABLE staging.shops ADD COLUMN new_column varchar(255);
ALTER TABLE production.shops ADD COLUMN new_column varchar(255);

Update the Kafka consumer’s known schema. In this step, we just need to add the new column name to a simple array list. For example:

class ShopsConsumer < ApplicationConsumer
  SOURCE_COLUMNS = [
    'id',
    'name',
    'state',
    'new_column' # this one is the new column
  ]
 
  def consume
    # Ruby code for:
    #   1. data cleaning
    #   2. data transformation
    #   3. preparation of the SQL INSERT statement
 
    RedshiftClient.conn.exec <<~SQL
      /*
        generated SQL INSERT statement
      */
    SQL
  end
end

Update the DataLoader’s SQL logic for the new column. A DataLoader is responsible for promoting the data from the staging area to the production table.

class DataLoader::ShopsTable < DataLoader::Base
  class << self
    def load
      RedshiftClient.conn.exec <<~SQL
        CREATE TABLE staging.shops_new (LIKE staging.shops);
      SQL
 
      RedshiftClient.conn.exec <<~SQL
        /*
          We move the data to a new table because in staging.shops
          the Kafka consumer will continue add new rows
        */
        ALTER TABLE staging.shops_new APPEND FROM staging.shops;
      SQL
 
      RedshiftClient.conn.exec <<~SQL
        BEGIN;
          /*
            SQL to handle
              * data deduplications etc
              * more transformations
              * all the necessary operations in order to apply the data modeling we need for this table
          */
 
          INSERT INTO production.shops (
            id,
            name,
            state,
            new_column, /* --> this one is the new column <-- */
            dw_start_date,
            dw_end_date,
            dw_current,
            dw_md_changed_at,
            dw_md_operation,
            dw_md_missing_data
          )
          SELECT
            id,
            name,
            state,
            new_column, /* --> this one is the new column <-- */
            /*
              here is the logic to apply the data modeling (type 1,2,3,4...7)
            */
          FROM
            staging.shops_new
          ;
 
          DROP TABLE staging.shops_new;
        END TRANSACTION;
      SQL
    end
  end
end

Transfer the data that has been loaded in the meantime from the dw_md_missing_data SUPER column to the newly added column and then clean up. In this step, we just need to run a data migration like the following:

BEGIN;
 
  /*
    Transfer the data from the `dw_md_missing_data` to the corresponding column
  */
  UPDATE production.shops
  SET new_column = dw_md_missing_data.new_column::varchar(255)
  WHERE dw_md_missing_data.new_column IS NOT NULL;
 
  /*
    Clean up dw_md_missing_data column
  */
  UPDATE production.shops
  SET dw_md_missing_data = NULL
  WHERE dw_md_missing_data IS NOT NULL;
 
END TRANSACTION;

To perform the preceding operations, we make sure that no one else performs changes to the production.shops table because we want no new data to be added to the dw_md_missing_data column.

Conclusion

The solution discussed in this post enabled Skroutz to manage schema evolution in operational databases while seamlessly updating the data warehouse. This alleviated the need for constant development team coordination and removed risks of data loss during releases, ultimately fostering innovation rather than stifling it.

As the migration of Skroutz to the AWS Cloud approaches, discussions are underway on how the current architecture can be adapted to align more closely with AWS-centered principles. To that end, one of the changes being considered is Amazon Redshift streaming ingestion from Amazon Managed Streaming for Apache Kafka (Amazon MSK) or open source Kafka, which will make it possible for Skroutz to process large volumes of streaming data from multiple sources with low latency and high throughput to derive insights in seconds.

If you face similar challenges, discuss with an AWS representative and work backward from your use case to provide the most suitable solution.

About the authors

Konstantina Mavrodimitraki is a Senior Solutions Architect at Amazon Web Services, where she assists customers in designing scalable, robust, and secure systems in global markets. With deep expertise in data strategy, data warehousing, and big data systems, she helps organizations transform their data landscapes. A passionate technologist and people person, Konstantina loves exploring emerging technologies and supports the local tech communities. Additionally, she enjoys reading books and playing with her dog.

Kostas Diamantis is the Head of the Data Warehouse at Skroutz company. With a background in software engineering, he transitioned into data engineering, using his technical expertise to build scalable data solutions. Passionate about data-driven decision-making, he focuses on optimizing data pipelines, enhancing analytics capabilities, and driving business insights.

Introducing AWS Lambda native support for Avro and Protobuf formatted Apache Kafka events

2025-06-23 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-native-support-for-avro-and-protobuf-formatted-apache-kafka-events/

AWS Lambda now provides native support for Apache Avro and Protocol Buffers (Protobuf) formatted events with Apache Kafka event source mapping (ESM) when using Provisioned Mode. The support allows you to validate your schema with popular schema registries. This allows you to use and filter the more efficient binary event formats and share data using schema in a centralized and consistent way. This blog post shows how you can use Lambda to process Avro and Protobuf formatted events from Kafka topics using schema registry integration.

This new capability works with both Amazon Managed Streaming for Apache Kafka (Amazon MSK), Confluent Cloud and self-managed Kafka clusters. To get started, update your existing Kafka ESM to Provisioned Mode and add schema registry configuration, or create a new ESM in Provisioned Mode with schema registry integration enabled.

Avro and Protobuf

Many organizations use Avro and Protobuf formats with Apache Kafka because these binary serialization formats offer advantages over JSON. They provide 50-80% smaller message sizes, faster serialization and deserialization performance, robust schema evolution capabilities, and strong typing across multiple programming languages.Working with these formats in Lambda functions previously necessitated custom code. Developers needed to implement schema registry clients, handle authentication and caching, write format-specific deserialization logic, and manage schema evolution scenarios.

What’s new

Lambda’s Kafka Event Source Mapping (ESM) now provides built-in integration with AWS Glue Schema Registry, Confluent Cloud Schema Registry, and self-managed Confluent Schema Registry. When you configure schema registry settings for your Kafka ESM, the service automatically validates incoming JSON Schema, Avro, and Protobuf records against their registered schema. This moves complex schema registry integration logic from your application layer to the managed Lambda service.

You can build your function with Kafka’s open-source ConsumerRecords interface using Powertools for AWS Lambda to get your Avro or Protobuf generated business objects directly. Optionally you can specify to get your records in the JSON format, where your function receives clean, validated JSON data regardless of the original serialization format, removing the need for custom deserialization code in your Lambda functions. This also allows you to create Kafka consumers across multiple programming languages.

Powertools for AWS Lambda is a developer toolkit that provides specific support for Java, .NET, Python, and TypeScript, maintaining consistency with existing Kafka development patterns. You can directly access business objects without custom deserialization code.

You can also setup filtering rules to discard irrelevant, JSON, Avro or Protobuf formatted events before function invocations, which can improve processing performance and reduce costs.

How schema validation works

When you configure schema registry integration for your Kafka ESM, you specify the registry endpoint, authentication details, and which event fields (key, value, or both) to validate. The ESM polls your Kafka topics for records as usual but now performs additional processing before invoking your Lambda function.For each incoming event, the ESM extracts the schema ID embedded in the serialized data. It fetches the corresponding schema from your configured registry. This process happens transparently, with schema definitions cached for up to 24 hours to optimize performance. The ESM identifies the format of your events using schema metadata and validates the event structure. It keeps either the original binary data or deserializes it to JSON format based on your customer configuration and sends it to your function for processing.

Figure 1: Kafka processing flow diagram.

The ESM handles schema evolution automatically. When producers begin using new schema versions, the service detects the updated schema IDs and fetches the latest definitions from your registry. This makes sure that your functions always receive properly deserialized data without requiring code changes.

Event record format

As a part of the ESM schema registry configuration, you need to specify Event Record Format, which Lambda uses to deliver validated records to your function. The schema registry configuration supports SOURCE and JSON.

SOURCE preserves the original binary format of the data as a base64-encoded string with producer-appended schema-id removed. This allows direct conversion to Avro or Protobuf objects so that you can use Kafka’s ConsumerRecords interface for a Kafka-like experience. Use this format when working with strongly typed languages or when you need to maintain the full capabilities of Avro or Protobuf schemas. Then, you can use any Avro or Protobuf deserializer to convert raw bytes to your business object. Powertools provides native support for this deserialization.

With JSON, the ESM deserializes the data ready for direct use in languages with native JSON support. Use this when you don’t need to preserve the original binary format or work with generated classes. You can also use Powertools to convert the base64 to your business object. See the documentation for payload formats and deserialization behavior.

If you configure filtering rules, then they operate on the JSON-formatted events after deserialization. This upstream filtering prevents unnecessary Lambda invocations for events that don’t match your processing criteria, directly reducing your compute costs.

Configuration and setup

To use this feature, you must enable Provisioned Mode for your Kafka ESM, which provides the dedicated compute resources needed for schema registry integration.

You can configure the integration through the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS Language SDKs, or infrastructure as code (IaC) tools such as the AWS Serverless Application Model (AWS SAM) or AWS Cloud Development Kit (AWS CDK).

Your schema registry configuration includes the registry endpoint URL, authentication method (AWS Identity and Access Management (IAM) for AWS Glue Schema Registry, or Basic Auth, SASL/SCRAM, or mTLS for Confluent registries), and validation settings. You specify which event attributes to validate and optionally define filtering rules using standard Lambda event filtering syntax.

For error handling, configure Lambda failure destinations where events that fail schema validation or deserialization are sent. This makes sure that problematic events don’t disappear silently but are routed to other services such as Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and Amazon S3 for debugging and analysis.

Seeing the new features in action

There are a number of Serverless Patterns that you can use to process Kafka streams using Lambda. This example uses the Java pattern.

Deploy a sample Amazon MSK cluster

To set up an Amazon MSK cluster, follow the instructions in the GitHub repo and create a new AWS CloudFormation stack using the MSKAndKafkaClientEC2.yaml template file. The stack creates the Amazon MSK cluster, along with a client Amazon EC2 instance, to manage the Kafka cluster. There are costs involved when running this infrastructure.

Connect to the EC2 instance using EC2 Instance Connect.
Check that the Kafka topic is created by checking the contents of the kafka_topic_creator_output.txt file.
```
cat kafka_topic_creator_output.txt
```
The file should contain the text: “Created topic MskIamJavaLambdaTopic.”

Deploy the Glue schema registry and consumer Lambda function

The EC2 instance contains the software needed to deploy the schema registry and Lambda function.

Change directory to the pattern directory.
cd serverless-patterns/msk-lambda-iam-java-sam
Build the application using AWS SAM.
sam build

To deploy your application for the first time, run the following in the EC2 instance shell:

sam deploy --capabilities CAPABILITY_IAM --no-confirm-changeset \
	--no-disable-rollback --region $AWS_REGION --stack-name msk-lambda-schema-avro-java-sam --guided

You can accept all the defaults by hitting Enter. You can browse to the AWS Glue schema registry console and view the ContactSchema definition:

{
  "type": "record",
  "name": "Contact",
  "fields": [
    {"name": "firstname", "type": "string"},
    {"name": "lastname", "type": "string"},
    {"name": "company", "type": "string"},
    {"name": "street", "type": "string"},
    {"name": "city", "type": "string"},
    {"name": "county", "type": "string"},
    {"name": "state", "type": "string"},
    {"name": "zip", "type": "string"},
    {"name": "homePhone", "type": "string"},
    {"name": "cellPhone", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "website", "type": "string"}
  ]
}

The consumer Lambda function ESM is configured for Provisioned Mode.

View the ESM configuration from the Lambda console for the Lambda function name prefixed with msk-lambda-schema-avro-ja-LambdaMSKConsumer.
Choose the MSK Lambda trigger which opens the Triggers pane under Configuration.
Figure 2: View Lambda ESM schema configuration
The configuration specifies using the Event record format SOURCE so your function can use Kafka’s native open-source ConsumerRecords interface. Powertools then deserializes the payload.
The schema validation attribute is VALUE.
The ESM filter configuration only processes the records that match zip codes of 2000.
In your function code, specify the open-source Kafka ConsumersRecords interface by including Powertools for Lambda as a dependency. ConsumerRecords provides metadata about Kafka records and allows you to get direct access to your Avro/Protobuf generated business objects without requiring any additional deserialization code.

package com.amazonaws.services.lambda.samples.events.msk;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.lambda.powertools.kafka.Deserialization;
import software.amazon.lambda.powertools.kafka.DeserializationType;
import software.amazon.lambda.powertools.logging.Logging;

public class AvroKafkaHandler implements RequestHandler<ConsumerRecords<String, Contact>, String> {
    private static final Logger LOGGER = LoggerFactory.getLogger(AvroKafkaHandler.class);

    @Override
    @Logging(logEvent = true)
    @Deserialization(type = DeserializationType.KAFKA_AVRO)
    public String handleRequest(ConsumerRecords<String, Contact> records, Context context) {
        LOGGER.info("=== AvroKafkaHandler called ===");
        LOGGER.info("Event object: {}", records);
        LOGGER.info("Number of records: {}", records.count());
        
        for (ConsumerRecord<String, Contact> record : records) {
            LOGGER.info("Processing record - Topic: {}, Partition: {}, Offset: {}", 
                       record.topic(), record.partition(), record.offset());
            LOGGER.info("Record key: {}", record.key());
            LOGGER.info("Record value: {}", record.value());
            
            if (record.value() != null) {
                Contact contact = record.value();
                LOGGER.info("Contact details - firstName: {}, zip: {}", 
                           contact.getFirstname(), contact.getZip());
            }
        }
        
        LOGGER.info("=== AvroKafkaHandler completed ===");
        return "OK";
    }
}

Produce and consumer records

To send messages to Kafka, there is a LambdaMSKProducerJava function.

Invoke the function from the Lambda console or CLI within the EC2 instance.

sam remote invoke LambdaMSKProducerJavaFunction --region $AWS_REGION \
	--stack-name msk-lambda-schema-avro-java-sam

You can view the Producer logs to see the 10 records produced.The consumer Lambda function processes the records.

View the consumer Lambda function logs using the Amazon CloudWatch logs console or CLI within the EC2 instance.

sam logs --name LambdaMSKConsumerJavaFunction \
	--stack-name msk-lambda-schema-avro-java-sam --region $AWS_REGION

The Lambda function processes and logs only the records that match the filter FILTER. The Avro binary data is deserialized using Powertools for AWS Lambda. You should see the function logs showing each record processed with the decoded keys and values.

Figure 3: Lambda consumer logs showing Avro processing

Cleaning up

You can clean up the example Lambda function by running the sam delete command.

sam delete

If you created the Amazon MSK cluster and EC2 client instance, then navigate to the CloudFormation console, choose the stack, and choose Delete.

Performance and cost considerations

Schema validation and deserialization can add processing time before your function invocation. However, this overhead is typically minimal when compared to the benefits. ESM caching minimizes schema registry API calls. Using filtering allows you to reduce costs, depending on how effectively your filtering rules eliminate irrelevant events. This feature simplifies the operational overhead of managing schema registry integration code so teams can focus on business logic rather than infrastructure concerns.

Error handling and monitoring

If schema registries become temporarily unavailable, then cached schemas allow event processing to continue until the registry is available again. Authentication failures generate error messages with automatic retry logic. Schema evolution happens seamlessly as Lambda automatically detects and fetches new versions.

If events fail validation or deserialization, they are routed to your configured failure destinations. For Amazon SQS and Amazon SNS destinations, the service sends metadata about the failure. For Amazon S3 destinations, both metadata and the original serialized payload are included for detailed analysis.

You can use standard Lambda monitoring, with more CloudWatch metrics providing visibility into schema validation success rates, registry API usage, and filtering effectiveness.

Conclusion

AWS Lambda now supports Avro and Protobuf formats for Kafka event processing in Provisioned Mode for Kafka ESM. This enables schema validation, event filtering, and integration with both Amazon MSK, Confluent, and self-managed Kafka clusters. Whether you’re building new Kafka applications or migrating existing consumers to Lambda, this native schema registry integration streamlines processing pipelines.

For more information about the Lambda Kafka integration capabilities, go to the learning guide, Lambda ESM documentation. To learn about Lambda pricing, such as Provisioned Mode costs, visit the Lambda pricing page.

For more serverless learning resources, visit Serverless Land.

Running and optimizing small language models on-premises and at the edge

2025-06-23 Chris McEvilly

Post Syndicated from Chris McEvilly original https://aws.amazon.com/blogs/compute/running-and-optimizing-small-language-models-on-premises-and-at-the-edge/

As you move your generative AI implementations from prototype to production, you may discover the need to run foundation models (FMs) on-premises or at the edge to address data residency, information security (InfoSec) policy, or low latency requirements. For example, customers in regulated industries, such as financial services, healthcare, and telecom, may want to leverage chatbots that support customer queries, optimize internal workflows for complex reporting, and automatically approve requests—while keeping their data in-country. Similarly, some organizations choose to deploy their own small language models (SLMs) to align with stringent internal InfoSec requirements. As another example, manufacturers may want to deploy SLMs right inside their factories to analyze production data and provide real-time equipment diagnostics. To address users’ data residency, latency, and InfoSec needs, this post provides guidance on deploying generative AI FMs into AWS Local Zones and AWS Outposts. The goal is to present a framework for running a range of SLMs that address local data processing requirements-based customer engagements.

Generative AI deployment options

The growth in generative AI deployments and experimentation has accelerated with two business deployment options. The first is the use of a large language model (LLM) to support enterprise needs. LLMs are incredibly flexible: one model can perform completely different tasks, such as answering questions, coding, summarizing documents, translating languages, and generating content. LLMs have the potential to disrupt content creation and the way people use search engines and virtual assistants. The second deployment model is the use of SLMs, addressing a specific use case. SLMs are compact transformer models that primarily use decoder-only or encoder-decoder architectures, with typically less than 20 billion parameters, though this definition is evolving as larger models become available. SLMs can achieve comparable or even superior performance when fine-tuned for specific domains or tasks, making them an excellent alternative for specialized applications.

In addition, SLMs offer faster inference times, lower resource requirements, and are suitable for deployment on a wider range of devices, making them particularly valuable for specialized applications and edge computing where space and power supply are limited. While SLMs are limited in scope and accuracy compared to LLMs, you can enhance their performance for specific tasks through Retrieval Augmented Generation (RAG) and fine-tuning. This combination produces an SLM capable of responding to queries relating to a specific domain with a comparable level of accuracy to an LLM, while reducing hallucinations. SLMs offer effective solutions that balance user needs with cost efficiency.

Architecture overview

The solution discussed in this blog post uses Llama.cpp, an optimized framework written in C/C++ to efficiently run a range of SLMs. Llama.cpp can run efficiently in a wide range of computing environments, enabling generative AI models to operate in Local Zones or on Outposts without the large Graphics Processing Unit (GPU) clusters typically associated with LLMs in their native frameworks. This framework expands your model choice and increases model performance when deploying SLMs to Local Zones and Outposts.

The architecture provides a template for deploying a range of SLMs supporting chatbot or generation use cases. The solution consists of a front-end application receiving user queries, formatting the prompts for presentation to the model, and delivering the resulting responses. To support a scalable solution, the application servers and Amazon EC2 G4dn GPU-enabled instances are front-ended with an Application Load Balancer (ALB).

For a deployment where the incoming prompts exceed the ability of SLMs to service the requests, a message queue can be deployed to the front end of the SLMs. As an example, you could deploy a RabbitMQ cluster to serve as your queue manager.

Figure 1: Architecture overview

Solution deployment

The following instructions show how to launch an SLM using Llama.cpp in Local Zones or on Outposts. Although the preceding architecture overview shows a complete solution with multiple components, this post focuses specifically on the steps necessary to deploy the SLM in an EC2 instance using Llama.cpp.

Prerequisites

For this solution, you need the following prerequisites:

An AWS account that is either allowlisted for Local Zones or has a logical Outpost installed, configured, and running
Access to G4dn instances in your account at the chosen location (check availability in AWS Service Quotas)
A VPC created to host the environment
Public and private subnets to support the environment in the VPC
A security group that is associated with your EC2 instance
AWS Identity and Access Management (IAM) role with AWS Systems Manager Session Manager permissions

1. Launch a GPU instance for your SLM

Sign in to the AWS Management Console, open the Amazon EC2 console, and launch a g4dn.12xlarge EC2 instance in your Local Zone or Outposts environment. Configure it with:

Red Hat Enterprise Linux 9 (HVM), SSD Volume Type
A private subnet associated with your Local Zone or Outposts rack
30 GiB gp2 root volume and an additional 300 GiB gp2 EBS volume
The IAM role configured with permissions required for Systems Manager
SSM Agent to connect to the instance (follow the instructions in Install SSM Agent on RHEL 8.x and 9.x in the Systems Manager User Guide)

For detailed EC2 launch instructions, see Launch an EC2 instance using the launch instance wizard in the console or Launch an instance on your Outposts rack.

Figure 2: SLM instance launched

2. Install NVIDIA drivers

Connect to the SLM instance using Systems Manager. You can follow the instructions in Connect to your Amazon EC2 instance using Session Manager in the Amazon EC2 User Guide.

Install kernel packages and tools.

sudo su –

dnf update -y

subscription-manager repos --enable codeready-builder-for-rhel-9-x86_64-rpms

dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm

dnf install -y ccache cmake gcc-c++ git git-lfs htop python3-pip unzip wget

dnf install -y dkms elfutils-libelf-devel kernel-devel kernel-modules-extra \
libglvnd-devel vulkan-devel xorg-x11-server-Xorg

systemctl enable --now dkms

reboot

Install Miniconda3 in the /opt/miniconda3 folder or install a compatible package manager to manage Python dependencies.

Install the NVIDIA drivers.

dnf config-manager --add-repo \
http://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo

dnf module install -y nvidia-driver:latest-dkms

dnf install -y cuda-toolkit

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc

echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

source ~/.bashrc

3. Download and install Llama.cpp

Create and mount the filesystem of the Amazon EBS volume you created earlier in the /opt/slm folder. The instructions are available in Make an Amazon EBS volume available for use in the Amazon EBS User Guide.

Run the following commands to download and install Llama.cpp.

cd /opt/slm

git clone -b b4942 \
https://github.com/ggerganov/llama.cpp.git

cd llama.cpp

cmake -B build -DGGML_CUDA=ON

cmake --build build --config Release -j$(nproc)

conda install python=3.12

pip install -r requirements.txt

pip install nvitop

4. Download and convert the SLM model

To run the SLM efficiently with Llama.cpp, you need to convert the model format to GGUF (GPT-Generated Unified Format). This conversion optimizes the model for better performance and memory usage on edge deployments with limited resources. GGUF is specifically designed to work with Llama.cpp’s inference engine.The following instructions show how to download SmolLM2 1.7B and convert it to the GGUF format:

mkdir /opt/slm/models
cd /opt/slm/models

git lfs install
git clone https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct

cd /opt/slm/llama.cpp

python3 convert_hf_to_gguf.py --outtype f16 \
--outfile /opt/slm/llama.cpp/models/SmolLM2-1.7B-Instruct-f16.gguf \
/opt/slm/models/SmolLM2-1.7B-Instruct/

echo 'export PATH=/opt/slm/llama.cpp/build/bin:$PATH' >> ~/.bashrc

echo 'export LD_LIBRARY_PATH=/opt/slm/llama.cpp/build/bin:$LD_LIBRARY_PATH' \
>> ~/.bashrc

source ~/.bashrc

You can also download other publicly available models from Hugging Face as needed, following the same conversion process.

SLM operation and optimization

The deployment of SLMs through Llama.cpp provides operational flexibility through a significant range of environment customization options, allowing you to tailor model behavior to your specific use cases. With Llama.cpp, you can select a range of parameters that can be used to optimize the use of system resources and model operation. This enables models to efficiently use the available resources without consuming unneeded resources or negatively impacting their operation. The following parameters are commonly supported when running Llama.cpp to provide control over how the model runs.

-ngl N, --n-gpu-layers N: When compiled with GPU support, this option allows you to offload some layers to the GPU for computation. This generally results in increased performance.
-t N, --threads N: Sets the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores that your system has.
-n N, --n-predict N: Sets the number of tokens to predict when generating text. Adjusting this value can influence the length of the generated text.
-sm, --split-mode: Specifies how to split the model across multiple GPUs when running in a multi-GPU environment. Consider testing the ‘row’ splitting mode as it may provide better performance in some scenarios compared to the default layer-based splitting.
--temp N: Temperature is a parameter that provides control over an SLM’s randomness in terms of its outputs. Changing the value from the default of 0.8 changes the distribution of the model’s predictions. Lower values sharpen and deterministically enhance the model’s output, preferring high-probability phrases. On the other hand, a higher temperature fosters more inventiveness and unpredictability, allowing for a wider range of diverse responses. (default: 0.8).
-s SEED, --seed SEED: Provides a method to control the randomness of the SLM’s response. Setting a specific seed value produces consistent responses across multiple runs, while using the default value (-1) generates varied responses. (default: -1, -1 = random seed).
-c, --ctx-size N: Context size is defined as the number of tokens an FM can process as one prompt (in tokens) into the model. This value reserves memory in the host for the maximum size context length. Reducing the context size can reduce the amount of RAM needed to load and run a model. Models with larger context sizes (>50k) consume more RAM, and in some models, a larger context size can impact accuracy. For example, with Phi-3, it’s recommended to reduce the context size to 8k or 16k. The command to achieve this is –ctx-size XXXX, where XXXX is the context size.

This section demonstrates how to optimize SLM performance for specific use cases using Llama.cpp. We’ll cover two common scenarios: chatbot interactions and text summarization.

Chatbot Use Case Example

Token Size Requirements

For chatbot applications, typical token size requirements include input sizes ranging from 50-150 tokens, which support user questions up to a couple of sentences, and output sizes of approximately 100-300 tokens, allowing the model to provide detailed but concise responses.

Sample Command

./build/bin/llama-cli -m ./models/SmolLM2-1.7B-Instruct-f16.gguf \
-ngl 99 -n 512 --ctx-size 8192 -sm row --temp 0 --single-turn \
-sys "You are a helpful assistant" -p "Hello"

Figure 3: Chatbot example

Command Explanation

-m ./models/SmolLM2-1.7B-Instruct-f16.gguf: Specifies the model file to use
-ngl 99: Allocates GPU layers for optimal performance
-n 512: Sets maximum output tokens to 512, sufficient for the 100-300 tokens typically needed
--ctx-size 8192: Defines context window size, allowing for complex conversation history
-sm row: Splits the rows across GPUs
--temp 0: Sets the temperature to zero to be less creative
--single-turn: Optimizes for single-response interactions
-sys "You are a helpful assistant": Sets the system prompt to define assistant behavior
-p "Hello": Provides the user input prompt

Text summarization example

This command line shows SmolLM2-1.7B running to support a summarization activity.

PROMPT_TEXT="Summarize the following text: Amazon DynamoDB is a serverless, 
NoSQL database service that allows you to develop modern applications 
at any scale. As a serverless database, you only pay for what you use 
and DynamoDB scales to zero, has no cold starts, no version upgrades, 
no maintenance windows, no patching, and no downtime maintenance. 
DynamoDB offers a broad set of security controls and compliance 
standards. For globally distributed applications, DynamoDB global 
tables is a multi-Region, multi-active database with a 99.999% 
availability SLA and increased resilience. DynamoDB reliability is 
supported with managed backups, point-in-time recovery, and more. 
With DynamoDB streams, you can build serverless event-driven applications."

./build/bin/llama-cli -m ./models/SmolLM2-1.7B-Instruct-f16.gguf \
-ngl 99 -n 512 --ctx-size 8192 -sm row --single-turn \
-sys "You are a technical writer" \
--prompt "$PROMPT_TEXT"

Figure 4: Summarization example

Cleaning up

To avoid ongoing charges, take these steps to delete the resources you created by following this post, if they are no longer needed.

Terminate your EC2 instance to avoid unnecessary charges. Furthermore, verify that the 300 GiB EBS volume you attached is properly deleted by checking the Volumes section under Elastic Block Store and manually delete any remaining volumes by choosing them and choosing Actions > Delete volume.

Conclusion

This post took you through the steps of deploying SLMs into AWS on-premises or edge environments to address local data processing needs. The post first discussed business benefits of SLMs, including faster inference time, reduced operational costs, and improved model outcomes. SLMs launched using Llama.cpp and optimized for specific use cases can effectively deliver user services from the edge in a scalable manner. The optimization parameters described in this post provide multiple ways to configure these models for different deployment use cases. You can follow the deployment steps and techniques in this blog post to implement generative AI capabilities that align with your specific data residency, latency, or InfoSec compliance needs while operating efficiently within the resource constraints of edge environments. Visit the following pages to learn more about AWS Local Zones and AWS Outposts.

Networking of Amazon MQ for RabbitMQ event source mapping for AWS Lambda

2025-06-18 Rafal Pawlaszek

Post Syndicated from Rafal Pawlaszek original https://aws.amazon.com/blogs/compute/networking-of-amazon-mq-for-rabbitmq-event-source-mapping-for-aws-lambda/

Event-driven architectures with message brokers need careful attention to security best practices. Amazon MQ for RabbitMQ combined with AWS Lambda enables serverless event processing. However, implementing defense in depth and least privilege principles necessitates a clear understanding of networking requirements. This is particularly important when working with different subnet types and their impact on service connectivity.

This post explores the networking aspects of Lambda event source mapping for Amazon MQ for RabbitMQ. Learn how deployment options influence your networking setup and security posture to make informed architectural decisions. These networking concepts are essential for building secure, scalable solutions, regardless of your experience level with message brokers.

For clarity in this post, when we refer to “RabbitMQ”, we mean Amazon MQ for RabbitMQ.

Prerequisites

The following prerequisites are necessary to complete this post:

An Amazon Web Services (AWS) account
Basic understanding of AWS networking concepts
Familiarity with Amazon MQ for RabbitMQ
Basic knowledge of Lambda

Furthermore, to enable setup of the discussed architectures, this post is accompanied by a GitHub repository that uses AWS Cloud Development Kit (AWS CDK).

Repository prerequisites

The following prerequisites are necessary for the repository:

Repository setup

Clone the https://github.com/aws-samples/sample-amazonmq-rabbitmq-lambda-esm repository. This repository contains all the necessary code and instructions to create relevant architectures using AWS CDK.

Install dependencies and build

Install the necessary NPM dependencies by running the following commands:

npm install
npm run build

Amazon MQ for RabbitMQ networking deployment options

Public accessibility is the primary networking differentiator when deploying a RabbitMQ broker in AWS. Although the broker operates in the Amazon MQ service account, the networking configuration varies based on this choice.

Public broker

When you deploy a publicly accessible broker, Amazon MQ provisions all networking components in the service account. The service provides a DNS name that resolves to an IP address of the Network Load Balancer (NLB) in that account. This configuration doesn’t support security groups. All security measures must be implemented through the RabbitMQ broker’s authentication and authorization mechanisms. The following diagram shows this communication flow.

Figure-1 DNS resolution of a public Amazon MQ for RabbitMQ broker.

Private broker

A private broker routes networking through a Amazon Virtual Private Cloud (Amazon VPC) in your account. Amazon MQ uses AWS PrivateLink to provision VPC Endpoints, which serve as entry points for broker communication.

The following diagram shows how client applications communicate with RabbitMQ:

The client application connects to Amazon Route 53 Resolver
Route 53 Resolver resolves the DNS name to the VPC Endpoint’s IP address
The client communicates with the broker through PrivateLink
Security groups protect the VPC Endpoint’s Elastic Network Interfaces (ENIs)

Figure-2 DNS resolution of a private Amazon MQ for RabbitMQ broker.

A private broker deployment offers two networking options:

Custom VPC configuration – Specify:
- Subnets for VPC Endpoint creation
- At least one security group to protect the VPC Endpoints
Default VPC configuration – Leave VPC options blank to use:
- Default VPC
- Default security group

Amazon MQ for RabbitMQ Lambda event source mapping building blocks

RabbitMQ solutions offer two approaches for message processing:

Create a custom client to read messages from broker queues
Use Lambda functions with event source mapping (ESM) for automated message retrieval

The ESM is a Lambda service resource that reads the messages from the broker and invokes the Lambda function synchronously. In the remainder of this post, we refer to this Lambda function as listener.

ESM connectivity depends on the following:

The listener’s AWS Identity and Access Management (IAM) Role for access permissions
RabbitMQ broker networking components

For public brokers, ESM uses public connectivity. For private brokers, ESM:

Assumes the listener’s IAM Role
Creates ENIs in the same subnets as the broker’s VPC Endpoints
Uses the same security groups that protect the VPC Endpoints

The listener’s IAM Role must include these Amazon Elastic Compute Cloud (Amazon EC2) permissions:

CreateNetworkInterface
DeleteNetworkInterface
DescribeNetworkInterfaces
DescribeSecurityGroups
DescribeSubnets
DescribeVpcs

To view ESM ENIs:

Open the AWS Management Console
Navigate to EC2 > Network Interfaces
Look for ENIs with the following naming pattern:
AWS Lambda VPC ENI-armq-<ACCOUNT_ID>-<ESM_ID>-<remainder>

where:
- ACCOUNT_ID – The AWS account number containing the ESM
- ESM_ID – The unique identifier of the ESM

The following image shows example ESM ENIs.

Figure-3 An example list of interfaces that Amazon MQ for RabbitMQ creates for private brokers.

Disabling or deleting the ESM removes the ESM components.

An enabled ESM needs connectivity to the following:

AWS Security Token Service: For IAM role assumption
AWS Secrets Manager: For RabbitMQ credentials
RabbitMQ broker: For queue access
AWS Lambda: For listener invocation

Because the ESM queue polling process follows these steps:

Assumes the listener’s IAM Role
Retrieves RabbitMQ credentials from Secrets Manager
Establishes broker communication
Invokes the listener when messages are present

You have two options to enable private broker connectivity to support the queue polling process:

Deploy VPC endpoints in ESM subnets for:
- AWS Security Token Service (AWS STS)
- Secrets Manager
- Lambda
Deploy NAT gateway in ESM subnets

ESM networking configuration options

The following sections detail ESM networking configurations for different deployment scenarios.

Option 1: Public broker

In this approach all network communication happens on the Amazon MQ service’s side. The ESM, when enabled, uses public connectivity.

To observe the architecture implemented in your account go to the cloned repository root location, make sure that you are signed in with AWS CLI and run the following:

cdk deploy PublicRabbitMqInstanceStack

Option 2: Private broker in a default VPC

Deploying a private RabbitMQ broker without specifying the VPC informs the Amazon MQ service to pick the default VPC for setting up the networking and then the public subnet(s) in that VPC. The default security group is used for securing the broker’s VPC Endpoints.

Creating the ESM provisions dedicated ENIs in the public subnets where the RabbitMQ broker’s VPC Endpoints reside with the default security group applied. The default security group allows itself for inbound traffic on all protocols and full port range, thus the ESM can route traffic through the VPC Endpoint.

Although the subnet is public with internet gateway access, the ESM ENIs operate in private address space, preventing direct communication with AWS services. To enable proper communication, create VPC Endpoints for AWS STS, Secrets Manager, and Lambda. These endpoints allow the ESM to communicate with AWS services through private IP addresses within your VPC. The following diagram shows the complete communication path from the ESM to the broker.

Figure-4 Networking configuration and request flow for a private broker provisioned in the default VPC.

To observe the architecture implemented in your account, go to the cloned repository root location, make sure that you are signed in with AWS CLI, and run the following

cdk deploy PrivateRabbitMqInstanceDefaultVpcStack

Option 3: Private broker in a Custom VPC with NAT

When deploying a private RabbitMQ broker in a custom VPC, specify either a single subnet for a standalone broker or multiple subnets for a cluster deployment. The deployment also needs a security group for the VPC Endpoint ENIs.

Configure the security group with a self-referencing inbound rule on the AMQP port. This configuration enables communication between the ESM and the RabbitMQ VPC Endpoints’ ENIs.

The following diagram shows how ESM resources communicate through networking components when deployed in a private subnet with NAT gateway. This architecture demonstrates the complete communication path from the ESM to the broker.

Figure-5 Networking configuration and request flow for a private broker provisioned in a private VPC subnet with NAT.

To observe the architecture implemented in your account, go to the cloned repository root location, make sure that you are signed in with AWS CLI, and run the following:

cdk deploy PrivateRabbitMqInstanceCustomVpcWithNatStack

Option 4: Private broker in a Custom VPC with isolated subnets

This configuration builds upon the previous architecture but introduces isolated subnets. These subnets restrict all internet connectivity, permitting only internal VPC network traffic. Although the broker networking components mirror Option 3, the isolation introduces more considerations.

The security group still needs an open AMQP port for queue operations, but the subnet isolation prevents the ESM from directly accessing AWS services. To address this limitation, deploy VPC Endpoints for AWS STS, Secrets Manager, and Lambda within the isolated subnets. These endpoints create a private communication path for the ESM to interact with essential AWS services without needing internet access.

The following diagram shows the communication architecture for ESM resources deployed in isolated subnets. It demonstrates how VPC Endpoints enable secure communication between the ESM and AWS services while maintaining network isolation. This architecture makes sure that the ESM can fulfill its message processing responsibilities without compromising security through internet exposure.

Figure-6 Networking configuration and request flow for a private broker provisioned in an isolated VPC subnet.

To observe the architecture implemented in your account, go to the cloned repository root location, make sure that you are signed in with AWS CLI, and run the following:

cdk deploy PrivateRabbitMqInstanceCustomVpcIsolatedSubnetStack

Option 5: Private broker in a Custom VPC with public subnets

The final configuration places the broker in public subnets while maintaining the core deployment requirements from the previous options. Despite the public subnet placement, the ESM’s networking behavior presents an important consideration: ESM ENIs operate in private address space, preventing direct internet communication even with an internet gateway present.

This architecture necessitates VPC Endpoints for AWS service communication, similar to Option 2. Any attempts to route ESM traffic through the internet gateway fail because the ENIs operate in private address space. Understanding this limitation is crucial for proper deployment planning.

The following diagram shows the ESM communication architecture in public subnets. Despite the different subnet type, this configuration mirrors the isolated subnet approach in its use of VPC Endpoints. These endpoints enable the ESM to communicate with AWS STS, Secrets Manager, and Lambda services through private, secure connections within the VPC.

Figure-7 Networking configuration and request flow for a private broker provisioned in a public VPC subnet.

To observe the architecture implemented in your account, go to the cloned repository root location, make sure that you are signed in with AWS CLI, and run the following:

cdk deploy PrivateRabbitMqInstanceCustomVpcPublicSubnetStack

Cleaning up

To prevent unexpected AWS charges, remove resources you’ve created. The following AWS CDK command helps you safely remove all deployed resources:

cdk destroy --all

Conclusion

This post explored the relationship between AWS Lambda event source mapping and RabbitMQ networking configurations. We examined various deployment scenarios, from public brokers to isolated subnets, each presenting unique considerations for secure and effective implementation.

Understanding these networking patterns enables you to make informed architectural decisions when deploying Amazon MQ for RabbitMQ with Lambda event source mapping. Whether choosing public accessibility or implementing private networking with VPC Endpoints, understanding the consequences of choosing specific networking configurations allows you to apply security best practices while meeting your application’s messaging needs. As you implement these patterns, consider your specific security requirements and operational needs to choose the most appropriate configuration for your use case.

Take the next step in optimizing your serverless messaging architecture. Dive in to the AWS documentation, experiment with the RabbitMQ and Lambda integration patterns discussed, and discover how these networking configurations can elevate the security and performance of your own applications. Start implementing these strategies today to build more robust, scalable solutions.

Empower AI agents with user context using Amazon Cognito

2025-06-18 Abrom Douglas

Post Syndicated from Abrom Douglas original https://aws.amazon.com/blogs/security/empower-ai-agents-with-user-context-using-amazon-cognito/

Amazon Cognito is a managed customer identity and access management (CIAM) service that enables seamless user sign-up and sign-in for web and mobile applications. Through user pools, Amazon Cognito provides a user directory with strong authentication features, including passkeys, federation to external identity providers (IdPs), and OAuth 2.0 flows for secure machine-to-machine (M2M) authorization.

Amazon Cognito issues standard JSON Web Tokens (JWTs) and supports the customization of identity and access tokens for user authentication by using the pre token generation Lambda trigger. Learn more about this in How to customize access tokens in Amazon Cognito user pools. Amazon Cognito has extended token customization capabilities to support access token customization for M2M and the ability to pass metadata from the client during M2M authorization. Application builders can use these two features to support multiple use cases, including customizing access tokens based on unique runtime policies, entitlements, environment, or passed metadata. This can simplify and enrich M2M authentication and authorization scenarios and opens up new possibilities for emerging use cases, such as identity and access management for AI agents.

This post demonstrates how Amazon Cognito enables AI agents to perform authorized actions on behalf of users through user-contextualized access tokens for OAuth 2.0-enabled resource servers. AI agents represent a class of autonomous services that require robust identity management and precise access control, especially when acting on behalf of users. By using the Amazon Cognito client credentials flow with access token customization, you can establish distinct identities for AI agents that carry critical information about their capabilities, scope of access, and intended use cases. This approach provides a foundation for more secure, auditable AI agent operations while maintaining clear boundaries around their authorized activities.

The identity of an AI agent can be represented within Amazon Cognito as an app client. The AI agent obtains an access token (JSON Web Token (JWT)) through an OAuth 2.0 client credentials grant. This JWT can be customized to contain claims that represent the authenticated human user whom the AI agent is acting on behalf of. This token can then be used to authorize access to other services that has established trust with the Amazon Cognito user pool by trusting the issuer and audience of the token. For example, this third-party service could be a claims processor, a travel agency service, or a scheduling service acting on behalf of a user. The focus of this post is on foundational building blocks using Amazon Cognito for AI agents and how to obtain a customized access token with user context.

Solution overview and reference architecture

Looking at an example architecture (Figure 1), a user signs in to a web or mobile application using an Amazon Cognito user pool, and tokens for the user are returned to the client. Here, the application could be a serverless digital assistant using an Amazon Bedrock agent that needs to gather and process data residing in a third-party cross-domain service. The AI agent obtains its own access token by performing an OAuth 2.0 client credentials grant while passing the user’s access token as context using the aws_client_metadata request parameter. The AI agent receives the user contextualized access token and calls an external, third-party, or cross-domain service that trusts the issuer and audience of the AI agent’s access token issued from an Amazon Cognito user pool. The cross-domain service can obtain the JSON Web Key Set (JWKS) to verify the token and extract claims presenting both the AI agent and most importantly, the underlying user. Authorization takes place within the cross-domain service using the claims of the customized access token and for fine-grain authorization, Amazon Verified Permissions is used. See Figure 1 for a detailed flow of this example.

Figure 1: AI agent identity reference architecture

The user navigates to the application through the client.
There is no existing session or token for the user, so the user authentication flow with the Amazon Cognito user pool begins.
After a successful sign-in, Amazon Cognito returns access, ID, and refresh tokens to the client for the user.
As the user interacts with AI agent through the application, the client sends the user’s access token to an Amazon API Gateway endpoint.
The API gateway integrates with the AI agent, which is using an Amazon Bedrock agent. As an example, this can use several AWS Lambda functions interacting with an Amazon Bedrock Knowledge Base or a Retrieval-Augmented Generation (RAG) process.
The AI agent obtains its own access token from an Amazon Cognito user pool using an OAuth 2.0 client credentials grant. The user’s access token, obtained in step 1, is sent with the token request in the aws_client_metadata request parameter.

Note: You can use different Amazon Cognito user pools for user authentication and for agent (machine) authentication. This promotes separation and provides the ability to apply different settings and controls on each user pool if needed to meet security requirements.

Amazon Cognito validates the client ID and secret from the AI agent and invokes the pre token generation Lambda trigger to customize the access token for the AI agent.

Note: Within the pre token generation Lambda trigger, the user’s access token is verified before returning a customized access token to the AI agent using the aws-jwt-verify library.

The customized access token is returned to the AI agent, including custom claims representing the user.
The AI agent, using its own access token, calls the cross-domain service to perform the requested action on behalf of the user. For example, this can be a third-party reservation system or a photo sharing service.
The resource server in the cross-domain service verifies that the access token from the AI agent is valid. The resource server must be pre-configured to trust the user pool that issued the agent access token.
Coarse- and fine-grained authorization can happen either locally in the service code or using Verified Permissions.
A response from the cross-domain service flows back to the AI agent, if necessary.
A response from the AI agent to the user application or client is returned, if necessary.
Actions that take place throughout the flow are logged in AWS CloudTrail, providing end-to-end logging and auditing.

Implementation details

Let’s take a deeper look into the three core components of this scenario:

The AI agent obtaining its own OAuth 2.0 access token
The Amazon Cognito pre token generation Lambda trigger used to enrich the AI agent’s access token with user context
The cross-domain resource server performing fine-grained authorization

AI agent

Figure 2: AI agent obtaining a user access token from the frontend application through API Gateway

Amazon Bedrock Agents is used in this solution with a
custom orchestration configured to use Lambda. When the application interacts with the Amazon Bedrock agent, the custom orchestrator initiation begins with the agent passing the user’s access token to a Lambda function as part of the custom orchestration (shown in Figure 2). The Lambda function validates the user’s token to verify that it’s not expired and hasn’t been tampered with. This custom orchestrator begins the process for the agent to obtain its own OAuth access token and to access downstream and cross-domain resources on behalf of the user. The human user’s access token is included in the call from the application through the client. To learn more about Amazon Bedrock Agents custom orchestrator, see
Getting started with Amazon Bedrock Agents custom orchestrator. The following is an example of what a human user’s decoded access token provided through an API Gateway REST API might look like.

{
  sub: "user-identity-4e4c-example-7cede8e609a2",
  cognito:groups: 
    [
    "exampleChatApplicationAccess"
    ]
  ,
  iss: https://cognito-idp.<region>.amazonaws.com/<region>_example,
  version: 2,
  client_id: "1example23456789",
  origin_jti: "",
  token_use: "access",
  scope: "openid profile email",
  auth_time: 499192140,
  exp: 1445444940,
  iat: 499192140,
  jti: "",
  username: "my-example-username"
}

The following is a Node.js code sample that an AI agent can use to obtain its own access token from Amazon Cognito. This can be the Lambda function part of the custom orchestration for the Amazon Bedrock agent. Notice the clientMetadata variable being set, which will be passed to the Cognito /token endpoint using the aws_client_metadata request parameter. This request parameter is where the user’s access token is provided. In the following code example, you will find an attribute called callerApp, which is set to ExampleChatApplication, which serves as a unique identifier for the application. The callerApp value is preconfigured in the backend of the solution. This unique application identifier is included in the customized access token for the agent and used for additional authorization checks later. It’s a security best practice to use AWS Secrets Manager to store the client ID and client secret and obtain these credentials at runtime. As a security best practice, the user’s access token should be verified prior to passing it to the AI agent backend.

async function getAccessToken() {
    const clientId = 'exampleAiAgentClientId'; // use Secrets Manager
    const clientSecret = 'exampleAiAgentClientSecret'; // use Secrets Manager
    const tokenEndpoint = 'https://mydomain.auth.<region>.amazoncognito.com/oauth2/token';
    const scope = 'crossDomainService/read userData/read';
    const clientMetadata = '{"onBehalfOfToken":"<HUMAN-USER-ACCESS-TOKEN>", "callerApp":"ExampleChatApplication"}';
  
    const basicAuth = Buffer.from(`${clientId}:${clientSecret}`).toString('base64');
  
    const body = new URLSearchParams({
      grant_type: 'client_credentials',
      scope,
      aws_client_metadata: clientMetadata
    });
  
    const res = await fetch(tokenEndpoint, {
      method: 'POST',
      headers: {
        'Authorization': `Basic ${basicAuth}`,
        'Content-Type': 'application/x-www-form-urlencoded'
      },
      body
    });
  
    if (!res.ok) {
      const error = await res.text();
      throw new Error(`Token request failed: ${res.status} ${error}`);
    }
  
    const { access_token } = await res.json();
    console.log('Access Token:', access_token);
  
    return access_token;
  }
  
  getAccessToken().catch(err => console.error('Error:', err.message));

The access token for the AI agent is returned only if the client ID and secret are correct and the provided user access token is valid. However, before it’s returned, the AI agent’s access token is customized by the Amazon Cognito pre token generation Lambda trigger.

Amazon Cognito pre token generation Lambda trigger

Figure 3: AI agent access token customization with Cognito pre token generation Lambda trigger

After the AI agent’s action calls the Amazon Cognito /token endpoint with a valid client ID and secret, Cognito invokes the pre token generation Lambda trigger. The following is an example Lambda function that takes the aws_client_metadata request parameter, which contains the access token of the user and the callerApp attribute that was defined while the user was authenticating. In the following Lambda function, the access token provided from the user is verified (shown in Figure 3). The aws-jwt-verify library is used to verify the token is not expired, the token has not been tampered with by verifying the signature, and it’s making sure that an access token was provided. The Lambda function is also pre-configured to accept user tokens from a specific issuer and audience, this protects against malicious context injection risks. This is also an opportunity to perform additional authorization. For example, check if the user is a member of certain groups.

After the token is verified, the Lambda function customizes the access token to be returned to the AI agent.

import { CognitoJwtVerifier } from "aws-jwt-verify";

// Initialize the JWT verifier to verify the user’s access token
// Provide the user pool ID, token use, and client ID 
const jwtVerifier = CognitoJwtVerifier.create({
  userPoolId: process.env.USER_POOL_ID,  // user pool for user authentication
  clientId: process.env.CLIENT_ID,
  // groups: "exampleChatApplicationAccess", // optional group membership authorization
  tokenUse: 'access'
});

export const handler = async function(event, context) {
  try {
    const onBehalfOfToken = event.request.clientMetadata?.onBehalfOfToken || '';
    // It’s recommended that the provided “callerApp” value from the application is authorized for use with the app client for the AI agent
    const callerApp = event.request.clientMetadata?.callerApp || '';

    // The below console log will display the authenticated user’s JWT
    // Keep this logging with caution in a production environment
    console.log('Original event:', event);

    // Verify the access token from the human user
    // You could optionally also perform some authorization checks here as well
    // Example: check for the membership of a group
    let decodedJWT;
    if (onBehalfOfToken) {
      try {
        decodedJWT = await jwtVerifier.verify(onBehalfOfToken);
        console.log('Decoded JWT:', decodedJWT);
      } catch (err) {
        console.error('Token verification failed:', err);
        throw new Error('Token verification failed');
      }
    }

    // Create the onBehalfOf claim structure
    const behalfOfClaim = decodedJWT ? {
      sub: decodedJWT.sub,
      username: decodedJWT.username,
      groups: decodedJWT['cognito:groups'] || []
    } : {};

    // Customized token returned to client
    event.response = {
      "claimsAndScopeOverrideDetails": {
        "accessTokenGeneration": {
          "claimsToAddOrOverride": {
            "onBehalfOf": behalfOfClaim,
            "callerApp": callerApp
          },
        }
      }
    };

    return event;

  } catch (error) {
    console.error('Error in Lambda execution:', error);
    throw error;
  }
};

Notice in the preceding Lambda function that two custom claims are being dynamically created within the event.response: onBehalfOf and callerApp. The onBehalfOf claim contains nested claims that were extracted from the human user’s access token. The callerApp is carried forward from the frontend application and provided alongside the user access token. It’s recommended for the callerApp value to also be verified against some custom logic to add additional layer of protection. The return AI agent’s access token would look like the following JWT.


{    
	"sub": "agent-identity-4e4c-example-7cede8e609a2",
	"onBehalfOf": {
		"sub": "user-identity-4e4c-example-7cede8e609a2",
		"username": "my-example-username",
		"groups": [
			"readaccess"        
				]    
		},    
		"iss": "https://cognito-idp..amazonaws.com/_example",
		"version": 2,
		"client_id": "1example23456789",
		"callerApp": "ExampleChatApplication",
		"token_use": "access",
		"scope": "crossDomainService123/read userData/read",
		"auth_time": 499192140,
		"exp": 1445444940,
		"iat": 499192140,
		"jti": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
}

Cross-domain resource server authorization check

At this point, shown in Figure 4, the human user has successfully authenticated to the web application, the human user’s access token was sent as context to the backend, an AI agent obtained its own customized access token containing the human user context, and now the agent is ready to call an external cross-domain service.

Figure 4: Cross-domain resource server performing fine-grained authorization with Amazon Verified Permissions

As shown in Figure 4, the cross-domain service is the resource server and therefore needs to perform an authorization check. For this example, we’ll keep things straightforward and make sure that three core things are verified:

The AI agent’s OAuth access token is valid
The AI agent is authorized to access this service
The AI agent is authorized to interact with the user data

Depending on your use case and requirements, you might also need to verify that the user’s consent has been obtained prior to the AI agent acting on their behalf. Ultimately, you want to verify that the AI agent can access a user’s data on their behalf and only for the purpose for which consent has been provided by the user.

For the token verification, use the aws-jwt-verify library again. The following is a Node.js example to verify the AI agent’s access token.

import { CognitoJwtVerifier } from "aws-jwt-verify";

// add custom logic to verify that AI agent is authorized to perform this action on behalf of the user

// Verifier that expects valid access tokens:
const verifier = CognitoJwtVerifier.create({
  userPoolId: "<user_pool_id>", // user pool for AI agent authentication
  tokenUse: "access",
  clientId: "<client_id>",
});

try {
  const payload = await verifier.verify(
    "eyJraWQeyJhdF9oYXNoIjoidk..." //this will be the AI agent's access token
  );
  console.log("Token is valid. Payload:", payload);
} catch {
  console.log("Token not valid!");
}

Fine-grained authorization with Verified Permissions

As a security best practice, the zero trust principle of enforcing fine-grained identity-based authorization should take place using Verified Permissions. The preceding Node.js code sample is a basic validation of the AI agents access token that can happen within the application logic. Instead of keeping authorization logic within the resource server, you can use Verified Permissions to offload the authorization policies to a managed service. The following is an example Cedar policy for this use case.

permit(
    principal == Agent::"agent-identity-4e4c-example-7cede8e609a2",
    action == Action::"readOnly",
    resource == Resource::"crossDomainService123::userData"
)
when {
    resource.scope == Scope::"crossDomainService123/read" &&
    resource.owner == User::" user-identity-4e4c-example-7cede8e609a2" &&
    context.onBehalfOf.sub == " user-identity-4e4c-example-7cede8e609a2" &&
    context.callerApp == "ExampleChatApplication"
};

With the preceding Cedar policy example, you are permitting the AI agent to read userData from the crossDomainService123 resource. This is only permitted when the AI agent’s access token contains the crossDomainService/read scope and when the resource owner and the onBehalfOf user (from the access token) are the same—the human user in this case. There’s also an additional when clause in the policy to make sure that this interaction initiated from ExampleChatApplication.

The cross-domain resource server would use the AI agent’s access token and call the Verified Permissions IsAuthorizedWithToken API. To learn more, see Simplify fine-grained authorization with Amazon Verified Permissions and Amazon Cognito.

The following is a Node.js example using the IsAuthorizedWithToken API from Verified Permissions using the AWS SDK for JavaScript v3.

import { VerifiedPermissionsClient, IsAuthorizedWithTokenCommand } from "@aws-sdk/client-verifiedpermissions";

const client = new VerifiedPermissionsClient({ region: "<region>" });

// Dynamically provided token 
const jwtToken = "eyJraWQiOiJrMWtleSIsInR..."; //AI agent's access token

async function checkAccess() {
  const input = {
    policyStoreId: "ps-abc123example", // your AVP policy store ID
    accessToken: jwtToken,
    action: {
      actionType: "Action",
      actionId: "readOnly"
    },
    resource: {
      entityType: "crossDomainService123",
      entityId: "userData"
    }
  };

  const command = new IsAuthorizedWithTokenCommand(input);

  try {
    const response = await client.send(command);
    console.log("Authorization Decision:", response.decision);
  } catch (err) {
    console.error("Authorization error:", err);
  }
}

Based on the preceding examples of the AI agent’s access token (with user context), the Cedar policy, and the IsAuthorizedWithToken API call, the resource server would get an Allow decision for this action to take place. The following is an example of the authorization decision response.

{
    "decision": "Allow",
    "determiningPolicies": [{
        "determiningPolicyId": "ps-abc123example"
    }],
    "errors": []
}

Before this policy can be evaluated, you must define a schema that includes the relevant entity types (Agent, User, Resource, Scope, and so on), and create corresponding entities in your policy store that match the IDs used in the policy and request.

Bringing it all together, the requested data from the AI agent, on behalf of the user, is returned from the cross-domain service to the AI agent. This additional data can now be used within the context of the AI agent workload. The entire solution can be used for a chat application, such as the one described in Protect sensitive data in RAG applications with Amazon Bedrock.

Conclusion

Amazon Cognito M2M access token customization and support for passing client metadata provides you the extensibility to solve complex use cases and enables emerging ones like AI agent identity and access management. For example, passing contextual client metadata and customizing access tokens at runtime can help software as a service (SaaS) and multi-tenant service providers scale to an unlimited number of resource servers, because these can be dynamically determined at runtime. As organizations increasingly explore the use of AI agents, having a secure, scalable identity management solution becomes crucial for maintaining control and accountability. By using these new features, you can build more secure and scalable solutions with Amazon Cognito to prepare for the future of autonomous AI agent use cases.

Use the comments section to leave feedback about this post. If you have questions about this post, start a new thread on Amazon Cognito re:Post or contact AWS Support.

Secure your Express application APIs in minutes with Amazon Verified Permissions

2025-06-17 Trevor Schiavone

Post Syndicated from Trevor Schiavone original https://aws.amazon.com/blogs/security/secure-your-express-application-apis-in-minutes-with-amazon-verified-permissions/

Today, Amazon Verified Permissions announced the release of @verifiedpermissions/authorization-clients-js, an open source package that developers can use to implement external fine-grained authorization for Express.js web application APIs in minutes when using Verified Permissions.

Express is a minimal and flexible Node.js web application framework that provides a robust set of features for web and mobile applications. By using this standardized integration with Verified Permissions, developers can externalize authorization using up to 90 percent less code compared to writing their own custom integrations, saving them time and effort and improving application security posture by reducing the amount of custom integration code.

Why externalize authorization?

Traditionally, developers implemented authorization within their application by embedding authorization logic directly into application code. This embedded authorization logic is designed to support a few permissions, but as applications evolve, there is often a need to incrementally update the embedded authorization logic to support more complex use cases, resulting in code that is complex and difficult to maintain. As code complexity increases, further evolving the security model and performing audits of permissions becomes more challenging, resulting in an application that becomes more difficult to maintain over its lifecycle.

By externalizing authorization, you can decouple authorization logic from your application. This yields multiple benefits including freeing up development teams to focus on application logic and simplifying software audits.

One approach to externalize authorization from your application code is to use Cedar. Cedar is an open source language and software development kit (SDK) for writing and enforcing authorization policies for your applications. You specify fine-grained permissions as Cedar policies, and your application authorizes access requests by calling the Cedar SDK. For example, if you’re building a pet store application, you can use the following Cedar policy to control that only a user with a jobLevel of employee can access the POST /pets API.


permit (
	principal,
	action in [Action::"POST /pets"], 
	resource
) when {
	principal.jobLevel = "employee"
};

One option for using Cedar is to self-manage the implementation; you can find an example for this pattern in another post: Secure your application APIs in 5 minutes with Cedar.

Self-managed Cedar provides the benefits of externalizing authorization but requires ongoing operational management. Organizations are responsible for Cedar version upgrades, applying security patches, managing policies, and auditing authorizations. Another option for using Cedar is to use Verified Permissions. Verified Permissions removes these operational requirements by providing a managed service for Cedar. Verified Permissions manages scaling, simplifies policy governance by supporting centralized policy management, and logs policy changes and authorization requests to simplify auditing.

This post describes how web application developers can use the new Express package to simplify the integration of Express web applications with Verified Permissions. The step-by-step guide uses a sample Pet Store application to show how access to APIs can be restricted based on user groups. You can find the sample Pet Store application in the verifiedpermissions repository on GitHub.

Pet Store application API overview

The Pet Store application is used to manage a pet store. The pet store is built using Express with Node.js and exposes the APIs in the following table.

API	Description
GET /api/pets	Returns the list of available pets
GET /api/pets/{petId}	Returns the specified pet found
POST /api/pets	Adds a pet to the pet store
PUT /api/pets/{petId}	Updates an existing pet
DELETE /api/pets/{petId}	Removes a pet from the pet store

This application doesn’t allow all users to access all APIs. Instead, it enforces the following rules:

Administrators: Full access to pets and management functions
Employees: Can view, create, and update pets
Customers: Can view pets and create new pets

Implementing authorization for the Pet Store APIs

Let’s walk through how to secure your application APIs using Verified Permissions and the new package for Express. The initial application, with no authorization, can be found in the start folder; use this to follow along with the post. You can find a completed version of the application in the finish folder.

When completed, you’ll have implemented the application architecture shown in Figure 1. A React frontend application that uses Amazon Cognito for authentication. The application then includes the identity token returned from Cognito as an authorization header to the Express backend APIs. The Express backend, using the new Verified Permissions authorization middleware package, calls Verified Permissions to authorize the user request.

Figure 1: Architecture of the Pet Store application

Prerequisites

Before you get started, make sure you have the following prerequisites in place.

Step 1: Set up the AWS CLI

Some of the commands require the AWS Command Line Interface (AWS CLI). See Installing or updating to the latest version of the AWS CLI and Configuring settings for the AWS CLI.

Step 2: Set up an OpenID Connect identity provider and a database

The Pet Store application uses an OpenID Connect (OIDC) identity provider to manage users. For this example, you use an Amazon Cognito user pool called PetStoreUserPool with three users, one Admin, one Employee, and one Customer.

The application also uses a Amazon DynamoDB database to store the pets.

You can set up Amazon Cognito and DynamoDB in your AWS account by running the following command in the /start directory.

./scripts/setup-infrastructure.sh

The setup script will prompt you to set passwords for the three users (passwords must be at least 8 characters and require at least one number, one uppercase letter, and one lowercase letter).

Note the outputs of running this script because you’ll use them in step 5 of Integrate Verified Permissions.

Note: In your own applications, you can set up Amazon Cognito by following the instructions in Create a new application in the Amazon Cognito console, or you can bring your own OIDC identity provider.

Step 3 (optional): Run the application

Now that the infrastructure is set up, you can run the application. In two separate terminals, run the following commands in the /start directory:

./scripts/run-backend-dev.sh
./scripts/run-frontend-dev.sh

Test the application by creating some pets.

Integrate Verified Permissions

With the prerequisites in place, the next step is to integrate Verified Permissions. Verified Permissions can be integrated into an Express application in six steps:

Create a Verified Permissions policy store
Add the Cedar and Verified Permissions authorization middleware packages
Create and deploy a Cedar schema
Create and deploy Cedar policies
Connect the Verified Permissions policy store to your OIDC identity provider
Update the application code to call Verified Permissions to authorize API access

The Verified Permissions integration happens with the Express web application backend. All commands in the section should be run in the /start/backend directory.

Step 1: Create a Verified Permissions policy store

Create a policy store in Verified Permissions using the AWS CLI by running the following command

aws verifiedpermissions create-policy-store  --validation-settings "mode=STRICT"

Example successful command output:

{
    "policyStoreId": "AAAAbbbbCCCCdddd",
    "arn": "arn:aws:verifiedpermissions::111122223333:policy-store/AAAAbbbbCCCCdddd",
    "createdDate": "2025-06-05T19:30:37.896119+00:00",
    "lastUpdatedDate": "2025-06-05T19:30:37.896119+00:00"
}

Save the policyStoreId value from the command output to use in step 3.

Step 2: Add the Cedar and Verified Permissions authorization middleware packages

Run the following command to add two new dependencies on @verifiedpermissions/authorization-clients and @cedar-policy/authorization-for-expressjs
```
npm i --save @verifiedpermissions/authorization-clients
npm i --save @cedar-policy/authorization-for-expressjs
```

Step 3: Create and deploy the Cedar schema

A Cedar schema defines the authorization model for an application, including the entity types in the application and the actions users are allowed to take. You attach your schema to your Verified Permissions policy stores, and when policies are added or modified, the service automatically validates the policies against the schema.

The @cedar-policy/authorization-for-expressjs package can analyze the OpenAPI specification of your application and generate a Cedar schema. Specifically, the paths object in the OpenAPI schema is required in your specification.

If you don’t have an OpenAPI spec, you can generate one using the tool of your choice. There are several open source libraries that you can use to do this for Express; you might need to add some code to your application, generate the OpenAPI spec, and then remove the code. Alternatively, some generative AI based tools such as the Amazon Q Developer CLI are effective at generating OpenAPI spec documents. Regardless of how you generate the spec, be sure to validate the correct output from the tool.

For the sample application an OpenAPI spec document has been included and is named openapi.json.

Run the following command to generate the Cedar schema.

npx @cedar-policy/authorization-for-expressjs generate-schema --api-spec schemas/openapi.json --namespace PetStoreApp --mapping-type SimpleRest

Example successful command output:

Cedar schema successfully generated. Your schema files are named: v2.cedarschema.json, v4.cedarschema.json.
v2.cedarschema.json is compatible with Cedar 2.x and 3.x
v4.cedarschema.json is compatible with Cedar 4.x and required by the nodejs Cedar plugins.

Next, format the Cedar schema for use with the AWS CLI. The specific format required is described in the documentation Amazon Verified Permissions policy store schema. To format the Cedar schema run the following command.
```
../scripts/prepare-cedar-schema.sh v2.cedarschema.json v2.cedarschema.forAVP.json
```
Example successful command output:
```
Cedar schema prepared successfully: v2.cedarschema.forAVP.json
You can now use it with AWS CLI:
```

After the schema is formatted, run the following command to upload the schema to Verified Permissions. Note that you need to replace <policy store id> with the actual policy store ID, which is provided as an output from the command in step 1.

aws verifiedpermissions put-schema --definition file://v2.cedarschema.forAVP.json --policy-store-id <policy store id>

Example successful command output:

{
    "policyStoreId": "AAAAbbbbCCCCdddd",
    "namespaces": [
        "PetStoreApp"
    ],
    "createdDate": "2025-06-03T20:19:33.480528+00:00",
    "lastUpdatedDate": "2025-06-05T19:42:45.198325+00:00"
}

Step 4: Create and deploy Cedar policies

If no policies are configured, Cedar denies authorization requests. The next step is to create policies that will allow specific user groups access to specific resources. The Express framework integration helps bootstrap this process by generating example policies based on the previously generated schema. You can then then customize these policies based on your use cases.

Run the following command to generate sample Cedar policies.

npx @cedar-policy/authorization-for-expressjs generate-policies --schema v2.cedarschema.json

Example successful command output:

Cedar policy successfully generated in policies/policy_1.cedar
Cedar policy successfully generated in policies/policy_2.cedar

Two sample policies are generated in the /policies directory: policy_1.cedar and policy_2.cedar.

policy_1.cedar provides permissions for users in the admin user group to perform any action on any resource.


// policy_1.cedar
// Allows admin usergroup access to everything
permit (
	principal in PetStoreApp::UserGroup::"admin",
	action,
	resource
);

policy_2.cedar provides more access to the individual actions defined in the Cedar schema with a place holder for a specific group.

// policy_2.cedar
// Allows more granular user group control, change actions as needed
permit (
    principal in PetStoreApp::UserGroup::"ENTER_THE_USER_GROUP_HERE",
    action in
        [PetStoreApp::Action::"GET /pets",
         PetStoreApp::Action::"POST /pets",
         PetStoreApp::Action::"GET /pets/{petId}",
         PetStoreApp::Action::"PUT /pets/{petId}",
         PetStoreApp::Action::"DELETE /pets/{petId}"],
    resource
);

Note that if you specified an operationId in the OpenAPI specification, the action names defined in the Cedar Schema will use that operationId instead of the default <HTTP Method> /<PATH> format. In this case, make sure that the naming of your actions in your Cedar policies matches the naming of your actions in your Cedar schema.

For example, if you want to call your action AddPet instead of POST /pets, you could set the operationId in your OpenAPI specification to AddPet. The resulting action in the Cedar policy would be PetStoreApp::Action::"AddPet"

Create a third policy file called policy_3.cedar and then replace the contents of each file with the following policies. Replace <userpoolId> in each policy with the Cognito User Pool Id copied earlier.

Note: In a real use case, consider renaming your Cedar policy files based on their contents, for example, allow_customer_group.cedar.

// Defines permitted administrator user group actions
permit (
    principal in PetStoreApp::UserGroup::"<userPoolId>|administrator",
    action,
    resource
);

// Defines permitted employee user group actions
permit (
    principal in PetStoreApp::UserGroup::"<userPoolId>|employee",
    action in
        [PetStoreApp::Action::"GET /pets",
         PetStoreApp::Action::"POST /pets",
         PetStoreApp::Action::"GET /pets/{petId}",
         PetStoreApp::Action::"PUT /pets/{petId}"],
    resource
);

// Defines permitted customer user group actions
permit (
    principal in PetStoreApp::UserGroup::"<userPoolId>|customer",
    action in
        [PetStoreApp::Action::"GET /pets",
         PetStoreApp::Action::"POST /pets",
         PetStoreApp::Action::"GET /pets/{petId}"],
    resource
);

The policies need to be formatted so that they work with the AWS CLI for Verified Permissions. The specific format is described in the AWS CLI Verified Permissions documentation. Run the following command to format the policies.

../scripts/convert_cedar_policies.sh

Example successful command output:

Converting policies/policy_1.cedar to policies/json/policy_1.json
Created policies/json/policy_1.json
Converting policies/policy_2.cedar to policies/json/policy_2.json
Created policies/json/policy_2.json
Converting policies/policy_3.cedar to policies/json/policy_3.json
Created policies/json/policy_3.json
Conversion complete. JSON policy files are in ../policies/json/

The formatted policies will be output to the backend/policies/json/ directory.

After formatting the policies, run the following three commands, one for each policy, to upload them to Verified Permissions. The policy store ID is returned after completing step 2. Replace <policy store id> with the actual policy store ID.

aws verifiedpermissions create-policy  --definition file://policies/json/policy_1.json --policy-store-id <policy store id>
aws verifiedpermissions create-policy  --definition file://policies/json/policy_2.json --policy-store-id <policy store id>
aws verifiedpermissions create-policy  --definition file://policies/json/policy_3.json --policy-store-id <policy store id>

Example successful command output:

{
    "policyStoreId": "AAAAbbbbCCCCdddd",
    "policyId": "8AmzZYMw6Ux5DGBoX7w24m",
    "policyType": "STATIC",
    "principal": {
        "entityType": "PetStoreApp::UserGroup",
        "entityId": "<userPoolId>|administrator"
    },
    "createdDate": "2025-06-05T19:46:45.848602+00:00",
    "lastUpdatedDate": "2025-06-05T19:46:45.848602+00:00",
    "effect": "Permit"
}

Alternatively, you can also copy and paste Cedar policies into Verified Permissions in the AWS Management Console.

Step 5: Connect the Verified Permissions policy store to your OIDC identity provider

By default, the Verified Permissions authorizer middleware reads a JSON Web Token (JWT) provided within the authorizationheader of the API request to get user information. Verified Permissions can validate the token in addition to performing authorization policy evaluation.

To do this, create an identity source in Verified Permissions policy store. To simplify formatting in the AWS CLI command, we’ve defined the identity source configuration in identity-source-configuration.txtReplace the <userPoolArn> and <clientId> parameters in the following code block based on the outputs of running the setup-infrastructure.sh script in Step 2 of the prerequisites.
```
// identity-source-configuration.txt
{
    "cognitoUserPoolConfiguration": {
        "userPoolArn": "<userPoolArn>",
        "clientIds":["<clientId>"] ,
        "groupConfiguration": {
              "groupEntityType": "PetStoreApp::UserGroup"
        }
    }
}
```

After you update the file, run the following command to update the Verified Permissions policy store. Replace <policy store id> with the actual policy store ID.

aws verifiedpermissions create-identity-source --configuration file://identity-source-configuration.txt --policy-store-id <policy store id> --principal-entity-type PetStoreApp::User

Example successful command output:

{
    "createdDate": "2025-06-05T20:02:53.992782+00:00",
    "identitySourceId": "DTLvwdiKfdPmk2RWzSVfu2",
    "lastUpdatedDate": "2025-06-05T20:02:53.992782+00:00",
    "policyStoreId": "AAAAbbbbCCCCdddd"
}

Step 6: Update the application code to call Verified Permissions to authorize API access

You now need to update the application to use the @verifiedpermissions/authorization-clients and @cedar-policy/authorization-for-expressjs dependencies. This will allow the application to call Verified Permissions to authorize the API requests.

Add the dependencies and define the CedarAuthorizerMiddleware and AVPAuthorizer in the application by adding the following block of code to line 13 (directly after the import statements) of backend/app.ts. Replace <policystoreId> in the following code block with your actual Verified Permissions policy store ID.

const { ExpressAuthorizationMiddleware } = require('@cedar-policy/authorization-for-expressjs');

const { AVPAuthorizationEngine } = require('@verifiedpermissions/authorization-clients');

const avpAuthorizationEngine = new AVPAuthorizationEngine({
    policyStoreId: <policyStoreId>,
    callType: 'identityToken'
});

const expressAuthorization = new ExpressAuthorizationMiddleware({
    schema: {
        type: 'jsonString',
        schema: fs.readFileSync(path.join(__dirname, '../v4.cedarschema.json'), 'utf8'),
    },
    authorizationEngine: avpAuthorizationEngine,
    principalConfiguration: { type: 'identityToken' },
    skippedEndpoints: [],
    logger: {
        debug: (s: any) => console.log(s),
        log: (s: any) => console.log(s),
    }
});

Configure the Express application to use the authorization middleware that you just defined. To do this, add the following line of code to the end of the block of app.use(..) statements that begin after the comment // Configure security and performance middleware (approximately line 48 depending on how you pasted the previous block of code).
```
app.use(expressAuthorization.middleware);
```

You’ve now successfully set up authorization in your application by creating a Verified Permissions policy store, writing Cedar policies to define your authorization, and integrating your application with Verified Permissions.

Validating API security

You can use the frontend web application to verify that authorization has been applied to the APIs. In two separate terminals run the following commands in the /start directory

./scripts/run-backend-dev.sh
./scripts/run-frontend-dev.sh

In a browser navigate to http://localhost:3001 and sign in with one of the Amazon Cognito users you created earlier. Validate that the permissions policies are working as expected:

Administrators: Can view, create, update, and delete pets.
Employees: Can view, create, and update pets.
Customers: Can view pets and create new pets.

In the terminal for the Express application, you can see log output that provides additional details about the authorization decisions. For example, following an unauthorized action the terminal outputs the following:

Authorization result: {"type":"deny"}

Conclusion

The new @verifiedpermissions/authorization-clients-js package allows Express developers to integrate their application with Verified Permissions to decouple authorization logic from code. By decoupling your authorization logic and integrating your application with the Verified Permissions, you can improve developer productivity and simplify permissions and access audits.

To support analyzing and auditing permissions when writing cedar policies the open source Cedar project also recently open sourced the Cedar Analysis CLI to help developers perform policy analysis on their policies. You can learn more about this new tool in Introducing Cedar Analysis: Open Source Tools for Verifying Authorization Policies.

The framework packages are open source and available on GitHub under the Apache 2.0 license, with distribution through NPM. To learn more, see Amazon Verified Permissions and Cedar.

If you have feedback about this post, submit comments in the Comments section below.

Improve your security posture using Amazon threat intelligence on AWS Network Firewall

2025-06-17 Amit Gaur

Post Syndicated from Amit Gaur original https://aws.amazon.com/blogs/security/improve-your-security-posture-using-amazon-threat-intelligence-on-aws-network-firewall/

Today, customers use AWS Network Firewall to safeguard their workloads against common security threats. However, they often have to rely on third-party threat feeds and scanners that have limited visibility in AWS workloads to protect against active threats. A self-managed approach to cloud security through traditional threat intelligence feeds and custom rules can result in delayed responses, leaving customers exposed to active threats that are relevant to AWS workloads. Customers are looking for an automated approach to analyzing threats and deploying mitigations across multiple enforcement points to establish consistent defenses and want a unified, AWS-native solution that can rapidly protect against active threats across their entire cloud infrastructure.

This post introduces active threat defense, a new Network Firewall managed rule group that offers protection against active threats relevant to workloads in AWS. Active threat defense uses the AWS global infrastructure visibility and extensive threat intelligence to deliver automated, intelligence-driven security measures. The feature uses the Amazon threat intelligence system MadPot, which continuously tracks attack infrastructure, including malware hosting URLs, botnet command and control servers, and crypto mining pools, identifying indicators of compromise (IOCs) for active threats.

Active threat defense comes as a rule group AttackInfrastructure, which protects against malicious network traffic by blocking communications with detected attack infrastructure. After the managed rule group is configured in your firewall policy, Network Firewall now automatically blocks suspicious traffic to malicious IPs, domains, and URLs for indicator categories such as command-and-control (C2s), malware staging hosts, sinkholes, out-of-band testing (OAST), and mining-pools. It implements comprehensive filtering of both inbound and outbound traffic for various protocols, including TCP, UDP, DNS, HTTPS, and HTTP, and uses specific, verified threat indicators to facilitate high accuracy and minimize false positives.

Network Firewall with active threat defense protects AWS workloads using the following mechanisms:

Threat prevention: Automatically blocks malicious traffic using Amazon threat intelligence to identify and prevent active threats targeting workloads in AWS
Rapid protection: Continuously updates Network Firewall rules based on newly discovered threats, enabling immediate protection against them
Streamlined operations: Findings in GuardDuty marked with the threat list name “Amazon Active Threat Defense” can now be automatically blocked when active threat defense is enabled on Network Firewall
Collective defense: Deep threat inspection (DTI) enables shared threat intelligence, improving protection for active threat defense managed rule group users

Figure 1 illustrates the use of the active threat defense managed rule group with Network Firewall. It shows the automatic creation of stateful rules in the AWS managed rule group using threat data collected from MadPot.

Figure 1: Network Firewall with active threat defense

Getting started

The active threat defense managed rule group can be enabled directly within Network Firewall using the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDK. You can then associate the managed rule group with the Network Firewall policy. The rule group receives regular updates with new threat indicators and signatures, while automatically removing inactive or aged-out signatures.

Prerequisites

To get started with Network Firewall with active threat defense, visit the Network Firewall console or see the AWS Network Firewall Developers Guide. Active threat defense is supported in all AWS Regions where Network Firewall is available today, including the AWS GovCloud (US) Regions and China Regions.

If this is your first time using Network Firewall, make sure to complete the following prerequisites. If you already have a firewall policy and a firewall, you can skip this section.

Set up the active threat defense managed rule group

With the prerequisites in place, you can set up and use the active threat defence managed rule group.

To set up the managed rule group:

In the AWS Network Firewall console, choose Firewall policies in the navigation pane.
Select an existing firewall policy or the policy that you created as part of the prerequisites.

Figure 2: Select the Network Firewall policy
Scroll down to Stateful rule groups. On the right-hand side, choose Actions and select Add managed stateful rule groups.

Figure 3: Add a rule group
On the Add managed stateful rule groups page, scroll down to active threat defense. Select the rule group AttackInfrastructure. Based on your requirements for Deep threat inspection, you can opt out if you don’t want Network Firewall to process service logs. Choose Add to policy.

Figure 4: Add the rule group to the policy
You can verify on the next page the managed rule group was added to the policy.

Figure 5: Verify that the managed rule group was added to the policy

Pricing

For active threat defense pricing, see AWS Network Firewall pricing.

Considerations

The first consideration is to understand how Network Firewall is more effective in detecting and mitigating threats associated with HTTPS traffic when the TLS inspection feature is used alongside the active threat defense managed rule group. TLS inspection enables active threat defense to analyze the actual content of encrypted connections, allowing it to identify and block malicious URLs that might otherwise pass undetected. This process involves decrypting traffic, inspecting the contents for known malicious URL patterns or behaviors, and then re-encrypting the traffic if it’s deemed safe. For more information on the considerations on TLS inspection, see Considerations for TLS inspection. Organizations must balance the security benefits with potential latency introduction and make sure that they have proper controls in place to handle sensitive decrypted data.

Another consideration is the mitigation of false positives. When you use this managed rule group in your firewall policy, you can edit rule group alert settings to help identify false-positives as part of a mitigation strategy. For more information, see mitigating false-positives.

The final consideration is how the use of managed rule groups count against the limit of stateful rules for each policy. For more information, see AWS Network Firewall quotas and Setting rule group capacity in AWS Network Firewall.

Conclusion

In this post, you learned how to use the AWS Network Firewall active threat defense managed rule group to safeguard workloads against active threats.

If you have feedback about this post, submit comments in the Comments section below.

Validating event payload with Powertools for AWS Lambda (TypeScript)

2025-06-12 Alexander Schüren

Post Syndicated from Alexander Schüren original https://aws.amazon.com/blogs/compute/validating-event-payload-with-powertools-for-aws-lambda-typescript/

In this post, learn how the new Powertools for AWS Lambda (TypeScript) Parser utility can help you validate payloads easily and make your Lambda function more resilient.

Validating input payloads is an important aspect of building secure and reliable applications. This ensures that data that an application receives can gracefully handle unexpected or malicious inputs and prevent harmful downstream processing. When writing AWS Lambda functions, developers need to validate and verify the payload and ensure that specific fields and values are correct and safe to process.

Powertools for AWS Lambda is a developer toolkit available in Python, NodeJS/TypeScript, Java and .NET. It helps implement serverless best practices and increase developer velocity. Powertools for AWS Lambda (TypeScript) is introducing a new Parser utility to help developers more easily implement validation in their Lambda functions.

Why payload validation matters

Validating payloads can make your Lambda functions more resilient. Payloads that combine both technical and business information can also be challenging to validate. This requires writing validation logic inside your Lambda function code.This could range from a few if-statements to check payload values to a complex series of validation steps based on custom business logic. You may need to separate the validation of the technical information of payload like AWS Region, accountId, event source and business information inside the event such as productId and payment details.

It can be challenging to know the structure and the values of the event object and how to extract the relevant information. For example, an Amazon SQS event has a body field with a string value, which can be a JSON document. Amazon EventBridge has an object in the detail field that you can read directly without further transformation. You may need to decompress, decode, transform, and validate the payload inside a specific field. Understanding the many transformation layers can be complex, especially if your event object is a result of multiple service invocations.

Using the Powertools for AWS Lambda (TypeScript) Parser Utility

Powertools for AWS Lambda (TypeScript) is a modular library. You can selectively install features such as Logger, Tracer, Metrics, Batch Processing, Idempotency, and more. You can use Powertools for AWS Lambda in both TypeScript and JavaScript code bases. The new Parser utility simplifies validation and uses the popular validation library, Zod.

You can use parser as method decorator, with middyjs middleware or manually in all Lambda provided NodeJS runtimes

To use the utility, install the Powertools parser utility and Zod (<v3.x) using NPM or any package manager of your choice:

npm install @aws-lambda-powertools/parser zod@~3

You can define your schema using Zod. Here is an example of a simple order schema for validating events:

import { z } from 'zod';
const orderSchema = z.object({
    id: z.number().positive(),
	description: z.string(),
	items: z.array(
	    z.object({
		    id: z.number().positive(),
			quantity: z.number(),
			description: z.string(),
			})
		),
	});
export { orderSchema };

This order schema defines id, description and a list of items. You can specify the value types from simple numeric, narrow it down to positive or literal, or more complex like unition, array or even other schema. Zod offers an extensive list of value types that you can use.

Add the parser decorator to your handler function and set the schema parameter and use this schema to parse the event object.

import type {Context} from 'aws-lambda';
import type {LambdaInterface} from '@aws-lambda-powertools/commons/types';
import {parser} from '@aws-lambda-powertools/parser';
import {z} from 'zod';
import {Logger} from '@aws-lambda-powertools/logger';

const logger = new Logger();

const orderSchema = z.object({
    id: z.number().positive(),  
	description: z.string(),  
	items: z.array(
	    z.object({
		    id: z.number().positive(),
			quantity: z.number(),
			description: z.string(),
		})
	),
});

type Order = z.infer<typeof orderSchema>;

class Lambda implements LambdaInterface {
    @parser({schema: orderSchema})  
	public async handler(event: Order, _context: Context): Promise<void> {
	    // event is now typed as Order    
		for (const item of event.items) {      
		    logger.info('Processing item', {item});
			// process order item from the event
		}
	}
}
	
const myFunction = new Lambda();

export const handler = myFunction.handler.bind(myFunction);

Note that z.infer helps to extract the Order type from the schema, which improves development experience with autocomplete when using TypeScript. Zod parses the entire object, including nested fields, and reports all the errors combined, instead of returning only the first error.

Using built-in schema for AWS services

A more common scenario is to validate events from AWS Services that trigger Lambda functions, including Amazon SQS, Amazon EventBridge and many more. To make this easier Powertools includes pre-built schema for AWS events that you can use.

To parse an incoming Amazon EventBridge event, set the built-in schema in your parser configuration:

import {LambdaInterface} from '@aws-lambda-powertools/commons/types';
import {Context} from 'aws-lambda';
import {parser} from '@aws-lambda-powertools/parser';
import {EventBridgeSchema} from '@aws-lambda-powertools/parser/schemas';
import type {EventBridgeEvent} from '@aws-lambda-powertools/parser/types';

class Lambda implements LambdaInterface {  
    @parser({schema: EventBridgeSchema})  
	public async handler(event: EventBridgeEvent, _context: Context): Promise<void> {    
	    // event is parsed but the detail field is not specified  
	}
}

const myFunction = new Lambda();

export const handler = myFunction.handler.bind(myFunction);

The event object is parsed and validated during runtime and the TypeScript type EventBridgeEvent helps you understand the structure and access the fields during development. In this example, you only parse the EventBridge event object, so the detail field can be an arbitrary object.

You can also extend the built-in EventBridge schema and override the detail field with your custom oderSchema.

import {LambdaInterface} from '@aws-lambda-powertools/commons/types';
import {Context} from 'aws-lambda';
import {parser} from '@aws-lambda-powertools/parser';
import {EventBridgeSchema} from '@aws-lambda-powertools/parser/schemas';
import {z} from 'zod';

const orderSchema = z.object({  
    id: z.number().positive(),  
	description: z.string(),  
	items: z.array(
	    z.object({
		    id: z.number().positive(),      
			quantity: z.number(),      
			description: z.string(),
		}), 
	),
});

const eventBridgeOrderSchema = EventBridgeSchema.extend({  detail: orderSchema,});

type EventBridgeOrder = z.infer<typeof eventBridgeOrderSchema>;

class Lambda implements LambdaInterface {  
    @parser({schema: eventBridgeOrderSchema})  public async handler(event: EventBridgeOrder, _context: Context): Promise<void> {
	    // event.detail is now parsed as orderSchema  
	}
}

const myFunction = new Lambda();

export const handler = myFunction.handler.bind(myFunction);

The parser validates the full structure of the entire EventBridge event including the custom business object. Use .extend or other Zod schema functions to change any field of the built-in schema and customize the payload validation.

Using envelopes with custom schema

In some cases, you only need the custom portion of the payload, for example, the detail field of the EventBridge event or the body of SQS records. This requires you to parse the event schema manually, extract the required field, and then parse it again with the custom schema. This is complex as you must know the exact payload field and how to transform and parse it.

Powertools Parser utility helps solve this problem with Envelopes. Envelopes are schema objects with built-in logic to extract custom payloads.

Here is an example of the EventBridgeEnvelope of how it works:

import {LambdaInterface} from '@aws-lambda-powertools/commons/types';
import {Context} from 'aws-lambda';
import {parser} from '@aws-lambda-powertools/parser';
import {EventBridgeEnvelope} from '@aws-lambda-powertools/parser/envelopes';
import {z} from 'zod';

const orderSchema = z.object({  
    id: z.number().positive(), 
	description: z.string(),  
	items: z.array(    
	    z.object({      
		    id: z.number().positive(),      
			quantity: z.number(),      
			description: z.string(),
		}), 
	),
});

type Order = z.infer<typeof orderSchema>;

class Lambda implements LambdaInterface {  
    @parser({schema: orderSchema, envelope: EventBridgeEnvelope})  public async handler(event: Order, _context: Context): Promise<void> {
        // event is now typed as Order inferred from the orderSchema  
    }
}

const myFunction = new Lambda();

export const handler = myFunction.handler.bind(myFunction);

By setting schema and envelope, the parser utility knows how to combine both parameters, extract, and validate the custom payload from the event. Powertools Parser transforms the event object according to the schema definition so you can focus on your business-critical part of the code inside the handler function.

Safe parsing

If the object does not match the provided Zod schema, by default, the parser throws ParserError. If you require control over validation errors and need to implement custom error handling, use the safeParse option.

Here is an example how to capture failed validations as a metric in your handler function:

import {Logger} from "@aws-lambda-powertools/logger";
import {LambdaInterface} from "@aws-lambda-powertools/commons/types";
import {parser} from "@aws-lambda-powertools/parser";
import {orderSchema} from "../app/schema";import {z} from "zod";
import {EventBridgeEnvelope} from "@aws-lambda-powertools/parser/envelopes";
import {Metrics, MetricUnit} from "@aws-lambda-powertools/metrics";
import {ParsedResult, EventBridgeEvent} from "@aws-lambda-powertools/parser/types";

const logger = new Logger();

const metrics = new Metrics();

type Order = z.infer<typeof orderSchema>;

class Lambda implements LambdaInterface {  

    @metrics.logMetrics() 
	@parser({schema: orderSchema, envelope: EventBridgeEnvelope, safeParse: true})  
	public async handler(event: ParsedResult<EventBridgeEvent, Order>, _context: unknown): Promise<void> {
	    if (!event.success) {      
		    // failed validation      
			metrics.addMetric('InvalidPayload', MetricUnit.Count, 1);      
			logger.error('Invalid payload', event.originalEvent);    
		} else {      
		    // successful validation      
			for (const item of event.data.items) {        
			    logger.info('Processing item', item);        
				// event.data is typed as Order      
			}   
		}  
	}
}

const myFunction = new Lambda();

export const handler = myFunction.handler.bind(myFunction);

Setting safeParse option to true does not throw an error, but returns a modified event object that has a success flag and either error or data fields, depending on the validation result. You can then create custom error handling, for example incrementing InvalidPayload metric and access the originalEvent to log an error.

For successful validations, you can access the data field and process the payload. Note that the event object type is now ParsedResult with the EventBridgeEvent as input and Order as output types.

Custom validations

Sometimes you may require more complex business rules for your validation. Because Parser built-in schemas are Zod objects, you can customize the validation by applying .extend, .refine , .transform and other Zod operators. Here is an example of complex rules for the orderSchema:

import {z} from 'zod';

const orderSchema = z.object({
    id: z.number().positive(),  
	description: z.string(),  
	items: z.array(z.object({
	    id: z.number().positive(),    
		quantity: z.number(),    
		description: z.string(),  
	})).refine((items) => items.length > 0, {
	    message: 'Order must have at least one item',  
	}),
})  
    .refine((order) => order.id > 100 && order.items.length > 100, {
	    message:      'All orders with more than 100 items must have an id greater than 100', 
	});

Use .refine on items field to check if there is at least one item in the order. You can also combine multiple fields, here order.id and order.items.length, to have a specific rule for orders with more than 100 items. Keep in mind that .refine runs during the validation step and .transform will be applied after the validation. This allows you to change the shape of the data to normalize the output.

Conclusion

Powertools for AWS Lambda (TypeScript) is introducing a new Parser utility that makes it easier to add validation to your Lambda functions. By relying on the popular validation library Zod, Powertools offers an extensive set of built-in schemas for popular AWS service integrations including Amazon SQS, Amazon DynamoDB, Amazon EventBridge. Developers can use these schemas to validate their event payloads and also customize them according to their business needs.

Visit the documentation to learn more and join our Powertools community Discord to connect with like-minded serverless enthusiasts.

Build a multi-Region AWS PrivateLink backed service with seamless failover

2025-06-12 Madhav Vishnubhatta

Post Syndicated from Madhav Vishnubhatta original https://aws.amazon.com/blogs/architecture/build-a-multi-region-aws-privatelink-backed-service-with-seamless-failover/

Global Payments Inc. is a leading worldwide provider of payment technology and software solutions, headquartered in Atlanta, Georgia. The company processes more than 75 billion transactions annually, serving more than 5 million merchant locations and nearly 2,000 financial institutions globally. Through its merger with TSYS in 2019, Global Payments expanded its capabilities beyond merchant acquiring services to include issuer processing solutions.

The company’s services now encompass ecommerce and omnichannel payments, business management software, customer engagement tools, and cloud-based solutions. Their commitment to technological innovation and customer service has positioned them as one of the largest financial technology companies globally, consistently ranking among the Fortune 500.

This post demonstrates how the Issuer Solutions business of Global Payments, as a service provider, implemented cross-Region failover for an AWS PrivateLink backed service exposed to their customers. Their solution enables failover to a secondary Region without customer coordination, reducing Recovery Time Objective (RTO).

AWS PrivateLink involves two key roles: service providers and service consumers. Service providers build, own, and manage endpoint services. Service consumers create and manage Amazon Virtual Private Cloud (Amazon VPC) endpoints that connect their VPC to an Amazon VPC endpoint service privately, without exposing the traffic to the public internet. Enterprises often build services in multiple AWS Regions for resilience. This approach requires endpoint services in two Regions, with service consumers creating VPC endpoints for each service.

The architecture uses Amazon Route 53 to resolve the service’s Fully Qualified Domain Name (FQDN) to the active Region’s service. Amazon Application Recovery Controller (ARC) is used to initiate the failover.

Customer requirements

Issuer Solutions had the following requirements for this implementation:

The ability for consumers to access the service privately, without traversing the public internet
Resilience to degradation in a single Region by allowing failover to a secondary Region
Independent failover without customer coordination
Reliable and simple failover process

Solution overview

The following simplified architecture diagram illustrates the connectivity and failover mechanisms. The exact service implementation of Issuer Solutions is beyond this post’s scope. For simplicity, we represent the service as a Network Load Balancer backed by Amazon Elastic Compute Cloud (Amazon EC2) instances.

AWS Architecture diagram showing primary and secondary regions with Route53 Private Hosted Zones, VPC endpoints, and PrivateLink integration, illustrating the connectivity and failover mechanisms.

The solution consists of the following key components:

The service provider deploys identical services in two Regions. The service is represented in this simplified version with a Network Load Balancer backed by EC2 instances. Each Region is independent and therefore resilient against failures in the other Region.
Services are exposed through PrivateLink as VPC endpoint services in each Region, allowing client connections without needing NAT gateways or internet gateways and keeping the traffic within the customers’ own private IP space.
The service provider authorizes the consumer AWS account to find the services using the VPC endpoint service names.
The service consumer uses the VPC PrivateLink endpoint service names to create VPC endpoints with Elastic Network Interfaces (ENIs) in two Availability Zones in each Region. The consumer does this in both consumer Regions, and each consumer Region has two sets of VPC endpoints: one for the primary service of the provider and one for the secondary service.
The service consumer creates a Route 53 private hosted zone in each of the two Regions they use, each with two alias records, with a simple routing policy, pointing to the VPC endpoints’ FQDNs in their own Regions. These two alias records are primary.example.com and secondary.example.com.
The service provider creates a Route 53 ARC cluster with routing controls for the primary and secondary Regions.
The service provider creates a private hosted zone with a failover record set for a consumer specific CNAME like custC.service.p.com that resolves to primary.example.com and secondary.example.com. These records are associated with health checks that are associated with Route 53 ARC routing controls.

As shown in the architecture diagram, it is not necessary for the provider’s and consumer’s Regions to be the same because AWS supports creating VPC endpoints in a Region different to that of the VPC endpoint service itself. Refer to AWS PrivateLink now supports cross-region connectivity for more details.

How the DNS resolution works

Each consumer Region has two Route 53 private hosted zones involved in DNS resolution in this approach: the service provider’s hosted zone and the service consumer’s hosted zone. Both hosted zones are associated with the VPC used by the service consumer. Here is how the DNS resolution works:

When a client in the consumer VPC wants to reach the service, it uses the FQDN custC.service.p.com.
The hosted zone in the service provider’s account resolves custC.service.p.com to either primary.example.com or secondary.example.com depending on the status of the health checks controlled by ARC. For now, let’s assume this resolves to primary.example.com.
Next, primary.example.com resolves to the VPC endpoint FQDN com.amazonaws.vpce.<primary-region>.vpce-svcabc123 due to the hosted zone in the service consumer.

At the time of failover, the service provider will update the ARC health checks to turn off the primary control and turn on the secondary routing control. This causes the service provider’s hosted zone to resolve custC.service.p.com to secondary.example.com, which in turn resolves to the VPC endpoint FQDN in the secondary Region due to the service consumer’s hosted zone.

Considerations

With this setup, the service provider can fail over when they need to without the service consumer having to manually make any changes. This is especially useful for services that have multiple service consumers. Additionally, service consumers can make changes to the VPC endpoint as they see fit. They only need to update the hosted zone they manage to make sure that primary.example.com and secondary.example.com point to the correct VPC endpoints.

We used ARC in this post because it offers a robust solution for cross-Region failover with built-in static stability. ARC also avoids single points of failure in the failover logic by distributing it across multiple Regions.

This setup demonstrates an active-passive configuration where traffic is routed to a single Region at a time using the Route 53 failover routing policy. For an active-active approach, you can adapt this setup by employing alternative Route 53 policies such as weighted or latency-based routing, as detailed in Active-active and active-passive failover.

Code sample

We have created a GitHub repository with Terraform code to demonstrate this solution. The repository has the steps to set it up and test it.

Conclusion

This implementation of cross-Region failover by Global Payments Issuer Solutions for their PrivateLink backed service demonstrates a robust and flexible approach to providing high availability and resilience. By using AWS services such as PrivateLink, Route 53, and Route 53 ARC, they have created a solution that meets their key requirements. This architecture not only benefits Global Payments by allowing them to manage failovers efficiently, but also provides advantages to their service consumers. Customers maintain control over their own infrastructure while benefiting from seamless service continuity. As cloud architectures continue to evolve, solutions like this showcase the power of combining various AWS services to create highly available and fault-tolerant systems that meet complex business needs.

Clone our GitHub repository now and deploy this solution in your own AWS environment to try out this approach to cross-Region failover. Contact your AWS representative today to begin your journey toward enhanced business continuity.

About the Authors

How Nexthink built real-time alerts with Amazon Managed Service for Apache Flink

2025-06-12 Nikos Tragaras, Raphaël Afanyan

Post Syndicated from Nikos Tragaras, Raphaël Afanyan original https://aws.amazon.com/blogs/big-data/how-nexthink-built-real-time-alerts-with-amazon-managed-service-for-apache-flink/

This post is cowritten with Nikos Tragaras and Raphaël Afanyan from Nexthink.

In this post, we describe Nexthink’s journey as they implemented a new real-time alerting system using Amazon Managed Service for Apache Flink. We explore the architecture, the rationale behind key technology choices, and the Amazon Web Services (AWS) services that enabled a scalable and efficient solution.

Nexthink is a pioneering leader in digital employee experience (DEX). With a mission to empower IT teams and elevate workplace productivity, Nexthink’s Infinity platform offers real-time visibility into end user environments, actionable insights, and robust automation capabilities. By combining real-time analytics, proactive monitoring, and intelligent automation, Infinity enables organizations to deliver an optimal digital workspace.

In the past 5 years, Nexthink completed its transformation into a fully-fledged cloud platform that processes trillions of events per day, reaching over 5 GB per second of aggregated throughput. Internally, Infinity comprises more than 300 microservices that use the power of Apache Kafka through Amazon Managed Service for Apache Kafka (Amazon MSK) for data ingestion and intra-service communication. The Nexthink ecosystem includes several hundreds of Micronaut-based Java microservices deployed in Amazon Elastic Kubernetes Service (Amazon EKS). The vast majority of microservices interact with Kafka through the Kafka Streams framework.

Nexthink alerting system

To help you understand Nexthink’s journey toward a new real-time alerting solution, we begin by examining the existing system and the evolving requirements that led them to seek a new solution.

Nexthink’s existing alerting system provides near real-time notifications, helping users detect and respond to critical events quickly. While effective, this system has limitations in scalability, flexibility, and real-time processing capabilities.

Nexthink gathers telemetry data from thousands of customers’ laptops covering CPU usage, memory, software versions, network performance, and more. Amazon MSK and ClickHouse serve as the backbone for this data pipeline. All endpoint data is ingested in Kafka multi-tenant topics, which are processed and finally stored in a ClickHouse database.

Using the current alerting system, clients can define monitoring rules in Nexthink Query Language (NQL), which are evaluated in near real time by polling the database every 15 minutes. Alerts are triggered when anomalies are detected against client-defined thresholds or long-term baselines. This process is illustrated in the following architecture diagram.

Originally, database-polling allowed great flexibility in the evaluation of complex alerts. However, this approach placed heavy stress on the database. As the company grew and supported larger customers with more endpoints and monitors, the database experienced increasingly heavy loads.

Evolution to a new use-case: Real-time alerts

As Nexthink expanded its data collection to include virtual desktop infrastructure (VDI), the need for real-time alerting became even more critical. Unlike traditional endpoints, such as laptops, where events are gathered every 5 minutes, VDI data is ingested every 30 seconds—significantly increasing the volume and frequency of data. The existing architecture relied on database polling to evaluate alerts, running at a 15-minute interval. This approach was inadequate for the new VDI use case, where alerts needed to be evaluated in near real time on messages arriving every 30 seconds. Merely increasing the polling frequency wasn’t a viable option because it would place excessive load on the database, leading to performance bottlenecks and scalability challenges. To meet these new demands efficiently, we shifted to real-time alert evaluation directly on Kafka topics.

Technology options

As we evaluated solutions for our real-time alerting system, we analyzed two main technology options: Apache Kafka Streams and Apache Flink. Each option had benefits and limitations that needed to be considered.

All Nexthink microservices up to that point integrated with Kafka using Apache Kafka Streams. We’ve observed in practice multiple benefits:

Lightweight and seamless integration. No need for additional infrastructure.
Low latency using RocksDB as a local key-value store.
Team expertise. Nexthink teams have been writing microservices with Kafka-streams for a long time and feel very comfortable using it.

In some use cases however, we found that there were important limitations:

Scalability – Scalability was constrained by the tight coupling between parallelism of microservices and the number of partitions in Kafka topics. Many microservices had already scaled out to match the partition count of the topics they consumed, limiting their ability to scale further. One potential solution was increasing the partition count. However, this approach introduced significant operational overhead, especially with microservices consuming topics owned by other domains. It required rebalancing the entire Kafka cluster and needed coordination across multiple teams. Additionally, such modifications impacted downstream services, requiring careful reconfiguration of stateful processing. The alternative approach would be to introduce intermediate topics to redistribute workload, but this would add complexity to the data pipeline and increase resource consumption on Kafka. These challenges made it clear that a more flexible and scalable approach was needed.
State management – Services that needed to create large K-tables in memory had an increased startup time. Also, in cases where the internal state was large in volume, we found that it applied significant load to the Kafka cluster during the creation of the internal state.
Late event processing – In windowing operations, late events had to be managed manually with techniques that complexified the codebase.

Seeking an alternative that could help us overcome the challenges posed by our current system, we decided to evaluate Flink. Its robust streaming capabilities, scalability, and flexibility made it an excellent choice for building real-time alerting systems based on Kafka topics. Several advantages made Flink particularly appealing:

Native integration with Kafka – Flink offers native connectors for Kafka, which is a central component in the Nexthink ecosystem.
Event-time processing and support for late events – Flink allows messages to be processed based on the event time (that is, when the event actually occurred) even if they arrive out of order. This feature is crucial for real-time alerts because it guarantees their accuracy.
Scalability – Flink’s distributed architecture allows it to scale horizontally independently from the number of partitions in the Kafka topics. This feature weighed a lot in our decision-making because the dependence on the number of partitions was a strong limitation in our platform up to this point.
Fault tolerance – Flink supports checkpoints, allowing managed state to be persisted and ensuring consistent recovery in case of failures. Unlike Kafka Streams, which relies on Kafka itself for long-term state persistence (adding extra load to the cluster), Flink’s checkpointing mechanism operates independently and runs out-of-band, minimizing the impact on Kafka while providing efficient state management.
Amazon Managed Service for Apache Flink – Amazon Managed Service for Apache Flink is a fully managed service that simplifies the deployment, scaling, and management of Flink applications for real-time data processing. By eliminating the operational complexities of managing Flink clusters, AWS enables organizations to focus on building and running real-time analytics and event-driven applications efficiently. Amazon Managed Service for Apache Flink provided us with significant flexibility. It streamlined our evaluation process, which meant we could quickly set up a proof-of-concept environment without getting into the complexities of managing an internal Flink cluster. Moreover, by reducing the overhead of cluster management, it made Flink a viable technology choice and accelerated our delivery timeline.

Solution

After careful evaluation of both options, we chose Apache Flink as our solution due to its superior scalability, robust event-time processing, and efficient state management capabilities. Here’s how we implemented our new real-time alerting system.

The following diagram is the solution architecture.

The first use case was to detect issues with VDI. However, our intention was to build a generic solution that would give us the option to onboard in the future existing use cases currently implemented through polling. We wanted to maintain a common way of configuring monitoring conditions and allow alert evaluation both with polling as well as in real time, depending on the type of device being monitored.

This solution comprises multiple parts:

Monitor configuration – Using Nexthink Query Language (NQL), the alerts administrator defines a monitor that specifies, for example:
- Data source – VDI events
- Time window – Every 30 seconds
- Metric – Average network latency, grouped by desktop pool
- Trigger condition(s) – Latency exceeding 300 ms for a continual period of 5 minutes

This monitor configuration is then stored in an internally developed document store and propagated downstream in a Kafka topic.

Data processing using Generic Stream Services– The Nexthink Collector, an agent installed on endpoints, captures and reports various kinds of activities from the VDI endpoints where it’s installed. These events are forwarded to Amazon MSK in one of Nexthink’s production virtual private clouds (VPCs) and are consumed by Java microservices running on Amazon EKS belonging to several domains within Nexthink

One of them is Generic Stream Services, a system that processes the collected events and aggregates them in buckets of 30 seconds. This component works as self-service for all the feature teams in Nexthink and can query and aggregate data from an NQL query. This way, we were able to keep a unified user experience on monitor configuration using NQL, regardless of how alerts were evaluated. This component is broken down into two services:

- GS processor – Consumes raw VDI session events and applies initial processing
- GS aggregator – Groups and aggregates the data according to the monitor configuration

Real-time monitoring using Flink – Static threshold alerting and seasonal change detection, which identifies variations in data that follow a recurring pattern over time, are the two types of detection that we offer for VDI issues. The system splits the processing between two applications:
- Baseline application – Calculates statistical baselines with seasonality using time-of-day anomaly algorithm. For example, the latency by VDI client location or the CPU queue length of a desktop pool.
- Alert application – Generates alerts based on user-defined thresholds when the unexpected values don’t change over time or dynamic thresholds based on baselines, which trigger when a metric deviates from an expected pattern.

The following diagram illustrates how we join VDI metrics with monitor configurations, aggregate data using sliding time windows, and evaluate threshold rules, all within Apache Flink. From this process, alerts are generated and are then grouped and filtered before being processed further by the consumers of alerts.

Alert processing and notifications – After an alert is triggered (when a threshold is exceeded) or recovered (when a metric returns to normal levels), the system will assess their impact to prioritize response through the impact processing module. Alerts are then consumed by notification services that deliver messages through emails or webhooks. The alert and impact data are then ingested into a time series database.

Benefits of the new architecture

One of the key advantages of adopting a streaming-based approach over polling was its ease of configuration and management, especially for a small team of three engineers. There was no need for cluster management, so all we needed to do was to provision the service and start coding.

Given our prior experience with Kafka and Kafka Streams and combined with the simplicity of a managed service, we were able to quickly develop and deploy a new alerting system without the overhead of complex infrastructure setup. We used Amazon Managed Service for Apache Flink to spin up a proof of concept within a few hours, which meant the team could focus on defining the business logic without having concerns related to cluster management.

Initially, we were concerned about the challenges of joining multiple Kafka topics. With our previous Kafka Streams implementation, joined topics required identical partition keys, a constraint known as co-partitioning. This created an inflexible architecture, particularly when integrating topics across different business domains. Each domain naturally had its own optimal partitioning strategy, forcing difficult compromises.

Amazon Managed Service for Apache Flink solved this problem through its internal data partitioning capabilities. Although Flink still incurs some network traffic when redistributing data across the cluster during joins, the overhead is practically negligible. The resulting architecture is both more scalable (because topics can be scaled independently based on their specific throughput requirements) and easier to maintain without complex partition alignment concerns.

This significantly improved our ability to detect and respond to VDI performance degradations in real time while keeping our architecture clean and efficient.

Lessons learnt

As with any new technology, adopting Flink for real-time processing came with its own set of challenges and insights.

One of the primary difficulties we encountered was observing Flink’s internal state. Unlike Kafka Streams, where the internal state is by default backed by a Kafka topic from which its content can be visualized, Flink’s architecture makes it inherently difficult to inspect what is happening inside a running job. This required us to invest in robust logging and monitoring strategies to better understand what is happening during the execution and debug issues effectively.

Another critical insight emerged around late event handling—specifically, managing events with timestamps that fall within a time-window’s boundaries but arrive after that window has closed. Amazon Managed Service for Apache Flink addresses this challenge through its built-in watermarking mechanism. A watermark is a timestamp-based threshold that indicates when Flink should consider all events before a specific time to have arrived. This allows the system to make informed decisions about when to process time-based operations like window aggregations. Watermarks flow through the streaming pipeline, enabling Flink to track the progress of event time processing even with out-of-order events.

Although watermarks provide a mechanism to manage late data, they introduce challenges when dealing with multiple input streams operating at different speeds. Watermarks work well when processing events from a single source but can become problematic when joining streams with varying velocities. This is because they can lead to unintended delays or premature data discards. For example, a slow stream can hold back processing across the entire pipeline, and an idle stream might cause premature window closing. Our implementation required careful tuning of watermark strategies and allowable lateness parameters to balance processing timeliness with data completeness.

Our transition from Kafka Streams to Apache Flink proved smoother than initially anticipated. Teams with Java backgrounds and prior experience with Kafka Streams found Flink’s programming model intuitive and easy to use. The DataStream API offers familiar concepts and patterns, and Flink’s more advanced features could be adopted incrementally as needed. This gradual learning curve gave our developers the flexibility to become productive quickly, focusing first on core stream processing tasks before moving on to more advanced concepts like state management and late event processing.

The future of Flink in Nexthink

Real-time alerting is now deployed to production and available to our clients. A major success of this project was the fact that we successfully introduced a technology as an alternative to Kafka streams, with very little management requirements, guaranteed scalability, data-management flexibility, and comparable cost.

The impact on the Nexthink alerting system was significant because we no longer have a single evaluating alert through database polling. Therefore, we’re already assessing the timeframe for onboarding other alerting use cases to real-time evaluation with Flink. This will alleviate database load and will also provide more accuracy on the alert triggering.

Yet the impact of Flink isn’t limited to the Nexthink alerting system. We now have a proven production-ready alternative for services that are limited in terms of scalability due to the number of partitions of the topics they are consuming. Thus, we’re actively evaluating the option to convert more services to Flink to allow them to scale out more flexibly.

Conclusion

Amazon Managed Service for Apache Flink has been transformative for our real-time alerting system at Nexthink. By handling the complex infrastructure management, AWS enabled our team to deploy a sophisticated streaming solution in less than a month, keeping our focus on delivering business value rather than managing Flink clusters.

The capabilities of Flink have proven it to be more than an alternative to Kafka Streams. It’s become a compelling first choice for both new projects and existing feature refactoring. Windowed processing, late event management, and stateful streaming operations have made complex use cases remarkably straightforward to implement. As our development teams continue to explore Flink’s potential, we’re increasingly confident that it will play a central role in Nexthink’s real-time data processing architecture moving forward.

To get started with Amazon Managed Service for Apache Flink, explore the getting started resources and the hands-on workshop. To learn more about Nexthink’s broader journey with AWS, visit the blog post on Nexthink’s MSK-based architecture.

About the authors

Nikos Tragaras is a Principal Software Architect at Nexthink with around two decades of experience in building distributed systems, from traditional architectures to modern cloud-native platforms. He has worked extensively with streaming technologies, focusing on reliability and performance at scale. Passionate about programming, he enjoys building clean solutions to complex engineering problems

Raphaël Afanyan is a Software Engineer and Tech Lead of the Alerts team at Nexthink. Over the years, he has worked on designing and scaling data processing systems and played a key role in building Nexthink’s alerting platform. He now collaborates across teams to bring innovative product ideas to life, from backend architecture to polished user interfaces.

Simone Pomata is a Senior Solutions Architect at AWS. He has worked enthusiastically in the tech industry for more than 10 years. At AWS, he helps customers succeed in building new technologies every day.

Subham Rakshit is a Senior Streaming Solutions Architect for Analytics at AWS based in the UK. He works with customers to design and build streaming architectures so they can get value from analyzing their streaming data. His two little daughters keep him occupied most of the time outside work, and he loves solving jigsaw puzzles with them. Connect with him on LinkedIn.

Lorenzo Nicora works as a Senior Streaming Solutions Architect at AWS, helping customers across EMEA. He has been building cloud-centered, data-intensive systems for over 25 years, working across industries both through consultancies and product companies. He has used open source technologies extensively and contributed to several projects, including Apache Flink.

Optimizing ODCR usage through AI-powered capacity insights

2025-06-05 Ankush Goyal

Post Syndicated from Ankush Goyal original https://aws.amazon.com/blogs/compute/optimizing-odcr-usage-through-ai-powered-capacity-insights/

Efficient resource management is crucial for organizations seeking to optimize cloud costs while making sure of seamless access to compute capacity. Amazon Elastic Compute Cloud (Amazon EC2) On-Demand Capacity Reservations (ODCRs) provide the flexibility to reserve compute capacity within a specific Availability Zone (AZ) for any duration. This makes sure that critical workloads always have the necessary resources available, minimizing the risk of capacity shortages.

However, managing ODCRs across multiple teams and accounts presents several challenges:

Limited visibility across teams: In large organizations, tracking ODCR usage across teams and business units can be difficult, often leading to underused or duplicate reservations.
Complexity in optimization: Without clear insights, adjusting, releasing, or modifying reservations to align with changing workloads becomes a cumbersome process.
Cost management challenges: Unused or oversized ODCRs contribute to unnecessary expenses, making it essential to continuously monitor and optimize usage.

In this post, we demonstrate how Amazon Bedrock Agents can help organizations gain actionable insights into ODCR usage across their Amazon Web Services (AWS) environment. Unlike traditional approaches that rely on predefined patterns or manual tracking, this solution dynamically retrieves the latest ODCR usage data and provides intelligent, query-based recommendations to fulfill the capacity needs. This solution is serverless and pay-per-use and incurs minimal operational cost.

This approach allows IT leaders, cloud architects, and finance teams to optimize reservations, control costs, and enhance overall resource management—without the complexity of traditional analysis methods.

Solution overview

This solution addresses two specific use cases:

The team must create a new ODCR to accommodate an upcoming project.
The team needs to expand the capacity of their existing ODCR.

The system consists of three essential components:

ODCRSupervisorAgent: Functions as the main coordinator, handling user inquiries and managing requests to specialized subordinate agents.
CapacityPlanningAgent: Reviews existing ODCRs throughout the organization, recommending SPLIT and MOVE operations to optimize capacity allocation for new projects.
AugmentationAgent: Identifies opportunities to increase existing ODCR capacity by recommending MOVE operations from other organizational ODCRs.

The following figure shows the high-level architecture for this solution.

The system architecture uses AWS services for comprehensive ODCR management, as shown in the following figure. The platform combines Amazon Cognito for authentication, AWS Amplify for front-end delivery, and AWS Lambda for serverless computing. AWS Resource Explorer enables efficient discovery and tracking of ODCRs across AWS Regions, while Lambda functions periodically query Resource Explorer to retrieve detailed ODCR information and usage metrics. AWS Identity and Access Management (IAM) plays a crucial role in securing the system, managing fine-grained access controls and permissions across all components. Amazon Bedrock Agents and its multi-agent capabilities generate intelligent recommendations through coordinated agent interactions, allowing organizations to optimize ODCR usage and implement data-driven capacity planning strategies.

How the solution functions

At the heart of this system is an Amazon Bedrock Agent powered by Action Groups, which map user queries to backend tasks using Lambda. These Lambda functions act as the system’s intelligence layer: fetching, filtering, and formatting ODCR data for the agent to reason over. The following three Lambda functions are deployed as part of the AWS CloudFormation template:

get_az_mapping_info Lamba function: This Lambda function takes an AWS account ID and Region as input. It returns a mapping of AZ names to their corresponding AZ IDs for that account and AWS Region. This helps the Capacity Planning Agent align AZs across accounts.

find_eligiable_odcrs Lambda function: This Lambda function supports the Capacity Planning Agent by identifying suitable ODCRs that meet the specified instance type and AZ. It uses Resource Explorer to search for ODCRs across the organization. Then, the function assumes cross-account roles to access ODCRs in different AWS accounts to gather detailed information. After retrieving the necessary data, it filters the ODCRs based on instance type, tenancy, and AZ. Finally, it returns a formatted list of eligible ODCRs, which are sourced from either the specified account or an alternative account with adequate capacity.

find_eligible_odcrs_for_move Lambda function: This Lambda function assists the Augmentation Agent by finding all capacity reservations within the same account as the specified ODCR. It uses Resource Explorer to discover ODCRs and, when necessary, assumes cross-account roles to gather ODCR information. The function filters ODCRs based on matching instance type, tenancy, and AZ, then returns a formatted list of eligible ODCRs that can provide more capacity through MOVE operations.

Each Lambda function receives structured inputs from the Amazon Bedrock Agent, executes its designated task, and returns structured responses. Then, these responses are used by the agent to generate intelligent, actionable answers to user queries.

Together, these Lambda functions extend the capabilities of the Amazon Bedrock Agent by integrating with external data sources and automating complex ODCR management logic—allowing the system to deliver accurate, dynamic recommendations.

Prerequisites

You must have the following in place to implement the solution in this post:

An AWS account or multiple accounts with AWS Organizations configured.
- Enable Trusted Access for Resource Explorer in Organizations.
Foundational Model (FM) access in Amazon Bedrock for Anthropic’s Claude 3.5 Haiku v1 in the same Region where you plan to deploy this solution.
Turn on Resource Explorer.
- Setting up and configuring Resource Explorer
- We recommend assigning a delegated administrator account—the account where you plan to deploy the ODCR solution—and creating the view in that account. Note this account number because it is needed later while deploying cross account role.
- Create a Resource Explorer view that filters for resourcetype:ec2:capacity-reservation to enable searching for ODCRs in the delegated account.
- After setting up Resource Explorer and creating a view, note the Amazon Resource Name (ARN). You need this ARN when deploying the ODCR CloudFormation template.
Active ODCRs configured and maintained across multiple accounts within your Organization.
The accompanying CloudFormation template downloaded from the aws-samples GitHub repo.
- Cross Account IAM Role
- ODCR Solution

Deploy solution resources using CloudFormation

This solution is designed to be deployed in a single Region. If deploying outside us-east-1, then you must modify the CloudFormation parameters to reference the correct FM version and adjust for Regional availability, such as any necessary cross-Region inference profile configurations.

Deploy the Cross Account IAM Role template: To access ODCR data across your Organization, deploy a CloudFormation StackSet from your payer account. During deployment, you must provide the account number where you plan to deploy the ODCR solution. This is the same account number that you used as delegate account for Resource Explorer under the prerequisites. This account number is added to the trust policy of the IAM role, allowing the solution in that account to assume the role created by the StackSet. The deployment automatically provisions a standardized IAM role with the necessary permissions (for example ec2:DescribeCapacityReservations and ec2:DescribeAvailabilityZones) in each target account.
Deploy the ODCR Solution template: Deploy the provided template in the delegated administrator account that you chose for Resource Explorer. This sets up the core components of the solution and integrates with your payer account for organization-wide ODCR insights.

Resources deployed by the CloudFormation template

After you have deployed the CloudFormation template, the following AWS resources are created in your account.

Cross Account IAM Role template

IAM resource
- CrossAccountODCRAccessRole

ODCR Solution template

Amazon Cognito resources:
- User Pool: ODCRAgentUserPool
- App Client: ODCRAgentApp
- Identity Pool: odcr-identity-pool
- Groups: ODCRAdmins
- User: ODCR User
IAM resources:
- IAM roles:
  - GetAZMappingFunctionRole
  - LambdaODCRAccessRole
  - BedrockAgentExecutionRole
  - ODCRSupervisorAgentExecutionRole
  - CognitoAuthRole
Lambda functions:
- get_az_mapping_info
- find_eligible_odcrs
- find_eligible_odcrs_for_move
Amazon Bedrock Agents
- ODCRSupervisorAgent
- CapacityPlanningAgent with action groups:
  - get_az_mapping_info
  - find_eligible_odcrs
- AugmentationAgent with action group:
  - find_eligible_odcrs_for_move

After you deploy the final CloudFormation template, copy the following from the Outputs tab on the CloudFormation console to use during the configuration of your application after it’s deployed in Amplify:

AWSRegion
BedrockAgentAliasId
BedrockAgentId
BedrockAgentName
IdentityPoolId
UserPoolClientId
UserPoolId

The following screenshot shows you what the Outputs tab looks like.

Deploy the Amplify application for front-end

You need to manually deploy the Amplify application using the front-end code found on GitHub. Complete the following steps, as shown in the following figure:

Download the front-end code AWS-Amplify-Frontend.zip from GitHub.
Use the .zip file to manually deploy the application in Amplify.
Return to the Amplify page and use the domain that it automatically generated to access the application.

After completing these steps, your environment is ready to analyze ODCR usage and receive recommendations powered by Amazon Bedrock Agents.

Solution walkthrough

After deploying the application in Amplify, return to the Amplify console. Use the auto-generated domain URL shown there to access the front-end. Upon accessing the application URL, you’re prompted to provide information related to Amazon Cognito and Amazon Bedrock Agents. This information is necessary to securely authenticate users and allow the front end to interact with the Amazon Bedrock Agent. It enables the application to manage user sessions and make authorized API calls to AWS services on behalf of the user.

You can enter information with the values that you collected from the CloudFormation stack outputs, as shown in the following screenshot:

A temporary password was automatically generated during deployment and sent to the email address provided when launching the CloudFormation template. At first sign-in, you’re prompted to reset your password.When you’re signed in, you can begin interacting with the application to analyze and manage your ODCR usage. The interface supports natural language queries to streamline capacity planning. The following are some example questions that you can ask:

I require [X units] of capacity for the [INSTANCE TYPE] instance type in Availability Zone [AZ ID], within account [ACCOUNT NUMBER]. Could you please advise if there are any existing On-Demand Capacity Reservations (ODCRs) that can fulfill this requirement? Additionally, what operations would be necessary to utilize the available capacity?

I have a requirement to add [X additional units] of capacity to an existing On-Demand Capacity Reservation with ID [cr-0012abcd123456789]. Could you please advise on the steps or operations required to fulfill this request?

Cleaning up

If you decide to discontinue using this solution, then you can follow these steps to remove it, its associated resources deployed using CloudFormation, and the Amplify deployment:

Delete the CloudFormation StackSet:
1. On the CloudFormation console, choose StackSets in the navigation pane.
2. Locate your StackSet for the Cross-Account IAM Role (you assigned a name to it).
3. Choose Delete stacks from StackSet to remove all stack instances.
4. After all instances are deleted, choose StackSet.
5. Choose Delete StackSet from the Actions menu.
Delete the CloudFormation stack:
1. On the CloudFormation console, choose Stacks in the navigation pane.
2. Locate the stack you created during the deployment process (you assigned a name to it).
3. Choose the stack and choose Delete.
Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.

Conclusion

In this post, we demonstrated how to use Amazon Bedrock Agents and AWS services to build an intelligent ODCR management solution. Combining AWS Resource Explorer for organization-wide visibility, Amazon Bedrock Agents for intelligent recommendations, and a secure front-end powered by AWS Amplify, organizations can now make data-driven decisions about their capacity reservations. This solution helps teams optimize ODCR usage, reduce costs, and efficiently manage compute capacity across their AWS Organization. The artificial intelligence (AI)-powered approach eliminates manual tracking and analysis, allowing teams to focus on strategic capacity planning rather than operational overhead.

Get started today by deploying this solution in your AWS environment, and unlock the power of AI-driven capacity insights for more efficient ODCR management.

Using AWS Glue Data Catalog views with Apache Spark in EMR Serverless and Glue 5.0

2025-06-05 Aarthi Srinivasan

Post Syndicated from Aarthi Srinivasan original https://aws.amazon.com/blogs/big-data/using-aws-glue-data-catalog-views-with-apache-spark-in-emr-serverless-and-glue-5-0/

The AWS Glue Data Catalog has expanded its Data Catalog views feature, and now supports Apache Spark environments in addition to Amazon Athena and Amazon Redshift. This enhancement, launched in March 2025, now makes it possible to create, share, and query multi-engine SQL views across Amazon EMR Serverless, Amazon EMR on Amazon EKS, and AWS Glue 5.0 Spark, as well as Athena and Amazon Redshift Spectrum. The multi-dialect views empower data teams to create SQL views one time and query them through supported engines—whether it’s Athena for ad-hoc analytics, Amazon Redshift for data warehousing, or Spark for large-scale data processing. This cross-engine compatibility means data engineers can focus on building data products rather than managing multiple view definitions or complex permission schemes. Using AWS Lake Formation permissions, organizations can share these views within the same AWS account, across different AWS accounts, and with AWS IAM Identity Center users and groups, without granting direct access to the underlying tables. Features of Lake Formation such as fine-grained access control (FGAC) using Lake Formation-tag based access control (LF-TBAC) can be applied to Data Catalog views, enabling scalable sharing and access control across organizations.

In an earlier blog post, we demonstrated the creation of Data Catalog views using Athena, adding a SQL dialect for Amazon Redshift, and querying the view using Athena and Amazon Redshift. In this post, we guide you through the process of creating a Data Catalog view using EMR Serverless, adding the SQL dialect to the view for Athena, sharing it with another account using LF-Tags, and then querying the view in the recipient account using a separate EMR Serverless workspace and AWS Glue 5.0 Spark job and Athena. This demonstration showcases the versatility and cross-account capabilities of Data Catalog views and access through various AWS analytics services.

Benefits of Data Catalog views

The following are key benefits of Data Catalog views for business solutions:

Targeted data sharing and access control – Data Catalog views, combined with the sharing capabilities of Lake Formation, enable organizations to provide specific data subsets to different teams or departments without duplicating data. For example, a retail company can create views that show sales data to regional managers while restricting access to sensitive customer information. By applying LF-TBAC to these views, companies can efficiently manage data access across large, complex organizational structures, maintaining compliance with data governance policies while promoting data-driven decision-making.
Multi-service analytics integration – The ability to create a view in one analytics service and query it across Athena, Amazon Redshift, EMR Serverless, and AWS Glue 5.0 Spark breaks down data silos and promotes a unified analytics approach. This feature allows businesses to use the strengths of different services for various analytics needs. For instance, a financial institution could create a view of transaction data and use Athena for ad-hoc queries, Amazon Redshift for complex aggregations, and EMR Serverless for large-scale data processing—all without moving or duplicating the data. This flexibility accelerates insights and improves resource utilization across the analytics stack.
Centralized auditing and compliance – With views stored in the central Data Catalog, businesses can maintain a comprehensive audit trail of data access across connected accounts using AWS CloudTrail logs. This centralization is crucial for industries with strict regulatory requirements, such as healthcare or finance. Compliance officers can seamlessly monitor and report on data access patterns, detect unusual activities, and demonstrate adherence to data protection regulations like GDPR or HIPAA. This centralized approach simplifies compliance processes and reduces the risk of regulatory violations.

These capabilities of Data Catalog views provide powerful solutions for businesses to enhance data governance, improve analytics efficiency, and maintain robust compliance measures across their data ecosystem.

Solution overview

An example company has multiple datasets containing details of their customers’ purchase details mixed with personally identifiable information (PII) data. They categorize their datasets based on sensitivity of the information. The data steward wants to share a subset of their preferred customers data for further analysis downstream by their data engineering team.

To demonstrate this use case, we use sample Apache Iceberg tables customer and customer_address. We create a Data Catalog view from these two tables to filter by preferred customers. We then use LF-Tags to share restricted columns of this view to the downstream engineering team. The solution is represented in the following diagram.

Prerequisites

To implement this solution, you need two AWS accounts with an AWS Identity and Access Management (IAM) admin role. We use the role to run the provided AWS CloudFormation templates and also use the same IAM roles added as Lake Formation administrator.

Set up infrastructure in the producer account

We provide a CloudFormation template that deploys the following resources and completes the data lake setup:

Two Amazon Simple Storage Service (Amazon S3) buckets: one for scripts, logs, and query results, and one for the data lake storage.
Lake Formation administrator and catalog settings. The IAM admin role that you provide is registered as Lake Formation administrator. Cross-account sharing version is set to 4. Default permissions for newly created databases and tables is set to use Lake Formation permissions only.
An IAM role with read, write, and delete permissions on the data lake bucket objects. The data lake bucket is registered with Lake Formation using this IAM role.
An AWS Glue database for the data lake.
Lake Formation tags. These tags are attached to the database.
CSV and Iceberg format tables in the AWS Glue database. The CSV tables are pointing to s3://redshift-downloads/TPC-DS/2.13/10GB/ and the Iceberg tables are stored in the user account’s data lake bucket.
An Athena workgroup.
An IAM role and an AWS Lambda function to run Athena queries. Athena queries are run in the Athena workgroup to insert data from CSV tables to Iceberg tables. Relevant Lake Formation permissions are granted to the Lambda role.
An EMR Studio and related virtual private cloud (VPC), subnet, routing table, security groups, and EMR Studio service IAM role.
An IAM role with policies for the EMR Studio runtime. Relevant Lake Formation permissions are granted to this role on the Iceberg tables. This role will be used as the definer role to create the Data Catalog view. A definer role is the IAM role with necessary permissions to access the referenced tables, and runs the SQL statement that defines the view.

Complete the following steps in your producer AWS account:

Sign in to the AWS Management Console as an IAM administrator role.
Launch the CloudFormation stack.

Allow approximately 5 minutes for the CloudFormation stack to complete creation. After the CloudFormation has completed launching, proceed with the following instructions.

If you’re using the producer account in Lake Formation for the first time, on the Lake Formation console, create a database named default and grant describe permission on the default database to runtime role GlueViewBlog-EMRStudio-RuntimeRole.

Create an EMR Serverless application

Complete the following steps to create an EMR Serverless application in your EMR Studio:

On the Amazon EMR console, under EMR Studio in the navigation pane, choose Studios.
Choose GlueViewBlog-emrstudio and choose the URL link of the Studio to open it.
On the EMR Studio dashboard, choose Create application.

You will be directed to the Create application page on EMR Studio. Let’s create a Lake Formation enabled EMR Serverless application.

Under Application settings, provide the following information:
1. For Name, enter a name (for example, emr-glueview-application).
2. For Type, choose Spark.
3. For Release version, choose emr-7.8.0.
4. For Architecture, choose x86_64.
Under Application setup options, select Use custom settings.
Under Interactive endpoint, select Enable endpoint for EMR studio.
Under Additional configurations, for Metastore configuration, select Use AWS Glue Data Catalog as metastore, then select Use Lake Formation for fine-grained access control.
Under Network connections, choose emrs-vpc for VPC, enter any two private subnets, and enter emr-serverless-sg for Security groups.
Choose Create and start the application.

Create an EMR Workspace

Complete the following steps to create an EMR Workspace:

On the EMR Studio console, choose Workspaces in the navigation pane and choose Create Workspace.
Enter a Workspace name (for example, emrs-glueviewblog-workspace).
Leave all other settings as default and choose Create Workspace.
Choose Launch Workspace. Your browser might request to allow pop-up permissions for the first time launching the Workspace.
After the Workspace is launched, in the navigation pane, choose Compute.
For Compute type, select EMR Serverless application and enter emr-glueview-application for the application and GlueViewBlog-EMRStudio-RuntimeRole for Interactive runtime role.
Make sure the kernel attached to the Workspace is PySpark.

Create a Data Catalog view and verify

Complete the following steps:

Download the notebook glueviewblog_producer.ipynb. The code creates a Data Catalog view customer_nonpii_view from the two Iceberg tables, customer_iceberg and customer_address_iceberg, in the database glueviewblog_<account-id>_db.
On your EMR Workspace emrs-glueviewblog-workspace, go to the File browser section and choose Upload files.
Upload glueviewblog_producer.ipynb.
Update the data lake bucket name, AWS account ID, and AWS Region to match your resources.
Update the database_name, table1_name, and table2_name to match your resources.
Save the notebook.
Choose the double arrow icon to restart the kernel and rerun the notebook.

The Data Catalog view customer_nonpii_view is created and verified.

In the navigation pane on the Lake Formation console, under Data Catalog, choose Views.
Choose the new view customer_nonpii_view.
On the SQL definitions tab, verify EMR with Apache Spark shows up for Engine name.
Choose the tab LF-Tags. The view should show the LF-Tag sensitivity=pii-confidential inherited from the database.
Choose Edit LF-Tags.
On the Values dropdown menu, choose confidential to overwrite the Data Catalog view’s key value of sensitivity from pii-confidential.
Choose Save.

With this, we have created a non-PII view to share with the data engineering team from the datasets that has PII information of customers.

Add Athena SQL dialect to the view

With the view customer_nonpii_view having been created by the EMR runtime role GlueViewBlog-EMRStudio-RuntimeRole, the Admin will have only describe permissions on it as a database creator and Lake Formation administrator. In this step, the Admin will grant itself alter permissions on the view, in order to add the Athena SQL dialect to the view.

On the Lake Formation console, in the navigation pane, choose Data permissions.
Choose Grant and provide the following information:
1. For Principals, enter Admin.
2. For LF-Tags or catalog resources, select Resources matched by LF-Tags.
3. For Key, choose sensitivity.
4. For Values, choose confidential and pii-confidential.
5. Under Database permissions, select Super for Database permissions and Grantable permissions.
6. Under Table permissions, select Super for Table permissions and Grantable permissions.
7. Choose Grant.
Verify the LF-Tags based permissions the Admin.
Open the Athena query editor, choose the Workgroup GlueViewBlogWorkgroup and choose the AWS Glue database glueviewblog_<accountID>_db.

Run the following query. Replace <accountID> with your account ID.

ALTER VIEW glueviewblog_<accountID>_db.customer_nonpii_view ADD DIALECT
AS
select c_customer_id, c_customer_sk, c_last_review_date, ca_country, ca_location_type
from glueviewblog__<accountID>_db.customer_iceberg, glueviewblog__<accountID>_db.customer_address_iceberg
where c_current_addr_sk = ca_address_sk and c_preferred_cust_flag='Y';

Verify the Athena dialect by running a preview on the view.
On the Lake Formation console, verify the SQL dialects on the view customer_nonpii_view.

Share the view to the consumer account

Complete the following steps to share the Data Catalog view to the consumer account:

On the Lake Formation console, in the navigation pane, choose Data permissions.
Choose Grant and provide the following information:
1. For Principals, select External accounts and enter the consumer account ID.
2. For LF-Tags or catalog resources, select Resources matched by LF-Tags.
3. For Key, choose sensitivity.
4. For Values, choose confidential.
5. Under Database permissions, select Describe for Database permissions and Grantable permissions.
6. Under Table permissions, select Describe and Select for Table permissions and Grantable permissions.
7. Choose Grant.
Verify granted permissions on the Data permissions page.

With this, the producer account data steward has created a Data Catalog view of a subset of data from two tables in their Data Catalog, using the EMR runtime role as the definer role. They have shared it to their analytics account using LF-Tags to run further processing of the data downstream.

Set up infrastructure in the consumer account

We provide a CloudFormation template to deploy the following resources and set up the data lake as follows:

An S3 bucket for Amazon EMR and AWS Glue logs
Lake Formation administrator and catalog settings similar to the producer account setup
An AWS Glue database for the data lake
An EMR Studio and related VPC, subnet, routing table, security groups, and EMR Studio service IAM role
An IAM role with policies for the EMR Studio runtime

Complete the following steps in your consumer AWS account:

Sign in to the console as an IAM administrator role.
Launch the CloudFormation stack.

Allow approximately 5 minutes for the CloudFormation stack to complete creation. After the CloudFormation has completed launching, proceed with the following instructions.

If you’re using the consumer account Lake Formation for the first time, on the Lake Formation console, create a database named default and grant describe permission on the default database to runtime role GlueViewBlog-EMRStudio-Consumer-RuntimeRole.

Accept AWS RAM shares in the consumer account

You can now log in to the AWS consumer account and accept the AWS RAM invitation:

Open the AWS RAM console with the IAM role that has AWS RAM access.
In the navigation pane, choose Resource shares under Shared with me.

You should see two pending resource shares from the producer account.

Accept both invitations.

Create a resource link for the shared view

To access the view that was shared by the producer AWS account, you need to create a resource link in the consumer AWS account. A resource link is a Data Catalog object that is a link to a local or shared database, table, or view. After you create a resource link to a view, you can use the resource link name wherever you would use the view name. Furthermore, you can grant permission on the resource link to the job runtime role GlueViewBlog-EMRStudio-Consumer-RuntimeRole to access the view through EMR Serverless Spark.

To create a resource link, complete the following steps:

Open the Lake Formation console as the Lake Formation data lake administrator in the consumer account.
In the navigation pane, choose Tables.
Choose Create and Resource link.
For Resource link name, enter the name of the resource link (for example, customer_nonpii_view_rl).
For Database, choose the glueviewblog_customer_<accountID>_db database.
For Shared table region, choose the Region of the shared table.
For Shared table, choose customer_nonpii_view.
Choose Create.

Grant permissions on the database to the EMR job runtime role

Complete the following steps to grant permissions on the database glueviewblog_customer_<accountID>_db to the EMR job runtime role:

On the Lake Formation console, in the navigation pane, choose Databases.
Select the database glueviewblog_customer_<accountID>_db and on the Actions menu, choose Grant.
In the Principles section, select IAM users and roles, and choose GlueViewBlog-EMRStudio-Consumer-RuntimeRole.
In the Database permissions section, select Describe.
Choose Grant.

Grant permissions on the resource link to the EMR job runtime role

Complete the following steps to grant permissions on the resource link customer_nonpii_view_rl to the EMR job runtime role:

On the Lake Formation console, in the navigation pane, choose Tables.
Select the resource link customer_nonpii_view_rl and on the Actions menu, choose Grant.
In the Principles section, select IAM users and roles, and choose GlueViewBlog-EMRStudio-Consumer-RuntimeRole.
In the Resource link permissions section, select Describe for Resource link permissions.
Choose Grant.

This allows the EMR Serverless job runtime roles to describe the resource link. We don’t make any selections for grantable permissions because runtime roles shouldn’t be able to grant permissions to other principles.

Grant permissions on the target for the resource link to the EMR job runtime role

Complete the following steps to grant permissions on the target for the resource link customer_nonpii_view_rl to the EMR job runtime role:

On the Lake Formation console, in the navigation pane, choose Tables.
Select the resource link customer_nonpii_view_rl and on the Actions menu, choose Grant on target.
In the Principles section, select IAM users and roles, and choose GlueViewBlog-EMRStudio-Consumer-RuntimeRole.
In the View permissions section, select Select and Describe.
Choose Grant.

Set up an EMR Serverless application and Workspace in the consumer account

Repeat the steps to create an EMR Serverless application in the consumer account.

Repeat the steps to create a Workspace in the consumer account. For Compute type, select EMR Serverless application and enter emr-glueview-application for the application and GlueViewBlog-EMRStudio-Consumer-RuntimeRole as the runtime role.

Verify access using interactive notebooks from EMR Studio

Complete the following steps to verify access in EMR Studio:

Download the notebook glueviewblog_emr_consumer.ipynb. The code runs a select statement on the view shared from the producer.
In your EMR Workspace emrs-glueviewblog-workspace, navigate to the File browser section and choose Upload files.
Upload glueviewblog_emr_consumer.ipynb.
Update the data lake bucket name, AWS account ID, and Region to match your resources.
Update the database to match your resources.
Save the notebook.
Choose the double arrow icon to restart the kernel with PySpark kernel and rerun the notebook.

Verify access using interactive notebooks from AWS Glue Studio

Complete the following steps to verify access using AWS Glue Studio:

Download the notebook glueviewblog_glue_consumer.ipynb
Open the AWS Glue Studio console.
Choose Notebook and then choose Upload notebook.
Upload the notebook glueviewblog_glue_consumer.ipynb.
For IAM role, choose GlueViewBlog-EMRStudio-Consumer-RuntimeRole.
Choose Create notebook.
Update the data lake bucket name, AWS account ID, and Region to match your resources.
Update the database to match your resources.
Save the notebook.
Run all the cells to verify fine-grained access.

Verify access using the Athena query editor

Because the view from the producer account was shared to the consumer account, the Lake Formation administrator will have access to the view in the producer account. Also, because the lake admin role created the resource link pointing to the view, it will also have access to the resource link. Go to the Athena query editor and run a simple select query on the resource link.

The analytics team in the consumer account was able to access a subset of the data from a business data producer team, using their analytics tools of choice.

Clean up

To avoid incurring ongoing costs, clean up your resources:

In your consumer account, delete AWS Glue notebook, stop and delete the EMR application, and then delete EMR Workspace.
In your consumer account, delete the CloudFormation stack. This should remove the resources launched by the stack.
In your producer account, log in to the Lake Formation console and revoke the LF-Tags based permissions you had granted to the consumer account.
In your producer account, stop and delete the EMR application and then delete the EMR Workspace.
In your producer account, delete the CloudFormation stack. This should delete the resources launched by the stack.
Review and clean up any additional AWS Glue and Lake Formation resources and permissions.

Conclusion

In this post, we demonstrated a powerful, enterprise-grade solution for cross-account data sharing and analysis using AWS services. We walked you through how to do the following key steps:

Create a Data Catalog view using Spark in EMR Serverless within one AWS account
Securely share this view with another account using LF-TBAC
Access the shared view in the recipient account using Spark in both EMR Serverless and AWS Glue ETL
Implement this solution with Iceberg tables (it’s also compatible other open table formats like Apache Hudi and Delta Lake)

The solution approach with multi-dialect data catalog views provided in this post is particularly valuable for enterprises looking to modernize their data infrastructure while optimizing costs, improve cross-functional collaboration while enhancing data governance, and accelerate business insights while maintaining control over sensitive information.

Refer to the following information about Data Catalog views with individual analytics services, and try out the solution. Let us know your feedback and questions in the comments section.

EMR Serverless: Working with Glue Data Catalog views
Athena: Use Data Catalog views in Athena
Amazon Redshift: AWS Glue Data Catalog views
Lake Formation: Building AWS Glue Data Catalog views

About the Authors

Aarthi Srinivasan is a Senior Big Data Architect with Amazon SageMaker Lakehouse. As part of the SageMaker Lakehouse team, she works with AWS customers and partners to architect lake house solutions, enhance product features, and establish best practices for data governance.

Praveen Kumar is an Analytics Solutions Architect at AWS with expertise in designing, building, and implementing modern data and analytics platforms using cloud-based services. His areas of interest are serverless technology, data governance, and data-driven AI applications.

Dhananjay Badaya is a Software Developer at AWS, specializing in distributed data processing engines including Apache Spark and Apache Hadoop. As a member of the Amazon EMR team, he focuses on designing and implementing enterprise governance features for EMR Spark.

Implementing just-in-time privileged access to AWS with Microsoft Entra and AWS IAM Identity Center

2025-06-03 Rodney Underkoffler

Post Syndicated from Rodney Underkoffler original https://aws.amazon.com/blogs/security/implementing-just-in-time-privileged-access-to-aws-with-microsoft-entra-and-aws-iam-identity-center/

Controlling access to your privileged and sensitive resources is critical for all AWS customers. Preventing direct human interaction with services and systems through automation is the primary means of accomplishing this. For those infrequent times when automation is not yet possible or implemented, providing a secure method for temporary elevated access is the next best option. In a privileged access management solution, there are several elements that should be included:

User access should follow the principle of least privileged
Users should be granted only the minimum amount of access required to perform their job duties
Access granted should persist only for the time necessary to perform the assigned tasks
The solution should include:
- An eligibility process for granting access
- An approval process for granting access
- Auditing of the access grants and activities taken

Entra Privileged Identity Management (PIM) is a third-party solution that provides dynamic group management, access control, and audit capabilities that integrate with AWS IAM Identity Center.

In this post, we show you how to configure just-in-time access to AWS using Entra PIM’s integration with IAM Identity Center.

Just-in-time privileged access with Entra PIM and IAM Identity Center

Privileged Identity Management is a Microsoft Entra ID feature that enables management, control, and access monitoring of your important cloud resources. There are many different configuration options when it comes to eligibility and assignment to privileged security groups, including time-bound access with start and end dates, multi-factor authentication (MFA) enforcement, justification tracking, and so on. You can read more about those options in Microsoft’s product documentation.

Figure 1 shows the just-in-time access solution powered by Entra PIM group activation requests. In this solution, Entra PIM is integrated with IAM Identity Center to provide temporary, limited access to AWS resources based on user requests and approvals. Entra ID users can submit requests for specific access to specific AWS permissions sets, which are then automatically granted for a set duration.

Figure 1 – Entra PIM solution integrated with IAM Identity Center

Prerequisites

To try the solution described in this post, you need to have the following in place:

An AWS account with an organization instance of IAM Identity Center enabled. For more information, see What is IAM Identity Center.
An Azure account subscription with Entra ID P1 or P2 licensing.
Entra ID as an external identity provider (IdP) for IAM Identity Center as described in Configure SAML and SCIM with Microsoft Entra ID and IAM Identity Center. Follow Steps 1.1, 3.1, 3.2, and 3.3.
A sample user in Entra ID as described in the step 4.1 of Step 4: Configure and test your SCIM synchronization or use an existing user within Entra ID for testing.
Automatic provisioning enabled in Entra ID and IAM Identity Center. Follow Steps 4.2 and 4.3 in this Step 4: Configure and test your SCIM synchronization.

Step-by-step configuration

In the following steps, you create configurations to enable Entra PIM for Groups to automatically assign users to groups based on approval criteria. The groups will be Entra ID security groups that use direct assignment. Note that, at the time of this writing, dynamic groups and groups that you have synchronized from a self-managed Active Directory cannot be used with Microsoft Entra PIM. While it might be possible to also populate these groups using a third-party synchronization tool, for the purposes of this exercise, we assume that administration is occurring solely within Entra ID.

In the example scenario, the role corresponds to a specific job function within your organization. We use a group called AWS – Amazon EC2 Admin, which corresponds to a DevOps on-call site reliability engineer (SRE) lead.

Step 1: Create a group representing a specific privilege level.

Create a group in Entra ID that represents a specific privilege level that your employees can request for access to the AWS Management Console.

Sign in to the Microsoft Entra admin center with your credentials.
Select Groups and then All groups.
Choose New group.
Specify Security in the Group type dropdown list.
- In the Group name field, enter AWS - Amazon EC2 Admin.
- In the Group description field, enter Amazon EC2 administrator permissions.
- Choose Create.

Step 2: Assign access for the group in Entra ID

Now you need to assign the newly created group to your enterprise application.

Sign in to the Microsoft Entra admin center with your credentials
Select Applications and then Enterprise applications and select the IAM Identity Center application that you created.
Select Users and groups from the Manage menu group and select + Add user/group.
Select the None selected option from the Users and groups section.
Select the AWS – Amazon EC2 Admin group checkbox.
Choose Select and then choose Assign.
Select Provisioning from the Manage menu group and begin synchronizing the empty group by selecting the Start provisioning option.

When you first enable provisioning, the initial Microsoft Entra ID sync is triggered immediately. After that, subsequent syncs are triggered every 40 minutes, with the exact frequency depending on the number of users and groups in the application.

When the initial sync completes, the AWS – Amazon EC2 Admin group will be ready for configuration in IAM Identity Center.

Step 3: Create permission sets in IAM Identity Center

As you prepare to configure your permission set, let’s consider session duration from both the AWS and Entra PIM perspectives. There are two session durations on the AWS side: AWS access portal session duration and permission set session duration. The AWS access portal session duration defines the maximum length of time that a user can be signed in to the AWS access portal without reauthenticating. The default session duration is 8 hours but can be configured anywhere between 15 minutes and 7 days.

Note: Entra does not pass the SessionNotOnOrAfter attribute to IAM Identity Center as part of the SAML assertion. Meaning the duration of the AWS access portal session is controlled by the duration set in IAM Identity Center.

The session duration defined within a permission set specifies the length of time that a user can have a session for an AWS account. The default and minimum value is 1 hour (with a maximum value of 12). Entra PIM allows you to configure an activation maximum duration. The activation maximum duration is the length of time that the specified group will contain the activated user account. The activation maximum duration has a default value of 8 hours but can be configured between 30 minutes and 24 hours.

You should carefully consider the values that you provide for each of these durations. The AWS access portal will display permission sets that the user had access to at the time that they signed in for the duration of the active AWS access portal session.

When you set the permission set session duration, you need to keep in mind that active sessions are not terminated when the Entra PIM activation maximum duration has been reached. Let’s look at an example:

AWS access portal session duration: default (8 hours)
Session duration defined in the permission set: 1 hour
Entra PIM group activation maximum duration: 1 hour

You might be inclined to think that an hour after being added to the group in Entra, the user would no longer have access to AWS resources. This is not necessarily the case. A user could authenticate to the AWS access portal, wait up to 8 hours, and still successfully access AWS through the permission set link. Their session would be active for the duration of the session setting defined in the permission set, which is 1 hour in this case. In this example, we have a potential window of access of 10 hours, as shown in Figure 2 that follows.

Figure 2 – Calculating session duration

With this in mind, configure your test environment with the default setting of 8 hours for the AWS access portal and 1 hour for the permission set session duration value.

Open the IAM Identity Center console.
Under Multi-account permissions, choose Permission sets.
Choose Create permission set.
On the Select permission set type page, under Permission set type, select Custom permission set, and then choose Next.
On the Specify policies and permissions boundary page, expand AWS managed policies.
Search for and select AmazonEC2FullAccess policy, and then choose Next.
On the Specify permission set details page, enter EC2AdminAccess for the Permission set name and choose Next.
On the Review and create page, review the selections, and choose Create.

Step 4: Assign group access in your organization

At this point, you’re ready to assign the Microsoft Entra group to the corresponding permission set in IAM Identity Center. This allows users who are members of the group to be granted the appropriate access level in AWS.

In the navigation pane, under Multi-account permissions, choose AWS accounts.
On the AWS accounts page, select the check box next to one or more AWS accounts to which you want to assign access.
Choose Assign users or groups.
On the Groups tab, select AWS – Amazon EC2 Admin and choose Next
On the Assign permission sets to “<AWS-account-name>” page, select the EC2AdminAccess permission set.
Check that the correct permission set was selected and choose Next.
On the Review and submit page, verify that the correct group and permission set are selected, and choose Submit.

Step 5: Configure Entra PIM

To use this Microsoft Entra group with Entra PIM, you bring the group under the management of PIM by using the Entra admin console to activate the group. You can read more about group management with PIM in the Microsoft documentation. Begin by activating the Entra group that you created.

Sign in to the Microsoft Entra admin center with your credentials.
Select Groups and then All groups
Select the AWS – Amazon EC2 Admin group.

Figure 3 – Selecting groups for PIM enablement
Select Privileged Identity Management under the Activity menu list.
Choose Enable PIM for this group.

Figure 4 – Enable PIM for this group

Now, you will configure the PIM settings for the group. These settings define Member or Owner properties and requirements. It’s here that you can establish MFA requirements, configure notifications, conditional access, approvals, durations, and so on. The Owner role can elevate their permissions using just-in-time access to manage a group, while the Member role is limited to requesting just-in-time membership within the group. In this example, you use the Member properties to demonstrate group membership level temporary elevated access and set a 1-hour duration for the group assignment.

Sign in to the Microsoft Entra admin center with your credentials.
Select Identity Governance, Privileged Identity Management, and then Groups.
Select the AWS – Amazon EC2 Admin group.

Figure 5 – Selecting groups for PIM configuration
From the Manage menu select Settings.
Choose Member to view the default role setting details.

Figure 6 – Settings option for the Member role
Review the default settings. The activation maximum duration should be set to 1 hour and require a justification from the user.
Close the Role setting details – Member blade.

Figure 7 – Closing the Role setting details – Member blade
From the Manage menu select Assignments and choose + Add assignments.

Figure 8 – Adding eligibility assignments to the PIM enabled groups
Select Member from the Select role dropdown menu and choose No member selected. Select the test account, Rich Roe in this example, and then choose Select.

Figure 9 – Adding the test user as an eligible identity for PIM activation to the group
Choose Next and leave the default setting of 1 year of eligibility. Duration eligibility defines the period that the user can request activation for the group. Depending on your use case, you will define this as permanent or for a set period. For testing purposes, keep the default setting. Choose Assign.

Figure 10 – Completing the eligibility assignment

Test the configuration

You should now have a test configuration of Entra PIM and IAM Identity Center. Use the test account to verify just-in-time access.

Sign in to the Microsoft Entra admin center using the test account (Rich Roe in this example).
Select Identity Governance, Privileged Identity Management, and then My roles.

Figure 11 – Browsing to the My Roles section of the Entra admin center
From the Activate menu list, select Groups. Your eligible group assignments should be listed.
Choose Activate for the AWS – Amazon EC2 Admin group.

Figure 12 – Activating the just-in-time group membership
In the Activate – Member blade, enter a justification for the access request and choose Activate.

Figure 13 – Providing a justification for access

In this example, there are no approval workflow processes configured for the group, so Entra validates the eligibility requirements and adds the test account to the AWS – Amazon EC2 Admin group. If you want to dive deeper into the approval workflow process, you can read more about it on the Configure PIM for Groups settings page. Because the group is assigned to the enterprise application and configured for provisioning, the updated group membership is automatically synchronized using the SCIM protocol with the connected IAM Identity Center instance. The provisioning time can vary based on the number of PIM enabled users that are activating their memberships within a given 10-second period. In most situations, group memberships are synchronized within 2–10 minutes, but can revert to the standard 40-minute interval if activity runs up against Entra PIM throttling limits. IAM Identity Center responds to SCIM requests as they arrive from Entra ID.

To test access with the newly activated group assignment, use a separate browser or a private window.

Sign in to the My Apps portal with the test user credentials and select the IAM Identity Center app that you created for testing. If you experience an error or don’t see the expected permission set, wait a few minutes until the group membership has synchronized to IAM Identity Center and try again.

Figure 14 – Accessing IAM Identity Center through the My apps portal
Expand the associated AWS account and confirm the EC2ReadOnly permission set has been granted.
Close the AWS tab. Wait for the access to be revoked, which has been set to 60 minutes in this example.

Figure 15 – Just-in-time access to the EC2AdminAccess permission set
Sign back in to the My Apps portal and select the AWS IAM Identity Center app. Notice that the EC2ReadOnly permission set has been revoked.

Conclusion

The combination of AWS IAM Identity Center and Entra PIM provides a robust solution for managing just-in-time elevated access to AWS. By using security groups in Entra and mapping them to permission sets in IAM Identity Center, you can automate the provisioning and deprovisioning of privileged access based on defined policies and approval workflows. This approach helps to make sure the principle of least privilege is enforced, with access granted only for the duration required to complete a task. The detailed auditing capabilities of both services also provide comprehensive visibility into privileged access activities.

For AWS customers seeking a comprehensive, secure, and scalable privileged access management solution, the Entra PIM and IAM Identity Center integration is a common option that’s worth investigating to see if it’s a good fit for your use case.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Build a centralized observability platform for Apache Spark on Amazon EMR on EKS using external Spark History Server

2025-06-03 Sri Potluri

Post Syndicated from Sri Potluri original https://aws.amazon.com/blogs/big-data/build-a-centralized-observability-platform-for-apache-spark-on-amazon-emr-on-eks-using-external-spark-history-server/

Monitoring and troubleshooting Apache Spark applications become increasingly complex as companies scale their data analytics workloads. As data processing requirements grow, enterprises deploy these applications across multiple Amazon EMR on EKS clusters to handle diverse workloads efficiently. However, this approach creates a challenge in maintaining comprehensive visibility into Spark applications running across these separate clusters. Data engineers and platform teams need a unified view to effectively monitor and optimize their Spark applications.

Although Spark provides powerful built-in monitoring capabilities through Spark History Server (SHS), implementing a scalable and secure observability solution across multiple clusters requires careful architectural considerations. Organizations need a solution that not only consolidates Spark application metrics but extends its features by adding other performance monitoring and troubleshooting packages while providing secure access to these insights and maintaining operational efficiency.

This post demonstrates how to build a centralized observability platform using SHS for Spark applications running on EMR on EKS. We showcase how to enhance SHS with performance monitoring tools, with a pattern applicable to many monitoring solutions such as SparkMeasure and DataFlint. In this post, we use DataFlint as an example to demonstrate how you can integrate additional monitoring features. We explain how to collect Spark events from multiple EMR on EKS clusters into a central Amazon Simple Storage Service (Amazon S3) bucket; deploy SHS on a dedicated Amazon Elastic Kubernetes Service (Amazon EKS) cluster; and configure secure access using AWS Load Balancer Controller, AWS Private Certificate Authority, Amazon Route 53, and AWS Client VPN. This solution provides teams with a single, secure interface to monitor, analyze, and troubleshoot Spark applications across multiple clusters.

Overview of solution

Consider DataCorp Analytics, a data-driven enterprise running multiple business units with diverse Spark workloads. Their Financial Analytics team processes time-sensitive trading data requiring strict processing times and dedicated resources, and their Marketing Analytics team handles customer behavior data with flexible requirements, requiring multiple EMR on EKS clusters to accommodate these distinct workload patterns. As their Spark applications grow in number and complexity across these clusters, data and platform engineers struggle to maintain comprehensive visibility while maintaining secure access to monitoring tools.

This scenario presents an ideal use case for implementing a centralized observability platform using SHS and DataFlint. The solution deploys SHS on a dedicated EKS cluster, configured to read events from multiple EMR on EKS clusters through a centralized S3 bucket. Access is secured through Load Balancer Controller, AWS Private CA, Route 53, and Client VPN, and DataFlint enhances the monitoring capabilities with additional insights and visualizations. The following architecture diagram illustrates the components and their interactions.

The solution workflow is as follows:

Spark applications on EMR on EKS use a custom EMR Docker image that includes DataFlint JARs for enhanced metrics collection. These applications generate detailed event logs containing execution metrics, performance data, and DataFlint-specific insights. The logs are written to a centralized Amazon S3 location through the following configuration (note especially the configurationOverrides section). For additional information, explore the StartJobRun guide to learn how to run Spark jobs and review the StartJobRun API reference.

{
  "name": "${SPARK_JOB_NAME}", 
  "virtualClusterId": "${VIRTUAL_CLUSTER_ID}",  
  "executionRoleArn": "${IAM_ROLE_ARN_FOR_JOB_EXECUTION}",
  "releaseLabel": "emr-7.2.0-latest", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "s3://${S3_BUCKET_NAME}/app/${SPARK_APP_FILE}",
      "entryPointArguments": [
        "--input-path",
        "s3://${S3_BUCKET_NAME}/data/input",
        "--output-path",
        "s3://${S3_BUCKET_NAME}/data/output"
      ],
       "sparkSubmitParameters": "--conf spark.driver.cores=1 --conf spark.driver.memory=4G --conf spark.kubernetes.driver.limit.cores=1200m --conf spark.executor.cores=2  --conf spark.executor.instances=3  --conf spark.executor.memory=4G"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.driver.memory":"2G",
          "spark.kubernetes.container.image": "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${EMR_REPO_NAME}:${EMR_IMAGE_TAG}",
          "spark.app.name": "${SPARK_JOB_NAME}"
          "spark.eventLog.enabled": "true",
          "spark.eventLog.dir": "s3://${S3_BUCKET_NAME}/spark-events/"
         }
      }
    ], 
    "monitoringConfiguration": {
      "persistentAppUI": "ENABLED",
      "s3MonitoringConfiguration": {
        "logUri": "s3://${S3_BUCKET_NAME}/spark-events/"
      }
    }
  }
}

A dedicated SHS deployed on Amazon EKS reads these centralized logs. The Amazon S3 location is configured in the SHS to read from the central Amazon S3 location through the following code:

env:
  - name: SPARK_HISTORY_OPTS
    value: "-Dspark.history.fs.logDirectory=s3a://${S3_BUCKET}/spark-events/"

We configure Load Balancer Controller, AWS Private CA, a Route 53 hosted zone, and Client VPN to securely access the SHS UI using a web browser.
Finally, users can access the SHS web interface at https://spark-history-server.example.internal/.

You can find the code base in the AWS Samples GitHub repository.

Prerequisites

Before you deploy this solution, make sure the following prerequisites are in place:

Access to a valid AWS account
The AWS Command Line Interface (AWS CLI) is installed on your local machine
Git, Docker, eksctl, kubectl, Helm, and jq utilities are installed on your local machine
Permission to create AWS resources
Familiarity with Kubernetes, Spark, Amazon EKS, and EMR on EKS

Set up a common infrastructure

Complete the following steps to set up the infrastructure:

Clone the repository to your local machine and set the two environment variables. Replace <AWS_REGION> with the AWS Region where you want to deploy these resources.

git clone [email protected]:aws-samples/sample-centralized-spark-history-server-emr-on-eks.git
cd sample-centralized-spark-history-server-emr-on-eks
export REPO_DIR=$(pwd)
export AWS_REGION=<AWS_REGION>

Execute the following script to create the common infrastructure. The script creates a secure virtual private cloud (VPC) networking environment with public and private subnets and an encrypted S3 bucket to store Spark application logs.

cd ${REPO_DIR}/infra
./deploy_infra.sh

To verify successful infrastructure deployment, open the AWS CloudFormation console, choose your stack, and check the Events, Resources, and Outputs tabs for completion status, details, and list of resources created.

Set up EMR on EKS clusters

This section covers building a custom EMR on EKS Docker image with DataFlint integration, launching two EMR on EKS clusters (datascience-cluster-v and analytics-cluster-v), and configuring the clusters for job submission. Additionally, we set up the necessary IAM roles for service accounts (IRSA) to enable Spark jobs to write events to the centralized S3 bucket. Complete the following steps:

Deploy two EMR on EKS clusters:

cd ${REPO_DIR}/emr-on-eks
./deploy_emr_on_eks.sh

To verify successful creation of the EMR on EKS clusters using the AWS CLI, execute the following command:

aws emr-containers list-virtual-clusters \
    --query "virtualClusters[?state=='RUNNING']"

Execute the following command for the datascience-cluster-v and analytics-cluster-v clusters to verify their respective states, container provider information, and associated EKS cluster details. Replace <VIRTUAL-CLUSTER-ID> with the ID of each cluster obtained from the list-virtual-clusters output.

aws emr-containers describe-virtual-cluster \
    --id <VIRTUAL-CLUSTER-ID>

Configure and execute Spark jobs on EMR on EKS clusters

Complete the following steps to configure and execute Spark jobs on the EMR on EKS clusters:

Generate custom EMR on EKS image and StartJobRun request JSON files to run Spark jobs:

cd ${REPO_DIR}/jobs
./configure_jobs.sh

The script performs the following tasks:

Prepares the environment by uploading the sample Spark application spark_history_demo.py to a designated S3 bucket for job execution.
Creates a custom Amazon EMR container image by extending the base EMR 7.2.0 image with the DataFlint JAR for additional insights and publishing it to an Amazon Elastic Container Registry (Amazon ECR) repository.
Generates cluster-specific StartJobRun request JSON files for datascience-cluster-v and analytics-cluster-v.

Review start-job-run-request-datascience-cluster-v.json and start-job-run-request-analytics-cluster-v.json for additional details.

Execute the following commands to submit Spark jobs on the EMR on EKS virtual clusters:

aws emr-containers start-job-run \
--cli-input-json file://${REPO_DIR}/jobs/start-job-run/start-job-run-request-datascience-cluster-v.json
aws emr-containers start-job-run \
--cli-input-json file://${REPO_DIR}/jobs/start-job-run/start-job-run-request-analytics-cluster-v.json

Verify the successful generation of the logs in the S3 bucket:

aws s3 ls s3://emr-spark-logs-<AWS_ACCOUNT_ID>-<AWS_REGION>/spark-events/

You have successfully set up an EMR on EKS environment, executed Spark jobs, and collected the logs in the centralized S3 bucket. Next, we will deploy SHS, configure its secure access, and visualize the logs using it.

Set up AWS Private CA and create a Route 53 private hosted zone

Use the following code to deploy AWS Private CA and create a Route 53 private hosted zone. This will provide a user-friendly URL to connect to SHS over HTTPS.

cd ${REPO_DIR}/ssl
./deploy_ssl.sh

Set up SHS on Amazon EKS

Complete the following steps to build a Docker image containing SHS with DataFlint, deploy it on an EKS cluster using a Helm chart, and expose it through a Kubernetes service of type LoadBalancer. We use a Spark 3.5.0 base image, which includes SHS by default. However, although this simplifies deployment, it results in a larger image size. For environments where image size is critical, consider building a custom image with just the standalone SHS component instead of using the complete Spark distribution.

Deploy SHS on the spark-history-server EKS cluster:

cd ${REPO_DIR}/shs
./deploy_shs.sh

Verify the deployment by listing the pods and viewing the pod logs:

kubectl get pods --namespace spark-history
kubectl logs <SHS-PODNAME> --namespace spark-history

Review the logs and confirm there are no errors or exceptions.

You have successfully deployed SHS on the spark-history-server EKS cluster, and configured it to read logs from the emr-spark-logs-<AWS_ACCOUNT_ID>-<AWS_REGION> S3 bucket.

Deploy Client VPN and add entry to Route 53 for secure access

Complete the following steps to deploy Client VPN to securely connect your client machine (such as your laptop) to SHS and configure Route 53 to generate a user-friendly URL:

Deploy the Client VPN:

cd ${REPO_DIR}/vpn
./deploy_vpn.sh

Add entry to Route 53:

cd ${REPO_DIR}/dns
./deploy_dns.sh

Add certificates to local trusted stores

Complete the following steps to add the SSL certificate to your operating system’s trusted certificate stores for secure connections:

For macOS users, using Keychain Access (GUI):
1. Open Keychain Access from Applications, Utilities, choose the System keychain in the navigation pane, and choose File, Import Items.
2. Browse to and choose ${REPO_DIR}/ssl/certificates/ca-certificate.pem, then choose the imported certificate.
3. Expand the Trust section and set When using this certificate to Always Trust.
4. Close and enter your password when prompted and save.
5. Alternatively, you can execute the following command to include the certificate in Keychain and trust it:

sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain "${REPO_DIR}/ssl/certificates/ca-certificate.pem"

For Windows users:
1. Rename ca-certificate.pem to ca-certificate.crt.
2. Choose (right-click) ca-certificate.crt and choose Install Certificate.
3. Choose Local Machine (admin rights required).
4. Select Place all certificates in the following store.
5. Choose Browse and choose Trusted Root Certification Authorities.
6. Complete the installation by choosing Next and Finish.

Set up Client VPN on your client machine for secure access

Complete the following steps to install and configure Client VPN on your client machine (such as your laptop) and create a VPN connection to the AWS Cloud:

Download, install, and launch the Client VPN application from the official download page for your operating system.
Create your VPN profile:
1. Choose File in the menu bar, choose Manage Profiles, and choose Add Profile.
2. Enter a name for your profile. Example: SparkHistoryServerUI
3. Browse to ${REPO_DIR}/vpn/client_vpn_certs/client-config.ovpn, choose the certificate file, and choose Add Profile to save your configuration.
Select your newly created profile, choose Connect, and wait for the connection confirmation to establish the VPN connection.

When you’re connected, you will have secure access to the AWS resources in your environment.

Securely access the SHS URL

Complete the following steps to securely access SHS using a web browser:

Execute the following command to get the SHS URL:

https://spark-history-server.example.internal/

Copy this URL and enter it into your web browser to access the SHS UI.

The following screenshot shows an example of the UI.

Choose an App ID to view its detailed execution information and metrics.

Choose the DataFlint tab to view detailed application insights and analytics.

DataFlint displays various helpful metrics, including alerts, as shown in the following screenshot.

Clean up

To avoid incurring future charges from the resources created in this tutorial, clean up your environment after completing the steps. To remove all provisioned resources:

Disconnect from the Client VPN.
Run the cleanup.sh script:

cd ${REPO_DIR}/
./cleanup.sh

Conclusion

In this post, we demonstrated how to build a centralized observability platform for Spark applications using SHS and enhance SHS with performance monitoring tools like DataFlint. The solution aggregates Spark events from multiple EMR on EKS clusters into a unified monitoring interface, providing comprehensive visibility into your Spark applications’ performance and resource utilization. By using a custom EMR image with performance monitoring tool integration, we enhanced the standard Spark metrics to gain deeper insights into application behavior. If your environment uses a mix of EMR on EKS, Amazon EMR on EC2, or Amazon EMR Serverless, you can seamlessly extend this architecture to aggregate the logs from EMR on EC2 and EMR Serverless in a similar way and visualize them using SHS.

Although this solution provides a robust foundation for Spark monitoring, production deployments should consider implementing authentication and authorization. SHS supports custom authentication through javax servlet filters and fine-grained authorization through access control lists (ACLs). We encourage you to explore implementing authentication filters for secure access control, configuring user- and group-based ACLs for view and modify permissions, and setting up group mapping providers for role-based access. For detailed guidance, refer to Spark’s web UI security documentation and SHS security features.

While AWS endeavors to apply best practices for security within this example, each organization has its own policies. Please make sure to use the specific policies of your organization when deploying this solution as a starting point for implementing centralized Spark monitoring in your data processing environment.

About the authors

Sri Potluri is a Cloud Infrastructure Architect at AWS. He is passionate about solving complex problems and delivering well-structured solutions for diverse customers. His expertise spans across a range of cloud technologies, providing scalable and reliable infrastructures tailored to each project’s unique challenges.

Suvojit Dasgupta is a Principal Data Architect at AWS. He leads a team of skilled engineers in designing and building scalable data solutions for AWS customers. He specializes in developing and implementing innovative data architectures to address complex business challenges.