Tag Archives: Amazon API Gateway

Enhancing API security with Amazon API Gateway TLS security policies

Post Syndicated from Anton Aleksandrov original https://aws.amazon.com/blogs/compute/enhancing-api-security-with-amazon-api-gateway-tls-security-policies/

As compliance frameworks evolve and cryptographic standards advance, organizations are looking for additional controls to improve their cloud security posture. One of the neccesary controls is a more granular TLS configuration, for example when regulatory requirements mandate disabling older ciphers like CBC or enforcing TLS 1.3 as a minimum version.

In this post, you will learn how the new Amazon API Gateway’s enhanced TLS security policies help you meet standards such as PCI DSS, Open Banking, and FIPS, while strengthening how your APIs handle TLS negotiation. This new capability increases your security posture without adding operational complexity, and provides you with a single, consistent way to standardize TLS configuration across your API Gateway infrastructure.

Overview

Previously, API Gateway offered limited control over TLS configuration, and only for custom domain names. Default endpoints used fixed security policies, which meant you often had to introduce additional infrastructure, such as custom Amazon CloudFront distributions, to meet your organization’s security or compliance requirements.

With this launch, you can configure TLS behavior directly on all REST API endpoint types, including Regional, edge-optimized, and private, and apply consistent TLS settings across both your APIs and their custom domain names. You can choose from predefined enhanced security policies to enforce the minimum TLS versions and cipher suites that your workloads require. For example, you can enforce TLS 1.3, use hardened TLS 1.2 without CBC ciphers, adopt FIPS-aligned suites for government workloads, or prepare for the future with policies that include post-quantum cryptography (PQC). The new security policies provide finer-grained control without adding operational complexity, helping you align your APIs with evolving security and compliance expectations.

Understanding API Gateway security policies

A security policy in API Gateway is a predefined combination of a minimum TLS version and a curated set of cipher suites. When a client connects to your REST API or custom domain name, API Gateway uses the selected policy to determine which protocol versions and ciphers it will accept during the TLS handshake. This gives you a predictable and enforceable way to control how clients establish encrypted connections to your APIs.

API Gateway supports two categories of security policies. Legacy policies, such as TLS_1_0 or TLS_1_2, remain available for backwards compatibility. Enhanced policies, identified by the SecurityPolicy_* prefix, provide stricter and more modern controls for regulated workloads, advanced governance, or cryptographic hardening. When you use an enhanced policy, you must also specify an endpoint access mode, which adds additional validation for how traffic reaches your API, as described in the following sections.

Enhanced policies follow a consistent naming patterns that helps you quickly understand what each policy enforces. For example, for REGIONAL and PRIVATE endpoint types, the following pattern applies:

SecurityPolicy_[TLS-Versions]_[Variant]_[YYYY-MM]

From this structure, you can identify the minimum TLS versions supported, any specialized cryptographic variants (such as FIPS, PFS, or PQ), and the release date of the policy. For example, SecurityPolicy_TLS13_1_3_2025_09 accepts only TLS 1.3 traffic, while SecurityPolicy_TLS13_1_2_PFS_PQ_2025_09 supports TLS 1.2 as lowest and TLS 1.3 as highest TLS version with forward secrecy and post-quantum enhancements.

Each policy maps to a curated combination of ciphers. For instance, SecurityPolicy_TLS13_1_3_2025_09 accepts only three TLS 1.3 cipher suites (TLS_AES_128_GCM_SHA256, TLS_AES_256_GCM_SHA384, and TLS_CHACHA20_POLY1305_SHA256) and rejects any other protocol versions or ciphers. For a full list of supported policies and ciphers, and naming pattern for the EDGE endpont type, see the API Gateway documentation.

How security policies apply to default endpoints and custom domains

You can use API Gateway to attach different security policies to your default API endpoint and custom domain names. During TLS negotiation, API Gateway selects the policy based on the Server Name Indication (SNI) value in the client’s TLS handshake, not the HTTP Host header. This means the policy depends on the hostname the client uses when initiating TLS.

For example, if a client connects directly to your default endpoint, such as:

https://abcdef1234.execute-api.us-east-1.amazonaws.com

API Gateway uses the policy attached to that default endpoint because the SNI value matches its hostname.

If the client instead connects through a custom domain name, such as:

https://api.example.com

API Gateway uses the policy attached to that custom domain. In this case, the SNI value api.example.com determines which policy is enforced.

This distinction is important even if you disable your default endpoint. TLS negotiation always occurs before API Gateway evaluates endpoint settings, so the default endpoint security policy still applies to clients that connect directly to its hostname. To avoid unexpected client behavior, you should keep the API and its custom domain name aligned with the same security policy whenever possible.

Understanding endpoint access mode

When you use an enhanced security policy (SecurityPolicy_*), you must also specify an endpoint access mode. Endpoint access mode defines how strictly API Gateway validates the network path a request takes before it reaches your API. This gives you an additional layer of governance and helps you prevent unauthorized or misrouted traffic.

You can choose between two modes:

  • BASIC mode provides standard API Gateway behavior. It is the recommended starting point when you migrate an existing API to an enhanced security policy. Clients can continue reaching your API as they do today, without additional validation.
  • STRICT mode adds enforcement checks to ensure that requests originate from the correct endpoint type, and TLS negotiation aligns with your configuration.

When you enable STRICT mode, API Gateway performs additional validations, such as:

  • The SNI and HTTP Host header values match
  • The request originates from the same endpoint type as your API (Regional, edge-optimized, or private)

If any of these validations fail, API Gateway rejects the request. STRICT is a viable choice when you need stronger security guarantees, such as when running regulated or sensitive workloads. See API Gateway documentation for additional details.

When you switch from BASIC to STRICT mode, it takes up to 15 minutes for the change to fully propagate. Your API remains available during this period. If your endpoint access mode is set to STRICT, you cannot change the endpoint type until you revert the mode back to BASIC.

Applying security policies to new and existing APIs

You can apply a security policy when you create a new REST API or custom domain name, or update an existing resource to use one of the enhanced SecurityPolicy_* options. When migrating existing APIs, the recommended approach is to start with BASIC mode, validate client behavior (SNI and HTTP Host header values match, request originates from the same endpoint type as your API), and then move to STRICT mode once you confirm compatibility.

The following code snippets illustrate how to apply security policies to different scenarios:

Create a REST API with a security policy and STRICT endpoint access mode

You can attach a security policy directly during API creation, removing the need for extra infrastructure just to control TLS negotiation.

aws apigateway create-rest-api \
  --name "your-private-api-name" \
  --endpoint-configuration '{"types":["PRIVATE"]}' \
  --security-policy "SecurityPolicy_TLS13_1_3_2025_09" \
  --endpoint-access-mode STRICT \
  --policy file://api-policy.json

Create a custom domain name with a security policy and STRICT endpoint access mode

You can also specify the security policy when creating a custom domain name. API Gateway applies the selected policy during TLS negotiation based on the SNI value the client provides.

aws apigateway create-domain-name \
  --domain-name api.example.com \
  --regional-certificate-arn arn:aws:acm:region:account-id:certificate/certificate-id \
  --endpoint-configuration '{"types":["REGIONAL"]}' \
  --security-policy SecurityPolicy_TLS13_1_3_2025_09 \
  --endpoint-access-mode STRICT

Updating existing REST API

If you are migrating an existing API, start by applying the enhanced security policy with BASIC mode. After confirming that your clients can connect with BASIC mode as expected, proceed to enable the STRICT mode.

1. Apply the new policy with BASIC mode

aws apigateway update-rest-api --rest-api-id abcd123 --patch-operations '[
    {
         "op": "replace",
         "path": "/securityPolicy",
         "value": "SecurityPolicy_TLS13_1_3_2025_09"
    },
    {
         "op": "replace",
         "path": "/endpointAccessMode",
         "value": "BASIC"
     }
]'

Verify your clients can consume the API as expected using access logs and performance metrics in Amazon CloudWatch.

2. Enable the STRICT mode after validation

aws apigateway update-rest-api --rest-api-id abcd123 --patch-operations '[
    {
        "op": "replace",
        "path": "/endpointAccessMode",
        "value": "STRICT"
     }
]'

Updating existing custom domain name

Custom domain names follow the same migration approach as REST APIs.

1. Apply the new policy with BASIC mode and validate clients can successfully connect.

aws apigateway update-domain-name --domain-name api.example.com --patch-operations '[
    {
        "op": "replace",
        "path": "/securityPolicy",
        "value": "SecurityPolicy_TLS13_1_3_2025_09"
    },
    {
        "op": "replace",
        "path": "/endpointAccessMode",
        "value": "BASIC"
     }
]'

2. Enable the STRICT mode after validation

aws apigateway update-domain-name --domain-name api.example.com --patch-operations '[
    {
        "op": "replace",
        "path": "/endpointAccessMode",
        "value": "STRICT"
     }
]'

After you update your REST API or custom domain configuration, redeploy your API so that stages receive the new settings. When you change a security policy, the update takes up to 15 minutes to complete. The API status appears as UPDATING while the change propagates and returns to AVAILABLE when complete. Your API remains fully functional throughout this process.

Rolling back endpoint access mode

If you notice clients failing to connect to your API after applying the STRICT mode, you can revert the endpoint access mode back to BASIC at any time. Below code snippet illustrates doing this for a REST API.

aws apigateway update-rest-api --rest-api-id abcd123 --patch-operations '[
    {
      "op": "replace",
      "path": "/endpointAccessMode",
      "value": "BASIC"
    }
  ]'

You can use the same approach to update a custom domain name.

Monitoring TLS usage and policy migrations

As you adopt enhanced security policies, it is important to understand how clients negotiate encrypted connections with your API. Monitoring helps you verify client readiness, identify legacy consumers that may require updates, and validate that STRICT endpoint access mode behaves as expected during rollout. Use the following API Gateway access logs variables to monitor protocol and cipher usage over time.

  • $context.tlsVersion – the negotiated TLS version
  • $context.cipherSuite – the cipher suite selected during the handshake

You can use these variables to confirm that:

  • Clients are using the expected minimum TLS version
  • BC-based ciphers are no longer used after you move to a hardened policy
  • PQC and FIPS-aligned policies are being exercised by the appropriate clients

Access logs are especially useful during migrations, where validating the actual client behavior is a prerequisite before enabling STRICT mode. For example, if you still observe live clients negotiating TLS 1.0 or TLS 1.2 CBC ciphers after applying a hardened policy in BASIC mode, you can identify the affected clients and plan remediation before switching to STRICT mode.

Future-proof security configurations

Some of the new policies combine TLS 1.3 with post-quantum cryptography (PQC) to help you prepare for a future where quantum-capable threat actors exist. With these policies you can start testing and adopting quantum-resistant algorithms without redesigning your API architecture.

As standards evolve and new cipher suites are introduced, API Gateway’s policy model provides you with a clear path for adding new variants while keeping your configuration simple and predictable.

Conclusion and next steps

Enhanced TLS security policies and endpoint access mode in the Amazon API Gateway gives you direct control over how clients establish secure connections to your APIs. You can choose the policies that match your compliance needs, such as PCI DSS, FIPS, Open Banking, PQC, and use STRICT mode to control how traffic reaches your endpoints and apply additional domain-level validations, further hardening security of your APIs

To get started:

  1. Review the list of available security policies in the API Gateway documentation.
  2. Identify which REST APIs and domains require stronger TLS controls.
  3. Apply an appropriate SecurityPolicy-* policy with BASIC mode.
  4. Validate client behavior using access logs and CloudWatch metrics.
  5. Move to STRICT mode when you are ready to enforce additional connection-level protection.

For more information about building Serverless architectures, see ServerlessLand.com

Build scalable REST APIs using Amazon API Gateway private integration with Application Load Balancer

Post Syndicated from Christian Silva original https://aws.amazon.com/blogs/compute/build-scalable-rest-apis-using-amazon-api-gateway-private-integration-with-application-load-balancer/

This post is written by Vijay Menon, Principal Solutions Architect, and Christian Silva, Senior Solutions Architect.

Today, we announced Amazon API Gateway REST API’s support for private integration with Application Load Balancers (ALBs). You can use this new capability to securely expose your VPC-based applications through your REST APIs without exposing your ALBs to the public internet.

Prior to this launch, if you wanted to connect API Gateway to private ALBs, you would have had to use a Network Load Balancer (NLB) as an intermediary, increasing cost and complexity. Now, you can directly integrate API Gateway with private ALBs without requiring an NLB, reducing operational overhead and optimizing cost.

Previous architecture: Connecting API Gateway to private ALBs

Before this launch, API Gateway REST APIs connect to private ALB resources through an NLB positioned in front of the ALB. Many customers have successfully built and operated production workloads using this architecture, demonstrating its reliability for business-critical applications. The following architecture demonstrates this setup.

Figure 1. Previous architecture: API Gateway to private ALB via intermediary NLB

In response to customer feedback for a simplified architecture and reduced costs, we’ve extended VPC link v2 support to REST APIs. This feature now enables direct private ALB integration for REST APIs, eliminating the need for an intermediary NLB.

New architecture: Connecting API Gateway to private ALBs

With direct private ALB integration, this architecture becomes simpler and more efficient. The integration removes the need for an intermediate NLB, reducing the number of hops between client and your services. This streamlined setup simplifies the architecture for applications, allowing more efficient use of ALB’s layer-7 load-balancing capabilities, authentication, and authorization features. While these ALB features were technically accessible before, the new architecture removes the overhead and complexity of managing an additional NLB. Here’s how the simplified architecture looks now:

Figure 2. Direct integration between API Gateway and private ALB

Benefits of a direct integration between your API Gateway endpoint and your private ALB

  • Architectural simplification and operational excellence: Now that your API Gateway can directly connect to your private ALB, you no longer need an NLB to act as a bridge between your API Gateway and your private ALB. This eliminates the need to provision, configure, manage, or monitor an intermediate load balancer. The reduction in infrastructure components translates to reduced operational overhead and fewer potential failure points. Traffic flows directly from API Gateway to your ALB within the Amazon Web Services (AWS) network, reducing network hops and latency.
  • Improved scalability: VPC link v2 supports a one-to-many relationship with load balancers. A single VPC link v2 allows API Gateway to integrate with multiple ALBs or NLBs within your VPC. This architectural advantage is particularly valuable for organizations managing complex applications with multiple microservices, each potentially behind its own ALB, or those running numerous APIs. The ability to consolidate multiple load balancer connections through a single VPC link not only reduces administrative overhead but also provides greater flexibility in scaling your architecture. As your application grows and you add more services or load balancers, you won’t need to provision additional VPC links, making it easier to expand your infrastructure while maintaining operational efficiency.
  • Cost optimization: You can remove the NLB from your architecture and thereby eliminate both the hourly charges for running the NLB and the associated Network Load Balancer Capacity Units (NLCU) used per hour. For organizations running multiple environments or numerous APIs, these savings can accumulate to thousands of dollars annually. Moreover, your data transfer patterns become more efficient. Traffic flows directly from API Gateway to your ALB within the AWS network, which avoids any unnecessary hops that could incur more data transfer charges. This streamlined path not only reduces costs but also improves performance by minimizing network latency.

Getting started

This tutorial demonstrates the setup using both the AWS Management Console and AWS Command Line Interface (AWS CLI). Before you begin, make sure that you have an internal ALB configured in your VPC. For resources that need naming, use appropriate names for your environment.

Step 1: Create a VPC link v2
The first step in our process is to create a VPC link v2, which will enable API Gateway to route traffic to your internal ALB. Here’s how to set it up:

  1. Navigate to the API Gateway console.
  2. In the left navigation pane, choose VPC links.
  3. Choose Create VPC link.
  4. Choose VPC link v2 as the VPC link type.
  5. Provide a descriptive name for your VPC link.
  6. Choose your VPC and subnets where your ALB resides. For high availability, choose subnets in multiple AWS Availability Zones (AZs) that match your ALB configuration.
  7. Assign one or more security groups to your VPC link. These security groups will control the traffic flow between API Gateway and your VPC.
  8. Choose Create and wait for the VPC link status to become Available. This process can take a few minutes.

Alternatively, you can create a VPC link v2 using the AWS CLI:

# Create VPC link v2
aws apigatewayv2 create-vpc-link \
    --name "test-vpc-link-v2" \
    --subnet-ids "<your-subnet1-id>" "<your-subnet2-id>" \
    --security-group-ids "<your-security-group-id>" \
    --region <your-AWS-region>

# Check VPC link v2 status
aws apigatewayv2 get-vpc-link \
    --vpc-link-id "<your-vpc-link-v2-id>" \
    --region <your-AWS-region>

Step 2: Create a REST API and configure integration
With your VPC link v2 now available, the next step is to create a REST API and configure it to use the VPC Link. This process involves creating the API, setting up resources and methods, and configuring the integration with your internal ALB.

  1. In the API Gateway console, choose Create API.
  2. Choose REST API.
  3. Enter an API name and choose Create API.
  4. Create a new resource by choosing Actions, then choose Create resource. This resource will represent the endpoint for your API.
  5. Create a method by choosing Actions, then choose Create method. The method defines the type of request your API will accept (GET, POST, etc.).
  6. Now, configure the integration. This is where you’ll connect your API to your internal ALB via the VPC link v2:
    1. Choose VPC link as the integration type.
    2. Choose the HTTP method for your backend integration.
    3. Choose your newly created VPC link v2.
    4. Specify your ALB as the Integration target.
    5. Enter the endpoint URL for your integration. The port specified in the URL is used to route requests to the backend.
    6. Set the Integration timeout.

Using the AWS CLI:

# Create REST API
aws apigateway create-rest-api \
    --name "test-rest-api" \
    --description "REST API integration with internal ALB via VPC link v2" \
    --region <your-AWS-region>

# Get REST API’s root resource ID
aws apigateway get-resources \
    --rest-api-id "<your-rest-api-id>" \
    --region <your-AWS-region>

# Create a new resource
aws apigateway create-resource \
    --rest-api-id "<your-rest-api-id>" \
    --parent-id "<your-parent-id>" \
    --path-part "internal-alb" \ 
    --region <your-AWS-region>

# Create a new method
aws apigateway put-method \
    --rest-api-id "<your-rest-api-id>" \
    --resource-id "<your-resource-id>" \
    --http-method ANY \
    --authorization-type NONE \
    --region <your-AWS-region>

# Create the integration
aws apigateway put-integration \
    --rest-api-id "<your-rest-api-id>" \
    --resource-id "<your-resource-id>" \
    --http-method ANY \
    --type HTTP_PROXY \
    --integration-http-method ANY \
    --uri "http://test-internal-alb.com/test" \
    --connection-type VPC_LINK \
    --connection-id "<your-vpc-link-v2-id>" \
    --integration-target "<your-ALB-arn>" \
    --region <your-AWS-region>

Step 3: Deploy and test
With your API configured, it’s time to deploy it and verify that it’s working correctly.

  1. Choose Deploy API to create a new deployment of your API.
  2. Create a new stage (for example “test”). Stages allow you to manage multiple versions of your API.
  3. After deployment, you’ll receive an API endpoint URL. Copy this URL as you’ll need it for testing.

Test your API using your preferred API client or a simple curl command.

Using the AWS CLI:

# Create a new deployment to a test stage
aws apigateway create-deployment \
    --rest-api-id "<your-rest-api-id>" \
    --stage-name "test" \
    --region <your-AWS-region>

Test your API integration using a curl command:

curl https://<rest-api-id>.execute-api.<your-aws-region>.amazonaws.com/internal-alb
{"message": "Hello from internal ALB"}

Step 4: Scale your VPC link v2
A single VPC link can now connect to multiple ALBs or NLBs within your VPC, simplifying infrastructure management. This AWS CLI snippet demonstrates API Gateway integrating with multiple internal services, for example orders and payments services, each behind its own ALB, using a single VPC link v2. Note how the same VPC link ID is used across both integrations.

# Orders service integration (ALB-1)
aws apigateway put-integration \
    --rest-api-id "<your-rest-api-id>" \
    --resource-id "<orders-resource-id>" \
    --http-method ANY \
    --type HTTP_PROXY \
    --integration-http-method ANY \
    --uri "<your-orders-alb-endpoint>" \
    --connection-type VPC_LINK \
    --connection-id "<your-vpc-link-v2-id>" \
    --integration-target "<your-orders-alb-arn>" \
    --region "<your-aws-region>"

# Payments service integration (ALB-2)
aws apigateway put-integration \
    --rest-api-id "<your-rest-api-id>" \
    --resource-id "<payments-resource-id>" \
    --http-method ANY \
    --type HTTP_PROXY \
    --integration-http-method ANY \
    --uri "<your-payments-alb-endpoint>" \
    --connection-type VPC_LINK \
    --connection-id "<your-vpc-link-v2-id>" \
    --integration-target "<your-payments-alb-arn>" \
    --region "<your-aws-region>"

For a detailed, step-by-step guide, please see our official documentation in the API Gateway Developer Guide.

Use cases

Private ALB integration with API Gateway enables architectural patterns that solve enterprise challenges. These are three key scenarios where organizations can use this new capability:

  • Microservices on Amazon ECS and Amazon EKS: Exposing microservices running on Amazon ECS or Amazon EKS becomes simpler with this integration. It allows secure, path-based routing to different services without exposing your ALB to the public internet or using complex NLB proxy patterns.
  • Hybrid cloud architectures: Seamless and secure connectivity between cloud-native APIs and on-premises resources is achieved via AWS Direct Connect or AWS Site-to-Site VPN. This setup allows flexible routing based on HTTP methods and headers to various internal systems.
  • Enterprise modernization: Gradual application modernization is facilitated by enabling phased migration from monolithic architectures to microservices. Organizations can route traffic between legacy and new components while maintaining operational continuity and minimizing risk.

Conclusion

Direct private integration between API Gateway REST APIs and ALBs enhances API architecture on AWS. By simplifying infrastructure and reducing operational overhead, this capability improves performance and efficiency for API-driven applications.

This feature is available today in all AWS Regions where VPC link v2 and ALBs are present. We can’t wait to see what you build with it and how it transforms your API architectures. Get started now by visiting the API Gateway console and creating your first VPC link v2 for direct ALB integration.

For more information, visit the API Gateway product page, review our pricing details, and explore the comprehensive developer documentation to learn about all the powerful features available to help you build world-class APIs on AWS.

Serverless strategies for streaming LLM responses

Post Syndicated from KyungYong Shim original https://aws.amazon.com/blogs/compute/serverless-strategies-for-streaming-llm-responses/

Modern generative AI applications often need to stream large language model (LLM) outputs to users in real-time. Instead of waiting for a complete response, streaming delivers partial results as they become available, which significantly improves the user experience for chat interfaces and long-running AI tasks. This post compares three serverless approaches to handle Amazon Bedrock LLM streaming on Amazon Web Services (AWS), which helps you choose the best fit for your application.

  1. AWS Lambda function URLs with response streaming
  2. Amazon API Gateway WebSocket APIs
  3. AWS AppSync GraphQL subscriptions

We cover how each option works, the implementation details, authentication with Amazon Cognito, and when to choose one over the others.

Lambda function URLs with response streaming

AWS Lambda function URLs provide a direct HTTP(S) endpoint to invoke your Lambda function. Response streaming allows your function to send incremental chunks of data back to the caller without buffering the entire response. This approach is ideal for forwarding the Amazon Bedrock streamed output, providing a faster user experience. Streaming is supported in Node.js 18+. In Node.js, you wrap your handler with awslambda.streamifyResponse(), which provides a stream to write data to, and which sends it immediately to the HTTP response.

Architecture

The following figure shows the architecture.

Lambda function URLs with Amazon Bedrock architecture

  1. The client makes a fetch() call to the Lambda function URL.
  2. Lambda invokes InvokeModelWithResponseStream using the AWS SDK for JavaScript.
  3. As tokens arrive from Amazon Bedrock, they are written to the response stream.

Implementation steps

  1. Create a streaming Lambda function: Use a Node.js 18+ or later runtime (necessary for native streaming). Install the AWS SDK to call Amazon Bedrock. In the handler code, wrap the function with awslambda.streamifyResponse and stream the model output. For example, in Node.js you might do the following:
    const bedrock = new BedrockRuntimeClient({region: “us-east-1”});
    
    // Please consider adding more details when you use it for your application.
    exports.handler = awslambda.streamifyResponse(async (event, responseStream) => 
    {
        // 1. Parse input (e.g., prompt from event)
        const prompt = event.body?.prompt;
        // 2. Call Amazon Bedrock with streaming (using AWS SDK for Amazon Bedrock)
        const command = new InvokeModelWithResponseStreamCommand({ modelId: "YOUR_MODEL_ID", body: { prompt }});
        const response = await bedrock.send(command);
        // 3. Stream Bedrock tokens to client
        for await (const event of response.body) {
            if (event.content) {
                responseStream.write(event.content); // write partial output
            }
        }
        // 4. End stream when done
        responseStream.end();
    });
    

  2. This code snippet uses the Amazon Bedrock SDK’s async iterable to read the event stream of tokens and writes each to the responseStream.
  3. Configure the Lambda role: the execution role must allow the Amazon Bedrock invocation (such as bedrock:InvokeModelWithResponseStream on the LLM model Amazon Resource Name (ARN)).

Authentication with Amazon Cognito

Lambda function URLs can be set to “None” (public) or “AWS_IAM”. Native Cognito User Pool token authentication isn’t supported, thus you need to implement a solution.

  1. JWT verification in Lambda: Allow public access and verify a valid JWT from Amazon Cognito in the request header within your Lambda code. This necessitates development effort.
    // Initialize Cognito JWT Verifier
    const { CognitoJwtVerifier } = require('aws-jwt-verify');
    
    const jwtVerifier = CognitoJwtVerifier.create({
      userPoolId: USER_POOL_ID,
      tokenUse: 'id',
      clientId: USER_POOL_CLIENT_ID
    });
    
    // Verify JWT token from Cognito
    async function verifyToken(token) {
      try {
        if (!token) throw new Error('No authorization token provided');
        
        // Remove 'Bearer ' prefix if present
        if (token.startsWith('Bearer ')) {
          token = token.slice(7);
        }
    
        // Verify the token using Cognito JWT Verifier
        const payload = await jwtVerifier.verify(token);
        logger.info(`Verified token for user: ${payload.sub}`);
        
        return payload;
      } catch (error) {
        logger.error(`Token verification failed: ${error.message}`);
        throw new Error(`Invalid token: ${error.message}`);
      }
    }
    
    //...
    
        // Verify authentication
        let userId;
        try {
          const authHeader = event.headers?.Authorization;
          const payload = await verifyToken(authHeader);
          userId = payload.sub;
          logger.info(`Authenticated user: ${userId}`);
        } catch (error) {
          responseStream.write(`data: ${JSON.stringify({ type: 'error', error: 'Unauthorized', message: error.message })}\n\n`);
          return;
        }
    

  2. IAM authorization with Amazon Cognito identity: Use AWS credentials obtained from Amazon Cognito. A more complex setup, especially for web apps, is potentially overkill for a single function.

Pros and cons of Lambda function URLs

Pros:

  • Clarity: No API Gateway or other services are needed, which minimizes operational overhead.
  • Low latency, high throughput: The response is delivered directly from Lambda to the client. This yields excellent Time To First Byte (TTFB) performance, with no intermediate buffering.
  • Direct implementation: For Node.js developers, enabling streaming is as direct as a wrapper and writing to a stream. This is ideal for quick prototypes or single function microservices.
  • Lower cost for low concurrent usage: You pay only for Lambda execution time. There’s no persistent connection cost, which is the same as with WebSocket or AWS AppSync. If invocations are infrequent or short, then this could be cost-efficient.

Cons:

  • Limited runtime support: Native streaming is only supported in Node.js.
  • No built-in user pool auth: Unlike API Gateway or AWS AppSync, Lambda URLs don’t directly support Amazon Cognito user pool authorizers. You must handle auth either through AWS Identity and Access Management (IAM) or manual token validation, adding some development effort and potential security pitfalls if done incorrectly.
  • Error handling complexity: Streaming makes error propagation trickier. If an error occurs mid-stream, then you need to decide how to inform the client.

API Gateway WebSocket for streaming

API Gateway WebSocket APIs establish persistent, stateful connections between clients and your backend. This is ideal for real-time applications needing server-initiated messages. The client connects once, sends a prompt to Amazon Bedrock through the WebSocket, and the server pushes the streamed response back over the same connection.

Architecture

The following figure shows the architecture.

API Gateway WebSocket with Amazon Bedrock architecture

  1. Client connects through the WebSocket URL and store connectionId.
  2. Client sends a prompt through a custom route to the LLMHandler.
  3. Lambda as LLMHandler invokes Amazon Bedrock and streams back through WebSocket.
  4. Client disconnects through the DisconnectHandler and removes connectionId.

Implementation steps

  1. Create a WebSocket API in API Gateway with routes
    1. $connect: Connected to ConnectHandler Lambda.
    2. $disconnect: Connected to DisconnectHandler Lambda.
    3. $stream: All messages go to StreamHandler Lambda.
  2. Create Lambda Authorizer
    1. Receives the connection request with token in query string.
    2. Validates the JWT token against Amazon Cognito.
    3. Returns Allow/Deny policy for the connection.
      def lambda_handler(event, context):
          # Extract token from querystring
          token = event.get("queryStringParameters", {}).get("token", "")
          
          # Validate JWT token against Cognito
          if validate_token(token):
              return {
                  "isAuthorized": True,
                  # Optionally include context that other handlers can access
                  "context": {
                      "userId": extracted_user_id
                  }
              }
          else:
              return {"isAuthorized": False}
      

  3. Create Connection Handler
    1. Connection Lambda runs after successful authorization.
    2. Receives the new connection’s unique connectionId.
    3. Store connection info in Amazon DynamoDB (optional).
    4. Returns 200 status to complete the connection.
      def lambda_handler(event, context):
          # Extract connectionId
          connection_id = event.get("requestContext", {}).get("connectionId")
          
          # Optionally store in DynamoDB
          # dynamodb.put_item(...)
          
          # Connection established successfully
          return {"statusCode": 200}
      

  4. Create Disconnect Handler
    1. Disconnect Lambda is triggered automatically when clients disconnect.
    2. Receives the terminated connection’s connectionId.
    3. Cleans up any stored connection data.
    4. Returns 200 status
      def lambda_handler(event, context):
          # Extract connectionId
          connection_id = event.get("requestContext", {}).get("connectionId")
          
          # Optionally remove from DynamoDB
          # dynamodb.delete_item(...)
          
          # Disconnection handled successfully
          return {"statusCode": 200}
      

  5. Create LLM Handler
      1. Receives messages sent to the stream route.
      2. Extracts prompt from the message body.
      3. Calls Amazon Bedrock model with streaming response.
      4. Streams tokens back to the client using the connection ID.
        def lambda_handler(event, context):
            # Extract connectionId and domain details for sending responses
            connection_id = event["requestContext"]["connectionId"]
            domain = event["requestContext"]["domainName"]
            stage = event["requestContext"]["stage"]
            
            # Parse message body to get the prompt
            body = json.loads(event.get("body", "{}"))
            prompt = body.get("prompt", "")
            
            # Create API Gateway management client for sending responses
            api_client = boto3.client(
                'apigatewaymanagementapi',
                endpoint_url=f'https://{domain}/{stage}'
            )
            
            # Call Amazon Bedrock with streaming response
            response = bedrock_client.invoke_model_with_response_stream(...)
            
            # Stream tokens back to client
            for chunk in response["body"]:
                # Extract token from chunk
                token = process_chunk(chunk)
                
                # Send token directly back through the WebSocket
                api_client.post_to_connection(
                    ConnectionId=connection_id,
                    Data=json.dumps({"token": token, "isComplete": False})
                )
            
            # Send completion message
            api_client.post_to_connection(
                ConnectionId=connection_id,
                Data=json.dumps({"token": "", "isComplete": True})
            )
            
            return {"statusCode": 200}
        

Authentication with Amazon Cognito

Securing a WebSocket API with Amazon Cognito needs a bit more work. API Gateway WebSocket doesn’t have a built-in Amazon Cognito User Pool authorizer:

  1. Lambda authorizer with JWT authentication: API Gateway invokes your Lambda authorizer upon connection, validating the Amazon Cognito JWT (passed as a query parameter). The Lambda generates an IAM policy granting access and returns it.
  2. IAM authentication for WebSockets: Clients sign requests with SigV4 using AWS credentials from an Amazon Cognito Identity Pool. API Gateway evaluates the request against IAM policies.

Pros and cons of API Gateway WebSocket APIs

Pros:

  • Bidirectional real-time communication: WebSockets are ideal for applications where the server needs to push data such as the LLM’s response without explicit requests.
  • Persistent connection for multi-turn conversations: After the initial handshake, the same connection can be reused for subsequent prompts and responses, avoiding repeated setup latency. This is great for a chat UI where the user asks multiple questions in one session.
  • Scalability: API Gateway is a managed service that can handle 500 connections/second and 10,000 requests/second across APIs, which can be increased by request.

Cons:

  • Higher development complexity: When compared to the clarity of a direct Lambda URL, a WebSocket API involves multiple Lambdas and coordination to manage the connection state.
  • Custom auth implementation: There is no built-in Amazon Cognito user pool integration, thus you must implement a Lambda authorizer.
  • Timeout management: The API Gateway integration timeout is 29 s, thus your Lambda function should return the response promptly.

AWS AppSync GraphQL subscription

AWS AppSync is a fully managed GraphQL service that streamlines building real-time APIs. It handles WebSocket connections and client fan-out automatically. Clients subscribe to a GraphQL subscription, and a Lambda resolver pushes the Amazon Bedrock streamed tokens back.

Architecture

The following figure shows the architecture.

AWS AppSync GraphQL subscription with Amazon Bedrock architecture

  1. Client calls a startStream mutation. AppSync invokes the Request Lambda.
  2. The Request Lambda immediately returns a unique sessionId and sends the processing task to an Amazon Simple Queue Service (Amazon SQS) queue.
  3. Client uses the sessionId to subscribe to an onTokenReceived GraphQL subscription.
  4. The Processing Lambda (triggered by Amazon SQS) invokes Amazon Bedrock and, for each token, calls a publishToken mutation in AWS AppSync.
  5. AWS AppSync automatically pushes the token to all clients subscribed with the matching sessionId.

Implementation steps

  1. Design the GraphQL Schema: define types and operations.
    type StreamResponse {
      sessionId: String!
      status: String!
      message: String
      timestamp: String!
      error: String
    }
    
    type TokenEvent {
      sessionId: String!
      token: String!
      isComplete: Boolean!
      timestamp: String!
    }
    
    type Mutation {
      startStream(prompt: String!): StreamResponse!
      publishToken(sessionId: String!, token: String!, isComplete: Boolean!): TokenEvent!
    }
    
    type Subscription {
      onTokenReceived(sessionId: String!): TokenEvent
    

  2. Create the Request Handler (Request Lambda)
    1. Receives the GraphQL mutation with the prompt.
    2. Generates a unique session ID.
    3. Sends the prompt and session ID to the SQS queue.
    4. Returns the session ID to the client immediately.
      def lambda_handler(event, context):
          # Extract prompt from GraphQL event
          prompt = event["arguments"]["prompt"]
          
          # Generate unique session ID
          session_id = str(uuid.uuid4())
          
          # Send message to SQS queue
          sqs_client.send_message(
              QueueUrl="your-sqs-queue-url",
              MessageBody=json.dumps({
                  "prompt": prompt,
                  "sessionId": session_id
              })
          )
          
          # Return session ID to client
          return {
              "sessionId": session_id,
              "status": "streaming_started",
              "timestamp": datetime.datetime.utcnow().isoformat()
          }
      

  3. Create the Processing Handler (Processing Lambda)
    1. It is triggered by Amazon SQS messages.
    2. It calls Amazon Bedrock with streaming enabled.
    3. For each token generated, it calls the AppSync publishToken mutation.
      def lambda_handler(event, context):
          # Process SQS event records
          for record in event["Records"]:
              body = json.loads(record["body"])
              prompt = body["prompt"]
              session_id = body["sessionId"]
              
              # Call Amazon Bedrock with streaming
              response = bedrock_client.invoke_model_with_response_stream(...)
              
              # Process streaming response
              for chunk in response["body"]:
                  # Extract token from chunk
                  token = process_chunk(chunk)
                  
                  # Publish token to AppSync
                  publish_token_to_appsync(
                      session_id=session_id,
                      token=token,
                      is_complete=False
                  )
              
              # Send completion token
              publish_token_to_appsync(
                  session_id=session_id,
                  token="",
                  is_complete=True
              )
      

  4. Configure GraphQL Resolvers
    1. StartStream resolver: Connect to the Request Lambda.
    2. PublishToken resolver: Trigger subscription with a NONE data source.
  5. Client subscription setup
    1. Make a startStream mutation.
      const { sessionId } = await client.mutate({
        mutation: START_STREAM,
        variables: { prompt }
      });
      

    2. Subscribe to receive tokens.
      client.subscribe({
        query: ON_TOKEN_RECEIVED,
        variables: { sessionId }
      }).subscribe({
        next: ({ data }) => {
          if (data.onTokenReceived.isComplete) {
            // Handle completion
          } else {
            // Append token to UI
            appendToken(data.onTokenReceived.token);
          }
        }
      });
      

Authentication with Amazon Cognito

AWS AppSync integrates seamlessly with Amazon Cognito User Pools. Setting the API’s auth mode to Amazon Cognito User Pool needs a valid JWT for every GraphQL operation. This is the most developer-friendly option for authentication. AWS AppSync handles the handshake and token refresh.

Pros and cons of AWS AppSync subscriptions

Pros:

  • Fully managed real-time protocol: You don’t deal with raw WebSockets or connection IDs at all. AWS AppSync automatically establishes and maintains a secure WebSocket for subscriptions (no need for a connect or disconnect Lambda).
  • Streamlined authentication: Built-in support for Amazon Cognito User Pool tokens means that you can secure the API without writing custom authorizers.

Cons:

  • Potential overhead and complexity: For a direct case (one prompt—one stream), introducing GraphQL and AWS AppSync might be seen as over-engineering if your app doesn’t use GraphQL for other use cases.
  • 30-second resolver limit: AWS AppSync has a 30-second limit for mutation resolvers, thus you need to design the initial request to start the process and immediately return, relying on a subscription to stream the results progressively to avoid blocking the user.

Conclusion

The Amazon Bedrock streaming interface unlocks fluid, low-latency LLM experiences. You can use the right AWS serverless architecture to deliver streamed responses in a secure, scalable, and cost-effective way.

  • Lambda function URLs with streaming: Direct, single-user applications and prototypes.
  • API Gateway WebSocket: Multi-turn conversations, collaborative applications.
  • AppSync: Complex applications already using GraphQL.

Each method is serverless, production-ready, and fully integrated with Amazon Cognito for secure access control. AWS provides the flexibility to design high-quality AI user experiences at scale.

Refer to GitHub sample source code for more details.

Comparative table

Feature LAMBDA FUNCTION URLS API GATEWAY WEBSOCKET APIs APPSYNC GRAPHQL SUBSCRIPTIONS
Complexity Lowest Medium High
Real-time focus Limited Strong Strong
Authentication Needs custom logic Needs custom logic Built-in Amazon Cognito support
Scalability Good Good Excellent
GraphQL support None None Native
Use cases Q&A Chatbots, real-time apps Complex apps, multi-user scenarios
Cost Pay per invocation Connection time and Lambda execution Request/connection-based pricing

 

Building multi-tenant SaaS applications with AWS Lambda’s new tenant isolation mode

Post Syndicated from Anton Aleksandrov original https://aws.amazon.com/blogs/compute/building-multi-tenant-saas-applications-with-aws-lambdas-new-tenant-isolation-mode/

Today, AWS announced a new tenant isolation mode for AWS Lambda, that allows you to process function invocations in separate execution environments for each application end-user or tenant invoking your Lambda function. This capability simplifies building secure multi-tenant SaaS applications by managing tenant-level compute environment isolation and request routing for you. As a result, you can focus on your core business logic rather than implementing your own tenant-aware compute environment isolation.

Overview

Lambda runs your function code in secure execution environments that leverage Firecracker virtualization to provide isolation. These execution environments never share or reuse virtual resources (such as vCPU, disk, or memory) across functions, or even across different versions of the same function. However, Lambda can reuse execution environments for multiple invocations of the same function version, as these execution environments are fully set-up and can therefore deliver faster request processing for your functions.

Figure 1. Incoming invocations processed by a collection of execution environments that belong to a single function.

Figure 1. Incoming invocations processed by a collection of execution environments that belong to a single function.

Multi-tenant SaaS applications that handle sensitive tenant-specific data or execute code supplied dynamically by tenants may need a higher degree of isolation—at the individual application tenant level rather than at the function level—for secure code execution and to reduce the risk of cross-tenant data access.

Prior to today’s launch, developers would implement custom solutions, such as SDKs or application logic to manage isolation within function code. This approach was bug-prone, required more work from application development teams, and didn’t ensure isolation at the compute environment level.

Alternatively, developers adopted the approach of creating separate functions per application tenant, replicating the same code across hundreds or thousands of tenants. This approach provided stronger compute environment isolation than sharing compute environments across multiple tenants of the same function, but increased implementation overhead and operational complexity as workloads grew to support a larger number of tenants over time.

Figure 2. Using function-per-tenant model, each tenant’s requests are processed by a separate function.

Figure 2. Using function-per-tenant model, each tenant’s requests are processed by a separate function.

Starting today, AWS Lambda offers a new tenant isolation mode that lets you isolate execution environments used across different tenants of your multi-tenant SaaS applications, even when all of the tenants invoke the same function. When you enable the new tenant isolation mode, you include a tenant identifier with each function invocation. Lambda uses this identifier to route the request to the correct execution environment. As a result, each execution environment is reused only for invocations from the same tenant. This means you still get the performance benefits of warm execution environments, while ensuring that each tenant’s workloads remain isolated.

Figure 3. With the new tenant isolation capability, Lambda creates separate execution environments per tenant for a single function.

Figure 3. With the new tenant isolation capability, Lambda creates separate execution environments per tenant for a single function.

For organizations handling sensitive tenant-specific data or running untrusted code supplied dynamically by end-users, Lambda’s new tenant isolation mode provides the security benefits of per-tenant compute environment separation without the operational complexity of managing individual functions or infrastructure for each tenant.

Example scenario

Consider building a multi-tenant serverless SaaS application. To optimize performance, your function handler can retrieve tenant-specific configuration and data, cache it in memory, and reuse it for subsequent invocations from the same tenant. For example, you might cache tenant-specific database location, feature flags, or business rules that are frequently accessed during request processing. You may store this information within the application runtime process as global variables or as files in the /tmp directory. However, if the underlying execution environment is used to serve multiple tenants, this approach can potentially expose data across tenants.

With tenant isolation mode you can address this risk with much simpler architecture and configuration. This built-in capability makes Lambda an excellent choice for multi-tenant SaaS applications needing isolated compute environments for individual tenants.

Getting Started with Lambda Tenant Isolation Mode

Use the new tenancy-config parameter to configure tenant isolation mode when you create your function. You can only apply this configuration at function creation time; it cannot be updated for existing functions. The following snippet creates a function with tenancy config using the AWS CLI.

aws lambda create-function \
   --function-name my-function1 \
   --runtime nodejs22.x \
   --zip-file fileb://my-function1.zip \
   --handler index.handler \
   --role arn:aws:iam:1234567890:role/my-function-role \
   --tenancy-config '{"TenantIsolationMode": "PER_TENANT"}'

After the function is created, you must provide the tenant ID parameter with each invocation. Lambda uses this identifier to ensure that the execution environment used for a particular tenant is never reused for other tenants. For subsequent invocations from the same tenant, Lambda may reuse the execution environment to optimize performance. Specify this tenant-id parameter as illustrated below:

aws lambda invoke \
   --function-name my-function \
   --tenant-id BlueTenant \
   response.json

The new tenant-id parameter is required for functions using the tenant isolation mode. Function invocations omitting this parameter will fail with an invocation error, as shown below:

aws lambda invoke --function-name multitenant-function out.json

An error occurred (InvalidParameterValueException) when calling the Invoke operation:
The invoked function is enabled with tenancy configuration. 
Add a valid tenant ID in your request and try again.

Lambda makes the tenant ID parameter available through your function handler’s context object. This allows you to access tenant-specific information in your code, for example if you wish to implement custom logic based on the tenant identity, as shown below:

exports.handler = async function (event, context) {
   const tenantId = context.tenantId;

   // Process tenant-specific logic

   return {
      statusCode: 200,
      body: `OK for tenantId=${tenantId}`
   };
};

The following table outlines differences between Lambda functions with and without tenant isolation mode enabled:

Feature Without the new
tenant isolation mode
With the new
tenant isolation mode
Execution environment isolation Isolated per function version. Isolated per end-user or tenant invoking a function version.
Execution environment reuse Can be reused to process all invocations of a function version. Can only be reused to process invocations from the same tenant invoking a function version.
Data stored on local disk and in-memory Potentially accessible across all invocations of a function version. Potentially accessible across invocations from the same tenant. Not accessible for invocations from other tenants.
Cold starts Occur when there are no warm execution environments available to process incoming invocation. Occur when there are no tenant-specific warm execution environments available to process incoming invocation. More cold starts expected due to tenant-specific execution environments.

Integrating with Amazon API Gateway

Amazon API Gateway uses Lambda’s Invoke API to invoke Lambda functions. When using the Invoke API, Lambda expects the tenant ID parameter to be passed using the X-Amz-Tenant-Id HTTP header. You can configure API Gateway to inject this HTTP header into the Lambda invocation request with a value obtained from client request properties such as HTTP header, query parameter, or path parameter. When using Lambda Authorizers, you can obtain the value from authorization context information returned by the authorizer, such as principal ID or JWT claim. See API Gateway documentation to learn how you can return authorization information from Lambda authorizers to be used for the X-Amz-Tenant-Id header value.

Figure 4. Obtaining X-Amz-Tenant-Id header value from authentication sources.

Figure 4. Obtaining X-Amz-Tenant-Id header value from authentication sources.

The following screenshot illustrates API Gateway Lambda integration configuration, where the incoming request to API Gateway includes an x-tenant-id header that is mapped to the X-Amz-Tenant-Id request header to invoke a Lambda function using tenant isolation mode.

Figure 5. Mapping client request header to Lambda tenant-id header.

Figure 5. Mapping client request header to Lambda tenant-id header.

The following code snippet illustrates this configuration implemented with the AWS CDK.

const lambdaIntegration = new ApiGw.LambdaIntegration(fn, {
   requestParameters: {
      // This configures API Gateway to inject X-Amz-Tenant-Id header
      // into downstream requests. The header value is obtained from 
      // x-tenant-id header in the client request.
      'integration.request.header.X-Amz-Tenant-Id': 'method.request.header.x-tenant-id'
   }
});

resource.addMethod('GET', lambdaIntegration, {
   requestParameters: {
      // This enables API Gateway to use the x-tenant-id header value 
      // obtained from the client request. The header name is arbitrary.
      // you can use any other header name. 
      'method.request.header.x-tenant-id': true
   }
});

Tenant-aware observability

For functions using tenant isolation, Lambda automatically includes the tenant ID in function logs when you have JSON logging enabled, making it easier to monitor and debug tenant-specific issues. Note that the tenantId property is available during function invocation, rather than during function initialization. The tenantId property is included for both platform events (like platform.start and platform.report) and custom logs you print in your function code, as shown in the following screenshot:

Figure 6. Lambda function logs with tenantId.

Figure 6. Lambda function logs with tenantId.

Lambda creates a separate CloudWatch log stream for each execution environment. You can use CloudWatch Log Insights to find log streams that belong to a particular tenant by filtering by tenant Id:

fields @logStream, @message
| filter tenantId=='BlueTenant' or record.tenantId=='BlueTenant'
| stats count() as logCount by @logStream
| sort @timestamp desc

You can also retrieve tenant-specific logs across all log streams:

fields @message
| filter tenantId=='BlueTenant' or record.tenantId=='BlueTenant'
| limit 1000

Each log stream starts with function initialization logs followed by the invocation logs. This structure helps you to debug tenant-specific issues and understand the lifecycle of each tenant’s execution environments.

Considerations

When using the new tenant isolation for Lambda functions, consider the following:

  • Each tenant’s execution environments are isolated from other tenants so that tenant-specific data stored on disk or in memory remain separated from other tenants invoking the same Lambda function.
  • All tenants share the function’s execution role. For more fine-grained permissions for individual tenants, consider propagating tenant-scoped credentials from the upstream application components invoking your Lambda function.
  • Your application may experience higher percentage of cold starts, as Lambda processes requests in separate execution environments for each tenant invoking your functions.
  • You pay a fee for each new tenant-specific execution environment created, depending on the memory configured for your function. See Lambda pricing page for details.

Best practices

When using the new tenant isolation mode for Lambda functions, AWS recommends the following best practices:

  • Implement robust tenant ID validation at the application layer to prevent unauthorized access through tenant ID manipulation. Consider using a dedicated service or database to maintain valid tenant IDs.
  • Monitor and audit tenant access patterns regularly to detect potential security anomalies or unauthorized cross-tenant access attempts.
  • Be aware of Lambda concurrency quotas when building multi-tenant applications. You might need to request quota increases based on your tenant count and usage patterns.

Sample code

Follow the instructions in this GitHub repository to provision a sample project in your own account and see the new Lambda tenant isolation mode in action. The sample project illustrates how to integrate a function using the new tenant isolation mode with Amazon API Gateway and propagate tenant identity from client requests.

Conclusion

The new tenant isolation mode for Lambda simplifies building serverless multi-tenant SaaS applications on AWS. By automatically managing application tenant-level compute environment isolation, this capability eliminates the need for custom isolation logic or separate tenant functions, allowing you to focus on the core business logic while AWS handles the complexities of tenant-aware compute environment isolation.

Combined with the existing security features in Lambda, rapid scaling, and pay-per-use pricing, tenant isolation mode makes Lambda an even more compelling choice for modern SaaS applications, whether you’re building new solutions or enhancing existing ones.

To learn more, refer to the documentation for tenant isolation. For details on pricing, refer to Lambda’s pricing page.

Improve API discoverability with the new Amazon API Gateway Portal

Post Syndicated from Giedrius Praspaliauskas original https://aws.amazon.com/blogs/compute/improve-api-discoverability-with-the-new-amazon-api-gateway-portal/

Amazon API Gateway now provides a fully managed portal feature, Amazon API Gateway Portal, that eliminates the need for static websites, open source solutions, or third-party offerings, which often led to fragmented API lifecycle management and increased costs. API Gateway Portal integrates with the API Gateway service and offers features like API products, interactive “Try it” functionality, and documentation for your API portfolio.

This fully managed solution addresses the need for a seamless way to showcase APIs and help developers quickly find, try, and integrate with them. By providing a managed solution that handles infrastructure, security, and scalability, API providers can focus on creating valuable APIs and delivering a great developer experience.

In this post, we will show how you can use the new portal feature to create customizable portals with enhanced security features in minutes, with APIs from multiple accounts, without managing any infrastructure.

Overview

A developer portal is a web page where API providers can share their APIs and API documentation by grouping them into portal products. Each portal product is a logical grouping of REST APIs and contains the documentation that you create and publish for your API consumers. Product pages within a portal contain the custom documentation at the portal product level. Product REST endpoint pages contain the documentation for each of the REST APIs with the details of the path and method of a REST API and the stage it’s deployed to. The combination of Product pages and Product REST endpoint pages provide the complete documentation for our API consumers on how to start using your REST APIs.

This abstraction allows you to organize endpoints from multiple APIs and stages into coherent product offerings for your consumers. For example, if you operate multiple APIs supporting a pet adoption service, you can create an “AdoptAnimals” portal product that groups dog-related endpoints from one API with cat-related endpoints from another API, while organizing user management functions into a separate “AdoptProcess” portal product.

With this flexibility you can present your APIs in a way that matches your business logic rather than your technical architecture and organize your APIs in ways that make the most sense for your consumers. For large enterprises managing extensive API portfolios, API Gateway Portal offers centralized catalogs of APIs across business groups, reducing duplicate work and improving standardization.

The portal feature automatically creates developer portals that display APIs with documentation, interactive testing capabilities, and integrated consumer analytics. The platform uses AWS Resource Access Manager (RAM) for multi-account API sharing, Amazon Cognito for access control, and Amazon CloudWatch for centralized monitoring.

Key features of API Gateway Portal

The API Gateway Portal provides comprehensive functionality for both API providers and consumers.

The following is a list of the key features that were introduced by the service at launch:

Customizable portal experience: You control your portal’s branding through custom logos and color schemes. You can configure custom domain names with SSL certificates managed by AWS Certificate Manager, or use the default domain structure provided by AWS.

Flexible access control: Access to developer portals can be controlled using Amazon Cognito, you can configure portals to be either publicly accessible or require authentication. Integration with Cognito user pools provides secure and scalable identity and access management that is enterprise-grade, cost-effective, and customizable. For organizations using existing identity systems, Cognito supports federation with SAML and OpenID Connect identity providers.

Cross-account API organization: The portal supports sharing portal products across AWS accounts using AWS RAM, so that organizations can create a unified API catalog while maintaining flexibility for API providers to develop and maintain APIs in their own accounts. When you share a portal product with another account, that account cannot modify any properties of your portal product or product endpoint pages, so API providers maintain control over their APIs while still enabling discovery across the organization. The cross-account sharing capabilities provide significant governance benefits for enterprise customers, including centralized discovery, standardization, reduced duplication, clear ownership, and controlled access.

Documentation: Beyond API reference documentation synchronized from your API definitions, you can add supplemental documentation including guides, use cases, and integration examples.

Search, discovery, and interactive API exploration: Consumers can search across your entire catalog. The portal provides intuitive customizable navigation and organization to help users find the right endpoints for their needs. Using the “Try It” functionality consumers can try APIs directly from the portal. Users can input request parameters, headers, and see live responses, reducing time-to-value for API integrations. This environment includes built-in limits for security and cost control.

Access control and governance

Amazon API Gateway Portal provides security and governance capabilities essential for production deployments.

Identity and access management: Integration with Cognito user pools provides secure and scalable identity and access management that is enterprise-grade, cost-effective, and customizable, including multi-factor authentication, password policies, and user lifecycle management.

API authorization: The portal respects existing authorization mechanisms configured on your APIs, including AWS IAM, Lambda authorizers, and Cognito user pools. Portal access doesn’t bypass your established security controls.

Cross-account governance: When sharing portal products across accounts using AWS RAM, the original API owners retain full control over their endpoints, including authorization strategies, integration configurations, and stage settings. Portal owners can use shared portal products but cannot modify the underlying API configurations.

Audit and monitoring: All portal management activities integrate with AWS CloudTrail for comprehensive audit logging. You can use Amazon CloudWatch RUM to perform real user monitoring to collect and view analytics about API consumers in near real time.

Resource limits: The service includes built-in quotas to prevent abuse, including limits on API testing rate limits, payload sizes, and integration timeouts. With these limits the “Try It” functionality cannot impact your production API performance.

Getting Started

Setting up a portal involves three main steps: creating portal products, configuring the portal, and publishing for consumer access. We will walk through those steps in more detail.

Create portal product

The following procedure shows you how to create a portal product:

  1. Navigate to the API Gateway console and select Portal products from the main navigation.
  2. Choose Create portal product and specify your portal product details including name, description, and visibility settings.
  3. Next, select the endpoints you want to include in this portal product. You can choose entire API stages or specific resources and methods, and even rename endpoints with user-friendly names for better discoverability.
  4. The system automatically imports your API documentation. You can improve the documentation with additional context, use cases, and examples later.
  5. Organize product endpoints into custom categories that reflect your business logic rather than technical implementation details.

Configure the developer portal

The following procedure shows how to create a portal.

  1. Select Developer portals in the API Gateway console navigation.
  2. Specify your portal name, description, and domain configuration.
  3. Choose between adding your prefix to the default AWS domain or configuring a custom domain name with your own SSL certificate.
  4. Configure access control by selecting authentication requirements. For internal portals, you might require Amazon Cognito authentication, while public portals can allow anonymous access to documentation.
  5. Upload your logo and select color themes to match your brand identity.
  6. Add your portal products. You can include products from your account or products shared with you from other accounts through AWS RAM. The portal provides search and filtering capabilities for consumers.

Preview and publish

Before making your portal publicly available, use the preview functionality to review the consumer experience. The preview shows exactly how your portal will appear to users, including navigation, documentation, and available API testing capabilities.

When you’re satisfied with the configuration, choose Publish portal to make it accessible to consumers. The publishing process typically completes within a few minutes, and API Gateway provides the final portal URL for distribution to your consumers.

Conclusion and next steps

The new API Gateway Portal eliminates the complexity of building and maintaining custom API documentation sites. Your developers get a professional, feature-rich experience where they can discover and try your APIs immediately. Plus, since everything stays within AWS, you get built-in security, simplified operations, and comprehensive observability through integration with services like CloudWatch and CloudTrail.

Ready to streamline your API discovery experience? Here’s how to get started:

Building an AI gateway to Amazon Bedrock with Amazon API Gateway

Post Syndicated from Thomas Natschläger original https://aws.amazon.com/blogs/architecture/building-an-ai-gateway-to-amazon-bedrock-with-amazon-api-gateway/

When building generative AI applications, enterprises need to govern foundation model usage through authorization, quota management, tenant isolation, and cost control. To meet these needs, Dynatrace developed a robust AI gateway architecture that has evolved into a reusable reference pattern for organizations looking to control access to Amazon Bedrock services at scale.

This pattern uses Amazon API Gateway as the access layer in front of Amazon Bedrock. It supports key capabilities such as request authorization with seamless integration into existing identity systems (for example, JWT validation), usage quotas and request throttling, lifecycle management, canary releases, and AWS WAF integration. The gateway also uses Amazon API Gateway response streaming, launched today, for real-time delivery of API model outputs that stream to users as they are generated. The complete solution code is available in our GitHub repository.

In this blog post, you’ll explore the underlying architecture, learn how to deploy and configure the solution, and discover further enhancement ideas.

Architecture of the AI gateway

The reference architecture gives you granular control over LLM access using fully managed AWS services. It is transparent to client applications and seamlessly integrates into existing enterprise environments.


Figure 1. Reference architecture of the AI gateway.

The solution consists of four core components:

  1. Amazon Route 53 (optional) manages custom domain routing, allowing clients to access the gateway through a company-specific endpoint instead of the default Amazon API Gateway domain.
  2. Amazon API Gateway serves as the entry point for the requests and provides capabilities like authorization, request throttling, and lifecycle management.
  3. AWS Lambda authorizer handles request authorization, which in the Dynatrace implementation involves validating JWT tokens with existing authentication systems. For your specific implementation, you can implement your own authorization logic in a Lambda authorizer, integrate with Amazon Cognito user pools, or use other API Gateway authorization mechanisms.
  4. Lambda integration is a dynamic request forwarder that signs incoming requests with AWS credentials and routes them to the appropriate Amazon Bedrock endpoints. The function preserves the original request details, including the API action and parameters, to support current or future Amazon Bedrock APIs without code changes. The complete implementation is available in the integration Lambda function.
  5. Amazon Bedrock provides access to foundation models and AI capabilities.

The benefit of this architecture is the transparency to client applications and future-proof design. Clients can use AWS SDKs (like Boto3) to access Amazon Bedrock functionalities (such as LLMs and Knowledge Bases) exactly as they would when calling the Amazon Bedrock API directly. Meanwhile, the AI gateway handles authorization, quota management, and other capabilities behind the scenes.

When a client makes an Amazon Bedrock API call to the AI gateway endpoint, the Lambda integration function:

  1. Captures the original request with its details (headers, body, and parameters).
  2. Applies AWS Signature Version 4 authentication.
  3. Forwards the request to the correct Amazon Bedrock service endpoint.

With this approach the AI gateway can support current and new Amazon Bedrock features without requiring specific API knowledge or code updates, reducing gateway maintenance as the available features grow.

Deploying with AWS CloudFormation

This walkthrough will deploy a private AI gateway with authorization disabled for initial testing. You’ll create the core infrastructure (API Gateway, Lambda functions, and VPC endpoints) and then test basic functionality before optionally adding security features.

The quickest way to deploy this solution is with AWS CloudFormation:

  1. Sign in as an administrator to the AWS Management Console and use the navigation bar to select your desired AWS Region for deployment.
  2. Choose the following Launch Stack button:

launch stack button

  1. In the Quick create stack page, configure the key parameters as follows. For complete parameter descriptions, see the documentation.
Parameter Description Choose value Why
EndpointType API Gateway endpoint accessibility (PRIVATE or REGIONAL) PRIVATE Secure internal access only
EnableAuthorizer Enable Lambda Authorizer for API Gateway false Start without auth for simpler testing
CustomDomain Custom domain name for API Gateway (leave empty) Use default domain initially
HostedZoneId Route 53 Hosted Zone ID for custom domain SSL validation (leave empty) Not needed with default domain
  1. Select the capability I acknowledge that AWS CloudFormation might create IAM resources.
  2. Leave all other configurations at their default values and choose Create Stack.
  3. In the stack page, wait until the Status of the stack transitions to CREATE_COMPLETE.
  4. Choose Outputs and copy the values for GatewayUrl, VpcId, and ApiId – you’ll use these to test your gateway later.

Testing the deployment

Your gateway is now running privately inside its VPC, but that means you can’t reach it from the outside. You’ll now create an AWS CloudShell environment inside the VPC to test the gateway:

  1. Open the CloudShell console page, choose the + icon and then choose Create VPC environment.
  2. On the Create a VPC environment page, configure:
    1. Name: for example, AIGatewayTest
    2. Virtual Private Cloud (VPC): the VpcId you copied earlier
    3. Subnet: any available subnet
    4. Security group: the default VPC security group
  3. Choose Create to create your VPC environment.

Once your CloudShell environment is ready, you’ll create a client that properly routes requests through your private API endpoint.First, execute this command in CloudShell to create a reusable client factory that routes requests through your gateway while maintaining the standard boto3 interface:

cat > boto3_client_factory.py << 'EOF'
import boto3
from botocore import UNSIGNED
from botocore.client import BaseClient
from botocore.config import Config

class Boto3ClientFactory:
    """Utility class for creating boto3 clients."""
    
    @classmethod
    def create(cls, service_name: str, endpoint_url: str, jwt_token: str = None) -> BaseClient:
        """Create a boto3 client instance usable with the sign-and-forward Lambda.
        
        Parameters
        ----------
        service_name : str
            The service name to be used when initiating boto3.client.
        endpoint_url: str
            URL pointing to API gateway invoke endpoint.
        jwt_token: str, optional
            JWT token to include in Authorization header for all requests.
            If provided, adds 'Authorization: Bearer {jwt_token}' header.
        """
        generic_client = boto3.client(
            service_name = service_name,
            endpoint_url = endpoint_url,
            # do not sign the request at the client side; authentication is done
            # in API gateway
            config = Config(signature_version = UNSIGNED),
            # non-None Region_name needs to be passed due to validation logic
            # it is NOT used if we pass endpoint_url above
            region_name="",
        )
        
        def add_client_headers(model, params, **kwargs):
            """Hook to add custom headers each request."""
            headers = params["headers"]
            
            # the header "aws-endpoint-prefix" is used by the Lambda integration
            headers["aws-endpoint-prefix"] = model.service_model.endpoint_prefix
            
            # Add Authorization header if jwt_token is provided
            if jwt_token:
                headers["Authorization"] = f"Bearer {jwt_token}"
        
        # register the hook to add custom headers before each call
        generic_client.meta.events.register("before-call.*.*", add_client_headers)
        
        return generic_client
EOF

With the factory in place, you can now create clients for different Amazon Bedrock services. Set your configuration variables:

# Replace with your actual Gateway URL from CloudFormation Outputs (used in all examples)
export GATEWAY_URL="https://your-api-id.execute-api.region.amazonaws.com/v1"
# Replace with one of your pre-existing Amazon Bedrock Knowledge Bases ID (optional, only needed for knowledge base example)
export KB_ID="your-kb-id"

Test model inference with Amazon Bedrock ConverseStream API:

cat > test_converse_stream.py << 'EOF'
import os
from boto3_client_factory import Boto3ClientFactory
import json

# Get configuration from environment variables
api_gateway_url = os.environ['GATEWAY_URL']

# Create client for Bedrock Runtime (model inference)
bedrock_runtime_client = Boto3ClientFactory.create(
    service_name = "bedrock-runtime",
    endpoint_url = api_gateway_url
)

response = bedrock_runtime_client.converse_stream(
    modelId = 'global.anthropic.claude-haiku-4-5-20251001-v1:0',
    messages = [{"role": "user", "content": [{"text": "Who invented the airplane?"}]}]
)

print("Model Response:")
# Stream the response as it arrives
for event in response['stream']:
    if 'contentBlockDelta' in event:
        delta = event['contentBlockDelta']['delta']
        if 'text' in delta:
            print(delta['text'], end='', flush=True)  # Print each chunk immediately
    elif 'messageStop' in event:
        print("\n")  # End of stream
        break
EOF

python test_converse_stream.py

Test retrieval from Amazon Bedrock Knowledge Bases:

cat > test_knowledge_base.py << 'EOF'
import os
from boto3_client_factory import Boto3ClientFactory

# Get configuration from environment variables
api_gateway_url = os.environ['GATEWAY_URL']
knowledge_base_id = os.environ['KB_ID']

# Create client for Bedrock Agent Runtime (knowledge bases)
bedrock_kb_client = Boto3ClientFactory.create(
    service_name = "bedrock-agent-runtime",
    endpoint_url = api_gateway_url
)

response = bedrock_kb_client.retrieve(
    knowledgeBaseId = knowledge_base_id,
    retrievalQuery = {'text': 'Who invented the airplane?'},
    retrievalConfiguration = {
        'vectorSearchConfiguration': {
            'numberOfResults': 5
        }
    }
)

print("Knowledge base retrieval results:")
print(response)
EOF

python test_knowledge_base.py

Configuring authorization

After testing the basic functionality, you can now enable authorization by updating your deployed stack with custom authorization logic. For example, to implement JWT validation in a Lambda authorizer:

  1. Open the CloudFormation template file with your favorite text editor and replace the following placeholder code in the Lambda authorizer with your own authorization logic (see examples):
def lambda_handler(event, context):
    try:
        # Placeholder for JWT validation
        # Implement your authorization logic
        token = event['authorizationToken']
        
        # By default, deny all requests
        return generate_policy('user', 'Deny', event['methodArn'])
    except Exception as e:
        return generate_policy('user', 'Deny', event['methodArn'])
  1. Update the CloudFormation stack to enable your new authorization logic:
    1. Go back to the CloudFormation console
    2. Select your bedrock-llm-gateway stack, choose Update stack, and choose Make a direct update
    3. Choose Replace existing template, upload your modified template file, and choose Next
    4. In the parameters section, change EnableAuthorizer from false to true and choose Next
    5. Select the capability I acknowledge that AWS CloudFormation might create IAM resources.
    6. Choose Next and choose Submit
  2. Once the CloudFormation stack update is complete, deploy your changes to the API Gateway. API Gateway requires an explicit deployment step to activate configuration changes. While CloudFormation updates the API Gateway resources, the changes won’t be active until you create a new deployment to push them to the stage.
    1. Navigate to the API Gateway console
    2. Choose your API (find it using the ApiId from your CloudFormation outputs)
    3. Choose Deploy API
    4. For Stage select v1, and choose Deploy
  3. Test your authorization by going back to CloudShell and running this example. This example passes a JWT token to test the authorization – replace it with your actual token or other authorization parameters you configured in the Lambda authorizer.
cat > test_with_auth.py << 'EOF'
import os
from boto3_client_factory import Boto3ClientFactory

# Get configuration from environment variables
api_gateway_url = os.environ['GATEWAY_URL']

# Replace "your-jwt-token" with your actual JWT token
jwt_token = "your-jwt-token"

bedrock_runtime_client_with_jwt = Boto3ClientFactory.create(
    service_name = "bedrock-runtime",
    endpoint_url = api_gateway_url,
    jwt_token = jwt_token
)

response = bedrock_runtime_client_with_jwt.converse_stream(
    modelId = 'global.anthropic.claude-haiku-4-5-20251001-v1:0',
    messages = [{"role": "user", "content": [{"text": "Who invented the airplane?"}]}]
)

print("Model Response:")
# Stream the response as it arrives
for event in response['stream']:
    if 'contentBlockDelta' in event:
        delta = event['contentBlockDelta']['delta']
        if 'text' in delta:
            print(delta['text'], end='', flush=True)  # Print each chunk immediately
    elif 'messageStop' in event:
        print("\n")  # End of stream
        break
EOF

python test_with_auth.py

Enhancement options for the AI gateway

The solution can be enhanced using additional API Gateway capabilities. Here are some examples:

  • Rate limiting and throttling: Control request rates using usage plans and API keys. This is especially important in multi-tenant SaaS applications to avoid noisy neighbor problems. For examples of throttling scenarios, see the throttling documentation.
  • Private or edge-optimized endpoints: Configure endpoint types to optimize for internal access or global performance.
  • Lifecycle management and canary releases: Manage multiple API versions and implement gradual rollouts with stage variables and canary deployments.
  • WAF integration: Add AWS WAF rules to help protect from common exploits.
  • Prompt and response caching: Implement caching strategies to reduce costs and improve response times for frequently requested prompts using API Gateway caching.
  • Content filtering: In addition to the safeguards offered by Amazon Bedrock Guardrails, add custom filtering in the Lambda integration layer to screen for sensitive content such as personally identifiable information (PII).

For more information about these capabilities, visit the API Gateway features page.

Conclusion

The AI gateway pattern demonstrated in this post provides a scalable way to manage access to foundation models and agent tools through Amazon Bedrock. Initially developed and implemented by Dynatrace to serve their global user base, this pattern has proven its effectiveness at enterprise-scale. By using the Amazon API Gateway enterprise features organizations can implement necessary controls while maintaining the benefits of serverless architecture.

To start using this solution today, follow the walkthrough in this blog post or check out our GitHub repository. To learn more about the services used in this solution, explore the Amazon API Gateway features page, or visit the documentation for Amazon API Gateway and Amazon Bedrock. To learn how this solution streams foundation model responses, see the documentation for the new Amazon API Gateway response streaming capability.


About the authors

Building responsive APIs with Amazon API Gateway response streaming

Post Syndicated from Anton Aleksandrov original https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/

Today, AWS announced support for response streaming in Amazon API Gateway to significantly improve the responsiveness of your REST APIs by progressively streaming response payloads back to the client. With this new capability, you can use streamed responses to enhance user experience when building LLM-driven applications (such as AI agents and chatbots), improve time-to-first-byte (TTFB) performance for web and mobile applications, stream large files, and perform long-running operations while reporting incremental progress using protocols such as server-sent events (SSE).

In this post you will learn about this new capability, the challenges it addresses, and how to use response streaming to improve the responsiveness of your applications.

Overview

Consider this scenario – you’re running an AI-powered agentic application that uses an Amazon Bedrock foundation model. Your users interact with the application through an API, asking complex questions that require detailed responses. Before response streaming, users would send their prompts and wait to eventually receive the application response, sometimes for tens of seconds. This awkward pause between questions and responses created a disconnected, unnatural experience.

With the new API Gateway response streaming capability, the interaction through the API becomes much more fluid and natural. As soon as your application starts processing the model response, you can stream it back to your users using the API Gateway.

The following animation illustrates this significant user experience improvement. The prompt on the left is processed using a non-streaming response with user having to wait for several seconds to receive the result. The prompt on the right is using the new API Gateway response streaming, significantly reducing TTFB and improving user experience.

Figure 1. Comparing user experience before (left) and after (right) enabling API Gateway response streaming when returning a response from a Bedrock foundational model.

Your users can now see AI responses appear in real-time, word by word, just like watching someone type. This immediate feedback makes your applications feel more responsive and engaging, keeping users connected throughout the interaction. In addition, you don’t have to worry about response size limits or implement complex workarounds – the streaming happens automatically and efficiently, letting you focus on building great user experiences rather than managing infrastructure constraints.

Understanding response steaming

In the traditional request-response model, responses must be fully computed before being sent to the client. This can negatively impact user experience – the client must wait for the complete response to be generated on the server-side and transmitted over-the-wire. This is especially pronounced in interactive, latency-sensitive cloud applications such as AI agents, chatbots, virtual assistants, or music generators.

Figure 2. Response is returned to the client only after it’s been fully generated, increasing time-to-first-byte latency.

Another important scenario is returning larger response payloads, such as images, large documents, or datasets. In some cases, these payloads may exceed the 10 MB response size limit or default integration timeout limit of 29 seconds of API Gateway. Before the launch of response streaming, developers worked around these limitations by using pre-signed Amazon S3 URLs to download large responses or accepting lower RPS for an increase in timeout. While functional, these workarounds introduced additional latency and architectural complexity.

With response streaming support you can address these challenges. You can now update your REST APIs to return streamed responses, significantly enhancing user experience, improving TTFB performance, supporting response payload sizes to exceed 10 MB, and serving requests that can take up to 15 minutes.

Figure 3. Response streaming reduces time-to-first-byte and improves user experience.

The response streaming capability is already delivering significant performance for organizations:

“Working closely with the AWS teams to enable response streaming was instrumental in advancing our roadmap to deliver the most performant storefront experiences for our largest customers at Salesforce Commerce Cloud. Our collaboration exceeded our Core Web Vital goals; we saw our Total Blocking Time metrics drop by over 98%, which will enable our customers to drive higher revenue and conversion rates.”, says Drew Lau, Senior Director of Product Management at Salesforce.

Response streaming is supported for any HTTP-proxy integration, AWS Lambda functions (using proxy integration mode), and private integrations. To get started, configure your API integration to stream the response from your backend, as described in the following sections, and redeploy your API for changes to take effect.

Getting started with response streaming

To enable response streaming for your REST APIs, update your integration configuration to set the response transfer mode to STREAM. This enables API Gateway to start streaming the response to the client as soon as response bytes become available. When using response streaming, you can configure request timeout up to 15 minutes. For best time to first byte user experience, AWS strongly recommends your backend integration also implements response streaming.

You can enable response streaming in several different ways, as illustrated in the following snippets:

Using the API Gateway console, when creating method integrations, select Stream for the Response transfer mode.

Figure 4. Enabling response streaming in API Gateway Console.

Setting response transfer mode using the Open API spec:

paths:
  /products:
    get:
      x-amazon-apigateway-integration:
        httpMethod: "GET"
        uri: "https://example.com"
        type: "http_proxy"
        timeoutInMillis: 300000
        responseTransferMode: "STREAM"

Setting response transfer mode using infrastructure-as-code (IaC) frameworks, such as AWS CloudFormation. Note the /response-streaming-invocations Uri fragment, it tells API Gateway to use the Lambda InvokeWithResponseStreaming endpoint:

MyProxyResourceMethod:
  Type: 'AWS::ApiGateway::Method'
  Properties:
    RestApiId: !Ref LambdaSimpleProxy
    ResourceId: !Ref ProxyResource
    HttpMethod: ANY
    Integration:
      Type: AWS_PROXY
      IntegrationHttpMethod: POST
      ResponseTransferMode: STREAM
      Uri: !Sub arn:aws:apigateway:${APIGW_REGION}:lambda:path/2021-11-
           15/functions/${FN_ARN}/response-streaming-invocations

Updating response transfer mode using the AWS CLI:

aws apigw update-integration \
   --rest-api-id a1b2c2 \
   --resource-id aaa111 \
   --http-method GET \
   --patch-operations "op='replace',path='/responseTransferMode',value=STREAM" \
   --region us-west-2

Using response streaming with Lambda functions

When using Lambda functions as a downstream integration endpoint, your Lambda functions must be streaming-enabled. The API Gateway uses the InvokeWithResponseStreaming API to invoke functions, as illustrated in the following diagram, and requires Lambda proxy integration. See the API Gateway documentation for additional guidance.

Figure 5. Using API Gateway response streaming with Lambda functions for interactive AI applications.

When you use response streaming with Lambda functions, API Gateway expects the handler response stream to contain the following components (in order):

  • JSON response metadata – Must be a valid JSON object and can only contain statusCode, headers, multiValueHeaders, and cookies fields (all optional). Metadata cannot be an empty string; at a minimum it must be an empty JSON object.
  • The 8-null-byte delimiter – Lambda adds this delimiter automatically when you use the built-in awslambda.HttpResponseStream.from() method, as illustrated below. When not using this method, you’re responsible for adding the delimiter yourself.
  • Response payload – Can be empty.

The following code snippet illustrates how you can return a streamed response from your Lambda functions so it will be compatible with API Gateway response streaming:

export const handler = awslambda.streamifyResponse(
   async (event, responseStream, context) => {

      const httpResponseMetadata = {
         statusCode: 200,
         headers: {
            'Content-Type': 'text/plain',
            'X-Custom-Header': 'some-value'
         }
      };

      responseStream = awslambda.HttpResponseStream.from(
         responseStream,
         httpResponseMetadata
      );

      responseStream.write('hello');
      await new Promise(r => setTimeout(r, 1000));
      responseStream.write(' world');
      await new Promise(r => setTimeout(r, 1000));
      responseStream.write('!!!');
      responseStream.end();
   }
);

Refer to the API Gateway documentation for further implementation guidelines.

Using response streaming with HTTP Proxy integrations

You can stream HTTP responses from your applications used as downstream integration endpoints, for example web servers running on Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). In this case, you must use HTTP_PROXY integration and specify the response transfer mode as STREAM (using the console, AWS CLI, or IaC). Redeploy your API after modifying it.

Figure 6. Using API Gateway response streaming with HTTP server applications.

Once API Gateway receives a streaming response from your application, it will wait until the HTTP headers block transfer is complete. Then, it will send to the client an HTTP response status code and headers, followed by the content from your application as it gets received by the API Gateway service. It will continue streaming response from your application to the client until the stream ends (up to 15 minutes).

Many popular API and web application development frameworks provide response streaming abstractions. The following code snippet illustrates how you can implement HTTP response streaming using FastAPI:

app = FastAPI()

async def stream_response():
   yield b"Hello "
   await asyncio.sleep(1)
   yield b"World "
   await asyncio.sleep(1)
   yield b"!"

@app.get("/")
async def main():
   return StreamingResponse(stream_response(), media_type="text/plain")

Adding real-time response streaming to your HTTP clients

Different HTTP clients have different ways to process streamed response fragments as they arrive. The following code snippet illustrates how to process a streamed response with a Node.js application:

const request = http.request(options, (response)=>{
   response.on('data', (chunk) => {
      console.log(chunk);
   });

   response.on('end', () => {
      console.log('Response complete’);
   });
});

request.end();

When using CURL, you can use the –no-buffer argument to print response fragments as they arrive.

curl --no-buffer {URL}

Sample code

Clone this sample project from GitHub to see API Gateway response streaming in action. Follow instructions in the README.md to provision the sample project in your AWS account.

Considerations

Before you enable response streaming, consider:

  • Response streaming is available for REST APIs and can be used with HTTP_PROXY integrations, Lambda integrations (in proxy mode), and private integrations.
  • You can use API Gateway response streaming with any endpoint type, such as Regional, Private, and Edge-optimized, with or without custom domain names.
  • When using response streaming, you can configure response timeouts up to 15 minutes, according to your scenario requirements.
  • All streaming responses from Regional or Private endpoints are subject to a 5-minute idle timeout. All streaming responses from edge-optimized endpoints are subject to a 30-second idle timeout.
  • Within each streaming response, the first 10MB of response payload is not subject to any bandwidth restrictions. Response payload data exceeding 10MB is restricted to 2MB/s.
  • Response streaming is compatible with API Gateway security capabilities such as authorizers, WAF, access controls, TLS/mTLS, request throttling, and access logging.
  • When processing streamed responses, the following features are not supported: response transformation with VTL, integration response caching, and content encoding.
  • Always protect your APIs against unauthorized access and other potential security threats by implementing proper authorization with Lambda Authorizers or Amazon Cognito User Pools. Read REST API protection documentation and API Gateway security documentation for additional details.

Observability

You can continue using existing observability capabilities, such as execution logs, access logs, AWS X-Ray integration, and Amazon CloudWatch metrics with API Gateway response streaming.

In addition to the existing access logs variables, the following new variables are available:

  • $content.integration.responseTransferMode – the response transfer mode of your integration. This can be either BUFFERED or STREAMED.
  • $context.integration.timeToAllHeaders – the time between when API Gateway establishes the integration connection to when it receives all integration response headers from the client.
  • $context.integration.timeToFirstContent – the time between when API Gateway establishes the integration connection to when it receives the first content bytes.

See API Gateway documentation for more information.

Pricing

With this new capability, you continue to pay the same API Invoke rates for streamed responses. Each 10MB of response data, rounded up to the nearest 10MB, is billed as a single request. See API Gateway pricing page for additional details.

Conclusion

The new response streaming capability for Amazon API Gateway enhances how you can build and deliver responsive APIs in the cloud. With immediate streaming of response data as it becomes available, you can significantly improve time-to-first-byte performance and overcome traditional payload size and timeout limitations. This is particularly valuable for AI-powered applications, file transfers, and interactive web experiences that demand real-time responsiveness.

To learn more about API Gateway response streaming see the service documentation.

To learn more about building Serverless architectures see Serverless Land.

BASF Digital Farming builds a STAC-based solution on Amazon EKS

Post Syndicated from Kevin S. Ridolfi original https://aws.amazon.com/blogs/architecture/basf-digital-farming-builds-a-stac-based-solution-on-amazon-eks/

This post was co-written with Frederic Haase and Julian Blau with BASF Digital Farming GmbH.

At xarvio – BASF Digital Farming, our mission is to empower farmers around the world with cutting-edge digital agronomic decision-making tools. Central to this mission is our crop optimization platform, xarvio FIELD MANAGER, which delivers actionable insights through a range of geospatial assets, including satellite imagery, drone data, and application maps from sprayers.

In this post, we show you how we built a scalable geospatial data solution on AWS to efficiently catalog, manage, and visualize both raster and vector datasets through the web. We walk you through our solution based on the SpatioTemporal Asset Catalog (STAC) specification and the open source eoAPI ecosystem, detailing the solution architecture, key technologies, and lessons learned during deployment. This builds upon a previous post on efficient satellite imagery ingestion using AWS Serverless, extending our discussion to the full lifecycle of geospatial data management at scale.

Requirements for our geospatial data solution

BASF Digital Farming’s xarvio FIELD MANAGER platform operates at exceptional scale in the geospatial data ecosystem, processing hundreds of millions of satellite images that translate into STAC items, which further decompose into billions of individual geospatial artifacts. Unlike traditional satellite data providers such as European Space Agency (ESA) who work with predictable, structured data flows, we operate in an inherently dynamic agricultural environment where we ingest near-daily satellite imagery per field from a diverse array of sensors and providers globally. Our mission to support farmers worldwide with advanced digital agronomic decision advice demands a reliable, cloud-based infrastructure capable of handling this massive data velocity and volume and applying advanced quality assurance processes including cloud detection and anomaly detection algorithms. The platform’s true value emerges through our machine learning (ML) pipelines that transform raw satellite data into actionable insights. For example, estimating accurate absolute biomass such as Leaf Area Index (LAI) helps farmers make precise, data-driven agronomic decisions that optimize crop yield and resource utilization across fields worldwide.

STAC and eoAPI ecosystem

To efficiently manage our growing archive of geospatial data, we adopted the Spatio Temporal Asset Catalog (STAC) specification, an open standard that provides a common language to describe and catalog raster and vector datasets. With STAC, we can standardize metadata across diverse sources like satellite imagery, UAV datasets, and prescription maps, making it straightforward to search, filter, and retrieve assets across our platform. We built our platform using the eoAPI ecosystem, an integrated suite of open source tools designed to handle the full lifecycle of geospatial data on the cloud. At its core is pgSTAC, which provides a performant PostGIS-backed STAC API implementation. With pgSTAC, we can index millions of STACi Items efficiently, with support for spatial, temporal, and attribute-based filtering at scale. On top of that, we use Tiles in PostGIS (TiPG) to serve tiled vector data directly from our PostGIS database. This enables real-time visualization of field boundaries, management zones, and application histories as lightweight Mapbox Vector Tiles (MVT), without requiring an external tile server. For raster assets, including satellite and drone imagery, we rely on TiTiler, a modern dynamic tile server built for Cloud Optimized GeoTIFFs (COGs). With TiTiler, we can stream imagery on-demand as WMTS or XYZ tiles, perform dynamic rendering (such as NDVI or false color composites), and integrate seamlessly into web maps and mobile apps.

Solution overview

The following architecture diagram shows how we implemented our geospatial data platform on AWS. In this section, we explain each component of the architecture and how they work together to process millions of satellite images and geospatial assets daily. The solution uses Amazon Elastic Kubernetes Service (Amazon EKS) as the core computing platform, with Amazon Simple Storage Service (Amazon S3) for storage and Amazon Relational Database Service (Amazon RDS) for metadata management. We break down the architecture into four main layers: core services, storage, database, and ingestion.

A detailed AWS Cloud architecture visualization showcasing a complete geospatial data processing system across four distinct layers. The database layer features an EKS Cluster managing STAC, raster, and vector services, all connected to Amazon RDS through a proxy instance. The client layer supports both desktop and mobile access via Amazon API Gateway. The ingestion layer processes geospatial data streams through a STAC ingestor, feeding into a robust storage layer utilizing Cloud Optimized GeoTIFF and FlatGeobuf technologies. The architecture emphasizes scalability and efficient spatial data handling through PostgreSQL with pgstac extension, enabling seamless integration of various geospatial services and data formats.

Core services layer

The solution uses an EKS cluster hosting three key services:

  • stac-service – Implements the STAC API specification to catalog and serve metadata for both raster and vector datasets
  • raster-service – Powered by TiTiler, this service dynamically renders and tiles cloud-optimized raster data (for example, COGs) for seamless integration into web and mobile maps
  • vector-service – Built with TiPG, this component serves vector data (for example, boundaries or application zones) as tiled MVT layers directly from the database or from Amazon S3

These services are containerized and orchestrated within Kubernetes, allowing for high availability, modular separation, and simplified continuous integration and delivery (CI/CD) workflows.

KEDA-based automatic scaling

We use Kubernetes Event-Driven Autoscaling (KEDA) to scale our platform services dynamically based on real-time workloads. With KEDA, we can scale individual pods based on precise event-driven metrics such as the STAC ingestion queue depth or visualization request load. This supports responsive performance during peak activity while maintaining lean resource usage during idle periods, aligning perfectly with our need for elasticity in a data-intensive, variable-load environment.

Geospatial asset storage layer

The platform stores all raw and processed geospatial assets in S3 buckets, optimized for performance and durability. This layer holds COGs for raster imagery and FlatGeobuf or similar formats for vector data. These formats are chosen for their support of streaming access, indexing, and cloud-based performance.

Database layer

The metadata backbone of the system is a PostgreSQL database hosted on Amazon RDS, extended with the pgSTAC plugin. This setup enables efficient indexing and querying of millions of STAC items and collections. An RDS proxy sits in front of the database, providing connection pooling and resiliency, especially under bursty or concurrent access patterns common in geospatial applications.

Ingestion layer

An independent ingestion component handles batch or streaming geospatial data inputs. This component processes satellite imagery, drone data, or prescription maps and pushes relevant metadata into the STAC API and storage assets into Amazon S3. The ingestion engine is decoupled from serving infrastructure, enabling asynchronous and large-scale data loading.

Amazon API Gateway and clients

Public access to the platform is handled through Amazon API Gateway, allowing clients—whether browser-based or mobile—to interact securely with the services. The API gateway provides a unified entrypoint and is used for applying rate limiting, authorization, and routing policies.

Solution benefits

The solution offers the following benefits:

  • Rapid onboarding with STAC standardization – By aligning with the STAC specification, we’ve significantly reduced the time to onboard new data domains like sprayer application maps. Compared to previous approaches in our legacy system, metadata modeling and integration are now both standardized and automated, so we can expose new geospatial data products to clients in days instead of weeks or months.
  • Optimized storage with COGs and Amazon S3 – Storing raster and vector assets in Amazon S3 using cloud-optimized formats (such as COGs for imagery or FlatGeobuff for vectors) reduces storage costs while enabling low-latency, streaming access. This avoids the need for preprocessing or extract, transform, and load (ETL)-heavy pipelines and simplifies client delivery.
  • Large-scale ingestion with a batch STAC ingestor – Our custom STAC ingestor supports both real-time and batch-mode operations. This has made it possible to onboard satellite constellations, drone imagery, and historical datasets in bulk without disrupting running services. The ingestion service uses optimized database ingestion functions, capable of ingesting thousands of items per second, providing high-throughput and reliable data integration at scale.
  • PostgreSQL, pgSTAC, and Amazon RDS Proxy for a scalable metadata backbone – With pgSTAC and Amazon RDS Proxy, we benefit from advanced spatial-temporal querying while making sure database connection management is handled gracefully, even under high concurrency. This combination offers reliability without compromising performance.
  • Scalable deployment with Amazon EKS – Hosting the solution on Amazon EKS provides full control over deployments, resource tuning, and service orchestration. Combined with automatic scaling, we dynamically adjust compute capacity based on demand, facilitating resilience and cost-efficiency.

Learnings

As part of building this solution, we learned the following:

  • RDS Proxy is essential for automatically scaled environments – Given our use of automatic scaling pods in Amazon EKS, we found that RDS Proxy is critical. It handles connection pooling efficiently and protects the underlying PostgreSQL database from connection exhaustion during sudden scale-up events. Without it, we encountered spiky load failures and blocked connections during high-ingest periods.
  • Batch STAC ingestor is a core component – Our custom STAC ingestor proved to be an indispensable piece of the system. It interfaces directly with pgSTAC to perform large-scale, automated ingestions of geospatial metadata from streams and archives. Without this tool, onboarding data providers or processing legacy imagery at scale would have been labor-intensive and error-prone.
  • COGs are non-negotiable – For fast, scalable visualization of large raster datasets, COGs are essential, particularly if raster datasets exceed several gigabytes. They enable efficient HTTP range requests, alleviate the need for preprocessing, and work seamlessly with TiTiler for real-time tile rendering. Non-COG formats led to noticeably slower performance and weren’t suitable for cloud-based visualization.
  • Serverless-compliant, optimized for Amazon EKS (for now) – Although the architecture is designed to be serverless-compatible, we opted for an Amazon EKS first approach due to the nature of our other application landscape. Components like TiTiler and TiPG benefit from persistent, memory-tuned environments that are harder to achieve in a serverless runtime. However, the solution remains modular and stateless by design, and certain subsystems (such as ingestion triggers, notifications, or monitoring) are already candidates for future serverless migration to further improve elasticity and reduce operational overhead.

Conclusion

BASF Digital Farming GmbH has successfully implemented a STAC-based geospatial data platform on Amazon EKS, enabling efficient management and visualization of satellite imagery, drone data, and application maps. This architecture helps us onboard new data sources within weeks rather than months. The new platform also processes twice as much data in a single day while cutting costs by 50%, thanks to reduced data handling through the STAC schema and the efficiencies of automatic scaling. By adopting the STAC standard, the architecture improves data discoverability, reduces search latency, and supports more efficient analytic workflows.

Organizations looking to build similar geospatial data solutions can use AWS services like Amazon EKS, Amazon S3, and Amazon RDS along with open source tools like STAC and eoAPI to create scalable, cost-effective solutions. Learn more about building containerized applications on AWS at Containers on AWS.

Zero downtime blue/green deployments with Amazon API Gateway

Post Syndicated from Biswanath Mukherjee original https://aws.amazon.com/blogs/compute/zero-downtime-blue-green-deployments-with-amazon-api-gateway/

Modern applications require deployment strategies that minimize downtime and reduce risk. Blue/green deployment is a strategy that reduces downtime and risk by running two identical production environments called “blue” and “green”. At any given time, only one environment serves live production traffic while the other remains idle. This strategy provides immediate fallback to the previous stable version if issues arise after the new deployment. Let’s say a company is deploying a new version of its application. The following diagram shows the blue/green deployment strategy they are using.

Complete blue/green deployment strategy, including testing, monitoring, and conditional rollback procedures

As shown in the preceding diagram, current production traffic is served by the blue environment. During deployment of the new version of the application, the following sequence of activities happens:

  1. Deploy the new version of the application: Deploy the new version to the green environment.
  2. Test thoroughly: Test the new version in the green environment by invoking the API invoke URL from API Gateway. This does not affect production traffic, which continues to be served by the blue environment.
  3. Switch traffic: After you have thoroughly tested the green environment, redirect all production traffic from the blue to the green environment.
  4. Monitor and take any necessary action: Continue to monitor the green environment as it serves production traffic. Keep the blue environment ready for immediate rollback to the previous stable version, in case any issues arise in the green environment. At this point, one of two possible outcomes happens:
    1. If you identify any issues with the green environment in production, roll back to the previous stable version of the blue environment and fix the green environment before retrying.
    2. If you observe no issues with the green environment, decommission the blue environment. The green environment is now the new blue environment, the environment with stable production version of the application serving live traffic.

In this post, you learn how to implement blue/green deployments by using Amazon API Gateway for your APIs. For this post, we use AWS Lambda functions on the backend. However, you can follow the same strategy for other backend implementations of the APIs. All the required infrastructure is deployed by using AWS Serverless Application Model (AWS SAM).

Solution overview

As you follow along with this post, you will implement the following blue/green deployment architecture by using API Gateway custom domain API mapping. You use API mappings to connect API stages to a custom domain name. After you create a domain name and configure DNS records, you use API mappings to send traffic to your APIs through your custom domain name.

AWS serverless architecture showing blue /green deployment using Amazon API Gateway custom domain

As shown in the preceding diagram, the blue/green deployment architecture includes four primary AWS services – Amazon Route 53, Amazon API Gateway, AWS Lambda functions, and AWS Certificate Manager (ACM). When you send a request to the API, the Route 53 resolves the domain to the API Gateway custom domain. API Gateway handles HTTPS termination by using a configured ACM certificate. API Gateway examines the incoming request path and headers to route the request to the active environment.

You first set up the blue environment along with the Route 53 DNS configuration, API Gateway custom domain mapping, ACM and test it with Route 53 URL. This is your production environment serving live traffic. After that, deploy a new version of the application in the green environment and test it by invoking the API invoke URL from API Gateway while live traffic is still being served from the blue environment. Then, switch the traffic from the blue environment to the green environment by using API Gateway custom domain API mapping. This ensures that there is no change in the external (client-facing) API custom domain URL. Live traffic is now served by the green environment. If you observe any issues during this time, you can quickly roll back to the blue environment and fix the green environment. If the green environment is stable, you can decommission the blue environment.

This post uses two separate regional API endpoints to simulate blue/green environments and traffic routing between them but you can this same architectural pattern using single API endpoint with two stages representing blue and green environments.

Prerequisites

Complete the following prerequisites before you start setting up the solution:

Set up and test the solution

Note that this is a sample project to understand the concept and not for direct usage in production. The sample project contains three APIs.

  • Health GET API to know the current health of the environment and which environment the request was served from.
  • Pets GET API to get the list of available pets.
  • Order POST API to place a pet order.

Follow the steps below to setup blue/green deployment and test it.

  1. Clone the repository and navigate to the directory (all commands run from here)
    Clone the GitHub repository in a new folder and navigate to the stacks folder.

    git clone https://github.com/aws-samples/sample-blue-green-deployment-with-api-gateway.git
    cd sample-blue-green-deployment-with-api-gateway/stacks

  2. Deploy the blue environment
    You will first setup the blue environment. Run the following commands to deploy the blue environment:

    sam build -t blue-stack.yaml
    sam deploy -g -t blue-stack.yaml

    Enter the following details:

    • Stack name: The CloudFormation stack name (for example, blue-green-api-blue)
    • AWS Region: A supported AWS Region (for example, us-east-1)
    • BlueLambdaFunction has no authentication. Is this okay?: y
    • SAM configuration file: blue-samconfig.toml

    Keep everything else set to their default values. You will use the SAM deploy output for the subsequent steps. Going forward, you can run the following command to deploy the blue environment resources:

    sam build -t blue-stack.yaml
    sam deploy -t blue-stack.yaml --config-file blue-samconfig.toml

  3. Test the blue environment

    Run the following commands to test the blue environment after replacing ApiEndpoint with BlueApiEndpoint from the SAM deploy output in each case. First, test Health check API to know the health of the environment. Run the following command.

    curl --request GET \ 
    --url https://<ApiEndpoint>/health

    Sample response:

    {
      "status": "healthy", 
      "environment": "blue", 
      "version": "v1.0.0", 
      "timestamp": "2025-09-05T13:11:11.248267Z"
    }

    Second, test the pets GET API request to get the list of available pets. Run the following command.

    curl --request GET \ 
    --url https://<ApiEndpoint>/pets

    Sample response:

    {
      "environment": "blue",
      "version": "v1.0.0",
      "pets": [
        {
          "id": 1,
          "name": "Buddy",
          "category": "dog",
          "status": "available"
        },
        {
          "id": 2,
          "name": "Whiskers",
          "category": "cat",
          "status": "available"
        },
        {
          "id": 3,
          "name": "Charlie",
          "category": "bird",
          "status": "pending"
        }
      ]
    }

    Now, test create order POST API to place an order for a pet:

    curl --request POST \
      --url https://<ApiEndpoint>/orders \
      --header 'content-type: application/json' \
      --data '{
      "id": 1,
      "name": "Buddy"
    }'

    Expected response:

    {
      "confirmationNumber": "ORD-3251C0F4", 
      "environment": "blue",
      "version": "v1.0.0",
      "timestamp": "2025-09-05T13:16:04.703666Z", 
      "data": 
      {
        "id": 1, 
        "name": "Buddy", 
        "status": "ordered"
      }
    }
    

    Check that the response contains the environment attribute set to blue and the version attribute set to v1.0.0

  4. Deploy API Gateway custom domain pointing to the blue environment

    Run the following commands to deploy Amazon API Gateway custom domain with API mapping pointing to the blue environment created in the previous step.

    sam build -t custom-domain-stack.yaml
    sam deploy -g -t custom-domain-stack.yaml

    Enter the following details:

    • Stack name: (for example, blue-green-api-custom-domain)
    • AWS Region: A supported AWS Region (for example, us-east-1)
    • PublicHostedZoneId: The Route 53 public hosted zone ID (for example, ABXXXXXXXXXXXXXXXXXYZ)
    • CustomDomainName: The Route 53 domain name (for example, api.example.com)
    • CertificateArn: The ACM certificate ARN (for example, arn:aws:acm:us-east-1:123456789012:certificate/abc123)
    • ActiveApiId: The value of the BlueApiId output from the blue-stack deployment
    • ActiveApiStage: The value of the BlueApiStage output from the blue-stack deployment
    • SAM configuration file: custom-domain-samconfig.toml

    Keep everything else to their default values. You will use the SAM deploy output for the subsequent steps. At this point the production (live) traffic is routed to blue environment.

  5. Test the production (live) traffic which is routed to blue environment

    Follow the test method mentioned in step 3 to test the production environment, but replace ApiEndpoint with CustomDomainUrl. Check that the production environment response contains the environment attribute set to blue and the version attribute set to v1.0.0

  6. Deploy the green environment

    You will now deploy the v2.0.0 of the application on green environment. Run the following commands to deploy the green environment:

    sam build -t green-stack.yaml
    sam deploy -g -t green-stack.yaml

    Enter the following details:

    • Stack name: (for example, blue-green-api-green)
    • AWS Region: Select the same Region as the blue environment (for example, us-east-1)
    • GreenLambdaFunction has no authentication. Is this okay?: y
    • SAM configuration file: green-samconfig.toml

    Keep everything else to default values. You will use the SAM deploy output for the subsequent steps. Going forward, you can run the following commands to deploy the green environment.

    sam build -t green-stack.yaml
    sam deploy -t green-stack.yaml --config-file green-samconfig.toml
    

  7. Test the green environment
    Follow the test method shown in step 3 to test the production environment, but replace ApiEndpoint with GreenApiEndpoint. Validate that the response contains the environment attribute set to green and the version attribute set to v2.0.0
  8. Switch the live traffic to the green environment
    Run the following command to switch live traffic to the green environment:

    sam deploy -g -t custom-domain-stack.yaml  --config-file custom-domain-samconfig.toml

    Enter the following details:

    • Stack name: Keep it as it is.
    • AWS Region: Keep it as it is.
    • PublicHostedZoneId: Keep it as it is.
    • CustomDomainName: Keep it as it is.
    • CertificateArn: Keep it as it is.
    • ActiveApiId: The value of GreenApiEndpoint output from the green-stack deployment
    • ActiveApiStage: The value of GreenApiStage output from the green-stack deployment
    • SAM configuration file: Keep it as it is.

    Keep everything else at their default values.

  9. Test the production environment (now green) again

    Follow the test method shown in step 3 to test the production environment (now green) again, and be sure to replace ApiEndpoint with CustomDomainUrl from SAM deploy output. Validate that the response now contains the environment attribute set to green and the version attribute set to v2.0.0. Note that it may take a minute or two for the change to be seen due to local DNS caching, during this time some of the traffic may be still served from the blue environment. After switching the live traffic to the green environment, if there is an issue, you can switch the live traffic back to the blue environment and fix the green environment. Depending on whether you need to roll back to the previous stable blue environment or decommission the blue environment when you no longer need it, follow either step 10 or 11.

  10. (Option 1) Roll back to the blue environment, if necessary
    Run the following command to roll back to the blue environment, if necessary.

    sam deploy -g -t custom-domain-stack.yaml  --config-file custom-domain-samconfig.toml

    Enter the following details:

    • Stack name: Keep it as it is.
    • AWS Region: Keep it as it is.
    • PublicHostedZoneId: Keep it as it is.
    • CustomDomainName: Keep it as it is.
    • CertificateArn: Keep it as it is.
    • ActiveApiId: Value of BlueApiEndpoint output from the blue-stack deployment.
    • ActiveApiStage: The value of BlueApiStage output from the blue-stack deployment.
    • SAM configuration file: Keep it as it is.

    Keep everything else to their default values. Live traffic is switched to the blue environment. You can confirm that by following step 3.

  11. (Option 2) Decommission the blue environment when you no longer need it
    Run the following command to decommission (delete) the blue environment when you no longer need it. Replace blue-stack-name with your blue deployment stack name:

    sam delete --stack-name <blue-stack-name> --no-prompts

Clean up

To avoid costs, remove all resources created along this post once you’re done.

Run the following command after replacing the <placeholder> variables to delete the resources you deployed for this post’s solution:

# Delete all stacks (in reverse order)
# Delete custom domain
sam delete --stack-name <api-custom-domain-stack-name> --no-prompts
# Delete green stack
sam delete --stack-name <green-stack-name> --no-prompts
# Delete blue stack, if already not deleted in step 10
sam delete --stack-name <blue-stack-name> --no-prompts

Conclusion

Blue/green deployments with API Gateway provide a robust, scalable solution for zero-downtime deployments that deliver significant technical and operational advantages. This post’s solution architecture enables traffic switching at the API Gateway level while maintaining complete isolation between the blue and green environments to prevent interference. The solution offers immediate rollback capabilities for quick issue resolution and supports comprehensive testing through production-like validation before any traffic switching occurs.

Following this solution you can build a solid foundation for production blue/green deployments for your serverless microservice architecture that maintains the flexibility to adapt to your specific requirements while ensuring reliability, scalability, and operational excellence. To learn more about serverless architectures see Serverless Land.

Modernization of real-time payment orchestration on AWS

Post Syndicated from Neeraj Kaushik original https://aws.amazon.com/blogs/architecture/modernization-of-real-time-payment-orchestration-on-aws/

The global real-time payments market is experiencing significant growth. According to Fortune Business Insights, the market was valued at USD 24.91 billion in 2024 and is projected to grow to USD 284.49 billion by 2032, with a CAGR of 35.4%. Similarly, Grand View Research reports that the global mobile payment market, valued at USD 88.50 billion in 2024, is expected to grow at a CAGR of 38.0% from 2025 to 2030. (Disclaimer: Third-party market research and statistics are provided for informational purposed only. AWS and IBM make no representations about the accuracy of this information.)

This rapid expansion underscores the urgency for financial institutions to modernize their payment processing infrastructure. Financial institutions often need to process high volume of transactions with near-zero latency to meet stringent service level agreements (SLAs) to support surging mobile payments volume.

However, traditional payment orchestration systems, often built on monolithic architectures, struggle to meet these demands due to latency, availability, and scalability challenges. Additionally, their reliance on on-premises infrastructure leads to higher costs and an impediment to innovation, reinforcing the need for modernization.

As sustainability becomes a priority, organizations are turning to cloud-based solutions to optimize infrastructure, reduce carbon footprints, and enhance energy efficiency. This shift provides scalability and performance, and aligns with global sustainability goals, securing the future of real-time payments.

In this post, we discuss the real-time payment orchestration framework. It uses an event-driven architecture and AWS serverless services to enhance the resiliency, efficiency, and scalability of real-time payments. By decomposing payment processing into distinct business capabilities, financial institutions can improve modularity and flexibility. Implementing tenant-based segregation helps with data isolation and security. Additionally, adopting asynchronous communication through Amazon Managed Streaming for Apache Kafka (Amazon MSK) enhances scalability and resilience.

Traditional real-time payment orchestration

Payment orchestration serves as a middleware solution, streamlining transaction processing across multiple payment methods, gateways, and financial institutions. It orchestrates key business functions such as payment authorization, payment processing, settlement and clearing, compliance and risk management, and account management for both inbound and outbound payment flows.

The following diagram depicts the high-level business capabilities supported by payment orchestrators across various payment flows, including real-time payments, digital disbursements, tax payments, wires, and more.

Payment processing system flowchart showing main components from acceptance to billing

Detailed flowchart depicting a payment processing system with multiple components. The diagram shows primary payment types at the top (including Realtime Payments, Digital Disbursement, Credit Transfer, and Peer to Peer Payments) flowing down through core processing stages including Payment Acceptance, Execution, Clearing, Reporting, Tracking, Reversals, and Billing.

Many financial institutions adopt a tenant-based approach organized by geography due to varying clearing processes, localized regulations, and transaction requirements across AWS Regions. However, without proper separation of services, teams often continue to add region-specific logic to existing services, gradually increasing their monolithic complexity and using the same infrastructure for all payment flows.

Traditional payment systems process transactions linearly, with each step waiting for the previous one to complete. However, analysis of payment workflows reveals numerous opportunities for parallel execution:

  • Sanctions screening and fraud detection – Compliance and fraud checks can run simultaneously with initial routing decisions, rather than sequentially blocking all subsequent processing
  • Payment routing and authorization requests – When basic validations are complete, routing and authorization can proceed in parallel rather than one after another
  • Payment execution and ledger updates – The actual payment execution doesn’t need to wait for ledger records to be updated—these can occur concurrently
  • Settlement, reconciliation, and tracking – These post-transaction processes can be initiated independently as soon as the primary transaction is complete

This parallel approach can dramatically improve throughput and reduce latency compared to traditional queue-based systems where operations form a sequential chain that extends processing time and creates bottlenecks.

Most legacy payment orchestration systems rely heavily on on-premises virtual machines (VMs), leading to several challenges:

  • Multi-Region support for disaster recovery and multi-tenancy resulting in significant capital expenditure and operational overhead
  • High latency and SLA issues caused by sequential message processing and delays between globally separated data centers
  • Limited reusability of payment flows as monolithic architectures require region-specific changes for local clearing mechanisms and regulations, increasing complexity and costs
  • Scalability challenges and high memory consumption due to inefficient resource utilization and execution of irrelevant logic across regions
  • Complex cross-border payment routing caused by variations in clearing rules, transaction limits, and local regulations, increasing latency and routing errors
  • Integration challenges with diverse data formats because legacy systems rely on proprietary standards (for example, ISO 20022, SWIFT MT), complicating data conversion and compliance
  • High deployment complexity for new payment flows due to monolithic architectures requiring extensive region-specific modifications, slowing time to market
  • Environmental impact and high carbon footprint from on-premises infrastructure consuming excessive energy, whereas cloud-based approaches improve efficiency

Solution overview

To overcome these challenges, the proposed architecture embraces the following design principles to build a future-ready, real-time payment orchestration solution:

  • Performance at scale – Handling over 1,000 transactions per second (TPS) with consistent low latency under varying load conditions.
  • High availability – Achieving 99.999% uptime to meet the strict requirements of financial transactions.
  • Geographic resilience – Supporting global operations with region-specific compliance while maintaining consistent performance.
  • Cost optimization – Reducing total cost of ownership through efficient resource utilization and serverless technologies.
  • Security and compliance – Supporting data protection and regulatory adherence across different jurisdictions.
  • Operational simplicity – Streamlining deployment, monitoring, and maintenance across the payment ecosystem.
  • Microservices – Decomposing payment processing into distinct business capabilities, so financial institutions can improve modularity and flexibility. This microservices-based approach allows for independent scaling and development of critical components.

The following diagram depicts the high-level solution architecture for real-time payments. The existing channels using synchronous or asynchronous APIs can be modified to use edge-optimized endpoints to reduce latency.

Event-driven payment orchestration system with pub/sub channels connecting multiple payment processing modules

Architecture diagram detailing an AWS-based payment orchestration platform utilizing event-driven principles. Features reusable components across two regions, with dedicated modules for payment initiation, execution, reconciliation, billing, and risk management. Implements pub/sub messaging patterns for inter-component communication and connects to enterprise systems including accounting, compliance, and analytics.

An event-driven architecture is used for payment orchestration, which handles communication through a pub/sub pattern. This architecture maintains persistent connections, improving performance of the end-to-end real-time payment processing.

The event-driven architecture for real-time payment processing allows multiple payment operations to occur simultaneously using different adaptors, as opposed to the traditional systems where payment processes are sequential and flow through a single pipeline. Payment events are distributed to specialized payment processor microservices based on their function (initiation, execution, tracking, settlements), enabling each to process independently without waiting for others to complete.

Because we’re transitioning from sequential processing to distributed, maintaining transaction traceability is crucial. The payment tracking adapters shown in the preceding diagram connect to enterprise analytics systems, creating a specialized layer for monitoring transactions. The pub/sub model allows for attaching correlation IDs to events, enabling systems to track related events across different topics and processing stages.

A standardized event schema serves as the foundation for this architecture, providing consistency across regional deployments while allowing for customization at the adapter level. This schema defines uniform event structures containing tenant-specific metadata and supports versioning to accommodate evolving requirements. By isolating region-specific variations to the adapter layer, the solution maintains core functionality while interfacing with diverse enterprise systems through configuration-driven customization rather than code changes.

For most payment processes, especially those with independent processing steps that can run in parallel, this architecture delivers net performance gains despite the topic switching overhead, particularly for complex transactions where multiple independent validations or processing steps are required.

Deployment on the AWS Cloud

The solution uses edge-optimized Amazon API Gateway for channels. An edge-optimized API endpoint routes requests to the nearest Amazon CloudFront Point of Presence (POP), which can help in cases where your clients are geographically distributed to enable efficient routing within each geographical region, enhancing global responsiveness by minimizing network round trips and making sure requests take the shortest possible path before transitioning from the public internet to the client network.

The following diagram illustrates the high-level solution architecture for real-time payments.

Multi-region AWS payment architecture with managed Kafka topics connecting Lambda microservices and DynamoDB storage

Comprehensive AWS payment orchestration solution implementing modern cloud-native architecture principles. Core processing logic implemented as Lambda functions covering initiation, execution, reconciliation, billing, tracking, risk management, and settlement workflows. Leverages Amazon MSK for reliable event streaming between components, with dedicated Kafka topics for each processing stage. Data persistence handled by Amazon DynamoDB, supporting cross-region operations. Architecture demonstrates AWS best practices for financial services, including regional redundancy, serverless computing, managed services, and event-driven design patterns. System integrates with external banking infrastructure and enterprise systems while maintaining separation of concerns through microservices architecture. Features built-in support for compliance monitoring, risk management, and payment tracking through specialized Lambda functions.

The solution uses Amazon MSK to implement an event-driven architecture that efficiently handles both inbound and outbound channels traffic through API requests and asynchronous message-based events. Amazon MSK communicates using a high-performance binary protocol between producers, consumers, and brokers, providing low latency and high throughput. Real-time payments are logically partitioned across multiple tenants within geographical regions—North America, EMEA, LATAM, and Asia-Pacific.

Each real-time payment tenant follows an active/active disaster recovery strategy by deploying MSK clusters across multiple AWS Regions, designed to achieve high availability and resilience. Amazon MSK offer both serverless and provisioned cluster options. The team can decide to select one or the other depending on the non-functional requirements and team expertise. Amazon MSK automatically manages partition leadership with leaders in primary Regions and followers in secondary Regions. During failover, leaders are re-elected in healthy Regions, designed to help maintain processing capabilities during regional incidents. Sticky partitioning uses consistent hashing for deterministic routing, and cooperative rebalancing enables efficient failover. Multi-AZ deployment provides zone redundancy and isolated clusters per Region for data sovereignty compliance through programmatic AWS Identity and Access Management (IAM) and virtual private cloud (VPC) boundaries.

To support seamless cross-Region replication and maintain message continuity, Amazon MSK Replicator—a fully managed feature of Amazon MSK—is used to replicate topics and synchronize consumer group offsets across clusters. MSK Replicator simplifies the process of building multi-Region Kafka applications by not needing custom code, open-source tool configuration, or infrastructure management. It automatically provisions and scales the necessary resources, so teams can focus on business logic while only paying for the data being replicated. In the event of a regional outage or failover, traffic can be automatically redirected to a healthy Region without data loss or service disruption, providing near-zero Recovery Time Objectives (RTOs) and uninterrupted operations for downstream services such as payment processors and audit trail consumers.

In addition to regional redundancy, the architecture uses an event-driven architecture to enable parallel and decoupled processing of payment transactions. Events such as transaction initiation, validation, and settlement are emitted asynchronously and consumed by various microservices independently, which drastically reduces end-to-end latency.

To process these events at scale, the architecture can use AWS Lambda, Amazon Elastic Container Service (Amazon ECS), or Amazon Elastic Kubernetes Service (Amazon EKS) depending upon non-functional requirements. Automatic scaling responds to Amazon CloudWatch metrics, and exponential backoff retry logic with dead-letter queues (DLQs) handles throttling scenarios. Circuit breakers prevent cascade failures during high error rates.

One of the key benefits of the solution is the reusability of payment flows across different regions. Although each region has its own unique compliance requirements and settlement rules, the core functionalities of real-time payments (payment authorization, payment processing, settlement and clearing) are largely similar. This reusability enables rapid deployment of payment solutions across new regions without rearchitecting the entire system. For example, the real-time payment system in the US and UK might share similar business logic for real-time gross settlement but differ in the clearing and compliance requirements. The solution treats these as bounded contexts within the microservices architecture, providing flexibility while making sure each region can handle its own specific rules and regulations.

Sustainability

AWS relentlessly innovates its infrastructure design, build, and operations to make progress towards net-zero carbon by 2040 and being water positive by 2030. Amazon MSK with AWS Graviton based instances use up to 60% less energy than comparable M5 instances, helping you achieve your sustainability goals. Lambda is inherently sustainable by design. Its serverless model makes sure compute resources are only used when needed, drastically reducing idle infrastructure and wasted energy. Instead of keeping always-on servers for infrequent tasks, Lambda provisions compute power just-in-time, achieving near-zero idle capacity.

Security and compliance in financial services

Given the sensitive nature of payment transactions and financial data, you should apply the security controls required to meet financial regulations such as AWS PCI DSS and AWS Federal Information Processing Standard (FIPS) 140-3 according to your organization’s needs.

The solution should incorporate multi-layered security controls, continuous monitoring, and automated compliance auditing to meet the rigorous expectations of banking regulators and internal risk teams. For more information, refer to Security Guidance.

Conclusion

The modernization of payment orchestration systems using an event-driven architecture and AWS serverless technologies marks a significant advancement in meeting the demands of today’s rapidly evolving financial services landscape. This solution addresses the key challenges faced by traditional payment systems while delivering substantial benefits in performance, scalability, cost optimization, global resilience, sustainability, and compliance. By using cutting-edge cloud technologies and robust security controls, financial institutions can now build a future-ready foundation that adapts to evolving business needs while maintaining the highest standards of performance, security, and reliability. As the real-time payments market continues its explosive growth, this modern architecture provides a solution that meets today’s demands and is also well-positioned to support tomorrow’s payment innovations. Organizations looking to modernize their payment infrastructure can use this blueprint to accelerate their digital transformation journey, supporting sustainable, secure, and efficient payment processing at scale in an increasingly competitive global marketplace.

The architecture presented here is for reference purposes only. IBM will work closely with you to deploy the solution in accordance with industry standards and compliance requirements.For additional resources, refer to:

IBM Consulting is an AWS Premier Tier Services Partner that helps customers who use AWS to harness the power of innovation and drive their business transformation. They are recognized as a Global Systems Integrator (GSI) for over 22 competencies, including Financial Services Consulting. For additional information, please contact an IBM Representative.

Accessing private Amazon API Gateway endpoints through custom Amazon CloudFront distribution using VPC Origins

Post Syndicated from Napoleone Capasso original https://aws.amazon.com/blogs/compute/accessing-private-amazon-api-gateway-endpoints-through-custom-amazon-cloudfront-distribution-using-vpc-origins/

Organizations can use Amazon CloudFront Virtual Private Cloud (VPC) Origins to deliver content from applications hosted in private subnets within Amazon VPC. Network traffic flows between Amazon CloudFront and Application Load Balancers (ALBs), Network Load Balancers (NLBs), or Amazon Elastic Compute Cloud (Amazon EC2) instances deployed within private subnets. This means that Amazon CloudFront can access both public and private Amazon Web Services (AWS) resources seamlessly.

This post demonstrates how you can connect CloudFront with a Private REST API in Amazon REST API Gateway using a VPC origin.

Overview

Organizations looking to enhance their application security and performance can find several key benefits in this architecture. You can use it to keep your APIs private and access them through CloudFront, implementing more security layers such as AWS Shield Advanced, geoblocking and TLSv1.3 support along with custom cipher suites. You can use this approach to engage the global CloudFront content delivery network while maintaining more control over the distribution.

Furthermore, you can enhance security controls through the built-in features of CloudFront, such as AWS WAF integration, custom SSL certificates, and field-level encryption. The VPC Origins feature eliminates any need to expose your internal resources to the public internet, which reduces your application’s potential attack surface.

Enterprises that need to maintain strict compliance requirements while delivering content globally can find this solution particularly valuable. Contain all traffic within the AWS private network to better meet your security and compliance objectives while still providing fast, reliable access to your applications.

Solution overview

You can set up CloudFront as the front door to your application and use VPC Origins integrated with an internal ALB to route traffic to Private Rest API through an execute-api VPC endpoint. All traffic between CloudFront and Private REST API remains within the AWS Private network.

The following figure provides an overview of the solution.

Figure 1: Amazon CloudFront to Private Amazon API Gateway

This diagram depicts three services running in an AWS account. The CloudFront distribution serves as the main entry point for the application. This distribution connects to an internal ALB using VPC Origins. The interface VPC endpoint execute-api is set as the target for the internal ALB to route requests to the Private Amazon API Gateway.

Solution deployment

To deploy this solution, follow the instructions in the GitHub repository and clone the repository.

The solution can be deployed in any AWS Region. Make sure to have a valid SSL certificate in AWS Certificate Manager (ACM) in the us-east-1 Region for the CloudFront distribution. All other resources must be in the same Region.

Prerequisites

For this walkthrough, the following resources are needed:

  • An Amazon VPC with at least 2 Private Subnets.
  • A Public Hosted Zone in Amazon Route 53.
  • A valid public SSL certificate in ACM in the us-east-1 Region for CloudFront and another certificate in the same Region as an ALB.
  • A custom domain name, covered by the ACM certificate for CloudFront distribution.

These are the input parameters for the solution deployment.

Walkthrough

The AWS Serverless Application Model (AWS SAM) template from the sample GitHub repository creates a secure, private networking architecture that provides controlled access to an API Gateway through CloudFront. It provisions a private API Gateway with a VPC endpoint, an internal ALB, and a CloudFront distribution. The template establishes secure communication channels by implementing private networking components, configuring SSL certificates, and setting up Route53 DNS routing. It uses custom AWS Lambda resources to dynamically manage network interfaces and security group configurations, providing a robust and flexible infrastructure for private API access, as shown in the following figure.

Figure 2: Amazon CloudFront to Private Amazon API Gateway walkthrough

Step 1: Create the CloudFront distribution and the VPC Origins

The solution creates a CloudFront distribution that enables wide range of HTTP clients by supporting HTTP2 and HTTP3, enhances your solution security by enforcing HTTPS-only traffic, and uses a custom domain with the user-provided SSL certificate.The internal ALB is configured as the origin for the CloudFront distribution, and it is created using the VPC Origins feature.

########### Cloudfront Distribution ###############
  CloudFrontDistribution:
    Type: AWS::CloudFront::Distribution
    Properties:
      DistributionConfig:
        HttpVersion: http2and3
        Comment: CloudFront Distribution with VPC Origin Integration
        Aliases: 
        - !Ref CloudFrontDomainName
        ViewerCertificate:
          AcmCertificateArn: !Ref CloudFrontCertificateARN
          MinimumProtocolVersion: TLSv1.2_2021
          SslSupportMethod: sni-only
        Origins:
        - Id: AlbOrigin
          DomainName: !GetAtt ApplicationLoadBalancer.DNSName
          VpcOriginConfig:
            OriginKeepaliveTimeout: 60
            OriginReadTimeout: 60
            VpcOriginId: !GetAtt CloudFrontVpcOrigin.Id
        Enabled: true
        DefaultCacheBehavior:
          TargetOriginId: AlbOrigin
          CachePolicyId: 83da9c7e-98b4-4e11-a168-04f0df8e2c65
          OriginRequestPolicyId: 216adef6-5c7f-47e4-b989-5492eafa07d3
          ViewerProtocolPolicy: https-only

The VPC Origins references the internal ALB, and only supports HTTPS protocol.

  ########### Cloudfront VPC Origin ###############
  CloudFrontVpcOrigin:
    Type: AWS::CloudFront::VpcOrigin
    Properties:
      VpcOriginEndpointConfig:
          Arn: !Ref ApplicationLoadBalancer
          Name: !Sub vpc-origin-${AWS::StackName}
          OriginProtocolPolicy: https-only
          OriginSSLProtocols: 
          - TLSv1.2

When the VPC Origins is created, the custom resource Lambda function adds an inbound rule to the CloudFront VPC Origin security group to allow traffic only from the CloudFront prefix list in the chosen Region.

        ec2_client.authorize_security_group_ingress(
            GroupId=cloudfront_sg_id,
            IpPermissions=[
                {
                    'IpProtocol': 'tcp',  
                    'FromPort': 443,      
                    'ToPort': 443,        
                    'PrefixListIds': [{'PrefixListId': cloudfront_prefix_id}]
                }
            ]
        )

Step 2: Create the ALB

The internal ALB is configured with an HTTPS listener using the user-provided SSL certificate. This maintains the encryption of the traffic between CloudFront and the ALB.

########### ALB Listener ###########  
  ALBListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      DefaultActions:
        - Type: forward
          TargetGroupArn: !Ref ALBTargetGroup
      LoadBalancerArn: !Ref ApplicationLoadBalancer
      Port: '443'
      Protocol: HTTPS
      SslPolicy: ELBSecurityPolicy-TLS-1-2-2017-01
      Certificates:
        - CertificateArn: !Ref ALBCertificateARN

The target group of the ALB points to the execute API VPC endpoint IP addresses. These IP addresses are fetched using a custom resource Lambda function.

  ########### ALB Target Group ###########
  ALBTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Port: 443
      Name: !Sub TargetGroup-${AWS::StackName}
      Protocol: HTTPS
      VpcId: !Ref VPCId
      Targets:
        - Id: !GetAtt GetPrivateIPs.IP0
          Port: 443
        - Id: !GetAtt GetPrivateIPs.IP1
          Port: 443
      TargetType: ip
      Matcher:
        HttpCode: '200,403'

The custom resource Lambda function fetches private IPs for given network interface IDs using the EC2 client and returns a dictionary with keys IP0 and IP1 values as the private IPs.

def fetch_interface_ips(network_interface_ids):
    """
    Fetch private IPs for given network interface IDs using ec2 client
    Returns a dictionary with keys IP0, IP1, etc. and values as the private IPs
    """
    responseData = {}
    
    # Use describe_network_interfaces instead of the resource approach
    response = ec2_client.describe_network_interfaces(
        NetworkInterfaceIds=network_interface_ids
    )
    
    for index, interface in enumerate(response['NetworkInterfaces']):
        responseData[f'IP{index}'] = interface['PrivateIpAddress']
    
    return responseData

Furthermore, the Lambda function adds an inbound rule to the internal ALB security group to allow traffic from the VPC Origin security group.

ec2_client.authorize_security_group_ingress(
            GroupId=security_group_id,
            IpPermissions=[
                {
                    'IpProtocol': 'tcp',  
                    'FromPort': 443,     
                    'ToPort': 443,  
                    'UserIdGroupPairs': [{'GroupId': cloudfront_sg_id}]
                }
            ]
        )

Step 3: Create the VPC endpoint for Private API Gateway

The execute-api VPC Endpoint is configured to route traffic from the internal ALB to the Private API Gateway. Only VPC endpoints can resolve private endpoints and route traffic securely.

  ########### execute-api VPC Endpoint ###########
  ExecuteApiVpcEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      ServiceName: !Sub "com.amazonaws.${AWS::Region}.execute-api"
      VpcId: !Ref VPCId
      SubnetIds: !Ref PrivateSubnets
      VpcEndpointType: Interface
      PrivateDnsEnabled: true
      SecurityGroupIds:
        - !Ref VpcEndpointSG 

Step 4: Create the Private API Gateway Resource Policy

The Resource Policy set up on the Private API Gateway only allows traffic from the VPC endpoint placed within the user-defined VPC.

        x-amazon-apigateway-policy:
          Version: "2012-10-17"
          Statement:
          - Effect: "Allow"
            Principal: "*"
            Action: "execute-api:Invoke"
            Resource: "execute-api:/*"
            Condition:
              StringEquals:
                aws:sourceVpce: 
                - !Ref ExecuteApiVpcEndpoint

Testing

Fetch the CloudFront URL from the Outputs section of the deployed stack and test it by using the curl command or your web browser.

curl https://{your custom domain name} 
{"message": "Hello from API GW"}

Conclusion

This post demonstrates how you can establish a secure architecture to access a Private REST API Gateway through a Private ALB, using Amazon CloudFront as the entry point for your application. The solution uses the CloudFront VPC Origins feature, a powerful capability that you can use to directly integrate CloudFront with resources that you host within your Amazon VPC. You can implement this architecture to significantly enhance your application’s security posture as you restrict access to your backend services and minimize the potential attack surface. This approach provides you with a robust and reliable method to protect your applications from unauthorized access while maintaining high availability and performance through the CloudFront global content delivery network.

AWS Weekly Roundup: Amazon EC2, Amazon Q Developer, IPv6 updates, and more (September 1, 2025)

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-ec2-amazon-q-developer-ipv6-updates-and-more-september-1-2025/

My LinkedIn feed was absolutely packed this week with pictures from the AWS Heroes Summit event in Seattle. It was heartwarming to see so many familiar faces and new Heroes coming together.

AWS Heroes Summit 2025

For those not familiar with the AWS Heroes program, it’s a global community recognition initiative that honors individuals who make outstanding contributions to the AWS community. These Heroes share their deep AWS knowledge through content creation, speaking at events, organizing community gatherings, and contributing to open-source projects.

The AWS Heroes Summit brings these exceptional community leaders together, providing a unique platform for knowledge exchange, networking, and collaboration. As someone who regularly interacts with Heroes through our AWS initiatives, I always find these summits invaluable – they offer deep technical discussions, early access to AWS roadmaps, and opportunities to provide direct feedback to AWS service teams. The insights and connections made at these events often translate into better resources and guidance for the broader AWS community.

Last week’s launches

In addition to this inspiring community, here are some AWS launches that caught my attention:

  • AWS expands Internet Protocol v6 (IPv6) support to AWS App Runner, AWS Client VPN, and RDS Data API — Three more AWS services now support IPv6 connectivity, helping you meet compliance requirements and removes the need for handling address translation between IPv4 and IPv6. AWS App Runner now supports IPv6-based inbound and outbound traffic on both public and private App Runner service endpoints. AWS Client VPN announced support for remote access to IPv6 workloads, allowing you to establish secure VPN connections to your IPv6-enabled VPC resources. Finally, RDS Data API now supports IPv6, enabling dual-stack configuration (IPv4 and IPv6) connectivity for your Aurora databases.
  • We launched two new instance families this week: the new storage-optimized I8ge and the general-purpose M8i instances —Our I8ge instances, powered by AWS Graviton4 processors, deliver up to 60% better compute performance compared to their Graviton2-based predecessors. These instances feature third-generation AWS Nitro SSDs, providing up to 55% better real-time storage performance per TB and significantly lower I/O latency. With 120 TB of storage and sizes up to 48xlarge (including two metal options), they offer the highest storage density among AWS Graviton-based storage optimized instances. We also launched M8i and M8i-flex instances with custom Intel Xeon 6 processors. These instances deliver up to 15% better price-performance and 2.5x more memory bandwidth than their predecessors. M8i-flex instances are ideal for general-purpose workloads, available from large to 16xlarge. For demanding applications, you can choose from our SAP-certified M8i instances in 13 sizes, including 2 bare metal options and a new 96xlarge size.
  • Amazon EC2 Mac Dedicated hosts now support Host Recovery and Reboot-based host maintenance — you can enable two new capabilities for your EC2 Mac Dedicated Hosts: Host Recovery and Reboot-based Host Maintenance. Host Recovery automatically detects potential hardware issues on Mac Dedicated Hosts and seamlessly migrates Mac instances to a new replacement host, minimizing disruption to workloads. Reboot-based Host Maintenance automatically stops and restarts instances on replacement hosts when scheduled maintenance events occur, eliminating the need for manual intervention during planned maintenance windows.
  • Amazon Q Developer now supports MCP admin control — Administrators have now the ability to enable or disable the MCP functionality for all the Q Developer clients in their organization. When an administrator disables the functionality, users will not be allowed to add any MCP servers, nor will any previously defined servers be initialized.

Other AWS news

Here are some additional projects and blog posts that you might find interesting:

  • Mastering Amazon Q Developer with Rules — I read an interesting article about Amazon Q Developer’s rules feature this weekend that I want to share with you. What caught my attention is how it solves a pain point I often encounter when working with AI assistants – having to repeatedly explain my coding preferences and standards. With rules, you define your preferences once in Markdown files, and Amazon Q Developer automatically follows them for every interaction. I particularly like how transparent the system is, showing which rules it’s following, and how it helps maintain consistency across teams. Since implementing rules in my projects, I’ve seen more consistent code quality, all while reducing the cognitive load of having to repeatedly explain our standards.
  • Strategies for excelling across all four exam domains of the AWS Certified Machine Learning – Specialty certification. The AWS Training & Certification team, where I spent my first three years at AWS, shared how to prepare for the AWS Certified Machine Learning – Specialty certification, whether you’re starting from scratch or building upon existing AWS Certifications. They share the prerequisites and guidance to help you get ready for this certification and demonstrate your expertise in building ML solutions with AWS.
  • As is now our tradition after Prime Day, we shared the impressive metrics showing how AWS services scaled to support one of the world’s largest shopping events. Amazon Prime Day 2025 was the biggest ever, setting records for both sales volume and total items sold during the 4-day event. This year was particularly special as we saw a significant transformation in the Prime Day experience through advancements in our generative AI offerings, with customers using Alexa+, Rufus, and AI Shopping Guides to discover deals and get product information. The numbers are staggering – Amazon DynamoDB handled tens of trillions of API calls while maintaining high availability, delivering single-digit millisecond responses and peaking at 151 million requests per second. Amazon API Gateway processed over 1 trillion internal service requests—a 30 percent increase in requests on average per day compared to Prime Day 2024.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

  • AWS Summits — Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Toronto (September 4), Los Angeles (September 17), and Bogotá (October 9).
  • AWS re:Invent 2025 — This flagship annual conference is coming to Las Vegas from December 1–5. The event catalog is now available. Mark your calendars for this not to be missed gathering of the AWS community.
  • AWS Community Days — Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Adria (September 5), Baltic (September 10), Aotearoa (September 18), South Africa (September 20), Bolivia (September 20), Portugal (September 27).

Join the AWS Builder Center to learn, build, and connect with builders in the AWS community. Browse here for upcoming in-person and virtual developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— seb

Effectively building AI agents on AWS Serverless

Post Syndicated from Anton Aleksandrov original https://aws.amazon.com/blogs/compute/effectively-building-ai-agents-on-aws-serverless/

Imagine an AI assistant that doesn’t just respond to prompts – it reasons through goals, acts, and integrates with real-time systems. This is the promise of agentic AI.

According to Gartner, by 2028 over 33% of enterprise applications will embed agentic capabilities – up from less than 1% today. While early generative AI efforts focused on GPUs and model training, agentic systems shift the focus to CPUs, orchestration, and integration with live data – the places where organizations are starting to see real return on investment (ROI).

In this post, you’ll learn how to build and run serverless AI agents on AWS using services such as Amazon Bedrock AgentCore (preview as of this post publication), AWS Lambda, and Amazon Elastic Container Service (Amazon ECS), which provide scalable compute foundations for agentic workloads. You’ll also explore architectural patterns, state management, identity, observability, and tool usage to support production-ready deployments.

Overview

Early AI assistants were stateless and reactive – each prompt processed in isolation, with no memory of prior interactions or awareness of broader context. Gradually, AI assistants became more capable by injecting system prompts, preserving conversation history, and incorporating enterprise knowledge using Retrieval-Augmented Generation (RAG), as illustrated in the following diagram.

Despite these improvements, traditional AI assistants still lacked true autonomy. They couldn’t reason through multi-step goals, make decisions on their own, or adjust workflows dynamically based on outcomes. As a result, they worked well for simpler Q&A or predefined workflows, but struggled with dynamic, more complex, real-world tasks that require planning, using external tools, and making decisions along the way.

Agentic AI systems shift from passive content generation to autonomous, goal-driven behavior. Powered by Large Language Models (LLMs) and enhanced with memory, planning, and tool use, these systems can break down complex tasks into smaller steps, reason through each step, and take real-time actions, such as calling APIs, executing tools, or interacting with live data. By referencing the LLM within a control cycle that manages context, memory, and decision-making, these systems can choose the right tools, adapt workflows, and integrate deeply into enterprise environments, with use cases ranging from travel booking and financial analysis to DevOps automation and code debugging. This is referred to as an agentic loop. In this system, the agent relies on the LLM’s reasoning output to execute tools, capture tool results, and feed these results to the LLM as updated context (as shown in the following diagram). This happens in a loop until LLM instructs the agent to return the final output to the caller.

While agentic loop is a lightweight approach to structuring these systems, other control flow paradigms, such as graph, swarm, and workflows, are also available in open-source frameworks like LangGraph.

Introducing Strands Agents SDK

Strands Agents SDK is a code-first framework to build production-ready AI agents with minimal boilerplate. It utilizes the above-mentioned agentic loop system and abstracts common challenges like memory management, tool integration, and multi-step reasoning in a lightweight, modular Python framework. Strands SDK handles state, tool orchestration, and multi-step reasoning so agents can remember past conversations, call external APIs, enforce business rules, and adapt to changing inputs. This allows you to focus on the application’s business logic.

Because agents built with Strands SDK are essentially Python apps, they’re portable and can run across different compute options, such as Bedrock AgentCore Runtime, Lambda functions, ECS tasks, or even locally. This makes Strands Agents SDK a powerful foundation for building scalable and goal-driven AI systems. The following sections assume you’re running your AI agents built with Strands Agents SDK on Lambda functions.

Building your first serverless AI agent

Imagine you’re building an AI-powered corporate travel assistant on AWS, and you have the following technical requirements:

  1. Define the system prompts, memory, and model you want to use
  2. Integrate tools for API calls, business logic, and knowledge bases
  3. Ensure authentication and observability

Strands SDK handles heavy lifting, so you can focus on building smart, responsive agents with minimal overhead. The following code snippet creates a simple agent, according to your configuration.

from strands import Agent

agent = Agent(
    system_prompt=
      """You're a travel assistant that helps 
         employees book business trips 
         according to policy.""",
    model=my_model,
    tools=[get_policies, get_hotels, get_cars, book_travel]
)

response = agent("Book me a flight to NYC next Monday.")

That’s it. Your agent now has a personality, memory, and ability to use external tools. The Agent class in the Strands SDK abstracts agentic logic, such as maintaining conversation history, handling LLM interactions, orchestrating tools and external knowledge sources, and running the full agentic loop.

Session state management

Session state management is critical for agentic workflows. It allows agents to track goals across interactions – enabling coherent conversations, retaining context, and providing personalized experiences. Without state management, each prompt is handled in isolation, making it impossible for the agent to reference prior context or track ongoing tasks. In cloud environments, where applications need to be stateless and scalable, the solution is to externalize session state to persistent storage, such as Amazon Simple Storage Service (Amazon S3). This allows any agent instance to reconstruct the conversation history on demand, delivering a seamless, stateful user experience while keeping the agentic app itself stateless for scalability and resilience.

AI agents built with Strands store conversation history in the agent.messages property (see documentation). To support stateless compute environments, you can externalize the agent state, persisting it after each interaction and restoring it before the next. This preserves continuity across invocations while keeping your agent instances stateless. In user-aware agentic applications, you want to persist state for each user, typically associated with the user’s unique ID. The following example illustrates how you can do it with the built-in S3SessionManager class when running your agent in a stateless environment such as a Lambda function:

    session_manager = S3SessionManager(
        session_id=f"session_for_user_{user.id}",
        bucket=SESSION_STORE_BUCKET_NAME,
        prefix="agent_sessions"
    )

    agent = Agent(
        session_manager=session_manager
    )

When using Bedrock AgentCore, use the fully managed, serverless AgentCore Memory primitive to manage sessions and long-term memory. It provides relevant context to models while helping agents learn from past interactions. You can make Strands’ session manager work with AgentCore Memory similar to S3SessionManager.

Authentication and authorization

For enterprise AI agents to operate safely, they must know who the user is and what they are allowed to do. This goes beyond basic identity validation – AI agents often act on behalf of users, so they might need to enforce role-based access controls, support audit, and comply with corporate policies.

AWS services like Amazon CognitoAmazon Identity and Access Management (IAM), and Amazon API Gateway provide a solid foundation for authentication and authorization. For example, you can use Cognito to authenticate users through user pools or federated identity providers, combined with API Gateway and Lambda authorizer to validate user access permissions before forwarding requests to the agent, as shown in the preceding diagram. IAM policies define what the agent is allowed to do. After the user is both authenticated and authorized, the agent can extract the identity context, for example, from a JSON Web Token (JWT), to personalize prompts, enforce rules, or dynamically restrict actions.

The following code snippet illustrates retrieving user’s identity from the Authorization header and passing it to an agent:

def handler(event: dict, ctx):
    user_id = extract_user_id(event["headers"]["Authorization"])
    user_prompt: dict = json.loads(event["body"])["prompt"]
    agent_response = agent.prompt(user_id, user_prompt)
  
    return {
        "statusCode": 200,
        "body": json.dumps({"text": agent_response.text})
    }

The identity context can become a part of the agent’s execution loop. An agent might check the user’s department before booking travel or restrict access to sensitive tools unless the user has the appropriate permissions. By integrating authentication early, you not only enhance security, but also unlock rich personalization and audit capabilities that make agents enterprise-ready from day one.

When using Bedrock AgentCore, the AgentCore Identity primitive allows your AI agents to securely access AWS services and third-party tools either on behalf of users or as themselves with pre-authorized user consent. It provides managed OAuth 2.0 supported providers for both inbound and outbound authentication. During the preview phase, AgentCore Identity supports identity providers like Amazon Cognito, Auth0 by Okta, Microsoft Entra ID, GitHub, Google, Salesforce, and Slack. Refer to the samples for implementation details.

Building portable Strands agents on AWS

Strands Agents SDK is compute-agnostic. The agents you build are standard Python applications, which can run on any compute type.

For portability and maintainability, separate your agent’s business logic from the interface layer. By doing this, you can reuse the same core agent code across environments, whether invoked through API Gateway and Lambda functions, accessed through Application Load Balancer and Amazon ECS, running on AgentCore Runtime, or even executed locally during development, as shown in the following figure.

The following code snippets illustrate this technique.

Lambda handler code:

def handler(event: dict, ctx):
     user_id = extract_user_id(event)
     user_prompt = json.loads(event["body"])["prompt"]
     agent_response = call_agent(user_id, user_prompt)
     return {
          "statusCode":200,
          "body": json.dumps({
               "text": agent_response.mesage
          })
     }

AgentCore code:

@app.entrypoint
def invoke(payload):
     user_id = extract_user_id(payload)
     user_prompt = payload.get("prompt")
     agent_response = call_agent(user_id, user_prompt)
     return {"result": agent_response.message)

HTTP Handler code:

@app.post("/prompt")
async def prompt(request: Request, prompt_request: PromptRequest):
    user_id=extract_user_id(request)
    user_prompt = prompt_request.prompt
    agent_response = call_agent(user_id, user_prompt)
    return {"text": agent_response.message)

For local testing:

if __name__ == "__main__":
     user_id="local-testing-user"
     user_prompt="book me a trip to NYC"
     agent_response = call_agent(user_id, user_prompt)
     return agent_response.message

Agent code:

def call_agent(user_id, user_prompt):
     agent = Agent(
          system_prompt="You’re a travel agent…",
          model=my_model,
          session_manager = my_session_manager,    
      )
     agent_response = agent(user_prompt)
     return agent_response

Extending agent functionality with tools

A key strength of agentic systems is their ability to invoke tools that perform actions or retrieve real-time data, enabling agents to interact with the outside world, not just generate text. The Strands Agents SDK includes built-in tools and allows you to define your own custom tools, as either in-process Python functions or external tools accessible over HTTP using the Model Context Protocol (MCP). These tools can fetch data, call APIs, or trigger workflows, and can be registered for the agent to reason over and use during execution.

The following snippet illustrates creating an in-process tool. See the documentation for more examples.

from strands import tool 

@tool
def get_weather(city: str) -> str:
    weather = call_weather_api(city)
    return f"The current weather in {city} is {weather}"

Integrating with remote MCP servers

Model Context Protocol (MCP) is an open standard that decouples agents from tools using a client-server model. Instead of embedding tool logic directly into the agent, your agent becomes an MCP client that connects to one or more MCP servers – each exposing tools, resources, and reusable prompts.

Running remote MCP servers is especially valuable when tools span multiple business domains or are provided by third-party vendors, just like how microservices separate responsibilities across teams and systems. This separation allows each domain team to manage their own tools independently while exposing a consistent, standardized interface to agents. It also enables reuse, versioning, and centralized governance without tightly coupling logic into the agent itself. By decoupling tools from agents, MCP unlocks composability, scalability, and long-term ecosystem growth.

The following snippet illustrates configuring an MCP client to connect to a remote MCP Server, retrieving the list of tools, and integrating those tools with an agent.

mcp_client = MCPClient(lambda: streamablehttp_client(
    url=mcp_endpoint,
    headers={"Authorization": f"Bearer {token}"},
))

with mcp_client:
  tools = mcp_client.list_tools_sync()
  agent = Agent(tools=tools)

When using Bedrock AgentCore, you can operate MCP at scale through AgentCore Gateway. It provides an easy and secure way for developers to build, deploy, discover, and connect to remote tools like above at scale. With AgentCore Gateway, developers can convert APIs, Lambda functions, and existing services into Model Context Protocol (MCP)-compatible tools and make them available to agents through Gateway endpoints with just a few lines of code.

Monitoring and observability

Observability is essential when running AI agents. Beyond traditional metrics such as uptime and latency, agentic systems introduce new telemetry dimensions, such as LLM latency, token consumption, and tracing reasoning cycles. These new metrics are essential for understanding both the performance and cost of your agentic systems.

When deploying agents using AWS services such as Bedrock AgentCore, Lambda, or ECS, you inherit the built-in observability capabilities, such as seamless integration with Amazon CloudWatch for metrics, logs, and distributed tracing. This simplifies tracking invocation counts, errors, request duration, and concurrency, as shown in the following figure – essential for operating reliable and scalable agentic applications.

In addition, the Strands Agents SDK provides built-in agent observability features. It uses OpenTelemetry (OTEL) to automatically trace each agent interaction, including spans for LLM calls, tool usage, and context updates. It also exports detailed metrics such as token counts, tool execution times, and decision cycle durations. These metrics can be sent to any OTEL-compatible backend, giving you deep, real-time visibility into how your agents reason, act, and adapt. The following snippet shows built-in token usage metrics:

{
  "accumulated_usage": {
    "inputTokens": 1539,
    "outputTokens": 122,
    "totalTokens": 1661
  },
  "average_cycle_time": 0.881234884262085,
  "total_cycles": 2,
  "total_duration": 1.881234884262085,
  ... redacted ...
}

Learn more about observability and evaluation of Strands agents from this sample code.

When using Bedrock AgentCore, the AgentCore Observability primitive helps you to log and capture metrics and traceability from other AgentCore primitives like runtime, memory, and gateway, as described in this tutorial.

Security considerations

You should build secure communication and access controls layers deploying AI agents that integrate with remote MCP servers. All client-server interactions should be encrypted using TLS, ideally with mutual TLS for bidirectional authentication. Access to tools should be validated through authorization checks with fine-grained permissions to enforce least privilege access. Deploying MCP servers behind an API Gateway provides additional security layers like DDoS protection, WAF, and centralized authentication. Use API Gateway logging capabilities to capture caller identity and execution outcomes. Using trusted, versioned MCP repositories helps protect against supply chain attacks and ensures consistent tool governance across teams. Protocols such as MCP are evolving rapidly, you should always use the most recent versions to minimize potential security vulnerabilities risk.

In addition, you should leverage security best practices described in the AWS Well-Architected Framework Security Pillar, such as enforcing strict IAM role scoping, integrating with identity providers for user context, encrypting all data in transit and at rest, and using VPC endpoints and PrivateLink to limit network exposure. To protect against prompt injection attacks, sanitize inputs, and ensure you maintain comprehensive audit logs for compliance and governance.

Sample project

Follow instructions in this GitHub repo to deploy a sample project implementing the practices described in this post using the AWS Serverless compute. The repo includes a travel agent implemented with Strands Agents SDK and a remote MCP server, both running as Lambda functions.

Conclusion

Agentic AI moves beyond simple prompt-response interactions to enable dynamic, goal-driven workflows. In this post, you learned how to build scalable, production-ready agents on AWS using the Strands Agents SDK and serverless services such as Lambda and Amazon ECS.

By externalizing state, integrating authentication, and adding observability, agents can operate securely and at scale. With support for in-process and remote tools through the MCP, you can cleanly separate responsibilities and build composable, enterprise-ready systems. You can combine these patterns to deliver intelligent, adaptable AI agents that fit naturally into modern cloud and event-driven architectures.

Useful resources

To learn more about Serverless architectures see Serverless Land.

Modernizing SOAP applications using Amazon API Gateway and AWS Lambda

Post Syndicated from Daniel Abib original https://aws.amazon.com/blogs/compute/modernizing-soap-applications-using-amazon-api-gateway-and-aws-lambda/

This post demonstrates how you can modernize legacy SOAP applications using Amazon API Gateway and AWS Lambda to create bidirectional proxy architectures that enable integration between SOAP and REST systems without disrupting existing business operations.

Many organizations today face the challenge of maintaining critical business systems that were built decades ago. These legacy applications power essential business operations despite relying on outdated technologies and integration patterns. Although complete system replacement would be ideal, practical constraints such as budget limitations, resource availability, technical complexity, and missing documentation often make modernization efforts challenging.

This post first shows proxy architecture patterns to expose a legacy SOAP server over a REST API. It then shows how to integrate a legacy SOAP client with applications using a REST API.

While SOAP and REST APIs share HTTP as their foundation, SOAP has some limitations compared to REST, like limited HTTP methods (GET/POST only) and mandatory XML formatting. REST is more flexible with multiple HTTP methods and diverse payload formats (plain text, binary, HTML, JSON, XML).

Using API Gateway and Lambda to proxy SOAP service

Consider a legacy solution that only supports SOAP. The following diagram shows the architecture for a SOAP proxy server using API Gateway and Lambda.

Figure 1: SOAP Server Proxy for modernized architecture

Figure 1: SOAP Server Proxy for modernized architecture

The proxy exposes the APIs hosted on the SOAP Server (on the right side of the image) over a REST interface. A SOAP service expects the HTTP Content-Type header set to text/xml, and a XML format payload that follows the WSDL specification defined by the server.

In the proposed architecture, the Lambda function is the core transformation engine, handling the bidirectional conversion between JSON and XML formats. Lambda functions can be developed in multiple programming languages such as Python, Node.js, Java, C#, Go, Ruby, and PowerShell, allowing you to use your existing development expertise. The serverless nature of Lambda provides automatic scaling to handle traffic spikes without needing infrastructure management or capacity planning.

API Gateway acts as the intelligent front door, managing all incoming requests and routing them appropriately. It provides enterprise-grade features such as request throttling to protect backend systems from overload, comprehensive authentication and authorization mechanisms, API key management for partner access control, request and response validation, caching capabilities for improved performance, and detailed monitoring and logging. These built-in features remove the need for custom middleware development and provide immediate operational benefits. API Gateway can receive multiple payload format such as XML, JSON, binary data, and plain text. This makes it suitable for diverse integration scenarios.

Using API Gateway and Lambda to support legacy SOAP clients

The previous section focused on exposing SOAP services over REST APIs. Organizations also face the reverse challenge where legacy SOAP client applications must access REST services. The architecture for supporting legacy SOAP clients follows a similar pattern but with reversed data flow. In this case, the legacy SOAP client sends XML-formatted requests to what it believes is a SOAP server. However, behind the scenes API Gateway and Lambda work together to translate these requests into REST API calls.

Figure 2: Legacy SOAP client modernization architecture

Figure 2: Legacy SOAP client modernization architecture

The legacy SOAP client application sends XML SOAP messages to API Gateway. The Lambda function receives these SOAP requests, extracts the relevant data from the XML envelope, and transforms it into JSON format for the modern REST service.

The Lambda function wraps the JSON response from the REST services into the SOAP XML format that the legacy client expects. It recreates the appropriate XML structure, SOAP headers, and ensures that the response conforms to the WSDL specification that the client application was designed to consume.

Example scenario

Let’s suppose our legacy client application needs to send a SOAP request to convert an integer number to its word form. The SOAP envelop to convert the number 1519 to its long form “one thousand, five hundred and nine” looks like this:

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body> \
        <ConvertNumberToWordsSoapIn>
            <NumberToWordsRequest>1519</NumberToWordsRequest>
        </ConvertNumberToWordsSoapIn>
    </soap:Body>
</soap:Envelope>

The REST conversion service expects a JSON payload the as follownig:

jsonObject = {
	"data" : 1519
}

The following code block shows a sample Lambda function implementation for this. This function converts the SOAP XML envelop to JSON, changes the http header to application/json, and converts response from REST service to SOAP format.

var parseString = require('xml2js').parseString;
const axios = require('axios');

exports.handler = async (event, context) => {
    var valueNumber;
    
    try {
        console.log("Parsing XML string");

        // Parsing the XML to obtain data needed for conversion (number to words)
        parseString(event.body, function (err, result) {
            if (!err) {
                valueNumber = result['soap:Envelope']['soap:Body'][0]
                              ['ConvertNumberToWordsSoapIn'][0]
                              ['NumberToWordsRequest'][0];
            } else { 
                console.log (err);
                throw (err);
            }
        });
        console.log("Creating JSON for calling the service");
        // Creating JSON to call service
        var jsonObject = {
            "data" : valueNumber
        }
        
        console.log("Calling Microservice (NumberToWords)");
        const headers = { 
            'Content-Type': 'Application/json'
        };
        
        console.log ("Parameter for NumberToWords URL:" + 
                    JSON.stringify(process.env.NumberToWordMicroservice));

        // Calling numberToWords REST Server
        var resultNumberToWords = await 
            axios.post(process.env.NumberToWordMicroservice, jsonObject, { headers });
        
        // Creating the response
        console.log("Creating response XML");

        var resp =  create_response (JSON.stringify(resultNumberToWords.data.message));
        console.log("Response in XML: "+ resp);
        
        // Returning the value in XML using text/xml content type
        let response = {'statusCode': 200, headers: {"content-type": "text/xml"}, 
                        'body': JSON.stringify(resp)}
        return response;
        
    } catch (err) {
        console.log ("Error: " + err);
        let response = {'statusCode': 500, 
                        headers: {"content-type": "text/xml"}, 'body': err}
        return response;
    }
};

// Function to create a SOAP XML envelope with the result value
function create_response(numberInWords) {
  return '<?xml version="1.0" encoding="utf-8"?> \
            <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">\
            <soap:Body>\
              <m:ConvertNumberToWordsResponse xmlns:m="http://www.dataaccess.com/webservicesserver/"> \
                  <m:ConvertNumberToWordsResponseResult>' + numberInWords + '</m:ConvertNumberToWordsResponseResult> \
              </m:ConvertNumberToWordsResponse> \
            </soap:Body>\
            </soap:Envelope>';
}

With this approach, you can maintain your existing SOAP client applications without modification, allowing them to consume modern REST services. You can preserve investments in legacy client applications while gradually modernizing the overall system. This architecture is particularly valuable in scenarios where multiple legacy SOAP clients need to access the same modern REST services. This is because a single proxy can serve multiple client applications simultaneously. The serverless nature of the architecture makes sure that it scales automatically based on the number of client requests, providing cost-effective operation regardless of usage patterns.

Alternative approach using API Gateway transformation capabilities

The Lambda-based approach provides maximum flexibility and control. API Gateway also offers built-in transformation capabilities that can handle certain SOAP modernization scenarios without the need for compute resources.

The native API Gateway transformation uses Apache Velocity Template Language mapping templates. It converts the payload directly at the gateway, offering a streamlined solution for specific modernization scenarios.

The VTL approach works by defining mapping templates that handle the conversion process between different payload formats. When modernizing SOAP services, these templates can intercept REST requests with JSON payloads, restructure the data into XML format compatible with your legacy SOAP endpoints, and reverse the process for responses returning to the client.

Figure 3: API Gateway with velocity template language transformation

Figure 3: API Gateway with velocity template language transformation

This gateway-native transformation strategy offers several operational advantages. You benefit from streamlined architecture because the transformation logic resides entirely within the API Gateway service. There are no other infrastructure components to manage or monitor, and the solution avoids the complexity of coordinating between multiple AWS services. Cost efficiency is another key benefit, as there are no compute charges beyond the standard API Gateway pricing.

Consider the previous example of converting a number to its word format. The VTL transformation in API Gateway will look like this:

## Parse the SOAP envelope and extract the number value
#set(\$xmlDoc = \$input.path('\$'))
#set(\$numberToWords = \$xmlDoc.Envelope.Body.ConvertNumberToWordsSoapIn.NumberToWordsRequest)

## Convert to integer if it's a string
#if(\$numberToWords.toString().matches("^\d+\$"))
  #set(\$dataValue = \$numberToWords.toInteger())
#else
  #set(\$dataValue = \$numberToWords)
#end

{
  "data": \$dataValue
}

You should consider VTL transformations when your SOAP services have predictable, stable schemas with relatively direct XML structures. This approach works particularly well for legacy systems that rarely undergo changes and have clear request-response patterns. For more dynamic environments or complex transformation requirements, the Lambda-based solution provides superior flexibility and maintainability.

Security considerations

An important consideration when working with legacy SOAP services is understanding their authentication mechanisms. SOAP protocols often implement authentication through security standards, where authentication credentials and security tokens are embedded directly within the SOAP envelope headers. This includes username tokens, digital signatures, and encryption elements that are part of the XML structure.

When SOAP envelopes contain unencrypted authentication information in the headers, the proxy architecture typically functions without more modifications. This is because the Lambda function can pass through these authentication elements transparently to the backend SOAP service. However, due to the nature of SOAP authentication being tightly integrated with the XML envelope structure, certain scenarios may need custom handling within the Lambda function.

For example, if the SOAP service uses timestamp-based authentication tokens, session management, or needs specific security header modifications, the Lambda function may need customization to properly handle, validate, or refresh these authentication elements during the JSON-to-XML transformation process. Organizations should carefully analyze their SOAP service authentication requirements to determine if more Lambda logic is needed to maintain security compliance.

Moreover, make sure that any SOAP authentication credentials processed by the Lambda function are handled securely and never logged in plain text.

Conclusion

In this post, you learned how cloud-native services can bridge the gap between legacy systems and modern application architectures, allowing you to use your existing investments while adopting contemporary development practices and technologies.

Amazon API Gateway and AWS Lambda enable organizations to create REST services that proxy legacy SOAP servers, allowing modern applications to consume legacy services through JSON payloads while preserving existing SOAP infrastructure. This serverless solution provides cost-effectiveness, automatic scaling, and reduced operational overhead while facilitating company modernization through scalable APIs without abandoning legacy software investments.

This modernization strategy allows you to gradually transition from legacy SOAP services to modern REST APIs without disrupting existing business operations. As your modernization journey progresses, you can extend this pattern to support more SOAP services or implement more sophisticated transformation logic based on your specific business requirements.

For more serverless learning resources, visit Serverless Land.

Amazon OpenSearch Service 101: Create your first search application with OpenSearch

Post Syndicated from Sriharsha Subramanya Begolli original https://aws.amazon.com/blogs/big-data/amazon-opensearch-service-101-create-your-first-search-application-with-opensearch/

Organizations today face the challenge of managing and deriving insights from an ever-expanding universe of data in real time. Industrial Internet of Things (IoT) sensors stream millions of temperature, pressure, and performance metrics from field equipment every second. Ecommerce platforms need to surface relevant products from vast catalogs instantly. Security teams must analyze system logs in real time to detect threats. As data volumes grow, organizations increasingly struggle with fragmented monitoring tools that create critical visibility gaps and slow incident response times. The cost of commercial observability solutions becomes prohibitive, forcing teams to manage multiple separate tools and increasing both operational overhead and troubleshooting complexity. Across these diverse scenarios, the ability to efficiently search, analyze, and visualize data in real time has become crucial for business success.

Amazon OpenSearch Service addresses these challenges by providing a fully managed search and analytics service. This managed service configures, manages, and scales OpenSearch clusters so you can focus on your search workloads and end customers. Amazon OpenSearch Serverless further makes it straightforward to run search and log analytics workloads by automatically scaling compute and storage resources up and down to match your application’s demands—with no infrastructure to manage. Whether you’re processing continuous streams of IoT telemetry, enabling product discovery, or performing security analytics, OpenSearch Service scales to meet your needs.

In this post, we walk you through a search application building process using Amazon OpenSearch Service. Whether you’re a developer new to search or looking to understand OpenSearch fundamentals, this hands-on post shows you how to build a search application from scratch—starting with the initial setup; diving into core components such as indexing, querying, result presentation; and culminating in the execution of your first search query.

Components of OpenSearch Service

Before building your first search application, it’s important to understand some key architectural components in OpenSearch. The fundamental unit of information in OpenSearch is a document stored in JSON format. These documents are organized into indices—collections of related documents that function similar to database tables. When you search for information, OpenSearch queries these indices to find matching documents.

OpenSearch operates on a distributed architecture where multiple servers, called nodes, work together in a cluster or domain. Each cluster can utilize dedicated master nodes that focus solely on cluster management tasks, such as maintaining cluster state, managing indices, and orchestrating shard allocation. These specialized nodes enhance cluster stability by offloading cluster management duties from data nodes. Data nodes, on the other hand, handle the storage, indexing, and querying of data—essentially performing the heavy lifting of data operations. Together, they provide scalability, availability, and efficient data processing in the cluster. Configure dedicated coordinator nodes that specialize in routing and distributing search and indexing requests across the cluster. These nodes reduce the load on data nodes, which allows them to focus on data storage, indexing, and search operations.

Coordinator nodes in OpenSearch are most beneficial in the following scenarios:

  1. Large cluster deployments – When managing substantial data volumes across many nodes.
  2. Query-intensive workloads – For environments handling frequent search queries or aggregations, especially those with complex date histograms or multiple aggregations, benefit from faster query processing.
  3. Heavy dashboard utilizationOpenSearch Dashboards can be resource-intensive. Offloading this responsibility to dedicated coordinator nodes reduces the strain on data nodes.

To manage large datasets efficiently, OpenSearch splits indices into smaller pieces called shards. Each shard is distributed across the cluster, with a recommended size of 10–50 GB for optimal performance. For reliability and high availability, OpenSearch maintains replica copies of these shards on different nodes, which means that your data remains accessible even if some nodes fail.

Search operations in OpenSearch are powered by inverted indices, a data structure that maps terms to the documents containing them. The BM25 ranking algorithm helps make sure that search results are relevant to users’ queries. Although searches happen in near real time, with configurable refresh intervals, individual document retrievals are immediate.

This architecture provides the foundation for handling high-volume IoT data streams, complex full-text search operations, and real-time analytics, all while maintaining fault tolerance. Understanding these components will help you make informed decisions as you build your search application.OpenSearch Dashboards is a visualization and analytics tool for exploring, analyzing, and visualizing data in real time. It provides an intuitive interface for querying, monitoring, and reporting on OpenSearch data using visualizations such as charts, graphs, and maps. Key features include interactive dashboards, alerting, anomaly detection, security monitoring, and trace analytics.

Sample Amazon OpenSearch Service tutorial application overview

The following architecture diagram demonstrates how to build and deploy a scalable, fully managed search application on Amazon Web Services (AWS). The architecture uses Amazon OpenSearch Service for indexing and searching data. The UI application is deployed on AWS App Runner and interacts with Amazon OpenSearch Service through secure serverless Amazon API Gateway and AWS Lambda.

Scope of Solution

Here is the end-to-end workflow for our application detailing how user requests are handled from initial access through to data retrieval or indexing:

  1. Users access the application through AWS App Runner, which hosts the frontend interface.
  2. Amazon Cognito handles user authentication and authorization for secure access to the application.
  3. When users interact with the application, their requests are sent to API Gateway. API Gateway communicates with Amazon Cognito to verify user authentication status. It serves as the primary entry point for all API operations and routes the requests appropriately. It forwards requests to Lambda functions within the virtual private cloud (VPC).
  4. Lambda functions process the requests, performing either:
  5. Data indexing operations into OpenSearch Service
  6. Search queries against the OpenSearch Service cluster
  7. The OpenSearch Service cluster resides within a private subnet in a VPC for enhanced security.

Prerequisites

Before you deploy the solution, review the prerequisites.

Install the sample app

The entire infrastructure is deployed using AWS Cloud Development Kit (AWS CDK), with cluster configurations customizable through the cdk.json file on GitHub. This deployment approach provides consistent and repeatable infrastructure creation while maintaining security best practices. The steps to deploy this infrastructure are available in this README file. After deployment, you’ll access a comprehensive search application built with Cloudscape React components that includes:

  1. Interactive search functionality – Test various OpenSearch query methods including prefix match keyword searches, phrase matching, fuzzy searches, and field-specific queries against the sample product dataset
  2. Document management tools – Bulk index the product catalog with a single click or delete and recreate the index as needed for testing purposes
  3. Educational resources – Access embedded guides explaining OpenSearch concepts, query syntax, and best practices

Index the documents

After you’ve deployed this search application, the first step is to index some documents into OpenSearch Service. Sign in to the search application UI and follow these steps:

  1. To trigger a bulk index process, under Index Documents in the navigation pane, choose Bulk Index Product Catalog.
  2. Choose Index Product catalog, as shown in the following screenshot.

The Lambda function indexes a comprehensive ecommerce product catalog into your newly created OpenSearch Service cluster. This sample dataset includes detailed fashion and lifestyle products spanning multiple categories. Each product record contains rich metadata, including title, detailed description, category, color, and price.

Bulk Index Process

Keyword searches

OpenSearch Service offers multiple search features. For an exhaustive list, refer to Search features. We focus on a few keyword search types to help you get started with OpenSearch.

With the product catalog in OpenSearch, you can perform prefix searches through the search application’s intuitive interface. To better understand the search functionality, expand the Guide section at the top of the interface. This interactive guide explains how various kinds of searches work, complete with a practical example in context of the product catalog dataset. The guide includes best practices and a link to the detailed documentation to help you make the most of OpenSearch’s powerful query capabilities.

You can do a prefix search on any of the three key search fields: Title, Description, or Color.

A typical prefix match query looks like this:

{
  "query": {
    "match_phrase_prefix": {
      "attribute_name": {
        "query": "attribute_value",
        "max_expansions": 10,
        "slop": 1
      }
    }
  }
}

You can use this query pattern to find documents where specific fields begin with your search term, offering an intuitive “starts with” search experience.

The following image illustrates a practical example of the Prefix Match search. Entering “Ru” in the title field matches products with titles such as “Running”, “Runners” and “Ruby.” Prefix Match search is particularly useful when users only remember the beginning of a product name or are searching across multiple variations or simply exploring product categories.

Prefix Match example

Multi Match search enables searching across multiple fields simultaneously. For example, you can search for “Coral” across product title, description, and color fields simultaneously. The search query can be customized using field boosting in which matches in certain fields carry more weight than others.

A typical multi match query looks like this:

{
  "query": {
    "multi_match": {
      "query": "Coral",
      "fields": [
        "title^3",
        "description",
        "color"
      ],
      "type": "best_fields"
    }
  }
}

You can explore Wildcard Match, Range Filter, and other search features through the search application. For developers and administrators managing this search infrastructure, OpenSearch Dashboards is a native, developer-friendly interface for indexing, searching, and managing your data. It serves as a comprehensive control center where you can interact directly with your indices, test queries, and monitor performance in real time. The following screenshot shows OpenSearch Dashboards which provides an interactive UI to explore, analyze and visualize search and log data.

OpenSearch Dashboards

While our example demonstrates lexical search functionality on a sample product catalog, OpenSearch Service is equally powerful for observability usecases. When handling time-series data from logs, metrics, or traces, OpenSearch excels at real-time analytics and visualization. For instance, DevOps teams can index application logs and system telemetry data, then use date histograms and statistical aggregations to identify performance bottlenecks or security anomalies as they occur. This real-time search allows IT teams to detect and respond to incidents with minimal delay. Using OpenSearch Dashboards, teams can create live operational dashboards that update automatically as new data streams in. For IoT applications monitoring thousands of sensors, this means temperature anomalies or equipment failures can trigger immediate alerts through OpenSearch’s alerting capabilities. These observability workloads benefit from the same distributed architecture that powers our product search example, with the added advantage of time-series optimized indices and retention policies for managing high-volume streaming data efficiently.

Beyond search management, you can configure alerts for specific conditions, set up notification channels for operational events, and enable data discovery features. If you want to experiment with the same search queries we implemented in our application, you can launch OpenSearch Dashboards and use relevant index and search APIs from the Dev Tools section, which is an ideal environment for developing and testing before implementing in your production application. Because our OpenSearch Service cluster resides within a private subnet, you need to create a Secure Shell (SSH) tunnel to access the dashboard. For more information and steps to do this, refer to How do I use an SSH tunnel to access OpenSearch Dashboards with Amazon Cognito authentication from outside a VPC? in the Knowledge Center. So far, we’ve explored OpenSearch’s query domain-specific language (DSL). However, for those coming in from a traditional database background, OpenSearch also offers SQL and Piped Processing Language (PPL) functionality, making the transition smoother. You can explore more on this at SQL and PPL in the OpenSearch documentation.

In this post, we introduced you to different types of keyword searches. You can also store documents as vector embeddings in OpenSearch and use it for semantic search, hybrid search, multimodal search, or to implement Retrieval Augmented Generation (RAG) pattern.

Conclusion

You can now build sample search applications by following the steps outlined in this post and the implementation details available at sample-for-amazon-opensearch-service-tutorials-101 on GitHub. By using the distributed architecture of Amazon OpenSearch Service, an AWS managed service, you get fast, scalable search capabilities that grow with your business, built-in security and compliance controls, and automated cluster management—all with pay-only-for-what-you-use pricing flexibility.

Ready to learn more? Check out the Amazon OpenSearch Service Developer Guide. For more insights, best practices and architectures, and industry trends, refer to Amazon OpenSearch Service blog posts and hands-on workshops at AWS Workshops. Please also visit the OpenSearch Service Migration Hub if you are ready to migrate legacy or self-managed workloads to OpenSearch Service.

We hope this detailed guide and accompanying code will help you get started. Try it out, let us know your thoughts in the comments section, and feel free to reach out to us for questions!


About the authors

SriharshaSriharsha Subramanya Begolli works as a Senior Solutions Architect with Amazon Web Services (AWS), based in Bengaluru, India. His primary focus is assisting large enterprise customers in modernizing their applications and developing cloud-based systems to meet their business objectives. His expertise lies in the domains of data and analytics.

Fraser SequeiraFraser Sequeira is a Startups Solutions Architect with Amazon Web Services (AWS) based in Melbourne, Australia. In his role at AWS, Fraser works closely with startups to design and build cloud-native solutions on AWS, with a focus on analytics and streaming workloads. With over 10 years of experience in cloud computing, Fraser has deep expertise in big data, real-time analytics, and building event-driven architecture on AWS. He enjoys staying on top of the latest technology innovations from AWS and sharing his learnings with customers. He spends his free time tinkering with new open source technologies.

AWS Weekly Roundup: New AWS Heroes, Amazon Q Developer, EC2 GPU price reduction, and more (June 9, 2025)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-new-aws-heroes-amazon-q-developer-ec2-gpu-price-reduction-and-more-june-9-2025/

The AWS Heroes program recognizes a vibrant, worldwide group of AWS experts whose enthusiasm for knowledge-sharing has a real impact within the community. Heroes go above and beyond to share knowledge in a variety of ways in developer community. We introduce our newest AWS Heroes in the second quarter of 2025.

To find and connect with more AWS Heroes near you, visit the categories in which they specialize Community Heroes, Container Heroes, Data Heroes, DevTools Heroes, Machine Learning Heroes, Security Heroes, and Serverless Heroes.

Last week’s launches
In addition to the inspiring celebrations, here are some AWS launches that caught my attention.

For a full list of AWS announcements, be sure to keep an eye on What’s New at AWS.

Other AWS news
Here are some additional projects, blog posts that you might find interesting:

  • Up to 45 percent price reduction for Amazon EC2 NVIDIA GPU-accelerated instances – AWS is reducing the price of NVIDIA GPU-accelerated Amazon EC2 instances (P4d, P4de, P5, and P5en) by up to 45 percent for On-Demand and Savings Plan usage. We are also making the very new P6-B200 instances available through Savings Plans to support large-scale deployments.
  • Introducing public AWS API models – AWS now provides daily updates of Smithy API models on GitHub, enabling developers to build custom SDK clients, understand AWS API behaviors, and create developer tools for better AWS service integration.
  • The AWS Asia Pacific (Taipei) Region is now open – The new Region provides customers with data residency requirements to securely store data in Taiwan while providing even lower latency. Customers across industries can benefit from the secure, scalable, and reliable cloud infrastructure to drive digital transformation and innovation.
  • Amazon EC2 has simplified the AMI cleanup workflow – Amazon EC2 now supports automatically deleting underlying Amazon Elastic Block Store (Amazon EBS) snapshots when deregistering Amazon Machine Images (AMIs).
  • The Lab where AWS designs custom chips – Visit Annapurna Labs in Austin, Texas—a combination of offices, workshops, and even a mini data center—where Amazon Web Services (AWS) engineers are designing the future of computing.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events.

  • Join re:Inforce from anywhere – If you aren’t able to make it to Philadelphia (June 16–18), tune in remotely. Get free access to the re:Inforce keynote and innovation talks live as they happen.
  • AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Shanghai (June 19 – 20), Milano (June 18), Mumbai (June 19) and Japan (June 25 – 26).
  • AWS re:Invent – Mark your calendars for AWS re:Invent (December 1 – 5) in Las Vegas. Registration is now open
  • AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Mexico (June 14), Nairobi, Kenya (June 14) and Colombia (June 28)

That’s all for this week. Check back next Monday for another Weekly Roundup!

Betty

Dynamically routing requests with Amazon API Gateway routing rules

Post Syndicated from Anton Aleksandrov original https://aws.amazon.com/blogs/compute/dynamically-routing-requests-with-amazon-api-gateway-routing-rules/

Effective API management and routing capabilities are crucial for organizations managing complex application architectures. Whether you’re a technology company rolling out new API versions to millions of users, or a financial services organization conducting A/B tests to optimize customer experiences, the ability to route API traffic dynamically and efficiently is essential.

Today, Amazon API Gateway announces support for dynamic routing rules for custom domain names in all supported AWS Regions. This new capability enables you to route API requests based on HTTP header values, either independently or in combination with URL paths. In this post, you will learn how to use this new capability to implement routing strategies such as API versioning and gradual rollouts without modifying your API endpoints.

Dynamic Routing Rules Overview

Many organizations require dynamic API routing capabilities to support their evolving business needs. As a line-of-business persona, you want to be able to test new user experiences with specific customer segments, while maintaining their existing flows intact. As an engineer, you want to be able to maintain multiple API versions across different client applications while ensuring regulatory compliance. Prior to this launch, developers using API Gateway implemented dynamic routing by using different URL paths, such as “/v1/products” and “/v2/products”.

With this new launch, you can implement dynamic routing logic with a simple declarative configuration within the custom domain name settings. The new routing rule mechanism allows you to make routing decisions based on HTTP headers, base paths, or a combination of both. Developers are no longer required to create new or alter existing paths to smoothly transition between API versions, they can simply specify the desired value in the request HTTP header. Among other possibilities, you can implement cell-based architecture routing, A/B testing, or dynamic backend selection based on hostname, tenant ID, accepted response media type, or cookie value. By implementing routing logic directly within the API Gateway, you can eliminate proxy layers and complex URL structures while maintaining fine-grained control over your API traffic. This new feature seamlessly integrates with existing API Gateway capabilities and supports both public and private REST APIs. The following diagram shows how you can use routing rules for header and base-path based routing. This example uses a single level resource /products to show path matching, however depending on your use-case you could also use multi-level paths like /products/items.

Figure 1. Using routing rules for header and base-path based routing

In the following section you’ll learn how to implement header-based routing, use the new routing rules construct for common scenarios like API versioning and A/B testing, and configure rules with different routing conditions and priorities to achieve the desired behavior.

What is a routing rule

A routing rule is a new resource type uniquely associated with a single custom domain. It represents a collection of conditions that, when matched, cause the incoming request to be forwarded to a specific API and stage. Routing rules have three configuration properties:

  • The Conditions property defines the criteria that must be met for actions to be taken. A rule can include up to two header conditions and one base path condition, and all specified conditions must be met to trigger the action. If no conditions are defined for a rule, it serves as a catch-all rule matching all requests.
  • The Actions property defines what actions will be taken when rule conditions are met. At the time of this launch the supported action is invoking any stage of any REST API within the same account and region boundaries.
  • The Priority property defines the order that rules are evaluated in, with 1 being highest priority and 1,000,000 the lowest. You cannot reuse same priority value for more than one rule. AWS recommends you leave ample space between sequential rules to make it easy to add new rules in future, for example use 100, 200, 300 instead of 1, 2, 3.

Header conditions, specified via a MatchHeaders property, are used to match HTTP request header values, such as x-version=v1. Conforming to RFC 7230, header names are not case sensitive, while header values are. You can also use wildcards in header values for prefix, suffix, and contains match. See the following examples using AWS CloudFormation templates:

Exact match:

- MatchHeaders: 
	AnyOf: 
		- Header: "x-version" 
		ValueGlob: "alpha-v2-latest"

Will only match x-version=alpha-v2-latest

Prefix match:

- MatchHeaders: 
	AnyOf: 
	- Header: "x-version" 
	ValueGlob: "*latest"

Matches x-version=alpha-v2-latest, but not x-version=alpha-v2

Suffix match:

- MatchHeaders: 
	AnyOf: 
		- Header: "x-version" 
		ValueGlob: "alpha*"

Will match x-version=alpha-v2-latest and x-version=alpha-v1, but not x-version=beta-v1

Prefix and suffix match.

- MatchHeaders: 
	AnyOf: 
		- Header: "x-version" 
		ValueGlob: "*v2*"

Matches x-version=alpha-v2-latest and x-version=beta-v2-test, but not x-version=alpha-v1

Base path condition, specified via MatchBasePaths property, is used to match the incoming request path. The matching is case sensitive.

- MatchBasePaths: 
	AnyOf: 
		- "products"

You can have up to two MatchHeaders and one MatchBasePaths conditions per routing rule. Conditions are evaluated using the AND operator, meaning all conditions must be met for the action to be taken. Both condition types support a single comparison value under AnyOf property. The following snippet illustrates a sample routing rule with two MatchHeaders conditions and a single MatchBasePaths condition.

ProductsV1RoutingRule: 
	Type: 'AWS::ApiGatewayV2::RoutingRule' 
	Properties: 
		DomainNameArn: !Sub "arn:aws:apigateway:${AWS::Region}::/domainnames/${ApiCustomDomain}" 
		Priority: 100 
		Conditions: 
			- MatchHeaders: 
				AnyOf: 
					- Header: "x-version" 
					ValueGlob: "v2" 
			- MatchHeaders: 
				AnyOf: 
					- Header: "x-user-cohort" 
					ValueGlob: "beta-testers" 
			- MatchBasePaths: 
				AnyOf: 
					- "products" 
		Actions: 
				- InvokeApi:
					ApiId: !Ref ProductsV2Api 
					Stage: !Ref ProductsV2Stage

This rule matches requests to https://example.com/products when both header conditions are met – x-version=v2 and x-user-cohort=beta-testers. This rule does not match requests to any other base path, such as https://example.com/orders, or requests that do not match at least one header condition.

For scenarios like API versioning, you can create rules that evaluate headers such as “accept” or “version” to route traffic to different API implementations. For example, to route requests containing “x-version: api-beta” to your beta API, you would create a rule specifying this header condition and set the action to route to your beta API deployment.

Header-based routing also simplifies A/B testing by allowing you to define client cohorts based on custom headers, allowing controlled experiments with different configurations. You can create rules that check for a custom header like “x-test-group” to route specific users to different API implementations. The priority system ensures predictable routing behavior – when multiple rules match a request, the rule with the lowest priority number (highest precedence) determines the routing. Combining header and path conditions within a single rule enables complex routing scenarios such as version-specific routing for specific API resources instead of the entire API, as illustrated in the following diagram.

Figure 2. A routing configuration with two header and one path conditions in API Gateway Console.

Review the API Gateway documentation for detailed guide on creating routing rules.

Configuring Routing Mode

Before you begin creating routing rules, you must first create at least one API, stage, and a custom domain name. You can configure your custom domain name with the new routing mode setting.

  • API mappings only. This is the default mode. When using this mode, you can continue to use base path mappings to route requests to different APIs, and not use Routing Rules at all. This mode maintains the current behavior, where requests are routed based on base path mappings only.
  • Routing rules then API mappings. With this mode you can use Routing Rules while continuing to keep base path mappings as a fallback. When you use this mode, the Routing Rules always take precedence, and unmatched requests are evaluated against base path mappings. This mode is useful for gradually transitioning your APIs to Routing Rules.
  • Routing rules only. This mode gives you the flexibility to use routing rules only, and not rely on the base paths that you may have previously created on the domain using API mappings. This is the recommended routing mode; it is helpful when you are starting off with a new custom domain or finished transitioning from API mappings to Routing Rules for an existing custom domain.

When switching from one routing mode to another, always test your new configuration in non-production environments first. For example, when switching mode from API mappings only to routing rules only, your traffic will only be routed with routing rules; existing API mappings will no longer take effect.

Onboarding to Header-Based Routing

You can adopt the new Header-Based Routing for your existing API Gateway custom domains with zero-downtime, risk-minimized approach. The first step is to configure your custom domain to use the Routing rules then API mappings mode using the API Gateway console, AWS CLI, or your infrastructure-as-code (IaC) tool. This configuration ensures that while you gradually create Routing Rules, your existing base path mappings continue to function as fallback routes. Since Routing Rules are evaluated before base path mappings, and in the absence of any matching rules, requests automatically fall back to your existing base path mappings, your current API traffic remains unaffected during this transition.

Once you’ve configured the routing mode, you can progressively introduce Routing Rules alongside your existing setup. For example, you might start by creating a rule with a specific test header that routes to a new API version, allowing you to validate the routing behavior with controlled test traffic while production traffic continues flowing through your existing base path mappings. As you gain confidence in the new routing configuration, you can gradually expand your rules, adjust priorities, and optionally migrate away from base path mappings entirely. This incremental approach, combined with API Gateway’s observability capabilities described in the next section, enables you to validate each change and ensure your API consumers experience no disruption during the transition.

Observability

API Gateway provides comprehensive visibility into how your routing rules are processing requests through access logging. Each request now includes additional context variables that help you understand the routing decision process. The $context.customDomain.routingRuleIdMatched variable identifies which rule was matched and applied to the request, while existing variables like $context.domainName, $context.apiId, and $context.stage provide the complete routing context. By analyzing these access logs, you can verify routing behavior, troubleshoot unexpected routes, and gather insights about traffic patterns across different API versions or test variants.

End-to-end example

Consider a real-world scenario where a team needs to gradually migrate users to a new API version, such as an e-commerce platform updating its checkout API from v1 to v2. First, the team creates two different REST APIs – one for each version. Then, they set up a Routing Rule with priority 100 that checks for the header x-version=v2 and routes matching requests to the v2 API. They also create another rule with priority 200 that routes all requests with paths starting with /checkout to v1 API as a fallback.

Figure 3. Gradually transitioning clients from v1 to v2 API.

In the application code they add the x-version header for a small percentage of users. They monitor the performance and error rates using API Gateway’s telemetry capabilities by tracking the access and execution logs, along with emitted metrics. As their confidence grows, they gradually increase the percentage of users sending the v2 header. This approach ensures a controlled migration with minimal risk and ability to quickly rollback by simply removing the header from requests or changing a routing rule.

Sample

Follow the instructions in this GitHub repository to provision the sample in your AWS account. The project illustrates using dynamic routing with API Gateway.

Conclusion

Header-based routing brings significant advantages to API Gateway users. The feature’s backward compatibility ensures a smooth transition path – you can maintain existing base path mappings while gradually adopting Routing Rules, or use both mechanisms simultaneously with the fallback option. This flexibility allows you to migrate at your own pace without disrupting existing applications. The solution is cost-effective, with no additional charges for using Routing Rules on REST APIs. It reduces requirements to leverage extra service and infrastructure for dynamic routing. The priority-based evaluation system provides deterministic routing behavior, making it easier to understand and troubleshoot routing decisions.

To learn more about API Gateway header-based routing see the service documentation.

To learn more about Serverless architectures see Serverless Land.

PackScan: Building real-time sort center analytics with AWS Services

Post Syndicated from Sairam Vangapally original https://aws.amazon.com/blogs/big-data/packscan-building-real-time-sort-center-analytics-with-aws-services/

Amazon manages a complex logistics network with multiple touch points, from fulfillment centers to sort centers to final customer delivery. Among these, sort centers play a crucial role in the middle mile, providing faster and more efficient package movement. Within Amazon’s Middle Mile operations, high-volume sort centers process millions of packages daily, making immediate access to operational data essential for optimizing efficiency and decision-making. Real-time visibility into key metrics—such as package movements, container statuses, and associate productivity—is critical for smooth logistics operations. To address the need for real-time operational planning, the Amazon Middle Mile team developed PackScan, a cloud-based platform designed to provide instant insights across the network. By significantly reducing data latency, PackScan enables proactive decision-making, so teams can monitor inbound package flows, optimize outbound shipments based on live data, track associate productivity, identify bottlenecks, and enhance overall operational efficiency—all in real time.

In this post, we explore how PackScan uses Amazon cloud-based services to drive real-time visibility, improve logistics efficiency, and support the seamless movement of packages across Amazon’s Middle Mile network.

Prerequisites

This post assumes a foundational understanding of the following services and concepts:

Although hands-on experience is not required, a conceptual understanding of these services will help in understanding the architecture, design patterns, and components discussed throughout the article.

Business challenges

Amazon’s sort centers handle over 15 million packages daily across more than 120 facilities in North America. Given this scale, even minor delays in operational insights can lead to inefficiencies, increased costs, and escalations. Traditionally, data latencies of up to an hour have restricted the ability to make proactive decisions, directly affecting productivity, resource allocation, and responsiveness—especially during peak periods like holiday seasons and big deal days.

Without immediate visibility into package movements, container statuses, and associate performance, operational teams face challenges in identifying and resolving bottlenecks in real time. The lack of timely insights can disrupt the flow of packages, leading to shipment delays, reduced throughput, and suboptimal facility performance. Addressing these inefficiencies required a solution capable of delivering real-time, high-fidelity data to support rapid decision-making.

To bridge this gap, Amazon’s Middle Mile organization needed a scalable platform that could enhance visibility, minimize latency, and provide up-to-the-minute insights into logistics operations. PackScan was designed to meet these demands, giving teams access to the real-time data necessary to optimize workflows, mitigate bottlenecks, and improve overall efficiency.

Data flow

In 2024, PackScan was deployed across 80 sort centers in the USA, enabling real-time package analytics. The solution powers Grafana dashboards, which refresh every 10 seconds by fetching live package data from OpenSearch Service. With this near real-time visibility, operations teams can monitor package movement and sorting efficiency across sort centers. The following diagram outlines how package scan data is ingested, processed, and made actionable.

Each sort center is equipped with hardware at inbound stations where packages arrive from trailers. Integrated barcode scanners automatically scan each package as it enters the sorting process. Every scan generates an SNS event, capturing key attributes such as the package ID, dimensions, the associate who performed the scan, and the timestamp and location of the scan.

After they’re generated, these SNS events are ingested into Data Firehose through a Lambda function, where the data undergoes real-time enrichment. During this process, additional attributes are appended, including the business logic rules. The enriched data is then streamed into OpenSearch Service, where events are indexed to enable fast and efficient querying. With the indexed package scan events available in OpenSearch Service, real-time analytics and monitoring become possible. The Grafana dashboards query this data every 10 seconds, providing operational insights into package inflow metrics and associate performance.

Solution overview

PackScan was implemented using a structured and scalable approach, using AWS cloud-based services to enable high-frequency data ingestion, real-time processing, and actionable insights. The architecture is designed to minimize latency while providing reliability, scalability, and operational efficiency. The solution is built around a serverless, event-driven architecture that dynamically scales based on data ingestion volumes. The architecture—illustrated in the following figure—enabled us to build a real-time data solution, utilizing the advantages of various AWS services to provide low-latency analytics, high scalability, and real-time operational insights across Amazon’s sort centers.

The following are the key components and features of the solution:

  • Real-time data processing – Lambda functions serve as the processing backbone of the system, handling 500,000 scan events per second. Each incoming event is processed by applying data transformations, enrichment, and validation before passing it downstream.
  • High-frequency data ingestion and streaming – Data Firehose is the primary ingestion pipeline, handling millions of scan events daily from thousands of barcode scanners across multiple sort centers. The Firehose streams handle incoming data of 12,000 PUT requests per second, maintaining smooth ingestion and low-latency streaming. Data retention policies are set to buffer and forward enriched events every 60 seconds or upon reaching 5 MB batch size, optimizing storage and processing efficiency.
  • Optimized querying and operational insights – OpenSearch Service is used to index and store the processed scan events, providing real-time querying and anomaly detection. The OpenSearch cluster consists of 12 data nodes (r5.4xlarge.search) and 3 primary nodes (r5.large.search), processing up to 10 GB of data per day with a rolling index strategy, where indexes are rotated every 24 hours to maintain query performance. The system supports concurrent queries per second, enabling logistics teams to perform rapid lookups and gain instant visibility into package movements.
  • Live visualization and dashboarding – Grafana, hosted on an m5.12xlarge EC2 instance, provides real-time visualization of key logistics metrics. The dashboards refresh every 10 seconds, querying OpenSearch and displaying up-to-the-minute package analytics. The setup includes multiple preconfigured dashboards, monitoring package flow at different inbound stations, and workforce efficiency. These dashboards support concurrent users, enabling supervisors and associates to track and optimize operations proactively. The following screenshot shows one of the real-time dashboards, with details of package flow by different routes within sort centers.

The entire PackScan architecture is designed for automatic scaling, adjusting dynamically based on data ingestion volume to maintain efficiency during peak and off-peak operations. This approach provides cost-effective resource utilization while maintaining high availability and performance.

Business outcomes

The implementation of PackScan has led to measurable improvements in operational efficiency, workforce productivity, and real-time decision-making across Amazon’s sort centers. By reducing data latency and enabling real-time insights, PackScan has transformed logistics operations in meaningful ways:

  • Widespread deployment – PackScan was deployed across 80 sort centers, supporting approximately 1,000 display monitors that provide real-time operational insights.
  • Significant reduction in data latency – Data latency dropped from approximately 1 hour to less than 1 minute, allowing for real-time operational responsiveness and minimizing workflow disruptions.
  • Proactive operational management – With dynamic workload balancing and instant bottleneck identification, supervisors can now address issues as they arise, leading to smoother operations and fewer escalations.
  • Boost in workforce productivity – The real-time performance feedback has enhanced associate engagement, resulting in a 25% increase in throughput per hour and 12% reduction in labor hours.

Overall, PackScan has redefined real-time logistics visibility within Amazon’s Middle Mile operations, empowering operational teams with actionable insights, enhanced workforce efficiency, and a data-driven approach to package movement and sort center performance.

Lessons learned and best practices

The deployment and scaling of PackScan provided valuable insights into optimizing real-time logistics visibility. Several key lessons and best practices emerged from this implementation:

  • Cloud architecture drives efficiency – Adopting Amazon technologies provides seamless scalability, reduced operational overhead, and lower infrastructure costs, while maintaining high reliability. The following table shows an approximate breakdown of monthly service costs observed in production. This is an estimation based on current pricing; we recommend checking the respective AWS service pricing pages to generate the most up-to-date quote. This architecture demonstrates that with combination of provisioned and serverless design, production-ready solutions can be built and scaled at a fraction of the cost of traditional infrastructure.
AWS Service Description Estimated Monthly Cost
Amazon EC2 Three EC2 instances of type m5.12xlarge hosting Grafana $1,700
AWS Lambda Streams SNS events to Data Firehose $4,000
Amazon Data Firehose Real-time data delivery with 12,000 records streaming to OpenSearch Service $1,500
Amazon OpenSearch Service Indexing and querying package scan events $28,000
  • Real-time visibility is a game changer – Immediate access to operational data enhances agility, enabling teams to make timely, data-driven decisions that prevent bottlenecks and improve throughput.
  • Continuous monitoring enhances decision-making – Operational dashboards should evolve with business needs. Regular monitoring and updates provide accuracy, usability, and relevance in driving informed decision-making.

By applying these best practices, PackScan has set a foundation for scalable, real-time logistics management, making sure that Amazon’s Middle Mile operations remain proactive, efficient, and highly responsive to changing business demands.

Conclusion

PackScan has successfully transformed real-time operational visibility within Amazon’s sort centers, addressing critical challenges in data latency, workforce productivity, and logistics efficiency. By using AWS services, particularly Data Firehose for real-time data delivery and OpenSearch Service for analytics, PackScan has enabled proactive decision-making, streamlined operations, and enhanced throughput in high-volume sort environments. Looking ahead, future enhancements will focus on further elevating operational intelligence and scalability, including:

  • Integrating predictive analytics to anticipate workflow bottlenecks and optimize resource allocation
  • Scaling the solution across additional operational scenarios, providing greater resilience and adaptability to dynamic logistics environments

With these advancements, PackScan will continue to drive operational excellence, cost-efficiency, and real-time decision-making capabilities, reinforcing Amazon’s commitment to innovation in logistics and supply chain management.

For those interested in implementing similar solutions, we recommend exploring AWS Serverless Architecture Patterns and the AWS Architecture Blog for additional insights and best practices in building scalable, real-time analytics solutions.


About the authors

Sairam Vangapally is a Data Engineer at Amazon with extensive experience architecting real-time, large-scale data platforms that power critical logistics operations across North America. He has led the design and deployment of end-to-end data pipelines, enabling high-throughput ingestion, transformation, and analytics at scale. He is passionate about building resilient data infrastructure and driving cross-functional collaboration to deliver solutions that accelerate operational insights and business impact.

Nitin Goyal serves as a Data Engineering Manager in Amazon’s Sort Center organization, where he leads initiatives to optimize operational efficiency across North American facilities. With over nine years of tenure at Amazon spanning multiple teams, he specializes in architecting high-performance data systems, with particular emphasis on real-time streaming pipelines, artificial intelligence, and low-latency solutions. His expertise drives the development of sophisticated operational workflows that enhance sort center productivity and effectiveness.

Enhance AI-assisted development with Amazon ECS, Amazon EKS and AWS Serverless MCP server

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/enhance-ai-assisted-development-with-amazon-ecs-amazon-eks-and-aws-serverless-mcp-server/

Today, we’re introducing specialized Model Context Protocol (MCP) servers for Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and AWS Serverless, now available in the AWS Labs GitHub repository. These open source solutions extend AI development assistants capabilities with real-time, contextual responses that go beyond their pre-trained knowledge. While Large Language Models (LLM) within AI assistants rely on public documentation, MCP servers deliver current context and service-specific guidance to help you prevent common deployment errors and provide more accurate service interactions.

You can use these open source solutions to develop applications faster, using up-to-date knowledge of Amazon Web Services (AWS) capabilities and configurations during the build and deployment process. Whether you’re writing code in your integrated development environment (IDE), or debugging production issues, these MCP servers support AI code assistants with deep understanding of Amazon ECS, Amazon EKS, and AWS Serverless capabilities, accelerating the journey from code to production. They work with popular AI-enabled IDEs, including Amazon Q Developer on the command line (CLI), to help you build and deploy applications using natural language commands.

  • The Amazon ECS MCP Server containerizes and deploys applications to Amazon ECS within minutes by configuring all relevant AWS resources, including load balancers, networking, auto-scaling, monitoring, Amazon ECS task definitions, and services. Using natural language instructions, you can manage cluster operations, implement auto-scaling strategies, and use real-time troubleshooting capabilities to identify and resolve deployment issues quickly.
  • For Kubernetes environments, the Amazon EKS MCP Server provides AI assistants with up-to-date, contextual information about your specific EKS environment. It offers access to the latest EKS features, knowledge base, and cluster state information. This gives AI code assistants more accurate, tailored guidance throughout the application lifecycle, from initial setup to production deployment.
  • The AWS Serverless MCP Server enhances the serverless development experience by providing AI coding assistants with comprehensive knowledge of serverless patterns, best practices, and AWS services. Using AWS Serverless Application Model Command Line Interface (AWS SAM CLI) integration, you can handle events and deploy infrastructure while implementing proven architectural patterns. This integration streamlines function lifecycles, service integrations, and operational requirements throughout your application development process. The server also provides contextual guidance for infrastructure as code decisions, AWS Lambda specific best practices, and event schemas for AWS Lambda event source mappings.

Let’s see it in action
If this is your first time using AWS MCP servers, visit the Installation and Setup guide in the AWS Labs GitHub repository to installation instructions. Once installed, add the following MCP server configuration to your local setup:

Install Amazon Q for command line and add the configuration to ~/.aws/amazonq/mcp.json. If you’re already an Amazon Q CLI user, add only the configuration.

{
  "mcpServers": {
    "awslabs.aws-serverless-mcp":  {
      "command": "uvx",
      "timeout": 60,
      "args": ["awslabs.aws_serverless_mcp_server@latest"],
    },
    "awslabs.ecs-mcp-server": {
      "disabled": false,
      "command": "uv",
      "timeout": 60,
      "args": ["awslabs.ecs-mcp-server@latest"],
    },
    "awslabs.eks-mcp-server": {
      "disabled": false,
      "timeout": 60,
      "command": "uv",
      "args": ["awslabs.eks-mcp-server@latest"],
    }
  }
}

For this demo I’m going to use the Amazon Q CLI to create an application that understands video using 02_using_converse_api.ipynb from Amazon Nova model cookbook repository as sample code. To do this, I send the following prompt:

I want to create a backend application that automatically extracts metadata and understands the content of images and videos uploaded to an S3 bucket and stores that information in a database. I'd like to use a serverless system for processing. Could you generate everything I need, including the code and commands or steps to set up the necessary infrastructure, for it to work from start to finish? - Use 02_using_converse_api.ipynb as example code for the image and video understanding.

Amazon Q CLI identifies the necessary tools, including the MCP serverawslabs.aws-serverless-mcp-server. Through a single interaction, the AWS Serverless MCP server determines all requirements and best practices for building a robust architecture.

I ask to Amazon Q CLI that build and test the application, but encountered an error. Amazon Q CLI quickly resolved the issue using available tools. I verified success by checking the record created in the Amazon DynamoDB table and testing the application with the dog2.jpeg file.

To enhance video processing capabilities, I decided to migrate my media analysis application to a containerized architecture. I used this prompt:

I'd like you to create a simple application like the media analysis one, but instead of being serverless, it should be containerized. Please help me build it in a new CDK stack.

Amazon Q Developer begins building the application. I took advantage of this time to grab a coffee. When I returned to my desk, coffee in hand, I was pleasantly surprised to find the application ready. To ensure everything was up to current standards, I simply asked:

please review the code and all app using the awslabsecs_mcp_server tools 

Amazon Q Developer CLI gives me a summary with all the improvements and a conclusion.

I ask it to make all the necessary changes, once ready I ask Amazon Q developer CLI to deploy it in my account, all using natural language.

After a few minutes, I review that I have a complete containerized application from the S3 bucket to all the necessary networking.

I ask Amazon Q developer CLI to test the app send it the-sea.mp4 video file and received a timed out error, so Amazon Q CLI decides to use the fetch_task_logs from awslabsecs_mcp_server tool to review the logs, identify the error and then fix it.

After a new deployment, I try it again, and the application successfully processed the video file

I can see the records in my Amazon DynamoDB table.

To test the Amazon EKS MCP server, I have code for a web app in the auction-website-main folder and I want to build a web robust app, for that I asked Amazon Q CLI to help me with this prompt:

Create a web application using the existing code in the auction-website-main folder. This application will grow, so I would like to create it in a new EKS cluster

Once the Docker file is created, Amazon Q CLI identifies generate_app_manifests from awslabseks_mcp_server as a reliable tool to create a Kubernetes manifests for the application.

Then create a new EKS cluster using the manage_eks_staks tool.

Once the app is ready, the Amazon Q CLI deploys it and gives me a summary of what it created.

I can see the cluster status in the console.

After a few minutes and resolving a couple of issues using the search_eks_troubleshoot_guide tool the application is ready to use.

Now I have a Kitties marketplace web app, deployed on Amazon EKS using only natural language commands through Amazon Q CLI.

Get started today
Visit the AWS Labs GitHub repository to start using these AWS MCP servers and enhance your AI-powered developmen there. The repository includes implementation guides, example configurations, and additional specialized servers to run AWS Lambda function, which transforms your existing AWS Lambda functions into AI-accessible tools without code modifications, and Amazon Bedrock Knowledge Bases Retrieval MCP server, which provides seamless access to your Amazon Bedrock knowledge bases. Other AWS specialized servers in the repository include documentation, example configurations, and implementation guides to begin building applications with greater speed and reliability.

To learn more about MCP Servers for AWS Serverless and Containers and how they can transform your AI-assisted application development, visit the Introducing AWS Serverless MCP Server: AI-powered development for modern applications, Automating AI-assisted container deployments with the Amazon ECS MCP Server, and Accelerating application development with the Amazon EKS MCP server deep-dive blogs.

— Eli