Tag Archives: serverless

Deploy serverless applications in a multicloud environment using Amazon CodeCatalyst

Post Syndicated from Deepak Kovvuri original https://aws.amazon.com/blogs/devops/deploy-serverless-applications-in-a-multicloud-environment-using-amazon-codecatalyst/

Amazon CodeCatalyst is an integrated service for software development teams adopting continuous integration and deployment practices into their software development process. CodeCatalyst puts the tools you need all in one place. You can plan work, collaborate on code, and build, test, and deploy applications by leveraging CodeCatalyst Workflows.

Introduction

In the first post of the blog series, we showed you how organizations can deploy workloads to instances, and virtual machines (VMs), across hybrid and multicloud environment. The second post of the series covered deploying containerized application in a multicloud environment. Finally, in this post, we explore how organizations can deploy modern, cloud-native, serverless application across multiple cloud platforms. Figure 1 shows the solution which we walk through in the post.

Figure 1 – Architecture diagram

The post walks through how to develop, deploy and test a HTTP RESTful API to Azure Functions using Amazon CodeCatalyst. The solution covers the following steps:

  • Set up CodeCatalyst development environment and develop your application using the Serverless Framework.
  • Build a CodeCatalyst workflow to test and then deploy to Azure Functions using GitHub Actions in Amazon CodeCatalyst.

An Amazon CodeCatalyst workflow is an automated procedure that describes how to build, test, and deploy your code as part of a continuous integration and continuous delivery (CI/CD) system. You can use GitHub Actions alongside native CodeCatalyst actions in a CodeCatalyst workflow.

Pre-requisites

Walkthrough

In this post, we will create a hello world RESTful API using the Serverless Framework. As we progress through the solution, we will focus on building a CodeCatalyst workflow that deploys and tests the functionality of the application. At the end of the post, the workflow will look similar to the one shown in Figure 2.

 CodeCatalyst CI/CD workflow

Figure 2 – CodeCatalyst CI/CD workflow

Environment Setup

Before we start developing the application, we need to setup a CodeCatalyst project and then link a code repository to the project. The code repository can be CodeCatalyst Repo or GitHub. In this scenario, we’ve used GitHub repository. By the time we develop the solution, the repository should look as shown below.

Files in solution's GitHub repository

Figure 3 – Files in GitHub repository

In Amazon CodeCatalyst, there’s an option to create Dev Environments, which can used to work on the code stored in the source repositories of a project. In the post, we create a Dev Environment, and associate it with the source repository created above and work off it. But you may choose not to use a Dev Environment, and can run the following commands, and commit to the repository. The /projects directory of a Dev Environment stores the files that are pulled from the source repository. In the dev environment, install the Serverless Framework using this command:

npm install -g serverless

and then initialize a serverless project in the source repository folder:

├── README.md
├── host.json
├── package.json
├── serverless.yml
└── src
    └── handlers
        ├── goodbye.js
        └── hello.js

We can push the code to the CodeCatalyst project using git. Now, that we have the code in CodeCatalyst, we can turn our focus to building the workflow using the CodeCatalyst console.

CI/CD Setup in CodeCatalyst

Configure access to the Azure Environment

We’ll use the GitHub action for Serverless to create and manage Azure Function. For the action to be able to access the Azure environment, it requires credentials associated with a Service Principal passed to the action as environment variables.

Service Principals in Azure are identified by the CLIENT_ID, CLIENT_SECRET, SUBSCRIPTION_ID, and TENANT_ID properties. Storing these values in plaintext anywhere in your repository should be avoided because anyone with access to the repository which contains the secret can see them. Similarly, these values shouldn’t be used directly in any workflow definitions because they will be visible as files in your repository. With CodeCatalyst, we can protect these values by storing them as secrets within the project, and then reference the secret in the CI\CD workflow.

We can create a secret by choosing Secrets (1) under CI\CD and then selecting ‘Create Secret’ (2) as shown in Figure 4. Now, we can key in the secret name and value of each of the identifiers described above.

Figure 4 – CodeCatalyst Secrets

Building the workflow

To create a new workflow, select CI/CD from navigation on the left and then select Workflows (1). Then, select Create workflow (2), leave the default options, and select Create (3) as shown in Figure 5.

Create CodeCatalyst CI/CD workflow

Figure 5 – Create CI/CD workflow

If the workflow editor opens in YAML mode, select Visual to open the visual designer. Now, we can start adding actions to the workflow.

Configure the Deploy action

We’ll begin by adding a GitHub action for deploying to Azure. Select “+ Actions” to open the actions list and choose GitHub from the dropdown menu. Find the Build action and click “+” to add a new GitHub action to the workflow.

Next, configure the GitHub action from the configurations tab by adding the following snippet to the GitHub Actions YAML property:

- name: Deploy to Azure Functions
  uses: serverless/[email protected]
  with:
    args: -c "serverless plugin install --name serverless-azure-functions && serverless deploy"
    entrypoint: /bin/sh
  env:
    AZURE_SUBSCRIPTION_ID: ${Secrets.SUBSCRIPTION_ID}
    AZURE_TENANT_ID: ${Secrets.TENANT_ID}
    AZURE_CLIENT_ID: ${Secrets.CLIENT_ID}
    AZURE_CLIENT_SECRET: ${Secrets.CLIENT_SECRET}

The above workflow configuration makes use of Serverless GitHub Action that wraps the Serverless Framework to run serverless commands. The action is configured to package and deploy the source code to Azure Functions using the serverless deploy command.

Please note how we were able to pass the secrets to GitHub action by referencing the secret identifiers in the above configuration.

Configure the Test action

Similar to the previous step, we add another GitHub action which will use the serverless framework’s serverless invoke command to test the API deployed on to Azure Functions.

- name: Test Function
  uses: serverless/[email protected]
  with:
    args: |
      -c "serverless plugin install --name serverless-azure-functions && \
          serverless invoke -f hello -d '{\"name\": \"CodeCatalyst\"}' && \
          serverless invoke -f goodbye -d '{\"name\": \"CodeCatalyst\"}'"
    entrypoint: /bin/sh
  env:
    AZURE_SUBSCRIPTION_ID: ${Secrets.SUBSCRIPTION_ID}
    AZURE_TENANT_ID: ${Secrets.TENANT_ID}
    AZURE_CLIENT_ID: ${Secrets.CLIENT_ID}
    AZURE_CLIENT_SECRET: ${Secrets.CLIENT_SECRET}

The workflow is now ready and can be validated by choosing ‘Validate’ and then saved to the repository by choosing ‘Commit’. The workflow should automatically kick-off after commit and the application is automatically deployed to Azure Functions.

The functionality of the API can now be verified from the logs of the test action of the workflow as shown in Figure 6.

Test action in CodeCatalyst CI/CD workfl

Figure 6 – CI/CD workflow Test action

Cleanup

If you have been following along with this workflow, you should delete the resources you deployed so you do not continue to incur charges. First, delete the Azure Function App (usually prefixed ‘sls’) using the Azure console. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project. There’s no cost associated with the CodeCatalyst project and you can continue using it.

Conclusion

In summary, this post highlighted how Amazon CodeCatalyst can help organizations deploy cloud-native, serverless workload into multi-cloud environment. The post also walked through the solution detailing the process of setting up Amazon CodeCatalyst to deploy a serverless application to Azure Functions by leveraging GitHub Actions. Though we showed an application deployment to Azure Functions, you can follow a similar process and leverage CodeCatalyst to deploy any type of application to almost any cloud platform. Learn more and get started with your Amazon CodeCatalyst journey!

We would love to hear your thoughts, and experiences, on deploying serverless applications to multiple cloud platforms. Reach out to us if you’ve any questions, or provide your feedback in the comments section.

About Authors

Picture of Deepak

Deepak Kovvuri

Deepak Kovvuri is a Senior Solutions Architect at supporting Enterprise Customers at AWS in the US East area. He has over 6 years of experience in helping customers architecting a DevOps strategy for their cloud workloads. Deepak specializes in CI/CD, Systems Administration, Infrastructure as Code and Container Services. He holds an Masters in Computer Engineering from University of Illinois at Chicago.

Picture of Amandeep

Amandeep Bajwa

Amandeep Bajwa is a Senior Solutions Architect at AWS supporting Financial Services enterprises. He helps organizations achieve their business outcomes by identifying the appropriate cloud transformation strategy based on industry trends, and organizational priorities. Some of the areas Amandeep consults on are cloud migration, cloud strategy (including hybrid & multicloud), digital transformation, data & analytics, and technology in general.

Picture of Brian

Brian Beach

Brian Beach has over 20 years of experience as a Developer and Architect. He is currently a Principal Solutions Architect at Amazon Web Services. He holds a Computer Engineering degree from NYU Poly and an MBA from Rutgers Business School. He is the author of “Pro PowerShell for Amazon Web Services” from Apress. He is a regular author and has spoken at numerous events. Brian lives in North Carolina with his wife and three kids.

Picture of Pawan

Pawan Shrivastava

Pawan Shrivastava is a Partner Solution Architect at AWS in the WWPS team. He focusses on working with partners to provide technical guidance on AWS, collaborate with them to understand their technical requirements, and designing solutions to meet their specific needs. Pawan is passionate about DevOps, automation and CI CD pipelines. He enjoys watching MMA, playing cricket and working out in the Gym.

Python 3.11 runtime now available in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/python-3-11-runtime-now-available-in-aws-lambda/

This post is written by Ramesh Mathikumar, Senior DevOps Consultant and Francesco Vergona, Solutions Architect.

AWS Lambda now supports Python 3.11 as both a managed runtime and container base image. Python 3.11 contains significant performance enhancements over Python 3.10. Features like reduced startup time, streamlined stack frames and CPython specialization adaptive interpreter help many workloads using Python 3.11 run faster and cheaper, thanks to Lambda’s per-millisecond billing model. With this release, Python developers can now take advantage of new features and improvements introduced in Python 3.11 when creating serverless applications on Lambda.

You can use Python 3.11 with Lambda Powertools for Python, a developer toolkit to implement Serverless best practices and increase developer velocity. Lambda Powertools includes proven libraries to support common patterns such as observability, parameter store integration, idempotency, batch processing, feature flags, and more. Learn more about PowerTools for AWS Lambda for Python in the documentation.

You can also use Python 3.11 with Lambda@Edge, allowing you to customize low-latency content delivered through Amazon CloudFront.

Python is a popular language for building serverless applications. The Python 3.11 release includes both performance improvements and new language features. For customers who deploy their Lambda functions using container image, the base image for Python 3.11 also includes changes to make managing installed packages easier.

This blog post reviews these changes in turn, followed by an overview of how you can get started with Python 3.11 in Lambda.

Performance improvements

Optimizations to CPython introduced by Python 3.11 brings significant performance enhancements, making it an average of 25% faster than Python 3.10, based on Python community benchmark tests using the Python Performance Benchmark Suite.

This release focuses on two key areas:

  • Faster startup: core modules essential for Python are now “frozen,” with statically allocated code objects, resulting in a 10–15% faster interpreter start up relative to Python 3.10.
  • Faster function execution: improvements include streamlined frame creation, in-lined Python function calls for reduced C stack usage, and the implementation of a Specializing Adaptive Interpreter, which specializes the interpreter for “hot code” (code that’s executed multiple times) and reducing the overhead during execution.

These optimizations can improve performance by 10-60% depending on the workload. In the context of a Lambda function execution, this results in performance improvements for both ”cold start“ and ”warm start“ invocations

In addition to faster CPython performance improvements, Python 3.11 also provides performance improvements across other areas. For example:

  • String formatting with printf-style% codes is now as fast as f-string expressions.
  • Integer division is around 20% faster on x86-64 for certain scenarios.
  • Operations like sum() and list resizing have seen notable speed enhancements.
  • Dictionaries save memory by not storing hash values when keys are Unicode objects.
  • Improvements to asyncio. DatagramProtocol introduce significantly faster large file transfers over UDP.
  • Math functions, statistics functions, and unicodedata.normalize() also benefit from substantial speed improvements.

Language features

Thanks to its simplicity, readability, and extensive community support, Python is a popular language for building serverless applications. The Python 3.11 release includes several new language features, including:

  • Variadic generics (PEP 646): Python 3.11 introduces TypeVarTuple, enabling parameterization with an arbitrary number of types.
  • Marking individual TypedDict items as required or not-required (PEP 655): The introduction of Required and NotRequired in TypedDict allows for explicit marking of individual item requirements, eliminating the need for inheritance.
  • Self type (PEP 673): The Self annotation simplifies the annotation of methods returning an instance of their class, similarly to TypeVar in PEP 484
  • Arbitrary literal string type (PEP 675): The LiteralString annotation allows a function parameter to accept any literal string type, including strings created from literals.
  • Data class transforms (PEP 681): The @dataclass_transform() decorator enables objects to utilize runtime transformations for dataclass-like functionalities.

For the full list of Python 3.11 changes, see the Python 3.11 release notes.

Change in pre-installed modules location and search path

Previously, Lambda base container images for Python included the /var/runtime directory before the /var/lang/lib/python3.x directory in the search path. This meant that packages in /var/runtime are loaded in preference to packages installed via pip into /var/lang/lib/python3.x. Since the AWS SDK for Python (boto3/botocore) was pre-installed into /var/runtime, this made it harder for base container images customers to upgrade the SDK version.

With the Python 3.11 runtime, the AWS SDK and its dependencies are now pre-installed into the /var/lang/lib/python3.11 directory, and the search path has been modified so this directory has precedence over /var/runtime. This change means customers who build and deploy Lambda functions using the Python 3.11 base container image can now override the SDK simply by running pip install on a newer version. This change also enables pip to verify and track that the pre-installed SDK and its dependencies are compatible with any customer-installed packages.

This is the default sys.path before Python 3.11 (where X.Y is the Python major.minor version):

  • /var/task/: User Function
  • /opt/python/lib/pythonX.Y/site-packages/: User Layer
  • /opt/python/: User Layer
  • /var/runtime/: Pre-installed modules
  • /var/lang/lib/pythonX.Y/site-packages/: Default pip install location

Here is the default sys.path starting from Python 3.11:

  • /var/task/: User Function
  • /opt/python/lib/pythonX.Y/site-packages/: User Layer
  • /opt/python/: User Layer
  • /var/lang/lib/pythonX.Y/site-packages/: Pre-installed modules and default pip install location
  • /var/runtime/: No pre-installed modules

Using Python 3.11 in Lambda

AWS Management Console

To use the Python 3.11 runtime to develop your Lambda functions, specify a runtime parameter value Python 3.11 when creating or updating a function. Python 3.11 version is now available in the Runtime dropdown in the Create function page.

Create function

To update an existing Lambda function to Python 3.11, navigate to the function in the Lambda console, then choose Edit in the Runtime settings panel. The new version of Python is available in the Runtime dropdown:

Edit function

AWS Lambda – Container Image

Change the Python base image version by modifying FROM statement in the Dockerfile:

FROM public.ecr.aws/lambda/python:3.11
# Copy function code
COPY lambda_handler.py ${LAMBDA_TASK_ROOT}

To learn more, refer to the usage tab on building functions as container images.

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to python3.11 to use this version.

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Simple Lambda Function
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Description: My Python Lambda Function
      CodeUri: my_function/
      Handler: lambda_function.lambda_handler
      Runtime: python3.11

AWS SAM supports generating this template with Python 3.11 out of the box for new serverless applications using the sam init command. Refer to the AWS SAM documentation here.

AWS Cloud Development Kit (AWS CDK)

In the AWS CDK, set the runtime attribute to Runtime.PYTHON_3_11 to use this version. In Python:

from constructs import Construct 
from aws_cdk import ( App, Stack, aws_lambda as _lambda )

class SampleLambdaStack(Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        base_lambda = _lambda.Function(self, 'SampleLambda', 
                                       handler='lambda_handler.handler', 
                                    runtime=_lambda.Runtime.PYTHON_3_11, 
                                 code=_lambda.Code.from_asset('lambda'))

In TypeScript:

import * as cdk from 'aws-cdk-lib';
import * as lambda from 'aws-cdk-lib/aws-lambda'
import * as path from 'path';
import { Construct } from 'constructs';

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The python3.11 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, 'python311LambdaFunction', {
      runtime: lambda.Runtime.PYTHON_3_11,
      memorySize: 512,
      code: lambda.Code.fromAsset(path.join(__dirname, '/../lambda')),
      handler: 'lambda_handler.handler'
    })
  }
}

Conclusion

You can build and deploy functions using Python 3.11 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of Infrastructure as Code (IaC). You can also use the Python 3.11 container base image if you prefer to build and deploy your functions using container images.

We are excited to bring Python 3.11 runtime support to Lambda and empower developers to build more efficient, powerful, and scalable serverless applications. Try Python 3.11 runtime in Lambda today and experience the benefits of this updated language version and take advantage of improved performance and new language features.
For more serverless learning resources, visit Serverless Land

Introducing the vector engine for Amazon OpenSearch Serverless, now in preview

Post Syndicated from Pavani Baddepudi original https://aws.amazon.com/blogs/big-data/introducing-the-vector-engine-for-amazon-opensearch-serverless-now-in-preview/

We are pleased to announce the preview release of the vector engine for Amazon OpenSearch Serverless. The vector engine provides a simple, scalable, and high-performing similarity search capability in Amazon OpenSearch Serverless that makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (AI) applications without having to manage the underlying vector database infrastructure. This post summarizes the features and functionalities of our vector engine.

Using augmented ML search and generative AI with vector embeddings

Organizations across all verticals are rapidly adopting generative AI for its ability to handle vast datasets, generate automated content, and provide interactive, human-like responses. Customers are exploring ways to transform the end-user experience and interaction with their digital platform by integrating advanced conversational generative AI applications such as chatbots, question and answer systems, and personalized recommendations. These conversational applications enable you to search and query in natural language and generate responses that closely resemble human-like responses by accounting for the semantic meaning, user intent, and query context.

ML-augmented search applications and generative AI applications use vector embeddings, which are numerical representations of text, image, audio, and video data to generate dynamic and relevant content. The vector embeddings are trained on your private data and represent the semantic and contextual attributes of the information. Ideally, these embeddings can be stored and managed close to your domain-specific datasets, such as within your existing search engine or database. This enables you to process a user’s query to find the closest vectors and combine them with additional metadata without relying on external data sources or additional application code to integrate the results. Customers want a vector database option that is simple to build on and enables them to move quickly from prototyping to production so they can focus on creating differentiated applications. The vector engine for OpenSearch Serverless extends OpenSearch’s search capabilities by enabling you to store, search, and retrieve billions of vector embeddings in real time and perform accurate similarity matching and semantic searches without having to think about the underlying infrastructure.

Exploring the vector engine’s capabilities

Built on OpenSearch Serverless, the vector engine inherits and benefits from its robust architecture. With the vector engine, you don’t have to worry about sizing, tuning, and scaling the backend infrastructure. The vector engine automatically adjusts resources by adapting to changing workload patterns and demand to provide consistently fast performance and scale. As the number of vectors grows from a few thousand during prototyping to hundreds of millions and beyond in production, the vector engine will scale seamlessly, without the need for reindexing or reloading your data to scale your infrastructure. Additionally, the vector engine has separate compute for indexing and search workloads, so you can seamlessly ingest, update, and delete vectors in real time while ensuring that the query performance your users experience remains unaffected. All the data is persisted in Amazon Simple Storage Service (Amazon S3), so you get the same data durability guarantees as Amazon S3 (eleven nines). Even though we are still in preview, the vector engine is designed for production workloads with redundancy for Availability Zone outages and infrastructure failures.

The vector engine for OpenSearch Serverless is powered by the k-nearest neighbor (kNN) search feature in the open-source OpenSearch Project, proven to deliver reliable and precise results. Many customers today are using OpenSearch kNN search in managed clusters for offering semantic search and personalization in their applications. With the vector engine, you can get the same functionality with the simplicity of a serverless environment. The vector engine supports the popular distance metrics such as Euclidean, cosine similarity, and dot product, and can accommodate 16,000 dimensions, making it well-suited to support a wide range of foundational and other AI/ML models. You can also store diverse fields with various data types such as numeric, boolean, date, keyword, geopoint for metadata, and text for descriptive information to add more context to the stored vectors. Colocating the data types reduces the complexity and maintainability and avoids data duplication, version compatibility challenges, and licensing issues, effectively simplifying your application stack. Because the vector engine supports the same OpenSearch open-source suite APIs, you can take advantage of its rich query capabilities, such as full text search, advanced filtering, aggregations, geo-spatial query, nested queries for faster retrieval of data, and enhanced search results. For example, if your use case requires you to find the results within 15 miles of the requestor, the vector engine can do this in a single query, eliminating the need for maintaining two different systems and then combining the results through application logic. With support for integration with LangChain, Amazon Bedrock, and Amazon SageMaker, you can easily integrate your preferred ML and AI system with the vector engine.

The vector engine supports a wide range of use cases across various domains, including image search, document search, music retrieval, product recommendation, video search, location-based search, fraud detection, and anomaly detection. We also anticipate a growing trend for hybrid searches that combine lexical search methods with advanced ML and generative AI capabilities. For example, when a user searches for a “red shirt” on your e-commerce website, semantic search helps expand the scope by retrieving all shades of red, while preserving the tuning and boosting logic implemented on the lexical (BM25) search. With OpenSearch filtering, you can further enhance the relevance of your search results by providing users with options to refine their search based on size, brand, price range, and availability in nearby stores, allowing for a more personalized and precise experience. The hybrid search support in the vector engine enables you to query vector embeddings, metadata, and descriptive information within a single query call, making it easy to provide more accurate and contextually relevant search results without building complex application code.

You can get started in minutes with the vector engine by creating a specialized vector search collection under OpenSearch Serverless using the AWS Management Console, AWS Command Line Interface (AWS CLI), or the AWS software development kit (AWS SDK). Collections are a logical grouping of indexed data that works together to support a workload, while the physical resources are automatically managed in the backend. You don’t have to declare how much compute or storage is needed or monitor the system to make sure it’s running well. OpenSearch Serverless applies different sharding and indexing strategies for the three available collection types: time series, search, and vector search. The vector engine’s compute capacity used for data ingestion, and search and query are measured in OpenSearch Compute Units (OCUs). One OCU can handle 4 million vectors for 128 dimensions or 500K for 768 dimensions at 99% recall rate. The vector engine is built on OpenSearch Serverless, which is a highly available service and requires a minimum of 4 OCUs (two OCUs for the ingest including primary and standby, and two OCUs for the search with two active replicas across Availability Zones) for that first collection in an account. All subsequent collections using the same AWS Key Management Service (AWS KMS) key can share those OCUs.

Get started with vector embeddings

To get started using vector embeddings using the console, complete the following steps:

  1. Create a new collection on the OpenSearch Serverless console.
  2. Provide a name and optional description.
  3. Currently, vector embeddings are supported exclusively by vector search collections; therefore, for Collection type, select Vector search.
  4. Next, you must configure the security policies, which includes encryption, network, and data access policies.

We are introducing the new Easy create option, which streamlines the security configuration for faster onboarding. All the data in the vector engine is encrypted in transit and at rest by default. You can choose to bring your own encryption key or use the one provided by the service that is dedicated for your collection or account. You can choose to host your collection on a public endpoint or within a VPC. The vector engine supports fine-grained AWS Identity and Access Management (IAM) permissions so that you can define who can create, update, and delete encryption, network, collections, and indexes, thereby enabling organizational alignment.

  1. With the security settings in place, you can finish creating the collection.

After the collection is successfully created, you can create the vector index. At this point, you can use the API or the console to create an index. An index is a collection of documents with a common data schema and provides a way for you to store, search, and retrieve your vector embeddings and other fields. The vector index supports up to 1,000 fields.

  1. To create the vector index, you must define the vector field name, dimensions, and the distance metric.

The vector index supports up to 16,000 dimensions and three types of distance metrics: Euclidean, cosine, and dot product.

Once you have successfully created the index, you can use OpenSearch’s powerful query capabilities to get comprehensive search results.

The following example shows how easily you can create a simple property listing index with the title, description, price, and location details as fields using the OpenSearch API. By using the query APIs, this index can efficiently provide accurate results to match your search requests, such as “Find me a two-bedroom apartment in Seattle that is under $3000.”

From preview to GA and beyond

Today, we are excited to announce the preview of the vector engine, making it available for you to begin testing it out immediately. As we noted earlier, OpenSearch Serverless was designed to provide a highly available service to power your enterprise applications, with independent compute resources for index and search and built-in redundancy.

We recognize that many of you are in the experimentation phase and would like a more economical option for dev-test. Prior to GA, we plan to offer two features that will enable us to reduce the cost of your first collection. The first is a new dev-test option that enables you to launch a collection with no active standby or replica, reducing the entry cost by 50%. The vector engine still provides durability guarantees because it persists all the data in Amazon S3. The second is to initially provision a 0.5 OCU footprint, which will scale up as needed to support your workload, further lowering costs if your initial workload is in the tens of thousands to low-hundreds of thousands of vectors (depending on the number of dimensions). Between these two features, we will reduce the minimum OCUs needed to power your first collection from 4 OCUs down to 1 OCU per hour.

We are also working on features that will allow us to achieve workload pause and resume capabilities in the coming months, which is particularly useful for the vector engine because many of these use cases don’t require continuous indexing of the data.

Lastly, we are diligently focused on optimizing the performance and memory usage of the vector graphs, including improving caching, merging and more.

While we work on these cost reductions, we will be offering the first 1400 OCU-hours per month free on vector collections until the dev-test option is made available. This will enable you to test the vector engine preview for up to two weeks every month at no cost, based on your workload.

Summary

The vector engine for OpenSearch Serverless introduces a simple, scalable, and high-performing vector storage and search capability that makes it straightforward for you to quickly store and query billions of vector embeddings generated from a variety of ML models, such as those provided by Amazon Bedrock, with response times in milliseconds.

The preview release of vector engine for OpenSearch Serverless is now available in eight Regions globally: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland).

We are the excited about the future ahead and your feedback will play a vital role in guiding the progress of this product. We encourage you to try out the vector engine for OpenSearch Serverless and share your use cases, questions, and feedback in the comments section.

In the coming weeks, we will be publishing a series of posts to provide you with detailed guidance on how to integrate the vector engine with LangChain, Amazon Bedrock, and SageMaker. To learn more about the vector engine’s capabilities, refer to our Getting Started with Amazon OpenSearch Serverless documentation


About the authors

Pavani Baddepudi is a Principal Product Manager for Search Services at AWS and the lead PM for OpenSearch Serverless. Her interests include distributed systems, networking, and security. When not working, she enjoys hiking and exploring new cuisines.

Carl Meadows is Director of Product Management at AWS and is responsible for Amazon Elasticsearch Service, OpenSearch, Open Distro for Elasticsearch, and Amazon CloudSearch. Carl has been with Amazon Elasticsearch Service since before it was launched in 2015. He has a long history of working in the enterprise software and cloud services spaces. When not working, Carl enjoys making and recording music.

Migrating AWS Lambda functions from the Go1.x runtime to the custom runtime on Amazon Linux 2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/migrating-aws-lambda-functions-from-the-go1-x-runtime-to-the-custom-runtime-on-amazon-linux-2/

This post is written by Micah Walter, Senior Solutions Architect, Yanko Bolanos, Senior Solutions Architect, and Ramesh Mathikumar, Senior DevOps Consultant.

This blog post describes our plans to improve performance and streamline the user experience for customers writing AWS Lambda functions using Go.

Today, customers using Go with Lambda can either use the go1.x runtime, or use the provided.al2 runtime. Going forward, we plan to deprecate the go1.x runtime in line with the end-of-life of Amazon Linux 1, currently scheduled for December 31, 2023.

Customers using the go1.x runtime should migrate their functions to the provided.al2 runtime to continue to benefit from the latest runtime updates and security patches. Customers who deploy Go functions using container images who are currently using the go1.x base container image should similarly migrate to the provided.al2 base image.

Using the provided.al2 runtime offers several benefits over the go1.x runtime. First, it supports running Lambda functions on AWS Graviton2 processors, offering up to 34% better price-performance compared to functions running on x86_64 processors. Second, it offers a streamlined implementation with a smaller deployment package and faster function invoke path. Finally, this change aligns Go with other languages that also compile to native code such as Rust or C++, which also run on the provided.al2 runtime.

This migration does not require any code changes. The only changes relate to how you build your deployment package and configure your function. This blog post outlines the steps required to update your build scripts and tooling to use the provided.al2 runtime for your Go functions.

There is a difference in Lambda billing between the go1.x runtime and the provided.al2 runtime. With the go1.x runtime, Lambda does not bill for time spent during function initialization (cold start), whereas with the provided.al2 runtime Lambda includes function initialization time in the billed function duration. Since Go functions typically initialize very quickly, and since Lambda reduces the number of initializations by re-using function execution environments for multiple function invokes, in practice the difference in your Lambda bill should be very small.

Compiling for the provided.al2 runtime

In order to run a compiled Go application on Lambda, you must compile your code for Linux. While the go1.x runtime allows you to use any executable name, the provided.al2 runtime requires you to use bootstrap as the executable name. On macOS and Linux, here’s the simplest form of the build command:

GOARCH=amd64 GOOS=linux go build -o bootstrap main.go

This build command creates a Go binary file called bootstrap compatible with the x86_64 instruction set for Lambda. To compile for AWS Graviton processors, set GOARCH=arm64 in the preceding command.

The final step is to compress this binary into a ZIP file deployment package, ready to deploy to Lambda:

zip myFunction.zip bootstrap

For users compiling on Windows, Go supports compiling for Linux without using a Linux virtual machine or build container. However, Lambda uses POSIX file permissions, which must be set correctly. Lambda provides a helper tool which builds a deployment package that is valid for Lambda—see the Lambda documentation for details. Existing users of this tool should update to the latest version to make sure their build scripts are up-to-date.

Removing the RPC dependency

The go1.x runtime uses two processes within the Lambda execution environment to route requests to your handler function. The first process, which is included in the runtime, retrieves function invocation requests from the Lambda runtime API, and uses RPC to pass the invoke to the second process. This second process runs the executable which you deploy, and comprises the aws-lambda-go package and your function code. The aws-lambda-go package receives the RPC request and executes your function.

The following runtime architecture diagram for the go1.x runtime shows the runtime client process calling the runtime API to retrieve a function invocation and using RPC to call a separate process containing the function code.

Execution environment

Go functions deployed to the provided.al2 runtime use a simpler, single-process architecture. When building the Go executable, you include the same aws-lambda-go package as before. However, in this case the aws-lambda-go package acts as the runtime client, retrieving invocation requests from the runtime API directly.

The following runtime architecture diagram shows a Go function running on the provided.al2 runtime. A single process retrieves the function invocation from the runtime API and executes the function code.

Go running on provided.al2

Removing the additional process and RPC hop streamlines the function execution path, resulting in faster invokes. You can also remove the RPC component from the aws-lambda-go package, giving a smaller binary size and faster code loading during cold starts. To remove the RPC dependency, add the lambda.norpc tag to your build command:

GOARCH=amd64 GOOS=linux go build -tags lambda.norpc -o bootstrap main.go

Creating a new Lambda function

Once your deployment package is ready, you can create a new Lambda function using the provided.al2 runtime using the Lambda console:

Creating in the Lambda console

Migrating existing functions

If you have existing Lambda functions that use the go1.x runtime, you can migrate these functions by following these steps:

  1. Recompile your binary using the preceding commands, making sure to name your binary bootstrap.
  2. If you are using the same instruction set architecture, open the runtime settings and switch the runtime to “Provide your own bootstrap on Amazon Linux 2”.
  3. Upload the new version of your binary as a zip file.
    Edit runtime settings

Note: The handler value is not used by the provided.al2 runtime, nor the aws-lambda-go library, and may be set to any value. We recommend the setting the value to bootstrap to help with migrating between go1.x and provided.al2.

To switch instruction set architecture to Graviton (arm64), save your changes, and then re-open the runtimes settings to make the architecture change.

Migrating Go1.x Lambda container images

Lambda allows you to run your Go code as a container image. Customers that are using the go1.x base image for Lambda containers must migrate to the provided.al2 base image. The Lambda documentation includes instructions on how to build and deploy Go functions using the provided.al2 base image.

The following Dockerfile uses a two-stage build to avoid unnecessary layers and files in your final image. The first stage of the process builds the application. This stage installs Go, downloads the dependencies for the code, and compiles the binary. The second stage copies the executable to a new container without the dependencies of the build process.

  1. Create a Dockerfile in your project directory:
    FROM public.ecr.aws/lambda/provided:al2 as build
    # install compiler
    RUN yum install -y golang
    RUN go env -w GOPROXY=direct
    # cache dependencies
    ADD go.mod go.sum ./
    RUN go mod download
    # build
    ADD . .
    RUN go build -tags lambda.norpc -o /main
    # copy artifacts to a clean image
    FROM public.ecr.aws/lambda/provided:al2
    COPY --from=build /main /main
    ENTRYPOINT [ "/main" ]           
    
  2. Build your Docker image with the Docker build command:
    docker build -t hello-world .
  3. Authenticate the Docker CLI to your Amazon ECR registry:
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com 
  4. Tag and push your image to the Amazon ECR registry:
    docker tag hello-world:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/hello-world:latest
    
    docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/hello-world:latest

You can now create or update your Go Lambda function to use the new container image.

Changes to tooling

To migrate your functions from the go1.x runtime to the provided.al2 runtime, you must make configuration changes to your build scripts or CI/CD configurations. Here are some common examples.

Makefiles and build scripts

If you use Makefiles or custom build scripts to build Go functions, you must modify to ensure the executable file is named bootstrap when deploying to the provided.al2 runtime.

Here is an example Makefile which compiles the main.go file into an executable called bootstrap in the bin folder. It also creates a zip file, which you can deploy to Lambda using the console or via the AWS CLI.

GOARCH=arm64 GOOS=linux go build -tags lambda.norpc -o ./bin/bootstrap
(cd bin && zip -FS bootstrap.zip bootstrap)

CloudFormation

If you deploy your Lambda functions using AWS CloudFormation templates, change the Handler and Runtime settings under Properties:

Resources:
  Function:
    Type: AWS::Serverless::Function
    Properties:
      Handler: bootstrap
      Runtime: provided.al2
      ... # Other required properties

AWS Serverless Application Model

If you use the AWS Serverless Application Model (AWS SAM) to build and deploy your Go functions, make the same changes to the Handler and Runtime settings as for CloudFormation. You must also add the BuildMethod:

Resources:
  HelloWorldFunction:
    Type: AWS::Serverless::Function
    Metadata:
      BuildMethod: go1.x 
    Properties:
      CodeUri: hello-world/ # folder where your main program resides
      Handler: bootstrap
      Runtime: provided.al2
      Architectures:
        - x86_64

Cloud Development Kit (CDK)

If you use the AWS Cloud Development Kit (AWS CDK), you can compile your Go executable and place it under a folder in your project. Next, specify the location by using awslambda.Code_FromAsset, and AWS CDK packages the binary into a zip file and uploads it.

// Go CDK
awslambda.NewFunction(stack, jsii.String("HelloHandler"), &awslambda.FunctionProps{
    Code:         awslambda.Code_FromAsset(jsii.String("lambda"), nil), //folder where bootstrap executable is located
    Runtime:      awslambda.Runtime_PROVIDED_AL2(),
    Handler:      jsii.String("bootstrap"), // Handler named bootstrap
    Architecture: awslambda.Architecture_ARM_64(),
})

Taking this further, AWS CDK can perform build commands as part of your AWS CDK build process by using the native AWS CDK bundling functionality. With the bundling parameter, AWS CDK can perform steps before staging the files in the cloud assembly. Instead of placing the binary file, place the Go code in a folder and use the Bundling option to compile the code in a Docker container.

This example uses the golang:1.20.1 Docker image. After compilation, AWS CDK creates a zip file with the binary and creates the Lambda function:

// Go CDK
awslambda.NewFunction(stack, jsii.String("HelloHandler"), &awslambda.FunctionProps{
        Code: awslambda.Code_FromAsset(jsii.String("go-lambda"), &awss3assets.AssetOptions{
                Bundling: &awscdk.BundlingOptions{
                        Image: awscdk.DockerImage_FromRegistry(jsii.String("golang:1.20.1")),
                        Command: &[]*string{
                                jsii.String("bash"),
                                jsii.String("-c"),
                                jsii.String("GOCACHE=/tmp go mod tidy && GOCACHE=/tmp GOARCH=arm64 GOOS=linux go build -tags lambda.norpc -o /asset-output/bootstrap"),
                        },
                },
        }),
        Runtime:      awslambda.Runtime_PROVIDED_AL2(),
        Handler:      jsii.String("bootstrap"),
        Architecture: awslambda.Architecture_ARM_64(),
})

Conclusion

Lambda is deprecating the go1.x runtime in line with Amazon Linux 1 end-of-life, scheduled for December 31, 2023. Customers using Go with Lambda should migrate their functions to the provided.al2 runtime. Benefits include support for AWS Graviton2 processors with better price-performance, and a streamlined invoke path with faster performance.

For more serverless learning resources, visit Serverless Land.

Alcion supports their multi-tenant platform with Amazon OpenSearch Serverless

Post Syndicated from Zack Rossman original https://aws.amazon.com/blogs/big-data/alcion-supports-their-multi-tenant-platform-with-amazon-opensearch-serverless/

This is a guest blog post co-written with Zack Rossman from Alcion.

Alcion, a security-first, AI-driven backup-as-a-service (BaaS) platform, helps Microsoft 365 administrators quickly and intuitively protect data from cyber threats and accidental data loss. In the event of data loss, Alcion customers need to search metadata for the backed-up items (files, emails, contacts, events, and so on) to select specific item versions for restore. Alcion uses Amazon OpenSearch Service to provide their customers with accurate, efficient, and reliable search capability across this backup catalog. The platform is multi-tenant, which means that Alcion requires data isolation and strong security so as to ensure that tenants can only search their own data.

OpenSearch Service is a fully managed service that makes it easy to deploy, scale, and operate OpenSearch in the AWS Cloud. OpenSearch is an Apache-2.0-licensed, open-source search and analytics suite, comprising OpenSearch (a search, analytics engine, and vector database), OpenSearch Dashboards (a visualization and utility user interface), and plugins that provide advanced capabilities like enterprise-grade security, anomaly detection, observability, alerting, and much more. Amazon OpenSearch Serverless is a serverless deployment option that makes it simple to use OpenSearch without configuring, managing, and scaling OpenSearch Service domains.

In this post, we share how adopting OpenSearch Serverless enabled Alcion to meet their scale requirements, reduce their operational overhead, and secure their tenants’ data by enforcing tenant isolation within their multi-tenant environment.

OpenSearch Service managed domains

For the first iteration of their search architecture, Alcion chose the managed domains deployment option in OpenSearch Service and was able to launch their search functionality in production in less than a month. To meet their security, scale, and tenancy requirements, they stored data for each tenant in a dedicated index and used fine-grained access control in OpenSearch Service to prevent cross-tenant data leaks. As their workload evolved, Alcion engineers tracked OpenSearch domain utilization via the provided Amazon CloudWatch metrics, making changes to increase storage and optimize their compute resources.

The team at Alcion used several features of OpenSearch Service managed domains to improve their operational stance. They introduced index aliases, which provide a single alias name to access (read and write) multiple underlying indexes. They also configured Index State Management (ISM) policies to help them control their data lifecycle by rolling indexes over based on index size. Together, the ISM policies and index aliases were necessary to scale indexes for large tenants. Alcion also used index templates to define the shards per index (partitioning) of their data so as to automate their data lifecycle and improve the performance and stability of their domains.

The following architecture diagram shows how Alcion configured their OpenSearch managed domains.

The following diagram shows how Microsoft 365 data was indexed to and queried from tenant-specific indexes. Alcion implemented request authentication by providing the OpenSearch primary user credentials with each API request.

OpenSearch Serverless overview and tenancy model options

OpenSearch Service managed domains provided a stable foundation for Alcion’s search functionality, but the team needed to manually provision resources to the domains for their peak workload. This left room for cost optimizations because Alcion’s workload is bursty—there are large variations in the number of search and indexing transactions per second, both for a single customer and taken as a whole. To reduce costs and operational burden, the team turned to OpenSearch Serverless, which offers auto-scaling capability.

To use OpenSearch Serverless, the first step is to create a collection. A collection is a group of OpenSearch indexes that work together to support a specific workload or use case. The compute resources for a collection, called OpenSearch Compute Units (OCUs), are shared across all collections in an account that share an encryption key. The pool of OCUs is automatically scaled up and down to meet the demands of indexing and search traffic.

The level of effort required to migrate from an OpenSearch Service managed domain to OpenSearch Serverless was manageable thanks to the fact that OpenSearch Serverless collections support the same OpenSearch APIs and libraries as OpenSearch Service managed domains. This allowed Alcion to focus on optimizing the tenancy model for the new search architecture. Specifically, the team needed to decide how to partition tenant data within collections and indexes while balancing security and total cost of ownership. Alcion engineers, in collaboration with the OpenSearch Serverless team, considered three tenancy models:

  • Silo model: Create a collection for each tenant
  • Pool model: Create a single collection and use a single index for multiple tenants
  • Bridge model: Create a single collection and use a single index per tenant

All three design choices had benefits and trade-offs that had to be considered for designing the final solution.

Silo model: Create a collection for each tenant

In this model, Alcion would create a new collection whenever a new customer onboarded to their platform. Although tenant data would be cleanly separated between collections, this option was disqualified because the collection creation time meant that customers wouldn’t be able to back up and search data immediately after registration.

Pool model: Create a single collection and use a single index for multiple tenants

In this model, Alcion would create a single collection per AWS account and index tenant-specific data in one of many shared indexes belonging to that collection. Initially, pooling tenant data into shared indexes was attractive from a scale perspective because this led to the most efficient use of index resources. But after further analysis, Alcion found that they would be well within the per-collection index quota even if they allocated one index for each tenant. With that scalability concern resolved, Alcion pursued the third option because siloing tenant data into dedicated indexes results in stronger tenant isolation than the shared index model.

Bridge model: Create a single collection and use a single index per tenant

In this model, Alcion would create a single collection per AWS account and create an index for each of the hundreds of tenants managed by that account. By assigning each tenant to a dedicated index and pooling these indexes in a single collection, Alcion reduced onboarding time for new tenants and siloed tenant data into cleanly separated buckets.

Implementing role-based access control for supporting multi-tenancy

OpenSearch Serverless offers a multi-point, inheritable set of security controls, covering data access, network access, and encryption. Alcion took full advantage of OpenSearch Serverless data access policies to implement role-based access control (RBAC) for each tenant-specific index with the following details:

  • Allocate an index with a common prefix and the tenant ID (for example, index-v1-<tenantID>)
  • Create a dedicated AWS Identity and Access Management (IAM) role that is used to sign requests to the OpenSearch Serverless collection
  • Create an OpenSearch Serverless data access policy that grants document read/write permissions within a dedicated tenant index to the IAM role for that tenant
  • OpenSearch API requests to a tenant index are signed with temporary credentials belonging to the tenant-specific IAM role

The following is an example OpenSearch Serverless data access policy for a mock tenant with ID t-eca0acc1-12345678910. This policy grants the IAM role document read/write access to the dedicated tenant access.

[
    {
        "Rules": [
            {
                "Resource": [
                    "index/collection-searchable-entities/index-v1-t-eca0acc1-12345678910"
                ],
                "Permission": [
                    "aoss:ReadDocument",
                    "aoss:WriteDocument",
                ],
                "ResourceType": "index"
            }
        ],
        "Principal": [
            "arn:aws:iam::12345678910:role/OpenSearchAccess-t-eca0acc1-1b9f-4b3f-95d6-12345678910"
        ],
        "Description": "Allow document read/write access to OpenSearch index belonging to tenant t-eca0acc1-1b9f-4b3f-95d6-12345678910"
    }
] 

The following architecture diagram depicts how Alcion implemented indexing and searching for Microsoft 365 resources using the OpenSearch Serverless shared collection approach.

The following is the sample code snippet for sending an API request to an OpenSearch Serverless collection. Notice how the API client is initialized with a signer object that signs requests with the same IAM principal that is linked to the OpenSearch Serverless data access policy from the previous code snippet.

package alcion

import (
	"context"
	"encoding/json"
	"strings"

	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/credentials/stscreds"
	"github.com/aws/aws-sdk-go-v2/service/sts"
	"github.com/opensearch-project/opensearch-go/v2"
	"github.com/opensearch-project/opensearch-go/v2/opensearchapi"
	"github.com/opensearch-project/opensearch-go/v2/signer"
	awssignerv2 "github.com/opensearch-project/opensearch-go/v2/signer/awsv2"
	"github.com/pkg/errors"
)

const (
	// Scope the API request to the AWS OpenSearch Serverless service
	aossService = "aoss"

	// Mock values
	indexPrefix        = "index-v1-"
	collectionEndpoint = "<https://kfbr3928z4y6vot2mbpb.us-east-1.aoss.amazonaws.com>"
	tenantID           = "t-eca0acc1-1b9f-4b3f-95d6-b0b96b8c03d0"
	roleARN            = "arn:aws:iam::1234567890:role/OpenSearchAccess-t-eca0acc1-1b9f-4b3f-95d6-b0b96b8c03d0"
)

func CreateIndex(ctx context.Context, tenantID string) (*opensearchapi.Response, error) {

	sig, err := createRequestSigner(ctx)
	if err != nil {
		return nil, errors.Wrapf(err, "error creating new signer for AWS OSS")
	}

	cfg := opensearch.Config{
		Addresses: []string{collectionEndpoint},
		Signer:    sig,
	}

	aossClient, err := opensearch.NewClient(cfg)
	if err != nil {
		return nil, errors.Wrapf(err, "error creating new OpenSearch API client")
	}

  body, err := getSearchBody()
  if err != nil {
    return nil, errors.Wrapf(err, "error getting search body")
  }

	req := opensearchapi.SearchRequest{
		Index: []string{indexPrefix + tenantID},
		Body:  body,
	}

	return req.Do(ctx, aossClient)
}

func createRequestSigner(ctx context.Context) (signer.Signer, error) {

	awsCfg, err := config.LoadDefaultConfig(ctx)
	if err != nil {
		return nil, errors.Wrapf(err, "error loading default config")
	}

	stsClient := sts.NewFromConfig(awsCfg)
	provider := stscreds.NewAssumeRoleProvider(stsClient, roleARN)

	awsCfg.Credentials = aws.NewCredentialsCache(provider)
	return awssignerv2.NewSignerWithService(awsCfg, aossService)
}

func getSearchBody() (*strings.Reader, error) {
	// Match all documents, page size = 10
	query := map[string]interface{}{
		"size": 10,
	}

	queryJson, err := json.Marshal(query)
  if err != nil {
    return nil, err
  }

	return strings.NewReader(string(queryJson)), nil
} 

Conclusion

In May of 2023, Alcion rolled out its search architecture based on the shared collection and dedicated index-per-tenant model in all production and pre-production environments. The team was able to tear out complex code and operational processes that had been dedicated to scaling OpenSearch Service managed domains. Furthermore, thanks to the auto scaling capabilities of OpenSearch Serverless, Alcion has reduced their OpenSearch costs by 30% and expects the cost profile to scale favorably.

In their journey from managed to serverless OpenSearch Service, Alcion benefited in their initial choice of OpenSearch Service managed domains. In migrating forward, they were able to reuse the same OpenSearch APIs and libraries for their OpenSearch Serverless collections that they used for their OpenSearch Service managed domain. Additionally, they updated their tenancy model to take advantage of OpenSearch Serverless data access policies. With OpenSearch Serverless, they were able to effortlessly adapt to their customers’ scale needs while ensuring tenant isolation.

For more information about Alcion, visit their website.


About the Authors

Zack Rossman is a Member of Technical Staff at Alcion. He is the tech lead for the search and AI platforms. Prior to Alcion, Zack was a Senior Software Engineer at Okta, developing core workforce identity and access management products for the Directories team.

Niraj Jetly is a Software Development Manager for Amazon OpenSearch Serverless. Niraj leads several data plane teams responsible for launching Amazon OpenSearch Serverless. Prior to AWS, Niraj led several product and engineering teams as CTO, VP of Engineering, and Head of Product Management for over 15 years. Niraj is a recipient of over 15 innovation awards, including being named CIO of the year in 2014 and top 100 CIO in 2013 and 2016. A frequent speaker at several conferences, he has been quoted in NPR, WSJ, and The Boston Globe.

Jon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania and a Master of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.

Configure monitoring, limits, and alarms in Amazon Redshift Serverless to keep costs predictable

Post Syndicated from Satesh Sonti original https://aws.amazon.com/blogs/big-data/configure-monitoring-limits-and-alarms-in-amazon-redshift-serverless-to-keep-costs-predictable/

Amazon Redshift Serverless makes it simple to run and scale analytics in seconds. It automatically provisions and intelligently scales data warehouse compute capacity to deliver fast performance, and you pay only for what you use. Just load your data and start querying right away in the Amazon Redshift Query Editor or in your favorite business intelligence (BI) tool. Redshift Serverless measures data warehouse capacity in Redshift Processing Units (RPUs), and you can configure base RPUs anywhere between 8–512. You can start with your preferred RPU capacity or defaults and adjust anytime later.

In this post, we share how you can monitor your workloads running on Redshift Serverless through three approaches: the Redshift Serverless console, Amazon CloudWatch, and system views. We also show how to set up guardrails via alerts and limits for Redshift Serverless to keep your costs predictable.

Method 1: Monitor through the Redshift Serverless console

You can view all user queries, including Data Manipulation Language (DML) statements, Data Definition Language (DDL) statements, and Data Control Language (DCL), through the Redshift Serverless console. You can also view the RPU consumption to run these workloads on a single page. You can also apply filters based on time, database, users, and type of queries.

Prerequisites for monitoring access

A superuser has access to monitor all workloads and resource consumption by default. If other users need monitoring access through the Redshift Serverless console, then the superuser can provide necessary access by performing the following steps:

  1. Create a policy with necessary privileges and assign this policy to required users or roles.
  2. Grant query monitoring permission to the user or role.

For more information, refer to Granting access to monitor queries.

Query monitoring

In this section, we walk through the Redshift Serverless console to see query history, database performance, and resource usage. We also go through monitoring options and how to set filters to narrow down results using filter attributes.

  1. On the Redshift Serverless console, under Monitoring in the navigation pane, choose Query and database monitoring.
  2. Open the workgroup you want to monitor.
  3. In the Metric filters section, expand Additional filtering options.
  4. You can set filters for time range, aggregation time interval, database, query category, SQL, and users.

Query and database monitoring

Two tabs are available, Query history and Database performance. Use the Query history tab for obtaining details at a per-query level, and the Database performance tab for reviewing performance aggregated across queries. Both these tabs are filtered based off the selections you made.

Under Query history, you will see the Query runtime graph. Use this graph to look into query concurrency (queries that are running in the same time frame). You can choose a query to view more query run details, for example, queries that took longer to run than you expected.

Query runtime monitoring dashbaord

In the Queries and loads section, you can see all queries by default, but you can also filter by status to view completed, running, and failed queries.

Query history screen

Navigate to the Database Performance tab in the Query and database monitoring section to view the following:

  • Queries completed per second – Average number of queries completed per second
  • Queries duration –Average amount of time to complete a query
  • Database connections – Number of active database connections
  • Running and Queued queries – Total number of running and queued queries at a Resource monitoring

To monitor your resources, complete the following steps:

  1. On the Redshift Serverless console, choose Resource monitoring under Monitoring in the navigation pane.

The default workgroup will be selected by default, but you can choose the workgroup you would like to monitor.

  1. In the Metric filters section, expand Additional filtering options.
  2. Choose a 1-minute time interval (for example) and review the results.

You can also try different ranges to see the results.

Screen to apply metric filters

On the RPU capacity used graph, you can see how Redshift Serverless is able to scale RPUs in a matter of minutes. This gives a visual representation of peaks and lows in your consumption over your chosen period of time.

RPU capacity consumption

You also see the actual compute usage in terms of RPU-seconds for the workload you ran.
RPU Seconds consumed

Method 2: Monitor metrics in CloudWatch

Redshift Serverless publishes serverless endpoint performance metrics to CloudWatch. The Amazon Redshift CloudWatch metrics are data points for operational monitoring. These metrics enable you to monitor performance of your serverless workgroups (compute) and usage of namespaces (data). CloudWatch allows you to centrally monitor your serverless endpoints in one AWS account, or also cross-account and cross-Region.

  • On the CloudWatch console, under Metrics in the navigation pane, choose All metrics.
  • On the Browse tab, choose AWS/Redshift-Serverless to get to a collection of metrics for Redshift Serverless usage.

Redshift Serverless in Amazon CloudWatch

  • Choose Workgroup to view workgroup-related metrics.

Workgroups and Namespaces

From the list, you can check your particular workgroup and the metrics available (in this example, ComputeSeconds and ComputeCapacity). You should see the graph is updated and charting your data.

Redshift Serverless Workgroup Metrics

  • To name the graph, choose the pencil icon next to the graph title and enter a graph name (for example, dataanalytics-serverless), then choose Apply.

Rename CloudWatch Graph

  • On the Browse tab, choose AWS/Redshift-Serverless and choose Namespace this time.
  • Select the namespace you want to monitor and the metrics of interest.

Redshift Serverless Namespace Metrics

You can add additional metrics to your graph. To centralize monitoring, you can add these metrics to an existing CloudWatch dashboard or a new dashboard.

  • On the Actions menu, choose Add to dashboard.

Redshift Serverless Namespace Metrics

Method 3: Granular monitoring using system views

System views in Redshift Serverless are used to monitor workload performance and RPU usage at a granular level over a period of time. These query monitoring system views have been simplified to include monitoring for DDL, DML, COPY, and UNLOAD queries. For a complete list of system views and their uses, refer to Monitoring views.

SQL Notebook

You can download the SQL notebook with most used system views queries. These queries help to answer most frequently asked monitoring questions listed below.

  • How to monitor queries based on status?
  • How to monitor specific query elapsed time breakdown details?
  • How to monitor workload breakdown by query count, and percentile run time?
  • How to monitor detailed steps involved in query execution?
  • How to monitor Redshift serverless usage cost by day?
  • How to monitor data loads (copy commands)?
  • How to monitor number of sessions, and connections?

You can import this in Query Editor V2.0 and run the queries connecting to the Redshift Serverless workgroup you would like to monitor.

Set limits to control costs

When you are creating your serverless endpoint, the base capacity is defaulted to 128 RPUs. However, you can change it at creation time or later via the Redshift Serverless console.

  1. On the details page of your serverless workgroup, choose the Limits tab.
  2. In the Base capacity section, choose Edit.
  3. You can specify Base capacity from 8–512 RPUs, in increments of 8.

Each RPU provides 16 GB memory, so the lowest base 8 RPU is compute with 128 GB memory, and highest base 512 RPU is compute with 8 TB memory.

Edit base RPU capacity

Usage limits

To configure usage capacity limits to limit your overall Redshift Serverless bill, complete the following steps:

  1. In the Usage limits section, choose Manage usage limits.
  2. To control RPU usage, set the maximum RPU-hours by frequency. You can set Frequency to Daily, Weekly, and Monthly.
  3. For Usage limit (RPU hours), enter your preferred value.
  4. For Action, choose Alert, Log to system table, or Turn off user queries.

Set RPU usage limit

Optionally, you can select an existing Amazon Simple Notification Service (Amazon SNS) topic or create a new SNS topic, and subscribe via email to this SNS topic to be notified when usage limits have been met.

Query monitoring rules for Redshift Serverless

To prevent wasteful resource utilization and runaway costs caused by poorly rewritten queries, you can implement query monitoring rules via query limits on your Redshift Serverless workgroup. For more information, refer to WLM query monitoring rules. The query monitoring rules in Redshift Serverless stop queries that meet the limit that has been set up in the rule. To receive notifications and automate notifications on Slack, refer to Automate notifications on Slack for Amazon Redshift query monitoring rule violations.

To set up query limits, complete the following steps:

  1. On the Redshift Serverless console, choose Workgroup configuration in the navigation pane.
  2. Choose a workgroup to monitor.
  3. On the workgroup details page, under Query monitoring rules, choose Manage query limits.

You can add up to 10 query monitoring rules to each serverless workgroup.

Set query limits

The serverless workgroup will go to a Modifying state each time you add or remove a limit.

Let’s take an example where you have to create a serverless workgroup for your dashboards. You know that dashboard queries typically complete in under a minute. If any dashboard query takes more than a minute, it could indicate a poorly written query or a query that hasn’t been tested well, and has incorrectly been released to production.

For this use case, we set a rule with Limit type as Query execution time and Limit (seconds) as 60.

Set required limit

The following screenshot shows the Redshift Serverless metrics available for setting up query monitoring rules.

Query Monitoring Metrics on CloudWatch

Configure alarms

Alarms are very useful because they enable you to make proactive decisions about your Redshift Serverless endpoint. Any usage limits that you set up will automatically show as alarms on the Redshift Serverless console, and are created as CloudWatch alarms.

Additionally, you can set up one or more CloudWatch alarms on any of the metrics listed in Amazon Redshift Serverless metrics.

For example, setting an alarm for DataStorage over a threshold value would keep track of the storage space that your serverless namespace is using for your data.

To create an alarm for your Redshift Serverless instance, complete the following steps:

  1. On the Redshift Serverless console, under Monitoring in the navigation pane, choose Alarms.
  2. Choose Create alarm.

Set Alarms from console

  1. Choose your level of metrics to monitor:
    • Workgroup
    • Namespace
    • Snapshot storage

If we select Workgroup, we can choose from the workgroup-level metrics shown in the following screenshot.

Workgroup Level Metrics

The following screenshot shows how we can set up alarms at the namespace level along with various metrics that are available to use.

Namespace Level Metrics

The following screenshot shows the metrics available at the snapshot storage level.

Snapshot level metrics

If you are starting new, then please start with most commonly used metrics listed below. Please also Create a billing alarm to monitor your estimated AWS charges.

  • ComputeSeconds
  • ComputeCapacity
  • DatabaseConnections
  • EstimatedCharges
  • DataStorage
  • QueriesFailed

Notifications

After you define your alarm, provide a name and a description, and choose to enable notifications.

Amazon Redshift uses an SNS topic to send alarm notifications. For instructions to create an SNS topic, refer to Creating an Amazon SNS topic. You must subscribe to the topic to receive the messages published to it. For instructions, refer to Subscribing to an Amazon SNS topic.

You can also monitor event notifications to be aware of the changes in your Redshift Serverless Datawarehouse. Please refer Amazon Redshift Serverless event notifications with Amazon EventBridge for further details.

Clean up

To clean up your resources, delete the workgroup and namespace you used for trying the monitoring approaches discussed in this post.

Cleanup

Conclusion

In this post, we covered how to perform monitoring activities on Redshift Serverless through the Redshift Serverless console, system views, and CloudWatch, and how to keep costs predictable. Try the monitoring approaches discussed in this post and let us know your feedback in the comments.


About the Authors

Satesh Sonti is a Sr. Analytics Specialist Solutions Architect based out of Atlanta, specialized in building enterprise data platforms, data warehousing, and analytics solutions. He has over 17 years of experience in building data assets and leading complex data platform programs for banking and insurance clients across the globe.

Harshida Patel is a Specialist Principal Solutions Architect, Analytics with AWS.

Raghu Kuppala is an Analytics Specialist Solutions Architect experienced working in the databases, data warehousing, and analytics space. Outside of work, he enjoys trying different cuisines and spending time with his family and friends.

Ashish Agrawal is a Sr. Technical Product Manager with Amazon Redshift, building cloud-based data warehouses and analytics cloud services. Ashish has over 24 years of experience in IT. Ashish has expertise in data warehouses, data lakes, and platform as a service. Ashish has been a speaker at worldwide technical conferences.

Enable data analytics with Talend and Amazon Redshift Serverless

Post Syndicated from Tamara Astakhova original https://aws.amazon.com/blogs/big-data/enable-data-analytics-with-talend-and-amazon-redshift-serverless/

This is a guest post co-written with Cameron Davie from Talend.

Today, in order to accelerate and scale data analytics, companies are looking for an approach to minimize infrastructure management and predict computing needs for different types of workloads, including spikes and ad hoc analytics.

The integration of Talend Cloud and Talend Stitch with Amazon Redshift Serverless can help you achieve successful business outcomes without data warehouse infrastructure management.

In this post, we demonstrate how Talend easily integrates with Redshift Serverless to help you accelerate and scale data analytics with trusted data.

About Redshift Serverless

Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Data scientists, developers, and data analysts can access meaningful insights and build data-driven applications with zero maintenance. Redshift Serverless automatically provisions and intelligently scales data warehouse capacity to deliver fast performance for even the most demanding and unpredictable workloads, and you pay only for what you use. You can load your data and start querying in your favorite business intelligence (BI) tools, build machine learning (ML) models in SQL, or combine your data with third-party data for new insights because Redshift Serverless seamlessly integrates with your data landscape. Existing Amazon Redshift customers can migrate their Redshift clusters to Redshift Serverless using the Amazon Redshift console or API without making changes to their applications and have the advantage of using this capability.

About Talend

Talend is an AWS ISV Partner with the Amazon Redshift Ready Product designation and AWS Competencies in both Data and Analytics and Migration. Talend Cloud combines data integration, data integrity, and data governance in a single, unified platform that makes it easy to collect, transform, clean, govern, and share your data. Talend Stitch is fully managed, scalable service that helps replicate data into your cloud data warehouse and quickly access analytics to make better, faster decisions.

Solution overview

The integration of Talend with Amazon Redshift adds new features and capabilities. As of this writing, Talend has 14 distinct native connectivity and configuration components for Amazon Redshift, which are fully documented in the Talend Help Center.

From the Talend Studio interface, there are no differences or changes required to support or access a Redshift Serverless instance or provisioned cluster.

In the following sections, we detail the steps to integrate the Talend Studio interface with Redshift Serverless.

Prerequisites

To complete the integration, you need a Redshift Serverless data warehouse. For setup instructions, see the Getting Started Guide. You also need a Talend Cloud account and Talend Studio. For setup instructions, see the Talend Cloud installation guide.

Integrate Talend Studio with Redshift Serverless

In the Talend Studio interface, you first create and establish a connection to Redshift Serverless. Then you add an output component to standard loading from your desired source into your Redshift Serverless data warehouse, using the established connection. The alternative step is to use a bulk loading component to load large amounts of data directly to your Redshift Serverless data warehouse, using the tRedshiftBulkExec component. Complete the following steps:

  1. Configure a tRedshiftConnection component to connect to Redshift Serverless:
    • For Database, choose Amazon Redshift.
    • Leave the values for Property Type and Driver version as default.
    • For Host, enter the Redshift Serverless endpoint’s host URL.
    • For Port, enter 5349.
    • For Database, enter your database name.
    • For Schema, enter your preferred schema.
    • For Username and Password, enter your user name and password, respectively.

Follow security best practices by using a strong password policy and regular password rotation to reduce the risk of password-based attacks or exploits.

For more information on how to connect to a database, refer to tDBConnection.

After you create the connection object, you can add an output component to your Talend Studio job. The output component defines that the data being processed in the job’s workflow will land in Redshift Serverless. The following examples show standard output and bulk loading output.

  1. Add a tRedshiftOutput database component.

tRedshiftOutput database component

  1. Configure the tRedshiftOutput database component to write, update, make changes to the connected Redshift Serverless data warehouse.
  2. When using the tRedshiftOutput component, select Use an existing component and choose the connection you created.

This step makes sure that this component is pre-configured.

tDBOutput component

For more information on how to set up a tDBOutput component, see tDBOutput.

  1. Alternatively, you can configure a tRedshiftBulkExec database component to run the insert operations on the connected Redshift Serverless data warehouse.

Using the tRedshiftBulkExec database component allows you to mass load data files directly from Amazon Simple Storage Service (Amazon S3) into Redshift Serverless as tables. The following screenshot illustrates that Talend is able to use connection information in a job across multiple components, saving time and effort when establishing connections to both Amazon Redshift and Amazon S3.

  1. When using the tRedshiftBulkExec component, select Use an existing component for Database settings and choose the connection you created.

This makes sure that this component is preconfigured.

  1. For S3 Setting, select Use an existing S3 connection and enter your existing connection that you will configure separately.

tDBBulkExec component

For more information on how to set up a tDBBulkExec component, see tDBBulkExec.

As well as Talend Cloud for enterprise-level data transformation needs, you could also use Talend Stitch to handle data ingestion and data replication to Redshift Serverless. All configuration for ingestion or replicating data from your desired sources to Redshift Serverless is done in a single input screen.

  1. Provide the following parameters:
    • For Display Name, enter your preferred display name for this connection.
    • For Description, enter a description of the connection. This is optional.
    • For Host, enter the Redshift Serverless endpoint’s host URL.
    • For Port, enter 5349.
    • For Database, enter your database name.
    • For Username and Password, enter your user name and password, respectively.

All support documents and information (including diagrams, steps, and screenshots) can be found in the Talend Cloud and Talend Stitch documentation.

Summary

In this post, we demonstrated how the integration of Talend with Redshift Serverless helps you quickly integrate multiple data sources into a fully managed, secure platform and immediately enable business-wide analytics.

Check out AWS Marketplace and sign up for a free trial with Talend. For more information about Redshift Serverless, refer to the Getting Started Guide.


About the Authors

Tamara Astakhova is a Sr. Partner Solutions Architect in Data and Analytics at AWS. She has over 18 years of experience in the architecture and development of large-scale data analytics systems. Tamara is working with strategic partners helping them build complex AWS-optimized architectures.

Cameron Davie is a Principal Solutions Engineer for the Tech Alliances team. He oversees the technical responsibilities of Talend’s most strategic ISV partnerships. Cameron has been with Talend for 6 years in this role, working directly as the primary technical resource for partners such as AWS, Snowflake, and more. Cameron’s role at Talend is primarily focused on technical enablement and evangelism. This includes showcasing key capabilities of our partners’ solution internally as well as demonstrating Talend’s core technical capabilities with the technical sellers at Talend’s strategic ISV partners. Cameron is a veteran of ISV partnerships and enterprise software, with over 23 years of experience. Before Talend, he spent 14 years at SAP on their OEM/Embedded Solutions partnership team.

Maneesh Sharma is a Senior Database Engineer at AWS with more than a decade of experience designing and implementing large-scale data warehouse and analytics solutions. He collaborates with various Amazon Redshift Partners and customers to drive better integration.

Implementing patterns that exit early out of a parallel state in AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/implementing-patterns-that-exit-early-out-of-a-parallel-state-in-aws-step-functions/

This post is written by Madhav Vishnubhatta, Senior Technical Account Manager, Enterprise Support.

This blog post explains how to implement patterns in AWS Step Functions that control the break out of a parallel state as soon as a minimum requirement is met. The parallel state usually completes only when all the parallel flows inside it are completed. But if you do not want to wait for all of the parallel flows to complete before moving to the next step, this post provides patterns to help implement this functionality.

You can use AWS Step Functions to set up visual serverless workflows that orchestrate and coordinate multiple AWS services into a serverless workflow. This allows you to build complex, stateful, and scalable applications without managing the underlying infrastructure. In Step Functions, the individual steps are called states.

Step Functions offers multiple types of states. Some states help control the logic of the workflow. For example, the choice state enables conditional logic to control the flow to any one of the multiple possible next states, depending on the conditions defined in the state. The parallel state helps control the logic, but rather than choose one of multiple next states (as the choice state does), the parallel state allows all the branches to run as parallel flows concurrently. When all the parallel flows are complete, control moves on to the Parallel state’s next state.

Patterns that do not need to wait for all parallel flows to finish

Consider a scenario where the Step Functions workflow represents the process of an employee requesting a laptop in your organization. The process begins with a request from the employee as the first step, but the approval of this request could come from either of two IT managers.

In this case there could be two parallel flows, each waiting for an approval from one IT manager. But, as soon as one person provides approval, the workflow can move forward to the next step of actually issuing a laptop to the employee. This is an “either-or” pattern.

Consider a similar use-case but with a slightly different requirement. Instead of just one person’s approval being enough to issue a laptop, what if approval is needed from a minimum of two out of three IT managers before the laptop is issued. This is the “quorum” pattern.

The parallel state does not directly support these two patterns because the state waits for all the flows to complete. In this case, that means all the managers must provide an approval before a laptop can be issued.

Solution overview

Step Functions provides an error handling mechanism with the fail state, which can be used to fail the workflow with an error. This error can be caught downstream in the workflow and handled as needed. Both the either-or and the quorum patterns can be implemented with this fail state along with the error handling capability.

In case of either-or, as soon as the parallel flow is finished, the fail state can throw an error, which is caught outside the parallel state for further processing. Even though it is the fail state, it might not represent an error scenario in your use-case.

The quorum pattern needs an additional mechanism to store the status of each parallel flow, using an Amazon DynamoDB table. The quorum pattern creates an item in the DynamoDB table at the beginning of the workflow that is updated by each parallel flow as soon as it has completed. Each parallel flow checks the DynamoDB table to look at the number of processes that have completed and compare it against the quorum. If the quorum is met, that flow raises an error with a fail state that can be caught outside the parallel step.

Prerequisites

Both of these patterns are published on Serverless Land:

To deploy and use these patterns, you need:

  1. An AWS Account
  2. Access to login as a user or assume a role that can:
  3. Familiarity with AWS Serverless Application Model (AWS SAM).
  4. AWS SAM Command Line Interface installed.

Example walkthrough

Either-or pattern

To deploy the Either-or pattern, follow the deployment Instructions section in the GitHub repo. This deployment creates the following resources:

  1. A Step Functions workflow.
  2. An IAM role that is assumed by the Step Functions workflow during execution.

Navigate to the AWS CloudFormation page in the AWS Management Console and choose the stack with the name provided during deployment. Choose the State Machine resource in the Resources section of the CloudFormation stack to go to the Step Functions console. Choose Edit and then choose WorkflowStudio to see a visual representation of the workflow.

You can see the exported workflow in the GitHub repo. This is the logic of the workflow:

Either-or patter. Conceptual flow.

  1. There are three (numbered) parallel flows in this workflow.
  2. Flows #1 and #2 are the main parallel flows, one of which completing should move the control to outside the Parallel state.
  3. Flow #3 is the time out flow so that the workflow can exit after a set amount of time if neither of the other two parallel flows complete by then.
  4. Each of the two main parallel flows follows the following logic:
    • Wait for the process to complete. This is a filler and can be replaced with your business logic on how to monitor process completion. This could be a human approval, or any other job that needs to finish.
    • Once process is complete, throw a dummy error, which moves control to outside the parallel state.
  5. The dummy errors for the two flows are caught outside the parallel state with corresponding catch condition.
  6. The errors from the two flows need not be caught separately. You might just do the same action no matter which of the parallel flows finished, but I show separate steps in case you need to do something different based on which parallel flow finished.

To test the workflow, follow the instructions provided in the Testing section of the README file at the GitHub repo.

To clean up the resources created, run:

sam delete

Quorum pattern

To deploy the Quorum pattern, follow the Deployment Instructions section in the GitHub repo. This deployment creates the following resources:

  1. A Step Functions workflow.
  2. An IAM role that is assumed by the Step Functions workflow during execution.
  3. A DynamoDB Table called “QuorumWorkflowTable”.

Navigate to CloudFormation in the AWS Management Console and choose the stack with the name provided during deployment. Choose the state machine resource in the Resources section of the CloudFormation stack to go to the Step Functions console.

Choose Edit and then choose WorkflowStudio to see a visual representation of the workflow.

You can see the the exported workflow in the GitHub repo. This is the logic of the workflow:

Quorum pattern. Conceptual flow.

  1. The first step creates an entry in the DynamoDB table with the execution ID of the workflow’s execution. This item in the table tracks the completion of processes.
  2. The next state is the parallel state, which has three parallel flows and a fourth time out flow. All the four flows are numbered.
  3. Flow #1, #2, and #3 are the main parallel flows, two of which completing should move the control to outside the parallel state.
  4. Flow #4 is the timeout flow so that the workflow can exit after a set amount of time, if neither of the other two parallel flows complete by then.
  5. Each of the three main parallel flows uses the following logic:
    • Wait for the process to complete.
    • Once complete, update the DynamoDB table entry to mark the completion of the process.
    • After the update, query the item from DynamoDB to get the list of processes that have completed and check if the quorum has been met.
    • If the quorum has been met, raise an “Error” (which is actually a success criterion in terms of business case), to move the control to outside the parallel state.

To test the workflow, follow the instructions provided in the Testing section of the README file at the GitHub repo.

To clean up the resources created, run:

sam delete

Conclusion

This blog post shows how you can implement patterns that must exit early out of a parallel state in an AWS Step Functions workflow.

The use-cases for this approach are not limited to these two patterns. More complicated use-cases like having different combinations of conditions to exit a parallel state can all be implemented using parallel and fail states.

Visit Serverless Land for more Step Functions workflow patterns.

Let’s Architect! DevOps Best Practices on AWS

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-devops-best-practices-on-aws/

DevOps has revolutionized software development and operations by fostering collaboration, automation, and continuous improvement. By bringing together development and operations teams, organizations can accelerate software delivery, enhance reliability, and achieve faster time-to-market.

In this blog post, we will explore the best practices and architectural considerations for implementing DevOps with Amazon Web Services (AWS), enabling you to build efficient and scalable systems that align with DevOps principles. The Let’s Architect! team wants to share useful resources that help you to optimize your software development and operations.

DevOps revolution

Distributed systems are adopted from enterprises more frequently now. When an organization wants to leverage distributed systems’ characteristics, it requires a mindset and approach shift, akin to a new model for software development lifecycle.

In this re:Invent 2021 video, Emily Freeman, now Head of Community Engagement at AWS, shares with us the insights gained in the trenches when adapting a new software development lifecycle that will help your organization thrive using distributed systems.

Take me to this re:Invent 2021 video!

Operationalizing the DevOps revolution

Operationalizing the DevOps revolution

My CI/CD pipeline is my release captain

Designing effective DevOps workflows is necessary for achieving seamless collaboration between development and operations teams. The Amazon Builders’ Library offers a wealth of guidance on designing DevOps workflows that promote efficiency, scalability, and reliability. From continuous integration and deployment strategies to configuration management and observability, this resource covers various aspects of DevOps workflow design. By following the best practices outlined in the Builders’ Library, you can create robust and scalable DevOps workflows that facilitate rapid software delivery and smooth operations.

Take me to this resource!

A pipeline coordinates multiple inflight releases and promotes them through three stages

A pipeline coordinates multiple inflight releases and promotes them through three stages

Using Cloud Fitness Functions to Drive Evolutionary Architecture

Cloud fitness functions provide a powerful mechanism for driving evolutionary architecture within your DevOps practices. By defining and measuring architectural fitness goals, you can continuously improve and evolve your systems over time.

This AWS Architecture Blog post delves into how AWS services, like AWS Lambda, AWS Step Functions, and Amazon CloudWatch can be leveraged to implement cloud fitness functions effectively. By integrating these services into your DevOps workflows, you can establish an architecture that evolves in alignment with changing business needs: improving system resilience, scalability, and maintainability.

Take me to this AWS Architecture Blog post!

Fitness functions provide feedback to engineers via metrics

Fitness functions provide feedback to engineers via metrics

Multi-Region Terraform Deployments with AWS CodePipeline using Terraform Built CI/CD

Achieving consistent deployments across multiple regions is a common challenge. This AWS DevOps Blog post demonstrates how to use Terraform, AWS CodePipeline, and infrastructure-as-code principles to automate Multi-Region deployments effectively. By adopting this approach, you can demonstrate the consistent infrastructure and application deployments, improving the scalability, reliability, and availability of your DevOps practices.

The post also provides practical examples and step-by-step instructions for implementing Multi-Region deployments with Terraform and AWS services, enabling you to leverage the power of infrastructure-as-code to streamline DevOps workflows.

Take me to this AWS DevOps Blog post!

Multi-Region AWS deployment with IaC and CI/CD pipelines

Multi-Region AWS deployment with IaC and CI/CD pipelines

See you next time!

Thanks for joining our discussion on DevOps best practices! Next time we’ll talk about how to create resilient workloads on AWS.

To find all the blogs from this series, check out the Let’s Architect! list of content on the AWS Architecture Blog. See you soon!

Access Amazon OpenSearch Serverless collections using a VPC endpoint

Post Syndicated from Raj Ramasubbu original https://aws.amazon.com/blogs/big-data/access-amazon-opensearch-serverless-collections-using-a-vpc-endpoint/

Amazon OpenSearch Serverless helps you index, analyze, and search your logs and data using OpenSearch APIs and dashboards. The OpenSearch Serverless collection is a group of indexes. API and dashboard clients can access the collections from public networks or one or more VPCs. For VPC access to collections and dashboards, you can create VPC endpoints. In this post, we demonstrate how you can create and use VPC endpoints and OpenSearch Serverless network policies to control access to your collections and OpenSearch dashboards from multiple network locations.

The demo in this post uses an AWS Lambda-based client in a VPC to ingest data into a collection via a VPC endpoint and a browser in a public network accessing the same collection.

Solution overview

To illustrate how you can ingest data into an OpenSearch Serverless collection from within a VPC, we use a Lambda function. We use a VPC-hosted Lambda function to create an index in an OpenSearch Serverless collection and add documents to the index using a VPC endpoint. We then use a publicly accessible OpenSearch Serverless dashboard to see the documents ingested from Lambda function.

The following sections detail the steps to ingest data into the collection using Lambda and access the OpenSearch Serverless dashboard.

Prerequisites

This setup assumes that you have already created a VPC with private subnets.

Ingest data using Lambda and access the OpenSearch Serverless dashboard

To set up your solution, complete the following steps:

  1. On the OpenSearch Service console, create a private connection between your VPC and OpenSearch Serverless using a VPC endpoint. Use the private subnets and a security group from your VPC.
  2. Create an OpenSearch collection using the VPC endpoint created in the previous step.
  3. Create a network policy to enable VPC access to the OpenSearch endpoint so the Lambda function can ingest documents to the collection. You should also enable public access to the OpenSearch dashboard endpoint so we can see the documents ingested.
  4. After you create the collection, create a data access policy to grant ingestion access to the Lambda function’s AWS Identity and Access Management (IAM) role.
  5. Additionally, grant read access to the dashboard user’s IAM role.
  6. Add IAM permissions to the Lambda function’s IAM role and the dashboard user’s IAM role for the OpenSearch Serverless collection.
  7. Create a Lambda function in the same VPC and subnet that we used for the OpenSearch endpoint (see the following code). This function creates an index called sitcoms-eighties in the OpenSearch Serverless collection and adds a sample document to the index:
import datetime
import time
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
import boto3

host = '<Insert-OpenSearch-Serverless-Endpoint>'
region = 'us-east-1'
service = 'aoss'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service,session_token=credentials.token)

# Build the OpenSearch client
client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)

def lambda_handler (event, context):

    # Create index
    response = client.indices.create('sitcoms-eighties')
    print('\nCreating index:')
    print(response)
    time.sleep(5)
    
    dt = datetime.datetime.now()
    # Add a document to the index.
    response = client.index(
        index='sitcoms-eighties',
        body={
            'title': 'Seinfeld',
            'creator': 'Larry David',
            'year': 1989,
            'createtime': dt
        },
        id='1',
    )
    print('\nDocument added:')
    print(response)
  1. Run the Lambda function, and you should see the output as shown in the following screenshot.
  2. You can now see the documents from this index through your publicly accessible OpenSearch Dashboards URL.
  3. Create the index pattern in OpenSearch Dashboards, and then you can see the documents as shown in the following screenshot.

Use a VPC DNS resolver from your network

A client in your VPN network can connect to the collection or dashboards over a VPC endpoint. The client needs to find the VPC endpoint’s IP address using an Amazon Route 53 inbound resolver endpoint. To learn more about Route 53 inbound resolver endpoints, refer to Resolving DNS queries between VPCs and your network. The following diagram shows a sample setup.

The flow for this architecture is as follows:

  1. The DNS query for the OpenSearch Serverless client is routed to a locally configured on-premises DNS server.
  2. The on-premises DNS as configured performs conditional forwarding for the zone us-east-1.aoss.amazonaws.com to a Route 53 inbound resolver endpoint IP address. You must replace your Region name in the preceding zone name.
  3. The inbound resolver endpoint performs DNS resolution by forwarding the query to the private hosted zone that was created along with the OpenSearch Serverless VPC endpoint.
  4. The IP addresses returned by the DNS query are the private IP addresses of the interface VPC endpoint, which allow your on-premises host to establish private connectivity over AWS Site-to-Site VPN.
  5. The interface endpoint is a collection of one or more elastic network interfaces with a private IP address in your account that serves as an entry point for traffic going to an OpenSearch Serverless endpoint.

Summary

OpenSearch Serverless allows you to set up and control access to the service using VPC endpoints and network policies. In this post, we explored how to access an OpenSearch Serverless collection API and dashboard from within a VPC, on premises, and public networks. If you have any questions or suggestions, please write to us in the comments section.


About the Authors

Raj Ramasubbu is a Senior Analytics Specialist Solutions Architect focused on big data and analytics and AI/ML with Amazon Web Services. He helps customers architect and build highly scalable, performant, and secure cloud-based solutions on AWS. Raj provided technical expertise and leadership in building data engineering, big data analytics, business intelligence, and data science solutions for over 18 years prior to joining AWS. He helped customers in various industry verticals like healthcare, medical devices, life science, retail, asset management, car insurance, residential REIT, agriculture, title insurance, supply chain, document management, and real estate.

Vivek Kansal works with the Amazon OpenSearch team. In his role as Principal Software Engineer, he uses his experience in the areas of security, policy engines, cloud-native solutions, and networking to help secure customer data in OpenSearch Service and OpenSearch Serverless in an evolving threat landscape.

Decoupling event publishing with Amazon EventBridge Pipes

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/decoupling-event-publishing-with-amazon-eventbridge-pipes/

This post is written by Gregor Hohpe, Sr. Principal Evangelist, Serverless.

Event-driven architectures (EDAs) help decouple applications or application components from one another. Consuming events makes a component less dependent on the sender’s location or implementation details. Because events are self-contained, they can be consumed asynchronously, which allows sender and receiver to follow independent timing considerations. Decoupling through events can improve development agility and operational resilience when building fine-grained distributed applications, which is the preferred style for serverless applications.

Many AWS services publish events through built-in mechanisms to support building event-driven architectures with a minimal amount of custom coding. Modern applications built on top of those services can also send and consume events based on their specific business logic. AWS application integration services like Amazon EventBridge or Amazon SNS, a managed publish-subscribe service, filter those events and route them to the intended destination, providing additional decoupling between event producer and consumer.

Publishing events

Custom applications that act as event producers often use the AWS SDK library, which is available for 12 programming languages, to send an event message. The application code constructs the event as a local data structure and specifies where to send it, for example to an EventBridge event bus.

The application code required to send an event to EventBridge is straightforward and only requires a few lines of code, as shown in this (simplified) helper method that publishes an order event generated by the application:

async function sendEvent(event) {
    const ebClient = new EventBridgeClient();
    const params = { Entries: [ {
          Detail: JSON.stringify(event),
          DetailType: "newOrder",
          Source: "orders",
          EventBusName: eventBusName
        } ] };
    return await ebClient.send(new PutEventsCommand(params));
}

An application most likely calls such a method in the context of another action, for example when persisting a received order to a data store. The code that performs those tasks might look as follows:

const order = { "orderId": "1234", "Items": ["One", "Two", "Three"] }
await writeToDb(order);
await sendEvent(order);

The code populates an order object with multiple line items (in reality, this would be based on data entered by a user or received via an API call), writes it to a database (via another helper method whose implementation isn’t shown), and then sends it to an EventBridge bus via the preceding method.

Code causes coupling

Although this code is not complex, it has drawbacks from an architectural perspective:

  • It interweaves application logic with the solution’s topology because the destination of the event, both in terms of service (EventBridge versus SNS, for example) and the instance (the service bus name in this case) are defined inside the application’s source code. If the event destination changes, you must change the source code, or at least know which string constant is passed to the function via an environment variable. Both aspects work against the EDA principle of minimizing coupling between components because changes in the communication structure propagate into the producer’s source code.
  • Sending the event after updating the database is brittle because it lacks transactional guarantees across both steps. You must implement error handling and retry logic to handle cases where sending the event fails, or even undo the database update. Writing such code can be tedious and error-prone.
  • Code is a liability. After all, that’s where bugs come from. In a real-life example, a helper method similar to preceding code erroneously swapped day and month on the event date, which led to a challenging debugging cycle. Boilerplate code to send events is therefore best avoided.

Performing event routing inside EventBridge can lessen the first concern. You could reconfigure EventBridge’s rules and targets to route events with a specified type and source to a different destination, provided you keep the event bus name stable. However, the other issues would remain.

Serverless: Less infrastructure, less code

AWS serverless integration services can alleviate the need to write custom application code to publish events altogether.

A key benefit of serverless applications is that you can let the AWS Cloud do the undifferentiated heavy lifting for you. Traditionally, we associate serverless with provisioning, scaling, and operating compute infrastructure so that developers can focus on writing code that generates business value.

Serverless application integration services can also take care of application-level tasks for you, including publishing events. Most applications store data in AWS data stores like Amazon Simple Storage Service (S3) or Amazon DynamoDB, which can automatically emit events whenever an update takes place, without any application code.

EventBridge Pipes: Events without application code

EventBridge Pipes allows you to create point-to-point integrations between event producers and consumers with optional transformation, filtering, and enrichment steps. Serverless integration services combined with cloud automation allow ”point-to-point” integrations to be more easily managed than in the past, which makes them a great fit for this use case.

This example takes advantage of EventBridge Pipes’ ability to fetch events actively from sources like DynamoDB Streams. DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. EventBridge Pipes picks up events from that log and pushes them to one of over 14 event targets, including an EventBridge bus, SNS, SQS, or API Destinations. It also accommodates batch sizes, timeouts, and rate limiting where needed.

EventBridge Pipes example

The integration through EventBridge Pipes can replace the custom application code that sends the event, including any retry or error logic. Only the following code remains:

const order = { "orderId": "1234", "Items": ["One", "Two", "Three"] }
await writeToDb(order);

Automation as code

EventBridge Pipes can be configured from the CLI, the AWS Management Console, or from automation code using AWS CloudFormation or AWS CDK. By using AWS CDK, you can use the same programming language that you use to write your application logic to also write your automation code.

For example, the following CDK snippet configures an EventBridge Pipe to read events from a DynamoDB Stream attached to the Orders table and passes them to an EventBridge event bus.

This code references the DynamoDB table via the ordersTable variable that would be set when the table is created:

const pipe = new CfnPipe(this, 'pipe', {
  roleArn: pipeRole.roleArn,
  source: ordersTable.tableStreamArn!,
  sourceParameters: {
    dynamoDbStreamParameters: {
      startingPosition: 'LATEST'
    },
  },
  target: ordersEventBus.eventBusArn,
  targetParameters: {
    eventBridgeEventBusParameters: {
      detailType: 'order-details',
      source: 'my-source',
    },
  },
}); 

The automation code cleanly defines the dependency between the DynamoDB table and the event destination, independent of application logic.

Decoupling with data transformation

Coupling is not limited to event sources and destinations. A source’s data format can determine the event format and require downstream changes in case the data format or the data source change. EventBridge Pipes can also alleviate that consideration.

Events emitted from the DynamoDB Stream use the native, marshaled DynamoDB format that includes type information, such as an “S” for strings or “L” for lists.

For example, the order event in the DynamoDB stream from this example looks as follows. Some fields are omitted for readability:

{
  "detail": {
    "eventID": "be1234f372dd12552323a2a3362f6bd2",
    "eventName": "INSERT",
    "eventSource": "aws:dynamodb",
    "dynamodb": {
      "Keys": { "orderId": { "S": "ABCDE" } },
      "NewImage": { 
        "orderId": { "S": "ABCDE" },
        "items": {
            "L": [ { "S": "One" }, { "S": "Two" }, { "S": "Three" } ]
        }
      }
    }
  }
} 

This format is not well suited for downstream processing because it would unnecessarily couple event consumers to the fact that this event originated from a DynamoDB Stream. EventBridge Pipes can convert this event into a more easily consumable format. The transformation is specified via an inputTemplate parameter using JSONPath expressions. EventBridge Pipes added support for list processing with wildcards proves to be perfect for this scenario.

In this example, add the following transformation template inside the target parameters to the preceding CDK code (the asterisk character matches a complete list of elements):

targetParameters: {
  inputTemplate: '{"orderId": <$.dynamodb.NewImage.orderId.S>,' + 
                 '"items": <$.dynamodb.NewImage.items.L[*].S>}'
}

This transformation formats the event published by EventBridge Pipes like a regular business event, decoupling any event consumer from the fact that it originated from a DynamoDB table:

{
  "time": "2023-06-01T06:18:10Z",
  "detail": {
    "orderId": "ABCDE",
    "items": ["One", "Two", "Three" ]
  }
}

Conclusion

When building event-driven applications, consider whether you can replace application code with serverless integration services to improve the resilience of your application and provide a clean separation between application logic and system dependencies.

EventBridge Pipes can be a helpful feature in these situations, for example to capture and publish events based on DynamoDB table updates.

Learn more about EventBridge Pipes at https://aws.amazon.com/eventbridge/pipes/ and discover additional ways to reduce serverless application code at https://serverlessland.com/refactoring-serverless. For a complete code example, see https://github.com/aws-samples/aws-refactoring-to-serverless.

Detecting and stopping recursive loops in AWS Lambda functions

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/detecting-and-stopping-recursive-loops-in-aws-lambda-functions/

This post is written by Pawan Puthran, Principal Serverless Specialist TAM, Aneel Murari, Senior Serverless Specialist Solution Architect, and Shree Shrikhande, Senior AWS Lambda Product Manager.

AWS Lambda is announcing a recursion control to detect and stop Lambda functions running in a recursive or infinite loop.

At launch, this feature is available for Lambda integrations with Amazon Simple Queue Service (Amazon SQS), Amazon SNS, or when invoking functions directly using the Lambda invoke API. Lambda now detects functions that appear to be running in a recursive loop and drops requests after exceeding 16 invocations.

This can help reduce costs from unexpected Lambda function invocations because of recursion. You receive notifications about this action through the AWS Health Dashboard, email, or by configuring Amazon CloudWatch Alarms.

Overview

You can invoke Lambda functions in multiple ways. AWS services generate events that invoke Lambda functions, and Lambda functions can send messages to other AWS services. In most architectures, the service or resource that invokes a Lambda function should be different from the service or resource that the function outputs to. Because of misconfiguration or coding bugs, a function can send a processed event to the same service or resource that invokes the Lambda function, causing a recursive loop.

Lambda now detects the function running in a recursive loop between supported services, after exceeding 16 invocations. It returns a RecursiveInvocationException to the caller. There is no additional charge for this feature. For asynchronous invokes, Lambda sends the event to a dead-letter queue or on-failure destination, if one is configured.

The following is an example of an order processing system.

Image processing system

Order processing system

  1. A new order information message is sent to the source SQS queue.
  2. Lambda consumes the message from the source queue using an ESM.
  3. The Lambda function processes the message and sends the updated orders message to a destination SQS queue using the SQS SendMessage API.
  4. The source queue has a dead-letter queue(DLQ) configured for handling any failed or unprocessed messages.
  5. Because of a misconfiguration, the Lambda function sends the message to the source SQS queue instead of the destination queue. This causes a recursive loop of Lambda function invocations.

To explore sample code for this example, see the GitHub repo.

In the preceding example, after 16 invocations, Lambda throws a RecursiveInvocationException to the ESM. The ESM stops invoking the Lambda function and, once the maxReceiveCount is exceeded, SQS moves the message to the source queues configured DLQ.

You receive an AWS Health Dashboard notification with steps to troubleshoot the function.

AWS Health Dashboard notification

AWS Health Dashboard notification

You also receive an email notification to the registered email address on the account.

Email notification

Email notification

Lambda emits a RecursiveInvocationsDropped CloudWatch metric, which you can view in the CloudWatch console.

RecursiveInvocationsDropped CloudWatch metric

RecursiveInvocationsDropped CloudWatch metric

How does Lambda detect recursion?

For Lambda to detect recursive loops, your function must use one of the supported AWS SDK versions or higher.

Lambda uses an AWS X-Ray trace header primitive called “Lineage” to track the number of times a function has been invoked with an event. When your function code sends an event using a supported AWS SDK version, Lambda increments the counter in the lineage header. If your function is then invoked with the same triggering event more than 16 times, Lambda stops the next invocation for that event. You do not need to configure active X-Ray tracing for this feature to work.

An example lineage header looks like:

X-Amzn-Trace-Id:Root=1-645f7998-4b1e232810b0bb733dba2eab;Parent=5be88d12eefc1fc0;Sampled=1;Lineage=43e12f0f:5

43e12f0f is the hash of a resource, in this case a Lambda function. 5 is the number of times this function has been invoked with the same event. The logic of hash generation, encoding, and size of the lineage header may change in the future. You should not design any application functionality based on this.

When using an ESM to consume messages from SQS, after the maxReceiveCount value is exceeded, the message is sent to the source queue’s configured DLQ. When Lambda detects a recursive loop and drops subsequent invocations, it returns a RecursiveInvocationException to the ESM. This increments the maxReceiveCount value. When the ESM auto retries to process events, based on the error handling configuration, these retries are not considered recursive invocations.

When using SQS, you can also batch multiple messages into one Lambda event. Where the message batch size is greater than 1, Lambda uses the maximum lineage value within the batch of messages. It drops the entire batch if the value exceeds 16.

Recursion detection in action

You can deploy a sample application example in the GitHub repo to test Lambda recursive loop detection. The application includes a Lambda function that reads from an SQS queue and writes messages back to the same SQS queue.

As prerequisites, you must install:

To deploy the application:

    1. Set your AWS Region:
export REGION=<your AWS region>
    1. Clone the GitHub repository
git clone https://github.com/aws-samples/aws-lambda-recursion-detection-sample.git
cd aws-lambda-recursion-detection-sample
    1. Use AWS SAM to build and deploy the resources to your AWS account. Enter a stack name, such as lambda-recursion, when prompted. Accept the remaining default values.
sam build –-use-container
sam deploy --guided --region $REGION

To test the application:

    1. Save the name of the SQS queue in a local environment variable:
SOURCE_SQS_URL=$(aws cloudformation describe-stacks \ --region $REGION \ --stack-name lambda-recursion \ --query 'Stacks[0].Outputs[?OutputKey==`SourceSQSqueueURL`].OutputValue' --output text)
  1. Publish a message to the source SQS queue:
aws sqs send-message --queue-url $SOURCE_SQS_URL --message-body '{"orderId":"111","productName":"Bolt","orderStatus":"Submitted"}' --region $REGION

This invokes the Lambda function, which writes the message back to the queue.

To verify that Lambda has detected the recursion:

  1. Navigate to the CloudWatch console. Choose All Metrics under Metrics in the left-hand panel and search for RecursiveInvocationsDropped.

    Find RecursiveInvocationsDropped.

    Find RecursiveInvocationsDropped.

  2. Choose Lambda > By Function Name and choose RecursiveInvocationsDropped for the function you created. Under Graphed metrics, change the statistic to sum and Period to 1 minute. You see one record. Refresh if you don’t see the metric after a few seconds.
Metrics sum view

Metrics sum view

Actions to take when Lambda stops a recursive loop

When you receive a notification regarding recursion in your account, the following steps can help address the issue.

  • To stop further invoke attempts while you fix the underlying configuration issue, set the function concurrency to 0. This acts as an off switch for the Lambda function. You can choose the “Throttle” button in the Lambda console or use the PutFunctionConcurrency API to set the function concurrency to 0.
  • You can also disable or delete the event source mapping or trigger for the Lambda function.
  • Check your Lambda function code and configuration for any code defects that create loops. For example, check your environment variables to ensure you are not using the same SQS queue or SNS topic as source and target.
  • If an SQS Queue is the event source for your Lambda function, configure a DLQ on the source queue.
  • If an SNS topic is the event source, configure an On-Failure Destination for the Lambda function.

Disabling recursion detection

You may have valid use-cases where Lambda recursion is intentional as part of your design. In this case, use caution and implement suitable guardrails to prevent unexpected charges to your account. To learn more about best practices for using recursive invocation patterns, see Recursive patterns that cause run-away Lambda functions in the AWS Lambda Operator Guide.

This feature is turned on by default to stop recursive loops. To request turning it off for your account, reach out to AWS Support.

Conclusion

Lambda recursion control for SQS and SNS automatically detects and stops functions running in a recursive or infinite loop. This can be due to misconfiguration or coding errors. Recursion control helps reduce unexpected usage with Lambda and downstream services. The post also explains how Lambda detects and stops recursive loops and notifies you through AWS Health Dashboard to troubleshoot the function.

To learn more about the feature, visit the AWS Lambda Developer Guide.

For more serverless learning resources, visit Serverless Land

Understanding AWS Lambda’s invoke throttling limits

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/understanding-aws-lambdas-invoke-throttle-limits/

This post is written by Archana Srikanta, Principal Engineer, AWS Lambda.

When you call AWS Lambda’s Invoke API, a series of throttle limits are evaluated to decide if your call is let through or throttled with a 429 “Too Many Requests” exception. This blog post explains the most common invoke throttle limits and the relationship between them, so you can better understand scaling workloads on Lambda.

Overview

Invoke call flow

The throttle limits exist to protect the following components of Lambda’s internal service architecture, and your workload, from noisy neighbors:

  • Execution environment: An execution environment is a Firecracker microVM where your function code runs. A given execution environment only hosts one invocation at a time, but it can be reused for subsequent invocations of the same function version.
  • Invoke data plane: These are a series of internal web services that, on an invoke, select (or create) a sandbox and route your request to it. This is also responsible for enforcing the throttle limits.

When you make an Invoke API call, it transits through some or all of the Invoke Data Plane services, before reaching an execution environment where your function code is downloaded and executed.

There are three distinct but related throttle limits which together decide if your invoke request is accepted by the data plane or throttled.

Concurrency

Concurrent means “existing, happening, or done at the same time”. Accordingly, the Lambda concurrency limit is a limit on the simultaneous in-flight invocations allowed at any given time. It is not a rate or transactions per second (TPS) limit in and of itself, but instead a limit on how many invocations can be inflight at the same time. This documentation visually explains the concept of concurrency.

Under the hood, the concurrency limit roughly translates to a limit on the maximum number of execution environments (and thus Firecracker microVMs) that your account can claim at any given point in time. Lambda runs a fleet of multi-tenant bare metal instances, on which Firecracker microVMs are carved out to serve as execution environments for your functions. AWS constantly monitors and scales this fleet based on incoming demand and shares the available capacity fairly among customers.

The concurrency limit helps protect Lambda from a single customer exhausting all the available capacity and causing a denial of service to other customers.

Transactions per second (TPS)

Customers often ask how their concurrency limit translates to TPS. The answer depends on how long your function invocations last.

Relation between concurrency and TPS

The diagram above considers three cases, each with a different function invocation duration, but a fixed concurrency limit of 1000. In the first case, invocations have a constant duration of 1 second. This means you can initiate 1000 invokes and claim all 1000 execution environments permitted by your concurrency limit. These execution environments remain busy for the entire second, and you cannot start any more invokes in that second because your concurrency limit prevents you from claiming any more execution environments. So, the TPS you can achieve with a concurrency limit of 1000 and a function duration of 1 second is 1000 TPS.

In case 2, the invocation duration is halved to 500ms, with the same concurrency limit of 1000. You can initiate 1000 concurrent invokes at the start of the second as before. These invokes keep the execution environments busy for the first half of the second. Once finished, you can start an additional 1000 invokes against the same execution environments while still being within your concurrency limit. So, by halving the function duration, you doubled your TPS to 2000.

Similarly, in case 3, if your function duration is 100ms, you can initiate 10 rounds of 1000 invokes each in a second, achieving a TPS of 10K.

Codifying this as an equation, the TPS you can achieve given a concurrency limit is:

TPS = concurrency / function duration in seconds

Taken to an extreme, for a function duration of only 1ms and at a concurrency limit of 1000 (the default limit), an account can drive an invoke TPS of one million. For every additional unit of concurrency granted via a limit increase, it implicitly grants an additional 1000 TPS per unit of concurrency increased. The high TPS doesn’t require any additional execution environments (Firecracker microVMs), so it’s not problematic from a fleet capacity perspective. However, driving over a million TPS from a single account puts stress on the Invoke Data Plane services. They must be protected from noisy neighbor impact as well so all customers have a fair share of the services’ bandwidth. A concurrency limit alone isn’t sufficient to protect against this – the TPS limit provides this protection.

As of this writing, the invoke TPS is capped at 10 times your concurrency. Added to the previous equation:

TPS = min( 10 x concurrency, concurrency / function duration in seconds)

The concurrency factor is common across both terms in the min function, so the key comparison is:

min(10, 1 / function duration in seconds)

If the function duration is exactly 100ms (or 1/10th of a second), both terms in the min function are equal. If the function duration is over 100ms, the second term is lower and TPS is limited as per concurrency/function duration. If the function duration is under 100ms, the first term is lower and TPS is limited as per 10 x concurrency.

Limits for functions less than 100ms

To summarize, the TPS limit exists to protect the Invoke Data Plane from the high churn of short-lived invocations, for which the concurrency limit alone affords too high of a TPS. If you drive short invocations of under 100ms, your throughput is capped as though the function duration is 100ms (at 10 x concurrency) as shown in the diagram above. This implies that short lived invocations may be TPS limited, rather than concurrency limited. However, if your function duration is over 100ms you can effectively ignore the 10 x concurrency TPS limit and calculate your available TPS as concurrency/function duration.

Burst

The third throttle limit is the burst limit. Lambda does not keep execution environments provisioned for your entire concurrency limit at all times. That would be wasteful, especially if usage peaks are transient, as is the case with many workloads. Instead, the service spins up execution environments just-in-time as the invoke arrives, if one doesn’t already exist. Once an execution environment is spun up, it remains “warm” for some period of time and is available to host subsequent invocations of the same function version.

However, if an invoke doesn’t find a warm execution environment, it experiences a “cold start” while we provision a new execution environment. Cold starts involve certain additional operations over and above the warm invoke path, such as downloading your code or container and initializing your application within the execution environment. These initialization operations are typically computationally heavy and so have a lower throughput compared to the warm invoke path. If there are sudden and steep spikes in the number of cold starts, it can put pressure on the invoke services that handle these cold start operations, and also cause undesirable side effects for your application such as increased latencies, reduced cache efficiency and increased fan out on downstream dependencies. The burst limit exists to protect against such surges of cold starts, especially for accounts that have a high concurrency limit. It ensures that the climb up to a high concurrency limit is gradual so as to smooth out the number of cold starts in a burst.

The algorithm used to enforce the burst limit is the Token Bucket rate-limiting algorithm. Consider a bucket that holds tokens. The bucket has a maximum capacity of B tokens (burst). The bucket starts full. Each time you send an invoke request that requires an additional unit of concurrency, it costs a token from the bucket. If the token exists, you are granted the additional concurrency and the token is removed from the bucket. The bucket is refilled at a constant rate of r tokens per minute (rate) until it reaches its maximum capacity.

Token bucket

What this means is that the rate of climb of concurrency is limited to r tokens per minute. Even though the algorithm allows you to collect up to B tokens and burst, you must wait for the bucket to refill before you can burst again, effectively limiting your average rate to r per minute.

Concurrency burst limit chart

The chart above shows the burst limit in action with a maximum concurrency limit of 3000, a maximum burst(B) of 1000 and a refill rate(r) of 500/minute. The token bucket starts full with 1000 tokens, as is the available burst headroom.

There is a burst activity between minute one and two, which consumes all tokens in the bucket and claims all 1000 concurrent execution environments allowed by the burst limit. At this point the bucket is empty and any attempt to claim additional concurrent execution environments is burst throttled, in spite of max concurrency not being reached yet.

The token bucket and the burst headroom are replenished at minutes two and three with 500 tokens each minute to bring it back up to its maximum capacity of 1000. At minute four, there is no additional refill because the bucket is at maximum capacity. Between minutes four and five, there is a second burst activity which empties the bucket again and claims an additional 1000 execution environments, bringing the total number of active execution environments to 2000.

The bucket continues to replenish at a rate of 500/minute at minutes five and six. At this point, sufficient tokens have been accumulated to cover the entire concurrency limit of 3000, and so the bucket isn’t refilled anymore even when you have the third burst activity at minute seven. At minute ten, when all the usage ramps down, the available burst headroom slowly stair steps back down to the maximum initial burst of 1K.

The actual numbers for maximum burst and refill rate vary by Region and are subject to change, please visit the Lambda burst limits page for specific values.

It is important to distinguish that the burst limit isn’t a rate limit on the invoke itself, but a rate limit on how quickly concurrency can rise. However, since invoke TPS is a function of concurrency, it also clamps how quickly TPS can rise (a rate limit for a rate limit). The following chart shows how the TPS burst headroom follows a similar stair step pattern as the concurrency burst headroom, only with a multiplier.

Burst limits chart

Conclusion

This blog explains three key throttle limits applied on Lambda invokes: the concurrency limit, TPS limit and burst limit. It outlines the relationship between these limits and how each one protects the system and your workload from noisy neighbors. Equipped with this knowledge you can better interpret any 429 throttling exceptions you may receive while scaling your applications on Lambda. For more information on getting started with Lambda visit the Developer Guide.

For more serverless learning resources, visit Serverless Land.

Implementing AWS Lambda error handling patterns

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/implementing-aws-lambda-error-handling-patterns/

This post is written by Jeff Chen, Principal Cloud Application Architect, and Jeff Li, Senior Cloud Application Architect

Event-driven architectures are an architecture style that can help you boost agility and build reliable, scalable applications. Splitting an application into loosely coupled services can help each service scale independently. A distributed, loosely coupled application depends on events to communicate application change states. Each service consumes events from other services and emits events to notify other services of state changes.

Handling errors becomes even more important when designing distributed applications. A service may fail if it cannot handle an invalid payload, dependent resources may be unavailable, or the service may time out. There may be permission errors that can cause failures. AWS services provide many features to handle error conditions, which you can use to improve the resiliency of your applications.

This post explores three use-cases and design patterns for handling failures.

Overview

AWS Lambda, Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and Amazon EventBridge are core building blocks for building serverless event-driven applications.

The post Understanding the Different Ways to Invoke Lambda Functions lists the three different ways of invoking a Lambda function: synchronous, asynchronous, and poll-based invocation. For a list of services and which invocation method they use, see the documentation.

Lambda’s integration with Amazon API Gateway is an example of a synchronous invocation. A client makes a request to API Gateway, which sends the request to Lambda. API Gateway waits for the function response and returns the response to the client. There are no built-in retries or error handling. If the request fails, the client attempts the request again.

Lambda’s integration with SNS and EventBridge are examples of asynchronous invocations. SNS, for example, sends an event to Lambda for processing. When Lambda receives the event, it places it on an internal event queue and returns an acknowledgment to SNS that it has received the message. Another Lambda process reads events from the internal queue and invokes your Lambda function. If SNS cannot deliver an event to your Lambda function, the service automatically retries the same operation based on a retry policy.

Lambda’s integration with SQS uses poll-based invocations. Lambda runs a fleet of pollers that poll your SQS queue for messages. The pollers read the messages in batches and invoke your Lambda function once per batch.

You can apply this pattern in many scenarios. For example, your operational application can add sales orders to an operational data store. You may then want to load the sales orders to your data warehouse periodically so that the information is available for forecasting and analysis. The operational application can batch completed sales as events and place them on an SQS queue. A Lambda function can then process the events and load the completed sale records into your data warehouse.

If your function processes the batch successfully, the pollers delete the messages from the SQS queue. If the batch is not successfully processed, the pollers do not delete the messages from the queue. Once the visibility timeout expires, the messages are available again to be reprocessed. If the message retention period expires, SQS deletes the message from the queue.

The following table shows the invocation types and retry behavior of the AWS services mentioned.

AWS service example Invocation type Retry behavior
Amazon API Gateway Synchronous No built-in retry, client attempts retries.

Amazon SNS

Amazon EventBridge

Asynchronous Built-in retries with exponential backoff.
Amazon SQS Poll-based Retries after visibility timeout expires until message retention period expires.

There are a number of design patterns to use for poll-based and asynchronous invocation types to retain failed messages for additional processing. These patterns can help you recover from delivery or processing failures.

You can explore the patterns and test the scenarios by deploying the code from this repository which uses the AWS Cloud Development Kit (AWS CDK) using Python.

Lambda poll-based invocation pattern

When using Lambda with SQS, if Lambda isn’t able to process the message and the message retention period expires, SQS drops the message. Failure to process the message can be due to function processing failures, including time-outs or invalid payloads. Processing failures can also occur when the destination function does not exist, or has incorrect permissions.

You can configure a separate dead-letter queue (DLQ) on the source queue for SQS to retain the dropped message. A DLQ preserves the original message and is useful for analyzing root causes, handling error conditions properly, or sending notifications that require manual interventions. In the poll-based invocation scenario, the Lambda function itself does not maintain a DLQ. It relies on the external DLQ configured in SQS. For more information, see Using Lambda with Amazon SQS.

The following shows the design pattern when you configure Lambda to poll events from an SQS queue and invoke a Lambda function.

Lambda synchronously polling catches of messages from SQS

Lambda synchronously polling batches of messages from SQS

To explore this pattern, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with the happy and unhappy paths.

Lambda asynchronous invocation pattern

With asynchronous invokes, there are two failure aspects to consider when using Lambda. The event source cannot deliver the message to Lambda and the Lambda function errors when processing the event.

Event sources vary in how they handle failures delivering messages to Lambda. If SNS or EventBridge cannot send the event to Lambda after exhausting all their retry attempts, the service drops the event. You can configure a DLQ on an SNS topic or EventBridge event bus to hold the dropped event. This works in the same way as the poll-based invocation pattern with SQS.

Lambda functions may then error due to input payload syntax errors, duration time-outs, or the function throws an exception such as a data resource not available.

For asynchronous invokes, you can configure how long Lambda retains an event in its internal queue, up to 6 hours. You can also configure how many times Lambda retries when the function errors, between 0 and 2. Lambda discards the event when the maximum age passes or all retry attempts fail. To retain a copy of discarded events, you can configure either a DLQ or, preferably, a failed-event destination as part of your Lambda function configuration.

A Lambda destination enables you to specify what to do next if an asynchronous invocation succeeds or fails. You can configure a destination to send invocation records to SQS, SNS, EventBridge, or another Lambda function. Destinations are preferred for failure processing as they support additional targets and include additional information. A DLQ holds the original failed event. With a destination, Lambda also passes details of the function’s response in the invocation record. This includes stack traces, which can be useful for analyzing the root cause.

Using both a DLQ and Lambda destinations

You can apply this pattern in many scenarios. For example, many of your applications may contain customer records. To comply with the California Consumer Privacy Act (CCPA), different organizations may need to delete records for a particular customer. You can set up a consumer delete SNS topic. Each organization creates a Lambda function, which processes the events published by the SNS topic and deletes customer records in its managed applications.

The following shows the design pattern when you configure an SNS topic as the event source for a Lambda function, which uses destination queues for success and failure process.

SNS topic as event source for Lambda

SNS topic as event source for Lambda

You configure a DLQ on the SNS topic to capture messages that SNS cannot deliver to Lambda. When Lambda invokes the function, it sends details of the successfully processed messages to an on-success SQS destination. You can use this pattern to route an event to multiple services for simpler use cases. For orchestrating multiple services, AWS Step Functions is a better design choice.

Lambda can also send details of unsuccessfully processed messages to an on-failure SQS destination.

A variant of this pattern is to replace an SQS destination with an EventBridge destination so that multiple consumers can process an event based on the destination.

To explore how to use an SQS DLQ and Lambda destinations, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with the happy and unhappy paths.

Using a DLQ

Although destinations is the preferred method to handle function failures, you can explore using DLQs.

The following shows the design pattern when you configure an SNS topic as the event source for a Lambda function, which uses SQS queues for failure process.

Lambda invoked asynchonously

Lambda invoked asynchonously

You configure a DLQ on the SNS topic to capture the messages that SNS cannot deliver to the Lambda function. You also configure a separate DLQ for the Lambda function. Lambda saves an unsuccessful event to this DLQ after Lambda cannot process the event after maximum retry attempts.

To explore how to use a Lambda DLQ, deploy the code in this repository. Once deployed, you can use this instruction to test the pattern with happy and unhappy paths.

Conclusion

This post explains three patterns that you can use to design resilient event-driven serverless applications. Error handling during event processing is an important part of designing serverless cloud applications.

You can deploy the code from the repository to explore how to use poll-based and asynchronous invocations. See how poll-based invocations can send failed messages to a DLQ. See how to use DLQs and Lambda destinations to route and handle unsuccessful events.

Learn more about event-driven architecture on Serverless Land.

Serverless ICYMI Q2 2023

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/serverless-icymi-q2-2023/

Welcome to the 22nd edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

Serverless Innovation Day

AWS recently hosted the Serverless Innovation Day, a day of live streams that showcased AWS serverless technologies such as AWS Lambda, Amazon ECS with AWS Fargate, Amazon EventBridge, and AWS Step Functions. The event included insights from AWS leaders such as Holly Mesrobian, Ajay Nair, and Usman Khalid, as well as prominent customers and our serverless Developer Advocate team. It provided insights into serverless modernization success stories, use cases, and best practices. If you missed the event, you can catch up on the recorded sessions here.

Serverless Land, your go-to resource for all things serverless, expanded to include a new Serverless Testing section. This provides valuable insights, patterns, and best practices for testing integrations using AWS SAM and CDK templates.

Serverless Land also launched a new learning page featuring a collection of resources, including blog posts, videos, workshops, and training materials, allowing users to choose a learning path from a variety of topics. “EventBridge Visuals“, small, easily digestible visuals focused on EventBridge have also been added.

AWS Lambda

Lambda introduced support for response payload streaming allowing functions to progressively stream response data to clients. This feature significantly improves performance by reducing the time to first byte (TTFB) latency, benefiting web and mobile applications.

Response streaming is particularly useful for applications with large payloads such as images, videos, documents, or database results. It eliminates the need to buffer the entire payload in memory and enables the transfer of responses larger than Lambda’s 6 MB limit, up to a soft limit of 20 MB.

By configuring the Function URL to use the InvokeWithResponseStream API, streaming responses can be accessed through an HTTP client that supports incremental response data. This enhancement expands Lambda’s capabilities, allowing developers to handle larger payloads more efficiently and enhance the overall performance and user experience of their web and mobile applications.

Lambda now supports Java 17 with Amazon Corretto distribution, providing long-term support and improved performance. Java 17 introduces new language features like records, sealed classes, and multi-line strings. The runtime uses ZGC and Shenandoah garbage collectors to reduce latency. Default JVM configuration changes optimize tiered compilation for reduced startup latency. Developers can use Java 17 in Lambda through AWS Management Console, AWS SAM, and AWS CDK. Popular frameworks like Spring Boot 3 and Micronaut 4 require Java 17 as a minimum. Micronaut provides a web service to generate example projects using Java 17 and AWS CDK infrastructure.

Lambda now supports the Ruby 3.2 runtime, enabling you to write serverless functions using the latest version of the Ruby programming language. This update enhances developer productivity and brings new features and improvements to your Ruby-based Lambda functions.

Lambda introduced support for Kafka and Amazon MQ event sources in four additional Regions. This expanded availability allows developers to build event-driven architectures using these messaging systems in more regions around the world, providing greater flexibility and scalability. It also supports Kafka and Amazon MQ event sources in AWS GovCloud (US) Regions, allowing government organizations to leverage the benefits of event-driven architectures in their cloud environments.

Lambda also added support for starting from a specific timestamp for Kafka event sources, allowing for precise message processing and useful scenarios like Disaster Recovery, without any additional charges.

Serverless Land has launched new learning paths for Lambda to help you level up your serverless skills:

  • The Java Replatforming learning path guides Java developers through the process of migrating existing Java applications to a serverless architecture.
  • The Lift and Shift to Serverless learning path provides guidance on migrating traditional applications to a serverless environment.
  • Lambda Fundamentals is a 23-part video series providing practical examples and tips to help you get started with serverless development using Lambda.

The new AWS Tech Talk, Best practices for building interactive applications with AWS Lambda, helps you learn best practices and architectural patterns for building web and mobile backends as well as API-driven microservices on Lambda. Explore how to take advantage of features in Lambda, Amazon API Gateway, Amazon DynamoDB, and more to easily build highly scalable serverless web applications.

AWS Step Functions

The latest update to AWS Step Functions introduces versions and aliases, allows users to run specific state machine revisions, ensuring reliable deployments, reducing risks, and providing version visibility. Appending version numbers to the state machine ARN enables selection of desired versions, even after updates. Aliases distribute execution requests based on weights, supporting incremental deployment patterns.

This enhances confidence in state machine updates, improves observability, auditing, and can be managed through the Step Functions console or AWS CloudFormation. Versions and aliases are available in all supported AWS Regions at no extra cost.

AWS SAM

AWS SAM CLI has introduced a new feature called remote invoke that allows developers to test Lambda functions in the AWS Cloud. This feature enables developers to invoke Lambda functions from their local development environment and provides options for event payloads, output formats, and logging.

It can be used with or without AWS SAM and can be combined with AWS SAM Accelerate for streamlined development and testing. Overall, the remote invoke feature simplifies serverless application testing in the AWS Cloud.

Amazon EventBridge

EventBridge announced an open-source connector for Kafka Connect, providing seamless integration between EventBridge and Kafka Connect. This connector simplifies the process of streaming events from Kafka topics to EventBridge, enabling you to build event-driven architectures with ease.

EventBridge has improved end-to-end latencies for event buses, delivering events up to 80% faster. This enables broader use in latency-sensitive applications such as industrial and medical applications, with the lower latencies applied by default across all AWS Regions at no extra cost.

Amazon Aurora Serverless v2

Amazon Aurora Serverless v2 is now available in four additional Regions, expanding the reach of this scalable and cost-effective serverless database option. With Aurora Serverless v2, you can benefit from automatic scaling, pause-and-resume capability, and pay-per-use pricing, enabling you to optimize costs and manage your databases more efficiently.

Amazon SNS

Amazon SNS now supports message data protection in five additional Regions, ensuring the security and integrity of your message payloads. With this feature, you can encrypt sensitive message data at rest and in transit, meeting compliance requirements and safeguarding your data.

Serverless Blog Posts

April 2023

Apr 27 – AWS Lambda now supports Java 17

Apr 27 – Optimizing Amazon EC2 Spot Instances with Spot Placement Scores

Apr 26 – Building private serverless APIs with AWS Lambda and Amazon VPC Lattice

Apr 25 – Implementing error handling for AWS Lambda asynchronous invocations

Apr 20 – Understanding techniques to reduce AWS Lambda costs in serverless applications

Apr 18 – Python 3.10 runtime now available in AWS Lambda

Apr 13 – Optimizing AWS Lambda extensions in C# and Rust

Apr 7 – Introducing AWS Lambda response streaming

May 2023

May 24 – Developing a serverless Slack app using AWS Step Functions and AWS Lambda

May 11 – Automating stopping and starting Amazon MWAA environments to reduce cost

May 10 – Monitor Amazon SNS-based applications end-to-end with AWS X-Ray active tracing

May 10 – Debugging SnapStart-enabled Lambda functions made easy with AWS X-Ray

May 10 – Implementing cross-account CI/CD with AWS SAM for container-based Lambda functions

May 3 – Extending a serverless, event-driven architecture to existing container workloads

May 3 – Patterns for building an API to upload files to Amazon S3

June 2023

Jun 7 – Ruby 3.2 runtime now available in AWS Lambda

Jun 5 – Implementing custom domain names for Amazon API Gateway private endpoints using a reverse proxy

June 22 – Deploying state machines incrementally with versions and aliases in AWS Step Functions

June 22 – Testing AWS Lambda functions with AWS SAM remote invoke

Videos

Serverless Office Hours – Tues 10AM PT

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues.

YouTube: youtube.com/serverlessland
Twitch: twitch.tv/aws

LinkedIn:  linkedin.com/company/serverlessland

April 2023

Apr 4 – Serverless AI with ChatGPT and DALL-E

Apr 11 – Building Java apps with AWS SAM

Apr 18 – Managing EventBridge with Kubernetes

Apr 25 – Lambda response streaming

May 2023

May 2 – Automating your life with serverless 

May 9 – Building real-life asynchronous architectures

May 16 – Testing Serverless Applications

May 23 – Build faster with Amazon CodeCatalyst 

May 30 – Serverless networking with VPC Lattice

June 2023

June 6 – AWS AppSync: Private APIs and Merged APIs 

June 13 – Integrating EventBridge and Kafka

June 20 – AWS Copilot for serverless containers

June 27 – Serverless high performance modeling

FooBar Serverless YouTube channel

April 2023

Apr 6 – Designing a DynamoDB Table in 4 Steps: From Entities to Access Patterns

Apr 14 – Amazon CodeWhisperer – Improve developer productivity using machine learning (ML)

Apr 20 – Beginner’s Guide to DynamoDB with AWS CDK: Step-by-Step Tutorial for provisioning NoSQL Databases

Apr 27 – Build a WebApp that uses DynamoDB in 6 steps | DynamoDB Expressions

May 2023

May 4 – How to Migrate Data to DynamoDB?

May 11 – Load Testing DynamoDB: Observability and Performance tuning

May 18 – DynamoDB Streams – THE most powerful feature from DynamoDB for event-driven applications

May 25 – Track Application Events with DynamoDB streams and Email Notifications using EventBridge Pipes

June 2023

Jun 1 – How to filter messages based on the payload using Amazon SNS

June 8 – Getting started with Amazon Kinesis

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Retrieving parameters and secrets with Powertools for AWS Lambda (TypeScript)

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/retrieving-parameters-and-secrets-with-powertools-for-aws-lambda-typescript/

This post is written by Andrea Amorosi, Senior Solutions Architect and Pascal Vogel, Solutions Architect.

When building serverless applications using AWS Lambda, you often need to retrieve parameters, such as database connection details, API secrets, or global configuration values at runtime. You can make these parameters available to your Lambda functions via secure, scalable, and highly available parameter stores, such as AWS Systems Manager Parameter Store or AWS Secrets Manager.

The Parameters utility for Powertools for AWS Lambda (TypeScript) simplifies the integration of these parameter stores inside your Lambda functions. The utility provides high-level functions for retrieving secrets and parameters, integrates caching and transformations, and reduces the amount of boilerplate code you must write.

The Parameters utility supports the following parameter stores:

The Parameters utility is part of the Powertools for AWS Lambda (TypeScript), which you can use in both JavaScript and TypeScript code bases. Implementing guidance from the Serverless Applications Lens of the AWS Well-Architected Framework, Powertools provides utilities to ease the adoption of best practices such as distributed tracing, structured logging, and asynchronous business and application metrics.

For more details, see the Powertools for AWS Lambda (TypeScript) documentation on GitHub and the introduction blog post.

This blog post shows how to use the new Parameters utility to retrieve parameters and secrets in your JavaScript and TypeScript Lambda functions securely.

Getting started with the Parameters utility

Initial setup

The Powertools toolkit is modular, meaning that you can install the Parameters utility independently from the Logger, Tracing, or Metrics packages. Install the Parameters utility library in your project via npm:

npm install @aws-lambda-powertools/parameters

In addition, you must add the AWS SDK client for the parameter store you are planning to use. The Parameters utility supports AWS SDK v3 for JavaScript only, which allows the utility to be modular. You install only the needed SDK packages to keep your bundle size small.

Next, assign appropriate AWS Identity and Access Management (IAM) permissions to the Lambda function execution role of your Lambda function that allow retrieving parameters from the parameter store.

The following sections illustrate how to perform the previously mentioned steps for some typical parameter retrieval scenarios.

Retrieving a single parameter from SSM Parameter Store

To retrieve parameters from SSM Parameter Store, install the AWS SDK client for SSM in addition to the Parameters utility:

npm install @aws-sdk/client-ssm

To retrieve an individual parameter, the Parameters utility provides the getParameter function:

import { getParameter } from '@aws-lambda-powertools/parameters/ssm';

export const handler = async (): Promise<void> => {
  // Retrieve a single parameter
  const parameter = await getParameter('/my/parameter');
  console.log(parameter);
};

Finally, you need to assign an IAM policy with the ssm:GetParameter permission to your Lambda function execution role. Apply the principle of least privilege by scoping the permission to the specific parameter resource as shown in the following policy example:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameter"
      ],
      "Resource": [
        "arn:aws:ssm:AWS_REGION:AWS_ACCOUNT_ID:my/parameter"
      ]
    }
  ]
}

Adjusting cache TTL

By default, the retrieved parameters are cached in-memory for 5 seconds. This cached value is used for further invocations of the Lambda function until it expires. If your application requires a different behavior, the Parameters utility allows you to adjust the time-to-live (TTL) via the maxAge argument.

Building on the previous example, if you want to cache your retrieved parameter for 30 instead of 5 seconds, you can adapt your function code as follows:

import { getParameter } from '@aws-lambda-powertools/parameters/ssm';

export const handler = async (): Promise<void> => {
  // Retrieve a single parameter with a 30 seconds cache TTL
  const parameter = await getParameter('/my/parameter', { maxAge: 30 });
  console.log(parameter);
};

In other cases, you may want to always retrieve the latest value from the parameter store and ignore any cached value. To achieve this, set the forceFetch parameter to true:

import { getParameter } from '@aws-lambda-powertools/parameters/ssm';

export const handler = async (): Promise<void> => {
  // Always retrieve the latest value of a single parameter
  const parameter = await getParameter('/my/parameter', { forceFetch: true });
  console.log(parameter);
};

For details, see Always fetching the latest in the Powertools for AWS Lambda (TypeScript) documentation.

Decoding parameters stored in JSON or base64 format

If some of your parameters are stored in base64 or JSON, you can deserialize them via the Parameters utility’s transform argument.

Considering a parameter stored in SSM as JSON, it can be retrieved and deserialized as follows:

import { Transform } from '@aws-lambda-powertools/parameters';
import { getParameter } from '@aws-lambda-powertools/parameters/ssm';

export const handler = async (): Promise => {
  // Retrieve and deserialize a single JSON parameter
  const valueFromJson = await getParameter('/my/json/parameter', { transform: Transform.JSON });
  console.log(valueFromJson);
};

The Parameters utility supports the transform argument for all parameter store providers and high-level functions. For details, see Deserializing values with transform parameters.

Working with encrypted parameters in SSM Parameter Store

SSM Parameter Store supports encrypted secure string parameters via the AWS Key Management Service (AWS KMS). The Parameters utility allows you to retrieve these encrypted parameters by adding the decrypt argument to your request.

For example, you could retrieve an encrypted parameter as follows:

import { getParameter } from '@aws-lambda-powertools/parameters/ssm';

export const handler = async (): Promise<void> => {
  // Decrypt the parameter
  const decryptedParameter = await getParameter('/my/encrypted/parameter', { decrypt: true });
  console.log(decryptedParameter);
};

In this case, the Lambda function execution role needs to have the kms:Decrypt IAM permission in addition to ssm:GetParameter.

Retrieving multiple parameters from SSM Parameter Store

Besides retrieving a single parameter using getParameter, you can also use getParameters to recursively retrieve multiple parameters under a SSM Parameter Store path, or getParametersByName to retrieve multiple distinct parameters by their full name.

You can also apply custom caching, transform, or decrypt configurations per parameter when using getParametersByName. The following example retrieves three distinct parameters from SSM Parameter Store with different caching and transform configurations:

import { getParametersByName } from '@aws-lambda-powertools/parameters/ssm';
import type {
  SSMGetParametersByNameOptionsInterface
} from '@aws-lambda-powertools/parameters/ssm/types';

const props: Record<string, SSMGetParametersByNameOptionsInterface> = {
  '/develop/service/commons/telemetry/config': { maxAge: 300, transform: 'json' },
  '/no_cache_param': { maxAge: 0 },
  '/develop/service/payment/api/capture/url': {}, // When empty or undefined, it uses default values
};

export const handler = async (): Promise<void> => {
  // This returns an object with the parameter name as key
  const parameters = await getParametersByName(props);
  for (const [ key, value ] of Object.entries(parameters)) {
    console.log(`${key}: ${value}`);
  }
};

Retrieving multiple parameters requires the GetParameter and GetParameters permissions to be present in the Lambda function execution role.

Retrieving secrets from Secrets Manager

To securely store sensitive parameters such as passwords or API keys for external services, Secrets Manager is a suitable option. To retrieve secrets from Secrets Manager using the Parameters utility, install the AWS SDK client for Secrets Manager in addition to the Parameters utility:

npm install @aws-sdk/client-secrets-manager

Now you can access a secret using its key as follows:

import { getSecret } from '@aws-lambda-powertools/parameters/secrets';

export const handler = async (): Promise<void> => {
  // Retrieve a single secret
  const secret = await getSecret('my-secret');
  console.log(secret);
};

Getting a secret from Secrets Manager requires you to add the secretsmanager:GetSecretValue IAM permission to your Lambda function execution role.

Retrieving an application configuration from AppConfig

If you plan to leverage feature flags or dynamic application configurations in your applications built on Lambda, AppConfig is a suitable option. The Parameters utility makes it easy to fetch configurations from AppConfig while benefitting from utility features such as caching and transformations.

For example, considering an AppConfig application called my-app with an environment called my-env, you can retrieve its configuration profile my-configuration as follows:

import { getAppConfig } from '@aws-lambda-powertools/parameters/appconfig';

export const handler = async (): Promise<void> => {
  // Retrieve a configuration, latest version
  const config = await getAppConfig('my-configuration', {
    environment: 'my-env',
    application: 'my-app'
  });
  console.log(config);
};

Retrieving a configuration requires both the appconfig:GetLatestConfiguration and appconfig:StartConfigurationSession IAM permissions to be attached to the Lambda function execution role.

Retrieving a parameter from a DynamoDB table

DynamoDB’s low latency and high flexibility make it a great option for storing parameters. To use DynamoDB as a parameter store via the Parameters utility, install the DynamoDB AWS SDK client and utility package in addition to the Parameters utility.

npm install @aws-sdk/client-dynamodb @aws-sdk/util-dynamodb

By default, the Parameters utility expects the DynamoDB table containing the parameters to have a partition key of id and an attribute called value. For example, assuming an item with an id of my-parameter and a value of my-value stored in an DynamoDB table called my-table, you can retrieve it as follows:

import { DynamoDBProvider } from '@aws-lambda-powertools/parameters/dynamodb';

const dynamoDBProvider = new DynamoDBProvider({ tableName: 'my-table' });

export const handler = async (): Promise<void> => {
  // Retrieve a value from DynamoDB
  const value = await dynamoDBProvider.get('my-parameter');
  console.log(value);
};

In case of retrieving a single parameter from DynamoDB, the Lambda function execution role needs to have the dynamodb:GetItem IAM permission.

The Parameters utility DynamoDB provider can also retrieve multiple parameters from a table with a single request via a DynamoDB query. See DynamoDB provider in the Powertools for AWS Lambda (TypeScript) documentation for details.

Conclusion

This blog post introduces the Powertools for AWS Lambda (TypeScript) Parameters utility and demonstrates how it is used with different parameter stores. The Parameters utility allows you to retrieve secrets and parameters in your Lambda function from SSM Parameter Store, Secrets Manager, AppConfig, DynamoDB, and custom parameter stores. By using the utility, you get access to functionality such as caching and transformation, and reduce the amount of boilerplate code you need to write for your Lambda functions.

To learn more about the Parameters utility and its full set of functionality, take a look at the Powertools for AWS Lambda (TypeScript) documentation.

Share your feedback for Powertools for AWS Lambda (TypeScript) by opening a GitHub issue.

For more serverless learning resources, visit Serverless Land.

Implementing AWS Well-Architected best practices for Amazon SQS – Part 3

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-3/

This blog is written by Chetan Makvana, Senior Solutions Architect and Hardik Vasa, Senior Solutions Architect.

This is the third part of a three-part blog post series that demonstrates best practices for Amazon Simple Queue Service (Amazon SQS) using the AWS Well-Architected Framework.

This blog post covers best practices using the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar. The inventory management example introduced in part 1 of the series will continue to serve as an example.

See also the other two parts of the series:

Performance Efficiency Pillar

The Performance Efficiency Pillar includes the ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve. It recommends best practices to use trade-offs to improve performance, such as learning about design patterns and services and identify how tradeoffs impact customers and efficiency.

By adopting these best practices, you can optimize the performance of SQS by employing appropriate configurations and techniques while considering trade-offs for the specific use case.

Best practice: Use action batching or horizontal scaling or both to increase throughput

For achieving high throughput in SQS, optimizing the performance of your message processing is crucial. You can use two techniques: horizontal scaling and action batching.

When dealing with high message volume, consider horizontally scaling the message producers and consumers by increasing the number of threads per client, by adding more clients, or both. By distributing the load across multiple threads or clients, you can handle a high number of messages concurrently.

Action batching distributes the latency of the batch action over the multiple messages in a batch request, rather than accepting the entire latency for a single message. Because each round trip carries more work, batch requests make more efficient use of threads and connections, improving throughput. You can combine batching with horizontal scaling to provide throughput with fewer threads, connections, and requests than individual message requests.

In the inventory management example that we introduced in part 1, this scaling behavior is managed by AWS for the AWS Lambda function responsible for backend processing. When a Lambda function subscribes to an SQS queue, Lambda polls the queue as it waits for the inventory updates requests to arrive. Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time. If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages.

This means that Lambda can scale up to 1,000 concurrent Lambda functions processing messages from the SQS queue. Batching enables the inventory management system to handle a high volume of inventory update messages efficiently. This ensures real-time visibility into inventory levels and enhances the accuracy and responsiveness of inventory management operations.

Best practice: Trade-off between SQS standard and First-In-First-Out (FIFO) queues

SQS supports two types of queues: standard queues and FIFO queues. Understanding the trade-offs between SQS standard and FIFO queues allows you to make an informed choice that aligns with your application’s requirements and priorities. While SQS standard queues support a nearly unlimited throughput, it sacrifices strict message ordering and occasionally delivers messages in an order different from the one they were sent in. If maintaining the exact order of events is not critical for your application, utilizing SQS standard queues can provide significant benefits in terms of throughput and scalability.

On the other hand, SQS FIFO queues guarantee message ordering and exactly-once processing. This makes them suitable for applications where maintaining the order of events is crucial, such as financial transactions or event-driven workflows. However, FIFO queues have a lower throughput compared to standard queues. They can handle up to 3,000 transactions per second (TPS) per API method with batching, and 300 TPS without batching. Consider using FIFO queues only when the order of events is important for the application, otherwise use standard queues.

In the inventory management example, since the order of inventory records is not crucial, the potential out-of-order message delivery that can occur with SQS standard queues is unlikely to impact the inventory processing. This allows you to take advantage of the benefits provided by SQS standard queues, including their ability to handle a high number of transactions per second.

Cost Optimization Pillar

The Cost Optimization Pillar includes the ability to run systems to deliver business value at the lowest price. It recommends best practices to build and operate cost-aware workloads that achieve business outcomes while minimizing costs and allowing your organization to maximize its return on investment.

Best practice: Configure cost allocation tags for SQS to organize and identify SQS for cost allocation

A well-defined tagging strategy plays a vital role in establishing accurate chargeback or showback models. By assigning appropriate tags to resources, such as SQS queues, you can precisely allocate costs to different teams or applications. This level of granularity ensures fair and transparent cost allocation, enabling better financial management and accountability.

In the inventory management example, tagging the SQS queue allows for specific cost tracking under the Inventory department, enabling a more accurate assessment of expenses. The following code snippet shows how to tag the SQS queue using AWS Could Development Kit (AWS CDK).

# Create the SQS queue with DLQ setting
queue = sqs.Queue(
    self,
    "InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(300),
)

Tags.of(queue).add("department", "inventory")

Best practice: Use long polling

SQS offers two methods for receiving messages from a queue: short polling and long polling. By default, queues use short polling, where the ReceiveMessage request queries a subset of servers to identify available messages. Even if the query found no messages, SQS sends the response right away.

In contrast, long polling queries all servers in the SQS infrastructure to check for available messages. SQS responds only after collecting at least one message, respecting the specified maximum. If no messages are immediately available, the request is held open until a message becomes available or the polling wait time expires. In such cases, an empty response is sent.

Short polling provides immediate responses, making it suitable for applications that require quick feedback or near-real-time processing. On the other hand, long polling is ideal when efficiency is prioritized over immediate feedback. It reduces API calls, minimizes network traffic, and improves resource utilization, leading to cost savings.

In the inventory management example, long polling enhances the efficiency of processing inventory updates. It collects and retrieves available inventory update messages in a batch of 10, reducing the frequency of API requests. This batching approach optimizes resource utilization, minimizes network traffic, and reduces excessive API consumption, resulting in cost savings. You can configure this behavior using batch size and batch window:

# Add the SQS queue as a trigger to the Lambda function
sqs_to_dynamodb_function.add_event_source_mapping(
    "MyQueueTrigger", event_source_arn=queue.queue_arn, batch_size=10
)

Best practice: Use batching

Batching messages together allows you to send or retrieve multiple messages in a single API call. This reduces the number of API requests required to process or retrieve messages compared to sending or retrieving messages individually. Since SQS pricing is based on the number of API requests, reducing the number of requests can lead to cost savings.

To send, receive, and delete messages, and to change the message visibility timeout for multiple messages with a single action, use Amazon SQS batch API actions. This also helps with transferring less data, effectively reducing the associated data transfer costs, especially if you have many messages.

In the context of the inventory management example, the CSV processing Lambda function groups 10 inventory records together in each API call, forming a batch. By doing so, the number of API requests is reduced by a factor of 10 compared to sending each record separately. This approach optimizes the utilization of API resources, streamlines message processing, and ultimately contributes to cost efficiency. Following is the code snippet from the CSV processing Lambda function showcasing the use of SendMessageBatch to send 10 messages with a single action.

# Parse the CSV records and send them to SQS as batch messages
csv_reader = csv.DictReader(csv_content.splitlines())
message_batch = []
for row in csv_reader:
    # Convert the row to JSON
    json_message = json.dumps(row)

    # Add the message to the batch
    message_batch.append(
        {"Id": str(len(message_batch) + 1), "MessageBody": json_message}
    )

    # Send the batch of messages when it reaches the maximum batch size (10 messages)
    if len(message_batch) == 10:
        sqs_client.send_message_batch(QueueUrl=queue_url, Entries=message_batch)
        message_batch = []
        print("Sent messages in batch")

Best practice: Use temporary queues

In case of short-lived, lightweight messaging with synchronous two-way communication, you can use temporary queues. The temporary queue makes it easy to create and delete many temporary messaging destinations without inflating your AWS bill. The key concept behind this is the virtual queue. Virtual queues let you multiplex many low-traffic queues onto a single SQS queue. Creating a virtual queue only instantiates a local buffer to hold messages for consumers as they arrive; there is no API call to SQS, and no costs associated with creating a virtual queue.

The inventory management example does not use temporary queues. However, in use cases that involve short-lived, lightweight messaging with synchronous two-way communication, adopting the best practice of using temporary queues and virtual queues can enhance the overall efficiency, reduce costs, and simplify the management of messaging destinations.

Sustainability Pillar

The Sustainability Pillar provides best practices to meet sustainability targets for your AWS workloads. It encompasses considerations related to energy efficiency and resource optimization.

Best practice: Use long polling

Besides its cost optimization benefits explained as part of the Cost Optimization Pillar, long polling also plays a crucial role in improving resource efficiency by reducing API requests, minimizing network traffic, and optimizing resource utilization.

By collecting and retrieving available messages in a batch, long polling reduces the frequency of API requests, resulting in improved resource utilization and minimized network traffic. By reducing excessive API consumption through long polling, you can effectively use resources. It collects and retrieves messages in batches, reducing excessive API consumption and unnecessary network traffic.

By reducing API calls, it optimizes data transfer and infrastructure operations. Additionally, long polling’s batching approach optimizes resource allocation, utilizing system resources more efficiently and improving energy efficiency. This enables the inventory management system to handle high message volumes effectively while operating in a cost-efficient and resource-efficient manner.

Conclusion

This blog post explores best practices for SQS using the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar of the AWS Well-Architected Framework. We cover techniques such as batch processing, message batching, and scaling considerations. We also discuss important considerations, such as resource utilization, minimizing resource waste, and reducing cost.

This three-part blog post series covers a wide range of best practices, spanning the Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability Pillars of the AWS Well-Architected Framework. By following these guidelines and leveraging the power of the AWS Well-Architected Framework, you can build robust, secure, and efficient messaging systems using SQS.

For more serverless learning resources, visit Serverless Land.

Implementing AWS Well-Architected best practices for Amazon SQS – Part 2

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-2/

This blog is written by Chetan Makvana, Senior Solutions Architect and Hardik Vasa, Senior Solutions Architect.

This is the second part of a three-part blog post series that demonstrates implementing best practices for Amazon Simple Queue Service (Amazon SQS) using the AWS Well-Architected Framework.

This blog post covers best practices using the Security Pillar and Reliability Pillar of the AWS Well-Architected Framework. The inventory management example introduced in part 1 of the series will continue to serve as an example.

See also the other two parts of the series:

Security Pillar

The Security Pillar includes the ability to protect data, systems, and assets and to take advantage of cloud technologies to improve your security. This pillar recommends putting in place practices that influence security. Using these best practices, you can protect data while in-transit (as it travels to and from SQS) and at rest (while stored on disk in SQS), or control who can do what with SQS.

Best practice: Configure server-side encryption

If your application has a compliance requirement such as HIPAA, GDPR, or PCI-DSS mandating encryption at rest, if you are looking to improve data security to protect against unauthorized access, or if you are just looking for simplified key management for the messages sent to the SQS queue, you can leverage Server-Side Encryption (SSE) to protect the privacy and integrity of your data stored on SQS.

SQS and AWS Key Management Service (KMS) offer two options for configuring server-side encryption. SQS-managed encryptions keys (SSE-SQS) provide automatic encryption of messages stored in SQS queues using AWS-managed keys. This feature is enabled by default when you create a queue. If you choose to use your own AWS KMS keys to encrypt and decrypt messages stored in SQS, you can use the SSE-KMS feature.

Amazon SQS Encryption Settings

SSE-KMS provides greater control and flexibility over encryption keys, while SSE-SQS simplifies the process by managing the encryption keys for you. Both options help you protect sensitive data and comply with regulatory requirements by encrypting data at rest in SQS queues. Note that SSE-SQS only encrypts the message body and not the message attributes.

In the inventory management example introduced in part 1, an AWS Lambda function responsible for CSV processing sends incoming messages to an SQS queue when an inventory updates file is dropped into the Amazon Simple Storage Service (Amazon S3) bucket. SQS encrypts these messages in the queue using SQS-SSE. When a backend processing Lambda polls messages from the queue, the encrypted message is decrypted, and the function inserts inventory updates into Amazon DynamoDB.

The AWS Could Development Kit (AWS CDK) code sets SSE-SQS as the default encryption key type. However, the following AWS CDK code shows how to encrypt the queue with SSE-KMS.

# Create the SQS queue with DLQ setting
queue = sqs.Queue(
    self,
    "InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(300),
    encryption=sqs.QueueEncryption.KMS_MANAGED,
)

Best practice: Implement least-privilege access using access policy

For securing your resources in AWS, implementing least-privilege access is critical. This means granting users and services the minimum level of access required to perform their tasks. Least-privilege access provides better security, allows you to meet your compliance requirements, and offers accountability via a clear audit trail of who accessed what resources and when.

By implementing least-privilege access using access policies, you can help reduce the risk of security breaches and ensure that your resources are only accessed by authorized users and services. AWS Identity and Access Management (IAM) policies apply to users, groups, and roles, while resource-based policies apply to AWS resources such as SQS queues. To implement least-privilege access, it’s essential to start by defining what actions are required for each user or service to perform their tasks.

In the inventory management example, the CSV processing Lambda function doesn’t perform any other task beyond parsing the inventory updates file and sending the inventory records to the SQS queue for further processing. To ensure that the function has the permissions to send messages to the SQS queue, grant the SQS queue access to the IAM role that the Lambda function assumes. By granting the SQS queue access to the Lambda function’s IAM role, you establish a secure and controlled communication channel. The Lambda function can only interact with the SQS queue and doesn’t have unnecessary access or permissions that might compromise the system’s security.

# Create pre-processing Lambda function
csv_processing_to_sqs_function = _lambda.Function(
    self,
    "CSVProcessingToSQSFunction",
    runtime=_lambda.Runtime.PYTHON_3_8,
    code=_lambda.Code.from_asset("sqs_blog/lambda"),
    handler="CSVProcessingToSQSFunction.lambda_handler",
    role=role,
    tracing=Tracing.ACTIVE,
)

# Define the queue policy to allow messages from the Lambda function's role only
policy = iam.PolicyStatement(
    actions=["sqs:SendMessage"],
    effect=iam.Effect.ALLOW,
    principals=[iam.ArnPrincipal(role.role_arn)],
    resources=[queue.queue_arn],
)

queue.add_to_resource_policy(policy)

Best practice: Allow only encrypted connections over HTTPS using aws:SecureTransport

It is essential to have a secure and reliable method for transferring data between AWS services and on-premises environments or other external systems. With HTTPS, a network-based attacker cannot eavesdrop on network traffic or manipulate it, using an attack such as man-in-the-middle.

With SQS, you can choose to allow only encrypted connections over HTTPS using the aws:SecureTransport condition key in the queue policy. With this condition in place, any requests made over non-secure HTTP receive a 400 InvalidSecurity error from SQS.

In the inventory management example, the CSV processing Lambda function sends inventory updates to the SQS queue. To ensure secure data transfer, the Lambda function uses the HTTPS endpoint provided by SQS. This guarantees that the communication between the Lambda function and the SQS queue remains encrypted and resistant to potential security threats.

# Create an IAM policy statement allowing only HTTPS access to the queue
secure_transport_policy = iam.PolicyStatement(
    effect=iam.Effect.DENY,
    actions=["sqs:*"],
    resources=[queue.queue_arn],
    conditions={
        "Bool": {
            "aws:SecureTransport": "false",
        },
    },
)

Best practice: Use attribute-based access controls (ABAC)

Some use-cases require granular access control. For example, authorizing a user based on user roles, environment, department, or location. Additionally, dynamic authorization is required based on changing user attributes. In this case, you need an access control mechanism based on user attributes.

Attribute-based access controls (ABAC) is an authorization strategy that defines permissions based on tags attached to users and AWS resources. With ABAC, you can use tags to configure IAM access permissions and policies for your queues. ABAC hence enables you to scale your permission management easily. You can author a single permission policy in IAM using tags created for each business role, and no longer need to update the policy when adding new resources.

ABAC for SQS queues enables two key use cases:

  • Tag-based access control: use tags to control access to your SQS queues, including control plane and data plane API calls.
  • Tag-on-create: enforce tags at the time of creation of an SQS queues and deny the creation of SQS resources without tags.

Reliability Pillar

The Reliability Pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to. By leveraging the best practices outlined in this pillar, you can enhance the way you manage messages in SQS.

Best practice: Configure dead-letter queues

In a distributed system, when messages flow between sub-systems, there is a possibility that some messages may not be processed right away. This could be because of the message being corrupted or downstream processing being temporarily unavailable. In such situations, it is not ideal for the bad message to block other messages in the queue.

Dead Letter Queues (DLQs) in SQS can improve the reliability of your application by providing an additional layer of fault tolerance, simplifying debugging, providing a retry mechanism, and separating problematic messages from the main queue. By incorporating DLQs into your application architecture, you can build a more robust and reliable system that can handle errors and maintain high levels of performance and availability.

In the inventory management example, a DLQ plays a vital role in adding message resiliency and preventing situations where a single bad message blocks the processing of other messages. If the backend Lambda function fails after multiple attempts, the inventory update message is redirected to the DLQ. By inspecting these unconsumed messages, you can troubleshoot and redrive them to the primary queue or to custom destination using the DLQ redrive feature. You can also automate redrive by using a set of APIs programmatically. This ensures accurate inventory updates and prevents data loss.

The following AWS CDK code snippet shows how to create a DLQ for the source queue and sets up a DLQ policy to only allow messages from the source SQS queue. It is recommended not to set the max_receive_count value to 1, especially when using a Lambda function as the consumer, to avoid accumulating many messages in the DLQ.

# Create the Dead Letter Queue (DLQ)
dlq = sqs.Queue(self, "InventoryUpdatesDlq", visibility_timeout=Duration.seconds(300))

# Create the SQS queue with DLQ setting
queue = sqs.Queue(
    self,
    "InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(300),
    dead_letter_queue=sqs.DeadLetterQueue(
        max_receive_count=3,  # Number of retries before sending the message to the DLQ
        queue=dlq,
    ),
)
# Create an SQS queue policy to allow source queue to send messages to the DLQ
policy = iam.PolicyStatement(
    effect=iam.Effect.ALLOW,
    actions=["sqs:SendMessage"],
    resources=[dlq.queue_arn],
    conditions={"ArnEquals": {"aws:SourceArn": queue.queue_arn}},
)
queue.queue_policy = iam.PolicyDocument(statements=[policy])

Best practice: Process messages in a timely manner by configuring the right visibility timeout

Setting the appropriate visibility timeout is crucial for efficient message processing in SQS. The visibility timeout is the period during which SQS prevents other consumers from receiving and processing a message after it has been polled from the queue.

To determine the ideal visibility timeout for your application, consider your specific use case. If your application typically processes messages within a few seconds, set the visibility timeout to a few minutes. This ensures that multiple consumers don’t process the message simultaneously. If your application requires more time to process messages, consider breaking them down into smaller units or batching them to improve performance.

If a message fails to process and is returned to the queue, it will not be available for processing again until the visibility timeout period has elapsed. Increasing the visibility timeout will increase the overall latency of your application. Therefore, it’s important to balance the tradeoff between reducing the likelihood of message duplication and maintaining a responsive application.

In the inventory management example, setting the right visibility timeout helps the application fail fast and improve the message processing times. Since the Lambda function typically processes messages within milliseconds, a visibility timeout of 30 seconds is set in the following AWS CDK code snippet.

queue = sqs.Queue(
    self,
    " InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(30),
)

It is recommended to keep the SQS queue visibility timeout to at least six times the Lambda function timeout, plus the value of MaximumBatchingWindowInSeconds. This allows Lambda function to retry the messages if the invocation fails.

Conclusion

This blog post explores best practices for SQS using the Security Pillar and Reliability Pillar of the AWS Well-Architected Framework. We discuss various best practices and considerations to ensure the security of SQS. By following these best practices, you can create a robust and secure messaging system using SQS. We also highlight fault tolerance and processing a message in a timely manner as important aspects of building reliable applications using SQS.

The next part of this blog post series focuses on the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar of the AWS Well-Architected Framework and explore best practices for SQS.

For more serverless learning resources, visit Serverless Land.

Implementing AWS Well-Architected best practices for Amazon SQS – Part 1

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-1/

This blog is written by Chetan Makvana, Senior Solutions Architect and Hardik Vasa, Senior Solutions Architect.

Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. AWS customers have constantly discovered powerful new ways to build more scalable, elastic, and reliable applications using SQS. You can leverage SQS in a variety of use-cases requiring loose coupling and high performance at any level of throughput, while reducing cost by only paying for value and remaining confident that no message is lost. When building applications with Amazon SQS, it is important to follow architectural best practices.

To help you identify and implement these best practices, AWS provides the AWS Well-Architected Framework for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems in the AWS Cloud. Built around six pillars—operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability, AWS Well-Architected provides a consistent approach for customers and partners to evaluate architectures and implement scalable designs.

This three-part blog series covers each pillar of the AWS Well-Architected Framework to implement best practices for SQS. This blog post, part 1 of the series, discusses best practices using the Operational Excellence Pillar of the AWS Well-Architected Framework.

See also the other two parts of the series:

Solution overview

Solution architecture for Inventory Updates Process

This solution architecture shows an example of an inventory management system. The system leverages Amazon Simple Storage Service (Amazon S3), AWS Lambda, Amazon SQS, and Amazon DynamoDB to streamline inventory operations and ensure accurate inventory levels. The system handles frequent updates from multiple sources, such as suppliers, warehouses, and retail stores, which are received as CSV files.

These CSV files are then uploaded to an S3 bucket, consolidating and securing the inventory data for the inventory management system’s access. The system uses a Lambda function to read and parse the CSV file, extracting individual inventory update records. The backend Lambda function transforms each inventory update record into a message and sends it to an SQS queue. Another Lambda function continually polls the SQS queue for new messages. Upon receiving a message, it retrieves the inventory update details and updates the inventory levels in DynamoDB accordingly.

This ensures that the inventory quantities for each product are accurate and reflect the latest changes. This way, the inventory management system provides real-time visibility into inventory levels across different locations and suppliers, enabling the company to monitor product availability with precision. Find the example code for this solution in the GitHub repository.

This example is used throughout this blog series to highlight how SQS best practices can be implemented based on the AWS Well Architected Framework.

Operational Excellence Pillar

The Operational Excellence Pillar includes the ability to support development and run workloads effectively, gain insight into their operation, and continuously improve supporting processes and procedures to deliver business value. To achieve operational excellence, the pillar recommends best practices such as defining workload metrics and implementing transaction traceability. This enables organizations to gain valuable insights into their operations, identify potential issues, and optimize services accordingly to improve customer experience. Furthermore, understanding the health of an application is critical to ensuring that it is functioning as expected.

Best practice: Use infrastructure as code to deploy SQS

Infrastructure as Code (IaC) helps you model, provision, and manage your cloud resources. One of the primary advantages of IaC is that it simplifies infrastructure management. With IaC, you can quickly and easily replicate your environment to multiple AWS Regions with a single turnkey solution. This makes it easy to manage your infrastructure, regardless of where your resources are located. Additionally, IaC enables you to create, deploy, and maintain infrastructure in a programmatic, descriptive, and declarative way repeatably. This reduces errors caused by manual processes, such as creating resources in the AWS Management Console. With IaC, you can easily control and track changes in your infrastructure, which makes it easier to maintain and troubleshoot your systems.

For managing SQS resources, you can use different IaC tools like AWS Serverless Application Model (AWS SAM), AWS CloudFormation, or AWS Could Development Kit (AWS CDK). There are also third-party solutions for creating SQS resources, such as the Serverless Framework. AWS CDK is a popular choice because it allows you to provision AWS resources using familiar programming languages such as Python, Java, TypeScript, Go, JavaScript, and C#/.Net.

This blog series showcases the use of AWS CDK with Python to demonstrate best practices for working with SQS. For example, the following AWS CDK code creates a new SQS queue:

from aws_cdk import (
    Duration,
    Stack,
    aws_sqs as sqs,
)
from constructs import Construct


class SqsCdBlogStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # The code that defines your stack goes here

        # example resource
        queue = sqs.Queue(
            self,
            "InventoryUpdatesQueue",
            visibility_timeout=Duration.seconds(300),
        )

Best practice: Configure CloudWatch alarms for ApproximateAgeofOldestMessage

It is important to understand Amazon CloudWatch metrics and dimensions for SQS, to have a plan in place to assess its behavior, and to add custom metrics where necessary. Once you have a good understanding of the metrics, it is essential to identify the key metrics that are most relevant to your use case and set up appropriate alerts to monitor them.

One of the key metrics that SQS provides is the ApproximateAgeOfOldestMessage metric. By monitoring this metric, you can determine the age of the oldest message in the queue, and take appropriate action to ensure that messages are processed in a timely manner. To set up alerts for the ApproximateAgeOfOldestMessage metric, you can use CloudWatch alarms. You configure these alarms to issue alerts when messages remain in the queue for extended periods of time. You can use these alerts to act, for instance by scaling up consumers to process messages more quickly or investigating potential issues with message processing.

In the inventory management example, leveraging the ApproximateAgeOfOldestMessage metric provides valuable insights into the health and performance of the SQS queue. By monitoring this metric, you can detect processing delays, optimize performance, and ensure that inventory updates are processed within the desired timeframe. This ensures that your inventory levels remain accurate and up-to-date. The following code creates an alarm which is triggered if the oldest inventory updates request is in the queue for more than 30 seconds.

# Create a CloudWatch alarm for ApproximateAgeOfOldestMessage metric
alarm = cloudwatch.Alarm(
	self,
	"OldInventoryUpdatesAlarm",
	alarm_name="OldInventoryUpdatesAlarm",
	metric=queue.metric_approximate_age_of_oldest_message(),
	threshold=600,  # Specify your desired threshold value in seconds
	evaluation_periods=1,
	comparison_operator=cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
)

Best practice: Add a tracing header while sending a message to the queue to provide distributed tracing capabilities for faster troubleshooting

By implementing distributed tracing, you can gain a clear understanding of the flow of messages in SQS queues, identify any bottlenecks or potential issues, and proactively react to any signals that indicate an unhealthy state. Tracing provides a wider continuous view of an application and helps to follow a user journey or transaction through the application.

AWS X-Ray is an example of a distributed tracing solution that integrates with Amazon SQS to trace messages that are passed through an SQS queue. When using the X-Ray SDK, SQS can propagate tracing headers to maintain trace continuity and enable tracking, analysis, and debugging throughout downstream services. SQS supports tracing headers through the Default HTTP header and the AWSTraceHeader System Attribute. AWSTraceHeader is available for use even when auto-instrumentation through the X-Ray SDK is not, for example, when building a tracing SDK for a new language. If you are using a Lambda downstream consumer, trace context propagation is automatic.

In the inventory management example, by utilizing distributed tracing with X-Ray for SQS, you can gain deep insights into the performance, behavior, and dependencies of the inventory management system. This visibility enables you to optimize performance, troubleshoot issues more effectively, and ensure the smooth and efficient operation of the system. The following code sets up a CSV processing Lambda function and a backend processing Lambda function with active tracing enabled. The Lambda function automatically receives the X-Ray TraceId from SQS.

# Create pre-processing Lambda function
csv_processing_to_sqs_function = _lambda.Function(
    self,
    "CSVProcessingToSQSFunction",
    runtime=_lambda.Runtime.PYTHON_3_8,
    code=_lambda.Code.from_asset("sqs_blog/lambda"),
    handler="CSVProcessingToSQSFunction.lambda_handler",
    role=role,
    tracing=Tracing.ACTIVE,  # Enable active tracing with X-Ray
)

# Create a post-processing Lambda function with the specified role
sqs_to_dynamodb_function = _lambda.Function(
    self,
    "SQSToDynamoDBFunction",
    runtime=_lambda.Runtime.PYTHON_3_8,
    code=_lambda.Code.from_asset("sqs_blog/lambda"),
    handler="SQSToDynamoDBFunction.lambda_handler",
    role=role,
    tracing=Tracing.ACTIVE,  # Enable active tracing with X-Ray
)

Conclusion

This blog post explores best practices for SQS with a focus on the Operational Excellence Pillar of the AWS Well-Architected Framework. We explore key considerations for ensuring the smooth operation and optimal performance of applications using SQS. Additionally, we explore the advantages of infrastructure as code in simplifying infrastructure management and showcase how AWS CDK can be used to provision and manage SQS resources.

The next part of this blog post series addresses the Security Pillar and Reliability Pillar of the AWS Well-Architected Framework and explores best practices for SQS.

For more serverless learning resources, visit Serverless Land.

Testing AWS Lambda functions with AWS SAM remote invoke

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/testing-aws-lambda-functions-with-aws-sam-remote-invoke/

Developers are taking advantage of event driven architecture (EDA) to build large distributed applications. To build these applications, developers are using managed service like AWS Lambda, AWS Step Functions, and Amazon EventBridge to handle compute, orchestration, and choreography. Since these services run in the cloud, developers are also looking for ways to test in the cloud. With this in mind, AWS SAM is adding a new feature to the AWS SAM CLI called remote invoke.

AWS SAM remote invoke enables developers to invoke a Lambda function in the AWS Cloud from their development environment. The feature has several options for identifying the Lambda function to invoke, the payload event, and the output type.

Using remote invoke

To test the remote invoke feature, there is a small AWS SAM application that comprises two AWS Lambda functions. The TranslateFunction takes a text string and translates it to the target language using the AI/ML service Amazon Translate. The StreamFunction generates data in a streaming format. To run these demonstrations, be sure to install the latest AWS SAM CLI.

To deploy the application, follow these steps:

  1. Clone the repository:
    git clone https://github.com/aws-samples/aws-sam-remote-invoke-example
  2. Change to the root directory of the repository:
    cd aws-sam-remote-invoke-example
  3. Build the AWS Lambda artifacts (use the –use-container option to ensure Python 3.10 and Node 18 are present. If these are both set up on your machine, you can ignore this flag):
    sam build --use-container
  4. Deploy the application to your AWS account:
    sam deploy --guided
  5. Name the application “remote-test” and choose all defaults.

AWS SAM can now remotely invoke the Lambda functions deployed with this application. Use the following command to test the TranslateFunction:

sam remote invoke --stack-name remote-test --event '{"message":"I am testing the power of remote invocation", "target-language":"es"}' TranslateFunction

This is a quick way to test a small event. However, developers often deal with large complex payloads. The AWS SAM remote invoke function also allows an event to be passed as a file. Use the following command to test:

sam remote invoke --stack-name remote-test --event-file './events/translate-event.json' TranslateFunction

With either of these methods, AWS SAM returns the response from the Lambda function as if it were called from a service like Amazon API Gateway. However, AWS SAM also offers the ability to get the raw response as returned from the Python software development kit (SDK), boto3. This format provides additional information such as the version that you invoked, if any retries were attempted, and more. To retrieve this output, run the invocation with the additional –output parameter with the value of json.

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --output json TranslateFunction
Full output from SDK

Full output from SDK

It is also possible to invoke Lambda functions that are not created in AWS SAM. Using the name of a Lambda function, AWS SAM can remotely invoke any Lambda function that you have permission to invoke. When you deployed the sample application, AWS SAM prints the name of the Lambda function in the console. Use the following command to print the output again:

sam list stack-outputs --stack-name remote-test

Using the output for the TranslateFunctionName, run:

sam remote invoke --event '{"message": "Testing direct access of the function", "target-language": "fr"}' <TranslateFunctionName>

Lambda recently added support from streaming responses from Lambda functions. Streaming functions do not wait until the entire response is available before they respond to the client. To show this, the StreamFunction generates multiple chunks of text and sends them over a period of time.

To invoke the function, run:

sam remote invoke --stack-name remote-test StreamFunction

Extending remote invoke

The AWS SDKs offer different options when invoking Lambda functions via the Lambda service. Behind the scenes, AWS SAM is using boto3 to power the remote invoke functionality. To make full use of the SDK options for Lambda function invocation, the AWS SAM offers a —parameter flag that can be used multiple times.

For example, you may want to run an invocation as a dry run only. This type of invocation tests Lambda’s ability to invoke the function based on factors like variable values and proper permissions. The command looks like the following:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --parameter InvocationType=DryRun --output json TranslateFunction

In a second example, I want to invoke a specific version of the Lambda function:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --parameter Qualifier='$LATEST' TranslateFunction

If you need both options:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' --parameter InvocationType=DryRun --parameter Qualifier='$LATEST' --output json TranslateFunction

Logging

When developing distributed applications, logging is a critical tool to trace the state of a request across decoupled microservices. AWS SAM offers the sam logs functionality to help view aggregated logs and traces from Amazon CloudWatch and AWS X-Ray, respectively. However, when testing individual functions, developers want contextual logs pinpointed to a specific invocation. The new remote invoke function provides these logs by default. Returning to the TranslateFunction, run the following command again:

sam remote invoke --stack-name remote-test --event '{"message": "I am testing the power of remote invocation", "target-language": "es"}' TranslateFunction
Logging response from remote invoke

Logging response from remote invoke

The remote invocation returns the response from the Lambda function, any logging from within the Lambda function, followed by the final report from the Lambda service about the invocation itself.

Combining remote invoke with AWS SAM Accelerate

Developers are constantly striving to remove complexity and friction and improve speed and agility in the development pipeline. To help serverless developers towards this goal, the AWS SAM team released a feature called AWS SAM Accelerate. AWS SAM Accelerate is a series of features that move debugging and testing from the local machine to the cloud.

To show how AWS SAM Accelerate and remote invoke can work together, follow these steps:

  1. In a separate terminal, start the AWS SAM sync process with the watch option:
    sam sync --stack-name remote-test --use-container --watch
  2. In a second window or tab, run the remote invoke function:
    sam remote invoke --stack-name remote-test --event-file './events/translate-event.json' TranslateFunction

The combination of these two options provides a robust auto-deployment and testing environment. During iterations of code in the Lambda function, each time you save the file, AWS SAM syncs the code and any dependencies to the cloud. As needed, the remote invoke is then run to verify the code works as expected, with logging provided for each execution.

Conclusion

Serverless developers are looking for the most efficient way to test their applications in the AWS Cloud. They want to invoke an AWS Lambda function quickly without having to mock security, external services, or other environment variables. This blog shows how to use the new AWS SAM remote invoke feature to do just that.

This post shows how to invoke the Lambda function, change the payload type and location, and change the output format. It explains using this feature in conjunction with the AWS SAM Accelerate features to streamline the serverless development and testing process.

For more serverless learning resources, visit Serverless Land.