Tag Archives: announcements

2025 CyberVadis report now available for due diligence on third-party suppliers

2025-07-07 Tea Jioshvili

Post Syndicated from Tea Jioshvili original https://aws.amazon.com/blogs/security/2025-cybervadis-report-now-available-for-due-diligence-on-third-party-suppliers/

We’re excited to announce that AWS has completed the CyberVadis assessment of its security posture with the highest score (Mature) in all assessed areas. This demonstrates our continued commitment to meet the heightened expectations for cloud service providers. Customers can now use the 2025 AWS CyberVadis report and scorecard to reduce their supplier due-diligence burden.

With the increasing adoption of cloud products and services across multiple sectors and industries, AWS is a critical component of customers’ third-party environments. Regulated customers, such as those in the financial services sector, are held to high standards by regulators and auditors when it comes to exercising effective due diligence on third parties.

Many customers use third-party risk management services such as CyberVadis to better manage risks from their evolving third-party environments and drive operational efficiencies. In support of these efforts, AWS has completed its annual CyberVadis security posture assessment, conducted by CyberVadis security analysts.

CyberVadis is a comprehensive third-party risk assessment process that combines the speed and scalability of automation with the certainty of analyst validation. CyberVadis assessments employ a dynamic and comprehensive approach to third-party risk assessment, replacing outdated static spreadsheets and the need for annual AWS assessment access requests. This cloud-based solution provides advanced capabilities by integrating AWS responses with analytics and sophisticated risk models to deliver an in-depth view of the security posture of AWS.

CyberVadis’s risk assessment methodology evaluates 20 topics covering the entire cybersecurity life cycle across four phases: Identify, Protect, Detect, and React. These topics include Data Privacy, Access Management, and Infrastructure Security. The assessment criteria are based on international information security standards, including ISO 2700x, NIST Cybersecurity Framework, Cybersecurity for ICS, PCI DSS, NIS2 and GDPR.

Customers can use CyberVadis results to map the assessment of AWS to commonly used industry frameworks and standards to instantly gain visibility into controls coverage.

AWS customers can download the complete 2025 AWS Assessment Report directly through CyberVadis’s portal using their own account, or through AWS Artifact.

We value your feedback and questions. Reach out to the AWS Compliance team through the Contact Us page. If you have feedback about this post, submit comments in the Comments section below. To learn more about our other compliance and security programs, see AWS Compliance Programs.

Amazon Nova Canvas update: Virtual try-on and style options now available

2025-07-02 Matheus Guimaraes

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/amazon-nova-canvas-update-virtual-try-on-and-style-options-now-available/

Have you ever wished you could quickly visualize how a new outfit might look on you before making a purchase? Or how a piece of furniture would look in your living room? Today, we’re excited to introduce a new virtual try-on capability in Amazon Nova Canvas that makes this possible. In addition, we are adding eight new style options for improved style consistency for text-to-image based style prompting. These features expand Nova Canvas AI-powered image generation capabilities making it easier than ever to create realistic product visualizations and stylized images that can enhance the experience of your customers.

Let’s take a quick look at how you can start using these today.

Getting started
The first thing is to make sure that you have access to the Nova Canvas model through the usual means. Head to the Amazon Bedrock console, choose Model access and enable Amazon Nova Canvas for your account making sure that you select the appropriate regions for your workloads. If you already have access and have been using Nova Canvas, you can start using the new features immediately as they’re automatically available to you.

Virtual try-on
The first exciting new feature is virtual try-on. With this, you can upload two pictures and ask Amazon Nova Canvas to put them together with realistic results. These could be pictures of apparel, accessories, home furnishings, and any other products including clothing. For example, you can provide the picture of a human as the source image and the picture of a garment as the reference image, and Amazon Nova Canvas will create a new image with that same person wearing the garment. Let’s try this out!

My starting point is to select two images. I picked one of myself in a pose that I think would work well for a clothes swap and a picture of an AWS-branded hoodie.

Note that Nova Canvas accepts images containing a maximum of 4.1M pixels – the equivalent of 2,048 x 2,048 – so be sure to scale your images to fit these constraints if necessary. Also, if you’d like to run the Python code featured in this article, ensure you have Python 3.9 or later installed as well as the Python packages boto3 and pillow.

To apply the hoodie to my photo, I use the Amazon Bedrock Runtime invoke API. You can find full details on the request and response structures for this API in the Amazon Nova User Guide. The code is straightforward, requiring only a few inference parameters. I use the new taskType of "VIRTUAL_TRY_ON". I then specify the desired settings, including both the source image and reference image, using the virtualTryOnParams object to set a few required parameters. Note that both images must be converted to Base64 strings.

import base64


def load_image_as_base64(image_path): 
   """Helper function for preparing image data."""
   with open(image_path, "rb") as image_file:
      return base64.b64encode(image_file.read()).decode("utf-8")


inference_params = {
   "taskType": "VIRTUAL_TRY_ON",
   "virtualTryOnParams": {
      "sourceImage": load_image_as_base64("person.png"),
      "referenceImage": load_image_as_base64("aws-hoodie.jpg"),
      "maskType": "GARMENT",
      "garmentBasedMask": {"garmentClass": "UPPER_BODY"}
   }
}

Nova Canvas uses masking to manipulate images. This is a technique that allows AI image generation to focus on specific areas or regions of an image while preserving others, similar to using painter’s tape to protect areas you don’t want to paint.

You can use three different masking modes, which you can choose by setting maskType to the correct value. In this case, I’m using "GARMENT", which requires me to specify which part of the body I want to be masked. I’m using "UPPER_BODY" , but you can use others such as "LOWER_BODY", "FULL_BODY", or "FOOTWEAR" if you want to specifically target the feet. Refer to the documentation for a full list of options.

I then call the invoke API, passing in these inference arguments and saving the generated image to disk.

# Note: The inference_params variable from above is referenced below.

import base64
import io
import json

import boto3
from PIL import Image

# Create the Bedrock Runtime client.
bedrock = boto3.client(service_name="bedrock-runtime", region_name="us-east-1")

# Prepare the invocation payload.
body_json = json.dumps(inference_params, indent=2)

# Invoke Nova Canvas.
response = bedrock.invoke_model(
   body=body_json,
   modelId="amazon.nova-canvas-v1:0",
   accept="application/json",
   contentType="application/json"
)

# Extract the images from the response.
response_body_json = json.loads(response.get("body").read())
images = response_body_json.get("images", [])

# Check for errors.
if response_body_json.get("error"):
   print(response_body_json.get("error"))

# Decode each image from Base64 and save as a PNG file.
for index, image_base64 in enumerate(images):
   image_bytes = base64.b64decode(image_base64)
   image_buffer = io.BytesIO(image_bytes)
   image = Image.open(image_buffer)
   image.save(f"image_{index}.png")

I get a very exciting result!

And just like that, I’m the proud wearer of an AWS-branded hoodie!

In addition to the "GARMENT" mask type, you can also use the "PROMPT" or "IMAGE" masks. With "PROMPT", you also provide the source and reference images, however, you provide a natural language prompt to specify which part of the source image you’d like to be replaced. This is similar to how the "INPAINTING" and "OUTPAINTING" tasks work in Nova Canvas. If you want to use your own image mask, then you choose the "IMAGE" mask type and provide a black-and-white image to be used as mask, where black indicates the pixels that you want to be replaced on the source image, and white the ones you want to preserve.

This capability is specifically useful for retailers. They can use it to help their customers make better purchasing decisions by seeing how products look before buying.

Using style options
I’ve always wondered what I would look like as an anime superhero. Previously, I could use Nova Canvas to manipulate an image of myself, but I would have to rely on my good prompt engineering skills to get it right. Now, Nova Canvas comes with pre-trained styles that you can apply to your images to get high-quality results that follow the artistic style of your choice. There are eight available styles including 3D animated family film, design sketch, flat vector illustration, graphic novel, maximalism, midcentury retro, photorealism, and soft digital painting.

Applying them is as straightforward as passing in an extra parameter to the Nova Canvas API. Let’s try an example.

I want to generate an image of an AWS superhero using the 3D animated family film style. To do this, I specify a taskType of "TEXT_IMAGE" and a textToImageParams object containing two parameters: text and style. The text parameter contains the prompt describing the image I want to create which in this case is “a superhero in a yellow outfit with a big AWS logo and a cape.” The style parameter specifies one of the predefined style values. I’m using "3D_ANIMATED_FAMILY_FILM" here, but you can find the full list in the Nova Canvas User Guide.

inference_params = {
   "taskType": "TEXT_IMAGE",
   "textToImageParams": {
      "text": "a superhero in a yellow outfit with a big AWS logo and a cape.",
      "style": "3D_ANIMATED_FAMILY_FILM",
   },
   "imageGenerationConfig": {
      "width": 1280,
      "height": 720,
      "seed": 321
   }
}

Then, I call the invoke API just as I did in the previous example. (The code has been omitted here for brevity.) And the result? Well, I’ll let you judge for yourself, but I have to say I’m quite pleased with the AWS superhero wearing my favorite color following the 3D animated family film style exactly as I envisioned.

What’s really cool is that I can keep my code and prompt exactly the same and only change the value of the style attribute to generate an image in a completely different style. Let’s try this out. I set style to PHOTOREALISM.

inference_params = { 
   "taskType": "TEXT_IMAGE", 
   "textToImageParams": { 
      "text": "a superhero in a yellow outfit with a big AWS logo and a cape.",
      "style": "PHOTOREALISM",
   },
   "imageGenerationConfig": {
      "width": 1280,
      "height": 720,
      "seed": 7
   }
}

And the result is impressive! A photorealistic superhero exactly as I described, which is a far departure from the previous generated cartoon and all it took was changing one line of code.

Things to know
Availability – Virtual try-on and style options are available in Amazon Nova Canvas in the US East (N. Virginia), Asia Pacific (Tokyo), and Europe (Ireland). Current users of Amazon Nova Canvas can immediately use these capabilities without migrating to a new model.

Pricing – See the Amazon Bedrock pricing page for details on costs.

For a preview of virtual try-on of garments, you can visit nova.amazon.com where you can upload an image of a person and a garment to visualize different clothing combinations.

If you are ready to get started, please check out the Nova Canvas User Guide or visit the AWS Console.

Matheus Guimaraes | @codingmatheus

Introducing GenAI-powered business description recommendations for custom assets in Amazon SageMaker Catalog

2025-07-02 Ramesh H Singh

Post Syndicated from Ramesh H Singh original https://aws.amazon.com/blogs/big-data/introducing-genai-powered-business-description-recommendations-for-custom-assets-in-amazon-sagemaker-catalog/

An organization’s data can come from various sources, including cloud-based pipelines, partner ecosystems, open table formats like Apache Iceberg, software as a service (SaaS) platforms, and internal applications. Although much of this data is business-critical, the ability to make it documented and discoverable at scale continues to challenge teams—especially when assets don’t originate from pre-integrated AWS based sources.

To help bridge this gap, Amazon SageMaker Catalog—part of the next generation of Amazon SageMaker—now supports generative AI-powered recommendations for business descriptions, including table summaries, use cases, and column-level descriptions for custom structured assets registered programmatically. This new capability, powered by large language models (LLMs) in Amazon Bedrock, extends automated metadata generation to the broader spectrum of enterprise data, including Iceberg tables in Amazon Simple Storage Service (Amazon S3) or datasets from third-party and internal applications.

With just a few clicks, you can create AI-generated suggestions, review and refine descriptions, and publish enriched asset metadata directly to the catalog. This helps reduce manual documentation effort, improves metadata consistency, and accelerates asset discoverability across organizations.

This launch is part of our broader investment in generative AI-powered cataloging and metadata intelligence across SageMaker Catalog. By combining machine learning (ML) with human oversight and governance controls, we’re making it straightforward for organizations to scale trusted, usable data across business units.

In this post, we demonstrate how to generate AI recommendations for business descriptions for custom structured assets in SageMaker Catalog.

Challenges when using incomplete metadata for custom and external data

SageMaker Catalog supports automated documentation for assets harvested from AWS-centered services like AWS Glue and Amazon Redshift. These built-in integrations automatically pull schema and generate contextual metadata, making it straightforward for data consumers to discover and understand what’s available.

However, many critical datasets originate outside of these services, such as:

Iceberg tables stored in Amazon S3
Structured datasets from third-party platforms like Snowflake or Databricks
Relational assets manually registered using APIs

As a result, customers had to manually enter business descriptions and column-level context—a process that delays publishing, introduces inconsistency, and undermines the discoverability of important assets.

With this launch, SageMaker Catalog adds support for generative AI-powered metadata generation for custom schema-based data assets registered programmatically through APIs. We use large language models (LLMs) in Amazon Bedrock to automatically generate key elements for custom structured assets. This includes providing a comprehensive table summary, detailed column-level descriptions, and suggesting potential analytical use cases. These automated capabilities help streamline the documentation process, ensuring consistency and efficiency across data assets.

Customer Spotlight

Across industries, customers are managing thousands of structured datasets that don’t originate from AWS-native pipelines. These datasets often lack documentation—not because they’re unimportant, but because documenting them is time-consuming, repetitive, and often deprioritized.

How Amazon’s Finance is revolutionizing data management with AI-powered metadata generation

As a large-scale organization with diverse data needs, Amazon’s Finance team manages thousands of data assets. Within the Finance organization, numerous datasets often lack proper documentation, creating bottlenecks that hinder critical financial analysis and decision-making.

Balaji Kumar Gopalakrishnan, Principal Engineer at Amazon Finance, shares how the AI-powered metadata generation capability is transforming their data management approach:

“As a finance organization, we manage numerous datasets that lack proper documentation, creating bottlenecks for critical financial analysis. The AI-powered auto-documentation capability would be transformative for our team—alleviating the manual documentation effort that delays asset discovery and usability. This would dramatically reduce our time-to-insight for reporting while enforcing consistent metadata standards across all our manually registered assets.”

This empowers teams like Amazon Finance to streamline metadata generation and documentation, making critical financial data easier to access and work with. By automating metadata creation, teams can focus on high-impact analysis, accelerating their decision-making process and improving the overall efficiency of the organization.

Key Benefits

This new feature directly addresses key challenges faced by cataloging teams by enabling them to:

Accelerate time to publish: Minimize the delay between data availability and catalog readiness.
Improve metadata quality: Ensure consistent, LLM-generated context, regardless of schema authors.
Enhance discoverability: Enable quick and easy access to data through rich, searchable descriptions.
Build trust: Provide transparent, editable AI suggestions to ensure metadata aligns with organizational needs and domain accuracy.

For data producers, this capability eliminates the need for repetitive, manual documentation, saving valuable time. By automating metadata generation, it also standardizes how metadata is written and structured across assets, resulting in faster publishing and quicker data access for consumers.

On the consumer side, the enhanced metadata offers greater clarity, allowing users to understand the data and its usage at a glance. With complete and curated metadata, they can trust the source, while working more independently and reducing reliance on subject matter experts (SMEs) and data stewards for interpretation.

Solution overview

In this post, we demonstrate how to manually create a structured asset and use the new AI-powered capability to generate business metadata to improve asset usability. The asset we add is a product inventory table with the following columns:

Table : ProductInventory
   Columns :
        productID : string
        name: string
        price: double
        stock_quantity : integer
        shipped_from : integer

Prerequisites

To follow this post, you must have an Amazon SageMaker Unified Studio domain set up with a domain owner or domain unit owner privileges. You must have a project that we will use to publish assets. For instructions, refer to the SageMaker Unified Studio Getting started guide.

Create an asset

Complete the following steps to manually create the asset:

The manually registered asset types need to use the amazon.datazone.RelationalTableFormType form type. Get the latest revision in your domain. Run the following command, replacing the domain-identifier with your domain:

aws datazone  get-form-type --domain-identifier dzd_xxxxf --form-type-identifier amazon.datazone.RelationalTableFormType

The latest revision returned is 7, which we use in the next steps:

{
    "createdAt": "2024-12-23T21:12:50.484000+00:00",
    "createdBy": "SYSTEM",
    "domainId": "dzd_xxxxf",
    "imports": [
        {
            "name": "amazon.datazone.RelationalColumnMixin",
            "revision": "5"
        },
        {
            "name": "amazon.datazone.RelationalTableMixin",
            "revision": "5"
        }
    ],
    "model": {
        "smithy": "$version: \"2.0\"\n\nnamespace amazon.datazone\n\nstructure RelationalColumn with [ RelationalColumnMixin ] {\n\n}\n\nlist RelationalColumns {\n    member: RelationalColumn\n}\n\n@documentation(\"A generic form-type to capture relational table details\")\nstructure RelationalTableFormType with [ RelationalTableMixin ] {\n\n    columns: RelationalColumns\n}"
    },
    "name": "amazon.datazone.RelationalTableFormType",
    "originDomainId": "dzd_amazon_datazone_domain",
    "originProjectId": "dzd_amazon_datazone_domain_project",
    "owningProjectId": "dzd_amazon_datazone_domain_project",
    
    "status": "ENABLED"
}

Create a new asset type that uses the amazon.datazone.RelationalTableFormType revision returned in the previous step:

aws datazone create-asset-type \
>   --domain-identifier dzd_xxxxf \
>   --name MyAssetType \
>   --description "Manually registered custom asset type" \
>   --owning-project-identifier 4zxxxx3r \
>   --forms-input '{"MyCustomForm": {"required": true, "typeIdentifier": "amazon.datazone.RelationalTableFormType","typeRevision":"7"}}'

You will receive a success response similar to the following:

{
    "description": "Manually registered custom asset type",
    "domainId": "dzd_xxxxf",
    "formsOutput": {
        "AssetCommonDetailsForm": {
            "required": false,
            "typeName": "amazon.datazone.AssetCommonDetailsFormType",
            "typeRevision": "6"
        },
        "MyCustomForm": {
            "required": true,
            "typeName": "amazon.datazone.RelationalTableFormType",
            "typeRevision": "7"
        }
    },
    "name": "MyAssetType",
    "revision": "1"
}

Create the asset for the table using the asset type and replacing the domain and project identifiers in your domain. For this example, we also enable businessNameGeneration:

aws datazone create-asset --domain-identifier dzd_xxxxf \
--name ProductInventory \
--owning-project-identifier 4zxxxx3r \
--type-identifier MyAssetType \
--forms-input  '[{
    "content": "{\r\n  \"tableName\": \"ProductInventory\",\r\n  \"columns\": [\r\n    {\r\n      \"columnName\": \"productID\",\r\n      \"dataType\": \"string\"\r\n    },\r\n    {\r\n      \"columnName\": \"name\",\r\n      \"dataType\": \"string\"\r\n    },\r\n    {\r\n      \"columnName\": \"price\",\r\n      \"dataType\": \"double\"\r\n    },\r\n    {\r\n      \"columnName\": \"stock_quantity\",\r\n      \"dataType\": \"integer\"\r\n    },\r\n    {\r\n      \"columnName\": \"shipped_from\",\r\n      \"dataType\": \"string\"\r\n    }\r\n  ]\r\n}",
    "formName": "MyCustomForm",
    "typeIdentifier": "amazon.datazone.RelationalTableFormType"}]'

The following is an example success response after the asset is created:

{
    "createdAt": "2025-06-24T23:47:51.734000+00:00",
    "createdBy": "9665be38-c692-4474-a41f-5d9793040f08",
    "domainId": "dzd_xxxxf",
    "firstRevisionCreatedAt": "2025-06-24T23:47:51.734000+00:00",
    "firstRevisionCreatedBy": "9665be38-c692-4474-a41f-5d9793040f08",
    "formsOutput": [
        {
            "content": "{\"tableName\":\"ProductInventory\",\"columns\":[{\"columnName\":\"productID\",\"dataType\":\"string\"},{\"columnName\":\"name\",\"dataType\":\"string\"},{\"columnName\":\"price\",\"dataType\":\"double\"},{\"columnName\":\"stock_quantity\",\"dataType\":\"integer\"},{\"columnName\":\"shipped_from\",\"dataType\":\"string\"}]}",
            "formName": "MyCustomForm",
            "typeName": "amazon.datazone.RelationalTableFormType"
        }
    ],
    "id": "4e4w5chq6lf3tz",
    "name": "ProductInventory",
    "owningProjectId": "4zxxxx3r",
    "predictionConfiguration": {
        "businessNameGeneration": {
            "enabled": true
        }
    },
    "readOnlyFormsOutput": [],
    "revision": "1",
    "typeIdentifier": "MyAssetType",
    "typeRevision": "1"
}

When an asset is created with businessNameGeneration enabled, it generates the business name predictions asynchronously. After they are generated, they are returned as suggestions under the asset’s readOnlyForms.

Generate business metadata

Complete the following steps to generate metadata:

Log in to the SageMaker Unified Studio portal, open the project that you used, and choose Assets in the navigation pane.

The business name is already generated for the asset and columns.

To generate descriptions, choose Generate descriptions.

The following screenshot shows the generated names on the Schema tab.

If you approve of the generated names, choose Accept all.

Choose Accept all again to confirm.

Choose Generate descriptions to create suggested table and column descriptions.

Review the generated recommendations and choose Accept all if it looks accurate.

The following screenshot shows the generated descriptions.

Even when assets are registered as custom, you can use this feature to generate business context and seamlessly publish it to SageMaker catalog.

Conclusion

As enterprise data environments become increasingly distributed and sourced from diverse platforms, maintaining metadata quality at scale presents a challenge. This feature uses generative AI to automate the creation of business descriptions, including table summaries, use cases, and column-level metadata, reducing manual effort while preserving alignment with governance requirements.

The feature is available in the next generation of SageMaker through SageMaker Catalog for custom structured assets (with schema) registered programmatically using an API. For implementation details, refer to the product documentation.

About the authors

Ramesh H Singh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon SageMaker team. He is passionate about building high-performance ML/AI and analytics products that enable enterprise customers to achieve their critical goals using cutting-edge technology. Connect with him on LinkedIn.

Pradeep Misra is a Principal Analytics Solutions Architect at AWS. He works across Amazon to architect and design modern distributed analytics and AI/ML platform solutions. He is passionate about solving customer challenges using data, analytics, and AI/ML. Outside of work, Pradeep likes exploring new places, trying new cuisines, and playing board games with his family. He also likes doing science experiments, building LEGOs and watching anime with his daughters.

Balaji Kumar Gopalakrishnan is a Principal Engineer at Amazon Finance Technology. He has been with Amazon since 2013, solving real-world challenges through technology that directly impact the lives of Amazon customers. Outside of work, Balaji enjoys hiking, painting, and spending time with his family. He is also a movie buff!

Mohit Dawar is a Senior Software Engineer at AWS working on DataZone and SageMaker Unified Studio. Over the past three years, he has led efforts around the core metadata catalog, generative AI-powered metadata curation, and lineage visualization. He enjoys working on large-scale distributed systems, experimenting with AI to improve user experience, and building tools that make data governance feel effortless. Connect with him on LinkedIn.

Mark Horta is a Software Development Manager at AWS working on DataZone and SageMaker Unified Studio. He is responsible for leading the engineering efforts for SageMaker Catalog focusing on generative-AI metadata generation and curation and data lineage.

Build the highest resilience apps with multi-Region strong consistency in Amazon DynamoDB global tables

2025-06-30 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/build-the-highest-resilience-apps-with-multi-region-strong-consistency-in-amazon-dynamodb-global-tables/

While tens of thousands of customers are successfully using Amazon DynamoDB global tables with eventual consistency, we’re seeing emerging needs for even stronger resilience. Many organizations find that the DynamoDB multi-Availability Zone architecture and eventually consistent global tables meet their requirements, but critical applications like payment processing systems and financial services demand more.

For these applications, customers require a zero Recovery Point Objective (RPO) during rare Region-wide events, meaning you can direct your app to read the latest data from any Region. Your multi-Region applications always need to access the same data regardless of location.

Starting today, you can use a new Amazon DynamoDB global tables capability that provides multi-Region strong consistency (MRSC), enabling zero RPO. This capability, first announced as a preview at AWS re:Invent 2024, simplifies building highly resilient global applications.

Here’s how you can enable MRSC starting with an existing empty DynamoDB table:

With MRSC, DynamoDB now offers you the highest level of resilience for your applications. If application processing is interrupted in a Region, you can redirect traffic to another Region containing a MRSC replica and know that you’ll be processing the latest data.

Getting started
Let me walk you through how you can use this new capability.

To start using MRSC, I need to create a global table from an existing DynamoDB table that doesn’t contain any data. I navigate to my existing table, select the Global tables tab and select Create replica.

The MRSC availability architecture requires three AWS Regions. I can configure MRSC with either three full replicas or with two replicas and a witness. A witness contains only replicated change data to provide the required availability without maintaining a full copy of my table data.

The following screenshot shows how I configure MRSC with two replicas and a witness. To configure MRSC with three full replicas instead, I can clear Configure Region 2 as Witness.

If I need to update my existing table programmatically, I can use the Amazon Q CLI to generate the required command with the following prompt:

> Hey Q! Update my existing DynamoDB table called "demo-mrsc" in us-east-1 with multi-Region strong consistency across us-east-2 with witness in us-west-2 Regions

Shortly after, Q CLI will respond back with following command:

> To update your DynamoDB table with multi-Region strong consistency, you'll need to use the update-table command with the appropriate parameters. Here's how you would do it:

aws dynamodb update-table \ 
    --table-name demo-mrsc \ 
    --replica-updates '[{"Create": {"RegionName": "us-east-2"}}]' \ 
    --global-table-witness-updates '[{"Create": {"RegionName": "us-west-2"}}]' \ 
    --multi-region-consistency STRONG \ 
    --region us-east-1

After it’s finished processing, I can check the status of my MRSC global table. I can see I have a witness configured for my DynamoDB global table. A witness reduces costs while still providing the resilience benefits of multi-Region strong consistency.

Then, in my application, I can use ConsistentRead to read data with strong consistency. Here’s a Python example:

import boto3

# Configure the DynamoDB client for your region
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')
table = dynamodb.Table('demo-mrsc')

pk_id = "demo#test123"

# Read with strong consistency across regions
response = table.get_item(
    Key={
        'PK': pk_id
    },
    ConsistentRead=True
)

print(response)

For operations that require the strongest resilience, I can use ConsistentRead=True. For less critical operations where eventual consistency is acceptable, I can omit this parameter to improve performance and reduce costs.

Additional things to know
Here are a couple of things to note:

Availability – The Amazon DynamoDB multi-Region strong consistency capability is available in following AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Osaka, Seoul, Tokyo), and Europe (Frankfurt, Ireland, London, Paris)
Pricing – Multi-Region strong consistency pricing follows the existing global tables pricing structure. DynamoDB recently reduced global tables pricing by up to 67 percent, making this highly resilient architecture more affordable than ever. Visit Amazon DynamoDB lowers pricing for on-demand throughput and global tables in the AWS Database Blog to learn more.

Learn more about how you can achieve the highest level of application resilience, enable your applications to be always available and always read the latest data regardless of the Region by visiting Amazon DynamoDB global tables.

Happy building!

— Donnie

Amazon Redshift Python user-defined functions will reach end of support after June 30, 2026

2025-06-30 Raks Khare

Post Syndicated from Raks Khare original https://aws.amazon.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/

The Amazon Redshift integration with AWS Lambda provides the capability to create Amazon Redshift Lambda user-defined functions (UDFs). This capability delivers flexibility, enhanced integrations, and security for functions defined in Lambda that can be run through SQL queries. Amazon Redshift Lambda UDFs offer many advantages:

Enhanced integration – You can connect to external services or APIs from within your UDF logic, enabling richer data enrichment and operational workflows.
Multiple Python runtimes – Lambda UDFs benefit from Lambda function support for multiple Python runtimes depending on specific use cases. In addition, the new versions and security patches are available within a month of their official release.
Independent scaling – Lambda UDFs use Lambda compute resources, so heavy compute or memory-intensive tasks don’t impact query performance or resource concurrency within Amazon Redshift.
Isolation and security – You can isolate custom code execution in a separate service boundary. This simplifies maintenance, monitoring, budgeting, and permission management.

Because Lambda UDFs provide these significant advantages in integration, flexibility, scalability, and security, we will be ending support for Python UDFs in Amazon Redshift. We recommend that you migrate your existing Python UDFs to Lambda UDFs by June 30, 2026.

October 30, 2025 – Creation of new Python UDFs will no longer be supported (existing functions can still be invoked)
June 30, 2026 – Execution of existing Python UDFs will be suspended

In this post, we walk you through how to migrate your existing Python UDFs to Lambda UDFs, set up monitoring and cost evaluations, and review key considerations for a smooth transition.

Solution overview

You can create UDFs for tasks such as tokenization, encryption and decryption, or data science functionality like the Levenshtein distance calculation. For this post, we provide examples for customers who have Python UDFs in place, demonstrating how to replace them with Lambda UDFs.

The Levenshtein function, also known as the Levenshtein distance or edit distance, is a string metric used to measure the difference between two sequences of characters. Although this functionality was previously implemented using Python UDFs using the Python library in Amazon Redshift, Lambda provides a more efficient and scalable solution. This post demonstrates how to migrate from Python UDFs to Lambda UDFs for calculating Levenshtein distances.

Prerequisites

You must have the following:

An AWS account.
One of the following resources, depending on your use case:
- A Redshift cluster if you are using Amazon Redshift Provisioned. For instructions, refer to Create a sample Amazon Redshift cluster.
- A Redshift workgroup if you are using Amazon Redshift Serverless. For instructions, refer to Create a workgroup with a namespace.
An AWS Identity and Access Management (IAM) role that is able to ingest data from Amazon Simple Storage Service (Amazon S3) to Amazon Redshift. Set the IAM role as the default for Amazon Redshift.
Permissions to create Lambda functions and access Amazon CloudWatch.

Prepare the data

To set up our use case, complete the following steps:

On the Amazon Redshift console, choose Query editor v2 under Explorer in the navigation pane.
Connect to your Redshift data warehouse.
Create a table and load data. The following query loads 30,000,000 rows in the customer table:

DROP TABLE IF EXISTS customer;
CREATE TABLE customer
(
c_customer_sk int4 not null ,
c_customer_id char(16) not null ,
c_current_cdemo_sk int4 ,
c_current_hdemo_sk int4 ,
c_current_addr_sk int4 ,
c_first_shipto_date_sk int4 ,
c_first_sales_date_sk int4 ,
c_salutation char(10) ,
c_first_name char(20) ,
c_last_name char(30) ,
c_preferred_cust_flag char(1) ,
c_birth_day int4 ,
c_birth_month int4 ,
c_birth_year int4 ,
c_birth_country varchar(20) ,
c_login char(13) ,
c_email_address char(50) ,
c_last_review_date_sk int4 ,
primary key (c_customer_sk)
) distkey(c_customer_sk);

COPY customer from 's3://redshift-downloads/TPC-DS/2.13/3TB/customer/'
IAM_ROLE default gzip delimiter '|' EMPTYASNULL REGION 'us-east-1';

Identify existing Python UDFs

Run the following script to list existing Python UDFs:

SELECT 
    p.proname, 
    p.pronargs, 
    t.typname, 
    n.nspname, 
    l.lanname, 
    pg_get_functiondef(p.oid) 
FROM 
    pg_proc p, 
    pg_language l, 
    pg_type t, 
    pg_namespace n
WHERE 
    p.prolang = l.oid
    and p.prorettype = t.oid
    and l.lanname = 'plpythonu'
    and p.pronamespace = n.oid
    and nspname not in ('pg_catalog', 'information_schema')
ORDER BY 
    proname;

The following is our existing Python UDF definition for Levenshtein distance:

create or replace function fn_levenshtein_distance(a varchar, b varchar) returns integer as
$$

def levenshtein_distance(a, len_a, b, len_b):
    d = [[0] * (len_b + 1) for i in range(len_a + 1)]  

    for i in range(1, len_a + 1):
        d[i][0] = i

    for j in range(1, len_b + 1):
        d[0][j] = j
    
    for j in range(1, len_b + 1):
        for i in range(1, len_a + 1):
            if a[i - 1] == b[j - 1]:
                cost = 0
            else:
                cost = 1
            d[i][j] = min(d[i - 1][j] + 1,      # deletion
                          d[i][j - 1] + 1,      # insertion
                          d[i - 1][j - 1] + cost) # substitution   

    return d[len_a][len_b]

def distance(a, b):
    len_a, len_b = len(a), len(b)
    if len_a == len_b:
        return 0
    elif len_a == 0:
        return len_b
    elif len_b == 0:
        return len_a
    else:
        return levenshtein_distance(a, len_a, b, len_b)

return distance(a, b)
$$ immutable;

Convert the Python UDF function to a Lambda UDF

You can simplify converting your Python UDF to a Lambda UDF using Amazon Q Developer, a generative AI-powered assistant. It handles code transformation, packaging, and integration logic, accelerating migration and improving scalability. Integrated with popular developer tools like VS Code, JetBrains, and others, Amazon Q streamlines workflows so teams can modernize analytics using serverless architectures with minimal effort.

Amazon Q Developer code suggestions are based on large language models (LLMs) trained on billions of lines of code, including open source and Amazon code. Always review a code suggestion before accepting it, and you might need to edit it to make sure that it does exactly what you intended.

Convert @python-udf.py Redshift Python UDF to Redshift Lambda UDF which batch processes data in the arguments array in a loop and returns json dump at the end. Refer to @lambda-context.py for reference and additional guidance on Lambda UDF.

Create a Lambda function

Complete the following steps to create a Lambda function:

On the Lambda console, choose Functions in the navigation pane.
Choose Create function.
Choose Author from scratch.
For Function name, enter a custom name (for example, levenshtein_distance_func).
For Runtime, choose your code environment. (The examples in this post are compatible with Python 3.12.)
For Architecture, select your system architecture. (The examples in this post are compatible with x86_64.)

For Execution role, select Create a new role with basic Lambda permissions.

Choose Create function.
Choose Code and add the following code:

import json

def lambda_handler(event, context):
    t1 = event['arguments']
    resp = [None]*len(t1)

    for i, x in enumerate(t1):
        if x[0] is not None and x[1] is not None:
            resp[i] = distance(x[0], x[1])

    ret = dict()
    ret['results'] = resp
    return json.dumps(ret)

def levenshtein_distance(a, len_a, b, len_b):
    d = [[0] * (len_b + 1) for i in range(len_a + 1)]  

    for i in range(1, len_a + 1):
        d[i][0] = i

    for j in range(1, len_b + 1):
        d[0][j] = j
    
    for j in range(1, len_b + 1):
        for i in range(1, len_a + 1):
            if a[i - 1] == b[j - 1]:
                cost = 0
            else:
                cost = 1
            d[i][j] = min(d[i - 1][j] + 1,      # deletion
                          d[i][j - 1] + 1,      # insertion
                          d[i - 1][j - 1] + cost) # substitution   

    return d[len_a][len_b]

def distance(a, b):
    len_a, len_b = len(a), len(b)
    if len_a == len_b and a == b:
        return 0
    elif len_a == 0:
        return len_b
    elif len_b == 0:
        return len_a
    else:
        return levenshtein_distance(a, len_a, b, len_b)

Choose configuration and update Timeout to 1 minute.

You can modify memory to optimize performance. To learn more, see Optimizing Levenshtein User-Defined Function in Amazon Redshift.

Create an Amazon Redshift IAM role

To allow your Amazon Redshift cluster to invoke the Lambda function, you must set up proper IAM permissions. Complete the following steps:

Identify the IAM role associated with your Amazon Redshift cluster. If you don’t have one, create a new IAM role for Amazon Redshift.
Add the following IAM policy to this role, providing your AWS Region and AWS account number:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "lambda:InvokeFunction",
"Resource": "arn:aws:lambda:<REGION>:<AWS account>:function:levenshtein_distance_func"
}
]
}

Create a Lambda UDF

Run following script to create your Lambda UDF:

CREATE or REPLACE EXTERNAL FUNCTION 
fn_lambda_levenshtein_distance(a varchar, b varchar) returns int
lambda 'levenshtein_distance_func' IAM_ROLE default 
STABLE
;

Test the solution

To test the solution, run the following script using the Python UDF:

SELECT c_customer_sk, c_customer_id, fn_levenshtein_distance(c_first_name, c_last_name) as distance
FROM customer
WHERE c_customer_sk in (1,2,3,4,5,31);

The following table shows our output.

Run the same script using the Lambda UDF:

SELECT c_customer_sk, c_customer_id, fn_lambda_levenshtein_distance(c_first_name, c_last_name) as distance
FROM customer
WHERE c_customer_sk in (1,2,3,4,5,31);

The results of both UDFs match.

Replace the Python UDF with the Lambda UDF

You can use the following steps in preproduction for testing:

Revoke access for the Python UDF:

REVOKE execute on function fn_levenshtein_distance(varchar, varchar) from <group_name> or <role_name>

Grant access to the Lambda UDF:

grant execute on function fn_lambda_levenshtein_distance(varchar, varchar) to <group_name> or <role_name>

After full testing of the Lambda UDF has been performed, you can drop the Python UDF.
Rename the Lambda UDF fn_lambda_levenshtein_distance to fn_levenshtein_distance so the end-user and application code doesn’t need to change:

ALTER FUNCTION fn_lambda_levenshtein_distance(varchar, varchar)
     RENAME TO fn_levenshtein_distance;

Validate with the following query:

SELECT c_customer_sk, c_customer_id, fn_levenshtein_distance(c_first_name, c_last_name) as distance
FROM customer
WHERE c_customer_sk in (1,2,3,4,5,31);

Cost evaluation

To evaluate the cost of the Lambda UDF, complete the following steps:

Run the following script to create a table using a SELECT query, which uses the Lambda UDF:

DROP TABLE IF EXISTS customer_lambda;
CREATE TABLE customer_lambda as 
SELECT c_customer_sk, c_customer_id, fn_levenshtein_distance(c_first_name, c_last_name) as distance
FROM customer;

You can inspect the query logs using CloudWatch Log Insights.

On the CloudWatch console, choose Logs in the navigation pane, then choose Log Insights.
Filter by the Lambda UDF and use the following query to identify the number of Lambda invocations.

fields @timestamp, @message, @logStream, @log
| filter @message like /^REPORT/
| sort @timestamp desc
| limit 10000

Use following query to find the cost of the Lambda UDF for the specific duration you selected:

parse @message /Duration:\s*(?<@duration_ms>\d+\.\d+)\s*ms\s*Billed\s*Duration:\s*(?<@billed_duration_ms>\d+)\s*ms\s*Memory\s*Size:\s*(?<@memory_size_mb>\d+)\s*MB/
| filter @message like /REPORT RequestId/
| stats sum(@billed_duration_ms * @memory_size_mb * 1.6279296875e-11 + 2.0e-7) as @cost_dollars_total

For this example, we used the us-east-1 Region using ARM-based instances. For more details on Lambda pricing by Region and the Free Tier limit, see AWS Lambda pricing.

Choose Summarize results.

The cost of this Lambda UDF invocation was $0.02329 for 30 million rows.

Monitor Lambda UDFs

Monitoring Lambda UDFs involves tracking both the Lambda function’s performance and the impact on the Redshift query execution. Because UDFs execute externally, a dual approach is necessary.

CloudWatch metrics and logs for Lambda functions

CloudWatch provides comprehensive monitoring for Lambda functions, such as the following key metrics:

Invocations – Tracks the number of times the Lambda function is called, indicating UDF usage frequency
Duration – Measures execution time, helping identify performance bottlenecks
Errors – Counts failed invocations, which is critical for detecting issues in UDF logic
Throttles – Indicates when Lambda limits invocations due to concurrency caps, which can delay query results
Logs – CloudWatch Logs capture detailed execution output, including errors and custom log messages, aiding in debugging
Alarms – Configures alarms for high error rates (for example, Errors > 0) or excessive duration (for example, Duration > 1 second) to receive proactive notifications

Redshift query performance

Within Amazon Redshift, system views provide comprehensive insights into Lambda UDF performance and errors:

SYS_QUERY_HISTORY – Identifies queries that have called your Lambda UDFs by filtering with the UDF name in the query_text column. This helps track usage patterns and execution frequency.
SYS_QUERY_DETAIL – Provides granular execution metrics for queries involving Lambda UDFs, helping identify performance bottlenecks at the step level.
Performance aggregation – Generates summary reports of Lambda UDF performance metrics, including execution count, average duration, and maximum duration to track performance trends over time.

The following table summarizes the monitoring tools available.

Monitoring Tool	Purpose	Key Metrics/Views
CloudWatch Metrics	Track Lambda function performance	Invocations, Duration, Errors, Throttles
CloudWatch Logs	Debug Lambda execution issues	Error messages, custom logs
SYS_QUERY_HISTORY	Track Lambda UDF usage patterns	Query execution times, status, user information, query text
SYS_QUERY_DETAIL	Analyze Lambda UDF performance	Step-level execution details, resource utilization, query plan information
Performance Summary Reports	Track UDF performance trends	Execution count, average/maximum duration, total elapsed time

Monitoring approach for Lambda UDFs in Amazon Redshift

For analyzing individual queries, you can use the following code to track how your Lambda UDFs are being used across your organization:

SELECT * FROM sys_query_history
WHERE query_text LIKE '%your_lambda_udf_name%'
ORDER BY start_time DESC
LIMIT 20;

This helps you do the following:

Identify frequent users
Monitor execution patterns
Track usage trends
Detect unauthorized access

You can also create comprehensive monitoring by using query history to monitor performance metrics at the user level:

SELECT 
    usename,
    DATE_TRUNC('day', start_time) as day,
    COUNT(*) as query_count,
    AVG(DATEDIFF(microsecond, start_time, end_time))/1000000.0 as avg_duration_seconds,
    MAX(DATEDIFF(microsecond, start_time, end_time))/1000000.0 as max_duration_seconds
FROM sys_query_history q
JOIN pg_user u ON q.user_id = u.usesysid
WHERE query_text LIKE '%your_lambda_udf_name%'
AND user_id > 1
GROUP BY usename, day
ORDER BY usename, query_count DESC;

Additionally, you can generate weekly performance reports using the following aggregation query:

SELECT 
    'your_lambda_udf_name' AS function_name,
    COUNT(DISTINCT q.query_id) AS execution_count,
    AVG(DATEDIFF(millisecond, q.start_time, q.end_time)) AS avg_duration_ms,
    MAX(DATEDIFF(millisecond, q.start_time, q.end_time)) AS max_duration_ms,
    SUM(q.elapsed_time) / 1000000 AS total_elapsed_time_sec
FROM 
    sys_query_history q
WHERE 
    q.query_text LIKE '%your_lambda_udf_name%'
GROUP BY 
    function_name
ORDER BY 
    execution_count DESC;

Considerations

To maximize the benefits of Lambda UDFs, consider the following aspects to optimize performance, provide reliability, secure data, and manage costs. If you have Python UDFs that don’t use Python libraries, consider whether they are candidates to convert to SQL UDFs.

The following are key performance considerations:

Batching – Amazon Redshift batches multiple rows into a single Lambda invocation to reduce call frequency, improving efficiency. Make sure the Lambda function handles batched inputs efficiently. For more details, see Accessing external components using Amazon Redshift Lambda UDFs.
Parallel invocations – Redshift cluster slices invoke Lambda functions in parallel, enhancing performance for large datasets. Design functions to support concurrent executions.
Cold starts – Lambda functions might experience cold start delays, particularly if infrequently used. Languages like Python or Node.js typically have faster startup times than Java, reducing latency.
Function optimization – Optimize Lambda code for quick execution, minimizing resource usage and latency. For example, avoid unnecessary computations or external API calls.

Consider the following error handling methods:

Robust lambda logic – Implement comprehensive error handling in the Lambda function to manage exceptions gracefully. Return clear error messages in the JSON response, as specified in the Amazon Redshift-Lambda interface. For more details, see Scalar Lambda UDFs.
Error propagation – Lambda errors can cause Redshift query failures. Monitor SYS_QUERY_HISTORY for query-level issues and CloudWatch Logs for detailed Lambda errors.
JSON interface – The Lambda function must return a JSON object with success, error_msg, num_records, and results fields. Use proper formatting to avoid query disruptions.

Clean up

Complete the following steps to clean up your resources:

Delete the Redshift provisioned or serverless endpoint.
Delete the Lambda function.
Delete the IAM roles you created.

Conclusion

Lambda UDFs unlock a new level of flexibility, performance, and maintainability for extending Amazon Redshift. By decoupling custom logic from the warehouse engine, teams can scale independently, adopt modern runtimes, and integrate external systems.

If you’re currently using Python UDFs in Amazon Redshift, it’s time to explore the benefits of migrating to Lambda UDFs. With the generative AI capabilities of Amazon Q Developer, you can automate much of this transformation and accelerate your modernization journey. To learn more, refer to the Lambda UDF examples GitHub repo and Data Tokenization with Amazon Redshift and Protegrity.

About the authors

Raks Khare is a Senior Analytics Specialist Solutions Architect at AWS based out of Pennsylvania. He helps customers across varying industries and regions architect data analytics solutions at scale on the AWS platform. Outside of work, he likes exploring new travel and food destinations and spending quality time with his family.

Ritesh Kumar Sinha is an Analytics Specialist Solutions Architect based out of San Francisco. He has helped customers build scalable data warehousing and big data solutions for over 16 years. He loves to design and build efficient end-to-end solutions on AWS. In his spare time, he loves reading, walking, and doing yoga.

Yanzhu Ji is a Product Manager in the Amazon Redshift team. She has experience in product vision and strategy in industry-leading data products and platforms. She has outstanding skill in building substantial software products using web development, system design, database, and distributed programming techniques. In her personal life, Yanzhu likes painting, photography, and playing tennis.

Harshida Patel is a Analytics Specialist Principal Solutions Architect, with AWS.

New Amazon EC2 C8gn instances powered by AWS Graviton4 offering up to 600Gbps network bandwidth

2025-06-30 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/new-amazon-ec2-c8gn-instances-powered-by-aws-graviton4-offering-up-to-600gbps-network-bandwidth/

Today, we’re announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) C8gn network optimized instances powered by AWS Graviton4 processors and the latest 6th generation AWS Nitro Card. EC2 C8gn instances deliver up to 600Gbps network bandwidth, the highest bandwidth among EC2 network optimized instances.

You can use C8gn instances to run the most demanding network intensive workloads, such as security and network virtual appliances (virtual ﬁrewalls, routers, load balancers, proxy servers, DDoS appliances), data analytics, and tightly-coupled cluster computing jobs.

EC2 C8gn instances specifications
C8gn instances provide up to 192 vCPUs and 384 GiB memory, and offer up to 30 percent higher compute performance compared Graviton3-based EC2 C7gn instances.

Here are the specs for C8gn instances:

Instance Name	vCPUs	Memory (GiB)	Network Bandwidth (Gbps)	EBS Bandwidth (Gbps)
c8gn.medium	1	2	Up to 25	Up to 10
c8gn.large	2	4	Up to 30	Up to 10
c8gn.xlarge	4	8	Up to 40	Up to 10
c8gn.2xlarge	8	16	Up to 50	Up to 10
c8gn.4xlarge	16	32	50	10
c8gn.8xlarge	32	64	100	20
c8gn.12xlarge	48	96	150	30
c8gn.16xlarge	64	128	200	40
c8gn.24xlarge	96	192	300	60
c8gn.metal-24xl	96	192	300	60
c8gn.48xlarge	192	384	600	60
c8gn.metal-48xl	192	384	600	60

You can launch C8gn instances through the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs.

If you’re using C7gn instances now, you will have straightforward experience migrating network intensive workloads to C8gn instances because the new instances offer similar vCPU and memory ratios. To learn more, check out the collection of Graviton resources to help you start migrating your applications to Graviton instance types.

You can also visit the Level up your compute with AWS Graviton page to begin your Graviton adoption journey.

Now available
Amazon EC2 C8gn instances are available today in US East (N. Virginia) and US West (Oregon) Regions. Two metal instance sizes are only available in US East (N. Virginia) Region. These instances can be purchased as On-Demand, Savings Plan, Spot instances, or as Dedicated instances and Dedicated hosts.

Give C8gn instances a try in the Amazon EC2 console. To learn more, refer to the Amazon EC2 C8g instance page and send feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

— Channy

AWS Weekly Roundup: Project Rainier, Amazon CloudWatch investigations, AWS MCP servers, and more (June 30, 2025)

2025-06-30 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-project-rainier-amazon-cloudwatch-investigations-aws-mcp-servers-and-more-june-30-2025/

Every time I visit Seattle, the first thing that greets me at the airport is Mount Rainier. Did you know that the most innovative project at Amazon Web Services (AWS) is named after this mountain?

Project Rainier is a new project to create what is expected to be the world’s most powerful computer for training AI models across multiple data centers in the United Stages. Anthropic will develop the advanced versions of its Claude models with five times more computing power than its current largest training cluster.

The key technology powering Project Rainier is AWS custom-designed Trainium2 chips, which are specialized for the immense data processing required to train complex AI models. Thousands of these Trainium2 chips will be connected in a new type of Amazon EC2 UltraServer and EC2 UltraCluster architecture that allows ultra-fast communication and data sharing across the massive system.

Learn about the AWS vertical integration of Project Rainer, where it designs every component of the technology stack from chips to software, allows it to optimize the entire system for maximum efficiency and reliability.

Last week’s launches
Here are some launches that got my attention:

Amazon S3 access for Amazon FSx for OpenZFS – You can access and analyze your FSx for OpenZFS file data through Amazon S3 Access Points, enabling seamless integration with AWS AI/ML, and analytics services without moving your data out of the file system. You can treat your FSx for OpenZFS data as if it were stored in S3, making it accessible through the S3 API for various applications including Amazon Bedrock, Amazon SageMaker, AWS Glue, and other S3 based cloud-native applications.
Amazon S3 with sort and z-order compaction for Apache Iceberg tables – You can optimize query performance and reduce costs with new sort and z-order compaction. With S3 Tables, sort compaction automatically organizes data files based on defined column orders, while z-order compaction can be enabled through the maintenance API for efficient multicolumn queries.
Amazon CloudWatch investigations – You can accelerate your operational troubleshooting in AWS environments using the Amazon CloudWatch AI-powered investigation feature, which helps identify anomalies, surface related signals, and suggest remediation steps. This capability can be initiated through CloudWatch data widgets, multiple AWS consoles, CloudWatch alarm actions, or Amazon Q chat and enables team collaboration and integration with Slack and Microsoft Teams.
Amazon Bedrock Guardrails Standard tier – You can enhance your AI content safety measures using the new Standard tier. It offers improved content filtering and topic denial capabilities across up to 60 languages, better detection of variations including typos, and stronger protection against prompt attacks. This feature lets you configure safeguards to block harmful content, prevent model hallucinations, redact personally identifiable information (PII), and verify factual claims through automated reasoning checks.
Amazon Route 53 Resolver endpoints for private hosted zone – You can simplify DNS management across AWS and on-premises infrastructure using the new Route 53 DNS delegation feature for private hosted zone subdomains, which works with both inbound and outbound Resolver endpoints. You can delegate subdomain authority between your on-premises infrastructure and Route 53 Resolver cloud service using name server records, eliminating the need for complex conditional forwarding rules.
Amazon Q Developer CLI for Java transformation – You can automate and scale Java application upgrades using the new Amazon Q Developer Java transformation command line interface (CLI). This feature perform upgrades from Java versions 8, 11, 17, or 21 to versions 17 or 21 directly from the command line. This tool offers selective transformation options so you can choose specific steps from transformation plans and customize library upgrades.
New AWS IoT Device Management managed integrations – You can simplify Internet of Things (IoT) device management across multiple manufacturers and protocols using the new managed integrations feature, which provides a unified interface for controlling devices whether they connect directly, through hubs or third-party clouds. The feature includes pre-built cloud-to-cloud (C2C) connectors, device data model templates, and SDKs that support ZigBee, Z-Wave, and Wi-Fi protocols, while you can still create custom connectors and data models.

For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS? page.

Other AWS news
Various Model Context Protocol (MCP) servers for AWS services have been released. Here are some tutorials about MCP servers that you might find interesting:

AWS costs estimation using Amazon Q CLI and AWS Cost Analysis MCP – You can use Amazon Q CLI with the AWS Cost Analysis MCP server to perform sophisticated cost analysis that follows AWS best practices. We discuss basic setup and advanced techniques, with detailed examples and step-by-step instructions.
Supercharging AWS database development with AWS MCP servers – You can learn the core concepts behind MCP and demonstrate how new AWS MCP servers can accelerate your database development through natural language prompts.
Create your own MCP server with C# and .NET with Amazob Q CLI – You’ll use Amazon Q CLI as an MCP client to configure your local development environment for building and testing C# MCP servers, enhancing your ability to create, iterate, and refine AI-powered applications.

Upcoming AWS events
Check your calendars and sign up for these upcoming AWS events:

AWS re:Invent – Register now to get a head start on choosing your best learning path, booking travel and accommodations, and bringing your team to learn, connect, and have fun. If you’re an early-career professional, you can apply to the All Builders Welcome Grant program, which is designed to remove financial barriers and create diverse pathways into cloud technology.
AWS NY Summits – You can gain insights from Swami’s keynote featuring the latest cutting-edge AWS technologies in compute, storage, and generative AI. My News Blog team is also preparing some exciting news for you. If you’re unable to attend in person, you can still participate by registering for the global live stream. Also, save the date for these upcoming Summits in July and August near your city.
AWS Builders Online Series – If you’re based in one of the Asia Pacific time zones, join and learn fundamental AWS concepts, architectural best practices, and hands-on demonstrations to help you build, migrate, and deploy your workloads on AWS.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Channy

Introducing AWS Glue Data Catalog usage metrics for API usage

2025-06-26 David Zhang

Post Syndicated from David Zhang original https://aws.amazon.com/blogs/big-data/introducing-aws-glue-data-catalog-usage-metrics-for-api-usage/

We’re excited to announce AWS Glue Data Catalog usage metrics. The usage metrics is a new feature that provides native integration with Amazon CloudWatch. This feature provides you with immediate visibility into your AWS Glue Data Catalog API usage patterns and trends.

AWS Glue Data Catalog is a centralized repository that stores metadata about your organization’s datasets. With its unified interface that acts as an index, you can store and query information about your data sources, including their location, formats, schemas, and runtime metrics.

As you scale your lakehouse architecture on Amazon Web Services (AWS) and maintain reliable data operations, observability and monitoring becomes critical to understanding and optimizing Data Catalog API usages.

With Data Catalog usage metrics in CloudWatch, you can achieve the following:

Monitor API call patterns at 1-minute intervals
Proactively request service quota increase for API rate limits
Enable the CloudWatch pre-built anomaly detection feature to identify abnormalities in your API usage
Understand lakehouse usage across more than 50 APIs

In this post, we demonstrate how to access these metrics, provide a step-by-step walkthrough, and set up meaningful alarms.

Access Data Catalog usage metrics in Amazon CloudWatch console

To access Data Catalog usage metrics, complete the following steps:

Open Amazon CloudWatch console
Under Metrics, choose All metrics
In the search bar, enter Glue and choose Enter
Choose Usage > By AWS Resource, as shown in the following screenshot

The Metrics section opens and displays different catalog usage metrics that you can select from to create dashboards and alarms, as shown in the following screenshot

Monitor CallCount metrics

Each Amazon CloudWatch metric for Data Catalog is of a type API and set as CallCount. This means that for each API call on that specific resource (for example, GetConnection API) will be logged as one count. These metrics can seamlessly integrate into your existing CloudWatch dashboards, or you can use them to create new ones. For proactive monitoring, you can configure custom alarms that trigger automatically when this API usage exceeds your defined thresholds, helping you comply with service limits.

Under the Graphed metrics tab, you can provide additional customizations to match your monitoring needs. In the Details column, you can create alarms and enable anomaly detection to identify unusual patterns.

To help with effective API monitoring, CallCount metrics specifically focus on successful API calls. This way, you have more precise monitoring and can troubleshoot different types of API behaviors. The following screenshot shows the AWS Glue usage metrics view for GetTables API.

In the Statistics column, you can view your API usage beyond the default Sum, Min, and Max metrics. You can now select a wide variety of statistical methods to analyze your usage patterns, as shown in the following screenshot.

Metrics and dimensions for Data Catalog usage metrics

Data Catalog usage metrics use the AWS/Usage namespace and provide CallCount metrics. These metrics are published with the dimensions Service, Resource, Type and Class.

The CallCount metric doesn’t have a specified unit. The most useful statistic for the metric is SUM, which represents the total operation count for the 1-minute period. An important note is that the metric value is emitted at 1-minute intervals. Reducing the period further (for example, to 1 second) won’t change the emittance interval.

Metrics

Metric	Description
`CallCount`	The number of specified operations performed in your account.

Dimensions

Dimension key	Dimension value	Description
Service	AWS Glue	The name of the AWS service containing the resource. For Data Catalog usage metrics, the value for this dimension is AWS Glue.
Type	API	The type of resource being tracked. Currently, when the Service dimension is AWS Glue, the only valid value for Type is API.
Resource	<API name>	The name of the API operation. Valid values include the following: GetCatalogs, GetCatalog, GetDatabases, GetDatabase, GetTables, GetTable, GetTableVersion, GetTableVersions, SearchTables, GetPartitionIndexes, GetColumnStatisticsForTable, GetPartition, GetPartitions, BatchGetPartition, GetColumnStatisticsForPartition, GetConnection, GetConnections, GetUserDefinedFunction, GetUserDefinedFunctions, GetCatalogImportStatus, GetTableOptimizer, BatchGetTableOptimizer, ListTableOptimizerRuns, CreateCatalog, CreateDatabase, CreateTable, CreatePartitionIndex, CreatePartition, BatchCreatePartition, CreateConnection, CreateUserDefinedFunction, CreateTableOptimizer, UpdateCatalog, UpdateDatabase, UpdateTable, UpdateColumnStatisticsForTable, UpdatePartition, BatchUpdatePartition, UpdateColumnStatisticsForPartition, UpdateConnection, UpdateUserDefinedFunction, UpdateTableOptimizer, DeleteCatalog, DeleteDatabase, DeleteTable, BatchDeleteTable, DeleteTableVersion, DeletePartitionIndex, DeleteColumnStatisticsForTable, DeletePartition, BatchDeletePartition, DeleteColumnStatisticsForPartition, DeleteConnection, BatchDeleteConnection, DeleteUserDefinedFunction, DeleteTableOptimizer, TestConnection, ImportCatalogToGlue
Class	None	The class of resource being tracked. Data Catalog usage metrics use this dimension with a value of `None`.

Set up CloudWatch alarms for Data Catalog usage metrics

Data Catalog has defined rules to manage atypical usage patterns that limit the customer call rate at the granularity of requests per second. You can generate CloudWatch alarms using the CallCount metric so that limit increases can be done proactively. To configure a CloudWatch alarm with this threshold, complete the following steps:

On the CloudWatch metrics console, select one of the available metrics, as shown in the following screenshot. In this example, we select the resource GetTables. You can select multiple metrics to fit your use case.

Choose Graphed metrics.
Choose Sum as the primary statistic.
Set period to 1 minute.

Choose Details and Create Alarm.

For Threshold type, choose Anomaly Detection. You can also select Static based on your requirements and after you’ve determined a specific threshold value.
Set the Anomaly detection threshold to 2 (default). The threshold value is used to determine the normal range of values for the metric. A higher value produces a thicker band of normal values. For more information on how CloudWatch anomaly detection works, refer to How CloudWatch anomaly detection works.
Choose Next.
For Send a notification to the following SNS topic, choose Create new topic.
For Create a new topic, enter your Amazon Simple Notification Service (Amazon SNS) topic name.
For Email endpoints that will receive the notification, enter your email address. In this example, we’re going to create a new SNS topic. However, you can use your existing SNS topics or use other options such as AWS Lambda or auto scaling action.
Choose Create topic.

Scroll down and choose Next.
Enter an alarm name and a description and choose Next.
Review all the details you’ve entered and choose Create alarm, as shown in the following screenshot.

By following these steps, you’ve successfully configured a CloudWatch alarm using anomaly detection that monitors your Data Catalog usage with the threshold that you set. The alarm will trigger when the CallCount metric exceeds the calculated threshold, sending notifications to your specified SNS topic and email endpoints.

This proactive monitoring approach prevents API rate limit issues and provides a smooth operation of your Data Catalog usage. For more information on using CloudWatch alarms, refer to Using Amazon CloudWatch alarms.

Conclusion

AWS Glue Data Catalog usage metrics is an effective enhancement to your data infrastructure monitoring capabilities. It addresses the growing need for detailed observability through Amazon CloudWatch in modern data architectures built on top of Data Catalog. You now have access to more granular statistics, moving beyond simple maximum and average request metrics to comprehensive performance indicators including p99 percentiles. These metrics are emitted in 1-minute intervals, providing visibility into your data catalog operations. Organizations can now proactively identify bottlenecks before they affect operations and efficiently conduct capacity planning through detailed usage patterns.

From building monitoring dashboards to setting up alerts, the native support with CloudWatch anomaly detection and flexible alarm configurations makes it straightforward to proactively monitor your lakehouse deployment and prevent abnormalities in your lakehouse usage. For more information, refer to Monitoring Data Catalog usage metrics in Amazon CloudWatch in the AWS Glue documentation. We recommend testing and using these metrics as part of your modern monitoring and observability strategy. We encourage you to share your feedback with us.

About the authors

David Zhang is an Analytics Solutions Architect specializing in designing and implementing large-scale data infrastructure, ETL processes, and extensive data management systems. He helps customers modernize data platforms on Amazon Web Services (AWS). David is also an active speaker at AWS events and contributor to technical content and open source initiatives. He enjoys playing volleyball, tennis, and basketball during his free time.

Noritaka Sekiyama is a Principal Big Data Architect with Amazon Web Services (AWS) Analytics services. He’s responsible for building software artifacts to help customers. In his spare time, he enjoys cycling on his road bike.

Sandeep Adwankar is a Senior Product Manager at AWS. Based in the California Bay Area, he works with customers around the globe to translate business and technical requirements into products that enable customers to improve how they manage, secure, and access data.

Abhay Joshi is a Software Development Engineer at AWS Glue and AWS Lake Formation. He is passionate about building fault tolerant and reliable distributed systems at scale.

Amazon FSx for OpenZFS now supports Amazon S3 access without any data movement

2025-06-25 Elizabeth Fuentes

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/amazon-fsx-for-openzfs-now-supports-amazon-s3-access-without-any-data-movement/

Starting today, you can attach Amazon S3 Access Points to your Amazon FSx for OpenZFS file systems to access your file data as if it were in Amazon Simple Storage Service (Amazon S3). With this new capability, your data in FSx for OpenZFS is accessible for use with a broad range of Amazon Web Services (AWS) services and applications for artificial intelligence, machine learning (ML), and analytics that work with S3. Your file data continues to reside in your FSx for OpenZFS file system.

Organizations store hundreds of exabytes of file data on premises and want to move this data to AWS for greater agility, reliability, security, scalability, and reduced costs. Once their file data is in AWS, organizations often want to do even more with it. For example, they want to use their enterprise data to augment generative AI applications and build and train machine learning models with the broad spectrum of AWS generative AI and machine learning services. They also want the flexibility to use their file data with new AWS applications. However, many AWS data analytics services and applications are built to work with data stored in Amazon S3 as data lakes. After migration, they can use tools that work with Amazon S3 as their data source. Previously, this required data pipelines to copy data between Amazon FSx for OpenZFS file systems and Amazon S3 buckets.

Amazon S3 Access Points attached to FSx for OpenZFS file systems remove data movement and copying requirements by maintaining unified access through both file protocols and Amazon S3 API operations. You can read and write file data using S3 object operations including GetObject, PutObject, and ListObjectsV2. You can attach hundreds of access points to a file system, with each S3 access point configured with application-specific permissions. These access points support the same granular permissions controls as S3 access points that attach to S3 buckets, including AWS Identity and Access Management (IAM) access point policies, Block Public Access, and network origin controls such as restricting access to your Virtual Private Cloud (VPC). Because your data continues to reside in your FSx for OpenZFS file system, you continue to access your data using Network File System (NFS) and benefit from existing data management capabilities.

You can use your file data in Amazon FSx for OpenZFS file systems to power generative AI applications with Amazon Bedrock for Retrieval Augmented Generation (RAG) workflows, train ML models with Amazon SageMaker, and run analytics or business intelligence (BI) with Amazon Athena and AWS Glue as if the data were in S3, using the S3 API. You can also generate insights using open source tools such as Apache Spark and Apache Hive, without moving or refactoring your data.

To get started
You can create and attach an S3 Access Point to your Amazon FSx for OpenZFS file system using the Amazon FSx console, the AWS Command Line Interface (AWS CLI), or the AWS SDK.

To start, you can follow the steps in the Amazon FSx for OpenZFS file system documentation page to create the file system, then, using the Amazon FSx console, go to Actions and select Create S3 access point. Leave the standard configuration and then create.

To monitor the creation progress, you can go to the Amazon FSx console.

Once available, choose the name of the new S3 access point and review the access point summary. This summary includes an automatically generated alias that works anywhere you would normally use S3 bucket names.

Using the bucket-style alias, you can access the FSx data directly through S3 API operations.

List objects using the ListObjectsV2 API

Get files using the GetObject API

Write data using the PutObject API

The data continues to be accessible via NFS.

Beyond accessing your FSx data through the S3 API, you can work with your data using the broad range of AI, ML, and analytics services that work with data in S3. For example, I built an Amazon Bedrock Knowledge Base using PDFs containing airline customer service information from my travel support application repository, WhatsApp-Powered RAG Travel Support Agent: Elevating Customer Experience with PostgreSQL Knowledge Retrieval, as the data source.

To create the Amazon Bedrock Knowledge Base, I followed the connection steps in Connect to Amazon S3 for your knowledge base user guide. I chose Amazon S3 as the data source, entered my S3 access point alias as the S3 source, then configured and created the knowledge base.

Once the knowledge base is synchronized, I can see all documents and the Document source as S3.

Finally, I ran queries against the knowledge base and verified that it successfully used the file data from my Amazon FSx for OpenZFS file system to provide contextual answers, demonstrating seamless integration without data movement.

Things to know
Integration and access control – Amazon S3 Access Points for Amazon FSx for OpenZFS file systems support standard S3 API operations (such as GetObject, ListObjectsV2, PutObject) through the S3 endpoint, with granular access controls through AWS Identity and Access Management (IAM) permissions and file system user authentication. Your S3 Access Point includes an automatically generated access point alias for data access using S3 bucket names, and public access is blocked by default for Amazon FSx resources.

Data management – Your data stays in your Amazon FSx for OpenZFS file system while becoming accessible as if it were in Amazon S3, eliminating the need for data movement or copies, with file data remaining accessible through NFS file protocols.

Performance – Amazon S3 Access Points for Amazon FSx for OpenZFS file systems deliver first-byte latency in the tens of milliseconds range, consistent with S3 bucket access. Performance scales with your Amazon FSx file system’s provisioned throughput, with maximum throughput determined by your underlying FSx file system configuration.

Pricing – You’re billed by Amazon S3 for the requests and data transfer costs through your S3 Access Point, in addition to your standard Amazon FSx charges. Learn more on the Amazon FSx for OpenZFS pricing page.

You can get started today using the Amazon FSx console, AWS CLI, or AWS SDK to attach Amazon S3 Access Points to your Amazon FSx for OpenZFS file systems. The feature is available in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon), Europe (Frankfurt, Ireland, Stockholm), and Asia Pacific (Hong Kong, Singapore, Sydney, Tokyo).

— Eli

New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction

2025-06-24 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-improve-apache-iceberg-query-performance-in-amazon-s3-with-sort-and-z-order-compaction/

You can now use sort and z-order compaction to improve Apache Iceberg query performance in Amazon S3 Tables and general purpose S3 buckets.

You typically use Iceberg to manage large-scale analytical datasets in Amazon Simple Storage Service (Amazon S3) with AWS Glue Data Catalog or with S3 Tables. Iceberg tables support use cases such as concurrent streaming and batch ingestion, schema evolution, and time travel. When working with high-ingest or frequently updated datasets, data lakes can accumulate many small files that impact the cost and performance of your queries. You’ve shared that optimizing Iceberg data layout is operationally complex and often requires developing and maintaining custom pipelines. Although the default binpack strategy with managed compaction provides notable performance improvements, introducing sort and z-order compaction options for both S3 and S3 Tables delivers even greater gains for queries filtering across one or more dimensions.

Two new compaction strategies: Sort and z-order
To help organize your data more efficiently, Amazon S3 now supports two new compaction strategies: sort and z-order, in addition to the default binpack compaction. These advanced strategies are available for both fully managed S3 Tables and Iceberg tables in general purpose S3 buckets through AWS Glue Data Catalog optimizations.

Sort compaction organizes files based on a user-defined column order. When your tables have a defined sort order, S3 Tables compaction will now use it to cluster similar values together during the compaction process. This improves the efficiency of query execution by reducing the number of files scanned. For example, if your table is organized by sort compaction along state and zip_code, queries that filter on those columns will scan fewer files, improving latency and reducing query engine cost.

Z-order compaction goes a step further by enabling efficient file pruning across multiple dimensions. It interleaves the binary representation of values from multiple columns into a single scalar that can be sorted, making this strategy particularly useful for spatial or multidimensional queries. For example, if your workloads include queries that simultaneously filter by pickup_location, dropoff_location, and fare_amount, z-order compaction can reduce the total number of files scanned compared to traditional sort-based layouts.

S3 Tables use your Iceberg table metadata to determine the current sort order. If a table has a defined sort order, no additional configuration is needed to activate sort compaction—it’s automatically applied during ongoing maintenance. To use z-order, you need to update the table maintenance configuration using the S3 Tables API and set the strategy to z-order. For Iceberg tables in general purpose S3 buckets, you can configure AWS Glue Data Catalog to use sort or z-order compaction during optimization by updating the compaction settings.

Only new data written after enabling sort or z-order will be affected. Existing compacted files will remain unchanged unless you explicitly rewrite them by increasing the target file size in table maintenance settings or rewriting data using standard Iceberg tools. This behavior is designed to give you control over when and how much data is reorganized, balancing cost and performance.

Let’s see it in action
I’ll walk you through a simplified example using Apache Spark and the AWS Command Line Interface (AWS CLI). I have a Spark cluster installed and an S3 table bucket. I have a table named testtable in a testnamespace. I temporarily disabled compaction, the time for me to add data into the table.

After adding data, I check the file structure of the table.

spark.sql("""
  SELECT 
    substring_index(file_path, '/', -1) as file_name,
    record_count,
    file_size_in_bytes,
    CAST(UNHEX(hex(lower_bounds[2])) AS STRING) as lower_bound_name,
    CAST(UNHEX(hex(upper_bounds[2])) AS STRING) as upper_bound_name
  FROM ice_catalog.testnamespace.testtable.files
  ORDER BY file_name
""").show(20, false)

+--------------------------------------------------------------+------------+------------------+----------------+----------------+
|file_name                                                     |record_count|file_size_in_bytes|lower_bound_name|upper_bound_name|
+--------------------------------------------------------------+------------+------------------+----------------+----------------+
|00000-0-66a9c843-5a5c-407f-8da4-4da91c7f6ae2-0-00001.parquet  |1           |837               |Quinn           |Quinn           |
|00000-1-b7fa2021-7f75-4aaf-9a24-9bdbb5dc08c9-0-00001.parquet  |1           |824               |Tom             |Tom             |
|00000-10-00a96923-a8f4-41ba-a683-576490518561-0-00001.parquet |1           |838               |Ilene           |Ilene           |
|00000-104-2db9509d-245c-44d6-9055-8e97d4e44b01-0-00001.parquet|1000000     |4031668           |Anjali          |Tom             |
|00000-11-27f76097-28b2-42bc-b746-4359df83d8a1-0-00001.parquet |1           |838               |Henry           |Henry           |
|00000-114-6ff661ca-ba93-4238-8eab-7c5259c9ca08-0-00001.parquet|1000000     |4031788           |Anjali          |Tom             |
|00000-12-fd6798c0-9b5b-424f-af70-11775bf2a452-0-00001.parquet |1           |852               |Georgie         |Georgie         |
|00000-124-76090ac6-ae6b-4f4e-9284-b8a09f849360-0-00001.parquet|1000000     |4031740           |Anjali          |Tom             |
|00000-13-cb0dd5d0-4e28-47f5-9cc3-b8d2a71f5292-0-00001.parquet |1           |845               |Olivia          |Olivia          |
|00000-134-bf6ea649-7a0b-4833-8448-60faa5ebfdcd-0-00001.parquet|1000000     |4031718           |Anjali          |Tom             |
|00000-14-c7a02039-fc93-42e3-87b4-2dd5676d5b09-0-00001.parquet |1           |838               |Sarah           |Sarah           |
|00000-144-9b6d00c0-d4cf-4835-8286-ebfe2401e47a-0-00001.parquet|1000000     |4031663           |Anjali          |Tom             |
|00000-15-8138298d-923b-44f7-9bd6-90d9c0e9e4ed-0-00001.parquet |1           |831               |Brad            |Brad            |
|00000-155-9dea2d4f-fc98-418d-a504-6226eb0a5135-0-00001.parquet|1000000     |4031676           |Anjali          |Tom             |
|00000-16-ed37cf2d-4306-4036-98de-727c1fe4e0f9-0-00001.parquet |1           |830               |Brad            |Brad            |
|00000-166-b67929dc-f9c1-4579-b955-0d6ef6c604b2-0-00001.parquet|1000000     |4031729           |Anjali          |Tom             |
|00000-17-1011820e-ee25-4f7a-bd73-2843fb1c3150-0-00001.parquet |1           |830               |Noah            |Noah            |
|00000-177-14a9db71-56bb-4325-93b6-737136f5118d-0-00001.parquet|1000000     |4031778           |Anjali          |Tom             |
|00000-18-89cbb849-876a-441a-9ab0-8535b05cd222-0-00001.parquet |1           |838               |David           |David           |
|00000-188-6dc3dcca-ddc0-405e-aa0f-7de8637f993b-0-00001.parquet|1000000     |4031727           |Anjali          |Tom             |
+--------------------------------------------------------------+------------+------------------+----------------+----------------+
only showing top 20 rows

I observe the table is made of multiple small files and that the upper and lower bounds for the new files have overlap–the data is certainly unsorted.

I set the table sort order.

spark.sql("ALTER TABLE ice_catalog.testnamespace.testtable WRITE ORDERED BY name ASC")

I enable table compaction (it’s enabled by default; I disabled it at the start of this demo)

aws s3tables put-table-maintenance-configuration --table-bucket-arn ${S3TABLE_BUCKET_ARN} --namespace testnamespace --name testtable --type icebergCompaction --value "status=enabled,settings={icebergCompaction={strategy=sort}}"

Then, I wait for the next compaction job to trigger. These run throughout the day, when there are enough small files. I can check the compaction status with the following command.

aws s3tables get-table-maintenance-job-status --table-bucket-arn ${S3TABLE_BUCKET_ARN} --namespace testnamespace --name testtable

When the compaction is done, I inspect the files that make up my table one more time. I see that the data was compacted to two files, and the upper and lower bounds show that the data was sorted across these two files.

spark.sql("""
  SELECT 
    substring_index(file_path, '/', -1) as file_name,
    record_count,
    file_size_in_bytes,
    CAST(UNHEX(hex(lower_bounds[2])) AS STRING) as lower_bound_name,
    CAST(UNHEX(hex(upper_bounds[2])) AS STRING) as upper_bound_name
  FROM ice_catalog.testnamespace.testtable.files
  ORDER BY file_name
""").show(20, false)

+------------------------------------------------------------+------------+------------------+----------------+----------------+
|file_name                                                   |record_count|file_size_in_bytes|lower_bound_name|upper_bound_name|
+------------------------------------------------------------+------------+------------------+----------------+----------------+
|00000-4-51c7a4a8-194b-45c5-a815-a8c0e16e2115-0-00001.parquet|13195713    |50034921          |Anjali          |Kelly           |
|00001-5-51c7a4a8-194b-45c5-a815-a8c0e16e2115-0-00001.parquet|10804307    |40964156          |Liza            |Tom             |
+------------------------------------------------------------+------------+------------------+----------------+----------------+

There are fewer files, they have larger sizes, and there is a better clustering across the specified sort column.

To use z-order, I follow the same steps, but I set strategy=z-order in the maintenance configuration.

Regional availability
Sort and z-order compaction are now available in all AWS Regions where Amazon S3 Tables are supported and for general purpose S3 buckets where optimization with AWS Glue Data Catalog is available. There is no additional charge for S3 Tables beyond existing usage and maintenance fees. For Data Catalog optimizations, compute charges apply during compaction.

With these changes, queries that filter on the sort or z-order columns benefit from faster scan times and reduced engine costs. In my experience, depending on my data layout and query patterns, I observed performance improvements of threefold or more when switching from binpack to sort or z-order. Tell us how much your gains are on your actual data.

To learn more, visit the Amazon S3 Tables product page or review the S3 Tables maintenance documentation. You can also start testing the new strategies on your own tables today using the S3 Tables API or AWS Glue optimizations.

— seb

CISPE Data Protection Code of Conduct Public Register now certifies 122 AWS services as adherent

2025-06-23 Gokhan Akyuz

Post Syndicated from Gokhan Akyuz original https://aws.amazon.com/blogs/security/cispe-data-protection-code-of-conduct-public-register-now-certifies-122-aws-services-as-adherent/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce that 122 services are now certified as adherent to the Cloud Infrastructure Services Providers in Europe (CISPE) Data Protection Code of Conduct. This alignment with the CISPE requirements demonstrates our ongoing commitment to adhere to the heightened expectations for data protection by cloud service providers. AWS customers who use AWS certified services can be confident that their data is processed in adherence with the European Union’s General Data Protection Regulation (GDPR).

The CISPE Code of Conduct is the first pan-European, sector-specific code for cloud infrastructure service providers and received a favorable opinion that it complies with the GDPR. It helps organizations across Europe accelerate the development of GDPR-aligned, cloud-based services for consumers, businesses, and institutions.

The accredited monitoring body EY CertifyPoint evaluated AWS as of May 19, 2025, and successfully audited 112 certified services. AWS added ten additional services to the current scope in May 2025. As of the date of this post, 122 services are in scope of this certification. The Certificate of Compliance that illustrates AWS compliance status is available on the CISPE Public Register. For up-to-date information, including when additional services are added, search the CISPE Public Register by entering AWS as the Seller of Record; or see the AWS CISPE Data Protection Code of Conduct page.

AWS strives to bring additional services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about AWS compliance with CISPE Code, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs, AWS General Data Protection Regulation (GDPR) Center, and the EU data protection section of the AWS Cloud Security website. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Introducing AWS Lambda native support for Avro and Protobuf formatted Apache Kafka events

2025-06-23 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-native-support-for-avro-and-protobuf-formatted-apache-kafka-events/

AWS Lambda now provides native support for Apache Avro and Protocol Buffers (Protobuf) formatted events with Apache Kafka event source mapping (ESM) when using Provisioned Mode. The support allows you to validate your schema with popular schema registries. This allows you to use and filter the more efficient binary event formats and share data using schema in a centralized and consistent way. This blog post shows how you can use Lambda to process Avro and Protobuf formatted events from Kafka topics using schema registry integration.

This new capability works with both Amazon Managed Streaming for Apache Kafka (Amazon MSK), Confluent Cloud and self-managed Kafka clusters. To get started, update your existing Kafka ESM to Provisioned Mode and add schema registry configuration, or create a new ESM in Provisioned Mode with schema registry integration enabled.

Avro and Protobuf

Many organizations use Avro and Protobuf formats with Apache Kafka because these binary serialization formats offer advantages over JSON. They provide 50-80% smaller message sizes, faster serialization and deserialization performance, robust schema evolution capabilities, and strong typing across multiple programming languages.Working with these formats in Lambda functions previously necessitated custom code. Developers needed to implement schema registry clients, handle authentication and caching, write format-specific deserialization logic, and manage schema evolution scenarios.

What’s new

Lambda’s Kafka Event Source Mapping (ESM) now provides built-in integration with AWS Glue Schema Registry, Confluent Cloud Schema Registry, and self-managed Confluent Schema Registry. When you configure schema registry settings for your Kafka ESM, the service automatically validates incoming JSON Schema, Avro, and Protobuf records against their registered schema. This moves complex schema registry integration logic from your application layer to the managed Lambda service.

You can build your function with Kafka’s open-source ConsumerRecords interface using Powertools for AWS Lambda to get your Avro or Protobuf generated business objects directly. Optionally you can specify to get your records in the JSON format, where your function receives clean, validated JSON data regardless of the original serialization format, removing the need for custom deserialization code in your Lambda functions. This also allows you to create Kafka consumers across multiple programming languages.

Powertools for AWS Lambda is a developer toolkit that provides specific support for Java, .NET, Python, and TypeScript, maintaining consistency with existing Kafka development patterns. You can directly access business objects without custom deserialization code.

You can also setup filtering rules to discard irrelevant, JSON, Avro or Protobuf formatted events before function invocations, which can improve processing performance and reduce costs.

How schema validation works

When you configure schema registry integration for your Kafka ESM, you specify the registry endpoint, authentication details, and which event fields (key, value, or both) to validate. The ESM polls your Kafka topics for records as usual but now performs additional processing before invoking your Lambda function.For each incoming event, the ESM extracts the schema ID embedded in the serialized data. It fetches the corresponding schema from your configured registry. This process happens transparently, with schema definitions cached for up to 24 hours to optimize performance. The ESM identifies the format of your events using schema metadata and validates the event structure. It keeps either the original binary data or deserializes it to JSON format based on your customer configuration and sends it to your function for processing.

Figure 1: Kafka processing flow diagram.

The ESM handles schema evolution automatically. When producers begin using new schema versions, the service detects the updated schema IDs and fetches the latest definitions from your registry. This makes sure that your functions always receive properly deserialized data without requiring code changes.

Event record format

As a part of the ESM schema registry configuration, you need to specify Event Record Format, which Lambda uses to deliver validated records to your function. The schema registry configuration supports SOURCE and JSON.

SOURCE preserves the original binary format of the data as a base64-encoded string with producer-appended schema-id removed. This allows direct conversion to Avro or Protobuf objects so that you can use Kafka’s ConsumerRecords interface for a Kafka-like experience. Use this format when working with strongly typed languages or when you need to maintain the full capabilities of Avro or Protobuf schemas. Then, you can use any Avro or Protobuf deserializer to convert raw bytes to your business object. Powertools provides native support for this deserialization.

With JSON, the ESM deserializes the data ready for direct use in languages with native JSON support. Use this when you don’t need to preserve the original binary format or work with generated classes. You can also use Powertools to convert the base64 to your business object. See the documentation for payload formats and deserialization behavior.

If you configure filtering rules, then they operate on the JSON-formatted events after deserialization. This upstream filtering prevents unnecessary Lambda invocations for events that don’t match your processing criteria, directly reducing your compute costs.

Configuration and setup

To use this feature, you must enable Provisioned Mode for your Kafka ESM, which provides the dedicated compute resources needed for schema registry integration.

You can configure the integration through the AWS Management Console, AWS Command Line Interface (AWS CLI), AWS Language SDKs, or infrastructure as code (IaC) tools such as the AWS Serverless Application Model (AWS SAM) or AWS Cloud Development Kit (AWS CDK).

Your schema registry configuration includes the registry endpoint URL, authentication method (AWS Identity and Access Management (IAM) for AWS Glue Schema Registry, or Basic Auth, SASL/SCRAM, or mTLS for Confluent registries), and validation settings. You specify which event attributes to validate and optionally define filtering rules using standard Lambda event filtering syntax.

For error handling, configure Lambda failure destinations where events that fail schema validation or deserialization are sent. This makes sure that problematic events don’t disappear silently but are routed to other services such as Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and Amazon S3 for debugging and analysis.

Seeing the new features in action

There are a number of Serverless Patterns that you can use to process Kafka streams using Lambda. This example uses the Java pattern.

Deploy a sample Amazon MSK cluster

To set up an Amazon MSK cluster, follow the instructions in the GitHub repo and create a new AWS CloudFormation stack using the MSKAndKafkaClientEC2.yaml template file. The stack creates the Amazon MSK cluster, along with a client Amazon EC2 instance, to manage the Kafka cluster. There are costs involved when running this infrastructure.

Connect to the EC2 instance using EC2 Instance Connect.
Check that the Kafka topic is created by checking the contents of the kafka_topic_creator_output.txt file.
```
cat kafka_topic_creator_output.txt
```
The file should contain the text: “Created topic MskIamJavaLambdaTopic.”

Deploy the Glue schema registry and consumer Lambda function

The EC2 instance contains the software needed to deploy the schema registry and Lambda function.

Change directory to the pattern directory.
cd serverless-patterns/msk-lambda-iam-java-sam
Build the application using AWS SAM.
sam build

To deploy your application for the first time, run the following in the EC2 instance shell:

sam deploy --capabilities CAPABILITY_IAM --no-confirm-changeset \
	--no-disable-rollback --region $AWS_REGION --stack-name msk-lambda-schema-avro-java-sam --guided

You can accept all the defaults by hitting Enter. You can browse to the AWS Glue schema registry console and view the ContactSchema definition:

{
  "type": "record",
  "name": "Contact",
  "fields": [
    {"name": "firstname", "type": "string"},
    {"name": "lastname", "type": "string"},
    {"name": "company", "type": "string"},
    {"name": "street", "type": "string"},
    {"name": "city", "type": "string"},
    {"name": "county", "type": "string"},
    {"name": "state", "type": "string"},
    {"name": "zip", "type": "string"},
    {"name": "homePhone", "type": "string"},
    {"name": "cellPhone", "type": "string"},
    {"name": "email", "type": "string"},
    {"name": "website", "type": "string"}
  ]
}

The consumer Lambda function ESM is configured for Provisioned Mode.

View the ESM configuration from the Lambda console for the Lambda function name prefixed with msk-lambda-schema-avro-ja-LambdaMSKConsumer.
Choose the MSK Lambda trigger which opens the Triggers pane under Configuration.
Figure 2: View Lambda ESM schema configuration
The configuration specifies using the Event record format SOURCE so your function can use Kafka’s native open-source ConsumerRecords interface. Powertools then deserializes the payload.
The schema validation attribute is VALUE.
The ESM filter configuration only processes the records that match zip codes of 2000.
In your function code, specify the open-source Kafka ConsumersRecords interface by including Powertools for Lambda as a dependency. ConsumerRecords provides metadata about Kafka records and allows you to get direct access to your Avro/Protobuf generated business objects without requiring any additional deserialization code.

package com.amazonaws.services.lambda.samples.events.msk;

import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.lambda.powertools.kafka.Deserialization;
import software.amazon.lambda.powertools.kafka.DeserializationType;
import software.amazon.lambda.powertools.logging.Logging;

public class AvroKafkaHandler implements RequestHandler<ConsumerRecords<String, Contact>, String> {
    private static final Logger LOGGER = LoggerFactory.getLogger(AvroKafkaHandler.class);

    @Override
    @Logging(logEvent = true)
    @Deserialization(type = DeserializationType.KAFKA_AVRO)
    public String handleRequest(ConsumerRecords<String, Contact> records, Context context) {
        LOGGER.info("=== AvroKafkaHandler called ===");
        LOGGER.info("Event object: {}", records);
        LOGGER.info("Number of records: {}", records.count());
        
        for (ConsumerRecord<String, Contact> record : records) {
            LOGGER.info("Processing record - Topic: {}, Partition: {}, Offset: {}", 
                       record.topic(), record.partition(), record.offset());
            LOGGER.info("Record key: {}", record.key());
            LOGGER.info("Record value: {}", record.value());
            
            if (record.value() != null) {
                Contact contact = record.value();
                LOGGER.info("Contact details - firstName: {}, zip: {}", 
                           contact.getFirstname(), contact.getZip());
            }
        }
        
        LOGGER.info("=== AvroKafkaHandler completed ===");
        return "OK";
    }
}

Produce and consumer records

To send messages to Kafka, there is a LambdaMSKProducerJava function.

Invoke the function from the Lambda console or CLI within the EC2 instance.

sam remote invoke LambdaMSKProducerJavaFunction --region $AWS_REGION \
	--stack-name msk-lambda-schema-avro-java-sam

You can view the Producer logs to see the 10 records produced.The consumer Lambda function processes the records.

View the consumer Lambda function logs using the Amazon CloudWatch logs console or CLI within the EC2 instance.

sam logs --name LambdaMSKConsumerJavaFunction \
	--stack-name msk-lambda-schema-avro-java-sam --region $AWS_REGION

The Lambda function processes and logs only the records that match the filter FILTER. The Avro binary data is deserialized using Powertools for AWS Lambda. You should see the function logs showing each record processed with the decoded keys and values.

Figure 3: Lambda consumer logs showing Avro processing

Cleaning up

You can clean up the example Lambda function by running the sam delete command.

sam delete

If you created the Amazon MSK cluster and EC2 client instance, then navigate to the CloudFormation console, choose the stack, and choose Delete.

Performance and cost considerations

Schema validation and deserialization can add processing time before your function invocation. However, this overhead is typically minimal when compared to the benefits. ESM caching minimizes schema registry API calls. Using filtering allows you to reduce costs, depending on how effectively your filtering rules eliminate irrelevant events. This feature simplifies the operational overhead of managing schema registry integration code so teams can focus on business logic rather than infrastructure concerns.

Error handling and monitoring

If schema registries become temporarily unavailable, then cached schemas allow event processing to continue until the registry is available again. Authentication failures generate error messages with automatic retry logic. Schema evolution happens seamlessly as Lambda automatically detects and fetches new versions.

If events fail validation or deserialization, they are routed to your configured failure destinations. For Amazon SQS and Amazon SNS destinations, the service sends metadata about the failure. For Amazon S3 destinations, both metadata and the original serialized payload are included for detailed analysis.

You can use standard Lambda monitoring, with more CloudWatch metrics providing visibility into schema validation success rates, registry API usage, and filtering effectiveness.

Conclusion

AWS Lambda now supports Avro and Protobuf formats for Kafka event processing in Provisioned Mode for Kafka ESM. This enables schema validation, event filtering, and integration with both Amazon MSK, Confluent, and self-managed Kafka clusters. Whether you’re building new Kafka applications or migrating existing consumers to Lambda, this native schema registry integration streamlines processing pipelines.

For more information about the Lambda Kafka integration capabilities, go to the learning guide, Lambda ESM documentation. To learn about Lambda pricing, such as Provisioned Mode costs, visit the Lambda pricing page.

For more serverless learning resources, visit Serverless Land.

AWS Weekly Roundup: re:Inforce re:Cap, Valkey GLIDE 2.0, Avro and Protobuf or MCP Servers on Lambda, and more (June 23, 2025)

2025-06-23 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-reinforce-recap-valkey-glide-2-0-avro-and-protobuf-or-mcp-servers-on-lambda-and-more-june-23-2025/

Last week’s hallmark event was the security-focused AWS re:Inforce conference.

Now a tradition, the blog team wrote a re:Cap post to summarize the announcements and link to some of the top blog posts.

To further summarize, several new security innovations were announced, including enhanced IAM Access Analyzer capabilities, MFA enforcement for root users, and threat intelligence integration with AWS Network Firewall. Other notable updates include exportable public SSL/TLS certificates from AWS Certificate Manager, a simplified AWS WAF console experience, and a new AWS Shield feature for proactive network security (in preview). Additionally, AWS Security Hub has been enhanced for risk prioritization (Preview), and Amazon GuardDuty now supports Amazon EKS clusters.

But my favorite announcement came from the Amazon Verified Permissions team. They released an open source package for Express.js, enabling developers to implement external fine-grained authorization for web application APIs. This simplifies authorization integration, reducing code complexity and improving application security.

The team also published a blog post that outlines how to create a Verified Permissions policy store, add Cedar and Verified Permissions authorisation middleware to your app, create and deploy a Cedar schema, and create and deploy Cedar policies. The Cedar schema is generated from an OpenAPI specification and formatted for use with the AWS Command Line Interface (CLI).

Let’s look at last week’s other new announcements.

Last week’s launches
Apart from re:Inforce, here are the launches that got my attention.

AWS Lambda announces native support for Avro and Protobuf formatted Kafka events — AWS Lambda now provides native support for Avro and Protobuf formatted Kafka events with Apache Kafka’s event-source-mapping (ESM). This integration allows you to validate your schema, filter events, and process them using open source Kafka consumer interfaces. You can also use Powertools for AWS Lambda to process your Kafka events without writing custom deserialization code, making it easier to build your Kafka applications with AWS Lambda.

Kafka customers use Avro and Protobuf formats for efficient data storage, fast serialization and deserialization, schema evolution support, and interoperability between different programming languages. They utilize schema registries to manage, evolve, and validate schemas before data enters processing pipelines. Previously, you were required to write custom code within your Lambda function to validate, deserialize, and filter events when using these data formats. With this launch, Lambda natively supports Avro and Protobuf, as well as integration with GSR, CCSR, and SCSR. This enables you to process your Kafka events using these data formats without writing custom code. Additionally, you can optimize costs through event filtering to prevent unnecessary function invocations.

Amazon S3 Express One Zone now supports atomic renaming of objects with a single API call – The RenameObject API simplifies data management in S3 directory buckets by transforming a multi-step rename operation into a single API call. This means you can now rename objects in S3 Express One Zone by specifying an existing object’s name as the source and the new name as the destination within the same S3 directory bucket. With no data movement involved, this capability accelerates applications like log file management, media processing, and data analytics, while also lowering costs. For instance, renaming a 1-terabyte log file can now complete in milliseconds, instead of hours, significantly accelerating applications and reducing costs.
Valkey introduces GLIDE 2.0 with support for Go, OpenTelemetry, and pipeline batching – AWS, in partnership with Google and the Valkey community, announces the general availability of General Language Independent Driver for the Enterprise (GLIDE) 2.0. This is the latest release of one of AWS’s official open-source Valkey client libraries. Valkey, the most permissive open-source alternative to Redis, is stewarded by the Linux Foundation and will always remain open-source. Valkey GLIDE is a reliable, high-performance, multi-language client that supports all Valkey commands

GLIDE 2.0 introduces new capabilities that expand developer support, improve observability, and optimise performance for high-throughput workloads. Valkey GLIDE 2.0 extends its multi-language support to Go (contributed by Google), joining Java, Python, and Node.js to provide a consistent, fully compatible API experience across all four languages. More language support is on the way. With this release, Valkey GLIDE now supports OpenTelemetry, an open-source, vendor-neutral framework that enables developers to generate, collect, and export telemetry data and critical client-side performance insights. Additionally, GLIDE 2.0 introduces batching capabilities, reducing network overhead and latency for high-frequency use cases by allowing multiple commands to be grouped and executed as a single operation.

You can discover more about Valkey GLIDE in this recent episode of the AWS Developers Podcast: Inside Valkey GLIDE: building a next-gen Valkey client library with Rust.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Some other reading
My Belgian compatriot Alexis has written the first article of a two-part series explaining how to develop an MCP Tool server with a streamable HTTP transport and deploy it on Lambda and API Gateway. This is a must-read for anyone implementing MCP servers on AWS. I’m eagerly looking forward to the second part, where Alexis will discuss authentication and authorization for remote MCP servers.

Other AWS events
Check your calendar and sign up for upcoming AWS events.

AWS GenAI Lofts are collaborative spaces and immersive experiences that showcase AWS expertise in cloud computing and AI. They provide startups and developers with hands-on access to AI products and services, exclusive sessions with industry leaders, and valuable networking opportunities with investors and peers. Find a GenAI Loft location near you and don’t forget to register.

AWS Summits are free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Register in your nearest city: Japan (this week June 25 – 26), Online in India (June 26), New-York City (July 16).

Save the date for these upcoming Summits in July and August: Taipei (July 29), Jakarta (August 7), Mexico (August 8), São Paulo (August 13), and Johannesburg (August 20) (and more to come in September and October).

Browse all upcoming AWS led in-person and virtual events here.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— seb

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Amazon Linux 2023 achieves FIPS 140-3 validation

2025-06-18 Mahak Arora

Post Syndicated from Mahak Arora original https://aws.amazon.com/blogs/compute/amazon-linux-2023-achieves-fips-140-3-validation/

AWS announced that Amazon Linux 2023 (AL2023) has achieved Federal Information Processing Standards (FIPS) 140-3 Level 1 validation of our cryptographic modules, marking a significant milestone in our commitment to providing secure, compliant operating system options for regulated workloads. FIPS certified modules are particularly important for US and Canadian government workloads, healthcare applications requiring HIPAA compliance, financial services, defense contractors, and other regulated industries. FIPS 140-3, which supersedes FIPS 140-2, represents the latest government security standard for cryptographic modules, jointly validated by the National Institute of Standards and Technology (NIST) and the Canadian Centre for Cyber Security (CCCS) through the Cryptographic Module Validation Program (CMVP). The validation follows the rigorous requirements outlined in the FIPS 140-3 standard and encompasses critical cryptographic modules including the OpenSSL, Linux Kernel Cryptographic API, NSS, GnuTLS, and Libgcrypt.

These modules have been extensively tested to have robust security capabilities such as approved cryptographic algorithms, secure key management, strong entropy generation, and protected memory boundaries. The validation process was conducted by a NIST-accredited lab, and further reviewed by the Cryptographic Module Validation Program (CMVP). Additionally, the certificate details can be verified on the CMVP Active Validation List.

In order to enable FIPS mode on AL2023, customers can refer to our FIPS Mode enablement guide on AL2023. Amazon Linux maintains its compliance information through AWS Compliance Programs portal for FIPS- 140-3 and official NIST Guidelines and Compliance FAQs, for meeting global regulatory requirements. For regular updates and best practices, follow the AWS Security Blog, FIPS related FAQs on Amazon Linux 2 and Amazon Linux 2023 providing detailed configuration steps and operational guidance for regulated environments. You can also reach out to your AWS account team for help finding the resources you need.

If you have questions about this post, contact AWS Support.

AWS re:Inforce roundup 2025: top announcements

2025-06-17 AWS News Blog Team

Post Syndicated from AWS News Blog Team original https://aws.amazon.com/blogs/aws/aws-reinforce-roundup-2025-top-announcements/

At AWS re:Inforce 2025 (June 16-18, Philadelphia), AWS Vice President and Chief Information Security Officer Amy Herzog delivered the keynote address, announcing new security innovations. Throughout the event, AWS announced additional security capabilities focused on simplifying security at scale and enabling organizations to build more resilient applications in the cloud. Below is a comprehensive roundup of the major security launches and updates announced at this year’s conference.

Verify internal access to critical AWS resources with new IAM Access Analyzer capabilities
A new capability in AWS Identity and Access Management Access Analyzer helps security teams verify which principals within their AWS organization have access to critical resources like S3 buckets, DynamoDB tables, and RDS snapshots by using automated reasoning to evaluate multiple policies and provide findings through a unified dashboard.

AWS IAM now enforces MFA for root users across all account types
The new Multi-Factor Authentication enforcement prevents over 99% of password-related attacks. You can use a range of supported IAM MFA methods, including FIDO-certified security keys to harden access to your AWS accounts. AWS supports FIDO2 passkeys for a user-friendly MFA implementation and allows you to register up to 8 MFA devices per root and IAM user.

Improve your security posture using Amazon threat intelligence on AWS Network Firewall
This new Network Firewall managed rule group offers protection against active threats relevant to workloads in AWS. The feature uses the Amazon threat intelligence system MadPot to continuously track attack infrastructure, including malware hosting URLs, botnet command and control servers, and crypto mining pools, identifying indicators of compromise (IOCs) for active threats.

AWS Certificate Manager introduces exportable public SSL/TLS certificates to use anywhere
You can now use AWS Certificate Manager to issue exportable public certificates for your AWS, hybrid, or multicloud workloads that require secure TLS traffic termination.

AWS WAF simplified console experience
The new AWS WAF console experience reduces security configuration steps by up to 80% through pre-configured protection packs. Security teams can quickly implement comprehensive protection for specific application types, with consolidated security metrics and customizable controls through an intuitive interface.

Amazon CloudFront simplifies web application delivery and security with new user-friendly interface
Try the simplified console experience with Amazon CloudFront to accelerate and secure web applications within a few clicks by automating TLS certificate provisioning, DNS configuration, and security settings through an integrated interface with AWS WAF’s enhanced Rule Packs.

New AWS Shield feature discovers network security issues before they can be exploited (Preview)
Shield network security posture management automatically discovers and analyzes network resources across AWS accounts, prioritizes security risks based on AWS best practices, and provides actionable remediation recommendations to protect applications against threats like SQL injections and DDoS attacks.

Unify your security with the new AWS Security Hub for risk prioritization and response at scale (Preview)
AWS Security Hub has been enhanced to transform security signals into actionable insights, helping security teams prioritize and respond to critical issues at scale. This unified solution provides comprehensive visibility across your cloud environment while reducing the complexity of managing multiple security tools.

Amazon GuardDuty expands Extended Threat Detection coverage to Amazon EKS clusters
Amazon GuardDuty Extended Threat Detection now supports Amazon EKS clusters, helping you detect sophisticated multistage attacks by correlating security signals across Kubernetes audit logs, runtime behaviors, and AWS API activities. This enhancement automatically identifies critical attack sequences that might otherwise go unnoticed, enabling faster response to threats.

New categories for the AWS MSSP Competency
The AWS MSSP Competency (previously AWS Level 1 MSSP Competency) now includes new categories covering infrastructure security, workload security, application security, data protection, identity and access management, incident response, and cyber recovery. Partners provide 24/7 monitoring and incident response through dedicated Security Operations Centers.

Secure your Express application APIs in minutes with Amazon Verified Permissions
Amazon Verified Permissions announced the release of the verified-permissions-express-toolkit, an open-source package that allows developers to implement authorization for Express web application APIs in minutes using Amazon Verified Permissions.

Beyond compute: Shifting vulnerability detection left with Amazon Inspector code security
Amazon Inspector code security capabilities are now generally available, helping you secure applications before production by rapidly identifying and prioritizing security vulnerabilities and misconfigurations across application source code, dependencies, and infrastructure as code (IaC).

AWS Backup adds new Multi-party approval for logically air-gapped vaults
Multi-party approval for AWS Backup logically air-gapped vaults enables you to recover your backup data even when your AWS account is compromised, by leveraging authorization from a designated approval team of trusted individuals who can enable vault sharing with a recovery account.

Amazon GuardDuty expands Extended Threat Detection coverage to Amazon EKS clusters

2025-06-17 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/amazon-guardduty-expands-extended-threat-detection-coverage-to-amazon-eks-clusters/

Today, I’m happy to announce Amazon GuardDuty Extended Threat Detection with expanded coverage for Amazon Elastic Kubernetes Service (Amazon EKS), building upon the capabilities we introduced in our AWS re:Invent 2024 announcement of Amazon GuardDuty Extended Threat Detection: AI/ML attack sequence identification for enhanced cloud security.

Security teams managing Kubernetes workloads often struggle to detect sophisticated multistage attacks that target containerized applications. These attacks can involve container exploitation, privilege escalation, and unauthorized movement within Amazon EKS clusters. Traditional monitoring approaches might detect individual suspicious events, but often miss the broader attack pattern that spans across these different data sources and time periods.

GuardDuty Extended Threat Detection introduces a new critical severity finding type, which automatically correlates security signals across Amazon EKS audit logs, runtime behaviors of processes associated with EKS clusters, malware execution in EKS clusters, and AWS API activity to identify sophisticated attack patterns that might otherwise go unnoticed. For example, GuardDuty can now detect attack sequences in which a threat actor exploits a container application, obtains privileged service account tokens, and then uses these elevated privileges to access sensitive Kubernetes secrets or AWS resources.

This new capability uses GuardDuty correlation algorithms to observe and identify sequences of actions that indicate potential compromise. It evaluates findings across protection plans and other signal sources to identify common and emerging attack patterns. For each attack sequence detected, GuardDuty provides comprehensive details, including potentially impacted resources, timeline of events, actors involved, and indicators used to detect the sequence. The findings also map observed activities to MITRE ATT&CK® tactics and techniques and remediation recommendations based on AWS best practices, helping security teams understand the nature of the threat.

To enable Extended Threat Detection for EKS, you need at least one of these features enabled: EKS Protection or Runtime Monitoring. For maximum detection coverage, we recommend enabling both to enhance detection capabilities. EKS Protection monitors control plane activities through audit logs, and Runtime Monitoring observes behaviors within containers. Together, they create a complete view of your EKS clusters, enabling GuardDuty to detect complex attack patterns.

How it works
To use the new Amazon GuardDuty Extended Threat Detection for EKS clusters, go to the GuardDuty console to enable EKS Protection in your account. From the Region selector in the upper-right corner, select the Region where you want to enable EKS Protection. In the navigation pane, choose EKS Protection. On the EKS Protection page, review the current status and choose Enable. Select Confirm to save your selection.

After it’s enabled, GuardDuty immediately starts monitoring EKS audit logs from your EKS clusters without requiring any additional configuration. GuardDuty consumes these audit logs directly from the EKS control plane through an independent stream, which doesn’t affect any existing logging configurations. For multi-account environments, only the delegated GuardDuty administrator account can enable or disable EKS Protection for member accounts and configure auto-enable settings for new accounts joining the organization.

To enable Runtime Monitoring, choose Runtime Monitoring in the navigation pane. Under the Configuration tab, choose Enable to enable Runtime Monitoring for your account.

Now, you can view from the Summary dashboard the attack sequences and critical findings specifically related to Kubernetes cluster compromise. You can observe that GuardDuty identifies complex attack patterns in Kubernetes environments, such as credential compromise events and suspicious activities within EKS clusters. The visual representation of findings by severity, resource impact, and attack types gives you a holistic view of your Amazon EKS security posture. This means you can prioritize the most critical threats to your containerized workloads.

The Finding details page provides visibility into complex attack sequences targeting EKS clusters, helping you understand the full scope of potential compromises. GuardDuty correlates signals into a timeline, mapping observed behaviors to MITRE ATT&CK® tactics and techniques such as account manipulation, resource hijacking, and privilege escalation. This granular level of insight reveals exactly how attackers progress through your Amazon EKS environment. It identifies affected resources like EKS workloads and service accounts. The detailed breakdown of indicators, actors, and endpoints provides you with actionable context to understand attack patterns, determine impact, and prioritize remediation efforts. By consolidating these security insights into a cohesive view, you can quickly assess the severity of Amazon EKS security incidents, reduce investigation time, and implement targeted countermeasures to protect your containerized applications.

The Resources section of the Finding details page shows context about the specific assets affected during an attack sequence. This unified resource list provides you with visibility into the exact scope of the compromise—from the initial access to the targeted Kubernetes components. Because GuardDuty includes detailed attributes such as resource types, identifiers, creation dates, and namespace information, you can rapidly assess which components of your containerized infrastructure require immediate attention. This focused approach eliminates guesswork during incident response, so you can prioritize remediation efforts on the most critical affected resources and minimize the potential blast radius of Amazon EKS targeted attacks.

Now available
Amazon GuardDuty Extended Threat Detection with expanded coverage for Amazon EKS clusters provides comprehensive security monitoring across your Kubernetes environment. You can use this capability to detect sophisticated multistage attacks by correlating events across different data sources, identifying attack sequences that traditional monitoring might miss.

To start using this expanded coverage, enable EKS Protection in your GuardDuty settings and consider adding Runtime Monitoring for enhanced detection capabilities.

For more information about this new capability, refer to the Amazon GuardDuty Documentation.

— Esra

Unify your security with the new AWS Security Hub for risk prioritization and response at scale (Preview)

2025-06-17 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/unify-your-security-with-the-new-aws-security-hub-for-risk-prioritization-and-response-at-scale-preview/

AWS Security Hub has been a central place for you to view and aggregate security alerts and compliance status across Amazon Web Services (AWS) accounts. Today, we are announcing the preview release of the new AWS Security Hub which offers additional correlation, contextualization, and visualization capabilities. This helps you prioritize critical security issues, respond at scale to reduce risks, improve team productivity, and better protect your cloud environment.

Here’s a quick look at the new AWS Security Hub.

With this new enhancement, AWS Security Hub integrates security capabilities like Amazon GuardDuty, Amazon Inspector, AWS Security Hub Cloud Security Posture Management (CSPM), Amazon Macie, and other AWS security capabilities to help you gain visibility across your cloud environment through centralized management in a unified cloud security solution.

Getting started with the new AWS Security Hub
Let me walk you through how to get started with AWS Security Hub.

If you’re a new customer to AWS Security Hub, you need to navigate to the AWS Security Hub console to enable AWS security capabilities and capabilities and start assessing risk across your organization. You can learn more on the Documentation page.

After you have AWS Security Hub enabled, it will automatically consume data from supporting security capabilities you’ve enabled, such as Amazon GuardDuty, Amazon Inspector, Amazon Macie, and AWS Security Hub CSPM. You can navigate to the AWS Security Hub console to view these findings and benefit from insights created through correlation of findings across these capabilities.

As security risks are uncovered, they’re presented in a redesigned Security Hub summary dashboard. The new Security Hub summary dashboard provides a comprehensive, unified view of your AWS security posture. The dashboard organizes security findings into distinct categories, making it easier to identify and prioritize risks.

The new Exposure summary widget helps you identify and prioritize security exposures by analyzing resource relationships and signals from Amazon Inspector, AWS Security Hub CSPM, and Amazon Macie. These exposure findings are automatically generated and are a key part of the new solution, highlighting where your critical security exposures are located. You can learn more about exposure on the Documentation page.

AWS Security Hub now provides a Security coverage widget designed to help you identify potential coverage gaps. You can use this widget to identify where you’re missing coverage by the security capabilities that power Security Hub. This visibility helps you identify which capabilities, accounts, and features you need to address to improve your security coverage.

As you can see on the navigation menu, AWS Security Hub is organized into five key areas to streamline security management:

Exposure: Provides visibility into all exposure findings, a security vulnerability or misconfiguration that could potentially expose an AWS resource or system to unauthorized access or compromise, generated by Security Hub, helping you identify resources that might be accessible from outside your environment
Threats: Consolidates all threat findings generated by Amazon GuardDuty, showing potential malicious activities and intrusion attempts
Vulnerabilities: Displays all vulnerabilities detected by Amazon Inspector, highlighting software flaws and configuration issues
Posture management: Shows all posture management findings from AWS Security Hub Cloud Security Posture Management (CSPM), helping provide compliance with security best practices
Sensitive data: Presents all sensitive data findings identified by Amazon Macie, helping you track and protect your sensitive information

When you navigate to the Exposure page, you’ll see findings grouped by title, with severity levels clearly indicated to help you focus on critical issues first.

To explore specific exposures, you can select any finding to see affected resources. The panel includes key information about the implicated resource, account, Region, and when the issue was detected.

In this panel, you’ll also find an attack path visualization that is particularly useful for understanding complex security relationships. For network exposure paths, you can see all components involved in the path—including virtual private clouds (VPCs), subnets, security groups, network access control lists (ACLs), and load balancers—helping you identify exactly where to implement security controls. The visualization also highlights Identity and Access Management (IAM) relationships, showing how permission configurations might allow privilege escalation or data access. Resources with multiple contributing traits are clearly marked so you can quickly identify which components represent the greatest risk.

The Threats dashboard provides actionable insights into potential malicious activities detected by Amazon GuardDuty, organizing findings by severity so you can quickly identify critical issues like unusual API calls, suspicious network traffic, or potential credential compromises. The dashboard includes GuardDuty Extended Threat Detection findings, with all “Critical” severity threats representing these Extended Threat Detections that require immediate attention.

Similarly, the Vulnerabilities dashboard from Amazon Inspector provides a comprehensive view of software vulnerabilities and network exposure risks. The dashboard highlights vulnerabilities with known exploits, packages requiring urgent updates, and resources with the highest numbers of vulnerabilities.

Another valuable new feature is the Resources view, which provides an inventory of all resources deployed in your organization covered by AWS Security Hub. You can use this view to quickly identify which resources have findings against them and filter by resource type or finding severity. Selecting any resource provides detailed configuration information without needing to pivot to other consoles, streamlining your investigation workflow.

The new Security Hub also offers integration capabilities to help you comprehensively monitor your cloud environments and connect with third-party security solutions. This gives you the flexibility to create a unified security solution tailored to your organization’s specific needs.

For example, with integration capability, when viewing a security finding, you can select the Create ticket option and choose your preferred ticketing integration.

Additional things to know
Here are a couple of things to note:

Availability – During this preview period, the new AWS Security Hub is available in following AWS Regions: US East (N. Virginia, Ohio), US West (N. California, Oregon), Africa (Cape Town), Asia Pacific (Hong Kong, Jakarta, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), Middle East (Bahrain), and South America (São Paulo).
Pricing – The new AWS Security Hub is available at no additional charge during the preview period. However, you will still incur costs for the integrated capabilities including Amazon GuardDuty, Amazon Inspector, Amazon Macie, and AWS Security Hub CSPM.
Integration with existing AWS security capabilities – Security Hub integrates with Amazon GuardDuty, Amazon Inspector, AWS Security Hub CSPM, and Amazon Macie, providing a comprehensive security posture without additional operational overhead.
Enhanced data interoperability – The new Security Hub uses the Open Cybersecurity Schema Framework (OCSF), enabling seamless data exchange across your security capabilities with normalized data formats.

To learn more about the enhanced AWS Security Hub and join the preview, visit the AWS Security Hub product page.

Happy building!

— Donnie

AWS Backup adds new Multi-party approval for logically air-gapped vaults

2025-06-17 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/aws-backup-adds-new-multi-party-approval-for-logically-air-gapped-vaults/

Today, we’re announcing the general availability of a new capability that integrates AWS Backup logically air-gapped vaults with Multi-party approval to provide access to your backups even when your AWS account is inaccessible due to inadvertent or malicious events. AWS Backup is a fully managed service that centralizes and automates data protection across AWS services and hybrid workloads. It provides core data protection features, ransomware recovery capabilities, and compliance insights and analytics for data protection policies and operations.

As a backup administrator, you use AWS Backup logically air-gapped vaults to securely share backups across accounts and organizations, logically isolate your backup storage, and support direct restore to help reduce recovery time following an inadvertent or malicious event. However, if a bad or unintended actor gains root access to your backup account or the management account of your organization, your backups suddenly become inaccessible, even though they’re still safely stored in the logically air-gapped vault. While traditional account recovery involved working through support channels, AWS Backup with Multi-party approval delivers immediate access to recovery tools, empowering you with faster resolution times and greater control over your recovery timeline.

Multi-party approval for AWS Backup logically air-gapped vaults adds an additional layer of protection for you to recover your application data even when your AWS account becomes completely inaccessible. Using Multi-party approval, you can create approval teams which consist of highly trusted individuals in your organization, then associate them with your logically air-gapped vault. If you get locked out of your AWS accounts due to inadvertent or malicious actions, you can request your own approval team to authorize sharing of your vault from any account, even those outside your AWS Organizations account. Once approved, you gain authorized access to your backups and can begin your recovery process.

How it works
Multi-party approval for AWS Backup logically air-gapped vaults combines the security of logically air-gapped vaults with the governance of Multi-party approval to create a recovery mechanism that works even when your AWS account is compromised. Here’s how it works:

1. Approval team creation
First, you create an approval team in your AWS Organizations management account. If the management account is new, first create an AWS Identity and Access Management (IAM) Identity Center instance before creating the approval team. The approval team consists of trusted individuals (IAM Identity Center users) who will be authorized to approve vault sharing requests. Each approver receives an invitation to join the approval team through a new Approval portal.

2. Vault association
When your approval team is active, you share it with accounts that own logically air-gapped vaults using AWS Resource Access Manager (AWS RAM) to safeguard against requests for approval from arbitrary accounts. Backup administrators can then associate this approval team with new or existing logically air-gapped vaults.

3. Protection against compromise
If your AWS account becomes compromised or inaccessible, you can request access to your backups from a different account (a clean recovery account). This request includes the Amazon Resource Name (ARN) of the logically air-gapped vault in the format arn:aws:backup:<region>:<account>:backup-vault:<name> and an optional vault name and comment.

4. Multi-party approval
The request is sent to the approval team, who review it through the approval portal. When the minimum required number of approvers authorize the request, the vault is automatically shared with the requesting account. All requests and approvals are comprehensively logged in AWS CloudTrail.

5. Recovery process
With access granted, you can immediately start restoring or copying your data in the new recovery account without waiting for your compromised account to be remediated.

This approach provides an entirely separate authentication path to access and recover your backups, completely independent of your AWS account credentials. Even if the bad actor has root access to your account, they can’t prevent the approval team-based recovery process.

1. Create a new logically air-gapped vault
To create a new logically air-gapped vault, provide a name, tags (optional), and vault lock properties.

2. Assign an approval team
When the vault has been created, choose Assign approval team to assign it with an existing approval team.

Choose an existing approval team from the drop-down menu then select Submit to finalize the assignment.

Now your approval team is assigned to your logically air-gapped vault.

Good to know
It’s essential to test your recovery process before an actual emergency:

From a different AWS account, use the AWS Backup console or API to request sharing of your logically air-gapped vault by providing the vault ID and ARN.
Request approval of your request from the approval team.
Once approved, verify that you can access and restore backups from the vault in your testing account.

As a best practice, monitor the health of your approval team regularly using AWS Backup Audit Manager to ensure they have sufficient active participants to meet your approval threshold.

Multi-party approval for enhanced cloud governance
Today, we’re also announcing the general availability of a new capability that AWS account administrators can use to add Multi-party approval to their product offerings. As highlighted in this post, AWS Backup is the first service to integrate this capability. With Multi-party approval, administrators can enable application owners to guard sensitive service operations with a distributed review process.

Good to know
Multi-party approval provides several significant security advantages:

Distributed decision-making, eliminating single points of failure
Full auditability through AWS CloudTrail integration
Protection against compromised credentials
Formal governance for compliance-sensitive operations
Consistent approval experience across integrated services

Now available

Multi-party approval is available today in all AWS Regions where AWS Organizations is available. Multi-party approval for AWS Backup logically air-gapped vaults is available in all AWS Regions where AWS Backup is available.

– Veliswa.

New AWS Shield feature discovers network security issues before they can be exploited (Preview)

2025-06-17 Esra Kayabali

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/new-aws-shield-feature-discovers-network-security-issues-before-they-can-be-exploited-preview/

Today, I’m happy to announce AWS Shield network security director (preview), a capability that simplifies identification of configuration issues related to threats such as SQL injections and distributed denial of service (DDoS) events, and proposes remediations. This feature identifies and analyzes network resources, connections, and configurations. It compares them against AWS best practices to create a network topology that highlights resources requiring protection.

Organizations today face significant challenges in maintaining a robust network security posture. Security teams often struggle to efficiently discover all resources in their environments, understand how these resources are interconnected, and identify which security services are currently configured. Additionally, they find determining how well resources are configured relative to AWS best practices requires considerable expertise and effort. Many teams find it difficult to identify which network security services and rule sets would best protect their applications from common and emerging threats.

AWS Shield network security director addresses these challenges through three key capabilities. First, it performs comprehensive analysis to discover resources across your AWS accounts, identify connectivity between resources, and determine which network security services and configurations are currently in place. Second, it prioritizes resources by severity level based on AWS network security best practices and threat intelligence. Finally, it provides specific remediation recommendations such as step-by-step instructions for implementing the right AWS security services, including AWS WAF, Amazon Virtual Private Cloud (Amazon VPC) security groups, and Amazon VPC network access control lists (ACLs) to protect your resources.

The service supports critical network security use cases, including protecting applications against internet-born threats and controlling human access to resources based on port, protocol, or IP address range. It provides network analysis to discover assets and delivers analysis that eliminates time-consuming manual processes for identifying resources that need protection. The service offers resource prioritization by assigning security findings a severity level based on network context and adherence to AWS best practices, helping you focus on what matters most. Additionally, it supplies actionable recommendations with specific guidance on which services and configurations will address each security gap. You can also get answers, in natural language, from AWS Shield network security director from within Amazon Q Developer in the AWS Management Console and chat applications.

Getting started with AWS Shield network security director
To use AWS Shield network security director, I need to initiate a network analysis of my AWS resources. I go to the AWS WAF & Shield console and choose Getting started under AWS Shield network security director in the navigation pane. I choose Get started, which takes me to the configuration page. On this page, I can choose how to perform my first network analysis: I can assess findings from across all supported Regions or from my current Region only. I select Start network analysis.

After the analysis is completed, the dashboard page shows a breakdown of resource types by severity level and the most common categories of network security findings associated with their resources. Resources are categorized by type and severity level (critical, high, medium, low, informational), making it easy to identify which areas need immediate attention.

Next, I explore the Resources section to understand the distribution of my assets and filter by severity level in my environment. I can use Resource overview to review a specific severity level, which will redirect me to the Resources under Network security director with the associated severity level filter. I choose the resources that have Medium severity level.

I choose a specific resource to view its network topology map showing how it connects to other resources and associated findings. This visualization helps me understand the potential impact of security configurations and identify exposed paths. I review detailed findings such as “Allows unrestricted inbound access (0.0.0.0/0) on all ports” with severity ratings.

Next, I go to Findings under Network security director, which shows common configuration issues. For each finding, I receive detailed information and recommended remediation steps. The service rates the severity of findings (high, medium, low) to help me prioritize my response. Critical-severity findings such as “CloudFront origin is also internet accessible without CloudFront protections” or high-severity findings such as “Allows unrestricted inbound access (0.0.0.0/0) on all ports” are presented first, followed by medium- and low-severity issues.

You can analyze your network security configurations, in natural language, with AWS Shield network security director within Amazon Q Developer in the AWS Management Console and chat applications. For example, you can say “Do I have any network security issues on my CloudFront distributions?” or “Are any of my resources vulnerable to bots and scrapers?” This integration helps security teams quickly understand their security posture and receive guidance on implementing best practices without having to navigate through extensive documentation.

To explore this capability, I ask “What are my most critical network security issues?” in the Explore with Amazon Q section. Amazon Q analyzes my network security configuration and generates a response based on the security assessment of my AWS environment.

With this comprehensive view of your network security, you can now make data-driven decisions to strengthen your defenses against emerging threats.

Join the preview
AWS Shield network security director is available in the US East (N. Virginia) and Europe (Stockholm) Regions. The Amazon Q Developer capability to analyze network security configurations is available in preview in US East (N. Virginia). To begin strengthening your network security, visit the AWS Shield network security director console and initiate your first network security analysis.

For more information, visit the AWS Shield product page.

— Esra

Amazon CloudFront simplifies web application delivery and security with new user-friendly interface

2025-06-17 Micah Walter

Post Syndicated from Micah Walter original https://aws.amazon.com/blogs/aws/amazon-cloudfront-simplifies-web-application-delivery-and-security-with-new-user-friendly-interface/

Today, we’re announcing a new simplified onboarding experience for Amazon CloudFront that developers can use to accelerate and secure their web applications in seconds. This new experience, along with improvements to the AWS WAF console experience, makes it easier than ever for developers to configure content delivery and security services without requiring deep technical expertise.

Setting up content delivery and security for web applications traditionally required navigating multiple Amazon Web Services (AWS) services and making numerous configuration decisions. With this new CloudFront onboarding experience, developers can now create a fully configured distribution with DNS and a TLS certificate in just a few clicks.

Amazon CloudFront offers compelling benefits for organizations of all sizes looking to deliver content and applications globally. As a content delivery network (CDN), CloudFront significantly improves application performance by serving content from edge locations closest to your users, reducing latency and improving user experience. Beyond performance, CloudFront provides built-in security features that protect your applications from distributed denial of service (DDoS) attacks and other threats at the edge, preventing malicious traffic from reaching your origin infrastructure. The service automatically scales with your traffic demands without requiring any manual intervention, handling both planned and unexpected traffic spikes with ease. Whether you’re running a small website or a large-scale application, the CloudFront integration with other AWS services and the new simplified console experience makes it easier than ever to implement these essential capabilities for your web applications.

Streamlined CloudFront configuration

The new CloudFront console experience guides developers through a simplified workflow that starts with the domain name they want to use for their distribution. When using Amazon Route 53, the experience automatically handles TLS certificate provisioning and DNS record configuration, while incorporating security best practices by default. This unified approach eliminates the need to switch between multiple services like AWS Certificate Manager, Route 53, and AWS WAF, and offers developers a faster time to production without the need to dive deep on the nuanced configuration options of each service.

For example, a developer can now create a secure CloudFront distribution for their applications fronted by a load balancer by entering their domain name and selecting their load balancer as the origin. The console automatically recommends optimal CDN and security configurations based on the application type and requirements, and developers can deploy with confidence knowing they’re following AWS best practices.

For developers who wish to host a static website on Amazon Simple Storage Service (Amazon S3), CloudFront provides several important benefits. First, it improves your website’s performance by caching content at edge locations closer to your users, reducing latency and improving page load times. Second, it helps protect your S3 bucket by acting as a security layer—CloudFront can be configured to be the only way to access your content, preventing direct access to your S3 bucket. The new experience automatically configures these security best practices for you.

Enhanced security integration with AWS WAF

Complementing the new CloudFront experience, we’re also introducing an improved AWS WAF console that features intelligent Rule Packs—curated sets of security rules based on application type and security requirements. These Rule Packs enable developers to implement comprehensive security controls without needing to be security experts.

When creating a CloudFront distribution, developers can now enable AWS WAF protection through an integrated experience that uses these new Rule Packs. The console provides clear recommendations for security configurations that developers can use to preview and validate their settings before deployment.

Web applications face numerous security threats today, including SQL injection attacks, cross-site scripting (XSS), and other OWASP Top 10 vulnerabilities. With the new AWS WAF integration, you automatically get protection against these common attack vectors. The recommended Rule Packs provide immediate protection against malicious bot traffic, common web exploits, and known bad actors while preventing direct-to-origin attacks that could overwhelm your infrastructure.

Let’s take a look

If you’ve ever created an Amazon CloudFront distribution, you’ll immediately notice that things have changed. The new experience is straightforward to follow and understand. For my example, I chose to create a distribution for a static website using Amazon S3 as my origin.

New onboarding experience for Amazon CloudFront

In Step 1, I give my distribution a name and select from Single website or app or the new Multi-tenant architecture option, which I can use to configure distributions that use multiple domains but share a common configuration. I choose Single website or app and enter an optional domain name. With the new experience, I can use the Check domain button to verify I have my domain as a Route 53 zone file.

Next, I select the origin for the distribution, which is where CloudFront will fetch the content to serve and cache. For my Origin type, I select Amazon S3. As the preceding screenshot shows, there are several additional options to choose from. Each of the options is designed to make configuration as straightforward as possible for the most popular use cases. Next, I select my S3 bucket, either by typing in the bucket name or using the Browse S3 button.

Next, I have several settings related to using Amazon S3 as my origin. The Grant CloudFront access to origin option is an important one. This option (selected by default) will update my S3 bucket policy to allow CloudFront to access my bucket and will configure my bucket for origin access control. This way, I can use a completely private bucket and know that assets in my bucket can only be accessed through CloudFront. This is a critical step to keeping my bucket and assets secure.

In the next step, I’m presented with the option to configure AWS WAF. With AWS WAF enabled, my web servers are better protected because it inspects each incoming request for potential threats before allowing them to make their way to my web servers. There is a cost to enabling AWS WAF, and as you can see in the following screenshot, there is a calculator to help estimate additional charges.

New onboarding experience for Amazon CloudFront

Now available

The new CloudFront onboarding experience and enhanced AWS WAF console are available today in all AWS Regions where these services are offered. You can start using these new features through the AWS Management Console. There are no additional charges for using these new experiences—you pay only for the CloudFront and AWS WAF resources you use, based on their respective pricing models.

To learn more about the new CloudFront onboarding experience and AWS WAF improvements, visit the Amazon CloudFront Documentation and AWS WAF Documentation. Start building faster, more secure web applications today with these simplified experiences.

Challenges when using incomplete metadata for custom and external data

Customer Spotlight

Key Benefits

Solution overview

Prerequisites

Create an asset

Generate business metadata

Conclusion

About the authors

Solution overview

Prerequisites

Prepare the data

Identify existing Python UDFs

Convert the Python UDF function to a Lambda UDF

Create a Lambda function

Create an Amazon Redshift IAM role

Create a Lambda UDF

Test the solution

Replace the Python UDF with the Lambda UDF

Cost evaluation

Monitor Lambda UDFs

CloudWatch metrics and logs for Lambda functions

Redshift query performance

Monitoring approach for Lambda UDFs in Amazon Redshift

Considerations

Clean up

Conclusion

About the authors

Access Data Catalog usage metrics in Amazon CloudWatch console

Monitor CallCount metrics

Metrics and dimensions for Data Catalog usage metrics

Set up CloudWatch alarms for Data Catalog usage metrics

Conclusion

About the authors

Avro and Protobuf

What’s new

How schema validation works

Event record format

Configuration and setup

Seeing the new features in action

Deploy a sample Amazon MSK cluster

Deploy the Glue schema registry and consumer Lambda function

Cleaning up

Performance and cost considerations

Error handling and monitoring

Conclusion

The collective thoughts of the interwebz