Tag Archives: announcements

Improve your app authentication workflow with new Amazon Cognito features

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/improve-your-app-authentication-workflow-with-new-amazon-cognito-features/

Introduced 10 years ago, Amazon Cognito is a service that helps you implement customer identity and access management (CIAM) in your web and mobile applications. You can use Amazon Cognito for various use cases, from providing your customers to quickly add sign-in and sign-up experiences to your applications and authorization to securing machine-to-machine authentication and enabling role-based access to AWS resources.

Today, I’m excited to share a series of significant updates to Amazon Cognito. These enhancements aim to provide you with more flexibility, improved security, and a better user experience for your applications.

Here’s a quick summary:

A new developer-focused console experience
Amazon Cognito now offers a streamlined getting-started experience featuring a quick wizard and use case-specific recommendations. This new approach helps you set up configurations and reach your end users faster and more efficiently than ever before.

This is the new Amazon Cognito flow to help you quickly set up your application. You can get started in three steps:

  1. Choose the type of application you need to build
  2. Configure the sign-in options according to the type of your application
  3. Follow the instructions to integrate the sign-in and sign-up pages with your application

Then, select Create.

Amazon Cognito then automatically creates your application and a new user pool, which is a user directory for authentication and authorization. From here, you can review your sign-in page by selecting View login page or get started with the example code for your application. Furthermore, Amazon Cognito supports major application frameworks and offers detailed instructions for integrating them using standard OpenID Connect (OIDC) and OAuth open source libraries.

This is the new overview dashboard for your application. The user pool dashboard now provides important information in the Details section, as well as a set of Recommendations to help you continue your development journey.

On this page, you can customize your users’ sign-in and sign-up experience with the Managed Login feature. This is a good segue for me to provide you with a quick overview of the next new feature.

Introducing Managed Login
The introduction of Managed Login brings a new level of customization to Amazon Cognito. Managed Login handles the heavy lifting of availability, scaling, and security for your company. Once integrated, you automatically get all the new security patches and future features without further code changes.

This feature allows you to create personalized sign-up and sign-in experiences that are a seamless part of your company’s application for your end users.

Before you can use Managed Login, you need to assign a domain. There are two ways to do this: use a prefix domain, a randomly generated sub-domain of Amazon Cognito domain, or use your own custom domain to provide your users with a familiar domain name.

Then, you can choose your Branding version, selecting either Managed login or classic Hosted UI.

If you’re an existing Amazon Cognito user, you might be familiar with the classic Hosted UI feature. Managed Login is the improved version of Hosted UI, offering a new collection of web interfaces for sign-up and sign-in, built-in responsiveness for different screen sizes, multi-factor authentication, and password-reset activities in your user pool.

With Managed Login, you can use the new branding designer, a no-code visual editor for managed login assets and style, and a set of API operations for programmatic configuration or deployment via infrastructure-as-code with AWS CloudFormation.

With the branding designer, you have the flexibility to customize the look and feel of the entire user journey, from sign up and sign in to password recovery and multi-factor authentication. This feature provides a real time preview and convenient shortcuts to preview screens in different screen sizes and display modes before you launch it.

You can learn more about Managed Login by visiting the Managed Login documentation page.

Passwordless login support
The Managed Login feature also offers pre-built integrations for passwordless authentication methods, including signing in with passkeys, email OTP (one-time-password) and SMS OTP. Passkey support allows users to authenticate using cryptographic keys stored securely on their devices, offering better security compared to traditional passwords. This capability helps you implement low-friction and secure authentication methods without the need to understand and implement WebAuthn related protocols.

By reducing the friction associated with traditional password-based sign-ins, this feature simplifies application access for your users while maintaining high security standards.

Visit the user pools authentication flow documentation page to learn more about the passwordless login support.

More options on pricing tiers: Lite, Essentials and Plus
Amazon Cognito has introduced new user pool feature tiers: Lite, Essentials, and Plus. These tiers are designed to cater to different customer needs and use cases with the Essentials tier being the default tier for new users pools created by customers. This new tier structure also allows you to choose the most appropriate option based on your application requirements, with the flexibility to switch between tiers as needed.

To check your current tier, you can go to your application dashboard and select Feature plan. You can also select Settings from the navigation menu.

On this page, you’ll get detailed information for each tier and the option to downgrade or upgrade your plan.

Here’s a quick overview of each tier:

  1. Lite tier: Existing features such as user registration, password-based authentication, and social identity provider integration are now packaged in this tier. If you’re an existing Amazon Cognito user, you can continue using these features without making changes to your user pools. 

  2. Essentials tier: Offers comprehensive authentication and access control features, allowing you to implement secure, scalable, and customized sign-up and sign-in experiences for your application within minutes. It includes all capabilities in Lite along with supporting Managed Login and passwordless login options using passkeys, email, or SMS. Essentials also supports customizing access tokens and disallowing password reuse.

  3. Plus tier: Builds upon the Essentials tier, focusing on elevated security needs. It includes all Essentials features plus threat protection capabilities against suspicious login activity, detection of compromised credentials, risk-based adaptive authentication, and the ability to export user authentication event logs for threat analysis.

Pricing for the Lite, Essentials and Plus tiers is based on monthly active users. Customers currently using the advanced security features of Amazon Cognito should consider the Plus tier, which includes all the advanced security features, additional capabilities such as passwordless, and up to 60 percent savings as compared to using the standalone advanced security features.

If you want to learn about these new pricing tiers, see the Amazon Cognito pricing page.

Things you need to know

  • Availability – The Essentials and Plus tier are available in all AWS Regions where Amazon Cognito is available except AWS GovCloud (US) Regions.
  • Free tier on Lite and Essentials tiers – Customers on the Lite and Essentials tiers can enjoy the free tier each month that does not automatically expire. It is available to both existing and new AWS customers indefinitely. For more details on free tier, please visit the Amazon Cognito pricing page.

  • Extended pricing benefit for existing customers – Customers are eligible to upgrade their user pools without advanced security features (ASF) in their existing accounts to Essentials and pay the same price as Cognito user pools until November 30, 2025. To be eligible, customers’ accounts must have had at least 1 monthly active user (MAU) in the last 12 months on or before 10:00am Pacific Time, November 22, 2024. These customers are also eligible to create new user pools with Essentials tier at the same price as Cognito users pools in those accounts until November 30, 2025.

With these updates, you can implement secure, scalable, and customizable authentication solutions for your applications with Amazon Cognito.

Happy building,
Donnie

Node.js 22 runtime now available in AWS Lambda

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/node-js-22-runtime-now-available-in-aws-lambda/

This post is written by Julian Wood, Principal Developer Advocate, and Andrea Amorosi, Senior SA Engineer.

You can now develop AWS Lambda functions using the Node.js 22 runtime, which is in active LTS status and ready for production use. Node.js 22 includes a number of additions to the language, including require()ing ES modules, as well as changes to the runtime implementation and the standard library. With this release, Node.js developers can take advantage of these new features and enhancements when creating serverless applications on Lambda.

You can develop Node.js 22 Lambda functions using the AWS Management ConsoleAWS Command Line Interface (AWS CLI)AWS SDK for JavaScriptAWS Serverless Application Model (AWS SAM)AWS Cloud Development Kit (AWS CDK), and other infrastructure as code tools.

To use this new version, specify a runtime parameter value of nodejs22.x when creating or updating functions or by using the appropriate container base image.

You can use Node.js 22 with Powertools for AWS Lambda (TypeScript), a developer toolkit to implement serverless best practices and increase developer velocity. Powertools for AWS Lambda includes libraries to support common tasks such as observability, AWS Systems Manager Parameter Store integration, idempotency, batch processing, and more. You can also use Node.js 22 with Lambda@Edge to customize low-latency content delivered through Amazon CloudFront.

This blog post highlights important changes to the Node.js runtime, notable Node.js language updates, and how you can use the new Node.js 22 runtime in your serverless applications.

Node.js 22 language updates

Node.js 22 introduces several language updates and features that enhance developer productivity and improve application performance.

This release adds support for loading ECMAScript modules (ESM) using require(). You can enable this feature using the --experimental-require-module flag by configuring the NODE_OPTIONS environment variable. require() support for synchronous ESM graphs bridges the gap between CommonJS and ESM, providing more flexibility in module loading. It is important to note that this feature is currently experimental and may change in future releases.

WebSocket support which was previously available behind the --experimental-websocket flag is now enabled by default in Node.js 22. This brings a browser-compatible WebSocket client implementation to Node.js with no need for external dependencies. Native support simplifies building real-time applications and enhances the overall WebSocket experience in Node.js environments.

The new runtime also includes performance improvements to AbortSignal creation. This makes network operations faster and more efficient for the Fetch API and test runner. The Fetch API is also now considered stable in Node.js 22.

For TypeScript users, Node.js 22 introduces experimental support for transforming TypeScript-only syntax into JavaScript code. By using the --experimental-transform-types flag, you can enable this feature to support TypeScript syntax such as Enum and namespace directly. While you can enable the feature in Lambda, your function entrypoint (i.e. index.mjs or app.cjs) cannot currently be written using TypeScript as the runtime expects a file with a JavaScript extension. You can use TypeScript for any other module imported within your codebase.

For a detailed overview of Node.js 22 language features, see the Node.js 22 release blog post and the Node.js 22 changelog.

Experimental features that are unavailable

Node.js 22 includes an experimental feature to detect the module syntax automatically (CommonJS or ES Modules). This feature must be enabled when the Node.js runtime is compiled. Since the Lambda-provided Node.js 22 runtime is intended for production workloads, this experimental feature is not enabled in the Lambda build and cannot be enabled via an execution-time flag. To use this feature in Lambda, you need to deploy your own Node.js runtime using a custom runtime or container image with experimental module syntax detection enabled.

Performance considerations

At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda sub-systems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing conclusions from side-by-side performance comparisons with other Lambda runtimes until the performance has stabilized. Since performance is highly dependent on workload, customers with performance-sensitive workloads should conduct their own testing, instead of relying on generic test benchmarks.

Builders should continue to measure and test function performance and optimize function code and configuration for any impact. To learn more about how to optimize Node.js performance in Lambda, see Performance optimization in the Lambda Operator Guide, and our blog post Optimizing Node.js dependencies in AWS Lambda.

Migration from earlier Node.js runtimes

AWS SDK for JavaScript

Up until Node.js 16, Lambda’s Node.js runtimes included the AWS SDK for JavaScript version 2. This has since been superseded by the AWS SDK for JavaScript version 3, which was released in December 2022. Starting with Node.js 18, and continuing with Node.js 22, the Lambda Node.js runtimes include version 3. When upgrading from Node.js 16 or earlier runtimes and using the included version 2, you must upgrade your code to use the v3 SDK.

For optimal performance, and to have full control over your code dependencies, we recommend bundling and minifying the AWS SDK in your deployment package, rather than using the SDK included in the runtime. For more information, see Optimizing Node.js dependencies in AWS Lambda.

Amazon Linux 2023

The Node.js 22 runtime is based on the provided.al2023 runtime, which is based on the Amazon Linux 2023 minimal container image. The Amazon Linux 2023 minimal image uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in Node.js 18 and earlier AL2-based images. If you deploy your Lambda function as a container image, you must update your Dockerfile to use dnf instead of yum when upgrading to the Node.js 22 base image from Node.js 18 or earlier.

Additionally AL2 includes curl and gnupg2 as their minimal versions curl-minimal and gnupg2-minimal.

Learn more about the provided.al2023 runtime in the blog post Introducing the Amazon Linux 2023 runtime for AWS Lambda and the Amazon Linux 2023 launch blog post.

Using the Node.js 22 runtime in AWS Lambda

AWS Management Console

To use the Node.js 22 runtime to develop your Lambda functions, specify a runtime parameter value Node.js 22.x when creating or updating a function. The Node.js 22 runtime version is now available in the Runtime dropdown on the Create function page in the AWS Lambda console:

Creating Node.js function in AWS Management Console

Creating Node.js function in AWS Management Console

To update an existing Lambda function to Node.js 22, navigate to the function in the Lambda console, then choose Node.js 22.x in the Runtime settings panel. The new version of Node.js is available in the Runtime dropdown:

Changing a function to Node.js 22

Changing a function to Node.js 22

AWS Lambda container image

Change the Node.js base image version by modifying the FROM statement in your Dockerfile.

FROM public.ecr.aws/lambda/nodejs:22
# Copy function code
COPY lambda_handler.xx ${LAMBDA_TASK_ROOT}

AWS Serverless Application Model (AWS SAM)

In AWS SAM, set the Runtime attribute to node22.x to use this version:

AWSTemplateFormatVersion: "2210-09-09"
Transform: AWS::Serverless-2216-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: lambda_function.lambda_handler
      Runtime: nodejs22.x
      CodeUri: my_function/.
      Description: My Node.js Lambda Function

When you add function code directly in an AWS SAM or AWS CloudFormation template as an inline function, it is seen as common.js.

AWS SAM supports generating this template with Node.js 22 for new serverless applications using the sam init command. Refer to the AWS SAM documentation.

AWS Cloud Development Kit (AWS CDK)

In AWS CDK, set the runtime attribute to Runtime.NODEJS_22_X to use this version.

import * as cdk from "aws-cdk-lib";
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as path from "path";
import { Construct } from "constructs";

export class CdkStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    // The code that defines your stack goes here

    // The Node.js 22 enabled Lambda Function
    const lambdaFunction = new lambda.Function(this, "node22LambdaFunction", {
      runtime: lambda.Runtime.NODEJS_22_X,
      code: lambda.Code.fromAsset(path.join(__dirname, "/../lambda")),
      handler: "index.handler",
    });
  }
}

 

Conclusion

Lambda now supports Node.js 22 as a managed language runtime. This release uses the Amazon Linux 2023 OS as well as other improvements detailed in this blog post.

You can build and deploy functions using Node.js 22 using the AWS Management Console, AWS CLI, AWS SDK, AWS SAM, AWS CDK, or your choice of infrastructure as code tool. You can also use the Node.js 22 container base image if you prefer to build and deploy your functions using container images.

The Node.js 22 runtime helps developers build more efficient, powerful, and scalable serverless applications. Read about the Node.js programming model in the Lambda documentation to learn more about writing functions in Node.js 22. Try the Node.js runtime in Lambda today.

For more serverless learning resources, visit Serverless Land.

Introducing new capabilities to AWS CloudTrail Lake to enhance your cloud visibility and investigations

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/introducing-new-capabilities-to-aws-cloudtrail-lake-to-enhance-your-cloud-visibility-and-investigations/

Today, I’m excited to announce new updates to AWS CloudTrail Lake, which is a managed data lake you can use to aggregate, immutably store, and query events recorded by AWS CloudTrail for auditing, security investigation, and operational troubleshooting.

The new updates in CloudTrail Lake are:

  • Enhanced filtering options for CloudTrail events
  • Cross-account sharing of event data stores
  • General availability of the generative AI–powered natural language query generation
  • AI-powered query results summarization capability in preview
  • Comprehensive dashboard capabilities, including a high-level overview dashboard with AI-powered insights (AI-powered insights is in preview), a suite of 14 pre-built dashboards for various use cases, and the ability to create custom dashboards with scheduled refreshes

Let’s look into the new features one by one.

Enhanced filtering options for CloudTrail events ingested into event data stores
Enhanced event filtering capabilities give you greater control over which CloudTrail events are ingested into your event data stores. These enhanced filtering options provide tighter control over your AWS activity data, improving the efficiency and precision of security, compliance, and operational investigations. Additionally, the new filtering options help you reduce your analysis workflow costs by ingesting only the most relevant event data into your CloudTrail Lake event data stores.

You can filter both management and data events based on attributes such as eventSource, eventType, eventName, userIdentity.arn, and sessionCredentialFromConsole.

I go to the AWS CloudTrail console and choose Event data stores under Lake in the navigation pane. I choose Create event data store. In the first step, I enter a name in the Event data store name field. For this demo, I leave other fields as default. You can choose the pricing and retention options that suit your needs. In the next step, I choose Managements events and Data events under CloudTrail events. You can include all the options you need under CloudTrail events. You also have the option to choose ingestion options. I choose Ingest events to start ingesting when it’s created. There may be scenarios, when you want to deselect the Ingest events option to stop an event data store from ingesting events. For example, you may be copying trail events to the event data store and do not want the event data store to collect any future events. You can also choose to enable ingestion for all accounts in your organization or include only the current region in your event data store.

The following example shows an out of the box template for filtering, which excludes any management events that are initiated by an AWS Service. I choose Advanced event collection under the Management events. I choose Exclude AWS service-initiated events from the Log selector template dropdown. You can also expand the JSON view to see how the filters actually apply.

Under the Data events, the following example creates a filter to include DynamoDB data events initiated by a certain user, helping me to log events based on an IAM principal. I choose DynamoDB as Resource type. I choose Custom as Log selector template. Under the Advanced event selector, I choose userIdentity.arn as Field and equals as Operator. I enter the user’s ARN as Value. I choose Next and choose Create event data store in the final step.

Now, I have my event data store that gives me granular control over the ingested CloudTrail data.

This expanded set of filtering options helps you to be more selective in capturing only the most relevant events for your security, compliance, and operational needs.

Cross-account sharing of event data stores
You can use the cross-account sharing feature of event data stores to enhance collaborative analysis within organizations. It enables secure sharing of event data stores with selected AWS principals through Resource-Based Policies (RBP). This functionality allows authorized entities to query shared event data stores within the same AWS Region where they were created. 

To use this feature, I go to the AWS CloudTrail console and choose Event data stores under Lake in the navigation pane. I choose an event data store from the list and navigate to its details page. I choose Edit in the Resource policy section. The following example policy includes a statement that allows root users in accounts 111111111111, 222222222222, and 333333333333 to run queries and get query results on the event data store owned by account ID 999999999999. I choose Save changes to save the policy.

Generative AI–powered natural language query generation in CloudTrail Lake is now generally available
In June, we announced this feature for CloudTrail Lake in preview. With this launch, you can generate SQL queries using natural language questions to easily explore and analyze AWS activity logs (only management, data, and network activity events) without needing technical SQL expertise. The feature uses generative AI to convert natural language questions into ready-to-use SQL queries you can run directly in the CloudTrail Lake console. This simplifies the process of exploring event data stores and retrieving insights such as error counts, top services used, and the causes of errors. This feature is also accessible through the AWS Command Line Interface (AWS CLI), providing additional flexibility for users who prefer command-line operations. The preview blog post provides step-by-step instructions on how to get started with the natural language query generation feature in CloudTrail Lake.

CloudTrail Lake generative AI–powered query results summarization capability in preview
Building on the capability of natural language query generation, we’re introducing a new AI-powered query results summarization feature in preview to further simplify the process of analyzing AWS account activity. With this feature, you can easily extract valuable insights from your AWS activity logs (only management, data, and network activity events) by automatically summarizing the key points from your query results in natural language, reducing the time and effort required to understand the information.

To try this feature, I go to the AWS CloudTrail console and choose Query under Lake in the navigation pane. I choose an event data store for my CloudTrail Lake query from the dropdown list in Event data store. You can use summarization regardless of whether the query was written manually or generated by generative AI. For this example, I will use the natural language query generation capability. In the Query generator, I enter the following prompt in the Prompt field using natural language:

How many errors were logged during the past month for each service and what was the cause of each error?

Then, I choose Generate query. The following SQL query is automatically generated:

SELECT eventsource,
    errorcode,
    errormessage,
    count(*) as errorcount
FROM a0******
WHERE eventtime >= '2024-10-14 00:00:00'
    AND eventtime <= '2024-11-14 23:59:59'
    AND (
        errorcode IS NOT NULL
        OR errormessage IS NOT NULL
    )
GROUP BY 1,
    2,
    3
ORDER BY 4 DESC;

I choose Run to get the results. To use the summarization capability, I choose Summarize results in the Query results tab. CloudTrail automatically analyzes the query results and provides a natural language summary of the key insights. It’s important to note that there’s a monthly quota of 3 MB for query results that can be summarized.

This new summarization capability can save you time and effort in understanding your AWS activity data by automatically generating meaningful summaries of the key findings.

Comprehensive dashboard capabilities
Lastly, let me tell you about the new dashboard capabilities of CloudTrail Lake to enhance visibility and analysis across your AWS environments.

The first one is a Highlights dashboard that provides you with an easy-to-view summary of the data captured in your CloudTrail Lake management and data events stored in event data stores. This dashboard makes it easier to quickly identify and understand important insights, such as the top failed API calls, trends in failed login attempts, and spikes in resource creation. It surfaces any anomalies or unusual trends in the data.

I go to the AWS CloudTrail console and choose Dashboard under Lake in the navigation pane to check out the Highlights dashboard. First, I enable Highlights dashboard by choosing Agree and enable Highlights.

I check out the Highlights dashboard once it populates with data.

The second addition to the new dashboard capabilities is a suite of 14 pre-built dashboards. These dashboards are designed for different personas and use cases. For example, the security-focused dashboards help you to track and analyze key security indicators, such as top access denied events, failed console login attempts, and users who have disabled multi-factor authentication (MFA). There are also pre-built dashboards for operational monitoring, highlighting trends in errors and availability issues, such as top APIs with throttling errors and top users with errors. You can also use the dashboards focused on specific AWS services such as Amazon EC2 and Amazon DynamoDB, which help you identify security risks or operational problems within those particular service environments.

You can create your own dashboards and optionally set schedules for refreshing them. This level of customization helps you tailor the CloudTrail Lake analysis capabilities to your precise monitoring and investigative needs across your AWS environments.

I switch to the Managed and custom dashboards to observe the custom and pre-built dashboards.

I choose IAM activity dashboard pre-built dashboard to observe overall IAM activity. You can choose Save as new dashboard to customize this dashboard.

To create a custom dashboard from scratch, I go to Dashboard under Lake in the navigation pane and choose Build my own dashboard. I enter a name in the Enter a name for the dashboard field and choose event data stores under Permissions, to visualize the events. Next, I choose Create dashboard.

Now, I can add widgets to my dashboard. You have the flexibility to customize your dashboards in multiple ways. You can select from a list of pre-built sample widgets using Add sample widget, or you can create your own custom widgets using Create new widget. For each widget, you can choose the type of visualization you prefer, such as a line graph, bar graph, or other options to best represent your data.

Now available
The new features in AWS CloudTrail Lake represent a major advancement in providing a comprehensive audit logging and analysis solution. These enhancements provide the ability to gain more profound understanding and conduct investigations more rapidly, assisting with more preventative monitoring and faster incident handling across your entire AWS environments.

You can now start using generative AI–powered natural language query generation in CloudTrail Lake in US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), and Europe (London) AWS Regions.

CloudTrail Lake generative AI–powered query results summarization capability is available in preview in US East (N. Virginia), US West (Oregon), and Asia Pacific (Tokyo) Regions.

Enhanced filtering options, cross-account sharing of event data stores and dashboards are available in all the Regions where CloudTrail Lake is available, with the exception of generative AI–powered summarization feature on the Highlights dashboard being available only in US East (N. Virginia), US West (Oregon), and Asia Pacific (Tokyo) Regions.

Running queries will incur CloudTrail Lake query charges. For more details on pricing, visit AWS CloudTrail pricing.

— Esra

AWS Glue Data Catalog supports automatic optimization of Apache Iceberg tables through your Amazon VPC

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/aws-glue-data-catalog-supports-automatic-optimization-of-apache-iceberg-tables-through-your-amazon-vpc/

The AWS Glue Data Catalog supports automatic table optimization of Apache Iceberg tables, including compaction, snapshots, and orphan data management. The data compaction optimizer constantly monitors table partitions and kicks off the compaction process when the threshold is exceeded for the number of files and file sizes.

The Iceberg table compaction process starts and will continue if the table or any of the partitions within the table has more than the configured number of files (default five files), each smaller than 75% of the target file size. The snapshot retention process runs periodically (default daily) to identify and remove snapshots that are older than the specified retention configuration from the table properties, while keeping the most recent snapshots up to the configured limit. Similarly, the orphan file deletion process scans the table metadata and the actual data files, identifies the unreferenced files, and deletes them to reclaim storage space. These storage optimizations can help you reduce metadata overhead, control storage costs, and improve query performance.

Although automatic table optimization has simplified day-to-day Iceberg table maintenance tasks, certain industries and customers have advanced requirements to access their Iceberg tables from specific virtual private clouds (VPCs). This access control is needed for not only data ingestion and querying, but also for table maintenance.

To help achieve such requirements, we provide the capability where the Data Catalog optimizes Iceberg tables to run in your specific VPC. This post demonstrates how it works with step-by-step instructions.

How the table optimizer works with AWS Glue network connection

By default, a table optimizer is not associated with any of your VPCs and subnets. With this new capability of supporting data access from VPCs, you can associate a table optimizer with an AWS Glue network connection to run in a specific VPC, subnet, and security group. An AWS Glue network connection is commonly used to run an AWS Glue job with a specific VPC, subnet, and security group. The following diagram illustrates how it works.

In the next sections, we demonstrate how to configure a table optimizer with an AWS Glue network connection.

Prerequisites

To run through this instruction, you must have the following prerequisites:

Set up resources with AWS CloudFormation

This post includes a sample AWS CloudFormation template that enables a quick setup of the solution resources. You can review and customize the template to suit your needs.

The CloudFormation template generates the following resources:

  • An Amazon Simple Storage Service (Amazon S3) bucket to store the dataset, AWS Glue job scripts, and so on. (See Appendix 1 at the end of this post for manual instructions.)
  • A Data Catalog database.
  • An AWS Glue job that creates and modifies sample customer data in your S3 bucket with a trigger every 10 minutes.
  • AWS IAM roles and policies.
  • A VPC, public subnet, two private subnets, internet gateway, and route tables.
  • Amazon Virtual Private Cloud (Amazon VPC) endpoints for AWS Glue, AWS Lake Formation, Amazon CloudWatch, Amazon S3, and AWS Security Token Service (AWS STS). The endpoint names are as follows:
    • AWS Gluecom.amazonaws.<region>.glue (for example, com.amazonaws.us-east-1.glue).
    • Lake Formationcom.amazonaws.<region>.lakeformation (only if tables are registered with Lake Formation).
    • CloudWatchcom.amazonaws.<region>.monitoring.
    • Amazon S3com.amazonaws.<region>.s3.
    • AWS STScom.amazonaws.<region>.sts.
  • An AWS Glue network connection configured with the VPC and subnet. (See Appendix 2 at the end of this post for manual instructions.)

To launch the CloudFormation stack, complete the following steps:

  1. Sign in to the AWS CloudFormation console.
  2. Choose Launch Stack.
    Launch Stack
  3. Choose Next.
  4. For SubnetAz1, choose your preferred Availability Zone.
  5. For SubnetAz2, choose your preferred Availability Zone. This needs to be different from SubnetAz1.
  6. Leave the other parameters as default or make appropriate changes based on your requirements, then choose Next.
  7. Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
  8. Choose Create.

This stack can take around 5–10 minutes to complete, after which you can view the deployed stack on the AWS CloudFormation console.

Configure automatic table optimization with an AWS Glue network connection

Complete following steps to configure automatic table optimization with an AWS Glue network connection:

  1. On the AWS Glue console, choose Databases in the navigation pane.
  2. Choose iceberg_optimizer_vpc_db.
  3. Under Tables, choose customer.
  4. On the Table optimization – new tab, choose Enable optimization.

  1. For Optimization configuration, choose Customize settings.
  2. For IAM role, choose the iceberg-optimizer-vpc-MyGlueTableOptimizerRole-xxx role created by the CloudFormation stack.
  3. For Virtual private cloud (VPC) – optional, choose myvpc_private_network_connection.

  1. Select I acknowledge that expired data will be deleted as part of the optimizers and choose Enable optimization.

Now the table optimizer has been configured with your VPC. After a while, you can see how the optimizer worked.

  1. Under Table optimization – new, choose View optimization history on the Actions menu.

You can confirm that the table optimizer worked successfully for this Iceberg table.

You have now seen how to set up the table optimizer with an AWS Glue network connection to run it through a specific VPC.

Clean up

When you have finished all the preceding steps, remember to clean up all the AWS resources you created using AWS CloudFormation:

  1. Delete the S3 bucket storing the Iceberg table and the AWS Glue job script.
  2. Delete the CloudFormation stack.

Conclusion

This post demonstrated how the Data Catalog supports automatic optimization of Iceberg tables through your VPC. With this enhancement, you can simplify table maintenance for your Iceberg tables under advanced security requirements. This feature is available today in all AWS Glue supported AWS Regions.

Try out this solution for your own use case, and share your feedback and questions in the comments.


About the Authors

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his new road bike.

Paul Villena is an Analytics Solutions Architect in AWS with expertise in building modern data and analytics solutions to drive business value. He works with customers to help them harness the power of the cloud. His areas of interest are infrastructure as code, serverless technologies, and coding in Python.

Justin Lin is a software engineer on the AWS Lake Formation team. He works on delivering managed optimization solutions for open table formats to enhance customer data management and query performance. In his spare time, he enjoys playing tennis.

Himani Desai is a Software Engineer on the AWS Lake Formation team. She works on providing managed optimization solutions for Iceberg tables.

Abishek Shankar is a software engineer on the AWS Lake Formation team, working on providing managed optimization solutions for Iceberg tables.

Shyam Rathi is a Software Development Manager on the AWS Lake Formation team, working on delivering new features and enhancements related to modern data lakes.

Sandeep Adwankar is a Senior Product Manager at AWS. Based in the California Bay Area, he works with customers around the globe to translate business and technical requirements into products that enable customers to improve how they manage, secure, and access data.


Appendix 1: Configure your S3 bucket to allow access only from a specific VPC

The instructions provided in this post help you configure your S3 bucket automatically through the CloudFormation template, but you can also manually configure your S3 bucket to allow access only from a specific VPC. This is an optional step to simulate the strict security regulation on your Iceberg table. Complete following steps:

  1. On the Amazon S3 console, choose Buckets in the navigation pane.
  2. Choose your S3 bucket.
  3. Choose Permissions.
  4. Under Bucket policy, choose Edit.
  5. Enter following bucket policy:
{
    "Version": "2012-10-17",
    "Id": "S3BucketPolicyVPCAccessOnly",
    "Statement": [
        {
            "Sid": "DenyIfNotFromAllowedVPC",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket-name>",
                "arn:aws:s3:::<your-bucket-name>/*"
            ],
            "Condition": {
                "StringNotEquals": {
                    "aws:SourceVpc": "<your-vpc-id>",
                    "aws:PrincipalArn": [
                        "arn:aws:iam::<your-account-id>:role/<your-IAM-role-name>"
                    ]
                }
            }
        }
    ]
}
  1. Choose Save changes.

Now this S3 bucket prevents any data operations not from the VPC. You can try uploading files to the bucket through Amazon S3 console to see that this operation fails as expected.

Appendix 2: Create an AWS Glue network connection

You can also can manually configure the AWS Glue network connection with the following steps:

  1. On the AWS Glue console, choose Data connections in the navigation pane.
  2. Under Connections, choose Create connection.
  3. Select Network, and choose Next.
  4. For VPC, choose your VPC created by the CloudFormation stack. The VPC ID is shown on the Outputs tab of the CloudFormation stack.
  5. For Subnet, choose your private subnet created by the CloudFormation stack. The subnet ID is shown on the Outputs tab of the CloudFormation stack.
  6. For Security groups, choose your security group created by the CloudFormation stack. The security group ID is shown on the Outputs tab of the CloudFormation stack.
  7. Choose Next.
  8. For Name, enter myvpc_private_network_connection.
  9. Choose Next.
  10. Review the configurations and choose Create connection.

Track performance of serverless applications built using AWS Lambda with Application Signals

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/track-performance-of-serverless-applications-built-using-aws-lambda-with-application-signals/

In November 2023, we announced Amazon CloudWatch Application Signals, an AWS built-in application performance monitoring (APM) solution, to solve the complexity associated with monitoring performance of distributed systems for applications hosted on Amazon EKS, Amazon ECS, and Amazon EC2. Application Signals automatically correlates telemetry across metrics, traces, and logs, to speed up troubleshooting and reduce application disruption. By providing an integrated experience for analyzing performance in the context of your applications, Application Signals gives you improved productivity focusing on the applications that support your most critical business functions.

Today we’re announcing the availability of Application Signals for AWS Lambda to eliminate the complexities of manual setup and performance issues required to assess application health for Lambda functions. With CloudWatch Application Signals for Lambda, you can now collect application golden metrics (the incoming and outgoing volume of requests, latency, faults, and errors).

AWS Lambda abstracts away the complexity of the underlying infrastructure, enabling you to focus on building your application without having to monitor server health. This allows you to shift your focus toward monitoring the performance and health of your applications, which is necessary to operate your applications at peak performance and availability. This requires deep visibility into performance insights such as volume of transactions, latency spikes, availability drops, and errors for your critical business operations and application programming interfaces (APIs).

Previously, you had to spend significant time correlating disjointed logs, metrics, and traces across multiple tools to establish the root cause of anomalies, increasing mean time to recovery (MTTR) and operational costs. Additionally, building your own APM solutions with custom code or manual instrumentation using open source (OSS) libraries was time-consuming, complex, operationally expensive, and often resulted in increased cold start times and deployment challenges when managing large fleets of Lambda functions. Now, you can use Application Signals to seamlessly monitor and troubleshoot health and performance issues in serverless applications, without requiring any manual instrumentation or code changes from your application developers.

How it works
Using the pre-built, standardized dashboards of Application Signals, you can identify the root cause of performance anomalies in just a few clicks by drilling down into performance metrics for critical business operations and APIs. This helps you visualize application topology which shows interactions between the function and its dependencies. In addition, you can define Service Level Objectives (SLOs) on your applications to monitor specific operations that matter most to you. An example of an SLO could be to set a goal that a webpage should render within 2000 ms 99.9 percent of the time in a rolling 28-day interval.

Application Signals auto-instruments your Lambda function using enhanced AWS Distro for OpenTelemetry (ADOT) libraries. This delivers better performance such as lower cold start latency,
memory consumption, and function invocation duration, so you can quickly monitor your applications.

I have an existing Lambda function appsignals1 and I will configure Application Signals in the Lambda Console to collect various telemetry on this application.

In the Configuration tab of the function I select Monitoring and operations tools to enable both the Application signals and the Lambda service traces.

I have an application myAppSignalsApp that has this Lambda function attached as a resource. I’ve defined an SLO for my application to monitor specific operations that matter most to me. I’ve defined a goal that states that the application executes within 10 ms 99.9 percent of the time in a rolling 1-day interval.

It can take 5-10 minutes for Application Signals to discover the function after it’s been invoked. As a result you’ll need to refresh the Services page before you can see the service.

Now I’m in the Services page and I can see a list of all my Lambda functions that have been discovered by Application Signals. Any telemetry that is emitted will be displayed here.

I can then visualize the complete application topology from the Service Map and quickly spot anomalies across my service’s individual operations and dependencies, using the newly collected metrics of volume of requests, latency, faults, and errors. To troubleshoot, I can click into any point in time for any application metric graph to discover correlated traces and logs related to that metric, to quickly identify if issues impacting end users are isolated to an individual task or deployment.

Available now
Amazon CloudWatch Application Signals for Lambda is now generally available and you can start using it today in all AWS Regions where Lambda and Application Signals are available. Today, Application Signals is available for Lambda functions that use Python and Node.js managed runtimes. We’ll continue to add support for other Lambda runtimes in near future.

To learn more, visit the AWS Lambda developer guide and Application Signals developer guide. You can submit your questions to AWS re:Post for Amazon CloudWatch, or through your usual AWS Support contacts.

Veliswa.

Announcing a visual update to the AWS Management Console (preview)

Post Syndicated from Prasad Rao original https://aws.amazon.com/blogs/aws/announcing-a-visual-update-to-the-aws-management-console-preview/

Today, we are announcing a visual update to the AWS Management Console in preview. We are rolling out this update by using the latest version of Cloudscape, the Amazon Web Services (AWS) design system used to build intuitive, inclusive, and meaningful AWS experiences at scale.

In this post, I describe how the visual update makes it easier for you to scan content, focus on the key information, and find what you are looking for more effectively while preserving the familiar and consistent experience of the AWS Management Console.

AWS Management console home page - previous

AWS Management console home page - Visual Update

Improved readability
A revised typography scale and improved treatment of headings result in a stronger visual hierarchy, which helps you to better locate and understand your data. A refined use of color and weight across text elements help you differentiate key pieces of information faster. For example, you’ll see that labels in form fields are now more prominent, which eases scanning. The same applies to keys in key-value pairs and sections across components, such as service navigation, expandable elements, and tabs.

Cloudfront distribution console screenshot - Previous vs Visual Update

We improved the color palette, made it more vibrant, and simplified the color treatment of interactive elements. For example, secondary buttons, links, tokens, and interactive states for numerous interface elements are now blue, making it easier for you to interact with the content on the screen and contributing to improving task efficiency.

Screenshot showing improved color - Previous vs Visual Update

Improved focus in light and dark mode
Reduced visual complexity supports user focus. We replaced drop shadows with a new thinner stroke on main content wrappers, such as cards, panels, and containers, and unified the use of border styles across components. This reduces visual noise and optimizes the space inside the layout. Shadows are now reserved to add emphasis on specific interactive and transient elements, which helps simplify visual depth and improves the overall content hierarchy.

Screenshot showing improved focus - Previous vs Visual Update

We also released updates to dark mode to address the need for clearer differentiation between elements on the page. These changes include an update to the color ramp and improved contrast between interactive states across components.

Screenshot comparing dark mode of AWS Management Console home page - Previous vs Visual Update

Modernized interface
We modernized the interface while retaining familiarity to continue to offer predictable and recognizable experiences across AWS. The user experience is now easier on the eyes, thanks to the use of rounder shapes, brighter colors, and improved layout treatment. These updates create a smoother, more natural appearance, making the interface more visually pleasing.

To deliver a more delightful experience and support visual storytelling, we also introduced a whole new family of illustrations and motion while still offering the highest accessibility standards.

Example of an illustration introduced

Improved information density
We optimized information density by reducing unused space, leading to more content visible on the screen. Related data is now displayed closer together, reinforcing visual grouping. Space within content wrappers such as cards and containers has been minimized, so you can consume more information at once. The new layout is centered and wider, optimizing the experience to serve larger screen sizes than before. The visual update makes it easier to consume information, which creates a better and friendlier experience within the AWS Management Console.

Showing Improved information density on AWS Lambda Create Function Screen - Previous vs Visual Update

Showing Improved information density in tabular format - Previous vs Visual Update

Additionally, we introduced Toolbar, a new way to navigate and access contextual tools and features. This helps you perform your tasks while maximizing the amount of content available on screen.

Screenshot of toolbar introduced

Improved consistency
The interface is now more distinctive and consistent. Refreshed colors, iconography, and shapes help deliver a more dynamic and expressive experience while reinforcing a unified and cohesive journey across all AWS experiences.

Available now
You can start experiencing the visual update now in selected consoles across all AWS Regions by visiting the AWS Management Console. We’ll be extending the update across all services. Thanks to the new visual treatment, you can now benefit from an experience that’s more readable and intuitive and that contributes to improved overall task efficiency.

Introducing Amazon CloudFront VPC origins: Enhanced security and streamlined operations for your applications

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/introducing-amazon-cloudfront-vpc-origins-enhanced-security-and-streamlined-operations-for-your-applications/

I’m happy to introduce the release of Amazon CloudFront Virtual Private Cloud (VPC) origins, a new feature that enables content delivery from applications hosted in private subnets within their Amazon Virtual Private Cloud (Amazon VPC). This makes it easy to secure web applications, allowing you to focus on growing your businesses while improving security and maintaining high-performance and global scalability with CloudFront.

Customers serving content from Amazon Simple Storage Solution (Amazon S3), AWS Elemental Services and AWS Lambda Function URLs can use Origin Access Control as a managed solution to secure their origins, and make CloudFront the single front-door to your application. However, this was more difficult to achieve for applications that are hosted on Amazon Elastic Compute Cloud (Amazon EC2) or using load balancers, because you had to create your own solution to achieve the same result. You would have to use a combination of methods such as using access control lists (ACLs), managing firewall rules, or using logic such as header validation and a few other techniques to ensure that the endpoint remained exclusive to CloudFront.

CloudFront VPC origins removes the need for this kind of undifferentiated work by offering a managed solution that can be used to point CloudFront distributions directly to Application Load Balancers (ALBs), Network Load Balancers (NLBs), or EC2 instances inside your private subnets. This ensures that CloudFront becomes the sole ingress point for those resources with minimum configuration effort, providing you with improved performance and a cost-saving opportunity because it also eliminates the need for public IP addresses.

Configuring a CloudFront VPC origin
CloudFront VPC origins is available at no additional cost, making it an accessible option for all AWS customers. It can be integrated with new or existing CloudFront distributions using the Amazon CloudFront console or the AWS Command Line Interface (AWS CLI).

Imagine that you have an application hosted privately on an AWS Fargate for Amazon ECS fronted through an ALB. Let’s create a CloudFront distribution that uses the ALB directly inside the private subnet.

Start by navigating to the CloudFront console and select the new menu option: VPC origins.

vpc origins page

Creating a new VPC origin is straightforward. You only need to select from a few options. In the Origin ARN, you can search for available resources that are hosted in private subnets or enter it directly. You select the resources that you want, choose a friendly name for your VPC origin alongside some security options, and then confirm. Please note that, at launch, the VPC origin resource must be in the same AWS Account as the CloudFront distribution, although support for resources across all accounts is coming soon.

creating a vpc origin

After the creation process is complete, your VPC origin will be deployed and ready to go! You can check its status on the VPC origins page.

With this, we have created a CloudFront distribution that serves content directly from a resource hosted on a private subnet in a few clicks! After your VPC origin is created, you can navigate to your Distribution window, and add the VPC origin to your Distribution by either selecting the ARN from the dropdown or copy-pasting the ARN manually.

Remember, though, that it’s important to still continue to layer your application’s security by using services such as AWS Web Application Firewall (WAF) to protect from web exploits, or AWS Shield for managed DDos protection, and other services to achieve a full-spectrum protection.

Conclusion
CloudFront VPC Origins offers a new way for organizations to deliver secure, high-performance applications by enabling CloudFront distributions to serve content directly from resources hosted within private subnets. This reduces the complexity and cost of maintaining public-facing origins while ensuring that your application remains secure.

To learn more, see the getting started guide.

Matheus Guimaraes | @codingmatheus

Amazon CloudFront now accepts your applications’ gRPC calls

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-cloudfront-now-accepts-your-applications-grpc-calls/

Starting today, you can deploy Amazon CloudFront, our global content delivery network (CDN), in front of your gRPC API endpoints.

gRPC is a modern, efficient, and language-agnostic framework for building APIs. It uses Protocol Buffers (protobuf) as its interface definition language (IDL), which enable you to define services and message types in a platform-independent manner. With gRPC, communication between services is achieved through lightweight and high-performance remote procedure calls (RPCs) over HTTP/2. This promotes efficient and low-latency communication across services, making it ideal for microservices architectures.

gRPC offers features such as bidirectional streaming, flow control, and automatic code generation for a variety of programming languages. It’s well-suited for scenarios in which you require high performance, efficient communication, and real-time data streaming. If your application needs to handle a large amount of data or requires low-latency communication between client and server, gRPC can be a good choice. However, gRPC might be more challenging to learn compared to REST. For example, gRPC relies on the protobuf serialization format, which requires developers to define their data structures and service methods in .proto files.

I see two benefits of deploying CloudFront in front of your gRPC API endpoints.

First, it allows the reduction of latency between the client application and your API implementation. CloudFront offers a global network of over 600+ edge locations with intelligent routing to the closest edge. Edge locations provide TLS termination and optional caching for your static content. CloudFront transfers client application requests to your gRPC origin through the fully managed, low-latency, and high-bandwidth private AWS network.

Secondly, your applications benefit from additional security services deployed on edge locations, such as traffic encryption, the validation of the HTTP headers through AWS Web Application Firewall, and AWS Shield Standard protection against distributed denial of service (DDoS) attacks.

Let’s see it in action
To start this demo, I use the gRPC route-guide demo from the official gRPC code repository. I deploy this example application in a container for ease of deployment (but any other deployment option is supported too).

I use this Dockerfile

FROM python:3.7
RUN pip install protobuf grpcio
COPY ./grpc/examples/python/route_guide .
CMD python route_guide_server.py
EXPOSE 50051

I also use the AWS Copilot command line to deploy my container on Amazon Elastic Container Service (Amazon ECS). The Copilot command prompts me to collect the information it requires to build and deploy the container. Then, it creates the ECS cluster, the ECS service, and the ECS task automatically. It also creates a TLS certificate and the load balancer for me. I test the client application by modifying line 122 to use the DNS name of the load balancer listener endpoint. I also change the client application code to use grpc.secure_channel instead of grpc.insecure_channel because the load balancer provides the application with an HTTPS endpoint.

gRPC client application demo - source code with ALB

When I’m confident my API is correctly deployed and working, I proceed and configure CloudFront.

First, in the CloudFront section of the AWS Management Console, I select Create distribution.

Under Origin, I enter my gRPC endpoint DNS name as Origin domain. I enable HTTPS only as Protocol and leave the HTTPS port as is (443). Then I choose a Name for the distribution.

CloudFront - Add origin and name

Under Viewer, I select HTTPS only as Viewer protocol policy. Then, I select GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE as Allowed HTTP methods. I select Enable for Allow gRPC requests over HTTP/2.

CloudFront - Viewer Policy

Under Cache key and origin requests, I select AllViewer as Origin request policy.

The default cache policy is CacheOptimized, but gRPC isn’t cacheable API traffic. Therefore, I select CachingDisabled as Cache policy.

CloudFront - Cache policy

AWS WAF helps protect you against common web exploits and bots that can affect availability, compromise security, or consume excessive resources. For gRPC traffic, AWS WAF can inspect the HTTP headers of the request and enforce access control. It doesn’t inspect the request body in protobuf format.

For this demo, I choose to not use AWS WAF. Under Web Application Firewall (WAF), I select Do not enable security protections.

CloudFront - Security

I also keep all the other options with their default value. HTTP/2 support is selected by default. Do not disable it because it is required for gRPC.

Finally, I select Create distribution.

CloudFront - Create distribution

There is only one switch to enable gRPC on top of the usual setup. When turned on, with HTTP/2 and HTTP POST enabled, CloudFront detects gRPC client traffic and forwards it to your gRPC origin.

After a few minutes, the distribution is ready. I copy and paste the endpoint URL of the CloudFront distribution, and I change the client-side app to make it point to CloudFront instead of the previously created load balancer.

gRPC client application demo - source code

I test the application again, and it works.

gRPC client application demo - execution

Pricing and Availability
gRPC origins are available on all the more than 600 CloudFront edge locations at no additional cost. The usual requests and data transfer fees apply.

Go and point your CloudFront origin to a gRPC endpoint today.

— seb

Enhance data governance with enforced metadata rules in Amazon DataZone

Post Syndicated from Ramesh H Singh original https://aws.amazon.com/blogs/big-data/enhance-data-governance-with-enforced-metadata-rules-in-amazon-datazone/

We’re excited to announce a new feature in Amazon DataZone that offers enhanced metadata governance for your subscription approval process. With this update, domain owners can define and enforce metadata requirements for data consumers when they request access to data assets. By making it mandatory for data consumers to provide specific metadata, domain owners can achieve compliance, meet organizational standards, and support audit and reporting needs.

Many organizations require additional metadata from data consumers during the subscription request process to align with internal workflows and regulatory requirements. With enforced metadata rules, domain unit owners can establish consistent governance practices across all data subscriptions. For example, financial services organizations can mandate specific compliance-related metadata when data consumers request access to sensitive financial data. Similarly, healthcare providers can enforce metadata requirements to align with regulatory standards for patient data access. This feature simplifies the approval process by guiding data consumers through completing mandatory fields and enabling data owners to make informed decisions, ensuring data access requests meet organizational policies.

By streamlining metadata governance, Amazon DataZone empowers customers to meet compliance standards, maintain audit readiness, and simplify access workflows for enhanced efficiency and control. For example, one of our customers, Bristol Myers Squibb (BMS), leverages Amazon DataZone to address their specific data governance needs. Sitikantha Sarangi, Director of Data Engineering and ML Ops Platform at BMS, says:

“At BMS, our teams have been leveraging Amazon DataZone’s comprehensive data governance solution to catalog and enable secure data subscriptions across the organization within governed project environments. With the new custom metadata enforcement feature, we now can more easily navigate our data catalog. This capability allows us to set specific requirements for data consumers, such as providing a compliance certification link or detailing data usage intentions, ensuring that access requests for sensitive data are thoroughly reviewed and approved in alignment with our standards. This customization helps us more efficiently ensure we are appropriately utilizing data while facilitating efficient, secure data sharing across teams.” 

Key benefits

The feature benefits multiple stakeholders. Domain unit owners can ensure compliance by enforcing metadata requirements, granting access only after thorough reviews. Data consumers benefit from a streamlined subscription request process, guided by metadata requirements that reduce complexity. Data producers gain clarity with detailed subscription requests, enabling informed decisions aligned with required standards. Overall, the key benefits are:

  • Enhanced control for domain owners – Admins and domain unit owners can now enforce additional metadata requirements on subscription requests, making sure that data consumers supply essential information for thorough review and compliance checks
  • Custom workflow support – Organizations can build custom workflows for assets by capturing critical metadata from data consumers, such as AWS account IDs or project-specific identifiers, to fulfill access requests

In this post, we walk you through setting up and using metadata enforcement to create seamless, compliant data access workflows.

Solution overview

The solution in this post is composed of two parts. In the first part, we walk through the steps necessary to enforce metadata for subscription requests for managed assets. In the second part, we walk through the steps necessary to request subscriptions for custom assets.

Prerequisites

To follow this post, user should already have Amazon DataZone setup with respective projects to publish and consume the assets. The publisher of the Retail project must have published a shipments data asset in Amazon DataZone. The domain owner or admin must have created a metadata form required for the subscription request.

This feature also supports metadata enforcement for subscription requests of a data product. For instructions on how to set this up, refer to Amazon DataZone data products.

Solution walkthrough: Enhance data governance with enforced metadata rules for Managed Assets

To perform the solution in this post, follow the steps in the next sections.

Metadata enforcement for subscription requests

To enforce metadata for subscription requests, use the following steps.

Step 1: Domain owner configures metadata requirements

Domain unit owners can configure metadata enforcement in Amazon DataZone as follows:

  1. On the Amazon DataZone console, choose Domain to open your domain or domain unit settings.
  2. Choose dataplatform, as shown in the following screenshot.
  3. To add metadata forms for subscription requests, on the RULES tab, choose ADD, as shown in the following screenshot.
  4. Provide the name to the metadata form rule.
  5. Choose ADD ANOTHER METADATA FORM.
  6. Choose from a list of available metadata forms within the domain or domain unit. Search options make navigation straightforward.

You can select multiple forms for enforcement on subscription requests.

  1. Choose Add, as shown in the following screenshot.

Create metadata form rule as below:

  1. In the next screen, you can specify additional settings. You can apply metadata forms across all asset types or limit them to specific asset types. Additionally, choose whether the rule applies to a specific project or all projects within the domain. After the scope is defined as shown in the screenshot, choose ADD RULE.

    Note: Enable metadata enforcement across child domains, with optional permissions allowing child domains to override the parent domain’s enforced forms. This option is available while defining the scope, if the domain owner chooses All projects, as shown in the following screenshot.

Step 2: Data consumer submits subscription request

After metadata enforcement is configured, data consumers follow these steps to request access:

  1. To find and select an asset in the Amazon DataZone catalog, choose MARKETING and then sign in to the Amazon DataZone console as a data consumer. On the search bar, enter the shipments data asset, as shown in following screenshot.
  2. Choose SUBSCRIBE to open the subscription request modal, as shown in the following screenshot.
  3. Choose a project and provide a Reason for request, as shown in the following screenshot.
  4. Fill in the required metadata fields as specified by the domain unit. If mandatory fields are incomplete, they will be highlighted, and the submission will be disabled until resolved. After all the mandatory fields are entered, choose APPLY, as shown in the following screenshot.
  5. Choose Request to submit the subscription request, as shown in the following screenshot.

After submitting, an event is generated in Amazon EventBridge, which can be used in custom workflows outside of Amazon DataZone as needed.

Step 3: Data producer (owner) approves the subscription

After a data consumer submits a subscription request, they review the metadata. The data producer receives the subscription request with all metadata provided by the data consumer.

  1. Sign in to the Amazon DataZone console as a data producer. Choose RETAIL as the
  2. In the navigation pane, choose Incoming requests and find the subscription request. Choose View request, as shown in the following screenshot.
  3. Data producers can review the metadata, including document links and account IDs, to determine if the request meets compliance and workflow requirements before granting access, as shown in the following screenshot.
  4. Under Approval access, choose Full access to provide full access to data. For fine-grain access control, choose Approve with row or column filters. For this post, we choose Full access.
  5. Provide the Decision comment.
  6. Choose APPROVE, as shown in the following screenshot.

Step 4: Data consumer consumes the data

Now, data consumers follow these steps:

  1. After the subscription grants are approved and fulfilled, sign in to the Amazon DataZone console as data consumer from MARKETING project to query the subscribed data.
  2. Choose MARKETING On the Environments tab, choose Query data through Amazon Athena, as shown in the following screenshot.
  3. Query the subscribed data asset shipments in Amazon Athena, with below query and as shown in the screenshot.
    SELECT * from “env_mkt_datalake_sub_db”.“shipments” limit 10;

Solution walkthrough: Enhance data governance with enforced metadata rules for Custom Assets

Customers can manage access grants for unmanaged assets using Amazon DataZone. When a subscription to an asset in the business data catalog is approved by the data owner, Amazon DataZone publishes an event in Amazon EventBridge in the account along with all the necessary information in the payload that you can use to create the access grants between the source and the target. Using metadata enforcement for unmanaged assets, customers can provide all context in the single request.

STEP 1: Create a custom asset type

To create a custom asset type Metrics with an attached metadata form to describe the metric asset type, follow these steps:

Below is an example of a custom asset type – “Metrics” which has two fields 1/Dashboard Link and 2/Calculation

Step 2: Data producer creates a custom asset using the “Metrics” asset type

The data producer creates a Conversion Rate Metric with all metadata along with associated metadata forms by following these steps:

Below is “Conversion Rate Metric” asset created in DataZone. The highlighted boxes show that is an Unmanaged asset and of type “Metrics” that was created in the previous step.

Step 3: Domain owner configures metadata requirements

Domain unit owners can configure metadata enforcement in Amazon DataZone as follows:

  1. On the Amazon DataZone console, choose Domain to open your domain or domain unit settings.
  2. To add metadata forms for subscription requests, on the RULES tab, choose ADD, as shown in the following screenshot.
  3. To select metadata forms, provide the Name to the metadata form rule.
  4. Choose ADD METADATA FORM, as shown in the following screenshot.
  5. Remaining fields can be left as default. For this blog, please set it as shown in below
  6. In the Add metadata form pop-up, enter MetricsRequestForm, as shown in the following screenshot.

  7. Choose ADD Rule as shown above to create the rule for all metrics assets. Below is the screenshot of the rule once created.

Step 4: Admins sets up an EventBridge rule

To set up an EventBridge rule, follow these steps:

  1. Create an EventBridge rule to capture all new subscription requests. Please see the documentation Amazon DataZone events and notifications for details to setup.
  2. Create an AWS Lambda function as a target to action on the event. Please see documentation – Event bus targets in Amazon EventBridge to setup targets.

For this blog, set the below event pattern that triggers the lambda only for new Subscription requests.

{
  "source": ["aws.datazone"],
  "detail-type": ["Subscription Request Created"]
}

Step 5: Data consumer submits subscription request

After metadata enforcement is configured, data consumers follow these steps to request access:

  1. To locate the asset in the Amazon DataZone catalog, sign in to the Amazon DataZone console as a data consumer from the marketing Use the search bar to find the Conversion Rate Metric asset. Choose SUBSCRIBE, as shown in the following screenshot.
  2. Provide details, including the Metrics Request Form associated with the Metrics asset type.
  3. Choose REQUEST, as shown in the following screenshot.

You will receive notification confirming that your subscription request is submitted, as shown in the following screenshot.

For the request, EventBridge will capture the following request event and send it to the setup target:

{
    'version': '0',
    'id': '3fdf59a2-f95c-192f-0901-4025dc6e6a61',
    'detail-type': 'Subscription Request Created',
    'source': 'aws.datazone',
    'account': '1234567890', 
    'time': '2024-11-15T18:57:16Z', 
    'region': 'us-east-1', 
    'resources': [], 
    'detail': 
        {
            'version': '283',
            'internal': None,
            'metadata': 
                {'
                    id': 'cwaxxxlj', 
                    'version': '1',
                    'typeName': 'SubscriptionRequestEntityType',
                    'domain': 'dzd_xxxxxxxxx1z',
                    'user': 'd1xxxxx-eexxx-xxxx-axxxx-0xxxxxxxx8ce',
                    'awsAccountId': '1234567890', 
                    'owningProjectId': '555xxxxxxrmv', 
                    'clientToken': '3bxxxxxxxxxxc91bb76d6'
                }, 
            'data': 
                {
                    'autoApproved': False, 
                    'requesterId': 'd1xxxxx848ce',
                    'reviewerId': '54uxxxxxxd3',
                    'status': 'PENDING',
                    'subscribedListings': [{'id': '6ixxgev', 'item': {'assetListing': {'entityId': 'xxxxxxxxx7', 'entityType': 'Metrics'}}, 'ownerProjectId': '5xxxxxx3', 'version': '2'}], 
                    'subscribedPrincipals': [{'id': '555xxxxxxrmv', 'type': 'PROJECT'}]
                }
            }
}

The data steward and asset owner can get details for the request with the  GetSubscriptionRequestDetails API and view the asset details and form associated with the request:

{
    "id": "cwxxxlj",
    "createdBy": "d17xxxxxxx848ce",
    "domainId": "dzd_xxxxxxz",
    "status": "PENDING",
    "createdAt": "2024-11-15T20:26:01.014000+00:00",
    "updatedAt": "2024-11-15T20:26:01.014000+00:00",
    "requestReason": "Marketing Analytics use case",
    "subscribedPrincipals": [
        {
            "project": {
                "id": "bxxxxx23hj",
                "name": "Marketing"
            }
        }
    ],
    "subscribedListings": [
        {
            "id": "6xxxxxxx1ev",
            "revision": "2",
            "name": "Conversion Rate Metric",
            "description": "Conversion rate calculates the percentage of web visitors who complete a desired action, such as creating an account, placing an order or clicking a link",
            "item": {
                "assetListing": {
                    "entityId": "b8xxxxxd7",
                    "entityRevision": "7",
                    "entityType": "Metrics",
                    "forms": "{\n  \"DZ_Internal_Basic_Form\" : {\n    \"name\" : \"Conversion Rate Metric\",\n    \"description\" : \"Conversion rate calculates the percentage of web visitors who complete a desired action, such as creating an account, placing an order or clicking a link\"\n  },\n  \"amazonstatus\" : {\n    \"publishingPrecedence\" : \"PUBLISHED_INDIVIDUALLY\",\n    \"status\" : \"ACTIVE\"\n  },\n  \"AssetCommonDetailsForm\" : {\n    \"readMe\" : \"Conversion Rate is a key performance metric used in marketing, e-commerce, and digital analytics. It measures the percentage of users or visitors who take a desired action out of the total number of users or visitors. This desired action, known as a \\\"conversion,\\\" can vary depending on the specific goals of a business or campaign.\\n\\n\\nApplications:\\n\\n- E-commerce: Percentage of website visitors who make a purchase\\n- Marketing: Percentage of leads who become customers\\n- Digital Advertising: Percentage of ad viewers who click on an ad or complete a form\\n- Email Marketing: Percentage of email recipients who click a link or perform a desired action\\n\\n\\nImportance:\\n\\n- Measures effectiveness of marketing efforts and user experience\\n- Helps in understanding customer behavior and preferences\\n- Guides optimization efforts for websites, ads, and marketing campaigns\\n- Often used as a key metric for ROI (Return on Investment) calculations\"\n  },\n  \"MarketingMetrics\" : {\n    \"DashboardLink\" : \"www.anycompany.com/marketing/conversion_rate\",\n    \"Calculation\" : \"Conversion rate = Conversions / Total visitors x 100\"\n  },\n  \"amazonmetadata\" : {\n    \"entityVersion\" : \"7\",\n    \"createdAt\" : \"2024-11-15T16:43:15.325935428Z\",\n    \"typeNamespace\" : \"dzd_6xxxxxx1z\",\n    \"sourceCategory\" : \"asset\",\n    \"typeName\" : \"Metrics\",\n    \"entityId\" : \"byxxxxxdolk7\",\n    \"sourceEntityFormDetails\" : [ {\n      \"typeNamespace\" : \"dzd_xxxxx1z\",\n      \"typeVersion\" : \"15\",\n      \"formName\" : \"MarketingMetrics\",\n      \"typeName\" : \"MarketingMetrics\"\n    }, {\n      \"typeNamespace\" : \"amazon.datazone\",\n      \"typeVersion\" : \"10\",\n      \"formName\" : \"DZ_Internal_Basic_Form\",\n      \"typeName\" : \"NamedDataZoneBasicFormType\"\n    }, {\n      \"typeNamespace\" : \"amazon.datazone\",\n      \"typeVersion\" : \"6\",\n      \"formName\" : \"AssetCommonDetailsForm\",\n      \"typeName\" : \"AssetCommonDetailsFormType\"\n    }, {\n      \"typeNamespace\" : \"amazon.datazone.internal\",\n      \"typeVersion\" : \"1\",\n      \"formName\" : \"DZ_Internal_Rendering_Config_Form\",\n      \"typeName\" : \"RenderingConfigFormType\"\n    } ]\n  },\n  \"DZ_Internal_Rendering_Config_Form\" : {\n    \"metadataFormItems\" : [ {\n      \"formName\" : \"MarketingMetrics\",\n      \"collapse\" : false\n    }, {\n      \"formName\" : \"AssetCommonDetailsForm\",\n      \"collapse\" : false\n    } ]\n  }\n}",
                    "glossaryTerms": []
                }
            },
            "ownerProjectId": "54xxxxxd3",
            "ownerProjectName": "Custom-Metrics-Assets"
        }
    ],
    "metadataForms": [
        {
            "formName": "MetricsRequestForm",
            "typeName": "MetricsRequestForm",
            "typeRevision": "5",
            "content": "{\"BusinessUnit\": \"AWS\",\"ContactEmail\": \"[email protected]\",\"Team\": \"DataZone\"}"
        }
    ]
}

The data and asset owner can use these details to orchestrate an approval workflow using the Lambda function. After it has been validated, the asset owner or steward can then call the AcceptSubscriptionRequest API to grant access. The data consumer will be notified after access is approved. The following screenshot shows the notification that the subscription was approved.

Now that the subscription is approved, users can use the dashboard URL to access the metric.

Cleanup

To make sure no additional charges are incurred after testing, delete the Amazon DataZone domain. Refer to Delete Amazon DataZone domains for the process.

Conclusion

The new metadata enforcement rule for subscription requests in Amazon DataZone strengthens data governance by empowering domain unit owners to establish clear metadata requirements for data consumers, streamlining access requests and enhancing data governance. This feature enables organizations to align with the organization’s metadata standards, implement custom workflows, and provide a consistent, governed data access experience.

The feature is supported in all AWS Regions where Amazon DataZone is available at the time of this writing. To check which Regions are available, refer to AWS Services by Region. Check out the video below to learn more about how to set up metadata rules for subscription workflows. Get started with the technical documentation.


About the Authors

Ramesh H Singh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon DataZone team. He is passionate about building high-performance ML/AI and analytics products that enable enterprise customers to achieve their critical goals using cutting-edge technology. Connect with him on LinkedIn.

Pradeep Misra PicPradeep Misra is a Principal Analytics Solutions Architect at AWS. He works across Amazon to architect and design modern distributed analytics and AI/ML platform solutions. He is passionate about solving customer challenges using data, analytics, and AI/ML. Outside of work, Pradeep likes exploring new places, trying new cuisines, and playing board games with his family. He also likes doing science experiments, building LEGOs and watching anime with his daughters.

Lakshmi Nair is a Senior Analytics Specialist Solutions Architect at AWS. She specializes in designing advanced analytics systems across industries. She focuses on crafting cloud-based data platforms, enabling real-time streaming, big data processing, and robust data governance.

Santhosh Padmanabhan is a Software Development Manager at AWS, leading the Amazon DataZone engineering team. His team designs, builds, and operates services specializing in data, machine learning, and AI governance. With deep expertise in building distributed data systems at scale, Santhosh plays a key role in advancing AWS’s data governance capabilities.

Expanded resource awareness in Amazon Q Developer

Post Syndicated from Brendan Jenkins original https://aws.amazon.com/blogs/devops/expanded-resource-awareness-in-amazon-q-developer/

Recently, Amazon Q Developer announced expanded support for account resource awareness with Amazon Q in the AWS Management Console along with the general availability of Amazon Q Developer in AWS Chatbot, enabling you to ask questions from Microsoft Teams or Slack. Additionally, Amazon Q will now provide context-aware assistance for your questions about resources in your account depending on where you are in the console. Amazon Q in the console gives you the ability to use natural language with the Amazon Q Developer chat capability to list resources in your AWS account, get specific resource details, and ask about related resources, launched in preview on April 30, 2024.

In this blog, I will highlight the new expanded functionality of this feature in Amazon Q Developer including understanding relationships between account resources, context-awareness, and the general availability of the AWS Chatbot integration with Microsoft Teams and Slack.

Expanded account resource awareness with Amazon Q Developer

Prior to the launch of the expanded support, you could ask Amazon Q Developer to list resources in your AWS Account with prompts such as “List all my EC2 instances in us-east-1” and the service would list all your Amazon Elastic Compute Cloud (Amazon EC2) instances. Now, with the expanded support, you can ask more complex questions about your AWS account resources. I will show a few examples in this section of this post.

For our first example, imagine that you’re a developer who is responsible for maintaining code as a part of the software development lifecycle (SDLC) and you frequently use AWS Lambda for development and Amazon Relational Database Service (RDS) in the backend as a part of your development process. With this new update, a developer could open a new Q chat in the AWS Management Console, and enter a prompt such as: “Which RDS clusters are due for an update?”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all RDS clusters that need updates in their account and Amazon Q listing those Databases.

Figure 1: Amazon Q Developer listing RDS clusters needing an update

As a result, the Amazon Q Developer console chat will return a list of all your Amazon RDS clusters that have available updates as shown in Figure 1 above.

Now, for another example, you want to update any Lambda functions in your AWS account that had a Simple Notification Service (SNS) topic as a trigger due to moving to a new SNS topic you recently created. To identify which SNS topics are still being used, you could enter a prompt such as “List all the SNS topics that trigger a lambda function.”

User entering prompt Amazon Q Developer chat in the AWS management console about listing all SNS topics that trigger a lambda function and Amazon Q listing the SNS topics as an output.

Figure 2: Amazon Q listing SNS topics that are lambda triggers

As shown in the prior example, Amazon Q Developer was able to identify any SNS topics in the form of Amazon resource name (ARN) that was set to trigger a lambda function in the AWS account as intended.

Additionally, you can ask a follow up question in the same chat to investigate more. You can send a prompt such as “Which lambda function uses the arn:aws:sns:us-east-1:76859XXXX:FailoverHealthcheck SNS topic?”

User entering prompting Amazon Q Developer chat with a follow up question in the AWS management console about which Lambda is associated with an SNS topic.

Figure 3: Asking Q Developer a follow up question about a resource

From Figure 3 above, you can see that there is a Lambda function/endpoint associated with the SNS topic resource that Amazon Q Developer was able to identify.

Outside of the examples above, here are some other prompts/examples that can be explored for the expanded support:

– “Do I have any ECS clusters with pending tasks?”

– “Are there any ECS clusters in my account with services in DRAINING status?”

Amazon Q Developer understands where you are in the console

Amazon Q Developer in the AWS Management Console now provides context-aware assistance for your questions about resources in your account. This feature allows you to ask questions directly related to the console page you’re viewing, eliminating the need to specify the service or resource in your query. Q Developer uses the current page as additional context to provide more accurate and relevant responses, streamlining your interaction with AWS services and resources.

Prior to the update, a user would have to prompt, “What is the public IPv4 address of my instance i-08ccXXXXXX?”  Now, if you are viewing an EC2 instance in the console and prompt Amazon Q, “What is the public IPv4 address of my instance?” you will not need to specify the instance you are referring to.

User entering prompt Amazon Q Developer chat in the AWS management console about what the IP address is of the instance on the page.

Figure 4: Asking Amazon Q about an EC2 instance being viewed

In figure 4 above, Amazon Q’s console chat was able to use its context-awareness to pick up on what the IPv4 address was on the console page where I was currently working, despite me not specifying which instance I was referring to.

AWS ChatBot can now answer questions about AWS resources in Microsoft Teams and Slack

Recently, we announced the general availability of Amazon Q Developer in AWS Chatbot, which provides answers to customers’ AWS resource related queries in Microsoft Teams and Slack. This gives teams the ability to quickly find relevant resources to troubleshoot issues using natural language queries in the chat channels of Microsoft Teams or Slack.

For example, you could integrate the AWS Chatbot Service with Amazon Q Developer to allow you to enter a prompt in Slack such as “@aws show EC2 instances in running state in us-east-1”.

User entering prompt in slack to ask the AWS Chatbot about EC2 resources and Amazon Q responding

Figure 5: Amazon Q listing all EC2 resources in Slack

As shown in figure 5 above, Amazon Q was able to list all the EC2 resources and place them into a slack channel showing an example of the functionality in action.

Conclusion

Amazon Q Developer has enhanced its cloud resource management capabilities, enabling more intuitive and intelligent interactions with AWS resources. The new features allow developers to ask complex, context-aware questions about their cloud infrastructure directly through the AWS Management Console, Microsoft Teams, and Slack. Users can now easily discover new details about specific resources with natural language queries that provide precise, contextual information. These improvements represent a significant step forward in simplifying cloud resource management, making it faster and more user-friendly for development teams to understand, track, and maintain their AWS environments. To learn more about chatting with your AWS resources, check out Console documentation and AWS Chatbot documentation.

About the authors

Brendan Jenkins

Brendan Jenkins is a Tech Lead Solutions Architect at Amazon Web Services (AWS) working with Enterprise AWS customers providing them with technical guidance and helping achieve their business goals. He has an area of specialization in DevOps and Machine Learning technology.

Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless

Post Syndicated from Jagadish Kumar original https://aws.amazon.com/blogs/big-data/introducing-point-in-time-queries-and-sql-ppl-support-in-amazon-opensearch-serverless/

Today we announced support for three new features for Amazon OpenSearch Serverless: Point in Time (PIT) search, which enables you to maintain stable sorting for deep pagination in the presence of updates, and Piped Processing Language (PPL) and Structured Query Language (SQL), which give you new ways to query your data. Querying with SQL or PPL is useful if you’re already familiar with the language or want to integrate your domain with an application that uses them.

OpenSearch Serverless is a powerful and scalable search and analytics engine that enables you to store, search, and analyze large volumes of data while reducing the burden of manual infrastructure provisioning and scaling as you ingest, analyze, and visualize your time series and search data, simplifying data management and enabling you to derive actionable insights from data. The vector engine for OpenSearch Serverless also makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (generative AI) applications without needing to manage the underlying vector database infrastructure.

PIT search

Point in Time (PIT) search lets you run different queries against a dataset that’s fixed in time. Typically, when you run the same query on the same index at different points in time, you receive different results because documents are constantly indexed, updated, and deleted. With PIT, you can query against a state of your dataset for a point in time. Although OpenSearch still supports other ways of paginating results, PIT search provides superior capabilities and performance because it isn’t bound to a query and supports consistent pagination. When you create a PIT for a set of indexes, OpenSearch creates contexts to access data at that point in time and when you use a query with a PIT ID, it searches the contexts that are frozen in time to provide consistent results.

Using PIT involves the following high-level steps:

  1. Create a PIT.
  2. Run search queries with a PIT ID and use the search_after parameter for the next page of results.
  3. Close the PIT.

Create a PIT

When you create a PIT, OpenSearch Serverless provides a PIT ID, which you can use to run multiple queries on the frozen dataset. Even though the indexes continue to ingest data and modify or delete documents, the PIT references the data that hasn’t changed since the PIT creation.

Run a search query with the PIT ID

PIT search isn’t bound to a query, so you can run different queries on the same dataset, which is frozen in time.

When you run a query with a PIT ID, you can use the search_after parameter to retrieve the next page of results. This gives you control over the order of documents in the pages of results.

The following response contains the first 100 documents that match the query. To get the next set of documents, you can run the same query with the last document’s sort values as the search_after parameter, keeping the same sort and pit.id. You can use the optional keep_alive parameter to extend the PIT time.

Close the PIT

When your queries on the dataset are complete, you can delete the PIT using the DELETE operation. PITs automatically expire after the keep_alive duration.

Considerations and limitations

Keep in mind the following limitations when using this feature:

SQL and PPL support

OpenSearch Serverless provides a primary query interface called query DSL that you can use to search your data. Query DSL is a flexible language with a JSON interface. In addition to DSL, you can now extract insights out of OpenSearch Serverless using the familiar SQL query syntax.

You can use the SQL and PPL API, the /plugins/_sql and /plugins/_ppl endpoints respectively, to search the data. You can use aggregations, group by, and where clauses to investigate your data and read your data as JSON documents or CSV tables, so you have the flexibility to use the format that works best for you. By default, queries return data in JDBC format. You can specify the response format as JDBC, standard OpenSearch JSON, CSV, or raw.

Use the /plugins/_sql endpoint to send SQL queries to the SQL plugin, as shown in the following example.

Besides basic filtering and aggregation, OpenSearch SQL also supports complex queries, such as querying semi-structured data, set operations, sub-queries and limited JOINs. Beyond the standard functions, OpenSearch functions are provided for better analytics and visualization.

For PPL queries, use the /plugins/_ppl endpoint to send queries to the SQL plugin.

Considerations and limitations

Keep in mind the following:

  • Query Workbench is not supported for SQL and PPL queries
  • The SQL and PPL CLI is supported and can be used to issue SQL and PPL queries
  • DELETE statements are not supported
  • SQL plugin data sources are not supported
  • The SQL query stats API is not supported

Summary

In this post, we discussed new features in OpenSearch Serverless. PIT is a useful feature when you need to maintain a consistent view of your data for pagination during search operations. SQL in OpenSearch Service bridges the gap between traditional relational database concepts and the flexibility of OpenSearch’s document-oriented data storage. You can send SQL and PPL queries to the _sql and _ppl endpoints, respectively, and use aggregations, group by, and where clauses to analyze their data.

For more information, refer to :


About the Authors

Jagadish Kumar (Jag) is a Senior Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Frank Dattalo is a Software Engineer with Amazon OpenSearch Service. He focuses on the search and plugin experience in Amazon OpenSearch Serverless. He has an extensive background in search, data ingestion, and AI/ML. In his free time, he likes to explore Seattle’s coffee landscape.

Milav Shah is an Engineering Leader with Amazon OpenSearch Service. He focuses on the search experience for OpenSearch customers. He has extensive experience building highly scalable solutions in databases, real-time streaming, and distributed computing. He also possesses functional domain expertise in verticals like Internet of Things, fraud protection, gaming, and ML/AI. In his free time, he likes to ride his bicycle, hike, and play chess.

Introducing Amazon MWAA micro environments for Apache Airflow

Post Syndicated from Hernan Garcia original https://aws.amazon.com/blogs/big-data/introducing-amazon-mwaa-micro-environments-for-apache-airflow/

Amazon Managed Workflows for Apache Airflow (Amazon MWAA), is a managed Apache Airflow service used to extract business insights across an organization by combining, enriching, and transforming data through a series of tasks called a workflow. It enhances infrastructure security and availability while reducing operational overhead.

Today, we’re excited to announce mw1.micro, the latest addition to Amazon MWAA environment classes. This offering is designed to provide an even more cost-effective solution for running Airflow environments in the cloud. With mw1.micro, we’re bringing the power of Amazon MWAA to teams who require a lightweight environment without compromising on essential features. In this post, we’ll explore mw1.micro characteristics, key benefits, ideal use cases, and how you can set up an Amazon MWAA environment based on this new environment class.

Customers maintain multiple MWAA environments to separate development stages, optimize resources, manage versions, enhance security, ensure redundancy, customize settings, improve scalability, and facilitate experimentation. This approach offers greater flexibility and control over workflow management. These organizations often maintain multiple AWS accounts for development, testing, and production stages, leading to increased complexity and cost. The traditional approach of using full-sized Amazon MWAA environments for development and testing can also be expensive, especially for teams working on smaller projects or proof-of-concept initiatives. Additionally, customers adopting a federated deployment model find it challenging to provide isolated environments for different teams or departments, and at the same time optimize cost. The introduction of mw1.micro addresses these pain points by offering an option that enables a more efficient resource utilization and significant cost savings.

The micro environment class

The mw1.micro configuration provides a balanced set of resources suitable for small-scale data processing and orchestration tasks. The class allocates 1 vCPU and 3GB of RAM for a scheduler/worker hybrid container. Similarly, the web server is equipped with 1 vCPU and 3 GB RAM configuration. The Amazon Elastic Container Service (Amazon ECS) tasks launched in the environment use AWS Fargate platform version 1.4.0, increasing ephemeral task storage to 20 GB.

mw1.micro environments support up to three concurrent tasks, making it ideal for sequential or lightly parallelized workflows. Additionally, it can accommodate up to 25 DAGs, providing ample capacity for organizing and managing various data pipelines and processes. This micro environment is particularly well-suited for development, testing, or small production workloads where resource optimization and cost-efficiency are primary concerns.

The following table summarizes the environment capabilities of mw1.micro.

Class/Resources Scheduler and Worker vCPU/RAM Web Server vCPU/RAM Concurrent Tasks DAG Capacity
mw1.micro 1 vCPU / 3GB 1 vCPU / 3GB 3 Up to 25

For mw1.micro, we maintain the general architecture of Amazon MWAA, and combine the Airflow scheduler and worker into a single container. For this reason, mw1.micro uses only two AWS Fargate tasks, one scheduler/worker hybrid, and one web server. The following diagram illustrates the environment architecture.

Another important change is that the meta database will now use a t4g.medium Amazon Aurora PostgreSQL-Compatible Edition instance powered by AWS Graviton2. With the Graviton2 family of processors, you get compute, storage, and networking improvements, and the reduction of your carbon footprint offered by the AWS family of processors.

Supported features

mw1.micro maintains Amazon MWAA and Airflow key functionalities that developers currently rely on:

  • You can set up a public or private web server, allowing you to control access to your Airflow UI as needed
  • You can add custom plugins and requirements, enabling you to extend Airflow’s capabilities and manage dependencies effortlessly
  • Startup scripts can be used to perform initialization tasks, making sure your environment is configured precisely to your specifications
  • The Airflow UI is fully functional, providing the same intuitive interface for managing and monitoring your workflows
  • It has the same networking features as other Amazon MWAA environment classes, such as custom URLs and shared virtual private cloud (VPC) support
  • Scheduler and worker logs remain separate in their respective Amazon CloudWatch log groups, providing ease of monitoring and troubleshooting

Considerations

The architectural decisions behind mw1.micro reflect a balance between functionality and cost-effectiveness. Here are the constraints the limited resources in mw1.micro brings:

  • The scheduler and worker are combined into a single Fargate task. Only a single scheduler/worker container is supported.
  • micro consists of a single Fargate task for the web server. The maximum number of web servers is 1.
  • The number of concurrent Airflow tasks in the worker (worker_autoscale) can be set to a maximum value of 3.

Pricing and availability

Amazon MWAA pricing dimensions remains unchanged, and you only pay for what you use:

  • The environment class
  • Metadata database storage consumed

Metadata database storage pricing remains the same. Refer to Amazon Managed Workflows for Apache Airflow Pricing for rates and more details.

Observe Amazon MWAA performance

When you start using the new environment class, it’s important to understand its behavior for maintaining optimal operation and identifying potential capacity issues. It’s essential to monitor key metrics such as metadata database memory usage, and CPU utilization of the worker/scheduler hybrid container. We recommend following the guidance described in Introducing container, database, and queue utilization metrics for Amazon MWAA to better understand the state of your environments, and get insights to right-size your resources.

Set up a new micro environment in Amazon MWAA

You can set up an Amazon MWAA micro environment in your account and preferred AWS Region using the AWS Management Console, API, or AWS Command Line Interface (AWS CLI). If you’re adopting infrastructure as code (IaC), you can automate the setup using AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or Terraform scripts.

The Amazon MWAA micro environment class is available today in all Regions where Amazon MWAA is currently available.

Conclusion

In this post, we announced the availability of the new micro environment class in Amazon MWAA. This offering addresses the needs of teams working on smaller projects, proof-of-concept initiatives, or those requiring isolated environments for different departments. By providing a lightweight yet feature-rich solution, mw1.micro enables organizations to achieve substantial cost savings without compromising on essential functionalities.

As you explore the possibilities of mw1.micro, remember to monitor its performance using the recommended metrics to maintain optimal operation. With its availability across all Regions where Amazon MWAA is offered, your teams can now use the power of Airflow in a more streamlined and economical manner, opening up new opportunities for efficient data pipeline management and orchestration in the cloud.

For additional details and code examples on Amazon MWAA, visit the Amazon MWAA User Guide and the Amazon MWAA examples GitHub repo.

Apache, Apache Airflow, and Airflow are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.


About the Authors

Hernan Garcia is a Senior Solutions Architect at AWS based in the Netherlands. He works in the financial services industry, supporting enterprises in their cloud adoption. He is passionate about serverless technologies, security, and compliance. He enjoys spending time with family and friends, and trying out new dishes from different cuisines.

Sriharsh Adari is a Senior Solutions Architect at AWS, where he helps customers work backward from business outcomes to develop innovative solutions on AWS. Over the years, he has helped multiple customers on data platform transformations across industry verticals. His core area of expertise includes technology strategy, data analytics, and data science. In his spare time, he enjoys playing sports, watching TV shows, and playing Tabla.

Important changes to CloudTrail events for AWS IAM Identity Center

Post Syndicated from Arthur Mnev original https://aws.amazon.com/blogs/security/modifications-to-aws-cloudtrail-event-data-of-iam-identity-center/

AWS IAM Identity Center is streamlining its AWS CloudTrail events by including only essential fields that are necessary for workflows like audit and incident response. This change simplifies user identification in CloudTrail, addressing customer feedback. It also enhances correlation between IAM Identity Center users and external directory services, such as Okta Universal Directory or Microsoft Active Directory.

Effective January 13, 2025, IAM Identity Center will stop emitting userName and principalId fields under the user identity element in CloudTrail events. These fields will be excluded from the CloudTrail events that are initiated when users sign in to IAM Identity Center, use the AWS access portal, and access AWS accounts through the AWS CLI. Instead, IAM Identity Center now emits user ID and Identity Store Amazon Resource Name (ARN) fields to replace the userName and principalId fields, simplifying user identification. IAM Identity Center CloudTrail events will also specify IdentityCenterUser as the identity type instead of Unknown, providing a clear identifier for users. Additionally, IAM Identity Center will omit the value of a group’s displayName in CloudTrail events when you create or update a group. You can access group attributes, such as displayName, by using the Identity Store DescribeGroup API operation for authorized workflows.

We recommend that you update your workflows that process the userName, principalId, userIdentity type, or group displayName fields in CloudTrail events for IAM Identity Center before these changes take effect on January 13, 2025. This blog post provides guidance for these updates.

How to prepare your workflows for the upcoming changes to IAM Identity Center user identification in CloudTrail

To simplify user identification, IAM Identity Center is making changes to the user identity element for its CloudTrail events. Based on these changes, you can update your workflows to link CloudTrail events to a specific user, associate users with their external directories, and track user activity within the same session. The updated user identity element for a sample CloudTrail event is shared at the end of this section.

IAM Identity Center will update the userIdentity type for CloudTrail events that are emitted when users sign in, use the AWS access portal, and access AWS accounts through the AWS CLI. For authenticated users, the userIdentity type will change from Unknown to IdentityCenterUser. For unauthenticated users, the userIdentity type will remain Unknown. We recommend that you update your workflows to accept both values.

To identify the user linked to a CloudTrail event, IAM Identity Center now emits userId and identityStoreArn fields to replace the userName and principalId fields. The userId is a unique and immutable user identifier that IAM Identity Center assigns to every user in the Identity Store, its native directory referenced by the identityStoreArn. These new fields enhance user identification and action tracking in CloudTrail and are present in the CloudTrail entries where the userIdentity type is IdentityCenterUser. For an example of the user identity element with the new fields and the describe-user CLI command to retrieve user attributes using the user ID and Identity Store ARN, see the Identifying the user and session in IAM Identity Center user-initiated CloudTrail events section of the IAM Identity Center User Guide.

Among other user attributes, you can use the describe-user CLI command to retrieve the external ID associated with a user in the Identity Store. You can use the external ID to associate Identity Store users with their external directories. The external ID maps the user to an immutable user identifier in their external directory, such as Microsoft Active Directory or Okta Universal Directory.

Note: IAM Identity Center doesn’t emit an external ID in CloudTrail. You need access to the Identity Store to retrieve an external ID based on the userId and identityStoreArn fields in CloudTrail.

If you have access to the CloudTrail events but not the Identity Store, you can use the UserName field emitted under the additionalEventData element to correlate your users with their external directories. This field represents the username that the user authenticates or federates with when signing in to IAM Identity Center. For more details, see the Correlating users between IAM Identity Center and external directories section of the IAM Identity Center User Guide.

Notes:

  • When the identity source is the AWS Directory Service, the UserName value logged in the additionalEventData element in CloudTrail is equal to the username that the user enters during authentication. For example, a user who has the username [email protected], can authenticate with anyuser, [email protected], or company.com\anyuser, and in each case the entered value is emitted in CloudTrail respectively.
  • For a sign-in failure caused by incorrect username input, IAM Identity Center emits the UserName field in its CloudTrail event as a fixed-text value of HIDDEN_DUE_TO_SECURITY_REASONS. This is because the username value input by the user in such a scenario could contain sensitive information, such as a user’s password.

To track user activity within the same session, IAM Identity Center now emits the credentialId field in CloudTrail events for user actions that take place in the AWS access portal or that use the AWS CLI. The credentialId field contains the AWS access portal session ID for a user, to help you track user actions during their session.

The following table shows a CloudTrail event example that illustrates the fields, highlighted in yellow, that will change on January 13, 2025. IAM Identity Center recently started emitting userId, identityStoreArn, credentialId, and UserName in the additional event data for its CloudTrail events. Therefore, this example considers them as existing fields.

Before the upcoming changes
"eventName": "CredentialChallenge",
"eventSource": "signin.amazonaws.com",
"userIdentity": {
  "type": "Unknown",
  "userName": "anyuser",
  "accountId": "123456789012",
  "principalId": "123456789012",
  "onBehalfOf": {
    "userId": "a11111-1111-1111-11a1-111aa111aa11",
    "identityStoreArn": "arn:aws:identitystore::111111111:identitystore/d-111111a1a"
  },
  "credentialId": "1111a111111111a1a11111a1a[…]"
},
"additionalEventData": {
    "CredentialType": "PASSWORD",
    "UserName": "anyuser"
}
After the upcoming changes
"eventName": "CredentialChallenge",
"eventSource": "signin.amazonaws.com",
"userIdentity": {
  "type": "IdentityCenterUser",
  "accountId": "123456789012",
  "onBehalfOf": {
    "userId": "a11111-1111-1111-11a1-111aa111aa11",
    "identityStoreArn": "arn:aws:identitystore::111111111:identitystore/d-111111a1a"
  },
  "credentialId": "1111a111111111a1a11111a1a[…]"
},
"additionalEventData": {
    "CredentialType": "PASSWORD",
    "UserName": "anyuser"
}

How to prepare your workflows for the upcoming changes to IAM Identity Center group management events in CloudTrail

Your workflows that require access to group attributes, such as displayName, can retrieve them by using the Identity Store DescribeGroup API operation. Beginning January 13, 2025, IAM Identity Center will replace the displayName value in the administrative CloudTrail events for CreateGroup and UpdateGroup with a fixed text value of HIDDEN_DUE_TO_SECURITY_REASONS. This update restricts access to the group displayName only to workflows that are authorized to access group attributes in the Identity Store.

The following table shows a CloudTrail event example that illustrates the upcoming change in the displayName field, which is highlighted in yellow.

Before the upcoming changes
"eventName": "CreateGroup",
"eventSource": "sso-directory.amazonaws.com",
"userIdentity": {
  "type": "AssumedRole",
  "userName": "GroupManagerRole",
  "accountId": "123456789012",
  "principalId": "123456789012"
}
…
"group": {
    "groupId": "11a1a111-1111-1010-aaa1-01111a1111a0",
    "displayName": "PowerUserGroup",
    "groupAttributes": {
        "description": {
            "stringValue": "HIDDEN_DUE_TO_SECURITY_REASONS"
        }
    }
}
After the upcoming changes
"eventName": "CreateGroup",
"eventSource": "sso-directory.amazonaws.com",
"userIdentity": {
  "type": "AssumedRole",
  "userName": "GroupManagerRole",
  "accountId": "123456789012",
  "principalId": "123456789012"
}
…
"group": {
    "groupId": "11a1a111-1111-1010-aaa1-01111a1111a0",
    "displayName": "HIDDEN_DUE_TO_SECURITY_REASONS",
    "groupAttributes": {
        "description": {
            "stringValue": "HIDDEN_DUE_TO_SECURITY_REASONS"
        }
    }
}

Gain a deeper understanding of the specific CloudTrail events impacted by the changes

Earlier in this post, we said that IAM Identity Center emits the relevant CloudTrail events when users sign in to IAM Identity Center, use the AWS access portal, and access AWS accounts through the AWS CLI, or when administrators create and update groups. These CloudTrail events belong to four event groups that the IAM Identity Center User Guide refers to as AWS access portal, OIDC, Sign-in, and Identity Store events. The following list provides more details about the use cases that lead to the emission of these CloudTrail events:

  1. The AWS access Portal events cover sign-in and sign-out from the AWS access portal, as well as the retrieval of a user’s account and application assignments, which are necessary to display the portal. IAM Identity Center also emits these events when configuring AWS CLI or IDE toolkits for access to AWS accounts as an IAM Identity Center user.
  2. The relevant OpenID Connect (OIDC) event is CreateToken. IAM Identity Center emits this event when starting a session for an authenticated user (for example, to access assigned AWS accounts through AWS CLI or IDE toolkits).
  3. The Sign-in events cover password-based and federated authentication, as well as multi-factor authentication (MFA).
  4. The relevant Identity Store events include the end-user management of MFA devices inside the AWS access portal and the two administrative Identity Store events, CreateGroup and UpdateGroup.

Note that some of the API operations behind the CloudTrail events in scope are also available as AWS CLI commands:

The two tables in this section provide a detailed record of the changes and their relation to CloudTrail events.

The following table lists the changes to fields emitted by IAM Identity Center and the relevant CloudTrail events.

Changes AWS access portal
(Use of the portal)
OIDC
(Sign-in to IAM Identity Center through AWS CLI and IDE toolkits)
Sign-in
(authentication, including MFA, federation)
Identity Store
(MFA device and group management)
Available as of January 13, 2025
Exclusion of userName from the userIdentity element for authenticated users Yes Yes, limited to the CreateToken event Yes Yes, limited to MFA management in the AWS access portal
Exclusion of principalId from the userIdentity element Yes Yes, limited to the CreateToken event Yes Yes, limited to MFA management in the AWS access portal
Modified userIdentity’s type value from Unknown to IdentityCenterUser Yes Yes, limited to the CreateToken event Yes, limited to successful authentications Yes, limited to MFA management in the AWS access portal
Exclusion of the group displayName value from the requestParameters and responseElements elements No No No Yes, limited to administrative CreateGroup and UpdateGroup events
Exclusion of the UserName (in the additionalEventData element) a user keys in on failed authentication attempts No No Yes, limited to the CredentialChallenge event No
Available as of October 2024
Addition of the onBehalfOf element with userId and identityStoreArn, and credentialId in the userIdentity element Yes Yes, limited to the CreateToken event Yes, limited to successful authentications Yes, limited to MFA management in the AWS access portal
Addition of UserName in additionalEventData element No No Yes, limited to CredentialChallenge and UserAuthentication events in specific cases No

The following table summarizes the relevant IAM Identity Center CloudTrail event groups, event sources, and event names.

Event group Source Event names
AWS access portal sso.amazonaws.com Authenticate
Federate
ListAccountRoles
ListAccounts
ListApplications
ListProfilesForApplication
GetRoleCredentials
Logout
OIDC sso.amazonaws.com CreateToken
Sign-in signin.amazon.com CredentialChallenge
CredentialVerification
UserAuthentication
Identity Store sso-directory.amazonaws.com or
identitystore.amazonaws.com
ListMfaDevicesForUser
DeleteMfaDeviceForUser
UpdateMfaDeviceForUser
StartWebAuthnDeviceRegistration
StartVirtualMfaDeviceRegistration
CompleteWebAuthnDeviceRegistration
CompleteVirtualMfaDeviceRegistration
CreateGroup
UpdateGroup

Conclusion

In this post, we reviewed several important upcoming and recently completed changes to CloudTrail events that IAM Identity Center emits. We recommend that you update your CloudTrail based workflows before January 13, 2025 if they rely on the userName, principalId, or type fields in the CloudTrail user identity element when users sign in to IAM Identity Center, use the AWS access portal, access AWS accounts through the AWS CLI, or set a group’s displayName field in group management administrative events. AWS has recently introduced the fields userId, identityStoreArn, and credentialId in the CloudTrail user identity element to help you complete your updates.

Please contact your AWS account team or AWS support if you need additional assistance.

Arthur Mnev
Arthur Mnev

Arthur is a Senior Specialist Security Architect for AWS Industries. He spends his day working with customers and designing innovative approaches to help customers move forward with their initiatives, improve their security posture, and reduce security risks in their cloud journeys. Outside of work, Arthur enjoys being a father, skiing, scuba diving, and Krav Maga.
Alex Milanovic
Alex Milanovic

Alex is a Senior Product Manager at AWS Identity, with over a decade of expertise in Identity and Access Management (IAM) and more than 25 years in the tech sector. His work centers on empowering organizations of all sizes, from large enterprises to small and medium-sized businesses, to effectively adopt and implement IAM cloud services.

Streamline container application networking with built-in Amazon ECS support in Amazon VPC Lattice

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/streamline-container-application-networking-with-native-amazon-ecs-support-in-amazon-vpc-lattice/

Since its launch, Amazon VPC Lattice has streamlined complex networking tasks. As a result, my perspective on how to build and connect modern, multi-service applications has changed. As my colleague Danilo wrote in his post announcing the general availability of VPC Lattice:

“By using VPC Lattice, you can focus on your application logic and improve productivity and deployment flexibility with consistent support for instances, containers, and serverless computing.”

Today, we’re announcing Amazon VPC Lattice built-in support for Amazon Elastic Container Service (Amazon ECS). With this new built-in integration, Amazon ECS services can now be directly associated with VPC Lattice target groups without the need for intermediate load balancers.

Here’s a quick look at how you can find Amazon VPC Lattice integration while creating an Amazon ECS service:

The Amazon VPC Lattice integration with Amazon ECS works by registering and deregistering IP addresses from ECS tasks within a service as targets in a VPC Lattice target group. As ECS tasks for the service are launched, Amazon ECS will automatically register those tasks to the VPC Lattice target group.

Furthermore, if ECS tasks fail VPC Lattice health checks, Amazon ECS will automatically replace the tasks. Also, if any task is terminated or scales down, it’s removed from the target group.

Using the Amazon VPC Lattice integration
Let me walk you through how to use this new integration. In the following demo, I will deploy a simple application server running as an ECS service and configure the integration with VPC Lattice. Then, I’ll test the application server by connecting to the VPC Lattice domain name without having to configure additional load balancers on Amazon ECS.

Before I can start with this integration, I need to make sure Amazon ECS will have the required permissions to register and deregister targets into VPC Lattice. To learn more, please visit the Amazon ECS infrastructure IAM role documentation page.

To use the integration with VPC Lattice, I need to define a task definition with at least one container and one port mapping. This is an example of my task definition.

{
    "containerDefinitions": [
        {
            "name": "webserver",
            "image": "public.ecr.aws/ecs-sample-image/amazon-ecs-sample:latest",
            "cpu": 0,
            "portMappings": [
                {
                    "name": "web-80-tcp",
                    "containerPort": 80,
                    "hostPort": 80,
                    "protocol": "tcp",
                    "appProtocol": "http"
                }
            ],
            ...
            *redacted for brevity*
}

Then, I navigate to my ECS cluster and choose Create.

Next, I need to select the task definition and assign the service name.

In the VPC Lattice integration section, I choose Turn on VPC Lattice to start configuring the target group for VPC Lattice. I don’t need to specify a load balancer because I’ll use VPC Lattice. By default, VPC Lattice will use a round-robin routing algorithm to route requests to healthy targets.

Now, I can start defining the integration for my ECS service in VPC Lattice. First, I select the infrastructure role for Amazon ECS. Then, I need to select the virtual private cloud (VPC) where I want my service to run. After that, I need to define the Target groups that will receive traffic. After I’m done configuring the service with VPC Lattice integration, I create this service.

After a few minutes, I have my ECS service ready. I navigate to the service and choose Configuration and networking. If I scroll down to the VPC Lattice section, I can see the VPC Lattice target group created.

To get more information on this target group, I select the target group name, which will redirect me to the VPC Lattice target group page. Here, I can see that Amazon ECS successfully registered the IP address of the running task.

Now, I need to create a VPC Lattice service and service network. My preference is always to create the VPC Lattice service then associate with the VPC Lattice service network later on. So, let’s do that.

I choose Services under the VPC Lattice section and choose Create service.

I fill in all the details required to create a VPC Lattice service and choose Next.

Then, I add a listener, and for the Forward to target group on the Listener default action, I select the newly created target group.

On the next page, because I’m going to create the VPC Lattice service network later, I skip this step and choose Next, review the configurations, and create the service.

With VPC Lattice service created, now it’s time to create VPC Lattice service networks. I navigate to Service networks under the VPC Lattice section and choose Create service network.

First, I fill the VPC Lattice service network name.

Then, on the Service associations page, I select the service that I have created.

I associate this service network to my VPC as well as the security group.

For the simplicity of this demo, I set None for the Auth type. However, I highly recommend you to read how you can use IAM to manage access to VPC Lattice. Then, I choose Create service network.

At this stage, we have everything setup for this integration. My VPC Lattice service network is now associated with my VPC Lattice service and my VPC.

With everything set up, I copy the Domain name from my VPC Lattice service page.

Then, to access the service, I log in to the instance in the same VPC and call the service by using the domain name from VPC Lattice.

[ec2-user@ ~]$ curl http://service-a-XYZ.XYZ.vpc-lattice-svcs.XYZ.on.aws

"Hello there! I'm Amazon ECS."

One thing to note is if you’re not receiving traffic to your Amazon ECS workloads, check the security groups as described in the Control traffic in VPC Lattice using security groups documentation page.

I’m personally excited about this integration because it unlocks various possibilities while streamlining application architectures and improving overall system reliability. Now that all AWS compute types are inherently supported in VPC Lattice, I can unify services across all my ECS clusters, AWS accounts, and VPCs.

Things to know
Here are a couple of important points to note:

Try this new capability of Amazon VPC Lattice today and see how it can streamline your container application communication running on Amazon ECS.

Happy building!

Donnie Prakoso

AWS Lambda SnapStart for Python and .NET functions is now generally available

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/aws-lambda-snapstart-for-python-and-net-functions-is-now-generally-available/

Today, we’re announcing the general availability of AWS Lambda SnapStart for Python and .NET functions that delivers faster function startup performance, from several seconds to as low as sub-second, typically with minimal or no code changes in Python, C#, F#, and Powershell.

In November 28, 2022, we introduced Lambda SnapStart for Java functions to improve startup performance by up to 10 times. With Lambda SnapStart, you can reduce outlier latencies that come from initializing functions, without having to provision resources or spend time implementing complex performance optimizations.

Lambda SnapStart works by caching and reusing the snapshotted memory and disk state of any one-time initialization code, or code that runs only the first time a Lambda function is invoked. Lambda takes a Firecracker microVM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access.

When you invoke the function version for the first time, and as the invocations scale up, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup latency. Lambda SnapStart makes it easy to build highly scalable and responsive applications in Python and .NET using AWS Lambda.

For Python functions, startup latency from initialization code can be several seconds long. Some scenarios where this can occur are – loading dependencies (such as LangChain, Numpy, Pandas, and DuckDB) or using frameworks (such as Flask or Django). Many functions also perform machine learning (ML) inference using Lambda, and need to load ML models during initialization – a process that can take tens of seconds depending on the size of the model used. Using Lambda SnapStart can reduce startup latency from several seconds to as low as sub-second for these scenarios.

For .NET functions, we expect most use cases to benefit because .NET just-in-time (JIT) compilation takes up to several seconds. Latency variability associated with initialization of Lambda functions has been a long-standing barrier for customers to use .NET for AWS Lambda. SnapStart enables functions to resume quickly by caching a snapshot of their memory and disk state. Therefore, most .NET functions will experience significant improvement in latency variability with Lambda SnapStart.

Getting started with Lambda SnapStart for Python and .NET
To get started, you can use the AWS Management Console, AWS Command Line Interface (AWS CLI) or AWS SDKs to activate, update, and delete SnapStart for Python and .NET functions.

On the AWS Lambda console, go to the Functions page and choose your function to use Lambda SnapStart. Select Configuration, choose General configuration, and then choose Edit. You can see SnapStart settings on the Edit basic settings page.

You can activate Lambda functions using Python 3.12 and higher, and .NET 8 and higher managed runtimes. Choose Published versions and then choose Save.

When you publish a new version of your function, Lambda initializes your code, creates a snapshot of the initialized execution environment, and then caches the snapshot for low-latency access. You can invoke the function to confirm activation of SnapStart.

Here is an AWS CLI command to update the function configuration by running the update-function-configuration command with the --snap-start option.

aws lambda update-function-configuration \
  --function-name lambda-python-snapstart-test \
  --snap-start ApplyOn=PublishedVersions

Publish a function version with the publish-version command.

aws lambda publish-version \
  --function-name lambda-python-snapstart-test

Confirm that SnapStart is activated for the function version by running the get-function-configuration command and specifying the version number.

aws lambda get-function-configuration \
  --function-name lambda-python-snapstart-test:1

If the response shows that OptimizationStatus is On and State is Active, then SnapStart is activated, and a snapshot is available for the specified function version.

"SnapStart": { 
    "ApplyOn": "PublishedVersions",
    "OptimizationStatus": "On"
 },
 "State": "Active",

To learn more about activating, updating, and deleting a snapshot with AWS SDKs, AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and AWS Cloud Development Kit (AWS CDK), visit Activating and managing Lambda SnapStart in the AWS Lambda Developer Guide.

Runtime hooks
You can use runtime hooks to run code executed before Lambda creates a snapshot or after Lambda resumes a function from a snapshot. Runtime hooks are useful to perform cleanup or resource release operations, dynamically update configuration or other metadata, integrate with external services or systems, such as sending notifications or updating external state or to fine-tune your function’s startup sequence, such as by preloading dependencies.

Python runtime hooks are available as part of the open source Snapshot Restore for Python library, which is included in Python managed runtime. This library provides two decorators @register_before_snapshot to run before Lambda creates a snapshot and @register_after_restore to run when Lambda resumes a function from a snapshot. To learn more, visit Lambda SnapStart runtime hooks for Python in the AWS Lambda Developer Guide.

Here is an example Python handler to show how to run code before checkpointing and after restoring:

from snapshot_restore_py import register_before_snapshot, register_after_restore

def lambda_handler(event, context):
    # handler code

@register_before_snapshot
def before_checkpoint():
    # Logic to be executed before taking snapshots

@register_after_restore
def after_restore():
    # Logic to be executed after restore

You can also use .NET runtime hooks available as part of the Amazon.Lambda.Core package (version 2.5 or later) from NuGet. This library provides two methods RegisterBeforeSnapshot() to run before snapshot creation and RegisterAfterRestore() to run after resuming a function from a snapshot. To learn more, visit Lambda SnapStart runtime hooks for .NET in the AWS Lambda Developer Guide.

Here is an example C# handler to show how to run code before checkpointing and after restoring:

public class SampleClass
{
    public SampleClass()
    { 
        Amazon.Lambda.Core.SnapshotRestore.RegisterBeforeSnapshot(BeforeCheckpoint); 
        Amazon.Lambda.Core.SnapshotRestore.RegisterAfterRestore(AfterRestore);
    }
    
    private ValueTask BeforeCheckpoint()
    {
        // Add logic to be executed before taking the snapshot
        return ValueTask.CompletedTask;
    }

    private ValueTask AfterRestore()
    {
        // Add logic to be executed after restoring the snapshot
        return ValueTask.CompletedTask;
    }

    public APIGatewayProxyResponse FunctionHandler(APIGatewayProxyRequest request, ILambdaContext context)
    {
        // INSERT business logic
        return new APIGatewayProxyResponse
        {
            StatusCode = 200
        };
    }
}

To learn how to implement runtime hooks for your preferred runtime, visit Implement code before or after Lambda function snapshots in the AWS Lambda Developer Guide.

Things to know
Here are some things that you should know about Lambda SnapStart:

  • Handling uniqueness – If your initialization code generates unique content that is included in the snapshot, then the content will not be unique when it’s reused across execution environments. To maintain uniqueness when using SnapStart, you must generate unique content after initialization, such as if your code uses custom random number generation that doesn’t rely on built-in-libraries or caches any information such as DNS entries that might expire during initialization. To learn how to restore uniqueness, visit Handling uniqueness with Lambda SnapStart in the AWS Lambda Developer Guide.
  • Performance tuning – To maximize the performance, we recommend that you preload dependencies and initialize resources that contribute to startup latency in your initialization code instead of in the function handler. This moves the latency associated with heavy class loading out of the invocation path, optimizing startup performance with SnapStart.
  • Networking best practices –The state of connections that your function establishes during the initialization phase isn’t guaranteed when Lambda resumes your function from a snapshot. In most cases, network connections that an AWS SDK establishes automatically resume. For other connections, review the Maximize Lambda SnapStart performance in the AWS Lambda Developer Guide.
  • Monitoring functions – You can monitor your SnapStart functions using Amazon CloudWatch log stream, AWS X-Ray active tracing, and accessing real-time telemetry data for extensions using the Telemetry API, Amazon API Gateway and function URL metrics. To learn more about differences for SnapStart functions, visit Monitoring for Lambda SnapStart in the AWS Lambda Developer Guide.

Now available
AWS Lambda SnapStart for Python and .NET functions are available today in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm) AWS Regions.

With the Python and .NET managed runtimes, there are two types of SnapStart charges: the cost of caching a snapshot per function version that you publish with SnapStart enabled, and the cost of restoration each time a function instance is restored from a snapshot. So, delete unused function versions to reduce your SnapStart cache costs. To learn more, visit the AWS Lambda pricing page.

Give Lambda SnapStart for Python and .NET a try in the AWS Lambda console. To learn more, visit Lambda SnapStart page and send feedback through AWS re:Post for AWS Lambda or your usual AWS Support contacts.

Channy

Build and modify apps using natural language with AWS App Studio, now generally available

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/build-and-modify-apps-using-natural-language-with-aws-app-studio-now-generally-available/

Announced as preview in July, AWS App Studio is a generative AI-powered application development service that enables users to create applications using natural language, without the need for professional software development skills. In that post, I covered how AWS App Studio helps you build secure, scalable applications and eliminates operational overhead by fully managing each application.

App Studio empowers a new set of builders to create business applications. Whether you are an IT Project Manager, Data Engineer, Enterprise Architect, or Solution Architect, simply describe your requirements in natural language, within minutes, App Studio generates fully functional applications complete with multipage UIs, data models, and custom business logic.

Today, we’re excited to announce that AWS App Studio is now generally available in the US West (Oregon) and Europe (Ireland) AWS Regions.

Building on feedback from the preview, we are introducing several new features to enhance your app building experience:

Modify your applications with natural language
During the preview period, customers shared with us that they enjoy and appreciate generating fully functional applications using natural language prompts. However, the development journey usually doesn’t stop there, and they asked if they could extend or modify their apps using natural language.

Now, with App Studio, you can modify your applications using natural language. After you’ve generated your applications, you can now describe your desired changes and the assistant will propose updates for you to review. Upon confirmation, it will automatically make the change. This feature makes it even faster and easier to customize your application.

Let’s see how it works in my IT inventory management application that I built with App Studio.

With this new feature, I can chat with the assistant to modify my applications.

To modify my application, I can provide a prompt to add another feature to my app. In this case, I need to add another text input for the web URL to get details of requested hardware, and I need to another text area to store notes.

The generative AI assistant will then process my input and provide a proposal. I can review this proposal and select Confirm to proceed.

Then, the assistant will automatically add the components and modify my application.

Add intelligence to your app with a new generative AI component
We’re also introducing a new component to make it even easier to add generative AI capabilities such as text summarization, content generation, and file analysis to your applications.

There are two ways to use this feature. First, with my canvas open, I can select the Gen AI component and drag and drop it onto the canvas. Then, while selecting the component, I can use the assistant to customize it.

Another way is to use the assistant directly. Let’s say I need a feature to analyze repair notes and provide a summary to make it easier for me to review. I can type what I need in the chat box or use the suggested prompts.

Then, the assistant will process my input and provide a proposal. I can review the proposal and select Confirm to proceed. 

App Studio will automatically add the required components. On the canvas, I see there’s a button that triggers an automation. If I need to change the underlying prompt, I can select the link that will redirect me to the respective automation. 

Under the hood, the Gen AI component is powered by a new action step called Gen AI Prompt. This new component provides an easy way to modify the prompt and input parameters to customize the output generated by the large language model (LLM).

Here’s my published app with the newly added generative AI feature to summarize repair notes.

Generate and add custom business logic with natural language
I can also use the assistant to help me add custom business logic with JavaScript in my automation.

Let’s say that I need a custom business logic to calculate repair duration and notify my stakeholders through email. Here’s the multi-step automation that I created. To add the custom logic to my automation, I choose the JavaScript component and then drag and drop it into the right spot.

Next, I need to select the action and, in the Properties panel, I select the Expand editor icon.

With this feature, I can now generate JavaScript code with natural language. Here, I provide a prompt and App Studio generates the source code for me along with comments. This generated source code provides a foundation that I can customize to suit my requirements. 

Next, I need to add the Send Email action into my automation to complete the flow.

Customize your app’s theme and style
Now, you can customize the look and feel of your application with App themes. With this feature, you can change the appearance of your application to Light mode or Dark mode. Additionally, you can specify custom colors for your app to match your company’s brand. To enable this feature, you need to turn on the Customize toggle.

Available today
Start building secure, intelligent, and scalable business applications with App Studio today. It’s free to build, and you’ll receive a 60-day (250 user hour) free trial.

Learn more about all these features and others in the AWS App Studio documentation and join the conversation in the #aws-app-studio channel in the AWS Developers Slack workspace.

Happy building,

Donnie

AWS Weekly Roundup: AWS BuilderCards at re:Invent 2024, AWS Community Day, Amazon Bedrock, vector databases, and more (Nov 18, 2024)

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-buildercards-at-reinvent-2024-aws-community-day-amazon-bedrock-vector-databases-and-more-nov-18-2024/

This week, we wrapped up the final 2024 Latin America Amazon Web Services (AWS) Community Days of the year in Brazil, with multiple parallel events taking place. In Goiânia, we had Marcelo Palladino, senior developer advocate, and Marcelo Paiva, AWS Community Builder, as keynote speakers. Florianópolis feature Ana Cunha, senior developer advocate, and in Santiago de Chile, I had the honor to share the stage with Rossana Suarez, AWS Container Hero, as keynote speakers. These events, organized by communities for communities, provide opportunities to network, learn something new, and immerse yourself in the community. In a community, everyone grows together, and no one is left behind.

AWS Lambda celebrates its 10th anniversary, the service that introduced me to AWS and remains my favorite. Born from customer needs, it revolutionized cloud computing by allowing code execution without server management. Since its inception, documented in this LinkedIn post by Dr. Werner Vogels, Chief Technology Officer at Amazon.com, through the original PR/FAQ document, the service has grown significantly, introducing features such as 1ms billing precision and support for 10GB memory. Thank you AWS Lambda, here’s to many more anniversaries.

Amazon invests $110 million to support AI research at universities using Trainium chips. The initiative provides computing resources using AWS Trainium chips, enabling researchers to develop new AI architectures and machine learning innovations that will be open-sourced for broader advancement. Check out the Linkedin post by Matt Garman, CEO at AWS.

Last week’s launches
AWS BuilderCards second edition at re:Invent 2024Jeff Barr announced the launch of the second edition of AWS BuilderCards at re:Invent 2024. It includes improvements to the design and game mechanics, plus a new add-on pack on generative AI. Over 15,000 sets have been distributed at previous events, with excellent user feedback. They’ll be available for online purchase after re:Invent.

Amazon EventBridge announces up to 94% improvement in end-to-end latency for Event BusesAmazon EventBridge has improved end-to-end latency for Event Buses by up to 94%, reducing average latency from 2235.23ms (measured in January 2023) to 129.33ms (measured in August 2024 at P99). This enhancement enables faster processing for time-sensitive applications such as fraud detection, industrial automation, and gaming across all AWS Regions where Amazon EventBridge is available, including the AWS GovCloud (US) Regions, at no additional cost to you.

Introducing resource control policies (RCPs), a new type of authorization policy in AWS OrganizationsResource control policies (RCPs), a new authorization policy in AWS Organizations. RCPs allow centralized control over maximum permissions granted to resources, complementing service control policies (SCPs) that control permissions for principals. RCPs can restrict external access to resources like Amazon Simple Storage Service (Amazon S3) buckets, enforcing a data perimeter across the organization.

Replicate changes from databases to Apache Iceberg tables using Amazon Data Firehose (in preview) – A new preview capability in Amazon Data Firehose that captures and replicates database changes to Apache Iceberg tables on Amazon S3. This feature supports PostgreSQL and MySQL databases, providing a simple solution to stream database updates without impacting performance. It automatically handles data partitioning and schema evolution, eliminating the need for complex ETL processes.

Amazon S3 now supports up to 1 million buckets per AWS account– Amazon S3 has increased its default bucket quota from 100 to 10,000 per AWS account. Customers can now request increases up to 1 million buckets. The first 2,000 buckets are free, with a small monthly fee applying thereafter for additional buckets.

Amazon Keyspaces (for Apache Cassandra) reduces prices by up to 75%Amazon Keyspaces (for Apache Cassandra) announces significant price reductions of up to 75%. The service reduces on-demand mode pricing by up to 56% for single-region and 65% for multi-region usage. Time-to-live (TTL) delete prices are also reduced by 75%.

Centrally managing root access for customers using AWS OrganizationsAWS Identity and Access Management (IAM) launches a new capability for centrally managing root access in AWS Organizations. This feature allows security teams to remove long-term root credentials from member accounts and use temporary, task-scoped root sessions for specific actions. The solution enhances security by eliminating permanent root credentials while maintaining the ability to perform necessary privileged operations.

Amazon DynamoDB reduces prices for on-demand throughput and global tablesAmazon DynamoDB announces significant price reductions, cutting on-demand mode throughput costs by 50% and global tables by up to 67%. Multi-region replicated writes now match single-region pricing. These changes make on-demand mode the recommended choice for most DynamoDB workloads.

Amazon Q Developer plugins for Datadog and Wiz now generally availableAmazon Q Developer now offers plugins for Datadog and Wiz services, allowing users to access these partners features directly through the AWS Console. Users can query information using natural language commands like @datadog or @wiz to get real-time updates and security insights.

Other AWS blog posts
Here are some additional projects and blog posts that you might find interesting:

Introducing Stable Diffusion 3.5 Large in Amazon SageMaker JumpStart – This powerful 8.1 billion parameter model enables high-quality, photorealistic image generation from text prompts. Customers can seamlessly deploy and use the model in Amazon SageMaker JumpStart, benefiting from Amazon SageMaker security and machine learning operations (MLOps) capabilities.

Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services – This blog post explains how we developed a Chrome extension that uses AI services to enhance live streaming experiences. The extension use Amazon Transcribe, Amazon Translate, and Amazon Bedrock to provide real-time transcription, translation, and summarization of live streams directly in the browser. It supports over 50 languages for transcription and 75 for translation, making content globally accessible.

Simplify automotive damage processing with Amazon Bedrock and vector databases –This blog post presents a solution combining Amazon Bedrock and vector databases to streamline automotive damage assessment. The system uses AI to analyze vehicle damage images, provide cost estimates, and match with similar cases from existing datasets. It use Anthropic’s Claude 3 and Amazon Titan Multimodal Embeddings, for efficient, accurate processing.

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service – Amazon Bedrock and Amazon OpenSearch Service vector databases combine to automate automotive damage assessment, using AI to analyze images and match them with historical data for accurate repair estimates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Community Days – Join community-led conferences featuring technical discussions, workshops, and hands-on labs driven by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are scheduled for November 23 in Indonesia, and on December 14 in Kochi, India.

AWS re:Invent 2024 – Join us in Las Vegas to learn all things AWS. Our annual conference is the best—and fastest—way to grow your skills. If you can’t join us in person, you can attend virtually by registering at
Watch re:Invent online.

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

Create your AWS Builder ID and reserve your alias. Builder ID is a universal login credential that gives users access to AWS tools and resources, including over 600 free training courses, community features, and developer tools such as Amazon Q Developer beyond the AWS Management Console.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Thanks to Odina Jacobs for the AWS Community Chile photo.

Eli

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Replicate changes from databases to Apache Iceberg tables using Amazon Data Firehose (in preview)

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/replicate-changes-from-databases-to-apache-iceberg-tables-using-amazon-data-firehose/

Today, we’re announcing the availability, in preview, of a new capability in Amazon Data Firehose that captures changes made in databases such as PostgreSQL and MySQL and replicates the updates to Apache Iceberg tables on Amazon Simple Storage Service (Amazon S3).

Apache Iceberg is a high-performance open-source table format for performing big data analytics. Apache Iceberg brings the reliability and simplicity of SQL tables to S3 data lakes and makes it possible for open source analytics engines such as Apache Spark, Apache Flink, Trino, Apache Hive, and Apache Impala to concurrently work with the same data.

This new capability provides a simple, end-to-end solution to stream database updates without impacting transaction performance of database applications. You can set up a Data Firehose stream in minutes to deliver change data capture (CDC) updates from your database. Now, you can easily replicate data from different databases into Iceberg tables on Amazon S3 and use up-to-date data for large-scale analytics and machine learning (ML) applications.

Typical Amazon Web Services (AWS) enterprise customers use hundreds of databases for transactional applications. To perform large scale analytics and ML on the latest data, they want to capture changes made in databases, such as when records in a table are inserted, modified, or deleted, and deliver the updates to their data warehouse or Amazon S3 data lake in open source table formats such as Apache Iceberg.

To do so, many customers develop extract, transform, and load (ETL) jobs to periodically read from databases. However, ETL readers impact database transaction performance, and batch jobs can add several hours of delay before data is available for analytics. To mitigate impact on database transaction performance, customers want the ability to stream changes made in the database. This stream is referred to as a change data capture (CDC) stream.

I met multiple customers that use open source distributed systems, such as Debezium, with connectors to popular databases, an Apache Kafka Connect cluster, and Kafka Connect Sink to read the events and deliver them to the destination. The initial configuration and test of such systems involves installing and configuring multiple open source components. It might take days or weeks. After setup, engineers have to monitor and manage clusters, and validate and apply open source updates, which adds to the operational overhead.

With this new data streaming capability, Amazon Data Firehose adds the ability to acquire and continually replicate CDC streams from databases to Apache Iceberg tables on Amazon S3. You set up a Data Firehose stream by specifying the source and destination. Data Firehose captures and continually replicates an initial data snapshot and then all subsequent changes made to the selected database tables as a data stream. To acquire CDC streams, Data Firehose uses the database replication log, which reduces impact on database transaction performance. When the volume of database updates increases or decreases, Data Firehose automatically partitions the data, and persists records until they’re delivered to the destination. You don’t have to provision capacity or manage and fine-tune clusters. In addition to the data itself, Data Firehose can automatically create Apache Iceberg tables using the same schema as the database tables as part of the initial Data Firehose stream creation and automatically evolve the target schema, such as new column addition, based on source schema changes.

Since Data Firehose is a fully managed service, you don’t have to rely on open source components, apply software updates, or incur operational overhead.

The continual replication of database changes to Apache Iceberg tables in Amazon S3 using Amazon Data Firehose provides you with a simple, scalable, end-to-end managed solution to deliver CDC streams into your data lake or data warehouse, where you can run large-scale analysis and ML applications.

Let’ see how to configure a new pipeline
To show you how to create a new CDC pipeline, I setup a Data Firehose stream using the AWS Management Console. As usual, I also have the choice to use the AWS Command Line Interface (AWS CLI), AWS SDKs, AWS CloudFormation, or Terraform.

For this demo, I choose a MySQL database on Amazon Relational Database Service (Amazon RDS) as source. Data Firehose also works with self-managed databases on Amazon Elastic Compute Cloud (Amazon EC2). To establish connectivity between my virtual private cloud (VPC)—where the database is deployed—and the RDS API without exposing the traffic to the internet, I create an AWS PrivateLink VPC service endpoint. You can learn how to create a VPC service endpoint for RDS API by following instructions in the Amazon RDS documentation.

I also have an S3 bucket to host the Iceberg table, and I have an AWS Identity and Access Management (IAM) role setup with correct permissions. You can refer to the list of prerequisites in the Data Firehose documentation.

To get started, I open the console and navigate to the Amazon Data Firehose section. I can see the stream already created. To create a new one, I select Create Firehose stream.

Create Firehose Stream

I select a Source and Destination. In this example: a MySQL database and Apache Iceberg Tables. I also enter a Firehose stream name for my stream.

Create Firehose Stream - screen 1

I enter the fully qualified DNS name of my Database endpoint and the Database VPC endpoint service name. I verify that Enable SSL is checked and, under Secret name, I select the name of the secret in AWS Secrets Manager where the database username and password are securely stored.

Create Firehose Stream - screen 2

Next, I configure Data Firehose to capture specific data by specifying databases, tables, and columns using explicit names or regular expressions.

I must create a watermark table. A watermark, in this context, is a marker used by Data Firehose to track the progress of incremental snapshots of database tables. It helps Data Firehose identify which parts of the table have already been captured and which parts still need to be processed. I can create the watermark table manually or let Data Firehose automatically create it for me. In that case, the database credentials passed to Data Firehose must have permissions to create a table in the source database.

Create Firehose Stream - screen 3

Next, I configure the S3 bucket Region and name to use. Data Firehose can automatically create the Iceberg tables when they don’t exist yet. Similarly, it can update the Iceberg table schema when detecting a change in your database schema.

Create Firehose Stream - screen 4

As a final step, it’s important to enable Amazon CloudWatch error logging to get feedback about the stream progress and the eventual errors. You can configure a short retention period on the CloudWatch log group to reduce the cost of log storage.

After having reviewed my configuration, I select Create Firehose stream.

Create Firehose Stream - screen 5

Once the stream is created, it will start to replicate the data. I can monitor the stream’s status and check for eventual errors.

Create Firehose Stream - screen 6

Now, it’s time to test the stream.

I open a connection to the database and insert a new line in a table.

Firehose - MySQL

Then, I navigate to the S3 bucket configured as the destination and I observe that a file has been created to store the data from the table.

View parquet files on S3 bucket

I download the file and inspect its content with the parq command (you can install that command with pip install parquet-cli)

Parquet file content

Of course, downloading and inspecting Parquet files is something I do only for demos. In real life, you’re going to use AWS Glue and Amazon Athena to manage your data catalog and to run SQL queries on your data.

Things to know
Here are a few additional things to know.

This new capability supports self-managed PostgreSQL and MySQL databases on Amazon EC2 and the following databases on Amazon RDS:

The team will continue to add support for additional databases during the preview period and after general availability. They told me they are already working on supporting SQL Server, Oracle, and MongoDB databases.

Data Firehose uses AWS PrivateLink to connect to databases in your Amazon Virtual Private Cloud (Amazon VPC).

When setting up an Amazon Data Firehose delivery stream, you can either specify specific tables and columns or use wildcards to specify a class of tables and columns. When you use wildcards, if new tables and columns are added to the database after the Data Firehose stream is created and if they match the wildcard, Data Firehose will automatically create those tables and columns in the destination.

Pricing and availability
The new data streaming capability is available today in all AWS Regions except China Regions, AWS GovCloud (US) Regions, and Asia Pacific (Malaysia) Regions. We want you to evaluate this new capability and provide us with feedback. There are no charges for your usage at the beginning of the preview. At some point in the future, it will be priced based on your actual usage, for example, based on the quantity of bytes read and delivered. There are no commitments or upfront investments. Make sure to read the pricing page to get the details.

Now, go configure your first continual database replication to Apache Iceberg tables on Amazon S3 and visit http://aws.amazon.com/firehose.

— seb

Secure by Design: AWS enhances centralized security controls as MFA requirements expand

Post Syndicated from Arynn Crow original https://aws.amazon.com/blogs/security/secure-by-design-aws-enhances-centralized-security-controls-as-mfa-requirements-expand/

At Amazon Web Services (AWS), we’ve built our services with secure by design principles from day one, including features that set a high bar for our customers’ default security posture. Strong authentication is a foundational component in overall account security, and the use of multi-factor authentication (MFA) is one of the simplest and most effective ways to help prevent unauthorized individuals from gaining access to systems or data. We have found that enabling MFA prevents greater than 99% of password-related attacks. Today, we’re sharing progress from the past year since we first announced that we would require customers to improve their default security posture by requiring the use of MFA for root users in the AWS Management Console.

In recent years, the typical workplace has evolved significantly. With an increase in practices like hybrid work and bring-your-own-device (BYOD) policies, defining security boundaries became much more complex. Most organizations have adjusted their security perimeters to emphasize identity-based controls, which often made user passwords the new weakest link in the perimeter. Users sometimes choose low-complexity passwords for ease of use, or reuse complex passwords across multiple websites, which substantially increases risk when a website experiences a data breach.

We take many steps to improve our customers’ resilience against these types of risks. For example, we monitor online sources for compromised credentials and block customers from using these in AWS. We also guard against setting weak passwords, never suggest default passwords for users to use, and when we detect unusual sign-in activity for customers who haven’t yet enabled MFA, we validate the sign-in with one-time PIN challenges to their primary email address. Despite these measures, passwords alone remain inherently risky.

We recognized two key opportunities to improve the situation. The first is to accelerate our customers’ MFA adoption, raising the bar for default security posture at AWS by requiring MFA for highly privileged users. In May 2024, we began requiring MFA for AWS Organizations management account root users, starting with users in larger environments. Then, in June, we launched support for FIDO2 passkeys as an MFA method, to offer customers an additional highly secure but also user-friendly way to align with their security requirements. At the same time, we announced that our MFA requirements expanded to include root users in standalone accounts. After AWS Identity and Access Management (IAM) launched FIDO2 passkey support in June 2024, customer registration rates for phishing-resistant MFA increased by over 100%. Between April and October 2024, more than 750,000 AWS root users enabled MFA.

The second opportunity we recognized is to eliminate unnecessary passwords altogether. On top of the security issues with passwords, attempting to secure password-based authentication introduces operational overhead for customers, especially those operating at scale and those with regulatory requirements to rotate passwords periodically. Today, we are launching a new capability to centrally manage root access for accounts managed in AWS Organizations. This capability enables customers to greatly reduce the number of passwords they have to manage while still maintaining strong controls over the use of root principals. Customers can now enable centralized root access with a simple configuration change through the IAM console or the AWS CLI, a process which is described further in this post. Then, customers can remove the long-term credentials (including passwords or long-term access keys) of member account root users in their organizations. This will improve the security posture of our customers while simultaneously reducing their operational effort.

We strongly recommend that Organizations customers get started enabling our centralized root access feature today to experience these benefits. However, in cases where customers continue to maintain root users, it’s essential to make sure that these highly privileged credentials are well-protected. With enhanced support for our customers operating at scale, as well as additional features like passkeys, we’re expanding our MFA requirements to member accounts in AWS Organizations. Beginning in the Spring of 2025, customers who have not enabled central management of root access will be required to register MFA for their AWS Organizations member account root users in order to access the AWS Management Console. As with our previous expansions to management and standalone accounts, we will roll this change out gradually and notify individual customers who are required to take action in advance, to help customers adhere to the new requirements while minimizing impact to their day-to-day operations.

You can learn more about our new feature to centrally manage root access in the IAM User Guide, and more about using MFA at AWS in the AWS MFA in IAM User Guide.

If you have feedback about this post, submit comments in the Comments section below.

Arynn Crow

Arynn Crow

Arynn Crow is the Principal Product Manager of Account Protection for AWS Identity. Arynn started at Amazon in 2012 as a customer service agent, trying out many different roles over the years before finding her happy place in security and identity in 2017. Arynn focuses on account protection, regulation and standards, and secure by design initiatives.

Centrally managing root access for customers using AWS Organizations

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/centrally-managing-root-access-for-customers-using-aws-organizations/

AWS Identity and Access Management (IAM) is launching a new capability allowing security teams to centrally manage root access for member accounts in AWS Organizations. You can now easily manage root credentials and perform highly privileged actions.

Managing root user credentials at scale
For a long time, Amazon Web Services (AWS) accounts were provisioned with highly privileged root user credentials, which had unrestricted access to the account. This root access, while powerful, also posed significant security risks. Each AWS account’s root user had to be secured by adding layers of protection like multi-factor authentication (MFA). Security teams were required to manage and secure these root credentials manually. The process involved rotating credentials periodically, storing them securely, and making sure that the credentials complied with security policies.

As our customers expanded their AWS environments, this manual approach became cumbersome and prone to error. For example, large enterprises operating hundreds or thousands of member accounts struggled to secure root access consistently across all accounts. The manual intervention not only added operational overhead but also created a lag in account provisioning, preventing full automation and increasing security risks. Root access, if not properly secured, could lead to account takeovers and unauthorized access to sensitive resources.

Furthermore, whenever specific root actions such as unlocking an Amazon Simple Storage Service (Amazon S3) bucket policy or an Amazon Simple Queue Service (Amazon SQS) resource policy were required, security teams had to retrieve and use root credentials, which only increased the attack surface. Even with rigorous monitoring and strong security policies, maintaining long-term root credentials opened doors to potential mismanagement, compliance risks, and manual errors.

Security teams began seeking a more automated, scalable solution. They needed a way to not only centralize the management of root credentials but also programmatically manage root access without needing long-term credentials in the first place.

Centrally manage root access
With the new ability to centrally manage root access, we address the longstanding challenge of managing root credentials across multiple accounts. This new capability introduces two essential capabilities: the central management of root credentials and root sessions. Together, they offer security teams a secure, scalable, and compliant way to manage root access across AWS Organizations member accounts.

Let’s first discuss the central management of root credentials. With this capability, you can now centrally manage and secure privileged root credentials across all accounts in AWS Organizations. Root credentials management allows you to:

  • Remove long-term root credentials – Security teams can now programmatically remove root user credentials from member accounts, confirming that no long-term privileged credentials are left vulnerable to misuse.
  • Prevent credential recovery – It not only removes the credentials but also prevents their recovery, safeguarding against any unintended or unauthorized root access in the future.
  • Provision secure-by-default accounts – Because you can now create member accounts without root credentials from the start, you no longer need to apply additional security measures like MFA after account provisioning. Accounts are secure by default, which drastically reduces security risks associated with long-term root access and helps simplify the entire provisioning process.
  • Help to stay compliant – Root credentials management allows security teams to demonstrate compliance by centrally discovering and monitoring the status of root credentials across all member accounts. This automated visibility confirms that no long-term root credentials exist, making it easier to meet security policies and regulatory requirements.

But how can we make sure it remains possible to perform selected root actions on the accounts? This is the second capability we launch today: root sessions. It offers a secure alternative to maintaining long-term root access. Instead of manually accessing root credentials whenever privileged actions are required, security teams can now gain short-term, task-scoped root access to member accounts. This capability makes sure that actions such as unlocking S3 bucket policies or SQS queue policies can be performed securely without the need for long-term root credentials.

Root sessions key benefits include:

  • Task-scoped root access – AWS enables short-term root access for specific actions, adhering to the best practices of least privilege. This limits the scope of what can be done and minimizes the duration of access, reducing potential risks.
  • Centralized management – You can now perform privileged root actions from a central account without needing to log in to each member account individually. This streamlines the process and reduces the operational burden on security teams, allowing them to focus on higher-level tasks.
  • Alignment with AWS best practices – By using short-term credentials, organizations align themselves with AWS security best practices, which emphasize the principle of least privilege and the use of short-term, temporary access where possible.

This new capability does not grant full root access. It provides temporary credentials for performing one of these five specific actions. The first three actions are possible with central management of root accounts. The last two come when enabling root sessions.

  • Auditing root user credentials – Read-only access to review root user information
  • Re-enabling account recovery – Reactivating account recovery without root credentials
  • Deleting root user credentials – Removing console passwords, access keys, signing certificates, and MFA devices
  • Unlocking an S3 bucket policy – Editing or deleting an S3 bucket policy that denies all principals
  • Unlocking an SQS queue policy – Editing or deleting an Amazon SQS resource policy that denies all principals

How to obtain root credentials on a member account
In this demo, I show you how to prepare your management account, create a member account without root credentials, and obtain temporary root credentials to make one of the five authorized API call on the member account. I assume you have an organization already created.

First, I create a member account.

aws organizations create-account    \
     --email [email protected] \
     --account-name 'Root Accounts Demo account'
{
    "CreateAccountStatus": {
        "Id": "car-695abd4ee1ca4b85a34e5dcdcd1b944f",
        "AccountName": "Root Accounts Demo account",
        "State": "IN_PROGRESS",
        "RequestedTimestamp": "2024-09-04T20:04:09.960000+00:00"
    }
}

Then, I enable the two new capabilities on my management account. Don’t worry, these commands don’t alter the behavior of the accounts in any way other than enabling use of the new capability.

➜  aws organizations enable-aws-service-access \
        --service-principal iam.amazonaws.com

➜  aws iam enable-organizations-root-credentials-management
{
    "OrganizationId": "o-rlrup7z3ao",
    "EnabledFeatures": [
        "RootCredentialsManagement"
    ]
}

➜  aws iam enable-organizations-root-sessions
{
    "OrganizationId": "o-rlrup7z3ao",
    "EnabledFeatures": [
        "RootSessions",
        "RootCredentialsManagement"
    ]
}

Alternatively, I can also use the console on the management account. Under Access management, I select Account settings.

Root Access Management

Now, I’m ready to make requests to obtain temporary root credentials. I have to pass one of the five managed IAM policies to scope down the credentials to a specific action.

➜  aws sts assume-root \
       --target-principal <my member account id> \
       --task-policy-arn arn=arn:aws:iam::aws:policy/root-task/S3UnlockBucketPolicy 

{
    "Credentials": {
        "AccessKeyId": "AS....XIG",
        "SecretAccessKey": "ao...QxG",
        "SessionToken": "IQ...SS",
        "Expiration": "2024-09-23T17:44:50+00:00"
    }
}

Once I obtain the access key ID, the secret access key, and the session token, I use them as usual with the AWS Command Line Interface (AWS CLI) or an AWS SDKs.

For example, I can pass these three values as environment variables.

$ export AWS_ACCESS_KEY_ID=ASIA356SJWJITG32xxx
$ export AWS_SECRET_ACCESS_KEY=JFZzOAWWLocoq2of5Exxx
$ export AWS_SESSION_TOKEN=IQoJb3JpZ2luX2VjEMb//////////wEaCXVxxxx

Now that I received the temporary credentials, I can make a restricted API call as root on the member account. First, I verify I now have root credentials. The Arn field confirms I’m working with the root account.


# Call get Caller Identity and observe I'm root in the member account
$ aws sts get-caller-identity
{
   "UserId": "012345678901",
   "Account": "012345678901",
   "Arn": "arn:aws:iam::012345678901:root"
}

Then, I use the delete-bucket-policy from S3 to remove an incorrect policy that has been applied to a bucket. The invalid policy removed all bucket access for everybody. Removing such policy requires root credentials.

aws s3api delete-bucket-policy --bucket my_bucket_with_incorrect_policy

When there is no output, it means the operation is successful. I can now apply a correct access policy to this bucket.

Credentials are valid only for 15 minutes. I wrote a short shell script to automate the process of getting the credentials as JSON, exporting the correct environment variables, and issuing the command I want to run as root.

Availability
Central management of root access is available at no additional cost in all AWS Regions except AWS GovCloud (US) and AWS China Regions, where there is no root account. Root sessions are available everywhere.

You can start using it through the IAM console, AWS CLI or AWS SDK. For more information, visit AWS account root user in our documentation and follow best practices for securing your AWS accounts.

— seb