Tag Archives: Amazon Elastic Container Service

Celebrating 10 Years of Amazon ECS: Powering a Decade of Containerized Innovation

Post Syndicated from Matheus Guimaraes original https://aws.amazon.com/blogs/aws/celebrating-10-years-of-amazon-ecs-powering-a-decade-of-containerized-innovation/

Today, we celebrate 10 years of Amazon Elastic Container Service (ECS) and its incredible journey of pushing the boundaries of what’s possible in the cloud! What began as a solution to streamline running Docker containers on Amazon Web Services (AWS) has evolved into a cornerstone technology, offering both impressive performance and operational simplicity, including a serverless option with AWS Fargate for seamless container orchestration.

Over the past decade, Amazon ECS has become a trusted solution for countless organizations, providing the reliability and performance that customers such as SmugMug rely on to power their operations without being bogged down by infrastructure challenges. As Andrew Shieh, Principal Engineer at SmugMug, shares, Amazon ECS has been the “unsung hero” behind their seamless transition to AWS and efficient handling of massive data operations, such as migrating petabytes of photos to Amazon Simple Storage Service (Amazon S3). “The blazingly fast container spin-ups allow us to deliver awesome experiences to our customers,” he adds. It’s this kind of dependable support that has made Amazon ECS a favorite among developers and platform teams, helping them scale their solutions and innovate over the years.

In the early 2010s, as containerized services like Docker gained traction, developers started looking for efficient ways to manage and scale their applications in this new paradigm. Traditional infrastructure was cumbersome, and managing containers at scale was challenging. Amazon ECS arrived in 2014, just when developers were looking to adopt containers at scale. It offered a fully managed, and reliable solution that streamlined container orchestration on AWS. Teams could focus on building and deploying applications without the overhead of managing clusters or complex infrastructure, ushering in a new era of cloud-native development.

When the Amazon ECS team set out to build the service, their vision was clear. As Deepak Singh, product manager who launched Amazon ECS now serving as VP of Next Generation Developer Experience, said at the time, “Our customers wanted a solution that was deeply integrated with AWS, that could work for them at scale and could grow as they grew.” Amazon ECS was designed to use the best of what AWS has to offer—scalability, availability, resilience, and security—to give customers the confidence to run their applications in production environments.

Evolution
Amazon ECS has consistently innovated for customers over the past decade. It marked the beginning of the container innovation journey at AWS, paving the way for a broader ecosystem of container-related services that have transformed how businesses build and manage applications.

Smartsheet proudly sings the praises of the significant impact that Amazon ECS, and especially AWS Fargate, had on their business to date. “Our teams can deploy more frequently, increase throughput, and reduce the engineering time to deploy from hours to minutes. We’ve gone from weekly deployments to deployments that we do multiple times a day. And from what used to be hours of at least two engineers’ time, we’ve been able to shave that down to several minutes,” said Skylar Graika, distinguished engineer at Smartsheet. ” Within the last year, we have been able to scale out its capacity by 50 times, and by leveraging deep integrations across AWS services, we have improved efficiencies and simplified our security and compliance process. Additionally, by adopting AWS Graviton with the Fargate deployments, we’ve seen a 20 percent reduction in cost.”

Amazon ECS played a pivotal role as the starting point for a decade of container evolution at AWS and today, it still stands as one of the most scalable and reliable container orchestration solutions, powering massive operations such as Prime Day 2024, where Amazon launched an impressive 77.24 million ECS tasks, Rufus, a shopping assistant experience powered by generative AI that uses Amazon ECS as part of its core architecture and so many others.

Rustem Feyzkhanov, ML engineering manager at Instrumental, and AWS Machine Learning Hero, is quick to recognize the increased efficiency gained from adopting the service. “Amazon ECS has become an indispensable tool in our work,” says Rastem. “Over the past years, it has simplified container management and service scaling, allowing us to focus on development rather than infrastructure. This service makes it possible for application code teams to co-own infrastructure and that speeds up the development process.”

Timeline
Let’s have a look at some of the key milestones that have shaped the evolution of ECS, marking pivotal moments that changed how customers harness the power of containers on AWS.

2014Introducing Amazon EC2 Container Service! – Check out this nostalgic blog post, which marked the release of ECS in preview mode. It shows how much functionality the service already launched with making a big impact from the get-go! Customers could already run, stop, and manage Docker containers on a cluster of Amazon Elastic Compute Cloud (EC2) instances, with built-in resource management and task scheduling. It became generally available on April 9, 2015.

2015Amazon ECS auto-scaling – With the introduction of added support for more Amazon CloudWatch metrics, customers could now automatically scale their clusters in and out by monitoring the CPU and memory usage in the cluster and configuring threshold values for auto scaling. I think this is a great example of how seemingly modest releases can have a huge impact for customers. Another impactful release was the introduction of Amazon ECR, a fully managed container registry that streamlines container storage and deployment.

2016Application Load Balancer (ALB) for ECS – The introduction of ALB for ECS, provided advanced routing features for containerized applications. ALB enabled more efficient load balancing across microservices, improving traffic management and scalability for ECS workloads. Windows users also benefitted from various releases this year including the added support for Windows Server 2016 with several AMIs and right and beta support for Windows Server Containers.

2017Introducing AWS Fargate! – Fargate was a huge leap forward towards customers being able to run containers without managing the underlying infrastructure, which significantly streamlined their operations. Developers no longer had to worry about provisioning, scaling, or maintaining the EC2 instances on which their containers ran and could now focus entirely on their application logic while AWS handled the rest. This helped them to scale faster and innovate more freely, accelerating their cloud-centered journeys and transforming how they approached containerized applications.

2018AWS Auto Scaling – With this release, teams could now build scaling plans easily for their Amazon ECS tasks. This year also saw the release of many improvements such as moving Amazon ECR to its own console experience outside of the Amazon ECS console, integration of Amazon ECS with AWS Cloud Map, and many others. Additionally, AWS Fargate continued to expand into regions world-wide.

2019Arm-based Graviton2 instances available on Amazon ECS – AWS Graviton2 was released during a time when many businesses were turning their attention towards reprioritizing their sustainability goals. With a focus on improved performance and lower power usage, EC2-instances powered by Graviton2 were supported on Amazon ECS from day 1 of their launch. Customers could take full advantage of this new groundbreaking custom chipset specially built for the cloud. Another great highlight from this year was the launch of AWS Fargate Spot which helped customers to achieve significant cost reductions.

2020Bottlerocket – An open-source, Linux-based operating system optimized for running containers. Designed to improve security and simplify updates, Bottlerocket helped Amazon ECS users achieve greater efficiency and stability in managing containerized workloads.

2021ECS Exec – Amazon ECS introduced ECS Exec in March 2021. With it, customers could run commands directly inside a running container on Amazon EC2 or AWS Fargate. This feature provided enhanced troubleshooting and debugging capabilities without requiring to modify or redeploy containers, streamlining operational workflows. This year also saw the release of Amazon ECS Windows containers streamlined operations for those running them in their cluster.

2022Amazon ECS introduces Service Connect – The release of ECS Service Connect marked a pivotal moment for organizations running microservices architectures on Amazon ECS because it abstracted away much of the complexity involved in service-to-service networking. This dramatically streamlined management of communication between services. With a native service discovery and service mesh capability, developers could now define and manage how their services interacted with each other seamlessly, improving observability, resilience, and security without the need to manage custom networking or load balancers.

2023Amazon GuardDuty ECS runtime monitoring – Last year, Amazon GuardDuty introduced ECS Runtime Monitoring for AWS Fargate, enhancing security by detecting potential threats within running containers. This feature provides continuous visibility into container workloads, improving security posture without additional performance overhead.

2024Amazon ECS Fargate with EBS Integration – In January this year, Amazon ECS and AWS Fargate added support for Amazon EBS volumes, enabling persistent storage for containers. This integration allows users to attach EBS volumes to Fargate tasks, making it much more effortless to deploy storage and support data intensive applications.

Where are we now?
Amazon ECS is in an exciting place right now as it enjoys a level of maturity that allows it to keep innovating while delivering huge value to both new and existing customers. This year has seen many improvements to the service making it increasingly more secure, cost-effective and straightforward to use.

This includes releases such as the support for automatic traffic encryption using TLS in Service Connect;  enhanced stopped task error messages which makes it more straightforward to troubleshoot task launch failures; and the ability to restart containers without having to relaunch the task. The introduction of Graviton2 based instances with AWS Fargate Spot provided customers with a great opportunity to double down on their cost savings.

As usual with AWS, the Amazon ECS team are very focused on delighting customers. “With Amazon ECS and AWS Fargate, we make it really easy for you to focus on your differentiated business logic while leveraging all the powerful compute that AWS offers without having to manage it,” says Nick Coult, director of Product and Science, Serverless Compute. “Our vision with these services was, and still is, to enable you to minimize infrastructure management, write less code, architect for extensibility, and drive high performance, resilience, and security. And, we have continuously innovated in these areas with this goal in mind over the past 10 years. At Amazon ECS, we remain steadfast in our commitment to delivering agility without compromising security, empowering developers with an exceptional experience, unlocking broader, simpler integrations, and new possibilities for emerging workloads like generative AI.”

Conclusion
Looking back on its history, it’s clear to me that ECS is a testament to the AWS approach of working backwards from customer needs. From its early days of streamlining container orchestration to the transformative introduction of Fargate and Service Connect, ECS has consistently evolved to remove barriers for developers and businesses alike.

As we look to the future, I think ECS will keep pushing boundaries, enabling even more innovative and scalable solutions. I encourage everyone to continue exploring what ECS has to offer, discovering new ways to build and pushing the platform to its full potential. There’s a lot more to come, and I’m excited to see where the journey takes us.

Learning resources
If you’re new to Amazon ECS, I recommend you read the comprehensive and accessible Getting Started With Amazon ECS guide.

When you’re ready to skill up with some hands-on free training, I recommend trying this self-paced Amazon ECS workshop, which covers many aspects of the service, including many of the features mentioned in this post.

Thank you, Amazon ECS, and thank you to all of you who use this service and continue to help us make it better for you. Here’s to another 10 years of container innovation! 🥂

AWS Weekly Roundup: Amazon DynamoDB, AWS AppSync, Storage Browser for Amazon S3, and more (September 9, 2024)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-dynamodb-aws-appsync-storage-browser-for-amazon-s3-and-more-september-9-2024/

Last week, the latest AWS Heroes arrived! AWS Heroes are amazing technical experts who generously share their insights, best practices, and innovative solutions to help others.

The AWS GenAI Lofts are in full swing with San Francisco and São Paulo open now, and London, Paris, and Seoul coming in the next couple of months. Here’s an insider view from a workshop in San Francisco last week.

AWS GenAI Loft San Francisco workshop

Last week’s launches
Here are the launches that got my attention.

Storage Browser for Amazon S3 (alpha release) – An open source Amplify UI React component that you can add to your web applications to provide your end users with a simple interface for data stored in S3. The component uses the new ListCallerAccessGrants API to list all S3 buckets, prefixes, and objects they can access, as defined by their S3 Access Grants.

AWS Network Load Balancer – Now supports a configurable TCP idle timeout. For more information, see this Networking & Content Devliery Blog post.

AWS Gateway Load Balancer – Also supports a configurable TCP idle timeout. More info is available in this blog post.

Amazon ECS – Now supports AWS Graviton-based Spot compute with AWS Fargate. This allows to run fault-tolerant Arm-based applications with up to 70% lower costs compared to on-demand.

Zone Groups for Availability Zones in AWS Regions – We are working on extending the Zone Group construct to Availability Zones (AZs) with a consistent naming format across all AWS Regions.

Amazon Managed Service for Apache Flink – Now supports Apache Flink 1.20. You can upgrade to benefit from bug fixes, performance improvements, and new functionality added by the Flink community.

AWS Glue – Now provides job queuing. If quotas or limits are insufficient to start a Glue job, AWS Glue will now automatically queue the job and wait for limits to free up.

Amazon DynamoDB – Now supports Attribute-Based Access Control (ABAC) for tables and indexes (limited preview). ABAC is an authorization strategy that defines access permissions based on tags attached to users, roles, and AWS resources. Read more in this Database Blog post.

Amazon BedrockStability AI’s top text-to-image models (Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core) are now available to generate high-quality visuals with speed and precision.

Amazon Bedrock Agents – Now supports Anthropic Claude 3.5 Sonnet, including Anthropic recommended tool use for function calling which can improve developer and end user experience.

Amazon Sagemaker Studio – You can now use Amazon EMR Serverless directly from your Studio Notebooks to interactively query, explore and visualize data, and run Apache Spark jobs.

Amazon SageMakerIntroducing sagemaker-core, a new Python SDK that provides an object-oriented interface for interacting with SageMaker resources such as TrainingJob, Model, and Endpoint resource classes.

AWS AppSync – Improves monitoring by including DEBUG and INFO logging levels for its GraphQL APIs. You now have more granular control over log verbosity to make it easier to troubleshoot your APIs while optimizing readability and costs.

Amazon WorkSpaces Pools – You can now bring your Windows 10 or 11 licenses and provide a consistent desktop experience when switching between on-premise and virtual desktops.

Amazon SES – A new enhanced onboarding experience to help discover and activate key SES features, including recommendations for optimal setup and the option to enable the Virtual Deliverability Manager to enhance email deliverability.

Amazon Redshift – Now the Amazon Redshift Data API support session reuse to retain the context of a session from one query execution to another, reducing connection setup latency on repeated queries to the same data warehouse.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, blog posts, and news items that you might find interesting:

Amazon Q Developer Code Challenge – At the 2024 AWS Summit in Sydney, we put two teams (one using Amazon Q Developer, one not) in a battle of coding prowess, starting with basic math and string manipulation, up to including complex algorithms and intricate ciphers. Here are the results.

Amazon Q Developer Code Challenge graph

AWS named as a Leader in the first Gartner Magic Quadrant for AI Code Assistants – It’s great to see how new technologies make the whole software development lifecycle easier and increase developer productivity.

Build powerful RAG pipelines with LlamaIndex and Amazon Bedrock – A deep dive tutorial that covers simple and advanced use cases.

Evaluating prompts at scale with Prompt Management and Prompt Flows for Amazon Bedrock – To implement an automated prompt evaluation system to streamline prompt development and improve the overall quality of AI-generated content.

Amazon Redshift data ingestion options – An overview of the available ingestion methods and how they work for different use cases.

Amazon Redshift data ingestion options

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. AWS Summits for this year are coming to an end. There are two more left that you can still register: Toronto (September 11), and Ottawa (October 9).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs driven by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are in the SF Bay Area (September 13), where our own Antje Barth is a keynote speaker, Argentina (September 14), Armenia (September 14), and DACH (in Munich on September 17).

AWS GenAI Lofts – Collaborative spaces and immersive experiences that showcase AWS’s cloud and AI expertise, while providing startups and developers with hands-on access to AI products and services, exclusive sessions with industry leaders, and valuable networking opportunities with investors and peers. Find a GenAI Loft location near you and don’t forget to register.

Browse all upcoming AWS-led in-person and virtual events here.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Danilo

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Serverless ICYMI Q2 2024

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/serverless-icymi-q2-2024/

Welcome to the 26th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

Calendar

Calendar

EDA Day – London 2024

The AWS Serverless DA team hosted the third Event-Driven Architecture (EDA) Day in London on May 14th. This event brought together prominent figures in the event-driven architecture community, AWS, and customer speakers.

EDA Day covered 13 sessions, 2 workshops, and a Q&A panel. David Boyne was the keynote speaker with a talk “Complexity is the Gotcha of Event-Driven Architecture”. There were AWS speakers including Matthew Meckes, Natasha Wright, Julian Wood, Gillian Amstrong, Josh Kahn, Veda Ramen, and Uma Ramadoss. There was also an impressive lineup of guest speakers, Daniele Frasca, David Anderson, Ryan Cormack, Sarah Hamilton, Sheen Brisals, Marcin Sodkiewicz, and Ben Ellerby.

Videos are available on YouTube

EDA Day London

EDA Day London

The future of Serverless

There has been a lot of talk about the future of serverless, with this year being the 10th anniversary of AWS Lambda. Eric Johnson addresses the topic in his ServerlessDays Milan keynote, “Now serverless is all grown up, what’s next”.

AWS Lambda

AWS launched support for the latest release of Ruby 3.3 is based on the new Amazon Linux 2023 runtime. The Ruby 3.3 runtime also provides access to the latest Ruby language features.

There is a new guide on how to retrieve data about Lambda functions that use a deprecated runtime.

Learn how to run code after returning a response from an AWS Lambda function. This post shows how to return a synchronous function response as soon as possible, yet also perform additional asynchronous work after you send the response. For example, you may store data in a database or send information to a logging system.

See how you can use the circuit-breaker pattern with Lambda extensions and Amazon DynamoDB. The circuit breaker pattern can help prevent cascading failures and improve overall system stability.

Circuit-breaker pattern

Circuit-breaker pattern

Lambda functions now scale up to 12X faster in the AWS GovCloud (US) Regions.

Powertools for AWS Lambda (Python) adds support for Agents for Amazon Bedrock.

The AWS SDK for JavaScript v2 enters maintenance mode on September 8, 2024 and reaches end-of-support on September 8, 2025.

Amazon CloudWatch Logs introduced Live Tail streaming CLI support.

Amazon ECS and AWS Fargate

You can now secure Amazon Elastic Container Service (Amazon ECS) workloads on AWS Fargate with customer managed keys (CMKs). Once you add your keys to AWS Key Management Service (AWS KMS), you can use these to encrypt the underlying ephemeral storage of an Amazon ECS task on AWS Fargate.

Windows containers on AWS Fargate now start faster, up to 42% for Windows Server 2022 Core. AWS has optimized the Windows Server AMIs, introduced EC2 fast launch with pre-provisioned snapshots, and reduced network latency.

Amazon ECS Service Connect is a networking capability to simplify service discovery, connectivity, and traffic observability for Amazon ECS. You can now proactively scale Amazon ECS services by using custom metrics.

ECS Connect custom metrics

ECS Service Connect custom metrics

AWS Step Functions

The AWS Step Functions TestState API allows you to test individual states independently and to integrate testing into your preferred development workflows. Learn how to accelerate workflow development to iterate faster.

Step Functions TestState API

Step Functions TestState API

Amazon EventBridge

Amazon EventBridge Pipes now supports event delivery through AWS PrivateLink. You can send events from an event source located in an Amazon Virtual Private Cloud (VPC) to a Pipes target without traversing the public internet.

Amazon Timestream for LiveAnalytics is now an EventBridge Pipes target. Timestream for LiveAnalytics is a fast, scalable, purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day.

EventBridge has a new console dashboard which provides a centralized view of your resources, metrics, and quotas. The console has an improved Learn page and other console enhancements. When using the CloudFormation template export for Pipes, you can also generate the IAM role. There is a new Rules tab in the Event Bus detail page, and the monitoring tab in the Rule detail page now includes additional metrics.

EventBridge Scheduler has some new API request metrics for improved observability.

Generative AI

Amazon Bedrock is a fully managed Generative AI service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API. Bedrock now supports new models, including Anthropic’s Claude 3.5, AI21 Labs’ Jamba-Instruct, Amazon Titan Text Premier.

The new Bedrock Converse API provides a consistent way to invoke Amazon Bedrock models and simplifies multi-turn conversations. There is also a JavaScript tutorial to walk you through sending requests to the Converse API using the Javascript SDK.

Amazon Q Developer is now generally available. Amazon Q Developer, part of the Amazon Q family, is a generative AI–powered assistant for software development. Amazon Q is available in the AWS Management Console and as an integrated development environment (IDE) extension for Visual Studio Code, Visual Studio, and JetBrains IDEs. Amazon Q Developer has knowledge of your AWS account resources and can help understand your costs.

Amazon Q list Lambda functions

Amazon Q list Lambda functions

You can use Amazon Q Developer to develop code features and transform code to upgrade Java applications. Amazon Q Developer also offers inline completions in the command line. For more information, see Reimagining software development with the Amazon Q Developer Agent.

Amazon Q code features

Amazon Q code features

Knowledge Bases for Amazon Bedrock now let you configure Guardrails, configure inference parameters, and offers observability logs.

Storage and data

Amazon S3 no longer charges for several HTTP error codes if initiated from outside your individual AWS account or AWS Organization.

You can automatically detect malware in new object uploads to S3 with Amazon GuardDuty.

Amazon Elastic File System (Amazon EFS) now support up to 1.5 GiB/s of throughput per client, a 3x increase over the previous limit of 500 MiB/s.

Discover architectural patterns for real-time analytics using Amazon Kinesis Data Streams in part 1 and part 2 and see how to optimize write throughput.

Amazon API Gateway

Amazon API Gateway now allows you to increase the integration timeout beyond the prior limit of 29 seconds. You can raise the integration timeout for Regional and private REST APIs, but this might require a reduction in your account-level throttle quota limit. This launch can help with workloads that require longer timeouts, such as Generative AI use cases with Large Language Models (LLMs).

You can also now use Amazon Verified Permissions to secure API Gateway REST APIs when using an Open ID connect (OIDC) compliant identity provider. You can now control access based on user attributes and group memberships, without writing code.

AWS AppSync

You can now invoke your AWS AppSync data sources in an event-driven manner. Previously, you could only invoke Lambda functions synchronously from AWS AppSync. AWS AppSync can now trigger Lambda functions in Event mode, asynchronously decoupling the API response from the Lambda invocation, which helps with long-running operations.

AWS AppSync now passes application request headers to Lambda custom authorizer functions. You can make authorization decisions based on the value of the authorization header, and the value of other headers that were sent with the request from the application client.

Learn best practices for AWS AppSync GraphQL APIs. See how to how to optimize the security, performance, coding standards, and deployment of your AWS AppSync API. AWS AppSync also has increase quotas, and new metrics

AWS Amplify

AWS Amplify Gen 2 is now generally available. This now provides a code-first developer experience for building full-stack apps using TypeScript. Amplify Gen 2 allows you to express app requirements like the data models, business logic, and authorization rules in TypeScript.

AWS Amplify Gen2

AWS Amplify Gen2

Amplify has a new experience for file storage. This post explores using Lambda to create serverless functions for Amplify using TypeScript. There are also new team environment workflows.

Serverless blog posts

April

May

June

Serverless container blog posts

April

May

June

Serverless Office Hours

Serverless Office Hours

Serverless Office Hours

April

May

June

Containers from the Couch

Containers from the Couch

Containers from the Couch

April

May

FooBar Serverless

April

February

June

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on X (formerly Twitter) to see the latest news, follow conversations, and interact with the team.

And finally, visit the Serverless Land and Containers on AWS websites for all your serverless and serverless container needs.

Securing Amazon ECS workloads on AWS Fargate with customer managed keys

Post Syndicated from Maish Saidel-Keesing original https://aws.amazon.com/blogs/compute/securing-amazon-ecs-workloads-on-aws-fargate-with-customer-managed-keys/

As Amazon CTO Werner Vogels said, “Encryption is the tool we have to make sure that nobody else has access to your data. Amazon Web Services (AWS) built encryption into nearly all of its 165 cloud services. Make use of it. Dance like nobody is watching. Encrypt like everyone is.”

Security is the top priority at AWS, underpinning everything we do. With AWS Fargate, every Amazon Elastic Container Service (Amazon ECS) task is launched on to a new single use, single tenant unit of compute. The ephemeral storage for this compute is always encrypted, and the AWS Key Management Service (AWS KMS) encryption key used for this encryption is managed by AWS Fargate.

Today, AWS is announcing that you can bring your own customer managed keys (CMKs). Once added to AWS KMS, you can use these to encrypt the underlying ephemeral storage of an Amazon ECS task on AWS Fargate. With this new capability, customers operating in heavily regulated environments can now have more control and visibility into their task’s ephemeral storage encryption.

This post dives into AWS Fargate task ephemeral storage and shows how the new customer managed key (CMK) feature can be enabled and audited.

Overview

AWS Fargate is a serverless compute engine for containerized workloads running on Amazon ECS and Amazon Elastic Kubernetes Service (Amazon EKS). Each time a new piece of work is scheduled on to AWS Fargate, as an Amazon ECS task or an Amazon EKS Pod, this workload is placed on a single use, single-tenant instance of compute.

For Amazon ECS tasks, that unit of compute has 20GiBs of ephemeral storage attached. This can be increased up to 200GiB by specifying the ephemeralStorage parameter in your task definition. This ephemeral storage is bound to the lifecycle of the Amazon ECS task, and once the Amazon ECS task has stopped, along with the underlying compute, this ephemeral storage is deleted.

If you are using AWS Fargate platform version 1.4.0 or higher, this ephemeral storage volume is encrypted by default. It is encrypted using an AWS Key Management Service (KMS) key with the AES-256 encryption algorithm. The key, and its lifecycle, is owned by the AWS Fargate service. You can learn more about Fargate-managed ephemeral storage encryption in the AWS Fargate Security Whitepaper.

With today’s launch, as an alternative to the Fargate-managed encryption, you can choose to encrypt the ephemeral storage with customer managed keys (CMKs). This helps regulation-sensitive customers meet their internal security policies and regulatory requirements.

Customers can import their own existing keys into AWS KMS or create a new CMK to encrypt the ephemeral storage. CMKs used by AWS Fargate can be managed through the normal AWS KMS lifecycle actions such as being rotated, disabled, and deleted. See the Amazon ECS documentation for more details on managing the KMS key. Additionally, all access from AWS Fargate to the KMS key can be audited in AWS CloudTrail Logs.

In January 2024, AWS announced that additional Amazon Elastic Block Store (Amazon EBS) volumes can now be attached to Amazon ECS tasks running on AWS Fargate. These EBS volumes unlock additional use cases for AWS Fargate customers, using higher capacity and high-performance volumes for use in their tasks alongside the ephemeral storage. These additional EBS volumes are managed differently to the ephemeral storage, and these volumes can already be encrypted with customer managed KMS keys (CMKs).

AWS Fargate falls under the scope of the following compliance programs regarding AWS’s side of the shared responsibility model. The compliance programs covered by AWS Fargate include:

You can download third-party audit reports using AWS Artifact. For more information, see Downloading Reports in AWS Artifact. Many of these compliance programs require customers to encrypt their data at rest within their Amazon ECS on AWS Fargate resources.

Customers also have additional internal risk management policies for key handling, where they must generate their own keys, have backups for these keys off-cloud, and manage the lifecycle of these keys. Until today, these customers could not use AWS Fargate’s default encryption solution for the workloads subject to their internal security policies.

Enabling CMK for ephemeral storage on an Amazon ECS Cluster

Following today’s launch a single KMS key can now be attached to a new or existing Amazon ECS Cluster. Once a key has been attached, all new tasks launched on to AWS Fargate use this KMS key. If you have existing tasks running in the Amazon ECS cluster, they must be redeployed to use the new encryption key. If these tasks are part of an Amazon ECS service, passing the –force-new-deployment flag to an amazon ecs update-service command forces all tasks to be redeployed with the new KMS key (while respecting the minimumHealthyPercent of the service).

To attach a KMS key to a new or existing cluster, specify the KeyId to the new managedStorageConfiguration field:

aws ecs create-cluster \
  --cluster clusterName \
  --configuration '{"managedStorageConfiguration":{"fargateEphemeralStorageKmsKeyId":"arn:aws:kms:us-west-2:012345678901:key/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111"}}'

Here is an example of the output of a DescribeClusters API request to an Amazon ECS cluster with a customer managed key:

aws ecs describe-clusters --clusters ecs-fargate-self-managed-key-cluster --region us-west-2 --include CONFIGURATIONS

Result of describe-clusters query

Aside from auditing CloudTrail Logs for encryption events, you can also verify that an ECS task is using the KMS key by using the DescribeTask API on an existing task:

{
    "tasks": [
        {
            ....
            "clusterArn": "arn:aws:ecs:us-west-2:1234567890:cluster/mycluster",
            "taskArn": "arn:aws:ecs:us-west-2:1234567890:task/11223342-1111-4fde-b6ca-273c5cfc00a1]",
            "fargateEphemeralStorage": {
                "sizeInGiB": 20,
                "kmsKeyId": "arn:aws:kms:us-west-2:1234567890:key/082222a1-1111-4fde-b6ca-273c5cfc00a1"
            }
        }
    ]
}

Enforcing encryption with customer managed keys

The new AWS Identity and Access Management (IAM) condition key ensures that your Amazon ECS clusters are created with a customer managed key. This can be applied as Service Control Policy in your AWS Organization or as part of your IAM permissions.

Here is an IAM policy example snippet that ensures a cluster can only be created when a specific AWS KMS key is used:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:CreateCluster"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "ecs:fargate-ephemeral-storage-kms-key": "arn:aws:kms:us-east-1:123456789012:key/1234abcd-12ab-34cd-56ef-1234567890ab"
        }
      }
    }
  ]
}

Audit encryption events

Encryption events are logged in AWS CloudTrail. The following is an example of a CloudTrail event that includes the volume ID, cluster name, and AWS Account ID of the operation. You can find more details about the type of events that are logged in Managing AWS KMS keys for Fargate ephemeral storage.

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AWSService",
        "invokedBy": "ec2-frontend-api.amazonaws.com"
    },
    "eventTime": "2024-04-23T18:08:13Z",
    "eventSource": "kms.amazonaws.com",
    "eventName": "CreateGrant",
    "awsRegion": "us-west-2",
    "sourceIPAddress": "ec2-frontend-api.amazonaws.com",
    "userAgent": "ec2-frontend-api.amazonaws.com",
    "requestParameters": {
        "keyId": "arn:aws:kms:us-west-2:123456789012:key/9b52b885-3f4d-40af-9843-d6b24b735559",
        "granteePrincipal": "fargate.us-west-2.amazonaws.com",
        "operations": [
            "Decrypt"
        ],
        "constraints": {
            "encryptionContextSubset": {
                "aws:ecs:clusterAccount": "123456789012",
                "aws:ebs:id": "vol-01234567890abcdef",
                "aws:ecs:clusterName": "ecs-fargate-self-managed-key-cluster"
            }
        },
        "retiringPrincipal": "ec2.us-west-2.amazonaws.com"
    },
    "responseElements": {
        "grantId": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
        "keyId": "arn:aws:kms:us-west-2:123456789012:key/9b52b885-3f4d-40af-9843-d6b24b735559"
    },
    "requestID": "be4d1a4e4730e0dceca51f87ee7454d5db76400d80e22bfbf3c4ca01e893b60c",
    "eventID": "bf36027c-86bd-40f2-a561-960cbe148c4c",
    "readOnly": false,
    "resources": [
        {
            "accountId": "AWS Internal",
            "type": "AWS::KMS::Key",
            "ARN": "arn:aws:kms:us-west-2:123456789012:key/9b52b885-3f4d-40af-9843-d6b24b735559"
        }
    ],
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "123456789012",
    "sharedEventID": "bf36027c-86bd-40f2-a561-960cbe148c4c",
    "eventCategory": "Management"
}

Conclusion

With the use of AWS KMS customer managed keys, you can now meet your security requirements for your data inside your Amazon ECS workloads running on AWS Fargate.

To learn more about compliance on your Amazon ECS workloads you can reference the FSI Services Spotlight: Amazon Elastic Container Service (ECS) with AWS Fargate blog post or the security overview of AWS Fargate whitepaper. To learn more about the use of customer managed keys in AWS Fargate, refer to the AWS documentation. This feature was requested by our customers on the AWS Containers roadmap.

London Stock Exchange Group uses chaos engineering on AWS to improve resilience

Post Syndicated from Elias Bedmar original https://aws.amazon.com/blogs/architecture/london-stock-exchange-group-uses-chaos-engineering-on-aws-to-improve-resilience/

This post was co-written with Luke Sudgen, Lead DevOps Engineer Post Trade, and Padraig Murphy, Solutions Architect Post Trade, from London Stock Exchange Group.

In this post, we’ll discuss some failure scenarios that were tested by London Stock Exchange Group (LSEG) Post Trade Technology teams during a chaos engineering event supported by AWS. Chaos engineering allows LSEG to simulate real-world failures in their cloud systems as part of controlled experiments. This methodology improves resilience and observability, which reduces risk and helps achieve compliance with regulators before deploying to production.

Introduction, tooling, and methodology

As a heavily regulated provider of global financial markets infrastructure, LSEG is always looking for opportunities to enhance workload resilience. LSEG and AWS teamed up to organize and run a 3-day AWS Experience-Based Acceleration (EBA) event to perform chaos engineering experiments against key workloads. The event was sponsored and led by the architecture function and included cross-functional Post Trade technical teams across various workstreams. The experiments were run using AWS Fault Injection Service (FIS) following the experiment methodology described in the Verify the resilience of your workloads using Chaos Engineering blog post.

Resilience of modern distributed cloud systems can be continuously improved through reviewing workload architectures and recovery, assessing standard operating procedures (SOPs), and building SOP alerts and recovery automations. AWS Resilience Hub provides a comprehensive tooling suite to get started on these activities.

Another key activity to validate and enhance your resilience posture is chaos engineering, a methodology that induces controlled chaos into customer systems through real-world controlled experiments. Chaos engineering helps customers create real-world failure conditions that can uncover hidden bugs, monitor blind spots, and manage bottlenecks that are difficult to find in distributed systems. This makes it a very useful tool in regulated industries such as financial services.

Architectural overview

The architectural diagram in Figure 1 comprises a three-tier application deployed in virtual private clouds (VPCs) with a multi-AZ setup.

Chaos engineering pattern for hybrid architecture (3-tier application)

Figure 1. Chaos engineering pattern for hybrid architecture (3-tier application)

Operating within a public subnet, the web application creates a hybrid architecture by using an Amazon Elastic Compute Cloud (Amazon EC2) Auto Scaling group and connecting to an Amazon Relational Database Service (Amazon RDS) database that’s located in a private subnet and connected with on-premises services. Additionally, a number of internal services are hosted in a separate VPC, housed within containers. FIS provides a controlled environment to validate the robustness of the architecture against various failure scenarios, such as:

  • Amazon EC2 instance failure that causes the application or container pod on the machine to also fail
  • Amazon RDS database instance reboot or failover
  • Severe network latency degradation
  • Network connectivity disruption
  • Amazon Elastic Block Store (Amazon EBS) volume failure (IOPS pause, disk full)

Amazon EC2 instance and container failure

The objective of this use case is to evaluate the resilience of the application or container pod running on Amazon EC2 instances and identify how the system can adapt itself and continue functioning during unexpected disruptions or instability of an instance. You can use aws:ec2:stop-instances or aws:ec2:terminate-instances FIS actions to mimic different EC2 instance failure modes. The response of running containers to the different instance failures was also assessed. If you’re running containers within a managed AWS service such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), you can use FIS failure scenarios for ECS tasks and EKS pods.

Amazon RDS failure

RDS failure is another common scenario you can use to identify and troubleshoot database managed service failures from failovers and node reboots at a large scale. FIS can be used to inject reboot/failover failure conditions into the managed RDS instances to understand the bottlenecks and issues from disaster failovers, sync failures, and other database-related problems.

Severe network latency degradation

Network latency degradation injects latency in the network interface that connects two systems. This helps you understand how these systems handle a data transfer delay and your operational response readiness (alerts, metrics, and correction). This FIS action (aws:ssm:send-command/AWSFIS-Run-Network-Latency) uses the Linux traffic control (tc) utility.

Network connectivity disruption

Connectivity issues like traffic disruption or other network issues can be simulated with FIS network actions. FIS supports the aws:network:disrupt-connectivity action to test your application’s resilience in the event of total or partial connectivity loss within its subnet, as well as disruption (including cross-Region) with other AWS networking components such as route tables or AWS Transit Gateway.

Amazon EBS volume failure (IOPS pause)

Disk failure is a problematic issue in real-time operations-based systems. It can lead to transactions failing due to I/O failures or storage failure during peak activity in heavy workloads. The EBS volume failure actions test system performance under different disk failure scenarios. FIS supports the aws:ebs:pause-volume-io action to pause I/O operations on target EBS volumes, as well as other failure modes. The target volumes must be in the same Availability Zone and must be attached to instances built on the AWS Nitro System.

Outcomes and conclusion

Following the experiment, the teams from LSEG successfully identified a series of architectural improvements to reduce application recovery time and enhance metric granularity and alerting. As a second tangible output, the teams now have a reusable chaos engineering methodology and toolset. Running regular in-person cross-functional events is a great way to implement a chaos engineering practice in your organization.

You can start your resilience journey on AWS today with AWS Resilience Hub.

Serverless ICYMI Q1 2024

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/serverless-icymi-q1-2024/

Welcome to the 25th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

2024 Q1 calendar

2024 Q1 calendar

Adobe Summit

At the Adobe Summit, the AWS Serverless Developer Advocacy team showcased a solution developed for the NFL using AWS serverless technologies and Adobe Photoshop APIs. The system automates image processing tasks, including background removal and dynamic resizing, by integrating AWS Step Functions, AWS Lambda, Amazon EventBridge, and AI/ML capabilities via Amazon Rekognition. This solution reduced image processing time from weeks to minutes and saved the NFL significant costs. Combining cloud-based serverless architectures with advanced machine learning and API technologies can optimize digital workflows for cost-effective and agile digital asset management.

Adobe Summit ServerlessVideo

Adobe Summit ServerlessVideo

ServerlessVideo is a demo application to stream live videos and also perform advanced post-video processing. It uses several AWS services, including Step Functions, Lambda, EventBridge, Amazon ECS, and Amazon Bedrock in a serverless architecture that makes it fast, flexible, and cost-effective. The team used ServerlessVideo to interview attendees about the conference experience and Adobe and partners about how they use Adobe. Learn more about the project and watch videos from Adobe Summit 2024 at video.serverlessland.com.

AWS Lambda

AWS launched support for the latest long-term support release of .NET 8, which includes API enhancements, improved Native Ahead of Time (Native AOT) support, and improved performance.

AWS Lambda .NET 8

AWS Lambda .NET 8

Learn how to compare design approaches for building serverless microservices. This post covers the trade-offs to consider with various application architectures. See how you can apply single responsibility, Lambda-lith, and read and write functions.

The AWS Serverless Java Container has been updated. This makes it easier to modernize a legacy Java application written with frameworks such as Spring, Spring Boot, or JAX-RS/Jersey in Lambda with minimal code changes.

AWS Serverless Java Container

AWS Serverless Java Container

Lambda has improved the responsiveness for configuring Event Source Mappings (ESMs) and Amazon EventBridge Pipes with event sources such as self-managed Apache Kafka, Amazon Managed Streaming for Apache Kafka (MSK), Amazon DocumentDB, and Amazon MQ.

Chaos engineering is a popular practice for building confidence in system resilience. However, many existing tools assume the ability to alter infrastructure configurations, and cannot be easily applied to the serverless application paradigm. You can use the AWS Fault Injection Service (FIS) to automate and manage chaos experiments across different Lambda functions to provide a reusable testing method.

Amazon ECS and AWS Fargate

Amazon Elastic Container Service (Amazon ECS) now provides managed instance draining as a built-in feature of Amazon ECS capacity providers. This allows Amazon ECS to safely and automatically drain tasks from Amazon Elastic Compute Cloud (Amazon EC2) instances that are part of an Amazon EC2 Auto Scaling Group associated with an Amazon ECS capacity provider. This simplification allows you to remove custom lifecycle hooks previously used to drain Amazon EC2 instances. You can now perform infrastructure updates such as rolling out a new version of the ECS agent by seamlessly using Auto Scaling Group instance refresh, with Amazon ECS ensuring workloads are not interrupted.

Credentials Fetcher makes it easier to run containers that depend on Windows authentication when using Amazon EC2. Credentials Fetcher now integrates with Amazon ECS, using either the Amazon EC2 launch type, or AWS Fargate serverless compute launch type.

Amazon ECS Service Connect is a networking capability to simplify service discovery, connectivity, and traffic observability for Amazon ECS. You can now more easily integrate certificate management to encrypt service-to-service communication using Transport Layer Security (TLS). You do not need to modify your application code, add additional network infrastructure, or operate service mesh solutions.

Amazon ECS Service Connect

Amazon ECS Service Connect

Running distributed machine learning (ML) workloads on Amazon ECS allows ML teams to focus on creating, training and deploying models, rather than spending time managing the container orchestration engine. Amazon ECS provides a great environment to run ML projects as it supports workloads that use NVIDIA GPUs and provides optimized images with pre-installed NVIDIA Kernel drivers and Docker runtime.

See how to build preview environments for Amazon ECS applications with AWS Copilot. AWS Copilot is an open source command line interface that makes it easier to build, release, and operate production ready containerized applications.

Learn techniques for automatic scaling of your Amazon Elastic Container Service  (Amazon ECS) container workloads to enhance the end user experience. This post explains how to use AWS Application Auto Scaling which helps you configure automatic scaling of your Amazon ECS service. You can also use Amazon ECS Service Connect and AWS Distro for OpenTelemetry (ADOT) in Application Auto Scaling.

AWS Step Functions

AWS workloads sometimes require access to data stored in on-premises databases and storage locations. Traditional solutions to establish connectivity to the on-premises resources require inbound rules to firewalls, a VPN tunnel, or public endpoints. Discover how to use the MQTT protocol (AWS IoT Core) with AWS Step Functions to dispatch jobs to on-premises workers to access or retrieve data stored on-premises.

You can use Step Functions to orchestrate many business processes. Many industries are required to provide audit trails for decision and transactional systems. Learn how to build a serverless pipeline to create a reliable, performant, traceable, and durable pipeline for audit processing.

Amazon EventBridge

Amazon EventBridge now supports publishing events to AWS AppSync GraphQL APIs as native targets. The new integration allows you to publish events easily to a wider variety of consumers and simplifies updating clients with near real-time data.

Amazon EventBridge publishing events to AWS AppSync

Amazon EventBridge publishing events to AWS AppSync

Discover how to send and receive CloudEvents with EventBridge. CloudEvents is an open-source specification for describing event data in a common way. You can publish CloudEvents directly to EventBridge, filter and route them, and use input transformers and API Destinations to send CloudEvents to downstream AWS services and third-party APIs.

AWS Application Composer

AWS Application Composer lets you create infrastructure as code templates by dragging and dropping cards on a virtual canvas. These represent CloudFormation resources, which you can wire together to create permissions and references. Application Composer has now expanded to the VS Code IDE as part of the AWS Toolkit. This now includes a generative AI partner that helps you write infrastructure as code (IaC) for all 1100+ AWS CloudFormation resources that Application Composer now supports.

AWS AppComposer generate suggestions

AWS AppComposer generate suggestions

Amazon API Gateway

Learn how to consume private Amazon API Gateway APIs using mutual TLS (mTLS). mTLS helps prevent man-in-the-middle attacks and protects against threats such as impersonation attempts, data interception, and tampering.

Serverless at AWS re:Invent

Serverless at AWS reInvent

Serverless at AWS reInvent

Visit the Serverless Land YouTube channel to find a list of serverless and serverless container sessions from reinvent 2023. Hear from experts like Chris Munns and Julian Wood in their popular session, Best practices for serverless developers, or Nathan Peck and Jessica Deen in Deploying multi-tenant SaaS applications on Amazon ECS and AWS Fargate.

Serverless blog posts

January

February

March

Serverless container blog posts

January

February

December

Serverless Office Hours

Serverless Office Hours

Serverless Office Hours

January

February

March

Containers from the Couch

Containers from the Couch

Containers from the Couch

January

February

March

FooBar Serverless

FooBar Serverless

FooBar Serverless

January

February

March

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

And finally, visit the Serverless Land and Containers on AWS websites for all your serverless and serverless container needs.

Top Architecture Blog Posts of 2023

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2023/

2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.

As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.

I give you our 2023 cream of the crop!

#10: Build a serverless retail solution for endless aisle on AWS

In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.

Building endless aisle architecture for order processing

Figure 1. Building endless aisle architecture for order processing

Check it out!

#9: Optimizing data with automated intelligent document processing solutions

Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.

AI-based intelligent document processing engine

Figure 2: AI-based intelligent document processing engine

Check it out!

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.

Warm standby with managed services

Figure 3. Warm standby with managed services

Check it out!

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.

Architecture flow for Microservices to simulate a realistic failure scenario

Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario

Check it out!

#6: Let’s Architect! Designing event-driven architectures

Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.

Let's Architect

Figure 5. Let’s Architect artwork

Check it out!

#5: Use a reusable ETL framework in your AWS lake house architecture

As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.

Reusable ETL framework architecture

Figure 6. Reusable ETL framework architecture

Check it out!

#4: Invoking asynchronous external APIs with AWS Step Functions

It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.

Invoking Asynchronous External APIs architecture

Figure 7. Invoking Asynchronous External APIs architecture

Check it out!

#3: Announcing updates to the AWS Well-Architected Framework

The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!

Well-Architected logo

Figure 8. Well-Architected logo

Check it out!

#2: Let’s Architect! Designing architectures for multi-tenancy

The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.

Let's Architect

Figure 9. Let’s Architect

Check it out!

And finally…

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!

Resilience patterns and trade-offs

Figure 10. Resilience patterns and trade-offs

Check it out!

Bonus! Three older special mentions

These three posts were published before 2023, but we think they deserve another round of applause because you, our readers, keep coming back to them.

Thanks again to everyone for their contributions during a wild year. We hope you’re looking forward to the rest of 2024 as much as we are!

AWS Weekly Roundup — Amazon ECS, RDS for MySQL, EMR Studio, AWS Community, and more — January 22, 2024

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-ecs-rds-for-mysql-emr-studio-aws-community-and-more-january-22-2024/

As usual, a lot has happened in the Amazon Web Services (AWS) universe this past week. I’m also excited about all the AWS Community events and initiatives that are happening around the world. Let’s take a look together!

Last week’s launches
Here are some launches that got my attention:

Amazon Elastic Container Service (Amazon ECS) now supports managed instance draining – Managed instance draining allows you to gracefully shutdown workloads deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances by safely stopping and rescheduling them to other, non-terminating instances. This new capability streamlines infrastructure maintenance, such as deploying a new AMI version, eliminating the need for custom solutions to shutdown instances without disrupting their workloads. To learn more, check out Nathan’s post on the AWS Containers Blog.

Amazon Relational Database Service (Amazon RDS) for MySQL now supports multi-source replication – Using multi-source replication, you can configure multiple RDS for MySQL database instances as sources for a single target database instance. This feature facilitates tasks such as merging shards into a single target, consolidating data for analytics, or creating long-term backups within a single RDS for MySQL instance. The Amazon RDS for MySQL User Guide has all the details.

Amazon EMR Studio now comes with simplified create experience and improved start times – With the simplified console experience for creating EMR Studio, you can launch interactive and batch workloads with default settings more easily. The improved start times let you launch EMR Studio Workspaces for performing interactive analysis in notebooks in seconds. Have a look at the Amazon EMR User Guide to learn more.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional projects, programs, and news items that you might find interesting:

Get The NewsSummarize news using Amazon Bedrock – My colleague Danilo built this application to summarize the most recent news from an RSS or Atom feed using Amazon Bedrock. The application is deployed as an AWS Lambda function. The function downloads the most recent entries from an RSS or Atom feed, downloads the linked content, extracts text, and makes a summary.

AWS Community BuildersAWS Community Builders program – Interested in joining our AWS Community Builders program? The 2024 application is open until January 28. The AWS Community Builders program offers technical resources, education, and networking opportunities to AWS technical enthusiasts who are passionate about sharing knowledge and connecting with the technical community.

User Group YaoundeAWS User Groups – The AWS User Group Yaounde Cameroon embarked on a 12-week workshop challenge. Over 12 weeks, participants explored various aspects of AWS and cloud computing, including architecture, security, storage, and more, to develop skills and share knowledge. You can read more about this amazing initiative in this LinkedIn post.

AWS open-source news and updates – My colleague Ricardo writes this weekly open source newsletter in which he highlights new open source projects, tools, and demos from the AWS Community.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS InnovateAWS Innovate: AI/ML and Data Edition – Register now for the Asia Pacific & Japan AWS Innovate online conference on February 22, 2024, to explore, discover, and learn how to innovate with artificial intelligence (AI) and machine learning (ML). Choose from over 50 sessions in three languages and get hands-on with technical demos aimed at generative AI builders.

AWS Community re:Invent re:CapsAWS Community re:Invent re:Caps – Join a Community re:Cap event organized by volunteers from AWS User Groups and AWS Cloud Clubs around the world to learn about the latest announcements from AWS re:Invent.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AWS Weekly Roundup—Amazon Route53, Amazon EventBridge, Amazon SageMaker, and more – January 15, 2024

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-route53-amazon-eventbridge-amazon-sagemaker-and-more-january-15-2024/

We are in January, the start of a new year, and I imagine many of you have made a new year resolution to learn something new. If you want to learn something new and get a free Amazon Web Services (AWS) Learning Badge, check out the new Events and Workflows Learning Path. This learning path will teach you everything you need to know about AWS Step Functions, Amazon EventBridge, event-driven architectures, and serverless, and when you finish the learning path, you can take an assessment. If you pass the assessment, you get an AWS Learning Badge, credited by Credly, that you can share in your résumé and social media profiles.

Events and workflows learning path badge

Last Week’s Launches
Here are some launches that got my attention during the previous week.

Amazon Route 53 – Now you can enable Route 53 Resolver DNS Firewall to filter DNS traffic based on the query type contained in the question section of the DNS query format. In addition, Route 53 now supports geoproximity routing as an additional routing policy for DNS records. Expand and reduce the geographic area from which traffic is routed to a resource by changing the record’s bias value. This is really helpful for industries that need to deliver highly responsive digital experiences.

Amazon CloudWatch LogsCloudWatch Logs now support creating account-level subscription filters. This capability allows you to forward all the logs groups from an account to other services like Amazon OpenSearch Service or Amazon Kinesis Data Firehose.

Amazon Elastic Container Service (Amazon ECS) Amazon ECS now integrates with Amazon Elastic Block Store (Amazon EBS), allowing you to provision and attach EBS volumes to Amazon ECS tasks running on both AWS Fargate and Amazon Elastic Cloud Compute (Amazon EC2). Read the blog post Channy wrote where he shows this feature in action.

Amazon EventBridgeEventBridge now supports AWS AppSync as a target of EventBridge buses. This enables you to stream real-time updates from your backend applications to your front-end clients. For example, you can get notifications in your mobile application from an order you did when the order status changes on the backend.

Amazon SageMakerSageMaker now supports M7i, C7i, and R7i instances for machine learning (ML) inference. These instances are powered by custom 4th generation Intel Xeon scalable processors and deliver up to 15 percent better price performance than their previous generations.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you may have missed:

If you are a serverless enthusiast, this week, the AWS Compute Blog published the Serverless ICYMI (in case you missed it) quarterly recap for the last quarter of 2023. This post compiles the announcements made during the months of October, November, and December, with all the relevant content that was produced by AWS Developer Advocates during that time. In addition to that blog post, you can learn about ServerlessVideo, a new demo application that we launched at AWS re:Invent 2023.

ServerlessVideo

This week there were also a couple of really interesting blog posts that explain how to solve very common challenges that customers face. The first one is the blog post in the AWS Security Blog that explains how to customize access tokens in Amazon Cognito user pools. And the second one is from the AWS Database Blog, which explains how to effectively sort data with Amazon DynamoDB.

The Official AWS Podcast – Listen each week for updates on the latest AWS news and deep dives into exciting use cases. There are also official AWS podcasts in several languages. Check out the ones in FrenchGermanItalian, and Spanish.

AWS open source newsletter – This is a newsletter curated by my colleague Ricardo to bring you the latest open source projects, posts, events, and more.

For our customers in Turkey, on January 1, 2024, AWS Turkey Pazarlama Teknoloji ve Danışmanlık Hizmetleri Limited Şirketi (AWS Turkey) replaced AWS EMEA SARL (AWS Europe) as the contracting party and service provider to customers in Türkiye. This enables AWS customers in Türkiye to transact in their local currency (Turkish Lira) and with a local bank. For more information on AWS Turkey, visit the FAQ page.

Upcoming AWS Events
The beginning of the year is the season of AWS re:Invent recaps, which are happening all around the globe during the next two months. You can check the recaps page to find the one closest to you.

You can browse all upcoming AWS led in-person and virtual events, as well as developer-focused events such as AWS DevDay.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Amazon ECS supports a native integration with Amazon EBS volumes for data-intensive workloads

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/amazon-ecs-supports-a-native-integration-with-amazon-ebs-volumes-for-data-intensive-workloads/

Today we are announcing that Amazon Elastic Container Service (Amazon ECS) supports an integration with Amazon Elastic Block Store (Amazon EBS), making it easier to run a wider range of data processing workloads. You can provision Amazon EBS storage for your ECS tasks running on AWS Fargate and Amazon Elastic Compute Cloud (Amazon EC2) without needing to manage storage or compute.

Many organizations choose to deploy their applications as containerized packages, and with the introduction of Amazon ECS integration with Amazon EBS, organizations can now run more types of workloads than before.

You can run data workloads requiring storage that supports high transaction volumes and throughput, such as extract, transform, and load (ETL) jobs for big data, which need to fetch existing data, perform processing, and store this processed data for downstream use. Because the storage lifecycle is fully managed by Amazon ECS, you don’t need to build any additional scaffolding to manage infrastructure updates, and as a result, your data processing workloads are now more resilient while simultaneously requiring less effort to manage.

Now you can choose from a variety of storage options for your containerized applications running on Amazon ECS:

  • Your Fargate tasks get 20 GiB of ephemeral storage by default. For applications that need additional storage space to download large container images or for scratch work, you can configure up to 200 GiB of ephemeral storage for your Fargate tasks.
  • For applications that span many tasks that need concurrent access to a shared dataset, you can configure Amazon ECS to mount the Amazon Elastic File System (Amazon EFS) file system to your ECS tasks running on both EC2 and Fargate. Common examples of such workloads include web applications such as content management systems, internal DevOps tools, and machine learning (ML) frameworks. Amazon EFS is designed to be available across a Region and can be simultaneously attached to many tasks.
  • For applications that need high-performance, low-cost storage that does not need to be shared across tasks, you can configure Amazon ECS to provision and attach Amazon EBS storage to your tasks running on both Amazon EC2 and Fargate. Amazon EBS is designed to provide block storage with low latency and high performance within an Availability Zone.

To learn more, see Using data volumes in Amazon ECS tasks and persistent storage best practices in the AWS documentation.

Getting started with EBS volume integration to your ECS tasks
You can configure the volume mount point for your container in the task definition and pass Amazon EBS storage requirements for your Amazon ECS task at runtime. For most use cases, you can get started by simply providing the size of the volume needed for the task. Optionally, you can configure all EBS volume attributes and the file system you want the volume formatted with.

1. Create a task definition
Go to the Amazon ECS console, navigate to Task definitions, and choose Create new task definition.

In the Storage section, choose Configure at deployment to set EBS volume as a new configuration type. You can provision and attach one volume per task for Linux file systems.

When you choose Configure at task definition creation, you can configure existing storage options such as bind mounts, Docker volumes, EFS volumes, Amazon FSx for Windows File Server volumes, or Fargate ephemeral storage.

Now you can select a container in the task definition, the source EBS volume, and provide a mount path where the volume will be mounted in the task.

You can also use $aws ecs register-task-definition --cli-input-json file://example.json command line to register a task definition to add an EBS volume. The following snippet is a sample, and task definitions are saved in JSON format.

{
    "family": "nginx"
    ...
    "containerDefinitions": [
        {
            ...
            "mountPoints": [
                "containerPath": "/foo",
                "sourceVoumne": "new-ebs-volume"
            ],
            "name": "nginx",
            "image": "nginx"
        }
    ],
    "volumes": [
       {
           "name": "/foo",
           "configuredAtRuntime": true
       }
    ]
}

2. Deploy and run your task with EBS volume
Now you can run a task by selecting your task in your ECS cluster. Go to your ECS cluster and choose Run new task. Note that you can select the compute options, the launch type, and your task definition.

Note: While this example goes through deploying a standalone task with an attached EBS volume, you can also configure a new or existing ECS service to use EBS volumes with the desired configuration.

You have a new Volume section where you can configure the additional storage. The volume name, type, and mount points are those that you defined in your task definition. Choose your EBS volume types, sizes (GiB), IOPs, and the desired throughput.

You cannot attach an existing EBS volume to an ECS task. But if you want to create a volume from an existing snapshot, you have the option to choose your snapshot ID. If you want to create a new volume, then you can leave this field empty. You can choose the file system type, either ext3 or ext4 file systems on Linux.

By default, when a task is terminated, Amazon ECS deletes the attached volume. If you need the data in the EBS volume to be retained after the task exits, check Delete on termination. Also, you need to create an AWS Identity and Access Management (IAM) role for volume management that contains the relevant permissions to allow Amazon ECS to make API calls on your behalf. For more information on this policy, see infrastructure role in the AWS documentation.

You can also configure encryption on your EBS volumes using either Amazon managed keys and customer managed keys. To learn more about the options, see our Amazon EBS encryption in the AWS documentation.

After configuring all task settings, choose Create to start your task.

3. Deploy and run your task with EBS volume
Once your task has started, you can see the volume information on the task definition details page. Choose a task and select the Volumes tab to find your created EBS volume details.

Your team can organize the development and operations of EBS volumes more efficiently. For example, application developers can configure the path where your application expects storage to be available in the task definition, and DevOps engineers can configure the actual EBS volume attributes at runtime when the application is deployed.

This allows DevOps engineers to deploy the same task definition to different environments with differing EBS volume configurations, for example, gp3 volumes in the development environments and io2 volumes in production.

Now available
Amazon ECS integration with Amazon EBS is available in nine AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm). You only pay for what you use, including EBS volumes and snapshots. To learn more, see the Amazon EBS pricing page and Amazon EBS volumes in ECS in the AWS documentation.

Give it a try now and send feedback to our public roadmap, AWS re:Post for Amazon ECS, or through your usual AWS Support contacts.

Channy

P.S. Special thanks to Maish Saidel-Keesing, a senior enterprise developer advocate at AWS for his contribution in writing this blog post.

Serverless ICYMI Q4 2023

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/serverless-icymi-q4-2023/

Welcome to the 24th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

2023 Q4 Calendar

2023 Q4 Calendar

ServerlessVideo

ServerlessVideo at re:Invent 2024

ServerlessVideo at re:Invent 2024

ServerlessVideo is a demo application built by the AWS Serverless Developer Advocacy team to stream live videos and also perform advanced post-video processing. It uses several AWS services including AWS Step Functions, Amazon EventBridge, AWS Lambda, Amazon ECS, and Amazon Bedrock in a serverless architecture that makes it fast, flexible, and cost-effective. Key features include an event-driven core with loosely coupled microservices that respond to events routed by EventBridge. Step Functions orchestrates using both Lambda and ECS for video processing to balance speed, scale, and cost. There is a flexible plugin-based architecture using Step Functions and EventBridge to integrate and manage multiple video processing workflows, which include GenAI.

ServerlessVideo allows broadcasters to stream video to thousands of viewers using Amazon IVS. When a broadcast ends, a Step Functions workflow triggers a set of configured plugins to process the video, generating transcriptions, validating content, and more. The application incorporates various microservices to support live streaming, on-demand playback, transcoding, transcription, and events. Learn more about the project and watch videos from reinvent 2023 at video.serverlessland.com.

AWS Lambda

AWS Lambda enabled outbound IPv6 connections from VPC-connected Lambda functions, providing virtually unlimited scale by removing IPv4 address constraints.

The AWS Lambda and AWS SAM teams also added support for sharing test events across teams using AWS SAM CLI to improve collaboration when testing locally.

AWS Lambda introduced integration with AWS Application Composer, allowing users to view and export Lambda function configuration details for infrastructure as code (IaC) workflows.

AWS added advanced logging controls enabling adjustable JSON-formatted logs, custom log levels, and configurable CloudWatch log destinations for easier debugging. AWS enabled monitoring of errors and timeouts occurring during initialization and restore phases in CloudWatch Logs as well, making troubleshooting easier.

For Kafka event sources, AWS enabled failed event destinations to prevent functions stalling on failing batches by rerouting events to SQS, SNS, or S3. AWS also enhanced Lambda auto scaling for Kafka event sources in November to reach maximum throughput faster, reducing latency for workloads prone to large bursts of messages.

AWS launched support for Python 3.12 and Java 21 Lambda runtimes, providing updated libraries, smaller deployment sizes, and better AWS service integration. AWS also introduced a simplified console workflow to automate complex network configuration when connecting functions to Amazon RDS and RDS Proxy.

Additionally in December, AWS enabled faster individual Lambda function scaling allowing each function to rapidly absorb traffic spikes by scaling up to 1000 concurrent executions every 10 seconds.

Amazon ECS and AWS Fargate

In Q4 of 2023, AWS introduced several new capabilities across its serverless container services including Amazon ECS, AWS Fargate, AWS App Runner, and more. These features help improve application resilience, security, developer experience, and migration to modern containerized architectures.

In October, Amazon ECS enhanced its task scheduling to start healthy replacement tasks before terminating unhealthy ones during traffic spikes. This prevents going under capacity due to premature shutdowns. Additionally, App Runner launched support for IPv6 traffic via dual-stack endpoints to remove the need for address translation.

In November, AWS Fargate enabled ECS tasks to selectively use SOCI lazy loading for only large container images in a task instead of requiring it for all images. Amazon ECS also added idempotency support for task launches to prevent duplicate instances on retries. Amazon GuardDuty expanded threat detection to Amazon ECS and Fargate workloads which users can easily enable.

Also in November, the open source Finch container tool for macOS became generally available. Finch allows developers to build, run, and publish Linux containers locally. A new website provides tutorials and resources to help developers get started.

Finally in December, AWS Migration Hub Orchestrator added new capabilities for replatforming applications to Amazon ECS using guided workflows. App Runner also improved integration with Route 53 domains to automatically configure required records when associating custom domains.

AWS Step Functions

In Q4 2023, AWS Step Functions announced the redrive capability for Standard Workflows. This feature allows failed workflow executions to be redriven from the point of failure, skipping unnecessary steps and reducing costs. The redrive functionality provides an efficient way to handle errors that require longer investigation or external actions before resuming the workflow.

Step Functions also launched support for HTTPS endpoints in AWS Step Functions, enabling easier integration with external APIs and SaaS applications without needing custom code. Developers can now connect to third-party HTTP services directly within workflows. Additionally, AWS released a new test state capability that allows testing individual workflow states before full deployment. This feature helps accelerate development by making it faster and simpler to validate data mappings and permissions configurations.

AWS announced optimized integrations between AWS Step Functions and Amazon Bedrock for orchestrating generative AI workloads. Two new API actions were added specifically for invoking Bedrock models and training jobs from workflows. These integrations simplify building prompt chaining and other techniques to create complex AI applications with foundation models.

Finally, the Step Functions Workflow Studio is now integrated in the AWS Application Composer. This unified builder allows developers to design workflows and define application resources across the full project lifecycle within a single interface.

Amazon EventBridge

Amazon EventBridge announced support for new partner integrations with Adobe and Stripe. These integrations enable routing events from the Adobe and Stripe platforms to over 20 AWS services. This makes it easier to build event-driven architectures to handle common use cases.

Amazon SNS

In Q4, Amazon SNS added native in-place message archiving for FIFO topics to improve event stream durability by allowing retention policies and selective replay of messages without provisioning separate resources. Additional message filtering operators were also introduced including suffix matching, case-insensitive equality checks, and OR logic for matching across properties to simplify routing logic implementation for publishers and subscribers. Finally, delivery status logging was enabled through AWS CloudFormation.

Amazon SQS

Amazon SQS has introduced several major new capabilities and updates. These improve visibility, throughput, and message handling for users. Specifically, Amazon SQS enabled AWS CloudTrail logging of key SQS APIs. This gives customers greater visibility into SQS activity. Additionally, SQS increased the throughput quota for the high throughput mode of FIFO queues. This was significantly increased in certain Regions. It also boosted throughput in Asia Pacific Regions. Furthermore, Amazon SQS added dead letter queue redrive support. This allows you to redrive messages that failed and were sent to a dead letter queue (DLQ).

Serverless at AWS re:Invent

Serverless videos from re:Invent

Serverless videos from re:Invent

Visit the Serverless Land YouTube channel to find a list of serverless and serverless container sessions from reinvent 2023. Hear from experts like Chris Munns and Julian Wood in their popular session, Best practices for serverless developers, or Nathan Peck and Jessica Deen in Deploying multi-tenant SaaS applications on Amazon ECS and AWS Fargate.

EDA Day Nashville

EDA Day Nashville

EDA Day Nashville

The AWS Serverless Developer Advocacy team hosted an event-driven architecture (EDA) day conference on October 26, 2022 in Nashville, Tennessee. This inaugural GOTO EDA day convened over 200 attendees ranging from prominent EDA community members to AWS speakers and product managers. Attendees engaged in 13 sessions, two workshops, and panels covering EDA adoption best practices. The event built upon 2022 content by incorporating additional topics like messaging, containers, and machine learning. It also created opportunities for students and underrepresented groups in tech to participate. The full-day conference facilitated education, inspiration, and thoughtful discussion around event-driven architectural patterns and services on AWS.

Videos from EDA Day are now available on the Serverless Land YouTube channel.

Serverless blog posts

October

November

December

Serverless container blog posts

October

November

December

Serverless Office Hours

Serverless office hours: Q4 videos

October

November

December

Containers from the Couch

Containers from the Couch

October

November

December

FooBar

FooBar

October

November

December

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

And finally, visit the Serverless Land and Containers on AWS websites for all your serverless and serverless container needs.

Detect runtime security threats in Amazon ECS and AWS Fargate, new in Amazon GuardDuty

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/introducing-amazon-guardduty-ecs-runtime-monitoring-including-aws-fargate/

Today, we’re announcing Amazon GuardDuty ECS Runtime Monitoring to help detect potential runtime security issues in Amazon Elastic Container Service (Amazon ECS) clusters running on both AWS Fargate and Amazon Elastic Compute Cloud (Amazon EC2).

GuardDuty combines machine learning (ML), anomaly detection, network monitoring, and malicious file discovery against various AWS data sources. When threats are detected, GuardDuty generates security findings and automatically sends them to AWS Security Hub, Amazon EventBridge, and Amazon Detective. These integrations help centralize monitoring for AWS and partner services, initiate automated responses, and launch security investigations.

GuardDuty ECS Runtime Monitoring helps detect runtime events such as file access, process execution, and network connections that might indicate runtime threats. It checks hundreds of threat vectors and indicators and can produce over 30 different finding types. For example, it can detect attempts of privilege escalation, activity generated by crypto miners or malware, or activity suggesting reconnaissance by an attacker. This is in addition to GuardDuty‘s primary detection categories.

GuardDuty ECS Runtime Monitoring uses a managed and lightweight security agent that adds visibility into individual container runtime behaviors. When using AWS Fargate, there is no need for you to install, configure, manage, or update the agent. We take care of that for you. This simplifies the management of your clusters and reduces the risk of leaving some tasks without monitoring. It also helps to improve your security posture and pass regulatory compliance and certification for runtime threats.

GuardDuty ECS Runtime Monitoring findings are visible directly in the console. You can configure GuardDuty to also send its findings to multiple AWS services or to third-party monitoring systems connected to your security operations center (SOC).

With this launch, Amazon Detective now receives security findings from GuardDuty ECS Runtime Monitoring and includes them in its collection of data for analysis and investigations. Detective helps to analyze, investigate, and quickly identify the root cause of potential security issues or suspicious activities. It collects log data from AWS resources and uses machine learning, statistical analysis, and graph theory to build a linked set of data that enables you to easily conduct security investigations.

Configure GuardDuty ECS Runtime Monitoring on AWS Fargate
For this demo, I choose to show the experience provided for AWS Fargate. When using Amazon ECS, you must ensure your EC2 instances have the GuardDuty agent installed. You can install the agent manually, bake it into your AMI, or use GuardDuty‘s provided AWS Systems Manager document to install it (go to Systems Manager in the console, select Documents, and then search for GuardDuty). The documentation has more details about installing the agent on EC2 instances.

When operating from a GuardDuty administrator account, I can enable GuardDuty ECS Runtime Monitoring at the organization level to monitor all ECS clusters in all organizations’ AWS accounts.

In this demo, I use the AWS Management Console to enable Runtime Monitoring. Enabling GuardDuty ECS Runtime Monitoring in the console has an effect on all your clusters.

When I want GuardDuty to automatically deploy the GuardDuty ECS Runtime Monitoring agent on Fargate, I enable GuardDuty agent management. To exclude individual clusters from automatic management, I can tag them with GuardDutyManaged=false. I make sure I tag my clusters before enabling ECS Runtime Monitoring in the console. When I don’t want to use the automatic management option, I can leave the option disabled and selectively choose the clusters to monitor with the tag GuardDutyManaged=true.

The Amazon ECS or AWS Fargate cluster administrator must have authorization to manage tags on the clusters.

The IAM TaskExecutionRole you attach to tasks must have permissions to download the GuardDuty agent from a private ECR repository. This is done automatically when you use the AmazonECSTaskExecutionRolePolicy managed IAM policy.

Here is my view of the console when the Runtime Monitoring and agent management are enabled.

guardduty ecs enbale monitoring

I can track the deployment of the security agent by assessing the Coverage statistics across all the ECS clusters.

guardduty ecs cluster coverage

Once monitoring is enabled, there is nothing else to do. Let’s see what findings it detects on my simple demo cluster.

Check out GuardDuty ECS runtime security findings
When GuardDuty ECS Runtime Monitoring detects potential threats, they appear in a list like this one.

ECS Runtime Monitoring - finding list

I select a specific finding to view more details about it.

ECS Runtime Monitoring - finding details

Things to know
By default, a Fargate task is immutable. GuardDuty won’t deploy the agent to monitor containers on existing tasks. If you want to monitor containers for already running tasks, you must stop and start the tasks after enabling GuardDuty ECS Runtime Monitoring. Similarly, when using Amazon ECS services, you must force a new deployment to ensure tasks are restarted with the agent. As I mentioned already, be sure the tasks have IAM permissions to download the GuardDuty monitoring agent from Amazon ECR.

We designed the GuardDuty agent to have little impact on performance, but you should plan for it in your Fargate task sizing calculations.

When you choose automatic agent management, GuardDuty also creates a VPC endpoint to allow the agent to communicate with GuardDuty APIs. When—just like me—you create your cluster with a CDK or CloudFormation script with the intention to delete the cluster after a period of time (for example, in a continuous integration scenario), bear in mind that the VPC endpoint must be deleted manually to allow CloudFormation to delete your stack.

Pricing and availability
You can now use GuardDuty ECS Runtime Monitoring on AWS Fargate and Amazon EC2 instances. For a full list of Regions where GuardDuty ECS Runtime Monitoring is available, visit our Region-specific feature availability page.

You can try GuardDuty ECS Runtime Monitoring for free for 30 days. When you enable GuardDuty for the first time, you have to explicitly enable GuardDuty ECS Runtime Monitoring. At the end of the trial period, we charge you per vCPU per hour of the monitoring agents. The GuardDuty pricing page has all the details.

Get insights about the threats to your container and enable GuardDuty ECS Runtime Monitoring today.

— seb

How to create an AMI hardening pipeline and automate updates to your ECS instance fleet

Post Syndicated from Nima Fotouhi original https://aws.amazon.com/blogs/security/how-to-create-an-ami-hardening-pipeline-and-automate-updates-to-your-ecs-instance-fleet/

Amazon Elastic Container Service (Amazon ECS) is a comprehensive managed container orchestrator that simplifies the deployment, maintenance, and scalability of container-based applications. With Amazon ECS, you can deploy your containerized application as a standalone task, or run a task as part of a service in your cluster. The Amazon ECS infrastructure for tasks includes Amazon Elastic Compute Cloud (Amazon EC2) instances in the AWS Cloud, serverless (AWS Fargate) in the AWS Cloud, or on-premises virtual machines (VMs) or servers. You can enable auto-scaling for Amazon ECS capacity providers when using EC2 instances, allowing your infrastructure to dynamically adjust based on workload demands. You define the infrastructure type or the capacity providers where you deploy your tasks or services.

You can choose EC2 instances as the computing resources for your ECS cluster, which allows you to control your cluster’s underlying infrastructure, including the size of EC2 instances, the instance operating system, and extra security controls required by a compliance framework. AWS recommends that you use Amazon ECS-optimized Amazon Machine Images (AMIs), which are set up with the requirements and recommendations to efficiently run your container workloads on Amazon Linux instances. We recommend that you refresh your container instances fleet with the latest ECS-optimized AMIs to include the latest bug fixes and feature updates. However, managing and updating your container instance fleet might become complex as your Amazon ECS workload grows.

In this blog post, I will show you how to create a workflow to enhance Amazon ECS-optimized AMIs by using the CIS Docker Benchmark and automatically updating your EC2 instances in your ECS cluster with the newly created AMIs.

Overview of CIS Docker Benchmark

The CIS Docker Benchmark provides prescriptive guidance for establishing a secure configuration posture for a Docker container engine, container host, container images and build files. The CIS Docker Benchmark has seven sections about Docker and container security:

  1. Host configuration
  2. Docker daemon configuration
  3. Docker daemon configuration files
  4. Container images and build file configuration
  5. Container runtime configuration
  6. Docker security operations
  7. Docker swarm configuration

The solution described in this post covers sections 1, 2, and 3 of the CIS Docker Benchmark, including security recommendations to prepare the host machine used for Amazon ECS workloads, securing the behavior of the Docker daemon (server), and securing Docker-related files and directory permissions and ownerships. However, the solution doesn’t implement all of the controls listed in these three sections. For a complete list of controls implemented, see the solution’s repository.

Solution overview

EC2 Image Builder is a fully managed AWS service, designed to simplify the process of creating, handling, and implementing server images that are custom, secure, and consistently updated. For this solution, you will deploy an EC2 Image Builder pipeline to apply the CIS Docker Benchmarks to an Amazon ECS-optimized AMI and use the created AMI to refresh the Amazon ECS instance fleet. This solution is customizable, so you can select the security controls to harden your base AMI. You can also specify cluster tags during CloudFormation template deployment; these tags will filter the ECS clusters that you have included in the Amazon EC2 instance refresh process. I’ve provided an AWS CloudFormation template that you can use to provision the necessary resources.

Figure 1: Amazon ECS instance refresh workflow

Figure 1: Amazon ECS instance refresh workflow

As shown in Figure 1, the solution involves the following steps:

  1. EC2 Image Builder
    1. The AMI image pipeline downloads the ansible playbook from the S3 bucket, and runs it against the base image.
    2. The pipeline publishes the hardened AMI.
    3. The pipeline validates the benchmarks applied to the base image and publishes the results to a test results S3 bucket. It also invokes Amazon Inspector to run a vulnerability scan on the published image.
  2. State machine initiation
    1. When the AMI is successfully published, the pipeline publishes a message to the AMI status SNS topic. The SNS topic invokes the State machine initiation Lambda function.
    2. The State machine initiation Lambda function extracts the image ID of the published AMI and uses it as the input to initiate the state machine.
  3. State machine
    1. The first state gathers information related to Amazon ECS clusters, including the capacity providers for the EC2 auto scaling group. It creates a new launch template version with the hardened AMI image ID for the EC2 auto scaling group.
    2. The second state uses the new launch template to initiate an instance refresh for the EC2 auto scaling group.
  4. Instance refresh status update
    1. The instance refresh rule selects the auto scaling group instance refresh events (failure, success, and cancellation events) and sends them to the Instance refresh status SNS topic.
    2. The Instance refresh status SNS topic sends an email on the instance refresh status to subscribers.
  5. Image update reminder
    1. A weekly scheduled rule invokes the Image update reminder Lambda function.
    2. The Image update reminder Lambda function retrieves the value for LatestECSOptimizedAMI from the CloudFormation template, and extracts the last modified date of the Amazon ECS-optimized AMI used as the base image in the EC2 Image Builder pipeline. It compares the last modified date of the AMI with the creation date of the latest AMI published by the pipeline. If a new base image is available, it publishes a message to the image update reminder SNS topic.
    3. The Image update reminder SNS topic sends a message to subscribers notifying them of a new base image. You need to create a new version of your image recipe to update it with the new AMI.

Prerequisites

To follow along with this walkthrough, make sure that you have the following prerequisites in place:

Walkthrough

To deploy the solution, complete the following steps.

Step 1: Download or clone the repository

The first step is to download or clone the solution’s repository.

To download the repository

  1. Go to the main page of the repository on GitHub.
  2. Choose Code, and then choose Download ZIP.

To clone the repository

  1. Make sure that you have Git installed.
  2. Run the following command in your terminal:

git clone https://github.com/aws-samples/ecs-image-hardening-and-instance-refresh.git

Step 2: Create an S3 bucket

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. An S3 bucket is a container for objects stored on Amazon S3. For this walkthrough, you need to create an S3 bucket and copy the content of the ansible folder to your newly created bucket. Make a note of your S3 bucket name because you will need it in the next step.

Step 3: Create the CloudFormation stack

In this step, you deploy the solution’s resources by creating a CloudFormation stack using the provided CloudFormation template. Sign in to your account and choose an AWS Region where you want to create the stack. Make sure that the Region you choose supports the services used by this solution. To create the stack, follow the steps in Creating a stack on the AWS CloudFormation console. Note that you need to provide values for the parameters defined in the template to deploy the stack. The following table lists the parameters that you need to provide.

Parameter Description
AnsiblePlaybookArguments ansible-playbook command arguments
AnsiblePlaybookBucket S3 bucket name containing ansible playbook
CloudFormationUpdaterEventBridgeRuleState Amazon EventBridge rule that invokes the Lambda function that checks for a new version of the EC2 Image Builder parent image
ClusterTags Tags in JSON format to filter the ECS clusters that you want to update
ComponentName Name of the EC2 Image Builder component
DistributionConfigurationName Name of the EC2 Image Builder distribution configuration
EnableImageScanning Choose whether or not to enable Amazon Inspector image scanning
ImagePipelineName Name of the EC2 Image Builder pipeline
InfrastructureConfigurationName Name of the EC2 Image Builder infrastructure configuration
InstanceType EC2 Image Builder infrastructure configuration EC2 instance type
LatestECSOptimizedAMI ECS-optimized AMI parameter name; for more info, see Retrieving Amazon ECS-optimized AMI metadata
libDockerVolumeSize Container partition size in gigabytes (GB)
libDockerVolumeType Container partition volume type
RecipeName Name of the EC2 Image Builder recipe
RootVolumeSize AMI root partition volume size in GB
RootVolumeType AMI root partition volume type

Step 4: Set up Amazon SNS topic subscribers

Amazon Simple Notification Service (Amazon SNS) is a web service that coordinates and manages the delivery or sending of messages to subscribing endpoints or clients. An Amazon SNS topic is a logical access point that acts as a communication channel.

The solution in this post creates three Amazon SNS topics to keep you informed of each step of the process. The following is a list of the topics that the solution creates and their purpose.

  • AMI status topic – a message is published to this topic upon successful creation of an AMI.
  • Image update reminder topic – a message is published to this topic if a newer version of the base Amazon ECS-optimized AMI is published by AWS.
  • Instance refresh status topic – a message is published to this topic each time that an ECS cluster capacity provider gets an instance fleet refresh.

You need to manually modify the subscriptions for each topic to receive messages published to that topic.

To modify the subscriptions for the topics created by the CloudFormation template

  1. Sign in to the Amazon SNS console.
  2. In the left navigation pane, choose Subscriptions.
  3. On the Subscriptions page, choose Create subscription.
  4. On the Create subscription page, in the Details section, do the following:
    • For Topic ARN, choose the Amazon Resource Name (ARN) of one of the topics that the CloudFormation topic created.
    • For Protocol, choose Email.
    • For Endpoint, enter the endpoint value. In our example, this is an email address, such as the email address of a distribution list.
    • Choose Create subscription.
  5. Repeat the preceding steps for the other two topics.

Step 5: Run the pipeline

The EC2 Image Builder pipeline that the solution creates consists of an image recipe with one component, an infrastructure configuration, and a distribution configuration. I’ve set up the image recipe to create an AMI, select a base image, choose components, and define block device mapping. There’s only one component where building and testing steps are defined. For the building step, the solution creates a separate partition for /var/lib/docker and mounts it to a dedicated device specified in the image recipe. It then applies the CIS Docker Benchmark ansible playbook and cleans up the unnecessary files and folders. In the test step, the solution runs Amazon inspector, a continuous assessment service that scans your AWS workloads for software vulnerabilities and unintended network exposure, and Docker Bench for Security. Optionally, you can create your own components and associate them with the image recipe to make further modifications on the base image.

You will need to manually run the pipeline by using either the AWS Management Console or AWC CLI.

To run the pipeline (console)

  1. Open the EC2 Image Builder console.
  2. From the pipeline details page, choose the name of your pipeline.
  3. From the Actions menu at the top of the page, select Run pipeline.

To run the pipeline (AWS CLI)

  1. Make sure that you have properly configured your AWS CLI.
  2. Run the following command. Replace <pipeline region> with your own information.

aws imagebuilder list-image-pipelines –region <pipeline region>

  1. From the list of pipelines, find the pipeline named ECSAnsiblePipeline and note the pipeline ARN, which you will use in the next step.
  2. Run the pipeline. Make sure to replace <pipeline arn> and <region> with your own information.

aws imagebuilder start-image-pipeline-execution –image-pipeline-arn <pipeline arn> –region <region>

The following is a process overview of the image hardening and instance refresh:

  1. Image hardening – when you start the pipeline, EC2 Image Builder creates the required infrastructure to build your AMI, applies the ansible playbook (CIS Docker Benchmark) to the base AMI, and publishes the hardened AMI. A message is published to the AMI status topic as well.
  2. Image testing – after publishing the AMI, EC2 Image Builder scans the newly created AMI with Amazon Inspector and reports the findings back. It also runs Docker Bench for Security to verify the changes that the ansible playbook made to the base AMI and publishes the results to an S3 bucket.
  3. State machine initiation – after a new AMI is successfully published, the AMI status topic invokes the State machine initiation Lambda function. The Lambda function invokes the instance refresh state machine and passes on the AMI info.
  4. Instance refresh – the instance refresh state machine has two steps:
    1. Gather cluster information – a Lambda function gathers information regarding EC2 capacity providers and their associated auto scaling groups. For each auto scaling group, it creates a new launch template and includes the hardened AMI information. When you create the CloudFormation stack, if you pass a tag or a list of tags, only clusters with matching tags are processed in this step.
    2. Auto scaling group instance refresh – the state machine uses the output of the first Lambda function (first state) and starts instance refresh for auto scaling groups in parallel (second state). An EventBridge rule publishes a message to the Instance refresh status topic upon successful refresh of each auto scaling group.

This solution also creates an EventBridge rule that is invoked weekly. This rule invokes the Image update reminder Lambda function, and notifies you if a new version of your base AMI has been published by AWS so that you can run the pipeline and update your hardened AMI.

Conclusion

In this blog post, you learned how to create a workflow to harden Amazon ECS-optimized AMIs by using the CIS Docker Benchmark and to automate the refresh of EC2 instances in your ECS clusters. This automated workflow has several advantages. First, it helps ensure a consistent and standardized process for image hardening, reducing potential human errors and inconsistencies. By automating the entire process, you can apply security and compliance standards across your instances. Second, the tight integration with AWS Step Functions enables smooth, orchestrated updates to the ECS cluster instances, enhancing the reliability and predictability of deployments. This automation also reduces manual intervention, helping you achieve time savings so that your teams can focus on more value-driven tasks. Moreover, this systematic approach helps to enhance the security posture of your Amazon ECS workloads because you can address vulnerabilities rapidly and systematically, helping to keep the environment resilient against potential threats.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Nima Fotouhi

Nima Fotouhi

Nima is a Security Consultant at AWS. He’s a builder with a passion for infrastructure as code (IaC) and policy as code (PaC) and helps customers build secure infrastructure on AWS. In his spare time, he loves to hit the slopes and go snowboarding.

Security considerations for running containers on Amazon ECS

Post Syndicated from Mutaz Hajeer original https://aws.amazon.com/blogs/security/security-considerations-for-running-containers-on-amazon-ecs/

If you’re looking to enhance the security of your containers on Amazon Elastic Container Service (Amazon ECS), you can begin with the six tips that we’ll cover in this blog post. These curated best practices are recommended by Amazon Web Services (AWS) container and security subject matter experts in order to help raise your container security posture.

Before we jump into best practices, let’s look at the how the shared responsibility model works for Amazon ECS hosted on Amazon Elastic Compute Cloud (Amazon EC2) infrastructure compared to AWS Fargate. The security and compliance of a managed service like Amazon ECS is a shared responsibility between you and AWS. Generally speaking, AWS is responsible for security of the cloud whereas you, the customer, are responsible for security in the cloud. AWS is responsible for the management of the Amazon ECS control plane, including the infrastructure that’s needed to deliver a secure and reliable service. In this post, we’re going to focus on the areas of ECS security that you will be responsible for and provide guidance on what you need to do to adhere to these ECS security best practices.

Figure 1 shows the shared responsibility model for Amazon ECS hosted on an EC2 instance, in which the customer has more security responsibility to cover than when using ECS on Fargate. For example, the ECS agent and the worker node configuration is the customer’s responsibility to govern, because the customer is managing the EC2 instance. Therefore, the customer will have to manage the ECS agent and worker node as part of their configuration and management operations.

Figure 1: Responsibility model for Amazon ECS hosted on an Amazon EC2 instance

Figure 1: Responsibility model for Amazon ECS hosted on an Amazon EC2 instance

AWS assumes greater responsibility for infrastructure security for Fargate, as shown in Figure 2.

Figure 2: Responsibility model for Amazon ECS hosted on Fargate

Figure 2: Responsibility model for Amazon ECS hosted on Fargate

In Fargate, each task runs in its own virtual machine (VM). No two tasks share the operating system or kernel resources. With Fargate, AWS manages the security of the underlying instance in the cloud and the runtime that’s used to run your tasks. It also automatically scales your infrastructure on your behalf, which is something you should take into consideration if you’re starting your container journey and deciding on your infrastructure options.

With that, let’s go through these six Amazon ECS security best practices.

1 – Manage ECS access with IAM policies and roles

AWS Identity and Access Management (IAM) policies can help you control access to Amazon ECS. For this we recommend that you do the following:

  • Enforce least privilege when setting up policies for Amazon ECS resources – Use resource-level permissions to specify upon which resources you want to allow particular actions. For example, only allow a specific IAM user to stop a task that uses a specific task definition family on a specific ECS cluster.
  • Specify your task’s role – Make sure to define the right task role in your ECS task definition. The task role is used by your application in the task to make API calls to AWS services like Amazon Simple Storage Service (Amazon S3). This allows you to run your tasks by using an IAM role that has only the necessary permissions, without complete access to all services and resources within your account.
  • Create automated pipelines – Use Amazon CodePipeline or one of your other preferred continuous integration and continuous delivery (CI/CD) solutions to create pipelines that package and deploy your applications into ECS clusters. This way, you limit the users’ actions and delegate them to the automated pipeline. For an example of how to create pipelines, see Automatically build CI/CD pipelines and Amazon ECS clusters for microservices using AWS CDK.
  • Audit Amazon ECS API access – Track and monitor your AWS CloudTrail logs to identify who has access to your Amazon ECS APIs and whether that access is still warranted. You can then delete the IAM users, roles, and groups that aren’t in use and review the policies that are in place. For more information, see the AWS security audit guidelines.

2 – Secure your ECS network

Network security is an important item to work on as part of applying best practices to secure your Amazon ECS environment. This area includes several sub-areas such as firewalling, traffic routing, and network observability. Here’s what we recommend:

  • Network segmentation and isolation – Amazon ECS tasks are configured to operate in different network modes. AWS recommends the use of awsvpc as the preferred network mode. This is because it’s the only mode that you can use to assign security groups to tasks. After you configure your task to use this mode, the ECS agent automatically provisions and attaches an elastic network interface (ENI) to the task. When the ENI is provisioned, the task is enrolled in an AWS security group. The security group acts as a virtual firewall that you can use to control inbound and outbound traffic. It’s also the only mode that’s available for Fargate tasks on ECS if you choose to go that route.
  • Use network encryption where applicable – Encrypting network traffic helps prevent unauthorized users from intercepting and reading data when that data is transmitted across a network. With Amazon ECS, you can implement network encryption in different ways, such as with a service mesh (TLS), using AWS Nitro system instances, using server name indication (SNI) with an application load balancer, or end-to-end encryption with TLS certificates. If your service is fronted by a public-facing load balancer, use TLS/SSL to encrypt the traffic from the client’s browser to the load balancer and re-encrypt traffic to the backend if warranted. For more information, see Amazon ECS encryption in transit.
  • Create clusters in separate VPCs when network traffic needs to be strictly isolated – You should create clusters in separate virtual private clouds (VPCs) when network traffic needs to be strictly isolated. Avoid running workloads that have strict security requirements on clusters with workloads that don’t have to adhere to those requirements. When strict network isolation is mandatory, create clusters in separate VPCs and selectively expose services to other VPCs by using VPC endpoints. For more information, see VPC endpoints.
  • Configure AWS PrivateLink endpoints when possible – AWS PrivateLink is a networking technology that allows you to create private endpoints for different AWS services, including Amazon ECS. You should configure AWS PrivateLink endpoints when possible. If your security policy prevents you from attaching an internet gateway to your Amazon VPCs, then configure PrivateLink endpoints for ECS and other services such as Amazon Elastic Container Registry (Amazon ECR), AWS Secrets Manager, and Amazon CloudWatch. For more details, see the Amazon ECS Best Practices Guide.

3 – ECS secrets management

Secrets, such as API keys and database credentials, are frequently used by applications to gain access to other systems. They often consist of a username and password, a certificate, or an API key. Access to these secrets should be restricted to specific IAM principals that are using IAM and injected into containers at runtime. Here’s what we recommend:

  • Use Secrets Manager or Amazon EC2 Systems Manager Parameter Store for storing secret materials – Securely storing API keys, database credentials, and other secret materials is crucial to help prevent accidental exposure and unauthorized access. AWS recommends that you store these secrets in Secrets Manager or as an encrypted parameter in Amazon EC2 Systems Manager Parameter Store. These services are similar because they’re both managed key-value stores that use AWS Key Management Service (AWS KMS) to encrypt sensitive data. Secrets Manager, however, also includes the ability to automatically rotate secrets, generate random secrets, and share secrets across AWS accounts. Additionally, Amazon ECS does not support versioned parameters in Parameter Store. If you need to implement any of these features, use Secrets Manager; otherwise, use encrypted parameters. Also, you can use tools like Chamber to manage secrets. For more information, see this Knowledge Center article.
  • Mount the secret to a volume by using a sidecar container – Considering the elevated risk of data leakage with environmental variables, it’s recommended that you run a sidecar container that reads your secrets from Secrets Manager and writes them to a shared volume. This container can run and exit before the application container by using Amazon ECS container ordering. When you do this, the application container subsequently mounts the volume where the secret was written. This will help isolate secret management concerns and facilitates dynamic secret handling. For example, your application should be written to read the secret from the shared volume. Then, because the volume is scoped to the task, the volume is automatically deleted after the task stops. For more details about sidecar containers, see the aws-secret-sidecar-injector project in GitHub.

4 – Secure the ECS task and runtime

You should consider the container image as your first line of defense. An insecure, poorly constructed image can allow users to escape the bounds of the container and gain access to the host. You should do the following to mitigate the chances of this happening:

  • Secure your container’s images – Escape to host is a well-known container threat technique where bad actors use unsecured container images to escape the bounds of a container and gain access to the underlying host. We recommend that you scan your container’s images before deployment. For images stored on Amazon ECR, you can use Amazon Inspector to scan your images, along with Amazon EventBridge to be notified to take actions to either delete or rebuild insecure images. This process is shown in the architecture in Figure 3. You can find more details on how to create custom responses to Amazon Inspector findings with Amazon EventBridge in the Amazon Inspector User Guide.

    Figure 3: Sample architecture showing how to get notified of Amazon Inspector findings on a container’s image

    Figure 3: Sample architecture showing how to get notified of Amazon Inspector findings on a container’s image

  • Enable the ECR tag immutability feature – Threat actors could also try to push a compromised version of a container image into your Amazon ECR repository with an identical tag. A solution for this is to force a new tag for each new version of your image. You can do this by enabling the image tag mutability feature for your ECR repositories. You can find the Tag immutability setting on the Create repository page in the Amazon ECR console, under General settings, as shown in Figure 4.
    Figure 4: Enabling the tag immutability feature for your Amazon ECR repository

    Figure 4: Enabling the tag immutability feature for your Amazon ECR repository

  • Secure your containers and tasks
    • Define the USER parameter to use inside your container – Containers run by default as the root user, which doesn’t adhere to the principle of least privilege and can be misused. One recommendation is to make sure to run your containers as a non-root user by specifying the USER directive in your Dockerfile. You can enforce this when using a CI/CD pipeline by configuring the pipeline to fail the build if the USER directive is missing.
    • Don’t run your containers in privileged mode – Make sure to not run your containers in privileged mode, which can be a potential gap that allows unauthorized users to run commands within a container. You can use AWS Security Hub to detect containers that are running in privileged mode. Alternatively, you can use AWS Lambda to scan your task definitions for the use of the privileged parameter. Security Hub has a built-in control (ECS.4) to check whether the privileged parameter in the container definition of Amazon ECS task definitions is set to true.
  • Disable ECS Exec – Customers should disable the ECS Execute condition key for production environments. Disabling the key provides access control that can help prevent SSH access into running containers. You can do this by disabling the ECS:Enable-Execute-Command condition key.
  • Secure runtime – For Linux containers, make sure to add or drop Linux kernel capabilities in the task definition. You can do this either by using linuxParameters and applying SELinux labels or by using the AppArmor profile, which is a Linux security module that restricts a container’s capabilities, such as accessing parts of the file system. When you’re using the Fargate launch type, each Fargate task has its own isolation boundary and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task.

5 – ECS logging and monitoring

Logging and monitoring your container’s activity can help you quickly identify and investigate security incidents in your AWS environments. For example, threat actors might have escalated permissions and have access to your root user. Here’s what we recommend:

  • Monitor your root-user activities – Configure an Amazon EventBridge rule that detects root-user activities based on Amazon CloudTrail logs. For more details, see this blog post.
  • Monitor changes to your tasks and containers – Put appropriate events rules in place in Amazon EventBridge for the creation of and changes to your tasks and containers.
  • Monitor Amazon ECS scheduled tasks – If threat actors have enough privileges, they can abuse the ECS task scheduling feature to deploy containers that would run malicious code. Monitor this type of activity by using Amazon CloudTrail logs and get notifications. For more information about scheduling ECS tasks, see the Amazon ECS Developer Guide.
  • Monitor your container’s activity metrics – Another recommendation is to enable logging for your container and use Amazon CloudWatch to track activity metrics on your containers, such as CPU and memory utilization. This can help you detect if your resources are accessed and being used for malicious activities, such as launching DoS attacks. See Amazon ECS CloudWatch Container Insights for more information.
  • Use Amazon VPC Flow Logs to analyze the traffic to and from long-running tasks – You should use Amazon VPC Flow Logs to analyze the traffic to and from long-running tasks. Tasks that use awsvpc network mode get their own ENI. By setting tasks to use this mode, you can use VPC flow logs to monitor traffic that goes to and from individual tasks. A recent update to Amazon VPC Flow Logs (v3) enriches the logs with traffic metadata, including the VPC ID, subnet ID, and the instance ID. You can use this metadata to help narrow an investigation. For more information, see Amazon VPC Flow Logs. AWS cloud-native tools like Amazon GuardDuty inspect VPC flow logs and generate alerts and findings if unusual activity is detected.

6 – ECS security compliance

When using Amazon ECS, your compliance responsibility is determined by the sensitivity of your data, the compliance objectives of your company, and applicable laws and regulations. For example, with regards to the Payment Card Industry Data Security Standard (PCI-DSS), it’s important that you understand the complete flow of cardholder data (CHD) within your environment.

The temporary nature of containerized applications provides additional complexities when auditing configurations. As a result, customers need to maintain an awareness of all container configuration parameters, to make sure that compliance requirements are addressed throughout the phases of a container lifecycle. For additional information on adhering to PCI DSS compliance on Amazon ECS, see the Architecting on Amazon ECS for PCI DSS Compliance whitepaper.

One service that can help with monitoring Amazon ECS compliance is AWS Security Hub. You can use this service to monitor your usage of ECS as it relates to security best practices. Security Hub uses controls to evaluate resource configurations and security standards to help you comply with various compliance frameworks. For more information about using Security Hub to evaluate ECS resources, see Amazon ECS controls in the AWS Security Hub User Guide.

Conclusion

In this blog post, we presented a curated list of best practices for securing your Amazon ECS implementation. You can use these best practices as a starting point to increase the security posture of your ECS environment. You can always add, remove, or prioritize the best practice items based on your business needs and requirements. If you’re looking for more detailed guidance on securing ECS in your environment, we suggest that you take a look at Amazon ECS Security Best Practices.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on AWS re:Post or contact AWS Support.

Mutaz Hajeer

Mutaz Hajeer

Mutaz is a Senior Security Solutions Architect on the AWS Worldwide Commercial Sector Security Specialist team, working with customers in North America. Mutaz has been working within the cybersecurity field for 14 years and now focuses on threat detection and incident response services within AWS. Outside of work, he likes to coach, play, and watch soccer, along with spending time with his wife and three kids.

Ibtissam Liedri author

Ibtissam Liedri

Ibtissam is a Solutions Architect for AWS Financial Services. She assists financial services customers throughout their cloud journeys, helping them craft scalable, flexible, and resilient architectures. Ibtissam has an interest in cloud security with a focus on threat detection and incident response services within AWS and enjoys helping customers understand how to better build and secure their workloads.

Temi Adebambo

Temi Adebambo

Temi is the head of Security Solutions Architecture at AWS, with extensive experience leading technical teams and delivering enterprise-wide technology transformation programs. He has assisted Fortune 500 corporations with cloud security architecture, cyber risk management, compliance, IT security strategy, and governance. Prior to AWS, Temi served in various roles at Deloitte and PwC, providing consulting services in cybersecurity across industries.

ITS adopts microservices architecture for improved air travel search engine

Post Syndicated from Sushmithe Sekuboyina original https://aws.amazon.com/blogs/architecture/its-adopts-microservices-architecture-for-improved-air-travel-search-engine/

Internet Travel Solutions, LLC (ITS) is a travel management company that develops and maintains smart products and services for the corporate, commercial, and cargo sectors. ITS streamlines travel bookings for companies of any size around the world. It provides an intuitive consumer site with an integrated view of your travel and expenses.

ITS had been using monolithic architectures to host travel applications for years. As demand grew, applications became more complex, difficult to scale, and challenging to update over time. This slowed down deployment cycles.

In this blog post, we will explore how ITS improved speed to market, business agility, and performance, by modernizing their air travel search engine. We’ll show how they refactored their monolith application into microservices, using services such as Amazon Elastic Container Service (ECS)Amazon ElastiCache for Redis, and AWS Systems Manager.

Building a microservices-based air travel search engine

Typically, when a customer accesses the search widget on the consumer site, they select their origin, destination, and travel dates. Then, flights matching these search criteria are displayed. Data is retrieved from the backend database, and multiple calls are made to the Global Distribution System and external partner’s APIs, which typically takes 10-15 seconds. ITS then uses proprietary logic combined with business policies to curate the best results for the user. The existing monolith system worked well for normal workloads. However, when the number of concurrent user requests increased, overall performance of the application degraded.

In order to enhance the user experience, significantly accelerate search speed, and advance ITS’ modernization initiative, ITS chose to restructure their air travel application into microservices. The key goals in rearchitecting the application are:

  • To break down search components into logical units
  • To reduce database load by serving transient requests through memory-based storage
  • To decrease application logic processing on ITS’ side to under 3 seconds

Overview of the solution

To begin, we decompose our air travel search engine into microservices (for example, search, list, PriceGraph, and more). Next, we containerize the application to simplify and optimize system utilization by running these microservices using AWS Fargate, a serverless compute option on Amazon ECS.

Every search call processes about 30-60 MB of data in varying formats from different data stores. We use a new JSON-based data format to streamline varying data formats and store this data in Amazon ElastiCache for Redis, an in-memory data store that provides sub-millisecond latency and data structure flexibility. Additionally, some of the static data used by our air travel search application was moved to Amazon DynamoDB for faster retrieval speeds.

ITS’ microservice architecture, using AWS

Figure 1. ITS’ microservice architecture, using AWS

ITS’ modernized architecture has several benefits beyond reducing operational expenses (OpEx). Some of these advantages include:

  • Agility. This architecture streamlines development, testing, and deploying changes on individual components, leading to faster iterations and shorter time-to-market (TTM).
  • Scalability. The managed scaling feature of AWS Fargate eliminates the need to worry about cluster autoscaling when setting up capacity providers. Amazon ECS actively oversees the task lifecycle and health status, responding to unexpected occurrences like crashes or freezes by initiating tasks as necessary to fulfill our service demands. This capability enhances resource utilization, ensures business continuity, and lowers overall total cost of ownership (TCO), letting the application owner focus on business needs.
  • Improved performance. Integrating Amazon ElastiCache for Redis with Amazon ECS on AWS Fargate to cache frequently accessed data significantly improves search response times and lowers load on backend services.
  • Centralized configuration management. Decoupling configuration parameters like database connection, strings, and environment variables from application code by integrating AWS Systems Manager Parameter Store, also provides consistency across tasks.

Results and metrics

ITS designed this architecture, tested, and implemented it in their production environment. ITS benchmarked this solution against their monolith application under varying factors for four months and noticed a significant improvement in air travel search speeds and overall performance. Here are the results:

Single User Non-cloud airlist page round trip (RT) Cloud airlist page RT
Leg 1 Leg 2 Leg 1 Leg 2
Test 1 29 secs 17 secs 11 secs 2 secs
Test 2 24 secs 11 secs 11.8 secs 1 sec
Test 3 24 secs 12 secs 14 secs 1 sec

Table 1. Monolithic versus modernized architecture response times

Searching round trip (RT) flights in the old system resulted in an average runtime of 27 seconds for the first leg, and 12 seconds for the return leg. With the new system, the average time is 12 seconds for the first leg and 1.3 seconds for the return leg. This is a combined improvement of 72%

Note that this time includes the trip time for our calls to reach an external vendor and receive inventory back. This usually ranges from 6 to 17 seconds, depending on the third-party system performance. Leg 2 performance for our new system is significantly faster (between 1-2 seconds). This is because search results are served directly from the Amazon ElastiCache for Redis in-memory datastore, rather than querying backend databases. This decreases load on the database, enabling it to handle more complex and resource-intensive operations efficiently.

Table 2 shows the results of endurance tests:

Endurance Test Cloud airlist page RT
Leg 1 Leg 2
50 Users in 10 minutes 14.01 secs 4.48 secs
100 Users in 15 minutes 14.47 secs 13.31 secs

Table 2. Endurance test

Table 3 shows the results of spike tests:

Spike Test Cloud airlist page RT
Leg 1 Leg 2
10 Users 12.34 secs 9.41 secs
20 Users 11.97 secs 10.55 secs
30 Users 15 secs 7.75 secs

Table 3. Spike test

Conclusion

In this blog post, we explored how Internet Travel Solutions, LLC (ITS) is using Amazon ECS on AWS Fargate, Amazon ElastiCache for Redis, and other services to containerize microservices, reduce costs, and increase application performance. This results in a vastly improved search results speed. ITS overcame many technical complexities and design considerations to modernize its air travel search engine.

To learn more about refactoring monolith application into microservices, visit Decomposing monoliths into microservices. If you are interested in learning more about Amazon ECS on AWS Fargate, visit Getting started with AWS Fargate.

Blue/Green deployments using AWS CDK Pipelines and AWS CodeDeploy

Post Syndicated from Luiz Decaro original https://aws.amazon.com/blogs/devops/blue-green-deployments-using-aws-cdk-pipelines-and-aws-codedeploy/

Customers often ask for help with implementing Blue/Green deployments to Amazon Elastic Container Service (Amazon ECS) using AWS CodeDeploy. Their use cases usually involve cross-Region and cross-account deployment scenarios. These requirements are challenging enough on their own, but in addition to those, there are specific design decisions that need to be considered when using CodeDeploy. These include how to configure CodeDeploy, when and how to create CodeDeploy resources (such as Application and Deployment Group), and how to write code that can be used to deploy to any combination of account and Region.

Today, I will discuss those design decisions in detail and how to use CDK Pipelines to implement a self-mutating pipeline that deploys services to Amazon ECS in cross-account and cross-Region scenarios. At the end of this blog post, I also introduce a demo application, available in Java, that follows best practices for developing and deploying cloud infrastructure using AWS Cloud Development Kit (AWS CDK).

The Pipeline

CDK Pipelines is an opinionated construct library used for building pipelines with different deployment engines. It abstracts implementation details that developers or infrastructure engineers need to solve when implementing a cross-Region or cross-account pipeline. For example, in cross-Region scenarios, AWS CloudFormation needs artifacts to be replicated to the target Region. For that reason, AWS Key Management Service (AWS KMS) keys, an Amazon Simple Storage Service (Amazon S3) bucket, and policies need to be created for the secondary Region. This enables artifacts to be moved from one Region to another. In cross-account scenarios, CodeDeploy requires a cross-account role with access to the KMS key used to encrypt configuration files. This is the sort of detail that our customers want to avoid dealing with manually.

AWS CodeDeploy is a deployment service that automates application deployment across different scenarios. It deploys to Amazon EC2 instances, On-Premises instances, serverless Lambda functions, or Amazon ECS services. It integrates with AWS Identity and Access Management (AWS IAM), to implement access control to deploy or re-deploy old versions of an application. In the Blue/Green deployment type, it is possible to automate the rollback of a deployment using Amazon CloudWatch Alarms.

CDK Pipelines was designed to automate AWS CloudFormation deployments. Using AWS CDK, these CloudFormation deployments may include deploying application software to instances or containers. However, some customers prefer using CodeDeploy to deploy application software. In this blog post, CDK Pipelines will deploy using CodeDeploy instead of CloudFormation.

A pipeline build with CDK Pipelines that deploys to Amazon ECS using AWS CodeDeploy. It contains at least 5 stages: Source, Build, UpdatePipeline, Assets and at least one Deployment stage.

Design Considerations

In this post, I’m considering the use of CDK Pipelines to implement different use cases for deploying a service to any combination of accounts (single-account & cross-account) and regions (single-Region & cross-Region) using CodeDeploy. More specifically, there are four problems that need to be solved:

CodeDeploy Configuration

The most popular options for implementing a Blue/Green deployment type using CodeDeploy are using CloudFormation Hooks or using a CodeDeploy construct. I decided to operate CodeDeploy using its configuration files. This is a flexible design that doesn’t rely on using custom resources, which is another technique customers have used to solve this problem. On each run, a pipeline pushes a container to a repository on Amazon Elastic Container Registry (ECR) and creates a tag. CodeDeploy needs that information to deploy the container.

I recommend creating a pipeline action to scan the AWS CDK cloud assembly and retrieve the repository and tag information. The same action can create the CodeDeploy configuration files. Three configuration files are required to configure CodeDeploy: appspec.yaml, taskdef.json and imageDetail.json. This pipeline action should be executed before the CodeDeploy deployment action. I recommend creating template files for appspec.yaml and taskdef.json. The following script can be used to implement the pipeline action:

##
#!/bin/sh
#
# Action Configure AWS CodeDeploy
# It customizes the files template-appspec.yaml and template-taskdef.json to the environment
#
# Account = The target Account Id
# AppName = Name of the application
# StageName = Name of the stage
# Region = Name of the region (us-east-1, us-east-2)
# PipelineId = Id of the pipeline
# ServiceName = Name of the service. It will be used to define the role and the task definition name
#
# Primary output directory is codedeploy/. All the 3 files created (appspec.json, imageDetail.json and 
# taskDef.json) will be located inside the codedeploy/ directory
#
##
Account=$1
Region=$2
AppName=$3
StageName=$4
PipelineId=$5
ServiceName=$6
repo_name=$(cat assembly*$PipelineId-$StageName/*.assets.json | jq -r '.dockerImages[] | .destinations[] | .repositoryName' | head -1) 
tag_name=$(cat assembly*$PipelineId-$StageName/*.assets.json | jq -r '.dockerImages | to_entries[0].key')  
echo ${repo_name} 
echo ${tag_name} 
printf '{"ImageURI":"%s"}' "$Account.dkr.ecr.$Region.amazonaws.com/${repo_name}:${tag_name}" > codedeploy/imageDetail.json                     
sed 's#APPLICATION#'$AppName'#g' codedeploy/template-appspec.yaml > codedeploy/appspec.yaml 
sed 's#APPLICATION#'$AppName'#g' codedeploy/template-taskdef.json | sed 's#TASK_EXEC_ROLE#arn:aws:iam::'$Account':role/'$ServiceName'#g' | sed 's#fargate-task-definition#'$ServiceName'#g' > codedeploy/taskdef.json 
cat codedeploy/appspec.yaml
cat codedeploy/taskdef.json
cat codedeploy/imageDetail.json

Using a Toolchain

A good strategy is to encapsulate the pipeline inside a Toolchain to abstract how to deploy to different accounts and regions. This helps decoupling clients from the details such as how the pipeline is created, how CodeDeploy is configured, and how cross-account and cross-Region deployments are implemented. To create the pipeline, deploy a Toolchain stack. Out-of-the-box, it allows different environments to be added as needed. Depending on the requirements, the pipeline may be customized to reflect the different stages or waves that different components might require. For more information, please refer to our best practices on how to automate safe, hands-off deployments and its reference implementation.

In detail, the Toolchain stack follows the builder pattern used throughout the CDK for Java. This is a convenience that allows complex objects to be created using a single statement:

 Toolchain.Builder.create(app, Constants.APP_NAME+"Toolchain")
        .stackProperties(StackProps.builder()
                .env(Environment.builder()
                        .account(Demo.TOOLCHAIN_ACCOUNT)
                        .region(Demo.TOOLCHAIN_REGION)
                        .build())
                .build())
        .setGitRepo(Demo.CODECOMMIT_REPO)
        .setGitBranch(Demo.CODECOMMIT_BRANCH)
        .addStage(
                "UAT",
                EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES,
                Environment.builder()
                        .account(Demo.SERVICE_ACCOUNT)
                        .region(Demo.SERVICE_REGION)
                        .build())                                                                                                             
        .build();

In the statement above, the continuous deployment pipeline is created in the TOOLCHAIN_ACCOUNT and TOOLCHAIN_REGION. It implements a stage that builds the source code and creates the Java archive (JAR) using Apache Maven.  The pipeline then creates a Docker image containing the JAR file.

The UAT stage will deploy the service to the SERVICE_ACCOUNT and SERVICE_REGION using the deployment configuration CANARY_10_PERCENT_5_MINUTES. This means 10 percent of the traffic is shifted in the first increment and the remaining 90 percent is deployed 5 minutes later.

To create additional deployment stages, you need a stage name, a CodeDeploy deployment configuration and an environment where it should deploy the service. As mentioned, the pipeline is, by default, a self-mutating pipeline. For example, to add a Prod stage, update the code that creates the Toolchain object and submit this change to the code repository. The pipeline will run and update itself adding a Prod stage after the UAT stage. Next, I show in detail the statement used to add a new Prod stage. The new stage deploys to the same account and Region as in the UAT environment:

... 
        .addStage(
                "Prod",
                EcsDeploymentConfig.CANARY_10_PERCENT_5_MINUTES,
                Environment.builder()
                        .account(Demo.SERVICE_ACCOUNT)
                        .region(Demo.SERVICE_REGION)
                        .build())                                                                                                                                      
        .build();

In the statement above, the Prod stage will deploy new versions of the service using a CodeDeploy deployment configuration CANARY_10_PERCENT_5_MINUTES. It means that 10 percent of traffic is shifted in the first increment of 5 minutes. Then, it shifts the rest of the traffic to the new version of the application. Please refer to Organizing Your AWS Environment Using Multiple Accounts whitepaper for best-practices on how to isolate and manage your business applications.

Some customers might find this approach interesting and decide to provide this as an abstraction to their application development teams. In this case, I advise creating a construct that builds such a pipeline. Using a construct would allow for further customization. Examples are stages that promote quality assurance or deploy the service in a disaster recovery scenario.

The implementation creates a stack for the toolchain and another stack for each deployment stage. As an example, consider a toolchain created with a single deployment stage named UAT. After running successfully, the DemoToolchain and DemoService-UAT stacks should be created as in the next image:

Two stacks are needed to create a Pipeline that deploys to a single environment. One stack deploys the Toolchain with the Pipeline and another stack deploys the Service compute infrastructure and CodeDeploy Application and DeploymentGroup. In this example, for an application named Demo that deploys to an environment named UAT, the stacks deployed are: DemoToolchain and DemoService-UAT

CodeDeploy Application and Deployment Group

CodeDeploy configuration requires an application and a deployment group. Depending on the use case, you need to create these in the same or in a different account from the toolchain (pipeline). The pipeline includes the CodeDeploy deployment action that performs the blue/green deployment. My recommendation is to create the CodeDeploy application and deployment group as part of the Service stack. This approach allows to align the lifecycle of CodeDeploy application and deployment group with the related Service stack instance.

CodePipeline allows to create a CodeDeploy deployment action that references a non-existing CodeDeploy application and deployment group. This allows us to implement the following approach:

  • Toolchain stack deploys the pipeline with CodeDeploy deployment action referencing a non-existing CodeDeploy application and deployment group
  • When the pipeline executes, it first deploys the Service stack that creates the related CodeDeploy application and deployment group
  • The next pipeline action executes the CodeDeploy deployment action. When the pipeline executes the CodeDeploy deployment action, the related CodeDeploy application and deployment will already exist.

Below is the pipeline code that references the (initially non-existing) CodeDeploy application and deployment group.

private IEcsDeploymentGroup referenceCodeDeployDeploymentGroup(
        final Environment env, 
        final String serviceName, 
        final IEcsDeploymentConfig ecsDeploymentConfig, 
        final String stageName) {

    IEcsApplication codeDeployApp = EcsApplication.fromEcsApplicationArn(
            this,
            Constants.APP_NAME + "EcsCodeDeployApp-"+stageName,
            Arn.format(ArnComponents.builder()
                    .arnFormat(ArnFormat.COLON_RESOURCE_NAME)
                    .partition("aws")
                    .region(env.getRegion())
                    .service("codedeploy")
                    .account(env.getAccount())
                    .resource("application")
                    .resourceName(serviceName)
                    .build()));

    IEcsDeploymentGroup deploymentGroup = EcsDeploymentGroup.fromEcsDeploymentGroupAttributes(
            this,
            Constants.APP_NAME + "-EcsCodeDeployDG-"+stageName,
            EcsDeploymentGroupAttributes.builder()
                    .deploymentGroupName(serviceName)
                    .application(codeDeployApp)
                    .deploymentConfig(ecsDeploymentConfig)
                    .build());

    return deploymentGroup;
}

To make this work, you should use the same application name and deployment group name values when creating the CodeDeploy deployment action in the pipeline and when creating the CodeDeploy application and deployment group in the Service stack (where the Amazon ECS infrastructure is deployed). This approach is necessary to avoid a circular dependency error when trying to create the CodeDeploy application and deployment group inside the Service stack and reference these objects to configure the CodeDeploy deployment action inside the pipeline. Below is the code that uses Service stack construct ID to name the CodeDeploy application and deployment group. I set the Service stack construct ID to the same name I used when creating the CodeDeploy deployment action in the pipeline.

   // configure AWS CodeDeploy Application and DeploymentGroup
   EcsApplication app = EcsApplication.Builder.create(this, "BlueGreenApplication")
           .applicationName(id)
           .build();

   EcsDeploymentGroup.Builder.create(this, "BlueGreenDeploymentGroup")
           .deploymentGroupName(id)
           .application(app)
           .service(albService.getService())
           .role(createCodeDeployExecutionRole(id))
           .blueGreenDeploymentConfig(EcsBlueGreenDeploymentConfig.builder()
                   .blueTargetGroup(albService.getTargetGroup())
                   .greenTargetGroup(tgGreen)
                   .listener(albService.getListener())
                   .testListener(listenerGreen)
                   .terminationWaitTime(Duration.minutes(15))
                   .build())
           .deploymentConfig(deploymentConfig)
           .build();

CDK Pipelines roles and permissions

CDK Pipelines creates roles and permissions the pipeline uses to execute deployments in different scenarios of regions and accounts. When using CodeDeploy in cross-account scenarios, CDK Pipelines deploys a cross-account support stack that creates a pipeline action role for the CodeDeploy action. This cross-account support stack is defined in a JSON file that needs to be published to the AWS CDK assets bucket in the target account. If the pipeline has the self-mutation feature on (default), the UpdatePipeline stage will do a cdk deploy to deploy changes to the pipeline. In cross-account scenarios, this deployment also involves deploying/updating the cross-account support stack. For this, the SelfMutate action in UpdatePipeline stage needs to assume CDK file-publishing and a deploy roles in the remote account.

The IAM role associated with the AWS CodeBuild project that runs the UpdatePipeline stage does not have these permissions by default. CDK Pipelines cannot grant these permissions automatically, because the information about the permissions that the cross-account stack needs is only available after the AWS CDK app finishes synthesizing. At that point, the permissions that the pipeline has are already locked-in­­. Hence, for cross-account scenarios, the toolchain should extend the permissions of the pipeline’s UpdatePipeline stage to include the file-publishing and deploy roles.

In cross-account environments it is possible to manually add these permissions to the UpdatePipeline stage. To accomplish that, the Toolchain stack may be used to hide this sort of implementation detail. In the end, a method like the one below can be used to add these missing permissions. For each different mapping of stage and environment in the pipeline it validates if the target account is different than the account where the pipeline is deployed. When the criteria is met, it should grant permission to the UpdatePipeline stage to assume CDK bootstrap roles (tagged using key aws-cdk:bootstrap-role) in the target account (with the tag value as file-publishing or deploy). The example below shows how to add permissions to the UpdatePipeline stage:

private void grantUpdatePipelineCrossAccoutPermissions(Map<String, Environment> stageNameEnvironment) {

    if (!stageNameEnvironment.isEmpty()) {

        this.pipeline.buildPipeline();
        for (String stage : stageNameEnvironment.keySet()) {

            HashMap<String, String[]> condition = new HashMap<>();
            condition.put(
                    "iam:ResourceTag/aws-cdk:bootstrap-role",
                    new String[] {"file-publishing", "deploy"});
            pipeline.getSelfMutationProject()
                    .getRole()
                    .addToPrincipalPolicy(PolicyStatement.Builder.create()
                            .actions(Arrays.asList("sts:AssumeRole"))
                            .effect(Effect.ALLOW)
                            .resources(Arrays.asList("arn:*:iam::"
                                    + stageNameEnvironment.get(stage).getAccount() + ":role/*"))
                            .conditions(new HashMap<String, Object>() {{
                                    put("ForAnyValue:StringEquals", condition);
                            }})
                            .build());
        }
    }
}

The Deployment Stage

Let’s consider a pipeline that has a single deployment stage, UAT. The UAT stage deploys a DemoService. For that, it requires four actions: DemoService-UAT (Prepare and Deploy), ConfigureBlueGreenDeploy and Deploy.

When using CodeDeploy the deployment stage is expected to have four actions: two actions to create CloudFormation change set and deploy the ECS or compute infrastructure, an action to configure CodeDeploy and the last action that deploys the application using CodeDeploy. In the diagram, these are (in the diagram in the respective order): DemoService-UAT.Prepare and DemoService-UAT.Deploy, ConfigureBlueGreenDeploy and Deploy.

The
DemoService-UAT.Deploy action will create the ECS resources and the CodeDeploy application and deployment group. The
ConfigureBlueGreenDeploy action will read the AWS CDK
cloud assembly. It uses the configuration files to identify the Amazon Elastic Container Registry (Amazon ECR) repository and the container image tag pushed. The pipeline will send this information to the
Deploy action.  The
Deploy action starts the deployment using CodeDeploy.

Solution Overview

As a convenience, I created an application, written in Java, that solves all these challenges and can be used as an example. The application deployment follows the same 5 steps for all deployment scenarios of account and Region, and this includes the scenarios represented in the following design:

A pipeline created by a Toolchain should be able to deploy to any combination of accounts and regions. This includes four scenarios: single-account and single-Region, single-account and cross-Region, cross-account and single-Region and cross-account and cross-Region

Conclusion

In this post, I identified, explained and solved challenges associated with the creation of a pipeline that deploys a service to Amazon ECS using CodeDeploy in different combinations of accounts and regions. I also introduced a demo application that implements these recommendations. The sample code can be extended to implement more elaborate scenarios. These scenarios might include automated testing, automated deployment rollbacks, or disaster recovery. I wish you success in your transformative journey.

Luiz Decaro

Luiz is a Principal Solutions architect at Amazon Web Services (AWS). He focuses on helping customers from the Financial Services Industry succeed in the cloud. Luiz holds a master’s in software engineering and he triggered his first continuous deployment pipeline in 2005.

Integrating AWS WAF with your Amazon Lightsail instance

Post Syndicated from Macey Neff original https://aws.amazon.com/blogs/compute/integrating-aws-waf-with-your-amazon-lightsail-instance/

This blog post is written by Riaz Panjwani, Solutions Architect, Canada CSC and Dylan Souvage, Solutions Architect, Canada CSC.

Security is the top priority at AWS. This post shows how you can level up your application security posture on your Amazon Lightsail instances with an AWS Web Application Firewall (AWS WAF) integration. Amazon Lightsail offers easy-to-use virtual private server (VPS) instances and more at a cost-effective monthly price.

Lightsail provides security functionality built-in with every instance through the Lightsail Firewall. Lightsail Firewall is a network-level firewall that enables you to define rules for incoming traffic based on IP addresses, ports, and protocols. Developers looking to help protect against attacks such as SQL injection, cross-site scripting (XSS), and distributed denial of service (DDoS) can leverage AWS WAF on top of the Lightsail Firewall.

As of this post’s publishing, AWS WAF can only be deployed on Amazon CloudFront, Application Load Balancer (ALB), Amazon API Gateway, and AWS AppSync. However, Lightsail can’t directly act as a target for these services because Lightsail instances run within an AWS managed Amazon Virtual Private Cloud (Amazon VPC). By leveraging VPC peering, you can deploy the aforementioned services in front of your Lightsail instance, allowing you to integrate AWS WAF with your Lightsail instance.

Solution Overview

This post shows you two solutions to integrate AWS WAF with your Lightsail instance(s). The first uses AWS WAF attached to an Internet-facing ALB. The second uses AWS WAF attached to CloudFront. By following one of these two solutions, you can utilize rule sets provided in AWS WAF to secure your application running on Lightsail.

Solution 1: ALB and AWS WAF

This first solution uses VPC peering and ALB to allow you to use AWS WAF to secure your Lightsail instances. This section guides you through the steps of creating a Lightsail instance, configuring VPC peering, creating a security group, setting up a target group for your load balancer, and integrating AWS WAF with your load balancer.

AWS architecture diagram showing Amazon Lightsail integration with WAF using VPC peering across two separate VPCs. The Lightsail application is in a private subnet inside the managed VPC(vpc-b), with peering connection to your VPC(vpc-a) which has an ALB in a public subnet with WAF attached to it.

Creating the Lightsail Instance

For this walkthrough, you can utilize an AWS Free Tier Linux-based WordPress blueprint.

1. Navigate to the Lightsail console and create the instance.

2. Verify that your Lightsail instance is online and obtain its private IP, which you will need when configuring the Target Group later.

Screenshot of Lightsail console with a WordPress application set up showcasing the networking tab.

Attaching an ALB to your Lightsail instance

You must enable VPC peering as you will be utilizing an ALB in a separate VPC.

1. To enable VPC peering, navigate to your account in the top-right corner, select the Account dropdown, select Account, then select Advanced, and select Enable VPC Peering. Note the AWS Region being selected, as it is necessary later. For this example, select “us-east-2”. Screenshot of Lightsail console in the settings menu under the advanced section showcasing VPC peering.2. In the AWS Management Console, navigate to the VPC service in the search bar, select VPC Peering Connections and verify the created peering connection.

Screenshot of the AWS Console showing the VPC Peering Connections menu with an active peering connection.

3. In the left navigation pane, select Security groups, and create a Security group that allows HTTP traffic (port 80). This is used later to allow public HTTP traffic to the ALB.

4. Navigate to the Amazon Elastic Compute Cloud (Amazon EC2) service, and in the left pane under Load Balancing select Target Groups. Proceed to create a Target Group, choosing IP addresses as the target type.Screenshot of the AWS console setting up target groups with the IP address target type selected.

5. Proceed to the Register targets section, and select Other private IP address. Add the private IP address of the Lightsail instance that you created before. Select Include as Pending below and then Create target group (note that if your Lightsail instance is re-launched the target group must be updated as the private IP address may change).

6. In the left pane, select Load Balancers, select Create load balancers and choose Application Load Balancer. Ensure that you select the “Internet-facing” scheme, otherwise, you will not be able to connect to your instance over the internet.Screenshot of the AWS console setting up target groups with the IP address target type selected.

7. Select the VPC in which you want your ALB to reside. In this example, select the default VPC and all the Availability Zones (AZs) to make sure of the high availability of the load balancer.

8. Select the Security Group created in Step 3 to make sure that public Internet traffic can pass through the load balancer.

9. Select the target group under Listeners and routing to the target group you created earlier (in Step 5). Proceed to Create load balancer.Screenshot of the AWS console creating an ALB with the target group created earlier in the blog, selected as the listener.

10. Retrieve the DNS name from your load balancer again by navigating to the Load Balancers menu under the EC2 service.Screenshot of the AWS console with load balancer created.

11. Verify that you can access your Lightsail instance using the Load Balancer’s DNS by copying the DNS name into your browser.

Screenshot of basic WordPress app launched accessed via a web browser.

Integrating AWS WAF with your ALB

Now that you have ALB successfully routing to the Lightsail instance, you can restrict the instance to only accept traffic from the load balancer, and then create an AWS WAF web Access Control List (ACL).

1. Navigate back to the Lightsail service, select the Lightsail instance previously created, and select Networking. Delete all firewall rules that allow public access, and under IPv4 Firewall add a rule that restricts traffic to the IP CIDR range of the VPC of the previously created ALB.

Screenshot of the Lightsail console showing the IPv4 firewall.

2. Now you can integrate the AWS WAF to the ALB. In the Console, navigate to the AWS WAF console, or simply navigate to your load balancer’s integrations section, and select Create web ACL.

Screenshot of the AWS console showing the WAF configuration in the integrations tab of the ALB.

3. Choose Create a web ACL, and then select Add AWS resources to add the previously created ALB.Screenshot of creating and assigning a web ACL to the ALB.

4. Add any rules you want to your ACL, these rules will govern the traffic allowed or denied to your resources. In this example, you can add the WordPress application managed rules.Screenshot of adding the AWS WAF managed rule for WordPress applications.

5. Leave all other configurations as default and create the AWS WAF.

6. You can verify your firewall is attached to the ALB in the load balancer Integrations section.Screenshot of the AWS console showing the WAF integration detected in the integrations tab of the ALB.

Solution 2: CloudFront and AWS WAF

Now that you have set up ALB and VPC peering to your Lightsail instance, you can optionally choose to add CloudFront to the solution. This can be done by setting up a custom HTTP header rule in the Listener of your ALB, setting up the CloudFront distribution to use the ALB as an origin, and setting up an AWS WAF web ACL for your new CloudFront distribution. This configuration makes traffic limited to only accessing your application through CloudFront, and is still protected by WAF.AWS architecture diagram showing Amazon Lightsail integration with WAF using VPC peering across two separate VPCs. The Lightsail application is in a public subnet inside VPC-B, with peering connection to VPC-A which has an ALB in a private subnet fronted with CloudFront that has WAF attached.

1. Navigate to the CloudFront service, and click Create distribution.

2. Under Origin domain, select the load balancer that you had created previously.Screenshot of creating a distribution in CloudFront.

3. Scroll down to the Add custom header field, and click Add header.

4. Create your header name and value. Note the header name and value as you will need it later in the walkthrough.Screenshot of adding the custom header to your CloudFront distribution.

5. Scroll down to the Cache key and origin requests section. Under Cache policy, choose CachingDisabled.Screenshot of selecting the CachingDisabled cache policy inside the creation of the CloudFront distribution.

6. Scroll to the Web Application Firewall (WAF) section, and select Enable security protections.Screenshot of selecting “Enable security protections” inside the creation of the CloudFront distribution.

7. Leave all other configurations as default, and click Create distribution.

8. Wait until your CloudFront distribution has been deployed, and verify that you can access your Lightsail application using the DNS under Domain name.

Screenshot of the CloudFront distribution created with the status as enabled and the deployment finished.

9. Navigate to the EC2 service, and in the left pane under Load Balancing, select Load Balancers.

10. Select the load balancer you created previously, and under the Listeners tab, select the Listener you had created previously. Select Actions in the top right and then select Manage rules.Screenshot of the Listener section of the ALB with the Manage rules being selected.

11. Select the edit icon at the top, and then select the edit icon beside the Default rule.

Screenshot of the edit section inside managed rules.

12. Select the delete icon to delete the Default Action.

Screenshot of highlighting the delete button inside the edit rules section.

13. Choose Add action and then select Return fixed response.

Screenshot of adding a new rule “Return fixed response…”.

14. For Response code, enter 403, this will restrict access to CloudFront.

15. For Response body, enter “Access Denied”.

16. Select Update in the top right corner to update the Default rule.

Screenshot of the rule being successfully updated.

17. Select the insert icon at the top, then select Insert Rule.

Screenshot of inserting a new rule to the Listener.

18. Choose Add Condition, then select Http header. For Header type, enter the Header name, and then for Value enter the Header Value chosen previously.

19. Choose Add Action, then select Forward to and select the target group you had created in the previous section.

20. Choose Save at the top right corner to create the rule.

Screenshot of adding a new rule to the Listener, with the Http header selected as the custom-header and custom-value from the previous creation of the CloudFront distribution, with the Load Balancer selected as the target group.

21. Retrieve the DNS name from your load balancer again by navigating to the Load Balancers menu under the EC2 service.

22. Verify that you can no longer access your Lightsail application using the Load Balancer’s DNS.

Screenshot of the Lightsail application being accessed through the Load Balancer via a web browser with Access Denied being shown..

23. Navigate back to the CloudFront service and select the Distribution you had created. Under the General tab, select the Web ACL link under the AWS WAF section. Modify the Web ACL to leverage any managed or custom rule sets.

Screenshot of the CloudFront distribution focusing on the AWS WAF integration under the General tab Settings.

You have successfully integrated AWS WAF to your Lightsail instance! You can access your Lightsail instance via your CloudFront distribution domain name!

Clean Up Lightsail console resources

To start, you will delete your Lightsail instance.

  1. Sign in to the Lightsail console.
  2. For the instance you want to delete, choose the actions menu icon (⋮), then choose Delete.
  3. Choose Yes to confirm the deletion.

Next you will delete your provisioned static IP.

  1. Sign in to the Lightsail console.
  2. On the Lightsail home page, choose the Networking tab.
  3. On the Networking page choose the vertical ellipsis icon next to the static IP address that you want to delete, and then choose Delete.

Finally you will disable VPC peering.

  1. In the Lightsail console, choose Account on the navigation bar.
  2. Choose Advanced.
  3. In the VPC peering section, clear Enable VPC peering for all AWS Regions.

Clean Up AWS console resources

To start, you will delete your Load balancer.

  1. Navigate to the EC2 console, choose Load balancers on the navigation bar.
  2. Select the load balancer you created previously.
  3. Under Actions, select Delete load balancer.

Next, you will delete your target group.

  1. Navigate to the EC2 console, choose Target Groups on the navigation bar.
  2. Select the target group you created previously.
  3. Under Actions, select Delete.

Now you will delete your CloudFront distribution.

  1. Navigate to the CloudFront console, choose Distributions on the navigation bar.
  2. Select the distribution you created earlier and select Disable.
  3. Wait for the distribution to finish deploying.
  4. Select the same distribution after it is finished deploying and select Delete.

Finally, you will delete your WAF ACL.

  1. Navigate to the WAF console, and select Web ACLS on the navigation bar.
  2. Select the web ACL you created previously, and select Delete.

Conclusion

Adding AWS WAF to your Lightsail instance enhances the security of your application by providing a robust layer of protection against common web exploits and vulnerabilities. In this post, you learned how to add AWS WAF to your Lightsail instance through two methods: Using AWS WAF attached to an Internet-facing ALB and using AWS WAF attached to CloudFront.

Security is top priority at AWS and security is an ongoing effort. AWS strives to help you build and operate architectures that protect information, systems, and assets while delivering business value. To learn more about Lightsail security, check out the AWS documentation for Security in Amazon Lightsail.

Let’s Architect! Cost-optimizing AWS workloads

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-cost-optimizing-aws-workloads/

Every software component built by engineers and architects is designed with a purpose: to offer particular functionalities and, ultimately, contribute to the generation of business value. We should consider fundamental factors, such as the scalability of the software and the ease of evolution during times of business changes. However, performance and cost are important factors as well since they can impact the business profitability.

This edition of Let’s Architect! follows a similar series post from 2022, which discusses optimizing the cost of an architecture. Today, we focus on architectural patterns, services, and best practices to design cost-optimized cloud workloads. We also want to identify solutions, such as the use of Graviton processors, for increased performance at lower price. Cost optimization is a continuous process that requires the identification of the right tools for each job, as well as the adoption of efficient designs for your system.

AWS re:Invent 2022 – Manage and control your AWS costs

Govern cloud usage and avoid cost surprises without slowing down innovation within your organization. In this re:Invent 2022 session, you can learn how to set up guardrails and operationalize cost control within your organizations using services, such as AWS Budgets and AWS Cost Anomaly Detection, and explore the latest enhancements in the AWS cost control space. Additionally, Mercado Libre shares how they automate their cloud cost control through central management and automated algorithms.

Take me to this re:Invent 2022 video!

Work backwards from team needs to define/deploy cloud governance in AWS environments

Work backwards from team needs to define/deploy cloud governance in AWS environments

Compute optimization

When it comes to optimizing compute workloads, there are many tools available, such as AWS Compute Optimizer, Amazon EC2 Spot Instances, Amazon EC2 Reserved Instances, and Graviton instances. Modernizing your applications can also lead to cost savings, but you need to know how to use the right tools and techniques in an effective and efficient way.

For AWS Lambda functions, you can use the AWS Lambda Cost Optimization video to learn how to optimize your costs. The video covers topics, such as understanding and graphing performance versus cost, code optimization techniques, and avoiding idle wait time. If you are using Amazon Elastic Container Service (Amazon ECS) and AWS Fargate, you can watch a Twitch video on cost optimization using Amazon ECS and AWS Fargate to learn how to adjust your costs. The video covers topics like using spot instances, choosing the right instance type, and using Fargate Spot.

Finally, with Amazon Elastic Kubernetes Service (Amazon EKS), you can use Karpenter, an open-source Kubernetes cluster auto scaler to help optimize compute workloads. Karpenter can help you launch right-sized compute resources in response to changing application load, help you adopt spot and Graviton instances. To learn more about Karpenter, read the post How CoStar uses Karpenter to optimize their Amazon EKS Resources on the AWS Containers Blog.

Take me to Cost Optimization using Amazon ECS and AWS Fargate!
Take me to AWS Lambda Cost Optimization!
Take me to How CoStar uses Karpenter to optimize their Amazon EKS Resources!

Karpenter launches and terminates nodes to reduce infrastructure costs

Karpenter launches and terminates nodes to reduce infrastructure costs

AWS Lambda general guidance for cost optimization

AWS Lambda general guidance for cost optimization

AWS Graviton deep dive: The best price performance for AWS workloads

The choice of the hardware is a fundamental driver for performance, cost, as well as resource consumption of the systems we build. Graviton is a family of processors designed by AWS to support cloud-based workloads and give improvements in terms of performance and cost. This re:Invent 2022 presentation introduces Graviton and addresses the problems it can solve, how the underlying CPU architecture is designed, and how to get started with it. Furthermore, you can learn the journey to move different types of workloads to this architecture, such as containers, Java applications, and C applications.

Take me to this re:Invent 2022 video!

AWS Graviton processors are specifically designed by AWS for cloud workloads to deliver the best price performance

AWS Graviton processors are specifically designed by AWS for cloud workloads to deliver the best price performance

AWS Well-Architected Labs: Cost Optimization

The Cost Optimization section of the AWS Well Architected Workshop helps you learn how to optimize your AWS costs by using features, such as AWS Compute Optimizer, Spot Instances, and Reserved Instances. The workshop includes hands-on labs that walk you through the process of optimizing costs for different types of workloads and services, such as Amazon Elastic Compute Cloud, Amazon ECS, and Lambda.

Take me to this AWS Well-Architected lab!

Savings Plans is a flexible pricing model that can help reduce expenses compared with on-demand pricing

Savings Plans is a flexible pricing model that can help reduce expenses compared with on-demand pricing

See you next time!

Thanks for joining us to discuss cost optimization! In 2 weeks, we’ll talk about in-memory databases and caching systems.

To find all the blogs from this series, visit the Let’s Architect! list of content on the AWS Architecture Blog.

Let’s Architect! Security in software architectures

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-security-in-software-architectures/

Security is fundamental for each product and service you are building with. Whether you are working on the back-end or the data and machine learning components of a system, the solution should be securely built.

In 2022, we discussed security in our post Let’s Architect! Architecting for Security. Today, we take a closer look at general security practices for your cloud workloads to secure both networks and applications, with a mix of resources to show you how to architect for security using the services offered by Amazon Web Services (AWS).

In this edition of Let’s Architect!, we share some practices for protecting your workloads from the most common attacks, introduce the Zero Trust principle (you can learn how AWS itself is implementing it!), plus how to move to containers and/or alternative approaches for managing your secrets.

A deep dive on the current security threat landscape with AWS

This session from AWS re:Invent, security engineers guide you through the most common threat vectors and vulnerabilities that AWS customers faced in 2022. For each possible threat, you can learn how it’s implemented by attackers, the weaknesses attackers tend to leverage, and the solutions offered by AWS to avert these security issues. We describe this as fundamental architecting for security: this implies adopting suitable services to protect your workloads, as well as follow architectural practices for security.

Take me to this re:Invent 2022 session!

Statistics about common attacks and how they can be launched

Statistics about common attacks and how they can be launched

Zero Trust: Enough talk, let’s build better security

What is Zero Trust? It is a security model that produces higher security outcomes compared with the traditional network perimeter model.

How does Zero Trust work in practice, and how can you start adopting it? This AWS re:Invent 2022 session defines the Zero Trust models and explains how to implement one. You can learn how it is used within AWS, as well as how any architecture can be built with these pillars in mind. Furthermore, there is a practical use case to show you how Delphix put Zero Trust into production.

Take me to this re:Invent 2022 session!

AWS implements the Zero Trust principle for managing interactions across different services

AWS implements the Zero Trust principle for managing interactions across different services

A deep dive into container security on AWS

Nowadays, it’s vital to have a thorough understanding of a container’s underlying security layers. AWS services, like Amazon Elastic Kubernetes Service and Amazon Elastic Container Service, have harnessed these Linux security-layer protections, keeping a sharp focus on the principle of least privilege. This approach significantly minimizes the potential attack surface by limiting the permissions and privileges of processes, thus upholding the integrity of the system.

This re:Inforce 2023 session discusses best practices for securing containers for your distributed systems.

Take me to this re:Inforce 2023 session!

Fundamentals and best practices to secure containers

Fundamentals and best practices to secure containers

Migrating your secrets to AWS Secrets Manager

Secrets play a critical role in providing access to confidential systems and resources. Ensuring the secure and consistent management of these secrets, however, presents a challenge for many organizations.

Anti-patterns observed in numerous organizational secrets management systems include sharing plaintext secrets via unsecured means, such as emails or messaging apps, which can allow application developers to view these secrets in plaintext or even neglect to rotate secrets regularly. This detailed guidance walks you through the steps of discovering and classifying secrets, plus explains the implementation and migration processes involved in transferring secrets to AWS Secrets Manager.

Take me to this AWS Security Blog post!

An organization's perspectives and responsibilities when building a secrets management solution

An organization’s perspectives and responsibilities when building a secrets management solution

Conclusion

We’re glad you joined our conversation on building secure architectures! Join us in a couple of weeks when we’ll talk about cost optimization on AWS.

To find all the blogs from this series, visit the Let’s Architect! list of content on the AWS Architecture Blog.

AWS Fargate Enables Faster Container Startup using Seekable OCI

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/aws-fargate-enables-faster-container-startup-using-seekable-oci/

While developing with containers is becoming an increasingly popular way for deploying and scaling applications, there are still areas where improvements can be made. One of the main issues with scaling containerized applications is the long startup time, especially during scale up when newer instances need to be added. This issue can have a negative impact on the customer experience, for example when a website needs to scale out to serve additional traffic.

A research paper shows that container image downloads account for 76 percent of container startup time, but on average only 6.4 percent of the data is needed for the container to start doing useful work. Starting and scaling out containerized applications requires downloading container images from a remote container registry. This may introduce a non-trivial latency, as the entire image must be downloaded and unpacked before the applications can be started.

One solution to this problem is lazy loading (also known as asynchronous loading) container images. This approach downloads data from the container registry in parallel with the application startup, such as stargz-snapshotter, a project that aims to improve the overall container start time.

Last year, we introduced Seekable OCI (SOCI), a technology open sourced by Amazon Web Services (AWS) that enables container runtimes to implement lazy loading the container image to start applications faster without modifying the container images. As part of that effort, we open sourced SOCI Snapshotter, a snapshotter plugin that enables lazy loading with SOCI in containerd.

AWS Fargate Support for SOCI
Today, I’m excited to share that AWS Fargate now supports Seekable OCI (SOCI), which helps applications deploy and scale out faster by enabling containers to start without waiting to download the entire container image. At launch, this new capability is available for Amazon Elastic Container Service (Amazon ECS) applications running on AWS Fargate.

Here’s a quick look to show how AWS Fargate support for SOCI works:

SOCI works by creating an index (SOCI index) of the files within an existing container image. This index is a key enabler to launching containers faster, providing the capability to extract an individual file from a container image without having to download the entire image. Your applications no longer need to wait to complete pulling and unpacking a container image before your applications start running. This allows you to deploy and scale out applications more quickly and reduce the rollout time for application updates.

A SOCI index is generated and stored separately from the container images. This means that your container images don’t need to be converted to use SOCI, therefore not breaking secure hash algorithm (SHA)-based security, such as container image signing. The index is then stored in the registry alongside the container image. At release, AWS Fargate support for SOCI works with Amazon Elastic Container Registry (Amazon ECR).

When you use Amazon ECS with AWS Fargate to run your SOCI-indexed containerized images, AWS Fargate automatically detects if a SOCI index for the image exists and starts the container without waiting for the entire image to be pulled. This also means that AWS Fargate will still continue to run container images that don’t have SOCI indexes.

Let’s Get Started
There are two ways to create SOCI indexes for container images.

  • Use AWS SOCI Index BuilderAWS SOCI Index Builder is a serverless solution for indexing container images in the AWS Cloud. This AWS CloudFormation stack deploys an Amazon EventBridge rule to identify Amazon ECR action events and invoke an AWS Lambda function to match the defined filter. Then, another AWS Lambda function generates and pushes SOCI indexes to repositories in the Amazon ECR registry.
  • Create SOCI indexes manually – This approach provides more flexibility on in how the SOCI indexes are created, including for existing container images in Amazon ECR repositories. To create SOCI indexes, you can use the soci CLI provided by the soci-snapshotter project.

The AWS SOCI Index Builder provides you with an automated process to get started and build SOCI indexes for your container images. The sociCLI provides you with more flexibility around index generation and the ability to natively integrate index generation in your CI/CD pipelines.

In this article, I manually generate SOCI indexes using the soci CLI from the soci-snapshotter project.

Create a Repository and Push Container Images
First, I create an Amazon ECR repository called pytorch-socifor my container image using AWS CLI.

$ aws ecr create-repository --region us-east-1 --repository-name pytorch-soci

I keep the Amazon ECR URI output and define it as a variable to make it easier for me to refer to the repository in the next step.

$ ECRSOCIURI=xyz.dkr.ecr.us-east-1.amazonaws.com/pytorch-soci:latest

For the sample application, I use a PyTorch training (CPU-based) container image from AWS Deep Learning Containers. I use the nerdctl CLI to pull the container image because, by default, the Docker Engine stores the container image in the Docker Engine image store, not the containerd image store.

$ SAMPLE_IMAGE="763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.1-cpu-py36-ubuntu16.04" 
$ aws ecr get-login-password --region us-east-1 | sudo nerdctl login --username AWS --password-stdin xyz.dkr.ecr.ap-southeast-1.amazonaws.com
$ sudo nerdctl pull --platform linux/amd64 $SAMPLE_IMAGE

Then, I tag the container image for the repository that I created in the previous step.

$ sudo nerdctl tag $SAMPLE_IMAGE $ECRSOCIURI

Next, I need to push the container image into the ECR repository.

$ sudo nerdctl push $ECRSOCIURI

At this point, my container image is already in my Amazon ECR repository.

Create SOCI Indexes
Next, I need to create SOCI index.

A SOCI index is an artifact that enables lazy loading of container images. A SOCI index consists of 1) a SOCI index manifest and 2) a set of zTOCs. The following image illustrates the components in a SOCI index manifest, and how it refers to a container image manifest.

The SOCI index manifest contains the list of zTOCs and a reference to the image for which the manifest was generated. A zTOC, or table of contents for compressed data, consists of two parts:

  1. TOC, a table of contents containing file metadata and the corresponding offset in the decompressed TAR archive.
  2. zInfo, a collection of checkpoints representing the state of the compression engine at various points in the layer.

To learn more about the concept and term, please visit soci-snapshotter Terminology page.

Before I can create SOCI indexes, I need to install the sociCLI. To learn more about how to install the soci, visit Getting Started with soci-snapshotter.

To create SOCI indexes, I use the soci create command.

$ sudo soci create $ECRSOCIURI
layer sha256:4c6ec688ebe374ea7d89ce967576d221a177ebd2c02ca9f053197f954102e30b -> ztoc skipped
layer sha256:ab09082b308205f9bf973c4b887132374f34ec64b923deef7e2f7ea1a34c1dad -> ztoc skipped
layer sha256:cd413555f0d1643e96fe0d4da7f5ed5e8dc9c6004b0731a0a810acab381d8c61 -> ztoc skipped
layer sha256:eee85b8a173b8fde0e319d42ae4adb7990ed2a0ce97ca5563cf85f529879a301 -> ztoc skipped
layer sha256:3a1b659108d7aaa52a58355c7f5704fcd6ab1b348ec9b61da925f3c3affa7efc -> ztoc skipped
layer sha256:d8f520dcac6d926130409c7b3a8f77aea639642ba1347359aaf81a8b43ce1f99 -> ztoc skipped
layer sha256:d75d26599d366ecd2aa1bfa72926948ce821815f89604b6a0a49cfca100570a0 -> ztoc skipped
layer sha256:a429d26ed72a85a6588f4b2af0049ae75761dac1bb8ba8017b8830878fb51124 -> ztoc skipped
layer sha256:5bebf55933a382e053394e285accaecb1dec9e215a5c7da0b9962a2d09a579bc -> ztoc skipped
layer sha256:5dfa26c6b9c9d1ccbcb1eaa65befa376805d9324174ac580ca76fdedc3575f54 -> ztoc skipped
layer sha256:0ba7bf18aa406cb7dc372ac732de222b04d1c824ff1705d8900831c3d1361ff5 -> ztoc skipped
layer sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888 -> ztoc sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4
layer sha256:089632f60d8cfe243c5bc355a77401c9a8d2f415d730f00f6f91d44bb96c251b -> ztoc sha256:f6a16d3d07326fe3bddbdb1aab5fbd4e924ec357b4292a6933158cc7cc33605b
layer sha256:f18dd99041c3095ade3d5013a61a00eeab8b878ba9be8545c2eabfbca3f3a7f3 -> ztoc sha256:95d7966c964dabb54cb110a1a8373d7b88cfc479336d473f6ba0f275afa629dd
layer sha256:69e1edcfbd217582677d4636de8be2a25a24775469d677664c8714ed64f557c3 -> ztoc sha256:ac0e18bd39d398917942c4b87ac75b90240df1e5cb13999869158877b400b865

From the above output, I can see that sociCLI created zTOCs for four layers, which and this means only these four layers will be lazily pulled and the other container image layers will be downloaded in full before the container image starts. This is because there is less of a launch time impact in lazy loading very small container image layers. However, you can configure this behavior using the --min-layer-size flag when you run soci create.

Verify and Push SOCI Indexes
The soci CLI also provides several commands that can help you to review the SOCI Indexes that have been generated.

To see a list of all index manifests, I can run the following command.

$ sudo soci index list

DIGEST                                                                     SIZE    IMAGE REF                                                                                   PLATFORM       MEDIA TYPE                                    CREATED
sha256:ea5c3489622d4e97d4ad5e300c8482c3d30b2be44a12c68779776014b15c5822    1931    xyz.dkr.ecr.us-east-1.amazonaws.com/pytorch-soci:latest                                     linux/amd64    application/vnd.oci.image.manifest.v1+json    10m4s ago
sha256:ea5c3489622d4e97d4ad5e300c8482c3d30b2be44a12c68779776014b15c5822    1931    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.1-cpu-py36-ubuntu16.04    linux/amd64    application/vnd.oci.image.manifest.v1+json    10m4s ago

While optional, if I need to see the list of zTOC, I can use the following command.

$ sudo soci ztoc list
DIGEST                                                                     SIZE        LAYER DIGEST
sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4    2038072     sha256:4007a89234b4f56c03e6831dc220550d2e5fba935d9f5f5bcea64857ac4f4888
sha256:95d7966c964dabb54cb110a1a8373d7b88cfc479336d473f6ba0f275afa629dd    11442416    sha256:f18dd99041c3095ade3d5013a61a00eeab8b878ba9be8545c2eabfbca3f3a7f3
sha256:ac0e18bd39d398917942c4b87ac75b90240df1e5cb13999869158877b400b865    36277264    sha256:69e1edcfbd217582677d4636de8be2a25a24775469d677664c8714ed64f557c3
sha256:f6a16d3d07326fe3bddbdb1aab5fbd4e924ec357b4292a6933158cc7cc33605b    10152696    sha256:089632f60d8cfe243c5bc355a77401c9a8d2f415d730f00f6f91d44bb96c251b

This series of zTOCs contains all of the information that SOCI needs to find a given file in a layer. To review the zTOC for each layer, I can use one of the digest sums from the preceding output and use the following command.

$ sudo soci ztoc info sha256:0b4d78c856b7e9e3d507ac6ba64e2e2468997639608ef43c088637f379bb47e4
{
  "version": "0.9",
  "build_tool": "AWS SOCI CLI v0.1",
  "size": 2038072,
  "span_size": 4194304,
  "num_spans": 33,
  "num_files": 5552,
  "num_multi_span_files": 26,
  "files": [
    {
      "filename": "bin/",
      "offset": 512,
      "size": 0,
      "type": "dir",
      "start_span": 0,
      "end_span": 0
    },
    {
      "filename": "bin/bash",
      "offset": 1024,
      "size": 1037528,
      "type": "reg",
      "start_span": 0,
      "end_span": 0
    }

---Trimmed for brevity---

Now, I need to use the following command to push all SOCI-related artifacts into the Amazon ECR.

$ PASSWORD=$(aws ecr get-login-password --region us-east-1)
$ sudo soci push --user AWS:$PASSWORD $ECRSOCIURI

If I go to my Amazon ECR repository, I can verify the index is created. Here, I can see that two additional objects are listed alongside my container image: a SOCI Index and an Image index. The image index allows AWS Fargate to look up SOCI indexes associated with my container image.

Understanding SOCI Performance
The main objective of SOCI is to minimize the required time to start containerized applications. To measure the performance of AWS Fargate lazy loading container images using SOCI, I need to understand how long it takes for my container images to start with SOCI and without SOCI.

To understand the duration needed for each container image to start, I can use metrics available from the DescribeTasks API on Amazon ECS. The first metric is createdAt, the timestamp for the time when the task was created and entered the PENDING state. The second metric is startedAt, the time when the task transitioned from the PENDING state to the RUNNING state.

For this, I have created another Amazon ECR repository using the same container image but without generating a SOCI index, called pytorch-without-soci. If I compare these container images, I have two additional objects in pytorch-soci(an image index and a SOCI index) that don’t exist in pytorch-without-soci.

Deploy and Run Applications
To run the applications, I have created an Amazon ECS cluster called demo-pytorch-soci-cluster, a VPC and the required ECS task execution role. If you’re new to Amazon ECS, you can follow Getting started with Amazon ECS to be more familiar with how to deploy and run your containerized applications.

Now, let’s deploy and run both the container images with FARGATE as the launch type. I define five tasks for each pytorch-sociand pytorch-without-soci.

$ aws ecs \ 
    --region us-east-1 \ 
    run-task \ 
    --count 5 \ 
    --launch-type FARGATE \ 
    --task-definition arn:aws:ecs:us-east-1:XYZ:task-definition/pytorch-soci \ 
    --cluster socidemo 

$ aws ecs \ 
    --region us-east-1 \ 
    run-task \ 
    --count 5 \ 
    --launch-type FARGATE \ 
    --task-definition arn:aws:ecs:us-east-1:XYZ:task-definition/pytorch-without-soci \ 
    --cluster socidemo

After a few minutes, there are 10 running tasks on my ECS cluster.

After verifying that all my tasks are running, I run the following script to get two metrics: createdAt and startedAt.

#!/bin/bash
CLUSTER=<CLUSTER_NAME>
TASKDEF=<TASK_DEFINITION>
REGION="us-east-1"
TASKS=$(aws ecs list-tasks \
    --cluster $CLUSTER \
    --family $TASKDEF \
    --region $REGION \
    --query 'taskArns[*]' \
    --output text)

aws ecs describe-tasks \
    --tasks $TASKS \
    --region $REGION \
    --cluster $CLUSTER \
    --query "tasks[] | reverse(sort_by(@, &createdAt)) | [].[{startedAt: startedAt, createdAt: createdAt, taskArn: taskArn}]" \
    --output table

Running the above command for the container image without SOCI indexes — pytorch-without-soci— produces following output:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                                   DescribeTasks                                                                                   |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
|             createdAt            |             startedAt             |                                                  taskArn                                                   |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
|  2023-07-07T17:43:59.233000+00:00|  2023-07-07T17:46:09.856000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/dcdf19b6e66444aeb3bc607a3114fae0   |
|  2023-07-07T17:43:59.233000+00:00|  2023-07-07T17:46:09.459000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/9178b75c98ee4c4e8d9c681ddb26f2ca   |
|  2023-07-07T17:43:59.233000+00:00|  2023-07-07T17:46:21.645000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/7da51e036c414cbab7690409ce08cc99   |
|  2023-07-07T17:43:59.233000+00:00|  2023-07-07T17:46:00.606000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/5ee8f48194874e6dbba75a5ef753cad2   |
|  2023-07-07T17:43:59.233000+00:00|  2023-07-07T17:46:02.461000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/58531a9e94ed44deb5377fa997caec36   |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+

From the average aggregated delta time (between startedAt and createdAt) for each task, the pytorch-without-soci (without SOCI indexes) successfully ran after 129 seconds.

Next, I’m running same command but for pytorch-sociwhich comes with SOCI indexes.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                                                                                   DescribeTasks                                                                                   |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
|             createdAt            |             startedAt             |                                                  taskArn                                                   |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+
|  2023-07-07T17:43:53.318000+00:00|  2023-07-07T17:44:51.076000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/c57d8cff6033494b97f6fd0e1b797b8f   |
|  2023-07-07T17:43:53.318000+00:00|  2023-07-07T17:44:52.212000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/6d168f9e99324a59bd6e28de36289456   |
|  2023-07-07T17:43:53.318000+00:00|  2023-07-07T17:45:05.443000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/4bdc43b4c1f84f8d9d40dbd1a41645da   |
|  2023-07-07T17:43:53.318000+00:00|  2023-07-07T17:44:50.618000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/43ea53ea84154d5aa90f8fdd7414c6df   |
|  2023-07-07T17:43:53.318000+00:00|  2023-07-07T17:44:50.777000+00:00 |  arn:aws:ecs:ap-southeast-1:xyz:task/demo-pytorch-soci-cluster/0731bea30d42449e9006a5d8902756d5   |
+----------------------------------+-----------------------------------+------------------------------------------------------------------------------------------------------------+

Here, I see my container image with SOCI-enabled — pytorch-soci — was started 60 seconds after being created.

This means that running my sample application with SOCI indexes on AWS Fargate is approximately 50 percent faster compared to running without SOCI indexes.

It’s recommended to benchmark the startup and scaling-out time of your application with and without SOCI. This helps you to have a better understanding of how your application behaves and if your applications benefit from AWS Fargate support for SOCI.

Customer Voices
During the private preview period, we heard lots of feedback from our customers about AWS Fargate support for SOCI. Here’s what our customers say:

Autodesk provides critical design, make, and operate software solutions across the architecture, engineering, construction, manufacturing, media, and entertainment industries. “SOCI has given us a 50% improvement in startup performance for our time-sensitive simulation workloads running on Amazon ECS with AWS Fargate. This allows our application to scale out faster, enabling us to quickly serve increased user demand and save on costs by reducing idle compute capacity. The AWS Partner Solution for creating the SOCI index is easy to configure and deploy.” – Boaz Brudner, Head of Innovyze SaaS Engineering, AI and Architecture, Autodesk.

Flywire is a global payments enablement and software company, on a mission to deliver the world’s most important and complex payments. “We run multi-step deployment pipelines on Amazon ECS with AWS Fargate which can take several minutes to complete. With SOCI, the total pipeline duration is reduced by over 50% without making any changes to our applications, or the deployment process. This allowed us to drastically reduce the rollout time for our application updates. For some of our larger images of over 750MB, SOCI improved the task startup time by more than 60%.”, Samuel Burgos, Sr. Cloud Security Engineer, Flywire.

Virtuoso is a leading software corporation that makes functional UI and end-to-end testing software. “SOCI has helped us reduce the lag between demand and availability of compute. We have very bursty workloads which our customers expect to start as fast as possible. SOCI helps our ECS tasks spin-up 40% faster, allowing us to quickly scale our application and reduce the pool of idle compute capacity, enabling us to deliver value more efficiently. Setting up SOCI was really easy. We opted to use the quick-start AWS Partner’s solution with which we could leave our build and deployment pipelines untouched.”, Mathew Hall, Head of Site Reliability Engineering, Virtuoso.

Things to Know
Availability — AWS Fargate support for SOCI is available in all AWS Regions where Amazon ECS, AWS Fargate, and Amazon ECR are available.

Pricing — AWS Fargate support for SOCI is available at no additional cost and you will only be charged for storing the SOCI indexes in Amazon ECR.

Get Started — Learn more about benefits and how to get started on the AWS Fargate Support for SOCI page.

Happy building.
Donnie