Tag Archives: AWS Trusted Advisor

AWS Weekly Roundup – EC2 DL2q instances, PartyRock, Amplify’s 6th birthday, and more – November 20, 2023

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-ec2-dl2q-instances-partyrock-amplifys-6th-birthday-and-more-november-20-2023/

Last week I saw an astonishing 160+ new service launches. There were so many updates that we decided to publish a weekly roundup again. This continues the same innovative pace of the previous week as we are getting closer to AWS re:Invent 2023.

Our News Blog team is also finalizing new blog posts for re:Invent to introduce awesome launches with service teams for your reading pleasure. Jeff Barr shared The Road to AWS re:Invent 2023 to explain our blogging journey and process. Please stay tuned in the next week!

Last week’s launches
Here are some of the launches that caught my attention last week:

Amazon EC2 DL2q instances – New DL2q instances are powered by Qualcomm AI 100 Standard accelerators and are the first to feature Qualcomm’s AI technology in the public cloud. With eight Qualcomm AI 100 Standard accelerators and 128 GiB of total accelerator memory, you can run popular generative artificial intelligence (AI) applications and extend to edge devices across smartphones, autonomous driving, personal compute, and extended reality headsets to develop and validate these AI workloads before deploying.

PartyRock for Amazon Bedrock – We introduced PartyRock, a fun and intuitive hands-on, generative AI app-building playground powered by Amazon Bedrock. You can experiment, learn all about prompt engineering, build mini-apps, and share them with your friends—all without writing any code or creating an AWS account.

You also can now access the Meta Llama 2 Chat 13B foundation model and Cohere Command Light, Embed English, and multilingual models for Amazon Bedrock.

AWS Amplify celebrates its sixth birthday – We announced six new launches; a new documentation site, support for Next.js 14 with our hosting and JavaScript library, added custom token providers and an automatic React Native social sign-in update to Amplify Auth, new ChangePassword and DeleteUser account settings components, and updated all Amplify UI packages to use new Amplify JavaScript v6. You can also use wildcard subdomains when using a custom domain with your Amplify application deployed to AWS Amplify Hosting.

Amplify docs site UI

Also check out other News Blog posts about major launches published in the past week:

Other AWS service launches
Here are some other bundled feature launches per AWS service:

Amazon Athena  – You can use a new cost-based optimizer (CBO) to enhance query performance based on table and column statistics, collected by AWS Glue Data Catalog and Athena JDBC 3.x driver, a new alternative that supports almost all authentication plugins. You can also use Amazon EMR Studio to develop and run interactive queries on Amazon Athena.

Amazon CloudWatch – You can use a new CloudWatch metric called EBS Stalled I/O Check to monitor the health of your Amazon EBS volumes, the regular expression for Amazon CloudWatch Logs Live Tail filter pattern syntax to search and match relevant log events, observability of SAP Sybase ASE database in CloudWatch Application Insights, and up to two stats commands in a Log Insights query to perform aggregations on the results.

Amazon CodeCatalyst – You can connect to a Amazon Virtual Private Cloud (Amazon VPC) from CodeCatalyst Workflows, provision infrastructure using Terraform within CodeCatalyst Workflows, access CodeCatalyst with your workforce identities configured in IAM Identity Center, and create teams made up of members of the CodeCatalyst space.

Amazon Connect – You can use a pre-built queue performance dashboard and Contact Lens conversational analytics dashboard to view and compare real-time and historical aggregated queue performance. You can use quick responses for chats, previously written formats such as typing in ‘/#greet’ to insert a personalized response, and scanning attachments to detect malware or other unwanted content.

AWS Glue – AWS Glue for Apache Spark added new six database connectors: Teradata, SAP HANA, Azure SQL, Azure Cosmos DB, Vertica, and MongoDB, as well as the native connectivity to Amazon OpenSearch Service.

AWS Lambda – You can see single pane view of metrics, logs, and traces in the AWS Lambda console and advanced logging controls to natively capture logs in JSON structured format. You can view the SAM template on the Lambda console and export the function’s configuration to AWS Application Composer. AWS Lambda also supports Java 21 and NodeJS 20 versions built on the new Amazon Linux 2023 runtime.

AWS Local Zones in Dallas – You can enable the new Local Zone in Dallas, Texas, us-east-1-dfw-2a, with Amazon EC2 C6i, M6i, R6i, C6gn, and M6g instances and Amazon EBS volume types gp2, gp3, io1, sc1, and st1. You can also access Amazon ECS, Amazon EKS, Application Load Balancer, and AWS Direct Connect in this new Local Zone to support a broad set of workloads at the edge.

Amazon Managed Streaming for Apache Kafka (Amazon MSK) – You can standardize access control to Kafka resources using AWS Identity and Access Management (IAM) and build Kafka clients for Amazon MSK Serverless written in all programming languages. These are open source client helper libraries and code samples for popular languages, including Java, Python, Go, and JavaScript. Also, Amazon MSK now supports an enhanced version of Apache Kafka 3.6.0 that offers generally available Tiered Storage and automatically sends you storage capacity alerts when you are at risk of exhausting your storage.

Amazon OpenSearch Service Ingestion – You can migrate your data from Elasticsearch version 7.x clusters to the latest versions of Amazon OpenSearch Service and use persistent buffering to protect the durability of incoming data.

Amazon RDS –Amazon RDS for MySQL now supports creating active-active clusters using the Group Replication plugin, upgrading MySQL 5.7 snapshots to MySQL 8.0, and Innovation Release version of MySQL 8.1.

Amazon RDS Custom for SQL Server extends point-in-time recovery support for up to 1,000 databases, supports Service Master Key Retention to use transparent data encryption (TDE), table- and column-level encryption, DBMail and linked servers, and use SQL Server Developer edition with the bring your own media (BYOM).

Additionally, Amazon RDS Multi-AZ deployments with two readable standbys now supports minor version upgrades and system maintenance updates with typically less than one second of downtime when using Amazon RDS Proxy.

AWS Partner Central – You can use an improved user experience in AWS Partner Central to build and promote your offerings and the new Investments tab in the Partner Analytics Dashboard to gain actionable insights. You can now link accounts and associated users between Partner Central and AWS Marketplace and use an enhanced co-sell experience with APN Customer Engagements (ACE) manager.

Amazon QuickSight – You can programmatically manage user access and custom permissions support for roles to restrict QuickSight functionality to the QuickSight account for IAM Identity Center and Active Directory using APIs. You can also use shared restricted folders, a Contributor role and support for data source asset types in folders and the Custom Week Start feature, an addition designed to enhance the data analysis experience for customers across diverse industries and social contexts.

AWS Trusted Advisor – You can use new APIs to programmatically access Trusted Advisor best practices checks, recommendations, and prioritized recommendations and 37 new Amazon RDS checks that provide best practices guidance by analyzing DB instance configuration, usage, and performance data.

There’s a lot more launch news that I haven’t covered. See AWS What’s New for more details.

See you virtually in AWS re:Invent
AWS re:Invent 2023Next week we’ll hear the latest from AWS, learn from experts, and connect with the global cloud community in Las Vegas. If you come, check out the agenda, session catalog, and attendee guides before your departure.

If you’re not able to attend re:Invent in person this year, we’re offering the option to livestream our Keynotes and Innovation Talks. With the registration for online pass, you will have access to on-demand keynote, Innovation Talks, and selected breakout sessions after the event.

Channy

Managing AWS Lambda runtime upgrades

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/managing-aws-lambda-runtime-upgrades/

This post is written by Julian Wood, Principal Developer Advocate, and Dan Fox, Principal Specialist Serverless Solutions Architect.

AWS Lambda supports multiple programming languages through the use of runtimes. A Lambda runtime provides a language-specific execution environment, which provides the OS, language support, and additional settings, such as environment variables and certificates that you can access from your function code.

You can use managed runtimes that Lambda provides or build your own. Each major programming language release has a separate managed runtime, with a unique runtime identifier, such as python3.11 or nodejs20.x.

Lambda automatically applies patches and security updates to all managed runtimes and their corresponding container base images. Automatic runtime patching is one of the features customers love most about Lambda. When these patches are no longer available, Lambda ends support for the runtime. Over the next few months, Lambda is deprecating a number of popular runtimes, triggered by end of life of upstream language versions and of Amazon Linux 1.

Runtime Deprecation
Node.js 14 Nov 27, 2023
Node.js 16 Mar 11, 2024
Python 3.7 Nov 27, 2023
Java 8 (Amazon Linux 1) Dec 31, 2023
Go 1.x Dec 31, 2023
Ruby 2.7 Dec 07, 2023
Custom Runtime (provided) Dec 31, 2023

Runtime deprecation is not unique to Lambda. You must upgrade code using Python 3.7 or Node.js 14 when those language versions reach end of life, regardless of which compute service your code is running on. Lambda can help make this easier by tracking which runtimes you are using and providing deprecation notifications.

This post contains considerations and best practices for managing runtime deprecations and upgrades when using Lambda. Adopting these techniques makes managing runtime upgrades easier, especially when working with a large number of functions.

Specifying Lambda runtimes

When you deploy your function as a .zip file archive, you choose a runtime when you create the function. To change the runtime, you can update your function’s configuration.

Lambda keeps each managed runtime up to date by taking on the operational burden of patching the runtimes with security updates, bug fixes, new features, performance enhancements, and support for minor version releases. These runtime updates are published as runtime versions. Lambda applies runtime updates to functions by migrating the function from an earlier runtime version to a new runtime version.

You can control how your functions receive these updates using runtime management controls. Runtime versions and runtime updates apply to patch updates for a given Lambda runtime. Lambda does not automatically upgrade functions between major language runtime versions, for example, from nodejs14.x to nodejs18.x.

For a function defined as a container image, you choose a runtime and the Linux distribution when you create the container image. Most customers start with one of the Lambda base container images, although you can also build your own images from scratch. To change the runtime, you create a new container image from a different base container image.

Why does Lambda deprecate runtimes?

Lambda deprecates a runtime when upstream runtime language maintainers mark their language end-of-life or security updates are no longer available.

In almost all cases, the end-of-life date of a language version or operating system is published well in advance. The Lambda runtime deprecation policy gives end-of-life schedules for each language that Lambda supports. Lambda notifies you by email and via your Personal Health Dashboard if you are using a runtime that is scheduled for deprecation.

Lambda runtime deprecation happens in several stages. Lambda first blocks creating new functions that use a given runtime. Lambda later also blocks updating existing functions using the unsupported runtime, except to update to a supported runtime. Lambda does not block invocations of functions that use a deprecated runtime. Function invocations continue indefinitely after the runtime reaches end of support.

Lambda is extending the deprecation notification period from 60 days before deprecation to 180 days. Previously, blocking new function creation happened at deprecation and blocking updates to existing functions 30 days later. Blocking creation of new functions now happens 30 days after deprecation, and blocking updates to existing functions 60 days after.

Lambda occasionally delays deprecation of a Lambda runtime for a limited period beyond the end of support date of the language version that the runtime supports. During this period, Lambda only applies security patches to the runtime OS. Lambda doesn’t apply security patches to programming language runtimes after they reach their end of support date.

Can Lambda automatically upgrade my runtime?

Moving from one major version of the language runtime to another has a significant risk of being a breaking change. Some libraries and dependencies within a language have deprecation schedules and do not support versions of a language past a certain point. Moving functions to new runtimes could potentially impact large-scale production workloads that customers depend on.

Since Lambda cannot guarantee backward compatibility between major language versions, upgrading the Lambda runtime used by a function is a customer-driven operation.

Lambda function versions

You can use function versions to manage the deployment of your functions. In Lambda, you make code and configuration changes to the default function version, which is called $LATEST. When you publish a function version, Lambda takes a snapshot of the code, runtime, and function configuration to maintain a consistent experience for users of that function version. When you invoke a function, you can specify the version to use or invoke the $LATEST version. Lambda function versions are required when using Provisioned Concurrency or SnapStart.

Some developers use an auto-versioning process by creating a new function version each time they deploy a change. This results in many versions of a function, with only a single version actually in use.

While Lambda applies runtime updates to published function versions, you cannot update the runtime major version for a published function version, for example from Node.js 16 to Node.js 20. To update the runtime for a function, you must update the $LATEST version, then create a new published function version if necessary. This means that different versions of a function can use different runtimes. The following shows the same function with version 1 using Node.js 14.x and version 2 using Node.js 18.x.

Version 1 using Node.js 14.x

Version 1 using Node.js 14.x

Version 2 using Node.js 18.x

Version 2 using Node.js 18.x

Ensure you create a maintenance process for deleting unused function versions, which also impact your Lambda storage quota.

Managing function runtime upgrades

Managing function runtime upgrades should be part of your software delivery lifecycle, in a similar way to how you treat dependencies and security updates. You need to understand which functions are being actively used in your organization. Organizations can create prioritization based on security profiles and/or function usage. You can use the same communication mechanisms you may already be using for handling security vulnerabilities.

Implement preventative guardrails to ensure that developers can only create functions using supported runtimes. Using infrastructure as code, CI/CD pipelines, and robust testing practices makes updating runtimes easier.

Identifying impacted functions

There are tools available to check Lambda runtime configuration and to identify which functions and what published function versions are actually in use. Deleting a function or function version that is no longer in use is the simplest way to avoid runtime deprecations.

You can identify functions using deprecated or soon to be deprecated runtimes using AWS Trusted Advisor. Use the AWS Lambda Functions Using Deprecated Runtimes check, in the Security category that provides 120 days’ notice.

AWS Trusted Advisor Lambda functions using deprecated runtimes

AWS Trusted Advisor Lambda functions using deprecated runtimes

Trusted Advisor scans all versions of your functions, including $LATEST and published versions.

The AWS Command Line Interface (AWS CLI) can list all functions in a specific Region that are using a specific runtime. To find all functions in your account, repeat the following command for each AWS Region and account. Replace the <REGION> and <RUNTIME> parameters with your values. The --function-version ALL parameter causes all function versions to be returned; omit this parameter to return only the $LATEST version.

aws lambda list-functions --function-version ALL --region <REGION> --output text —query "Functions[?Runtime=='<RUNTIME>'].FunctionArn"

You can use AWS Config to create a view of the configuration of resources in your account and also store configuration snapshot data in Amazon S3. AWS Config queries do not support published function versions, they can only query the $LATEST version.

You can then use Amazon Athena and Amazon QuickSight to make dashboards to visualize AWS Config data. For more information, see the Implementing governance in depth for serverless applications learning guide.

Dashboard showing AWS Config data

Dashboard showing AWS Config data

There are a number of ways that you can track Lambda function usage.

You can use Amazon CloudWatch metrics explorer to view Lambda by runtime and track the Invocations metric within the default CloudWatch metrics retention period of 15 months.

Track invocations in Amazon CloudWatch metrics

Track invocations in Amazon CloudWatch metrics

You can turn on AWS CloudTrail data event logging to log an event every time Lambda functions are invoked. This helps you understand what identities are invoking functions and the frequency of their invocations.

AWS Cost and Usage Reports can show which functions are incurring cost and in use.

Limiting runtime usage

AWS CloudFormation Guard is an open-source evaluation tool to validate infrastructure as code templates. Create policy rules to ensure that developers only chose approved runtimes. For more information, see Preventative Controls with AWS CloudFormation Guard.

AWS Config rules allow you to check that Lambda function settings for the runtime match expected values. For more information on running these rules before deployment, see Preventative Controls with AWS Config. You can also reactively flag functions as non-compliant as your governance policies evolve. For more information, see Detective Controls with AWS Config.

Lambda does not currently have service control policies (SCP) to block function creation based on the runtime

Upgrade best practices

Use infrastructure as code tools to build and manage your Lambda functions, which can make it easier to manage upgrades.

Ensure you run tests against your functions when developing locally. Include automated tests as part of your CI/CD pipelines to provide confidence in your runtime upgrades. When rolling out function upgrades, you can use weighted aliases to shift traffic between two function versions as you monitor for errors and failures.

Using runtimes after deprecation

AWS strongly advises you to upgrade your functions to a supported runtime before deprecation to continue to benefit from security patches, bug-fixes, and the latest runtime features. While deprecation does not affect function invocations, you will be using an unsupported runtime, which may have unpatched security vulnerabilities. Your function may eventually stop working, for example, due to a certificate expiry.

Lambda blocks function creation and updates for functions using deprecated runtimes. To create or update functions after these operations are blocked, contact AWS Support.

Conclusion

Lambda is deprecating a number of popular runtimes over the next few months, reflecting the end-of-life of upstream language versions and Amazon Linux 1. This post covers considerations for managing Lambda function runtime upgrades.

For more serverless learning resources, visit Serverless Land.

Accelerating Well-Architected Framework reviews using integrated AWS Trusted Advisor insights

Post Syndicated from Stephen Salim original https://aws.amazon.com/blogs/architecture/accelerating-well-architected-framework-reviews-using-integrated-aws-trusted-advisor-insights/

In this blog, we will explain how the new AWS Well-Architected integration with AWS Trusted Advisor can give you insights that help you create a flywheel effect to accelerate your cloud optimization. Customers that have the most success in their cloud adoption recognize that optimizing their cloud architecture and operations is not a one-time effort. Optimization is a continuous improvement virtuous cycle based on learning architectural and operational best practices, measuring workloads against these best practices, and implementing improvements based on opportunities recognized from measurement.

Customers can use the AWS Well-Architected Framework to build a “learn, measure, and improve” continuous improvement virtuous cycle (Figure 1). With the AWS Well-Architected Tool, customers can measure their workloads against these AWS best practices to identify improvement opportunities or risks they should address. After customers complete Well-Architected Framework Reviews (WAFRs) they can generate improvement plans with prioritized guidance and resources for improvement. They can also track the improvements made over time using the milestones feature in the Well-Architected Tool.

Continuous optimization of workloads based on AWS best practices

Figure 1. Continuous optimization of workloads based on AWS best practices

Amazon uses the term flywheel to describe a virtuous cycle that has additional drivers to add momentum, which accelerates the cycle and the value it delivers. Figure 2 is the often-referenced Amazon retail flywheel, which shows how Amazon’s focus on customer experience drives growth. It is accelerated by creating a lower cost structure, which allows Amazon to pass lower prices to its customers, improving customer experience and driving faster growth.

Amazon Flywheel concept of scaling growth

Figure 2. The Amazon Flywheel concept of scaling growth

Customers can add momentum to an AWS Well-Architected “learn, measure, and improve” virtuous cycle using tools that give more insights while measuring workloads. Improved insights result in consistent measurements, that are more efficient and more accurate. This accelerates the optimization cycle by reducing the time required to measure workloads. Collecting information on AWS resources using Trusted Advisor checks allows customers to validate if a workload’s state is aligned with AWS best practices. The new AWS Well-Architected Tool integration with AWS Trusted Advisor makes it easier and faster to gain insights during WAFRs. The Trusted Advisor checks that are relevant to a specific set of best practices have been mapped to the corresponding questions in Well-Architected. The new feature now shows the mapped Trusted Advisor checks directly in the Well-Architected Tool. These insights help customers run WAFRs in less time, with more accuracy, creating a flywheel effect (Figure 3).

Insights from AWS Trusted Advisor create acceleration in achieving improved outcomes

Figure 3. Insights from AWS Trusted Advisor create acceleration in achieving improved outcomes

AWS Well-Architected Tool integration with AWS Trusted Advisor: feature example

In the following sections, we detail an example scenario on how to use the integration with Trusted Advisor to gain insights when measuring your workloads.

Enabling the AWS Well-Architected Tool integration with AWS Trusted Advisor

How to enable the new feature in your workload:

  1. Create a new workload in the AWS Well-Architected Console. Refer to the user guide for detailed instructions.

    Optional
    : When defining a workload, within the “Application” section of workload definition, you can now also specify the AWS Service Catalog AppRegistry AWS Resource Name (ARN). This field is to indicate a relationship between the AWS Well-Architected Tool workload and the AWS resources in an AppRegistry Application when performing a Well-Architected Framework Review (Figure 4).

    Application field to select AWS Service Catalog AppRegistry ARN

    Figure 4. Application field to select AWS Service Catalog AppRegistry ARN

    This is another new AWS Well-Architected Tool feature that launched along with the integration with Trusted Advisor feature. You can find out more details about the integration with AWS Service Catalog AppRegistry in the What’s New post and on the feature documentation page. For details on how to create an AWS Service Catalog AppRegistry Application refer to Creating applications.

  2. To enable the integration with Trusted Advisor, after the necessary workload information has been entered, within the “AWS Trusted Advisor” section, tick on “Activate Trusted Advisor” (Figure 5).
    Enabling the Trusted Advisor feature

    Figure 5. Enabling the AWS Trusted Advisor feature

    Optional: Once the workload is created, note the workload ARN. You can find the workload ARN in the Properties section of the workload resource you created (Figure 6). For steps on how to identify your workload, refer to Well-Architected Tool User Guide on viewing a workload.

    AWS Well-Architected Tool showing workload ARN

    Figure 6. AWS Well-Architected Tool showing workload ARN

  3. To collect Trusted Advisor checks from accounts other than the account where the workload you are reviewing exists, you must perform two steps. You need to ensure the account IDs are listed in the workload properties for the workload you are reviewing. You must then create an IAM role in the account from which Trusted Advisor checks will be collected with the following permission and trust relationship (Figures 7 and 8). For more information on how to setup this permission, refer to the feature documentation.
    Permissions needed by AWS Well-Architected Tool to interrogate AWS Trusted Advisor

    Figure 7. Permissions needed by AWS Well-Architected Tool to interrogate AWS Trusted Advisor

    The trust relationship allowing AWS Well-Architected Tool to assume policy on behalf of the workload

    Figure 8. The trust relationship allowing AWS Well-Architected Tool to assume policy on behalf of the workload

Using integration with AWS Trusted Advisor for insights during reviews

Once the feature is enabled, additional insights will be noticeable about the resources in your workload using Trusted Advisor checks. Let’s explore an example question. In this case, we will use Question 9 from the Reliability Pillar, as there are Trusted Advisor checks related to the best practices in it: How do you back up data?

  1. AWS Well-Architected Reliability Question 9 includes best practices that are related to how workload backup is performed to support the ability for the workload to recover from failure. Current findings using Trusted Advisor checks indicates the workload may not be configured based on the “Perform data backup automatically” best practice in the Reliability Pillar (Figure 9).

    "Perform data backup automatically" best practices

    Figure 9. “Perform data backup automatically” best practices

  2. To access Trusted Advisor checks as insights, you can select a question in the Well-Architected Tool (Figure 10). If there are related Trusted Advisor checks available for a question, there will be a “View checks” button like the screenshot below. You can also select the “Trusted Advisor checks” tab.

    Trusted Advisor checks that map to best practices

    Figure 10. AWS Trusted Advisor checks that map to best practices

  3. Trusted Advisor checks are available, which provide insights related to the best practice in the question. You will also notice the state of resources recommendations and the count of resources. Trusted Advisor checks that relate to the best practice “Perform data backup automatically” are displayed. One of the Trusted Advisor checks identified with a x in a circle (denoting “Action recommended”) status is on the Amazon Elastic Block Storage (Amazon EBS) snapshots availability to recover your EBS volume from in the event of disaster (Figure 11).

    AWS Trusted Advisor check for Amazon EBS snapshots with "Action recommended"

    Figure 11. AWS Trusted Advisor check for Amazon EBS snapshots with “Action recommended”

  4. Exploring the Trusted Advisor Console, you can identify the EBS volume ID that has been detected with no snapshot in this us-west-2 region (Figure 12).

    An EBS volume that does not have snapshots

    Figure 12. An EBS volume that does not have snapshots

  5. With the insights from Trusted Advisor, we can quickly determine that the “Perform data backup automatically” best practice is not in place, as we do not have Amazon EBS snapshots enabled. Through the “helpful resources” section, instructions can be found to help automate the snapshot creation of Amazon EBS volume (Figure 13). One method to achieve this is to use AWS Backup.

    Resources with details about best practices, including links to learn more

    Figure 13. Resources with details about best practices, including links to learn more

  6. Using AWS Backup you can define a backup plan to automate snapshots creation of the EBS volume. Using this plan, you adjust the frequency of the backup to help achieve your recovery time objective and recovery point objective (Figure 14). For more information on how to configure EBS volume backup plan, refer to the Developer Guide on creating a backup plan.

    Setup automatic Amazon EBS volume snapshots

    Figure 14. Setup automatic Amazon EBS volume snapshots

  7. Once this improvement is implemented and the related EBS volume snapshot is taken, Trusted Advisor will reflect the changes to the resource (Figure 15).

    Amazon EBS volume with a snapshot

    Figure 15. Amazon EBS volume with a snapshot

  8. The next time we perform a Well-Architected Framework Review on this workload, the related AWS Trusted Advisor Check will show no action required with a check-mark status (Figure 16).
    AWS Trusted Advisor checks that represent improvements that have been implemented

    Figure 16. AWS Trusted Advisor checks that represent improvements that have been implemented

    Optional: For access to the list of Trusted Advisor checks in .csv format, you can click on the “Download check details” button on each question to download the resources that were checked in relation to the specified best practices (Figure 17).

    "Download check details" button

    Figure 17. “Download check details” button

  9. Once implemented, this improvement ensures a means to recover the EBS volume data in the event of disaster. This makes the resources in the workload better aligned to the AWS Reliability Pillar Design principle of “Automatically recover from failure”. To reflect this alignment in the Well-Architected Tool, you can tick on the best practice check items under the related questions (Figure 18).

    A milestone with updated best practices based on improvements that have been implemented

    Figure 18. A milestone with updated best practices based on improvements that have been implemented

  10. Finally, you can create a milestone to capture a point in time state of your workload WAFR. As you continuously optimize with more WAFRs and improvements, the number of high- and medium-risk items identified within each review will decrease. You will notice the continuous optimization of your workload over time, as in Figure 19.

    The history of improvements being made over time

    Figure 19. The history of improvements being made over time

Conclusion

Using the AWS Well-Architected integration with AWS Trusted Advisor, customers have a mechanism to accelerate the “learn, measure, and improve” Well-Architected virtuous cycle, creating an optimization flywheel. We have demonstrated the value of creating acceleration through the insights from Trusted Advisor checks. You now know how to enable the integration with Trusted Advisor and have seen an example of how the insights can accelerate your review cycle. You will notice the improvements you make over time will reflect in the Trusted Advisor checks as you review the milestones for your workloads. Enable this feature on your next Well-Architected Framework Review (WAFR) to measure the impact that data-driven insights from Trusted Advisor can have on reducing the time-to-value for your reviews. For more information consider these additional resources. You can contact your account team for support in running WAFRs or check out the AWS Well-Architected Partner Program to find a partner that can help you run a review. Additionally, running a WAFR with a partner assisting you in remediating risks may also provide funding credits to offset the costs required to make the improvements.

“Perform data backup automatically” is part of the Reliability Pillar of the AWS Well-Architected Framework. AWS Well-Architected is a set of guiding design principles developed by AWS to help organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Use the AWS Well-Architected Tool to review your workloads periodically to address important design considerations and ensure that they follow the best practices and guidance of the AWS Well-Architected Framework. For follow up questions or comments, join our growing community on AWS re:Post.

 

AWS Trusted Advisor – New Priority Capability

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-trusted-advisor-new-priority-capability/

AWS Trusted Advisor is a service that continuously analyzes your AWS accounts and provides recommendations to help you to follow AWS best practices and AWS Well-Architected guidelines. Trusted Advisor implements a series of checks. These checks identify ways to optimize your AWS infrastructure, improve security and performance, reduce costs, and monitor service quotas.

Today, we are making available to all Enterprise Support customers a new capability for AWS Trusted Advisor: Trusted Advisor Priority. It gives you prioritized and context-driven recommendations manually curated by your AWS account team, based on their knowledge of your environment and the machine-generated checks from AWS Services.

Trusted Advisor implements over 200 checks in five categories: cost optimization, performance, security, fault tolerance, and service limits. Here is a view of the current Trusted Advisor dashboard.

AWS Trusted Advisor Categories

The list of checks available on your account depends on your level of support. When you have AWS Basic Support, available to all customers, or AWS Developer Support, you have access to core security and service limits checks. When you have AWS Business Support or AWS Enterprise Support, you have access to all checks.

The new Priority capability gives you a prioritized view of critical risks. It shows prioritized, contextual recommendations and actionable insights based on your business outcomes and what’s important to you. It also surfaces risks proactively identified by your AWS account team to alert and address critical cloud risks stemming from deviations from AWS best practices. It is designed to help you: IT leaders, technical decisions makers, and members of a Cloud Center of Excellence.

The account team takes advantage of their understanding of your production accounts and business-critical workloads. By working with you, they identify what’s important to you, and the outcomes or goals you wish to achieve. For example, they know about your business viewpoint whether it is exiting a data center by the end of the year, launching a new product, expanding to a new geography, or migrating a workload to the cloud.

Trusted Advisor uses multiple sources to define the priorities. On one side, it uses signals from other AWS services, such as AWS Compute Optimizer, Amazon GuardDuty, or VPC Flow Logs. On the other side, it uses context manually curated by your AWS account team (Account Manager, Technical Account Manager, Solutions Architect, Customer Solutions Manager, and others) and the knowledge they have about your production accounts, business-critical applications and critical workloads. You will be guided to opportunities to take advantage of AWS Support engagements like a Cost Optimization workshop when the account team believes there are opportunities to reduce costs, a deep dive with a service team, or an Infrastructure Event Management for an upcoming workload migration.

You will be alerted to risks in your deployments on AWS, using sources such as the AWS Well-Architected framework. We will highlight and bring to attention any open high risk issues (HRIs) from recently conducted Well-Architected reviews. We also run campaigns to proactively identify, alert, and reduce single points of failures, such as single Availability Zone deployments. This verifies that you don’t have a single point of failures for production applications that are used for mission-critical processes, that drive significant revenue, or have regulated availability requirements. Trusted Advisor helps you to detect, raise awareness, and provide prescriptive guidance.

Here is a diagram to visualize my mental model for Trusted Advisor Priority:

Trusted Advisor Mental Model Diagram

Trusted Advisor Priority works with AWS Organizations: it aggregates all recommendations from member accounts in your management account or designed delegated administrator. You may delegate access to Trusted Advisor Priority to a maximum of five other AWS accounts. Trusted Advisor Priority comes with a new AWS Identity and Access Management (IAM) policy to help you manage access to the capability. Finally, you can also configure to receive daily and weekly email digests of all prioritized notifications to the alternate contacts you set up in the management account or each delegated admin account.

Let’s See Trusted Advisor Priority in Action
I open the AWS Management Console and navigate to Trusted Advisor. I notice a new navigation entry on the left menu. It is the default view for Enterprise Support customers.

The Trusted Advisor Priority main screen summarizes the number of Pending response and In progress recommendations. It shares some time-related statistics on the right side of the screen. I can start to look at the Active prioritized recommendations list on the bottom half of the screen.

Recommendations are divided into two panels: Active and Closed. The Active tab includes recommendations that have been surfaced to you and which you are actively working on. The Closed tab includes recommendations that have been resolved. All account team prioritized recommendations are presented with a series of searchable and sortable columns. I see the recommendation name, status, source, category, and age.

AWS Trusted Advisor Priority

The list gives me details about the category, the age, and the status of the recommendations. The Source column distinguishes between auto-detected and manually identified opportunities. The Category column shows the category from Trusted Advisor (cost optimization, performance, security, fault tolerance, and service limits). The Age column shows me how long it’s been since the recommendation was first shared. This helps with tracking the time to resolution for each of these items.

AWS Trusted Advisor Priority

I can select any recommendation to drill down into the details. In this example, I select the second one: Amazon RDS Public Snapshots. This is a recommendation in the Security category.

AWS Trusted Advisor Priority

Recommendations are actionable, and they give you a real course of action to respond to the issue. In this case, it suggests modifying the snapshot configuration and removing the public flag that makes the database snapshot available to all AWS customers.

Trusted Advisor Priority provides a closed-loop feedback mechanism where I have the ability to accept or reject a recommendation if I don’t think the issue is relevant to my account.

The information is aggregated at an Organizations level. When you are using Organizations to group accounts to reflect your business units, the recommendations are aggregated and present an overall risk posture across your business units.

As an infrastructure manager, I can either Accept the recommendation and take action or Reject it because it is not a risk or it is something I will not fix and want to remove the recommendation from my list.

AWS Trusted Advisor Priority - Accept AWS Trusted Advisor Priority - Reject

Pricing and Availability
AWS Trusted Advisor Priority is available in all commercial AWS Regions where Trusted Advisor is available now, except the two AWS Regions in China. It is available at no additional cost for Enterprise Support customers.

Trusted Advisor Priority will not replace your Technical Account Manager or Solution Architect. They are key in providing tailored guidance and working with you through all phases of managing your cloud applications. Trusted Advisor Priority provides anytime access to tailored, context-aware, risk-mitigating recommendations and insights from your account team and optimizes your engagement with AWS. It will not reduce your access to your account team in any way but rather will make it easier for you to collaborate with them on your most important priorities.

You can start to use Trusted Advisor Priority today.

And now, go build!

— seb

Building AWS Lambda governance and guardrails

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-aws-lambda-governance-and-guardrails/

When building serverless applications using AWS Lambda, there are a number of considerations regarding security, governance, and compliance. This post highlights how Lambda, as a serverless service, simplifies cloud security and compliance so you can concentrate on your business logic. It covers controls that you can implement for your Lambda workloads to ensure that your applications conform to your organizational requirements.

The Shared Responsibility Model

The AWS Shared Responsibility Model distinguishes between what AWS is responsible for and what customers are responsible for with cloud workloads. AWS is responsible for “Security of the Cloud” where AWS protects the infrastructure that runs all the services offered in the AWS Cloud. Customers are responsible for “Security in the Cloud”, managing and securing their workloads. When building traditional applications, you take on responsibility for many infrastructure services, including operating systems and network configuration.

Traditional application shared responsibility

Traditional application shared responsibility

One major benefit when building serverless applications is shifting more responsibility to AWS so you can concentrate on your business applications. AWS handles managing and patching the underlying servers, operating systems, and networking as part of running the services.

Serverless application shared responsibility

Serverless application shared responsibility

For Lambda, AWS manages the application platform where your code runs, which includes patching and updating the managed language runtimes. This reduces the attack surface while making cloud security simpler. You are responsible for the security of your code and AWS Identity and Access Management (IAM) to the Lambda service and within your function.

Lambda is SOCHIPAAPCI, and ISO-compliant. For more information, see Compliance validation for AWS Lambda and the latest Lambda certification and compliance readiness services in scope.

Lambda isolation

Lambda functions run in separate isolated AWS accounts that are dedicated to the Lambda service. Lambda invokes your code in a secure and isolated runtime environment within the Lambda service account. A runtime environment is a collection of resources running in a dedicated hardware-virtualized Micro Virtual Machines (MVM) on a Lambda worker node.

Lambda workers are bare metalEC2 Nitro instances, which are managed and patched by the Lambda service team. They have a maximum lease lifetime of 14 hours to keep the underlying infrastructure secure and fresh. MVMs are created by Firecracker, an open source virtual machine monitor (VMM) that uses Linux’s Kernel-based Virtual Machine (KVM) to create and manage MVMs securely at scale.

MVMs maintain a strong separation between runtime environments at the virtual machine hardware level, which increases security. Runtime environments are never reused across functions, function versions, or AWS accounts.

Isolation model for AWS Lambda workers

Isolation model for AWS Lambda workers

Network security

Lambda functions always run inside secure Amazon Virtual Private Cloud (Amazon VPCs) owned by the Lambda service. This gives the Lambda function access to AWS services and the public internet. There is no direct network inbound access to Lambda workers, runtime environments, or Lambda functions. All inbound access to a Lambda function only comes via the Lambda Invoke API, which sends the event object to the function handler.

You can configure a Lambda function to connect to private subnets in a VPC in your account if necessary, which you can control with IAM condition keys . The Lambda function still runs inside the Lambda service VPC but sends all network traffic through your VPC. Function outbound traffic comes from your own network address space.

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

AWS Lambda service VPC with VPC-to-VPC NAT to customer VPC

To give your VPC-connected function access to the internet, route outbound traffic to a NAT gateway in a public subnet. Connecting a function to a public subnet doesn’t give it internet access or a public IP address, as the function is still running in the Lambda service VPC and then routing network traffic into your VPC.

All internal AWS traffic uses the AWS Global Backbone rather than traversing the internet. You do not need to connect your functions to a VPC to avoid connectivity to AWS services over the internet. VPC connected functions allow you to control and audit outbound network access.

You can use security groups to control outbound traffic for VPC-connected functions and network ACLs to block access to CIDR IP ranges or ports. VPC endpoints allow you to enable private communications with supported AWS services without internet access.

You can use VPC Flow Logs to audit traffic going to and from network interfaces in your VPC.

Runtime environment re-use

Each runtime environment processes a single request at a time. After Lambda finishes processing the request, the runtime environment is ready to process an additional request for the same function version. For more information on how Lambda manages runtime environments, see Understanding AWS Lambda scaling and throughput.

Data can persist in the local temporary filesystem path, in globally scoped variables, and in environment variables across subsequent invocations of the same function version. Ensure that you only handle sensitive information within individual invocations of the function by processing it in the function handler, or using local variables. Do not re-use files in the local temporary filesystem to process unencrypted sensitive data. Do not put sensitive or confidential information into Lambda environment variables, tags, or other freeform fields such as Name fields.

For more Lambda security information, see the Lambda security whitepaper.

Multiple accounts

AWS recommends using multiple accounts to isolate your resources because they provide natural boundaries for security, access, and billing. Use AWS Organizations to manage and govern individual member accounts centrally. You can use AWS Control Tower to automate many of the account build steps and apply managed guardrails to govern your environment. These include preventative guardrails to limit actions and detective guardrails to detect and alert on non-compliance resources for remediation.

Lambda access controls

Lambda permissions define what a Lambda function can do, and who or what can invoke the function. Consider the following areas when applying access controls to your Lambda functions to ensure least privilege:

Execution role

Lambda functions have permission to access other AWS resources using execution roles. This is an AWS principal that the Lambda service assumes which grants permissions using identity policy statements assigned to the role. The Lambda service uses this role to fetch and cache temporary security credentials, which are then available as environment variables during a function’s invocation. It may re-use them across different runtime environments that use the same execution role.

Ensure that each function has its own unique role with the minimum set of permissions..

Identity/user policies

IAM identity policies are attached to IAM users, groups, or roles. These policies allow users or callers to perform operations on Lambda functions. You can restrict who can create functions, or control what functions particular users can manage.

Resource policies

Resource policies define what identities have fine-grained inbound access to managed services. For example, you can restrict which Lambda function versions can add events to a specific Amazon EventBridge event bus. You can use resource-based policies on Lambda resources to control what AWS IAM identities and event sources can invoke a specific version or alias of your function. You also use a resource-based policy to allow an AWS service to invoke your function on your behalf. To see which services support resource-based policies, see “AWS services that work with IAM”.

Attribute-based access control (ABAC)

With attribute-based access control (ABAC), you can use tags to control access to your Lambda functions. With ABAC, you can scale an access control strategy by setting granular permissions with tags without requiring permissions updates for every new user or resource as your organization scales. You can also use tag policies with AWS Organizations to standardize tags across resources.

Permissions boundaries

Permissions boundaries are a way to delegate permission management safely. The boundary places a limit on the maximum permissions that a policy can grant. For example, you can use boundary permissions to limit the scope of the execution role to allow only read access to databases. A builder with permission to manage a function or with write access to the applications code repository cannot escalate the permissions beyond the boundary to allow write access.

Service control policies

When using AWS Organizations, you can use Service control policies (SCPs) to manage permissions in your organization. These provide guardrails for what actions IAM users and roles within the organization root or OUs can do. For more information, see the AWS Organizations documentation, which includes example service control policies.

Code signing

As you are responsible for the code that runs in your Lambda functions, you can ensure that only trusted code runs by using code signing with the AWS Signer service. AWS Signer digitally signs your code packages and Lambda validates the code package before accepting the deployment, which can be part of your automated software deployment process.

Auditing Lambda configuration, permissions and access

You should audit access and permissions regularly to ensure that your workloads are secure. Use the IAM console to view when an IAM role was last used.

IAM last used

IAM last used

IAM access advisor

Use IAM access advisor on the Access Advisor tab in the IAM console to review when was the last time an AWS service was used from a specific IAM user or role. You can use this to remove IAM policies and access from your IAM roles.

IAM access advisor

IAM access advisor

AWS CloudTrail

AWS CloudTrail helps you monitor, log, and retain account activity to provide a complete event history of actions across your AWS infrastructure. You can monitor Lambda API actions to ensure that only appropriate actions are made against your Lambda functions. These include CreateFunction, DeleteFunction, CreateEventSourceMapping, AddPermission, UpdateEventSourceMapping,  UpdateFunctionConfiguration, and UpdateFunctionCode.

AWS CloudTrail

AWS CloudTrail

IAM Access Analyzer

You can validate policies using IAM Access Analyzer, which provides over 100 policy checks with security warnings for overly permissive policies. To learn more about policy checks provided by IAM Access Analyzer, see “IAM Access Analyzer policy validation”.

You can also generate IAM policies based on access activity from CloudTrail logs, which contain the permissions that the role used in your specified date range.

IAM Access Analyzer

IAM Access Analyzer

AWS Config

AWS Config provides you with a record of the configuration history of your AWS resources. AWS Config monitors the resource configuration and includes rules to alert when they fall into a non-compliant state.

For Lambda, you can track and alert on changes to your function configuration, along with the IAM execution role. This allows you to gather Lambda function lifecycle data for potential audit and compliance requirements. For more information, see the Lambda Operators Guide.

AWS Config includes Lambda managed config rules such as lambda-concurrency-check, lambda-dlq-check, lambda-function-public-access-prohibited, lambda-function-settings-check, and lambda-inside-vpc. You can also write your own rules.

There are a number of other AWS services to help with security compliance.

  1. AWS Audit Manager: Collect evidence to help you audit your use of cloud services.
  2. Amazon GuardDuty: Detect unexpected and potentially unauthorized activity in your AWS environment.
  3. Amazon Macie: Evaluates your content to identify business-critical or potentially confidential data.
  4. AWS Trusted Advisor: Identify opportunities to improve stability, save money, or help close security gaps.
  5. AWS Security Hub: Provides security checks and recommendations across your organization.

Conclusion

Lambda makes cloud security simpler by taking on more responsibility using the AWS Shared Responsibility Model. Lambda implements strict workload security at scale to isolate your code and prevent network intrusion to your functions. This post provides guidance on assessing and implementing best practices and tools for Lambda to improve your security, governance, and compliance controls. These include permissions, access controls, multiple accounts, and code security. Learn how to audit your function permissions, configuration, and access to ensure that your applications conform to your organizational requirements.

For more serverless learning resources, visit Serverless Land.

Optimizing your AWS Infrastructure for Sustainability, Part III: Networking

Post Syndicated from Katja Philipp original https://aws.amazon.com/blogs/architecture/optimizing-your-aws-infrastructure-for-sustainability-part-iii-networking/

In Part I: Compute and Part II: Storage of this series, we introduced strategies to optimize the compute and storage layer of your AWS architecture for sustainability.

This blog post focuses on the network layer of your AWS infrastructure and proposes concepts to optimize your network utilization.

Optimizing the networking layer of your AWS infrastructure

When you make your applications available to more customers, the packets that travel across the network will increase. Similarly, the larger the size of data, as well as the more distance a packet has to travel, the more resources are required to transmit it. With growing number of application users, optimizing network traffic can ensure that network resource consumption is not growing linearly.

The recommendations in the following sections will help you use your resources more efficiently for the network layer of your workload.

Reducing the network traveled per request

Reducing the data sent over the network and optimizing the path a packet takes will result in a more efficient data transfer. The following table provides metrics related to some AWS services that can help you find potential network optimization opportunities.

Service Metric/Check Source
Amazon CloudFront Cache hit rate Viewing CloudFront and Lambda@Edge metrics
AWS Trusted Advisor check reference
Amazon Simple Storage Service (Amazon S3) Data transferred in/out of a bucket Metrics and dimensions
AWS Trusted Advisor check reference
Amazon Elastic Compute Cloud (Amazon EC2) NetworkPacketsIn/NetworkPacketsOut List the available CloudWatch metrics for your instances
AWS Trusted Advisor CloudFront Content Delivery Optimization AWS Trusted Advisor check reference

We recommend the following concepts to optimize your network utilization.

Read local, write global

The following strategies allow users to read the data from the source closest to them; thus, fewer requests travel longer distances.

  • If you are operating within a single AWS Region, you should choose a Region that is near the majority of your users. The further your users are away from the Region, the further data needs to travel through the global network.
  • If your users are spread over multiple Regions, set up multiple copies of the data to reside in each Region. Amazon Relational Database Service (Amazon RDS) and Amazon Aurora let you set up cross-Region read replicas. Amazon DynamoDB global tables allow for fast performance and alleviate network load.

Use a content delivery network

Content delivery networks (CDNs) bring your data closer to the end user. When requested, they cache static content from the original server and deliver it to the user. This shortens the distance each packet has to travel.

  • CloudFront optimizes network utilization and delivers traffic over CloudFront’s globally distributed edge network. Figure 1 shows a global user base that accesses an S3 bucket directly versus serving cached data from edge locations.
  • Trusted Advisor includes a check that recommends whether you should use a CDN for your S3 buckets. It analyzes the data transferred out of your S3 bucket and flags the buckets that could benefit from a CloudFront distribution.
Comparison of accessing an S3 bucket directly versus via a CloudFront distribution/edge locations

Figure 1. Comparison of accessing an S3 bucket directly versus via a CloudFront distribution/edge locations

Optimize CloudFront cache hit ratio

CloudFront caches different versions of an object depending upon the request headers (for example, language, date, or user-agent). You can further optimize your CDN distribution’s cache hit ratio (the number of times an object is served from the CDN versus from the origin) with a Trusted Advisor check. It automatically checks for headers that do not affect the object and then recommends a configuration to ignore those headers and not forward the request to the origin.

Use edge-oriented services

Edge computing brings data storage and computation closer to users. By implementing this approach, you can perform data preprocessing or run machine learning algorithms on the edge.

  • Edge-oriented services applied on gateways or directly onto user devices reduce network traffic because data does not need to be sent back to the cloud server.
  • One-time, low-latency tasks are a good fit for edge use cases, like when an autonomous vehicle needs to detect objects nearby. You should generally archive data that needs to be accessed by multiple parties in the cloud, but consider factors such as device hardware and privacy regulations first.
  • CloudFront Functions can run compute on edge locations and Lambda@Edge can generate Regional edge caches. AWS IoT Greengrass provides edge computing for Internet of Things (IoT) devices.

Reducing the size of data transmitted

Serve compressed files

In addition to caching static assets, you can further optimize network utilization by serving compressed files to your users. You can configure CloudFront to automatically compress objects, which results in faster downloads, leading to faster rendering of webpages.

Enhance Amazon EC2 network performance

Network packets consist of data that you are sending (frame) and the processing overhead information. If you use larger packets, you can pass more data in a single packet and decrease processing overhead.

Jumbo frames use the largest permissible packet that can be passed over the connection. Keep in mind that outside a single virtual private cloud (VPC), over virtual private network (VPN) or internet gateway, traffic is limited to a lower frame regardless of using jumbo frames.

Optimize APIs

If your payloads are large, consider reducing their size to reduce network traffic by compressing your messages for your REST API payloads. Use the right endpoint for your use case. Edge-optimized API endpoints are best suited for geographically distributed clients. Regional API endpoints are best suited for when you have a few clients with higher demands, because they can help reduce connection overhead. Caching your API responses will reduce network traffic and enhance responsiveness.

Conclusion

As your organization’s cloud adoption grows, knowing how efficient your resources are is crucial when optimizing your AWS infrastructure for environmental sustainability. Using the fewest number of resources possible and using them to their fullest will have the lowest impact on the environment.

Throughout this three-part blog post series, we introduced you to the following architectural concepts and metrics for the compute, storage, and network layers of your AWS infrastructure.

  • Reducing idle resources and maximizing utilization
  • Shaping demand to existing supply
  • Managing your data’s lifecycle
  • Using different storage tiers
  • Optimizing the path data travels through a network
  • Reducing the size of data transmitted

This is not an exhaustive list. We hope it is a starting point for you to consider the environmental impact of your resources and how you can build your AWS infrastructure to be more efficient and sustainable. Figure 2 shows an overview of how you can monitor related metrics with CloudWatch and Trusted Advisor.

Overview of services that integrate with CloudWatch and Trusted Advisor for monitoring metrics

Figure 2. Overview of services that integrate with CloudWatch and Trusted Advisor for monitoring metrics

Ready to get started? Check out the AWS Sustainability page to find out more about our commitment to sustainability. It provides information about renewable energy usage, case studies on sustainability through the cloud, and more.

Other blog posts in this series

Related information

Journey to Adopt Cloud-Native Architecture Series: #4 – Governing Security at Scale and IAM Baselining

Post Syndicated from Anuj Gupta original https://aws.amazon.com/blogs/architecture/journey-to-adopt-cloud-native-architecture-series-4-governing-security-at-scale-and-iam-baselining/

In Part 3 of this series, Improved Resiliency and Standardized Observability, we talked about design patterns that you can adopt to improve resiliency, achieve minimum business continuity, and scale applications with lengthy transactions (more than 3 minutes).

As a refresher from previous blogs in this series, our example ecommerce company’s “Shoppers” application runs in the cloud. The company experienced hypergrowth, which posed a number of platform and technology challenges, namely, they needed to scale on the backend without impacting users.

Because of this hypergrowth, distributed denial of service (DDoS) attacks on the ecommerce company’s services increased 10 times in 6 months. Some of these attacks led to downtime and loss of revenue. This blog post shows you how we addressed these threats by implementing a multi-account strategy and applying AWS Identity and Access Management (IAM) best practices.

A multi-account strategy ensures security at scale

Originally, the company’s production and non-production services were running in a single account. This meant non-production vulnerabilities like frequently changing code or privileged access could impact the production environment. Additionally, the application experienced issues due to unexpectedly reaching service quotas. These include (but are not limited to) number of read replicas per master in Amazon Relational Database Service (Amazon RDS) and total storage for all DB instances in Auto Scaling Service Quotas for Amazon Elastic Compute Cloud (Amazon EC2).

To address these issues, we followed multi-account strategy best practices. We established the multi-account hierarchy shown in Figure 1 that includes the following eight organizational units (OUs) to meet business requirements:

  1. Security PROD OU
  2. Security SDLC OU
  3. Infrastructure PROD OU
  4. Infrastructure SDLC OU
  5. Workload PROD OU
  6. Workload SDLC OU
  7. Sandbox OU
  8. Transitional OU

To identify the right fit for our needs, we evaluated AWS Landing Zone and AWS Control Tower. To reduce operation overhead of maintaining a solution, we used AWS Control Tower to deploy guardrails as service control policies (SCPs). These guardrails were then separated into production and non-production environments, creating the hierarchy shown in Figure 1.

We created a new Payer (or Management) Account with Sandbox OU and Transitional OU under Root OU. We then moved existing AWS accounts under the Transitional OU and Sandbox OU. We provisioned new accounts with Account Factory and gradually migrated services from existing AWS accounts into the newly formed Log Archive Account, Security Account, Network Account, and Shared Services Account and applied appropriate guardrails. We then registered Sandbox OU with Control Tower. Additionally, we migrated the centralized logging solution from Part 3 of this blog series to the Security Account. We moved non-production applications into the Dev and Test Accounts, respectively, to isolate workloads. We then moved existing accounts that had production services from the Transitional OU to Workload PROD OU.

Multi-account hierarchy

Figure 1. Multi-account hierarchy

Implementing a multi-account strategy alleviated service quota challenges. It isolated variable demand non-production environments from more consistent production environments, which reduced the downtime caused by unplanned scaling events. The multi-account strategy enforces governance at scale, but also promotes innovation by allocating separate accounts with distinct security requirements for proof of concepts and experimentation. This reduces impact risks to production accounts and allows the required guardrails to be automatically applied.

Improving access management and least privilege access

When the company experienced hypergrowth, they not only had to scale their application’s infrastructure, but they also had to increase how often they release their code. They also hired and onboarded new internal teams.

To strengthen new/existing employees’ credentials, we used AWS Trusted Advisor for IAM Access Key Rotation. This identifies IAM users whose access keys have not been rotated for more than 90 days and created an automated way to rotate them. We then generated an IAM credential report to identify IAM users that don’t need console access or that don’t need access keys. We gradually assigned these users role-based access versus IAM access keys.

During a Well-Architected Security Pillar review, we identified some applications that used hardcoded passwords that hadn’t been updated for more than 90 days. We re-factored these applications to get passwords from AWS Secrets Manager and followed best practices for performance.

Additionally, we set up a system to automatically change passwords for RDS databases and wrote an AWS Lambda function to update passwords for third-party integration. Some applications on Amazon EC2 were using IAM access keys to access AWS services. We re-factored them to get permissions from the EC2 instance role attached to the EC2 instances, which reduced operational burden of rotating access keys.

Using IAM Access Analyzer, we analyzed AWS CloudTrail logs and generated policies for IAM roles. This helped us determine the least privilege permissions required for the roles as mentioned in the IAM Access Analyzer makes it easier to implement least privilege permissions by generating IAM policies based on access activity blog.

To streamline access for internal users, we migrated users to AWS Single Sign-On (AWS SSO) federated access. We enabled all features in AWS Organizations to use AWS SSO and created permission sets to define access boundaries for different functions. We assigned permission sets to different user groups and assigned users to user groups based on their job function. This allowed us to reduce the number of IAM policies and use tag-based control when defining AWS SSO permissions policies.

We followed the guidance in the Attribute-based Access Control with AWS SSO blog post to map user attributes and use tags to define permissions boundaries for user groups. This allowed us to provide access to users based on specific teams, projects, and departments. We enforced multi-factor authentication (MFA) for all AWS SSO users by configuring MFA settings to allow sign in only when an MFA device has been registered.

These improvements ensure that only the right people have access to the required resources for the right time. They reduce the risk of compromised security credentials by using AWS Security Token Service (AWS STS) to generate temporary credentials when needed. System passwords are better protected from unwanted access and automatically rotated for improved security. AWS SSO also allows us to enforce permissions at scale when people’s job functions change within or across teams.

Conclusion

In this blog post, we described design patterns we used to implement security governance at scale using multi-account strategy and AWS SSO integrations. We also talked about patterns you can adopt for IAM baselining that allow least privilege access, checking for IAM best practices, and proactively detecting unwanted access.

This blog post also covers why you need to refresh your threat model during hyperscale growth and how different services can make it easier to enforce security controls. In the next blog, we will talk about more security design patterns to improve infrastructure security and incident response during hyperscale.

Find out more

Other blogs in this series

Related information