Tag Archives: AWS Well-Architected Framework

Top Architecture Blog Posts of 2023

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2023/

2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.

As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.

I give you our 2023 cream of the crop!

#10: Build a serverless retail solution for endless aisle on AWS

In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.

Building endless aisle architecture for order processing

Figure 1. Building endless aisle architecture for order processing

Check it out!

#9: Optimizing data with automated intelligent document processing solutions

Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.

AI-based intelligent document processing engine

Figure 2: AI-based intelligent document processing engine

Check it out!

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.

Warm standby with managed services

Figure 3. Warm standby with managed services

Check it out!

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.

Architecture flow for Microservices to simulate a realistic failure scenario

Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario

Check it out!

#6: Let’s Architect! Designing event-driven architectures

Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.

Let's Architect

Figure 5. Let’s Architect artwork

Check it out!

#5: Use a reusable ETL framework in your AWS lake house architecture

As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.

Reusable ETL framework architecture

Figure 6. Reusable ETL framework architecture

Check it out!

#4: Invoking asynchronous external APIs with AWS Step Functions

It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.

Invoking Asynchronous External APIs architecture

Figure 7. Invoking Asynchronous External APIs architecture

Check it out!

#3: Announcing updates to the AWS Well-Architected Framework

The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!

Well-Architected logo

Figure 8. Well-Architected logo

Check it out!

#2: Let’s Architect! Designing architectures for multi-tenancy

The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.

Let's Architect

Figure 9. Let’s Architect

Check it out!

And finally…

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!

Resilience patterns and trade-offs

Figure 10. Resilience patterns and trade-offs

Check it out!

Bonus! Three older special mentions

These three posts were published before 2023, but we think they deserve another round of applause because you, our readers, keep coming back to them.

Thanks again to everyone for their contributions during a wild year. We hope you’re looking forward to the rest of 2024 as much as we are!

Announcing the AWS Well-Architected Framework DevOps Guidance

Post Syndicated from Michael Rhyndress original https://aws.amazon.com/blogs/devops/announcing-the-aws-well-architected-framework-devops-guidance/

Today, Amazon Web Services (AWS) announced the launch of the AWS Well-Architected Framework DevOps Guidance. The AWS DevOps Guidance introduces the AWS DevOps Sagas—a collection of modern capabilities that together form a comprehensive approach to designing, developing, securing, and efficiently operating software at cloud scale. Taking the learnings from Amazon’s own transformation journey and our experience managing global cloud services, the AWS DevOps Guidance was built to equip organizations of all sizes with best practice culture, processes, and technical capabilities that help to deliver business value and applications more securely and at a higher velocity.

A Glimpse into Amazon’s DevOps Transformation

In the early 2000s, Amazon went through its own DevOps transformation which led to an online bookstore forming the AWS cloud computing division. Today, AWS provides a wide range of products and services for global customers that are powered by that same innovative DevOps approach. Due to the positive effects of this transformation, AWS recognizes the significance of DevOps and has been at the forefront of its adoption and implementation.

Amazon’s own journey, along with the collective experience gained from assisting customers as they modernize and migrate to the cloud, provided insight into the capabilities which we believe make DevOps adoption successful. With these learnings, we created the DevOps Sagas to help our customers sustainably adopt and practice DevOps through the implementation of an interconnected set of capabilities. Each DevOps Saga includes prescriptive guidance for capabilities that provide indicators of success, metrics to measure, and common anti-patterns to avoid.

Introducing The DevOps Sagas

The DevOps Sagas are core domains within the software delivery process that collectively form AWS DevOps best practices. Together, they encompass a collection of modern capabilities representing a comprehensive approach to designing, developing, securing, and efficiently operating software at cloud scale. You can use the DevOps Sagas as a common definition of what DevOps means to your organization by aligning on a shared understanding within your organization and to consistently measure DevOps adoption over time. The 5 DevOps Sagas are:

  • Organizational Adoption Saga: Inspires the formation of a customer-centric, adaptive culture focused on optimizing people-driven processes, personal and professional development, and improving developer experience to set the foundation for successful DevOps adoption.
  • Development Lifecycle Saga: Aims to enhance the organization’s capacity to develop, review, and deploy workloads swiftly and securely. It leverages feedback loops, consistent deployment methods, and an ‘everything-as-code’ approach to attain efficiency in deployment.
  • Quality Assurance Saga: Advocates for a proactive, test-first methodology integrated into the development process to ensure that applications are well-architected by design, secure, cost-efficient, sustainable, and delivered with increased agility through automation.
  • Automated Governance Saga: Facilitates directive, detective, preventive, and responsive measures at all stages of the development process. It emphasizes risk management, business process adherence, and application and infrastructure compliance at scale through automated processes, policies, and guardrails.
  • Observability Saga: Presents an approach to incorporating observability within environment and workloads, allowing teams to detect and address issues, improve performance, reduce costs, and ensure alignment with business objectives and customer needs.

DevOps Sagas display image defining the definition of Sagas, Capabilities, Indicators, Metrics, and Anti-Patterns. AWS DevOps Sagas provides foundational DevOps capabilities, indicators, and metrics aligned to AWS best practices. Sagas are core domains that collectively form AWS DevOps best practices. Capabilities are individual practices with differentiated outcomes that form a Saga. Indicators objectively measure qualities of each capability. Metrics quantify and measure proficiency of each capability. Anti-patterns avoid behaviors that may seem beneficial but lead to inefficient outcomes.

Who should use the AWS DevOps Guidance?

We recognize that every organization is unique and that there is no one-size-fits-all approach to practicing DevOps. The recommendations and examples provided can be tailored to suit your organization’s environment, quality, and security needs. The AWS DevOps Guidance is designed for a wide range of professionals and organizations, including startups exploring DevOps for the first time, established enterprises refining their processes, public sector companies, cloud-native businesses, and customers migrating to the AWS Cloud. Whether you are steering strategic direction as a Chief Technology Officer (CTO) or Chief Information Security Officer (CISO), a developer or architect actively engaged in designing and deploying workloads, or in a compliance role overseeing quality assurance, auditing, or governance, this guidance is tailored to help you.

Next Steps

With the release of the AWS DevOps Guidance, we encourage you, our customers, to download and read the document, as well as implement and test your workloads in accordance with the recommendations within. Use the AWS DevOps Guidance in tandem with the AWS Well-Architected Framework to conduct an assessment of your organization and individual workload’s adherence to DevOps best practices to pinpoint areas of strength and opportunities for improvement. Collaborate with your teams – from developers to operations and decision-makers – to share insights from your assessment. Use the insights gained from the AWS DevOps Guidance to prioritize areas of improvement and iteratively improve your DevOps capabilities.

Find the AWS DevOps Guidance on the AWS Well-Architected website or contact your AWS account team for more information. As with the AWS Well-Architected Framework and other industry and technology guidance, we recommend leveraging the AWS DevOps Guidance early and often – as you approach architectural and service design decisions, and whenever you carry out Well-Architected reviews. As you use the AWS DevOps Guidance, we would appreciate your comments and feedback to help us improve as best practices and technology evolve. We will continually refresh the content as we identify new best practices, metrics, and common scenarios.

Announcing updates to the AWS Well-Architected Framework guidance

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework-guidance/

We are excited to announce the availability of improved AWS Well-Architected Framework guidance. In this update, we have made changes across all six pillars of the framework: Operational ExcellenceSecurityReliabilityPerformance EfficiencyCost Optimization, and Sustainability.

In this release, we have made the implementation guidance for the new and updated best practices more prescriptive, including enhanced recommendations and steps on reusable architecture patterns targeting specific business outcomes in the Amazon Web Services (AWS) Cloud.

A brief history

The Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads in the cloud.

In 2012, the first version of the framework was published, leading to the 2015 release of the guidance whitepaper. We added the Operational Excellence pillar in 2016. The pillar-specific whitepapers and AWS Well-Architected Lenses were released in 2017, and the following year, the AWS Well-Architected Tool was launched.

In 2020, Well-Architected Framework guidance had a new release, along with more lenses, as well as API integration with the AWS Well-Architected Tool. The sixth pillar, Sustainability, was added in 2021. In 2022, dedicated pages were introduced for each consolidated best practice across all six pillars, with several best practices updated with improved prescriptive guidance. By April 2023, more than 50% of the Framework’s best practices have had their prescriptive guidance improved.

A brief history of the AWS Well-Architected Framework

A brief history of the AWS Well-Architected Framework

What’s new

As customers mature in their journey, they are seeking guidance to achieve accurate solutions that is prescriptive to their business, environments, and workloads. AWS Well-Architected is committed to providing such information to customers by continually evolving and updating our guidance.

The content updates and improvements in this release focus on having more complete coverage across the AWS service portfolio, helping customers make more informed decisions when developing implementation plans. Services that were added or expanded in coverage include: AWS Elastic Disaster Recovery, AWS Trusted Advisor, AWS Resilience Hub, AWS Config, AWS Security Hub, Amazon GuardDuty, AWS Organizations, AWS Control Tower, AWS Compute Optimizer, AWS Budgets, Amazon CodeWhisperer, Amazon CodeGuru, Amazon EventBridge, Amazon CloudWatch, Amazon Simple Notification Service, AWS Systems Manager, Amazon ElastiCache, and AWS Global Accelerator.

Pillar updates

Operational Excellence

The Operational Excellence Pillar has received updates to two of the five Design Principles and has a new Design Principle on observability, which highlights its importance and relevance throughout the pillar content. All 10 best practices in OPS05 have been updated, and we have consolidated 28 best practices into 16, across four questions (OPS04, OPS06, OPS08, and OPS09), as well as improving prescriptive guidance.

Security

In the Security Pillar, the Incident response in SEC10 underwent an update to align with the AWS Security Incident Response Guide, while introducing one new best practice, and improving the prescriptive guidance for others. Two best practices in SEC08 and SEC09 have received improved prescriptive guidance on securing workloads at rest and in transit.

Reliability

The Reliability Pillar has received prescriptive guidance improvements to one best practice in REL06, and six best practices in REL11, focused on how to best monitor, failover, remediate, and limit impacts of failures. The update addresses a wide variety of managed services and designs, including multi-Region-based resilience.

Performance Efficiency

The Performance Efficiency Pillar has been completely restructured, consolidating and merging guidance to reduce the number of best practices by 10 and the number of questions by three. We have added best practices around efficient caching and optimizing hardware acceleration. We have also improved the implementation guidance in all 32 best practices of the newly restructured Pillar.

Cost Optimization

The Cost Optimization Pillar has 10 best practices with improved implementation prescriptive guidance.

Sustainability

The Sustainability Pillar has received updates to the risk levels of seven best practices.

Conclusion

This Well-Architected release includes updates and improvements to 90 best practices: Operational Excellence (26), Security (8), Reliability (7), Performance Efficiency (32), Cost Optimization (10), and Sustainability (7). These changes are in addition to the 151 improved best practices released in 2023 (127 in April 10, 2023; and 24 in July 13, 2023), resulting in more than 73% of the existing Framework best practices updated at least once in the last year.

As of this release, 100% of Performance Efficiency, Cost Optimization, and Sustainability; 63% of Operational Excellence; 60% of Security; and 50% of Reliability Pillar content have been refreshed at least once since October 2022.

The content is available in 11 languages: English, Spanish, French, German, Italian, Japanese, Korean, Indonesian, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Updates in this release are also available in the AWS Well-Architected Tool, which can be used to review your workloads, address important design considerations, and help ensure that you follow the best practices and guidance of the AWS Well-Architected Framework.

Ready to get started? Review the updated AWS Well-Architected Framework Pillar best practices, as well as pillar-specific whitepapers.

Have questions about some of the new best practices or most recent updates? Join our growing community on AWS re:Post.

Let’s Architect! Resiliency in architectures

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-resiliency-in-architectures/

What is “resiliency”, and why does it matter? When we discussed this topic in an early 2022 edition of Let’s Architect!, we referenced the AWS Well-Architected Framework, which defines resilience as having “the capability to recover when stressed by load, accidental or intentional attacks, and failure of any part in the workload’s components.” Businesses rely heavily on the availability and performance of their digital services. Resilience has emerged as critical for any efficiently architected system, which is why it is a fundamental role in ensuring the reliability and availability of workloads hosted on the AWS Cloud platform.

In this newer edition of Let’s Architect!, we share some best practices for putting together resilient architectures, focusing on providing continuous service and avoiding disruptions. Ensuring uninterrupted operations is likely a primary objective when it comes to building a resilient architecture.

Understand resiliency patterns and trade-offs to architect efficiently in the cloud

In this AWS Architecture Blog post, the authors introduce five resilience patterns. Each of these patterns comes with specific strengths and trade-offs, allowing architects to personalize their resilience strategies according to the unique requirements of their applications and business needs. By understanding these patterns and their implications, organizations can design resilient cloud architectures that deliver high availability and efficient recovery from potential disruptions.

Take me to this Architecture Blog post!

Resilience patterns and tradeoffs

Resilience patterns and tradeoffs

Timeouts, retries, and backoff with jitter

Marc Broker discusses the inevitability of failures and the importance of designing systems to withstand them. He highlights three essential tools for building resilience: timeouts, retries, and backoff. By embracing these three techniques, we can create robust systems that maintain high availability in the face of failures. Timeouts, backoff, and jitter are fundamental to spread the traffic coming from clients and avoid overloading your systems. Building resilience is a fundamental aspect of ensuring the reliability and performance of AWS services in the ever-changing and dynamic technological landscape.

Take me to the Amazon Builders’ Library!

The Amazon Builder’s Library is a collection of technical resources produced by engineers at Amazon

The Amazon Builder’s Library is a collection of technical resources produced by engineers at Amazon

Prepare & Protect Your Applications From Disruption With AWS Resilience Hub

The AWS Resilience Hub not only protects businesses from potential downtime risks but also helps them build a robust foundation for their applications, ensuring uninterrupted service delivery to customers and users.

In this AWS Online Tech Talk, led by the Principal Product Manager of AWS Resilience Hub, the importance of a resilience hub to protect mission-critical applications from downtime risks is emphasized. The AWS Resilience Hub is showcased as a centralized platform to define, validate, and track application resilience. The talk includes strategies to avoid disruptions caused by software, infrastructure, or operational issues, plus there’s also a demo demonstrating how to apply these techniques effectively.

If you are interested in delving deeper into the services discussed in the session, AWS Resilience Hub is a valuable resource for monitoring and implementing resilient architectures.

Take me to this AWS Online Tech Talk!

AWS Resilience Hub recommendations

AWS Resilience Hub recommendations

Data resiliency design patterns with AWS

In this re:Invent 2022 session, data resiliency, why it matters to customers, and how you can incorporate it into your application architecture is discussed in depth. This session kicks off with the comprehensive overview of data resiliency, breaking down its core components and illustrating its critical role in modern application development. It, then, covers application data resiliency and protection designs, plus extending from the native data resiliency capabilities of AWS storage through DR solutions using AWS Elastic Disaster Recovery.

Take me to this re:Invent 2022 video!

Asynchronous cross-region replication

Asynchronous cross-region replication

See you next time!

Thanks for joining our discussion on architecture resiliency! See you in two weeks when we’ll talk about security on AWS.

To find all the blogs from this series, visit the Let’s Architect! list of content on the AWS Architecture Blog.

Introducing the latest Machine Learning Lens for the AWS Well-Architected Framework

Post Syndicated from Raju Patil original https://aws.amazon.com/blogs/architecture/introducing-the-latest-machine-learning-lens-for-the-aws-well-architected-framework/

Today, we are delighted to introduce the latest version of the AWS Well-Architected Machine Learning (ML) Lens whitepaper. The AWS Well-Architected Framework provides architectural best practices for designing and operating ML workloads on AWS. It is based on six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and—a new addition to this revision—Sustainability. The ML Lens uses the Well-Architected Framework to outline the steps for performing an AWS Well-Architected review for your ML implementations.

The ML Lens provides a consistent approach for customers to evaluate ML architectures, implement scalable designs, and identify and mitigate technical risks. It covers common ML implementation scenarios and identifies key workload elements to allow you to architect your cloud-based applications and workloads according to the AWS best practices that we have gathered from supporting thousands of customer implementations.

The new ML Lens joins a collection of Well-Architected lenses that focus on specialized workloads such as the Internet of Things (IoT), games, SAP, financial services, and SaaS technologies. You can find more information in AWS Well-Architected Lenses.

What is the Machine Learning Lens?

Let’s explore the ML Lens across ML lifecycle phases, as the following figure depicts.

Machine Learning Lens

Figure 1. Machine Learning Lens

The Well-Architected ML Lens whitepaper focuses on the six pillars of the Well-Architected Framework across six phases of the ML lifecycle. The six phases are:

  1. Defining your business goal
  2. Framing your ML problem
  3. Preparing your data sources
  4. Building your ML model
  5. Entering your deployment phase
  6. Establishing the monitoring of your ML workload

Unlike the traditional waterfall approach, an iterative approach is required to achieve a working prototype based on the six phases of the ML lifecycle. The whitepaper provides you with a set of established cloud-agnostic best practices in the form of Well-Architected Pillars for each ML lifecycle phase. You can also use the Well-Architected ML Lens wherever you are on your cloud journey. You can choose either to apply this guidance during the design of your ML workloads, or after your workloads have entered production as a part of the continuous improvement process.

What’s new in the Machine Learning Lens?

  1. Sustainability Pillar: As building and running ML workloads becomes more complex and consumes more compute power, refining compute utilities and assessing your carbon footprint from these workloads grows to critical importance. The new pillar focuses on long-term environmental sustainability and presents design principles that can help you build ML architectures that maximize efficiency and reduce waste.
  2. Improved best practices and implementation guidance: Notably, enhanced guidance to identify and measure how ML will bring business value against ML operational cost to determine the return on investment (ROI).
  3. Updated guidance on new features and services: A set of updated ML features and services announced to-date have been incorporated into the ML Lens whitepaper. New additions include, but are not limited to, the ML governance features, the model hosting features, and the data preparation features. These and other improvements will make it easier for your development team to create a well-architected ML workloads in your enterprise.
  4. Updated links: Many documents, blogs, instructional and video links have been updated to reflect a host of new products, features, and current industry best practices to assist your ML development.

Who should use the Machine Learning Lens?

The Machine Learning Lens is of use to many roles, including:

  • Business leaders for a broader appreciation of the end-to-end implementation and benefits of ML
  • Data scientists to understand how the critical modeling aspects of ML fit in a wider context
  • Data engineers to help you use your enterprise’s data assets to their greatest potential through ML
  • ML engineers to implement ML prototypes into production workloads reliably, securely, and at scale
  • MLOps engineers to build and manage ML operation pipelines for faster time to market
  • Risk and compliance leaders to understand how the ML can be implemented responsibly by providing compliance with regulatory and governance requirements

Machine Learning Lens components

The Lens includes four focus areas:

1. The Well-Architected Machine Learning Design Principles

A set of best practices that are used as the basis for developing a Well-Architected ML workload.

2. The Machine Learning Lifecycle and the Well Architected Framework Pillars

This considers all aspects of the Machine Learning Lifecycle and reviews design strategies to align to pillars of the overall Well-Architected Framework.

  • The Machine Learning Lifecycle phases referenced in the ML Lens include:
    • Business goal identification – identification and prioritization of the business problem to be addressed, along with identifying the people, process, and technology changes that may be required to measure and deliver business value.
    • ML problem framing – translating the business problem into an analytical framing, i.e., characterizing the problem as an ML task, such as classification, regression, or clustering, and identifying the technical success metrics for the ML model.
    • Data processing – garnering and integrating datasets, along with necessary data transformations needed to produce a rich set of features.
    • Model development – iteratively training and tuning your model, and evaluating candidate solutions in terms of the success metrics as well as including wider considerations such as bias and explainability.
    • Model deployment – establishing the mechanism to flow data though the model in a production setting to make inferences based on production data.
    • Model monitoring – tracking the performance of the production model and the characteristics of the data used for inference.
  • The Well-Architected Framework Pillars are:
    • Operational Excellence – ability to support ongoing development, run operational workloads effectively, gain insight into your operations, and continuously improve supporting processes and procedures to deliver business value.
    • Security – ability to protect data, systems, and assets, and to take advantage of cloud technologies to improve your security.
    • Reliability – ability of a workload to perform its intended function correctly and consistently, and to automatically recover from failure situations.
    • Performance Efficiency – ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as system demand changes and technologies evolve.
    • Cost Optimization – ability to run systems to deliver business value at the lowest price point.
    • Sustainability – addresses the long-term environmental, economic, and societal impact of your business activities.

3. Cloud-agnostic best practices

These are best practices for each ML lifecycle phase across the Well-Architected Framework pillars irrespective of your technology setting. The best practices are accompanied by:

  • Implementation guidance – the AWS implementation plans for each best practice with references to AWS technologies and resources.
  • Resources – a set of links to AWS documents, blogs, videos, and code examples as supporting resources to the best practices and their implementation plans.

4. Indicative ML Lifecycle architecture diagrams to illustrate processes, technologies, and components that support many of these best practices.

What are the next steps?

The new Well-Architected Machine Learning Lens whitepaper is available now. Use the Lens whitepaper to determine that your ML workloads are architected with operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability in mind.

If you require support on the implementation or assessment of your Machine Learning workloads, please contact your AWS Solutions Architect or Account Representative.

Special thanks to everyone across the AWS Solution Architecture, AWS Professional Services, and Machine Learning communities, who contributed to the Lens. These contributions encompassed diverse perspectives, expertise, backgrounds, and experiences in developing the new AWS Well-Architected Machine Learning Lens.

Implementing AWS Well-Architected best practices for Amazon SQS – Part 3

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-3/

This blog is written by Chetan Makvana, Senior Solutions Architect and Hardik Vasa, Senior Solutions Architect.

This is the third part of a three-part blog post series that demonstrates best practices for Amazon Simple Queue Service (Amazon SQS) using the AWS Well-Architected Framework.

This blog post covers best practices using the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar. The inventory management example introduced in part 1 of the series will continue to serve as an example.

See also the other two parts of the series:

Performance Efficiency Pillar

The Performance Efficiency Pillar includes the ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve. It recommends best practices to use trade-offs to improve performance, such as learning about design patterns and services and identify how tradeoffs impact customers and efficiency.

By adopting these best practices, you can optimize the performance of SQS by employing appropriate configurations and techniques while considering trade-offs for the specific use case.

Best practice: Use action batching or horizontal scaling or both to increase throughput

For achieving high throughput in SQS, optimizing the performance of your message processing is crucial. You can use two techniques: horizontal scaling and action batching.

When dealing with high message volume, consider horizontally scaling the message producers and consumers by increasing the number of threads per client, by adding more clients, or both. By distributing the load across multiple threads or clients, you can handle a high number of messages concurrently.

Action batching distributes the latency of the batch action over the multiple messages in a batch request, rather than accepting the entire latency for a single message. Because each round trip carries more work, batch requests make more efficient use of threads and connections, improving throughput. You can combine batching with horizontal scaling to provide throughput with fewer threads, connections, and requests than individual message requests.

In the inventory management example that we introduced in part 1, this scaling behavior is managed by AWS for the AWS Lambda function responsible for backend processing. When a Lambda function subscribes to an SQS queue, Lambda polls the queue as it waits for the inventory updates requests to arrive. Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time. If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages.

This means that Lambda can scale up to 1,000 concurrent Lambda functions processing messages from the SQS queue. Batching enables the inventory management system to handle a high volume of inventory update messages efficiently. This ensures real-time visibility into inventory levels and enhances the accuracy and responsiveness of inventory management operations.

Best practice: Trade-off between SQS standard and First-In-First-Out (FIFO) queues

SQS supports two types of queues: standard queues and FIFO queues. Understanding the trade-offs between SQS standard and FIFO queues allows you to make an informed choice that aligns with your application’s requirements and priorities. While SQS standard queues support a nearly unlimited throughput, it sacrifices strict message ordering and occasionally delivers messages in an order different from the one they were sent in. If maintaining the exact order of events is not critical for your application, utilizing SQS standard queues can provide significant benefits in terms of throughput and scalability.

On the other hand, SQS FIFO queues guarantee message ordering and exactly-once processing. This makes them suitable for applications where maintaining the order of events is crucial, such as financial transactions or event-driven workflows. However, FIFO queues have a lower throughput compared to standard queues. They can handle up to 3,000 transactions per second (TPS) per API method with batching, and 300 TPS without batching. Consider using FIFO queues only when the order of events is important for the application, otherwise use standard queues.

In the inventory management example, since the order of inventory records is not crucial, the potential out-of-order message delivery that can occur with SQS standard queues is unlikely to impact the inventory processing. This allows you to take advantage of the benefits provided by SQS standard queues, including their ability to handle a high number of transactions per second.

Cost Optimization Pillar

The Cost Optimization Pillar includes the ability to run systems to deliver business value at the lowest price. It recommends best practices to build and operate cost-aware workloads that achieve business outcomes while minimizing costs and allowing your organization to maximize its return on investment.

Best practice: Configure cost allocation tags for SQS to organize and identify SQS for cost allocation

A well-defined tagging strategy plays a vital role in establishing accurate chargeback or showback models. By assigning appropriate tags to resources, such as SQS queues, you can precisely allocate costs to different teams or applications. This level of granularity ensures fair and transparent cost allocation, enabling better financial management and accountability.

In the inventory management example, tagging the SQS queue allows for specific cost tracking under the Inventory department, enabling a more accurate assessment of expenses. The following code snippet shows how to tag the SQS queue using AWS Could Development Kit (AWS CDK).

# Create the SQS queue with DLQ setting
queue = sqs.Queue(
    self,
    "InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(300),
)

Tags.of(queue).add("department", "inventory")

Best practice: Use long polling

SQS offers two methods for receiving messages from a queue: short polling and long polling. By default, queues use short polling, where the ReceiveMessage request queries a subset of servers to identify available messages. Even if the query found no messages, SQS sends the response right away.

In contrast, long polling queries all servers in the SQS infrastructure to check for available messages. SQS responds only after collecting at least one message, respecting the specified maximum. If no messages are immediately available, the request is held open until a message becomes available or the polling wait time expires. In such cases, an empty response is sent.

Short polling provides immediate responses, making it suitable for applications that require quick feedback or near-real-time processing. On the other hand, long polling is ideal when efficiency is prioritized over immediate feedback. It reduces API calls, minimizes network traffic, and improves resource utilization, leading to cost savings.

In the inventory management example, long polling enhances the efficiency of processing inventory updates. It collects and retrieves available inventory update messages in a batch of 10, reducing the frequency of API requests. This batching approach optimizes resource utilization, minimizes network traffic, and reduces excessive API consumption, resulting in cost savings. You can configure this behavior using batch size and batch window:

# Add the SQS queue as a trigger to the Lambda function
sqs_to_dynamodb_function.add_event_source_mapping(
    "MyQueueTrigger", event_source_arn=queue.queue_arn, batch_size=10
)

Best practice: Use batching

Batching messages together allows you to send or retrieve multiple messages in a single API call. This reduces the number of API requests required to process or retrieve messages compared to sending or retrieving messages individually. Since SQS pricing is based on the number of API requests, reducing the number of requests can lead to cost savings.

To send, receive, and delete messages, and to change the message visibility timeout for multiple messages with a single action, use Amazon SQS batch API actions. This also helps with transferring less data, effectively reducing the associated data transfer costs, especially if you have many messages.

In the context of the inventory management example, the CSV processing Lambda function groups 10 inventory records together in each API call, forming a batch. By doing so, the number of API requests is reduced by a factor of 10 compared to sending each record separately. This approach optimizes the utilization of API resources, streamlines message processing, and ultimately contributes to cost efficiency. Following is the code snippet from the CSV processing Lambda function showcasing the use of SendMessageBatch to send 10 messages with a single action.

# Parse the CSV records and send them to SQS as batch messages
csv_reader = csv.DictReader(csv_content.splitlines())
message_batch = []
for row in csv_reader:
    # Convert the row to JSON
    json_message = json.dumps(row)

    # Add the message to the batch
    message_batch.append(
        {"Id": str(len(message_batch) + 1), "MessageBody": json_message}
    )

    # Send the batch of messages when it reaches the maximum batch size (10 messages)
    if len(message_batch) == 10:
        sqs_client.send_message_batch(QueueUrl=queue_url, Entries=message_batch)
        message_batch = []
        print("Sent messages in batch")

Best practice: Use temporary queues

In case of short-lived, lightweight messaging with synchronous two-way communication, you can use temporary queues. The temporary queue makes it easy to create and delete many temporary messaging destinations without inflating your AWS bill. The key concept behind this is the virtual queue. Virtual queues let you multiplex many low-traffic queues onto a single SQS queue. Creating a virtual queue only instantiates a local buffer to hold messages for consumers as they arrive; there is no API call to SQS, and no costs associated with creating a virtual queue.

The inventory management example does not use temporary queues. However, in use cases that involve short-lived, lightweight messaging with synchronous two-way communication, adopting the best practice of using temporary queues and virtual queues can enhance the overall efficiency, reduce costs, and simplify the management of messaging destinations.

Sustainability Pillar

The Sustainability Pillar provides best practices to meet sustainability targets for your AWS workloads. It encompasses considerations related to energy efficiency and resource optimization.

Best practice: Use long polling

Besides its cost optimization benefits explained as part of the Cost Optimization Pillar, long polling also plays a crucial role in improving resource efficiency by reducing API requests, minimizing network traffic, and optimizing resource utilization.

By collecting and retrieving available messages in a batch, long polling reduces the frequency of API requests, resulting in improved resource utilization and minimized network traffic. By reducing excessive API consumption through long polling, you can effectively use resources. It collects and retrieves messages in batches, reducing excessive API consumption and unnecessary network traffic.

By reducing API calls, it optimizes data transfer and infrastructure operations. Additionally, long polling’s batching approach optimizes resource allocation, utilizing system resources more efficiently and improving energy efficiency. This enables the inventory management system to handle high message volumes effectively while operating in a cost-efficient and resource-efficient manner.

Conclusion

This blog post explores best practices for SQS using the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar of the AWS Well-Architected Framework. We cover techniques such as batch processing, message batching, and scaling considerations. We also discuss important considerations, such as resource utilization, minimizing resource waste, and reducing cost.

This three-part blog post series covers a wide range of best practices, spanning the Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability Pillars of the AWS Well-Architected Framework. By following these guidelines and leveraging the power of the AWS Well-Architected Framework, you can build robust, secure, and efficient messaging systems using SQS.

For more serverless learning resources, visit Serverless Land.

Implementing AWS Well-Architected best practices for Amazon SQS – Part 2

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-2/

This blog is written by Chetan Makvana, Senior Solutions Architect and Hardik Vasa, Senior Solutions Architect.

This is the second part of a three-part blog post series that demonstrates implementing best practices for Amazon Simple Queue Service (Amazon SQS) using the AWS Well-Architected Framework.

This blog post covers best practices using the Security Pillar and Reliability Pillar of the AWS Well-Architected Framework. The inventory management example introduced in part 1 of the series will continue to serve as an example.

See also the other two parts of the series:

Security Pillar

The Security Pillar includes the ability to protect data, systems, and assets and to take advantage of cloud technologies to improve your security. This pillar recommends putting in place practices that influence security. Using these best practices, you can protect data while in-transit (as it travels to and from SQS) and at rest (while stored on disk in SQS), or control who can do what with SQS.

Best practice: Configure server-side encryption

If your application has a compliance requirement such as HIPAA, GDPR, or PCI-DSS mandating encryption at rest, if you are looking to improve data security to protect against unauthorized access, or if you are just looking for simplified key management for the messages sent to the SQS queue, you can leverage Server-Side Encryption (SSE) to protect the privacy and integrity of your data stored on SQS.

SQS and AWS Key Management Service (KMS) offer two options for configuring server-side encryption. SQS-managed encryptions keys (SSE-SQS) provide automatic encryption of messages stored in SQS queues using AWS-managed keys. This feature is enabled by default when you create a queue. If you choose to use your own AWS KMS keys to encrypt and decrypt messages stored in SQS, you can use the SSE-KMS feature.

Amazon SQS Encryption Settings

SSE-KMS provides greater control and flexibility over encryption keys, while SSE-SQS simplifies the process by managing the encryption keys for you. Both options help you protect sensitive data and comply with regulatory requirements by encrypting data at rest in SQS queues. Note that SSE-SQS only encrypts the message body and not the message attributes.

In the inventory management example introduced in part 1, an AWS Lambda function responsible for CSV processing sends incoming messages to an SQS queue when an inventory updates file is dropped into the Amazon Simple Storage Service (Amazon S3) bucket. SQS encrypts these messages in the queue using SQS-SSE. When a backend processing Lambda polls messages from the queue, the encrypted message is decrypted, and the function inserts inventory updates into Amazon DynamoDB.

The AWS Could Development Kit (AWS CDK) code sets SSE-SQS as the default encryption key type. However, the following AWS CDK code shows how to encrypt the queue with SSE-KMS.

# Create the SQS queue with DLQ setting
queue = sqs.Queue(
    self,
    "InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(300),
    encryption=sqs.QueueEncryption.KMS_MANAGED,
)

Best practice: Implement least-privilege access using access policy

For securing your resources in AWS, implementing least-privilege access is critical. This means granting users and services the minimum level of access required to perform their tasks. Least-privilege access provides better security, allows you to meet your compliance requirements, and offers accountability via a clear audit trail of who accessed what resources and when.

By implementing least-privilege access using access policies, you can help reduce the risk of security breaches and ensure that your resources are only accessed by authorized users and services. AWS Identity and Access Management (IAM) policies apply to users, groups, and roles, while resource-based policies apply to AWS resources such as SQS queues. To implement least-privilege access, it’s essential to start by defining what actions are required for each user or service to perform their tasks.

In the inventory management example, the CSV processing Lambda function doesn’t perform any other task beyond parsing the inventory updates file and sending the inventory records to the SQS queue for further processing. To ensure that the function has the permissions to send messages to the SQS queue, grant the SQS queue access to the IAM role that the Lambda function assumes. By granting the SQS queue access to the Lambda function’s IAM role, you establish a secure and controlled communication channel. The Lambda function can only interact with the SQS queue and doesn’t have unnecessary access or permissions that might compromise the system’s security.

# Create pre-processing Lambda function
csv_processing_to_sqs_function = _lambda.Function(
    self,
    "CSVProcessingToSQSFunction",
    runtime=_lambda.Runtime.PYTHON_3_8,
    code=_lambda.Code.from_asset("sqs_blog/lambda"),
    handler="CSVProcessingToSQSFunction.lambda_handler",
    role=role,
    tracing=Tracing.ACTIVE,
)

# Define the queue policy to allow messages from the Lambda function's role only
policy = iam.PolicyStatement(
    actions=["sqs:SendMessage"],
    effect=iam.Effect.ALLOW,
    principals=[iam.ArnPrincipal(role.role_arn)],
    resources=[queue.queue_arn],
)

queue.add_to_resource_policy(policy)

Best practice: Allow only encrypted connections over HTTPS using aws:SecureTransport

It is essential to have a secure and reliable method for transferring data between AWS services and on-premises environments or other external systems. With HTTPS, a network-based attacker cannot eavesdrop on network traffic or manipulate it, using an attack such as man-in-the-middle.

With SQS, you can choose to allow only encrypted connections over HTTPS using the aws:SecureTransport condition key in the queue policy. With this condition in place, any requests made over non-secure HTTP receive a 400 InvalidSecurity error from SQS.

In the inventory management example, the CSV processing Lambda function sends inventory updates to the SQS queue. To ensure secure data transfer, the Lambda function uses the HTTPS endpoint provided by SQS. This guarantees that the communication between the Lambda function and the SQS queue remains encrypted and resistant to potential security threats.

# Create an IAM policy statement allowing only HTTPS access to the queue
secure_transport_policy = iam.PolicyStatement(
    effect=iam.Effect.DENY,
    actions=["sqs:*"],
    resources=[queue.queue_arn],
    conditions={
        "Bool": {
            "aws:SecureTransport": "false",
        },
    },
)

Best practice: Use attribute-based access controls (ABAC)

Some use-cases require granular access control. For example, authorizing a user based on user roles, environment, department, or location. Additionally, dynamic authorization is required based on changing user attributes. In this case, you need an access control mechanism based on user attributes.

Attribute-based access controls (ABAC) is an authorization strategy that defines permissions based on tags attached to users and AWS resources. With ABAC, you can use tags to configure IAM access permissions and policies for your queues. ABAC hence enables you to scale your permission management easily. You can author a single permission policy in IAM using tags created for each business role, and no longer need to update the policy when adding new resources.

ABAC for SQS queues enables two key use cases:

  • Tag-based access control: use tags to control access to your SQS queues, including control plane and data plane API calls.
  • Tag-on-create: enforce tags at the time of creation of an SQS queues and deny the creation of SQS resources without tags.

Reliability Pillar

The Reliability Pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to. By leveraging the best practices outlined in this pillar, you can enhance the way you manage messages in SQS.

Best practice: Configure dead-letter queues

In a distributed system, when messages flow between sub-systems, there is a possibility that some messages may not be processed right away. This could be because of the message being corrupted or downstream processing being temporarily unavailable. In such situations, it is not ideal for the bad message to block other messages in the queue.

Dead Letter Queues (DLQs) in SQS can improve the reliability of your application by providing an additional layer of fault tolerance, simplifying debugging, providing a retry mechanism, and separating problematic messages from the main queue. By incorporating DLQs into your application architecture, you can build a more robust and reliable system that can handle errors and maintain high levels of performance and availability.

In the inventory management example, a DLQ plays a vital role in adding message resiliency and preventing situations where a single bad message blocks the processing of other messages. If the backend Lambda function fails after multiple attempts, the inventory update message is redirected to the DLQ. By inspecting these unconsumed messages, you can troubleshoot and redrive them to the primary queue or to custom destination using the DLQ redrive feature. You can also automate redrive by using a set of APIs programmatically. This ensures accurate inventory updates and prevents data loss.

The following AWS CDK code snippet shows how to create a DLQ for the source queue and sets up a DLQ policy to only allow messages from the source SQS queue. It is recommended not to set the max_receive_count value to 1, especially when using a Lambda function as the consumer, to avoid accumulating many messages in the DLQ.

# Create the Dead Letter Queue (DLQ)
dlq = sqs.Queue(self, "InventoryUpdatesDlq", visibility_timeout=Duration.seconds(300))

# Create the SQS queue with DLQ setting
queue = sqs.Queue(
    self,
    "InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(300),
    dead_letter_queue=sqs.DeadLetterQueue(
        max_receive_count=3,  # Number of retries before sending the message to the DLQ
        queue=dlq,
    ),
)
# Create an SQS queue policy to allow source queue to send messages to the DLQ
policy = iam.PolicyStatement(
    effect=iam.Effect.ALLOW,
    actions=["sqs:SendMessage"],
    resources=[dlq.queue_arn],
    conditions={"ArnEquals": {"aws:SourceArn": queue.queue_arn}},
)
queue.queue_policy = iam.PolicyDocument(statements=[policy])

Best practice: Process messages in a timely manner by configuring the right visibility timeout

Setting the appropriate visibility timeout is crucial for efficient message processing in SQS. The visibility timeout is the period during which SQS prevents other consumers from receiving and processing a message after it has been polled from the queue.

To determine the ideal visibility timeout for your application, consider your specific use case. If your application typically processes messages within a few seconds, set the visibility timeout to a few minutes. This ensures that multiple consumers don’t process the message simultaneously. If your application requires more time to process messages, consider breaking them down into smaller units or batching them to improve performance.

If a message fails to process and is returned to the queue, it will not be available for processing again until the visibility timeout period has elapsed. Increasing the visibility timeout will increase the overall latency of your application. Therefore, it’s important to balance the tradeoff between reducing the likelihood of message duplication and maintaining a responsive application.

In the inventory management example, setting the right visibility timeout helps the application fail fast and improve the message processing times. Since the Lambda function typically processes messages within milliseconds, a visibility timeout of 30 seconds is set in the following AWS CDK code snippet.

queue = sqs.Queue(
    self,
    " InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(30),
)

It is recommended to keep the SQS queue visibility timeout to at least six times the Lambda function timeout, plus the value of MaximumBatchingWindowInSeconds. This allows Lambda function to retry the messages if the invocation fails.

Conclusion

This blog post explores best practices for SQS using the Security Pillar and Reliability Pillar of the AWS Well-Architected Framework. We discuss various best practices and considerations to ensure the security of SQS. By following these best practices, you can create a robust and secure messaging system using SQS. We also highlight fault tolerance and processing a message in a timely manner as important aspects of building reliable applications using SQS.

The next part of this blog post series focuses on the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar of the AWS Well-Architected Framework and explore best practices for SQS.

For more serverless learning resources, visit Serverless Land.

Implementing AWS Well-Architected best practices for Amazon SQS – Part 1

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-1/

This blog is written by Chetan Makvana, Senior Solutions Architect and Hardik Vasa, Senior Solutions Architect.

Amazon Simple Queue Service (Amazon SQS) is a fully managed message queuing service that makes it easy to decouple and scale microservices, distributed systems, and serverless applications. AWS customers have constantly discovered powerful new ways to build more scalable, elastic, and reliable applications using SQS. You can leverage SQS in a variety of use-cases requiring loose coupling and high performance at any level of throughput, while reducing cost by only paying for value and remaining confident that no message is lost. When building applications with Amazon SQS, it is important to follow architectural best practices.

To help you identify and implement these best practices, AWS provides the AWS Well-Architected Framework for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems in the AWS Cloud. Built around six pillars—operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability, AWS Well-Architected provides a consistent approach for customers and partners to evaluate architectures and implement scalable designs.

This three-part blog series covers each pillar of the AWS Well-Architected Framework to implement best practices for SQS. This blog post, part 1 of the series, discusses best practices using the Operational Excellence Pillar of the AWS Well-Architected Framework.

See also the other two parts of the series:

Solution overview

Solution architecture for Inventory Updates Process

This solution architecture shows an example of an inventory management system. The system leverages Amazon Simple Storage Service (Amazon S3), AWS Lambda, Amazon SQS, and Amazon DynamoDB to streamline inventory operations and ensure accurate inventory levels. The system handles frequent updates from multiple sources, such as suppliers, warehouses, and retail stores, which are received as CSV files.

These CSV files are then uploaded to an S3 bucket, consolidating and securing the inventory data for the inventory management system’s access. The system uses a Lambda function to read and parse the CSV file, extracting individual inventory update records. The backend Lambda function transforms each inventory update record into a message and sends it to an SQS queue. Another Lambda function continually polls the SQS queue for new messages. Upon receiving a message, it retrieves the inventory update details and updates the inventory levels in DynamoDB accordingly.

This ensures that the inventory quantities for each product are accurate and reflect the latest changes. This way, the inventory management system provides real-time visibility into inventory levels across different locations and suppliers, enabling the company to monitor product availability with precision. Find the example code for this solution in the GitHub repository.

This example is used throughout this blog series to highlight how SQS best practices can be implemented based on the AWS Well Architected Framework.

Operational Excellence Pillar

The Operational Excellence Pillar includes the ability to support development and run workloads effectively, gain insight into their operation, and continuously improve supporting processes and procedures to deliver business value. To achieve operational excellence, the pillar recommends best practices such as defining workload metrics and implementing transaction traceability. This enables organizations to gain valuable insights into their operations, identify potential issues, and optimize services accordingly to improve customer experience. Furthermore, understanding the health of an application is critical to ensuring that it is functioning as expected.

Best practice: Use infrastructure as code to deploy SQS

Infrastructure as Code (IaC) helps you model, provision, and manage your cloud resources. One of the primary advantages of IaC is that it simplifies infrastructure management. With IaC, you can quickly and easily replicate your environment to multiple AWS Regions with a single turnkey solution. This makes it easy to manage your infrastructure, regardless of where your resources are located. Additionally, IaC enables you to create, deploy, and maintain infrastructure in a programmatic, descriptive, and declarative way repeatably. This reduces errors caused by manual processes, such as creating resources in the AWS Management Console. With IaC, you can easily control and track changes in your infrastructure, which makes it easier to maintain and troubleshoot your systems.

For managing SQS resources, you can use different IaC tools like AWS Serverless Application Model (AWS SAM), AWS CloudFormation, or AWS Could Development Kit (AWS CDK). There are also third-party solutions for creating SQS resources, such as the Serverless Framework. AWS CDK is a popular choice because it allows you to provision AWS resources using familiar programming languages such as Python, Java, TypeScript, Go, JavaScript, and C#/.Net.

This blog series showcases the use of AWS CDK with Python to demonstrate best practices for working with SQS. For example, the following AWS CDK code creates a new SQS queue:

from aws_cdk import (
    Duration,
    Stack,
    aws_sqs as sqs,
)
from constructs import Construct


class SqsCdBlogStack(Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        # The code that defines your stack goes here

        # example resource
        queue = sqs.Queue(
            self,
            "InventoryUpdatesQueue",
            visibility_timeout=Duration.seconds(300),
        )

Best practice: Configure CloudWatch alarms for ApproximateAgeofOldestMessage

It is important to understand Amazon CloudWatch metrics and dimensions for SQS, to have a plan in place to assess its behavior, and to add custom metrics where necessary. Once you have a good understanding of the metrics, it is essential to identify the key metrics that are most relevant to your use case and set up appropriate alerts to monitor them.

One of the key metrics that SQS provides is the ApproximateAgeOfOldestMessage metric. By monitoring this metric, you can determine the age of the oldest message in the queue, and take appropriate action to ensure that messages are processed in a timely manner. To set up alerts for the ApproximateAgeOfOldestMessage metric, you can use CloudWatch alarms. You configure these alarms to issue alerts when messages remain in the queue for extended periods of time. You can use these alerts to act, for instance by scaling up consumers to process messages more quickly or investigating potential issues with message processing.

In the inventory management example, leveraging the ApproximateAgeOfOldestMessage metric provides valuable insights into the health and performance of the SQS queue. By monitoring this metric, you can detect processing delays, optimize performance, and ensure that inventory updates are processed within the desired timeframe. This ensures that your inventory levels remain accurate and up-to-date. The following code creates an alarm which is triggered if the oldest inventory updates request is in the queue for more than 30 seconds.

# Create a CloudWatch alarm for ApproximateAgeOfOldestMessage metric
alarm = cloudwatch.Alarm(
	self,
	"OldInventoryUpdatesAlarm",
	alarm_name="OldInventoryUpdatesAlarm",
	metric=queue.metric_approximate_age_of_oldest_message(),
	threshold=600,  # Specify your desired threshold value in seconds
	evaluation_periods=1,
	comparison_operator=cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
)

Best practice: Add a tracing header while sending a message to the queue to provide distributed tracing capabilities for faster troubleshooting

By implementing distributed tracing, you can gain a clear understanding of the flow of messages in SQS queues, identify any bottlenecks or potential issues, and proactively react to any signals that indicate an unhealthy state. Tracing provides a wider continuous view of an application and helps to follow a user journey or transaction through the application.

AWS X-Ray is an example of a distributed tracing solution that integrates with Amazon SQS to trace messages that are passed through an SQS queue. When using the X-Ray SDK, SQS can propagate tracing headers to maintain trace continuity and enable tracking, analysis, and debugging throughout downstream services. SQS supports tracing headers through the Default HTTP header and the AWSTraceHeader System Attribute. AWSTraceHeader is available for use even when auto-instrumentation through the X-Ray SDK is not, for example, when building a tracing SDK for a new language. If you are using a Lambda downstream consumer, trace context propagation is automatic.

In the inventory management example, by utilizing distributed tracing with X-Ray for SQS, you can gain deep insights into the performance, behavior, and dependencies of the inventory management system. This visibility enables you to optimize performance, troubleshoot issues more effectively, and ensure the smooth and efficient operation of the system. The following code sets up a CSV processing Lambda function and a backend processing Lambda function with active tracing enabled. The Lambda function automatically receives the X-Ray TraceId from SQS.

# Create pre-processing Lambda function
csv_processing_to_sqs_function = _lambda.Function(
    self,
    "CSVProcessingToSQSFunction",
    runtime=_lambda.Runtime.PYTHON_3_8,
    code=_lambda.Code.from_asset("sqs_blog/lambda"),
    handler="CSVProcessingToSQSFunction.lambda_handler",
    role=role,
    tracing=Tracing.ACTIVE,  # Enable active tracing with X-Ray
)

# Create a post-processing Lambda function with the specified role
sqs_to_dynamodb_function = _lambda.Function(
    self,
    "SQSToDynamoDBFunction",
    runtime=_lambda.Runtime.PYTHON_3_8,
    code=_lambda.Code.from_asset("sqs_blog/lambda"),
    handler="SQSToDynamoDBFunction.lambda_handler",
    role=role,
    tracing=Tracing.ACTIVE,  # Enable active tracing with X-Ray
)

Conclusion

This blog post explores best practices for SQS with a focus on the Operational Excellence Pillar of the AWS Well-Architected Framework. We explore key considerations for ensuring the smooth operation and optimal performance of applications using SQS. Additionally, we explore the advantages of infrastructure as code in simplifying infrastructure management and showcase how AWS CDK can be used to provision and manage SQS resources.

The next part of this blog post series addresses the Security Pillar and Reliability Pillar of the AWS Well-Architected Framework and explores best practices for SQS.

For more serverless learning resources, visit Serverless Land.

Let’s Architect! Getting started with containers

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-getting-started-with-containers/

Most of AWS customers building cloud-native applications or modernizing applications choose containers to run their microservices applications to accelerate innovation and time to market while lowering their total cost of ownership (TCO). Using containers in AWS comes with other benefits, such as increased portability, scalability, and flexibility.

The combination of containers technologies and AWS services also provides features such as load balancing, auto scaling, and service discovery, making it easier to deploy and manage applications at scale.

In this edition of Let’s Architect! we share useful resources to help you to get started with containers on AWS.

Container Build Lens

This whitepaper describes the Container Build Lens for the AWS Well-Architected Framework. It helps customers review and improve their cloud-based architectures and better understand the business impact of their design decisions. The document describes general design principles for containers, as well as specific best practices and implementation guidance using the Six Pillars of the Well-Architected Framework.

Take me to explore the Containers Build Lens!

Follow Containers Build Lens Best practices to architect your containers-based workloads

Follow Containers Build Lens Best practices to architect your containers-based workloads.

EKS Workshop

The EKS Workshop is a useful resource to familiarize yourself with Amazon Elastic Kubernetes Service (Amazon EKS) by practicing on real use-cases. It is built to help users learn about Amazon EKS features and integrations with popular open-source projects. The workshop is abstracted into high-level learning modules, including Networking, Security, DevOps Automation, and more. These are further broken down into standalone labs focusing on a particular feature, tool, or use case.

Once you’re done experimenting with EKS Workshop, start building your environments with Amazon EKS Blueprints, a collection of Infrastructure as Code (IaC) modules that helps you configure and deploy consistent, batteries-included Amazon EKS clusters across accounts and regions following AWS best practices. Amazon EKS Blueprints are available in both Terraform and CDK.

Take me to this workshop!

The workshop is abstracted into high-level learning modules, including Networking, Security, DevOps Automation, and more.

The workshop is abstracted into high-level learning modules, including Networking, Security, DevOps Automation, and more.

Architecting for resiliency on AWS App Runner

Learn how to architect an highly available and resilient application using AWS App Runner. With App Runner, you can start with just the source code of your application or a container image. The complexity of running containerized applications is abstracted away, including the cloud resources needed for running your web application or API. App Runner manages load balancers, TLS certificates, auto scaling, logs, metrics, teachability and more, so you can focus on implementing your business logic in a highly scalable and elastic environment.

Take me to this blog post!

A high-level architecture for an available and resilient application with AWS App Runner.

A high-level architecture for an available and resilient application with AWS App Runner

Securing Kubernetes: How to address Kubernetes attack vectors

As part of designing any modern system on AWS, it is necessary to think about the security implications and what can affect your security posture. This session introduces the fundamentals of the Kubernetes architecture and common attack vectors. It also includes security controls provided by Amazon EKS and suggestions on how to address them. With these strategies, you can learn how to reduce risk for your Kubernetes-based workloads.

Take me to this video!

Some common attack vectors that need addressing with Kubernetes

Some common attack vectors that need addressing with Kubernetes

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about serverless.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

AWS Week in Review: New Service for Generative AI and Amazon EC2 Trn1n, Inf2, and CodeWhisperer now GA – April 17, 2023

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-week-in-review-new-service-for-generative-ai-and-amazon-ec2-trn1n-inf2-and-codewhisperer-now-ga-april-17-2023/

I could almost title this blog post the “AWS AI/ML Week in Review.” This past week, we announced several new innovations and tools for building with generative AI on AWS. Let’s dive right into it.

Last Week’s Launches
Here are some launches that got my attention during the previous week:

Announcing Amazon Bedrock and Amazon Titan models Amazon Bedrock is a new service to accelerate your development of generative AI applications using foundation models through an API without managing infrastructure. You can choose from a wide range of foundation models built by leading AI startups and Amazon. The new Amazon Titan foundation models are pre-trained on large datasets, making them powerful, general-purpose models. You can use them as-is or privately to customize them with your own data for a particular task without annotating large volumes of data. Amazon Bedrock is currently in limited preview. Sign up here to learn more.

Building with Generative AI on AWS

Amazon EC2 Trn1n and Inf2 instances are now generally availableTrn1n instances, powered by AWS Trainium accelerators, double the network bandwidth (compared to Trn1 instances) to 1,600 Gbps of Elastic Fabric Adapter (EFAv2). The increased bandwidth delivers even higher performance for training network-intensive generative AI models such as large language models (LLMs) and mixture of experts (MoE). Inf2 instances, powered by AWS Inferentia2 accelerators, deliver high performance at the lowest cost in Amazon EC2 for generative AI models, including LLMs and vision transformers. They are the first inference-optimized instances in Amazon EC2 to support scale-out distributed inference with ultra-high-speed connectivity between accelerators. Compared to Inf1 instances, Inf2 instances deliver up to 4x higher throughput and up to 10x lower latency. Check out my blog posts on Trn1 instances and Inf2 instances for more details.

Amazon CodeWhisperer, free for individual use, is now generally availableAmazon CodeWhisperer is an AI coding companion that generates real-time single-line or full-function code suggestions in your IDE to help you build applications faster. With GA, we introduce two tiers: CodeWhisperer Individual and CodeWhisperer Professional. CodeWhisperer Individual is free to use for generating code. You can sign up with an AWS Builder ID based on your email address. The Individual Tier provides code recommendations, reference tracking, and security scans. CodeWhisperer Professional—priced at $19 per user, per month—offers additional enterprise administration capabilities. Steve’s blog post has all the details.

Amazon GameLift adds support for Unreal Engine 5Amazon GameLift is a fully managed solution that allows you to manage and scale dedicated game servers for session-based multiplayer games. The latest version of the Amazon GameLift Server SDK 5.0 lets you integrate your Unreal 5-based game servers with the Amazon GameLift service. In addition, the latest Amazon GameLift Server SDK with Unreal 5 plugin is built to work with Amazon GameLift Anywhere so that you can test and iterate Unreal game builds faster and manage game sessions across any server hosting infrastructure. Check out the release notes to learn more.

Amazon Rekognition launches Face Liveness to deter fraud in facial verification – Face Liveness verifies that only real users, not bad actors using spoofs, can access your services. Amazon Rekognition Face Liveness analyzes a short selfie video to detect spoofs presented to the camera, such as printed photos, digital photos, digital videos, or 3D masks, as well as spoofs that bypass the camera, such as pre-recorded or deepfake videos. This AWS Machine Learning Blog post walks you through the details and shows how you can add Face Liveness to your web and mobile applications.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some additional news items and blog posts that you may find interesting:

Updates to the AWS Well-Architected Framework – The most recent content updates and improvements focus on providing expanded guidance across the AWS service portfolio to help you make more informed decisions when developing implementation plans. Services that were added or expanded in coverage include AWS Elastic Disaster Recovery, AWS Trusted Advisor, AWS Resilience Hub, AWS Config, AWS Security Hub, Amazon GuardDuty, AWS Organizations, AWS Control Tower, AWS Compute Optimizer, AWS Budgets, Amazon CodeWhisperer, and Amazon CodeGuru. This AWS Architecture Blog post has all the details.

Amazon releases largest dataset for training “pick and place” robots – In an effort to improve the performance of robots that pick, sort, and pack products in warehouses, Amazon has publicly released the largest dataset of images captured in an industrial product-sorting setting. Where the largest previous dataset of industrial images featured on the order of 100 objects, the Amazon dataset, called ARMBench, features more than 190,000 objects. Check out this Amazon Science Blog post to learn more.

AWS open-source news and updates – My colleague Ricardo writes this weekly open-source newsletter in which he highlights new open-source projects, tools, and demos from the AWS Community. Read edition #153 here.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

Build On AWS - Generative AI#BuildOn Generative AI – Join our weekly live Build On Generative AI Twitch show. Every Monday morning, 9:00 US PT, my colleagues Emily and Darko take a look at aspects of generative AI. They host developers, scientists, startup founders, and AI leaders and discuss how to build generative AI applications on AWS.

In today’s episode, Emily walks us through the latest AWS generative AI announcements. You can watch the video here.

Dot Net Developer Day.NET Developer Day.NET Enterprise Developer Day EMEA 2023 (April 25) is a free, one-day virtual event providing enterprise developers with the most relevant information to swiftly and efficiently migrate and modernize their .NET applications and workloads on AWS.

AWS Developer Innovation DayAWS Developer Innovation DayAWS Developer Innovation Day (April 26) is a new, free, one-day virtual event designed to help developers and teams be productive and collaborate from discovery to delivery, to running software and building applications. Get a first look at exciting product updates, technical deep dives, and keynotes.

AWS Global Summits – Check your calendars and sign up for the AWS Summit close to where you live or work: Tokyo (April 20–21), Singapore (May 4), Stockholm (May 11), Hong Kong (May 23), Tel Aviv (May 31), Amsterdam (June 1), London (June 7), Washington, DC (June 7–8), Toronto (June 14), Madrid (June 15), and Milano (June 22).

You can browse all upcoming AWS-led in-person and virtual events and developer-focused events such as Community Days.

That’s all for this week. Check back next Monday for another Week in Review!

— Antje

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Let’s Architect! Monitoring production systems at scale

Post Syndicated from Vittorio Denti original https://aws.amazon.com/blogs/architecture/lets-architect-monitoring-production-systems-at-scale/

“Everything fails, all the time” is a famous quote from Amazon’s Chief Technology Officer Werner Vogels. This means that software and distributed systems may eventually fail because something can always go wrong. We have to accept this and design our systems accordingly, test our software and services, and think about all the possible edge cases.

With this in mind, we should also set our teams up for success by providing visibility in every environment for a quick turnaround when incidents happen. When a system serves traffic in production, we need to monitor it to make sure it behaves as expected and that all components are healthy. But questions arise such as:

  • How do we monitor a system?
  • What is monitoring?
  • What are some architectural and engineering approaches to implement in order to design a successful monitoring strategy?

All of these questions require complex answers. It’s not possible to cover everything in a blog post, but let’s start exploring the topic and sharing resources to guide you through this domain.

In this edition of Let’s Architect! we share some practices for monitoring used at Amazon and AWS, as well as more resources to discover how to build monitoring solutions for the workloads running on AWS.

Observability best practices at Amazon

Observability and monitoring are engineering tasks that also require putting a suitable cultural mindset in place. At Amazon, if a service doesn’t run as expected, the team writes a CoE (Correction of Errors) document to analyze the issue and answer critical questions to learn from it. There are also weekly operations meetings to analyze operational and performance dashboards for each service.

The session introduced here covers the full range of monitoring at Amazon, from how teams assess system health at a high level to how they understand the details of a single request. Use this resource to learn some best practices for metrics, logs, and tracing, and using these signals to achieve operational excellence.

Take me to this re:Invent video!

Observability is an iterative process which requires us to establish a feedback loop and improve based on the signals coming from the system.

Build an observability solution using managed AWS services and the OpenTelemetry standard

Visibility of what’s happening in a distributed system is key to operationalize workloads at scale. OpenTelemetry is the standard for observability and AWS services are fully integrated with that. The blog post introduced in this section shows you how AWS Distro for OpenTelemetry (ADOT) works under the hood and how to use it with a Kubernetes cluster. But keep in mind, this is just one of the many implementations available for AWS compute services and OpenTelemetry—so even if you’re not using Kubernetes right now, we’ve still got you covered!

Want more? Watch this re:Invent video for an understanding of how to think about logging, tracing, metrics, and monitoring with AWS services, and the possibilities to provide the observability your distributed systems need. This is a great learning resource with many demos and examples.

Take me to this blog post!

Flow of metrics and traces from Application services to the Observability Platform.

Optimizing your AWS Batch architecture for scale with observability dashboards

We’ve explored the mental models and strategies for monitoring in previous resources. Now let’s see how these principles can be applied in a scenario where we run batch and ML computing jobs at scale. In the blog post introduced in this section, you can learn how to use runtime metrics to understand an architecture designed on AWS Batch for running batch computing jobs. AWS Batch is a fully managed service enabling you to run jobs at any scale without needing to manage underlying compute resources. This blog explains how AWS Batch works and guides you through the process used to design a monitoring framework.

Since the solution is open-source, you are free to add other custom metrics you find useful. To get started with the AWS Batch open-source observability solution, visit the project page on GitHub. Several customers have used this monitoring tool to optimize their workload for scale by reshaping their jobs, refining their instance selection, and tuning their AWS Batch architecture.

Take me to this blog!

High-level structure of AWS Batch resources and interactions. This diagram depicts a user submitting jobs based on a job definition template to a job queue, which then communicates to a compute environment that resources are needed.

Observability workshop

This resource provides a hands-on experience for you on the variety of toolsets AWS offers to set up monitoring and observability on your applications. Whether your workload is on-premises or on AWS—or your application is a giant monolith or based on modern microservices-based architecture—the observability tools can provide deeper insights into application performance and health.

The monitoring tools covered in this workshop provide powerful capabilities that enable you to identify bottlenecks, issues, and defects without having to manually sift through various logs, metrics, and trace data.

Take me to this workshop!

The diagram illustrates the various components of the PetAdoptions architecture. In the workshop you will learn how to monitor this application.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about containers on AWS.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Announcing updates to the AWS Well-Architected Framework

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework-2/

We are excited to announce the availability of improved AWS Well-Architected Framework guidance. In this update, we have made changes across all six pillars of the framework: Operational ExcellenceSecurityReliabilityPerformance EfficiencyCost Optimization, and Sustainability.

A brief history

The AWS Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads in the cloud.

In 2012, the first version of the framework was published, leading to the 2015 release of the guidance whitepaper. We added the operational excellence pillar in 2016. The pillar-specific whitepapers and AWS Well-Architected Lenses were released in 2017, and, the following year, the AWS Well-Architected Tool was launched.

In 2020, the content for the Well-Architected Framework received a major update, as well as more lenses, and API integration with the AWS Well-Architected Tool. The sixth pillar, Sustainability, was added in 2021. In 2022, dedicated pages were introduced for each consolidated best practice across all six pillars, with several best practices updated with improved prescriptive guidance.

AWS Well-Architected timeline

AWS Well-Architected timeline

What’s new

Well-Architected Framework content is consistently updated and improved in order to adapt to the constantly changing and innovating AWS environment, with new and evolved emerging services and technologies. This ensures cloud architects can build and operate secure, high-performing, resilient, efficient, and sustainable systems in the AWS Cloud.

The content updates and improvements in this release focus on providing more complete coverage across the AWS service portfolio to help customers make more informed decisions when developing implementation plans. Services that were added or expanded in coverage include: AWS Elastic Disaster Recovery, AWS Trusted Advisor, AWS Resilience Hub, AWS Config, AWS Security Hub, Amazon GuardDuty, AWS Organizations, AWS Control Tower, AWS Compute Optimizer, AWS Budgets, Amazon CodeWhisperer, and Amazon CodeGuru.

Pillar updates

The Operational Excellence Pillar has a new best practice on enabling support plans for production workloads. This Pillar also has a major update on defining a customer communication plan for outages.

In the Security Pillar, we added a new best practice area, Application Security (AppSec). AppSec is complete with eight new best practices to guide customers as they develop, test, and release software, providing guidance on how to consider the tools, testing, and organizational approach used to develop software.

The Reliability Pillar has a new best practice on architecting workloads to meet availability targets and uptime service-level agreements (SLAs). We also added the resilience shared responsibility model to its introduction section.

The Cost Optimization Pillar has new best practices on automating operations as a part of cost-optimization efforts and enforcing data-retention policies.

In the Sustainability Pillar, we introduced a clear process for selecting Regions, as well as tools for right-sizing services and improving the overall utilization of resources in the AWS Cloud.

Best practice updates

The implementation guidance and best practices have been updated in this release to be more prescriptive, including enhanced recommendations and steps on reusable architecture patterns targeting specific business outcomes in the AWS Cloud.

As many as 113 best practices are updated with more prescriptive guidance in Operational Excellence (22), Security (18), Reliability (14), Performance Efficiency (10), Cost Optimization (22), and Sustainability (27). Fourteen new best practices have been introduced in Operational Excellence (1), Security (9), Reliability (1), Cost Optimization (2), and Sustainability (1).

From a total of 127 new/updated best practices, 78% include explicit implementation steps as part of making them more prescriptive. The remaining 22% have been updated by improving their existing implementation steps. These changes are in addition to the 51 improved best practices released in 2022 (18 in Q3 2022, and 33 in Q4 2022), resulting in more than 50% of the existing Framework best practices having been updated recently.

The content is available in 11 languages: English, Spanish, French, German, Italian, Japanese, Korean, Indonesian, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Here is the list of best practices that are new or updated in this release:

  • Operational Excellence: OPS01-BP03, OPS01-BP04, OPS02-BP01, OPS02-BP06, OPS02-BP07, OPS03-BP04, OPS03-BP05, OPS04-BP01, OPS04-BP03, OPS04-BP04, OPS04-BP05, OPS05-BP02, OPS05-BP06, OPS05-BP07, OPS07-BP01, OPS07-BP05, OPS07-BP06, OPS08-BP02, OPS08-BP03, OPS08-BP04, OPS10-BP05, OPS11-BP01, OPS11-BP04
  • Security: SEC01-BP01, SEC01-BP02, SEC01-BP07, SEC02-BP01, SEC02-BP02, SEC02-BP03, SEC02-BP05, SEC03-BP02, SEC03-BP04, SEC03-BP07, SEC03-BP09, SEC04-BP01, SEC05-BP01, SEC06-BP01, SEC07-BP01, SEC08-BP04, SEC08-BP02, SEC09-BP02, SEC03-BP08, SEC11-BP01, SEC11-BP02, SEC11-BP03, SEC11-BP04, SEC11-BP05, SEC11-BP06, SEC11-BP07, SEC11-BP08
  • Reliability: REL01-BP01, REL01-BP02, REL01-BP03, REL01-BP04, REL01-BP06, REL02-BP01, REL09-BP01, REL09-BP02, REL09-BP03, REL09-BP04, REL10_BP04, REL10-BP03, REL11-BP07, REL13-BP02, REL13-BP03
  • Performance Efficiency: PERF02-BP06, PERF05_BP03, PERF05-BP02, PERF05-BP04, PERF05-BP05, PERF05-BP06, PERF05-BP07, PFRF04-BP04, PERF02_BP04, PERF02_BP05
  • Cost Optimization: COST02_BP01, COST02_BP02, COST02_BP03, COST02_BP05, COST03_BP02, COST03_BP04, COST03_BP05, COST04_BP01, COST04_BP02, COST04_BP03, COST04_BP04, COST04_BP05, COST05_BP03, COST05_BP05, COST05_BP06, COST06_BP01, COST06_BP03, COST07_BP01, COST07_BP02, COST07_BP05, COST09_BP03, COST10_BP01, COST10_BP02, COST11_BP01
  • Sustainability: SUS01_BP01, SUS02_BP01, SUS02_BP02, SUS02_BP03, SUS02_BP04, SUS02_BP05, SUS02_BP06, SUS03_BP01, SUS03_BP02, SUS03_BP03, SUS03_BP04, SUS03_BP05, SUS04_BP01, SUS04_BP02, SUS04_BP03, SUS04_BP04, SUS04_BP05, SUS04_BP06, SUS04_BP07, SUS04_BP08, SUS05_BP01, SUS05_BP02, SUS05_BP03, SUS05_BP04, SUS06_BP01, SUS06_BP02, SUS06_BP03, SUS06_BP04

Updates in this release are also available in the AWS Well-Architected Tool, which can be used to review your workloads, address important design considerations, and help ensure that you follow the best practices and guidance of the AWS Well-Architected Framework.

Ready to get started? Review the updated AWS Well-Architected Framework Pillar best practices, as well as pillar-specific whitepapers.

Have questions about some of the new best practices or most recent updates? Join our growing community on AWS re:Post.

Let’s Architect! Streamlining business with migration and modernization

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-streamlining-business-with-migration-and-modernization/

Many customers migrate their systems to Amazon Web Services (AWS) to increase their competitive edge and drive business value. To maximize the benefits of a cloud migration, companies tend to move their applications in conjunction with modernization initiatives. These joined efforts help your applications gain more agility, scalability, and resilience. Modernizing the portfolio of workloads with AWS means that you can re-platform, refactor, or replace these workloads by using containers, serverless technologies, purpose-built data stores, and software automation. These functionalities allow you to benefit from the best of the AWS agility and total cost optimization (TCO) benefits.

In this edition of Let’s Architect! we share hands-on activities, customer stories, and tips and tricks to migrate and modernize your applications with AWS.

Migrating to the cloud: What is the cost of doing nothing?

Would you think that small companies always migrate faster than large enterprises? Actually, cloud migration speed doesn’t necessarily depend on the size of the business! Company size is not a clear indicator of migration and modernization success, but a shift of culture and mindset is essential for successful company evolution.

When it comes to migration, the cost of doing nothing is not just financial: Businesses can also expect a slower pace of innovation and a higher security burden. This video analyzes the financial benefits of migration and shares mental models for approaching an AWS cloud migration, and Marriott team members explain how they planned their migration and the lessons learned along the way.

Take me to this re:Invent 2022 video!

Benefits of an early migration start

Benefits of an early migration start

Modernization pathways for a legacy .NET Framework monolithic application on AWS

Organizations aim to deliver the best technological solutions based on customer needs. At any stage in their cloud adoption journey, businesses often end up managing and building monolithic applications. Let’s explore a migration path for a monolithic .NET Framework application to a modern microservices-based stack on AWS, and discuss AWS tools to break the monolith into microservices and containerize applications.

Cost optimization is another key factor for modernizing your workloads and solutions include moving to Linux-based systems or using open-source database engines. This Migrate and Modernize enterprise workloads with AWS video walks you through the process of migrating and modernizing enterprise workloads with AWS.

Take me to this blog post with more detail!

A modernized microservices-based rearchitecture

A modernized microservices-based rearchitecture

Implementing a serverless-first strategy in an enterprise

Organizations of all sizes want to benefit from the agility, cost savings, and developer experience that serverless architectures can provide on AWS. For large enterprises, the return on investment (ROI) can be massive, but overcoming architecture inertia while ensuring security best practices and governance stay in place is a hurdle that many struggle with. In this lightning talk, learn how your organization can implement a serverless-first strategy to overcome these obstacles. Delta Air Lines shares the story of making serverless-first a reality as part of their AWS journey.

Take me to this video

Benefits of serverless

Benefits of serverless

Application Migration with AWS

This workshop shows you how to migrate and modernize a fictional application to the AWS Cloud by:

  1. Performing a database migration
  2. Migrating and modernizing your web server using different migration strategies (for example, breaking down the monolith into containers)
  3. Teaching you how to improve Operation excellence, Security, Performance efficiency, and Cost optimization of the deployed architecture by following these pillars of the AWS Well-Architected Framework.

Take me to this workshop!

Different migration strategies for web servers

Different migration strategies for web servers

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about distributed systems with containers.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Let’s Architect! Architecting a data mesh

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-a-data-mesh/

Data architectures were mainly designed around technologies rather than business domains in the past. This changed in 2019, when Zhamak Dehghani introduced the data mesh. Data mesh is an application of the Domain-Driven-Design (DDD) principles to data architectures: Data is organized into data domains and the data is the product that the team owns and offers for consumption.

A data mesh architecture unites the disparate data sources within an organization through centrally managed data-sharing and governance guidelines. Business functions can maintain control over how shared data is accessed because data mesh also solves advanced data security challenges through distributed, decentralized ownership.

This edition of Let’s Architect! introduces data mesh, highlights the foundational concepts of data architectures, and covers the patterns for designing a data mesh in the AWS cloud with supporting resources.

Data lakes, lake houses and data mesh: what, why, and how?

Let’s explore a video introduction to data lakes, lake houses, and data mesh. This resource explains how to leverage those concepts to gain greater data insights across different business segments, with a special focus on best practices to build a well-architected, modern data architecture on AWS. It also gives an overview of the AWS cloud services that can be used to create such architectures and describes the fundamental pillars of designing them.

Take me to this intro to data lakes, lake houses, and data mesh video!

Data mesh is an architecture pattern where data are organized into domains and seen as products to expose for consumption

Data mesh is an architecture pattern where data are organized into domains and seen as products to expose for consumption

Building data mesh architectures on AWS

Knowing what a data mesh architecture is, here is a step-by-step video from re:Invent 2022 on designing one. It covers a use case on how GoDaddy considered and implemented data mesh, in addition to:

  • The fundamental pillars behind a well-architected data mesh in the cloud
  • Finding an approach to build a data mesh architecture using native AWS services
  • Reasons for considering a data mesh architecture where data lakes provide limitations in some scenarios
  • How data mesh can be applied in practice to overcome them
  • The mental models to apply during the data mesh design process

Take me to this re:Invent 2022 video!

In the data mesh architecture the producers expose their data for consumption to the consumers. Access is regulated through a centralized governance layer.

In the data mesh architecture the producers expose their data for consumption to the consumers. Access is regulated through a centralized governance layer.

Amazon DataZone: Democratize data with governance

Now let’s explore data accessibility as it relates to data mesh architectures.

Amazon DataZone is a new AWS business data catalog allowing you to unlock data across organizational boundaries with built-in governance. This service provides a unified environment where everyone in an organization—from data producers to data consumers—can access, share, and consume data in a governed manner.

Here is a video to learn how to apply AWS analytics services to discover, access, and share data across organizational boundaries within the context of a data mesh architecture.

Take me to this re:Invent 2022 video!

Amazon DataZone accelerates the adoption of the data mesh pattern by making it scalable to high number of producers and consumers.

Amazon DataZone accelerates the adoption of the data mesh pattern by making it scalable to high number of producers and consumers.

Build a data mesh on AWS

Feeling inspired to build? Hands-on experience is a great way to learn and see how the theoretical concepts apply in practice.

This workshop teaches you a data mesh architecture building approach on AWS. Many organizations are interested in implementing this architecture to:

  1. Move away from centralized data lakes to decentralized ownership
  2. Deliver analytics solutions across business units

Learn how a data mesh architecture can be implemented with AWS native services.

Take me to this workshop!

The diagrams shows how to separate the producers, consumers and governance components through a multi-account strategy.

The diagrams shows how to separate the producers, consumers and governance components through a multi-account strategy.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about monitoring and observability.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Introducing AWS Lambda Powertools for .NET

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-powertools-for-net/

This blog post is written by Amir Khairalomoum, Senior Solutions Architect.

Modern applications are built with modular architectural patterns, serverless operational models, and agile developer processes. They allow you to innovate faster, reduce risk, accelerate time to market, and decrease your total cost of ownership (TCO). A microservices architecture comprises many distributed parts that can introduce complexity to application observability. Modern observability must respond to this complexity, the increased frequency of software deployments, and the short-lived nature of AWS Lambda execution environments.

The Serverless Applications Lens for the AWS Well-Architected Framework focuses on how to design, deploy, and architect your serverless application workloads in the AWS Cloud. AWS Lambda Powertools for .NET translates some of the best practices defined in the serverless lens into a suite of utilities. You can use these in your application to apply structured logging, distributed tracing, and monitoring of metrics.

Following the community’s continued adoption of AWS Lambda Powertools for Python, Java, and TypeScript, AWS Lambda Powertools for .NET is now generally available.

This post shows how to use the new open source Powertools library to implement observability best practices with minimal coding. It walks through getting started, with the provided examples available in the Powertools GitHub repository.

About Powertools

Powertools for .NET is a suite of utilities that helps with implementing observability best practices without needing to write additional custom code. It currently supports Lambda functions written in C#, with support for runtime versions .NET 6 and newer. Powertools provides three core utilities:

  • Tracing provides a simpler way to send traces from functions to AWS X-Ray. It provides visibility into function calls, interactions with other AWS services, or external HTTP requests. You can add attributes to traces to allow filtering based on key information. For example, when using the Tracing attribute, it creates a ColdStart annotation. You can easily group and analyze traces to understand the initialization process.
  • Logging provides a custom logger that outputs structured JSON. It allows you to pass in strings or more complex objects, and takes care of serializing the log output. The logger handles common use cases, such as logging the Lambda event payload, and capturing cold start information. This includes appending custom keys to the logger.
  • Metrics simplifies collecting custom metrics from your application, without the need to make synchronous requests to external systems. This functionality allows capturing metrics asynchronously using Amazon CloudWatch Embedded Metric Format (EMF) which reduces latency and cost. This provides convenient functionality for common cases, such as validating metrics against CloudWatch EMF specification and tracking cold starts.

Getting started

The following steps explain how to use Powertools to implement structured logging, add custom metrics, and enable tracing with AWS X-Ray. The example application consists of an Amazon API Gateway endpoint, a Lambda function, and an Amazon DynamoDB table. It uses the AWS Serverless Application Model (AWS SAM) to manage the deployment.

When you send a GET request to the API Gateway endpoint, the Lambda function is invoked. This function calls a location API to find the IP address, stores it in the DynamoDB table, and returns it with a greeting message to the client.

Example application

Example application

The AWS Lambda Powertools for .NET utilities are available as NuGet packages. Each core utility has a separate NuGet package. It allows you to add only the packages you need. This helps to make the Lambda package size smaller, which can improve the performance.

To implement each of these core utilities in a separate example, use the Globals sections of the AWS SAM template to configure Powertools environment variables and enable active tracing for all Lambda functions and Amazon API Gateway stages.

Sometimes resources that you declare in an AWS SAM template have common configurations. Instead of duplicating this information in every resource, you can declare them once in the Globals section and let your resources inherit them.

Logging

The following steps explain how to implement structured logging in an application. The logging example shows you how to use the logging feature.

To add the Powertools logging library to your project, install the packages from NuGet gallery, from Visual Studio editor, or by using following .NET CLI command:

dotnet add package AWS.Lambda.Powertools.Logging

Use environment variables in the Globals sections of the AWS SAM template to configure the logging library:

  Globals:
    Function:
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: powertools-dotnet-logging-sample
          POWERTOOLS_LOG_LEVEL: Debug
          POWERTOOLS_LOGGER_CASE: SnakeCase

Decorate the Lambda function handler method with the Logging attribute in the code. This enables the utility and allows you to use the Logger functionality to output structured logs by passing messages as a string. For example:

[Logging]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
  Logger.LogInformation("Getting ip address from external service");
  var location = await GetCallingIp();
  ...
}

Lambda sends the output to Amazon CloudWatch Logs as a JSON-formatted line.

{
  "cold_start": true,
  "xray_trace_id": "1-621b9125-0a3b544c0244dae940ab3405",
  "function_name": "powertools-dotnet-tracing-sampl-HelloWorldFunction-v0F2GJwy5r1V",
  "function_version": "$LATEST",
  "function_memory_size": 256,
  "function_arn": "arn:aws:lambda:eu-west-2:286043031651:function:powertools-dotnet-tracing-sample-HelloWorldFunction-v0F2GJwy5r1V",
  "function_request_id": "3ad9140b-b156-406e-b314-5ac414fecde1",
  "timestamp": "2022-02-27T14:56:39.2737371Z",
  "level": "Information",
  "service": "powertools-dotnet-sample",
  "name": "AWS.Lambda.Powertools.Logging.Logger",
  "message": "Getting ip address from external service"
}

Another common use case, especially when developing new Lambda functions, is to print a log of the event received by the handler. You can achieve this by enabling LogEvent on the Logging attribute. This is disabled by default to prevent potentially leaking sensitive event data into logs.

[Logging(LogEvent = true)]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
}

With logs available as structured JSON, you can perform searches on this structured data using CloudWatch Logs Insights. To search for all logs that were output during a Lambda cold start, and display the key fields in the output, run following query:

fields coldStart='true'
| fields @timestamp, function_name, function_version, xray_trace_id
| sort @timestamp desc
| limit 20
CloudWatch Logs Insights query for cold starts

CloudWatch Logs Insights query for cold starts

Tracing

Using the Tracing attribute, you can instruct the library to send traces and metadata from the Lambda function invocation to AWS X-Ray using the AWS X-Ray SDK for .NET. The tracing example shows you how to use the tracing feature.

When your application makes calls to AWS services, the SDK tracks downstream calls in subsegments. AWS services that support tracing, and resources that you access within those services, appear as downstream nodes on the service map in the X-Ray console.

You can instrument all of your AWS SDK for .NET clients by calling RegisterXRayForAllServices before you create them.

public class Function
{
  private static IDynamoDBContext _dynamoDbContext;
  public Function()
  {
    AWSSDKHandler.RegisterXRayForAllServices();
    ...
  }
  ...
}

To add the Powertools tracing library to your project, install the packages from NuGet gallery, from Visual Studio editor, or by using following .NET CLI command:

dotnet add package AWS.Lambda.Powertools.Tracing

Use environment variables in the Globals sections of the AWS SAM template to configure the tracing library.

  Globals:
    Function:
      Tracing: Active
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: powertools-dotnet-tracing-sample
          POWERTOOLS_TRACER_CAPTURE_RESPONSE: true
          POWERTOOLS_TRACER_CAPTURE_ERROR: true

Decorate the Lambda function handler method with the Tracing attribute to enable the utility. To provide more granular details for your traces, you can use the same attribute to capture the invocation of other functions outside of the handler. For example:

[Tracing]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
  var location = await GetCallingIp().ConfigureAwait(false);
  ...
}

[Tracing(SegmentName = "Location service")]
private static async Task<string?> GetCallingIp()
{
  ...
}

Once traffic is flowing, you see a generated service map in the AWS X-Ray console. Decorating the Lambda function handler method, or any other method in the chain with the Tracing attribute, provides an overview of all the traffic flowing through the application.

AWS X-Ray trace service view

AWS X-Ray trace service view

You can also view the individual traces that are generated, along with a waterfall view of the segments and subsegments that comprise your trace. This data can help you pinpoint the root cause of slow operations or errors within your application.

AWS X-Ray waterfall trace view

AWS X-Ray waterfall trace view

You can also filter traces by annotation and create custom service maps with AWS X-Ray Trace groups. In this example, use the filter expression annotation.ColdStart = true to filter traces based on the ColdStart annotation. The Tracing attribute adds these automatically when used within the handler method.

View trace attributes

View trace attributes

Metrics

CloudWatch offers a number of included metrics to help answer general questions about the application’s throughput, error rate, and resource utilization. However, to understand the behavior of the application better, you should also add custom metrics relevant to your workload.

The metrics utility creates custom metrics asynchronously by logging metrics to standard output using the Amazon CloudWatch Embedded Metric Format (EMF).

In the sample application, you want to understand how often your service is calling the location API to identify the IP addresses. The metrics example shows you how to use the metrics feature.

To add the Powertools metrics library to your project, install the packages from the NuGet gallery, from the Visual Studio editor, or by using the following .NET CLI command:

dotnet add package AWS.Lambda.Powertools.Metrics

Use environment variables in the Globals sections of the AWS SAM template to configure the metrics library:

  Globals:
    Function:
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: powertools-dotnet-metrics-sample
          POWERTOOLS_METRICS_NAMESPACE: AWSLambdaPowertools

To create custom metrics, decorate the Lambda function with the Metrics attribute. This ensures that all metrics are properly serialized and flushed to logs when the function finishes its invocation.

You can then emit custom metrics by calling AddMetric or push a single metric with a custom namespace, service and dimensions by calling PushSingleMetric. You can also enable the CaptureColdStart on the attribute to automatically create a cold start metric.

[Metrics(CaptureColdStart = true)]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
  // Add Metric to capture the amount of time
  Metrics.PushSingleMetric(
        metricName: "CallingIP",
        value: 1,
        unit: MetricUnit.Count,
        service: "lambda-powertools-metrics-example",
        defaultDimensions: new Dictionary<string, string>
        {
            { "Metric Type", "Single" }
        });
  ...
}

Conclusion

CloudWatch and AWS X-Ray offer functionality that provides comprehensive observability for your applications. Lambda Powertools .NET is now available in preview. The library helps implement observability when running Lambda functions based on .NET 6 while reducing the amount of custom code.

It simplifies implementing the observability best practices defined in the Serverless Applications Lens for the AWS Well-Architected Framework for a serverless application and allows you to focus more time on the business logic.

You can find the full documentation and the source code for Powertools in GitHub. We welcome contributions via pull request, and encourage you to create an issue if you have any feedback for the project. Happy building with AWS Lambda Powertools for .NET.

For more serverless learning resources, visit Serverless Land.

Let’s Architect! Architecture tools

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecture-tools/

Tools, such as diagramming software, low-code applications, and frameworks, make it possible to experiment quickly. They are essential in today’s fast-paced and technology-driven world. From improving efficiency and accuracy, to enhancing collaboration and creativity, a well-defined set of tools can make a significant impact on the quality and success of a project in the area of software architecture.

As an architect, you can take advantage of a wide range of resources to help you build solutions that meet the needs of your organization. For example, with tools in the likes of the Amazon Web Services (AWS) Solutions Library and Serverless Land, you can boost your knowledge and productivity while working on event-driven architectures, microservices, and stateless computing.

In this Let’s Architect! edition, we explore how to incorporate these patterns into your architecture, and which tools to leverage to build solutions that are scalable, secure, and cost-effective.

How AWS Application Composer helps your team build great apps

In this re:Invent 2022 session, Chase Douglas, Principal Engineer at AWS, speaks about AWS Application Composer, a newly launched service.

This service has the potential to change the way architects design solutions—without writing a single line of code! The service is user-friendly, intuitive, and requires no prior coding experience. It allows users to scaffold a serverless architecture, defining a CloudFormation template visually with drag-and-drop. A detailed AWS Compute Blog post takes readers through the process of using AWS Application Composer.

Take me to this re:Invent 2022 video!

How an architecture can be designed with AWS Application Composer

How an architecture can be designed with AWS Application Composer

AWS design + build tools

When migrating to the cloud, we suggest referencing these four tried-and-true AWS resources that can be used to design and build projects.

  1. AWS Workshops are created by AWS teams to provide opportunities for hands-on learning to develop practical skills. Workshops are available in multiple categories and for skill levels 100-400.
  2. AWS Architecture Center contains a collection of best practices and architectural patterns for designing and deploying cloud-based solutions using AWS services. Furthermore, it includes detailed architecture diagrams, whitepapers, case studies, and other resources that provide a wealth of information on how to design and implement cloud solutions.
  3. Serverless Land (an Amazon property) brings together various patterns, workflows, code snippets, and blog posts pertaining to AWS serverless architectures.
  4. AWS Solutions Library provides customers with templates, tools, and automated workflows to easily deploy, operate, and manage common use cases on the AWS Cloud.
Inside event-driven architectures designed by David Boyne on Serverless Land

Inside event-driven architectures designed by David Boyne on Serverless Land

The Well-Architected way

In this session, the AWS Well-Architected provides guidance on how to implement the architectural models reported in the AWS Well-Architected Framework within your organization at scale.

Discover a customer story and understand how to use the features of the AWS Well-Architected Tool and APIs to receive recommendations based on your workload and measure your architectural metrics. In the Framework whitepaper, you can explore the six pillars of Well-Architected (operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability) and best practices to achieve them.

Understanding the key design pillars can help architects make informed design decisions, leading to more robust and efficient solutions. This knowledge also enables architects to identify potential problems early on in the design process and find appropriate patterns to address those issues.

Take me to the Well-Architected video!

Discover how the AWS Well-Architected Framework can help you design scalable, maintainable, and reusable solutions

Discover how the AWS Well-Architected Framework can help you design scalable, maintainable, and reusable solutions

See you next time!

Thanks for exploring architecture tools and resources with us!

Join us next time when we’ll talk about data mesh architecture!

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Top 10 AWS Architecture Blog posts of 2022

Post Syndicated from Elise Chahine original https://aws.amazon.com/blogs/architecture/top-10-aws-architecture-blog-posts-of-2022/

As we wrap up 2022, we want to take a moment to shine a bright light on our readers, who spend their time exploring our posts, providing generous feedback, and asking poignant questions! Much appreciation goes to our Solutions Architects, who work tirelessly to identify and produce what our customers need.

Without any further ado, here are the top 10 AWS Architecture Blog posts of 2022…

#1: Creating a Multi-Region Application with AWS Services – Part 2, Data and Replication

Joe Chapman, Senior Solutions Architect, and Seth Eliot, Principal Developer Advocate, come in at #1 with a review of AWS services that offer cross-Region data replication—getting data where in needs to be, quickly!

#1 2022

#2: Reduce Cost and Increase Security with Amazon VPC Endpoints

Nigel Harris and team. explain the benefits of using Amazon VPC endpoints, and how to appropriately restrict access to endpoints and the services they connect to. Learn more by taking the VPC Endpoint Workshop in the AWS Workshop Studio!

#2 2022

#3: Multi-Region Migration using AWS Application Migration Service

In this technical how-to post, Shreya Pathak and Medha Shree demonstrate how to configure AWS Application Migration Service to migrate workloads from one AWS Region to another.

#4: Let’s Architect! Architecting for Sustainability

The Let’s Architect! Team claims 4 of the top 10 spots for 2022! Luca, Laura, Vittorio, and Zamira kick-off the series by providing material to help our customers design sustainable architectures and create awareness on the topic of sustainability.

#5: Let’s Architect! Serverless architecture on AWS

In this post, the Let’s Architect! Team shares insights into reimagining a serverless environment, including how to start prototype and scale to mass adoption using decoupled systems, integration approaches, serverless architectural patterns and best practices, and more!

#6: Let’s Architect! Tools for Cloud Architects

For a three-in-a-row, the Let’s Architect! Team shares tools and methodologies for architects to learn and experiment with. This post was also a celebration of International Women’s Day, with half of the tools detailed developed with or by women!

#7: Announcing updates to the AWS Well-Architected Framework

Well-Architected is tried and true AWS, describing key concepts, design principles, and architecture best practices for cloud workloads. In this post, Haleh Najafzadeh, Senior Solutions Architecture Manager for AWS Well-Architected, updates our readers on improvements to the Well-Architected Framework across all six pillars.

#8: Creating a Multi-Region Application with AWS Services – Part 3, Application Management and Monitoring

Joe and Seth are back at #8, covering AWS services and features used for messaging, deployment, monitoring, and management in multi-Region applications.

#9: Let’s Architect! Creating resilient architecture

“The need for resilient workloads transcends all customer industries…” In their last top 10 post, the team provides resources to help build resilience into your AWS architecture.

#10: Using DevOps Automation to Deploy Lambda APIs across Accounts and Environments

Subrahmanyam Madduru and team demonstrate how to automate release deployments in a repeatable and agile manner, reducing manual errors and increasing the speed of delivery for business capabilities.

Goodbye, 2022!

A big thank you to all our readers and authors! Your feedback and collaboration are appreciated and help us produce better content every day.

From all of us at the AWS Architecture Blog, happy holidays!

AWS Week in Review – November 21, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-november-21-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

A new week starts, and the News Blog team is getting ready for AWS re:Invent! Many of us will be there next week and it would be great to meet in person. If you’re coming, do you know about PeerTalk? It’s an onsite networking program for re:Invent attendees available through the AWS Events mobile app (which you can get on Google Play or Apple App Store) to help facilitate connections among the re:Invent community.

If you’re not coming to re:Invent, no worries, you can get a free online pass to watch keynotes and leadership sessions.

Last Week’s Launches
It was a busy week for our service teams! Here are the launches that got my attention:

AWS Region in Spain – The AWS Region in Aragón, Spain, is now open. The official name is Europe (Spain), and the API name is eu-south-2.

Amazon Athena – You can now apply AWS Lake Formation fine-grained access control policies with all table and file format supported by Amazon Athena to centrally manage permissions and access data catalog resources in your Amazon Simple Storage Service (Amazon S3) data lake. With fine-grained access control, you can restrict access to data in query results using data filters to achieve column-level, row-level, and cell-level security.

Amazon EventBridge – With these additional filtering capabilities, you can now filter events by suffix, ignore case, and match if at least one condition is true. This makes it easier to write complex rules when building event-driven applications.

AWS Controllers for Kubernetes (ACK) – The ACK for Amazon Elastic Compute Cloud (Amazon EC2) is now generally available and lets you provision and manage EC2 networking resources, such as VPCs, security groups and internet gateways using the Kubernetes API. Also, the ACK for Amazon EMR on EKS is now generally available to allow you to declaratively define and manage EMR on EKS resources such as virtual clusters and job runs as Kubernetes custom resources. Learn more about ACK for Amazon EMR on EKS in this blog post.

Amazon HealthLake – New analytics capabilities make it easier to query, visualize, and build machine learning (ML) models. Now HealthLake transforms customer data into an analytics-ready format in near real-time so that you can query, and use the resulting data to build visualizations or ML models. Also new is Amazon HealthLake Imaging (preview), a new HIPAA-eligible capability that enables you to easily store, access, and analyze medical images at any scale. More on HealthLake Imaging can be found in this blog post.

Amazon RDS – You can now transfer files between Amazon Relational Database Service (RDS) for Oracle and an Amazon Elastic File System (Amazon EFS) file system. You can use this integration to stage files like Oracle Data Pump export files when you import them. You can also use EFS to share a file system between an application and one or more RDS Oracle DB instances to address specific application needs.

Amazon ECS and Amazon EKS – We added centralized logging support for Windows containers to help you easily process and forward container logs to various AWS and third-party destinations such as Amazon CloudWatch, S3, Amazon Kinesis Data Firehose, Datadog, and Splunk. See these blog posts for how to use this new capability with ECS and with EKS.

AWS SAM CLI – You can now use the Serverless Application Model CLI to locally test and debug an AWS Lambda function defined in a Terraform application. You can see a walkthrough in this blog post.

AWS Lambda – Now supports Node.js 18 as both a managed runtime and a container base image, which you can learn more about in this blog post. Also check out this interesting article on why and how you should use AWS SDK for JavaScript V3 with Node.js 18. And last but not least, there is new tooling support to build and deploy native AOT compiled .NET 7 applications to AWS Lambda. With this tooling, you can enable faster application starts and benefit from reduced costs through the faster initialization times and lower memory consumption of native AOT applications. Learn more in this blog post.

AWS Step Functions – Now supports cross-account access for more than 220 AWS services to process data, automate IT and business processes, and build applications across multiple accounts. Learn more in this blog post.

AWS Fargate – Adds the ability to monitor the utilization of the ephemeral storage attached to an Amazon ECS task. You can track the storage utilization with Amazon CloudWatch Container Insights and ECS Task Metadata endpoint.

AWS Proton – Now has a centralized dashboard for all resources deployed and managed by AWS Proton, which you can learn more about in this blog post. You can now also specify custom commands to provision infrastructure from templates. In this way, you can manage templates defined using the AWS Cloud Development Kit (AWS CDK) and other templating and provisioning tools. More on CDK support and AWS CodeBuild provisioning can be found in this blog post.

AWS IAM – You can now use more than one multi-factor authentication (MFA) device for root account users and IAM users in your AWS accounts. More information is available in this post.

Amazon ElastiCache – You can now use IAM authentication to access Redis clusters. With this new capability, IAM users and roles can be associated with ElastiCache for Redis users to manage their cluster access.

Amazon WorkSpaces – You can now use version 2.0 of the WorkSpaces Streaming Protocol (WSP) host agent that offers significant streaming quality and performance improvements, and you can learn more in this blog post. Also, with Amazon WorkSpaces Multi-Region Resilience, you can implement business continuity solutions that keep users online and productive with less than 30-minute recovery time objective (RTO) in another AWS Region during disruptive events. More on multi-region resilience is available in this post.

Amazon CloudWatch RUM – You can now send custom events (in addition to predefined events) for better troubleshooting and application specific monitoring. In this way, you can monitor specific functions of your application and troubleshoot end user impacting issues unique to the application components.

AWS AppSync – You can now define GraphQL API resolvers using JavaScript. You can also mix functions written in JavaScript and Velocity Template Language (VTL) inside a single pipeline resolver. To simplify local development of resolvers, AppSync released two new NPM libraries and a new API command. More info can be found in this blog post.

AWS SDK for SAP ABAP – This new SDK makes it easier for ABAP developers to modernize and transform SAP-based business processes and connect to AWS services natively using the SAP ABAP language. Learn more in this blog post.

AWS CloudFormation – CloudFormation can now send event notifications via Amazon EventBridge when you create, update, or delete a stack set.

AWS Console – With the new Applications widget on the Console home, you have one-click access to applications in AWS Systems Manager Application Manager and their resources, code, and related data. From Application Manager, you can view the resources that power your application and your costs using AWS Cost Explorer.

AWS Amplify – Expands Flutter support (developer preview) to Web and Desktop for the API, Analytics, and Storage use cases. You can now build cross-platform Flutter apps with Amplify that target iOS, Android, Web, and Desktop (macOS, Windows, Linux) using a single codebase. Learn more on Flutter Web and Desktop support for AWS Amplify in this post. Amplify Hosting now supports fully managed CI/CD deployments and hosting for server-side rendered (SSR) apps built using Next.js 12 and 13. Learn more in this blog post and see how to deploy a NextJS 13 app with the AWS CDK here.

Amazon SQS – With attribute-based access control (ABAC), you can define permissions based on tags attached to users and AWS resources. With this release, you can now use tags to configure access permissions and policies for SQS queues. More details can be found in this blog.

AWS Well-Architected Framework – The latest version of the Data Analytics Lens is now available. The Data Analytics Lens is a collection of design principles, best practices, and prescriptive guidance to help you running analytics on AWS.

AWS Organizations – You can now manage accounts, organizational units (OUs), and policies within your organization using CloudFormation templates.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A few more stuff you might have missed:

Introducing our final AWS Heroes of the year – As the end of 2022 approaches, we are recognizing individuals whose enthusiasm for knowledge-sharing has a real impact with the AWS community. Please meet them here!

The Distributed Computing ManifestoWerner Vogles, VP & CTO at Amazon.com, shared the Distributed Computing Manifesto, a canonical document from the early days of Amazon that transformed the way we built architectures and highlights the challenges faced at the end of the 20th century.

AWS re:Post – To make this community more accessible globally, we expanded the user experience to support five additional languages. You can now interact with AWS re:Post also using Traditional Chinese, Simplified Chinese, French, Japanese, and Korean.

For AWS open-source news and updates, here’s the latest newsletter curated by Ricardo to bring you the most recent updates on open-source projects, posts, events, and more.

Upcoming AWS Events
As usual, there are many opportunities to meet:

AWS re:Invent – Our yearly event is next week from November 28 to December 2. If you can’t be there in person, get your free online pass to watch live the keynotes and the leadership sessions.

AWS Community DaysAWS Community Day events are community-led conferences to share and learn together. Join us in Sri Lanka (on December 6-7), Dubai, UAE (December 10), Pune, India (December 10), and Ahmedabad, India (December 17).

That’s all from me for this week. Next week we’ll focus on re:Invent, and then we’ll take a short break. We’ll be back with the next Week in Review on December 12!

Danilo

Reducing Your Organization’s Carbon Footprint with Amazon CodeGuru Profiler

Post Syndicated from Isha Dua original https://aws.amazon.com/blogs/devops/reducing-your-organizations-carbon-footprint-with-codeguru-profiler/

It is crucial to examine every functional area when firms reorient their operations toward sustainable practices. Making informed decisions is necessary to reduce the environmental effect of an IT stack when creating, deploying, and maintaining it. To build a sustainable business for our customers and for the world we all share, we have deployed data centers that provide the efficient, resilient service our customers expect while minimizing our environmental footprint—and theirs. While we work to improve the energy efficiency of our datacenters, we also work to help our customers improve their operations on the AWS cloud. This two-pronged approach is based on the concept of the shared responsibility between AWS and AWS’ customers. As shown in the diagram below, AWS focuses on optimizing the sustainability of the cloud, while customers are responsible for sustainability in the cloud, meaning that AWS customers must optimize the workloads they have on the AWS cloud.

Figure 1. Shared responsibility model for sustainability

Figure 1. Shared responsibility model for sustainability

Just by migrating to the cloud, AWS customers become significantly more sustainable in their technology operations. On average, AWS customers use 77% fewer servers, 84% less power, and a 28% cleaner power mix, ultimately reducing their carbon emissions by 88% compared to when they ran workloads in their own data centers. These improvements are attributable to the technological advancements and economies of scale that AWS datacenters bring. However, there are still significant opportunities for AWS customers to make their cloud operations more sustainable. To uncover this, we must first understand how emissions are categorized.

The Greenhouse Gas Protocol organizes carbon emissions into the following scopes, along with relevant emission examples within each scope for a cloud provider such as AWS:

  • Scope 1: All direct emissions from the activities of an organization or under its control. For example, fuel combustion by data center backup generators.
  • Scope 2: Indirect emissions from electricity purchased and used to power data centers and other facilities. For example, emissions from commercial power generation.
  • Scope 3: All other indirect emissions from activities of an organization from sources it doesn’t control. AWS examples include emissions related to data center construction, and the manufacture and transportation of IT hardware deployed in data centers.

From an AWS customer perspective, emissions from customer workloads running on AWS are accounted for as indirect emissions, and part of the customer’s Scope 3 emissions. Each workload deployed generates a fraction of the total AWS emissions from each of the previous scopes. The actual amount varies per workload and depends on several factors including the AWS services used, the energy consumed by those services, the carbon intensity of the electric grids serving the AWS data centers where they run, and the AWS procurement of renewable energy.

At a high level, AWS customers approach optimization initiatives at three levels:

  • Application (Architecture and Design): Using efficient software designs and architectures to minimize the average resources required per unit of work.
  • Resource (Provisioning and Utilization): Monitoring workload activity and modifying the capacity of individual resources to prevent idling due to over-provisioning or under-utilization.
  • Code (Code Optimization): Using code profilers and other tools to identify the areas of code that use up the most time or resources as targets for optimization.

In this blogpost, we will concentrate on code-level sustainability improvements and how they can be realized using Amazon CodeGuru Profiler.

How CodeGuru Profiler improves code sustainability

Amazon CodeGuru Profiler collects runtime performance data from your live applications and provides recommendations that can help you fine-tune your application performance. Using machine learning algorithms, CodeGuru Profiler can help you find your most CPU-intensive lines of code, which contribute the most to your scope 3 emissions. CodeGuru Profiler then suggests ways to improve the code to make it less CPU demanding. CodeGuru Profiler provides different visualizations of profiling data to help you identify what code is running on the CPU, see how much time is consumed, and suggest ways to reduce CPU utilization. Optimizing your code with CodeGuru profiler leads to the following:

  • Improvements in application performance
  • Reduction in cloud cost, and
  • Reduction in the carbon emissions attributable to your cloud workload.

When your code performs the same task with less CPU, your applications run faster, customer experience improves, and your cost reduces alongside your cloud emission. CodeGuru Profiler generates the recommendations that help you make your code faster by using an agent that continuously samples stack traces from your application. The stack traces indicate how much time the CPU spends on each function or method in your code—information that is then transformed into CPU and latency data that is used to detect anomalies. When anomalies are detected, CodeGuru Profiler generates recommendations that clearly outline you should do to remediate the situation. Although CodeGuru Profiler has several visualizations that help you visualize your code, in many cases, customers can implement these recommendations without reviewing the visualizations. Let’s demonstrate this with a simple example.

Demonstration: Using CodeGuru Profiler to optimize a Lambda function

In this demonstration, the inefficiencies in a AWS Lambda function will be identified by CodeGuru Profiler.

Building our Lambda Function (10mins)

To keep this demonstration quick and simple, let’s create a simple lambda function that display’s ‘Hello World’. Before writing the code for this function, let’s review two important concepts. First, when writing Python code that runs on AWS and calls AWS services, two critical steps are required:

The Python code lines (that will be part of our function) that execute these steps listed above are shown below:

import boto3 #this will import AWS SDK library for Python
VariableName = boto3.client('dynamodb’) #this will create the AWS SDK service client

Secondly, functionally, AWS Lambda functions comprise of two sections:

  • Initialization code
  • Handler code

The first time a function is invoked (i.e., a cold start), Lambda downloads the function code, creates the required runtime environment, runs the initialization code, and then runs the handler code. During subsequent invocations (warm starts), to keep execution time low, Lambda bypasses the initialization code and goes straight to the handler code. AWS Lambda is designed such that the SDK service client created during initialization persists into the handler code execution. For this reason, AWS SDK service clients should be created in the initialization code. If the code lines for creating the AWS SDK service client are placed in the handler code, the AWS SDK service client will be recreated every time the Lambda function is invoked, needlessly increasing the duration of the Lambda function during cold and warm starts. This inadvertently increases CPU demand (and cost), which in turn increases the carbon emissions attributable to the customer’s code. Below, you can see the green and brown versions of the same Lambda function.

Now that we understand the importance of structuring our Lambda function code for efficient execution, let’s create a Lambda function that recreates the SDK service client. We will then watch CodeGuru Profiler flag this issue and generate a recommendation.

  1. Open AWS Lambda from the AWS Console and click on Create function.
  2. Select Author from scratch, name the function ‘demo-function’, select Python 3.9 under runtime, select x86_64 under Architecture.
  3. Expand Permissions, then choose whether to create a new execution role or use an existing one.
  4. Expand Advanced settings, and then select Function URL.
  5. For Auth type, choose AWS_IAM or NONE.
  6. Select Configure cross-origin resource sharing (CORS). By selecting this option during function creation, your function URL allows requests from all origins by default. You can edit the CORS settings for your function URL after creating the function.
  7. Choose Create function.
  8. In the code editor tab of the code source window, copy and paste the code below:
#invocation code
import json
import boto3

#handler code
def lambda_handler(event, context):
  client = boto3.client('dynamodb') #create AWS SDK Service client’
  #simple codeblock for demonstration purposes  
  output = ‘Hello World’
  print(output)
  #handler function return

  return output

Ensure that the handler code is properly indented.

  1. Save the code, Deploy, and then Test.
  2. For the first execution of this Lambda function, a test event configuration dialog will appear. On the Configure test event dialog window, leave the selection as the default (Create new event), enter ‘demo-event’ as the Event name, and leave the hello-world template as the Event template.
  3. When you run the code by clicking on Test, the console should return ‘Hello World’.
  4. To simulate actual traffic, let’s run a curl script that will invoke the Lambda function every 0.2 seconds. On a bash terminal, run the following command:
while true; do curl {Lambda Function URL]; sleep 0.06; done

If you do not have git bash installed, you can use AWS Cloud 9 which supports curl commands.

Enabling CodeGuru Profiler for our Lambda function

We will now set up CodeGuru Profiler to monitor our Lambda function. For Lambda functions running on Java 8 (Amazon Corretto), Java 11, and Python 3.8 or 3.9 runtimes, CodeGuru Profiler can be enabled through a single click in the configuration tab in the AWS Lambda console.  Other runtimes can be enabled following a series of steps that can be found in the CodeGuru Profiler documentation for Java and the Python.

Our demo code is written in Python 3.9, so we will enable Profiler from the configuration tab in the AWS Lambda console.

  1. On the AWS Lambda console, select the demo-function that we created.
  2. Navigate to Configuration > Monitoring and operations tools, and click Edit on the right side of the page.

  1.  Scroll down to Amazon CodeGuru Profiler and click the button next to Code profiling to turn it on. After enabling Code profiling, click Save.

Note: CodeGuru Profiler requires 5 minutes of Lambda runtime data to generate results. After your Lambda function provides this runtime data, which may need multiple runs if your lambda has a short runtime, it will display within the Profiling group page in the CodeGuru Profiler console. The profiling group will be given a default name (i.e., aws-lambda-<lambda-function-name>), and it will take approximately 15 minutes after CodeGuru Profiler receives the runtime data for this profiling group to appear. Be patient. Although our function duration is ~33ms, our curl script invokes the application once every 0.06 seconds. This should give profiler sufficient information to profile our function in a couple of hours. After 5 minutes, our profiling group should appear in the list of active profiling groups as shown below.

Depending on how frequently your Lambda function is invoked, it can take up to 15 minutes to aggregate profiles, after which you can see your first visualization in the CodeGuru Profiler console. The granularity of the first visualization depends on how active your function was during those first 5 minutes of profiling—an application that is idle most of the time doesn’t have many data points to plot in the default visualization. However, you can remedy this by looking at a wider time period of profiled data, for example, a day or even up to a week, if your application has very low CPU utilization. For our demo function, a recommendation should appear after about an hour. By this time, the profiling groups list should show that our profiling group now has one recommendation.

Profiler has now flagged the repeated creation of the SDK service client with every invocation.

From the information provided, we can see that our CPU is spending 5x more computing time than expected on the recreation of the SDK service client. The estimated cost impact of this inefficiency is also provided. In production environments, the cost impact of seemingly minor inefficiencies can scale very quickly to several kilograms of CO2 and hundreds of dollars as invocation frequency, and the number of Lambda functions increase.

CodeGuru Profiler integrates with Amazon DevOps Guru, a fully managed service that makes it easy for developers and operators to improve the performance and availability of their applications. Amazon DevOps Guru analyzes operational data and application metrics to identify behaviors that deviate from normal operating patterns. Once these operational anomalies are detected, DevOps Guru presents intelligent recommendations that address current and predicted future operational issues. By integrating with CodeGuru Profiler, customers can now view operational anomalies and code optimization recommendations on the DevOps Guru console. The integration, which is enabled by default, is only applicable to Lambda resources that are supported by CodeGuru Profiler and monitored by both DevOps Guru and CodeGuru.

We can now stop the curl loop (Control+C) so that the Lambda function stops running. Next, we delete the profiling group that was created when we enabled profiling in Lambda, and then delete the Lambda function or repurpose as needed.

Conclusion

Cloud sustainability is a shared responsibility between AWS and our customers. While we work to make our datacenter more sustainable, customers also have to work to make their code, resources, and applications more sustainable, and CodeGuru Profiler can help you improve code sustainability, as demonstrated above. To start Profiling your code today, visit the CodeGuru Profiler documentation page. To start monitoring your applications, head over to the Amazon DevOps Guru documentation page.

About the authors:

Isha Dua

Isha Dua is a Senior Solutions Architect based in San Francisco Bay Area. She helps AWS Enterprise customers grow by understanding their goals and challenges, and guiding them on how they can architect their applications in a cloud native manner while making sure they are resilient and scalable. She’s passionate about machine learning technologies and Environmental Sustainability.

Christian Tomeldan

Christian Tomeldan is a DevOps Engineer turned Solutions Architect. Operating out of San Francisco, he is passionate about technology and conveys that passion to customers ensuring they grow with the right support and best practices. He focuses his technical depth mostly around Containers, Security, and Environmental Sustainability.

Ifeanyi Okafor

Ifeanyi Okafor is a Product Manager with AWS. He enjoys building products that solve customer problems at scale.

Accelerating Well-Architected Framework reviews using integrated AWS Trusted Advisor insights

Post Syndicated from Stephen Salim original https://aws.amazon.com/blogs/architecture/accelerating-well-architected-framework-reviews-using-integrated-aws-trusted-advisor-insights/

In this blog, we will explain how the new AWS Well-Architected integration with AWS Trusted Advisor can give you insights that help you create a flywheel effect to accelerate your cloud optimization. Customers that have the most success in their cloud adoption recognize that optimizing their cloud architecture and operations is not a one-time effort. Optimization is a continuous improvement virtuous cycle based on learning architectural and operational best practices, measuring workloads against these best practices, and implementing improvements based on opportunities recognized from measurement.

Customers can use the AWS Well-Architected Framework to build a “learn, measure, and improve” continuous improvement virtuous cycle (Figure 1). With the AWS Well-Architected Tool, customers can measure their workloads against these AWS best practices to identify improvement opportunities or risks they should address. After customers complete Well-Architected Framework Reviews (WAFRs) they can generate improvement plans with prioritized guidance and resources for improvement. They can also track the improvements made over time using the milestones feature in the Well-Architected Tool.

Continuous optimization of workloads based on AWS best practices

Figure 1. Continuous optimization of workloads based on AWS best practices

Amazon uses the term flywheel to describe a virtuous cycle that has additional drivers to add momentum, which accelerates the cycle and the value it delivers. Figure 2 is the often-referenced Amazon retail flywheel, which shows how Amazon’s focus on customer experience drives growth. It is accelerated by creating a lower cost structure, which allows Amazon to pass lower prices to its customers, improving customer experience and driving faster growth.

Amazon Flywheel concept of scaling growth

Figure 2. The Amazon Flywheel concept of scaling growth

Customers can add momentum to an AWS Well-Architected “learn, measure, and improve” virtuous cycle using tools that give more insights while measuring workloads. Improved insights result in consistent measurements, that are more efficient and more accurate. This accelerates the optimization cycle by reducing the time required to measure workloads. Collecting information on AWS resources using Trusted Advisor checks allows customers to validate if a workload’s state is aligned with AWS best practices. The new AWS Well-Architected Tool integration with AWS Trusted Advisor makes it easier and faster to gain insights during WAFRs. The Trusted Advisor checks that are relevant to a specific set of best practices have been mapped to the corresponding questions in Well-Architected. The new feature now shows the mapped Trusted Advisor checks directly in the Well-Architected Tool. These insights help customers run WAFRs in less time, with more accuracy, creating a flywheel effect (Figure 3).

Insights from AWS Trusted Advisor create acceleration in achieving improved outcomes

Figure 3. Insights from AWS Trusted Advisor create acceleration in achieving improved outcomes

AWS Well-Architected Tool integration with AWS Trusted Advisor: feature example

In the following sections, we detail an example scenario on how to use the integration with Trusted Advisor to gain insights when measuring your workloads.

Enabling the AWS Well-Architected Tool integration with AWS Trusted Advisor

How to enable the new feature in your workload:

  1. Create a new workload in the AWS Well-Architected Console. Refer to the user guide for detailed instructions.

    Optional
    : When defining a workload, within the “Application” section of workload definition, you can now also specify the AWS Service Catalog AppRegistry AWS Resource Name (ARN). This field is to indicate a relationship between the AWS Well-Architected Tool workload and the AWS resources in an AppRegistry Application when performing a Well-Architected Framework Review (Figure 4).

    Application field to select AWS Service Catalog AppRegistry ARN

    Figure 4. Application field to select AWS Service Catalog AppRegistry ARN

    This is another new AWS Well-Architected Tool feature that launched along with the integration with Trusted Advisor feature. You can find out more details about the integration with AWS Service Catalog AppRegistry in the What’s New post and on the feature documentation page. For details on how to create an AWS Service Catalog AppRegistry Application refer to Creating applications.

  2. To enable the integration with Trusted Advisor, after the necessary workload information has been entered, within the “AWS Trusted Advisor” section, tick on “Activate Trusted Advisor” (Figure 5).
    Enabling the Trusted Advisor feature

    Figure 5. Enabling the AWS Trusted Advisor feature

    Optional: Once the workload is created, note the workload ARN. You can find the workload ARN in the Properties section of the workload resource you created (Figure 6). For steps on how to identify your workload, refer to Well-Architected Tool User Guide on viewing a workload.

    AWS Well-Architected Tool showing workload ARN

    Figure 6. AWS Well-Architected Tool showing workload ARN

  3. To collect Trusted Advisor checks from accounts other than the account where the workload you are reviewing exists, you must perform two steps. You need to ensure the account IDs are listed in the workload properties for the workload you are reviewing. You must then create an IAM role in the account from which Trusted Advisor checks will be collected with the following permission and trust relationship (Figures 7 and 8). For more information on how to setup this permission, refer to the feature documentation.
    Permissions needed by AWS Well-Architected Tool to interrogate AWS Trusted Advisor

    Figure 7. Permissions needed by AWS Well-Architected Tool to interrogate AWS Trusted Advisor

    The trust relationship allowing AWS Well-Architected Tool to assume policy on behalf of the workload

    Figure 8. The trust relationship allowing AWS Well-Architected Tool to assume policy on behalf of the workload

Using integration with AWS Trusted Advisor for insights during reviews

Once the feature is enabled, additional insights will be noticeable about the resources in your workload using Trusted Advisor checks. Let’s explore an example question. In this case, we will use Question 9 from the Reliability Pillar, as there are Trusted Advisor checks related to the best practices in it: How do you back up data?

  1. AWS Well-Architected Reliability Question 9 includes best practices that are related to how workload backup is performed to support the ability for the workload to recover from failure. Current findings using Trusted Advisor checks indicates the workload may not be configured based on the “Perform data backup automatically” best practice in the Reliability Pillar (Figure 9).

    "Perform data backup automatically" best practices

    Figure 9. “Perform data backup automatically” best practices

  2. To access Trusted Advisor checks as insights, you can select a question in the Well-Architected Tool (Figure 10). If there are related Trusted Advisor checks available for a question, there will be a “View checks” button like the screenshot below. You can also select the “Trusted Advisor checks” tab.

    Trusted Advisor checks that map to best practices

    Figure 10. AWS Trusted Advisor checks that map to best practices

  3. Trusted Advisor checks are available, which provide insights related to the best practice in the question. You will also notice the state of resources recommendations and the count of resources. Trusted Advisor checks that relate to the best practice “Perform data backup automatically” are displayed. One of the Trusted Advisor checks identified with a x in a circle (denoting “Action recommended”) status is on the Amazon Elastic Block Storage (Amazon EBS) snapshots availability to recover your EBS volume from in the event of disaster (Figure 11).

    AWS Trusted Advisor check for Amazon EBS snapshots with "Action recommended"

    Figure 11. AWS Trusted Advisor check for Amazon EBS snapshots with “Action recommended”

  4. Exploring the Trusted Advisor Console, you can identify the EBS volume ID that has been detected with no snapshot in this us-west-2 region (Figure 12).

    An EBS volume that does not have snapshots

    Figure 12. An EBS volume that does not have snapshots

  5. With the insights from Trusted Advisor, we can quickly determine that the “Perform data backup automatically” best practice is not in place, as we do not have Amazon EBS snapshots enabled. Through the “helpful resources” section, instructions can be found to help automate the snapshot creation of Amazon EBS volume (Figure 13). One method to achieve this is to use AWS Backup.

    Resources with details about best practices, including links to learn more

    Figure 13. Resources with details about best practices, including links to learn more

  6. Using AWS Backup you can define a backup plan to automate snapshots creation of the EBS volume. Using this plan, you adjust the frequency of the backup to help achieve your recovery time objective and recovery point objective (Figure 14). For more information on how to configure EBS volume backup plan, refer to the Developer Guide on creating a backup plan.

    Setup automatic Amazon EBS volume snapshots

    Figure 14. Setup automatic Amazon EBS volume snapshots

  7. Once this improvement is implemented and the related EBS volume snapshot is taken, Trusted Advisor will reflect the changes to the resource (Figure 15).

    Amazon EBS volume with a snapshot

    Figure 15. Amazon EBS volume with a snapshot

  8. The next time we perform a Well-Architected Framework Review on this workload, the related AWS Trusted Advisor Check will show no action required with a check-mark status (Figure 16).
    AWS Trusted Advisor checks that represent improvements that have been implemented

    Figure 16. AWS Trusted Advisor checks that represent improvements that have been implemented

    Optional: For access to the list of Trusted Advisor checks in .csv format, you can click on the “Download check details” button on each question to download the resources that were checked in relation to the specified best practices (Figure 17).

    "Download check details" button

    Figure 17. “Download check details” button

  9. Once implemented, this improvement ensures a means to recover the EBS volume data in the event of disaster. This makes the resources in the workload better aligned to the AWS Reliability Pillar Design principle of “Automatically recover from failure”. To reflect this alignment in the Well-Architected Tool, you can tick on the best practice check items under the related questions (Figure 18).

    A milestone with updated best practices based on improvements that have been implemented

    Figure 18. A milestone with updated best practices based on improvements that have been implemented

  10. Finally, you can create a milestone to capture a point in time state of your workload WAFR. As you continuously optimize with more WAFRs and improvements, the number of high- and medium-risk items identified within each review will decrease. You will notice the continuous optimization of your workload over time, as in Figure 19.

    The history of improvements being made over time

    Figure 19. The history of improvements being made over time

Conclusion

Using the AWS Well-Architected integration with AWS Trusted Advisor, customers have a mechanism to accelerate the “learn, measure, and improve” Well-Architected virtuous cycle, creating an optimization flywheel. We have demonstrated the value of creating acceleration through the insights from Trusted Advisor checks. You now know how to enable the integration with Trusted Advisor and have seen an example of how the insights can accelerate your review cycle. You will notice the improvements you make over time will reflect in the Trusted Advisor checks as you review the milestones for your workloads. Enable this feature on your next Well-Architected Framework Review (WAFR) to measure the impact that data-driven insights from Trusted Advisor can have on reducing the time-to-value for your reviews. For more information consider these additional resources. You can contact your account team for support in running WAFRs or check out the AWS Well-Architected Partner Program to find a partner that can help you run a review. Additionally, running a WAFR with a partner assisting you in remediating risks may also provide funding credits to offset the costs required to make the improvements.

“Perform data backup automatically” is part of the Reliability Pillar of the AWS Well-Architected Framework. AWS Well-Architected is a set of guiding design principles developed by AWS to help organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Use the AWS Well-Architected Tool to review your workloads periodically to address important design considerations and ensure that they follow the best practices and guidance of the AWS Well-Architected Framework. For follow up questions or comments, join our growing community on AWS re:Post.