Tag Archives: AWS Well-Architected Tool

Top Architecture Blog Posts of 2023

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2023/

2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.

As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.

I give you our 2023 cream of the crop!

#10: Build a serverless retail solution for endless aisle on AWS

In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.

Building endless aisle architecture for order processing

Figure 1. Building endless aisle architecture for order processing

Check it out!

#9: Optimizing data with automated intelligent document processing solutions

Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.

AI-based intelligent document processing engine

Figure 2: AI-based intelligent document processing engine

Check it out!

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.

Warm standby with managed services

Figure 3. Warm standby with managed services

Check it out!

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.

Architecture flow for Microservices to simulate a realistic failure scenario

Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario

Check it out!

#6: Let’s Architect! Designing event-driven architectures

Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.

Let's Architect

Figure 5. Let’s Architect artwork

Check it out!

#5: Use a reusable ETL framework in your AWS lake house architecture

As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.

Reusable ETL framework architecture

Figure 6. Reusable ETL framework architecture

Check it out!

#4: Invoking asynchronous external APIs with AWS Step Functions

It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.

Invoking Asynchronous External APIs architecture

Figure 7. Invoking Asynchronous External APIs architecture

Check it out!

#3: Announcing updates to the AWS Well-Architected Framework

The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!

Well-Architected logo

Figure 8. Well-Architected logo

Check it out!

#2: Let’s Architect! Designing architectures for multi-tenancy

The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.

Let's Architect

Figure 9. Let’s Architect

Check it out!

And finally…

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!

Resilience patterns and trade-offs

Figure 10. Resilience patterns and trade-offs

Check it out!

Bonus! Three older special mentions

These three posts were published before 2023, but we think they deserve another round of applause because you, our readers, keep coming back to them.

Thanks again to everyone for their contributions during a wild year. We hope you’re looking forward to the rest of 2024 as much as we are!

Announcing updates to the AWS Well-Architected Framework guidance

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework-guidance/

We are excited to announce the availability of improved AWS Well-Architected Framework guidance. In this update, we have made changes across all six pillars of the framework: Operational ExcellenceSecurityReliabilityPerformance EfficiencyCost Optimization, and Sustainability.

In this release, we have made the implementation guidance for the new and updated best practices more prescriptive, including enhanced recommendations and steps on reusable architecture patterns targeting specific business outcomes in the Amazon Web Services (AWS) Cloud.

A brief history

The Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads in the cloud.

In 2012, the first version of the framework was published, leading to the 2015 release of the guidance whitepaper. We added the Operational Excellence pillar in 2016. The pillar-specific whitepapers and AWS Well-Architected Lenses were released in 2017, and the following year, the AWS Well-Architected Tool was launched.

In 2020, Well-Architected Framework guidance had a new release, along with more lenses, as well as API integration with the AWS Well-Architected Tool. The sixth pillar, Sustainability, was added in 2021. In 2022, dedicated pages were introduced for each consolidated best practice across all six pillars, with several best practices updated with improved prescriptive guidance. By April 2023, more than 50% of the Framework’s best practices have had their prescriptive guidance improved.

A brief history of the AWS Well-Architected Framework

A brief history of the AWS Well-Architected Framework

What’s new

As customers mature in their journey, they are seeking guidance to achieve accurate solutions that is prescriptive to their business, environments, and workloads. AWS Well-Architected is committed to providing such information to customers by continually evolving and updating our guidance.

The content updates and improvements in this release focus on having more complete coverage across the AWS service portfolio, helping customers make more informed decisions when developing implementation plans. Services that were added or expanded in coverage include: AWS Elastic Disaster Recovery, AWS Trusted Advisor, AWS Resilience Hub, AWS Config, AWS Security Hub, Amazon GuardDuty, AWS Organizations, AWS Control Tower, AWS Compute Optimizer, AWS Budgets, Amazon CodeWhisperer, Amazon CodeGuru, Amazon EventBridge, Amazon CloudWatch, Amazon Simple Notification Service, AWS Systems Manager, Amazon ElastiCache, and AWS Global Accelerator.

Pillar updates

Operational Excellence

The Operational Excellence Pillar has received updates to two of the five Design Principles and has a new Design Principle on observability, which highlights its importance and relevance throughout the pillar content. All 10 best practices in OPS05 have been updated, and we have consolidated 28 best practices into 16, across four questions (OPS04, OPS06, OPS08, and OPS09), as well as improving prescriptive guidance.

Security

In the Security Pillar, the Incident response in SEC10 underwent an update to align with the AWS Security Incident Response Guide, while introducing one new best practice, and improving the prescriptive guidance for others. Two best practices in SEC08 and SEC09 have received improved prescriptive guidance on securing workloads at rest and in transit.

Reliability

The Reliability Pillar has received prescriptive guidance improvements to one best practice in REL06, and six best practices in REL11, focused on how to best monitor, failover, remediate, and limit impacts of failures. The update addresses a wide variety of managed services and designs, including multi-Region-based resilience.

Performance Efficiency

The Performance Efficiency Pillar has been completely restructured, consolidating and merging guidance to reduce the number of best practices by 10 and the number of questions by three. We have added best practices around efficient caching and optimizing hardware acceleration. We have also improved the implementation guidance in all 32 best practices of the newly restructured Pillar.

Cost Optimization

The Cost Optimization Pillar has 10 best practices with improved implementation prescriptive guidance.

Sustainability

The Sustainability Pillar has received updates to the risk levels of seven best practices.

Conclusion

This Well-Architected release includes updates and improvements to 90 best practices: Operational Excellence (26), Security (8), Reliability (7), Performance Efficiency (32), Cost Optimization (10), and Sustainability (7). These changes are in addition to the 151 improved best practices released in 2023 (127 in April 10, 2023; and 24 in July 13, 2023), resulting in more than 73% of the existing Framework best practices updated at least once in the last year.

As of this release, 100% of Performance Efficiency, Cost Optimization, and Sustainability; 63% of Operational Excellence; 60% of Security; and 50% of Reliability Pillar content have been refreshed at least once since October 2022.

The content is available in 11 languages: English, Spanish, French, German, Italian, Japanese, Korean, Indonesian, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Updates in this release are also available in the AWS Well-Architected Tool, which can be used to review your workloads, address important design considerations, and help ensure that you follow the best practices and guidance of the AWS Well-Architected Framework.

Ready to get started? Review the updated AWS Well-Architected Framework Pillar best practices, as well as pillar-specific whitepapers.

Have questions about some of the new best practices or most recent updates? Join our growing community on AWS re:Post.

Let’s Architect! Monitoring production systems at scale

Post Syndicated from Vittorio Denti original https://aws.amazon.com/blogs/architecture/lets-architect-monitoring-production-systems-at-scale/

“Everything fails, all the time” is a famous quote from Amazon’s Chief Technology Officer Werner Vogels. This means that software and distributed systems may eventually fail because something can always go wrong. We have to accept this and design our systems accordingly, test our software and services, and think about all the possible edge cases.

With this in mind, we should also set our teams up for success by providing visibility in every environment for a quick turnaround when incidents happen. When a system serves traffic in production, we need to monitor it to make sure it behaves as expected and that all components are healthy. But questions arise such as:

  • How do we monitor a system?
  • What is monitoring?
  • What are some architectural and engineering approaches to implement in order to design a successful monitoring strategy?

All of these questions require complex answers. It’s not possible to cover everything in a blog post, but let’s start exploring the topic and sharing resources to guide you through this domain.

In this edition of Let’s Architect! we share some practices for monitoring used at Amazon and AWS, as well as more resources to discover how to build monitoring solutions for the workloads running on AWS.

Observability best practices at Amazon

Observability and monitoring are engineering tasks that also require putting a suitable cultural mindset in place. At Amazon, if a service doesn’t run as expected, the team writes a CoE (Correction of Errors) document to analyze the issue and answer critical questions to learn from it. There are also weekly operations meetings to analyze operational and performance dashboards for each service.

The session introduced here covers the full range of monitoring at Amazon, from how teams assess system health at a high level to how they understand the details of a single request. Use this resource to learn some best practices for metrics, logs, and tracing, and using these signals to achieve operational excellence.

Take me to this re:Invent video!

Observability is an iterative process which requires us to establish a feedback loop and improve based on the signals coming from the system.

Build an observability solution using managed AWS services and the OpenTelemetry standard

Visibility of what’s happening in a distributed system is key to operationalize workloads at scale. OpenTelemetry is the standard for observability and AWS services are fully integrated with that. The blog post introduced in this section shows you how AWS Distro for OpenTelemetry (ADOT) works under the hood and how to use it with a Kubernetes cluster. But keep in mind, this is just one of the many implementations available for AWS compute services and OpenTelemetry—so even if you’re not using Kubernetes right now, we’ve still got you covered!

Want more? Watch this re:Invent video for an understanding of how to think about logging, tracing, metrics, and monitoring with AWS services, and the possibilities to provide the observability your distributed systems need. This is a great learning resource with many demos and examples.

Take me to this blog post!

Flow of metrics and traces from Application services to the Observability Platform.

Optimizing your AWS Batch architecture for scale with observability dashboards

We’ve explored the mental models and strategies for monitoring in previous resources. Now let’s see how these principles can be applied in a scenario where we run batch and ML computing jobs at scale. In the blog post introduced in this section, you can learn how to use runtime metrics to understand an architecture designed on AWS Batch for running batch computing jobs. AWS Batch is a fully managed service enabling you to run jobs at any scale without needing to manage underlying compute resources. This blog explains how AWS Batch works and guides you through the process used to design a monitoring framework.

Since the solution is open-source, you are free to add other custom metrics you find useful. To get started with the AWS Batch open-source observability solution, visit the project page on GitHub. Several customers have used this monitoring tool to optimize their workload for scale by reshaping their jobs, refining their instance selection, and tuning their AWS Batch architecture.

Take me to this blog!

High-level structure of AWS Batch resources and interactions. This diagram depicts a user submitting jobs based on a job definition template to a job queue, which then communicates to a compute environment that resources are needed.

Observability workshop

This resource provides a hands-on experience for you on the variety of toolsets AWS offers to set up monitoring and observability on your applications. Whether your workload is on-premises or on AWS—or your application is a giant monolith or based on modern microservices-based architecture—the observability tools can provide deeper insights into application performance and health.

The monitoring tools covered in this workshop provide powerful capabilities that enable you to identify bottlenecks, issues, and defects without having to manually sift through various logs, metrics, and trace data.

Take me to this workshop!

The diagram illustrates the various components of the PetAdoptions architecture. In the workshop you will learn how to monitor this application.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about containers on AWS.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Announcing updates to the AWS Well-Architected Framework

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework-2/

We are excited to announce the availability of improved AWS Well-Architected Framework guidance. In this update, we have made changes across all six pillars of the framework: Operational ExcellenceSecurityReliabilityPerformance EfficiencyCost Optimization, and Sustainability.

A brief history

The AWS Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads in the cloud.

In 2012, the first version of the framework was published, leading to the 2015 release of the guidance whitepaper. We added the operational excellence pillar in 2016. The pillar-specific whitepapers and AWS Well-Architected Lenses were released in 2017, and, the following year, the AWS Well-Architected Tool was launched.

In 2020, the content for the Well-Architected Framework received a major update, as well as more lenses, and API integration with the AWS Well-Architected Tool. The sixth pillar, Sustainability, was added in 2021. In 2022, dedicated pages were introduced for each consolidated best practice across all six pillars, with several best practices updated with improved prescriptive guidance.

AWS Well-Architected timeline

AWS Well-Architected timeline

What’s new

Well-Architected Framework content is consistently updated and improved in order to adapt to the constantly changing and innovating AWS environment, with new and evolved emerging services and technologies. This ensures cloud architects can build and operate secure, high-performing, resilient, efficient, and sustainable systems in the AWS Cloud.

The content updates and improvements in this release focus on providing more complete coverage across the AWS service portfolio to help customers make more informed decisions when developing implementation plans. Services that were added or expanded in coverage include: AWS Elastic Disaster Recovery, AWS Trusted Advisor, AWS Resilience Hub, AWS Config, AWS Security Hub, Amazon GuardDuty, AWS Organizations, AWS Control Tower, AWS Compute Optimizer, AWS Budgets, Amazon CodeWhisperer, and Amazon CodeGuru.

Pillar updates

The Operational Excellence Pillar has a new best practice on enabling support plans for production workloads. This Pillar also has a major update on defining a customer communication plan for outages.

In the Security Pillar, we added a new best practice area, Application Security (AppSec). AppSec is complete with eight new best practices to guide customers as they develop, test, and release software, providing guidance on how to consider the tools, testing, and organizational approach used to develop software.

The Reliability Pillar has a new best practice on architecting workloads to meet availability targets and uptime service-level agreements (SLAs). We also added the resilience shared responsibility model to its introduction section.

The Cost Optimization Pillar has new best practices on automating operations as a part of cost-optimization efforts and enforcing data-retention policies.

In the Sustainability Pillar, we introduced a clear process for selecting Regions, as well as tools for right-sizing services and improving the overall utilization of resources in the AWS Cloud.

Best practice updates

The implementation guidance and best practices have been updated in this release to be more prescriptive, including enhanced recommendations and steps on reusable architecture patterns targeting specific business outcomes in the AWS Cloud.

As many as 113 best practices are updated with more prescriptive guidance in Operational Excellence (22), Security (18), Reliability (14), Performance Efficiency (10), Cost Optimization (22), and Sustainability (27). Fourteen new best practices have been introduced in Operational Excellence (1), Security (9), Reliability (1), Cost Optimization (2), and Sustainability (1).

From a total of 127 new/updated best practices, 78% include explicit implementation steps as part of making them more prescriptive. The remaining 22% have been updated by improving their existing implementation steps. These changes are in addition to the 51 improved best practices released in 2022 (18 in Q3 2022, and 33 in Q4 2022), resulting in more than 50% of the existing Framework best practices having been updated recently.

The content is available in 11 languages: English, Spanish, French, German, Italian, Japanese, Korean, Indonesian, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Here is the list of best practices that are new or updated in this release:

  • Operational Excellence: OPS01-BP03, OPS01-BP04, OPS02-BP01, OPS02-BP06, OPS02-BP07, OPS03-BP04, OPS03-BP05, OPS04-BP01, OPS04-BP03, OPS04-BP04, OPS04-BP05, OPS05-BP02, OPS05-BP06, OPS05-BP07, OPS07-BP01, OPS07-BP05, OPS07-BP06, OPS08-BP02, OPS08-BP03, OPS08-BP04, OPS10-BP05, OPS11-BP01, OPS11-BP04
  • Security: SEC01-BP01, SEC01-BP02, SEC01-BP07, SEC02-BP01, SEC02-BP02, SEC02-BP03, SEC02-BP05, SEC03-BP02, SEC03-BP04, SEC03-BP07, SEC03-BP09, SEC04-BP01, SEC05-BP01, SEC06-BP01, SEC07-BP01, SEC08-BP04, SEC08-BP02, SEC09-BP02, SEC03-BP08, SEC11-BP01, SEC11-BP02, SEC11-BP03, SEC11-BP04, SEC11-BP05, SEC11-BP06, SEC11-BP07, SEC11-BP08
  • Reliability: REL01-BP01, REL01-BP02, REL01-BP03, REL01-BP04, REL01-BP06, REL02-BP01, REL09-BP01, REL09-BP02, REL09-BP03, REL09-BP04, REL10_BP04, REL10-BP03, REL11-BP07, REL13-BP02, REL13-BP03
  • Performance Efficiency: PERF02-BP06, PERF05_BP03, PERF05-BP02, PERF05-BP04, PERF05-BP05, PERF05-BP06, PERF05-BP07, PFRF04-BP04, PERF02_BP04, PERF02_BP05
  • Cost Optimization: COST02_BP01, COST02_BP02, COST02_BP03, COST02_BP05, COST03_BP02, COST03_BP04, COST03_BP05, COST04_BP01, COST04_BP02, COST04_BP03, COST04_BP04, COST04_BP05, COST05_BP03, COST05_BP05, COST05_BP06, COST06_BP01, COST06_BP03, COST07_BP01, COST07_BP02, COST07_BP05, COST09_BP03, COST10_BP01, COST10_BP02, COST11_BP01
  • Sustainability: SUS01_BP01, SUS02_BP01, SUS02_BP02, SUS02_BP03, SUS02_BP04, SUS02_BP05, SUS02_BP06, SUS03_BP01, SUS03_BP02, SUS03_BP03, SUS03_BP04, SUS03_BP05, SUS04_BP01, SUS04_BP02, SUS04_BP03, SUS04_BP04, SUS04_BP05, SUS04_BP06, SUS04_BP07, SUS04_BP08, SUS05_BP01, SUS05_BP02, SUS05_BP03, SUS05_BP04, SUS06_BP01, SUS06_BP02, SUS06_BP03, SUS06_BP04

Updates in this release are also available in the AWS Well-Architected Tool, which can be used to review your workloads, address important design considerations, and help ensure that you follow the best practices and guidance of the AWS Well-Architected Framework.

Ready to get started? Review the updated AWS Well-Architected Framework Pillar best practices, as well as pillar-specific whitepapers.

Have questions about some of the new best practices or most recent updates? Join our growing community on AWS re:Post.

AWS Week in Review: Public Preview of Amazon DataZone and AWS DataSync Updates – April 3, 2023

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-week-in-review-public-preview-of-amazon-datazone-and-aws-datasync-updates-april-3-2023/

Last weekend, I enjoyed the spring vibes at Seoul Forest, a large park in the middle of Seoul city, where cherry blossoms are in full bloom.

Compared to last year, there were crowds of people, so I realized that it was really back to normal after the pandemic. I hope you all enjoy the season of spring or fall with your family.

Last Week’s Launches
Like an April Fool’s Day joke, there were 65 launches last week, far more than usual. AWS product teams are working hard with a customer obsession.

So, I had a lot of trouble choosing the important ones. Other than the ones I’ve picked out, there may be important feature releases that fit your needs. Be sure to take a look at the full launches list in the last week.

First, here is a list of the general availability of AWS services and features treated by AWS News Blog:

Let’s take a look at some launches from the last week that I want to remind you of:

The Preview of Amazon DataZone – At AWS re:Invent 2022, we preannounced Amazon DataZone, a new data management service to catalog, discover, analyze, share, and govern data between data producers and consumers in the organization. You can now try out the public preview of Amazon DataZone.

Data producers populate the business data catalog from AWS Glue Data Catalog and Amazon Redshift tables. Data consumers search for and subscribe to data assets in the data catalog and analyze with tools such as Amazon Athena query editors in the Amazon DataZone portal. To get started with Amazon DataZone, see our Quick Start Guide to include sample datasets to implement a complete use case.

AWS DataSync Supports Azure Blob Storage in PreviewAWS DataSync supports copying your object data at scale from Azure Blob Storage to AWS storage services such as Amazon S3. AWS DataSync supports all blob types within Azure Blob Storage and can also be used with Azure Data Lake Storage (ADLS) Gen 2.

In addition to Azure Blob Storage, DataSync supports Google Cloud Storage and Azure Files storage locations as well as various general storage systems and AWS storage services. To learn more, see Migrating Azure Blob Storage to Amazon S3 using AWS DataSync in the AWS Storage Blog.

On-call schedules with AWS Systems Manager Incident Manager – You can now configure or change on-call rotation schedules with a group of contacts and have 24/7 coverage and responsiveness for critical issues in the Incident Manager console.

AWS Incident Manager helps you bring the right people and information together when a critical issue is detected, activating preconfigured response plans to engage responders using SMS, phone calls, and chat channels, as well as to run AWS Systems Manager Automation runbooks. To learn how to get started with an-call schedules in Incident Manager, see our Working with on-call schedules in Incident Manager in the AWS documentation.

AWS CloudShell Colsone Toolbar – You can now use AWS Cloudshell Console Toolbar with AWS Management Console in a single view. The Console Toolbar maintains its state (e.g., open, closed) and commands will continue to run in CloudShell as you navigate between services in the Console. For example, it allows you to run a command in CloudShell and view a CloudWatch alarm in the Console at the same time.

After signing into the Console, you can access CloudShell in the lower left of the Console by selecting the CloudShell icon in the Console Toolbar.

New Features of AWS Well-Architected Tool – The Consolidated Report and Enhanced Search enable customers to quickly identify risk themes across their workloads and scale improvements across their organization. This macro-level view helps executive stakeholders understand where common issues lie and prioritize team resources to drive widespread improvement. To learn more, see AWS Well-Architected Tool Dashboard in the AWS documentation.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items that you may find interesting from the last week:

Welcome to the .NET on AWS Blog – We launched a new blog channel for millions of .NET developers across the world. Blog posts will also cover built-for-the-cloud development, modernizing .NET Framework applications, and how to deploy .NET workloads on different AWS services. We will use this channel to share news on the work we’ve done with the .NET open-source community, post follow-ups from important events, and post announcements about upcoming presentations from our .NET developer advocates. To learn more, visit our .NET on AWS website and follow us on Twitter at @dotnetonAWS.

AWS Knowledge Center in AWS re:Post – You can now access trusted, authoritative articles and videos of AWS Knowledge Center on AWS re:Post to get answers to technical questions. Knowledge Center content is produced by an AWS team and covers the most frequent questions and requests from AWS customers. These articles are available in 10 localized languages: English, French, German, Italian, Japanese, Korean, Portuguese, Simplified Chinese, Spanish, and Traditional Chinese.

TF1’s FIFA Worldcup Digital Broadcasting Story – Sébastien shared an awesome story about how the French broadcaster TF1 use AWS Cloud technology and expertise to bring the FIFA World Cup to millions of people. He shared the history of redesigning its digital broadcasting architecture on AWS, testing the new platform on large-scale sporting events. For the preparation of the FIFA Worldcup event, TF1 enhanced monitoring to detect anomalies during the event and established the backup plan in a “war room” for the worst scenario. Even if you’re not a fan of football, I recommend reading the behind-the-scenes of the FIFA Worldcup Finals. It’s long but really fun!

Upcoming AWS Events
Check your calendars and sign up for these AWS-led events:

AWS re:Inforce 2023 – Now register AWS re:Inforce, in Anaheim, California, June 13–14. AWS Chief Information Security Officer CJ Moses will share the latest innovations in cloud security and what AWS Security is focused on. The breakout sessions will provide real-world examples of how security is embedded into the way businesses operate. To learn more and get the limited discount code to register, see CJ’s blog post of Gain insights and knowledge at AWS re:Inforce 2023 in the AWS Security Blog.

AWS Global Summits – Check your calendars and sign up for the AWS Summit closest to your city: Paris and Sydney (April 4), Seoul (May 3-4), Berlin and Singapore (May 4), Stockholm (May 11), Hong Kong (May 23), Amsterdam (June 1), London (June 7), Madrid (June 15), and Milano (June 22).

AWS Community Day – Join community-led conferences driven by AWS user group leaders closest to your city: Peru (April 15), Helsinki (April 20), Chicago (June 15), Philippines (June 29–30), and Munich (September 14). Recently, we are bringing together AWS user groups from around the world into Meetup Pro accounts. Find your group and its meetups in your city!

You can browse all upcoming AWS-led in-person and virtual events, and developer-focused events such as AWS DevDay.

That’s all for this week. Check back next Monday for another Week in Review!

— Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Let’s Architect! Streamlining business with migration and modernization

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-streamlining-business-with-migration-and-modernization/

Many customers migrate their systems to Amazon Web Services (AWS) to increase their competitive edge and drive business value. To maximize the benefits of a cloud migration, companies tend to move their applications in conjunction with modernization initiatives. These joined efforts help your applications gain more agility, scalability, and resilience. Modernizing the portfolio of workloads with AWS means that you can re-platform, refactor, or replace these workloads by using containers, serverless technologies, purpose-built data stores, and software automation. These functionalities allow you to benefit from the best of the AWS agility and total cost optimization (TCO) benefits.

In this edition of Let’s Architect! we share hands-on activities, customer stories, and tips and tricks to migrate and modernize your applications with AWS.

Migrating to the cloud: What is the cost of doing nothing?

Would you think that small companies always migrate faster than large enterprises? Actually, cloud migration speed doesn’t necessarily depend on the size of the business! Company size is not a clear indicator of migration and modernization success, but a shift of culture and mindset is essential for successful company evolution.

When it comes to migration, the cost of doing nothing is not just financial: Businesses can also expect a slower pace of innovation and a higher security burden. This video analyzes the financial benefits of migration and shares mental models for approaching an AWS cloud migration, and Marriott team members explain how they planned their migration and the lessons learned along the way.

Take me to this re:Invent 2022 video!

Benefits of an early migration start

Benefits of an early migration start

Modernization pathways for a legacy .NET Framework monolithic application on AWS

Organizations aim to deliver the best technological solutions based on customer needs. At any stage in their cloud adoption journey, businesses often end up managing and building monolithic applications. Let’s explore a migration path for a monolithic .NET Framework application to a modern microservices-based stack on AWS, and discuss AWS tools to break the monolith into microservices and containerize applications.

Cost optimization is another key factor for modernizing your workloads and solutions include moving to Linux-based systems or using open-source database engines. This Migrate and Modernize enterprise workloads with AWS video walks you through the process of migrating and modernizing enterprise workloads with AWS.

Take me to this blog post with more detail!

A modernized microservices-based rearchitecture

A modernized microservices-based rearchitecture

Implementing a serverless-first strategy in an enterprise

Organizations of all sizes want to benefit from the agility, cost savings, and developer experience that serverless architectures can provide on AWS. For large enterprises, the return on investment (ROI) can be massive, but overcoming architecture inertia while ensuring security best practices and governance stay in place is a hurdle that many struggle with. In this lightning talk, learn how your organization can implement a serverless-first strategy to overcome these obstacles. Delta Air Lines shares the story of making serverless-first a reality as part of their AWS journey.

Take me to this video

Benefits of serverless

Benefits of serverless

Application Migration with AWS

This workshop shows you how to migrate and modernize a fictional application to the AWS Cloud by:

  1. Performing a database migration
  2. Migrating and modernizing your web server using different migration strategies (for example, breaking down the monolith into containers)
  3. Teaching you how to improve Operation excellence, Security, Performance efficiency, and Cost optimization of the deployed architecture by following these pillars of the AWS Well-Architected Framework.

Take me to this workshop!

Different migration strategies for web servers

Different migration strategies for web servers

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about distributed systems with containers.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Let’s Architect! Architecting a data mesh

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-a-data-mesh/

Data architectures were mainly designed around technologies rather than business domains in the past. This changed in 2019, when Zhamak Dehghani introduced the data mesh. Data mesh is an application of the Domain-Driven-Design (DDD) principles to data architectures: Data is organized into data domains and the data is the product that the team owns and offers for consumption.

A data mesh architecture unites the disparate data sources within an organization through centrally managed data-sharing and governance guidelines. Business functions can maintain control over how shared data is accessed because data mesh also solves advanced data security challenges through distributed, decentralized ownership.

This edition of Let’s Architect! introduces data mesh, highlights the foundational concepts of data architectures, and covers the patterns for designing a data mesh in the AWS cloud with supporting resources.

Data lakes, lake houses and data mesh: what, why, and how?

Let’s explore a video introduction to data lakes, lake houses, and data mesh. This resource explains how to leverage those concepts to gain greater data insights across different business segments, with a special focus on best practices to build a well-architected, modern data architecture on AWS. It also gives an overview of the AWS cloud services that can be used to create such architectures and describes the fundamental pillars of designing them.

Take me to this intro to data lakes, lake houses, and data mesh video!

Data mesh is an architecture pattern where data are organized into domains and seen as products to expose for consumption

Data mesh is an architecture pattern where data are organized into domains and seen as products to expose for consumption

Building data mesh architectures on AWS

Knowing what a data mesh architecture is, here is a step-by-step video from re:Invent 2022 on designing one. It covers a use case on how GoDaddy considered and implemented data mesh, in addition to:

  • The fundamental pillars behind a well-architected data mesh in the cloud
  • Finding an approach to build a data mesh architecture using native AWS services
  • Reasons for considering a data mesh architecture where data lakes provide limitations in some scenarios
  • How data mesh can be applied in practice to overcome them
  • The mental models to apply during the data mesh design process

Take me to this re:Invent 2022 video!

In the data mesh architecture the producers expose their data for consumption to the consumers. Access is regulated through a centralized governance layer.

In the data mesh architecture the producers expose their data for consumption to the consumers. Access is regulated through a centralized governance layer.

Amazon DataZone: Democratize data with governance

Now let’s explore data accessibility as it relates to data mesh architectures.

Amazon DataZone is a new AWS business data catalog allowing you to unlock data across organizational boundaries with built-in governance. This service provides a unified environment where everyone in an organization—from data producers to data consumers—can access, share, and consume data in a governed manner.

Here is a video to learn how to apply AWS analytics services to discover, access, and share data across organizational boundaries within the context of a data mesh architecture.

Take me to this re:Invent 2022 video!

Amazon DataZone accelerates the adoption of the data mesh pattern by making it scalable to high number of producers and consumers.

Amazon DataZone accelerates the adoption of the data mesh pattern by making it scalable to high number of producers and consumers.

Build a data mesh on AWS

Feeling inspired to build? Hands-on experience is a great way to learn and see how the theoretical concepts apply in practice.

This workshop teaches you a data mesh architecture building approach on AWS. Many organizations are interested in implementing this architecture to:

  1. Move away from centralized data lakes to decentralized ownership
  2. Deliver analytics solutions across business units

Learn how a data mesh architecture can be implemented with AWS native services.

Take me to this workshop!

The diagrams shows how to separate the producers, consumers and governance components through a multi-account strategy.

The diagrams shows how to separate the producers, consumers and governance components through a multi-account strategy.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about monitoring and observability.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Introducing AWS Lambda Powertools for .NET

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-powertools-for-net/

This blog post is written by Amir Khairalomoum, Senior Solutions Architect.

Modern applications are built with modular architectural patterns, serverless operational models, and agile developer processes. They allow you to innovate faster, reduce risk, accelerate time to market, and decrease your total cost of ownership (TCO). A microservices architecture comprises many distributed parts that can introduce complexity to application observability. Modern observability must respond to this complexity, the increased frequency of software deployments, and the short-lived nature of AWS Lambda execution environments.

The Serverless Applications Lens for the AWS Well-Architected Framework focuses on how to design, deploy, and architect your serverless application workloads in the AWS Cloud. AWS Lambda Powertools for .NET translates some of the best practices defined in the serverless lens into a suite of utilities. You can use these in your application to apply structured logging, distributed tracing, and monitoring of metrics.

Following the community’s continued adoption of AWS Lambda Powertools for Python, Java, and TypeScript, AWS Lambda Powertools for .NET is now generally available.

This post shows how to use the new open source Powertools library to implement observability best practices with minimal coding. It walks through getting started, with the provided examples available in the Powertools GitHub repository.

About Powertools

Powertools for .NET is a suite of utilities that helps with implementing observability best practices without needing to write additional custom code. It currently supports Lambda functions written in C#, with support for runtime versions .NET 6 and newer. Powertools provides three core utilities:

  • Tracing provides a simpler way to send traces from functions to AWS X-Ray. It provides visibility into function calls, interactions with other AWS services, or external HTTP requests. You can add attributes to traces to allow filtering based on key information. For example, when using the Tracing attribute, it creates a ColdStart annotation. You can easily group and analyze traces to understand the initialization process.
  • Logging provides a custom logger that outputs structured JSON. It allows you to pass in strings or more complex objects, and takes care of serializing the log output. The logger handles common use cases, such as logging the Lambda event payload, and capturing cold start information. This includes appending custom keys to the logger.
  • Metrics simplifies collecting custom metrics from your application, without the need to make synchronous requests to external systems. This functionality allows capturing metrics asynchronously using Amazon CloudWatch Embedded Metric Format (EMF) which reduces latency and cost. This provides convenient functionality for common cases, such as validating metrics against CloudWatch EMF specification and tracking cold starts.

Getting started

The following steps explain how to use Powertools to implement structured logging, add custom metrics, and enable tracing with AWS X-Ray. The example application consists of an Amazon API Gateway endpoint, a Lambda function, and an Amazon DynamoDB table. It uses the AWS Serverless Application Model (AWS SAM) to manage the deployment.

When you send a GET request to the API Gateway endpoint, the Lambda function is invoked. This function calls a location API to find the IP address, stores it in the DynamoDB table, and returns it with a greeting message to the client.

Example application

Example application

The AWS Lambda Powertools for .NET utilities are available as NuGet packages. Each core utility has a separate NuGet package. It allows you to add only the packages you need. This helps to make the Lambda package size smaller, which can improve the performance.

To implement each of these core utilities in a separate example, use the Globals sections of the AWS SAM template to configure Powertools environment variables and enable active tracing for all Lambda functions and Amazon API Gateway stages.

Sometimes resources that you declare in an AWS SAM template have common configurations. Instead of duplicating this information in every resource, you can declare them once in the Globals section and let your resources inherit them.

Logging

The following steps explain how to implement structured logging in an application. The logging example shows you how to use the logging feature.

To add the Powertools logging library to your project, install the packages from NuGet gallery, from Visual Studio editor, or by using following .NET CLI command:

dotnet add package AWS.Lambda.Powertools.Logging

Use environment variables in the Globals sections of the AWS SAM template to configure the logging library:

  Globals:
    Function:
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: powertools-dotnet-logging-sample
          POWERTOOLS_LOG_LEVEL: Debug
          POWERTOOLS_LOGGER_CASE: SnakeCase

Decorate the Lambda function handler method with the Logging attribute in the code. This enables the utility and allows you to use the Logger functionality to output structured logs by passing messages as a string. For example:

[Logging]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
  Logger.LogInformation("Getting ip address from external service");
  var location = await GetCallingIp();
  ...
}

Lambda sends the output to Amazon CloudWatch Logs as a JSON-formatted line.

{
  "cold_start": true,
  "xray_trace_id": "1-621b9125-0a3b544c0244dae940ab3405",
  "function_name": "powertools-dotnet-tracing-sampl-HelloWorldFunction-v0F2GJwy5r1V",
  "function_version": "$LATEST",
  "function_memory_size": 256,
  "function_arn": "arn:aws:lambda:eu-west-2:286043031651:function:powertools-dotnet-tracing-sample-HelloWorldFunction-v0F2GJwy5r1V",
  "function_request_id": "3ad9140b-b156-406e-b314-5ac414fecde1",
  "timestamp": "2022-02-27T14:56:39.2737371Z",
  "level": "Information",
  "service": "powertools-dotnet-sample",
  "name": "AWS.Lambda.Powertools.Logging.Logger",
  "message": "Getting ip address from external service"
}

Another common use case, especially when developing new Lambda functions, is to print a log of the event received by the handler. You can achieve this by enabling LogEvent on the Logging attribute. This is disabled by default to prevent potentially leaking sensitive event data into logs.

[Logging(LogEvent = true)]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
}

With logs available as structured JSON, you can perform searches on this structured data using CloudWatch Logs Insights. To search for all logs that were output during a Lambda cold start, and display the key fields in the output, run following query:

fields coldStart='true'
| fields @timestamp, function_name, function_version, xray_trace_id
| sort @timestamp desc
| limit 20
CloudWatch Logs Insights query for cold starts

CloudWatch Logs Insights query for cold starts

Tracing

Using the Tracing attribute, you can instruct the library to send traces and metadata from the Lambda function invocation to AWS X-Ray using the AWS X-Ray SDK for .NET. The tracing example shows you how to use the tracing feature.

When your application makes calls to AWS services, the SDK tracks downstream calls in subsegments. AWS services that support tracing, and resources that you access within those services, appear as downstream nodes on the service map in the X-Ray console.

You can instrument all of your AWS SDK for .NET clients by calling RegisterXRayForAllServices before you create them.

public class Function
{
  private static IDynamoDBContext _dynamoDbContext;
  public Function()
  {
    AWSSDKHandler.RegisterXRayForAllServices();
    ...
  }
  ...
}

To add the Powertools tracing library to your project, install the packages from NuGet gallery, from Visual Studio editor, or by using following .NET CLI command:

dotnet add package AWS.Lambda.Powertools.Tracing

Use environment variables in the Globals sections of the AWS SAM template to configure the tracing library.

  Globals:
    Function:
      Tracing: Active
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: powertools-dotnet-tracing-sample
          POWERTOOLS_TRACER_CAPTURE_RESPONSE: true
          POWERTOOLS_TRACER_CAPTURE_ERROR: true

Decorate the Lambda function handler method with the Tracing attribute to enable the utility. To provide more granular details for your traces, you can use the same attribute to capture the invocation of other functions outside of the handler. For example:

[Tracing]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
  var location = await GetCallingIp().ConfigureAwait(false);
  ...
}

[Tracing(SegmentName = "Location service")]
private static async Task<string?> GetCallingIp()
{
  ...
}

Once traffic is flowing, you see a generated service map in the AWS X-Ray console. Decorating the Lambda function handler method, or any other method in the chain with the Tracing attribute, provides an overview of all the traffic flowing through the application.

AWS X-Ray trace service view

AWS X-Ray trace service view

You can also view the individual traces that are generated, along with a waterfall view of the segments and subsegments that comprise your trace. This data can help you pinpoint the root cause of slow operations or errors within your application.

AWS X-Ray waterfall trace view

AWS X-Ray waterfall trace view

You can also filter traces by annotation and create custom service maps with AWS X-Ray Trace groups. In this example, use the filter expression annotation.ColdStart = true to filter traces based on the ColdStart annotation. The Tracing attribute adds these automatically when used within the handler method.

View trace attributes

View trace attributes

Metrics

CloudWatch offers a number of included metrics to help answer general questions about the application’s throughput, error rate, and resource utilization. However, to understand the behavior of the application better, you should also add custom metrics relevant to your workload.

The metrics utility creates custom metrics asynchronously by logging metrics to standard output using the Amazon CloudWatch Embedded Metric Format (EMF).

In the sample application, you want to understand how often your service is calling the location API to identify the IP addresses. The metrics example shows you how to use the metrics feature.

To add the Powertools metrics library to your project, install the packages from the NuGet gallery, from the Visual Studio editor, or by using the following .NET CLI command:

dotnet add package AWS.Lambda.Powertools.Metrics

Use environment variables in the Globals sections of the AWS SAM template to configure the metrics library:

  Globals:
    Function:
      Environment:
        Variables:
          POWERTOOLS_SERVICE_NAME: powertools-dotnet-metrics-sample
          POWERTOOLS_METRICS_NAMESPACE: AWSLambdaPowertools

To create custom metrics, decorate the Lambda function with the Metrics attribute. This ensures that all metrics are properly serialized and flushed to logs when the function finishes its invocation.

You can then emit custom metrics by calling AddMetric or push a single metric with a custom namespace, service and dimensions by calling PushSingleMetric. You can also enable the CaptureColdStart on the attribute to automatically create a cold start metric.

[Metrics(CaptureColdStart = true)]
public async Task<APIGatewayProxyResponse> FunctionHandler
         (APIGatewayProxyRequest apigProxyEvent, ILambdaContext context)
{
  ...
  // Add Metric to capture the amount of time
  Metrics.PushSingleMetric(
        metricName: "CallingIP",
        value: 1,
        unit: MetricUnit.Count,
        service: "lambda-powertools-metrics-example",
        defaultDimensions: new Dictionary<string, string>
        {
            { "Metric Type", "Single" }
        });
  ...
}

Conclusion

CloudWatch and AWS X-Ray offer functionality that provides comprehensive observability for your applications. Lambda Powertools .NET is now available in preview. The library helps implement observability when running Lambda functions based on .NET 6 while reducing the amount of custom code.

It simplifies implementing the observability best practices defined in the Serverless Applications Lens for the AWS Well-Architected Framework for a serverless application and allows you to focus more time on the business logic.

You can find the full documentation and the source code for Powertools in GitHub. We welcome contributions via pull request, and encourage you to create an issue if you have any feedback for the project. Happy building with AWS Lambda Powertools for .NET.

For more serverless learning resources, visit Serverless Land.

Let’s Architect! Architecture tools

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecture-tools/

Tools, such as diagramming software, low-code applications, and frameworks, make it possible to experiment quickly. They are essential in today’s fast-paced and technology-driven world. From improving efficiency and accuracy, to enhancing collaboration and creativity, a well-defined set of tools can make a significant impact on the quality and success of a project in the area of software architecture.

As an architect, you can take advantage of a wide range of resources to help you build solutions that meet the needs of your organization. For example, with tools in the likes of the Amazon Web Services (AWS) Solutions Library and Serverless Land, you can boost your knowledge and productivity while working on event-driven architectures, microservices, and stateless computing.

In this Let’s Architect! edition, we explore how to incorporate these patterns into your architecture, and which tools to leverage to build solutions that are scalable, secure, and cost-effective.

How AWS Application Composer helps your team build great apps

In this re:Invent 2022 session, Chase Douglas, Principal Engineer at AWS, speaks about AWS Application Composer, a newly launched service.

This service has the potential to change the way architects design solutions—without writing a single line of code! The service is user-friendly, intuitive, and requires no prior coding experience. It allows users to scaffold a serverless architecture, defining a CloudFormation template visually with drag-and-drop. A detailed AWS Compute Blog post takes readers through the process of using AWS Application Composer.

Take me to this re:Invent 2022 video!

How an architecture can be designed with AWS Application Composer

How an architecture can be designed with AWS Application Composer

AWS design + build tools

When migrating to the cloud, we suggest referencing these four tried-and-true AWS resources that can be used to design and build projects.

  1. AWS Workshops are created by AWS teams to provide opportunities for hands-on learning to develop practical skills. Workshops are available in multiple categories and for skill levels 100-400.
  2. AWS Architecture Center contains a collection of best practices and architectural patterns for designing and deploying cloud-based solutions using AWS services. Furthermore, it includes detailed architecture diagrams, whitepapers, case studies, and other resources that provide a wealth of information on how to design and implement cloud solutions.
  3. Serverless Land (an Amazon property) brings together various patterns, workflows, code snippets, and blog posts pertaining to AWS serverless architectures.
  4. AWS Solutions Library provides customers with templates, tools, and automated workflows to easily deploy, operate, and manage common use cases on the AWS Cloud.
Inside event-driven architectures designed by David Boyne on Serverless Land

Inside event-driven architectures designed by David Boyne on Serverless Land

The Well-Architected way

In this session, the AWS Well-Architected provides guidance on how to implement the architectural models reported in the AWS Well-Architected Framework within your organization at scale.

Discover a customer story and understand how to use the features of the AWS Well-Architected Tool and APIs to receive recommendations based on your workload and measure your architectural metrics. In the Framework whitepaper, you can explore the six pillars of Well-Architected (operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability) and best practices to achieve them.

Understanding the key design pillars can help architects make informed design decisions, leading to more robust and efficient solutions. This knowledge also enables architects to identify potential problems early on in the design process and find appropriate patterns to address those issues.

Take me to the Well-Architected video!

Discover how the AWS Well-Architected Framework can help you design scalable, maintainable, and reusable solutions

Discover how the AWS Well-Architected Framework can help you design scalable, maintainable, and reusable solutions

See you next time!

Thanks for exploring architecture tools and resources with us!

Join us next time when we’ll talk about data mesh architecture!

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Accelerating Well-Architected Framework reviews using integrated AWS Trusted Advisor insights

Post Syndicated from Stephen Salim original https://aws.amazon.com/blogs/architecture/accelerating-well-architected-framework-reviews-using-integrated-aws-trusted-advisor-insights/

In this blog, we will explain how the new AWS Well-Architected integration with AWS Trusted Advisor can give you insights that help you create a flywheel effect to accelerate your cloud optimization. Customers that have the most success in their cloud adoption recognize that optimizing their cloud architecture and operations is not a one-time effort. Optimization is a continuous improvement virtuous cycle based on learning architectural and operational best practices, measuring workloads against these best practices, and implementing improvements based on opportunities recognized from measurement.

Customers can use the AWS Well-Architected Framework to build a “learn, measure, and improve” continuous improvement virtuous cycle (Figure 1). With the AWS Well-Architected Tool, customers can measure their workloads against these AWS best practices to identify improvement opportunities or risks they should address. After customers complete Well-Architected Framework Reviews (WAFRs) they can generate improvement plans with prioritized guidance and resources for improvement. They can also track the improvements made over time using the milestones feature in the Well-Architected Tool.

Continuous optimization of workloads based on AWS best practices

Figure 1. Continuous optimization of workloads based on AWS best practices

Amazon uses the term flywheel to describe a virtuous cycle that has additional drivers to add momentum, which accelerates the cycle and the value it delivers. Figure 2 is the often-referenced Amazon retail flywheel, which shows how Amazon’s focus on customer experience drives growth. It is accelerated by creating a lower cost structure, which allows Amazon to pass lower prices to its customers, improving customer experience and driving faster growth.

Amazon Flywheel concept of scaling growth

Figure 2. The Amazon Flywheel concept of scaling growth

Customers can add momentum to an AWS Well-Architected “learn, measure, and improve” virtuous cycle using tools that give more insights while measuring workloads. Improved insights result in consistent measurements, that are more efficient and more accurate. This accelerates the optimization cycle by reducing the time required to measure workloads. Collecting information on AWS resources using Trusted Advisor checks allows customers to validate if a workload’s state is aligned with AWS best practices. The new AWS Well-Architected Tool integration with AWS Trusted Advisor makes it easier and faster to gain insights during WAFRs. The Trusted Advisor checks that are relevant to a specific set of best practices have been mapped to the corresponding questions in Well-Architected. The new feature now shows the mapped Trusted Advisor checks directly in the Well-Architected Tool. These insights help customers run WAFRs in less time, with more accuracy, creating a flywheel effect (Figure 3).

Insights from AWS Trusted Advisor create acceleration in achieving improved outcomes

Figure 3. Insights from AWS Trusted Advisor create acceleration in achieving improved outcomes

AWS Well-Architected Tool integration with AWS Trusted Advisor: feature example

In the following sections, we detail an example scenario on how to use the integration with Trusted Advisor to gain insights when measuring your workloads.

Enabling the AWS Well-Architected Tool integration with AWS Trusted Advisor

How to enable the new feature in your workload:

  1. Create a new workload in the AWS Well-Architected Console. Refer to the user guide for detailed instructions.

    Optional
    : When defining a workload, within the “Application” section of workload definition, you can now also specify the AWS Service Catalog AppRegistry AWS Resource Name (ARN). This field is to indicate a relationship between the AWS Well-Architected Tool workload and the AWS resources in an AppRegistry Application when performing a Well-Architected Framework Review (Figure 4).

    Application field to select AWS Service Catalog AppRegistry ARN

    Figure 4. Application field to select AWS Service Catalog AppRegistry ARN

    This is another new AWS Well-Architected Tool feature that launched along with the integration with Trusted Advisor feature. You can find out more details about the integration with AWS Service Catalog AppRegistry in the What’s New post and on the feature documentation page. For details on how to create an AWS Service Catalog AppRegistry Application refer to Creating applications.

  2. To enable the integration with Trusted Advisor, after the necessary workload information has been entered, within the “AWS Trusted Advisor” section, tick on “Activate Trusted Advisor” (Figure 5).
    Enabling the Trusted Advisor feature

    Figure 5. Enabling the AWS Trusted Advisor feature

    Optional: Once the workload is created, note the workload ARN. You can find the workload ARN in the Properties section of the workload resource you created (Figure 6). For steps on how to identify your workload, refer to Well-Architected Tool User Guide on viewing a workload.

    AWS Well-Architected Tool showing workload ARN

    Figure 6. AWS Well-Architected Tool showing workload ARN

  3. To collect Trusted Advisor checks from accounts other than the account where the workload you are reviewing exists, you must perform two steps. You need to ensure the account IDs are listed in the workload properties for the workload you are reviewing. You must then create an IAM role in the account from which Trusted Advisor checks will be collected with the following permission and trust relationship (Figures 7 and 8). For more information on how to setup this permission, refer to the feature documentation.
    Permissions needed by AWS Well-Architected Tool to interrogate AWS Trusted Advisor

    Figure 7. Permissions needed by AWS Well-Architected Tool to interrogate AWS Trusted Advisor

    The trust relationship allowing AWS Well-Architected Tool to assume policy on behalf of the workload

    Figure 8. The trust relationship allowing AWS Well-Architected Tool to assume policy on behalf of the workload

Using integration with AWS Trusted Advisor for insights during reviews

Once the feature is enabled, additional insights will be noticeable about the resources in your workload using Trusted Advisor checks. Let’s explore an example question. In this case, we will use Question 9 from the Reliability Pillar, as there are Trusted Advisor checks related to the best practices in it: How do you back up data?

  1. AWS Well-Architected Reliability Question 9 includes best practices that are related to how workload backup is performed to support the ability for the workload to recover from failure. Current findings using Trusted Advisor checks indicates the workload may not be configured based on the “Perform data backup automatically” best practice in the Reliability Pillar (Figure 9).

    "Perform data backup automatically" best practices

    Figure 9. “Perform data backup automatically” best practices

  2. To access Trusted Advisor checks as insights, you can select a question in the Well-Architected Tool (Figure 10). If there are related Trusted Advisor checks available for a question, there will be a “View checks” button like the screenshot below. You can also select the “Trusted Advisor checks” tab.

    Trusted Advisor checks that map to best practices

    Figure 10. AWS Trusted Advisor checks that map to best practices

  3. Trusted Advisor checks are available, which provide insights related to the best practice in the question. You will also notice the state of resources recommendations and the count of resources. Trusted Advisor checks that relate to the best practice “Perform data backup automatically” are displayed. One of the Trusted Advisor checks identified with a x in a circle (denoting “Action recommended”) status is on the Amazon Elastic Block Storage (Amazon EBS) snapshots availability to recover your EBS volume from in the event of disaster (Figure 11).

    AWS Trusted Advisor check for Amazon EBS snapshots with "Action recommended"

    Figure 11. AWS Trusted Advisor check for Amazon EBS snapshots with “Action recommended”

  4. Exploring the Trusted Advisor Console, you can identify the EBS volume ID that has been detected with no snapshot in this us-west-2 region (Figure 12).

    An EBS volume that does not have snapshots

    Figure 12. An EBS volume that does not have snapshots

  5. With the insights from Trusted Advisor, we can quickly determine that the “Perform data backup automatically” best practice is not in place, as we do not have Amazon EBS snapshots enabled. Through the “helpful resources” section, instructions can be found to help automate the snapshot creation of Amazon EBS volume (Figure 13). One method to achieve this is to use AWS Backup.

    Resources with details about best practices, including links to learn more

    Figure 13. Resources with details about best practices, including links to learn more

  6. Using AWS Backup you can define a backup plan to automate snapshots creation of the EBS volume. Using this plan, you adjust the frequency of the backup to help achieve your recovery time objective and recovery point objective (Figure 14). For more information on how to configure EBS volume backup plan, refer to the Developer Guide on creating a backup plan.

    Setup automatic Amazon EBS volume snapshots

    Figure 14. Setup automatic Amazon EBS volume snapshots

  7. Once this improvement is implemented and the related EBS volume snapshot is taken, Trusted Advisor will reflect the changes to the resource (Figure 15).

    Amazon EBS volume with a snapshot

    Figure 15. Amazon EBS volume with a snapshot

  8. The next time we perform a Well-Architected Framework Review on this workload, the related AWS Trusted Advisor Check will show no action required with a check-mark status (Figure 16).
    AWS Trusted Advisor checks that represent improvements that have been implemented

    Figure 16. AWS Trusted Advisor checks that represent improvements that have been implemented

    Optional: For access to the list of Trusted Advisor checks in .csv format, you can click on the “Download check details” button on each question to download the resources that were checked in relation to the specified best practices (Figure 17).

    "Download check details" button

    Figure 17. “Download check details” button

  9. Once implemented, this improvement ensures a means to recover the EBS volume data in the event of disaster. This makes the resources in the workload better aligned to the AWS Reliability Pillar Design principle of “Automatically recover from failure”. To reflect this alignment in the Well-Architected Tool, you can tick on the best practice check items under the related questions (Figure 18).

    A milestone with updated best practices based on improvements that have been implemented

    Figure 18. A milestone with updated best practices based on improvements that have been implemented

  10. Finally, you can create a milestone to capture a point in time state of your workload WAFR. As you continuously optimize with more WAFRs and improvements, the number of high- and medium-risk items identified within each review will decrease. You will notice the continuous optimization of your workload over time, as in Figure 19.

    The history of improvements being made over time

    Figure 19. The history of improvements being made over time

Conclusion

Using the AWS Well-Architected integration with AWS Trusted Advisor, customers have a mechanism to accelerate the “learn, measure, and improve” Well-Architected virtuous cycle, creating an optimization flywheel. We have demonstrated the value of creating acceleration through the insights from Trusted Advisor checks. You now know how to enable the integration with Trusted Advisor and have seen an example of how the insights can accelerate your review cycle. You will notice the improvements you make over time will reflect in the Trusted Advisor checks as you review the milestones for your workloads. Enable this feature on your next Well-Architected Framework Review (WAFR) to measure the impact that data-driven insights from Trusted Advisor can have on reducing the time-to-value for your reviews. For more information consider these additional resources. You can contact your account team for support in running WAFRs or check out the AWS Well-Architected Partner Program to find a partner that can help you run a review. Additionally, running a WAFR with a partner assisting you in remediating risks may also provide funding credits to offset the costs required to make the improvements.

“Perform data backup automatically” is part of the Reliability Pillar of the AWS Well-Architected Framework. AWS Well-Architected is a set of guiding design principles developed by AWS to help organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Use the AWS Well-Architected Tool to review your workloads periodically to address important design considerations and ensure that they follow the best practices and guidance of the AWS Well-Architected Framework. For follow up questions or comments, join our growing community on AWS re:Post.

 

Verify the resilience of your workloads using Chaos Engineering

Post Syndicated from Seth Eliot original https://aws.amazon.com/blogs/architecture/verify-the-resilience-of-your-workloads-using-chaos-engineering/

The following is an early preview of new guidance to be published as part of updates to the AWS Well-Architected content:

Chaos Engineering enables us to find shortcomings before our customers find them and therefore, provides us with the opportunity to create a better customer experience. Chaos Engineering does not introduce chaos into your systems, instead, it finds the chaos that is already there. By definition, chaos experiments should be fail-safe and tolerated by the system. It is therefore key that you use tools that allow for controlled experiments. A controlled experiment has a clear scope of impact, built in rollback mechanisms, and tight integration with monitoring that provides deep insights to the impact of the experiment in real-time. Chaos Engineering allows you to inject real-world cloud provider faults that give you insights on what you need to improve in regards to observability, incident response, and architecture to be resilient against faults that you cannot predict. To help you with this journey, we have adjusted our guidance in the Well-Architected Reliability Pillar, enabling you to build more robust and resilient workloads on AWS.


Well-Architected Reliability best practice: verify the resilience of your workloads using Chaos Engineering

Chaos Engineering provides your teams with capabilities to continuously inject real world disruptions (simulations) in a controlled way at the service provider, infrastructure, workload, and component levels, with minimal to no impact to your customers. It allows your teams to learn from faults and observe, measure, and improve the resilience of your workloads, as well as validate that alerts fire and teams get notified in the case of an event. When run continuously, Chaos Engineering can highlight deficiencies in your workloads that, if left unaddressed, could negatively affect availability and operation.

Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. – Principles of Chaos Engineering

If a system is able to withstand these disruptions, the chaos experiment should be maintained as an automated regression test. In this way, chaos experiments should be run as part of your software development lifecycle (SDLC) and as part of your CI/CD pipeline.

To ensure that your workload can survive component failure, inject real-world events as part of your experiments. For example, experiment with the loss of EC2 instances or failover of the primary Amazon RDS database instance, and verify that your workload is not impacted (or only minimally impacted). Use a combination of component faults to simulate events that may be caused by a disruption in an Availability Zone.

For application-level faults (such as crashes), you can start with stressors such as memory and CPU exhaustion.

To validate fallback or failover mechanisms for external dependencies due to intermittent network disruptions, your components should simulate such an event by blocking access to the third-party providers for a specified duration that might last from seconds to hours.

Other modes of degradation might cause reduced functionality and slow responses, often resulting in a disruption of your services. Common sources of this type of degradation are increased latency on critical services and unreliable network communication (dropped packets). Experiments with these faults, including networking effects such as latency, dropped messages, and DNS failures, could include the inability to resolve a name, reach the DNS service, or establish connections to dependent services.

Chaos Engineering tools

AWS Fault Injection Simulator (AWS FIS) is a fully managed service for running fault injection experiments that can be used as part of your CD pipeline, or outside of the pipeline. AWS FIS is a good choice to use during Chaos Engineering game days. It supports simultaneously introducing faults across different types of resources including Amazon EC2, Amazon ECS, Amazon EKS, and Amazon RDS. These faults include termination of resources, forcing failovers, stressing CPU or memory, throttling, latency, and packet loss. Since it is integrated with Amazon CloudWatch alarms, you can set up stop conditions as guardrails to rollback an experiment if it causes an unexpected impact (Figure 1).

AWS Fault Injection Simulator integrates with AWS resources to enable you to run fault injection experiments for your workloads

Figure 1. AWS Fault Injection Simulator integrates with AWS resources to enable you to run fault injection experiments for your workloads

To expand the scope of faults that can be injected on AWS, AWS FIS integrates with Chaos Mesh and Litmus Chaos, enabling you to coordinate fault injection workflows among multiple tools. For example, you can run a stress test on a pod’s CPU using Chaos Mesh or Litmus faults while terminating a randomly selected percentage of cluster nodes using AWS FIS fault actions.

Implementation steps

1. Determine which faults to use for experiments

Assess the design of your workload for resiliency. Such designs (created using the best practices of the Well-Architected Framework) consider risks based on critical dependencies, past events, known issues, and compliance requirements. List each element of the design intended to maintain resilience and the faults it is designed to mitigate. For more information about creating such lists, see the Operational Readiness Review whitepaper, which guides you on how to create a process to prevent reoccurrence of previous incidents. The Failure Modes & Effects Analysis (FMEA) process provides a framework for performing a component-level analysis of failures and how they impact your workload. FMEA is outlined in more detail in Failure Modes and Continuous Resilience by Adrian Cockcroft.

2. Assign a priority to each fault

To assess priority, consider the frequency of the fault and the impact of failure to the overall workload. It is fine to start with a coarse categorization, such as high, medium, or low, and refine it.

When considering frequency of a given fault, analyze past data for this workload when available. If not available, use data from other workloads running in a similar environment.

When considering impact of a given fault, the larger the scope of the fault, generally the larger the impact. Also consider the workload design and purpose. For example, the ability to access the source data stores is critical for a workload doing data transformation and analysis. In this case, you would prioritize experiments for access faults, as well as throttled access and latency insertion.

Post-incident analyses are a good source of data to understand both frequency and impact of failure modes.

Use the assigned priority to determine which faults to experiment with first and the order with which to develop new fault injection experiments.

3. For each experiment that you will execute, follow the Chaos Engineering/continuous resilience flywheel (Figure 2)

Chaos Engineering/continuous resilience flywheel, using the scientific method by Adrian Hornsby

Figure 2. Chaos Engineering/continuous resilience flywheel, using the scientific method by Adrian Hornsby

3A. Define steady state as some measurable output of a workload that indicates normal behavior

Your workload exhibits steady state if it is operating reliably and as expected. Therefore, validate that your workload is healthy before defining steady state. Steady state does not necessarily mean that there is no impact to the workload when a fault occurs, as a certain percentage in faults could be within acceptable limits. The steady state is your baseline that you will observe during the experiment, which will highlight anomalies if your hypothesis defined in the next step does not turn out as expected.

For example, a steady state of a payments system can be defined as the processing of 300 transactions per second (TPS) with a 99% success rate and round-trip time of 500 ms.

3B. Form a hypothesis about how the workload will react to the fault

A good hypothesis is based on how the workload is expected to mitigate the fault to maintain the steady state. The hypothesis states that given the fault of a specific type, the system or workload will continue steady state, because the workload was designed with specific mitigations. The specific type of fault and mitigations should be specified in the hypothesis.

The following template can be used for the hypothesis (but other wording is also acceptable):

If [specific fault] occurs the [workload name] workload will [describe mitigating controls] to maintain [business or technical metric].

For example:

  • If 20% of the nodes in the EKS node-group are taken down, the Transaction Create API continues to serve the 99th percentile of requests in under 100 ms (steady state). The EKS nodes will recover within five minutes, and pods will get scheduled and process traffic within eight minutes after the initiation of the experiment. Alerts will fire within three minutes.
  • If a single EC2 instance failure occurs, the order system’s Elastic Load Balancer (ELB) health check will cause the ELB to only send requests to the remaining healthy instances while the EC2 Auto scaling replaces the failed instance, maintaining a less than 0.01% increase in server-side (5xx) errors (steady state).
  • If the primary RDS database instance fails, the supply chain data collection workload will failover and connect to the standby RDS database instance to maintain less than one minute of database read/write errors (steady state).

3C. Run the experiment by injecting the fault

An experiment should, by default, be fail-safe and tolerated by the workload. If you know that the workload will fail, do not run the experiment. Chaos Engineering should be used to find known-unknowns or unknown-unknowns. Known-unknowns are things you are aware of but don’t fully understand, and unknown-unknowns are things you are neither aware of nor fully understand. Experimenting against a workload that you know is broken won’t provide you with new insights. Your experiment should be carefully planned, have a clear scope of impact, and provide a roll back mechanism that can be run in case of unexpected turbulence. If your due diligence shows that your workload should survive the experiment, move forward with running the experiment. There are several options for injecting the faults. For workloads on AWS, AWS FIS provides many pre-defined fault simulations called actions. You can also define custom actions that run in AWS FIS using AWS Systems Manager documents.

We discourage the use of custom scripts for chaos experiments, unless the scripts have the capabilities to understand current state of the workload, are able to emit logs, and provide mechanisms for roll backs and stop conditions where possible.

An effective framework or toolset that supports Chaos Engineering should track the current state of an experiment, emit logs, and provide rollback mechanisms, to support the controlled running of an experiment. Start with an established service like AWS FIS that allows you to run experiments with a clearly defined scope and safety mechanisms that rollback the experiment if the experiment introduces unexpected turbulence. To learn about a wider variety of experiments using AWS FIS, see the Resilient and Well-Architected Apps with Chaos Engineering lab. Also, AWS Resilience Hub will analyze your workload and create experiments that you can choose to implement and run in AWS FIS.

For every experiment, clearly understand its scope and its impact. We recommend that faults should be simulated first on a non-production environment before being run in production.

It is ideal to ultimately run in production under real-world load via canary deployments that spin up both a control and experimental system deployment, where feasible. Running experiments during off-peak times is a good practice to mitigate potential impact when first experimenting in production. Also, if using actual customer traffic poses too much risk, you can run experiments using synthetic traffic on production infrastructure against the control and experimental deployments. When using production is not possible, run experiments in pre-production environments that are as close to production as possible.

You must establish and monitor guardrails to ensure that the experiment does not impact production traffic or other systems beyond acceptable limits. Establish stop conditions to stop an experiment if it reaches a threshold on a guardrail metric that you define. This should include the metrics for steady state for the workload, as well as the metric against the components into which you’re injecting the fault. A synthetic monitor (also known as a “user canary”) is one metric you should usually include as a user proxy. Stop conditions for AWS FIS are supported as part of the experiment template, enabling up to five stop-conditions per template.

One of the Principles of Chaos Engineering is to minimize the scope of the experiment and its impact, specifically “While there must be an allowance for some short-term negative impact, it is the responsibility and obligation of the Chaos Engineer to ensure the fallout from experiments are minimized and contained”. A method to verify the scope and potential impact is to run the experiment in a non-production environment first, verifying that thresholds for stop conditions occur as expected during an experiment and observability is in place to catch an exception, instead of directly experimenting in production.

When running fault injection experiments, verify that all responsible parties are well informed. Communicate with appropriate teams, such as the operations teams, service reliability teams, and customer support, to let them know when experiments will be run and what to expect. Give these teams communication tools to inform those running the experiment if they see any adverse effects.

You must restore the workload and its underlying systems back to the original known-good state. Often, the resilient design of the workload will self-heal. But some fault designs or failed experiments can leave your workload in an unexpected failed state. By the end of the experiment, you must be aware of this and restore the workload and systems. With AWS FIS, you can set a rollback configuration (also called a post action) within the action parameters. A post action returns the target to the state that it was in before the action was run. Whether automated (such as using AWS FIS) or manual, these post actions should be part of a playbook that describes how to detect and handle failures.

3D. Verify the hypothesis

The Principles of Chaos Engineering gives this guidance on how to verify steady state of your workload: “Focus on the measurable output of a system, rather than internal attributes of the system. Measurements of that output over a short period of time constitute a proxy for the system’s steady state. The overall system’s throughput, error rates, latency percentiles, etc. could all be metrics of interest representing steady state behavior. By focusing on systemic behavior patterns during experiments, Chaos verifies that the system does work, rather than trying to validate how it works.”

In our two examples from Step 3B, we include the steady state metrics:

  • Less than 0.01% increase in server-side (5xx) errors
  • Less than 1 minute of database read/write errors

The 5xx errors are a good metric because they are a consequence of the failure mode that a client of the workload will experience directly. The database errors measurement is good as a direct consequence of the fault, but should also be supplemented with a client impact measurement such as failed customer requests or errors surfaced to the client. Additionally, include a synthetic monitor (also known as a “user canary”) on any APIs or URIs directly accessed by the client of your workload.

3E. Improve the workload design for resilience

If steady state was not maintained, then investigate how the workload design can be improved to mitigate the fault, applying the best practices of the AWS Well-Architected Reliability Pillar. Additional guidance and resources can be found in the AWS Builder’s Library, which hosts articles about how to improve your health checks and employ retries with backoff in your application code, among others.

After these changes have been implemented, run the experiment again (shown by the dotted line in Figure 2) to determine their effectiveness. If the verify step indicates the hypothesis holds true, then the workload will be in steady state, and the cycle in Figure 2 continues.

4. Run experiments regularly

A chaos experiment is a cycle, and experiments should be run regularly as part of Chaos Engineering. After a workload meets the experiment’s hypothesis, the experiment should be automated to run continuously as a regression part of your CI/CD pipeline. To learn how to do this, explore this blog on how to run AWS FIS experiments using AWS CodePipeline. This lab on recurrent AWS FIS experiments in a CI/CD pipeline enables you to work hands-on with this.

Fault injection experiments are also a part of game days. Game days simulate a failure or event to verify systems, processes, and team responses. The purpose of game days is to actually perform the actions that the team would perform as if an exceptional event happened.

5. Capture and store experiment results

Results for fault injection experiments must be captured and persisted. Include all necessary data necessary (such as time, workload, and conditions) to be able to later analyze experiment results and trends. Examples of results might include screenshots of dashboards, CSV dumps from your metrics database, or a hand-recorded record of events and observations from the experiment. Experiment logging with AWS FIS can be part of this data capture.


This blog post gives early access to the updated implementation guidance on Chaos Engineering we are publishing as part of updates to the AWS Well-Architected content. Using the implementation steps described in this post, you can begin using Chaos Engineering to verify the resilience of your workloads.

Announcing updates to the AWS Well-Architected Framework

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework/

We are excited to announce the availability of improved AWS Well-Architected Framework content. In this update, we have made changes across all six pillars of the framework: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.

A brief history

The Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads and organizations in the cloud.

In 2012, the first version of the framework was published, leading to the 2015 release of the guidance whitepaper. We added the operational excellence pillar in 2016. The pillar-specific whitepapers and AWS Well-Architected Lenses were released in 2017, and, the following year, the AWS Well-Architected Tool was launched. In 2020, the content for the framework received a major update, more lenses, and API integration with the Well-Architected Tool. The sixth pillar, sustainability, was added in late 2021.

W-A timeline v2

AWS Well-Architected timeline

What’s new

Updates to the Well-Architected content include:

Learn, measure, improve, and iterate

Best practices include regularly reviewing your workloads—even those that have not had major changes. We encourage you to assess your existing workloads as your architecture evolves or business needs change, and create milestones for your workloads as they develop. Use the Well-Architected Framework to guide your design and architecture of new workloads, or of workloads that you are planning on moving to the cloud.

Taking best practices into account early in your process can yield high success rates. In effective organizations, each best practice is considered and prioritized with respect to the goal they are trying to achieve.

AWS Well-Architected helps cloud architects build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. The Framework is built around six pillars—operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability.

Want to partner with us? Sign up!
Want to work with us? Visit Amazon Careers and search for “AWS Well-Architected” to find opportunities.

Let’s Architect! Designing Well-Architected systems

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-designing-well-architected-systems/

Amazon’s CTO Werner Vogels says, “Everything fails, all the time”. This means we should design with failure in mind and assume that something unpredictable could happen.

The AWS Well-Architected Framework is designed to help you prepare your workload for failure. It describes key concepts, design principles, and architectural best practices for designing and running workloads in the cloud. Using this tool regularly will help you gain awareness of the status of your workloads and is in place to improve any workload deployed inside your AWS accounts.

In this edition of Let’s Architect!, we’ve collected solutions and articles that will help you understand the value behind the Well-Architected Framework and how to implement it in your software development lifecycle.

AWS Well-Architected Framework

AWS Well-Architected (AWS WA) helps cloud architects build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Built around six pillars—operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability—AWS WA provides a consistent approach for customers and partners to evaluate architectures and implement scalable designs.

The AWS WA Framework includes domain-specific lenses, hands-on labs, and the AWS Well-Architected Tool. The AWS Well-Architected Tool (AWS WA Tool), available at no cost in the AWS Management Console, provides a mechanism for regularly evaluating workloads, identifying high-risk issues, and recording improvements.

The 6 pillars that composes the AWS Well-Architected framework

The 6 pillars that composes the AWS Well-Architected framework

Use templated answers to perform Well-Architected reviews at scale

For larger customers, performing AWS WA reviews often involves a combination of different teams. Coordinating participants from each team in order to perform a review increases the time taken and is expensive. In a large organization, there are often hundreds of AWS accounts where teams can store review documents, which means there is no way to quickly identify risks or spot common issues or trends that could influence improvements.

To address this, this blog post offers a solution to help you perform reviews easier and faster. It allows workload owners to automatically populate their reviews with templated answers to questions in the AWS WA Tool. These answers may be a shared responsibility between an application team and a centralized team such as platform, security, or finance. This way, application teams have fewer questions to answer and centralized team members have fewer reviews to attend, because answers that are common to all workloads are pre-populated in workload reviews. The solution also provides centralized reporting to provide a centralized view of AWS WA reviews conducted across the organization.

The components of the solution and the steps in the workflow

The components of the solution and the steps in the workflow

Machine Learning Lens

Machine learning (ML) is used to solve specific business problems and influence revenue. However, moving from experimentation (where scientists design ML models and explore applications) to a production scenario (where ML is used to generate value for the business) can present some challenges. For example, how do you create repeatable experiments? How do you increase automation in the deployment process? How do you deploy my model and monitor the performance?

This blog post and its companion whitepaper provide best practices based on AWS WA for each phase of putting ML into production, including formulating the problem and approaches for monitoring a model’s performance.

ML lifecycle phases with expanded components

ML lifecycle phases with expanded components

Establishing Feedback Loops Based on the AWS Well-Architected Framework Review

When you perform an AWS WA review using the AWS WA Tool, you’ll answer a set of questions. The tool then provides gives recommendations to improve your workloads.

To apply these recommendations effectively, you must 1) define how you’ll apply them, 2) create systems to define what is monitored and which kind of metrics or logs are required, 3) establish automatic or manual process and for reporting, and 4) improve them through iteration. This process is called a feedback loop.

This blog post shows you how to iteratively improve your overall architecture with feedback loops based on the results of the AWS WA review.

Feedback loop based on the AWS WA review

Feedback loop based on the AWS WA review

See you next time!

Thanks for reading! See you in a couple of weeks when we discuss strategies for running serverless applications on AWS.

Other posts in this series

Looking for more architecture content?

AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Selecting Network Switches for Your AWS Outposts

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/selecting-network-switches-for-your-aws-outposts/

This blog post is written by, Frankie Negro, Outposts Solution Architect.

AWS Outposts is a family of fully managed solutions that extend AWS infrastructure, services, APIs, and tools to customer premises. Outposts is available in a variety of form factors, from 1U and 2U Outposts servers (https://aws.amazon.com/outposts/servers/) to 42U Outposts racks (https://aws.amazon.com/outposts/rack/). AWS Outposts is ideal for workloads that require low-latency access to on-premises systems, local data processing, data residency, and application migration with local system interdependencies.

When operating and consuming services in the AWS Regions, the underlying networking layer is completely abstracted. You do not need to be aware of the underlying networking topology, device port speeds, connectors, transports, links, and media types. Instead, the focus is on design, with the architecture leveraging the high-level constructs available for the Amazon Virtual Private Cloud (VPC), such as VPCs, Subnets, Route Tables, Security Groups, and network access control lists. The network bandwidth available for an Amazon Elastic Compute Cloud (Amazon EC2) instance depends on the number of vCPUs that it has.

AWS Outposts requires a dedicated network connection to an AWS Region defined by the customer when ordering the product. This connection is called the Service Link, and it connects to either public or private anchors (not both) in a specific Availability Zone (AZ) in the selected parent Region. AWS recommends redundant connections that meet the bandwidth requirements for Outposts rack and Outposts servers.

The purpose of AWS Outposts is to fulfill use cases where the workload has requirements that prevent or make it unfeasible to operate in the AWS Regions. Most of these use cases, such as low latency and local data processing, require strong and reliable network infrastructure to handle a high volume of packets per second.

The construct connecting AWS Outposts to the customer on-premises network is called local gateway (LGW) for Outposts rack and local network interface (LNI) for Outposts Servers. These logical elements mediate the data traffic between Outposts and the customer premises.

On Outposts rack, Service Link and LGW traffic flows through the same network connection, which can be a single link per physical device or an aggregated link. Network packets sent to the Region or to your local network are segregated using distinct virtual LANs (VLANs) on Outposts rack. The smaller family members, Outposts servers, use two distinct physical ports.

The physical network elements providing the connections between the devices and services are called Outposts Networking Devices (ONDs) on the AWS side and Customer Networking Devices (CNDs) on the customer side. For its part, Outposts rack can deliver throughput up to 400 Gbps, aggregating 4 x 100 Gbps uplinks to support Service Link and LGW network traffic, while an Outposts server can provide a 10 Gbps dedicated network port for each traffic.

Outpost network traffic segments and logical elements

The upstream devices you provide play a fundamental role in the harmonic coexistence and operation at the ethernet physical and data link layers, which are the basis for performance and stability of the upper network and transport layers as defined by the OSI model. A careful selection of your upstream networking devices must combine reliable operations, cost effectiveness, and long-term vision.

The physical layer (L1)

Here we are talking about physical cables and media interfaces. There are no supported options for UTP Cables with RJ-45 connectors, as Outposts rack only supports Fiber Optic cables with Lucent Connectors (LC). For short distances you can use MMF (Multi-Mode Fiber) or MMF OM4 (Optical Multimode) with LC.  Longer distances can be achieved using SMF (Single Mode Fiber). Distance limits depend on the Fiber Mode and Type.

: Lucent Connector (LC) DuplexEach Outposts server has one physical QSFP+ interface. A 4-way breakout cable is supplied with SFP+ transceivers. You will use two interfaces: One for the LNI traffic and another for the Service Link traffic.

With this is mind, RJ-45 ports on upstream switches will not suit any AWS Outposts connections. Switch models that combine RF-45 and optical ports can be used in conjunction with copper ethernet cables category 8 (CAT8), which support up to 40 Gbps speeds, to connect other segments while the optical ports can be used for AWS Outposts.

When evaluating your upstream switches, bear in mind that Outposts rack switches are always capable of 1 / 10 / 40 / 100 Gbps speeds, and it is the same equipment regardless of the selected AWS Outposts resource ID and uplink connection speed defined during the order process.

It is recommended to account for future traffic needs from the beginning and specify upstream switches with 40 or 100 Gbps ports rather than start small and upgrade in the future. Upgrades and changes always carry risk, so limiting future risk by minimizing the need for upgrades will help mitigate issues and provide a stable, productive environment.

Another characteristic to look for when selecting your networking devices is “non-blocking” switches. These switches can handle all ports at full capacity simultaneously, without contention. It is a simple feature to select, and you can expect high performance out-of-the box without having to go too deep into details such as buffering mechanisms.

The Data Link layer (L2)

This layer establishes and terminates the logical links between nodes and exchange frames end-to-end. Outposts rack requires that your upstream devices support 802.1Q (Dot1q) standards that implement the VLAN support needed to segregate traffic to be forwarded to the Region (via Service Link) from traffic to be forwarded to the customer’s local network.

Most core switches ship with this capability. One good spec to evaluate is the maximum size of the MAC Address Table per VLAN supported by the switch. If the MAC Table gets full, your equipment may fail over to broadcast mode in that VLAN, which introduces additional stress in the network and is a potential exploit condition.

Another common feature for core switches is to support link aggregation or bundle links together so they act like a single, logical link. While AWS Outposts will work with just a single connection per OND, a recommended fault tolerance and high availability best practice is to aggregate multiple paths to withstand the failure of one or multiple members of the logical aggregation group.

As defined in the AWS Well-Architected Framework Reliability pillar design principles, to observe best practices of Automatically recover from failure and Scale horizontally to increase aggregate workload availability, you should consider implementing, for example, 4 x 10 Gbps instead of a single 40 Gbps uplink. AWS Outposts uses link aggregation control protocol (LACP) aggregations with the immediate customer network device (CND) according to the IEEE 802.3ad standard.

To learn more about how you can architect Outposts for network failures, check out the AWS Outposts High Availability Design and Architecture Considerations at this URL.

The logical interface defined as a result of the link aggregation (LAG) can be configured as an ethernet trunk port defined in the IEEE 802.1q standard to allow the use of multiple VLANs. Alternatively, the logical interface can be configured as an L3 interface with the Service Link and LGW defined as VLAN sub-interfaces. This is how AWS Outposts segregates traffic forwarded to Service Link from packets sent to the customer local network.

The Network layer (L3)

At this layer, we get into routing and logical addressing. Outposts rack requires Border Gateway Protocol (BGP) to dynamically exchange routes. Each OND device will establish eBGP peering with the upstream routing device for the Service Link and the LGW.

The architectural decision will be a trade-off between discrete components for routing and switching and an L3 switch capable of BGP routing. This aspect requires a careful assessment. It is common for a core switch to offer L3 capabilities, but BGP support is not available in most cases.

Switch design often aims for excelling at L2 and basic L3. If the network design requires advanced routing features or large IP routing tables, the safest path is to specify a powerful L2 switch and a dedicated L3 router.

Redundant equipment for fault tolerance is recommended as well. AWS does not have restrictions on how the customer implements core switches, but it’s always a good practice to keep it simple and standard, avoiding designs that include proprietary solutions, such as Virtual Chassis and Switch Clustering, because it can make troubleshooting difficult.

Conclusion

In this post, I showed the importance of dedicating time and effort to carefully evaluating the networking landscape where your AWS Outposts will be deployed, assessing the network device options available to you, designing for high-availability, and selecting switch models with proper feature sets and future-proof specifications.

The performance and operation of your AWS Outposts is largely dependent on your network substrate, and all efforts dedicated to making good decisions will be time well spent, allowing you to get the best value out of your hybrid solution while focusing on creating compelling applications and addressing your use cases with AWS Outposts.

Implementing the AWS Well-Architected Custom Lens lifecycle in your organization

Post Syndicated from Robert Hoffman original https://aws.amazon.com/blogs/architecture/implementing-the-aws-well-architected-custom-lens-lifecycle-in-your-organization/

In this blog post, we present a lifecycle that helps you build, validate, and improve your own AWS Well-Architected Custom Lens, in order to roll it out across your whole organization. The AWS Well-Architected Custom Lens is a new feature of the AWS Well-Architected Tool that lets you bring your own best practices to complement the existing Well-Architected Framework.

The Custom Lens lifecycle: how a Custom Lens can benefit your organization

The AWS Well-Architected Custom Lens Lifecycle

Figure 1. The AWS Well-Architected Custom Lens lifecycle

Each organization has its own requirements, processes, best practices, and tools, but the information can be spread over many systems and knowledge bases. A Custom Lens can capture the specifics of a working environment and let coworkers access this information in a single place—from the AWS console—without the need to go to a separate tool. A Custom Lens can be created in a central management account and securely shared with other accounts.

A Custom Lens can be updated periodically as either a major or minor version. If it is a minor version, the change is automatically applied to all accounts that the lens has been shared with. If it is a major version, the user has to accept the updated Custom Lens and a summary of the changes is displayed to the user. Accepting the changes then applies the update for existing workload reviews, and prompts the user to review the workload. Thus, updating a Custom Lens is an effective mechanism to continuously inform teams about new best practices.

In addition, maintaining and improving a Custom Lens continuously helps to identify gaps in organization-wide tooling, guidance, or documentation. You can aggregate feedback and metrics from reviews that have been performed and use it to drive the improvement process of the content. More importantly, the gathered metrics help measure the overall adherence to best practices and requirements in your organization. If you focus on creating clear, concise, and actionable content for your Custom Lens, the time needed to identify and implement improvements is reduced. As teams realize the value of the Custom Lens, more reviews will be performed, and you will receive more data to construct a comprehensive view.

1. Plan

The Plan phase identifies the benefits that a Custom Lens can provide your organization by identifying current gaps. You also define the scope of your Custom Lens, which is the type of content that supports your desired business outcomes. Depending on the scope, you need to identify the appropriate stakeholders and gain support for the initiative.

2. Implement

In the Implement phase, content is created for the Custom Lens with a working group. While doing this, you can identify missing supplementary artefacts, like documentation or tooling. If that is the case, you can create these artefacts and link to them from the Custom Lens Improvement Plan.

As part of the implementation, the Custom Lens is created by uploading a JSON file in the appropriate format to a central management account, then, sharing the lens with the organization’s AWS accounts. You can share the Custom Lens with IAM Principals, such as users, roles, and AWS accounts. For broader and more efficient sharing, you now have the ability to scale by sharing your Custom Lens with individual organizational units or the entire AWS Organizations. This feature reduces management overhead and removes the need for a custom automation.

3. Measure

The Measure phase aggregates feedback and metrics from reviews that have been performed with your Custom Lens; this information is used to drive the improvement process.

The Well-Architected Tool offers a way to share workload reviews, and you can use this to share all reviews with a central AWS account. You can then analyze the reviews in the central account by extracting the data and analyzing it, for example, by building a dashboard. The Well-Architected Lab for building custom reports provides a solution that can be implemented.

4. Improve

In the Improve phase, the gathered metrics and feedback are used to identify areas for future improvement. For example, you might find common gaps among the performed workload reviews, where the same best practices are not fulfilled. When you investigate the root cause, you can learn that the existing content lacks clarity or that the suggested tools are difficult to use.

In addition, improvements, such as content gaps that were not addressed during the first iteration of the Custom Lens, can be added to the backlog before you repeat the cycle.

To roll out changes of your Custom Lens in an automated and repeatable fashion, you can implement the architecture depicted in Figure 2.

Combining AWS CodeCommit with AWS Lambda to update your Custom Lens whenever a file change is pushed to the code repository

Figure 2. Combining AWS CodeCommit with AWS Lambda to update your Custom Lens whenever a file change is pushed to the code repository

This architecture enables automated releases of new versions of your Custom Lens whenever you commit an updated JSON file to the code repository. In detail, the steps are:

  1. The JSON file of your Custom Lens is stored in an AWS CodeCommit repository. An author pushes an updated version of the file to the repository.
  2. The CodeCommit repository is configured with a trigger action that invokes an AWS Lambda function on each commit.
  3. The Lambda function downloads the updated file by using the GetFile API of CodeCommit. Then, the Lambda function imports the updated Custom Lens and publishes it as a new version by using ImportLens and CreateLensVersion APIs of the AWS Well-Architected Tool, then shares the Custom Lens using CreateLensShare.
  4. The updated Custom Lens is available in all accounts that the lens has been shared with.
  5. Reviewers can create new workload reviews with the Custom Lens or upgrade to the newest version for existing workload reviews.

Conclusion

In this blog post, we walked you through the Custom Lens lifecycle, a process to create and continuously improve a Custom Lens for your organization. If you have a special software development lifecycle, a customized security and compliance framework, or other highly specific requirements or best practices that you want disseminated and measurable, learn more about how to create a Custom Lens in the Well-Architected Tool.

AWS Well-Architected is a set of guiding design principles developed by AWS to help organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Use the AWS Well-Architected Tool to review your workloads periodically to address important design considerations and ensure that they follow the best practices and guidance of the AWS Well-Architected Framework. For follow up questions or comments, join our growing community on AWS re:Post.

Simplifying serverless best practices with AWS Lambda Powertools for TypeScript

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/simplifying-serverless-best-practices-with-aws-lambda-powertools-for-typescript/

This blog post is written by Sara Gerion, Senior Solutions Architect.

Development teams must have a shared understanding of the workloads they own and their expected behaviors to deliver business value fast and with confidence. The AWS Well-Architected Framework and its Serverless Lens provide architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the AWS Cloud.

Developers should design and configure their workloads to emit information about their internal state and current status. This allows engineering teams to ask arbitrary questions about the health of their systems at any time. For example, emitting metrics, logs, and traces with useful contextual information enables situational awareness and allows developers to filter and select only what they need.

Following such practices reduces the number of bugs, accelerates remediation, and speeds up the application lifecycle into production. They can help mitigate deployment risks, offer more accurate production-readiness assessments and enable more informed decisions to deploy systems and changes.

AWS Lambda Powertools for TypeScript

AWS Lambda Powertools provides a suite of utilities for AWS Lambda functions to ease the adoption of serverless best practices. The AWS Hero Yan Cui’s initial implementation of DAZN Lambda Powertools inspired this idea.

Following the community’s adoption of AWS Lambda Powertools for Python and AWS Lambda Powertools for Java, we are excited to announce the general availability of the AWS Lambda Powertools for TypeScript.

AWS Lambda Powertools for TypeScript provides a suite of utilities for Node.js runtimes, which you can use in both JavaScript and TypeScript code bases. The library follows a modular approach similar to the AWS SDK v3 for JavaScript. Each utility is installed as standalone NPM package.

Today, the library is ready for production use with three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics).

You can instrument your code with Powertools in three different ways:

  • Manually. It provides the most granular control. It’s the most verbose approach, with the added benefit of no additional dependency and no refactoring to TypeScript Classes.
  • Middy middleware. It is the best choice if your existing code base relies on the Middy middleware engine. Powertools offers compatible Middy middleware to make this integration seamless.
  • Method decorator. Use TypeScript method decorators if you prefer writing your business logic using TypeScript Classes. If you aren’t using Classes, this requires the most significant refactoring.

The examples in this blog post use the Middy approach. To follow the examples, ensure that middy is installed:

npm i @middy/core

Logger

Logger provides an opinionated logger with output structured as JSON. Its key features include:

  • Capturing key fields from the Lambda context, cold starts, and structure logging output as JSON.
  • Logging Lambda invocation events when instructed (disabled by default).
  • Printing all the logs only for a percentage of invocations via log sampling (disabled by default).
  • Appending additional keys to structured logs at any point in time.
  • Providing a custom log formatter (Bring Your Own Formatter) to output logs in a structure compatible with your organization’s Logging RFC.

To install, run:

npm install @aws-lambda-powertools/logger

Usage example:

import { Logger, injectLambdaContext } from '@aws-lambda-powertools/logger';
 import middy from '@middy/core';

 const logger = new Logger({
    logLevel: 'INFO',
    serviceName: 'shopping-cart-api',
});

 const lambdaHandler = async (): Promise<void> => {
     logger.info('This is an INFO log with some context');
 };

 export const handler = middy(lambdaHandler)
     .use(injectLambdaContext(logger));

In Amazon CloudWatch, the structured log emitted by your application looks like:

{
     "cold_start": true,
     "function_arn": "arn:aws:lambda:eu-west-1:123456789012:function:shopping-cart-api-lambda-prod-eu-west-1",
     "function_memory_size": 128,
     "function_request_id": "c6af9ac6-7b61-11e6-9a41-93e812345678",
     "function_name": "shopping-cart-api-lambda-prod-eu-west-1",
     "level": "INFO",
     "message": "This is an INFO log with some context",
     "service": "shopping-cart-api",
     "timestamp": "2021-12-12T21:21:08.921Z",
     "xray_trace_id": "abcdef123456abcdef123456abcdef123456"
 }

Logs generated by Powertools can also be ingested and analyzed by any third-party SaaS vendor that supports JSON.

Tracer

Tracer is an opinionated thin wrapper for AWS X-Ray SDK for Node.js.

Its key features include:

  • Auto-capturing cold start and service name as annotations, and responses or full exceptions as metadata.
  • Automatically tracing HTTP(S) clients and generating segments for each request.
  • Supporting tracing functions via decorators, middleware, and manual instrumentation.
  • Supporting tracing AWS SDK v2 and v3 via AWS X-Ray SDK for Node.js.
  • Auto-disable tracing when not running in the Lambda environment.

To install, run:

npm install @aws-lambda-powertools/tracer

Usage example:

import { Tracer, captureLambdaHandler } from '@aws-lambda-powertools/tracer';
 import middy from '@middy/core'; 

 const tracer = new Tracer({
    serviceName: 'shopping-cart-api'
});

 const lambdaHandler = async (): Promise<void> => {
     /* ... Something happens ... */
 };

 export const handler = middy(lambdaHandler)
     .use(captureLambdaHandler(tracer));
AWS X-Ray segments and subsegments emitted by Powertools

AWS X-Ray segments and subsegments emitted by Powertools

Example service map generated with Powertools

Example service map generated with Powertools

Metrics

Metrics create custom metrics asynchronously by logging metrics to standard output following the Amazon CloudWatch Embedded Metric Format (EMF). These metrics can be visualized through CloudWatch dashboards or used to trigger alerts.

Its key features include:

  • Aggregating up to 100 metrics using a single CloudWatch EMF object (large JSON blob).
  • Validating your metrics against common metric definitions mistakes (for example, metric unit, values, max dimensions, max metrics).
  • Metrics are created asynchronously by the CloudWatch service. You do not need any custom stacks, and there is no impact to Lambda function latency.
  • Creating a one-off metric with different dimensions.

To install, run:

npm install @aws-lambda-powertools/metrics

Usage example:

import { Metrics, MetricUnits, logMetrics } from '@aws-lambda-powertools/metrics';
 import middy from '@middy/core';

 const metrics = new Metrics({
    namespace: 'serverlessAirline', 
    serviceName: 'orders'
});

 const lambdaHandler = async (): Promise<void> => {
     metrics.addMetric('successfulBooking', MetricUnits.Count, 1);
 };

 export const handler = middy(lambdaHandler)
     .use(logMetrics(metrics));

In CloudWatch, the custom metric emitted by your application looks like:

{
     "successfulBooking": 1.0,
     "_aws": {
     "Timestamp": 1592234975665,
     "CloudWatchMetrics": [
         {
         "Namespace": "serverlessAirline",
         "Dimensions": [
             [
             "service"
             ]
         ],
         "Metrics": [
             {
             "Name": "successfulBooking",
             "Unit": "Count"
             }
         ]
     },
     "service": "orders"
 }

Serverless TypeScript demo application

The Serverless TypeScript Demo shows how to use Lambda Powertools for TypeScript. You can find instructions on how to deploy and load test this application in the repository.

Serverless TypeScript Demo architecture

Serverless TypeScript Demo architecture

The code for the Get Products Lambda function shows how to use the utilities. The function is instrumented with Logger, Metrics and Tracer to emit observability data.

// blob/main/src/api/get-products.ts
import { APIGatewayProxyEvent, APIGatewayProxyResult} from "aws-lambda";
import { DynamoDbStore } from "../store/dynamodb/dynamodb-store";
import { ProductStore } from "../store/product-store";
import { logger, tracer, metrics } from "../powertools/utilities"
import middy from "@middy/core";
import { captureLambdaHandler } from '@aws-lambda-powertools/tracer';
import { injectLambdaContext } from '@aws-lambda-powertools/logger';
import { logMetrics, MetricUnits } from '@aws-lambda-powertools/metrics';

const store: ProductStore = new DynamoDbStore();
const lambdaHandler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {

  logger.appendKeys({
    resource_path: event.requestContext.resourcePath
  });

  try {
    const result = await store.getProducts();

    logger.info('Products retrieved', { details: { products: result } });
    metrics.addMetric('productsRetrieved', MetricUnits.Count, 1);

    return {
      statusCode: 200,
      headers: { "content-type": "application/json" },
      body: `{"products":${JSON.stringify(result)}}`,
    };
  } catch (error) {
      logger.error('Unexpected error occurred while trying to retrieve products', error as Error);

      return {
        statusCode: 500,
        headers: { "content-type": "application/json" },
        body: JSON.stringify(error),
      };
  }
};

const handler = middy(lambdaHandler)
    .use(captureLambdaHandler(tracer))
    .use(logMetrics(metrics, { captureColdStartMetric: true }))
    .use(injectLambdaContext(logger, { clearState: true, logEvent: true }));

export {
  handler
};

The Logger utility adds useful context to the application logs. Structuring your logs as JSON allows you to search on your structured data using Amazon CloudWatch Logs Insights. This allows you to filter out the information you don’t need.

For example, use the following query to search for any errors for the serverless-typescript-demo service.

fields resource_path, message, timestamp
| filter service = 'serverless-typescript-demo'
| filter level = 'ERROR'
| sort @timestamp desc
| limit 20
CloudWatch Logs Insights showing errors for the serverless-typescript-demo service.

CloudWatch Logs Insights showing errors for the serverless-typescript-demo service.

The Tracer utility adds custom annotations and metadata during the function invocation, which it sends to AWS X-Ray. Annotations allow you to search for and filter traces by business or application contextual information such as product ID, or cold start.

You can see the duration of the putProduct method and the ColdStart and Service annotations attached to the Lambda handler function.

putProduct trace view

putProduct trace view

The Metrics utility simplifies the creation of complex high-cardinality application data. Including structured data along with your metrics allows you to search or perform additional analysis when needed.

In this example, you can see how many times per second a product is created, deleted, or queried. You could configure alarms based on the metrics.

Metrics view

Metrics view

Code examples

You can use Powertools with many Infrastructure as Code or deployment tools. The project contains source code and supporting files for serverless applications that you can deploy with the AWS Cloud Development Kit (AWS CDK) or AWS Serverless Application Model (AWS SAM).

The AWS CDK lets you build reliable and scalable applications in the cloud with the expressive power of a programming language, including TypeScript. The AWS SAM CLI is that makes it easier to create and manage serverless applications.

You can use the sample applications provided in the GitHub repository to understand how to use the library quickly and experiment in your own AWS environment.

Conclusion

AWS Lambda Powertools for TypeScript can help simplify, accelerate, and scale the adoption of serverless best practices within your team and across your organization.

The library implements best practices recommended as part of the AWS Well-Architected Framework, without you needing to write much custom code.

Since the library relieves the operational burden needed to implement these functionalities, you can focus on the features that matter the most, shortening the Software Development Life Cycle and reducing the Time To Market.

The library helps both individual developers and engineering teams to standardize their organizational best practices. Utilities are designed to be incrementally adoptable for customers at any stage of their serverless journey, from startup to enterprise.

To get started with AWS Lambda Powertools for TypeScript, see the official documentation. For more serverless learning resources, visit Serverless Land.

Use templated answers to perform Well-Architected reviews at scale

Post Syndicated from Thomas Attree original https://aws.amazon.com/blogs/architecture/use-templated-answers-to-perform-well-architected-reviews-at-scale/

For larger customers, performing AWS Well-Architected (AWS WA) Framework reviews often involves a combination of different teams. Coordinating participants from each team in order to perform a review increases the time taken and is expensive. In a large organization, there are often hundreds of AWS accounts where teams can store review documents, which means there is no way to quickly identify risks or spot common issues or trends that could influence improvements.

To address this, we created a solution to help you perform reviews easier and faster. It allows workload owners to automatically populate their reviews with templated answers to questions in the AWS Well-Architected Tool (AWS WA Tool). These answers may be a shared responsibility between an application team and a centralized team such as platform, security, or finance. This way, application teams have fewer questions to answer and centralized team members have fewer reviews to attend, because answers that are common to all workloads are pre-populated in workload reviews. The solution also provides centralized reporting to provide a centralized view of AWS WA reviews conducted across the organization.

Perform Well-Architected reviews at scale

In large organizations, responsibilities are often distributed across multiple teams, for example:

  • A platform team manages an AWS Control Tower landing zone and provides accounts, access controls, and networking.
  • A security team defines security policies for this solution and enforces them using guardrails or marketplace solutions.
  • A financial operations team mandates a tagging policy to allow for accurate cost cross-charging within the business.
  • Application teams developing internal or external facing applications use a shared platform provided by a Cloud Center of Excellence.

To perform a traditional AWS WA review for this example, you would likely need to invite representatives from each of these teams to attend the review. This is because one team would be unlikely to be able to answer the foundational questions alone.

With tens or hundreds of workloads being reviewed every year, this approach doesn’t scale. This is because representatives from central teams end up attending every review. With more people involved, scheduling reviews is difficult, the overall time required to conduct the review increases, and longer reviews with more people are more expensive to perform.

Additionally, the review document is usually created and stored in one of the application team’s AWS accounts. In a large organization, there are often hundreds of AWS accounts. This makes it difficult for leadership to get a consolidated view of the risks identified across the reviews. It also makes it almost impossible to spot common issues or trends that could influence roadmaps for organization-wide improvements.

Automatically populate templated answers for quicker, easier reviews

Our solution allows you to address these challenges by using the AWS WA Tool to create answer templates. An answer template looks like a regular AWS WA Tool workload review. However, these answers propagate automatically to application workload reviews and are visible by application workload owners during the review process. This way, where there is a shared responsibility, workload owners can see this detail and they can be confident that the inputs provided by the central teams are correct and consistent.

The solution operates as shown in Figure 1 and works as follows:

  1. Central teams use the AWS WA Tool in the “central” AWS account to create workload templates. These are prefixed with “CentralTemplate” (or by a stack parameter).
  2. The central team answers the questions they’re responsible for and marks all others as “Question does not apply to this workload”.
  3. When an application team is ready to perform an AWS WA Framework review, they create a new workload in their workload account in the AWS WA Tool.
  4. This new workload is then shared with the central account (with contributor access) by an AWS Lambda function. After that, a message is placed on an Amazon Simple Notification Service (Amazon SNS) topic in the central account.
  5. In the central account, a Lambda function is subscribed to the Amazon SNS topic from step 4. This function accepts the incoming share, then shares all templates back to the workload account (with read-only access).
  6. The shared workload is then populated with templated answers from templates with the “CentralTemplate” prefix. Both the selected choices and notes are written to the shared workload. Questions in the template marked as “question does not apply to this workload” are ignored.
  7. As the application team proceeds through the questions, they will see the pre-populated answers from the template.
  8. Should a central team need to update their answers, they can update their template and create a milestone.
  9. The milestone creation invokes an AWS Step Functions workflow. The workflow collects all shared workload IDs. Next, it uses a map state to fan-out the updating of all shared workloads. Whether this process should overwrite or append workload answers is configurable at deployment time.
  10. Because all workloads are now visible in the central account, the dashboards referenced in AWS WA labs can be used for consolidated analysis of risks.
Solution components and workflow steps

Figure 1. Solution components and workflow steps

The solution can be coupled with an Amazon QuickSight powered reporting solution to get an organization-wide view of reviews from a single account. These reviews can also be shared with your AWS account team for ongoing collaborative improvement.

Note: For some workloads, you may need additional AWS WA Framework lenses. The solution offered in this post is lens agnostic, and also supports the use of custom lenses. To deploy the solution, refer to the deployment instructions which can be found on GitHub under aws-samples.

Conclusion

In this post, we explored some of the challenges faced by large enterprises when performing AWS WA Framework reviews at scale and showed you a solution to help your teams define templated answers to particular questions in the AWS WA Tool.

You can deploy this solution to your AWS accounts today by following the deployment instructions included on the aws-samples repository.

Having these templated answers automatically propagated to application workload reviews reduces the number of questions application teams have to answer, as well as the number of attendees required for a review. With this solution, all the AWS WA Framework reviews can be viewed in a single AWS account, so you can also apply the reporting solution provided in AWS WA labs to run centralized reports against all AWS WA Framework reviews in your organization.

Looking for more architecture content?

AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

New and Updated AWS Well-Architected Lenses

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-and-updated-aws-well-architected-lenses/

Since 2015, the AWS Well-Architected Framework has been helping AWS customers and partners improve their cloud architectures. The framework consists of design principles, questions, and best practices across multiple pillars: Operational ExcellenceSecurityReliabilityPerformance Efficiency, and Cost Optimization. At AWS re:Invent 2021, we introduced a new Sustainability Pillar to help organizations learn, measure, and improve their workloads using environmental best practices for cloud computing.

In 2017, we introduced AWS Well-Architected Lenses and extended the best practice guidance to specific industry and technology domains, such as serverless, high performance computing (HPC), internet of things (IoT), software as a service (SaaS), foundational technical review (FTR), and financial services. Use the applicable Lenses together with the pillars of the AWS Well-Architected Framework to fully evaluate your workloads.

In 2021, we added four new lenses for various technologies and industries at the request of our customers. If you are planning a new workload for the new year, check out the new and updated Lenses to help guide you through the implementation of AWS best practices.

New AWS Well-Architected Lenses

Streaming Media Lens (September 29, 2021)
The Streaming Media Lens helps customers apply best practices in the design, delivery, and maintenance of their cloud-based streaming media workloads. Whether you’ve just started designing a greenfield video application on AWS or are looking to migrate an existing workload, this Lens provides perspective on best practices and can spark new ideas. To learn more about best practices for architecting and improving your streaming media workloads on AWS, see the Streaming Media Lens documentation.

SAP Lens (October 29, 2021)
The SAP Lens is a collection of customer-proven design principles and best practices for ensuring SAP workloads on AWS are well-architected. The SAP Lens is based on insights that AWS has gathered from customers, AWS Partners, and the SAP Specialist Architect community. The Lens is designed to help you adopt a cloud-native approach to running SAP. To learn more, see the SAP Lens documentation.

Games Industry Lens (November 19, 2021)
The Games Industry Lens helps customers review and improve cloud-based architecture for game development, deployment, operations of gaming platforms, and to support massive player scale. The Lens presents common games deployment scenarios and identifies key elements to ensure your platforms are in accordance with the best practices of AWS Well-Architected Framework. Learn the best practices for designing, architecting, and deploying your games workloads on AWS in the Games Industry Lens documentation.

Hybrid Networking Lens (November 22, 2021)
The Hybrid Networking Lens provides best practices and strategies to use when designing hybrid networking architectures. This Lens supports a broad spectrum of use cases and helps set you up for success in building hybrid networking architectures and integrating your on-premises data center with AWS operations. It outlines three areas to consider when designing hybrid network connectivity for your workload: data layer, monitoring and configuration management, and security. To learn more, see the Hybrid Networking Lens documentation.

Updated AWS Well-Architected Lens

Machine Learning Lens (October 13, 2021)
The Machine Learning (ML) Lens introduces a set of established and repeatable best practices across the ML lifecycle phases. You can apply this guidance and architectural principles when designing your ML workloads or after your workloads have entered production as part of continuous improvement. The Lens includes guidance and resources on implementing the best practices on AWS. To learn more, see the ML Lens documentation.

Data Analytics Lens (October 29, 2021)
The Data Analytics Lens is a collection of customer-proven best practices for designing well-architected analytics workloads. It contains insights that AWS has gathered from real-world case studies and helps you learn the key design elements of well-architected analytics workloads, along with recommendations for improvement. For more information about building your own data analytics workload, see the Data Analytics Lens whitepaper.

Management and Governance Lens (December 17, 2021)
The Management and Governance Lens (M&G Lens) provides clear guidance to help you prepare your environment, regardless of your stage of cloud adoption, with a focus on eight different functions. Those functions are controls and guardrails, network connectivity, identity management, security management, monitoring and observability, cloud financial management, service management, and sourcing and distribution. To learn more, see the M&G Lens documentation.

To get started with your favorite lenses, visit the AWS Well-Architected page. You can learn, measure, and build using architectural best practices and tools.

To review your workloads using the AWS Well-Architected Framework, we recommend using the AWS Well-Architected Tool, a self-service tool designed to help you review AWS workloads at any time, without the need for an AWS Solutions Architect.

It provides a mechanism for regularly evaluating your workloads, identifying high-risk issues, and recording your improvements applying your favorite Lenses. You can also leverage Custom Lenses to record and track progress towards your organization’s internal best practices.

If you want to train these best practices, AWS Well-Architected Labs provides codes and documentation in the format of hands-on labs to help you learn, measure, and build using architectural best practices categorized into levels. Also, you can access an ecosystem of hundreds of members in the AWS Well-Architected Partner Program in your area to help analyze and review your applications.

You can refer to the AWS Architecture Center, a collection of reference architecture patterns, vetted architecture solutions, and best practices. If you’re new to AWS, use the Architect Learning Plan to learn how to design applications and systems on AWS. Build technical skills as you progress along the path toward AWS Certification.

This is My Architecture is a video series that showcases innovative architectural solutions on AWS by customers and partners. We would love to hear more from you, especially about your success stories in building your applications on AWS Well-Architected Framework. Please share with your account team to introduce your stories.

Channy

New – Sustainability Pillar for AWS Well-Architected Framework

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/sustainability-pillar-well-architected-framework/

The AWS Well-Architected Framework has been helping AWS customers improve their cloud architectures since 2015. The framework consists of design principles, questions, and best practices across multiple pillars: Operational Excellence, Security, Reliability, Performance Efficiency, and Cost Optimization.

Today we are introducing a new Sustainability Pillar to help organizations learn, measure, and improve their workloads using environmental best practices for cloud computing.

Similar to the other pillars, the Sustainability Pillar contains questions aimed at evaluating the design, architecture, and implementation of your workloads to reduce their energy consumption and improve their efficiency. The pillar is designed as a tool to track your progress toward policies and best practices that support a more sustainable future, not just a simple checklist.

The Shared Responsibility Model of Cloud Sustainability
The shared responsibility model also applies to sustainability. AWS is responsible for the sustainability of the cloud, while AWS customers are responsible for sustainability in the cloud.

The sustainability of the cloud allows AWS customers to reduce associated energy usage by nearly 80% with respect to a typical on-premises deployment. This is possible because of the much higher server utilization, power and cooling efficiency, custom data center design, and continued progress on the path to powering AWS operations with 100% renewable energy by 2025. But we can achieve much more by collectively designing sustainable architectures.

We are introducing the new Sustainability Pillar to help organizations improve their sustainability in the cloud. This is a continuous effort focused on energy reduction and efficiency of all types of workloads. In practice, the pillar helps developers and cloud architects surface the trade-offs, highlight patterns and best practices, and avoid anti-patterns. For example, selecting an efficient programming language, adopting modern algorithms, using efficient data storage techniques, and deploying correctly sized and efficient infrastructure.

Specifically, the pillar is designed to support organizations in developing a better understanding of the state of their workloads, as well as the impact related to defined sustainability targets, how to measure against these targets, and how to model where they cannot directly measure.

In addition to building sustainable workloads in the cloud, you can use AWS technology to solve broader sustainability challenges. For example, reducing the environmental incidents caused by industrial equipment failure using Amazon Monitron to detect abnormal behavior and conduct preventative maintenance. We call this sustainability through the cloud.

Well-Architected Design Principles for Sustainability in the Cloud
The Sustainability Pillar includes design principles and operational guidance, as well as architectural and software patterns.

The design principles will facilitate good design for sustainability:

  • Understand your impact – Measure business outcomes and the related sustainability impact to establish performance indicators, evaluate improvements, and estimate the impact of proposed changes over time.
  • Establish sustainability goals – Set long-term goals for each workload, model return on investment (ROI) and give owners the resources to invest in sustainability goals. Plan for growth and design your architecture to reduce the impact per unit of work such as per user or per operation.
  • Maximize utilization – Right size each workload to maximize the energy efficiency of the underlying hardware, and minimize idle resources.
  • Anticipate and adopt new, more efficient hardware and software offerings – Support upstream improvements by your partners, continually evaluate hardware and software choices for efficiencies, and design for flexibility to adopt new technologies over time.
  • Use managed services – Shared services reduce the amount of infrastructure needed to support a broad range of workloads. Leverage managed services to help minimize your impact and automate sustainability best practices such as moving infrequent accessed data to cold storage and adjusting compute capacity.
  • Reduce the downstream impact of your cloud workloads – Reduce the amount of energy or resources required to use your services and reduce the need for your customers to upgrade their devices; test using device farms to measure impact and test directly with customers to understand the actual impact on them.

Well-Architected Best Practices for Sustainability
The design principles summarized above correspond to concrete architectural best practices that development teams can apply every day.

Some examples of architectural best practices for sustainability:

  • Optimize geographic placement of workloads for user locations
  • Optimize areas of code that consume the most time or resources
  • Optimize impact on customer devices and equipment
  • Implement a data classification policy
  • Use lifecycle policies to delete unnecessary data
  • Minimize data movement across networks
  • Optimize your use of GPUs
  • Adopt development and testing methods that allow rapid introduction of potential sustainability improvements
  • Increase the utilization of your build environments

Many of these best practices are generic and apply to all workloads, while others are specific to some use cases, verticals, and compute platforms. I’d highly encourage you to dive into these practices and identify the areas where you can achieve the most impact immediately.

Transforming sustainability into a non-functional requirement can result in cost effective solutions and directly translate to cost savings on AWS, as you only pay for what you use. In some cases, meeting these non-functional targets might involve tradeoffs in terms of uptime, availability, or response time. Where minor tradeoffs are required, the sustainability improvements are likely to outweigh the change in quality of service. It’s important to encourage teams to continuously experiment with sustainability improvements and embed proxy metrics in their team goals.

Available Now
The AWS Well-Architected Sustainability Pillar is a new addition to the existing framework. By using the design principles and best practices defined in the Sustainability Pillar Whitepaper, you can make informed decisions balancing security, cost, performance, reliability, and operational excellence with sustainability outcomes for your workloads on AWS.

Learn more about the new Sustainability Pillar.

Alex

Announcing AWS Well-Architected Custom Lenses: Extend the Well-Architected Framework with Your Internal Best Practices

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/well-architected-custom-lenses-internal-best-practices/

We launched the AWS Well-Architected Framework back in 2015 to help you review workloads against architectural best practices, and across pillars such as operational excellence, security, reliability, performance efficiency, and cost optimization. In 2017, we extended the framework with the concept of “lenses” to optimize specific workload types such as the Serverless Lens, the SaaS Lens, and the Foundational Technical Review (FTR) Lens for APN Partners. In 2018, we launched the AWS Well-Architected Tool, a self-service tool designed to help you review AWS workloads at any time, without the need for an AWS Solutions Architect.

Today, I’m happy to announce the general availability of AWS Well-Architected Custom Lenses, a new feature of the AWS Well-Architected Tool that lets you bring your own best practices to complement the existing framework based on your industry, operational plans, and internal processes. Custom Lenses provide a consolidated view and a consistent way to measure and improve your workloads on AWS without relying on external spreadsheets or third-party systems.

In addition to AWS Well-Architected Lenses, now you can create and share custom lenses and include them in your workload reviews, ultimately tailoring the review to your organizational needs. For example, you could define a custom lens to review your workloads against PCI compliance, SOC 2 compliance, or other national or industry regulations. As an AWS Partner, you might include ad-hoc best practices in your custom lenses when reviewing workloads with customers from different industries and segments, ultimately making the review process easier, faster, and more comprehensive.

How to Define a new Custom Lens
You author a new custom lens by editing a JSON preset template, where you define questions, choices, helpful resources, improvement plans, and risk rules.

Here’s how it works: download the template from the AWS Well-Architected Tool, work on it locally, and then re-upload it.

The JSON structure is composed of multiple pillars. Each pillar might contain multiple questions, each with its own choices and risk rules.

Your JSON file will look like this:

{
    "schemaVersion": "2021-11-01",
    "name": "My Test Lens",
    "description": "This is a description of my test lens.",
    "pillars": [
        {
            "id": "pillar_red",
            "name": "Red Pillar",
            "questions": [
                {
                    "id": "pillar_1_q1",
                    "title": "How do you get started with this pillar?",
                    "description": "Optional description.",
                    "choices": [
                        {
                            "id": "choice1",
                            "title": "Best practice #1",
                            "helpfulResource": {
                                "displayText": "This is helpful text for the first choice.",
                                "url": "https://aws.amazon.com"
                            },
                            "improvementPlan": {
                                "displayText": "This is text that will be shown for improvement of this choice."
                            }
                        },
                        {
                            "id": "choice2",
                            "title": "Best practice #2",
                            ...
                        }
                    ],
                    "riskRules": [
                        {
                            "condition": "choice1 && choice2",
                            "risk": "NO_RISK"
                        },
                        {
                            "condition": "choice1 && !choice2",
                            "risk": "MEDIUM_RISK"
                        },
                        {
                            "condition": "default",
                            "risk": "HIGH_RISK"
                        }
                    ]
                }
            ]
            ...
        },
        ...
    ]
}

Once you’re ready to submit your JSON file, proceed with the upload.

And don’t worry about making it perfect on the first try. You’ll be able to improve it and add new versions.

AWS Well-Architected Custom Lenses in Action
You find the list of custom lenses and their latest version in the new Custom Lenses section.

Each custom lens has an owner and can be shared with multiple AWS accounts too.

Before using this new custom lens in a workload review, you’ll need to publish it and assign it a version.

Select Publish lens and provide a version name such as 1.0.

Now you can create a new workload review and apply both AWS-owned lenses and your own custom lenses, in addition to the main framework.

During the workload review, you will go through each pillar and questions of the custom lens, using the same user interface provided by the AWS Well-Architected Tool.

Last but not least, you can share your custom lens with other AWS Identity and Access Management (IAM) principals such as AWS accounts, IAM users, and IAM roles.

Available Today at No Charge
Custom Lenses are available today in all AWS Regions where the AWS Well-Architected Tool is available, at no cost. You can define up to five custom lenses and share them across AWS Accounts, in addition to the existing Well-Architected Framework and AWS-owned Lenses.

Check out the technical documentation here.

We’re looking forward to hearing your feedback and iterating quickly to improve the authoring and sharing experience based on your needs.

Alex