Tag Archives: AWS Well-Architected Framework

Top 15 Architecture Blog Posts of 2020

Post Syndicated from Jane Scolieri original https://aws.amazon.com/blogs/architecture/top-15-architecture-blog-posts-of-2020/

The goal of the AWS Architecture Blog is to highlight best practices and provide architectural guidance. We publish thought leadership pieces that encourage readers to discover other technical documentation, such as solutions and managed solutions, other AWS blogs, videos, reference architectures, whitepapers, and guides, Training & Certification, case studies, and the AWS Architecture Monthly Magazine. We welcome your contributions!

Field Notes is a series of posts within the Architecture blog channel which provide hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

We would like to thank you, our readers, for spending time on our blog this last year. Much appreciation also goes to our hard-working AWS Solutions Architects and other blog post writers. Below are the top 15 Architecture & Field Notes blog posts written in 2020.

#15: Field Notes: Choosing a Rehost Migration Tool – CloudEndure or AWS SMS

by Ebrahim (EB) Khiyami

In this post, Ebrahim provides some considerations and patterns where it’s recommended based on your migration requirements to choose one tool over the other.

Read Ebrahim’s post.

#14: Architecting for Reliable Scalability

by Marwan Al Shawi

In this post, Marwan explains how to architect your solution or application to reliably scale, when to scale and how to avoid complexity. He discusses several principles including modularity, horizontal scaling, automation, filtering and security.

Read Marwan’s post.

#13: Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

by Junjie Tang and Dean Phillips

In this post, Junjie and Dean explain how to build an Autonomous Driving Data Lake using this Reference Architecture. They cover all steps in the workflow from how to ingest the data, to moving it into an organized data lake construct.

Read Junjie’s and Dean’s post.

#12: Building a Self-Service, Secure, & Continually Compliant Environment on AWS

by Japjot Walia and Jonathan Shapiro-Ward

In this post, Jopjot and Jonathan provide a reference architecture for highly regulated Enterprise organizations to help them maintain their security and compliance posture. This blog post provides an overview of a solution in which AWS Professional Services engaged with a major Global systemically important bank (G-SIB) customer to help develop ML capabilities and implement a Defense in Depth (DiD) security strategy.

Read Jopjot’s and Jonathan’s post.

#11: Introduction to Messaging for Modern Cloud Architecture

by Sam Dengler

In this post, Sam focuses on best practices when introducing messaging patterns into your applications. He reviews some core messaging concepts and shows how they can be used to address challenges when designing modern cloud architectures.

Read Sam’s post.

#10: Building a Scalable Document Pre-Processing Pipeline

by Joel Knight

In this post, Joel presents an overview of an architecture built for Quantiphi Inc. This pipeline performs pre-processing of documents, and is reusable for a wide array of document processing workloads.

Read Joel’s post.

#9: Introducing the Well-Architected Framework for Machine Learning

by by Shelbee Eigenbrode, Bardia Nikpourian, Sireesha Muppala, and Christian Williams

In the Machine Learning Lens whitepaper, the authors focus on how to design, deploy, and architect your machine learning workloads in the AWS Cloud. The whitepaper describes the general design principles and the five pillars of the Framework as they relate to ML workloads.

Read the post.

#8: BBVA: Helping Global Remote Working with Amazon AppStream 2.0

by Jose Luis Prieto

In this post, Jose explains why BBVA chose Amazon AppStream 2.0 to accommodate the remote work experience. BBVA built a global solution reducing implementation time by 90% compared to on-premises projects, and is meeting its operational and security requirements.

Read Jose’s post.

#7: Field Notes: Serverless Container-based APIs with Amazon ECS and Amazon API Gateway

by Simone Pomata

In this post, Simone guides you through the details of the option based on Amazon API Gateway and AWS Cloud Map, and how to implement it. First you learn how the different components (Amazon ECS, AWS Cloud Map, API Gateway, etc.) work together, then you launch and test a sample container-based API.

Read Simone’s post.

#6: Mercado Libre: How to Block Malicious Traffic in a Dynamic Environment

by Gaston Ansaldo and Matias Ezequiel De Santi

In this post, readers will learn how to architect a solution that can ingest, store, analyze, detect and block malicious traffic in an environment that is dynamic and distributed in nature by leveraging various AWS services like Amazon CloudFront, Amazon Athena and AWS WAF.

Read Gaston’s and Matias’ post.

#5: Announcing the New Version of the Well-Architected Framework

by Rodney Lester

In this post, Rodney announces the availability of a new version of the AWS Well-Architected Framework, and focuses on such issues as removing perceived repetition, adding content areas to explicitly call out previously implied best practices, and revising best practices to provide clarity.

Read Rodney’s post.

#4: Serverless Stream-Based Processing for Real-Time Insights

by Justin Pirtle

In this post, Justin provides an overview of streaming messaging services and AWS Serverless stream processing capabilities. He shows how it helps you achieve low-latency, near real-time data processing in your applications.

Read Justin’s post.

#3: Field Notes: Working with Route Tables in AWS Transit Gateway

by Prabhakaran Thirumeni

In this post, Prabhakaran explains the packet flow if both source and destination network are associated to the same or different AWS Transit Gateway Route Table. He outlines a scenario with a substantial number of VPCs, and how to make it easier for your network team to manage access for a growing environment.

Read Prabhakaran’s post.

#2: Using VPC Sharing for a Cost-Effective Multi-Account Microservice Architecture

by Anandprasanna Gaitonde and Mohit Malik

Anand and Mohit present a cost-effective approach for microservices that require a high degree of interconnectivity and are within the same trust boundaries. This approach requires less VPC management while still using separate accounts for billing and access control, and does not sacrifice scalability, high availability, fault tolerance, and security.

Read Anand’s and Mohit’s post.

#1: Serverless Architecture for a Web Scraping Solution

by Dzidas Martinaitis

You may wonder whether serverless architectures are cost-effective or expensive. In this post, Dzidas analyzes a web scraping solution. The project can be considered as a standard extract, transform, load process without a user interface and can be packed into a self-containing function or a library.

Read Dzidas’ post.

Thank You

Thanks again to all our readers and blog post writers! We look forward to learning and building amazing things together in 2021.

New – SaaS Lens in AWS Well-Architected Tool

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-saas-lens-in-aws-well-architected-tool/

To help you build secure, high-performing, resilient, and efficient solutions on AWS, in 2015 we publicly launched the AWS Well-Architected Framework. It started as a single whitepaper but has expanded to include domain-specific lenses, hands-on labs, and the AWS Well-Architected Tool (available at no cost in the AWS Management Console) that provides a mechanism for regularly evaluating your workloads, identifying high risk issues, and recording your improvements.

To offer more workload-specific advice, in 2017 we extended the framework with the concept of “lens” to go beyond a general perspective and enter specific technology domains. Now, to help accelerate building Software-as-a-Service (SaaS) solutions, the AWS SaaS Factory team has led an effort to build a new AWS Well-Architected SaaS Lens.

SaaS is a licensing and delivery model by which software is centrally managed and hosted by a provider and available to customers on a subscription basis. In this way, software providers can innovate rapidly, optimize their costs, and gain operational efficiencies. At the same time, customers benefit from simplified IT management, speed, and a pay-for-what-you-use business model.

The Well-Architected SaaS Lens adds questions to the tool that are tailored to SaaS workloads and intended to drive critical thinking for developing and operating SaaS workloads. Each question has a list of best practices, and each best practice has a list of improvement plans to help guide you in implementing them. AWS Solution Architects from the AWS SaaS Factory Program, having worked with thousands of software developers and AWS Partners, view these well-architected patterns as a key component of building and operating a SaaS architecture on AWS.

Using the SaaS Lens in the Well-Architected Tool
In the Well-Architected Tool console, I start by defining my workload. Today, I’m reviewing a pre-production environment of a SaaS application. It’s just a minimum viable product (MVP) version of what I want to build, with just enough features to be usable and get a first feedback.

Now, I can choose which lenses to apply. The AWS Well-Architected Framework is there by default. I select the SaaS Lens. This is adding a set of additional questions that help me understand how to design, deploy, and architect my SaaS workload following the framework best practices. Other lenses are available in the tool, for example the Serverless Lens described here.

Now, I start my review. Many questions in the SaaS Lens are focused on how you are managing a multi-tenant application. This is the first question for the Operational Excellence pillar. I can also add some notes to explain my answer better or take note of what I want to improve.

I don’t need to answer all questions to start improving my SaaS application. For example, this is the improvement plan based on my answer to the previous question. For each point here, I can click and get more information on how to implement that on AWS.

Moving to the Reliability pillar, I feel more confident because of the techniques I used to separate individual tenants of my SaaS application in their own “sandbox” environment.

As I expect, no risks are detected this time!

When I finish reviewing the SaaS Lens for my workload, I get an overview of the detected risks. Here, I can also save a milestone that I can use later to compare my status and estimate my improvements.

Just below that, I get a suggestion on what to focus on next. Again, I can click and get in-depth suggestion on how to mitigate the risk.

As often happens in IT services, this is an iterative process. The AWS Well-Architected Tool helps quantify the risks and gives me a path to follow to continuously improve my SaaS application.

Available Now
The SaaS Lens is available today in all regions where the AWS Well-Architected Tool is offered, as described in the AWS Regional Services List. It can be applied to existing workloads, or used for new workloads you define in the tool.

There are no costs in using the AWS Well-Architected Tool; you can use it to improve the application you are working on, or to get visibility into multiple workloads used by the department or area you are working with.

Learn more about the new SaaS Lens and get started today with the AWS Well-Architected Tool!


Integrating CloudEndure Disaster Recovery into your security incident response plan

Post Syndicated from Gonen Stein original https://aws.amazon.com/blogs/security/integrating-cloudendure-disaster-recovery-into-your-security-incident-response-plan/

An incident response plan (also known as procedure) contains the detailed actions an organization takes to prepare for a security incident in its IT environment. It also includes the mechanisms to detect, analyze, contain, eradicate, and recover from a security incident. Every incident response plan should contain a section on recovery, which outlines scenarios ranging from single component to full environment recovery. This recovery section should include disaster recovery (DR), with procedures to recover your environment from complete failure. Effective recovery from an IT disaster requires tools that can automate preparation, testing, and recovery processes. In this post, I explain how to integrate CloudEndure Disaster Recovery into the recovery section of your incident response plan. CloudEndure Disaster Recovery is an Amazon Web Services (AWS) DR solution that enables fast, reliable recovery of physical, virtual, and cloud-based servers on AWS. This post also discusses how you can use CloudEndure Disaster Recovery to reduce downtime and data loss when responding to a security incident, and best practices for maintaining your incident response plan.

How disaster recovery fits into a security incident response plan

The AWS Well-Architected Framework security pillar provides guidance to help you apply best practices and current recommendations in the design, delivery, and maintenance of secure AWS workloads. It includes a recommendation to integrate tools to secure and protect your data. A secure data replication and recovery tool helps you protect your data if there’s a security incident and quickly return to normal business operation as you resolve the incident. The recovery section of your incident response plan should define recovery point objectives (RPOs) and recovery time objectives (RTOs) for your DR-protected workloads. RPO is the window of time that data loss can be tolerated due to a disruption. RTO is the amount of time permitted to recover workloads after a disruption.

Your DR response to a security incident can vary based on the type of incident you encounter. For example, your DR plan for responding to a security incident such as ransomware—which involves data corruption—should describe how to recover workloads on your secondary DR site using a recovery point prior to the data corruption. This use case will be discussed further in the next section.

In addition to tools and processes, your security incident response plan should define the roles and responsibilities necessary during an incident. This includes the people and roles in your organization who perform incident mitigation steps, in addition to those who need to be informed and consulted. This can include technology partners, application owners, or subject matter experts (SMEs) outside of your organization who can offer additional expertise. DR-related roles for your incident response plan include:

  • A person who analyzes the situation and provides visibility to decision-makers.
  • A person who decides whether or not to trigger a DR response.
  • A person who actively triggers the DR response.

Be sure to include all of the stakeholders you identify in your documented security incident response procedures and runbooks. Test your plan to verify that the people in these roles have the pre-provisioned access they need to perform their defined role.

How to use CloudEndure Disaster Recovery during a security incident

CloudEndure Disaster Recovery continuously replicates your servers—including OS, system state configuration, databases, applications, and files—to a staging area in your target AWS Region. The staging area contains low-cost resources automatically provisioned and managed by CloudEndure Disaster Recovery. This reduces the cost of provisioning duplicate resources during normal operation. Your fully provisioned recovery environment is launched only during an incident or drill.

If your organization experiences a security incident that can be remediated using DR, you can use CloudEndure Disaster Recovery to perform failover to your target AWS Region from your source environment. When you perform failover, CloudEndure Disaster Recovery orchestrates the recovery of your environment in your target AWS Region. This enables quick recovery, with RPOs of seconds and RTOs of minutes.

To deploy CloudEndure Disaster Recovery, you must first install the CloudEndure agent on the servers in your environment that you want to replicate for DR, and then initiate data replication to your target AWS Region. Once data replication is complete and your data is in sync, you can launch machines in your target AWS Region from the CloudEndure User Console. CloudEndure Disaster Recovery enables you to launch target machines in either Test Mode or Recovery Mode. Your launched machines behave the same way in either mode; the only difference is how the machine lifecycle is displayed in the CloudEndure User Console. Launch machines by opening the Machines page, shown in the following figure, and selecting the machines you want to launch. Then select either Test Mode or Recovery Mode from the Launch Target Machines menu.

Figure 1: Machines page on the CloudEndure User Console

Figure 1: Machines page on the CloudEndure User Console

You can launch your entire environment, a group of servers comprising one or more applications, or a single server in your target AWS Region. When you launch machines from the CloudEndure User Console, you’re prompted to choose a recovery point from the Choose Recovery Point dialog box (shown in the following figure).

Use point-in-time recovery to respond to security incidents that involve data corruption, such as ransomware. Your incident response plan should include a mechanism to determine when data corruption occurred. Knowing how to determine which recovery point to choose in the CloudEndure User Console helps you minimize response time during a security incident. Each recovery point is a point-in-time snapshot of your servers that you can use to launch recovery machines in your target AWS Region. Select the latest recovery point before the data corruption to restore your workloads on AWS, and then choose Continue With Launch.

Figure 2: Selection of an earlier recovery point from the Choose Recovery Point dialog box

Figure 2: Selection of an earlier recovery point from the Choose Recovery Point dialog box

Run your recovered workloads in your target AWS Region until you’ve resolved the security incident. When the incident is resolved, you can perform failback to your primary environment using CloudEndure Disaster Recovery. You can learn more about CloudEndure Disaster Recovery setup, operation, and recovery by taking this online CloudEndure Disaster Recovery Technical Training.

Test and maintain the recovery section of your incident response plan

Your entire incident response plan must be kept accurate and up to date in order to effectively remediate security incidents if they occur. A best practice for achieving this is through frequently testing all sections of your plan, including your tools. When you first deploy CloudEndure Disaster Recovery, begin running tests as soon as all of your replicated servers are in sync on your target AWS Region. DR solution implementation is generally considered complete when all initial testing has succeeded.

By correctly configuring the networking and security groups in your target AWS Region, you can use CloudEndure Disaster Recovery to launch a test workload in an isolated environment without impacting your source environment. You can run tests as often as you want. Tests don’t incur additional fees beyond payment for the fully provisioned resources generated during tests.

Testing involves two main components: launching the machines you wish to test on AWS, and performing user acceptance testing (UAT) on the launched machines.

  1. Launch machines to test.
    Select the machines to test from the Machines page of the CloudEndure User Console by selecting the check box next to the machine. Then choose Test Mode from the Launch Target Machines menu, as shown in the following figure. You can select the latest recovery point or an earlier recovery point.
    Figure 3: Select Test Mode to launch selected machines

    Figure 3: Select Test Mode to launch selected machines


    The following figure shows the CloudEndure User Console. The Disaster Recovery Lifecycle column shows that the machines have been Tested Recently.

    Figure 4: Machines launched in Test Mode display purple icons in the Status column and Tested Recently in the Disaster Recovery Lifecycle column

    Figure 4: Machines launched in Test Mode display purple icons in the Status column and Tested Recently in the Disaster Recovery Lifecycle column

  2. Perform UAT testing.
    Begin UAT testing when the machine launch job is successfully completed and your target machines have booted.

After you’ve successfully deployed, configured, and tested CloudEndure Disaster Recovery on your source environment, add it to your ongoing change management processes so that your incident response plan remains accurate and up-to-date. This includes deploying and testing CloudEndure Disaster Recovery every time you add new servers to your environment. In addition, monitor for changes to your existing resources and make corresponding changes to your CloudEndure Disaster Recovery configuration if necessary.

How CloudEndure Disaster Recovery keeps your data secure

CloudEndure Disaster Recovery has multiple mechanisms to keep your data secure and not introduce new security risks. Data replication is performed using AES 256-bit encryption in transit. Data at rest can be encrypted by using Amazon Elastic Block Store (Amazon EBS) encryption with an AWS managed key or a customer key. Amazon EBS encryption is supported by all volume types, and includes built-in key management infrastructure that has no performance impact. Replication traffic is transmitted directly from your source servers to your target AWS Region, and can be restricted to private connectivity such as AWS Direct Connect or a VPN. CloudEndure Disaster Recovery is ISO 27001 and GDPR compliant and HIPAA eligible.


Each organization tailors its incident response plan to meet its unique security requirements. As described in this post, you can use CloudEndure Disaster Recovery to improve your organization’s incident response plan. I also explained how to recover from an earlier point in time when you respond to security incidents involving data corruption, and how to test your servers as part of maintaining the DR section of your incident response plan. By following the guidance in this post, you can improve your IT resilience and recover more quickly from security incidents. You can also reduce your DR operational costs by avoiding duplicate provisioning of your DR infrastructure.

Visit the CloudEndure Disaster Recovery product page if you would like to learn more. You can also view the AWS Raise the Bar on Data Protection and Security webinar series for additional information on how to protect your data and improve IT resilience on AWS.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Gonen Stein

Gonen is the Head of Product Strategy for CloudEndure, an AWS company. He combines his expertise in business, cloud infrastructure, storage, and information security to assist enterprise organizations with developing and deploying IT resilience and business continuity strategies in the cloud.

AWS Well-Architected for Financial Services

Post Syndicated from Arjun Chakraborty original https://aws.amazon.com/blogs/architecture/aws-well-architected-for-financial-services/

Cloud is part of the new normal in the financial services industry. Higher customer expectations have raised the stakes, pushing institutions to streamline operations, lower costs, and accelerate innovation. The conversation is no longer about whether to embrace the cloud, but rather, how quickly it can be done. To address the need to provide tailored advice, AWS added the concept of AWS Well-Architected Lenses in 2017. AWS now is happy to announce Financial Services lens, the first industry specific guidance for the AWS Well-Architected Framework. This post provides an introduction of its purpose, topics covered, and common scenarios included.

In the Financial Services lens, we focus on how to design, deploy, and architect financial services industry workloads that promote the resiliency, security, and operational performance in line with risk and control objectives that firms define for themselves. This includes companies and industries that are required to meet the regulatory and compliance requirements of supervisory authorities.

While we recommend that AWS customers start with the standard AWS Well-Architected Framework as a foundation for their architecture review, the Financial Services lens provides additional best practices for the resiliency, security, and operational performance requirements of financial institutions. These best practices are based on our experience working with financial services customers globally.

The guidance inside the lens can be grouped under the following:

Well-Architected FinServ lens

General Design principles to improve security, data privacy, and resiliency in the cloud

These principles include both general design principles for unlocking the true value of cloud and specific best practices for your AWS environments. For example, we have advocated for a Security by Design approach (SbD), wherein firms should implement their architectures in repeatable templates that have control objectives, security baselines, and audit capabilities baked in. The lens also provides specific guidance on how to the implement this approach, such as how to protect your compute infrastructure and protect your data with encryption.

Guardrails to confidently build and deploy financial applications on AWS

The lens provides best practices in a number of areas including Identity and Access Management (IAM), networking, secrets management, data privacy, resiliency, and performance. When taken together with the standard Well-Architected Framework, it will help accelerate migration or new development of applications on the cloud while making sure that security, risk, and compliance needs are met as well.

Techniques to build transparency and auditability into your AWS environment

With AWS, customers have access to a higher fidelity of observability in terms of what happens in their environments compared to traditional on-premise environments. The lens identifies a number of best practices to harness this capability for better business and operational outcomes. We’ve also provided additional guidance on how to gather, store, and secure evidence for audit requirements associated with cloud environments.

Recommendations for controls to help expedite adoption of new AWS services

“Allow listing” new AWS services or certifying that a new service implements all the internal security and control standards is a time-consuming process for many financial services customers. It also holds back teams from leveraging innovation in new AWS services and features. In this lens, we have discussed a number of best practices in the areas of IAM privileges, private connectivity, and encryption which should help expedite the allow listing of new services.

The Financial Services Lens whitepaper is intended to provide guidance for common industry workloads, such as:

  • Financial data on the cloud
  • Building data lakes for use cases such as regulatory reporting
  • Artificial intelligence (AI) and machine learning (ML)
  • Grid computing for risk and modeling
  • Building a platform for ML and AI
  • Open banking
  • Digital user engagement

If any of these scenarios fits your needs, building to the principles of the Financial Services lens in the AWS Well-Architected Framework can improve security, resiliency, and operational efficiency for all financial services customers on AWS, and can also assist in meeting regulatory and compliance obligations.


We strive to provide you with a consistent approach to evaluate architectures and implement designs that will scale over time. AWS is committed to the maintaining the Financial Services Lens whitepaper as a living document; as the technology and regulatory landscape evolves and new AWS services come on line, we’ll update the Financial Services lens appropriately. We are also actively working towards implementing the Financial Services lens in the Well-Architected Tool, so that teams can run Well-Architected reviews against best practices included in the whitepaper.

Special thanks to the following individuals who contributed to building this resource, among many others who helped with review and implementation: Ilya Epshteyn, Misha Goussev, Som Chatterjee, James Craig, Anjana Kandalam, Roberto Silva, Chris Redmond, Pawan Agnihotri, Rahul Prabhakar, Jaswinder Hayre, Jennifer Code, Igor Kleyman, John McDonald, and John Kain.

Liberty IT Adopts Serverless Best Practices Using AWS Cloud Development Kit

Post Syndicated from Andrew Robinson original https://aws.amazon.com/blogs/architecture/liberty-it-adopts-serverless-best-practices-using-aws-cdk/

This post was co-written with Matthew Coulter, Lead Technical Architect of Global Risk at Liberty Mutual

Liberty IT Solutions, part of Liberty Mutual Group, has been using AWS CloudFormation to deploy serverless applications on AWS for the last four years. These deployments typically involve defining, integrating, and monitoring services such as AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

In this post, we will explore how Liberty took the AWS Cloud Development Kit (CDK), the AWS Well-Architected framework, and the Serverless Application Lens for the AWS Well-Architected Framework and built a set of CDK patterns in to help developers launch serverless resources.

Since each team was responsible for building and maintaining these applications, as soon as CloudFormation templates grew larger, it quickly became a challenge. As well, it was becoming increasingly time consuming to extract the relevant parts of the template for re-use. Even when teams were building similar applications with similar requirements, everything was built from scratch for that application.

Before we dive in, let’s cover some of the terminology in this blog post and provide you with some additional reading:

  • AWS CDK was launched as a software development framework to help you define cloud infrastructure as code and provision it through AWS CloudFormation.
  • AWS Well-Architected was developed to help cloud architects build secure, high performing, resilient, and efficient infrastructures for their applications.
  • Serverless Application Lens focuses on how to design, deploy, and architect your serverless application workloads on the AWS Cloud.


The model Liberty used to develop these applications was based on small components that led to repeatable patterns, which made them ideal for sharing architectures across teams.

For example, if different teams built a web application using API Gateway, AWS Lambda, and Amazon DynamoDB, they could build a reusable pattern that included best practices after they battle tested it. This known infrastructure pattern could then be easily shared across multiple teams, ultimately increasing developer productivity.

Due to compliance requirements and controls in place, Liberty built these applications using CloudFormation with specific configurations. Teams had to frequently reverse engineer their CloudFormation templates from previous proof of concepts (PoCs), as there were no high-level abstractions available. If there were similar examples out there, they’d have to engineer it back to CloudFormation with our controls in place in order to use it in production.

That’s where CDK comes in

Matt Coulter at Liberty (and co-author of this blog post), said, “After researching CDK, we thought it would be a good fit for producing the patterns we wanted to, and providing an open source framework for others to contribute to the patterns, and to build their own.”

In an initial PoC, Liberty was able to take more than 2,000 lines of YAML CloudFormation down to just 14 lines of code, which came packaged with its own unit tests. This could then be published to NPM or GitHub to share with the rest of the team and with no loss of functionality over the original CloudFormation template.

Discovering and implementing AWS Serverless best practices

Liberty wanted to start the patterns using industry-wide best practices before applying its own specific best practices for compliance requirements.

The ideal starting point for this was the AWS Well-Architected framework, specifically the Serverless Application Lens. The Serverless Lens is more focused and specific, and it provides guidance that speaks to developers more closely than the broader framework. Using Serverless Lens saved Liberty significant time investments on discovering and implementing those best practices across the patterns they had built, and it allowed for faster uptake.

Let’s take a look at one of the patterns in more detail.

The X-Ray tracer pattern

This pattern introduces the concept of distributed tracing: as workloads scale it can become more challenging to find anomalies and figure out where those anomalies occur within your application. This is not defined by the components used, but how the information is sent back information to AWS X-Ray. The pattern includes some common use cases with different data flows through AWS Serverless services.

Common use cases with different data flows through AWS Serverless services

After this pattern has been deployed, a service map will be generated in X-Ray showing the interconnectivity between the different resources deployed. (Your map may look slightly different than this one.)

Service map in X-Ray showing interconnectivity between different resources deployed

This pattern will provide you with an API Gateway endpoint you can use to trigger the flow and see the results in near real time in the X-Ray service map. The pattern also includes an SSL Certificate error in a Lambda function that connects to an external HTTP endpoint. When this happens, you can see the error in the X-Ray service map. You can then go into the trace details, which shows the specific error:

Trace details showing the specific error

Want to know more?

Liberty has deployed more than 1,000 applications into non-production environments using CDK Patterns, and more than 100 applications into production. This saved its developers time in deploying best-of-breed serverless applications for the company and has encouraged sharing across the entire organization.

All of these patterns are now available at CDK Patterns under the MIT License. There are currently 17 serverless patterns available in both Typescript and Python, and you can find CDK patterns by any of the five Well-Architected Pillars.

You can also reach out to Liberty IT about this project on Twitter as well as contribute either directly or on GitHub.


Announcing the New Version of the Well-Architected Framework

Post Syndicated from Rodney Lester original https://aws.amazon.com/blogs/architecture/announcing-the-new-version-of-the-well-architected-framework/

“Nothing is constant except for change.” – Unknown

We are announcing the availability of a new version of the AWS Well-Architected Framework. We’ve made changes in response to your feedback, as well as to industry trends that are becoming widely adopted. We focused on four themes: removing perceived repetition, adding content areas to explicitly call out previously implied best practices, and revising best practices to provide clarity. These changes are intended to make it easier for you to understand and implement the best practices in your workloads. In addition to updating all the Well-Architected Pillar whitepapers to align more closely with these best practices, we have also improved the usability of the AWS Well-Architected Tool (AWS WA Tool).


This is our first Framework update since we released the AWS WA Tool in the AWS Management Console at re:Invent 2018. However, this is actually the eighth version of the Framework since it was introduced in 2012. The Framework consists of design principles, questions, and best practices that can be achieved in many different ways and are written in a manner that is much broader than a “checklist.” The best practices can be implemented using technology (AWS or third party) or by processes, which can also be automated. The Framework is written to allow you to explore if you are actually accomplishing each best practice, rather than simply giving you “yes” or “no” questions to indicate if you have performed a particular action. Evaluating your design, architecture, and implementation should not be a checklist but an evaluation of your progress toward the desired outcome.

We revise the Framework every year through a Kaizen process, where we collect the data about what worked for customers, what could be improved to provide clarity, and what should be added or removed. The first three versions were purely internal releases—used by our Solutions Architects, Professional Services Consultants, and Enterprise Support teams to have conversations with customers about building and operating the best cloud architectures.

With the fourth version in 2015, we published our guidance as a whitepaper. This increased its visibility and pointed out a blind spot: we had not documented the operational best practices required to achieve the other pillars’ operational aspects. The fifth version, released the following year, included a new pillar to address our omission, Operational Excellence. It was revised two more times, once in 2017, when we released the pillar-specific whitepapers and AWS Well-Architected Lenses, and again in 2018 when we launched the AWS WA Tool.

What’s new

We have added more topics to the Operational Excellence pillar. These specifically relate to the structure of your organization and how your culture supports those organizational choices. There are also additional best practices that we have identified and clarified in the existing areas of Prepare, Operate, and Evolve.

Operational Excellence

The Security pillar has one less question, which we accomplished by refining the best practices and removing duplication. The most changes are in identity and access management and how to operate your workloads securely.

Security pillar

The Reliability pillar has three more questions, based around your workload architecture. These have been covered as part of failure management in the Reliability Pillar whitepaper since 2017, but they have elicited enough discussion and action with customers that we now explicitly call them out.

Reliability pillar

The number of questions in the Performance Efficiency pillar did not change, but we clarified the best practices.

Performance efficiency

The Cost Optimization pillar has introduced a new section on Cloud Financial Management (CFM), which added a new question along with the best practices associated with it.

Cost optimization

These changes make the best practices clearer, remove duplication, and explicitly call out the new best practices that we have identified as an important part of having great cloud workloads. Expect to see additional blog posts explaining the changes to each pillar in more detail.

With this release, the Framework in now available in ten languages: English, Spanish, French, German, Italian, Japanese, Korean, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Learn, measure, improve, & iterate

It’s a best practice to regularly review your workloads—even those that have not had major changes. Our internal standard is to perform reviews at least annually. We encourage you to review your existing workloads at least once this year, and to create milestones for your workloads as they evolve. Use the Framework to guide your design and architecture of new workloads, or of workloads that you are planning on moving to the cloud. The greatest successes have come from customers that take these best practices into consideration as early as possible in the process. In effective organizations, every best practice is considered and prioritized.

Continue to learn, measure, and improve your cloud workloads with the AWS Well-Architected Framework and use the AWS Well-Architected Tool to help document your progress to having Well-Architected workloads.

Learn more about the new version of Well-Architected and its pillars

What’s New in the Well-Architected Operational Excellence Pillar?

Post Syndicated from Brian Carlson original https://aws.amazon.com/blogs/architecture/whats-new-in-the-well-architected-operational-excellence-pillar/

We recently released an updated version of the Operational Excellence pillar of the AWS Well-Architected Framework, which includes expanded guidance on operating model, and organizational culture, as well as some other refinements.

Gerald Weinberg, in his 1985 book, The Secrets of Consulting, defined The Second Law of Consulting as “No matter how it looks at first, it’s always a people problem.” I appreciate the humor and wit with which he approached software development.

Organizations, teams, and team members have always been a significant part of the Operational Excellence (OE) pillar. They now have their own dedicated section Organization, which includes guidance on Organization Priorities, Operating Model, and Organizational Culture.

How do you structure your organization to support your business outcomes?

Regardless of the choice of methodology—DevOps, ISO, ITIL, or something else—we all ultimately perform the same activities. The single most significant difference is in how we organize responsibilities and teams. Understanding the relationships between individuals and teams is essential to the smooth flow of work through your organization.

Starting from a simplified view of an organization limited to the engineering and operations of both applications and infrastructure (or platform), we explore the tradeoffs between common operating models.

Operational Excellence

Using variations on this 2 by 2 diagram, where responsibilities might span multiple quadrants, we discuss who does what, and the relationships between teams, governance, and decision-making. A well-defined set of responsibilities reduces the number of conflicting and redundant efforts. Business outcomes are easier to achieve when there is strong alignment and relationships between business, development, and operations teams.

Your teams might use different operating models, based on their capabilities and needs. You can be successful with any operating model, but some operating models have advantages, or are better suited, to your individual teams. Mapping who does what across your teams can provide significant insight to how they contribute or impact the flow of work. These insights can lead to the identification of opportunities for improvement as well as opportunities to further leverage the existing capabilities and innovation of your teams.

How does your organizational culture support your business outcomes?

The standards set by your leadership shape your organizational culture, so we begin by looking at Executive Sponsorship. Senior leadership sets expectations for the organization and evaluates success. When senior leadership is the sponsor, advocate, and driver for the adoption of best practices, as well as the evolution of the organization, both are more likely to happen. Our experience has shown that strong executive sponsorship is key to success.

“Diverse opinions are encouraged and sought within and across teams” is our top best practice in Organizational Culture. Leverage cross-organizational diversity to seek multiple unique perspectives. Use these perspectives to increase innovation, challenge your assumptions, and reduce the risk of confirmation bias. Grow inclusion, diversity, and accessibility within your teams to gain beneficial perspectives. These actions call back to the strong alignment and relationships across teams that we discussed with operating models. When you engage diverse perspectives and experiences, the insights you gain are more likely to identify new challenges, new opportunities, and new solutions.

While small in scope, other changes to OE represent significant areas for improvement.

We start out with the subtle addition of “Evaluate governance requirements” alongside the existing “Evaluate compliance requirements.” With compliance, we focus is on the obligations that come from outside your organization, for example the regulatory requirements for your industry. In the case of governance, we are focusing on the requirements that your organization applies to you and your workload.

“Use a process for root cause analysis” has been renamed “Perform post-incident analysis” and moved to “How do you evolve operations?” This change makes it part of continual improvement, where it makes more sense based on the timing of the activity and it gets more emphasis. Regardless of what you call it, if your goal is to identify the contributing factors and mitigate or prevent recurrence, you are doing the right thing.

The final addition to OE is “Perform Knowledge Management.” It’s a difficult challenge that can have simple solutions. Your team members need to be able to discover the information that they are looking for, access it, and identify that it’s current and complete. It’s much better to capture institutional knowledge now, so that the burden carried by key team members can be shared. If a key team member is tragically struck by a winning lottery ticket and forced to become a person of leisure, what will you do? Perhaps right now is the best time to get started on a knowledge management effort.

This update to the Operational Excellence pillar of the AWS Well-Architected Framework gives you and your teams the tools and information you need to improve your operations and governance. Together with the AWS Well-Architected Tool, use them to continue to learn, measure, and improve your cloud workloads.

Special thanks to everyone across the AWS operations community who contributed their diverse perspectives and experiences to improving the Operational Excellence pillar.

Learn more about the new version of Well-Architected and its pillars

What’s New in the Well-Architected Reliability Pillar?

Post Syndicated from Seth Eliot original https://aws.amazon.com/blogs/architecture/whats-new-in-the-well-architected-reliability-pillar/

The new version of the Reliability pillar for AWS Well-Architected includes expanded content across all areas of reliability. Guidance on distributed system architecture has been reorganized and expanded, and new best practices have been added as part of the Well-Architected Review. There is a sharper focus on chaos engineering with more explanation and examples. We’ve added more details on using fault isolation to protect your workloads using Availability Zones, and beyond.

In the AWS Well-Architected Tool, new reliability best practices have been added, and existing ones updated. We have completely updated the Reliability Pillar whitepaper to align to the questions and best practices found in the tool. Additionally, we added the latest guidance on implementing the best practices using the newest AWS resources and partner technologies, such as AWS Transit Gateway, AWS Service Quotas, and CloudEndure Disaster Recovery.

The whitepaper provides clearer definitions to help you better understand the relationships among reliability, resiliency, and availability. The focus remains on resiliency, and how to design this into your workloads so that they are able to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions, such as misconfigurations or transient network issues.

Launched at re:Invent 2019, Amazon Builders’ Library shares in-depth articles on how Amazon builds and runs resilient workloads. Our updated Reliability pillar draws extensively on this information, incorporating it across multiple best practices, and linking back to specific Amazon Builders’ Library articles. The AWS Well-Architected hands-on reliability labs now include Implementing Health Checks and Managing Dependencies to improve Reliability, which lets you exercise the practices demonstrated in the library’s Implementing health checks article firsthand. We expanded the suite of Well-Architected Reliability labs with new labs on data backup, data replication, and automated infrastructure deployment.

Implementing Health Checks and Managing Dependencies to Improve Reliability-2

The new Implementing Health Checks and Managing Dependencies to Improve Reliability lab shows you how to implement practices to detect dependency failures and remain resilient despite them.

Prior to this version of the Reliability pillar, we had identified three best practice areas: Foundations, Change Management, and Failure Management. In this new version, we added a fourth area:

  • Workload Architecture: Specific patterns to follow as you design and implement software architecture for your distributed systems.

This new area covers best practices related to service-oriented architecture, microservices architectures, and distributed systems. We also added these to the AWS Well-Architected Tool, so that you can review your workloads and understand if they are using these best architectural practices. Also the whitepaper content for these has been expanded, and draws on Amazon Builders’ Library articles, including Challenges with distributed systems and Timeouts, retries, and backoff with jitter.

The previous version helped you to understand the important role of Availability Zones in a reliable architecture. In the new version, we expanded on this by adding more detail on using bulkhead architectures, such as cell-based architecture (used across AWS), where each cell is a complete, independent instance of the service.

Best practices on how you implement change have always been an important part of the Reliability pillar. We now have more practical guidance on reliable deployment, including runbooks and pipeline tests. The new best practice on immutable infrastructure expands on our previous guidance on deployment automation using canary deployment or blue/green deployment.

We’ve also expanded coverage of Chaos Engineering. You can’t consider your workload to be resilient until you hypothesize how your workload will react to failures, inject those failures to test your design, and then compare your hypothesis to the testing results. While Chaos Monkey popularized the constructive use of chaos in 2010, Amazon has been purposely injecting failures since the early 2000s to increase resiliency and ensure readiness under the most adverse of circumstance. This history and experience are all the more applicable today in the cloud, where you can both design for recovery and test those designs. This is an often-overlooked best practice, but our most successful resiliency customers recognize it as a necessary and powerful tool.

This update to the Reliability pillar of the AWS Well-Architected Framework gives you and your teams the tools and information you need to understand your workload reliability. Together with the AWS Well-Architected Tool, start creating a plan today and continue to learn, measure, and improve your cloud workloads.

A huge thank you to everyone who gives us feedback on the tool and whitepapers, and a special thank you to Stephen Beck, Adrian Hornsby, Mahanth Jayadeva, Krupakar Pasupuleti, Jon Steele, and Jon Wright for their help with this update.

Learn more about the new version of Well-Architected and its pillars

What’s New in the Well-Architected Performance Efficiency Pillar?

Post Syndicated from Eric Pullen original https://aws.amazon.com/blogs/architecture/whats-new-in-the-well-architected-performance-efficiency-pillar/

We recently published a significant update to the AWS Well-Architected Framework, and as part of that update, the Performance Efficiency Pillar whitepaper has been streamlined and improved. The questions in the whitepaper are now better aligned with those in the AWS Well-Architected Tool, making it easy to cross-reference between them when reviewing workloads. With clearer headings and each best practice highlighted, you can quickly find what you need and then dive deep on any topic.

We also updated the Performance Efficiency pillar to reflect the evolution of AWS and the best practices we are seeing in the field. In each best practice, we have added—and in many cases expanded on—these innovations to improve your performance in the cloud. New services, such as the AWS Nitro System, AWS Global Accelerator, AWS Local Zones, and AWS Outposts, can help you achieve this goal.

AWS Outposts network

The Performance Efficiency pillar now describes hybrid networking options, such as the AWS Outposts network setup shown here.


Enterprise environments are often a mix of cloud, on-premises data centers, and edge locations. To address this, we modified the networking best practice to include performance considerations for hybrid cloud architectures. We also highlight AWS services that can increase your networking performance.

In addition to those changes, we have also added an additional focus to help address common performance bottlenecks in cloud architectures. New explanations are included to help address common situations, such as compute selection, storage selection, and database selection. With database selection, we added new information to help you choose between our 15 purpose-built database engines, which include relational, key-value, document, in-memory, graph, time series, and ledger databases. We now stress that performance optimized workloads should pick the best database to solve a specific problem or group of problems, and not simply utilize one-size-fits-all monolithic databases.

Lastly, we have included new information regarding performance monitoring. For example, we describe using CloudWatch Synthetics to check the latency of your workload endpoints, AWS X-Ray to capture performance-related metrics, and CloudWatch ServiceLens to integrate traces, metrics, logs, and alarms into one place to measure against critical key performance indicators (KPIs). These services allow you to gain a deeper insight into workload performance and ensure that the services are performing as expected.

This update to the Performance Efficiency pillar of the AWS Well-Architected Framework gives you additional insight into the tools and services needed to continuously improve your workload performance. I encourage you to use the AWS Well-Architected Tool to continually measure and improve your cloud workloads over time, measuring the impact of your performance improvements, and documenting those changes as milestones in the tool.

A special thank you to Tom Adamski, Nick Matthews, Steve Seymour, and Allan Clark for their help on this update. We would also like to thank everyone who has given us feedback on the AWS Well-Architected Tool and whitepapers. We would love to hear your feedback on our latest update!

Learn more about the new version of Well-Architected and its pillars

Introducing the Well-Architected Framework for Machine Learning

Post Syndicated from Shelbee Eigenbrode original https://aws.amazon.com/blogs/architecture/introducing-the-well-architected-framework-for-machine-learning/

We have published a new whitepaper, Machine Learning Lens, to help you design your machine learning (ML) workloads following cloud best practices. This whitepaper gives you an overview of the iterative phases of ML and introduces you to the ML and artificial intelligence (AI) services available on AWS using scenarios and reference architectures.

How often are you asking yourself “Am I doing this right?” when building and running applications in the cloud? You are not alone. To help you answer this question, we released the AWS Well-Architected Framework in 2015. The Framework is a formal approach for comparing your workload against AWS best practices and getting guidance on how to improve it.

We added “lenses” in 2017 that extended the Framework beyond its general perspective to provide guidance for specific technology domains, such as the Serverless Applications Lens, High Performance Computing (HPC) Lens, and IoT (Internet of Things) Lens. The new Machine Learning Lens further extends that guidance to include best practices specific to machine learning workloads.

A typical question we often hear is: “Should I use both the Lens whitepaper and the AWS Well-Architected Framework whitepaper when reviewing a workload?” Yes! We purposely exclude topics covered by the Framework in the Lens whitepapers. To fully evaluate your workload, use the Framework along with any applicable Lens whitepapers.

Applying the Machine Learning Lens

In the Machine Learning Lens, we focus on how to design, deploy, and architect your machine learning workloads in the AWS Cloud. The whitepaper starts by describing the general design principles for ML workloads. We then discuss the design principles for each of the five pillars of the Framework—operational excellence, security, reliability, performance efficiency, and cost optimization—as they relate to ML workloads.

Although the Lens is written to provide guidance across all pillars, each pillar is designed to be consumable by itself. This design results in some intended redundancy across the pillars. The following figure shows the structure of the whitepaper.

ML Lens-Well Architected (1)

The primary components of the AWS Well-Architected Framework are Pillars, Design Principles, Questions, and Best Practices. These components are illustrated in the following figure and are outlined in the AWS Well-Architected Framework whitepaper.

WA-Failure Management (2)

The Machine Learning Lens follows this pattern, with Design Principles, Questions, and Best Practices tailored for machine learning workloads. Each pillar has a set of questions, mapped to the design principles, which drives best practices for ML workloads.

ML Lens-Well Architected (2)

To review your ML workloads, start by answering the questions in each pillar. Identify opportunities for improvement as well as critical items for remediation. Then make a prioritized plan to address the improvement opportunities and remediation items.

We recommend that you regularly evaluate your workloads. Use the Lens throughout the lifecycle of your ML workload—not just at the end when you are about to go into production. When used during the design and implementation phase, the Machine Learning Lens can help you identify potential issues early, so that you have more time to address them.

Available now

The Machine Learning Lens is available now. Start using it today to review your existing workloads or to guide you in building new ones. Use the Lens to ensure that your ML workloads are architected with operational excellence, security, reliability, performance efficiency, and cost optimization in mind.