Tag Archives: AWS Well-Architected Framework

Know before you go – AWS re:Invent 2025 guide to Well-Architected and Cloud Optimization sessions

Post Syndicated from Anitha Selvan original https://aws.amazon.com/blogs/architecture/know-before-you-go-aws-reinvent-2025-guide-to-well-architected-and-cloud-optimization-sessions/

Are you ready to maximize your Well-Architected and Cloud Optimization learning and networking time at re:Invent 2025? We have put together this comprehensive guide to help you plan your schedule and make the most of the Well-Architected and cloud optimization sessions available this year. These sessions will deliver the practical guidance your teams need to lead strategic cloud initiatives, design next-generation architectures, optimize costs, or secure AI-powered systems.

Key themes at re:Invent for Well-Architected and Cloud Optimization – You can expect to see the following themes at re:Invent 2025

AI-powered architecture and governance

The sessions in this theme showcase how AWS is integrating AI technologies to transform traditional architectural practices. From using AI services for automated Well-Architected reviews to implementing self-evolving systems with agentic AI, these sessions demonstrate how you can use AI to automate architectural decisions, streamline governance processes, and scale best practices across the enterprise.

Sessions: ARC324-R, ARC317-R, SPS320, ARC302-R (session details are posted in the following section)

Well-Architected Framework evolution and implementation

These sessions highlight how the AWS Well-Architected Framework has evolved beyond its original scope to address modern architectural challenges. Attendees will learn how to implement the framework principles across different domains—from IoT security to backup strategies—while focusing on enterprise-scale governance and compliance.

Sessions: ARC204, SEC337, STG313-R, ARC323-R (session details are posted in the following section)

Cost optimization and FinOps

The cost optimization track focuses on innovative approaches to cloud financial management, with a strong emphasis on AI-powered FinOps solutions. Sessions range from hands-on workshops like the Frugal Architect GameDay to chalk talks on establishing effective cost governance models.

Sessions: ARC318-R, COP309-R, ARC309, DEV318 (session details are posted in the following section)

Session formats to fit your learning style

This year’s catalog features an exciting mix of content across different formats: from breakout, chalk talks, workshops, builder sessions to code talks.

Breakout sessions – Stay in the know

Sit back and enjoy these presentations to stay current with the latest solution enhancements and practical applications. AWS experts and guest speakers will share valuable insights and real-world examples.

From ideas to impact: Architecting with cloud best practices

ARC204 | Breakout session | December 1, 8:30 AM

Discover how foundational frameworks like the AWS Well-Architected Framework, AWS Cloud Adoption Framework, and AWS Cloud Operating Model evolved through customer feedback and real-world learnings from thousands of organizations, transforming from structured guidance into dynamic insights for optimizing cloud environments. Learn practical strategies for applying unified best practices to accelerate cloud transformation while managing large-scale architectural changes and maintaining operational excellence.

Build a well-architected foundation for scaling generative AI and agentic apps

AIM310 | Breakout session | December 1, 10:00 AM

Move beyond proof-of-concepts to build a production-ready foundation supporting all AI applications across your organization, addressing the critical transition from experimentation to enterprise-scale AI deployment. Navigate model access and management, tool discovery, memory and state handling, and observability at scale while building foundations that seamlessly integrate model access, orchestration workflows, agents, and tools with enterprise-grade governance controls.

AI-Powered Enterprise Architecture with ServiceNow & AWS 

ARC337-S | Breakout session | December 2, 3:00 PM

Enterprises face a core challenge: translating architectural vision into resilient cloud reality. See how integrating ServiceNow’s Enterprise Architecture Workspace with the AWS Well-Architected Tool transforms traditional design processes. Through elegant “shift-left” methodologies, architects gain contextual insights that seamlessly blend enterprise modeling with cloud best practices. This presentation is brought to you by ServiceNow, an AWS Partner.

The AI revolution in customer support: Building predictive service systems

SPS315 | Breakout session | December 3, 5:30 PM

Discover how AWS is using generative AI to transform customer support from reactive to proactive. We’ll show how large language models and AI agents are improving customer satisfaction and efficiency. Topics include smart case routing, context-aware support, early problem detection, and responsible AI use. We’ll share real results and discuss balancing AI capabilities with human touch.

Optimize AWS Costs: Developer Tools and Techniques

DEV318 | Breakout session | December 1, 3:00 PM

As cloud applications grow in complexity, optimizing costs becomes crucial for developers. This session explores AWS native tools and coding practices that reduce expenses without compromising performance or scalability.

Chalk talks

AWS speakers set the stage at the beginning of the talk and then open up for discussion. Bring your questions and dive deep into the topic with AWS experts and other customers.

Architecting agentic systems: Self-evolving patterns with AWS AI

ARC324-R | Chalk talk | December 2, 1:30 PM

Learn to architect self-evolving systems using agentic AI that align with AWS Well-Architected principles, exploring cutting-edge patterns for systems that adapt, heal, and optimize themselves autonomously while maintaining architectural integrity. Implement autonomous monitoring and self-healing capabilities with Amazon Bedrock Agents, design AI-driven security controls and automated recovery mechanisms and create systems that continuously adapt to workload patterns while maintaining reliability and performance standards.

Building Well-Architected agentic AI applications

ARC317-R/R1 | Chalk talk | December 2, 3:00 PM and December 4, 1:00 PM

Navigate generative AI agent development with robust architectural practices for security and compliance, focusing on proven patterns for building production-ready agentic AI applications that meet enterprise requirements. Design agent architectures with guardrails, monitoring systems, and access controls using the AWS Well-Architected Generative AI Lens while implementing governance patterns that ensure regulatory compliance and enable systems to scale from prototype to enterprise-wide deployment.

Using generative AI to automate architectural guidance

ARC315 | Chalk talk | December 1, 4:30 PM

Replace time-intensive manual processes with AI-powered systems that generate strategic recommendations, design principles, and best practices at scale while maintaining quality and consistency. Generate organization-specific design principles using AI analysis of architectural patterns, implement AI-driven guidance systems with effective quality control mechanisms, and build knowledge bases that feed AI-powered architectural guidance while maintaining human oversight and addressing ethical considerations.

Agentic architecting: From prototype to production-ready systems

ARC330-R/R1 | Chalk talk | December 2, 5:30PM and December 4, 2:30 PM

Transform prototypes into production-ready systems by incorporating security, monitoring, and CI/CD through agentic architecting, focusing on practical challenges of moving from experimental AI systems to production-grade architectures. Use AI agents to generate and optimize AWS CDK infrastructure and application code, implement automated security improvements and CI/CD pipeline creation, and maintain AWS Well-Architected principles while enabling teams to focus on business logic as AI handles infrastructure complexity.

AI-powered FinOps: Agent-based cloud cost management

ARC318-R/R1 | Chalk talk | December 1, 4:00 PM and December 3, 4:00 PM

Learn how intelligent agents tackle fragmented cost data and optimization processes in complex multi-account environments, moving beyond traditional FinOps approaches to autonomous, intelligent financial optimization. Architect solutions using Amazon OpenSearch Service for data aggregation and Amazon Bedrock for contextual reasoning to design secure, scalable FinOps solutions that continuously optimize costs while delivering measurable business outcomes.

Supercharge your well-architected reviews with AWS Generative AI

SPS320 | Chalk talk | December 3, 4:00 PM

Discover how Koch Industries revolutionized AWS Well-Architected reviews using generative AI, transforming weeks-long manual processes into automated, intelligent systems. Automate architectural assessments using Amazon Bedrock Knowledge Bases and Model Context Protocol (MCP) to scale best practice reviews and optimize workloads in minutes instead of days while achieving more comprehensive, consistent, and actionable recommendations through proven change management and organizational adoption strategies.

Architecting enterprise-scale governance beyond AWS Control Tower

ARC323-R/R1 | Chalk talk | December 3, 11:30 AM and December 4, 2:00PM

Discover advanced governance strategies that build upon AWS Control Tower for enterprise-scale environments requiring sophisticated compliance, security, and operational controls. Implement infrastructure across six Well-Architected Foundations capabilities with critical trade-off understanding, build efficient multi-account structures balancing security requirements with innovation needs, and architect automated compliance checks and policy enforcement at scale while enabling self-service capabilities with centralized governance and security controls.

Securing IoT Workloads with AWS IoT Lens and AWS Security Reference Architecture

SEC337 | Chalk talk | December 3, 11:30 AM

Industrial environments are reaching new levels of connectivity, automation, efficiency, and real-time data insights. However, this increased connectivity also introduces significant security challenges. Unaddressed security concerns can expose vulnerabilities and slow down companies looking to accelerate digital transformation using IoT and cloud. This chalk talk explores relevant techniques, architecture patterns, best practices and AWS security services for securing complex OT/IT environments, IoT devices, edge and cloud using the AWS Well-Architected IoT Lens and AWS Security Reference Architecture (SRA).

Establishing effective cost governance

COP309-R/R1 | Chalk talk | December 3, 3:00 PM and December 4, 12:30 PM

Generative AI agent development demands robust architectural practices for security and compliance. This chalk talk explores proven patterns for architecting secure, efficient AI agents using the AWS Well-Architected Generative AI Lens. Through collaborative discussion and whiteboarding, examine architectural governance and best practices for production environments. Learn to design agent architectures incorporating guardrails, monitoring systems, access controls, and sustainable deployment practices. Gain actionable insights for building secure, efficient, and cost-effective agentic AI applications that scale.

Break down monoliths, modernizing applications on Amazon ECS

CNS346 | Chalk talk | December 2, 4:30 PM

Join this interactive chalk talk to solve a common challenge where monolithic applications take months to deploy new features, and scaling becomes increasingly difficult. We’ll start with a real scenario, an application running on servers with a shared database. Together, we’ll design the modernization path using Amazon ECS and Well-Architected Framework principles. You’ll explore common architecture patterns, containerization strategies, CI/CD automation, and blue/green deployment approaches for ECS. After this session, you’ll walk away with a practical roadmap to transform your monolithic application into scalable microservices. Bring your curiosity and help us build the architecture live.

Hands-on workshop and Builders’ sessions

AWS speakers will introduce the use-case and tools designed to tackle the challenge. You will follow instructions, complete the tasks, and walk away with better understanding of the capabilities.

AI-powered Well-Architected reviews: Building automated governance

ARC302-R | Builders’ session | December 1, 9:00 AM; December 2, 11:30 AM and December 3, 4:00 PM

Build intelligent systems that automate AWS Well-Architected Framework reviews using generative AI, transforming manual architectural assessments into continuous, intelligent governance processes. Evaluate architecture against Well-Architected pillars while incorporating organization-specific requirements, implement continuous analysis of architecture and infrastructure as code templates, and enhance AI understanding of architectural context using Model Context Protocol servers to transform time-intensive reviews into scalable, automated processes with consistent governance.

AI-powered troubleshooting: From chaos to Well-Architected

ARC301-R | Builders’ session | December 1, 8:30 AM; December 2, 3:00 PM and December 3, 10:00 AM

Tackle complex scenarios using AI-powered tools to diagnose and resolve architectural problems, gaining practical experience using AI to transform poorly designed systems into well-architected solutions. Troubleshoot and optimize architectures with scaling bottlenecks and database inefficiencies using Amazon Q, apply Well-Architected principles to enhance performance and security under pressure, and accelerate problem identification and resolution while building AWS optimization expertise and learning to identify architectural anti-patterns before they become critical issues.

The Frugal Architect GameDay: Building cost-aware architectures

ARC309 | Workshop | December 1, 8:00 AM

Compete to implement cost efficiency improvements across multiple AWS services in this interactive GameDay, applying the Laws of the Frugal Architect to real-world scenarios for practical experience in transforming high-cost infrastructure into efficient, sustainable architectures. Address challenges spanning compute, networking, storage, serverless, and observability domains while learning to reduce cloud unit costs and improve profitability without compromising service quality through gamified scenarios that build rapid cost optimization decision-making skills.

Optimize AWS Backup using AI evaluation and Well-Architected best practices

STG313-R | Builders’ session | December 2, 1:30 PM and December 3, 8:30 AM

Enhance AWS Backup implementation using the AWS Backup Evaluator Solution, an AI agent that synthesizes data from multiple sources to provide intelligent backup optimization recommendations. Assess backup environments against the Well-Architected AWS Backup lens using Strands Agents SDK, create unified visibility across backup landscapes to identify optimization opportunities, and implement AI agents that continuously monitor backup efficiency while aligning with AWS best practices to enhance efficiency and cost-effectiveness.


Visit the AWS Cloud Support kiosk in the Venetian

Important notes:

Session dates, times, and locations listed in the post are subject to change as we continue to optimize the schedule on session popularity and venue capacity. Please check this blog post and the re:Invent session catalog regularly for the most up-to-date information about your registered sessions and newly added activities. For a full view of Well-architected content, including sessions with partners, explore the AWS re:Invent catalog and filter on the Well-Architected Framework area of interest.

Remember to reserve your seats early as popular sessions fill up quickly and bring your laptop for hands-on builders’ sessions and workshops. Register today


About the authors

Maximizing Business Value Through Strategic Cloud Optimization

Post Syndicated from Ryan Dsouza original https://aws.amazon.com/blogs/architecture/maximizing-business-value-through-strategic-cloud-optimization/

As cloud adoption continues to accelerate, organizations are realizing that the journey to the cloud is just the beginning. The real challenge—and opportunity—lies in optimizing cloud usage to drive maximum business value. At AWS, we’re committed to helping our customers navigate this journey successfully. Let’s explore some key insights and best practices for cloud optimization from the recent MIT Technology Review publication, Driving business value by optimizing the cloud.

The cloud optimization imperative

Recent data shows that global cloud infrastructure spending reached $84 billion in Q3 2024, marking a 23% year-over-year increase. This growth underscores the critical role of the cloud in driving business agility and innovation. However, to truly harness the power of the cloud, organizations must strike the right balance between cost, security, resilience, and innovation.

André Dufour, AWS Director and General Manager for AWS Cloud Optimization, emphasizes that cloud optimization involves making cloud spending efficient so that freed-up resources can be redirected to fund new innovations, such as generative AI initiatives.

Cloud optimization should be viewed as a continuous process rather than a one-time event, requiring regular assessment whenever business conditions or technical requirements change significantly. The approach should be comprehensive, addressing not just costs but all six pillars (operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability), while also recognizing that different workloads require tailored optimization strategies rather than a one-size-fits-all approach.

Best practices for success

Consider the following best practices:

  • Upskill your team – Empower your employees with cloud, cost management, and optimization skills. As Dufour notes, “Every engineer or builder plays a role in cloud optimization.”
  • Establish a cloud center of excellence – Create a centralized body responsible for developing and distributing cloud best practices throughout your organization.
  • Align finance and business – Make cloud KPIs business-centric rather than purely technical, so cloud optimization efforts support overall business goals.
  • Embrace automation – Use tools to automate cloud provisioning, monitoring, and optimization, reducing human error and effort.
  • Use AI services and solutions for efficiency – Use AI technologies to automate visualization, enhance decision-making, and optimize resource utilization.

Real-world success stories

Our customers are already seeing significant benefits from strategic cloud optimization:

  • DreamCasino achieved 30% cost savings and a 50% reduction in API response times, enabling expansion into new markets
  • BMC Software reduced cloud costs by 25% while improving security and reliability, reinvesting savings into new business opportunities
  • Even within AWS, our use of Amazon Q for application modernization saved an estimated 4,500 years of development work and $260 million in performance benefits

Business impact

Effective cloud optimization delivers more than just cost savings. It enables the following:

  • Faster innovation through reinvestment of saved resources
  • Enhanced security and operational efficiency
  • Improved ability to scale and adapt to business needs
  • Better customer experiences and faster time-to-market
  • The capability to make informed architecture and design decisions by balancing trade-offs across AWS Well-Architected pillars

AWS resources for your optimization journey

To help you accelerate your cloud optimization efforts, AWS provides several tools and resources:

Getting started

To get started, consider the following steps:

Additionally, you can engage with the AWS Cloud Optimization Success (COS) team for more detailed guidance and to help identify what to do next in your cloud optimization journey. The COS team has Solutions Architects who specialize in the Cloud Adoption Framework and Well-Architected Framework and deliver workshops and training sessions though customer and partner engagements. The team can help drive adoption of AWS services through the use of the Well-Architected and Cloud Adoption Frameworks and support other services like AWS Trusted Advisor and AWS Health to optimize cost and cloud architectures. Whether you’re just starting or looking to enhance existing implementations, the AWS COS team provides the guidance, tools, and expertise you need to succeed.

Conclusion

At AWS, we’re dedicated to helping you optimize your cloud journey. By implementing these strategies and best practices, you can unlock the full potential of the cloud, driving innovation and growth while maintaining security and operational excellence.

Ready to take your cloud optimization to the next level? Refer to the resources included in this post and contact your AWS COS team to learn how we can help you maximize the value of your cloud investments.


About the Authors

Introducing new regional implementations of Landing Zone Accelerator on AWS to support digital sovereignty

Post Syndicated from Max Peterson original https://aws.amazon.com/blogs/security/introducing-new-regional-implementations-of-landing-zone-accelerator-on-aws-to-support-digital-sovereignty/

Customers often tell me that they want a simpler path to meet the compliance and industry regulatory mandates they have in their geographic regions. In our deep engagements with partners and customers, we have learned that one of the greatest challenges for customers is the translation of security and compliance requirements into distinct technical controls. At Amazon Web Services (AWS), security is our top priority, and we understand that protecting your data in a world with changing regulations, technology, and risks takes teamwork. As we’ve said, security is foundational to sovereignty.

AWS helps organizations to develop and evolve security, identity, and compliance into key business enablers; that’s why we’re committed to working with national cyber authorities and regulators to help define and establish how their compliance standards can be translated into security best practices in the cloud. We’re responding to customer requests to create locally tailored approaches aligned to their own regional standards and guidance as established by in-region authorities.

Architectural best practice, locally tailored

Since its launch in 2022, Landing Zone Accelerator on AWS has been instrumental in helping thousands of customers deploy cloud foundations that align with multiple global compliance frameworks and AWS best practices, including the Baseline Informatiebeveiliging Overheid (BIO) in the Netherlands, and the Esquema Nacional de Seguridad (ENS) in Spain. AWS is committed to expanding our regional implementations to help customers meet specific national and regional standards and digital sovereignty goals.

In March, I was proud to share the news of the cooperation agreement between the Federal Office for Information Security (BSI) and AWS, where AWS committed to help advance digital sovereignty and cybersecurity best practices and standards in Germany and across the European Union. With that in mind, I’m excited to share that our next regional implementation of Landing Zone Accelerator on AWS will support customers with workloads in Germany. The C5-ready Landing Zone Accelerator is designed to help customers meet their Cloud Computing Compliance Criteria Catalogue (C5) compliance objectives in the cloud. This will be available to our customers in Q3-2025, and at launch, our regional implementations will also be available in AWS European Sovereign Cloud.

The C5 attestation scheme is backed by the German government and was introduced by the BSI in 2016. AWS has adhered to the C5 requirements since their inception. C5 helps organizations demonstrate operational security against common cybersecurity threats when using cloud services through the German government’s Security Recommendations for Cloud Computing Providers.

For many customers in Germany, adherence to C5 is a requirement, and this is evidenced through a compliance assessment by an authorized assessor. Preparing for this assessment is critical for a successful outcome and is why AWS has partnered with AWS Global Security & Compliance (GSCA) Partner Schellman to provide the assessor insight as to how the C5-ready Landing Zone Accelerator can accelerate and simplify the path to C5 adoption for AWS customers.

AWS Partner Schellman: Proven Track Record in C5 Assessments

As one of the few firms with deep expertise and experience in C5 assessments, Schellman has completed several dozen evaluations across a wide range of clients—from agile startups to global enterprises. This diverse portfolio underscores Schellman’s capabilities, deep technical expertise, and unwavering commitment to security assurance.

“Our team has seen firsthand how the C5 standard fosters transparency and builds trust in cloud services. We’re proud to support our clients not just in understanding C5, but in strategically leveraging it to improve security and competitiveness on a global scale.”
Jeff Schiess, Managing Director, Schellman

Lowering the Barrier to Entry – Schellman recognizes that achieving C5 compliance can sometimes be intimidating, particularly for organizations new to the framework. To that end, Schellman has performed an assessment against the foundational infrastructure provided by LZA on AWS, designed to simplify the C5 journey. The LZA provides preconfigured infrastructure templates and security baselines that significantly reduce the complexity of establishing C5-compliant cloud environments.

“With the Landing Zone Accelerator, organizations can build on a C5-ready foundation right from the start. It’s a practical, scalable solution for companies that might otherwise find the C5 standard overwhelming.”
Kristen Wilbur, Principal, Schellman

Sovereign by design

Landing Zone Accelerator on AWS automatically implements hundreds of security capabilities that map to control requirements across geographic compliance frameworks. This saves customers hundreds of hours in planning and implementing secure networking and account configurations by providing them with a foundation based on the AWS Well-Architected Security Pillar and AWS security best practices. Meeting compliance requirements, having verifiable access controls and data transfer restrictions, independence and choice over the technology stack, and surviving large-scale disruptions are some of the key capabilities that customers require of a sovereign-by-design workload. However, for many customers, translating regulatory requirements into a set of discrete technical controls and applying them consistently across one or more AWS accounts and AWS Regions can be time-intensive and challenging.

We provide customers and partners with detailed guidance on how to configure Landing Zone Accelerator on AWS in accordance with their local security and compliance requirements, including digital sovereignty requirements. This includes control mapping to local regulations or policies that shows customers how controls implemented in a landing zone are mapped to the specific requirements, calling out where customers are required to do more to meet these as part of our shared responsibility model—this includes organizational policies and procedures where customers must implement additional controls within their application or workload to meet local requirements.

Control over the location of your data

Landing Zone Accelerator on AWS provides customers with a choice of configurable preventative, detective, and proactive controls to help customers meet their data residency, security, and compliance objectives, whether you’re a public sector customer wanting to keep data in a single Region or navigating the complex needs of multi-national organizations with operations subject to differing digital sovereignty requirements.

Verifiable control over data access

Landing Zone Accelerator on AWS goes beyond just provisioning a secure, multi-account environment. It establishes a well-structured, multi-account architecture using AWS Organizations. This logically isolates workloads, management functions, and security controls into dedicated organizational units (OUs). This not only enhances security and operational efficiency, but also helps customers to enforce consistent data residency, access management, and compliance policies across their entire cloud footprint. These powerful guardrails empower customers to quickly harness the innovative potential of cloud technologies, whilst delivering business value from an established security and compliance baseline.

By providing this automated approach, AWS empowers organizations to rapidly deploy cloud environments tailored to their specific local requirements in days instead of weeks; with robust security, compliance, and operational guardrails in place from the outset. Landing Zone Accelerator on AWS is designed to simplify the path to cloud adoption and compliance for organizations, particularly those in regulated industries or with sovereignty requirements. This approach marks a shift from the previous heavy lift required for organizations to migrate workloads to the cloud while meeting their needs.

Partners at the core

There is a lot of complexity involved with navigating the evolving digital sovereignty landscape—but you don’t have to do it alone. Our AWS Digital Sovereignty Competency connects customers with trusted partners with demonstrated expertise to advise and architect for their customers’ digital sovereignty needs while taking advantage of the full potential of the AWS Cloud. As part of the competency, AWS is supporting partners to navigate customer challenges across four pillars: data residency, data protection, access control, and survivability.

Customers have told me about how challenging it can be to architect to address their sovereignty needs, often requiring manual iteration and longer time to value. Using Landing Zone Accelerator on AWS is one of the ways AWS and AWS Partners can work together to address customers’ sovereignty needs with a repeatable approach that helps our customers and partners move faster. I’m excited by how regional implementations of Landing Zone Accelerator on AWS is helping AWS Sovereignty Partners, such as Atos and SVA, to move faster without compromise.

“Compliance with regulations like C5 is essential for customers in the public sector and regulated industries, who prioritize digital sovereignty, and this is central to our Cloud for Clinics initiative with AWS in the German Healthcare market. The availability of the C5 LZA significantly reduces the technical complexity, giving us a common technical platform to build on reducing time to market. Atos is driving the operational rollout and expanding the scope of compliance mappings to further streamline customer compliance. At the same time, we are incorporating essential managed services like SOC/SIEM which we believe will make compliant cloud adoption easier to drive innovation by the Public Sector, Healthcare institutions or customers in regulated industries like Financial Services and Utilities.”
Boris Hecker, Managing Director, ATOS Germany

“Compliance with BSI C5 criteria for customers from the public sector and regulated industries is a basic requirement for the use of public cloud services. Implementing the regulations is often complex, time-consuming and resource-intensive. For this reason, customers are looking for solutions that they can tailor to the specific requirements of their industry; while ensuring they meet compliance standards. SVA supports customers in maintaining the balance between innovation and compliance with customized, C5-certified, managed services. We rely on solutions such as the Landing Zone Accelerator on AWS to reconcile the use of market-leading public cloud infrastructure with regulatory requirements.”
Patrick Glawe, Hyperscaler Lead at SVA

For more information, see Landing Zone Accelerator on AWS and AWS Digital Sovereignty Competency Partners

Max Peterson

Max Peterson

Max is the Vice President of AWS Sovereign Cloud. He leads efforts to ensure that all AWS customers around the world have the most advanced set of sovereignty controls, privacy safeguards, and security features available in the cloud. Before his current role, Max served as the VP of AWS Worldwide Public Sector (WWPS) and created and led the WWPS International Sales division, with a focus on empowering government, education, healthcare, aerospace and satellite, and nonprofit organizations to drive rapid innovation while meeting evolving compliance, security, and policy requirements. Max has over 30 years of public sector experience and served in other technology leadership roles before joining Amazon. Max has earned both a Bachelor of Arts in Finance and Master of Business Administration in Management Information Systems from the University of Maryland.

Build and operate an effective architecture review board

Post Syndicated from Darrin Weber original https://aws.amazon.com/blogs/architecture/build-and-operate-an-effective-architecture-review-board/

The rapid change of pace in computing landscapes because of cloud, artificial intelligence, and technology innovation has challenged organizations to keep up while making sure that their initiatives and projects remain compliant with enterprise guidelines and policies. An effective architecture review board (ARB) can help an organization maintain compliance with enterprise guardrails while accelerating implementation of initiatives in their project pipeline.

In this post, we identify the components of an efficient architecture review process, define what an ARB is, and describe how to build and operate an effective enterprise ARB.

What is an architecture review board?

An ARB is a multi-disciplinary team responsible for reviewing solution architectures to help ensure compliance with enterprise guidelines, best practices, and supportability. Team members include stakeholders from different disciplines throughout your organization, which typically include Security, Development, Enterprise Architecture, Infrastructure, and Operations. Including a broad set of stakeholders reduces the amount of project recycle that happens when stakeholder representation is overlooked.

An ARB isn’t a standalone group, it operates within the context of your project implementation process, reviewing solution architectures, custom development, and purchased solutions to maintain enterprise compliance and alignment with goals. As shown in the following diagram, architecture review typically occurs after the design phase—before a build or purchase decision—and again before deployment to validate that the reviewed architecture matches the solution that was built.

Project implementation process with architecture review checkpoints

Most organizations recognize the benefits and value of establishing an ARB. However, they often struggle to define and operate one in a manner that maximizes the benefits, integrates with overall project execution processes, and satisfies the needs of all the stakeholders. An efficient architecture review process imparts organizational benefits such as reduced costs, minimized security events, and diminished technical debt.

Life without a formal architecture review process

One of the most pronounced issues with implementing and maintaining software architecture is the difficulty in achieving human consensus. In any organization, you’ll find a diverse range of team members—each with their own priorities, perspectives, and pain points. Without a formalized review process, these differences can lead to prolonged debates and stalled projects. We often find that many members tend to fall into one of these personas:

The Not Invented Here The Not Invented Here – This individual doesn’t trust any software unless it was built and operated by members of their company. They’re generally wary of any cloud solution and will expend development time to avoid capital expenditure.
The Wait a Minute The Wait a Minute – This individual has good feedback and their input is welcome, but they tend to wait until the last minute before providing any feedback, making it difficult to have productive conversations and act on any constructive criticism.
The Bottleneck The Bottleneck – This individual craves control and insists that all reviews, decisions, and conversations go through them. This makes scaling the architecture review process very challenging and decisions will often come down to the whim of this one person.
The Creative The Creative – This individual has passion for software and for creating things, but will often choose complexity over simplicity and turn their architectures into art projects.
The Perfectionist The Perfectionist – This individual tends to let the perfect be the enemy of the good. While their intentions are pure, this approach can result in delayed decision making and debates on topics that might not be worth the time of the board.
The Historian The Historian – This individual has been at the company for a long time and remembers every success and failure along the way. While the context this individual brings to the table is invaluable, teams must guard against only looking to the past as they try to shape the future.

Benefits of an architecture review board

Establishing an ARB within your organization can yield substantial benefits, enhancing both the quality and efficiency of your architecture. Some key advantages are:

Improved compliance

By systematically reviewing architectural decisions, the ARB helps ensure that designs adhere to company best practices, open standards, and regulatory requirements as set forth by your enterprise architecture.

Reduced technical debt

Technical debt—taking shortcuts in the development process that lead to future complications—is a common issue in software development. The ARB helps identify and mitigate technical debt early in the design phase. By enforcing architectural standards and promoting best practices, the board helps ensure that decisions are made with long-term sustainability in mind. This approach results in more robust, maintainable codebases and reduces the likelihood of future rework.

Efficiency with lowered costs

While a formal architecture review sounds like it might have the potential for increased red tape and lowered efficiency, the ARB instead contributes to operational efficiency by standardizing architectural practices across the organization. This uniformity allows for better resource allocation, faster deployment cycles, and more predictable project timelines. By catching potential issues early in the design phase, the ARB helps avoid costly rewrites and rework, which can lead to significant cost savings over time.

Supportability

Designing for supportability is crucial for the long-term success of any application. The ARB makes sure that architectures are built with maintainability in mind, making it easier for operations teams to manage and troubleshoot systems. This focus on supportability leads to fewer downtime incidents, faster resolution times, and overall higher system reliability. By making sure the composition of the ARB crosses all parts of the organization, supportability concerns can be surfaced earlier and help ensure that changes are properly socialized.

Security

Above all, security is the most critical output of an effective ARB. The ARB plays a pivotal role in embedding security into the architectural fabric from the outset. By conducting thorough security reviews and incorporating security best practices into every design, the ARB makes sure that applications are resilient against unintended disclosure, inadvertent access, and threat actors. This proactive approach not only protects sensitive data, but also builds trust with your customers and stakeholders.

Steps for effective architecture review boards

Whether looking to establish a new architecture review process or improve the effectiveness of a current ARB, we’ve identified eight key steps to make sure that an ARB operates in a way which realizes the benefits of a robust architecture review process while maintaining enterprise compliance. With the exception of leadership support, the steps aren’t presented in a particular order and can be implemented in parallel or in whatever order fits your organization and resource availability.

Leadership support

Identifying a sponsor on the executive leadership team is crucial to the success of the ARB. An executive sponsor fosters participation from stakeholders, representing key organizations such as Security, Development, and Operations, along with gaining their commitment to the review processes. The executive sponsor helps embed the ARB function within the enterprise’s project implementation process. Supported by the executive sponsor, the ARB’s reviews serve as a formal gate within the project process, reducing attempts to bypass the review processes.

Single source for guidance, policies, and best practices

Establish a single, well-known repository or index so that the entire enterprise has a single source of truth that establishes the basis for designing and reviewing architecture. A common repository doesn’t need to be complex. It can be a central document location, wiki, or file share that’s quickly discoverable. Commonly, an enterprise’s collection of guidelines and policies are dispersed and managed by each organization using different mechanisms and repositories. Best practices are often treated as folklore passed between team members. Project teams and ARB stakeholders need to share a common understanding of the enterprise’s collective intelligence consisting of guidelines, policies, and best practices.

As the project community’s collective understanding of the enterprise guidelines and policies grows, initial solution designs are better aligned, and reviews through the ARB accelerate. After a common repository is established, consider using generative AI to create a natural language chatbot, a design chatbot, to simplify access to the collective guidelines, policies, and best practices. See Amazon Bedrock or Amazon Q – Generative AI Assistant.

Defined stakeholders

Make sure that your disciplines have defined stakeholders on the ARB. A good starting point is to identify stakeholders from the Security, Enterprise Architecture, Development, Infrastructure, and Operations teams. Broad representation on the board minimizes recycles and delays later in the project, which can occur when stakeholders aren’t engaged in the review process from the beginning. A stakeholder’s responsibility is to focus on their area of subject matter expertise and commit a portion of their time to the ARB. Consider rotating stakeholders periodically to distribute knowledge and workload through the organization.

Gated process with documented decisions

As previously described, architecture reviews typically occur after design and before solution implementation or purchase. Optionally, another architecture review takes place before deployment to validate that the solution matches what was reviewed and approved. It’s important to complete the review before implementation or the purchase decision and to get stakeholder sign off. Otherwise, projects risk rework and delay later in the process, often impacting cost or schedule to a greater degree. Document each ARB action, including approvals, reasons for recycles, exceptions required, follow-ups needed, and so on. Documented decisions should be added to the project’s overall lifecycle documentation to benefit future inspection of project or similar solution architectures.

Establish an exception process

There will always be exceptions to your enterprise guidelines or policies. Plan for exceptions with a defined process for reviewing, escalating, and gaining approval. Include leadership from both IT and business areas in the assessment and sign-off on an exception. Most importantly, set expiration dates on the exceptions–they should not be granted indefinitely. Exceptions are typically granted to accommodate a temporary nonconformance to provide time to plan for and implement a better, long-term solution.

Architecture central repository

Establish a well known, central repository for solution architecture documents. Solution documentation should be treated as living artifacts that are maintained for the lifecycle of the use case. A central architecture repository benefits teams responsible for operating and maintaining solutions, along with design teams chartered with new solution design. After a repository is established, consider including your architecture documentation in the generative AI design chatbot mentioned previously.

Automate review process

Employ automated architecture review processes wherever possible. Automated review processes allow stakeholders to focus their time on their subject matter expertise instead of administrative tasks. Consider separate review processes based on an initiative’s complexity, cost, and impact. Schedule live meetings with the ARB for the most complex and impactful solutions, and use offline mechanisms, such as email, for other efforts. Define a universal architecture template to capture areas of interest for review and automate the Q&A and sign-off processes. Consider using generative AI to do initial automated design reviews against enterprise core best practices and policies to further streamline stakeholder review processes.

Architecture review process shepherd

Identify a shepherd to help ensure that solution architectures are reviewed and the ARB review processes are broadly understood. The shepherd functions as a liaison with executive sponsors for exceptions. While the shepherd can also be a stakeholder on the board, the shepherd is not the single overall decision maker. The shepherd champions the continuous improvement of the architecture review process and mechanisms.

Conclusion

In this post, we explored the benefits of establishing an architecture review board within an organization, emphasizing its role in maintaining compliance, reducing technical debt, and enhancing operational efficiency. We discussed the challenges organizations face in setting up an effective ARB and provided guidance on the essential components and steps required to build and operate a successful ARB. By following the outlined steps, organizations can maximize the benefits of an ARB, making sure that architectural decisions align with enterprise goals and standards while fostering a culture of continuous improvement and stakeholder collaboration.

For additional guidance on garnering the leadership support necessary for an effective ARB, see Well-Architected Framework: Provide executive sponsorship. For more details on the review process, see Well-Architected Framework: The review process and AWS Well-Architected Tool, an AWS Management Console-based service that provides a consistent process for measuring your architecture using AWS best practices. If you’re interested in establishing a natural language chatbot interface for your enterprise architecture information, see Amazon Bedrock, Amazon Q Business, or Build a contextual chatbot application using Amazon Bedrock Knowledge Bases.


About the authors

Announcing the Well-Architected Data Residency with Hybrid Cloud Services Lens

Post Syndicated from Enrico Liguori original https://aws.amazon.com/blogs/architecture/announcing-the-well-architected-data-residency-with-hybrid-cloud-services-lens/

As organizations increasingly take advantage of the benefits of hybrid cloud architectures, they are often faced with the challenge of balancing data residency requirements with the flexibility of the cloud. To help you navigate this complex landscape, we are excited to introduce the AWS Well-Architected Data Residency with Hybrid Cloud Services (DRHC) Lens paper.

The AWS Well-Architected Framework provides a consistent approach for evaluating architectures and applying best practices to build reliable, secure, efficient, and cost-effective systems in the cloud. The framework is based on six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability.

What’s in the Well-Architected Data Residency with Hybrid Cloud Services Lens?

The DRHC Lens identifies a set of key design principles to help you in the development and deployment of Well-Architected hybrid cloud workloads:

  • Classify data and workloads – Determine which data sources and workloads must remain on premises or in a specific geography to comply with data residency requirements, and which can be moved to an AWS Region.
  • Establish operational practices for data sovereignty – Develop and implement distinct processes and procedures for data and workloads that are subject to data residency or data sovereignty regulations.
  • Use in-Region services whenever possible – Data residency and data sovereignty regulations often apply only to specific elements of an application, or to specific data. For those elements that aren’t in scope for these regulations, use the broad set of services available in Regions, such as identity, automation, and monitoring services.
  • Automate infrastructure – Build orchestration and automation frameworks that adapt to data residency or data sovereignty classifications to deploy compliant, repeatable, and verifiable application stacks. Automation is critical for eliminating manual processes that might introduce mistakes that can lead to compliance failures.
  • Implement robust security measures – The protection and integrity of data is critical to complying with corporate, data-residency, and data-sovereignty regulations. Use all of the tools and services available across AWS Regions, AWS Local Zones, and on AWS Outposts, including specific preventative controls to meet data residency regualations, when developing and deploying your applications.

In addition to these design principles, the lens provides detailed guidance across the six pillars of the Well-Architected Framework:

  • Operational Excellence – Achieve effective system operations, gain operational insights, and implement continuous process improvement for hybrid cloud workloads.
  • Security – Establish control objectives, implement access controls and governance, and configure detection mechanisms to protect data across on-premises and cloud environments.
  • Reliability – Deploy multiple Outposts or Local Zones for high availability, plan for disaster recovery, and maintain operations during on-premises maintenance.
  • Performance Efficiency – Carefully select services, Regions, and configurations that align with latency, bandwidth, and data residency requirements.
  • Cost Optimization – Implement a tagging strategy, monitor and manage Outposts capacity, and optimize network configurations to reduce costs.
  • Sustainability – Anchor Outposts and Local Zones to sustainable Regions, scale infrastructure to match demand, and optimize storage to reduce energy consumption.

Who should use the DRHC Lens?

All AWS customers with data residency requirements, including government agencies, healthcare providers, financial institutions, and public sector industries who are looking for guidance on these architectural patterns, can use the DRHC Lens. When designing and maintaining your architecture with data residency in mind, the following roles can benefit from this lens:

  • Business and technology leaders — For strategic decision-making around data sovereignty requirements
  • Chief technology officers — For aligning technology choices with data residency compliance needs
  • Solutions architects and data residency specialists — For designing compliant hybrid architectures following Well-Architected principles
  • Security and compliance teams — For making sure data handling meets regulatory requirements across Regions
  • Operations teams — For maintaining daily compliance with data residency controls
  • Data engineers and edge computing specialists — For implementing technical controls that enforce data locality requirements

Conclusion

The new Well-Architected Data Residency with Hybrid Cloud Services Lens is available now. Use the lens whitepaper to adopt your hybrid cloud workloads according to the tenants of the Well-Architected Framework while maintaining data sovereignty requirements.

Applying the Data Residency with Hybrid Cloud Services Lens to your architecture helps validate your data handling practices and provides actionable recommendations to address gaps. AWS will continue to update the whitepaper as new services and features are released. Using the lens can help you consistently apply the latest patterns to meet your requirements. This allows you to deploy applications to meet your business objectives and requirements.


About the Authors

Realizing twelve-factors with the AWS Well-Architected Framework

Post Syndicated from Michael Phorn original https://aws.amazon.com/blogs/architecture/realizing-twelve-factors-with-the-aws-well-architected-framework/

Organizations that are interested in improving their development velocity that follow the principles of the twelve-factor app might find benefits in understanding how to realize those concepts on Amazon Web Services (AWS). In this post, I will help you correlate the twelve-factors app concepts as you architect solutions on AWS.

Twelve-factors

Let’s start with a quick recap of twelve-factors. The Twelve-Factor App was published in 2011 by Adam Wiggins as a collaboration between developers at Heroku. He published it at a time when developers were shifting from a paradigm of writing software-as-a-service (SaaS) applications in their own cloud environments to having the applications hosted on a cloud provider, such as AWS. Their intent was to provide “a broad set of conceptual solutions” for building applications that were portable and resilient. The principles centered around reducing the software lifecycle burden, including application introduction, maintenance and operations, through sunsetting. These principles were captured in the following 12 factors:

  1. Codebase
  2. Dependencies
  3. Config
  4. Backing services
  5. Build, release, run
  6. Processes
  7. Port binding
  8. Concurrency
  9. Disposability
  10. Development and production environment parity
  11. Logs
  12. Admin process

These principles are portable and resilient application best practices.

At AWS, we have the Well-Architected Framework to capture cloud and architecture best practices, which contains similar practices to twelve-factors. The Framework comes from years of AWS Solutions Architect collective experience of building solutions across business verticals and use cases. The results are architectures that support secure, high-performing, resilient, and cost-effective systems in the cloud. If you’re responsible for the underlying infrastructure or the application, the Framework helps you, the CTO, the architect, the developer, or operations team member, understand the benefits and trade-offs of decisions that have to be made.

A brief history of the AWS Well-Architected Framework

AWS published the first version of the Framework in 2012, and we released the AWS Well-Architected Framework whitepaper in 2015. Following the initial introduction, we added the Operational Excellence pillar in 2016 and released pillar-specific whitepapers and AWS Well-Architected Lenses in 2017. The following year, the AWS Well-Architected Tool was launched.

While twelve-factors focuses on application characteristics, the AWS Well-Architected Framework provides architectural guidance. When your architecture undergoes a Well-Architected review, you can meet the guidance for a twelve-factors application more easily. With some factors, the Framework helps the application developer delegate some responsibility from the application to the infrastructure. Both frameworks aim to help you deliver applications and services that are robust, scalable, and cloud centered. The AWS Well-Architected Framework helps you reinforce these mechanisms.

The six pillars of the AWS Well-Architected Framework

Let’s explore the six pillars of the AWS Well-Architected Framework, what each aims to achieve, and where the twelve-factors concepts intersect with AWS guidance.

The following figures shows the twelve factors and how they map to processes in AWS, which are described in this section.

1. Operational excellence

The operational excellence pillar helps you review your organization’s ability to support development and run workloads efficiently. You can use the topics in this pillar to evaluate how you operate your solutions. The pillar guides you through inspection of organizational structure, inspection of your mechanisms, and identification of obstacles and roadblocks that might slow your ability to innovate. The results include a feedback loop of continuous improvement for operating the infrastructure and solutions.

The factors you capture through operational excellence are codebase (I) and development and production environment parity (X). Codebase prescribes that there is exactly one codebase used to deploy everywhere, which echoes the purpose of reducing the operational burden of maintaining your software. The argument for a single code base is for consistency, traceability, and efficiency across a unified development lifecycle. The second factor is development and production environment parity, which encourages developers to create smaller but more frequent deployments. It also encourages developers to maintain parity not just of the core software, but also the backing services between environments. Parity of environments is conducive to smoother development and deployment processes. Additionally, this parity helps developers catch issues in a non-production environment more consistently.

AWS services that can help you achieve operational excellence are AWS CodeConnections, AWS CodePipeline, AWS CloudFormation, AWS Systems Manager, Amazon CloudWatch, AWS Config, AWS CloudTrail, Amazon EventBridge, AWS X-Ray, AWS Organizations, AWS Control Tower, AWS Trusted Advisor, AWS Service Catalog, AWS Proton, Amazon CodeGuru (Preview), AWS Lambda, Amazon Simple Queue Service (Amazon SQS), Amazon Simple Notification Service (Amazon SNS), and AWS StepFunctions

2. Security

The security pillar describes how to use AWS Cloud technologies to protect data, systems, and assets that improve your security posture. At AWS, we advocate the shared responsibility model, which applies to the security pillar. AWS is responsible for providing a secure environment for managing and operating your systems and solutions, but it is your responsibility to implement those best practices in the context of your requirements. The security pillar describes best practices such as reviewing how you manage identities for people and machines, which helps you store secrets securely.

The config (III) factor can be mapped to the security pillar, which advises you to store variables and items that depend upon the environment as environment variables. This allows you to move between deployments without having to update your code. Configuration settings such as database connection strings, API keys, credentials, and other sensitive information should be separated from the application code. At AWS, we provide services that can be used to securely meet this requirement, including AWS Secrets ManagerAWS Systems Manager Parameter Store, AWS Certificate Manager, and AWS Key Management Service (KMS).

AWS services that can help you achieve security are AWS Identity and Access Management (IAM), Amazon GuardDuty, AWS Shield, AWS Web Application Firewall (WAF), Amazon Inspector, AWS CloudHSM, Amazon Macie, AWS Security Hub, AWS Config, AWS CloudTrail, Amazon VPC (Virtual Private Cloud), AWS Direct Connect, Amazon Cognito, AWS Firewall Manager, AWS Network Firewall, and AWS IAM Access Analyzer.

3. Reliability

The reliability pillar encompasses the ability of a workload to perform its intended function correctly and consistently when it’s expected to. Reliability means that your architecture and systems:

  • Appropriately scale resources to meet demands
  • Mitigate disruptions caused by misconfiguration or transient network issues
  • Recover when disruptions do occur

Automation of scaling and recovery are best practices within the reliability pillar.

Because twelve-factors helps developers deliver a reliable application, multiple factors are categorized under the reliability pillar of the AWS Well-Architected Framework. Backing services (IV) explains that you should have flexibility for integrating services. This way, when your system experiences issues with availability, the application can replace the troubled service without code changes. You should choose the right resource that provides scalability while optimizing costs. Dependencies (III) describes that applications declare and isolate dependencies to become modular and self-contained. This speeds up recovery by simplifying the setup for handlers of the application code. Applications that adhere to the processes (VI) factor run as a collection of stateless processes to support scaling. This is equivalent to creating microservices that can scale up or down depending upon the workload or bring in additional instances when one fails. Disposability (IX) suggests that an application’s processes can be started and stopped rapidly, which makes the application resilient to failures and capable of being adapted to elastically scale.

AWS services that can help you achieve reliability are Amazon EC2 Auto Scaling, Elastic Load Balancing (ELB), Amazon RDS Multi-AZ, Amazon Simple Storage Service (Amazon S3), AWS CloudFormation, Amazon Route53, AWS Shield, AWS Backup, Amazon CloudWatch, AWS Systems Manager, AWS Global Accelerator, Amazon Aurora, AWS Lambda, Amazon DynamoDB, and AWS Transit Gateway.

4. Performance efficiency

The principles under the performance efficiency pillar focus on using computing resources to build architectures on AWS that efficiently deliver sustained performance as demand changes and technologies evolve. Topics in this pillar include simplifying the consumption of technologies that align with your goals, the ability to go global in minutes, and reducing the time and effort needed to deliver a service.

The concurrency (VIII) factor prioritizes management of processes, which should be stateless and allow for horizontal scaling, promoting performance efficiency. The backing services (IV) factor also falls under this category because it dictates flexibility in integration. This flexibility enables the application to maximize performance by using the right resource that meets scalability and performance requirements.

AWS services that can help you achieve performance efficiency are Amazon Elastic Compute Cloud (Amazon EC2), Amazon EC2 AutoScaling, Amazon Elastic Block Store (Amazon EBS), Amazon S3, Amazon Aurora, Amazon DynamoDB, Amazon ElastiCache, Amazon CloudFront,Application Auto ScalingElastic Load Balancing (ELB), AWS Lambda, Amazon API GatewayAWS Step Functions, Amazon SQS, Amazon Kinesis, AWS Global Accelerator, Amazon Aurora, AWS X-Ray, Amazon CloudWatch, and AWS Compute Optimizer.

5. Cost optimization

The cost optimization pillar provides guidance for the architecture’s ability to operate systems and deliver the business value at the lowest price point. The cost optimization reviews help you avoid unnecessary costs, analyze and attribute expenditure, and use appropriate pricing models.

The relationship of the build, release, run (V) factor advocates for the process separation and strict discipline around efficient handling of application deployments. This aligns to the cost optimization pillar because cost effective operations are typically a result of well-designed processes and mechanisms. AWS services that can support the build, release, run factor are, AWS CodeBuild and AWS CodeDeploy.

Other AWS services that can help you with cost optimization are AWS Cost Explorer, AWS Budgets, AWS Data Exports, AWS Trusted Advisor, AWS Compute Optimizer, EC2 Spot Instances, AWS Savings Plans, Amazon S3 Intelligent-Tiering, AWS Lambda, Amazon Aurora, Application Auto Scaling, AWS Organizations, AWS Resource Groups, Tag Editor, AWS Marketplace, AWS License Manager, AWS Glue, and Amazon Athena.

6. Sustainability

The sustainability pillar focuses on minimizing the environmental impact of running workloads in the cloud. Topics in this include reviewing the lifecycle of your data and retention policies as a methodology to use only what is needed.

The disposability (IX) factor is aligned to sustainability because it highlights an application’s ability to rapidly start and shut down at a moment’s notice. This provides agility and optimized use of resources during the life of the application.

AWS services that can help you achieve sustainability are AWS Customer Carbon Footprint ToolAmazon EC2 AutoScalingAWS Lambda, Amazon EC2 Spot Instances, Amazon EBS gp3 volumes, Amazon S3 Intelligent-Tiering, Amazon S3 Lifecycle configurations, AWS Graviton processors, Amazon Aurora Serverless, Amazon RDS Multi-AZ deployments, AWS Compute Optimizer, AWS Well Architected Tool, and Amazon CloudWatch.

Remaining factors

Port binding, logs, and admin processes aren’t specifically categorized into the pillars of the AWS Well-Architected Framework. However, these factors can be addressed as an essential part of the services that AWS delivers.

The seventh factor: Port binding

The port binding factor says that an application should be bound to a specific port when it’s hosted as a web application with the intention of making the application completely self-contained. In the context of AWS, we offer you different ways to achieve this principle, which are dependent upon the way your application is deployed on AWS. When implementing port binding on AWS, we offer features such as security, service discovery, and dynamic port mapping to simplify and secure your applications through services like Amazon EC2, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Elastic Beanstalk, AWS App Runner.

The eleventh factor: Logs

The logs factor dictates that an application should treat its running process as an event stream out to files that are managed completely by the execution environment. AWS offers many types of logging to capture different aspects of your application and the supporting infrastructure. CloudWatch is a centralized logging management service that monitors, stores, and provides access to log files from AWS services. For more detail, see AWS services for logging and monitoring.

The twelfth factor: Admin processes

The admin processes factor advises application developers to perform administrative tasks in an isolated manner to minimize the impact on the main application. At AWS, this factor is realized as a separation of the control plane and the data plane. The control plane is responsible for managing, configuring, and controlling the network or system infrastructure, while the data plane is responsible for the handling the actual user data or traffic. This separation is an inherent part of AWS services. We believe this separation allows AWS to deliver services that are scalable, highly available, secure, and efficient.

Applying the AWS Well-Architected Framework

The Framework shouldn’t be treated as a checklist that you review after development is complete. Instead, a review should be explored during the design phase to help you learn and apply architectural best practices. By the end of development, architects should have built a solution that facilitates faster, lower-risk service building and deployment. The Framework is not a static document, and as AWS evolves, architects continue to learn from working with customers and refine the definition of well-architected.

Conclusion

If you are familiar with twelve-factors or want to develop a twelve-factors app on AWS, read more about the AWS Well-Architected Framework. Consider starting a review project on your own to explore the detailed questions underneath each category or if you have specific workload that you’re already working on. You can use one of the many AWS Well-Architected Tool lenses to focus on applying these best practices to the services that you’re using. To get started on a lens review, see AWS Well-Architected Tool, which is accessible at no charge through the AWS Management Console.


About the author

Top Architecture Blog Posts of 2024

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2024/

Well, it’s been another historic year! We’ve watched in awe as the use of real-world generative AI has changed the tech landscape, and while we at the Architecture Blog happily participated, we also made every effort to stay true to our channel’s original scope, and your readership this last year has proven that decision was the right one.

AI/ML carries itself in the top posts this year, but we’re also happy to see that foundational topics like resiliency and cost optimization are still of great interest to our audience.

(By the way, if you were hoping for more AI/ML content, head on over to our sister channel, the AWS Machine Learning Blog!).

Without further ado, here are our top posts from 2024!

#10 Deploy Stable Diffusion ComfyUI on AWS elastically and efficiently

This post helps you get started using ComfyUI, and was so successful that we followed it up later in the year with How to build custom nodes workflow with ComfyUI on EKS!

Architecture for deploying stable diffusion on ComfyUI

Figure 1. Architecture for deploying stable diffusion on ComfyUI

#9 Let’s Architect! Designing Well-Architected systems

In keeping with Let’s Architect! series, we have our first of three favorites for the year. This set of resources helps you apply Well-Architected standards in practice.

Let's Architect

Figure 2. Let’s Architect

#8 Let’s Architect! Learn About Machine Learning on AWS

As I said, Let’s Architect! has a winning series, and they’ve got a finger on the pulse of the tech world. This post about machine learning showcases some of the most exciting things happening at AWS.

Let's Architect

Figure 3. Let’s Architect

If you’re more interested in generative AI, you can also take a look at another post from 2024: Let’s Architect! GenAI

#7 Creating an organizational multi-Region failover strategy

Preparedness is another common theme in this year’s favorites. Michael, John, and Saurabh are well-versed in multi-Region architecture, and they’re here to share some strategies to contain failure impact.

When the application experiences an impairment using S3 resources in the primary Region, it fails over to use an S3 bucket in the secondary Region.

Figure 4. When the application experiences an impairment using S3 resources in the primary Region, it fails over to use an S3 bucket in the secondary Region.

#6 Building a three-tier architecture on a budget

Let’s talk cost optimization. This post about a three-tier architecture that relies on the AWS Free Tier is a must-read for anyone looking for tips to help them avoid unnecessary costs (and that’s everyone).

Example of a three-tier architecture on AWS

Figure 5. Example of a three-tier architecture on AWS

#5 Announcing updates to the AWS Well-Architected Framework guidance

As usual, Haleh & team are pros at making sure the Well-Architected Framework is current and relevant. Take a look at the enhanced and expanded guidance in all six pillars.

Well-Architected logo

Figure 6. Well-Architected logo

#4 Let’s Architect! Serverless developer experience in AWS

One more winning post from Luca, Federica, Vittorio, and Zamira! This collection of developer resources includes new ideas in AWS Lambda, Amazon Q Developer, and Amazon DynamoDB.

Let's Architect

Figure 7. Let’s Architect

#3 London Stock Exchange Group uses chaos engineering on AWS to improve resilience

This post from April 1 was not an April Fool’s joke! See how LSEG designed failure scenarios to test their resilience and observability.

Chaos engineering pattern for hybrid architecture (3-tier application)

Figure 8. Chaos engineering pattern for hybrid architecture (3-tier application)

#2 Achieving Frugal Architecture using the AWS Well-Architected Framework Guidance

Frugality AND Well-Architected? What a winning combo! This post, inspired by the 2023 re:Invent keynote, outlines the seven laws of Frugal Architecture.

Well-Architected logo

Figure 9. Well-Architected logo

#1 How an insurance company implements disaster recovery of 3-tier applications

And finally, our number one post of the year! Amit and Luiz showcase a customer solution with real-world applications that builds on the guidelines of other posts in this list! Well done!

The Pilot Light scenario for a 3-tier application that has application servers and a database deployed in two Regions

Figure 10. The Pilot Light scenario for a 3-tier application that has application servers and a database deployed in two Regions

Thank you!

As always, thanks to our contributors for their dedication and desire to share, and to you, our readers! We would be nothing with you. Literally.

For other top post lists, see our Top 10 and Top 5 posts from previous years.

Announcing updates to the AWS Well-Architected Framework guidance

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework-guidance-3/

We are excited to announce the availability of enhanced and expanded guidance for the AWS Well-Architected Framework with the following six pillars: Operational ExcellenceSecurityReliabilityPerformance EfficiencyCost Optimization, and Sustainability

This release includes new best practices and improved prescriptive implementation guidance for the existing best practices. This includes enhanced recommendations and steps on reusable architecture patterns focused on specific business outcomes.

A brief history

The Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads in the cloud.

2024 AWS Well-Architected guidance timeline

Figure 1. 2024 AWS Well-Architected guidance timeline

In 2012, we published the first version of the Framework. In 2015, we released the AWS Well-Architected Framework whitepaper. We added the Operational Excellence pillar in 2016. We released the pillar-specific whitepapers and AWS Well-Architected Lenses in 2017. The following year, the AWS Well-Architected Tool was launched.

In 2020, we released the new version of the Well-Architected Framework guidance, more lenses, and an API integration with the AWS Well-Architected Tool. We added the sixth pillar, Sustainability in 2021. In 2022, dedicated pages were introduced for each consolidated best practices across all six pillars, with several best practices updated with improved prescriptive guidance. By December 2023, we improved more than 75% of the Framework’s best practices. As of November 2024, we’ve refreshed 100% of the Framework’s best practices at least once since October 2022.

What’s new

The Well-Architected Framework supports customers as they mature in their cloud journey by providing guidance to help achieve more operable, secure, sustainable, scalable, and resilient environment and workload solutions.

The content updates and prescriptive guidance improvements in this release provide more complete coverage across AWS, helping customers make informed decisions when developing implementation plans. We added or expanded on guidance for the following services in this update: Amazon API Gateway, Amazon CloudFront, Amazon CloudWatch, Amazon CodeGuru, Amazon Cognito, Amazon GuardDuty, Amazon Inspector, Amazon Macie, Amazon Q Business, Amazon Q Developers, Amazon Redshift, Amazon S3, AWS Certificate Manager, AWS CloudFormation, AWS CloudTrail, AWS CodeBuild, AWS CodeDeploy, AWS CodePipeline, AWS Config, AWS Control Tower, AWS Customer Carbon Footprint Tool, AWS Glue, AWS Health, AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), AWS OpenSearch, AWS Organizations, AWS Resource Access Manager, AWS Secrets Manager, AWS Security Hub, AWS Step Functions, AWS Systems Manager, AWS Trusted Advisor, AWS Verified Access, and AWS WAF.

Pillar updates

Operational Excellence

In the Operational Excellence Pillar, we updated five best practices across four questions. This includes OPS02, OPS05, OPS09, and OPS10. The updates in this release include improved prescriptive guidance on multiple AWS services. OPS02-BP02 updates leverage Amazon Q Business for improving workforce collaboration and productivity. OPS05-BP08 updates demonstrate AWS Organizations and AWS Control Tower capabilities that enable updates to a multi-environment setup while meeting governance and policy requirements. OPS09-BP01 and OPS09-BP02 have updated guidance and resources for developing operational key performance indicators (KPIs). OPS10-BP02 has been updated with information on AWS Health, including its planned lifecycle events feature, for integrating into an incident management workflow.

Security

In the Security Pillar, we updated 43 best practices across nine questions. This includes SEC02, SEC03, SEC04, SEC06, SEC07, SEC08, SEC09, SEC10, and SEC11. All best practices in SEC03 (Permissions management) were revised, with updates to guidance on Attribute Based Access Control (ABAC), AWS IAM Access Analyzer, and emergency access processes. SEC02 (Identity management) also saw updates to all six of its best practices, including refinements to guidance on identity federation and secrets management. SEC07 through SEC11 received updates to guidance on data protection, incident response, and application security. Key updates include replacing the security information and event management SIEM solution on AWS OpenSearch recommendation with AWS CloudTrail Lake in SEC04 (Detection), expanded guidance on AWS S3 Object Lock and AWS S3 Glacier Vault Lock in SEC08 (Protecting data at rest), and the addition of recommendations for Mutual Transport Layer Security (mTLS) and private certificates in SEC09 (Protecting data in transit). Overall, these changes reflect AWS’s commitment to providing up-to-date, comprehensive security guidance in line with evolving best practices and new service capabilities.

Reliability

In the Reliability Pillar, we updated 14 best practices across nine questions. This includes REL01, REL02, REL04, REL06, REL07, REL08, REL10, REL12, and REL13. We expanded and clarified our guidance throughout the Pillar and added detailed implementation steps to each best practice that did not previously have them. We refreshed our multi-location deployment guidance by merging REL10-BP02 into REL10-BP01, while improving the prescriptive guidance of this best practice with a new title of Deploy the workload to multiple locations. We updated our idempotent operations guidance in REL04-BP04 to provide detailed technical guidance for builders who wish to provide idempotent APIs and updated the title to Make mutating operations idempotent. We merged functional testing guidance by migrating the content previously published under REL12-BP03 to REL08-BP02 (Integrate functional testing as part of your deployment) and expanded our guidance on testing in CI/CD pipelines. We refreshed REL07-BP01 to emphasize infrastructure as code (IaC) as a cornerstone of automated resource management and scaling. We improved our guidance in REL06-BP02 on how to use system and application logs to improve workload observability. We also refreshed our links to relevant resources including documents, videos, and presentations.

Performance Efficiency

In the Performance Efficiency Pillar, we updated the resources of PERF03-BP04 with the latest services.

Sustainability

In the Sustainability Pillar, we updated 10 best practices across six questions. This includes SUS01, SUS03, SUS04, SUS05, and SUS06. Best practices SUS01-BP01, SUS03-BP02, SUS03-BP05, SUS04-BP03, SUS04-BP05, SUS04-BP06, SUS04-BP07, SUS04-BP08, SUS05-BP04, and SUS06-BP02 now offer improved prescriptive guidance. Additionally, we added a new best practice, SUS06-BP01 Communicate and cascade your sustainability goals, which highlights the critical role of the central IT team in cascading sustainability goals and objectives across the broader organization. By strategically leveraging the cloud, implementing resource-efficient practices, and employing sustainability-focused tools and analytics, IT teams can play a pivotal role in driving meaningful reductions in the organization’s environmental impact.

Conclusion

This release includes updates and improvements to the Framework guidance totaling 78 best practices. As of this release, we’ve updated 100% of the existing Framework best practices at least once since October 2022. With this release, we have refreshed 100% of all the pillars of the Framework including the Reliability Pillar, with 14 of its best practice updated for the first time since major Framework improvements started in 2022.

Updates in this release will be incorporated into the AWS Well-Architected Tool in future releases, which you can use to review your workloads, address important design considerations, and help you follow the AWS Well-Architected Framework guidance.

The content will be available in 11 languages: English, Spanish, French, German, Italian, Japanese, Korean, Indonesian, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Ready to get started? Review the updated AWS Well-Architected Framework Pillar best practices and pillar-specific whitepapers.

Have questions about some of the new best practices or most recent updates? Join our growing community on AWS re:Post.

Let’s Architect! Building multi-tenant SaaS systems

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-building-multi-tenant-saas-systems/

Software as a Service (SaaS) applications offer a transformative solution for businesses worldwide, delivering on-demand software solutions to a global audience. However, building a successful SaaS platform demands on meticulous architectural planning, especially given the inherent challenges of multi-tenancy. It’s also essential to ensure that each tenant’s data remains isolated and protected from unauthorized access and that multi-tenant systems are cost-optimized and can sustain the scaling of the SaaS business provider.

In this blog post, we will explore some of the key elements and best practices for designing and deploying secure and efficient SaaS systems on AWS.

Building cost-optimized multi-tenant SaaS architectures

Cost is a key factor to consider when we design new systems. Multi-tenancy requires teams to think beyond the basics of auto scaling, adopting strategies to allow their architecture to support a complex cost-scaling challenges. In this session, the speaker covers some design patterns for distributed systems to support the continually evolving scale needs of the environment, while optimizing the cost of the infrastructure.

The architectural model chosen for deploying multi-tenant systems—pooled, siloed, or mixed—significantly influences the cost optimization strategy. Each approach offers distinct trade-offs in terms of resource allocation, scalability, and cost efficiency.

Figure 1. The architectural model chosen for deploying multi-tenant systems—pooled, siloed, or mixed—significantly influences the cost-optimization strategy. Each approach offers distinct trade-offs in terms of resource allocation, scalability, and cost efficiency.

Take me to this video

Well-Architected SaaS Lens

The SaaS Lens for the AWS Well-Architected Framework empowers customers to assess and enhance their cloud-based architectures, fostering a deeper understanding of the business implications of their design choices. By bringing together technical leadership and diverse teams to discuss strategies for improving various aspects of the system, the AWS Well-Architected Framework facilitates collaborative decision-making. Moreover, the AWS account team can provide valuable support in conducting these assessments, offering expert guidance and insights. The AWS SaaS Lens specifically focuses on how to design, deploy, and architect multi-tenant SaaS application workloads within the AWS Cloud.

The microservices running in a multi-tenant environment must be able to reference and apply tenant context within each service. At the same time, it’s also our goal to limit the degree to which developers need to introduce any tenant awareness into their code.

Figure 2. The microservices running in a multi-tenant environment must be able to reference and apply tenant context within each service. At the same time, it’s also our goal to limit the degree to which developers need to introduce any tenant awareness into their code.

Take me to this well-architected framework

SaaS anywhere: Designing distributed multi-tenant architectures

Not every SaaS provider has the luxury of running all the moving parts of their solution within their own infrastructure. SaaS teams might support a range of diverse system models, where architectures might include customer-hosted data, edge deployment for parts of the application, and on-premises components. In this session, you can learn the strategies to support the complexities of this distributed model without undermining the resilience, operational efficiency, and agility goals of your solution. The video covers how this influences the onboarding, deployment, and profile management of the SaaS environment.

In this architectural pattern, tenants are demanding to have the ML workload in their environment. So, the SaaS provider only manages the SaaS Control plane where tenants deploy the application plane in their environment, including the ML workload and the necessary components around it.

Figure 3. In this architectural pattern, tenants are demanding to have the ML workload in their environment. So, the SaaS provider only manages the SaaS control plane where tenants deploy the application plane in their environment, including the ML workload and the necessary components around it.

Take me to this video

Deploying multi-tenant SaaS applications on Amazon ECS and AWS Fargate

Containers are frequently employed in multi-tenant SaaS environments to enhance scalability, isolation, and resource efficiency. Developing such systems requires addressing multiple challenges, including tenant isolation, tenant on-boarding, tenant-specific metering, monitoring, and other factors related to multi-tenancy. This session explores how to effectively manage all of these aspects when deploying solutions on AWS Fargate.

Microservices architecture can enhance security isolation by dividing applications into smaller, independent services, reducing the potential impact of a breach.

Figure 4. Microservices architecture can enhance security isolation by dividing applications into smaller, independent services, reducing the potential impact of a breach.

Take me to this video

AWS Serverless SaaS Workshop

Serverless helps to create multi-tenant architectures thanks to services like AWS Lambda that isolate your business logic per request, making them the perfect companion to run a SaaS platform. This workshop provides a hands-on introduction to creating serverless multi-tenant SaaS applications, helping you get started and gain practical experience.

This is the high level architecture of the web application you will use in the AWS Serverless SaaS Workshop. In the labs, you will use this web application to add features that are needed to build this final SaaS application.

Figure 5. This is the high-level architecture of the web application you will use in the AWS Serverless SaaS Workshop. In the labs, you will use this web application to add features that are needed to build this final SaaS application.

Take me to this workshop

See you next time!

Thanks for reading! Multi-tenant SaaS architectures require a careful design of your system. In this post, you have discovered key elements for properly designing your next SaaS workloads. In the next blog, we will talk about modern data architectures.

To revisit any of our previous posts or explore the entire series, visit the Let’s Architect! page.

Achieving Frugal Architecture using the AWS Well-Architected Framework guidance

Post Syndicated from Ashley DeLoach original https://aws.amazon.com/blogs/architecture/achieving-frugal-architecture-using-the-aws-well-architected-framework-guidance/

As part of the re:Invent 2023 keynote, Dr. Werner Vogels introduced the Frugal Architect mindset. This mindset emphasizes the importance of continuous learning, curiosity, and regular revision of architectural choices with a focus on cost and sustainability. Cost and sustainability should be treated as critical non-functional requirements, alongside factors like security, compliance, and performance. The Frugal Architect approach involves measuring and optimizing cost at every stage of the development process, which allows for innovation in parallel with promoting responsible resource usage. In the rapidly-evolving technology landscape, builders should adopt the Frugal Architect mindset to balance innovation with cost efficiency and environmental sustainability.

This blog discusses how the six pillars of the AWS Well-Architected Framework (operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability) align with the seven Frugal Architect laws. It demonstrates how adhering to the principles and best practices outlined in these pillars can help architects and builders effectively implement the Frugal Architect laws in their projects. The Well-Architected Framework provides a comprehensive set of guidelines that embed the concepts of frugality, efficiency, and cost effectiveness, which are the core tenets of the Frugal Architect laws. By following the Framework’s pillars, architects can build secure, reliable, efficient, and cost-optimized systems and promote sustainability.

Make Cost a Non-functional Requirement (Law 1)

Non-functional requirements are criteria that evaluate a system’s operation instead of its specific features or functionality. This includes aspects like accessibility, availability, scalability, security, portability, maintainability, and compliance. However, one crucial non-functional requirement that is often overlooked is cost. Consider implications early on and throughout the design, development, and operation of your systems. Organizations can strike a balance between desired features, time-to-market, and operational efficiency through early prioritization of cost considerations. The Frugal Architect argues that you should treat cost as a fundamental non-functional requirement that should be given upfront consideration when planning and initiating system development projects.

The Cost Optimization Pillar of the AWS Well-Architected Framework provides guidance on how to optimize costs when using AWS Cloud services. It emphasizes treating cost as a key requirement, not an afterthought. The main principles focus on the importance of a robust financial management processes, adoption of a cloud consumption model that allows for flexible scaling and pay-per-use billing, continual measurement of outputs against costs to optimize efficiency, use of managed services to minimize operational overhead, and implementation of transparent cost attribution to tie cloud spending to revenue sources and workloads. Organizations that follow these practices can effectively manage and optimize their costs and benefit from the scalability and agility of cloud computing.

These cost optimization principles can help organizations maximize the financial benefits of using the AWS Cloud and avoid wasteful spending. Cost optimization is an ongoing process that includes rightsizing, higher output for the same cost, and use of the most cost-effective AWS services. The pillar promotes a disciplined approach to evaluate trade-offs between cost and other optimization areas like performance or reliability. Overall, you can use this pillar to make informed decisions to provision and operate AWS services cost-effectively.

Systems that Last Align Cost to Business (Law 2)

The durability and longevity of a system are closely tied to how well its costs align with the underlying business model. During the creation of a system, consider revenue sources and profit drivers. The key is to identify the primary dimension or aspect that generates revenue, and then verify that the system architecture supports and optimizes for that revenue-generating dimension. Essentially, revenue and profitability considerations should be the primary forces behind cost decisions in system design.

The AWS Well-Architected Cost Optimization Pillar provides practices and guidance for organizations to accurately monitor their AWS costs and usage. This visibility helps users understand the profitability of different business units and products, which facilitates informed decisions on resource allocation across the organization. Organizations can implement these practices to gain insights into their AWS spending patterns, which aids in development of effective cost optimization strategies. Overall, accurate expenditure analysis and attribution are crucial for organizations to optimize cloud costs, measure ROI, and make data-driven resource allocation decisions.

It’s important to accurately identify and attribute cloud costs to specific workloads. The cloud allows for transparent cost attribution, which helps organizations link costs to individual revenue streams and workload owners. This granular cost attribution data empowers workload owners to measure return on investment (ROI) for their workloads. With detailed cost information, workload owners can optimize resource utilization and reduce costs by rightsizing resources, eliminating waste, and making informed decisions. Organizations must use accurate cost attribution to understand where their cloud spending is going and verify that resources are being used efficiently across different workloads and revenue streams.

Architecting is a Series of Trade-Offs (Law 3)

Architectural decisions involve trade-offs, particularly between cost, resilience, and performance. Systems will inevitably fail, so investment in resilience is important but may impact performance. It’s important to find the right balance between technical requirements and business needs and align with risk tolerance and budget constraints. Frugality is about maximizing value, not just minimizing spend. Frugality means that you determine what you’re can pay for based on your priorities and make informed trade-off decisions. Ultimately, architectural choices require careful consideration of the tensions between different non-functional requirements.

The AWS Well-Architected Framework helps you make architectural trade-offs through its design principles and practices across its six pillars with your business requirements in mind. As you architect workloads, you make trade-offs between pillars based on your business context. You might optimize to improve the sustainability impact and reduce cost at the expense of reliability in development environments, or for mission-critical solutions. You might optimize reliability with increased costs and sustainability impact. In ecommerce solutions, performance can affect revenue and customer propensity to buy. Security generally is not a viable trade-off against the other pillars.

Rather than optimizing for any single pillar, the Framework guides a holistic evaluation across all pillars to determine the right architectural approach. Organizations can use AWS best practices while they find the optimal balance that aligns with their unique requirements. The key is making intentional trade-off decisions instead of following any uniform approach.

Unobserved Systems Lead to Unknown Costs (Law 4)

Without proper observation and measurement, the true operational costs of a system remain hidden, and wasteful practices can persist unnoticed. Just as exposing a utility meter prompts more mindful usage, visibility increases into costs can drive more sustainable behaviors. While implementing comprehensive monitoring requires upfront investment, the long-term benefits of conserving resources and optimizing efficiency make it a worthwhile endeavor. Ultimately, you should maintain cost awareness to foster a culture of responsible, sustainable practices.

The Operational Excellence Pillar of the AWS Well-Architected Framework emphasizes the importance of observability to gain actionable insights into workloads. This involves creation of key performance indicators (KPIs) and use of observability data telemetry to comprehensively understand workload behavior, performance, reliability, cost, and health. Organizations can implement observability best practices to make informed decisions and take prompt action when business outcomes are at risk due to issues with workload operation. Observability data provides visibility into the current state and helps identify areas for improvement. This means that organizations can be proactive in performance optimization, reliability enhancement, and cost reduction based on the actionable insights derived from observability telemetry data. Overall, observability is crucial for maintenance of operational excellence through the use of data-driven decision-making and continuous improvement of workloads.

Overall, monitoring guidance is a core component across multiple pillars of the Well-Architected Framework, as it helps organizations effectively manage and optimize their cloud workloads. For more detail on the monitoring principles of the AWS Well-Architected Framework, see Cost-Aware Architectures Implement Cost Controls (Law 5).

Cost-Aware Architectures Implement Cost Controls (Law 5)

The key aspects of frugal architecture combine granular controls with robust monitoring to identify areas for optimization. This helps you optimize costs and maintain a good user experience. With a robust monitoring system, you can take action where improvements are needed.

The AWS Well-Architected Framework aligns with the concept of frugality, which focuses on maximizing value rather than just minimizing spending. The Framework helps businesses achieve maximum value by making architectural choices that meet their specific requirements.

The Cost Optimization Pillar emphasizes the continual monitoring of usage and costs to identify opportunities for efficiency improvements and cost savings. This includes expenditure analysis, adoption of consumption-based models, and implementation of cloud financial management practices.

The Security Pillar, Reliability Pillar, and Performance Efficiency Pillar reinforce the importance of monitoring systems, workloads, and costs in real-time to maintain security, automatically recover from failures, and optimize performance relative to cost.

The Sustainability Pillar focuses on measurement of a workload’s current and forecasted environmental impact. It recommends continual evaluation of new hardware and software offerings that can reduce the environmental footprint.

Overall, monitoring guidance spans multiple Well-Architected pillars to maximize value through optimization of cost, performance, security, reliability, and sustainability.

Cost Optimization is Incremental (Law 6)

Cost efficiency is a continuous process, not a one-time goal. Regularly monitor your systems to identify inefficient patterns and areas for optimization. Revisit and refine systems periodically to find additional opportunities for improvement and further reduce costs over time.

The Cost Optimization Pillar covers principles like analysis and attribution of expenditure, measurement of overall efficiency, adoption of a consumption model, and implementation of cloud financial management practices.

Additionally, the Operational Excellence Pillar provides principles that apply not just to cost optimization but all pillars. These include observability for actionable insights, safe automation where possible, frequent small reversible changes, frequent refinement of operations procedures, anticipation of failure, and documentation and distribution of learning from operational events and metrics.

Organizations can follow these AWS Well-Architected Framework principles and their practices to continuously improve their cloud architectures and operations and optimize costs effectively.

Unchallenged Success Leads to Assumptions (Law 7)

We should continue to reevaluate past approaches, even those that were previously successful. Just because something worked before does not mean that it is still the best method. Grace Hopper, a computer scientist, mathematician, and United States Navy rear admiral, cautioned against blind adherence to tradition, saying that “we’ve always done it this way” is a dangerous mindset. We must be willing to question the old ways and explore new and potentially better methods.

The AWS Well-Architected Framework advocates for an evolutionary architecture approach to system design. Traditional architectures are often designed as static, with only a few major version updates during the system’s lifetime. However, as businesses and requirements change over time, initial architectural decisions can limit the ability to adapt and evolve the system. Cloud computing enables capabilities like automated testing and lower-risk design changes, which allows systems to evolve continually rather than being constrained by the original design. An evolutionary architecture positions businesses to take advantage of new innovations and changes as part of standard practice. Rather than being locked into original architectural choices, an evolutionary approach fosters ongoing adaptation and modernization as requirements shift. This contrasts with traditional fixed architectures that make it difficult to evolve over time and provides greater flexibility to evolve systems iteratively.

The Operational Excellence Pillar includes implementation of observability to understand system behavior, safe automation of processes, frequent but reversible changes, regular refinement of operations procedures, proactive anticipation potential failures proactively, and distribution of learnings from operational events and metrics to drive continuous improvement.

Overall, the Well-Architected Framework provides guidance on evolutionary architecture and operations processes to effectively manage increasing software complexity over time.

Conclusion

Frugality is about maximizing value, rather than just minimizing costs. Following AWS Well-Architected Framework best practices regarding security, reliability, and operational excellence can help realize frugal yet robust architectures. True frugality involves optimizing costs by aligning spending with areas that deliver the highest business value and impact. The Well-Architected Framework provides guidance for making architectural decisions that increase efficiency, lower risks, and maximize return on cloud investments. This involves determining priorities, understanding sources of value, and making informed trade-off decisions based on those priorities. It’s important to avoid indiscriminate cost-cutting and instead focus on resources on what matters most to drive value for the organization. By following Well-Architected best practices, companies can practice frugality in a strategic way that balances optimization with business goals.

Start your Frugal Architecture journey with AWS Well-Architected today by reading the documentation or visiting the AWS Well-Architected Tool in the console.

Introducing the Māori Data Lens for the Well-Architected Framework

Post Syndicated from Craig Hind original https://aws.amazon.com/blogs/architecture/introducing-the-maori-data-lens-for-the-well-architected-framework/

In Aotearoa New Zealand, we have been listening and learning to better understand Māori aspirations when using cloud technology. We have been learning from Māori customers, partners, and advisors who have helped us on this journey. A common theme was how to safeguard Māori data in a digital world. Together with a group of Māori advisers, we are excited to introduce the first iteration of a Māori Data Lens for the AWS Well-Architected Framework. This lens is the first of its kind for AWS globally that focuses on indigenous data, specifically Māori data considerations.

An AWS Well-Architected Framework lens is designed to provide a technology, industry, or domain specific perspective aligned with the AWS Well-Architected Framework. The Māori Data Lens allows customers to apply important Māori data considerations when designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. This lens is designed to be a living resource that can grow and adapt alongside the evolving questions and considerations Māori have about how to secure and protect their data as a taonga (treasure). We hope this lens will be valuable in empowering individuals and organisations to design, build, and operate applications and workloads in the AWS Cloud in ways that can align with Māori values and expectations across the six pillars of the Well-Architected Framework: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. These are not a rigid set of rules, but instead a set of guiding principles.

This lens is designed to complement invaluable te ao Māori knowledge and expertise. It’s important to consult, build trust, and reflect Māori voices in digital and technology choices. AWS customers can consult and partner with their Māori customers to build systems in a way that responsibly interact with their Māori data. This lens is a framework of practical questions and considerations. When combined with Māori knowledge and expertise, AWS customers can begin to use cloud technology in a way that empowers adherence to important cultural and ethical dimensions for safeguarding Māori data. As a starting point, we have sought to align the insights shared with us to our AWS best practices for architecting secure, reliable, and cost-effective applications in the AWS Cloud. Together, these guidelines support the durability and protection of Māori data.

At AWS, we have always believed it is essential that our customers have control over their data. We strive to give customers choice in how to secure and manage their data in the cloud in accordance with their needs. This has been true from the very beginning, when we were the only major cloud provider to allow our customers to control the geographic location of their data, never moving customer data without explicit instruction from the customer.

The launch of an AWS Local Zone in Auckland in 2023 and the coming AWS Region in Aotearoa gives all New Zealanders the choice to store their data onshore in Aotearoa New Zealand in the AWS Cloud without compromising on performance, innovation, scale, or security. We also know that some customers have needs that go beyond where their data is stored. We’re committed to expanding our understanding and our capability to help all customers meet their particular needs and best serve their own customers, to protect their data, and to meet legal and regulatory requirements.

Advice and feedback to date has been instrumental in helping to shape this resource, and we are deeply grateful for the ongoing partnership and insights. We recognise there are different perspectives, and that tikanga (protocol/practice) and experience among Māori on this topic continues to evolve. We welcome feedback on enhancing this resource to better serve the needs of our Māori customers and partners. To provide feedback, reach out to us using the feedback feature on the lens document or your local AWS account team.

We’d like to thank AWS partner HTK Group, as well as Māori technology and data experts who advised us on this work including Renata Hakiwai, Lee Timutimu, Nikora Ngaropo, Ngapera Riley, Wade Reweti, Atawhai Tibble and Eli Pohio. In their words:

“In this rapidly evolving digital landscape, the importance of understanding, organising, and harnessing data cannot be overstated. For our Māori communities, this holds even greater significance, as data can be a taonga or a treasure that represents the collective wisdom and knowledge passed down through generations.

Nā tō rourou, nā taku rourou, ka ora ai te iwi. With your food basket and my food basket, the people will thrive.

Mauri ora!”

Read the Māori Data Lens, or contact your AWS account team for more information.

Let’s Architect! Designing Well-Architected systems

Post Syndicated from Vittorio Denti original https://aws.amazon.com/blogs/architecture/lets-architect-well-architected-systems/

The design of cloud workloads can be a complex task, where a perfect and universal solution doesn’t exist. We should balance all the different trade-offs and find an optimal solution based on our context. But how does it work in practice? Which guiding principles should we follow? Which are the most important areas we should focus on?

In this blog, we will try to answer some of these questions by sharing a set of resources related to the AWS Well-Architected Framework. The Framework shares a set of methods to help you understand the pros and cons of decisions you make while building cloud systems. By following this resource, you will learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems in the cloud. The framework is constantly updated; it evolves as the technology landscape changes. Check out the latest updates from June 2024.

Build secure applications on AWS the Well-Architected way

The AWS Well-Architected Framework is constantly updated across all six pillars. The security pillar added a new best practice area: application security (AppSec). In this session, you can learn about the best practices highlighted in this area. Review four key domains: organization and culture, security of the pipeline, security in the pipeline, and dependency management. Each area provides a set of principles that you can implement and provides a complete view of how you design, develop, build, deploy, and operate secure workloads in the cloud.

Security should be part of the end-to-end development process, and implementing best practices both in the application code as well as in the underlying infrastructure components.

Figure 1. Security should be part of the end-to-end development process, and implementing best practices both in the application code as well as in the underlying infrastructure components.

Take me to this video

Announcing the AWS Well-Architected Mergers and Acquisitions Lens

How can we integrate different systems as a consequence of an acquisition? Mergers and acquisitions operations bring different people with different backgrounds together, with a need of driving systems convergence. Both organization and technical challenges can arise in this scenario. The Mergers and Acquisitions (M&A) Lens is a collection of customer-proven design principles, best practices, and prescriptive guidance to help you integrate the IT systems of two or more organizations. This lens helps companies follow AWS prescribed best practices during technical integration, drive cost optimization, and expedite merger and acquisition value realization.

If the seller company runs on another cloud platform or on-premises, the acquirer should plan a cloud migration while guaranteeing continuity of service.

Figure 2. If the seller company runs on another cloud platform or on-premises, the acquirer should plan a cloud migration while guaranteeing continuity of service.

Take me to this blog

AWS Well-Architected Labs

One of the best ways to become familiar with new concepts and methodologies consist of doing hands-on work to absorb the techniques properly. For each Let’s Architect! blog, we tend to share at least one workshop associated with the topic. The AWS Well-Architected Framework covers six different pillars, so today we share the AWS Well-Architected Labs to cover each area of the framework. Feel free to jump across the different workshops and start building!

Sustainability is one of the pillars in the framework. Asynchronous and scheduled processing are key techniques for improving the sustainability and costs of cloud architectures.

Figure 3. Sustainability is one of the pillars in the framework. Asynchronous and scheduled processing are key techniques for improving the sustainability and costs of cloud architectures.

Take me to this workshop

Gain confidence in system correctness and resilience with formal methods

Distributed systems are difficult to design. It’s even more difficult to test them and prove they are working. Formal methods enable the early discovery of design bugs that can escape the guardrails of design reviews and automated testing only to get uncovered in production. This video shows how AWS uses P, an open source, state machine–based programming language for formal modelling and analysis of distributed systems.

You can learn from AWS engineers and architects how to use P for your own applications to find bugs early in the development process and increase developer velocity. This tool is used in AWS to reason out the correctness of cloud services (for example, Amazon Simple Storage Service and Amazon DynamoDB).

An example of a distributed system for processing transactions.

Figure 4. An example of a distributed system for processing transactions.

Take me to this video

See you next time!

Thanks for reading! Hopefully, you got interesting insights into the methodologies for designing Well-Architected systems. In the next blog, we will talk about multi-region architectures. We will understand when they are actually needed, and which design principles should be applied.

To revisit any of our previous posts or explore the entire series, visit the Let’s Architect! page.

AWS Weekly Roundup: AI21 Labs’ Jamba-Instruct in Amazon Bedrock, Amazon WorkSpaces Pools, and more (July 1, 2024)

Post Syndicated from Esra Kayabali original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-ai21-labs-jamba-instruct-in-amazon-bedrock-amazon-workspaces-pools-and-more-july-1-2024/

AWS Summit New York is 10 days away, and I am very excited about the new announcements and more than 170 sessions. There will be A Night Out with AWS event after the summit for professionals from the media and entertainment, gaming, and sports industries who are existing Amazon Web Services (AWS) customers or have a keen interest in using AWS Cloud services for their business. You’ll have the opportunity to relax, collaborate, and build new connections with AWS leaders and industry peers.

Let’s look at the last week’s new announcements.

Last week’s launches
Here are the launches that got my attention.

AI21 Labs’ Jamba-Instruct now available in Amazon Bedrock – AI21 Labs’ Jamba-Instruct is an instruction-following large language model (LLM) for reliable commercial use, with the ability to understand context and subtext, complete tasks from natural language instructions, and ingest information from long documents or financial filings. With strong reasoning capabilities, Jamba-Instruct can break down complex problems, gather relevant information, and provide structured outputs to enable uses like Q&A on calls, summarizing documents, building chatbots, and more. For more information, visit AI21 Labs in Amazon Bedrock and the Amazon Bedrock User Guide.

Amazon WorkSpaces Pools, a new feature of Amazon WorkSpaces – You can now create a pool of non-persistent virtual desktops using Amazon WorkSpaces and save costs by sharing them across users who receive a fresh desktop each time they sign in. WorkSpaces Pools provides the flexibility to support shared environments like training labs and contact centers, and some user settings like bookmarks and files stored in a central storage repository such as Amazon Simple Storage Service (Amazon S3) or Amazon FSx can be saved for improved personalization. You can use AWS Auto Scaling to automatically scale the pool of virtual desktops based on usage metrics or schedules. For pricing information, refer to the Amazon WorkSpaces Pricing page.

API-driven, OpenLineage-compatible data lineage visualization in Amazon DataZone (preview)Amazon DataZone introduces a new data lineage feature that allows you to visualize how data moves from source to consumption across organizations. The service captures lineage events from OpenLineage-enabled systems or through API to trace data transformations. Data consumers can gain confidence in an asset’s origin, and producers can assess the impact of changes by understanding its consumption through the comprehensive lineage view. Additionally, Amazon DataZone versions lineage with each event to enable visualizing lineage at any point in time or comparing transformations across an asset or job’s history. To learn more, visit Amazon DataZone, read my News Blog post, and get started with data lineage documentation.

Knowledge Bases for Amazon Bedrock now offers observability logs – You can now monitor knowledge ingestion logs through Amazon CloudWatch, S3 buckets, or Amazon Data Firehose streams. This provides enhanced visibility into whether documents were successfully processed or encountered failures during ingestion. Having these comprehensive insights promptly ensures that you can efficiently determine when your documents are ready for use. For more details on these new capabilities, refer to the Knowledge Bases for Amazon Bedrock documentation.

Updates and expansion to the AWS Well-Architected Framework and Lens Catalog – We announced updates to the AWS Well-Architected Framework and Lens Catalog to provide expanded guidance and recommendations on architectural best practices for building secure and resilient cloud workloads. The updates reduce redundancies and enhance consistency in resources and framework structure. The Lens Catalog now includes the new Financial Services Industry Lens and updates to the Mergers and Acquisitions Lens. We also made important updates to the Change Enablement in the Cloud whitepaper. You can use the updated Well-Architected Framework and Lens Catalog to design cloud architectures optimized for your unique requirements by following current best practices.

Cross-account machine learning (ML) model sharing support in Amazon SageMaker Model RegistryAmazon SageMaker Model Registry now integrates with AWS Resource Access Manager (AWS RAM), allowing you to easily share ML models across AWS accounts. This helps data scientists, ML engineers, and governance officers access models in different accounts like development, staging, and production. You can share models in Amazon SageMaker Model Registry by specifying the model in the AWS RAM console and granting access to other accounts. This new feature is now available in all AWS Regions where SageMaker Model Registry is available except GovCloud Regions. To learn more, visit the Amazon SageMaker Developer Guide.

AWS CodeBuild supports Arm-based workloads using AWS Graviton3AWS CodeBuild now supports natively building and testing Arm workloads on AWS Graviton3 processors without additional configuration, providing up to 25% higher performance and 60% lower energy usage than previous Graviton processors. To learn more about CodeBuild’s support for Arm, visit our AWS CodeBuild User Guide.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

We launched existing services and instance types in additional Regions:

Other AWS news
Here are some additional news items that you might find interesting:

Top reasons to build and scale generative AI applications on Amazon Bedrock – Check out Jeff Barr’s video, where he discusses why our customers are choosing Amazon Bedrock to build and scale generative artificial intelligence (generative AI) applications that deliver fast value and business growth. Amazon Bedrock is becoming a preferred platform for building and scaling generative AI due to its features, innovation, availability, and security. Leading organizations across diverse sectors use Amazon Bedrock to speed their generative AI work, like creating intelligent virtual assistants, creative design solutions, document processing systems, and a lot more.

Four ways AWS is engineering infrastructure to power generative AI – We continue to optimize our infrastructure to support generative AI at scale through innovations like delivering low-latency, large-scale networking to enable faster model training, continuously improving data center energy efficiency, prioritizing security throughout our infrastructure design, and developing custom AI chips like AWS Trainium to increase computing performance while lowering costs and energy usage. Read the new blog post about how AWS is engineering infrastructure for generative AI.

AWS re:Inforce 2024 re:Cap – It’s been 2 weeks since AWS re:Inforce 2024, our annual cloud-security learning event. Check out the summary of the event prepared by Wojtek.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. To learn more about future AWS Summit events, visit the AWS Summit page. Register in your nearest city: New York (July 10), Bogotá (July 18), and Taipei (July 23–24).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are in Cameroon (July 13), Aotearoa (August 15), and Nigeria (August 24).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Esra

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Announcing updates to the AWS Well-Architected Framework guidance

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework-guidance-2/

We are excited to announce the availability of an enhanced AWS Well-Architected Framework. In this update, you’ll find expanded guidance across all six pillars of the Framework: Operational ExcellenceSecurityReliabilityPerformance EfficiencyCost Optimization, and Sustainability.

In this release, we updated the implementation guidance for the new and existing best practices to be more prescriptive. This includes enhanced recommendations and steps on reusable architecture patterns focused on specific business outcomes.

A brief history

The Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads in the cloud.

2024 AWS Well-Architected guidance timeline

Figure 1. 2024 AWS Well-Architected guidance timeline

In 2012, we published the first version of the Framework. In 2015, we released the AWS Well-Architected Framework whitepaper. We added the Operational Excellence pillar in 2016. We released pillar-specific whitepapers and AWS Well-Architected Lenses in 2017. The following year, the AWS Well-Architected Tool was launched.

In 2020, we released the new version of the Well-Architected Framework guidance, more lenses, and an API integration with the AWS Well-Architected Tool. We added the sixth pillar, Sustainability, in 2021. In 2022, dedicated HTML pages were introduced for each consolidated best practice across all six pillars, with several best practices updated with improved prescriptive guidance. By December 2023, we improved more than 75% of the Framework’s best practices. As of June 2024, more than 95% of the Framework’s best practices have been refreshed at least once.

What’s new

The Well-Architected Framework supports customers as they mature in their cloud journey by providing guidance to help achieve accurate business, environment, and workload solutions. Well-Architected is committed to providing such information to customers by continually evolving and updating our guidance.

The content updates and prescriptive guidance improvements in this release provide more complete coverage across AWS, helping customers make informed decisions when developing implementation plans. We added or expanded on guidance for the following services in this update: Amazon Chime, Amazon CloudWatch, Amazon CodeGuru Security, Amazon CodeWhisperer, Amazon Devops Guru, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon ElastiCache, Amazon EventBridge, Amazon GuardDuty, Amazon Q, Amazon Route 53, Amazon Security Lake, Amazon Simple Notification Service (Amazon SNS), AWS Billing and Cost Management, AWS Budgets, AWS Compute Optimizer, AWS Config, AWS Control Tower, AWS Cost Optimization Hub, AWS Customer Carbon Footprint Tool, AWS Data Exports, AWS Data Lifecycle Manager, AWS Elastic Disaster Recovery, AWS Fault Injection Service, AWS Global Accelerator, AWS Health, AWS Local Zones, AWS Organizations, AWS Outposts, AWS Resilience Hub, AWS Security Hub, AWS Systems Manager, AWS Trusted Advisor, and AWS Wickr.

Pillar updates

Operational Excellence

In the Operational Excellence Pillar, we updated 30 best practices across six questions. This includes OPS01, OPS02, OPS03, OPS07, OPS10, and OPS11. This update includes a refreshed structure and improved prescriptive guidance with updates on observability, generative AI capabilities, operating models, and the evolution of operational practices.

As part of this update, we consolidated four best practices into two (OPS01-BP07 merged into OPS01-BP06, OPS03-BP08 merged into OPS03-BP04) and changed the titles of seven best practices. Additionally, we added one new design principal to highlight the importance of aligning operating models to business outcomes and reordered design principles according to their priority from foundational to specialized. We updated three design principles and changed the title of one design principle. We’ve also updated the operating model guidance section of the pillar to be more prescriptive, showcasing pathways to evolving operating models.

The implementation guidance in best practices includes guidance on implementing generative AI capabilities with Amazon Q (Q Developer, Q Business, Q in QuickSight), the latest capabilities from Amazon CloudWatch Network Monitor, Amazon CloudWatch Internet Monitor, Amazon CloudWatch Logs, Amazon CloudWatch best practice alarms, cross-account observability, log-based alarms, log data protection, and AWS Health.

Security

In the Security Pillar, we updated 28 best practices across 10 questions. This includes SEC01, SEC02, SEC03, SEC04, SEC05, SEC06, SEC07, SEC08, SEC09, and SEC10. Best practice updates include removing duplication, clarifying desired outcomes, and providing robust prescriptive implementation guidance. As part of this update, we merged SEC01-BP05 into SEC01-BP04. We deleted two practices, SEC08-BP05 and SEC09-BP03, to remove the duplication of guidance covered across other existing practices. We updated the titles for 14 practices and changed the order of nine practices to improve clarity and flow.

Reliability

In the Reliability Pillar, we updated 11 best practices across six questions. This includes REL02, REL04, REL05, REL06, REL07, and REL08, with three best practices changing titles including REL04-BP01, REL05-BP06, and REL06-BP05. We improved resources available in best practices to include more recent blog posts, technical talks, and presentations. We also improved the prescriptive guidance by expanding on implementation steps. New services and service features added to the best practices guidance for AWS Resilience Hub, Amazon Route 53, Amazon Route53 Application Recovery Controller, AWS Fault Injection Service, and Amazon CloudWatch Synthetics.

Performance Efficiency

In the Performance Efficiency Pillar, we updated nine best practices across three questions. This includes PERF01, PERF03, and PERF05. We improved the prescriptive guidance on these best practices and added pillar-specific guidance on services including Amazon Devops Guru and Amazon ElastiCache Serverless. We’ve updated the resources section of all best practices with new and relevant resources.

Cost Optimization

In the Cost Optimization Pillar, we updated eight best practices across five questions. This includes COST01, COST02, COST03, COST05, and COST11. One new best practice added in COST06 highlights the benefits of using shared resources for organizational cost optimization. The improved best practices include guidance on AWS services and features including the AWS Cost Optimization Hub, AWS Billing and Cost Management features, and AWS Data Exports. These updates also cover sample key performance indicators (KPIs) for tracking optimization efforts, elaborate on the use of cost allocation tags, and discuss the split cost allocation for Amazon EKS and Amazon ECS to separate costs of containerized workloads. Additionally, the updates offer improved prescriptive and clear guidance on budgeting and forecasting. Finally, you’ll find guidance on using automations to reduce costs.

Sustainability

In the Sustainability Pillar, we updated 18 best practices across five questions. This includes SUS01, SUS02, SUS03, SUS04, SUS05, and SUS06. We improved the prescriptive guidance on these best practices, and added Pillar-specific guidance on services, including AWS Local Zones, AWS Outposts, Amazon Chime, AWS Wickr, Amazon CodeWhisperer, and AWS Customer Carbon Footprint Tool. We’ve expanded lists of resources across all best practices with new and relevant resources.

Conclusion

This release includes updates and improvements to the Framework guidance totaling 105 best practices. As of this release, we’ve updated 95% of the existing Framework best practices at least once since October 2022. With this release, we have refreshed 100% of the Operational Excellence, Security, Performance Efficiency, Cost Optimization, and Sustainability Pillars, as well as 79% of Reliability Pillar best practices. Best practice updates in this release across Operational Excellence, Security, and Reliability (a total of 66) are first-time updates since major Framework improvements started in 2022.

The content is available in 11 languages: English, Spanish, French, German, Italian, Japanese, Korean, Indonesian, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Updates in this release are also available in the AWS Well-Architected Tool, which you can use to review your workloads, address important design considerations, and help you follow the AWS Well-Architected Framework guidance.

Ready to get started? Review the updated AWS Well-Architected Framework Pillar best practices and pillar-specific whitepapers.

Have questions about some of the new best practices or most recent updates? Join our growing community on AWS re:Post.

Top Architecture Blog Posts of 2023

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2023/

2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.

As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.

I give you our 2023 cream of the crop!

#10: Build a serverless retail solution for endless aisle on AWS

In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.

Building endless aisle architecture for order processing

Figure 1. Building endless aisle architecture for order processing

Check it out!

#9: Optimizing data with automated intelligent document processing solutions

Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.

AI-based intelligent document processing engine

Figure 2: AI-based intelligent document processing engine

Check it out!

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.

Warm standby with managed services

Figure 3. Warm standby with managed services

Check it out!

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.

Architecture flow for Microservices to simulate a realistic failure scenario

Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario

Check it out!

#6: Let’s Architect! Designing event-driven architectures

Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.

Let's Architect

Figure 5. Let’s Architect artwork

Check it out!

#5: Use a reusable ETL framework in your AWS lake house architecture

As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.

Reusable ETL framework architecture

Figure 6. Reusable ETL framework architecture

Check it out!

#4: Invoking asynchronous external APIs with AWS Step Functions

It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.

Invoking Asynchronous External APIs architecture

Figure 7. Invoking Asynchronous External APIs architecture

Check it out!

#3: Announcing updates to the AWS Well-Architected Framework

The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!

Well-Architected logo

Figure 8. Well-Architected logo

Check it out!

#2: Let’s Architect! Designing architectures for multi-tenancy

The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.

Let's Architect

Figure 9. Let’s Architect

Check it out!

And finally…

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!

Resilience patterns and trade-offs

Figure 10. Resilience patterns and trade-offs

Check it out!

Bonus! Three older special mentions

These three posts were published before 2023, but we think they deserve another round of applause because you, our readers, keep coming back to them.

Thanks again to everyone for their contributions during a wild year. We hope you’re looking forward to the rest of 2024 as much as we are!

Announcing the AWS Well-Architected Framework DevOps Guidance

Post Syndicated from Michael Rhyndress original https://aws.amazon.com/blogs/devops/announcing-the-aws-well-architected-framework-devops-guidance/

Today, Amazon Web Services (AWS) announced the launch of the AWS Well-Architected Framework DevOps Guidance. The AWS DevOps Guidance introduces the AWS DevOps Sagas—a collection of modern capabilities that together form a comprehensive approach to designing, developing, securing, and efficiently operating software at cloud scale. Taking the learnings from Amazon’s own transformation journey and our experience managing global cloud services, the AWS DevOps Guidance was built to equip organizations of all sizes with best practice culture, processes, and technical capabilities that help to deliver business value and applications more securely and at a higher velocity.

A Glimpse into Amazon’s DevOps Transformation

In the early 2000s, Amazon went through its own DevOps transformation which led to an online bookstore forming the AWS cloud computing division. Today, AWS provides a wide range of products and services for global customers that are powered by that same innovative DevOps approach. Due to the positive effects of this transformation, AWS recognizes the significance of DevOps and has been at the forefront of its adoption and implementation.

Amazon’s own journey, along with the collective experience gained from assisting customers as they modernize and migrate to the cloud, provided insight into the capabilities which we believe make DevOps adoption successful. With these learnings, we created the DevOps Sagas to help our customers sustainably adopt and practice DevOps through the implementation of an interconnected set of capabilities. Each DevOps Saga includes prescriptive guidance for capabilities that provide indicators of success, metrics to measure, and common anti-patterns to avoid.

Introducing The DevOps Sagas

The DevOps Sagas are core domains within the software delivery process that collectively form AWS DevOps best practices. Together, they encompass a collection of modern capabilities representing a comprehensive approach to designing, developing, securing, and efficiently operating software at cloud scale. You can use the DevOps Sagas as a common definition of what DevOps means to your organization by aligning on a shared understanding within your organization and to consistently measure DevOps adoption over time. The 5 DevOps Sagas are:

  • Organizational Adoption Saga: Inspires the formation of a customer-centric, adaptive culture focused on optimizing people-driven processes, personal and professional development, and improving developer experience to set the foundation for successful DevOps adoption.
  • Development Lifecycle Saga: Aims to enhance the organization’s capacity to develop, review, and deploy workloads swiftly and securely. It leverages feedback loops, consistent deployment methods, and an ‘everything-as-code’ approach to attain efficiency in deployment.
  • Quality Assurance Saga: Advocates for a proactive, test-first methodology integrated into the development process to ensure that applications are well-architected by design, secure, cost-efficient, sustainable, and delivered with increased agility through automation.
  • Automated Governance Saga: Facilitates directive, detective, preventive, and responsive measures at all stages of the development process. It emphasizes risk management, business process adherence, and application and infrastructure compliance at scale through automated processes, policies, and guardrails.
  • Observability Saga: Presents an approach to incorporating observability within environment and workloads, allowing teams to detect and address issues, improve performance, reduce costs, and ensure alignment with business objectives and customer needs.

DevOps Sagas display image defining the definition of Sagas, Capabilities, Indicators, Metrics, and Anti-Patterns. AWS DevOps Sagas provides foundational DevOps capabilities, indicators, and metrics aligned to AWS best practices. Sagas are core domains that collectively form AWS DevOps best practices. Capabilities are individual practices with differentiated outcomes that form a Saga. Indicators objectively measure qualities of each capability. Metrics quantify and measure proficiency of each capability. Anti-patterns avoid behaviors that may seem beneficial but lead to inefficient outcomes.

Who should use the AWS DevOps Guidance?

We recognize that every organization is unique and that there is no one-size-fits-all approach to practicing DevOps. The recommendations and examples provided can be tailored to suit your organization’s environment, quality, and security needs. The AWS DevOps Guidance is designed for a wide range of professionals and organizations, including startups exploring DevOps for the first time, established enterprises refining their processes, public sector companies, cloud-native businesses, and customers migrating to the AWS Cloud. Whether you are steering strategic direction as a Chief Technology Officer (CTO) or Chief Information Security Officer (CISO), a developer or architect actively engaged in designing and deploying workloads, or in a compliance role overseeing quality assurance, auditing, or governance, this guidance is tailored to help you.

Next Steps

With the release of the AWS DevOps Guidance, we encourage you, our customers, to download and read the document, as well as implement and test your workloads in accordance with the recommendations within. Use the AWS DevOps Guidance in tandem with the AWS Well-Architected Framework to conduct an assessment of your organization and individual workload’s adherence to DevOps best practices to pinpoint areas of strength and opportunities for improvement. Collaborate with your teams – from developers to operations and decision-makers – to share insights from your assessment. Use the insights gained from the AWS DevOps Guidance to prioritize areas of improvement and iteratively improve your DevOps capabilities.

Find the AWS DevOps Guidance on the AWS Well-Architected website or contact your AWS account team for more information. As with the AWS Well-Architected Framework and other industry and technology guidance, we recommend leveraging the AWS DevOps Guidance early and often – as you approach architectural and service design decisions, and whenever you carry out Well-Architected reviews. As you use the AWS DevOps Guidance, we would appreciate your comments and feedback to help us improve as best practices and technology evolve. We will continually refresh the content as we identify new best practices, metrics, and common scenarios.

Announcing updates to the AWS Well-Architected Framework guidance

Post Syndicated from Haleh Najafzadeh original https://aws.amazon.com/blogs/architecture/announcing-updates-to-the-aws-well-architected-framework-guidance/

We are excited to announce the availability of improved AWS Well-Architected Framework guidance. In this update, we have made changes across all six pillars of the framework: Operational ExcellenceSecurityReliabilityPerformance EfficiencyCost Optimization, and Sustainability.

In this release, we have made the implementation guidance for the new and updated best practices more prescriptive, including enhanced recommendations and steps on reusable architecture patterns targeting specific business outcomes in the Amazon Web Services (AWS) Cloud.

A brief history

The Well-Architected Framework is a collection of best practices that allow customers to evaluate and improve the design, implementation, and operations of their workloads in the cloud.

In 2012, the first version of the framework was published, leading to the 2015 release of the guidance whitepaper. We added the Operational Excellence pillar in 2016. The pillar-specific whitepapers and AWS Well-Architected Lenses were released in 2017, and the following year, the AWS Well-Architected Tool was launched.

In 2020, Well-Architected Framework guidance had a new release, along with more lenses, as well as API integration with the AWS Well-Architected Tool. The sixth pillar, Sustainability, was added in 2021. In 2022, dedicated pages were introduced for each consolidated best practice across all six pillars, with several best practices updated with improved prescriptive guidance. By April 2023, more than 50% of the Framework’s best practices have had their prescriptive guidance improved.

A brief history of the AWS Well-Architected Framework

A brief history of the AWS Well-Architected Framework

What’s new

As customers mature in their journey, they are seeking guidance to achieve accurate solutions that is prescriptive to their business, environments, and workloads. AWS Well-Architected is committed to providing such information to customers by continually evolving and updating our guidance.

The content updates and improvements in this release focus on having more complete coverage across the AWS service portfolio, helping customers make more informed decisions when developing implementation plans. Services that were added or expanded in coverage include: AWS Elastic Disaster Recovery, AWS Trusted Advisor, AWS Resilience Hub, AWS Config, AWS Security Hub, Amazon GuardDuty, AWS Organizations, AWS Control Tower, AWS Compute Optimizer, AWS Budgets, Amazon CodeWhisperer, Amazon CodeGuru, Amazon EventBridge, Amazon CloudWatch, Amazon Simple Notification Service, AWS Systems Manager, Amazon ElastiCache, and AWS Global Accelerator.

Pillar updates

Operational Excellence

The Operational Excellence Pillar has received updates to two of the five Design Principles and has a new Design Principle on observability, which highlights its importance and relevance throughout the pillar content. All 10 best practices in OPS05 have been updated, and we have consolidated 28 best practices into 16, across four questions (OPS04, OPS06, OPS08, and OPS09), as well as improving prescriptive guidance.

Security

In the Security Pillar, the Incident response in SEC10 underwent an update to align with the AWS Security Incident Response Guide, while introducing one new best practice, and improving the prescriptive guidance for others. Two best practices in SEC08 and SEC09 have received improved prescriptive guidance on securing workloads at rest and in transit.

Reliability

The Reliability Pillar has received prescriptive guidance improvements to one best practice in REL06, and six best practices in REL11, focused on how to best monitor, failover, remediate, and limit impacts of failures. The update addresses a wide variety of managed services and designs, including multi-Region-based resilience.

Performance Efficiency

The Performance Efficiency Pillar has been completely restructured, consolidating and merging guidance to reduce the number of best practices by 10 and the number of questions by three. We have added best practices around efficient caching and optimizing hardware acceleration. We have also improved the implementation guidance in all 32 best practices of the newly restructured Pillar.

Cost Optimization

The Cost Optimization Pillar has 10 best practices with improved implementation prescriptive guidance.

Sustainability

The Sustainability Pillar has received updates to the risk levels of seven best practices.

Conclusion

This Well-Architected release includes updates and improvements to 90 best practices: Operational Excellence (26), Security (8), Reliability (7), Performance Efficiency (32), Cost Optimization (10), and Sustainability (7). These changes are in addition to the 151 improved best practices released in 2023 (127 in April 10, 2023; and 24 in July 13, 2023), resulting in more than 73% of the existing Framework best practices updated at least once in the last year.

As of this release, 100% of Performance Efficiency, Cost Optimization, and Sustainability; 63% of Operational Excellence; 60% of Security; and 50% of Reliability Pillar content have been refreshed at least once since October 2022.

The content is available in 11 languages: English, Spanish, French, German, Italian, Japanese, Korean, Indonesian, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Updates in this release are also available in the AWS Well-Architected Tool, which can be used to review your workloads, address important design considerations, and help ensure that you follow the best practices and guidance of the AWS Well-Architected Framework.

Ready to get started? Review the updated AWS Well-Architected Framework Pillar best practices, as well as pillar-specific whitepapers.

Have questions about some of the new best practices or most recent updates? Join our growing community on AWS re:Post.

Let’s Architect! Resiliency in architectures

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-resiliency-in-architectures/

What is “resiliency”, and why does it matter? When we discussed this topic in an early 2022 edition of Let’s Architect!, we referenced the AWS Well-Architected Framework, which defines resilience as having “the capability to recover when stressed by load, accidental or intentional attacks, and failure of any part in the workload’s components.” Businesses rely heavily on the availability and performance of their digital services. Resilience has emerged as critical for any efficiently architected system, which is why it is a fundamental role in ensuring the reliability and availability of workloads hosted on the AWS Cloud platform.

In this newer edition of Let’s Architect!, we share some best practices for putting together resilient architectures, focusing on providing continuous service and avoiding disruptions. Ensuring uninterrupted operations is likely a primary objective when it comes to building a resilient architecture.

Understand resiliency patterns and trade-offs to architect efficiently in the cloud

In this AWS Architecture Blog post, the authors introduce five resilience patterns. Each of these patterns comes with specific strengths and trade-offs, allowing architects to personalize their resilience strategies according to the unique requirements of their applications and business needs. By understanding these patterns and their implications, organizations can design resilient cloud architectures that deliver high availability and efficient recovery from potential disruptions.

Take me to this Architecture Blog post!

Resilience patterns and tradeoffs

Resilience patterns and tradeoffs

Timeouts, retries, and backoff with jitter

Marc Broker discusses the inevitability of failures and the importance of designing systems to withstand them. He highlights three essential tools for building resilience: timeouts, retries, and backoff. By embracing these three techniques, we can create robust systems that maintain high availability in the face of failures. Timeouts, backoff, and jitter are fundamental to spread the traffic coming from clients and avoid overloading your systems. Building resilience is a fundamental aspect of ensuring the reliability and performance of AWS services in the ever-changing and dynamic technological landscape.

Take me to the Amazon Builders’ Library!

The Amazon Builder’s Library is a collection of technical resources produced by engineers at Amazon

The Amazon Builder’s Library is a collection of technical resources produced by engineers at Amazon

Prepare & Protect Your Applications From Disruption With AWS Resilience Hub

The AWS Resilience Hub not only protects businesses from potential downtime risks but also helps them build a robust foundation for their applications, ensuring uninterrupted service delivery to customers and users.

In this AWS Online Tech Talk, led by the Principal Product Manager of AWS Resilience Hub, the importance of a resilience hub to protect mission-critical applications from downtime risks is emphasized. The AWS Resilience Hub is showcased as a centralized platform to define, validate, and track application resilience. The talk includes strategies to avoid disruptions caused by software, infrastructure, or operational issues, plus there’s also a demo demonstrating how to apply these techniques effectively.

If you are interested in delving deeper into the services discussed in the session, AWS Resilience Hub is a valuable resource for monitoring and implementing resilient architectures.

Take me to this AWS Online Tech Talk!

AWS Resilience Hub recommendations

AWS Resilience Hub recommendations

Data resiliency design patterns with AWS

In this re:Invent 2022 session, data resiliency, why it matters to customers, and how you can incorporate it into your application architecture is discussed in depth. This session kicks off with the comprehensive overview of data resiliency, breaking down its core components and illustrating its critical role in modern application development. It, then, covers application data resiliency and protection designs, plus extending from the native data resiliency capabilities of AWS storage through DR solutions using AWS Elastic Disaster Recovery.

Take me to this re:Invent 2022 video!

Asynchronous cross-region replication

Asynchronous cross-region replication

See you next time!

Thanks for joining our discussion on architecture resiliency! See you in two weeks when we’ll talk about security on AWS.

To find all the blogs from this series, visit the Let’s Architect! list of content on the AWS Architecture Blog.

Introducing the latest Machine Learning Lens for the AWS Well-Architected Framework

Post Syndicated from Raju Patil original https://aws.amazon.com/blogs/architecture/introducing-the-latest-machine-learning-lens-for-the-aws-well-architected-framework/

Today, we are delighted to introduce the latest version of the AWS Well-Architected Machine Learning (ML) Lens whitepaper. The AWS Well-Architected Framework provides architectural best practices for designing and operating ML workloads on AWS. It is based on six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and—a new addition to this revision—Sustainability. The ML Lens uses the Well-Architected Framework to outline the steps for performing an AWS Well-Architected review for your ML implementations.

The ML Lens provides a consistent approach for customers to evaluate ML architectures, implement scalable designs, and identify and mitigate technical risks. It covers common ML implementation scenarios and identifies key workload elements to allow you to architect your cloud-based applications and workloads according to the AWS best practices that we have gathered from supporting thousands of customer implementations.

The new ML Lens joins a collection of Well-Architected lenses that focus on specialized workloads such as the Internet of Things (IoT), games, SAP, financial services, and SaaS technologies. You can find more information in AWS Well-Architected Lenses.

What is the Machine Learning Lens?

Let’s explore the ML Lens across ML lifecycle phases, as the following figure depicts.

Machine Learning Lens

Figure 1. Machine Learning Lens

The Well-Architected ML Lens whitepaper focuses on the six pillars of the Well-Architected Framework across six phases of the ML lifecycle. The six phases are:

  1. Defining your business goal
  2. Framing your ML problem
  3. Preparing your data sources
  4. Building your ML model
  5. Entering your deployment phase
  6. Establishing the monitoring of your ML workload

Unlike the traditional waterfall approach, an iterative approach is required to achieve a working prototype based on the six phases of the ML lifecycle. The whitepaper provides you with a set of established cloud-agnostic best practices in the form of Well-Architected Pillars for each ML lifecycle phase. You can also use the Well-Architected ML Lens wherever you are on your cloud journey. You can choose either to apply this guidance during the design of your ML workloads, or after your workloads have entered production as a part of the continuous improvement process.

What’s new in the Machine Learning Lens?

  1. Sustainability Pillar: As building and running ML workloads becomes more complex and consumes more compute power, refining compute utilities and assessing your carbon footprint from these workloads grows to critical importance. The new pillar focuses on long-term environmental sustainability and presents design principles that can help you build ML architectures that maximize efficiency and reduce waste.
  2. Improved best practices and implementation guidance: Notably, enhanced guidance to identify and measure how ML will bring business value against ML operational cost to determine the return on investment (ROI).
  3. Updated guidance on new features and services: A set of updated ML features and services announced to-date have been incorporated into the ML Lens whitepaper. New additions include, but are not limited to, the ML governance features, the model hosting features, and the data preparation features. These and other improvements will make it easier for your development team to create a well-architected ML workloads in your enterprise.
  4. Updated links: Many documents, blogs, instructional and video links have been updated to reflect a host of new products, features, and current industry best practices to assist your ML development.

Who should use the Machine Learning Lens?

The Machine Learning Lens is of use to many roles, including:

  • Business leaders for a broader appreciation of the end-to-end implementation and benefits of ML
  • Data scientists to understand how the critical modeling aspects of ML fit in a wider context
  • Data engineers to help you use your enterprise’s data assets to their greatest potential through ML
  • ML engineers to implement ML prototypes into production workloads reliably, securely, and at scale
  • MLOps engineers to build and manage ML operation pipelines for faster time to market
  • Risk and compliance leaders to understand how the ML can be implemented responsibly by providing compliance with regulatory and governance requirements

Machine Learning Lens components

The Lens includes four focus areas:

1. The Well-Architected Machine Learning Design Principles

A set of best practices that are used as the basis for developing a Well-Architected ML workload.

2. The Machine Learning Lifecycle and the Well Architected Framework Pillars

This considers all aspects of the Machine Learning Lifecycle and reviews design strategies to align to pillars of the overall Well-Architected Framework.

  • The Machine Learning Lifecycle phases referenced in the ML Lens include:
    • Business goal identification – identification and prioritization of the business problem to be addressed, along with identifying the people, process, and technology changes that may be required to measure and deliver business value.
    • ML problem framing – translating the business problem into an analytical framing, i.e., characterizing the problem as an ML task, such as classification, regression, or clustering, and identifying the technical success metrics for the ML model.
    • Data processing – garnering and integrating datasets, along with necessary data transformations needed to produce a rich set of features.
    • Model development – iteratively training and tuning your model, and evaluating candidate solutions in terms of the success metrics as well as including wider considerations such as bias and explainability.
    • Model deployment – establishing the mechanism to flow data though the model in a production setting to make inferences based on production data.
    • Model monitoring – tracking the performance of the production model and the characteristics of the data used for inference.
  • The Well-Architected Framework Pillars are:
    • Operational Excellence – ability to support ongoing development, run operational workloads effectively, gain insight into your operations, and continuously improve supporting processes and procedures to deliver business value.
    • Security – ability to protect data, systems, and assets, and to take advantage of cloud technologies to improve your security.
    • Reliability – ability of a workload to perform its intended function correctly and consistently, and to automatically recover from failure situations.
    • Performance Efficiency – ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as system demand changes and technologies evolve.
    • Cost Optimization – ability to run systems to deliver business value at the lowest price point.
    • Sustainability – addresses the long-term environmental, economic, and societal impact of your business activities.

3. Cloud-agnostic best practices

These are best practices for each ML lifecycle phase across the Well-Architected Framework pillars irrespective of your technology setting. The best practices are accompanied by:

  • Implementation guidance – the AWS implementation plans for each best practice with references to AWS technologies and resources.
  • Resources – a set of links to AWS documents, blogs, videos, and code examples as supporting resources to the best practices and their implementation plans.

4. Indicative ML Lifecycle architecture diagrams to illustrate processes, technologies, and components that support many of these best practices.

What are the next steps?

The new Well-Architected Machine Learning Lens whitepaper is available now. Use the Lens whitepaper to determine that your ML workloads are architected with operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability in mind.

If you require support on the implementation or assessment of your Machine Learning workloads, please contact your AWS Solutions Architect or Account Representative.

Special thanks to everyone across the AWS Solution Architecture, AWS Professional Services, and Machine Learning communities, who contributed to the Lens. These contributions encompassed diverse perspectives, expertise, backgrounds, and experiences in developing the new AWS Well-Architected Machine Learning Lens.

Implementing AWS Well-Architected best practices for Amazon SQS – Part 3

Post Syndicated from Pascal Vogel original https://aws.amazon.com/blogs/compute/implementing-aws-well-architected-best-practices-for-amazon-sqs-part-3/

This blog is written by Chetan Makvana, Senior Solutions Architect and Hardik Vasa, Senior Solutions Architect.

This is the third part of a three-part blog post series that demonstrates best practices for Amazon Simple Queue Service (Amazon SQS) using the AWS Well-Architected Framework.

This blog post covers best practices using the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar. The inventory management example introduced in part 1 of the series will continue to serve as an example.

See also the other two parts of the series:

Performance Efficiency Pillar

The Performance Efficiency Pillar includes the ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve. It recommends best practices to use trade-offs to improve performance, such as learning about design patterns and services and identify how tradeoffs impact customers and efficiency.

By adopting these best practices, you can optimize the performance of SQS by employing appropriate configurations and techniques while considering trade-offs for the specific use case.

Best practice: Use action batching or horizontal scaling or both to increase throughput

For achieving high throughput in SQS, optimizing the performance of your message processing is crucial. You can use two techniques: horizontal scaling and action batching.

When dealing with high message volume, consider horizontally scaling the message producers and consumers by increasing the number of threads per client, by adding more clients, or both. By distributing the load across multiple threads or clients, you can handle a high number of messages concurrently.

Action batching distributes the latency of the batch action over the multiple messages in a batch request, rather than accepting the entire latency for a single message. Because each round trip carries more work, batch requests make more efficient use of threads and connections, improving throughput. You can combine batching with horizontal scaling to provide throughput with fewer threads, connections, and requests than individual message requests.

In the inventory management example that we introduced in part 1, this scaling behavior is managed by AWS for the AWS Lambda function responsible for backend processing. When a Lambda function subscribes to an SQS queue, Lambda polls the queue as it waits for the inventory updates requests to arrive. Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time. If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages.

This means that Lambda can scale up to 1,000 concurrent Lambda functions processing messages from the SQS queue. Batching enables the inventory management system to handle a high volume of inventory update messages efficiently. This ensures real-time visibility into inventory levels and enhances the accuracy and responsiveness of inventory management operations.

Best practice: Trade-off between SQS standard and First-In-First-Out (FIFO) queues

SQS supports two types of queues: standard queues and FIFO queues. Understanding the trade-offs between SQS standard and FIFO queues allows you to make an informed choice that aligns with your application’s requirements and priorities. While SQS standard queues support a nearly unlimited throughput, it sacrifices strict message ordering and occasionally delivers messages in an order different from the one they were sent in. If maintaining the exact order of events is not critical for your application, utilizing SQS standard queues can provide significant benefits in terms of throughput and scalability.

On the other hand, SQS FIFO queues guarantee message ordering and exactly-once processing. This makes them suitable for applications where maintaining the order of events is crucial, such as financial transactions or event-driven workflows. However, FIFO queues have a lower throughput compared to standard queues. They can handle up to 3,000 transactions per second (TPS) per API method with batching, and 300 TPS without batching. Consider using FIFO queues only when the order of events is important for the application, otherwise use standard queues.

In the inventory management example, since the order of inventory records is not crucial, the potential out-of-order message delivery that can occur with SQS standard queues is unlikely to impact the inventory processing. This allows you to take advantage of the benefits provided by SQS standard queues, including their ability to handle a high number of transactions per second.

Cost Optimization Pillar

The Cost Optimization Pillar includes the ability to run systems to deliver business value at the lowest price. It recommends best practices to build and operate cost-aware workloads that achieve business outcomes while minimizing costs and allowing your organization to maximize its return on investment.

Best practice: Configure cost allocation tags for SQS to organize and identify SQS for cost allocation

A well-defined tagging strategy plays a vital role in establishing accurate chargeback or showback models. By assigning appropriate tags to resources, such as SQS queues, you can precisely allocate costs to different teams or applications. This level of granularity ensures fair and transparent cost allocation, enabling better financial management and accountability.

In the inventory management example, tagging the SQS queue allows for specific cost tracking under the Inventory department, enabling a more accurate assessment of expenses. The following code snippet shows how to tag the SQS queue using AWS Could Development Kit (AWS CDK).

# Create the SQS queue with DLQ setting
queue = sqs.Queue(
    self,
    "InventoryUpdatesQueue",
    visibility_timeout=Duration.seconds(300),
)

Tags.of(queue).add("department", "inventory")

Best practice: Use long polling

SQS offers two methods for receiving messages from a queue: short polling and long polling. By default, queues use short polling, where the ReceiveMessage request queries a subset of servers to identify available messages. Even if the query found no messages, SQS sends the response right away.

In contrast, long polling queries all servers in the SQS infrastructure to check for available messages. SQS responds only after collecting at least one message, respecting the specified maximum. If no messages are immediately available, the request is held open until a message becomes available or the polling wait time expires. In such cases, an empty response is sent.

Short polling provides immediate responses, making it suitable for applications that require quick feedback or near-real-time processing. On the other hand, long polling is ideal when efficiency is prioritized over immediate feedback. It reduces API calls, minimizes network traffic, and improves resource utilization, leading to cost savings.

In the inventory management example, long polling enhances the efficiency of processing inventory updates. It collects and retrieves available inventory update messages in a batch of 10, reducing the frequency of API requests. This batching approach optimizes resource utilization, minimizes network traffic, and reduces excessive API consumption, resulting in cost savings. You can configure this behavior using batch size and batch window:

# Add the SQS queue as a trigger to the Lambda function
sqs_to_dynamodb_function.add_event_source_mapping(
    "MyQueueTrigger", event_source_arn=queue.queue_arn, batch_size=10
)

Best practice: Use batching

Batching messages together allows you to send or retrieve multiple messages in a single API call. This reduces the number of API requests required to process or retrieve messages compared to sending or retrieving messages individually. Since SQS pricing is based on the number of API requests, reducing the number of requests can lead to cost savings.

To send, receive, and delete messages, and to change the message visibility timeout for multiple messages with a single action, use Amazon SQS batch API actions. This also helps with transferring less data, effectively reducing the associated data transfer costs, especially if you have many messages.

In the context of the inventory management example, the CSV processing Lambda function groups 10 inventory records together in each API call, forming a batch. By doing so, the number of API requests is reduced by a factor of 10 compared to sending each record separately. This approach optimizes the utilization of API resources, streamlines message processing, and ultimately contributes to cost efficiency. Following is the code snippet from the CSV processing Lambda function showcasing the use of SendMessageBatch to send 10 messages with a single action.

# Parse the CSV records and send them to SQS as batch messages
csv_reader = csv.DictReader(csv_content.splitlines())
message_batch = []
for row in csv_reader:
    # Convert the row to JSON
    json_message = json.dumps(row)

    # Add the message to the batch
    message_batch.append(
        {"Id": str(len(message_batch) + 1), "MessageBody": json_message}
    )

    # Send the batch of messages when it reaches the maximum batch size (10 messages)
    if len(message_batch) == 10:
        sqs_client.send_message_batch(QueueUrl=queue_url, Entries=message_batch)
        message_batch = []
        print("Sent messages in batch")

Best practice: Use temporary queues

In case of short-lived, lightweight messaging with synchronous two-way communication, you can use temporary queues. The temporary queue makes it easy to create and delete many temporary messaging destinations without inflating your AWS bill. The key concept behind this is the virtual queue. Virtual queues let you multiplex many low-traffic queues onto a single SQS queue. Creating a virtual queue only instantiates a local buffer to hold messages for consumers as they arrive; there is no API call to SQS, and no costs associated with creating a virtual queue.

The inventory management example does not use temporary queues. However, in use cases that involve short-lived, lightweight messaging with synchronous two-way communication, adopting the best practice of using temporary queues and virtual queues can enhance the overall efficiency, reduce costs, and simplify the management of messaging destinations.

Sustainability Pillar

The Sustainability Pillar provides best practices to meet sustainability targets for your AWS workloads. It encompasses considerations related to energy efficiency and resource optimization.

Best practice: Use long polling

Besides its cost optimization benefits explained as part of the Cost Optimization Pillar, long polling also plays a crucial role in improving resource efficiency by reducing API requests, minimizing network traffic, and optimizing resource utilization.

By collecting and retrieving available messages in a batch, long polling reduces the frequency of API requests, resulting in improved resource utilization and minimized network traffic. By reducing excessive API consumption through long polling, you can effectively use resources. It collects and retrieves messages in batches, reducing excessive API consumption and unnecessary network traffic.

By reducing API calls, it optimizes data transfer and infrastructure operations. Additionally, long polling’s batching approach optimizes resource allocation, utilizing system resources more efficiently and improving energy efficiency. This enables the inventory management system to handle high message volumes effectively while operating in a cost-efficient and resource-efficient manner.

Conclusion

This blog post explores best practices for SQS using the Performance Efficiency Pillar, Cost Optimization Pillar, and Sustainability Pillar of the AWS Well-Architected Framework. We cover techniques such as batch processing, message batching, and scaling considerations. We also discuss important considerations, such as resource utilization, minimizing resource waste, and reducing cost.

This three-part blog post series covers a wide range of best practices, spanning the Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability Pillars of the AWS Well-Architected Framework. By following these guidelines and leveraging the power of the AWS Well-Architected Framework, you can build robust, secure, and efficient messaging systems using SQS.

For more serverless learning resources, visit Serverless Land.