Tag Archives: Creating a Multi-Region Application with AWS Services series

Creating a Multi-Region Application with AWS Services – Part 3, Application Management and Monitoring

Post Syndicated from Joe Chapman original https://aws.amazon.com/blogs/architecture/creating-a-multi-region-application-with-aws-services-part-3-application-management-and-monitoring/

In Part 1 of this series, we built a foundation for your multi-Region application using AWS compute, networking, and security services. In Part 2, we integrated AWS data and replication services to move and sync data between AWS Regions.

In Part 3, we cover AWS services and features used for messaging, deployment, monitoring, and management.

Developer tools

Automation that uses infrastructure as code (IaC) removes manual steps to create and configure infrastructure. It offers a repeatable template that can deploy consistent environments in different Regions.

IaC with AWS CloudFormation StackSets uses a single template to create, update, and delete stacks across multiple accounts and Regions in a single operation. When writing an AWS CloudFormation template, you can change the deployment behavior by pairing parameters with conditional logic. For example, you can set a “standby” parameter that, when “true,” limits the number of Amazon Elastic Compute Cloud (Amazon EC2) instances in an Amazon EC2 Auto Scaling group deployed to a standby Region.

Applications with deployments that span multiple Regions can use cross-Region actions in AWS CodePipeline for a consistent release pipeline. This way you won’t need to set up different actions in each Region. EC2 Image Builder and Amazon Elastic Container Registry (Amazon ECR) have cross-Region copy features to help with consistent AMI and image deployments, as covered in Part 1.

Event-driven architecture

Decoupled, event-driven applications produce a more extensible and maintainable architecture by having each component perform its specific task independently.

Amazon EventBridge, a serverless event bus, can send events between AWS resources. By utilizing cross-Region event routing, you can share events between workloads in different Regions (Figure 1) and accounts. For example, you can share health and utilization events across Regions to determine which Regional workload deployment is best suited for requests.

EventBridge routing events from one Region to event buses in other Regions

Figure 1. EventBridge routing events from one Region to event buses in other Regions

If your event-driven application relies on pub/sub messaging, Amazon Simple Notification Service (Amazon SNS) can fan out to multiple destinations. When the destination targets are Amazon Simple Queue Service (Amazon SQS) queues or AWS Lambda functions, Amazon SNS can notify recipients in different Regions. For example, you can send messages to a central SQS queue that processes orders for a multi-Region application.

Monitoring and observability

Observability becomes even more important as the number of resources and deployment locations increases. Being able to quickly identify the impact and root cause of an issue will influence recovery activities, and ensuring your observability stack is resilient to failures will help you make these decisions. When building on AWS, you can pair the health of AWS services with your application metrics to obtain a more complete view of the health of your infrastructure.

AWS Health dashboards and APIs show account-specific events and scheduled activities that may affect your resources. These events cover all Regions, and can expand to include all accounts in your AWS Organization. EventBridge can monitor events from AWS Health to take immediate actions based on an event. For example, if multiple services are reporting as degraded, you could set the EventBridge event target to an AWS Systems Manager automated runbook that prepares your disaster recovery (DR) application for failover.

AWS Trusted Advisor offers actionable alerts to optimize cost, increase performance, and improve security and fault tolerance. Trusted Advisor shows results across all Regions and can generate a report that shows an aggregated view of all check results across all accounts within an organization.

To maintain visibility over an application deployed across multiple Regions and accounts, you can create a Trusted Advisor dashboard and an operations dashboard with AWS Systems Manager Explorer. The operations dashboard offers a unified view of resources, such as Amazon EC2, Amazon CloudWatch, and AWS Config data. You can combine the metadata with Amazon Athena to create a multi-Region and multi-account inventory view of resources.

You can view metrics from applications and resources deployed across multiple Regions in the CloudWatch console. This makes it easy to create graphs and dashboards for multi-Region applications. Cross-account functionality is also available in CloudWatch, so you can create a centralized view of dashboards, alarms, and metrics across your organization.

Amazon OpenSearch Service aggregates unstructured and semi-structured log files, messages, metrics, documents, configuration data, and more. Cross-cluster replication replicates indices, mappings, and metadata in an active-passive setup from one OpenSearch Service domain to another. This reduces latency across Regions and ensures high availability of your data.

AWS Resilience Hub assesses and tracks the resiliency of your application. It checks how well an application will maintain availability when performing a Regional failover. For example, it can check if an application has cross-Region replication configured on Amazon Simple Storage Service (Amazon S3) buckets or that Amazon Relational Database Service (Amazon RDS) instances have a cross-Region read-replica. Figure 2 shows an output of a Resilience Hub assessment. It recommends use of Route 53 Application Recovery Controller (covered in Part 1) to ensure the Amazon EC2 Auto Scaling group in a Region is scaled and ready to accept traffic before we fail over to it.

Resilience Hub recommendations

Figure 2. Resilience Hub recommendations

Management: Governance

Growing an application into a new country means there may be additional data privacy laws and regulations to follow. These will vary depending on the country, and we encourage you to investigate with your legal team to fully understand how this affects your application.

AWS Control Tower supports data compliance by providing guardrails to control and meet data residency requirements. These guardrails are a collection of Service Control Policies (SCPs) and AWS Config rules. You can implement them independently of AWS Control Tower if needed. Additional security-centric multi-Region services are covered in part 1.

AWS Config provides a detailed view of the configuration and history of AWS resources. An AWS Config aggregator collects configuration and compliance data from multiple accounts and Regions into a central account. This centralized view offers a comprehensive view of the compliance and actions on resources, regardless of which account or Region they reside in.

Management: Operations

Several AWS Systems Manager capabilities allow for easier administration of AWS resources, especially as applications grow. Systems Manager Automation simplifies common maintenance and deployment tasks for AWS resources with automated runbooks. These runbooks automate actions on resources across Regions and accounts. You can pair Systems Manager Automation with Systems Manager Patch Manager to ensure instances maintain the latest patches across accounts and Regions. Figure 3 shows Systems Manager running several automation documents on a multi-Region architecture.

Using Systems Manager automation from a central operations AWS account to automate actions across multiple Regions

Figure 3. Using Systems Manager automation from a central operations AWS account to automate actions across multiple Regions

Bringing it together

At the end of each part of this blog series, we build on a sample application based on the services covered. This shows you how to bring these services together to build a multi-Region application with AWS services. We don’t use every service mentioned, just those that fit the use case.

We built this example to expand to a global audience. It requires high availability across Regions, and favors performance over strict consistency. We have chosen the following services covered in this post to accomplish our goals, building on our foundation from part 1 and part 2:

  • CloudFormation StackSets to deploy everything with IaC. This ensures the infrastructure is deployed consistently across Regions.
  • AWS Config rules provide a centralized place to monitor, record, and evaluate the configuration of our resources.
  • For added observability, we created dashboards with CloudWatch dashboard, Personal Health dashboard, and Trusted Advisor dashboard.
Building an application with multi-Region services

Figure 4. Building an application with multi-Region services

While our primary objective is expanding to a global audience, we note that some of the services such as CloudFormation StackSets rely on Region 1. Each Regional deployment is set up for static stability, but if there were an outage in Region 1 for an extended period of time, our DR playbook would outline how to make CloudFormation changes in Region 2.

Summary

Many AWS services have features to help you build and manage a multi-Region architecture, but identifying those capabilities across 200+ services can be overwhelming.

In this 3-part blog series, we’ve explored AWS services with features to assist you in building multi-Region applications. In Part 1, we built a foundation with AWS security, networking, and compute services. In Part 2, we added in data and replication strategies. Finally, in Part 3, we examined application and management layers.

Ready to get started? We’ve chosen some AWS Solutions, AWS Blogs, and Well-Architected labs to help you!

Other posts in this series

Related information

Creating a Multi-Region Application with AWS Services – Part 2, Data and Replication

Post Syndicated from Joe Chapman original https://aws.amazon.com/blogs/architecture/creating-a-multi-region-application-with-aws-services-part-2-data-and-replication/

In Part 1 of this blog series, we looked at how to use AWS compute, networking, and security services to create a foundation for a multi-Region application.

Data is at the center of many applications. In this post, Part 2, we will look at AWS data services that offer native features to help get your data where it needs to be.

In Part 3, we’ll look at AWS application management and monitoring services to help you build, monitor, and maintain a multi-Region application.

Considerations with replicating data

Data replication across the AWS network can happen quickly, but we are still limited by the speed of light. For this reason, data consistency must be considered when building a multi-Region application. Generally speaking, the longer a physical distance is, the longer it will take the data to get there.

When building a distributed system, consider the consistency, availability, partition tolerance (CAP) theorem. This theorem states that an application can only pick 2 out of the 3, and tradeoffs should be considered.

  • Consistency – all clients always have the same view of data
  • Availability – all clients can always read and write data
  • Partition Tolerance – the system will continue to work despite physical partitions

CAP diagram

Achieving consistency and availability is common for single-Region applications. For example, when an application connects to a single in-Region database. However, this becomes more difficult with multi-Region applications due to the latency added by transferring data over long distances. For this reason, highly distributed systems will typically follow an eventual consistency approach, favoring availability and partition tolerance.

Replicating objects and files

To ensure objects are in multiple Regions, Amazon Simple Storage Service (Amazon S3) can be set up to replicate objects across AWS Regions automatically with one-way or two-way replication. A subset of objects in an S3 bucket can be replicated with S3 replication rules. If low replication lag is critical, S3 Replication Time Control can help meet requirements by replicating 99.99% of objects within 15 minutes, and most within seconds. To monitor the replication status of objects, Amazon S3 events and metrics will track replication and can send an alert if there’s an issue.

Traditionally, each S3 bucket has its own single, Regional endpoint. To simplify connecting to and managing multiple endpoints, S3 Multi-Region Access Points create a single global endpoint spanning multiple S3 buckets in different Regions. When applications connect to this endpoint, it will route over the AWS network using AWS Global Accelerator to the bucket with the lowest latency. Failover routing is also automatically handled if the connectivity or availability to a bucket changes.

For files stored outside of Amazon S3, AWS DataSync simplifies, automates, and accelerates moving file data across Regions and accounts. It supports homogeneous and heterogeneous file migrations across Elastic File System (Amazon EFS), Amazon FSx, AWS Snowcone, and Amazon S3. It can even be used to sync on-premises files stored on NFS, SMB, HDFS, and self-managed object storage to AWS for hybrid architectures.

File and object replication should be expected to be eventually consistent. The rate at which a given dataset can transfer is a function of the amount of data, I/O bandwidth, network bandwidth, and network conditions.

Copying backups

Scheduled backups can be set up with AWS Backup, which automates backups of your data to meet business requirements. Backup plans can automate copying backups to one or more AWS Regions or accounts. A growing number of services are supported, and this can be especially useful for services that don’t offer real-time replication to another Region such as Amazon Elastic Block Store (Amazon EBS) and Amazon Neptune.

Figure 1 shows how these data transfer services can be combined for each resource.

Storage replication services

Figure 1. Storage replication services

Spanning non-relational databases across Regions

Amazon DynamoDB global tables provide multi-Region and multi-writer features to help you build global applications at scale. A DynamoDB global table is the only AWS managed offering that allows for multiple active writers in a multi-Region topology (active-active and multi-Region). This allows for applications to read and write in the Region closest to them, with changes automatically replicated to other Regions.

Global reads and fast recovery for Amazon DocumentDB (with MongoDB compatibility) can be achieved with global clusters. These clusters have a primary Region that handles write operations. Dedicated storage-based replication infrastructure enables low-latency global reads with a lag of typically less than one second.

Keeping in-memory caches warm with the same data across Regions can be critical to maintain application performance. Amazon ElastiCache for Redis offers global datastore to create a fully managed, fast, reliable, and secure cross-Region replica for Redis caches and databases. With global datastore, writes occurring in one Region can be read from up to two other cross-Region replica clusters – eliminating the need to write to multiple caches to keep them warm.

Spanning relational databases across Regions

For applications that require a relational data model, Amazon Aurora global database provides for scaling of database reads across Regions in Aurora PostgreSQL-compatible and MySQL-compatible editions. Dedicated replication infrastructure utilizes physical replication to achieve consistently low replication lag that outperforms the built-in logical replication database engines offer, as shown in Figure 2.

SysBench OLTP (write-only) stepped every 600 seconds on R4.16xlarge

Figure 2. SysBench OLTP (write-only) stepped every 600 seconds on R4.16xlarge

With Aurora global database, one primary Region is designated as the writer, and secondary Regions are dedicated to reads. Aurora MySQL supports write forwarding, which forwards write requests from a secondary Region to the primary Region to simplify logic in application code. Failover testing can happen by utilizing managed planned failover, which will change the active write cluster to another Region while keeping the replication topology intact. All databases discussed in this post employ eventual consistency when used across Regions, but Aurora PostgreSQL has an option to set the maximum a replica lag allowed with managed recovery point objective (managed RPO).

Logical replication, which utilizes a database engine’s built-in replication technology, can be set up for Amazon Relational Database Service (Amazon RDS) for MariaDB, MySQL, Oracle, PostgreSQL, and Aurora databases. A cross-Region read replica will receive these changes from the writer in the primary Region. For applications built on RDS for Microsoft SQL Server, cross-Region replication can be achieved by utilizing the AWS Database Migration Service. Cross-Region replicas allow for quicker local reads and can reduce data loss and recovery times in the case of a disaster by being promoted to a standalone instance.

For situations where a longer RPO and recovery time objective (RTO) are acceptable, backups can be copied across Regions. This is true for all of the relational and non-relational databases mentioned in this post, except for ElastiCache for Redis. Amazon Redshift can also automatically do this for your data warehouse. Backup copy times will vary depending on size and change rates.

A purpose-built database strategy offers many benefits, Figure 3 forms a purpose-built global database architecture.

Purpose-built global database architecture

Figure 3. Purpose-built global database architecture

Summary

Data is at the center of almost every application. In this post, we reviewed AWS services that offer cross-Region data replication to get your data where it needs to be quickly. Whether you need faster local reads, an active-active database, or simply need your data durably stored in a second Region, we have a solution for you. In the 3rd and final post of this series, we’ll cover application management and monitoring features.

Ready to get started? We’ve chosen some AWS Solutions, AWS Blogs, and Well-Architected labs to help you!

Related posts