Tag Archives: Architecture

Five Talent Collaborates with Customers Using the AWS Well-Architected Tool

Post Syndicated from Scott Sprinkel original https://aws.amazon.com/blogs/architecture/five-talent-collaborates-with-customers-using-the-aws-well-architected-tool/

Since its launch at re:Invent 2018, the AWS Well-Architected Tool (AWS W-A Tool) has provided a consistent process for documenting and measuring architecture workloads using the best practices from the AWS Well-Architected Framework. However, sharing workload reports for collaborative work experience was time consuming.

Well-Architected Tool

The new workload sharing feature solves these issues by offering a simple way to share workloads with other AWS accounts and AWS Identity and Account Management (IAM) users. Companies can leverage workload sharing to securely and efficiently collaborate and provide feedback about architecture implementation and design without sharing confidential account details through emails and PDFs. Multiple people across multiple organizations can now review a workload simultaneously and provide feedback and input.

Five Talent, a partner on the AWS Partner Network (APN) uses the workload sharing feature for a more collaborative Well-Architected review experience. With workload sharing, the company provides its clients with improved efficiency, transparency, and security.

“The new sharing feature has increased the efficiency across client and partner teams, which decreases the average time to remediate the high risks. By sharing the reviews in the AWS Console, we can protect sensitive customer data while staying informed in real time.” – Ryan Comingdeer, Chief Talent Officer, Five Talent.

Previously, Five Talent defined workloads in its AWS account and asked its clients to submit their workload information in a custom-built webform. Five Talent then generated and sent PDF reports via email. This was problematic for several reasons: Five Talent and its clients couldn’t control who had access to the report PDFs, it had no way to expedite high risk issue (HRI) remediation, and its recommendations could easily get lost in email correspondence. The workload sharing feature solves these problems and builds customer confidence through the ability for multiple people to work on reviews collaboratively.

Five Talent added extra security by customizing its workload sharing access controls based on its customer needs. The company is using the notes sections in the workload for secure and accurate communication. It shares links to documentation that can help clients take initiative and remediate HRIs — enabling quicker remediation and more transparent review cycles. Five Talent also highlights milestones in the AWS W-A Tool, enabling customers to prioritize HRIs without sorting through lengthy PDFs and email threads, which ultimately expedites the review and revision process.

The workload sharing feature has helped Five Talent drive down HRIs without requiring direct access to the AWS account where the workload is defined. This transparency and ability to work simultaneously helps keep all teams accountable while reinforcing the principles of the Well-Architected Framework.

Sign in to the Console and check out the new Shares tab in the AWS Well-Architected Tool, or visit the workload shares documentation to learn more.

Binge-Watch Live This is My Architecture Videos from AWS re:Invent

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/binge-watch-live-videos-from-aws-reinvent-2019/

AWS re:Invent 2019 was a whirlwind of activity, especially in the Expo Hall, where the AWS team spent four days filming 12 live This is My Architecture videos for Twitch. Watch one a day for the next two weeks…or eat them all in one sitting. Whichever you do, you’re guaranteed to learn something new.

Accolade

Discover security and operational excellence in healthcare with Accolade.

AWS Solution Builders

Get multi-region Availability with Amazon DynamoDB, Amazon S3, and Amazon Cognito.

EcoFit

EcoFit offers responsive, AWS Lambda-based microservices at scale.

Splunk

Splunk explains data at scale by decoupling compute and storage.

Crownpeak

Crownpeak uses AWS Lambda for its decoupled content deployment architecture.

Formula One Group

Learn how Formula One Group is using Amazon SageMaker to deliver real-time insights to fans.

Adobe

Adobe is simplifying networking across thousands of AWS accounts with AWS Transit Gateway.

The Trade Desk

The Trade Desk offers real-time ad bidding in the cloud with AWS Global Accelerator.

Mueller Water Products

Learn all about about scalable ingestion of sensor data for municipal water conservation with Mueller Water Products.

NextRoll

NextRoll is driving OpEx efficiency for ad bidding engines.

Pason Systems

Explore petabyte-scale drilling datamart on AWS with Pason Systems.

UltraServe

Application vending machine with runtime event control at UltraServe.

 

Be sure to visit the AWS channel on Twitch for more in-depth videos and interviews.

TMA Special: Connecting Taza Chocolate’s Legacy Equipment to the Cloud

Post Syndicated from Todd Escalona original https://aws.amazon.com/blogs/architecture/tma-special-connecting-taza-chocolates-legacy-equipment-to-the-cloud/

As a “bean to bar” chocolate manufacturer, Taza Chocolate uses traditional stone ground mills for the production of its famous chocolate discs. The analog, mid-century machines that the company imported from Central America were never built to connect to the cloud.

Along comes Tulip Interfaces, an AWS Industrial Software Competency Partner that makes the human and machine interaction easier by replacing paper processes with digital automation. Tulip retrofitted Taza’s legacy equipment with Internet of Things (IoT) sensors and connected it back to the AWS cloud.

Taza’s AWS cloud integration begins with Tulip’s own physical gateway that connects systems and machinery on the plant floor. Tulip then deploys IoT sensors to the machinery and passes outputs to the AWS cloud using an encrypted web socket where Tulip’s Kubernetes workers, managed by Kops, automatically schedule services across highly available instances and processes requests.

All job completion data is then fed to an Amazon RDS Multi-AZ PostgreSQL database that allows Taza to run visualizations and analytics for more insight using Prometheus and Garfana. In addition, all of the application definition metadata is contained in a MongoDB database service running on Amazon Elastic Cloud Compute (EC2) instances, which in return is VPC-peered with Kubernetes clusters. On top of this backend, Tulip uses a player application to stream metrics in near real-time that are displayed on the dashboard down on the shop floor and can be easily examined in order to help guide their operations and foster continuous improvements efforts to manufacturing operations.

Taza has realized many benefits from monitoring machine availability, performance, ambient conditions as well as overall process enhancements.

In this special, on-site This is My Architecture video, AWS Solutions Architect Evangelist Todd Escalona takes us on his journey through the Taza Chocolate factory where he meets with Taza’s Director of Manufacturing, Rich Moran, and Tulip’s DevOps lead, John Defreitas, to further explore how Tulip enables Taza Chocolate’s legacy equipment for cloud-based plant automation.

*Check out more This Is My Architecture video series.

Top 10 Architecture Blog Posts of 2019

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/top-10-architecture-blog-posts-of-2019/

As we wind our way toward 2020, I want to take a moment to first thank you, our readers, for spending time on our blog. We grew our audience quite a bit this year and the credit goes to our hard-working Solutions Architects and other blog post writers. Below are the top 10 Architecture blog posts written in 2019.

#10: How to Architect APIs for Scale and Security

by George Mao

George Mao, a Specialist Solutions Architect at AWS, focuses on serverless computing and has FIVE posts in the top ten this year. Way to go, George!

This post was the first in a series that focused on best practices and concepts you should be familiar with when you architect APIs for your applications.

Read George’s post.

#9: From One to Many: Evolving VPC Guidance

by Androski Spicer

Since its inception, the Amazon Virtual Private Cloud (VPC) has acted as the embodiment of security and privacy for customers who are looking to run their applications in a controlled, private, secure, and isolated environment.

This logically isolated space has evolved, and in its evolution has increased the avenues that customers can take to create and manage multi-tenant environments with multiple integration points for access to resources on-premises.

Read Androski’s post.

#8: Things to Consider When You Build REST APIs with Amazon API Gateway

by George Mao

REST API 2

This post dives deeper into the things an API architect or developer should consider when building REST APIs with Amazon API Gateway.

Read George’s post.

#7: How to Design Your Serverless Apps for Massive Scale

by George Mao

Serverless at scale-1

Serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations. In this series of posts, we’ll discuss topics that you should consider when designing your serverless architectures. First, we’ll look at architectural patterns designed to achieve massive scale with serverless.

Read George’s post.

#6: Best Practices for Developing on AWS Lambda

by George Mao

RDS instance: When to VPC enable a Lambda function

One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your AWS Lambda functions. Take advantage of this architecture with the tips in this post.

Read George’s post.

#5: Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis

by David Bailey

Figure 1 - Initial Landing Zone logging account resources

A key component of enterprise multi-account environments is logging. Centralized logging provides a single point of access to all salient logs generated across accounts and regions, and is critical for auditing, security and compliance. While some customers use the built-in ability to push Amazon CloudWatch Logs directly into Amazon Elasticsearch Service for analysis, others would prefer to move all logs into a centralized Amazon Simple Storage Service (Amazon S3) bucket location for access by several custom and third-party tools. In this blog post, David Bailey will show you how to forward existing and any new CloudWatch Logs log groups created in the future to a cross-account centralized logging Amazon S3 bucket.

Read David’s post.

#4: Updates to Serverless Architectural Patterns and Best Practices

by Drew Dennis

Drew wrote this post at about the halfway point between re:Invent 2018 and re:Invent 2019, where he revisited some of the recent serverless announcements we’ve made. These are all complimentary to the patterns discussed in the re:Invent architecture track’s Serverless Architectural Patterns and Best Practices session.

Read Drew’s post.

#3: Understanding the Different Ways to Invoke Lambda Functions

by George Mao

Invoking Lambda

In George’s first post of this series (#7 on this list), he talked about general design patterns to enable massive scale with serverless applications. In this post, he’ll review the different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Read George’s post.

#2: Using API Gateway as a Single Entry Point for Web Applications and API Microservices

by Anandprasanna Gaitonde and Mohit Malik

In this post, Anand and Mohit talk about a reference architecture that allows API Gateway to act as single entry point for external-facing, API-based microservices and web applications across multiple external customers by leveraging a different subdomain for each one.

Read Anand’s and Mohit’s post.

#1: 10 Things Serverless Architects Should Know

by Justin Pirtle

Building on the first three parts of the AWS Lambda scaling and best practices series where you learned how to design serverless apps for massive scale, AWS Lambda’s different invocation models, and best practices for developing with AWS Lambda, Justin invited you to take your serverless knowledge to the next level by reviewing 10 topics to deepen your serverless skills.

Read Justin’s post.

Thank You

Thanks again to all our readers and blog post writers. We look forward to learning and building amazing things together in the coming year.

Best of 2019

re:Invent 2019: Introducing the Amazon Builders’ Library (Part II)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-ii/

In last week’s post, I told you about a new site we introduced at re:Invent at the beginning of this month, the Amazon Builders’ Library, a site that’s chock-full articles by senior technical leaders that help you understand the underpinnings of both Amazon.com and AWS.

Below are four more architecture-based articles that describe how Amazon develops, architects, releases, and operates technology.

Caching Challenges and Strategies

Caching challenges

Over years of building services at Amazon we’ve experienced various versions of the following scenario: We build a new service, and this service needs to make some network calls to fulfill its requests. Perhaps these calls are to a relational database, or an AWS service like Amazon DynamoDB, or to another internal service. In simple tests or at low request rates the service works great, but we notice a problem on the horizon. The problem might be that calls to this other service are slow or that the database is expensive to scale out as call volume increases. We also notice that many requests are using the same downstream resource or the same query results, so we think that caching this data could be the answer to our problems. We add a cache and our service appears much improved. We observe that request latency is down, costs are reduced, and small downstream availability drops are smoothed over. After a while, no one can remember life before the cache. Dependencies reduce their fleet sizes accordingly, and the database is scaled down. Just when everything appears to be going well, the service could be poised for disaster. There could be changes in traffic patterns, failure of the cache fleet, or other unexpected circumstances that could lead to a cold or otherwise unavailable cache. This in turn could cause a surge of traffic to downstream services that can lead to outages both in our dependencies and in our service.

Read the full article by Matt Brinkley, Principal Engineer, and Jas Chhabra, Principal Engineer

Avoiding Fallback in Distributed Systems

Avoiding fallback

Critical failures prevent a service from producing useful results. For example, in an ecommerce website, if a database query for product information fails, the website cannot display the product page successfully. Amazon services must handle the majority of critical failures in order to be reliable.

This article covers fallback strategies and why we almost never use them at Amazon. You might find this surprising. After all, engineers often use the real world as a starting point for their designs. And in the real world, fallback strategies must be planned in advance and used when necessary. Let’s say an airport’s display boards go out. A contingency plan (such as humans writing flight information on whiteboards) must be in place to handle this situation, because passengers still need to find their gates. But consider how awful the contingency plan is: the difficulty of reading the whiteboards, the difficulty of keeping them up-to-date, and the risk that humans will add incorrect information. The whiteboard fallback strategy is necessary but it’s riddled with problems.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Leader Elections in Distributed Systems

Leader elections

Leader election is the simple idea of giving one thing (a process, host, thread, object, or human) in a distributed system some special powers. Those special powers could include the ability to assign work, the ability to modify a piece of data, or even the responsibility of handling all requests in the system.

Leader election is a powerful tool for improving efficiency, reducing coordination, simplifying architectures, and reducing operations. On the other hand, leader election can introduce new failure modes and scaling bottlenecks. In addition, leader election may make it more difficult for you to evaluate the correctness of a system.

Because of these complications, we carefully consider other options before implementing leader election. For data processing and workflows, workflow services like AWS Step Functions can achieve many of the same benefits as leader election and avoid many of its risks. For other systems, we often implement idempotent APIs, optimistic locking, and other patterns that make a single leader unnecessary.

In this article, I discuss some of the pros and cons of leader election in general and how Amazon approaches leader election in our distributed systems, including insights into leader failure.

Read the full article by Mark Brooker, Senior Principal Engineer

Workload Isolation Using Shuffle-Sharding

Shuffle-sharding

Not long after AWS began offering services, AWS customers made clear that they wanted to be able to use our Amazon Simple Storage Service (S3), Amazon CloudFront, and Elastic Load Balancing services at the “root” of their domain, that is, for names like “amazon.com” and not just for names like “www.amazon.com”.

That may seem very simple. However, due to a design decision in the DNS protocol, made back in the 1980s, it’s harder than it seems. DNS has a feature called CNAME that allows the owner of a domain to offload a part of their domain to another provider to host, but it doesn’t work at the root or top level of a domain. To serve our customers’ needs, we’d have to actually host our customers’ domains. When we host a customer’s domain, we can return whatever the current set of IP addresses are for Amazon S3, Amazon CloudFront, or Elastic Load Balancing. These services are constantly expanding and adding IP addresses, so it’s not something that customers could easily hard-code in their domain configurations either.

It’s no small task to host DNS. If DNS is having problems, an entire business can be offline. However, after we identified the need, we set out to solve it in the way that’s typical at Amazon—urgently. We carved out a small team of engineers, and we got to work

Read the full article by Colm MacCárthaigh, Senior Principal Engineer

Want to learn more about the Amazon Builders’ Library? Visit our FAQ.

Next week we’ll wrap up 2019 with a top ten list of the most-visited Architecture blog posts of 2019.

re:Invent 2019: Introducing the Amazon Builders’ Library (Part I)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-i/

Today, I’m going to tell you about a new site we launched at re:Invent, the Amazon Builders’ Library, a collection of living articles covering topics across architecture, software delivery, and operations. You get to peek under the hood of how Amazon architects, releases, and operates the software underpinning Amazon.com and AWS.

Want to know how Amazon.com does what it does? This is for you. In this two-part series (the next one coming December 23), I’ll highlight some of the best architecture articles written by Amazon’s senior technical leaders and engineers.

Avoiding insurmountable queue backlogs

Avoiding insurmountable queue backlogs

In queueing theory, the behavior of queues when they are short is relatively uninteresting. After all, when a queue is short, everyone is happy. It’s only when the queue is backlogged, when the line to an event goes out the door and around the corner, that people start thinking about throughput and prioritization.

In this article, I discuss strategies we use at Amazon to deal with queue backlog scenarios – design approaches we take to drain queues quickly and to prioritize workloads. Most importantly, I describe how to prevent queue backlogs from building up in the first place. In the first half, I describe scenarios that lead to backlogs, and in the second half, I describe many approaches used at Amazon to avoid backlogs or deal with them gracefully.

Read the full article by David Yanacek – Principal Engineer

Timeouts, retries, and backoff with jitter

Timeouts, retries and backoff with jitter

Whenever one service or system calls another, failures can happen. These failures can come from a variety of factors. They include servers, networks, load balancers, software, operating systems, or even mistakes from system operators. We design our systems to reduce the probability of failure, but impossible to build systems that never fail. So in Amazon, we design our systems to tolerate and reduce the probability of failure, and avoid magnifying a small percentage of failures into a complete outage. To build resilient systems, we employ three essential tools: timeouts, retries, and backoff.

Read the full article by Marc Brooker, Senior Principal Engineer

Challenges with distributed systems

Challenges with distributed systems

The moment we added our second server, distributed systems became the way of life at Amazon. When I started at Amazon in 1999, we had so few servers that we could give some of them recognizable names like “fishy” or “online-01”. However, even in 1999, distributed computing was not easy. Then as now, challenges with distributed systems involved latency, scaling, understanding networking APIs, marshalling and unmarshalling data, and the complexity of algorithms such as Paxos. As the systems quickly grew larger and more distributed, what had been theoretical edge cases turned into regular occurrences.

Developing distributed utility computing services, such as reliable long-distance telephone networks, or Amazon Web Services (AWS) services, is hard. Distributed computing is also weirder and less intuitive than other forms of computing because of two interrelated problems. Independent failures and nondeterminism cause the most impactful issues in distributed systems. In addition to the typical computing failures most engineers are used to, failures in distributed systems can occur in many other ways. What’s worse, it’s impossible always to know whether something failed.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Static stability using Availability Zones

Static stability using availability zones

At Amazon, the services we build must meet extremely high availability targets. This means that we need to think carefully about the dependencies that our systems take. We design our systems to stay resilient even when those dependencies are impaired. In this article, we’ll define a pattern that we use called static stability to achieve this level of resilience. We’ll show you how we apply this concept to Availability Zones, a key infrastructure building block in AWS and therefore a bedrock dependency on which all of our services are built.

Read the full article by Becky Weiss, Senior Principal Engineer, and Mike Furr, Principal Engineer

Check back in two weeks to read about some other architecture-based expert articles that let you in on how Amazon does what it does.

Decoupled Serverless Scheduler To Run HPC Applications At Scale on EC2

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/decoupled-serverless-scheduler-to-run-hpc-applications-at-scale-on-ec2/

This post is written by Ludvig Nordstrom and Mark Duffield | on November 27, 2019

In this blog post, we dive in to a cloud native approach for running HPC applications at scale on EC2 Spot Instances, using a decoupled serverless scheduler. This architecture is ideal for many workloads in the HPC and EDA industries, and can be used for any batch job workload.

At the end of this blog post, you will have two takeaways.

  1. A highly scalable environment that can run on hundreds of thousands of cores across EC2 Spot Instances.
  2. A fully serverless architecture for job orchestration.

We discuss deploying and running a pre-built serverless job scheduler that can run both Windows and Linux applications using any executable file format for your application. This environment provides high performance, scalability, cost efficiency, and fault tolerance. We introduce best practices and benefits to creating this environment, and cover the architecture, running jobs, and integration in to existing environments.

quick note about the term cloud native: we use the term loosely in this blog. Here, cloud native  means we use AWS Services (to include serverless and microservices) to build out our compute environment, instead of a traditional lift-and-shift method.

Let’s get started!

 

Solution overview

This blog goes over the deployment process, which leverages AWS CloudFormation. This allows you to use infrastructure as code to automatically build out your environment. There are two parts to the solution: the Serverless Scheduler and Resource Automation. Below are quick summaries of each part of the solutions.

Part 1 – The serverless scheduler

This first part of the blog builds out a serverless workflow to get jobs from SQS and run them across EC2 instances. The CloudFormation template being used for Part 1 is serverless-scheduler-app.template, and here is the Reference Architecture:

 

Serverless Scheduler Reference Architecture . Reference Architecture for Part 1. This architecture shows just the Serverless Schduler. Part 2 builds out the resource allocation architecture. Outlined Steps with detail from figure one

    Figure 1: Serverless Scheduler Reference Architecture (grayed-out area is covered in Part 2).

Read the GitHub Repo if you want to look at the Step Functions workflow contained in preceding images. The walkthrough explains how the serverless application retrieves and runs jobs on its worker, updates DynamoDB job monitoring table, and manages the worker for its lifetime.

 

Part 2 – Resource automation with serverless scheduler


This part of the solution relies on the serverless scheduler built in Part 1 to run jobs on EC2.  Part 2 simplifies submitting and monitoring jobs, and retrieving results for users. Jobs are spread across our cost-optimized Spot Instances. AWS Autoscaling automatically scales up the compute resources when jobs are submitted, then terminates them when jobs are finished. Both of these save you money.

The CloudFormation template used in Part 2 is resource-automation.template. Building on Figure 1, the additional resources launched with Part 2 are noted in the following image, they are an S3 Bucket, AWS Autoscaling Group, and two Lambda functions.

Resource Automation using Serverless Scheduler This is Part 2 of the deployment process, and leverages the Part 1 architecture. This provides the resource allocation, that allows for automated job submission and EC2 Auto Scaling. Detailed steps for the prior image

 

Figure 2: Resource Automation using Serverless Scheduler

                               

Introduction to decoupled serverless scheduling

HPC schedulers traditionally run in a classic master and worker node configuration. A scheduler on the master node orchestrates jobs on worker nodes. This design has been successful for decades, however many powerful schedulers are evolving to meet the demands of HPC workloads. This scheduler design evolved from a necessity to run orchestration logic on one machine, but there are now options to decouple this logic.

What are the possible benefits that decoupling this logic could bring? First, we avoid a number of shortfalls in the environment such as the need for all worker nodes to communicate with a single master node. This single source of communication limits scalability and creates a single point of failure. When we split the scheduler into decoupled components both these issues disappear.

Second, in an effort to work around these pain points, traditional schedulers had to create extremely complex logic to manage all workers concurrently in a single application. This stifled the ability to customize and improve the code – restricting changes to be made by the software provider’s engineering teams.

Serverless services, such as AWS Step Functions and AWS Lambda fix these major issues. They allow you to decouple the scheduling logic to have a one-to-one mapping with each worker, and instead share an Amazon Simple Queue Service (SQS) job queue. We define our scheduling workflow in AWS Step Functions. Then the workflow scales out to potentially thousands of “state machines.” These state machines act as wrappers around each worker node and manage each worker node individually.  Our code is less complex because we only consider one worker and its job.

We illustrate the differences between a traditional shared scheduler and decoupled serverless scheduler in Figures 3 and 4.

 

Traditional Scheduler Model This shows a traditional sceduler where there is one central schduling host, and then multiple workers.

Figure 3: Traditional Scheduler Model

 

Decoupled Serverless Scheduler on each instance This shows what a Decoupled Serverless Scheduler design looks like, wit

Figure 4: Decoupled Serverless Scheduler on each instance

 

Each decoupled serverless scheduler will:

  • Retrieve and pass jobs to its worker
  • Monitor its workers health and take action if needed
  • Confirm job success by checking output logs and retry jobs if needed
  • Terminate the worker when job queue is empty just before also terminating itself

With this new scheduler model, there are many benefits. Decoupling schedulers into smaller schedulers increases fault tolerance because any issue only affects one worker. Additionally, each scheduler consists of independent AWS Lambda functions, which maintains the state on separate hardware and builds retry logic into the service.  Scalability also increases, because jobs are not dependent on a master node, which enables the geographic distribution of jobs. This geographic distribution allows you to optimize use of low-cost Spot Instances. Also, when decoupling the scheduler, workflow complexity decreases and you can customize scheduler logic. You can leverage lower latency job monitoring and customize automated responses to job events as they happen.

 

Benefits

  • Fully managed –  With Part 2, Resource Automation deployed, resources for a job are managed. When a job is submitted, resources launch and run the job. When the job is done, worker nodes automatically shut down. This prevents you from incurring continuous costs.

 

  • Performance – Your application runs on EC2, which means you can choose any of the high performance instance types. Input files are automatically copied from Amazon S3 into local Amazon EC2 Instance Store for high performance storage during execution. Result files are automatically moved to S3 after each job finishes.

 

  • Scalability – A worker node combined with a scheduler state machine become a stateless entity. You can spin up as many of these entities as you want, and point them to an SQS queue. You can even distribute worker and state machine pairs across multiple AWS regions. These two components paired with fully managed services optimize your architecture for scalability to meet your desired number of workers.

 

  • Fault Tolerance –The solution is completely decoupled, which means each worker has its own state machine that handles scheduling for that worker. Likewise, each state machine is decoupled into Lambda functions that make up your state machine. Additionally, the scheduler workflow includes a Lambda function that confirms each successful job or resubmits jobs.

 

  • Cost Efficiency – This fault tolerant environment is perfect for EC2 Spot Instances. This means you can save up to 90% on your workloads compared to On-Demand Instance pricing. The scheduler workflow ensures little to no idle time of workers by closely monitoring and sending new jobs as jobs finish. Because the scheduler is serverless, you only incur costs for the resources required to launch and run jobs. Once the job is complete, all are terminated automatically.

 

  • Agility – You can use AWS fully managed Developer Tools to quickly release changes and customize workflows. The reduced complexity of a decoupled scheduling workflow means that you don’t have to spend time managing a scheduling environment, and can instead focus on your applications.

 

 

Part 1 – serverless scheduler as a standalone solution

 

If you use the serverless scheduler as a standalone solution, you can build clusters and leverage shared storage such as FSx for Lustre, EFS, or S3. Additionally, you can use AWS CloudFormation or to deploy more complex compute architectures that suit your application. So, the EC2 Instances that run the serverless scheduler can be launched in any number of ways. The scheduler only requires the instance id and the SQS job queue name.

 

Submitting Jobs Directly to serverless scheduler

The severless scheduler app is a fully built AWS Step Function workflow to pull jobs from an SQS queue and run them on an EC2 Instance. The jobs submitted to SQS consist of an AWS Systems Manager Run Command, and work with any SSM Document and command that you chose for your jobs. Examples of SSM Run Commands are ShellScript and PowerShell.  Feel free to read more about Running Commands Using Systems Manager Run Command.

The following code shows the format of a job submitted to SQS in JSON.

  {

    "job_id": "jobId_0",

    "retry": "3",

    "job_success_string": " ",

    "ssm_document": "AWS-RunPowerShellScript",

    "commands":

        [

            "cd C:\\ProgramData\\Amazon\\SSM; mkdir Result",

            "Copy-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 -LocalFolder .\\",

            "C:\\ProgramData\\Amazon\\SSM\\jobId_0.bat",

            "Write-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 –Folder .\\Result\\"

        ],

  }

 

Any EC2 Instance associated with a serverless scheduler it receives jobs picked up from a designated SQS queue until the queue is empty. Then, the EC2 resource automatically terminates. If the job fails, it retries until it reaches the specified number of times in the job definition. You can include a specific string value so that the scheduler searches for job execution outputs and confirms the successful completions of jobs.

 

Tagging EC2 workers to get a serverless scheduler state machine

In Part 1 of the deployment, you must manage your EC2 Instance launch and termination. When launching an EC2 Instance, tag it with a specific tag key that triggers a state machine to manage that instance. The tag value is the name of the SQS queue that you want your state machine to poll jobs from.

In the following example, “my-scheduler-cloudformation-stack-name” is the tag key that serverless scheduler app will for with any new EC2 instance that starts. Next, “my-sqs-job-queue-name” is the default job queue created with the scheduler. But, you can change this to any queue name you want to retrieve jobs from when an instance is launched.

{"my-scheduler-cloudformation-stack-name":"my-sqs-job-queue-name"}

 

Monitor jobs in DynamoDB

You can monitor job status in the following DynamoDB. In the table you can find job_id, commands sent to Amazon EC2, job status, job output logs from Amazon EC2, and retries among other things.

Alternatively, you can query DynamoDB for a given job_id via the AWS Command Line Interface:

aws dynamodb get-item --table-name job-monitoring \

                      --key '{"job_id": {"S": "/my-jobs/my-job-id.bat"}}'

 

Using the “job_success_string” parameter

For the prior DynamoDB table, we submitted two identical jobs using an example script that you can also use. The command sent to the instance is “echo Hello World.” The output from this job should be “Hello World.” We also specified three allowed job retries.  In the following image, there are two jobs in SQS queue before they ran.  Look closely at the different “job_success_strings” for each and the identical command sent to both:

DynamoDB CLI info This shows an example DynamoDB CLI output with job information.

From the image we see that Job2 was successful and Job1 retried three times before permanently labelled as failed. We forced this outcome to demonstrate how the job success string works by submitting Job1 with “job_success_string” as “Hello EVERYONE”, as that will not be in the job output “Hello World.” In “Job2” we set “job_success_string” as “Hello” because we knew this string will be in the output log.

Job outputs commonly have text that only appears if job succeeded. You can also add this text yourself in your executable file. With “job_success_string,” you can confirm a job’s successful output, and use it to identify a certain value that you are looking for across jobs.

 

Part 2 – Resource Automation with the serverless scheduler

The additional services we deploy in Part 2 integrate with existing architectures to launch resources for your serverless scheduler. These services allow you to submit jobs simply by uploading input files and executable files to an S3 bucket.

Likewise, these additional resources can use any executable file format you want, including proprietary application level scripts. The solution automates everything else. This includes creating and submitting jobs to SQS job queue, spinning up compute resources when new jobs come in, and taking them back down when there are no jobs to run. When jobs are done, result files are copied to S3 for the user to retrieve. Similar to Part 1, you can still view the DynamoDB table for job status.

This architecture makes it easy to scale out to different teams and departments, and you can submit potentially hundreds of thousands of jobs while you remain in control of resources and cost.

 

Deeper Look at the S3 Architecture

The following diagram shows how you can submit jobs, monitor progress, and retrieve results. To submit jobs, upload all the needed input files and an executable script to S3. The suffix of the executable file (uploaded last) triggers an S3 event to start the process, and this suffix is configurable.

The S3 key of the executable file acts as the job id, and is kept as a reference to that job in DynamoDB. The Lambda (#2 in diagram below) uses the S3 key of the executable to create three SSM Run Commands.

  1. Synchronize all files in the same S3 folder to a working directory on the EC2 Instance.
  2. Run the executable file on EC2 Instances within a specified working directory.
  3. Synchronize the EC2 Instances working directory back to the S3 bucket where newly generated result files are included.

This Lambda (#2) then places the job on the SQS queue using the schedulers JSON formatted job definition seen above.

IMPORTANT: Each set of job files should be given a unique job folder in S3 or more files than needed might be moved to the EC2 Instance.

 

Figure 5: Resource Automation using Serverless Scheduler - A deeper look A deeper dive in to Part 2, resource allcoation.

Figure 5: Resource Automation using Serverless Scheduler – A deeper look

 

EC2 and Step Functions workflow use the Lambda function (#3 in prior diagram) and the Auto Scaling group to scale out based on the number of jobs in the queue to a maximum number of workers (plus state machine), as defined in the Auto Scaling Group. When the job queue is empty, the number of running instances scale down to 0 as they finish their remaining jobs.

 

Process Submitting Jobs and Retrieving Results

  1. Seen in1, upload input file(s) and an executable file into a unique job folder in S3 (such as /year/month/day/jobid/~job-files). Upload the executable file last because it automatically starts the job. You can also use a script to upload multiple files at a time but each job will need a unique directory. There are many ways to make S3 buckets available to users including AWS Storage Gateway, AWS Transfer for SFTP, AWS DataSync, the AWS Console or any one of the AWS SDKs leveraging S3 API calls.
  2. You can monitor job status by accessing the DynamoDB table directly via the AWS Management Console or use the AWS CLI to call DynamoDB via an API call.
  3. Seen in step 5, you can retrieve result files for jobs from the same S3 directory where you left the input files. The DynamoDB table confirms when jobs are done. The SQS output queue can be used by applications that must automatically poll and retrieve results.

You no longer need to create or access compute nodes as compute resources. These automatically scale up from zero when jobs come in, and then back down to zero when jobs are finished.

 

Deployment

Read the GitHub Repo for deployment instructions. Below are CloudFormation templates to help:

AWS RegionLaunch Stack
eu-north-1link to zone
ap-south-1
eu-west-3
eu-west-2
eu-west-1
ap-northeast-3
ap-northeast-2
ap-northeast-1
sa-east-1
ca-central-1
ap-southeast-1
ap-southeast-2
eu-central-1
us-east-1
us-east-2
us-west-1
us-west-2

 

 

Additional Points on Usage Patterns

 

  • While the two solutions in this blog are aimed at HPC applications, they can be used to run any batch jobs. Many customers that run large data processing batch jobs in their data lakes could use the serverless scheduler.

 

  • You can build pipelines of different applications when the output of one job triggers another to do something else – an example being pre-processing, meshing, simulation, post-processing. You simply deploy the Resource Automation template several times, and tailor it so that the output bucket for one step is the input bucket for the next step.

 

  • You might look to use the “job_success_string” parameter for iteration/verification used in cases where a shot-gun approach is needed to run thousands of jobs, and only one has a chance of producing the right result. In this case the “job_success_string” would identify the successful job from potentially hundreds of thousands pushed to SQS job queue.

 

Scale-out across teams and departments

Because all services used are serverless, you can deploy as many run environments as needed without increasing overall costs. Serverless workloads only accumulate cost when the services are used. So, you could deploy ten job environments and run one job in each, and your costs would be the same if you had one job environment running ten jobs.

 

All you need is an S3 bucket to upload jobs to and an associated AMI that has the right applications and license configuration. Because a job configuration is passed to the scheduler at each job start, you can add new teams by creating an S3 bucket and pointing S3 events to a default Lambda function that pulls configurations for each job start.

 

Setup CI/CD pipeline to start continuous improvement of scheduler

If you are advanced, we encourage you to clone the git repo and customize this solution. The serverless scheduler is less complex than other schedulers, because you only think about one worker and the process of one job’s run.

Ways you could tailor this solution:

  • Add intelligent job scheduling using AWS Sagemaker  – It is hard to find data as ready for ML as log data because every job you run has different run times and resource consumption. So, you could tailor this solution to predict the best instance to use with ML when workloads are submitted.
  • Add Custom Licensing Checkout Logic – Simply add one Lambda function to your Step Functions workflow to make an API call a license server before continuing with one or more jobs. You can start a new worker when you have a license checked out or if a license is not available then the instance can terminate to remove any costs waiting for licenses.
  • Add Custom Metrics to DynamoDB – You can easily add metrics to DynamoDB because the solution already has baseline logging and monitoring capabilities.
  • Run on other AWS Services – There is a Lambda function in the Step Functions workflow called “Start_Job”. You can tailor this Lambda to run your jobs on AWS Sagemaker, AWS EMR, AWS EKS or AWS ECS instead of EC2.

 

Conclusion

 

Although HPC workloads and EDA flows may still be dependent on current scheduling technologies, we illustrated the possibilities of decoupling your workloads from your existing shared scheduling environments. This post went deep into decoupled serverless scheduling, and we understand that it is difficult to unwind decades of dependencies. However, leveraging numerous AWS Services encourages you to think completely differently about running workloads.

But more importantly, it encourages you to Think Big. With this solution you can get up and running quickly, fail fast, and iterate. You can do this while scaling to your required number of resources, when you want them, and only pay for what you use.

Serverless computing  catalyzes change across all industries, but that change is not obvious in the HPC and EDA industries. This solution is an opportunity for customers to take advantage of the nearly limitless capacity that AWS.

Please reach out with questions about HPC and EDA on AWS. You now have the architecture and the instructions to build your Serverless Decoupled Scheduling environment.  Go build!


About the Authors and Contributors

Authors 

 

Ludvig Nordstrom is a Senior Solutions Architect at AWS

 

 

 

 

Mark Duffield is a Tech Lead in Semiconductors at AWS

 

 

 

Contributors

 

Steve Engledow is a Senior Solutions Builder at AWS

 

 

 

 

Arun Thomas is a Senior Solutions Builder at AWS

 

 

AWS Architecture Monthly Magazine: Manufacturing

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-magazine-manufacturing/

Architecture Monthly Magazine - Nov-Dec 2019

For more than 25 years, Amazon has designed and manufactured smart products and distributed billions of products through its globally connected distribution network using cutting edge automation, machine learning and AI, and robotics, with AWS at its core. From product design to smart factory and smart products, AWS helps leading manufacturers transform their manufacturing operations with the most comprehensive and advanced set of cloud solutions available today, while taking advantage of the highest level of security.

In this Manufacturing-themed end-of-year issue of the AWS Architecture Monthly magazine, Steve Blackwell, AWS Manufacturing Tech Leader, talks about how manufacturers can experiment with and take advantage of emerging technologies using three main architectural patterns: demand forecasting, smart factories, and extending the manufacturing value chain with smart products.

In This Issue

We’ve assembled architectural best practices about Manufacturing from all over AWS, and we’ve made sure that a broad audience can appreciate it. Note that this will be our last issue of the year. We’ll be back in January with highlights and insights about AWS re:Invent 2019 (December 2-6 in Las Vegas).

  • Case Study: iRobot Ready to Unlock the Next Generation of Smart Homes Using the AWS Cloud
  • Ask an Expert: Steve Blackwell, Manufacturing Tech Leader
  • Blog Post: Reinventing the IoT Platform for Discrete Manufacturers
  • Solution: Smart Product Solution
  • AWS Coffee Break: IoT Helps Manufacturing Hit the Right Note
  • Whitepaper: Practical Ways To Achieve Smarter, Faster, and More Responsive Operations
  • Reference Architecture: EDA on AWS with IBM Spectrum LSF

How to Access the Magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

Serverless at AWS re:Invent 2019

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/serverless-at-aws-reinvent-2019/

Our annual AWS re:Invent conference is just two weeks away! We can’t wait to meet you for an AWSome week in Las Vegas. The Serverless team is now hard at work preparing to deliver over 130 sessions at re:Invent. Come meet us and learn about how to use the newest Serverless innovations to build and architect for modern applications.

reInvent 2019

Breakouts, Talks, Builders, & Demos!

To find any Serverless session, you can search our Agenda for the key words “SVS” or you can visit our re:Invent 2019 Session Catalog. Lets take a look at some of the Architecture-focused sessions you might want to join:

Workshops

  • SVS305-RHow to secure your Serverless APIs
    You’ll get hands on with Amazon API Gateway and learn how to architect for scale and security.
  • SVS303-R: Monolith to Serverless
    This workshop shows you how to re-architect monolithic applications to AWS Lambda-based microservices.

Breakouts

  • SVS308Moving to event-driven architectures
    Learn about the new event-driven world and how our newest tools help you develop event-centric applications.
  • SVS407: Architecting and operating resilient Serverless systems
    This is an excellent session to learn best practice patterns for building reliable applications.
  • SVS401Optimizing your Serverless applications
    Learn how to choose the correct services in your architecture and how to design your Lambda functions and APIs for security and scale.

Chalk Talks

  • SVS338: API Patterns and architectures (REST vs GraphQL APIs)
    We’ll help you evaluate your choices for modern APIs. Come learn how to choose between Amazon S3 REST and GraphQL
  • SVS213: Thinking Serverless
    How do you go from a flowchart to a Serverless application? Come to this session to learn the techniques you can use to design Serverless architectures.
  • SVS323: Mastering AWS Lambda streaming event sources
    This talk will go in depth on the common architecture patterns for consuming and scaling Amazon Kinesis and Amazon DynamoDB streams with AWS Lambda.

Builders Sessions

  • SVS330: Build secure Serverless mobile or web applications
    Get hands on experience building a serverless web application using AWS AppSync, AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

Come Meet Us

Don’t forget to come stop by our Serverless expert booth in the main Expo Hall. We will have many people from the Serverless team ready to speak with you!

Our Serverless team, including specialist solutions architects and developer advocates will be onsite throughout the week. We’d love to meet you, hear about your projects, and help with any architecture questions. Reach out to Sam Dengler, Brian McNamara, Chris Munns, Eric Johnson, James Beswick, and me, George Mao. See you onsite!

See You in Las Vegas!

I can’t wait to meet you in Las Vegas and hear about your projects. Please reach out to us and let’s chat about Serverless! As a side note, reserved seating is available for all sessions, so be sure to log in to your re:Invent account to reserve a seat and join us for all kinds of Serverless architecture discussions and hands-on training.

FogHorn: Edge-to-Edge Communication and Deep Learning

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/foghorn-edge-to-edge-communication-and-deep-learning/

FogHorn is an intelligent Internet of Things ( IoT) edge solution that delivers data processing and real-time inference where data is created. Referring to itself as “the only ‘real’ edge intelligence solution in the market today,”  FogHorn is powered by a hyper-efficient Complex Event Processor (CEP) and delivers comprehensive data enrichment and real-time analytics on high volumes, varieties, and velocities of streaming sensor data, and is optimized for constrained compute footprints and limited connectivity.

Andrea Sabet, AWS Solutions Architect speaks with Ramya Ravichandar, Vice President of Products at Foghorn to talk about how FogHorn integrates with IoT MQTT for edge-to-edge communication as well as Amazon SageMaker for deep learning model deployment. The edgefication process involves running inference with real-time streaming data against a trained deep learning model. Drifts in the model accuracy trigger a callback to SageMaker for retraining.

*Check out more This Is My Architecture video series.

 

Architecting a Low-Cost Web Content Publishing System

Post Syndicated from Craig Jordan original https://aws.amazon.com/blogs/architecture/architecting-a-low-cost-web-content-publishing-system/

Introduction

When an IT team first contemplates reducing on-premises hardware they manage to support their workloads they often feel a tension between wanting to use cloud-native services versus taking a lift-and-shift approach. Cloud native services based on serverless designs could reduce costs and enable a solution that is easier to operate, but appears to be disruptive to end user processes and tools. A lift-and-shift migration, though it can eliminate on-premises hardware and maintain existing workflows, doesn’t eliminate the need to manage a server infrastructure, does nothing to improve a team’s agility in releasing enhancements after migration to the cloud, and may not optimize the cost of the resulting solution. Rather than settling for an either/or option that sacrifices cost savings and ease of operation in order to be non-intrusive to their web authors’ daily work, the University of Saint Thomas, Minnesota team implemented a creative hybrid approach that both avoids end user disruption and achieves the cost savings, agility, and simplified administration that a cloud-native solution can provide.

The Situation

University of St. Thomas wanted to reduce on-premises management of hardware for their university website. In addition, by migrating this functionality to the cloud, they intended to increase the website’s availability. The on-premises solution was deployed on an IIS server maintained by the IT team, but the content of the website was authored by staff members in departments across the university using two different Content Management Systems (CMS). The publishing process from these tools to the web server worked well, and there was no appetite for eliminating the distributed nature of the web site’s development nor the content management systems that the authors were comfortable with.

The IT team hoped to implement a serverless solution utilizing only Amazon Simple Storage Service (S3) to host the static website content. Not only would that reduce the cost of the solution, it would also eliminate having to manage web servers. One of the two content management systems could publish directly to S3, but unfortunately the other CMS could not.

A lift-and-shift migration approach would move the website onto an IIS server in Amazon Elastic Cloud Compute (EC2) and update the publishing process to write its outputs to this new server. This solution would avoid any impact to the authors because all the change would be accomplished behind the scenes by the IT team. However, this approach did not achieve the team’s goals of creating a solution that cost less and was easier to manage than the current on-premises one.

Rather than giving up on creating a cloud-native solution, the team worked from the constraints on the edges of the solution toward the middle.

Solution

Achieving the cost savings, management ease, and high availability for the solution depended upon using S3 to store the website’s contents (#1 in the diagram). If the CMS tools could have published directly to S3, the solution would have been completed by simply adjusting the CMS tools to target their output to S3. However, only one of the two CMS tools could do this. The other one expects to publish its output to a file system that is accessible to the on-premises server where the CMS tool runs. The team solved this problem by launching a t3.small EC2 instance (#2) to sit between the CMS tools and the S3 bucket that would store the website’s production content. Initially, it seemed like using two simple file sync processes could keep the file system of the EC2 instance synchronized with the CMS files. However, when the team first attempted this approach to build a copy of the website on EC2’s file system, they discovered that one of the sync processes would delete the other tool’s output rather than ignoring it when synchronizing updates from its tool to EC2.

To overcome this issue, the team created separate website roots in the EC2 file system into which each CMS would synchronize. Using Unionfs, a Linux utility that combines multiple directories into a single logical directory, a unified root folder for the website (#3) was created that could be easily pushed to S3 using the S3 CLI.

With this much of the solution in place the team had successfully created a new architecture for their website that was nearly as inexpensive as a static website hosted on S3, but that also maintained the tools and processes that their website authors were familiar with.

There was just one more technical issue to address: The IIS site contained internal metadata that redirected its users from virtual directories to the physical content located elsewhere in the website’s content. For example, https://..../law might be redirected to https://..../lawschool/ To achieve similar functionality, the IT team created one HTML file for each of these redirects and added them to a third website root directory in the EC2 instance (#4). These files include static HTML headers needed to redirect the user’s browser to the desired endpoint. Blending this directory with the other two through Unionfs creates a single logical copy of the website’s contents and that can be synchronized out to S3 with a S3 sync CLI command.

A final enhancement to the website was to use an Amazon CloudFront distribution (#5) to cache its contents providing improved response time for website visitors. The distribution object caching TTLs are set to defaults. The publishing process runs every 15 minutes, so to ensure that the website visitors would receive the latest content, the team wrote an AWS Lambda function (#6) that invalidates the cache each time an object is removed from (created in) the S3 bucket using S3 event notifications.

Conclusion

The University of Saint Thomas IT team found a creative way to implement a new solution for their university website that reduces the time and effort required to manage servers, achieves operational simplicity and cost savings by using cloud-native services, and yet doesn’t interfere with the web authoring tools and processes their customers were happy with. The mix of server-based and serverless components in their design illustrates how flexible cloud architectures can be and highlights the ingenuity of the team that built it.

Acknowledgements

Thank you to the following people at the University of Saint Thomas:

This solution was architected by Julian Mino, Cloud Architect. The creative use of Unionfs was suggested by William Bear, AVP for Applications and Infrastructure and former Linux administrator. Vicky Vue, Systems Engineer and Keith Ketchmark, Sr. Systems Engineer implemented the solution using Terraform, Ansible and Python. Daniel Strojny (Associate Director, Networks & IT Operations) helped resolved some internal DNS issues the team encountered.

Architecture Monthly Magazine: Architecting for Financial Services

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/architecture-monthly-magazine-architecting-for-financial-services/

Architecture Monthly - October - Bull and BearThis month’s Architecture Monthly magazine delves into the high-stakes world of banking, insurance, and securities. From capital markets and insurance, to global investment banks, payments, and emerging fintech startups, AWS helps customers innovate, modernize, and transform.

We’re featuring two field experts in October’s issue. First, we interviewed Ed Pozarycki, a Solutions Architect manager in the AWS Financial Services vertical, who spoke to us about patterns, trends, and the special challenges architects face when building systems for financial organizations. And this month we’re rolling out a new feature: Ask an Expert, where we’ll ask AWS professionals three questions about the current magazine’s theme.In this issue, Lana Kalashnyk, Principal Blockchain Architect, told us three things to know about blockchain and cryptocurrencies.

In October’s Issue

For October’s magazine, we’ve assembled architectural best practices about financial services from all over AWS, and we’ve made sure that a broad audience can appreciate it.

  • Interview: Ed Pozarycki, Solutions Architecture Manager, Financial Services
  • Blog post: Tips For Building a Cloud Security Operating Model in the Financial Services Industry
  • Case study: Aon Securities, Inc.
  • Ask an Expert: 3 Things to Know About Blockchain & Cryptocurrencies
  • On-demand webinar: The New Age of Banking & Transforming Customer Experiences
  • Whitepaper: Financial Services Grid Computing on AWS

How to Access the Magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle page or contact us anytime at [email protected].

Financial Services at re:Invent

We have a full re:Invent program planned for the Financial Services industry in December, including leadership, breakout, and builder sessions, plus chalk talks and workshops. Register today.

Using API Gateway as a Single Entry Point for Web Applications and API Microservices

Post Syndicated from Anandprasanna Gaitonde original https://aws.amazon.com/blogs/architecture/using-api-gateway-as-a-single-entry-point-for-web-applications-and-api-microservices/

Introduction

The benefits of high availability, scalability, and elasticity that AWS offers has proven to be a boon for Software-as-a-Service (SaaS) providers. AWS has also made it seamless to adopt microservices architectures for modernizing these SaaS applications, as well as providing API-based access for external applications.

An API management layer such as Amazon API Gateway is a natural choice for customers to expose APIs externally in a secure and highly scalable manner. However, as they adopt the cloud for their software applications and services, these providers may spin up redundant AWS environments to support them for multiple customers. This is typically driven by some unique requirements for each of their customers.

However, there is potential to create a multi-tenant microservices architecture using the capabilities of API Gateway. This architecture utilizes the same instance of microservice to serve different customers, thereby leading to a better utilization of the environment and optimized from a cost perspective. This configuration requires providers to support white-labelling of domains to cater to each of their customer as well as support identification of the customer domain for handling customized business logic for each customer in the backend microservices.

This blog post talks about a reference architecture that allows API Gateway to act as single entry point for external-facing, API-based microservices and web applications across multiple external customers by leveraging a different subdomain for each one.

Amazon API Gateway: A Single Entry-Point

Using a single API Gateway in the architecture across multiple web portal applications and microservices is an important consideration towards the goal of reusability of components and cost optimization.

Amazon API Gateway provides a highly scalable solution to create and publish RESTful and WebSocket APIs. It provides flexibility in choosing multiple backend technologies such as AWS Lambda functions, AWS Step Functions state machines, or call HTTP(s) endpoints hosted on AWS Elastic Beanstalk, Amazon EC2, and also non-AWS hosted HTTP based services.

API Gateway allows for handling common API management tasks such as security, caching, throttling, and monitoring. While its primary objective is to provide that abstraction layer on top of your backend APIs and microservices, it can also allow backends to be simple web applications for web portal access or Amazon S3 buckets for providing access to static web content or documents.

Along with above capabilities, the following key features of API Gateway help to create the architecture described here.

  1. Custom Domain Names support:
    When an API is deployed using API Gateway, the default API endpoint domain name is not user friendly as can be seen here:https://api-id.execute-api.region.amazonaws.com/stageapi-id is generated by API Gateway; region is specified by you when creating the API; and stage is specified by you when deploying the API.The default API endpoint can be difficult to recall and not user-friendly. To provide a simpler and more intuitive URL for your API users, it allows you to specify a custom domain name such as customer1.example.com via its integration with AWS Certificate Manager, which allows for SSL certificate-based validation of the sub-domains. API Gateway allows you to map multiple sub-domains to a single API endpoint allowing you to white-label the domains based on an external customer’s requirement.
  2. API request /response transformation:
    API Gateway allows you to specify the integration of each path of the API endpoint separately. This allows you to route API requests for each path to a separate backend endpoint and at the same time apply any request/response transformations, such as customer header insertion or modification of existing headers to manage any custom handling of APIs.

Architecture and Its Benefits

In the architecture shown in the diagram below, the features explained in this blog are utilized.

This architecture is an example of a typical SaaS provider who wants to offer its services to other enterprises and needs to support white-labeling domains for this web and API infrastructure. This is achieved using the following steps:

    1. A single domain of example.com can be registered with a domain registrar and you can create subdomains by creating CNAME records for example customer1.example.com, customer2.example.com by updating DNS information with the domain registrar. This can be handled by AWS’s own DNS and Registrar service Amazon Route 53 or can be any third party domain name provider.
    2. Once complete, AWS provides AWS Certificate Manager (ACM) to create a certificate for the following domains: example.com and *.example.com. This makes sure that the ACM certificate once applied to the API Gateway can allow for multiple subdomains to be served by it.
    3. Using the certificate created in ACM, you can create custom domain for the API endpoint. In this example this API endpoint will serve two subdomains for two different external customers and specifying base path mappings as needed. The following two subdomains are created as custom domains using this capability: customer1.example.com and customer2.example.com.
      Note: Make sure to add CNAME records for customer1 and customer2 at your DNS provider to point to the target domain name created within your API Gateway for each of the two customer sub-domains.
    4. The API Endpoint is then configured with the following API resources:
      1. HTTP integration of /service1 to route traffic to the ELB endpoint of microservice hosted on an ECS cluster
      2. HTTP integration of /service2 to route traffic to the ELB endpoint of web application hosted on an EC2 cluster
      3. HTTP integration of /service1 to route traffic to the ELB endpoint of microservice hosted on an ECS cluster
    5. API Gateway allows you to capture the FQDN of the URL and map it to Custom Headers or Query String Parameters which are then sent to the backend service integrated with the corresponding API resource and the HTTP method. For example we can create a custom header called “Customer” to forward customer1 or customer2 to the backend application for customer-specific business logic. This is done using the Method Request parameters and Integration Request configuration within API Gateway.

    Summary

    As you can see, this is one of the approaches to use an API Gateway as a single entry-point for API-based microservices and web application assets. This allows you to use infrastructure more cost effectively without losing the advantages of scaling when demand to your applications grow. You can read more about working with API Gateway and Route 53 DNS in AWS Documentation and use these capabilities to create architectures to suit your specific requirements.

Automated Disaster Recovery using CloudEndure

Post Syndicated from Ryan Jaeger original https://aws.amazon.com/blogs/architecture/automated-disaster-recovery-using-cloudendure/

There are any number of events that cause IT outages and impact business continuity. These could include the unexpected infrastructure or application outages caused by flooding, earthquakes, fires, hardware failures, or even malicious attacks. Cloud computing opens a new door to support disaster recovery strategies, with benefits such as elasticity, agility, speed to innovate, and cost savings—all which aid new disaster recovery solutions.

With AWS, organizations can acquire IT resources on-demand, and pay only for the resources they use. Automating disaster recovery (DR) has always been challenging. This blog post shows how you can use automation to allow the orchestration of recovery to eliminate manual processes. CloudEndure Disaster Recovery, an AWS Company, Amazon Route 53, and AWS Lambda are the building blocks to deliver a cost-effective automated DR solution. The example in this post demonstrates how you can recover a production web application with sub-second Recovery Point Objects (RPOs) and Recovery Time Objectives (RTOs) in minutes.

As part of a DR strategy, knowing RPOs and RTOs will determine what kind of solution architecture you need. The RPO represents the point in time of the last recoverable data point (for example, the “last backup”). Any disaster after that point would result in data loss.

The time from the outage to restoration is the RTO. Minimizing RTO and RPO is a cost tradeoff. Restoring from backups and recreating infrastructure after the event is the lowest cost but highest RTO. Conversely, the highest cost and lowest RTO is a solution running a duplicate auto-failover environment.

Solution Overview

CloudEndure is an automated IT resilience solution that lets you recover your environment from unexpected infrastructure or application outages, data corruption, ransomware, or other malicious attacks. It utilizes block-level Continuous Data Replication (CDP), which ensures that target machines are spun up in their most current state during a disaster or drill, so that you can achieve sub-second RPOs. In the event of a disaster, CloudEndure triggers a highly automated machine conversion process and a scalable orchestration engine that can spin up machines in the target AWS Region within minutes. This process enables you to achieve RTOs in minutes. The CloudEndure solution uses a software agent that installs on physical or virtual servers. It connects to a self-service, web-based use console, which then issues an API call to the selected AWS target Region to create a Staging Area in the customer’s AWS account designated to receive the source machine’s replicated data.

Architecture

In the above example, a webserver and database server have the CloudEndure Agent installed, and the disk volumes on each server replicated to a staging environment in the customer’s AWS account. The CloudEndure Replication Server receives the encrypted data replication traffic and writes to the appropriate corresponding EBS volumes. It’s also possible to configure data replication traffic to use VPN or AWS Direct Connect.

With this current setup, if an infrastructure or application outage occurs, a failover to AWS is executed by manually starting the process from the CloudEndure Console. When this happens, CloudEndure creates EC2 instances from the synchronized target EBS volumes. After the failover completes, additional manual steps are needed to change the website’s DNS entry to point to the IP address of the failed over webserver.

Could the CloudEndure failover and DNS update be automated? Yes.

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service with three main functions: domain registration, DNS routing, and health checking. A configured Route 53 health check monitors the endpoint of a webserver. If the health check fails over a specified period, an alarm is raised to execute an AWS Lambda function to start the CloudEndure failover process. In addition to health checks, Route 53 DNS Failover allows the DNS record for the webserver to be automatically update based on a healthy endpoint. Now the previously manual process of updating the DNS record to point to the restored web server is automated. You can also build Route 53 DNS Failover configurations to support decision trees to handle complex configurations.

To illustrate this, the following builds on the example by having a primary, secondary, and tertiary DNS Failover choice for the web application:

How Health Checks Work in Complex Amazon Route 53 Configurations

When the CloudEndure failover action executes, it takes several minutes until the target EC2 is launched and configured by CloudEndure. An S3 static web page can be returned to the end-user to improve communication while the failover is happening.

To support this example, Amazon Route 53 DNS failover decision tree can be configured to have a primary, secondary, and tertiary failover. The decision tree logic to support the scenario is the following:

  1. If the primary health check passes, return the primary webserver.
  2. Else, if the secondary health check passes, return the failover webserver.
  3. Else, return the S3 static site.

When the Route 53 health check fails when monitoring the primary endpoint for the webserver, a CloudWatch alarm is configured to ALARM after a set time. This CloudWatch alarm then executes a Lambda function that calls the CloudEndure API to begin the failover.

In the screenshot below, both health checks are reporting “Unhealthy” while the primary health check is in a state of ALARM. At the point, the DNS failover logic should be returning the path to the static S3 site, and the Lambda function executed to start the CloudEndure failover.

The following architecture illustrates the completed scenario:

Conclusion

Having a disaster recovery strategy is critical for business continuity. The benefits of AWS combined with CloudEndure Disaster Recovery creates a non-disruptive DR solution that provides minimal RTO and RPO while reducing total cost of ownership for customers. Leveraging CloudWatch Alarms combined with AWS Lambda for serverless computing are building blocks for a variety of automation scenarios.

 

 

New Zealand Internet Connectivity to AWS

Post Syndicated from Cameron Tod original https://aws.amazon.com/blogs/architecture/new-zealand-internet-connectivity-to-aws/

Amazon Web Services (AWS) serves more than a million private and public sector organizations all over the world from its extensive and expanding global infrastructure.

Like other countries, organizations all around New Zealand are using AWS to change the way they operate. For example, Xero, a Wellington-based online accountancy software vendor, now serves customers in more than 100 countries, while the Department of Conservation provides its end users with virtual desktops running in Amazon Workspaces.

New Zealand doesn’t currently have a dedicated AWS Region. Geographically, the closest is Asia Pacific (Sydney), which is 2,000 kilometers (km) away, across a deep sea. While customers rely on AWS for business-critical workloads, they are well-served by New Zealand’s international connectivity.

To connect to Amazon’s network, our New Zealand customers have a range of options:

  • Public internet endpoints
  • Managed or software Virtual Private Networks (VPN)
  • AWS Direct Connect (DX).

All rely on the extensive internet infrastructure connecting New Zealand to the world.

International Connectivity

The vast majority of internet traffic is carried over physical cables, while the percentage of traffic moving over satellite or wireless links is small by comparison.

Historically, cables were funded and managed by consortia of telecommunication providers. More recently, large infrastructure and service providers like AWS have contributed to or are building their own cable networks.

There are currently about 400 submarine cables in service globally. Modern submarine cables are fiber-optic, run for thousands of kilometers, and are protected by steel strands, plastic sheathing, copper, and a chemical water barrier. Over that distance, the signal can weaken—or attenuate—so signal repeaters are installed approximately every 50km to mitigate attenuation. Repeaters are powered by a charge running over the copper sheathing in the cable.

An example of submarine cable composition.. S

An example of submarine cable composition.. Source: WikiMedia Commons

For most of their run, these cables are about as thick as a standard garden hose. They are thicker, however, closer to shore and in areas where there’s a greater risk of damage by fishing nets, boat anchors, etc.

Cables can—and do—break, but redundancy is built into the network. According to Telegeography, there are 100 submarine cable faults globally every year. However, most faults don’t impact users meaningfully.

New Zealand is served by four main cables:

  1. Hawaiki : Sydney -> Mangawhai (Northland, NZ) -> Kapolei (Hawaii, USA) -> Hilsboro, Oregon (USA) – 44 Terabits per second (Tbps)
  2. Tasman Global Access: Raglan (Auckland, New Zealand) -> Narabeen (NSW, Australia) – 20 Tbps
  3. Southern Cross A: Whenuapai (Auckland, New Zealand) -> Alexandria (NSW, Australia) – 1.2 Tbps
  4. Southern Cross B: Takapuna (Auckland, New Zealand) -> Spencer Beach (Hawaii, USA) – 1.2 Tbps
A map of major submarine cables connecting to New Zealand.

A map of major submarine cables connecting to New Zealand. Source submarinecablemap.com

The four cables combined currently deliver 66 Tbps of available capacity. The Southern Cross NEXT cable is due to come online in 2020, which will add another 72 Tbps. These are, of course, potential capacities; it’s likely the “lit” capacity—the proportion of the cables’ overall capacity that is actually in use—is much lower.

Connecting to AWS from New Zealand

While understanding the physical infrastructure is important in practice, these details are not shared with customers. Connectivity options are evaluated on the basis of partner and AWS offerings, which include connectivity.

Customers connect to AWS in three main ways: over public endpoints, via site-to-site VPNs, and via Direct Connect (DX), all typically provided by partners.

Public Internet Endpoints

Customers can connect to public endpoints for AWS services over the public internet. Some services, like Amazon CloudFront, Amazon API Gateway, and Amazon WorkSpaces are generally used in this way.

Network-level access can be controlled via various means depending on the service, whether that is Endpoint Policies for API Gateway, Security Groups, and Network Access Control Lists for Amazon Virtual Private Cloud (VPC), or Resource Policies for services such as Amazon S3, Amazon Simple Queue Service (SQS), or Amazon Key Management Service (KMS).

All services offer TLS or IPsec connectivity for secure encryption-in-motion.

Site-to-Site Virtual Private Network

Many organizations use a VPN to connect to AWS. It’s the simplest and lowest cost entry point to expose resources deployed in private ranges in an Amazon VPC. Amazon VPC allows customers to provision a logically isolated network segment, with fine-grained control of IP ranges, filtering rules, and routing.

AWS offers a managed site-to-site VPN service, which creates secure, redundant Internet Protocol Security (IPSec) VPNs, and also handles maintenance and high-availability while integrating with Amazon CloudWatch for robust monitoring.

If using an AWS managed VPN, the AWS endpoints have publicly routable IPs. They can be connected to over the public internet or via a Public Virtual Interface over DX (outlined below).

Customers can also deploy VPN appliances onto Amazon Elastic Compute Cloud (EC2) instances running in their VPC. These may be self-managed or provided by Amazon Marketplace sellers.

AWS also offers AWS Client VPN, for direct user access to AWS resources.

AWS Direct Connect

While connectivity over the internet is secure and flexible, it has one major disadvantage: it’s unpredictable. By design, traffic traversing the internet can take any path to reach its destination. Most of the time it works but occasionally routing conditions may reduce capacity or increase latency.

DX connections are either 1 or 10 Gigabits per second (Gbps). This capacity is dedicated to the customer; it isn’t shared, as other network users are never routed over the connection. This means customers can rely on consistent latency and bandwidth. The DX per-Gigabit transfer cost is lower than other egress mechanisms. For customers transferring large volumes of data, DX may be more cost effective than other means of connectivity.

Customers may publish their own 802.11q Virtual Local Area Network (VLAN) tags across the DX, and advertise routes via Border Gateway Protocol (BGP). A dedicated connection supports up to 50 private or public virtual interfaces. New Zealand does not have a physical point-of-presence for DX—users must procure connectivity to our Sydney Region. Many AWS Partner Network (APN) members in New Zealand offer this connectivity.

For customers who don’t want or need to manage VLANs to AWS—or prefer 1 Gbps or smaller links —APN partners offer hosted connections or hosted virtual interfaces.  For more detail, please review our AWS Direct Connect Partners page.

Performance

There are physical limits to latency dictated by the speed of light, and the medium through which optical signals travel. Southern Cross publishes latency statistics, and it sees one-way latency of approximately 11 milliseconds (ms) over the 2,276km Alexandria to Whenuapai link. Double that for a round-trip to 22 ms.

In practice, we see customers achieving round-trip times from user workstations to Sydney in approximately 30-50 ms, assuming fair-weather internet conditions or DX links. Latency in Auckland (the largest city) tends to be on the lower end of that spectrum, while the rest of the country tends towards the higher end.

Bandwidth constraints are more often dictated by client hardware, but AWS and our partners offer up to 10 Gbps links, or smaller as required. For customers that require more than 10 Gbps over a single link, AWS supports Link Aggregation Groups (LAG).

As outlined above, there are a range of ways for customers to adopt AWS via secure, reliable, and performant networks. To discuss your use case, please contact an AWS Solutions Architect.

 

New Issue of Architecture Monthly: Games

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/new-issue-of-architecture-monthly-games/

Architecture Monthyl Magazine - September 2019 (Games)This month’s Architecture Monthly magazine is all about games—not Scrabble, not Uno, not Twister, and certainly not hide-and-seek.

No, we’re talking the big business of online, multiplayer games. And did you know that approximately 90% of large, public game companies are running on the AWS cloud? Yep, I’m talking Epic (ever heard of Fortnite?), Ubisoft, Nintendo, and more. I had the opportunity to sit down with a senior tech leader for AWS Games, who talked about why companies are moving to the cloud from on-premise, and it’s about a whole lot more than just games for entertainment. We got into the big-money world of competitive eSports as well as the gamification of learning processes and economics.

Consider Twitch, often defined as Amazon’s live streaming platform for gamers. But Twitch is much more than a gaming platform. For example, AWS Live Video on Twitch offers live streaming about everything from how to develop serverless apps and robots to interactive quiz shows that help you prepare for AWS Certification exams. And of course, you can also learn about the technology that powers your favorite video games.

September’s Issue

For September’s issue, we’ve assembled architectural best practices about games from all over AWS, and we’ve made sure that a broad audience can appreciate it.

How to Access the Magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle page or contact us anytime at [email protected].

One to Many: Evolving VPC Design

Post Syndicated from Androski Spicer original https://aws.amazon.com/blogs/architecture/one-to-many-evolving-vpc-design/

Since its inception, the Amazon Virtual Private Cloud (VPC) has acted as the embodiment of security and privacy for customers who are looking to run their applications in a controlled, private, secure, and isolated environment.

This logically isolated space has evolved, and in its evolution has increased the avenues that customers can take to create and manage multi-tenant environments with multiple integration points for access to resources on-premises.

This blog is a two-part series that begins with a look at the Amazon VPC as a single unit of networking in the AWS Cloud but eventually takes you to a world in which simplified architectures for establishing a global network of VPCs are possible.

From One VPC: Single Unit of Networking

To be successful with the AWS Virtual Private Cloud you first have to define success for today and what success might look like as your organization’s adoption of the AWS cloud increases and matures. In essence, your VPCs should be designed to satisfy the needs of your applications today and must be scalable to accommodate future needs.

Classless Inter-Domain Routing (CIDR) notations are used to denote the size of your VPC. AWS allows you specify a CIDR block between /16 and /28. The largest, /16, provides you with 65,536 IP addresses and the smallest possible allowed CIDR block, /28, provides you with 16 IP addresses. Note, the first four IP addresses and the last IP address in each subnet CIDR block are not available for you to use, and cannot be assigned to an instance.

AWS VPC supports both IPv4 and IPv6. It is required that you specify an IPv4 CIDR range when creating a VPC. Specifying an IPv6 range is optional.

Customers can specify ANY IPv4 address space for their VPC. This includes but is not limited to RFC 1918 addresses.

After creating your VPC, you divide it into subnets. In an AWS VPC, subnets are not isolation boundaries around your application. Rather, they are containers for routing policies.

Isolation is achieved by attaching an AWS Security Group (SG) to the EC2 instances that host your application. SGs are stateful firewalls, meaning that connections are tracked to ensure return traffic is allowed. They control inbound and outbound access to the elastic network interfaces that are attached to an EC2 instance. These should be tightly configured, only allowing access as needed.

It is our best practice that subnets should be created in categories. There two main categories; public subnets and private subnets. At minimum they should be designed as outlined in the below diagrams for IPv4 and IPv6 subnet design.

Recommended IPv4 subnet design pattern

Recommended IPv6 subnet design pattern

Subnet types are denoted by the ability and inability for applications and users on the internet to directly initiate access to infrastructure within a subnet.

Public Subnets

Public subnets are attached to a route table that has a default route to the Internet via an Internet gateway.

Resources in a public subnet can have a public IP or Elastic IP (EIP) that has a NAT to the Elastic Network Interface (ENI) of the virtual machines or containers that hosts your application(s). This is a one-to-one NAT that is performed by the Internet gateway.

Illustration of public subnet access path to the Internet through the Internet Gateway (IGW)

Private Subnets

A private subnet contains infrastructure that isn’t directly accessible from the Internet. Unlike the public subnet, this infrastructure only has private IPs.

Infrastructure in a private subnet gain access to resources or users on the Internet through a NAT infrastructure of sorts.

AWS natively provides NAT capability through the use of the NAT Gateway service. Customers can also create NAT instances that they manage or leverage third-party NAT appliances from the AWS Marketplace.

In most scenarios, it is recommended to use the AWS NAT Gateway as it is highly available (in a single Availability Zone) and is provided as a managed service by AWS. It supports 5 Gbps of bandwidth per NAT gateway and automatically scales up to 45 Gbps.

An AWS NAT gateway’s high availability is confined to a single Availability Zone. For high availability across AZs, it is recommended to have a minimum of two NAT gateways (in different AZs). This allows you to switch to an available NAT gateway in the event that one should become unavailable.

This approach allows you to zone your Internet traffic, reducing cross Availability Zone connections to the Internet. More details on NAT gateway are available here.

Illustration of an environment with a single NAT Gateway (NAT-GW)

Illustration of high availability with a multiple NAT Gateways (NAT-GW) attached to their own route table

Illustration of the failure of one NAT Gateway and the fail over to an available NAT Gateway by the manual changing of the default route next hop in private subnet A route table

AWS allocated IPv6 addresses are Global Unicast Addresses by default. That said, you can privatize these subnets by using an Egress-Only Internet Gateway (E-IGW), instead of a regular Internet gateway. E-IGWs are purposely built to prevents users and applications on the Internet from initiating access to infrastructure in your IPv6 subnet(s).

Illustration of internet access for hybrid IPv6 subnets through an Egress-Only Internet Gateway (E-IGW)

Applications hosted on instances living within a private subnet can have different access needs. Some require access to the Internet while others require access to databases, applications, and users that are on-premises. For this type of access, AWS provides two avenues: the Virtual Gateway and the Transit Gateway. The Virtual Gateway can only support a single VPC at a time, while the Transit Gateway is built to simplify the interconnectivity of tens to hundreds of VPCs and then aggregating their connectivity to resources on-premises. Given that we are looking at the VPC as a single unit of networking, all diagrams below contain illustrations of the Virtual Gateway which acts a WAN concentrator for your VPC.

Illustration of private subnets connecting to data center via a Virtual Gateway (VGW)

 

Illustration of private subnets connecting to Data Center via a VGW

 

Illustration of private subnets connecting to Data Center using AWS Direct Connect as primary and IPsec as backup

The above diagram illustrates a WAN connection between a VGW attached to a VPC and a customer’s data center.

AWS provides two options for establishing a private connectivity between your VPC and on-premises network: AWS Direct Connect and AWS Site-to-Site VPN.

AWS Site-to-Site VPN configuration leverages IPSec with each connection providing two redundant IPSec tunnels. AWS support both static routing and dynamic routing (through the use of BGP).

BGP is recommended, as it allows dynamic route advertisement, high availability through failure detection, and fail over between tunnels in addition to decreased management complexity.

VPC Endpoints: Gateway & Interface Endpoints

Applications running inside your subnet(s) may need to connect to AWS public services (like Amazon S3, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon API Gateway, etc.) or applications in another VPC that lives in another account. For example, you may have a database in another account that you would like to expose applications that lives in a completely different account and subnet.

For these scenarios you have the option to leverage an Amazon VPC Endpoint.

There are two types of VPC Endpoints: Gateway Endpoints and Interface Endpoints.

Gateway Endpoints only support Amazon S3 and Amazon DynamoDB. Upon creation, a gateway is added to your specified route table(s) and acts as the destination for all requests to the service it is created for.

Interface Endpoints differ significantly and can only be created for services that are powered by AWS PrivateLink.

Upon creation, AWS creates an interface endpoint consisting of one or more Elastic Network Interfaces (ENIs). Each AZ can support one interface endpoint ENI. This acts as a point of entry for all traffic destined to a specific PrivateLink service.

When an interface endpoint is created, associated DNS entries are created that point to the endpoint and each ENI that the endpoint contains. To access the PrivateLink service you must send your request to one of these hostnames.

As illustrated below, ensure the Private DNS feature is enabled for AWS public and Marketplace services:

Since interface endpoints leverage ENIs, customers can use cloud techniques they are already familiar with. The interface endpoint can be configured with a restrictive security group. These endpoints can also be easily accessed from both inside and outside the VPC. Access from outside a VPC can be accomplished through Direct Connect and VPN.

Illustration of a solution that leverages an interface and gateway endpoint

Customers can also create AWS Endpoint services for their applications or services running on-premises. This allows access to these services via an interface endpoint which can be extended to other VPCs (even if the VPCs themselves do not have Direct Connect configured).

VPC Sharing

At re:Invent 2018, AWS launched the feature VPC sharing, which helps customers control VPC sprawl by decoupling the boundary of an AWS account from the underlying VPC network that supports its infrastructure.

VPC sharing uses Amazon Resource Access Manager (RAM) to share subnets across accounts within the same AWS organization.

VPC sharing is defined as:

VPC sharing allows customers to centralize the management of network, its IP space and the access paths to resources external to the VPC. This method of centralization and reuse (of VPC components such as NAT Gateway and Direct Connect connections) results in a reduction of cost to manage and maintain this environment.

Great, but there are times when a customer needs to build networks with multiple VPCs in and across AWS regions. How should this be done and what are the best practices?

This will be answered in part two of this blog.

 

 

Building a Serverless FHIR Interface on AWS

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/building-a-serverless-fhir-interface-on-aws/

This post is courtesy of Mithun Mallick, Senior Solutions Architect (Messaging), and Navneet Srivastava, Senior Solutions Architect.

Technology is revolutionizing the healthcare industry but it can be a challenge for healthcare providers to take full advantage because of software systems that don’t easily communicate with each other. A single patient visit involves multiple systems such as practice management, electronic health records, and billing. When these systems can’t operate together, it’s harder to leverage them to improve patient care.

To help make it easier to exchange data between these systems, Health Level Seven International (HL7) developed the Fast Healthcare Interoperability Resources (FHIR), an interoperability standard for the electronic exchange of healthcare information. In this post, I will show you the AWS services you use to build a serverless FHIR interface on the cloud.

In FHIR, resources are your basic building blocks. A resource is an exchangeable piece of content that has a common way to define and represent it, a set of common metadata, and a human readable part. Each resource type has the same set of operations, called interactions, that you use to manage the resources in a granular fashion. For more information, see the FHIR overview.

FHIR Serverless Architecture

My FHIR architecture features a server with its own data repository and a simple consumer application that displays Patient and Observation data. To make it easier to build, my server only supports the JSON content type over HTTPs, and it only supports the Bundle, Patient, and Observation FHIR resource types. In a production environment, your server should support all resource types.

For this architecture, the server supports the following interactions:

  • Posting bundles as collections of Patients and Observations
  • Searching Patients and Observations
  • Updating and reading Patients
  • Creating a CapabilityStatement

You can expand this architecture to support all FHIR resource types, interactions, and data formats.

The following diagram shows how the described services work together to create a serverless FHIR messaging interface.

 

Services work together to create a serverless FHIR messaging interface.

 

Amazon API Gateway

In Amazon API Gateway, you create the REST API that acts as a “front door” for the consumer application to access the data and business logic of this architecture. I used API Gateway to host the API endpoints. I created the resource definitions and API methods in the API Gateway.

For this architecture, the FHIR resources map to the resource definitions in API Gateway. The Bundle FHIR resource type maps to the Bundle API Gateway resource. The observation FHIR resource type maps to the observation API Gateway resource. And, the Patient FHIR resource type maps to the Patient API Gateway resource.

To keep the API definitions simple, I used the ANY method. The ANY method handles the various URL mappings in the AWS Lambda code, and uses Lambda proxy integration to send requests to the Lambda function.

You can use the ANY method to handle HTTP methods, such as:

  • POST to represent the interaction to create a Patient resource type
  • GET to read a Patient instance based on a patient ID, or to search based on predefined parameters

We chose Amazon DynamoDB because it provides the input data representation and query patterns necessary for a FHIR data repository. For this architecture, each resource type is stored in its own Amazon DynamoDB table. Metadata for resources stored in the repository is also stored in its own table.

We set up global secondary indexes on the patient and observations tables in order to perform searches and retrieve observations for a patient. In this architecture, the patient id is stored as a patient reference id in the observation table. The patientRefid-index allows you to retrieve observations based on the patient id without performing a full scan of the table.

We chose Amazon S3 to store archived FHIR messages because of its low cost and high durability.

Processing FHIR Messages

Each Amazon API Gateway request in this architecture is backed by an AWS Lambda function containing the Jersey RESTful web services framework, the AWS serverless Java container framework, and the HAPI FHIR library.

The AWS serverless Java framework provides a base implementation for the handleRequest method in LambdaHandler class. It uses the serverless Java container initialized in the global scope to proxy requests to our jersey application.

The handler method calls a proxy class and passes the stream classes along with the context.

This source code from the LambdaHandler class shows the handleRequest method:

// Main entry point of the Lambda function, uses the serverless-java-container initialized in the global scope
// to proxy requests to our jersey application
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) 
    throws IOException {
    	
        handler.proxyStream(inputStream, outputStream, context);

        // just in case it wasn't closed by the map	per
        outputStream.close();
}

The resource implementations classes are in the com.amazonaws.lab.resources package. This package defines the URL mappings necessary for routing the REST API calls.

The following method from the PatientResource class implements the GET patient interaction based on a patient id. The annotations describe the HTTP method called, as well as the path that is used to make the call. This method is invoked when a request is sent with the URL pattern: Patient/{id}. It retrieves the Patient resource type based on the id sent as part of the URL.

	@GET
	@Path("/{id}")
public Response gETPatientid(@Context SecurityContext securityContext,
			@ApiParam(value = "", required = true) @PathParam("id") String id, @HeaderParam("Accept") String accepted) {
…
}

Deploying the FHIR Interface

To deploy the resources for this architecture, we used an AWS Serverless Application Model (SAM) template. During deployment, SAM templates are expanded and transformed into AWS CloudFormation syntax. The template launches and configures all the services that make up the architecture.

Building the Consumer Application

For out architecture, we wrote a simple Node.JS client application that calls the APIs on FHIR server to get a list of patients and related observations. You can build more advanced applications for this architecture. For example, you could build a patient-focused application that displays vitals and immunization charts. Or, you could build a backend/mid-tier application that consumes a large number of messages and transforms them for downstream analytics.

This is the code we used to get the token from Amazon Cognito:

token = authcognito.token();

//Setting url to call FHIR server

     var options = {
       url: "https://<FHIR SERVER>",
       host: "FHIR SERVER",
       path: "Prod/Patient",
       method: "GET",
       headers: {
         "Content-Type": "application/json",
         "Authorization": token
         }
       }

This is the code we used to call the FHIR server:

request(options, function(err, response, body) {
     if (err) {
       console.log("In error  ");
       console.log(err);

}
else {
     let patientlist = JSON.parse(body);

     console.log(patientlist);
     res.json(patientlist["entry"]);
}
});
 

We used AWS CloudTrail and AWS X-Ray for logging and debugging.

The screenshots below display the results:

Conclusion

In this post, we demonstrated how to build a serverless FHIR architecture. We used Amazon API Gateway and AWS Lambda to ingest and process FHIR resources, and Amazon DynamoDB and Amazon S3 to provide a repository for the resources. Amazon Cognito provides secure access to the API Gateway. We also showed you how to build a simple consumer application that displays patient and observation data. You can modify this architecture for your individual use case.

About the authors

Mithun MallickMithun is a Sr. Solutions Architect and is responsible for helping customers in the HCLS industry build secure, scalable and cost-effective solutions on AWS. Mithun helps develop and implement strategic plan to engage customers and partners in the industry and works with the community of technically focused HCLS specialists within AWS. He has hands on experience on messaging standards like X12, HL7 and FHIR. Mithun has a M.B.A from CSU (Ft. Collins, CO) and a bachelors in Computer Engineering. He holds several associate and professional certifications for architecting on AWS.

 

 

Navneet SrivastavaNavneet, a Sr. Solutions Architect, is responsible for helping provider organizations and healthcare companies to deploy electronic medical records, devices, and AI/ML-based applications while educating customers about how to build secure, scalable, and cost-effective AWS solutions. He develops strategic plans to engage customers and partners, and works with a community of technically focused HCLS specialists within AWS. He is skilled AI, ML, Big Data, and healthcare related technologies. Navneet has a M.B.A from NYIT and a bachelors in software Engineering and holds several associate and professional certifications for architecting on AWS.

Top Resources for API Architects and Developers

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/top-resources-for-api-architects-and-developers/

We hope you’ve enjoyed reading our series on API architecture and development. We wrote about best practices for REST APIs with Amazon API Gateway  and GraphQL APIs with AWS AppSync. This post will cover the top resources that all API developers should be aware of.

Tech Talks, Webinars, and Twitch Live Stream

The technical staff at AWS have produced a variety of digital media that cover new service launches, best practices, and customer questions. Be sure to review these videos for tips and tricks on building APIs:

  • Happy Little APIs: This is a multi part series produced by our awesome Developer Advocate, Eric Johnson. He leads a series of talks that demonstrate how to build a real world API.
  • API Gateway’s WebSocket webinar: API Gateway now supports real time APIs with Websockets. This webinar covers how to use this feature and why you should let API Gateway manage your realtime APIs.
  • Best practices for building enterprise grade APIs: API Gateway reduces the time it takes to build and deploy REST development but there are strategies that can make development, security, and management easier.
  • An Intro to AWS AppSync and GraphQL: AppSync helps you build sophisticated data applications with realtime and offline capabilities.

Gain Experience With Hands-On Workshops and Examples

One of the easiest ways to get started with Serverless REST API development is to use the Serverless Application Model (SAM). SAM lets you run APIs and Lambda functions locally on your machine for easy development and testing.

For example, you can configure API Gateway as an Event source for Lambda with just a few lines of code:

Type: Api
Properties:
Path: /photos
Method: post

There are many great examples on our GitHub page to help you get started with Authorization (IAMCognito), Request, Response,  various policies , and CORS configurations for API Gateway.

If you’re working with GraphQL, you should review the Amplify Framework. This is an official AWS project that helps you quickly build Web Applications with built in AuthN and backend APIs using REST or GraphQL. With just a few lines of code, you can have Amplify add all required configurations for your GraphQL API. You have two options to integrate your application with an AppSync API:

  1. Directly using the Amplify GraphQL Client
  2. Using the AWS AppSync SDK

An excellent walk through of the Amplify toolkit is available here, including an example showing how to create a single page web app using ReactJS powered by an AppSync GraphQL API.

Finally, if you are interested in a full hands on experience, take a look at:

  • The Amazon API Gateway WildRydes workshop. This workshop teaches you how to build a functional single page web app with a REST backend, powered by API Gateway.
  • The AWS AppSync GraphQL Photo Workshop. This workshop teaches you how to use Amplify to quickly build a Photo sharing web app, powered by AppSync.

Useful Documentation

The official AWS documentation is the source of truth for architects and developers. Get started with the API Gateway developer guide. API Gateway is currently has two APIs (V1 and V2) for managing the service. Here is where you can view the SDK and CLI reference.

Get started with the AppSync developer guide, and review the AppSync management API.

Summary

As an API architect, your job is not only to design and implement the best API for your use case, but your job is also to figure out which type of API is most cost effective for your product. For example, an application with high request volume (“chatty“) may benefit from a GraphQL implementation instead of REST.

API Gateway currently charges $3.50 / million requests and provides a free tier of 1 Million requests per month. There is tiered pricing that will reduce your costs as request volume rises. AppSync currently charges $4.00 / million for Query and Mutation requests.

While AppSync pricing per request is slightly higher, keep in mind that the nature of GraphQL APIs typically result in significantly fewer overall request numbers.

Finally, we encourage you to join us in the coming weeks — we will be starting a series of posts covering messaging best practices.

About the Author

George MaoGeorge Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.

Wag!: Why Even Your Dog Loves a Canary Deployment

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/wag-why-even-your-dog-loves-a-canary-deployment/

Since August 26 was National Dog Day, we thought it might be a great time to talk about why Wag!,an on-demand dog-walking, boarding, and pet-setting service that’s available in 43 states and 100 cities, deployed blue-green (or canary) architecture for increased availability and reduced risk using Amazon ECS.

Last June, Dave Bullock, Director of Engineering from Wag Labs Inc., talked with AWS Senior Solutions Architect Peter Tilsen about how the company needed to find a faster solution for updates and rollbacks than what the previously solution, AWS OpsWorks, could provide. What used to take up to 10 minutes for a rolling deployment for all of their cluster instances, now is done in a few minutes.

Listen to Dave as he shows us how Wag! runs canary deployments (a technique for releasing applications by shifting traffic between two identical environments running different versions of the application) of containerized applications with ECS.

*Check out more This Is My Architecture video series.

About the author

Annik StahlAnnik Stahl is a Senior Program Manager in AWS, specializing in blog and magazine content as well as customer ratings and satisfaction. Having been the face of Microsoft Office for 10 years as the Crabby Office Lady columnist, she loves getting to know her customers and wants to hear from you.