Tag Archives: Architecture

Using VPC Sharing for a Cost-Effective Multi-Account Microservice Architecture

Post Syndicated from Anandprasanna Gaitonde original https://aws.amazon.com/blogs/architecture/using-vpc-sharing-for-a-cost-effective-multi-account-microservice-architecture/

Introduction

Many cloud-native organizations building modern applications have adopted a microservice architecture because of its flexibility, performance, and scalability. Even customers with legacy and monolithic application stacks are embarking on an application modernization journey and opting for this type of architecture. A microservice architecture allows applications to be composed of several loosely coupled discreet services that are independently deployable, scalable, and maintainable. These applications can comprise a large number of microservices, which often span multiple business units within an organization. These customers typically have a multi-account AWS environment with each AWS account belonging to an individual business unit. Their microservice implementations reside in the Virtual Public Clouds (VPCs) of their respective AWS accounts. You can set up multi-account AWS environment incorporating best practices using AWS Landing Zone or AWS Control Tower.

This type of multi-account, multi-VPC architecture provides a good boundary and isolation for individual microservices and achieves a highly available, scalable, and secure architecture. However, for microservices that require a high degree of interconnectivity and are within the same trust boundaries, you can use other AWS capabilities to optimize cost and network management complexity.

This blog presents a cost-effective approach that requires less VPC management while still using separate accounts for billing and access control. This approach does not sacrifice scalability, high availability, fault tolerance, and security. To achieve a similar microservice architecture, you can share a VPC across AWS accounts using AWS Resource Access Manager (AWS RAM) and Network Load Balancer (NLB) support in a shared Amazon Virtual Private Cloud (VPC). This allows multiple microservices to coexist in the same VPC, even though they are developed by different business units.

Microservices architecture in a multi-VPC approach

In this architecture, microservices deployed across multiple VPCs use privately exposed endpoints for better security posture instead of going over the internet. This requires the customers to enable inter-VPC communication using the various networking capabilities of AWS as shown below:

microservices deployed across multiple VPCs use privately exposed endpoints

In the above reference architecture, we created a VPC in Account A, which is hosting the front end of the application across a fleet of Amazon Elastic Compute Cloud (Amazon EC2) instances using an AWS Auto Scaling group. For simplicity, we’ve illustrated a single public and private subnet for the application front end. In reality, this spans across multiple subnets across multiple Availability Zones (AZ) to support a highly available and fault-tolerant configuration.

To ensure security, the application must communicate privately to microservices mS1 and mS2 deployed in VPC of Account B and Account C respectively. For high availability, these microservices are also implemented using a fleet of Amazon EC2 instances with the Auto Scaling group spanning across multiple subnets/availability zones. For high-performance load balancing, they are fronted by a Network Load Balancer.

While this architecture shows an implementation using Amazon EC2, it can also use containerized services deployed using Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). These microservices may have interdependencies and invoke each other’s’ APIs for servicing the requests of the application layer. This application to mS and mS to mS communication can be achieved using following possible connectivity options:

When only few VPC interconnections are required, Amazon VPC peering and AWS PrivateLink may be a viable option. For higher number of VPC interconnections, we recommend AWS Transit Gateway for better manageability of connections and routing through a centralized resource. However, based on the amount of traffic this can introduce significant costs to your architecture.

Alternative approach to microservice architecture using Network Load Balancers in a shared VPC

The above architecture pattern allows your individual microservice teams to continue to own their AWS resources that host their microservice implementation. But they can deploy them in a shared VPC owned by the central account, eliminating the need for inter-VPC network connections. You can share Amazon VPCs to use the implicit routing within a VPC for applications that require a high degree of interconnectivity and are within the same trust boundaries.

This architecture uses AWS RAM, which allows you to share the VPC Subnets from AWS Account A to participating AWS accounts within your AWS organization. When the subnets are shared, participant AWS accounts (Account B and Account C) can see the shared subnets in their own environment. They can then deploy their Amazon EC2 instances in those subnets. This is depicted in the diagram where the visibility of the shared subnets (SS1 and SS2) is extended to the participating accounts (Account B and Account C).

You can also deploy the NLB in these shared subnets. Then, each participant account owns all the AWS resources for their microservice stack, but it’s deployed in the VPC of Account A.

This allows your individual microservice teams to maintain control over load balancer configurations and Auto Scaling policies based for their specific microservices’ needs. At the same time, using the AWS RAM they are able to effectively use the existing VPC environment of Account A.

This architecture presents several benefits over the multi-VPC architecture discussed earlier:

  • You can deploy the entire application, including the individual microservices, into a single shared VPC. This is while still allowing individual microservice teams control over their AWS resources deployed in that VPC.
  • Since the entire architecture now resides in a single VPC, it doesn’t require other networking connectivity features. It can rely on intra-VPC traffic for communication between the application (API) layer and microservices.
  • This leads to reduction in cost of the architecture. While the AWS RAM functionality is free of charge, this also reduces the data transfer and per-connection costs incurred by other options such as VPC peering, AWS PrivateLink, and AWS Transit Gateway.
  • This maintains the isolation across the individual microservices and the application layer.  Participants can’t view, modify, or delete resources that belong to others or the VPC owner.
  • This also leads to effective utilization of your VPC CIDR block resources.
  • Since multiple subnets belonging to different Availability Zones are shared, the application and individual mS continues to take advantage of scalability, availability, and fault tolerance.

The following illustration shows how you can configure AWS RAM to set up the VPC subnet resource shares between owner Account A and participating Account B. The example below shows the sharing of private subnet SS1 using this method:

(Click for larger image)

Accounts A and B Resource Share

Once this subnet is shared, the participating Account B can launch its Network Load Balancer of its microservice ms1 in the shared VPC subnet as shown below:

Account B can launch its Network Load Balancer of its microservice ms1 in the shared VPC subnet

While this architecture has many advantages, there are important considerations:

  • This style of architecture is suitable when you are certain that the number of microservices is small enough to coexist in a single VPC without depleting the CIDR block of the shared subnets of the VPC.
  • If the traffic between these microservices is in-significant, then the cost benefit of this architecture over other options may not be substantial. This is due to the effect of traffic flow on data transfer cost.

Conclusion

AWS Cloud provides several options to build a microservices architecture. It is important to look at the characteristics of your application to determine which architectural choices top opt for. The AWS RAM and the ability to deploy AWS resources (including Network Load Balancers in shared VPC) helps you eliminate inter-VPC traffic and associated networking costs. And this without sacrificing high availability, scalability, fault tolerance, and security for your application.

Formula 1: Using Amazon SageMaker to Deliver Real-Time Insights to Fans

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/formula-1-using-amazon-sagemaker-to-deliver-real-time-insights-to-fans-live/

The Formula one Group (F1) is responsible for the promotion of the FIA Formula One World Championship, a series of auto racing events in 21 countries where professional drivers race single-seat cars on custom tracks or through city courses in pursuit of the World Championship title.

Formula 1 works with AWS to enhance its race strategies, data tracking systems, and digital broadcasts through a wide variety of AWS services—including Amazon SageMaker, AWS Lambda, and AWS analytics services—to deliver new race metrics that change the way fans and teams experience racing.

In this special live segment of This is My Architecture, you’ll get a look at what’s under the hood of Formula 1’s F1 Insights. Hear about the machine learning algorithms the company trains on Amazon SageMaker and how inferences are made during races to deliver insights to fans.

For more content like this, subscribe to our YouTube channels This is My Architecture, This is My Code, and This is My Model, or visit the This is My Architecture AWS website, which has search functionality and the ability to filter by industry, language, and service.

Delve into the Forumla 1 case study to learn more about how AWS fuels analytics through machine learning.

AWS Architecture Monthly Magazine: Data Lakes

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-magazine-data-lakes/

A data lake is the fastest way to get answers from all your data to all your users. It’s a centralized repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning—to guide better decisions.

In This Issue

In this month’s Architecture Monthly, we speak to AWS Analytics Tech Leader, Taz Sayed, about general architecture trends in data lakes, the questions customers need to ask themselves before considering a data lake, and we get his outlook on the role the cloud will play in future development efforts.

We also introduce you to two companies that are utilizing data lakes for deep analytics, point you to an AWS managed solution, provide some real-world videos, and more.

  • Ask an Expert: Taz Sayed, Tech Leader, AWS Analytics
  • Blog: Kayo Sports builds real-time view of the customer on AWS
  • Case Study: Yulu Uses a Data Lake on AWS to Pedal a Change
  • Solution: Data Lake on AWS
  • Managed Solution: AWS Lake Formation
  • Whitepaper: Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility

How to Access the Magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

Introduction to Messaging for Modern Cloud Architecture

Post Syndicated from Sam Dengler original https://aws.amazon.com/blogs/architecture/introduction-to-messaging-for-modern-cloud-architecture/

We hope you’ve enjoyed reading our posts on best practices for your serverless applications. The posts in the following series will focus on best practices when introducing messaging patterns into your applications. Let’s review some core messaging concepts and see how they can be used to address challenges when designing modern cloud architectures.

Introduction

Applications can communicate information with each other using messages, a mechanism for packaging a data payload and associated metadata. The application that sends a message is called the producer and the application that receives the message is called the consumer. Producers and consumers exchange messages using a variety of transportation channels, for example point-to-point requests, message queues, subscription topics, or event buses. These transportation channels have differently characteristics that make them useful when implementing message communication patterns. Dependencies emerge when producers and consumers exchange messages, which is called coupling.

Synchronous Communication

synchronous communication

Message communication is called synchronous when the producer sends a message to the consumer and waits for a response before the producer continues its processing logic. An example of synchronous communication over a point-to-point channel is when a HTTP client makes a request to a HTTP service, waits for the service to process the request, and then applies logic to the HTTP response to determine how to proceed.

Synchronous communication patterns are more straightforward to implement, however they create tight coupling between producers and consumers. Tight coupling can cause problems due to traffic spikes and failures propagating directly throughout the application. For example, in a three-tier architecture, when the application experiences a spike in client traffic, the web tier directly translates the traffic spike as pressure on downstream resources (the logic and data tiers), which may not scale to meet the sudden demand. Likewise, downstream resource failure in the logic or data tier directly impacts the web tier from responding to client requests. Applications can mimic a synchronous experience, for example a status spinner, using asynchronous communication with a polling or push notification strategy.

Asynchronous Communication

Asynchronous communication

Message communication is called asynchronous when the producer sends a message to the consumer and proceeds without waiting for the response. An example of asynchronous communication over a message queue channel is when a client publishes a message to a queue, and after the queue acknowledges receipt of the message, the publisher proceeds without waiting for the consumer to process the message.

Asynchronous communication patterns are implemented using transportation channels such as queues, topics, and event buses to create loose coupling between producers and consumers. Loose coupling increases an architecture’s resiliency to failure and ability to handle traffic spikes because it creates an indirection between producer and consumer communication, enabling them to operate independently of each other. Using the three-tier architecture example, a message queue can be introduced between the web, logic, and data tiers to enable each to scale independently of each other. When the application experiences a spike in client traffic, the web tier translates the traffic spike as more messages to the queue for processing, however the logic tier may continue to process messages off the queue without being directly impacted.

Considerations and Next Steps

Although asynchronous communication patterns can benefit modern cloud architectures, there are tradeoffs to consider. Asynchronous messaging adds latency to end-to-end processing time due to the addition of middleware. Producers and consumers take a dependency on the middleware stack, which must also scale to meet demand and be resilient to failure. Care must be taken to appropriately configure producers, consumers, and middleware to handle errors so that messages are not lost, more monitoring is required to ensure proper operations, and multiple logs must be correlated to troubleshoot and diagnose problems.

Amazon MQ, Amazon KinesisAmazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), and Amazon EventBridge are highly available, large scale, failure resistant managed services that can be used to implement asynchronous messaging patterns. You can explore these services at the AWS Messaging page and their integration into Serverless Architectures in the free new digital course, Architecting Serverless Solutions. You can also visit the AWS Event-Driven Architecture page to learn how to apply messaging patterns to build event-driven solutions. The upcoming posts in this series will explore these AWS services to help ensure message patterns are implemented using best practices when applied to modern cloud architecture.

Updated Certification Exam Validates AWS Architect Skills

Post Syndicated from Beth Shepherd original https://aws.amazon.com/blogs/architecture/updated-certification-exam-validates-aws-architect-skills/

We are excited to announce a new version of the AWS Certified Solutions Architect — Associate certification exam. This certification validates an individual’s ability to design and deploy well-architected solutions on AWS that meet customer requirements. The new exam version includes updated content across all domains as well as new objectives in categories such as databases, cost optimization, and security.

What Does this Certification Represent?

The solutions architect role and skill set is critical for AWS customers and partners, so we keep our exams updated to reflect the rapid pace of innovation on the AWS platform and the latest in best practices for architecting on the AWS Cloud.

To update this certification exam, we work with experienced experts in AWS architecture to set the bar for competency in an Associate-level solutions architect role. This certification demonstrates an individual meets that bar. Individuals who have AWS Certified Solutions Architect — Associate report an average salary in the top 10 for highest paying IT certifications globally.

When candidates earn this credential, they are demonstrating knowledge of core AWS services and best practices along with the ability to design and implement solutions on AWS that are available, cost-efficient, fault-tolerant, secure, and scalable. The new version of the exam covers the following domains:

  • Design Resilient Architectures
  • Design High-Performing Architectures
  • Design Secure Applications and Architectures
  • Design Cost-Optimized Architectures

Who Should Take the Exam?

We recommend candidates have one year of hands-on experience designing solutions on AWS, hands-on experience with AWS services such as compute, networking, storage, and database, and experience with AWS deployment and management services. For a comprehensive list of recommended knowledge and experience, see the exam guide.

How to Prepare for the Exam?

If you are new to architecting on AWS or looking to add to your skills, AWS Training and Certification also provides a learning path and comprehensive ramp-up guide for the architect role. The architect learning path includes classroom course offerings such as AWS Technical Essentials, Architecting on AWS, and Exam Readiness: AWS Certified Solutions Architect — Associate. The architect ramp-up guide starts with whitepapers and training options designed to help you learn the fundamentals of AWS Cloud. Once you have the basics, the guide moves on to cloud architect fundamentals, such as the AWS Well-Architected Framework, and how to gain exposure to best practices with hands-on labs, whitepapers, and more.

Architect learning path diagram

Resources for Organizations

If you’re an organization looking to challenge your team to the meet the bar set by experts in AWS architecture, AWS Training and Certification offers the resources you need to develop your team, innovate in the cloud, and transform your organization. Options range from comprehensive training plans to certification exam vouchers for your team. Learn more about AWS Training and Certification.

Get AWS Certified

Sign up today to take the latest AWS Certified Solutions Architect — Associate exam (SAA-C02) at testing centers worldwide for $150 USD. The previous version of the exam will be available through March 22, 2020.

NextGen Healthcare: Build and Deployment Pipelines with AWS

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/nextgen-healthcare-build-and-deployment-pipelines-with-aws/

Owen Zacharias, Vice President of Application Delivery at NextGen Healthcare, explains to AWS Solutions Architect Andrea Sabet how his company developed a series of build and deployment pipelines using native AWS services in the highly regulated healthcare sector.

Learn how the following services can be used to build and deploy infrastructure and application code:

Discover how AWS resources can be rapidly created and updated as part of a CI/CD pipeline while ensuring HIPAA compliance through approved/vetted AWS Identity and Access Management (IAM) roles that AWS CloudFormation is permitted to assume.

February’s AWS Architecture Monthly magazine is all about healthcare. Check it out on Kindle Newsstand, download the PDF, or see it on Flipboard.

*Check out more This Is My Architecture video series.

Building a serverless URL shortener app without AWS Lambda – part 2

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-a-serverless-url-shortener-app-without-lambda-part-2/

This post is the second installment of a three-part series on building a serverless URL shortener without using AWS Lambda. The purpose of the series is to highlight the power of Amazon API Gateway and its ability to integrate directly with backend services like Amazon DynamoDB. The result is a low latency, highly available application that is built with managed services and requires minimal code.

In part one, I cover using Apache Velocity Templating Language (VTL) and Amazon API Gateway to manage business logic usually processed by an AWS Lambda function. Now I discuss several methods for securing the API Gateway and any resources behind it. Whether building a functionless application as described in part one, or proxying a compute layer with API Gateway, this post offers some best practices for configuring API Gateway security.

To refer to the full application, visit https://github.com/aws-samples/amazon-api-gateway-url-shortener. The template.yaml file is the AWS SAM configuration for the application, and the api.yaml is the OpenAPI configuration for the API. I include instructions on how to deploy the full application, together with a simple web client, in the README.md file.

There are several steps to secure the API. First, I use AWS Identity and Access Management (IAM) to ensure I practice least privilege access to the application services. Additionally, I enable authentication and authorization, enforce request validation, and configure Cross-Origin Resource Sharing (CORS).

Secure functionless architecture

IAM least privileges

When configuring API Gateway to limit access to services, I create two specific roles for interaction between API Gateway and Amazon DynamoDB.

The first, DDBReadRole, limits actions to GetItem, Scan, and Query. This role is applied to all GET methods on the API. For POST, PUT, and DELETE methods, there is a separate role called DDBCrudRole that allows only the DeleteItem and UpdateItem actions. Additionally, the SAM template dynamically assigns these roles to a specific table. Thus, allowing these roles to only perform the actions on this specific table.

Authentication and authorization

For authentication, I configure user management with Amazon Cognito. I then configure an API Gateway Cognito authorizer to manage request authorization. Finally, I configure the client for secure requests.

Configuring Cognito for authentication

For authentication, Cognito provides user directories called user pools that allow user creation and authentication. For the user interface, developers have the option of building their own with AWS Amplify or having Cognito host the authentication pages. For simplicity, I opt for Cognito hosted pages. The workflow looks like this:

Cognito authentication flow

To set up the Cognito service, I follow these steps:

  1. Create a Cognito user pool. The user pool defines the user data and registration flows. This application is configured to use an email address as the primary user name. It also requires email validation at time of registration.Cognito user pool
  2. Create a Cognito user pool client. The client application is connected to the user pool and has permission to call unauthenticated APIs to register and login users. The client application configures the callback URLs as well as the identity providers, authentication flows, and OAuth scopes.Cognito app client
  3. Create a Cognito domain for the registration and login pages. I configure the domain to use the standard Cognito domains with a subdomain of shortener. I could also configure this to match a custom domain.Cognito domain

Configuring the Cognito authorizer

Next, I integrate the user pool with API Gateway by creating a Cognito authorizer. The authorizer allows API Gateway to verify an incoming request with the user pool to allow or deny access. To configure the authorizer, I follow these steps:

  1. Create the authorizer on the API Gateway. I create a new authorizer and connect it to the proper Cognito user pool. I also set the header name to Authorization.Cognito authorizer
  2. Next I attach the authorizer to each resource and method needing authorization by this particular authorizer.Connect Cognito authorizer to method

Configure the client for secure requests

The last step for authorized requests is to configure the client. As explained above, the client interacts with Amazon Cognito to authenticate and obtain temporary credentials. The truncated temporary credentials follow the format:

{
  "id_token": "eyJraWQiOiJnZ0pJZzBEV3F4SVUwZngreklE…",
  "access_token": "eyJraWQiOiJydVVHemFuYjJ0VlZicnV1…",
  "refresh_token": "eyJjdHkiOiJKV1QiLCJlbmMiOiJBMjU…",
  "expires_in": 3600,
  "token_type": "Bearer"
}

For the client to access any API Gateway resources that require authentication, it must include the Authorization header with the value set to the id_token. API Gateway treats it as a standard JSON Web Token (JWT), and decodes for authorization.

Request validation

The next step in securing the application is to validate the request payload to ensure it contains the expected data. When creating a new short link, the POST method request body must match the following:

{
  “id”: ”short link”,
  “url”: “target url”
}

To configure request validation, I first create a schema defining the expected POST method body payload. The schema looks like this:

{
  "required" : [ "id", "url" ],
  "type" : "object",
  "properties" : {
    "id" : { "type" : "string"},
    "url" : {
      "pattern" : "^https?://[[email protected]:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([[email protected]:%_\\+.~#?&//=]*)",
      "type" : "string”
    }
  }
}

The schema requires both id and url, and requires that they are both strings. It also uses a regex pattern to ensure that the url is a valid format.

Next, I create request validator definitions. Using the OpenAPI extensibility markup, I create three validation options: all, params-only, and body-only. Here is the markup:

 

x-amazon-apigateway-request-validators:
  all:
    validateRequestBody: true
    validateRequestParameters: true
  body:
    validateRequestBody: true
    validateRequestParameters: false
  params:
    validateRequestBody: false
    validateRequestParameters: true

These definitions appear in the OpenAPI template and are mapped to the choices on the console.

Attaching validation to methods

With the validation definitions in place, and the schema defined, I then attach the schema to the POST method and require validation of the request body against the schema. If the conditions of the schema are not met, API Gateway rejects the request with a status code of 400 and an error message stating, “Invalid request body”.

CORS

Cross-Origin Resource Sharing is a mechanism for allowing applications from different domains to communicate. The limitations are based on exchanged headers between the client and the server. For example, the server passes the Access-Control-Allow-Origin header, which indicates which client domain is allowed to interact with the server. The client passes the Origin header that indicates what domain the request is coming from. If the two headers do not match exactly, then the request is rejected.

It is also possible to use a wildcard value for many of the allowed values. For Origin, this means that any client domain can connect to the backend domain. While wildcards are possible, it is missing an opportunity to add another layer of security to the application. In light of this, I configure CORS to restrict API access the client application. To help understand the different CORS settings required, here is a layout of the API endpoints:

API resource and methods structure

When an endpoint requires authorization, or a method other than GET is used, browsers perform a pre-flight OPTIONS check. This means they make a request to the server to find out what the server allows.

To accommodate this, I configure an OPTIONS response using an API Gateway mock endpoint. This is the header configuration for the /app OPTIONS call:

Access-Control-Allow-Methods‘POST, GET, OPTIONS’
Access-Control-Allow-Headers‘authorization, content-type’
Access-Control-Allow-Origin‘<client-domain>’

The configuration for the /app/{linkId} OPTIONS call is similar:

Access-Control-Allow-Methods‘PUT, DELETE, OPTIONS’
Access-Control-Allow-Headers‘authorization, content-type’
Access-Control-Allow-Origin‘<client-domain>’

In addition to the OPTIONS call, I also add the browser required, Access-Control-Allow-Origin to the response header of PUT, POST, and DELETE methods.

Adding a header to the response is a two-step process. Because the response to the client is modeled at the Method Response, I first set the expected header here:

Response headers

The Integration Response is responsible for mapping the data from the integrated backend service to the proper values, so I map the value of the header here:

Resonse header values

With the proper IAM roles in place, authentication and authorization configured, and data validation enabled, I now have a secure backend to my serverless URL Shortener. Additionally, by making proper use of CORS I have given my test client access to the API to provide a full-stack application.

Conclusion

In this post, I demonstrate configuring built-in features of API Gateway to secure applications fronted with API Gateway. While this is not an exhaustive list of API Gateway features, it is a good starting point for API security and what can be done at the API Gateway level. In part three, I discuss how to observe and improve the performance of the application, as well as reporting on internal application metrics.

Continue to part three.

Happy coding!

Building a serverless URL shortener app without AWS Lambda – part 1

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/building-a-serverless-url-shortener-app-without-lambda-part-1/

When building applications, developers often use a standard multi-tier architecture pattern that generally includes a presentation, processing, and data tier. When building such an application using serverless technologies on AWS, it might look like the following:

Serverless architecture

In this three-part series, I am going to challenge you to approach this a different way by building a functionless or “backend-less” URL shortener application, that looks like this:

Functionless architecture

In part one, I discuss configuring a service integration between Amazon API Gateway and Amazon DynamoDB, removing the need for AWS Lambda entirely. I also demonstrate using Apache’s Velocity Templating Language (VTL) to apply business logic and modify the API request and response as needed. In part two, I show how to use API Gateway to increase security. In part three, I demonstrate how to improve response time and configure observability to get insights into application performance and client usage.

At AWS re:Invent 2019, the new HTTP API for Amazon API Gateway was announced. At the time of this writing, this new service does not support VTL or some of the other features discussed, so instead I use a REST API. When HTTP API gains feature parity, we will publish an additional follow up to this post.

Throughout this blog series, there are deep links to AWS SAM and OpenAPI configurations to show how to build this application using infrastructure as code (IaC). To refer to the full application, visit https://github.com/aws-samples/amazon-api-gateway-url-shortener. The template.yaml file is the AWS SAM configuration for the application, and the api.yaml is the OpenAPI configuration for the API. I have included instructions on how to deploy the full application, including a simple web client, in the README.md file.

Why would I do this?

AWS Lambda is the standard compute resource for serverless applications. With a Lambda function, I can process complex business logic in any of the AWS supported runtimes or even in my own custom runtime. However, do I really need to use a Lambda function when the business logic is minimal, and the main purpose becomes the transportation of data? Instead, I can turn to API Gateway to transport the data and process minimal amounts of business logic, as needed, with VTL. This allows me to minimize my application resources and cost.

API Gateway service integration

While each request to an API Gateway REST endpoint follows the same path, to understand how service integrations work, I show the integration for /app – POST. This represents the lifecycle of a request made to http://myexampleapi.com/api using a POST method. The purpose of this endpoint is to post new short links to the database.

API Gateway request lifecycle

The Method Request and Method Response mainly handle authorization, modeling, and validation, and are covered in detail in part two of this blog. For now, I focus on the Integration Request and Integration Response. The Integration Request is responsible for service integrations, and looks like this:

POST integration request

The Integration type is AWS Service and the AWS Region is my closest Region, us-west-2. For AWS Service, I choose DynamoDB from the long list of available services. For the HTTP Method, when interacting with the DynamoDB API, the POST method is required to take action on the underlying table.

For the Action, I choose UpdateItem. The action is the same here as you would use in the CLI or SDK to interact with DynamoDB. Generally, when adding new items to the DynamoDB table, I use the PutItem command. However, in this instance I must use UpdateItem to get a specific set of return data from DynamoDB.

When creating a new record in DynamoDB, the PutItem action does not return the completed record in the single request. If I want to obtain the new record, I need to make a secondary call to DynamoDB to fetch the record. However, the API Gateway request lifecycle does not have the ability to call the database a second time. I need to make sure I get everything I need the first time around. The nature of the UpdateItem is to update an existing item or create a new one if it doesn’t exist. Additionally, it returns the newly created object which I can then return to the client.

Finally, I configure the execution role. On this method, API Gateway needs permission to read and write from DynamoDB. Here is the policy section of the DDBCrudRole:

Policies:
  - PolicyName: DDBCrudPolicy
    PolicyDocument:
      Version: '2012-10-17'
      Statement:
        Action:
          - dynamodb:DeleteItem
          - dynamodb:UpdateItem
        Effect: Allow
        Resource: !GetAtt LinkTable.Arn

This simple policy is used for all create, read, update, and delete (CRUD) operations, and UpdateItem is used for both create and update. This policy is part of the SAM template, and dynamically references the DynamoDB table name for the resource. This follows the principles of least privilege, only allowing access to the required table.

Modifying the request

Now that I have configured the integration from API Gateway to DynamoDB, I modify the incoming request to a format that DynamoDB understands. Further down the page on the Integration Request, you see the Mapping Template option:

Mapping templates

The mapping template evaluates incoming request body and looks for existing templates to apply. I have created a template for application/json to match the incoming body. Here is a summarized version of the template:

{
  "TableName": "URLShortener-LinkTable-QTK7WFAJ11YS",
  "ConditionExpression":"attribute_not_exists(id)",
  "Key": {
    "id": { "S": $input.json('$.id') }
  },
  "ExpressionAttributeNames": {
    "#u": "url",
    "#o": "owner",
    "#ts": "timestamp"
  },
  "ExpressionAttributeValues":{
    ":u": {"S": $input.json('$.url')},
    ":o": {"S": "$context.authorizer.claims.email"},
    ":ts": {"S": "$context.requestTime"}
  },
  "UpdateExpression": "SET #u = :u, #o = :o, #ts = :ts",
  "ReturnValues": "ALL_NEW"
}

If you have worked with the DynamoDB SDK, this might look familiar. The TableName indicates which table to use in the call. The ConditionExpression value ensures that the id passed does not already exist. The value for id is extracted from the request body using $input.json(‘$.id’).

To avoid colliding with reserved words, DynamoDB has the concept of ExpressionAttributeNames and ExpressionAttributeValues. In the ExpressionAttributeValues I have set ‘:o’ to $context.authorizer.claims.email. This extracts the authenticated user’s email from the request context and maps it to owner. This allows me to uniquely group a single user’s links into a global secondary index (GSI). Querying the GSI is much more efficient than scanning the entire table.

I also retrieve the requestTime from the context object, allowing me to place a timestamp in the record. I set the ReturnValues to return all new values for the record.  Finally, the UpdateExpression maps the values to the proper names and inserts the item into DynamoDB.

Modifying the response

Before I discuss the Integration Response, let’s examine the Method Response:

Method response

The Method Response is responsible for modeling the response to the client. In most cases, DynamoDB returns a status code of either 200 or 400. Therefore, I configure a 200 response and a 400 response.

When DynamoDB returns a 200 response, the data looks like the following:

{
  "id": {"S": "aws"},
  "owner": {"S": "[email protected]"},
  "timestamp": {"S": "27/Dec/2019:21:21:17 +0000"},
  "url": {"S": "http://aws.amazon.com"}
}

In the Integration Response, I have a template that converts this to a structure that the client is expecting. The template looks like this:

#set($inputRoot = $input.path('$'))
{
  "id":"$inputRoot.Attributes.id.S",
  "url":"$inputRoot.Attributes.url.S",
  "timestamp":"$inputRoot.Attributes.timestamp.S",
  "owner":"$inputRoot.Attributes.owner.S"
}

This template has a variable called ­$inputRoot to contain the root data. I then build out the return object, formatted for the client:

{
  "id": "aws",
  "url": http://aws.amazon.com,
  "timestamp": "27/Dec/2019:21:21:17 +0000",
  "owner": "[email protected]"
}

For a 400 status, I must evaluate the issue and respond accordingly. The mapping template looks like this:

#set($inputRoot = $input.path('$')) 
#if($inputRoot.toString().contains("ConditionalCheckFailedException")) 
  #set($context.responseOverride.status = 200)
  {"error": true,"message": "URL link already exists"} 
#end

This template checks for the string, “ConditionalCheckFailedException”. If it exists, then I know that the conditional check “attribute_not_exists(id)”, from the UpdateItem template in the Integration Request failed. To return a 200 response, I use the “#set($context.responseOverride.status = 200)” override andset the response with the error details.

With my integration and mapping templates in place for the /app – POST method, I now have the ability to create new short links for my URL shortener. Taking this same approach for reading, updating, and deleting short links, I now have a fully functioning backend for the URL shortener that only uses API Gateway and DynamoDB.

What we have built so far

Conclusion

In this post, I walked through using VTL to manage simple business logic at the processing tier with API Gateway. I covered configuring the service integration with DynamoDB and modifying the request and response payloads as needed. In part 2, I discuss different options for configuring Amazon API Gateway security.

To deploy the URL shortener, visit https://github.com/aws-samples/amazon-api-gateway-url-shortener. The README.md file contains instructions for launching the application.

Continue to part two.

Happy coding!

New – Serverless Lens in AWS Well-Architected Tool

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-serverless-lens-in-aws-well-architected-tool/

When you build and run applications in the cloud, how often are you asking yourself “am I doing this right” ?

This is actually a very good question, and to let you get a good answer, we released publicly in 2015 the AWS Well-Architected Framework, a formal approach to compare your workload against our best practices, and get guidance on how to improve. Today, the Well-Architected Framework gives a consistent way for customers and partners to design and evaluate cloud architectures, and is based on five pillars:

  • Operational Excellence
  • Security
  • Reliability
  • Performance Efficiency
  • Cost Optimization

To provide more workload-specific advice, in 2017 we extended the framework with the concept of “lens” to go beyond a general perspective, and enter specific technology domains. Currently, there are three lenses that you can use:

  • Serverless
  • High Performance Computing (HPC)
  • IoT (Internet of Things)

The first thing to do to improve something, is decide what to measure and how. To let you review your workloads in a more structured way, we launched in 2018 the AWS Well-Architected Tool, a free tool available in the AWS Management Console, where you can define your workload, and answer questions regarding the five pillars.

You can use the Well-Architected Tool in different ways. For example:

  • If you’re working on a specific application, you can use the tool to assess risks and find areas for improvement.
  • If you’re responsible for multiple applications, you can use the tool to get visibility on the current status for all of them.

Today, I am happy to announce that we added the ability to apply lenses to the Well-Architected Tool, and the first one to be available is the Serverless Lens!

Using the Serverless Lens in AWS Well-Architected Tool
In the Well-Architected Tool console, I start by defining my workload. I am currently building the backend for a mobile app using the Amplify Framework. It’ll be a simple game, but I am going to use DynamoDB Global Tables to store data for my users, and the application will be running in two AWS Regions. Adding the AWS account IDs is optional, but can be useful to understand the application deployment in a multi-account setup.

Now, I can choose which lenses to apply. The AWS Well-Architected Framework is there by default. I select the Serverless Lens. This is adding a set of additional questions that help me understand how to design, deploy, and architect my serverless app following the framework best practices.

When the workload is defined, I start my review. I jump straight to the Serverless Lens. The new questions are distributed across the five pillars. For example, one of my favorite questions is around performance:

For each question, there are resources on the right side of the console that help me understand the possible answers and the terminology used. I select the activities and the technology choices that are part of my implementation, specifically:

  • I am using data streams (like those provided by Amazon Kinesis, or DynamoDB Streams) and asynchronous function invocations to improve concurrency.
  • I am caching user data in memory to reduce database accesses. I could also use the /tmp of the Lambda functions, or external data stores like Amazon ElastiCache.
  • I am removing functions when a service integration can natively do the job, for example when I need to call Kinesis Data Firehose from the Amazon API Gateway (this is optimizing my costs, too).

I save and exit, and even if I answered just one question, I already get some feedback from the tool. From the workload overview, I select the Serverless Lens. There, I notice that I have a high risk that I need to mitigate.

Just below, I have a suggestion on how to address the risk, including specific recommendations based on the question raising the risk. For a serverless application is important to balance performance and costs, using the right capacity unit that is automatically scaled by the platform.

I click on the first recommendation, and I receive specific action items for my improvement plan. This is covering the different architectural components I can use in my serverless apps, such as Lambda functions, DynamoDB tables, or API Gateway endpoints. In my case, I am going to follow the suggestion to use the Lambda Power Tuning open-source tool to fine-tune the memory/power configuration of my Lambda functions.

Before working on my improvement plan, I go on and answer all questions. I can now see the full report in the AWS console, or download it in PDF format to share it with other stakeholders. In this way, we can work together to plan the necessary improvements and have a successful serverless app.

Once we have made the improvements, I can go back and mark the correct answers to remove the high risk issue. Great architectures come as result of multiple iterations.

Available Now
The Serverless Lens is available today in all regions where the Well-Architected Tool is offered, as described in the AWS Region Table. It can be applied to existing workloads, or used for new workloads you define in the tool.

There is no costs in using the AWS Well-Architected Tool, you can use it to improve the application you are working on, or to get visibility into multiple workloads used by the department or area you are working on.

As a CIO/CTO, you can use it as a dashboard describing the status of all the applications you are responsible for. To make this easier, you can share a workload with another AWS account, that you can use to have a single view across multiple applications.

Since the output of the tool is a report with risks and how to address them, you should use the tool during the overall lifecycle of your application, especially during the design and implementation phase, and not just when you are going in production, because it may be too late to implement some of the suggestions you get.

Danilo

Nike: A Social Graph at Scale with Amazon Neptune

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/nike-a-social-graph-at-scale-with-amazon-neptune/

Getting a graph database to be performant and easy to use is very different from making a NoSQL (non-relational) database high-performing. Listen in as Todd Escalona of AWS talks with Marc Wangenheim, Senior Engineering Manager at Nike, about how the company powers a number of applications via a social graph, built on Amazon Neptune, that effectively maps millions of relationships among its users. They take a closer look at the underlying property graph that represents highly connected data, which allows users to select their interests such as basketball or training. These interest selections then drive personalized recommendations and curated content for consumers, based on entries in their graph.

 

Learn more about Amazon Neptune.

*Check out more This Is My Architecture video series.

 

Five Talent Collaborates with Customers Using the AWS Well-Architected Tool

Post Syndicated from Scott Sprinkel original https://aws.amazon.com/blogs/architecture/five-talent-collaborates-with-customers-using-the-aws-well-architected-tool/

Since its launch at re:Invent 2018, the AWS Well-Architected Tool (AWS W-A Tool) has provided a consistent process for documenting and measuring architecture workloads using the best practices from the AWS Well-Architected Framework. However, sharing workload reports for collaborative work experience was time consuming.

Well-Architected Tool

The new workload sharing feature solves these issues by offering a simple way to share workloads with other AWS accounts and AWS Identity and Account Management (IAM) users. Companies can leverage workload sharing to securely and efficiently collaborate and provide feedback about architecture implementation and design without sharing confidential account details through emails and PDFs. Multiple people across multiple organizations can now review a workload simultaneously and provide feedback and input.

Five Talent, a partner on the AWS Partner Network (APN) uses the workload sharing feature for a more collaborative Well-Architected review experience. With workload sharing, the company provides its clients with improved efficiency, transparency, and security.

“The new sharing feature has increased the efficiency across client and partner teams, which decreases the average time to remediate the high risks. By sharing the reviews in the AWS Console, we can protect sensitive customer data while staying informed in real time.” – Ryan Comingdeer, Chief Talent Officer, Five Talent.

Previously, Five Talent defined workloads in its AWS account and asked its clients to submit their workload information in a custom-built webform. Five Talent then generated and sent PDF reports via email. This was problematic for several reasons: Five Talent and its clients couldn’t control who had access to the report PDFs, it had no way to expedite high risk issue (HRI) remediation, and its recommendations could easily get lost in email correspondence. The workload sharing feature solves these problems and builds customer confidence through the ability for multiple people to work on reviews collaboratively.

Five Talent added extra security by customizing its workload sharing access controls based on its customer needs. The company is using the notes sections in the workload for secure and accurate communication. It shares links to documentation that can help clients take initiative and remediate HRIs — enabling quicker remediation and more transparent review cycles. Five Talent also highlights milestones in the AWS W-A Tool, enabling customers to prioritize HRIs without sorting through lengthy PDFs and email threads, which ultimately expedites the review and revision process.

The workload sharing feature has helped Five Talent drive down HRIs without requiring direct access to the AWS account where the workload is defined. This transparency and ability to work simultaneously helps keep all teams accountable while reinforcing the principles of the Well-Architected Framework.

Sign in to the Console and check out the new Shares tab in the AWS Well-Architected Tool, or visit the workload shares documentation to learn more.

Binge-Watch Live This is My Architecture Videos from AWS re:Invent

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/binge-watch-live-videos-from-aws-reinvent-2019/

AWS re:Invent 2019 was a whirlwind of activity, especially in the Expo Hall, where the AWS team spent four days filming 12 live This is My Architecture videos for Twitch. Watch one a day for the next two weeks…or eat them all in one sitting. Whichever you do, you’re guaranteed to learn something new.

Accolade

Discover security and operational excellence in healthcare with Accolade.

AWS Solution Builders

Get multi-region Availability with Amazon DynamoDB, Amazon S3, and Amazon Cognito.

EcoFit

EcoFit offers responsive, AWS Lambda-based microservices at scale.

Splunk

Splunk explains data at scale by decoupling compute and storage.

Crownpeak

Crownpeak uses AWS Lambda for its decoupled content deployment architecture.

Formula One Group

Learn how Formula One Group is using Amazon SageMaker to deliver real-time insights to fans.

Adobe

Adobe is simplifying networking across thousands of AWS accounts with AWS Transit Gateway.

The Trade Desk

The Trade Desk offers real-time ad bidding in the cloud with AWS Global Accelerator.

Mueller Water Products

Learn all about about scalable ingestion of sensor data for municipal water conservation with Mueller Water Products.

NextRoll

NextRoll is driving OpEx efficiency for ad bidding engines.

Pason Systems

Explore petabyte-scale drilling datamart on AWS with Pason Systems.

UltraServe

Application vending machine with runtime event control at UltraServe.

 

Be sure to visit the AWS channel on Twitch for more in-depth videos and interviews.

TMA Special: Connecting Taza Chocolate’s Legacy Equipment to the Cloud

Post Syndicated from Todd Escalona original https://aws.amazon.com/blogs/architecture/tma-special-connecting-taza-chocolates-legacy-equipment-to-the-cloud/

As a “bean to bar” chocolate manufacturer, Taza Chocolate uses traditional stone ground mills for the production of its famous chocolate discs. The analog, mid-century machines that the company imported from Central America were never built to connect to the cloud.

Along comes Tulip Interfaces, an AWS Industrial Software Competency Partner that makes the human and machine interaction easier by replacing paper processes with digital automation. Tulip retrofitted Taza’s legacy equipment with Internet of Things (IoT) sensors and connected it back to the AWS cloud.

Taza’s AWS cloud integration begins with Tulip’s own physical gateway that connects systems and machinery on the plant floor. Tulip then deploys IoT sensors to the machinery and passes outputs to the AWS cloud using an encrypted web socket where Tulip’s Kubernetes workers, managed by Kops, automatically schedule services across highly available instances and processes requests.

All job completion data is then fed to an Amazon RDS Multi-AZ PostgreSQL database that allows Taza to run visualizations and analytics for more insight using Prometheus and Garfana. In addition, all of the application definition metadata is contained in a MongoDB database service running on Amazon Elastic Cloud Compute (EC2) instances, which in return is VPC-peered with Kubernetes clusters. On top of this backend, Tulip uses a player application to stream metrics in near real-time that are displayed on the dashboard down on the shop floor and can be easily examined in order to help guide their operations and foster continuous improvements efforts to manufacturing operations.

Taza has realized many benefits from monitoring machine availability, performance, ambient conditions as well as overall process enhancements.

In this special, on-site This is My Architecture video, AWS Solutions Architect Evangelist Todd Escalona takes us on his journey through the Taza Chocolate factory where he meets with Taza’s Director of Manufacturing, Rich Moran, and Tulip’s DevOps lead, John Defreitas, to further explore how Tulip enables Taza Chocolate’s legacy equipment for cloud-based plant automation.

*Check out more This Is My Architecture video series.

Top 10 Architecture Blog Posts of 2019

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/top-10-architecture-blog-posts-of-2019/

As we wind our way toward 2020, I want to take a moment to first thank you, our readers, for spending time on our blog. We grew our audience quite a bit this year and the credit goes to our hard-working Solutions Architects and other blog post writers. Below are the top 10 Architecture blog posts written in 2019.

#10: How to Architect APIs for Scale and Security

by George Mao

George Mao, a Specialist Solutions Architect at AWS, focuses on serverless computing and has FIVE posts in the top ten this year. Way to go, George!

This post was the first in a series that focused on best practices and concepts you should be familiar with when you architect APIs for your applications.

Read George’s post.

#9: From One to Many: Evolving VPC Guidance

by Androski Spicer

Since its inception, the Amazon Virtual Private Cloud (VPC) has acted as the embodiment of security and privacy for customers who are looking to run their applications in a controlled, private, secure, and isolated environment.

This logically isolated space has evolved, and in its evolution has increased the avenues that customers can take to create and manage multi-tenant environments with multiple integration points for access to resources on-premises.

Read Androski’s post.

#8: Things to Consider When You Build REST APIs with Amazon API Gateway

by George Mao

REST API 2

This post dives deeper into the things an API architect or developer should consider when building REST APIs with Amazon API Gateway.

Read George’s post.

#7: How to Design Your Serverless Apps for Massive Scale

by George Mao

Serverless at scale-1

Serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations. In this series of posts, we’ll discuss topics that you should consider when designing your serverless architectures. First, we’ll look at architectural patterns designed to achieve massive scale with serverless.

Read George’s post.

#6: Best Practices for Developing on AWS Lambda

by George Mao

RDS instance: When to VPC enable a Lambda function

One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your AWS Lambda functions. Take advantage of this architecture with the tips in this post.

Read George’s post.

#5: Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis

by David Bailey

Figure 1 - Initial Landing Zone logging account resources

A key component of enterprise multi-account environments is logging. Centralized logging provides a single point of access to all salient logs generated across accounts and regions, and is critical for auditing, security and compliance. While some customers use the built-in ability to push Amazon CloudWatch Logs directly into Amazon Elasticsearch Service for analysis, others would prefer to move all logs into a centralized Amazon Simple Storage Service (Amazon S3) bucket location for access by several custom and third-party tools. In this blog post, David Bailey will show you how to forward existing and any new CloudWatch Logs log groups created in the future to a cross-account centralized logging Amazon S3 bucket.

Read David’s post.

#4: Updates to Serverless Architectural Patterns and Best Practices

by Drew Dennis

Drew wrote this post at about the halfway point between re:Invent 2018 and re:Invent 2019, where he revisited some of the recent serverless announcements we’ve made. These are all complimentary to the patterns discussed in the re:Invent architecture track’s Serverless Architectural Patterns and Best Practices session.

Read Drew’s post.

#3: Understanding the Different Ways to Invoke Lambda Functions

by George Mao

Invoking Lambda

In George’s first post of this series (#7 on this list), he talked about general design patterns to enable massive scale with serverless applications. In this post, he’ll review the different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Read George’s post.

#2: Using API Gateway as a Single Entry Point for Web Applications and API Microservices

by Anandprasanna Gaitonde and Mohit Malik

In this post, Anand and Mohit talk about a reference architecture that allows API Gateway to act as single entry point for external-facing, API-based microservices and web applications across multiple external customers by leveraging a different subdomain for each one.

Read Anand’s and Mohit’s post.

#1: 10 Things Serverless Architects Should Know

by Justin Pirtle

Building on the first three parts of the AWS Lambda scaling and best practices series where you learned how to design serverless apps for massive scale, AWS Lambda’s different invocation models, and best practices for developing with AWS Lambda, Justin invited you to take your serverless knowledge to the next level by reviewing 10 topics to deepen your serverless skills.

Read Justin’s post.

Thank You

Thanks again to all our readers and blog post writers. We look forward to learning and building amazing things together in the coming year.

Best of 2019

re:Invent 2019: Introducing the Amazon Builders’ Library (Part II)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-ii/

In last week’s post, I told you about a new site we introduced at re:Invent at the beginning of this month, the Amazon Builders’ Library, a site that’s chock-full articles by senior technical leaders that help you understand the underpinnings of both Amazon.com and AWS.

Below are four more architecture-based articles that describe how Amazon develops, architects, releases, and operates technology.

Caching Challenges and Strategies

Caching challenges

Over years of building services at Amazon we’ve experienced various versions of the following scenario: We build a new service, and this service needs to make some network calls to fulfill its requests. Perhaps these calls are to a relational database, or an AWS service like Amazon DynamoDB, or to another internal service. In simple tests or at low request rates the service works great, but we notice a problem on the horizon. The problem might be that calls to this other service are slow or that the database is expensive to scale out as call volume increases. We also notice that many requests are using the same downstream resource or the same query results, so we think that caching this data could be the answer to our problems. We add a cache and our service appears much improved. We observe that request latency is down, costs are reduced, and small downstream availability drops are smoothed over. After a while, no one can remember life before the cache. Dependencies reduce their fleet sizes accordingly, and the database is scaled down. Just when everything appears to be going well, the service could be poised for disaster. There could be changes in traffic patterns, failure of the cache fleet, or other unexpected circumstances that could lead to a cold or otherwise unavailable cache. This in turn could cause a surge of traffic to downstream services that can lead to outages both in our dependencies and in our service.

Read the full article by Matt Brinkley, Principal Engineer, and Jas Chhabra, Principal Engineer

Avoiding Fallback in Distributed Systems

Avoiding fallback

Critical failures prevent a service from producing useful results. For example, in an ecommerce website, if a database query for product information fails, the website cannot display the product page successfully. Amazon services must handle the majority of critical failures in order to be reliable.

This article covers fallback strategies and why we almost never use them at Amazon. You might find this surprising. After all, engineers often use the real world as a starting point for their designs. And in the real world, fallback strategies must be planned in advance and used when necessary. Let’s say an airport’s display boards go out. A contingency plan (such as humans writing flight information on whiteboards) must be in place to handle this situation, because passengers still need to find their gates. But consider how awful the contingency plan is: the difficulty of reading the whiteboards, the difficulty of keeping them up-to-date, and the risk that humans will add incorrect information. The whiteboard fallback strategy is necessary but it’s riddled with problems.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Leader Elections in Distributed Systems

Leader elections

Leader election is the simple idea of giving one thing (a process, host, thread, object, or human) in a distributed system some special powers. Those special powers could include the ability to assign work, the ability to modify a piece of data, or even the responsibility of handling all requests in the system.

Leader election is a powerful tool for improving efficiency, reducing coordination, simplifying architectures, and reducing operations. On the other hand, leader election can introduce new failure modes and scaling bottlenecks. In addition, leader election may make it more difficult for you to evaluate the correctness of a system.

Because of these complications, we carefully consider other options before implementing leader election. For data processing and workflows, workflow services like AWS Step Functions can achieve many of the same benefits as leader election and avoid many of its risks. For other systems, we often implement idempotent APIs, optimistic locking, and other patterns that make a single leader unnecessary.

In this article, I discuss some of the pros and cons of leader election in general and how Amazon approaches leader election in our distributed systems, including insights into leader failure.

Read the full article by Mark Brooker, Senior Principal Engineer

Workload Isolation Using Shuffle-Sharding

Shuffle-sharding

Not long after AWS began offering services, AWS customers made clear that they wanted to be able to use our Amazon Simple Storage Service (S3), Amazon CloudFront, and Elastic Load Balancing services at the “root” of their domain, that is, for names like “amazon.com” and not just for names like “www.amazon.com”.

That may seem very simple. However, due to a design decision in the DNS protocol, made back in the 1980s, it’s harder than it seems. DNS has a feature called CNAME that allows the owner of a domain to offload a part of their domain to another provider to host, but it doesn’t work at the root or top level of a domain. To serve our customers’ needs, we’d have to actually host our customers’ domains. When we host a customer’s domain, we can return whatever the current set of IP addresses are for Amazon S3, Amazon CloudFront, or Elastic Load Balancing. These services are constantly expanding and adding IP addresses, so it’s not something that customers could easily hard-code in their domain configurations either.

It’s no small task to host DNS. If DNS is having problems, an entire business can be offline. However, after we identified the need, we set out to solve it in the way that’s typical at Amazon—urgently. We carved out a small team of engineers, and we got to work

Read the full article by Colm MacCárthaigh, Senior Principal Engineer

Want to learn more about the Amazon Builders’ Library? Visit our FAQ.

Next week we’ll wrap up 2019 with a top ten list of the most-visited Architecture blog posts of 2019.

re:Invent 2019: Introducing the Amazon Builders’ Library (Part I)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-i/

Today, I’m going to tell you about a new site we launched at re:Invent, the Amazon Builders’ Library, a collection of living articles covering topics across architecture, software delivery, and operations. You get to peek under the hood of how Amazon architects, releases, and operates the software underpinning Amazon.com and AWS.

Want to know how Amazon.com does what it does? This is for you. In this two-part series (the next one coming December 23), I’ll highlight some of the best architecture articles written by Amazon’s senior technical leaders and engineers.

Avoiding insurmountable queue backlogs

Avoiding insurmountable queue backlogs

In queueing theory, the behavior of queues when they are short is relatively uninteresting. After all, when a queue is short, everyone is happy. It’s only when the queue is backlogged, when the line to an event goes out the door and around the corner, that people start thinking about throughput and prioritization.

In this article, I discuss strategies we use at Amazon to deal with queue backlog scenarios – design approaches we take to drain queues quickly and to prioritize workloads. Most importantly, I describe how to prevent queue backlogs from building up in the first place. In the first half, I describe scenarios that lead to backlogs, and in the second half, I describe many approaches used at Amazon to avoid backlogs or deal with them gracefully.

Read the full article by David Yanacek – Principal Engineer

Timeouts, retries, and backoff with jitter

Timeouts, retries and backoff with jitter

Whenever one service or system calls another, failures can happen. These failures can come from a variety of factors. They include servers, networks, load balancers, software, operating systems, or even mistakes from system operators. We design our systems to reduce the probability of failure, but impossible to build systems that never fail. So in Amazon, we design our systems to tolerate and reduce the probability of failure, and avoid magnifying a small percentage of failures into a complete outage. To build resilient systems, we employ three essential tools: timeouts, retries, and backoff.

Read the full article by Marc Brooker, Senior Principal Engineer

Challenges with distributed systems

Challenges with distributed systems

The moment we added our second server, distributed systems became the way of life at Amazon. When I started at Amazon in 1999, we had so few servers that we could give some of them recognizable names like “fishy” or “online-01”. However, even in 1999, distributed computing was not easy. Then as now, challenges with distributed systems involved latency, scaling, understanding networking APIs, marshalling and unmarshalling data, and the complexity of algorithms such as Paxos. As the systems quickly grew larger and more distributed, what had been theoretical edge cases turned into regular occurrences.

Developing distributed utility computing services, such as reliable long-distance telephone networks, or Amazon Web Services (AWS) services, is hard. Distributed computing is also weirder and less intuitive than other forms of computing because of two interrelated problems. Independent failures and nondeterminism cause the most impactful issues in distributed systems. In addition to the typical computing failures most engineers are used to, failures in distributed systems can occur in many other ways. What’s worse, it’s impossible always to know whether something failed.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Static stability using Availability Zones

Static stability using availability zones

At Amazon, the services we build must meet extremely high availability targets. This means that we need to think carefully about the dependencies that our systems take. We design our systems to stay resilient even when those dependencies are impaired. In this article, we’ll define a pattern that we use called static stability to achieve this level of resilience. We’ll show you how we apply this concept to Availability Zones, a key infrastructure building block in AWS and therefore a bedrock dependency on which all of our services are built.

Read the full article by Becky Weiss, Senior Principal Engineer, and Mike Furr, Principal Engineer

Check back in two weeks to read about some other architecture-based expert articles that let you in on how Amazon does what it does.

Decoupled Serverless Scheduler To Run HPC Applications At Scale on EC2

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/decoupled-serverless-scheduler-to-run-hpc-applications-at-scale-on-ec2/

This post is written by Ludvig Nordstrom and Mark Duffield | on November 27, 2019

In this blog post, we dive in to a cloud native approach for running HPC applications at scale on EC2 Spot Instances, using a decoupled serverless scheduler. This architecture is ideal for many workloads in the HPC and EDA industries, and can be used for any batch job workload.

At the end of this blog post, you will have two takeaways.

  1. A highly scalable environment that can run on hundreds of thousands of cores across EC2 Spot Instances.
  2. A fully serverless architecture for job orchestration.

We discuss deploying and running a pre-built serverless job scheduler that can run both Windows and Linux applications using any executable file format for your application. This environment provides high performance, scalability, cost efficiency, and fault tolerance. We introduce best practices and benefits to creating this environment, and cover the architecture, running jobs, and integration in to existing environments.

quick note about the term cloud native: we use the term loosely in this blog. Here, cloud native  means we use AWS Services (to include serverless and microservices) to build out our compute environment, instead of a traditional lift-and-shift method.

Let’s get started!

 

Solution overview

This blog goes over the deployment process, which leverages AWS CloudFormation. This allows you to use infrastructure as code to automatically build out your environment. There are two parts to the solution: the Serverless Scheduler and Resource Automation. Below are quick summaries of each part of the solutions.

Part 1 – The serverless scheduler

This first part of the blog builds out a serverless workflow to get jobs from SQS and run them across EC2 instances. The CloudFormation template being used for Part 1 is serverless-scheduler-app.template, and here is the Reference Architecture:

 

Serverless Scheduler Reference Architecture . Reference Architecture for Part 1. This architecture shows just the Serverless Schduler. Part 2 builds out the resource allocation architecture. Outlined Steps with detail from figure one

    Figure 1: Serverless Scheduler Reference Architecture (grayed-out area is covered in Part 2).

Read the GitHub Repo if you want to look at the Step Functions workflow contained in preceding images. The walkthrough explains how the serverless application retrieves and runs jobs on its worker, updates DynamoDB job monitoring table, and manages the worker for its lifetime.

 

Part 2 – Resource automation with serverless scheduler


This part of the solution relies on the serverless scheduler built in Part 1 to run jobs on EC2.  Part 2 simplifies submitting and monitoring jobs, and retrieving results for users. Jobs are spread across our cost-optimized Spot Instances. AWS Autoscaling automatically scales up the compute resources when jobs are submitted, then terminates them when jobs are finished. Both of these save you money.

The CloudFormation template used in Part 2 is resource-automation.template. Building on Figure 1, the additional resources launched with Part 2 are noted in the following image, they are an S3 Bucket, AWS Autoscaling Group, and two Lambda functions.

Resource Automation using Serverless Scheduler This is Part 2 of the deployment process, and leverages the Part 1 architecture. This provides the resource allocation, that allows for automated job submission and EC2 Auto Scaling. Detailed steps for the prior image

 

Figure 2: Resource Automation using Serverless Scheduler

                               

Introduction to decoupled serverless scheduling

HPC schedulers traditionally run in a classic master and worker node configuration. A scheduler on the master node orchestrates jobs on worker nodes. This design has been successful for decades, however many powerful schedulers are evolving to meet the demands of HPC workloads. This scheduler design evolved from a necessity to run orchestration logic on one machine, but there are now options to decouple this logic.

What are the possible benefits that decoupling this logic could bring? First, we avoid a number of shortfalls in the environment such as the need for all worker nodes to communicate with a single master node. This single source of communication limits scalability and creates a single point of failure. When we split the scheduler into decoupled components both these issues disappear.

Second, in an effort to work around these pain points, traditional schedulers had to create extremely complex logic to manage all workers concurrently in a single application. This stifled the ability to customize and improve the code – restricting changes to be made by the software provider’s engineering teams.

Serverless services, such as AWS Step Functions and AWS Lambda fix these major issues. They allow you to decouple the scheduling logic to have a one-to-one mapping with each worker, and instead share an Amazon Simple Queue Service (SQS) job queue. We define our scheduling workflow in AWS Step Functions. Then the workflow scales out to potentially thousands of “state machines.” These state machines act as wrappers around each worker node and manage each worker node individually.  Our code is less complex because we only consider one worker and its job.

We illustrate the differences between a traditional shared scheduler and decoupled serverless scheduler in Figures 3 and 4.

 

Traditional Scheduler Model This shows a traditional sceduler where there is one central schduling host, and then multiple workers.

Figure 3: Traditional Scheduler Model

 

Decoupled Serverless Scheduler on each instance This shows what a Decoupled Serverless Scheduler design looks like, wit

Figure 4: Decoupled Serverless Scheduler on each instance

 

Each decoupled serverless scheduler will:

  • Retrieve and pass jobs to its worker
  • Monitor its workers health and take action if needed
  • Confirm job success by checking output logs and retry jobs if needed
  • Terminate the worker when job queue is empty just before also terminating itself

With this new scheduler model, there are many benefits. Decoupling schedulers into smaller schedulers increases fault tolerance because any issue only affects one worker. Additionally, each scheduler consists of independent AWS Lambda functions, which maintains the state on separate hardware and builds retry logic into the service.  Scalability also increases, because jobs are not dependent on a master node, which enables the geographic distribution of jobs. This geographic distribution allows you to optimize use of low-cost Spot Instances. Also, when decoupling the scheduler, workflow complexity decreases and you can customize scheduler logic. You can leverage lower latency job monitoring and customize automated responses to job events as they happen.

 

Benefits

  • Fully managed –  With Part 2, Resource Automation deployed, resources for a job are managed. When a job is submitted, resources launch and run the job. When the job is done, worker nodes automatically shut down. This prevents you from incurring continuous costs.

 

  • Performance – Your application runs on EC2, which means you can choose any of the high performance instance types. Input files are automatically copied from Amazon S3 into local Amazon EC2 Instance Store for high performance storage during execution. Result files are automatically moved to S3 after each job finishes.

 

  • Scalability – A worker node combined with a scheduler state machine become a stateless entity. You can spin up as many of these entities as you want, and point them to an SQS queue. You can even distribute worker and state machine pairs across multiple AWS regions. These two components paired with fully managed services optimize your architecture for scalability to meet your desired number of workers.

 

  • Fault Tolerance –The solution is completely decoupled, which means each worker has its own state machine that handles scheduling for that worker. Likewise, each state machine is decoupled into Lambda functions that make up your state machine. Additionally, the scheduler workflow includes a Lambda function that confirms each successful job or resubmits jobs.

 

  • Cost Efficiency – This fault tolerant environment is perfect for EC2 Spot Instances. This means you can save up to 90% on your workloads compared to On-Demand Instance pricing. The scheduler workflow ensures little to no idle time of workers by closely monitoring and sending new jobs as jobs finish. Because the scheduler is serverless, you only incur costs for the resources required to launch and run jobs. Once the job is complete, all are terminated automatically.

 

  • Agility – You can use AWS fully managed Developer Tools to quickly release changes and customize workflows. The reduced complexity of a decoupled scheduling workflow means that you don’t have to spend time managing a scheduling environment, and can instead focus on your applications.

 

 

Part 1 – serverless scheduler as a standalone solution

 

If you use the serverless scheduler as a standalone solution, you can build clusters and leverage shared storage such as FSx for Lustre, EFS, or S3. Additionally, you can use AWS CloudFormation or to deploy more complex compute architectures that suit your application. So, the EC2 Instances that run the serverless scheduler can be launched in any number of ways. The scheduler only requires the instance id and the SQS job queue name.

 

Submitting Jobs Directly to serverless scheduler

The severless scheduler app is a fully built AWS Step Function workflow to pull jobs from an SQS queue and run them on an EC2 Instance. The jobs submitted to SQS consist of an AWS Systems Manager Run Command, and work with any SSM Document and command that you chose for your jobs. Examples of SSM Run Commands are ShellScript and PowerShell.  Feel free to read more about Running Commands Using Systems Manager Run Command.

The following code shows the format of a job submitted to SQS in JSON.

  {

    "job_id": "jobId_0",

    "retry": "3",

    "job_success_string": " ",

    "ssm_document": "AWS-RunPowerShellScript",

    "commands":

        [

            "cd C:\\ProgramData\\Amazon\\SSM; mkdir Result",

            "Copy-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 -LocalFolder .\\",

            "C:\\ProgramData\\Amazon\\SSM\\jobId_0.bat",

            "Write-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 –Folder .\\Result\\"

        ],

  }

 

Any EC2 Instance associated with a serverless scheduler it receives jobs picked up from a designated SQS queue until the queue is empty. Then, the EC2 resource automatically terminates. If the job fails, it retries until it reaches the specified number of times in the job definition. You can include a specific string value so that the scheduler searches for job execution outputs and confirms the successful completions of jobs.

 

Tagging EC2 workers to get a serverless scheduler state machine

In Part 1 of the deployment, you must manage your EC2 Instance launch and termination. When launching an EC2 Instance, tag it with a specific tag key that triggers a state machine to manage that instance. The tag value is the name of the SQS queue that you want your state machine to poll jobs from.

In the following example, “my-scheduler-cloudformation-stack-name” is the tag key that serverless scheduler app will for with any new EC2 instance that starts. Next, “my-sqs-job-queue-name” is the default job queue created with the scheduler. But, you can change this to any queue name you want to retrieve jobs from when an instance is launched.

{"my-scheduler-cloudformation-stack-name":"my-sqs-job-queue-name"}

 

Monitor jobs in DynamoDB

You can monitor job status in the following DynamoDB. In the table you can find job_id, commands sent to Amazon EC2, job status, job output logs from Amazon EC2, and retries among other things.

Alternatively, you can query DynamoDB for a given job_id via the AWS Command Line Interface:

aws dynamodb get-item --table-name job-monitoring \

                      --key '{"job_id": {"S": "/my-jobs/my-job-id.bat"}}'

 

Using the “job_success_string” parameter

For the prior DynamoDB table, we submitted two identical jobs using an example script that you can also use. The command sent to the instance is “echo Hello World.” The output from this job should be “Hello World.” We also specified three allowed job retries.  In the following image, there are two jobs in SQS queue before they ran.  Look closely at the different “job_success_strings” for each and the identical command sent to both:

DynamoDB CLI info This shows an example DynamoDB CLI output with job information.

From the image we see that Job2 was successful and Job1 retried three times before permanently labelled as failed. We forced this outcome to demonstrate how the job success string works by submitting Job1 with “job_success_string” as “Hello EVERYONE”, as that will not be in the job output “Hello World.” In “Job2” we set “job_success_string” as “Hello” because we knew this string will be in the output log.

Job outputs commonly have text that only appears if job succeeded. You can also add this text yourself in your executable file. With “job_success_string,” you can confirm a job’s successful output, and use it to identify a certain value that you are looking for across jobs.

 

Part 2 – Resource Automation with the serverless scheduler

The additional services we deploy in Part 2 integrate with existing architectures to launch resources for your serverless scheduler. These services allow you to submit jobs simply by uploading input files and executable files to an S3 bucket.

Likewise, these additional resources can use any executable file format you want, including proprietary application level scripts. The solution automates everything else. This includes creating and submitting jobs to SQS job queue, spinning up compute resources when new jobs come in, and taking them back down when there are no jobs to run. When jobs are done, result files are copied to S3 for the user to retrieve. Similar to Part 1, you can still view the DynamoDB table for job status.

This architecture makes it easy to scale out to different teams and departments, and you can submit potentially hundreds of thousands of jobs while you remain in control of resources and cost.

 

Deeper Look at the S3 Architecture

The following diagram shows how you can submit jobs, monitor progress, and retrieve results. To submit jobs, upload all the needed input files and an executable script to S3. The suffix of the executable file (uploaded last) triggers an S3 event to start the process, and this suffix is configurable.

The S3 key of the executable file acts as the job id, and is kept as a reference to that job in DynamoDB. The Lambda (#2 in diagram below) uses the S3 key of the executable to create three SSM Run Commands.

  1. Synchronize all files in the same S3 folder to a working directory on the EC2 Instance.
  2. Run the executable file on EC2 Instances within a specified working directory.
  3. Synchronize the EC2 Instances working directory back to the S3 bucket where newly generated result files are included.

This Lambda (#2) then places the job on the SQS queue using the schedulers JSON formatted job definition seen above.

IMPORTANT: Each set of job files should be given a unique job folder in S3 or more files than needed might be moved to the EC2 Instance.

 

Figure 5: Resource Automation using Serverless Scheduler - A deeper look A deeper dive in to Part 2, resource allcoation.

Figure 5: Resource Automation using Serverless Scheduler – A deeper look

 

EC2 and Step Functions workflow use the Lambda function (#3 in prior diagram) and the Auto Scaling group to scale out based on the number of jobs in the queue to a maximum number of workers (plus state machine), as defined in the Auto Scaling Group. When the job queue is empty, the number of running instances scale down to 0 as they finish their remaining jobs.

 

Process Submitting Jobs and Retrieving Results

  1. Seen in1, upload input file(s) and an executable file into a unique job folder in S3 (such as /year/month/day/jobid/~job-files). Upload the executable file last because it automatically starts the job. You can also use a script to upload multiple files at a time but each job will need a unique directory. There are many ways to make S3 buckets available to users including AWS Storage Gateway, AWS Transfer for SFTP, AWS DataSync, the AWS Console or any one of the AWS SDKs leveraging S3 API calls.
  2. You can monitor job status by accessing the DynamoDB table directly via the AWS Management Console or use the AWS CLI to call DynamoDB via an API call.
  3. Seen in step 5, you can retrieve result files for jobs from the same S3 directory where you left the input files. The DynamoDB table confirms when jobs are done. The SQS output queue can be used by applications that must automatically poll and retrieve results.

You no longer need to create or access compute nodes as compute resources. These automatically scale up from zero when jobs come in, and then back down to zero when jobs are finished.

 

Deployment

Read the GitHub Repo for deployment instructions. Below are CloudFormation templates to help:

AWS RegionLaunch Stack
eu-north-1link to zone
ap-south-1
eu-west-3
eu-west-2
eu-west-1
ap-northeast-3
ap-northeast-2
ap-northeast-1
sa-east-1
ca-central-1
ap-southeast-1
ap-southeast-2
eu-central-1
us-east-1
us-east-2
us-west-1
us-west-2

 

 

Additional Points on Usage Patterns

 

  • While the two solutions in this blog are aimed at HPC applications, they can be used to run any batch jobs. Many customers that run large data processing batch jobs in their data lakes could use the serverless scheduler.

 

  • You can build pipelines of different applications when the output of one job triggers another to do something else – an example being pre-processing, meshing, simulation, post-processing. You simply deploy the Resource Automation template several times, and tailor it so that the output bucket for one step is the input bucket for the next step.

 

  • You might look to use the “job_success_string” parameter for iteration/verification used in cases where a shot-gun approach is needed to run thousands of jobs, and only one has a chance of producing the right result. In this case the “job_success_string” would identify the successful job from potentially hundreds of thousands pushed to SQS job queue.

 

Scale-out across teams and departments

Because all services used are serverless, you can deploy as many run environments as needed without increasing overall costs. Serverless workloads only accumulate cost when the services are used. So, you could deploy ten job environments and run one job in each, and your costs would be the same if you had one job environment running ten jobs.

 

All you need is an S3 bucket to upload jobs to and an associated AMI that has the right applications and license configuration. Because a job configuration is passed to the scheduler at each job start, you can add new teams by creating an S3 bucket and pointing S3 events to a default Lambda function that pulls configurations for each job start.

 

Setup CI/CD pipeline to start continuous improvement of scheduler

If you are advanced, we encourage you to clone the git repo and customize this solution. The serverless scheduler is less complex than other schedulers, because you only think about one worker and the process of one job’s run.

Ways you could tailor this solution:

  • Add intelligent job scheduling using AWS Sagemaker  – It is hard to find data as ready for ML as log data because every job you run has different run times and resource consumption. So, you could tailor this solution to predict the best instance to use with ML when workloads are submitted.
  • Add Custom Licensing Checkout Logic – Simply add one Lambda function to your Step Functions workflow to make an API call a license server before continuing with one or more jobs. You can start a new worker when you have a license checked out or if a license is not available then the instance can terminate to remove any costs waiting for licenses.
  • Add Custom Metrics to DynamoDB – You can easily add metrics to DynamoDB because the solution already has baseline logging and monitoring capabilities.
  • Run on other AWS Services – There is a Lambda function in the Step Functions workflow called “Start_Job”. You can tailor this Lambda to run your jobs on AWS Sagemaker, AWS EMR, AWS EKS or AWS ECS instead of EC2.

 

Conclusion

 

Although HPC workloads and EDA flows may still be dependent on current scheduling technologies, we illustrated the possibilities of decoupling your workloads from your existing shared scheduling environments. This post went deep into decoupled serverless scheduling, and we understand that it is difficult to unwind decades of dependencies. However, leveraging numerous AWS Services encourages you to think completely differently about running workloads.

But more importantly, it encourages you to Think Big. With this solution you can get up and running quickly, fail fast, and iterate. You can do this while scaling to your required number of resources, when you want them, and only pay for what you use.

Serverless computing  catalyzes change across all industries, but that change is not obvious in the HPC and EDA industries. This solution is an opportunity for customers to take advantage of the nearly limitless capacity that AWS.

Please reach out with questions about HPC and EDA on AWS. You now have the architecture and the instructions to build your Serverless Decoupled Scheduling environment.  Go build!


About the Authors and Contributors

Authors 

 

Ludvig Nordstrom is a Senior Solutions Architect at AWS

 

 

 

 

Mark Duffield is a Tech Lead in Semiconductors at AWS

 

 

 

Contributors

 

Steve Engledow is a Senior Solutions Builder at AWS

 

 

 

 

Arun Thomas is a Senior Solutions Builder at AWS

 

 

AWS Architecture Monthly Magazine: Manufacturing

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/aws-architecture-monthly-magazine-manufacturing/

Architecture Monthly Magazine - Nov-Dec 2019

For more than 25 years, Amazon has designed and manufactured smart products and distributed billions of products through its globally connected distribution network using cutting edge automation, machine learning and AI, and robotics, with AWS at its core. From product design to smart factory and smart products, AWS helps leading manufacturers transform their manufacturing operations with the most comprehensive and advanced set of cloud solutions available today, while taking advantage of the highest level of security.

In this Manufacturing-themed end-of-year issue of the AWS Architecture Monthly magazine, Steve Blackwell, AWS Manufacturing Tech Leader, talks about how manufacturers can experiment with and take advantage of emerging technologies using three main architectural patterns: demand forecasting, smart factories, and extending the manufacturing value chain with smart products.

In This Issue

We’ve assembled architectural best practices about Manufacturing from all over AWS, and we’ve made sure that a broad audience can appreciate it. Note that this will be our last issue of the year. We’ll be back in January with highlights and insights about AWS re:Invent 2019 (December 2-6 in Las Vegas).

  • Case Study: iRobot Ready to Unlock the Next Generation of Smart Homes Using the AWS Cloud
  • Ask an Expert: Steve Blackwell, Manufacturing Tech Leader
  • Blog Post: Reinventing the IoT Platform for Discrete Manufacturers
  • Solution: Smart Product Solution
  • AWS Coffee Break: IoT Helps Manufacturing Hit the Right Note
  • Whitepaper: Practical Ways To Achieve Smarter, Faster, and More Responsive Operations
  • Reference Architecture: EDA on AWS with IBM Spectrum LSF

How to Access the Magazine

We hope you’re enjoying Architecture Monthly, and we’d like to hear from you—leave us star rating and comment on the Amazon Kindle Newsstand page or contact us anytime at [email protected].

Serverless at AWS re:Invent 2019

Post Syndicated from George Mao original https://aws.amazon.com/blogs/architecture/serverless-at-aws-reinvent-2019/

Our annual AWS re:Invent conference is just two weeks away! We can’t wait to meet you for an AWSome week in Las Vegas. The Serverless team is now hard at work preparing to deliver over 130 sessions at re:Invent. Come meet us and learn about how to use the newest Serverless innovations to build and architect for modern applications.

reInvent 2019

Breakouts, Talks, Builders, & Demos!

To find any Serverless session, you can search our Agenda for the key words “SVS” or you can visit our re:Invent 2019 Session Catalog. Lets take a look at some of the Architecture-focused sessions you might want to join:

Workshops

  • SVS305-RHow to secure your Serverless APIs
    You’ll get hands on with Amazon API Gateway and learn how to architect for scale and security.
  • SVS303-R: Monolith to Serverless
    This workshop shows you how to re-architect monolithic applications to AWS Lambda-based microservices.

Breakouts

  • SVS308Moving to event-driven architectures
    Learn about the new event-driven world and how our newest tools help you develop event-centric applications.
  • SVS407: Architecting and operating resilient Serverless systems
    This is an excellent session to learn best practice patterns for building reliable applications.
  • SVS401Optimizing your Serverless applications
    Learn how to choose the correct services in your architecture and how to design your Lambda functions and APIs for security and scale.

Chalk Talks

  • SVS338: API Patterns and architectures (REST vs GraphQL APIs)
    We’ll help you evaluate your choices for modern APIs. Come learn how to choose between Amazon S3 REST and GraphQL
  • SVS213: Thinking Serverless
    How do you go from a flowchart to a Serverless application? Come to this session to learn the techniques you can use to design Serverless architectures.
  • SVS323: Mastering AWS Lambda streaming event sources
    This talk will go in depth on the common architecture patterns for consuming and scaling Amazon Kinesis and Amazon DynamoDB streams with AWS Lambda.

Builders Sessions

  • SVS330: Build secure Serverless mobile or web applications
    Get hands on experience building a serverless web application using AWS AppSync, AWS Lambda, Amazon API Gateway, and Amazon DynamoDB.

Come Meet Us

Don’t forget to come stop by our Serverless expert booth in the main Expo Hall. We will have many people from the Serverless team ready to speak with you!

Our Serverless team, including specialist solutions architects and developer advocates will be onsite throughout the week. We’d love to meet you, hear about your projects, and help with any architecture questions. Reach out to Sam Dengler, Brian McNamara, Chris Munns, Eric Johnson, James Beswick, and me, George Mao. See you onsite!

See You in Las Vegas!

I can’t wait to meet you in Las Vegas and hear about your projects. Please reach out to us and let’s chat about Serverless! As a side note, reserved seating is available for all sessions, so be sure to log in to your re:Invent account to reserve a seat and join us for all kinds of Serverless architecture discussions and hands-on training.

FogHorn: Edge-to-Edge Communication and Deep Learning

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/foghorn-edge-to-edge-communication-and-deep-learning/

FogHorn is an intelligent Internet of Things ( IoT) edge solution that delivers data processing and real-time inference where data is created. Referring to itself as “the only ‘real’ edge intelligence solution in the market today,”  FogHorn is powered by a hyper-efficient Complex Event Processor (CEP) and delivers comprehensive data enrichment and real-time analytics on high volumes, varieties, and velocities of streaming sensor data, and is optimized for constrained compute footprints and limited connectivity.

Andrea Sabet, AWS Solutions Architect speaks with Ramya Ravichandar, Vice President of Products at Foghorn to talk about how FogHorn integrates with IoT MQTT for edge-to-edge communication as well as Amazon SageMaker for deep learning model deployment. The edgefication process involves running inference with real-time streaming data against a trained deep learning model. Drifts in the model accuracy trigger a callback to SageMaker for retraining.

*Check out more This Is My Architecture video series.