Tag Archives: Amazon DynamoDB

ICYMI: Serverless pre:Invent 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-preinvent-2019/

With Contributions from Chris Munns – Sr Manager – Developer Advocacy – AWS Serverless

The last two weeks have been a frenzy of AWS service and feature launches, building up to AWS re:Invent 2019. As there has been a lot announced we thought we’d ship an ICYMI post summarizing the serverless service specific features that have been announced. We’ve also dropped in some service announcements from services that are commonly used in serverless application architectures or development.

AWS re:Invent

AWS re:Invent 2019

We also want you to know that we’ll be talking about many of these features (as well as those coming) in sessions at re:Invent.

Here’s what’s new!

AWS Lambda

On September 3, AWS Lambda started rolling out a major improvement to how AWS Lambda functions work with your Amazon VPC networks. This change brings both scale and performance improvements, and addresses several of the limitations of the previous networking model with VPCs.

On November 25, Lambda announced that the rollout of this new capability has completed in 6 additional regions including US East (Virginia) and US West (Oregon).

New VPC to VPC NAT for Lambda functions

New VPC to VPC NAT for Lambda functions

On November 18, Lambda announced three new runtime updates. Lambda now supports Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new language features and benefits so be sure to check out the linked release posts. These new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda has released a number of controls for both stream and async based invocations:

  • For Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams, it’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities will help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

In addition to the above controls, Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set one destination for a success, and another for a failure. This unlocks really useful patterns for distributed event-based applications and can reduce code to send function results to a destination manually.

Lambda Destinations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Amazon Kinesis Data Streams and Amazon DynamoDB Streams. This allows for faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Lambda Parallelization Factor diagram

Lambda Parallelization Factor diagram

Lambda now supports Amazon SQS FIFO queues as an event source. FIFO queues guarantee the order of record processing compared to standard queues which are unordered. FIFO queues support messaging batching via a MessageGroupID attribute which allows for parallel Lambda consumers of a single FIFO queue. This allows for high throughput of record processing by Lambda.

Lambda now supports Environment Variables in AWS China (Beijing) Region, operated by Sinnet and the AWS China (Ningxia) Region, operated by NWCD.

Lastly, you can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics tell you the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, spot outliers, and find hard-to-spot situations that create a poor customer experience for a subset of your users.

AWS SAM CLI

AWS SAM CLI deploy command

AWS SAM CLI deploy command

The SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One of the powerful features of Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. Step Functions has expanded its integration with Amazon SageMaker to simplify machine learning workflows, and added a new integration with Amazon EMR, making it faster to build and easier to monitor EMR big data processing workflows.

Step Functions step with EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor and react to usage and spending trends on your AWS accounts.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

AWS Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates.

In US East (N. Virginia), US West (Oregon), and EU (Ireland), throughput has increased from 1,000 state transitions per second to 1,500 state transitions per second with bucket capacity of 5,000 state transitions. The default start rate for state machine executions has also increased from 200 per second to 300 per second, with bucket capacity of up to 1,300 starts in these regions.

In all other regions, throughput has increased from 400 state transitions per second to 500 state transitions per second with bucket capacity of 800 state transitions. The default start rate for AWS Step Functions state machine executions has also increased from 25 per second to 150 per second, with bucket capacity of up to 800 state machine executions.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

CloudWatch ServiceLens

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

On November 18, CloudWatch launched Embedded Metric Format to help you ingest complex high-cardinality application data in the form of logs and easily generate actionable metrics from them. You can publish these metrics from your Lambda function by using the PutLogEvents API or for Node.js or Python based applications using an open source library.

Lastly, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.

AWS X-Ray

X-Ray announced trace maps, which enable you to map the end to end path of a single request. Identifiers will show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. X-Ray supports tracing canary scripts throughout the application providing metrics on performance or application issues.

X-Ray Service map with CloudWatch Synthetics

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

DynamoDB announced support for customer managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Currently under preview, is another new DynamoDB capability to identify frequently accessed keys and database traffic trends. With this you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

CloudWatch Contributor Insights for DynamoDB

CloudWatch Contributor Insights for DynamoDB

Last but far from least for DynamoDB, is adaptive capacity, a feature which helps you handle imbalanced workloads by isolating frequently accessed items automatically and shifting data across partitions to rebalance them. This helps both reduce cost by enabling you to provision throughput for a more balanced out workload vs. over provisioning for uneven data access patterns.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge will appear next to your name in the SAR and will deep-link to your GitHub profile.

SAR Verified developer badges

SAR Verified developer badges

AWS Code Services

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

CodeBuild test trends

CodeBuild test trends

AWS Amplify and AWS AppSync

Instead of trying to summarize all the awesome things that our peers over in the Amplify and AppSync teams have done recently we’ll instead link you to their own recent summary: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

AWS AppSync

AWS AppSync

Still looking for more?

We only covered a small bit of all the awesome new things that were recently announced. Keep your eyes peeled for more exciting announcements next week during re:Invent and for a future ICYMI Serverless Q4 roundup. We’ll also be kicking off a fresh series of Tech Talks in 2020 with new content helping to dive deeper on everything new coming out of AWS for serverless application developers.

Decoupled Serverless Scheduler To Run HPC Applications At Scale on EC2

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/decoupled-serverless-scheduler-to-run-hpc-applications-at-scale-on-ec2/

This post is written by Ludvig Nordstrom and Mark Duffield | on November 27, 2019

In this blog post, we dive in to a cloud native approach for running HPC applications at scale on EC2 Spot Instances, using a decoupled serverless scheduler. This architecture is ideal for many workloads in the HPC and EDA industries, and can be used for any batch job workload.

At the end of this blog post, you will have two takeaways.

  1. A highly scalable environment that can run on hundreds of thousands of cores across EC2 Spot Instances.
  2. A fully serverless architecture for job orchestration.

We discuss deploying and running a pre-built serverless job scheduler that can run both Windows and Linux applications using any executable file format for your application. This environment provides high performance, scalability, cost efficiency, and fault tolerance. We introduce best practices and benefits to creating this environment, and cover the architecture, running jobs, and integration in to existing environments.

quick note about the term cloud native: we use the term loosely in this blog. Here, cloud native  means we use AWS Services (to include serverless and microservices) to build out our compute environment, instead of a traditional lift-and-shift method.

Let’s get started!

 

Solution overview

This blog goes over the deployment process, which leverages AWS CloudFormation. This allows you to use infrastructure as code to automatically build out your environment. There are two parts to the solution: the Serverless Scheduler and Resource Automation. Below are quick summaries of each part of the solutions.

Part 1 – The serverless scheduler

This first part of the blog builds out a serverless workflow to get jobs from SQS and run them across EC2 instances. The CloudFormation template being used for Part 1 is serverless-scheduler-app.template, and here is the Reference Architecture:

 

Serverless Scheduler Reference Architecture . Reference Architecture for Part 1. This architecture shows just the Serverless Schduler. Part 2 builds out the resource allocation architecture. Outlined Steps with detail from figure one

    Figure 1: Serverless Scheduler Reference Architecture (grayed-out area is covered in Part 2).

Read the GitHub Repo if you want to look at the Step Functions workflow contained in preceding images. The walkthrough explains how the serverless application retrieves and runs jobs on its worker, updates DynamoDB job monitoring table, and manages the worker for its lifetime.

 

Part 2 – Resource automation with serverless scheduler


This part of the solution relies on the serverless scheduler built in Part 1 to run jobs on EC2.  Part 2 simplifies submitting and monitoring jobs, and retrieving results for users. Jobs are spread across our cost-optimized Spot Instances. AWS Autoscaling automatically scales up the compute resources when jobs are submitted, then terminates them when jobs are finished. Both of these save you money.

The CloudFormation template used in Part 2 is resource-automation.template. Building on Figure 1, the additional resources launched with Part 2 are noted in the following image, they are an S3 Bucket, AWS Autoscaling Group, and two Lambda functions.

Resource Automation using Serverless Scheduler This is Part 2 of the deployment process, and leverages the Part 1 architecture. This provides the resource allocation, that allows for automated job submission and EC2 Auto Scaling. Detailed steps for the prior image

 

Figure 2: Resource Automation using Serverless Scheduler

                               

Introduction to decoupled serverless scheduling

HPC schedulers traditionally run in a classic master and worker node configuration. A scheduler on the master node orchestrates jobs on worker nodes. This design has been successful for decades, however many powerful schedulers are evolving to meet the demands of HPC workloads. This scheduler design evolved from a necessity to run orchestration logic on one machine, but there are now options to decouple this logic.

What are the possible benefits that decoupling this logic could bring? First, we avoid a number of shortfalls in the environment such as the need for all worker nodes to communicate with a single master node. This single source of communication limits scalability and creates a single point of failure. When we split the scheduler into decoupled components both these issues disappear.

Second, in an effort to work around these pain points, traditional schedulers had to create extremely complex logic to manage all workers concurrently in a single application. This stifled the ability to customize and improve the code – restricting changes to be made by the software provider’s engineering teams.

Serverless services, such as AWS Step Functions and AWS Lambda fix these major issues. They allow you to decouple the scheduling logic to have a one-to-one mapping with each worker, and instead share an Amazon Simple Queue Service (SQS) job queue. We define our scheduling workflow in AWS Step Functions. Then the workflow scales out to potentially thousands of “state machines.” These state machines act as wrappers around each worker node and manage each worker node individually.  Our code is less complex because we only consider one worker and its job.

We illustrate the differences between a traditional shared scheduler and decoupled serverless scheduler in Figures 3 and 4.

 

Traditional Scheduler Model This shows a traditional sceduler where there is one central schduling host, and then multiple workers.

Figure 3: Traditional Scheduler Model

 

Decoupled Serverless Scheduler on each instance This shows what a Decoupled Serverless Scheduler design looks like, wit

Figure 4: Decoupled Serverless Scheduler on each instance

 

Each decoupled serverless scheduler will:

  • Retrieve and pass jobs to its worker
  • Monitor its workers health and take action if needed
  • Confirm job success by checking output logs and retry jobs if needed
  • Terminate the worker when job queue is empty just before also terminating itself

With this new scheduler model, there are many benefits. Decoupling schedulers into smaller schedulers increases fault tolerance because any issue only affects one worker. Additionally, each scheduler consists of independent AWS Lambda functions, which maintains the state on separate hardware and builds retry logic into the service.  Scalability also increases, because jobs are not dependent on a master node, which enables the geographic distribution of jobs. This geographic distribution allows you to optimize use of low-cost Spot Instances. Also, when decoupling the scheduler, workflow complexity decreases and you can customize scheduler logic. You can leverage lower latency job monitoring and customize automated responses to job events as they happen.

 

Benefits

  • Fully managed –  With Part 2, Resource Automation deployed, resources for a job are managed. When a job is submitted, resources launch and run the job. When the job is done, worker nodes automatically shut down. This prevents you from incurring continuous costs.

 

  • Performance – Your application runs on EC2, which means you can choose any of the high performance instance types. Input files are automatically copied from Amazon S3 into local Amazon EC2 Instance Store for high performance storage during execution. Result files are automatically moved to S3 after each job finishes.

 

  • Scalability – A worker node combined with a scheduler state machine become a stateless entity. You can spin up as many of these entities as you want, and point them to an SQS queue. You can even distribute worker and state machine pairs across multiple AWS regions. These two components paired with fully managed services optimize your architecture for scalability to meet your desired number of workers.

 

  • Fault Tolerance –The solution is completely decoupled, which means each worker has its own state machine that handles scheduling for that worker. Likewise, each state machine is decoupled into Lambda functions that make up your state machine. Additionally, the scheduler workflow includes a Lambda function that confirms each successful job or resubmits jobs.

 

  • Cost Efficiency – This fault tolerant environment is perfect for EC2 Spot Instances. This means you can save up to 90% on your workloads compared to On-Demand Instance pricing. The scheduler workflow ensures little to no idle time of workers by closely monitoring and sending new jobs as jobs finish. Because the scheduler is serverless, you only incur costs for the resources required to launch and run jobs. Once the job is complete, all are terminated automatically.

 

  • Agility – You can use AWS fully managed Developer Tools to quickly release changes and customize workflows. The reduced complexity of a decoupled scheduling workflow means that you don’t have to spend time managing a scheduling environment, and can instead focus on your applications.

 

 

Part 1 – serverless scheduler as a standalone solution

 

If you use the serverless scheduler as a standalone solution, you can build clusters and leverage shared storage such as FSx for Lustre, EFS, or S3. Additionally, you can use AWS CloudFormation or to deploy more complex compute architectures that suit your application. So, the EC2 Instances that run the serverless scheduler can be launched in any number of ways. The scheduler only requires the instance id and the SQS job queue name.

 

Submitting Jobs Directly to serverless scheduler

The severless scheduler app is a fully built AWS Step Function workflow to pull jobs from an SQS queue and run them on an EC2 Instance. The jobs submitted to SQS consist of an AWS Systems Manager Run Command, and work with any SSM Document and command that you chose for your jobs. Examples of SSM Run Commands are ShellScript and PowerShell.  Feel free to read more about Running Commands Using Systems Manager Run Command.

The following code shows the format of a job submitted to SQS in JSON.

  {

    "job_id": "jobId_0",

    "retry": "3",

    "job_success_string": " ",

    "ssm_document": "AWS-RunPowerShellScript",

    "commands":

        [

            "cd C:\\ProgramData\\Amazon\\SSM; mkdir Result",

            "Copy-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 -LocalFolder .\\",

            "C:\\ProgramData\\Amazon\\SSM\\jobId_0.bat",

            "Write-S3object -Bucket my-bucket -KeyPrefix jobs/date/jobId_0 –Folder .\\Result\\"

        ],

  }

 

Any EC2 Instance associated with a serverless scheduler it receives jobs picked up from a designated SQS queue until the queue is empty. Then, the EC2 resource automatically terminates. If the job fails, it retries until it reaches the specified number of times in the job definition. You can include a specific string value so that the scheduler searches for job execution outputs and confirms the successful completions of jobs.

 

Tagging EC2 workers to get a serverless scheduler state machine

In Part 1 of the deployment, you must manage your EC2 Instance launch and termination. When launching an EC2 Instance, tag it with a specific tag key that triggers a state machine to manage that instance. The tag value is the name of the SQS queue that you want your state machine to poll jobs from.

In the following example, “my-scheduler-cloudformation-stack-name” is the tag key that serverless scheduler app will for with any new EC2 instance that starts. Next, “my-sqs-job-queue-name” is the default job queue created with the scheduler. But, you can change this to any queue name you want to retrieve jobs from when an instance is launched.

{"my-scheduler-cloudformation-stack-name":"my-sqs-job-queue-name"}

 

Monitor jobs in DynamoDB

You can monitor job status in the following DynamoDB. In the table you can find job_id, commands sent to Amazon EC2, job status, job output logs from Amazon EC2, and retries among other things.

Alternatively, you can query DynamoDB for a given job_id via the AWS Command Line Interface:

aws dynamodb get-item --table-name job-monitoring \

                      --key '{"job_id": {"S": "/my-jobs/my-job-id.bat"}}'

 

Using the “job_success_string” parameter

For the prior DynamoDB table, we submitted two identical jobs using an example script that you can also use. The command sent to the instance is “echo Hello World.” The output from this job should be “Hello World.” We also specified three allowed job retries.  In the following image, there are two jobs in SQS queue before they ran.  Look closely at the different “job_success_strings” for each and the identical command sent to both:

DynamoDB CLI info This shows an example DynamoDB CLI output with job information.

From the image we see that Job2 was successful and Job1 retried three times before permanently labelled as failed. We forced this outcome to demonstrate how the job success string works by submitting Job1 with “job_success_string” as “Hello EVERYONE”, as that will not be in the job output “Hello World.” In “Job2” we set “job_success_string” as “Hello” because we knew this string will be in the output log.

Job outputs commonly have text that only appears if job succeeded. You can also add this text yourself in your executable file. With “job_success_string,” you can confirm a job’s successful output, and use it to identify a certain value that you are looking for across jobs.

 

Part 2 – Resource Automation with the serverless scheduler

The additional services we deploy in Part 2 integrate with existing architectures to launch resources for your serverless scheduler. These services allow you to submit jobs simply by uploading input files and executable files to an S3 bucket.

Likewise, these additional resources can use any executable file format you want, including proprietary application level scripts. The solution automates everything else. This includes creating and submitting jobs to SQS job queue, spinning up compute resources when new jobs come in, and taking them back down when there are no jobs to run. When jobs are done, result files are copied to S3 for the user to retrieve. Similar to Part 1, you can still view the DynamoDB table for job status.

This architecture makes it easy to scale out to different teams and departments, and you can submit potentially hundreds of thousands of jobs while you remain in control of resources and cost.

 

Deeper Look at the S3 Architecture

The following diagram shows how you can submit jobs, monitor progress, and retrieve results. To submit jobs, upload all the needed input files and an executable script to S3. The suffix of the executable file (uploaded last) triggers an S3 event to start the process, and this suffix is configurable.

The S3 key of the executable file acts as the job id, and is kept as a reference to that job in DynamoDB. The Lambda (#2 in diagram below) uses the S3 key of the executable to create three SSM Run Commands.

  1. Synchronize all files in the same S3 folder to a working directory on the EC2 Instance.
  2. Run the executable file on EC2 Instances within a specified working directory.
  3. Synchronize the EC2 Instances working directory back to the S3 bucket where newly generated result files are included.

This Lambda (#2) then places the job on the SQS queue using the schedulers JSON formatted job definition seen above.

IMPORTANT: Each set of job files should be given a unique job folder in S3 or more files than needed might be moved to the EC2 Instance.

 

Figure 5: Resource Automation using Serverless Scheduler - A deeper look A deeper dive in to Part 2, resource allcoation.

Figure 5: Resource Automation using Serverless Scheduler – A deeper look

 

EC2 and Step Functions workflow use the Lambda function (#3 in prior diagram) and the Auto Scaling group to scale out based on the number of jobs in the queue to a maximum number of workers (plus state machine), as defined in the Auto Scaling Group. When the job queue is empty, the number of running instances scale down to 0 as they finish their remaining jobs.

 

Process Submitting Jobs and Retrieving Results

  1. Seen in1, upload input file(s) and an executable file into a unique job folder in S3 (such as /year/month/day/jobid/~job-files). Upload the executable file last because it automatically starts the job. You can also use a script to upload multiple files at a time but each job will need a unique directory. There are many ways to make S3 buckets available to users including AWS Storage Gateway, AWS Transfer for SFTP, AWS DataSync, the AWS Console or any one of the AWS SDKs leveraging S3 API calls.
  2. You can monitor job status by accessing the DynamoDB table directly via the AWS Management Console or use the AWS CLI to call DynamoDB via an API call.
  3. Seen in step 5, you can retrieve result files for jobs from the same S3 directory where you left the input files. The DynamoDB table confirms when jobs are done. The SQS output queue can be used by applications that must automatically poll and retrieve results.

You no longer need to create or access compute nodes as compute resources. These automatically scale up from zero when jobs come in, and then back down to zero when jobs are finished.

 

Deployment

Read the GitHub Repo for deployment instructions. Below are CloudFormation templates to help:

AWS RegionLaunch Stack
eu-north-1link to zone
ap-south-1
eu-west-3
eu-west-2
eu-west-1
ap-northeast-3
ap-northeast-2
ap-northeast-1
sa-east-1
ca-central-1
ap-southeast-1
ap-southeast-2
eu-central-1
us-east-1
us-east-2
us-west-1
us-west-2

 

 

Additional Points on Usage Patterns

 

  • While the two solutions in this blog are aimed at HPC applications, they can be used to run any batch jobs. Many customers that run large data processing batch jobs in their data lakes could use the serverless scheduler.

 

  • You can build pipelines of different applications when the output of one job triggers another to do something else – an example being pre-processing, meshing, simulation, post-processing. You simply deploy the Resource Automation template several times, and tailor it so that the output bucket for one step is the input bucket for the next step.

 

  • You might look to use the “job_success_string” parameter for iteration/verification used in cases where a shot-gun approach is needed to run thousands of jobs, and only one has a chance of producing the right result. In this case the “job_success_string” would identify the successful job from potentially hundreds of thousands pushed to SQS job queue.

 

Scale-out across teams and departments

Because all services used are serverless, you can deploy as many run environments as needed without increasing overall costs. Serverless workloads only accumulate cost when the services are used. So, you could deploy ten job environments and run one job in each, and your costs would be the same if you had one job environment running ten jobs.

 

All you need is an S3 bucket to upload jobs to and an associated AMI that has the right applications and license configuration. Because a job configuration is passed to the scheduler at each job start, you can add new teams by creating an S3 bucket and pointing S3 events to a default Lambda function that pulls configurations for each job start.

 

Setup CI/CD pipeline to start continuous improvement of scheduler

If you are advanced, we encourage you to clone the git repo and customize this solution. The serverless scheduler is less complex than other schedulers, because you only think about one worker and the process of one job’s run.

Ways you could tailor this solution:

  • Add intelligent job scheduling using AWS Sagemaker  – It is hard to find data as ready for ML as log data because every job you run has different run times and resource consumption. So, you could tailor this solution to predict the best instance to use with ML when workloads are submitted.
  • Add Custom Licensing Checkout Logic – Simply add one Lambda function to your Step Functions workflow to make an API call a license server before continuing with one or more jobs. You can start a new worker when you have a license checked out or if a license is not available then the instance can terminate to remove any costs waiting for licenses.
  • Add Custom Metrics to DynamoDB – You can easily add metrics to DynamoDB because the solution already has baseline logging and monitoring capabilities.
  • Run on other AWS Services – There is a Lambda function in the Step Functions workflow called “Start_Job”. You can tailor this Lambda to run your jobs on AWS Sagemaker, AWS EMR, AWS EKS or AWS ECS instead of EC2.

 

Conclusion

 

Although HPC workloads and EDA flows may still be dependent on current scheduling technologies, we illustrated the possibilities of decoupling your workloads from your existing shared scheduling environments. This post went deep into decoupled serverless scheduling, and we understand that it is difficult to unwind decades of dependencies. However, leveraging numerous AWS Services encourages you to think completely differently about running workloads.

But more importantly, it encourages you to Think Big. With this solution you can get up and running quickly, fail fast, and iterate. You can do this while scaling to your required number of resources, when you want them, and only pay for what you use.

Serverless computing  catalyzes change across all industries, but that change is not obvious in the HPC and EDA industries. This solution is an opportunity for customers to take advantage of the nearly limitless capacity that AWS.

Please reach out with questions about HPC and EDA on AWS. You now have the architecture and the instructions to build your Serverless Decoupled Scheduling environment.  Go build!


About the Authors and Contributors

Authors 

 

Ludvig Nordstrom is a Senior Solutions Architect at AWS

 

 

 

 

Mark Duffield is a Tech Lead in Semiconductors at AWS

 

 

 

Contributors

 

Steve Engledow is a Senior Solutions Builder at AWS

 

 

 

 

Arun Thomas is a Senior Solutions Builder at AWS

 

 

New AWS Lambda controls for stream processing and asynchronous invocations

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/new-aws-lambda-controls-for-stream-processing-and-asynchronous-invocations/

Today AWS Lambda is introducing new controls for asynchronous and stream processing invocations. These new features allow you to customize responses to Lambda function errors and build more resilient event-driven and stream-processing applications.

Stream processing function invocations

When processing data from event sources such as Amazon Kinesis Data Streams, and Amazon DynamoDB Streams, Lambda reads records in batches via shards. A shard is a uniquely identified sequence of data records. Your function is then invoked to process records from the batch “in order.” If an error is returned, Lambda retries the batch until processing succeeds or the data expires. This retry behavior is desirable in many cases, but not all:

  1. Until the issue is resolved, no data in the shard is processed. A single malformed record can prevent processing on an entire shard since “in order” guarantee ensures that failed batches are retried until the data record expires. This is often referred to as a “poison pill” event in that it prevents the rest of the system from processing data.
  2. In some cases, it may not be helpful to retry if subsequent invocations will also fail.
  3. If the function is not idempotent, retrying might produce unexpected side effects, and may result in duplicate outputs (for example, multiple database entries or business transactions).
Stream processing function invocationsFigure 1: Default stream invocation retry logs.

The new Lambda controls for stream processing invocations

With the new customizable controls, users are able to control how function errors and retries impact stream processing function invocations. A new event source-mapping configuration subresource (shown below), allows developers to clearly specify which controls will apply specifically to invocations.

DestinationConfig: {
    OnFailure: {
       Destination: SNS/SQS arn (String)
    }
}
Figure 2: Stream processing destination configuration.
{
    "MaximumRetryAttempts": integer,
    "BisectBatchOnFunctionError" : boolean,
    "MaximumRecordAgeInSeconds" : integer
}
Figure 3: Stream processing failure configuration.

MaximumRetryAttempts

Set a maximum number of retry attempts for batches before they can be skipped to unblock processing and prevent duplicate outputs.

Minimum: 0 | maximum: 10,000 | default: 10,000

MaximumRecordAgeInSeconds

Define a record maximum age in seconds, with expired records skipped to allow processing to continue. Data records that do not get successfully processed within the defined age of submission are skipped

Minimum: 60 | default/maximum: 604,800

BisectBatchOnFunctionError

This gives the option to recursively split the failed batch and retry on a smaller subset of records, eventually isolating the metadata causing the error.

Default: false

On-failure Destination

When either MaximumRetryAttempts or MaximumRecordAgeInSeconds reaches the specified value, a record will be skipped. If ‘Destination’ is set, metadata about the skipped records can be sent to a target ARN (i.e. Amazon SQS or Amazon SNS). If no target is configured, then the record will be dropped.

Getting started with error handling for stream processing invocations

Here you will see how to use the new controls to create a customized error handling configuration for stream processing invocations. You will set the maximum retry attempts to 1, the maximum record age to 60s, and send the metadata of skipped record to an Amazon Simple Queue Service (SQS) queue. First create a Kinesis Stream to put records into, and create an SQS queue to hold the metadata of retry exhausted or expired data records.

  1. Go to the Lambda console and choose Create function.
  2. Ensure that Author from scratch is selected, and enter a function a name such as “customStreamErrorExample.”
  3. Select Runtime as Node.js.12.x.
  4. Choose Create function.
    To enable Kinesis as an event trigger and to send metadata of a failed invocation to the SQS standard queue, you must give the Lambda execution role the required permissions.
  5. Scroll down to the Execution Role section and select the view link below the Existing role drop-down.
    Existing role selection
  6. Choose Add inline policy > JSON, then paste the following into the text box replacing {yourSQSarn} with the ARN of the SQS queue you created earlier and replacing {yourKinesisarn} with the ARN of the stream you created earlier. Choose Review policy.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "sqs:SendMessage",
                    "kinesis:GetShardIterator",
                    "kinesis:GetRecords",
                    "kinesis:DescribeStream"
                ],
                "Resource": [
                    "{yourSQSarn}",
                    "{yourKinesisarn}"
                ]
            },
            {
                "Sid": "VisualEditor1",
                "Effect": "Allow",
                "Action": "kinesis:ListStreams",
                "Resource": "*"
            }
        ]
    }
    
  7. Give the policy a name such as CustomSQSKinesis and choose Create policy.
  8. Navigate back to your Lambda function and choose Add trigger.
  9. Select Kinesis from the drop-down. Select your stream in the Kinesis stream dropdown.
  10. To see the new control options, choose the drop-down arrow on the Additional settings section.
  11. Paste the ARN of the previously created SQS queue (refer to this link to find the ARN).Additional settings
  12. Set the Maximum retry attempts to 1. Set the Maximum record age to 60. Choose Add.
    The Kinesis trigger has now been configured and the Lambda function has the required permissions to write to SQS and CloudWatch Logs.Success messageKinesis trigger added
  13. Choose the Lambda icon in the Designer section. Scroll down to the Function code section and paste the following code:
    exports.handler = async (event) => {
        // TODO implement
        console.log(event)
        const response = {
            statusCode: 200,
            body: event,dfh
        };
        return response;
    };

    This code has a syntax error in order to force a failure response.

    Next, you will trigger the Lambda function by adding a record to the Kinesis Stream via the AWS CLI. See how to install the AWS CLI if you have not previously done so.

  14. In your preferred terminal run the following command, replacing {YourStreamName} with the name of the stream you have created.
    aws kinesis put-record --stream-name {YourStreamName} --partition-key 123 --data testdata

    You should see a response similar to this (your sequence number will be different):
    Terminal response

  15. In the AWS Lambda console, choose Monitoring > View logs in CloudWatch and select the most recent log group.CloudWatch Logs
  16. As you can see the Lambda function failed as expected, but this time with only a single retry attempt. This is because the max retry attempt value was set to 1 and the record is younger than the max record age that was set to 60.
  17. In the AWS Management Console navigate to your SQS queue, Services > SQS.
  18. Select your queue and choose Queue Actions > View/Delete Messages > Start Polling for Messages.View/delete messages

Here you will see the metadata of your failed invocation. It has successfully been sent to the SQS destination for further investigation or processing.

Asynchronous function invocations

When a function is asynchronously invoked, Lambda sends the event to a queue before it is processed by your function. Invocations that result in an exception in the function code are retried twice with a delay of one minute before the first retry and two minutes before the second. Some invocations may never run due to a throttle or service fault: these are retried with exponential backoff until the invocation is six hours old.

Default asynchronous invocation retry logs

This retry behaviour shown above, works well in situations where every invocation must run at least once. However, in situations with a large volume of errors subsequent invocations are increasingly delayed as the backlog increases, as each new error places another retry event back into the event queue. If it were possible to skip the event without retrying, it would eliminate this delay and continue processing new events.

In order to avoid this default auto-retry policy, there are several widely used approaches:

  • Function error handling with AWS Step Functions
  • Using the event handler for error routing logic.
  • Third party monitoring services for exception alerts

These workarounds require extra resources, code, or custom logic and add latency to your applications. Retries caused by system failures due to timeouts, lack of memory, early exits, etc. could still occur.

New Lambda controls for asynchronous event invocations

New asynchronous event invoke controls mean that functions can now skip the processing of certain events and discard unwanted or backlogged requests from the asynchronous event queue without the need for “workarounds.” The controls will be accessible from the console and from a new Event Invoke Config subresource, listed in detail below:

{
    "MaximumEventAgeInSeconds": integer,
    "MaximumRetryAttempts": integer
}
Figure 4: Asynchronous event invoke configuration.

MaximumRetryAttempts

Set a maximum number of retry attempts for events before they can be skipped to unblock processing and prevent duplicate outputs.

Minimum: 0 | maximum: 2 | default: 2

MaximumEventAgeInSeconds

Define an event maximum age in seconds, with expired events skipped to allow processing to continue. Events that do not get successfully processed within defined age of submission are written to the function’s configured Dead Letter Queue and/or On-failure Destinations for asynchronous invocations. If none is configured, the event is discarded.

Minimum: 60 | maximum: 21600 (6 hours)

These controls can also be configured from the AWS Lambda console, in the “Failure handling configuration” section shown below:

Edit failure handling configurations

When the same Lambda function is run with maximum retry attempts set to 0, the Amazon CloudWatch Logs show that the function is only invoked a single time, and not retried after the initial error.

CloudWatch Logs

Conclusion

The addition of these new Lambda controls allows you to handle function failures and retries appropriately for both asynchronous and stream processing. This eliminates the need for custom-built logic, alerts, and added overhead.

Developers understand the needs of their own applications. These new features allow them to decide how to react to errors on a per-function basis. Whether that means real-time stream processing, non-blocking event queues, or more retries, it depends on the application’s requirements. With the new controls in place, the power is in your hands.

You can get started with the new controls for stream processing and asynchronous invocations via the AWS Management Console, AWS CLI, AWS SAM, AWS CloudFormation, or AWS SDK for Lambda.

New AWS Lambda scaling controls for Kinesis and DynamoDB event sources

Post Syndicated from Moheeb Zara original https://aws.amazon.com/blogs/compute/new-aws-lambda-scaling-controls-for-kinesis-and-dynamodb-event-sources/

AWS Lambda is introducing a new scaling parameter for Amazon Kinesis Data Streams and Amazon DynamoDB Streams event sources. Parallelization Factor can be set to increase concurrent Lambda invocations for each shard, which by default is 1. This allows for faster stream processing without the need to over-scale the number of shards, while still guaranteeing order of records processed.

There are two common optimization scenarios: high traffic and low traffic. For example, an online business might experience seasonal spikes in traffic. The following features help ensure that your business can scale appropriately to withstand the upcoming holiday season.

Handling high traffic with Parallelization Factor

A diagram showing how Parallelization Factor maintains order.

Each shard has uniquely identified sequences of data records. Each record contains a partition key to guarantee order and are organized separately into shards based on that key. The records from each shard must be polled to guarantee that records with the same partition key are processed in order.

When there is a high volume of data traffic, you want to process records as fast as possible. Before this release, customers were solving this by updating the number of shards on a Kinesis data stream. Increasing the number of shards increases the number of functions processing data from those shards. One Lambda function invocation processes one shard at a time.

You can now use the new Parallelization Factor to specify the number of concurrent batches that Lambda polls from a single shard. This feature introduces more flexibility in scaling options for Lambda and Kinesis. The default factor of one exhibits normal behavior. A factor of two allows up to 200 concurrent invocations on 100 Kinesis data shards. The Parallelization Factor can be scaled up to 10.

Each parallelized shard contains messages with the same partition key. This means record processing order will still be maintained and each parallelized shard must complete before processing the next.

Using Parallelization Factor

Since Parallelization Factor is quickly set on an event source mapping, it can be increased or decreased on demand. Fully automated scaling of stream processing is now possible.

For example, Amazon CloudWatch can be used to monitor changes in traffic. High traffic can cause the IteratorAge metric to increase, and an alarm can be created if this occurs for some specified period of time. The alarm can trigger a Lambda function that uses the UpdateEventSourceMapping API to increase the Parallelization Factor. In the same way, an alarm can be set to reduce the factor if traffic decreases.

You can enable Parallelization Factor in the AWS Lambda console by creating or updating a Kinesis or DynamoDB event source. Choose Additional settings and set the Concurrent batches per shard to the desired factor, between 1 and 10.

Configuring the Parallelization Factor from the AWS Lambda console.

Configuring the Parallelization Factor from the AWS Lambda console.

You can also enable this feature from the AWS CLI using the –-parallelization-factor parameter when creating or updating an event source mapping.

$ aws lambda create-event-source-mapping --function-name my-function \
--parallelization-factor 2 --batch-size 100 --starting-position AT_TIMESTAMP --starting-position-timestamp 1541139109 \
--event-source-arn arn:aws:kinesis:us-east-2:123456789012:stream/lambda-stream
{
	"UUID": "2b733gdc-8ac3-cdf5-af3a-1827b3b11284",
	“ParallelizationFactor”: 2,
	"BatchSize": 100,
	"MaximumBatchingWindowInSeconds": 0,
	"EventSourceArn": "arn:aws:kinesis:us-east-2:123456789012:stream/lambda-stream",
	"FunctionArn": "arn:aws:lambda:us-east-2:123456789012:function:my-function",
	"LastModified": 1541139209.351,
	"LastProcessingResult": "No records processed",
	"State": "Creating",
	"StateTransitionReason": "User action"
}

Handling low traffic with Batch Window

Previously, you could use Batch Size to handle low volumes, or handle tasks that were not time sensitive. Batch Size configures the number of records to read from a shard, up to 10,000. The payload limit of a single invocation is 6 MB.

In September, we launched Batch Window, which allows you to fine tune when Lambda invocations occur. Lambda normally reads records from a Kinesis data stream at a particular interval. This feature is ideal in situations where data is sparse and batches of data take time to build up.

Using Batch Window, you can set your function to wait up to 300 seconds for a batch to build before processing it. This means you can also set your function to process on certain conditions, such as reaching the payload size, or Batch Size reaching its maximum value. With Batch Window, you can manage the average number of records processed by the function with each invocation. This allows you to increase the efficiency of each invocation and reduce the total number.

Batch Window is set when adding a new event trigger in the AWS Lambda console.

Adding an event source trigger in the AWS Lambda console

Adding an event source trigger in the AWS Lambda console

It can also be set using AWS CLI with the --maximum-batching-window-in-seconds parameter.

$ aws lambda create-event-source-mapping --function-name my-function \
--maximum-batching-window-in-seconds 300 --batch-size 100 --starting-position AT_TIMESTAMP --starting-position-timestamp 1541139109 \
--event-source-arn arn:aws:kinesis:us-east-2:123456789012:stream/lambda-stream
{
	"UUID": "2b733gdc-8ac3-cdf5-af3a-1827b3b11284",
	"BatchSize": 100,
	"MaximumBatchingWindowInSeconds": 300,
	"EventSourceArn": "arn:aws:kinesis:us-east-2:123456789012:stream/lambda-stream",
	"FunctionArn": "arn:aws:lambda:us-east-2:123456789012:function:my-function",
	"LastModified": 1541139209.351,
	"LastProcessingResult": "No records processed",
	"State": "Creating",
	"StateTransitionReason": "User action"
}

Conclusion

You now have new options for managing scale in Amazon Kinesis and Amazon DynamoDB stream processing.  The Batch Window parameter allows you to tune how long to wait before processing a batch, ideal for low traffic or tasks that aren’t time sensitive. The Parallelization Factor parameter enables faster stream processing of ordered records at high volume, using concurrent Lambda invocations per shard. Both of these features can lead to more efficient stream processing.

New – Convert Your Single-Region Amazon DynamoDB Tables to Global Tables

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-convert-your-single-region-amazon-dynamodb-tables-to-global-tables/

Hundreds of thousands of AWS customers are using Amazon DynamoDB. In 2017, we launched DynamoDB global tables, a fully managed solution to deploy multi-region, multi-master DynamoDB tables without having to build and maintain your own replication solution. When you create a global table, you specify the AWS Regions where you want the table to be available. DynamoDB performs all of the necessary tasks to create identical tables in these regions and propagate ongoing data changes to all of them.

AWS customers are using DynamoDB global tables for two main reasons: to provide a low-latency experience to their clients and to facilitate their backup or disaster recovery process. Latency is the time it takes for a piece of information to travel through the network and back. Lower latency apps have higher customer engagement and generate more revenue. Deploying your backend to multiple regions close to your customers allows you to reduce the latency in your app. Having a full copy of your data in another region makes it easy to switch traffic to that other region in case you would break your regional setup or in case of an exceedingly rare regional failure. As our CTO, Dr. Werner Vogels wrote: “failures are a given, and everything will eventually fail over time.”

Starting today, you can convert your existing DynamoDB tables to global tables with a few clicks in the AWS Management Console, or using the AWS Command Line Interface (CLI), or the Amazon DynamoDB API. Previously, only empty tables could be converted to global tables. You had to guess your regional usage of a table at the time you created it. Now you can go global, or you can extend existing global tables to additional regions at any time.

Your applications can continue to use the table while we set up the replication. When you add a region to your table, DynamoDB begins populating the new replica using a snapshot of your existing table. Your applications can continue writing to your existing region while DynamoDB builds the new replica, and all in-flight updates will be eventually replicated to your new replica.

To create a DynamoDB global table using the AWS Command Line Interface (CLI), I first create a local table in the US West (Oregon) Region (us-west-2):

aws dynamodb create-table --region us-west-2 \
                          --table-name demo-global-table \
                          --key-schema AttributeName=id,KeyType=HASH \
                          --attribute-definitions AttributeName=id,AttributeType=S \
                          --billing-mode PAY_PER_REQUEST

The command returns:

{
    "TableDescription": {
        "AttributeDefinitions": [
            {
                "AttributeName": "id",
                "AttributeType": "S"
            }
        ],
        "TableName": "demo-global-table",
        "KeySchema": [
            {
                "AttributeName": "id",
                "KeyType": "HASH"
            }
        ],
        "TableStatus": "CREATING",
        "CreationDateTime": 1570278914.419,
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0,
            "ReadCapacityUnits": 0,
            "WriteCapacityUnits": 0
        },
        "TableSizeBytes": 0,
        "ItemCount": 0,
        "TableArn": "arn:aws:dynamodb:us-west-2:400000000003:table/demo-global-table",
        "TableId": "0a04bd34-bbff-42dd-ae18-78d05ce641fd",
        "BillingModeSummary": {
            "BillingMode": "PAY_PER_REQUEST"
        }
    }
}

Once the table is created, I insert some items:

aws dynamodb batch-write-item --region us-west-2 --request-items file://./batch-write-items.json

(The json file is available as a gist)

Then, I update the table to add an additional region, the US East (N. Virginia) Region (us-east-1):

aws dynamodb update-table --table-name
  '{
    "ReplicaUpdates":
    [
      {
        "Create": {
          "RegionName":  "us-east-1"
        }
      }
    ]
  }'

The command returns a long JSON, the attributes you need to pay attention to are:

{
...
        "TableStatus": "UPDATING",
        "TableSizeBytes": 124,
        "ItemCount": 3,
        "StreamSpecification": {
            "StreamEnabled": true,
            "StreamViewType": "NEW_AND_OLD_IMAGES"
        },
        "LatestStreamLabel": "2019-10-22T19:33:37.819",
        "LatestStreamArn": "arn:aws:dynamodb:us-west-2:400000000003:table/demo-global-table/stream/2019-10-22T19:33:37.819"
    }
...
}

I can make the same update in the AWS Management Console I select the table to update and click Global Tables.

Enabling streaming is a requirement for global tables. I first click on Enable stream, then Add region:

I choose the region I want to replicate to, for this example EU West (Ireland) and click Create replica.

DynamoDB asynchronously replicates the table to the new region. I monitor the progress of the replication in the AWS Management Console. Table’s status will eventually change from Creating to Active. I also can check the status by calling the DescribeTable API and verify TableStatus = Active.

After a while, I can query the table in the new region:

aws dynamodb get-item --region eu-east-1 --table-name demo-global-table --key '{"id" : {"S" : "0123456789"}}'

{
    "Item": {
        "firstname": {
            "S": "Jeff"
        },
        "id": {
            "S": "0123456789"
        },
        "lastname": {
            "S": "Barr"
        }
    }
}

Starting today, you can update existing local tables to global tables. In a few weeks, we’ll release a tool that enables you to update your existing global tables to take advantage of this new capability. The update itself will take a few minutes at most. Your table will be available for your applications during the update process.

Other Improvements
We are also simplifying the internal mechanism used for data synchronization. Previously, DynamoDB global tables leveraged DynamoDB Streams and added three attributes (aws:rep:*) in your schema to keep your data in sync. DynamoDB now manages replication natively. It does not expose synchronization attributes to your data and it does not consume additional write capacity:

  • Only one write operation occurs in each region of your global table, which reduces the consumed replicated write capacity that is required on your table.
  • Because of that, a second DynamoDB Streams record is no longer published.
  • The three aws:rep:* attributes that were previously populated are no longer inserted in the item record.

These changes have two consequences for your apps. First, they reduce your DynamoDB costs when using global tables, because no extra write capacity is required to managed the synchronization. Secondly, when your application is relying on the three technical items (aws:rep:*), your application requires a slight code change. In particular, DynamoDB Mapper must not require the aws:rep:* attributes to exist in the item record.

With this change, we are also updating the UpdateTable API. Any operation that modifies global secondary indexes (GSIs), billing mode, server-side encryption, or write capacity units on a global table are applied to all other replicas asynchronously.

Availability
The improved Amazon DynamoDB global tables is available today in the 13 regions where Amazon DynamoDB global tables is available, and more regions are planned in the future. As of today, the list of AWS Regions is us-east-1 (Northern Virginia), us-west-2 (Oregon), us-east-2 (Ohio), us-west-1 (Northern California), ap-northeast-2 (Seoul), ap-southeast-1 (Singapore), ap-southeast-2 (Sydney), ap-northeast-1 (Tokyo), eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-2 (London), GovCloud (US-East), and GovCloud (US-West).

There is no change in pricing. You pay only for the resources you use in the additional regions and the data transfer between regions.

This update addresses the most common feedback that we have heard from you and will serve as the platform on which we will build additional features in the future. Continue to tell us how you are using global tables and what is important for your apps.

— seb

ICYMI: Serverless Q3 2019

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2019/

This post is courtesy of Julian Wood, Senior Developer Advocate – AWS Serverless

Welcome to the seventh edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

ICYMI calendar

Launches/New products

Amazon EventBridge was technically launched in this quarter although we were so excited to let you know, we squeezed it into the Q2 2019 update. If you missed it, EventBridge is the serverless event bus that connects application data from your own apps, SaaS, and AWS services. This allows you to create powerful event-driven serverless applications using a variety of event sources.

The AWS Bahrain Region has opened, the official name is Middle East (Bahrain) and the API name is me-south-1. AWS Cloud now spans 22 geographic Regions with 69 Availability Zones around the world.

AWS Lambda

In September we announced dramatic improvements in cold starts for Lambda functions inside a VPC. With this announcement, you see faster function startup performance and more efficient usage of elastic network interfaces, drastically reducing VPC cold starts.

VPC to VPC NAT

These improvements are rolling out to all existing and new VPC functions at no additional cost. Rollout is ongoing, you can track the status from the announcement post.

AWS Lambda now supports custom batch window for Kinesis and DynamoDB Event sources, which helps fine-tune Lambda invocation for cost optimization.

You can now deploy Amazon Machine Images (AMIs) and Lambda functions together from the AWS Marketplace using using AWS CloudFormation with just a few clicks.

AWS IoT Events actions now support AWS Lambda as a target. Previously you could only define actions to publish messages to SNS and MQTT. Now you can define actions to invoke AWS Lambda functions and even more targets, such as Amazon Simple Queue Service and Amazon Kinesis Data Firehose, and republish messages to IoT Events.

The AWS Lambda Console now shows recent invocations using CloudWatch Logs Insights. From the monitoring tab in the console, you can view duration, billing, and memory statistics for the 10 most recent invocations.

AWS Step Functions

AWS Step Functions example

AWS Step Functions has now been extended to support probably its most requested feature, Dynamic Parallelism, which allows steps within a workflow to be executed in parallel, with a new Map state type.

One way to use the new Map state is for fan-out or scatter-gather messaging patterns in your workflows:

  • Fan-out is applied when delivering a message to multiple destinations, and can be useful in workflows such as order processing or batch data processing. For example, you can retrieve arrays of messages from Amazon SQS and Map sends each message to a separate AWS Lambda function.
  • Scatter-gather broadcasts a single message to multiple destinations (scatter), and then aggregates the responses back for the next steps (gather). This is useful in file processing and test automation. For example, you can transcode ten 500-MB media files in parallel, and then join to create a 5-GB file.

Another important update is AWS Step Functions adds support for nested workflows, which allows you to orchestrate more complex processes by composing modular, reusable workflows.

AWS Amplify

A new Predictions category as been added to the Amplify Framework to quickly add machine learning capabilities to your web and mobile apps.

Amplify framework

With a few lines of code you can add and configure AI/ML services to configure your app to:

  • Identify text, entities, and labels in images using Amazon Rekognition, or identify text in scanned documents to get the contents of fields in forms and information stored in tables using Amazon Textract.
  • Convert text into a different language using Amazon Translate, text to speech using Amazon Polly, and speech to text using Amazon Transcribe.
  • Interpret text to find the dominant language, the entities, the key phrases, the sentiment, or the syntax of unstructured text using Amazon Comprehend.

AWS Amplify CLI (part of the open source Amplify Framework) has added local mocking and testing. This allows you to mock some of the most common cloud services and test your application 100% locally.

For this first release, the Amplify CLI can mock locally:

amplify mock

AWS CloudFormation

The CloudFormation team has released the much-anticipated CloudFormation Coverage Roadmap.

Styled after the popular AWS Containers Roadmap, the CloudFormation Coverage Roadmap provides transparency about our priorities, and the opportunity to provide your input.

The roadmap contains four columns:

  • Shipped – Available for use in production in all public AWS Regions.
  • Coming Soon – Generally a few months out.
  • We’re working on It – Work in progress, but further out.
  • Researching – We’re thinking about the right way to implement the coverage.

AWS CloudFormation roadmap

Amazon DynamoDB

NoSQL Workbench for Amazon DynamoDB has been released in preview. This is a free, client-side application available for Windows and macOS. It helps you more easily design and visualize your data model, run queries on your data, and generate the code for your application.

Amazon Aurora

Amazon Aurora Serverless is a dynamically scaling version of Amazon Aurora. It automatically starts up, shuts down, and scales up or down, based on your application workload.

Aurora Serverless has had a MySQL compatible edition for a while, now we’re excited to bring more serverless joy to databases with the PostgreSQL compatible version now GA.

We also have a useful post on Reducing Aurora PostgreSQL storage I/O costs.

AWS Serverless Application Repository

The AWS Serverless Application Repository has had some useful SAR apps added by Serverless Developer Advocate James Beswick.

  • S3 Auto Translator which automatically converts uploaded objects into other languages specified by the user, using Amazon Translate.
  • Serverless S3 Uploader allows you to upload JPG files to Amazon S3 buckets from your web applications using presigned URLs.

Serverless posts

July

August

September

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q3:

Twitch

July

August

September

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS re:Invent

AWS re:Invent

December 2 – 6 in Las Vegas, Nevada is peak AWS learning time with AWS re:Invent 2019. Join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements.

Be sure to take a look at the growing catalog of serverless sessions this year. Make sure to book time for Builders SessionsChalk Talks, and Workshops as these sessions will fill up quickly. The schedule is updated regularly so if your session is currently fully booked, a repeat may be scheduled.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft.

Our friends at IOPipe have written 5 tips for avoiding serverless FOMO at this year’s re:Invent.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

 

Building a Serverless FHIR Interface on AWS

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/building-a-serverless-fhir-interface-on-aws/

This post is courtesy of Mithun Mallick, Senior Solutions Architect (Messaging), and Navneet Srivastava, Senior Solutions Architect.

Technology is revolutionizing the healthcare industry but it can be a challenge for healthcare providers to take full advantage because of software systems that don’t easily communicate with each other. A single patient visit involves multiple systems such as practice management, electronic health records, and billing. When these systems can’t operate together, it’s harder to leverage them to improve patient care.

To help make it easier to exchange data between these systems, Health Level Seven International (HL7) developed the Fast Healthcare Interoperability Resources (FHIR), an interoperability standard for the electronic exchange of healthcare information. In this post, I will show you the AWS services you use to build a serverless FHIR interface on the cloud.

In FHIR, resources are your basic building blocks. A resource is an exchangeable piece of content that has a common way to define and represent it, a set of common metadata, and a human readable part. Each resource type has the same set of operations, called interactions, that you use to manage the resources in a granular fashion. For more information, see the FHIR overview.

FHIR Serverless Architecture

My FHIR architecture features a server with its own data repository and a simple consumer application that displays Patient and Observation data. To make it easier to build, my server only supports the JSON content type over HTTPs, and it only supports the Bundle, Patient, and Observation FHIR resource types. In a production environment, your server should support all resource types.

For this architecture, the server supports the following interactions:

  • Posting bundles as collections of Patients and Observations
  • Searching Patients and Observations
  • Updating and reading Patients
  • Creating a CapabilityStatement

You can expand this architecture to support all FHIR resource types, interactions, and data formats.

The following diagram shows how the described services work together to create a serverless FHIR messaging interface.

 

Services work together to create a serverless FHIR messaging interface.

 

Amazon API Gateway

In Amazon API Gateway, you create the REST API that acts as a “front door” for the consumer application to access the data and business logic of this architecture. I used API Gateway to host the API endpoints. I created the resource definitions and API methods in the API Gateway.

For this architecture, the FHIR resources map to the resource definitions in API Gateway. The Bundle FHIR resource type maps to the Bundle API Gateway resource. The observation FHIR resource type maps to the observation API Gateway resource. And, the Patient FHIR resource type maps to the Patient API Gateway resource.

To keep the API definitions simple, I used the ANY method. The ANY method handles the various URL mappings in the AWS Lambda code, and uses Lambda proxy integration to send requests to the Lambda function.

You can use the ANY method to handle HTTP methods, such as:

  • POST to represent the interaction to create a Patient resource type
  • GET to read a Patient instance based on a patient ID, or to search based on predefined parameters

We chose Amazon DynamoDB because it provides the input data representation and query patterns necessary for a FHIR data repository. For this architecture, each resource type is stored in its own Amazon DynamoDB table. Metadata for resources stored in the repository is also stored in its own table.

We set up global secondary indexes on the patient and observations tables in order to perform searches and retrieve observations for a patient. In this architecture, the patient id is stored as a patient reference id in the observation table. The patientRefid-index allows you to retrieve observations based on the patient id without performing a full scan of the table.

We chose Amazon S3 to store archived FHIR messages because of its low cost and high durability.

Processing FHIR Messages

Each Amazon API Gateway request in this architecture is backed by an AWS Lambda function containing the Jersey RESTful web services framework, the AWS serverless Java container framework, and the HAPI FHIR library.

The AWS serverless Java framework provides a base implementation for the handleRequest method in LambdaHandler class. It uses the serverless Java container initialized in the global scope to proxy requests to our jersey application.

The handler method calls a proxy class and passes the stream classes along with the context.

This source code from the LambdaHandler class shows the handleRequest method:

// Main entry point of the Lambda function, uses the serverless-java-container initialized in the global scope
// to proxy requests to our jersey application
public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) 
    throws IOException {
    	
        handler.proxyStream(inputStream, outputStream, context);

        // just in case it wasn't closed by the map	per
        outputStream.close();
}

The resource implementations classes are in the com.amazonaws.lab.resources package. This package defines the URL mappings necessary for routing the REST API calls.

The following method from the PatientResource class implements the GET patient interaction based on a patient id. The annotations describe the HTTP method called, as well as the path that is used to make the call. This method is invoked when a request is sent with the URL pattern: Patient/{id}. It retrieves the Patient resource type based on the id sent as part of the URL.

	@GET
	@Path("/{id}")
public Response gETPatientid(@Context SecurityContext securityContext,
			@ApiParam(value = "", required = true) @PathParam("id") String id, @HeaderParam("Accept") String accepted) {
…
}

Deploying the FHIR Interface

To deploy the resources for this architecture, we used an AWS Serverless Application Model (SAM) template. During deployment, SAM templates are expanded and transformed into AWS CloudFormation syntax. The template launches and configures all the services that make up the architecture.

Building the Consumer Application

For out architecture, we wrote a simple Node.JS client application that calls the APIs on FHIR server to get a list of patients and related observations. You can build more advanced applications for this architecture. For example, you could build a patient-focused application that displays vitals and immunization charts. Or, you could build a backend/mid-tier application that consumes a large number of messages and transforms them for downstream analytics.

This is the code we used to get the token from Amazon Cognito:

token = authcognito.token();

//Setting url to call FHIR server

     var options = {
       url: "https://<FHIR SERVER>",
       host: "FHIR SERVER",
       path: "Prod/Patient",
       method: "GET",
       headers: {
         "Content-Type": "application/json",
         "Authorization": token
         }
       }

This is the code we used to call the FHIR server:

request(options, function(err, response, body) {
     if (err) {
       console.log("In error  ");
       console.log(err);

}
else {
     let patientlist = JSON.parse(body);

     console.log(patientlist);
     res.json(patientlist["entry"]);
}
});
 

We used AWS CloudTrail and AWS X-Ray for logging and debugging.

The screenshots below display the results:

Conclusion

In this post, we demonstrated how to build a serverless FHIR architecture. We used Amazon API Gateway and AWS Lambda to ingest and process FHIR resources, and Amazon DynamoDB and Amazon S3 to provide a repository for the resources. Amazon Cognito provides secure access to the API Gateway. We also showed you how to build a simple consumer application that displays patient and observation data. You can modify this architecture for your individual use case.

About the authors

Mithun MallickMithun is a Sr. Solutions Architect and is responsible for helping customers in the HCLS industry build secure, scalable and cost-effective solutions on AWS. Mithun helps develop and implement strategic plan to engage customers and partners in the industry and works with the community of technically focused HCLS specialists within AWS. He has hands on experience on messaging standards like X12, HL7 and FHIR. Mithun has a M.B.A from CSU (Ft. Collins, CO) and a bachelors in Computer Engineering. He holds several associate and professional certifications for architecting on AWS.

 

 

Navneet SrivastavaNavneet, a Sr. Solutions Architect, is responsible for helping provider organizations and healthcare companies to deploy electronic medical records, devices, and AI/ML-based applications while educating customers about how to build secure, scalable, and cost-effective AWS solutions. He develops strategic plans to engage customers and partners, and works with a community of technically focused HCLS specialists within AWS. He is skilled AI, ML, Big Data, and healthcare related technologies. Navneet has a M.B.A from NYIT and a bachelors in software Engineering and holds several associate and professional certifications for architecting on AWS.

NoSQL Workbench for Amazon DynamoDB – Available in Preview

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/nosql-workbench-for-amazon-dynamodb-available-in-preview/

I am always impressed by the flexibility of Amazon DynamoDB, providing our customers a fully-managed key-value and document database that can easily scale from a few requests per month to millions of requests per second.

The DynamoDB team released so many great features recently, from on-demand capacity, to support for native ACID transactions. Here’s a great recap of other recent DynamoDB announcements such as global tables, point-in-time recovery, and instant adaptive capacity. DynamoDB now encrypts all customer data at rest by default.

However, switching mindset from a relational database to NoSQL is not that easy. Last year we had two amazing talks at re:Invent that can help you understand how DynamoDB works, and how you can use it for your use cases:

To help you even further, we are introducing today in preview NoSQL Workbench for Amazon DynamoDB, a free, client-side application available for Windows and macOS to help you design and visualize your data model, run queries on your data, and generate the code for your application!

The three main capabilities provided by the NoSQL Workbench are:

  • Data modeler — to build new data models, adding tables and indexes, or to import, modify, and export existing data models.
  • Visualizer — to visualize data models based on their applications access patterns, with sample data that you can add manually or import via a SQL query.
  • Operation builder — to define and execute data-plane operations or generate ready-to-use sample code for them.

To see how this new tool can simplify working with DynamoDB, let’s build an application to retrieve information on customers and their orders.

Using the NoSQL Workbench
In the Data modeler, I start by creating a CustomerOrders data model, and I add a table, CustomerAndOrders, to hold my customer data and the information on their orders. You can use this tool to create a simple data model where customers and orders are in two distinct tables, each one with their own primary keys. There would be nothing wrong with that. Here I’d like to show how this tool can also help you use more advanced design patterns. By having the customer and order data in a single table, I can construct queries that return all the data I need with a single interaction with DynamoDB, speeding up the performance of my application.

As partition key, I use the customerId. This choice provides an even distribution of data across multiple partitions. The sort key in my data model will be an overloaded attribute, in the sense that it can hold different data depending on the item:

  • A fixed string, for example customer, for the items containing the customer data.
  • The order date, written using ISO 8601 strings such as 20190823, for the items containing orders.

By overloading the sort key with these two possible values, I am able to run a single query that returns the customer data and the most recent orders. For this reason, I use a generic name for the sort key. In this case, I use sk.

Apart from the partition key and the optional sort key, DynamoDB has a flexible schema, and the other attributes can be different for each item in a table. However, with this tool I have the option to describe in the data model all the possible attributes I am going to use for a table. In this way, I can check later that all the access patterns I need for my application work well with this data model.

For this table, I add the following attributes:

  • customerName and customerAddress, for the items in the table containing customer data.
  • orderId and deliveryAddress, for the items in the table containing order data.

I am not adding a orderDate attribute, because for this data model the value will be stored in the sk sort key. For a real production use case, you would probably have much more attributes to describe your customers and orders, but I am trying to keep things simple enough here to show what you can do, without getting lost in details.

Another access pattern for my application is to be able to get a specific order by ID. For that, I add a global secondary index to my table, with orderId as partition key and no sort key.

I add the table definition to the data model, and move on to the Visualizer. There, I update the table by adding some sample data. I add data manually, but I could import a few rows from a table in a MySQL database, for example to simplify a NoSQL migration from a relational database.

Now, I visualize my data model with the sample data to have a better understanding of what to expect from this table. For example, if I select a customerId, and I query for all the orders greater than a specific date, I also get the customer data at the end, because the string customer, stored in the sk sort key, is always greater that any date written in ISO 8601 syntax.

In the Visualizer, I can also see how the global secondary index on the orderId works. Interestingly, items without an orderId are not part of this index, so I get only 4 of the 6 items that are part of my sample data. This happens because DynamoDB writes a corresponding index entry only if the index sort key value is present in the item. If the sort key doesn’t appear in every table item, the index is said to be sparseSparse indexes are useful for queries over a subsection of a table.

I now commit my data model to DynamoDB. This step creates server-side resources such as tables and global secondary indexes for the selected data model, and loads the sample data. To do so, I need AWS credentials for an AWS account. I have the AWS Command Line Interface (CLI) installed and configured in the environment where I am using this tool, so I can just select one of my named profiles.

I move to the Operation builder, where I see all the tables in the selected AWS Region. I select the newly created CustomerAndOrders table to browse the data and build the code for the operations I need in my application.

In this case, I want to run a query that, for a specific customer, selects all orders more recent that a date I provide. As we saw previously, the overloaded sort key would also return the customer data as last item. The Operation builder can help you use the full syntax of DynamoDB operations, for example adding conditions and child expressions. In this case, I add the condition to only return orders where the deliveryAddress contains Seattle.

I have the option to execute the operation on the DynamoDB table, but this time I want to use the query in my application. To generate the code, I select between Python, JavaScript (Node.js), or Java.

You can use the Operation builder to generate the code for all the access patterns that you plan to use with your application, using all the advanced features that DynamoDB provides, including ACID transactions.

Available Now
You can find how to set up NoSQL Workbench for Amazon DynamoDB (Preview) for Windows and macOS here.

We welcome your suggestions in the DynamoDB discussion forum. Let us know what you build with this new tool and how we can help you more!

Learn about AWS Services & Solutions – September AWS Online Tech Talks

Post Syndicated from Jenny Hang original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-september-aws-online-tech-talks/

Learn about AWS Services & Solutions – September AWS Online Tech Talks

AWS Tech Talks

Join us this September to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

 

Compute:

September 23, 2019 | 11:00 AM – 12:00 PM PTBuild Your Hybrid Cloud Architecture with AWS – Learn about the extensive range of services AWS offers to help you build a hybrid cloud architecture best suited for your use case.

September 26, 2019 | 1:00 PM – 2:00 PM PTSelf-Hosted WordPress: It’s Easier Than You Think – Learn how you can easily build a fault-tolerant WordPress site using Amazon Lightsail.

October 3, 2019 | 11:00 AM – 12:00 PM PTLower Costs by Right Sizing Your Instance with Amazon EC2 T3 General Purpose Burstable Instances – Get an overview of T3 instances, understand what workloads are ideal for them, and understand how the T3 credit system works so that you can lower your EC2 instance costs today.

 

Containers:

September 26, 2019 | 11:00 AM – 12:00 PM PTDevelop a Web App Using Amazon ECS and AWS Cloud Development Kit (CDK) – Learn how to build your first app using CDK and AWS container services.

 

Data Lakes & Analytics:

September 26, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Provisioning Amazon MSK Clusters and Using Popular Apache Kafka-Compatible Tooling – Learn best practices on running Apache Kafka production workloads at a lower cost on Amazon MSK.

 

Databases:

September 25, 2019 | 1:00 PM – 2:00 PM PTWhat’s New in Amazon DocumentDB (with MongoDB compatibility) – Learn what’s new in Amazon DocumentDB, a fully managed MongoDB compatible database service designed from the ground up to be fast, scalable, and highly available.

October 3, 2019 | 9:00 AM – 10:00 AM PTBest Practices for Enterprise-Class Security, High-Availability, and Scalability with Amazon ElastiCache – Learn about new enterprise-friendly Amazon ElastiCache enhancements like customer managed key and online scaling up or down to make your critical workloads more secure, scalable and available.

 

DevOps:

October 1, 2019 | 9:00 AM – 10:00 AM PT – CI/CD for Containers: A Way Forward for Your DevOps Pipeline – Learn how to build CI/CD pipelines using AWS services to get the most out of the agility afforded by containers.

 

Enterprise & Hybrid:

September 24, 2019 | 1:00 PM – 2:30 PM PT Virtual Workshop: How to Monitor and Manage Your AWS Costs – Learn how to visualize and manage your AWS cost and usage in this virtual hands-on workshop.

October 2, 2019 | 1:00 PM – 2:00 PM PT – Accelerate Cloud Adoption and Reduce Operational Risk with AWS Managed Services – Learn how AMS accelerates your migration to AWS, reduces your operating costs, improves security and compliance, and enables you to focus on your differentiating business priorities.

 

IoT:

September 25, 2019 | 9:00 AM – 10:00 AM PTComplex Monitoring for Industrial with AWS IoT Data Services – Learn how to solve your complex event monitoring challenges with AWS IoT Data Services.

 

Machine Learning:

September 23, 2019 | 9:00 AM – 10:00 AM PTTraining Machine Learning Models Faster – Learn how to train machine learning models quickly and with a single click using Amazon SageMaker.

September 30, 2019 | 11:00 AM – 12:00 PM PTUsing Containers for Deep Learning Workflows – Learn how containers can help address challenges in deploying deep learning environments.

October 3, 2019 | 1:00 PM – 2:30 PM PTVirtual Workshop: Getting Hands-On with Machine Learning and Ready to Race in the AWS DeepRacer League – Join DeClercq Wentzel, Senior Product Manager for AWS DeepRacer, for a presentation on the basics of machine learning and how to build a reinforcement learning model that you can use to join the AWS DeepRacer League.

 

AWS Marketplace:

September 30, 2019 | 9:00 AM – 10:00 AM PTAdvancing Software Procurement in a Containerized World – Learn how to deploy applications faster with third-party container products.

 

Migration:

September 24, 2019 | 11:00 AM – 12:00 PM PTApplication Migrations Using AWS Server Migration Service (SMS) – Learn how to use AWS Server Migration Service (SMS) for automating application migration and scheduling continuous replication, from your on-premises data centers or Microsoft Azure to AWS.

 

Networking & Content Delivery:

September 25, 2019 | 11:00 AM – 12:00 PM PTBuilding Highly Available and Performant Applications using AWS Global Accelerator – Learn how to build highly available and performant architectures for your applications with AWS Global Accelerator, now with source IP preservation.

September 30, 2019 | 1:00 PM – 2:00 PM PTAWS Office Hours: Amazon CloudFront – Just getting started with Amazon CloudFront and [email protected]? Get answers directly from our experts during AWS Office Hours.

 

Robotics:

October 1, 2019 | 11:00 AM – 12:00 PM PTRobots and STEM: AWS RoboMaker and AWS Educate Unite! – Come join members of the AWS RoboMaker and AWS Educate teams as we provide an overview of our education initiatives and walk you through the newly launched RoboMaker Badge.

 

Security, Identity & Compliance:

October 1, 2019 | 1:00 PM – 2:00 PM PTDeep Dive on Running Active Directory on AWS – Learn how to deploy Active Directory on AWS and start migrating your windows workloads.

 

Serverless:

October 2, 2019 | 9:00 AM – 10:00 AM PTDeep Dive on Amazon EventBridge – Learn how to optimize event-driven applications, and use rules and policies to route, transform, and control access to these events that react to data from SaaS apps.

 

Storage:

September 24, 2019 | 9:00 AM – 10:00 AM PTOptimize Your Amazon S3 Data Lake with S3 Storage Classes and Management Tools – Learn how to use the Amazon S3 Storage Classes and management tools to better manage your data lake at scale and to optimize storage costs and resources.

October 2, 2019 | 11:00 AM – 12:00 PM PTThe Great Migration to Cloud Storage: Choosing the Right Storage Solution for Your Workload – Learn more about AWS storage services and identify which service is the right fit for your business.

 

 

ICYMI: Serverless Q2 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-q2-2019/

This post is courtesy of Moheeb Zara, Senior Developer Advocate – AWS Serverless

Welcome to the sixth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

April - June 2019

Amazon EventBridge

Before we dive in to all that happened in Q2, we’re excited about this quarter’s launch of Amazon EventBridge, the serverless event bus that connects application data from your own apps, SaaS, and AWS-as-a-service. This allows you to create powerful event-driven serverless applications using a variety of event sources.

Our very own AWS Solutions Architect, Mike Deck, sat down with AWS Serverless Hero Jeremy Daly and recorded a podcast on Amazon EventBridge. It’s a worthy listen if you’re interested in exploring all the features offered by this launch.

Now, back to Q2, here’s what’s new.

AWS Lambda

Lambda Monitoring

Amazon CloudWatch Logs Insights now allows you to see statistics from recent invocations of your Lambda functions in the Lambda monitoring tab.

Additionally, as of June, you can monitor the [email protected] functions associated with your Amazon CloudFront distributions directly from your Amazon CloudFront console. This includes a revamped monitoring dashboard for CloudFront distributions and [email protected] functions.

AWS Step Functions

Step Functions

AWS Step Functions now supports workflow execution events, which help in the building and monitoring of even-driven serverless workflows. Automatic Execution event notifications can be delivered upon start/completion of CloudWatch Events/Amazon EventBridge. This allows services such as AWS Lambda, Amazon SNS, Amazon Kinesis, or AWS Step Functions to respond to these events.

Additionally you can use callback patterns to automate workflows for applications with human activities and custom integrations with third-party services. You create callback patterns in minutes with less code to write and maintain, run without servers and infrastructure to manage, and scale reliably.

Amazon API Gateway

API Gateway Tag Based Control

Amazon API Gateway now offers tag-based access control for WebSocket APIs using AWS Identity and Access Management (IAM) policies, allowing you to categorize API Gateway resources for WebSocket APIs by purpose, owner, or other criteria.  With the addition of tag-based access control to WebSocket resources, you can now give permissions to WebSocket resources at various levels by creating policies based on tags. For example, you can grant full access to admins to while limiting access to developers.

You can now enforce a minimum Transport Layer Security (TLS) version and cipher suites through a security policy for connecting to your Amazon API Gateway custom domain.

In addition, Amazon API Gateway now allows you to define VPC Endpoint policies, enabling you to specify which Private APIs a VPC Endpoint can connect to. This enables granular security control using VPC Endpoint policies.

AWS Amplify

Amplify CLI (part of the open source Amplify Framework) now includes support for adding and configuring AWS Lambda triggers for events when using Amazon Cognito, Amazon Simple Storage Service, and Amazon DynamoDB as event sources. This means you can setup custom authentication flows for mobile and web applications via the Amplify CLI and Amazon Cognito User Pool as an authentication provider.

Amplify Console

Amplify Console,  a Git-based workflow for continuous deployment and hosting for fullstack serverless web apps, launched several updates to the build service including SAM CLI and custom container support.

Amazon Kinesis

Amazon Kinesis Data Firehose can now utilize AWS PrivateLink to securely ingest data. AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises applications, securely over the Amazon network. When AWS PrivateLink is used with Amazon Kinesis Data Firehose, all traffic to a Kinesis Data Firehose from a VPC flows over a private connection.

You can now assign AWS resource tags to applications in Amazon Kinesis Data Analytics. These key/value tags can be used to organize and identify resources, create cost allocation reports, and control access to resources within Amazon Kinesis Data Analytics.

Amazon Kinesis Data Firehose is now available in the AWS GovCloud (US-East), Europe (Stockholm), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and EU (London) regions.

For a complete list of where Amazon Kinesis Data Analytics is available, please see the AWS Region Table.

AWS Cloud9

Cloud9 Quick Starts

Amazon Web Services (AWS) Cloud9 integrated development environment (IDE) now has a Quick Start which deploys in the AWS cloud in about 30 minutes. This enables organizations to provide developers a powerful cloud-based IDE that can edit, run, and debug code in the browser and allow easy sharing and collaboration.

AWS Cloud9 is also now available in the EU (Frankfurt) and Asia Pacific (Tokyo) regions. For a current list of supported regions, see AWS Regions and Endpoints in the AWS documentation.

Amazon DynamoDB

You can now tag Amazon DynamoDB tables when you create them. Tags are labels you can attach to AWS resources to make them easier to manage, search, and filter.  Tagging support has also been extended to the AWS GovCloud (US) Regions.

DynamoDBMapper now supports Amazon DynamoDB transactional API calls. This support is included within the AWS SDK for Java. These transactional APIs provide developers atomic, consistent, isolated, and durable (ACID) operations to help ensure data correctness.

Amazon DynamoDB now applies adaptive capacity in real time in response to changing application traffic patterns, which helps you maintain uninterrupted performance indefinitely, even for imbalanced workloads.

AWS Training and Certification has launched Amazon DynamoDB: Building NoSQL Database–Driven Applications, a new self-paced, digital course available exclusively on edX.

Amazon Aurora

Amazon Aurora Serverless MySQL 5.6 can now be accessed using the built-in Data API enabling you to access Aurora Serverless with web services-based applications, including AWS LambdaAWS AppSync, and AWS Cloud9. For more check out this post.

Sharing snapshots of Aurora Serverless DB clusters with other AWS accounts or publicly is now possible. We are also giving you the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.

You can now set the minimum capacity of your Aurora Serverless DB clusters to 1 Aurora Capacity Unit (ACU). With Aurora Serverless, you specify the minimum and maximum ACUs for your Aurora Serverless DB cluster instead of provisioning and managing database instances. Each ACU is a combination of processing and memory capacity. By setting the minimum capacity to 1 ACU, you can keep your Aurora Serverless DB cluster running at a lower cost.

AWS Serverless Application Repository

The AWS Serverless Application Repository is now available in 17 regions with the addition of the AWS GovCloud (US-West) region.

Region support includes Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), Canada (Central), EU (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo), US West (N. California, Oregon), and US East (N. Virginia, Ohio).

Amazon Cognito

Amazon Cognito has launched a new API – AdminSetUserPassword – for the Cognito User Pool service that provides a way for administrators to set temporary or permanent passwords for their end users. This functionality is available for end users even when their verified phone or email are unavailable.

Serverless Posts

April

May

June

Events

Events this quarter

Senior Developer Advocates for AWS Serverless spoke at several conferences this quarter. Here are some recordings worth watching!

Tech Talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the ones from Q2.

Twitch

Twitch Series

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

Build with Serverless on Twitch

Serverless expert and AWS Specialist Solutions architect, Heitor Lessa, has been hosting a weekly Twitch series since April. Join him and others as they build an end-to-end airline booking solution using serverless. The final episode airs on August 7th at Wednesday 8:00am PT.

Here’s a recap of the last quarter:

AWS re:Invent

AWS re:Invent 2019

AWS re:Invent 2019 is around the corner! From December 2 – 6 in Las Vegas, Nevada, join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements. Be sure to take a look at the growing catalog of serverless sessions this year.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft

AWS Serverless Heroes

We urge you to explore the efforts of our AWS Serverless Heroes Community. This is a worldwide network of AWS Serverless experts with a diverse background of experience. For example, check out this post from last month where Marcia Villalba demonstrates how to set up unit tests for serverless applications.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

How to export an Amazon DynamoDB table to Amazon S3 using AWS Step Functions and AWS Glue

Post Syndicated from Joe Feeney original https://aws.amazon.com/blogs/big-data/how-to-export-an-amazon-dynamodb-table-to-amazon-s3-using-aws-step-functions-and-aws-glue/

In typical AWS fashion, not a week had gone by after I published How Goodreads offloads Amazon DynamoDB tables to Amazon S3 and queries them using Amazon Athena on the AWS Big Data blog when the AWS Glue team released the ability for AWS Glue crawlers and AWS Glue ETL jobs to read from DynamoDB tables natively. I was actually pretty excited about this. Less code means fewer bugs. The original architecture had been around for at least 18 months and could be simplified significantly with a little bit of work.

Refactoring the data pipeline

The AWS Data Pipeline architecture outlined in my previous blog post is just under two years old now. We had used data pipelines as a way to back up Amazon DynamoDB data to Amazon S3 in case of a catastrophic developer error. However, with DynamoDB point-in-time recovery we have a better, native mechanism for disaster recovery. Additionally, with data pipelines we still own the operations associated with the clusters themselves, even if they are transient. A common challenge is keeping our clusters up with recent releases of Amazon EMR to help mitigate any outstanding bugs. Another is the inefficiency of needing to spin up an EMR cluster for each DynamoDB table.

I decided to take a step back and list the capabilities I wanted to have in the next iteration:

  • Export tables using AWS Glue instead of EMR.
    • AWS Glue provides a serverless ETL environment where I don’t have to worry about the underlying infrastructure. This minimizes operational tasks like keeping up with the EMR release tags.
  • Use a workflow solution that works across services like AWS Glue and Amazon Athena.
    • In the first iteration, the workflow was spread across various services. Unless you had the entire pipeline in your head, it was difficult to get a bird’s-eye view of how the pipeline was progressing.
  • Ability to select different formats.
    • For data engineering, I prefer Apache Parquet. However, customers might prefer a different format.
  • Add exported data to Athena.
    • I find that the easier it is for the data to be queried, the more likely it’s used.

Architecture overview

At a high level, this is the architecture:

  • We’re using AWS Step Functions as the workflow engine.
    • Each step is either a built-in Step Functions state, a service integration, or a simple Python AWS Lambda For example, GlueStartJobRun is using the synchronous job run service integration, as discussed in the documentation.
    • We get a visual representation of the entire pipeline.
    • It’s quick to onboard new developers.
  • An event in Amazon CloudWatch Events, which is disabled to start, triggers a Step Functions state machine with a JSON payload that contains the following:
    • AWS Glue job name
    • Export destination
    • DynamoDB table name
    • Desired read percentage
    • AWS Glue crawler name
  • AWS Glue exports a DynamoDB table in your preferred format to S3 as snapshots_your_table_name. The data is partitioned by the snapshot_timestamp
  • An AWS Glue crawler adds or updates your data’s schema and partitions in the AWS Glue Data Catalog.
  • Finally, we create an Athena view that only has data from the latest export snapshot.

A simple AWS Glue ETL job

The script that I created accepts AWS Glue ETL job arguments for the table name, read throughput, output, and format. Behind the scenes, AWS Glue scans the DynamoDB table. AWS Glue makes sure that every top-level attribute makes it into the schema, no matter how sparse your attributes are (as discussed in the DynamoDB documentation).

Here’s the script:

import sys
import datetime
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext

ARG_TABLE_NAME = "table_name"
ARG_READ_PERCENT = "read_percentage"
ARG_OUTPUT = "output_prefix"
ARG_FORMAT = "output_format"

PARTITION = "snapshot_timestamp"

args = getResolvedOptions(sys.argv,
  [
    'JOB_NAME',
    ARG_TABLE_NAME,
    ARG_READ_PERCENT,
    ARG_OUTPUT,
    ARG_FORMAT
  ]
)

table_name = args[ARG_TABLE_NAME]
read = args[ARG_READ_PERCENT]
output_prefix = args[ARG_OUTPUT]
fmt = args[ARG_FORMAT]

print("Table name:", table_name)
print("Read percentage:", read)
print("Output prefix:", output_prefix)
print("Format:", fmt)

date_str = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M')
output = "%s/%s=%s" % (output_prefix, PARTITION, date_str)

sc = SparkContext()
glueContext = GlueContext(sc)

table = glueContext.create_dynamic_frame.from_options(
  "dynamodb",
  connection_options={
    "dynamodb.input.tableName": table_name,
    "dynamodb.throughput.read.percent": read
  }
)

glueContext.write_dynamic_frame.from_options(
  frame=table,
  connection_type="s3",
  connection_options={
    "path": output
  },
  format=fmt,
  transformation_ctx="datasink"
)

There’s not a lot here. We’re creating a DynamicFrameReader of connection type dynamodb and passing in the table name and desired maximum read throughput consumption. We pass that data frame to a DynamicFrameWriter that writes the table to S3 in the specified format.

Athena views

Most teams at Amazon own applications that have multiple DynamoDB tables, including my own team. Our current application uses five primary tables. Ideally, at the end of an export workflow you can write simple, obvious queries across a consistent view of your tables. However, each exported table is partitioned by the timestamp from when the table was exported. This makes querying across one or more tables very cumbersome, because you have to add a WHERE snapshot_timestamp = clause to every table reference in your query. Additionally, each table might have a different snapshot_timestamp value for any given day!

The final step in this export workflow creates an Athena view that adds that WHERE clause for you. This means that you can interact with your DynamoDB exports as if they were one sane view of your exported DynamoDB tables.

Setting up the infrastructure

The AWS CloudFormation stacks I create are split into two stacks. The common stack contains shared infrastructure, and you need only one of these per AWS Region. The table stacks are designed in such a way that you can create one per table-format combination in any given AWS Region. It contains the CloudWatch event logic and AWS Glue components needed to export and transform DynamoDB tables.

Creating the common stack

The common stack contains the majority of the infrastructure. That includes the Step Functions state machine and Lambda functions to trigger and check the state of asynchronous jobs. It also includes IAM roles that the export stacks use, and the S3 bucket to store the exports.

To create the common stack, do the following:

  1. Choose this Launch Stack
  2. Choose I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  3. Choose Create Stack.

Creating the table export stack

If you don’t have a DynamoDB table to export, follow the original blog post. Start with the Working with the Reviews stack section and continue until you’ve added the two Items to the table. Otherwise, feel free to point this CloudFormation stack at your favorite DynamoDB table that is using provisioned throughput. Tables that use on-demand throughput are not currently supported.

Because so much of this architecture is shareable, there’s not much in the table export stack. This stack defines the CloudWatch event used to trigger the Step Functions state machine with a JSON payload containing all the necessary metadata. Additionally, it contains the AWS Glue ETL job that exports the table and the AWS Glue Crawler that updates metadata in the AWS Glue Data Catalog.

Technically, you can define the AWS Glue ETL job in the common stack because it’s already parameterized. However, the default limit for concurrent runs for an AWS Glue job is three. This is a soft limit, but with this architecture you have headroom to export up to 25 tables before asking for a limit increase.

To create the table export stack, do the following:

  1. Choose this Launch Stack
  2. Choose an output format from the list. All the available formats are supported by Athena natively.
  3. Enter your DynamoDB table name.
  4. Enter the percentage of Read Capacity Units (RCUs) that the job should consume from your table’s currently provisioned throughput. This percentage is expressed as a float between 0.1 and 1.0 inclusive. The default is 0.25 (25 percent).

As an example: Suppose that your table’s RCUs are set to 100 and you use the default 0.25, 25 percent. Then the AWS Glue job consume 25 RCUs while running.

  1. Choose Create.

Kicking off a state machine execution

To demonstrate how this works, we run the DynamoDB export state machine manually by passing it the JSON payload that the CloudWatch event would pass to Step Functions.

Getting the JSON payload from CloudWatch Events

To get the JSON payload, do the following:

  1. Open CloudWatch in the AWS Management Console.
  2. In the left column under Events, choose Rules.
  3. Choose your rule from the list. It is prefixed by AWSBigDataBlog-.
  4. For Actions, choose Edit.
  5. Copy the JSON payload from the Configure input section of Targets.
  6. Choose Cancel to exit edit mode.

Starting a state machine execution

To start an execution of the state machine, take the following steps:

  1. Open Step Functions in the console.
  2. Choose the DynamoDBExportAndAthenaLoad state machine.
  3. Choose Start execution.
  4. Paste the JSON payload into the Input
  5. Choose Start execution.

There are a few ways to follow along with the execution. As steps are entered and exited, entries are added to the Execution event history list. This is a great way to see what state (event in Lambda speak) is passed to each step, in case you need to debug.

You can also expand the Visual workflow. It’s a great high-level view to see how the workflow is progressing.

After the workflow is finished, you see two new tables under the dynamodb_exports database in your AWS Glue Data Catalog. Your DynamoDB snapshots table name is prefixed with snapshots_. The schema is formatted for the AWS Glue Data Catalog (lowercase and hyphens transformed to underscores). You also have a view table with the same table name formatted for AWS Glue Data Catalog but without the snapshots_ prefix.

Querying your data

To showcase how having a separate view table of the most recent snapshot of a table is useful, I use the Reviews table from the previous blog post. The table has two items. I have also run the export workflow twice. As you can see when you preview the table, there are four items total. That’s because each snapshot contains two items.

From the items, the latest snapshot_timestamp is 2019-01-11T23:26. When I run the same preview query against the view table reviews, we see that there are only two items, which is what we expect. The view takes care of specifying the where snapshot_timestamp=… clause so you don’t have to.

Wrapping up

In this post, I showed you how to use AWS Glue’s DynamoDB integration and AWS Step Functions to create a workflow to export your DynamoDB tables to S3 in Parquet. I also show how to create an Athena view for each table’s latest snapshot, giving you a consistent view of your DynamoDB table exports.


About the Author

Joe Feeney is a Software Engineer at Amazon Go, where he does secret stuff and he’s quite chuffed with that. He enjoys embarrassing his family by taking Mario Kart entirely too seriously.

 

 

 

Getting started with serverless

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/getting-started-with-serverless/

This post is contributed by Maureen Lonergan, Director, AWS Training and Certification

We consistently hear from customers that they’re interested in building serverless applications to take advantage of the increased agility and decreased total cost of ownership (TCO) that serverless delivers. But we also know that serverless may be intimidating for those who are more accustomed to using instances or containers for compute.

Since we launched AWS Lambda in 2014, our serverless portfolio has expanded beyond event-driven computing. We now have serverless databases, integration, and orchestration tools. This enables you to build end-to-end serverless applications—but it also means that you must learn how to build using a new serverless operational model.

For this reason, AWS Training and Certification is pleased to offer a new course through Coursera entitled AWS Fundamentals: Building Serverless Applications.

This scenario-based course, developed by the experts at AWS, will:

  • Introduce the AWS serverless framework and architecture in the context of a real business problem.
  • Provide the foundational knowledge to become more proficient in choosing and creating serverless solutions using AWS.
  • Provide demonstrations of the AWS services needed for deploying serverless solutions.
  • Help you develop skills in building and deploying serverless solutions using real-world examples of a serverless website and chatbot.

The syllabus allocates more than nine hours of video content and reading material over four weekly lessons. Each lesson has an estimated 2–3 hours per week of study time (though you can set your own pace and deadlines), with suggested exercises in the AWS Management Console. There is an end-of-course assessment that covers all the learning objectives and content.

The course is on-demand and 100% digital; you can even audit it for free. A completion certificate and access to the graded assessments are available for $49.

What can you expect?

In this course you will learn to use the AWS Serverless portfolio to create a chatbot that answers the question, “Can I let my cat outside?” You will build an application using every one of the concepts and services discussed in the class, including:

At the end of the class, you can audibly interact with the application to ask that essential question, “Can my cat go out in Denver?” (See the conversation in the following screenshot.)

Serverless Coursera training app

Across the four weeks of the course, you learn:

  1. What serverless computing is and how to create a chatbot with Amazon Lex using an S3 bucket to host a web application.
  2. How to build a highly scalable API with API Gateway and use Amazon CloudFront as a content delivery network (CDN) for your site and API.
  3. How to use Lambda to build serverless functions that write data to DynamoDB.
  4. How to apply lessons from the previous weeks to extend and add functionality to the chatbot.

Serverless Coursera training

AWS Fundamentals: Building Serverless Applications is now available. This course complements other standalone digital courses by AWS Training and Certification. They include the highly recommended Introduction to Serverless Development, as well as the following:

ICYMI: Serverless Q1 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-q1-2019/

Welcome to the fifth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

If you didn’t see them, check our previous posts for what happened in 2018:

So, what might you have missed this past quarter? Here’s the recap.

Amazon API Gateway

Amazon API Gateway improved the experience for publishing APIs on the API Gateway Developer Portal. In addition, we also added features like a search capability, feedback mechanism, and SDK-generation capabilities.

Last year, API Gateway announced support for WebSockets. As of early February 2019, it is now possible to build WebSocket-enabled APIs via AWS CloudFormation and AWS Serverless Application Model (AWS SAM). The following diagram shows an example application.WebSockets

API Gateway is also now supported in AWS Config. This feature enhancement allows API administrators to track changes to their API configuration automatically. With the power of AWS Config, you can automate alerts—and even remediation—with triggered Lambda functions.

In early January, API Gateway also announced a service level agreement (SLA) of 99.95% availability.

AWS Step Functions

Step Functions Local

AWS Step Functions added the ability to tag Step Function resources and provide access control with tag-based permissions. With this feature, developers can use tags to define access via AWS Identity and Access Management (IAM) policies.

In addition to tag-based permissions, Step Functions was one of 10 additional services to have support from the Resource Group Tagging API, which allows a single central point of administration for tags on resources.

In early February, Step Functions released the ability to develop and test applications locally using a local Docker container. This new feature allows you to innovate faster by iterating faster locally.

In late January, Step Functions joined the family of services offering SLAs with an SLA of 99.9% availability. They also increased their service footprint to include the AWS China (Ningxia) and AWS China (Beijing) Regions.

AWS SAM Command Line Interface

AWS SAM Command Line Interface (AWS SAM CLI) released the AWS Toolkit for Visual Studio Code and the AWS Toolkit for IntelliJ. These toolkits are open source plugins that make it easier to develop applications on AWS. The toolkits provide an integrated experience for developing serverless applications in Node.js (Visual Studio Code) as well as Java and Python (IntelliJ), with more languages and features to come.

The toolkits help you get started fast with built-in project templates that leverage AWS SAM to define and configure resources. They also include an integrated experience for step-through debugging of serverless applications and make it easy to deploy your applications from the integrated development environment (IDE).

AWS Serverless Application Repository

AWS Serverless Application Repository applications can now be published to the application repository using AWS CodePipeline. This allows you to update applications in the AWS Serverless Application Repository with a continuous integration and continuous delivery (CICD) process. The CICD process is powered by a pre-built application that publishes other applications to the AWS Serverless Application Repository.

AWS Event Fork Pipelines

Event Fork Pipelines

AWS Event Fork Pipelines is now available in AWS Serverless Application Repository. AWS Event Fork Pipelines is a suite of nested open-source applications based on AWS SAM. You can deploy Event Fork Pipelines directly from AWS Serverless Application Repository into your AWS account. These applications help you build event-driven serverless applications by providing pipelines for common event-handling requirements.

AWS Cloud9

Cloud9

AWS Cloud9 announced that, in addition to Amazon Linux, you can now select Ubuntu as the operating system for their AWS Cloud9 environment. Before this announcement, you would have to stand up an Ubuntu server and connect AWS Cloud9 to the instance by using SSH. With native support for Ubuntu, you can take advantage of AWS Cloud9 features, such as instance lifecycle management for cost efficiency and preconfigured tooling environments.

AWS Cloud9 also added support for AWS CloudTrail, which allows you to monitor and react to changes made to your AWS Cloud9 environment.

Amazon Kinesis Data Analytics

Amazon Kinesis Data Analytics now supports CloudTrail logging. CloudTrail captures changes made to Kinesis Data Analytics and delivers the logs to an Amazon S3 bucket. This makes it easy for administrators to understand changes made to the application and who made them.

Amazon DynamoDB

Amazon DynamoDB removed the associated costs of DynamoDB Streams used in replicating data globally. Because of their use of streams to replicate data between Regions, this translates to cost savings in global tables. However, DynamoDB streaming costs remain the same for your applications reading from a replica table’s stream.

DynamoDB added the ability to switch encryption keys used to encrypt data. DynamoDB, by default, encrypts all data at rest. You can use the default encryption, the AWS-owned customer master key (CMK), or the AWS managed CMK to encrypt data. It is now possible to change between the AWS-owned CMK and the AWS managed CMK without having to modify code or applications.

Amazon DynamoDB Local, a local installable version of DynamoDB, has added support for transactional APIs, on-demand capacity, and as many as 20 global secondary indexes per table.

AWS Amplify

Amplify Deploy

AWS Amplify added support for OAuth 2.0 Authorization Code Grant flows in the native (iOS and Android) and React Native libraries. Previously, you would have to use third-party libraries and handwritten logic to achieve these use cases.

Additionally, Amplify also launched the ability to perform instant cache invalidation and delta deployments on every code commit. To achieve this, Amplify creates unique references to all the build artifacts on each deploy. Amplify has also added the ability to detect and upload only modified artifacts at the time of release to help reduce deployment time.

Amplify also added features for multiple environments, custom resolvers, larger data models, and IAM roles, including multi-factor authentication (MFA).

AWS AppSync

AWS AppSync increased its availability footprint to the EU (London) Region.

Amazon Cognito

Amazon Cognito increased its service footprint to include the Canada (central) Region. It also published an SLA of 99.9% availability.

Amazon Aurora

Amazon Aurora Serverless increases performance visibility by publishing logs to Amazon CloudWatch.

AWS CodePipeline

CodePipeline

AWS CodePipeline announces support for deploying static files to Amazon S3. While this may not usually fall under the serverless blogs and announcements, if you’re a developer who builds single-page applications or host static websites, this makes your life easier. Your static site can now be part of your CICD process without custom coding.

Serverless Posts

January:

February:

March

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the three tech talks that we delivered in Q1:

Whitepapers

Security Overview of AWS Lambda: This whitepaper presents a deep dive into the Lambda service through a security lens. It provides a well-rounded picture of the service, which can be useful for new adopters, as well as deepening understanding of Lambda for current users. Read the full whitepaper.

Twitch

AWS Launchpad Santa Clara

There is always something going on at our Twitch channel! Be sure and follow us so you don’t miss anything! For information about upcoming broadcasts and recent livestreams, keep an eye on AWS on Twitch for more Serverless videos and on the Join us on Twitch AWS page.

In other news

Building Happy Little APIs

Twitch Series: Building Happy Little APIs

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

Twitch series: Build on Serverless: Season 2

Build On Serverless

Join Heitor Lessa across 14 weeks, nearly every Wednesday from April 24 – August 7 at 8AM PST/11AM EST/3PM UTC. Heitor is live-building a full-stack, serverless airline-booking application using a bunch of services: Lambda, Amplify, API Gateway, Amazon Cognito, AWS SAM, CloudWatch, AWS AppSync, and others. See the episode guide and sign up for stream reminders.

2019 AWS Summits

AWS Summit

The 2019 schedule is in full swing for 2019 AWS Global Summits held in major cities around the world. These free events bring the cloud computing community together to connect, collaborate, and learn about AWS. They attract technologists from all industries and skill levels who want to discover how AWS can help them innovate quickly and deliver flexible, reliable solutions at scale. Get notified when to register and learn more at the AWS Global Summit Program website.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Learn about AWS Services & Solutions – April AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-services-solutions-april-aws-online-tech-talks/

AWS Tech Talks

Join us this April to learn about AWS services and solutions. The AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. These tech talks, led by AWS solutions architects and engineers, feature technical deep dives, live demonstrations, customer examples, and Q&A with AWS experts. Register Now!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Blockchain

May 2, 2019 | 11:00 AM – 12:00 PM PTHow to Build an Application with Amazon Managed Blockchain – Learn how to build an application on Amazon Managed Blockchain with the help of demo applications and sample code.

Compute

April 29, 2019 | 1:00 PM – 2:00 PM PTHow to Optimize Amazon Elastic Block Store (EBS) for Higher Performance – Learn how to optimize performance and spend on your Amazon Elastic Block Store (EBS) volumes.

May 1, 2019 | 11:00 AM – 12:00 PM PTIntroducing New Amazon EC2 Instances Featuring AMD EPYC and AWS Graviton Processors – See how new Amazon EC2 instance offerings that feature AMD EPYC processors and AWS Graviton processors enable you to optimize performance and cost for your workloads.

Containers

April 23, 2019 | 11:00 AM – 12:00 PM PTDeep Dive on AWS App Mesh – Learn how AWS App Mesh makes it easy to monitor and control communications for services running on AWS.

March 22, 2019 | 9:00 AM – 10:00 AM PTDeep Dive Into Container Networking – Dive deep into microservices networking and how you can build, secure, and manage the communications into, out of, and between the various microservices that make up your application.

Databases

April 23, 2019 | 1:00 PM – 2:00 PM PTSelecting the Right Database for Your Application – Learn how to develop a purpose-built strategy for databases, where you choose the right tool for the job.

April 25, 2019 | 9:00 AM – 10:00 AM PTMastering Amazon DynamoDB ACID Transactions: When and How to Use the New Transactional APIs – Learn how the new Amazon DynamoDB’s transactional APIs simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables.

DevOps

April 24, 2019 | 9:00 AM – 10:00 AM PTRunning .NET applications with AWS Elastic Beanstalk Windows Server Platform V2 – Learn about the easiest way to get your .NET applications up and running on AWS Elastic Beanstalk.

Enterprise & Hybrid

April 30, 2019 | 11:00 AM – 12:00 PM PTBusiness Case Teardown: Identify Your Real-World On-Premises and Projected AWS Costs – Discover tools and strategies to help you as you build your value-based business case.

IoT

April 30, 2019 | 9:00 AM – 10:00 AM PTBuilding the Edge of Connected Home – Learn how AWS IoT edge services are enabling smarter products for the connected home.

Machine Learning

April 24, 2019 | 11:00 AM – 12:00 PM PTStart Your Engines and Get Ready to Race in the AWS DeepRacer League – Learn more about reinforcement learning, how to build a model, and compete in the AWS DeepRacer League.

April 30, 2019 | 1:00 PM – 2:00 PM PTDeploying Machine Learning Models in Production – Learn best practices for training and deploying machine learning models.

May 2, 2019 | 9:00 AM – 10:00 AM PTAccelerate Machine Learning Projects with Hundreds of Algorithms and Models in AWS Marketplace – Learn how to use third party algorithms and model packages to accelerate machine learning projects and solve business problems.

Networking & Content Delivery

April 23, 2019 | 9:00 AM – 10:00 AM PTSmart Tips on Application Load Balancers: Advanced Request Routing, Lambda as a Target, and User Authentication – Learn tips and tricks about important Application Load Balancers (ALBs) features that were recently launched.

Productivity & Business Solutions

April 29, 2019 | 11:00 AM – 12:00 PM PTLearn How to Set up Business Calling and Voice Connector in Minutes with Amazon Chime – Learn how Amazon Chime Business Calling and Voice Connector can help you with your business communication needs.

May 1, 2019 | 1:00 PM – 2:00 PM PTBring Voice to Your Workplace – Learn how you can bring voice to your workplace with Alexa for Business.

Serverless

April 25, 2019 | 11:00 AM – 12:00 PM PTModernizing .NET Applications Using the Latest Features on AWS Development Tools for .NET – Get a dive deep and demonstration of the latest updates to the AWS SDK and tools for .NET to make development even easier, more powerful, and more productive.

May 1, 2019 | 9:00 AM – 10:00 AM PTCustomer Showcase: Improving Data Processing Workloads with AWS Step Functions’ Service Integrations – Learn how innovative customers like SkyWatch are coordinating AWS services using AWS Step Functions to improve productivity.

Storage

April 24, 2019 | 1:00 PM – 2:00 PM PTAmazon S3 Glacier Deep Archive: The Cheapest Storage in the Cloud – See how Amazon S3 Glacier Deep Archive offers the lowest cost storage in the cloud, at prices significantly lower than storing and maintaining data in on-premises magnetic tape libraries or archiving data offsite.

This Is My Architecture: Mobile Cryptocurrency Mining

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/this-is-my-architecture-mobile-cryptocurrency-mining/

In North America, approximately 95% of adults over the age of 25 have a bank account. In the developing world, that number is only about 52%. Cryptocurrencies can provide a platform for millions of unbanked people in the world to achieve financial freedom on a more level financial playing field.

Electroneum, a cryptocurrency company located in England, built its cryptocurrency mobile back end on AWS and is using the power of blockchain to unlock the global digital economy for millions of people in the developing world.

Electroneum’s cryptocurrency mobile app allows Electroneum customers in developing countries to transfer ETNs (exchange-traded notes) and pay for goods using their smartphones. Listen in to the discussion between AWS Solutions Architect Toby Knight and Electroneum CTO Barry Last as they explain how the company built its solution. Electroneum’s app is a web application that uses a feedback loop between its web servers and AWS WAF (a web application firewall) to automatically block malicious actors. The system then uses Athena, with a gamified approach, to provide an additional layer of blocking to prevent DDoS attacks. Finally, Electroneum built a serverless, instant payments system using AWS API Gateway, AWS Lambda, and Amazon DynamoDB to help its customers avoid the usual delays in confirming cryptocurrency transactions.

 

Deploying a personalized API Gateway serverless developer portal

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/deploying-a-personalized-api-gateway-serverless-developer-portal/

This post is courtesy of Drew Dresser, Application Architect – AWS Professional Services

Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. Customers of these APIs often want a website to learn and discover APIs that are available to them. These customers might include front-end developers, third-party customers, or internal system engineers. To produce such a website, we have created the API Gateway serverless developer portal.

The API Gateway serverless developer portal (developer portal or portal, for short) is an application that you use to make your API Gateway APIs available to your customers by enabling self-service discovery of those APIs. Your customers can use the developer portal to browse API documentation, register for, and immediately receive their own API key that they can use to build applications, test published APIs, and monitor their own API usage.

Over the past few months, the team has been hard at work contributing to the open source project, available on Github. The developer portal was relaunched on October 29, 2018, and the team will continue to push features and take customer feedback from the open source community. Today, we’re happy to highlight some key functionality of the new portal.

Benefits for API publishers

API publishers use the developer portal to expose the APIs that they manage. As an API publisher, you need to set up, maintain, and enable the developer portal. The new portal has the following benefits:

Benefits for API consumers

API Consumers use the developer portal as a traditional application user. An API consumer needs to understand the APIs being published. API consumers might be front-end developers, distributed system engineers, or third-party customers. The new developer portal comes with the following benefits for API consumers:

  • Explore – API consumers can quickly page through lists of APIs. When they find one they’re interested in, they can immediately see documentation on that API.
  • Learn – API consumers might need to drill down deeper into an API to learn its details. They want to learn how to form requests and what they can expect as a response.
  • Test – Through the developer portal, API consumers can get an API key and invoke the APIs directly. This enables developers to develop faster and with more confidence.

Architecture

The developer portal is a completely serverless application. It leverages Amazon API Gateway, Amazon Cognito User Pools, AWS Lambda, Amazon DynamoDB, and Amazon S3. Serverless architectures enable you to build and run applications without needing to provision, scale, and manage any servers. The developer portal is broken down in to multiple microservices, each with a distinct responsibility, as shown in the following image.

Identity management for the developer portal is performed by Amazon Cognito and a Lambda function in the Login & Registration microservice. An Amazon Cognito User Pool is configured out of the box to enable users to register and login. Additionally, you can deploy the developer portal to use a UI hosted by Amazon Cognito, which you can customize to match your style and branding.

Requests are routed to static content served from Amazon S3 and built using React. The React app communicates to the Lambda backend via API Gateway. The Lambda function is built using the aws-serverless-express library and contains the business logic behind the APIs. The business logic of the web application queries and add data to the API Key Creation and Catalog Update microservices.

To maintain the API catalog, the Catalog Update microservice uses an S3 bucket and a Lambda function. When an API’s Swagger file is added or removed from the bucket, the Lambda function triggers and maintains the API catalog by updating the catalog.json file in the root of the S3 bucket.

To manage the mapping between API keys and customers, the application uses the API Key Creation microservice. The service updates API Gateway with API key creations or deletions and then stores the results in a DynamoDB table that maps customers to API keys.

Deploying the developer portal

You can deploy the developer portal using AWS SAM, the AWS SAM CLI, or the AWS Serverless Application Repository. To deploy with AWS SAM, you can simply clone the repository and then deploy the application using two commands from your CLI. For detailed instructions for getting started with the portal, see Use the Serverless Developer Portal to Catalog Your API Gateway APIs in the Amazon API Gateway Developer Guide.

Alternatively, you can deploy using the AWS Serverless Application Repository as follows:

  1. Navigate to the api-gateway-dev-portal application and choose Deploy in the top right.
  2. On the Review page, for ArtifactsS3BucketName and DevPortalSiteS3BucketName, enter globally unique names. Both buckets are created for you.
  3. To deploy the application with these settings, choose Deploy.
  4. After the stack is complete, get the developer portal URL by choosing View CloudFormation Stack. Under Outputs, choose the URL in Value.

The URL opens in your browser.

You now have your own serverless developer portal application that is deployed and ready to use.

Publishing a new API

With the developer portal application deployed, you can publish your own API to the portal.

To get started:

  1. Create the PetStore API, which is available as a sample API in Amazon API Gateway. The API must be created and deployed and include a stage.
  2. Create a Usage Plan, which is required so that API consumers can create API keys in the developer portal. The API key is used to test actual API calls.
  3. On the API Gateway console, navigate to the Stages section of your API.
  4. Choose Export.
  5. For Export as Swagger + API Gateway Extensions, choose JSON. Save the file with the following format: apiId_stageName.json.
  6. Upload the file to the S3 bucket dedicated for artifacts in the catalog path. In this post, the bucket is named apigw-dev-portal-artifacts. To perform the upload, run the following command.
    aws s3 cp apiId_stageName.json s3://yourBucketName/catalog/apiId_stageName.json

Uploading the file to the artifacts bucket with a catalog/ key prefix automatically makes it appear in the developer portal.

This might be familiar. It’s your PetStore API documentation displayed in the OpenAPI format.

With an API deployed, you’re ready to customize the portal’s look and feel.

Customizing the developer portal

Adding a customer’s own look and feel to the developer portal is easy, and it creates a great user experience. You can customize the domain name, text, logo, and styling. For a more thorough walkthrough of customizable components, see Customization in the GitHub project.

Let’s walk through a few customizations to make your developer portal more familiar to your API consumers.

Customizing the logo and images

To customize logos, images, or content, you need to modify the contents of the your-prefix-portal-static-assets S3 bucket. You can edit files using the CLI or the AWS Management Console.

Start customizing the portal by using the console to upload a new logo in the navigation bar.

  1. Upload the new logo to your bucket with a key named custom-content/nav-logo.png.
    aws s3 cp {myLogo}.png s3://yourPrefix-portal-static-assets/custom-content/nav-logo.png
  2. Modify object permissions so that the file is readable by everyone because it’s a publicly available image. The new navigation bar looks something like this:

Another neat customization that you can make is to a particular API and stage image. Maybe you want your PetStore API to have a dog picture to represent the friendliness of the API. To add an image:

  1. Use the command line to copy the image directly to the S3 bucket location.
    aws s3 cp my-image.png s3://yourPrefix-portal-static-assets/custom-content/api-logos/apiId-stageName.png
  2. Modify object permissions so that the file is readable by everyone.

Customizing the text

Next, make sure that the text of the developer portal welcomes your pet-friendly customer base. The YAML files in the static assets bucket under /custom-content/content-fragments/ determine the portal’s text content.

To edit the text:

  1. On the AWS Management Console, navigate to the website content S3 bucket and then navigate to /custom-content/content-fragments/.
  2. Home.md is the content displayed on the home page, APIs.md controls the tab text on the navigation bar, and GettingStarted.md contains the content of the Getting Started tab. All three files are written in markdown. Download one of them to your local machine so that you can edit the contents. The following image shows Home.md edited to contain custom text:
  3. After editing and saving the file, upload it back to S3, which results in a customized home page. The following image reflects the configuration changes in Home.md from the previous step:

Customizing the domain name

Finally, many customers want to give the portal a domain name that they own and control.

To customize the domain name:

  1. Use AWS Certificate Manager to request and verify a managed certificate for your custom domain name. For more information, see Request a Public Certificate in the AWS Certificate Manager User Guide.
  2. Copy the Amazon Resource Name (ARN) so that you can pass it to the developer portal deployment process. That process is now includes the certificate ARN and a property named UseRoute53Nameservers. If the property is set to true, the template creates a hosted zone and record set in Amazon Route 53 for you. If the property is set to false, the template expects you to use your own name server hosting.
  3. If you deployed using the AWS Serverless Application Repository, navigate to the Application page and deploy the application along with the certificate ARN.

After the developer portal is deployed and your CNAME record has been added, the website is accessible from the custom domain name as well as the new Amazon CloudFront URL.

Customizing the logo, text content, and domain name are great tools to make the developer portal feel like an internally developed application. In this walkthrough, you completely changed the portal’s appearance to enable developers and API consumers to discover and browse APIs.

Conclusion

The developer portal is available to use right away. Support and feature enhancements are tracked in the public GitHub. You can contribute to the project by following the Code of Conduct and Contributing guides. The project is open-sourced under the Amazon Open Source Code of Conduct. We plan to continue to add functionality and listen to customer feedback. We can’t wait to see what customers build with API Gateway and the API Gateway serverless developer portal.

Handling AWS Chargebacks for Enterprise Customers

Post Syndicated from Varad Ram original https://aws.amazon.com/blogs/architecture/handling-aws-chargebacks-for-enterprise-customers/

As AWS product portfolios and feature sets grow, as an enterprise customer, you are likely to migrate your existing workloads and innovate your new products on AWS. To help you keep your cloud charges simple, you can use consolidated billing. This can, however, create complexity for your internal chargebacks, especially if some of your resources and services are not tagged correctly. To help your individual teams and business units normalize and reduce their costs as your AWS implementation grows, you can implement chargebacks transparently and automate billing.

This blog post includes a walkthrough of an end-to-end mechanism that you can use to automate your consolidated billing charges for either your existing AWS accounts, or for newly created accounts.

Walkthrough

Prerequisites for implementation:

  • One account that is the payer account, which consolidates billing and links all other accounts (including admin accounts)
  • An understanding of billing, Detailed Billing Report (DBR), Cost and Usage Report (CUR), and blended and unblended costs
  • Activate propagation of necessary cost allocation tags to consolidated billing
  • Access to reservations across the linked accounts
  • Read permission on the source bucket and write permission to the transformed bucket
  • An automated method (such as database access or an API) to verify the cost centers tagged to AWS resources
  • Permissions to get access to the services described in this solution on the account targeted for this automation

Before you begin, it is important to understand the blended costs and unblended costs in consolidated billing. Blended costs are calculated based on the blended rate (the average rates for the reserved and on-demand instances that are used by your member accounts) for each service your accounts used, multiplied by the account usage of those services. Unblended costs are the charges for those services broken out for each linked account.

Based on your organization’s strategy for savings (centralized or not), you could consider either the blended or unblended costs. The consolidated billing files that include the information for the chargeback are the Detailed Billing Report (DBR) and Cost and Usage Report (CUR). Both of these reports provide both the blended and unblended rates as separate columns.

To help you create and maintain your AWS accounts, you can use AWS Account Vending Machine (AVM). You can launch AVM from either the AWS Landing Zone or with a custom solution. AVM keeps all your account information in a DynamoDB table (such as the account number, root mail ID, default cost center, name of the owner, etc.) and maintains reservation-related data (such as invoice ID, instance type, region, amount, cost center, etc.) in another table. To enable your account administrator to add invoice details for all your reservations, you can use a web page hosted on AWS Lambda, Amazon Simple Storage Service (Amazon S3), or a web server.

To begin the process of billing transformation, you must add a trigger on an S3 bucket (which contains raw AWS billing files) that pushes messages (PutObject) into Amazon Simple Queue Service (SQS) and your billing transformation program (written in Python, Nodejs, Java, .net, etc. using AWS SDK) that runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance, containers, or Lambda (if the bill can be processed within 15 minutes with file size restrictions).

The billing transformation program must do the following:

  • Cache the Account details and reservation DynamoDB tables
  • Verify if there are any messages in SQS
  • Ignore if the file is not a DBR or CUR file (process either of them, not both)
  • Download the file, unzip, and read row-by-row; for a DBR file, consider only the “LineItem” RecordType
  • Add two new columns: Bill_CostCenter and Bill_Notes
    • If there is a valid value in the CostCenter tag (verified with internal automation processes), add the same value to the Bill_CostCenter column and any notes to the Bill_Notes column
    • If the CostCenter is invalid, get the default Cost Center from the cached account details and add the information to the Bill_CostCenter and Bill_Notes columns
    • If the row is a reservation invoice, the cost center information comes from the reservation table and is added to the correct column
  • Cache consolidation of cost centers with the blended or unblended cost of each row
  • Write each of these processed line items into a new file
  • Handle exceptions by the normal organization practices (for example, email the owner of the cost center or the finance team)
  • Push the new file into the transformed Amazon S3 bucket
  • Write the consolidated lines into a different file and upload to Transformed Amazon S3 bucket
Figure 1 – Architecture of processing a billing chargeback

Figure 1 – Architecture of processing a billing chargeback

 

Figure 2 – Validating the Cost Center process

After you have the consolidated billing file aggregated by cost center, you can easily see and handle your internal chargebacks. To further simplify your chargeback model, you can get help from AWS Technical Account Managers and Billing Concierge, if your organization would like AWS to provide custom invoices from the consolidated billing file.

Because the cost centers in your organization can expire over time, it’s important validate them frequently with automation, such as a Lambda program.

Improvements

If your organization has a more complex chargeback structure, you can extend the logic described above to support deeper and broader chargeback codes, or implement hierarchical chargeback structure.

You can also extend the transformation logic to support several chargeback codes (such as comma separated or with additional tags) if you have multiple teams or project that want to share a resource.

Summary

As enterprise organizations grow and consume more cloud services, the cost optimization process grows and evolves with them. Sophisticated chargeback models enable the teams and business units in the organization to be accountable and contribute to take the steps necessary to normalize the usage and costs of AWS services.

About the Author

Varad RamVarad Ram likes to help customers adopt to cloud technologies and he is particularly interested in Artificial Intelligence. He believes Deep Learning will power future technology growth. In his spare time, his daughter and toddler son keep him busy biking and hiking.

ICYMI: Serverless Q4 2018

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2018/

This post is courtesy of Eric Johnson, Senior Developer Advocate – AWS Serverless

Welcome to the fourth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

This edition of ICYMI includes all announcements from AWS re:Invent 2018!

If you didn’t see them, check our Q1 ICYMIQ2 ICYMI, and Q3 ICYMI posts for what happened then.

So, what might you have missed this past quarter? Here’s the recap.

New features

AWS Lambda introduced the Lambda runtime API and Lambda layers, which enable developers to bring their own runtime and share common code across Lambda functions. With the release of the runtime API, we can now support runtimes from AWS partners such as the PHP runtime from Stackery and the Erlang and Elixir runtimes from Alert Logic. Using layers, partners such as Datadog and Twistlock have also simplified the process of using their Lambda code libraries.

To meet the demand of larger Lambda payloads, Lambda doubled the payload size of asynchronous calls to 256 KB.

In early October, Lambda also lengthened the runtime limit by enabling Lambda functions that can run up to 15 minutes.

Lambda also rolled out native support for Ruby 2.5 and Python 3.7.

You can now process Amazon Kinesis streams up to 68% faster with AWS Lambda support for Kinesis Data Streams enhanced fan-out and HTTP/2 for faster streaming.

Lambda also released a new Application view in the console. It’s a high-level view of all of the resources in your application. It also gives you a quick view of deployment status with the ability to view service metrics and custom dashboards.

Application Load Balancers added support for targeting Lambda functions. ALBs can provide a simple HTTP/S front end to Lambda functions. ALB features such as host- and path-based routing are supported to allow flexibility in triggering Lambda functions.

Amazon API Gateway added support for AWS WAF. You can use AWS WAF for your Amazon API Gateway APIs to protect from attacks such as SQL injection and cross-site scripting (XSS).

API Gateway has also improved parameter support by adding support for multi-value parameters. You can now pass multiple values for the same key in the header and query string when calling the API. Returning multiple headers with the same name in the API response is also supported. For example, you can send multiple Set-Cookie headers.

In October, API Gateway relaunched the Serverless Developer Portal. It provides a catalog of published APIs and associated documentation that enable self-service discovery and onboarding. You can customize it for branding through either custom domain names or logo/styling updates. In November, we made it easier to launch the developer portal from the Serverless Application Repository.

In the continuous effort to decrease customer costs, API Gateway introduced tiered pricing. The tiered pricing model allows the cost of API Gateway at scale with an API Requests price as low as $1.51 per million requests at the highest tier.

Last but definitely not least, API Gateway released support for WebSocket APIs in mid-December as a final holiday gift. With this new feature, developers can build bidirectional communication applications without having to provision and manage any servers. This has been a long-awaited and highly anticipated announcement for the serverless community.

AWS Step Functions added eight new service integrations. With this release, the steps of your workflow can exist on Amazon ECS, AWS Fargate, Amazon DynamoDB, Amazon SNS, Amazon SQS, AWS Batch, AWS Glue, and Amazon SageMaker. This is in addition to the services that Step Functions already supports: AWS Lambda and Amazon EC2.

Step Functions expanded by announcing availability in the EU (Paris) and South America (São Paulo) Regions.

The AWS Serverless Application Repository increased its functionality by supporting more resources in the repository. The Serverless Application Repository now supports Application Auto Scaling, Amazon Athena, AWS AppSync, AWS Certificate Manager, Amazon CloudFront, AWS CodeBuild, AWS CodePipeline, AWS Glue, AWS dentity and Access Management, Amazon SNS, Amazon SQS, AWS Systems Manager, and AWS Step Functions.

The Serverless Application Repository also released support for nested applications. Nested applications enable you to build highly sophisticated serverless architectures by reusing services that are independently authored and maintained but easily composed using AWS SAM and the Serverless Application Repository.

AWS SAM made authorization simpler by introducing SAM support for authorizers. Enabling authorization for your APIs is as simple as defining an Amazon Cognito user pool or an API Gateway Lambda authorizer as a property of your API in your SAM template.

AWS SAM CLI introduced two new commands. First, you can now build locally with the sam build command. This functionality allows you to compile deployment artifacts for Lambda functions written in Python. Second, the sam publish command allows you to publish your SAM application to the Serverless Application Repository.

Our SAM tooling team also released the AWS Toolkit for PyCharm, which provides an integrated experience for developing serverless applications in Python.

The AWS Toolkits for Visual Studio Code (Developer Preview) and IntelliJ (Developer Preview) are still in active development and will include similar features when they become generally available.

AWS SAM and the AWS SAM CLI implemented support for Lambda layers. Using a SAM template, you can manage your layers, and using the AWS SAM CLI, you can develop and debug Lambda functions that are dependent on layers.

Amazon DynamoDB added support for transactions, allowing developer to enforce all-or-nothing operations. In addition to transaction support, Amazon DynamoDB Accelerator also added support for DynamoDB transactions.

Amazon DynamoDB also announced Amazon DynamoDB on-demand, a flexible new billing option for DynamoDB capable of serving thousands of requests per second without capacity planning. DynamoDB on-demand offers simple pay-per-request pricing for read and write requests so that you only pay for what you use, making it easy to balance costs and performance.

AWS Amplify released the Amplify Console, which is a continuous deployment and hosting service for modern web applications with serverless backends. Modern web applications include single-page app frameworks such as React, Angular, and Vue and static-site generators such as Jekyll, Hugo, and Gatsby.

Amazon SQS announced support for Amazon VPC Endpoints using PrivateLink.

Serverless blogs

October

November

December

Tech talks

We hold several Serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the three tech talks that we delivered in Q4:

Twitch

We’ve been so busy livestreaming on Twitch that you’re most certainly missing out if you aren’t following along!

For information about upcoming broadcasts and recent livestreams, keep an eye on AWS on Twitch for more Serverless videos and on the Join us on Twitch AWS page.

New Home for SAM Docs

This quarter, we moved all SAM docs to https://docs.aws.amazon.com/serverless-application-model. Everything you need to know about SAM is there. If you don’t find what you’re looking for, let us know!

In other news

The schedule is out for 2019 AWS Global Summits in cities around the world. AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. Summits are held in major cities around the world. They attract technologists from all industries and skill levels who want to discover how AWS can help them innovate quickly and deliver flexible, reliable solutions at scale. Get notified when to register and learn more at the AWS Global Summits website.

Still looking for more?

The Serverless landing page has lots of information. The resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

Pick the Right Tool for your IT Challenge

Post Syndicated from Markus Ostertag original https://aws.amazon.com/blogs/aws/pick-the-right-tool-for-your-it-challenge/

This guest post is by AWS Community Hero Markus Ostertag. As CEO of the Munich-based ad-tech company Team Internet AG, Markus is always trying to find the best ways to leverage the cloud, loves to work with cutting-edge technologies, and is a frequent speaker at AWS events and the AWS user group Munich that he co-founded in 2014.

Picking the right tools or services for a job is a huge challenge in IT—every day and in every kind of business. With this post, I want to share some strategies and examples that we at Team Internet used to leverage the huge “tool box” of AWS to build better solutions and solve problems more efficiently.

Use existing resources or build something new? A hard decision

The usual day-to-day work of an IT engineer, architect, or developer is building a solution for a problem or transferring a business process into software. To achieve this, we usually tend to use already existing architectures or resources and build an “add-on” to it.

With the rise of microservices, we all learned that modularization and decoupling are important for being scalable and extendable. This brought us to a different type of software architecture. In reality, we still tend to use already existing resources, like the same database of existing (maybe not fully used) Amazon EC2 instances, because it seems easier than building up new stuff.

Stacks as “next level microservices”?

We at Team Internet are not using the vocabulary of microservices but tend to speak about stacks and building blocks for the different use cases. Our approach is matching the idea of microservices to everything, including the database and other resources that are necessary for the specific problem we need to address.

It’s not about “just” dividing the software and code into different modules. The whole infrastructure is separated based on different needs. Each of those parts of the full architecture is our stack, which is as independent as possible from everything else in the whole system. It only communicates loosely with the other stacks or parts of the infrastructure.

Benefits of this mindset = independence and flexibility

  • Choosing the right parts. For every use case, we can choose the components or services that are best suited for the specific challenges and don’t need to work around limitations. This is especially true for databases, as we can choose from the whole palette instead of trying to squeeze requirements into a DBMS that isn’t built for that. We can differentiate the different needs of workloads like write-heavy vs. read-heavy or structured vs. unstructured data.
  • Rebuilding at will. We’re flexible in rebuilding whole stacks as they’re only loosely coupled. Because of this, a team can build a proof-of-concept with new ideas or services and run them in parallel on production workload without interfering or harming the production system.
  • Lowering costs. Because the operational overhead of running multiple resources is done by AWS (“No undifferentiated heavy lifting”), we just need to look at the service pricing. Most of the price schemes at AWS are supporting the stacks. For databases, you either pay for throughput (Amazon DynamoDB) or per instance (Amazon RDS, etc.). On the throughput level, it’s simple as you just split the throughput you did on one table to several tables without any overhead. On the instance level, the pricing is linear so that an r4.xlarge is half the price of an r4.2xlarge. So why not run two r4.xlarge and split the workload?
  • Designing for resilience. This approach also helps your architecture to be more reliable and resilient by default. As the different stacks are independent from each other, the scaling is much more granular. Scaling on larger systems is often provided with a higher “security buffer,” and failures (hardware, software, fat fingers, etc.) only happen on a small part of the whole system.
  • Taking ownership. A nice side effect we’re seeing now as we use this methodology is the positive effect on ownership and responsibility for our teams. Because of those stacks, it is easier to pinpoint and fix issues but also to be transparent and clear on who is responsible for which stack.

Benefits demand efforts, even with the right tool for the job

Every approach has its downsides. Here, it is obviously the additional development and architecture effort that needs to be taken to build such systems.

Therefore, we decided to always have the goal of a perfect system with independent stacks and reliable and loosely coupled processes between them in our mind. In reality, we sometimes break our own rules and cheat here and there. Even if we do, to have this approach helps us to build better systems and at least know exactly at what point we take a risk of losing the benefits. I hope the explanation and insights here help you to pick the right tool for the job.

Building Simpler Genomics Workflows on AWS Step Functions

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/building-simpler-genomics-workflows-on-aws-step-functions/

This post is courtesy of Ryan Ulaszek, AWS Genomics Partner Solutions Architect and Aaron Friedman, AWS Healthcare and Life Sciences Partner Solutions Architect

In 2017, we published a four part blog series on how to build a genomics workflow on AWS. In part 1, we introduced a general architecture highlighting three common layers: job, batch and workflow.  In part 2, we described building the job layer with Docker and Amazon Elastic Container Registry (Amazon ECR).  In part 3, we tackled the batch layer and built a batch engine using AWS Batch.  In part 4, we built out the workflow layer using AWS Step Functions and AWS Lambda.

Since then, we’ve worked with many AWS customers and APN partners to implement this solution in genomics as well as in other workloads-of-interest. Today, we wanted to highlight a new feature in Step Functions that simplifies how customers and partners can build high-throughput genomics workflows on AWS.

Step Functions now supports native integration with AWS Batch, which simplifies how you can create an AWS Batch state that submits an asynchronous job and waits for that job to finish.

Before, you needed to build a state machine building block that submitted a job to AWS Batch, and then polled and checked its execution. Now, you can just submit the job to AWS Batch using the new AWS Batch task type.  Step Functions waits to proceed until the job is completed. This reduces the complexity of your state machine and makes it easier to build a genomics workflow with asynchronous AWS Batch steps.

The new integrations include support for the following API actions:

  • AWS Batch SubmitJob
  • Amazon SNS Publish
  • Amazon SQS SendMessage
  • Amazon ECS RunTask
  • AWS Fargate RunTask
  • Amazon DynamoDB
    • PutItem
    • GetItem
    • UpdateItem
    • DeleteItem
  • Amazon SageMaker
    • CreateTrainingJob
    • CreateTransformJob
  • AWS Glue
    • StartJobRun

You can also pass parameters to the service API.  To use the new integrations, the role that you assume when running a state machine needs to have the appropriate permissions.  For more information, see the AWS Step Functions Developer Guide.

Using a job status poller

In our 2017 post series, we created a job poller “pattern” with two separate Lambda functions. When the job finishes, the state machine proceeds to the next step and operates according to the necessary business logic.  This is a useful pattern to manage asynchronous jobs when a direct integration is unavailable.

The steps in this building block state machine are as follows:

  1. A job is submitted through a Lambda function.
  2. The state machine queries the AWS Batch API for the job status in another Lambda function.
  3. The job status is checked to see if the job has completed.  If the job status equals SUCCESS, the final job status is logged. If the job status equals FAILED, the execution of the state machine ends. In all other cases, wait 30 seconds and go back to Step 2.

Both of the Submit Job and Get Job Lambda functions are available as example Lambda functions in the console.  The job status poller is available in the Step Functions console as a sample project.

Here is the JSON representing this state machine.

{
  "Comment": "A simple example that submits a job to AWS Batch",
  "StartAt": "SubmitJob",
  "States": {
    "SubmitJob": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>::function:batchSubmitJob",
      "Next": "GetJobStatus"
    },
    "GetJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "Next": "CheckJobStatus",
      "InputPath": "$",
      "ResultPath": "$.status"
    },
    "CheckJobStatus": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.status",
          "StringEquals": "FAILED",
          "End": true
        },
        {
          "Variable": "$.status",
          "StringEquals": "SUCCEEDED",
          "Next": "GetFinalJobStatus"
        }
      ],
      "Default": "Wait30Seconds"
    },
    "Wait30Seconds": {
      "Type": "Wait",
      "Seconds": 30,
      "Next": "GetJobStatus"
    },
    "GetFinalJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "End": true
    }
  }
}

With Step Functions Service Integrations

With Step Functions service integrations, it is now simpler to submit and wait for an AWS Batch job, or any other supported service.

The following code block is the JSON representing the new state machine for an asynchronous batch job. If you are familiar with the AWS Batch SubmitJob API action, you may notice that the parameters are consistent with what you would see in that API call. You can also use the optional AWS Batch parameters in addition to JobDefinition, JobName, and JobQueue.

{
 "StartAt": "RunBatchJob",
 "States": {
     "RunIsaacJob":{
     "Type":"Task",
     "Resource":"arn:aws:states:::batch:submitJob.sync",
     "Parameters":{
        "JobDefinition":"Isaac",
        "JobName.$":"$.isaac.JobName",
        "JobQueue":"HighPriority",
        "Parameters.$": "$.isaac"
     },
     "TimeoutSeconds": 900,
     "HeartbeatSeconds": 60,
     "Next":"Parallel",
     "InputPath":"$",
     "ResultPath":"$.status",
     "Retry" : [
        {
          "ErrorEquals": [ "States.Timeout" ],
          "IntervalSeconds": 3,
          "MaxAttempts": 2,
          "BackoffRate": 1.5
        }
     ]
  }
}

Here is an example of the workflow input JSON.  Pass all of the container parameters that were being constructed in the submit job Lambda function.

{
  "isaac": {
    "WorkingDir": "/scratch",
    "JobName": "isaac-1",
    "FastQ1S3Path": "s3://aws-batch-genomics-resources/fastq/SRR1919605_1.fastq.gz",
    "BAMS3FolderPath": "s3://aws-batch-genomics-resources/fastq/SRR1919605_2.fastq.gz",
    "FastQ2S3Path": "s3://bccn-genome-data/fastq/NIST7035_R2_trimmed.fastq.gz",
    "ReferenceS3Path": "s3://aws-batch-genomics-resources/reference/isaac/"
  }
}

When you deploy the job definition, add the command attribute that was previously being constructed in the Lambda function launching the AWS Batch job.

IsaacJobDefinition:
    Type: AWS::Batch::JobDefinition
    Properties:
      JobDefinitionName: "Isaac"
      Type: container
      RetryStrategy:
        Attempts: 1
      Parameters:
        BAMS3FolderPath: !Sub "s3://${JobResultsBucket}/NA12878_states_1/bam"
        FastQ1S3Path: "s3://aws-batch-genomics-resources/fastq/SRR1919605_1.fastq.gz"
        FastQ2S3Path: "s3://aws-batch-genomics-resources/fastq/SRR1919605_2.fastq.gz"
        ReferenceS3Path: "s3://aws-batch-genomics-resources/reference/isaac/"
        WorkingDir: "/scratch"
      ContainerProperties:
        Image: "rulaszek/isaac"
        Vcpus: 32
        Memory: 80000
        JobRoleArn:
          Fn::ImportValue: !Sub "${RoleStackName}:ECSTaskRole"
        Command:
          - "--bam_s3_folder_path"
          - "Ref::BAMS3FolderPath"
          - "--fastq1_s3_path"
          - "Ref::FastQ1S3Path"
          - "--fastq2_s3_path"
          - "Ref::FastQ2S3Path"
          - "--reference_s3_path"
          - "Ref::ReferenceS3Path"
          - "--working_dir"
          - "Ref::WorkingDir"
        MountPoints:
          - ContainerPath: "/scratch"
            ReadOnly: false
            SourceVolume: docker_scratch
        Volumes:
          - Name: docker_scratch
            Host:
              SourcePath: "/docker_scratch"

The key-value parameters passed into the workflow are mapped using Parameters.$ to the values in the job definition using the keys.  Value substitutions do take place. The Docker run looks like the following:

docker run <isaac_container_uri> --bam_s3_folder_path s3://batch-genomics-pipeline-jobresultsbucket-1kzdu216m2b0k/NA12878_states_3/bam
                                 --fastq1_s3_path s3://aws-batch-genomics-resources/fastq/SRR1919605_1.fastq.gz
                                 --fastq2_s3_path s3://aws-batch-genomics-resources/fastq/SRR1919605_2.fastq.gz 
                                 --reference_s3_path s3://aws-batch-genomics-resources/reference/isaac/ 
                                 --working_dir /scratch

Genomics workflow: Before and after

Overall, connectors dramatically simplify your genomics workflow.  The following workflow is a simple genomics secondary analysis pipeline, which we highlighted in our original post series.

The first step aligns the sample against a reference genome.  When alignment is complete, variant calling and QA metrics are calculated in two parallel steps.  When variant calling is complete, variant annotation is performed.  Before, our genomics workflow looked like this:

Now it looks like this:

Here is the new workflow JSON:

{
   "Comment":"A simple genomics secondary-analysis workflow",
   "StartAt":"RunIsaacJob",
   "States":{
      "RunIsaacJob":{
         "Type":"Task",
         "Resource":"arn:aws:states:::batch:submitJob.sync",
         "Parameters":{
            "JobDefinition":"Isaac",
            "JobName.$":"$.isaac.JobName",
            "JobQueue":"HighPriority",
            "Parameters.$": "$.isaac"
         },
         "TimeoutSeconds": 900,
         "HeartbeatSeconds": 60,
         "Next":"Parallel",
         "InputPath":"$",
         "ResultPath":"$.status",
         "Retry" : [
            {
              "ErrorEquals": [ "States.Timeout" ],
              "IntervalSeconds": 3,
              "MaxAttempts": 2,
              "BackoffRate": 1.5
            }
         ]
      },
      "Parallel":{
         "Type":"Parallel",
         "Next":"FinalState",
         "Branches":[
            {
               "StartAt":"RunStrelkaJob",
               "States":{
                  "RunStrelkaJob":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::batch:submitJob.sync",
                     "Parameters":{
                        "JobDefinition":"Strelka",
                        "JobName.$":"$.strelka.JobName",
                        "JobQueue":"HighPriority",
                        "Parameters.$": "$.strelka"
                     },
                     "TimeoutSeconds": 900,
                     "HeartbeatSeconds": 60,
                     "Next":"RunSnpEffJob",
                     "InputPath":"$",
                     "ResultPath":"$.status",
                     "Retry" : [
                        {
                          "ErrorEquals": [ "States.Timeout" ],
                          "IntervalSeconds": 3,
                          "MaxAttempts": 2,
                          "BackoffRate": 1.5
                        }
                     ]
                  },
                  "RunSnpEffJob":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::batch:submitJob.sync",
                     "Parameters":{
                        "JobDefinition":"SNPEff",
                        "JobName.$":"$.snpeff.JobName",
                        "JobQueue":"HighPriority",
                        "Parameters.$": "$.snpeff"
                     },
                     "TimeoutSeconds": 900,
                     "HeartbeatSeconds": 60,
                     "Retry" : [
                        {
                          "ErrorEquals": [ "States.Timeout" ],
                          "IntervalSeconds": 3,
                          "MaxAttempts": 2,
                          "BackoffRate": 1.5
                        }
                     ],
                     "End":true
                  }
               }
            },
            {
               "StartAt":"RunSamtoolsStatsJob",
               "States":{
                  "RunSamtoolsStatsJob":{
                     "Type":"Task",
                     "Resource":"arn:aws:states:::batch:submitJob.sync",
                     "Parameters":{
                        "JobDefinition":"SamtoolsStats",
                        "JobName.$":"$.samtools.JobName",
                        "JobQueue":"HighPriority",
                        "Parameters.$": "$.samtools"
                     },
                     "TimeoutSeconds": 900,
                     "HeartbeatSeconds": 60,
                     "End":true,
                     "Retry" : [
                        {
                          "ErrorEquals": [ "States.Timeout" ],
                          "IntervalSeconds": 3,
                          "MaxAttempts": 2,
                          "BackoffRate": 1.5
                        }
                     ]
                  }
               }
            }
         ]
      },
      "FinalState":{
         "Type":"Pass",
         "End":true
      }
   }
}

Here is the new Amazon CloudFormation template for deploying the AWS Batch job definitions for each tool:

AWSTemplateFormatVersion: 2010-09-09

Description: Batch job definitions for batch genomics

Parameters:
  RoleStackName:
    Description: "Stack that deploys roles for genomic workflow"
    Type: String
  VPCStackName:
    Description: "Stack that deploys vps for genomic workflow"
    Type: String
  JobResultsBucket:
    Description: "Bucket that holds workflow job results"
    Type: String

Resources:
  IsaacJobDefinition:
    Type: AWS::Batch::JobDefinition
    Properties:
      JobDefinitionName: "Isaac"
      Type: container
      RetryStrategy:
        Attempts: 1
      Parameters:
        BAMS3FolderPath: !Sub "s3://${JobResultsBucket}/NA12878_states_1/bam"
        FastQ1S3Path: "s3://aws-batch-genomics-resources/fastq/SRR1919605_1.fastq.gz"
        FastQ2S3Path: "s3://aws-batch-genomics-resources/fastq/SRR1919605_2.fastq.gz"
        ReferenceS3Path: "s3://aws-batch-genomics-resources/reference/isaac/"
        WorkingDir: "/scratch"
      ContainerProperties:
        Image: "rulaszek/isaac"
        Vcpus: 32
        Memory: 80000
        JobRoleArn:
          Fn::ImportValue: !Sub "${RoleStackName}:ECSTaskRole"
        Command:
          - "--bam_s3_folder_path"
          - "Ref::BAMS3FolderPath"
          - "--fastq1_s3_path"
          - "Ref::FastQ1S3Path"
          - "--fastq2_s3_path"
          - "Ref::FastQ2S3Path"
          - "--reference_s3_path"
          - "Ref::ReferenceS3Path"
          - "--working_dir"
          - "Ref::WorkingDir"
        MountPoints:
          - ContainerPath: "/scratch"
            ReadOnly: false
            SourceVolume: docker_scratch
        Volumes:
          - Name: docker_scratch
            Host:
              SourcePath: "/docker_scratch"

  StrelkaJobDefinition:
    Type: AWS::Batch::JobDefinition
    Properties:
      JobDefinitionName: "Strelka"
      Type: container
      RetryStrategy:
        Attempts: 1
      Parameters:
        BAMS3Path: !Sub "s3://${JobResultsBucket}/NA12878_states_1/bam/sorted.bam"
        BAIS3Path: !Sub "s3://${JobResultsBucket}/NA12878_states_1/bam/sorted.bam.bai"
        ReferenceS3Path: "s3://aws-batch-genomics-resources/reference/hg38.fa"
        ReferenceIndexS3Path: "s3://aws-batch-genomics-resources/reference/hg38.fa.fai"
        VCFS3Path: !Sub "s3://${JobResultsBucket}/NA12878_states_1/vcf"
        WorkingDir: "/scratch"
      ContainerProperties:
        Image: "rulaszek/strelka"
        Vcpus: 32
        Memory: 32000
        JobRoleArn:
          Fn::ImportValue: !Sub "${RoleStackName}:ECSTaskRole"
        Command:
          - "--bam_s3_path"
          - "Ref::BAMS3Path"
          - "--bai_s3_path"
          - "Ref::BAIS3Path"
          - "--reference_s3_path"
          - "Ref::ReferenceS3Path"
          - "--reference_index_s3_path"
          - "Ref::ReferenceIndexS3Path"
          - "--vcf_s3_path"
          - "Ref::VCFS3Path"
          - "--working_dir"
          - "Ref::WorkingDir"
        MountPoints:
          - ContainerPath: "/scratch"
            ReadOnly: false
            SourceVolume: docker_scratch
        Volumes:
          - Name: docker_scratch
            Host:
              SourcePath: "/docker_scratch"

  SnpEffJobDefinition:
    Type: AWS::Batch::JobDefinition
    Properties:
      JobDefinitionName: "SNPEff"
      Type: container
      RetryStrategy:
        Attempts: 1
      Parameters:
        VCFS3Path: !Sub "s3://${JobResultsBucket}/NA12878_states_1/vcf/variants/genome.vcf.gz"
        AnnotatedVCFS3Path: !Sub "s3://${JobResultsBucket}/NA12878_states_1/vcf/genome.anno.vcf"
        CommandArgs: " -t hg38 "
        WorkingDir: "/scratch"
      ContainerProperties:
        Image: "rulaszek/snpeff"
        Vcpus: 4
        Memory: 10000
        JobRoleArn:
          Fn::ImportValue: !Sub "${RoleStackName}:ECSTaskRole"
        Command:
          - "--annotated_vcf_s3_path"
          - "Ref::AnnotatedVCFS3Path"
          - "--vcf_s3_path"
          - "Ref::VCFS3Path"
          - "--cmd_args"
          - "Ref::CommandArgs"
          - "--working_dir"
          - "Ref::WorkingDir"
        MountPoints:
          - ContainerPath: "/scratch"
            ReadOnly: false
            SourceVolume: docker_scratch
        Volumes:
          - Name: docker_scratch
            Host:
              SourcePath: "/docker_scratch"

  SamtoolsStatsJobDefinition:
    Type: AWS::Batch::JobDefinition
    Properties:
      JobDefinitionName: "SamtoolsStats"
      Type: container
      RetryStrategy:
        Attempts: 1
      Parameters:
        ReferenceS3Path: "s3://aws-batch-genomics-resources/reference/hg38.fa"
        BAMS3Path: !Sub "s3://${JobResultsBucket}/NA12878_states_1/bam/sorted.bam"
        BAMStatsS3Path: !Sub "s3://${JobResultsBucket}/NA12878_states_1/bam/sorted.bam.stats"
        WorkingDir: "/scratch"
      ContainerProperties:
        Image: "rulaszek/samtools-stats"
        Vcpus: 4
        Memory: 10000
        JobRoleArn:
          Fn::ImportValue: !Sub "${RoleStackName}:ECSTaskRole"
        Command:
          - "--bam_s3_path"
          - "Ref::BAMS3Path"
          - "--bam_stats_s3_path"
          - "Ref::BAMStatsS3Path"
          - "--reference_s3_path"
          - "Ref::ReferenceS3Path"
          - "--working_dir"
          - "Ref::WorkingDir"
        MountPoints:
          - ContainerPath: "/scratch"
            ReadOnly: false
            SourceVolume: docker_scratch
        Volumes:
          - Name: docker_scratch
            Host:
              SourcePath: "/docker_scratch"

Here is the new CloudFormation script that deploys the new workflow:

AWSTemplateFormatVersion: 2010-09-09

Description: State Machine for batch benomics

Parameters:
  RoleStackName:
    Description: "Stack that deploys roles for genomic workflow"
    Type: String
  VPCStackName:
    Description: "Stack that deploys vps for genomic workflow"
    Type: String

Resources:
  # S3
  GenomicWorkflow:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      RoleArn:
        Fn::ImportValue: !Sub "${RoleStackName}:StatesExecutionRole"
      DefinitionString: !Sub |-
        {
           "Comment":"A simple example that submits a job to AWS Batch",
           "StartAt":"RunIsaacJob",
           "States":{
              "RunIsaacJob":{
                 "Type":"Task",
                 "Resource":"arn:aws:states:::batch:submitJob.sync",
                 "Parameters":{
                    "JobDefinition":"Isaac",
                    "JobName.$":"$.isaac.JobName",
                    "JobQueue":"HighPriority",
                    "Parameters.$": "$.isaac"
                 },
                 "TimeoutSeconds": 900,
                 "HeartbeatSeconds": 60,
                 "Next":"Parallel",
                 "InputPath":"$",
                 "ResultPath":"$.status",
                 "Retry" : [
                    {
                      "ErrorEquals": [ "States.Timeout" ],
                      "IntervalSeconds": 3,
                      "MaxAttempts": 2,
                      "BackoffRate": 1.5
                    }
                 ]
              },
              "Parallel":{
                 "Type":"Parallel",
                 "Next":"FinalState",
                 "Branches":[
                    {
                       "StartAt":"RunStrelkaJob",
                       "States":{
                          "RunStrelkaJob":{
                             "Type":"Task",
                             "Resource":"arn:aws:states:::batch:submitJob.sync",
                             "Parameters":{
                                "JobDefinition":"Strelka",
                                "JobName.$":"$.strelka.JobName",
                                "JobQueue":"HighPriority",
                                "Parameters.$": "$.strelka"
                             },
                             "TimeoutSeconds": 900,
                             "HeartbeatSeconds": 60,
                             "Next":"RunSnpEffJob",
                             "InputPath":"$",
                             "ResultPath":"$.status",
                             "Retry" : [
                                {
                                  "ErrorEquals": [ "States.Timeout" ],
                                  "IntervalSeconds": 3,
                                  "MaxAttempts": 2,
                                  "BackoffRate": 1.5
                                }
                             ]
                          },
                          "RunSnpEffJob":{
                             "Type":"Task",
                             "Resource":"arn:aws:states:::batch:submitJob.sync",
                             "Parameters":{
                                "JobDefinition":"SNPEff",
                                "JobName.$":"$.snpeff.JobName",
                                "JobQueue":"HighPriority",
                                "Parameters.$": "$.snpeff"
                             },
                             "TimeoutSeconds": 900,
                             "HeartbeatSeconds": 60,
                             "Retry" : [
                                {
                                  "ErrorEquals": [ "States.Timeout" ],
                                  "IntervalSeconds": 3,
                                  "MaxAttempts": 2,
                                  "BackoffRate": 1.5
                                }
                             ],
                             "End":true
                          }
                       }
                    },
                    {
                       "StartAt":"RunSamtoolsStatsJob",
                       "States":{
                          "RunSamtoolsStatsJob":{
                             "Type":"Task",
                             "Resource":"arn:aws:states:::batch:submitJob.sync",
                             "Parameters":{
                                "JobDefinition":"SamtoolsStats",
                                "JobName.$":"$.samtools.JobName",
                                "JobQueue":"HighPriority",
                                "Parameters.$": "$.samtools"
                             },
                             "TimeoutSeconds": 900,
                             "HeartbeatSeconds": 60,
                             "End":true,
                             "Retry" : [
                                {
                                  "ErrorEquals": [ "States.Timeout" ],
                                  "IntervalSeconds": 3,
                                  "MaxAttempts": 2,
                                  "BackoffRate": 1.5
                                }
                             ]
                          }
                       }
                    }
                 ]
              },
              "FinalState":{
                 "Type":"Pass",
                 "End":true
              }
           }
        }

Outputs:
  GenomicsWorkflowArn:
    Description: GenomicWorkflow ARN
    Value: !Ref GenomicWorkflow
  StackName:
    Description: StackName
    Value: !Sub ${AWS::StackName}

Conclusion

AWS Step Functions service integrations are a great way to simplify creating complex workflows with asynchronous steps. While we highlighted the use case with AWS Batch today, there are many other ways that healthcare and life sciences customers can use this new feature, such as with message processing.

For more information about how AWS can enable your genomics workloads, be sure to check out the AWS Genomics page.

We’ve updated the open-source project to take advantage of the new AWS Batch integration in Step Functions.  You can find the changes aws-batch-genomics/tree/v2.0.0 folder.

Original posts in this four-part series:

Happy coding!