Tag Archives: Amazon Aurora

Dream11: Scaling a Fantasy Sports Platform with 5M Daily Active Users

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/scaling-fantasy-sports-platform/

Founded in 2008, Dream11 is India’s leading sports-tech startup with a growing base of more than 45 million users playing multiple sports such as fantasy cricket, football, kabaddi, and basketball.

Dream11 uses Amazon Aurora with Amazon ElastiCache to serve 1 million concurrent users within 50ms response time, serving at an average 3 million requests per minute (rpm), which can surge to 3X in a 30-second time span. In this video, we’ll be shedding some light on our architecture, along with our battle plan to handle transactions without locking. We’ll also talk about the features of Aurora and ElastiCache that helped Dream11 to handle 5 million daily active users with 4X growth YoY.

For more content like this, subscribe to our YouTube channels This is My Architecture, This is My Code, and This is My Model, or visit the This is My Architecture on AWS, which has search functionality and the ability to filter by industry, language, and service.

Introducing the new Serverless LAMP stack

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/introducing-the-new-serverless-lamp-stack/

This is the first in a series of posts for PHP developers. The series will explain how to use serverless technologies with PHP. It covers the available tools, frameworks and strategies to build serverless applications, and why now is the right time to start.

In future posts, I demonstrate how to use AWS Lambda for web applications built with PHP frameworks such as Laravel and Symphony. I show how to move from using Lambda as a replacement for web hosting functionality to a decoupled, event-driven approach. I cover how to combine multiple Lambda functions of minimal scope with other serverless services to create performant scalable microservices.

In this post, you learn how to use PHP with Lambda via the custom runtime API. Visit this GitHub repository for the sample code.

The Serverless LAMP stack

The Serverless LAMP stack

The challenges with traditional PHP applications

Scalability is an inherent challenge with the traditional LAMP stack. A scalable application is one that can handle highly variable levels of traffic. PHP applications are often scaled horizontally, by adding more web servers as needed. This is managed via a load balancer, which directs requests to various web servers. Each additional server brings additional overhead with networking, administration, storage capacity, backup and restore systems, and an update to asset management inventories. Additionally, each horizontally scaled server runs independently. This can result in configuration synchronization challenges.

Horizontal scaling with traditional LAMP stack applications.

Horizontal scaling with traditional LAMP stack applications.

New storage challenges arise as each server has its own disks and filesystem, often requiring developers to add a mechanism to handle user sessions. Using serverless technologies, scalability is managed for the developer.

If traffic surges, the services scale to meet the demand without having to deploy additional servers. This allows applications to quickly transition from prototype to production.

The serverless LAMP architecture

A traditional web application can be split in to two components:

  • The static assets (media files, css, js)
  • The dynamic application (PHP, MySQL)

A serverless approach to serving these two components is illustrated below:

The serverless LAMP stack

The serverless LAMP stack

All requests for dynamic content (anything excluding /assets/*) are forwarded to Amazon API Gateway. This is a fully managed service for creating, publishing, and securing APIs at any scale. It acts as the “front door” to the PHP application, routing requests downstream to Lambda functions. The Lambda functions contain the business logic and interaction with the MySQL database. You can pass the input to the Lambda function as any combination of request headers, path variables, query string parameters, and body.

Notable AWS features for PHP developers

Amazon Aurora Serverless

During re:Invent 2017, AWS announced Aurora Serverless, an on-demand serverless relational database with a pay-per-use cost model. This manages the responsibility of relational database provisioning and scaling for the developer.

Lambda Layers and custom runtime API.

At re:Invent 2018, AWS announced two new Lambda features. These enable developers to build custom runtimes, and share and manage common code between functions.

Improved VPC networking for Lambda functions.

In September 2019, AWS announced significant improvements in cold starts for Lambda functions inside a VPC. This results in faster function startup performance and more efficient usage of elastic network interfaces, reducing VPC cold starts.

Amazon RDS Proxy

At re:Invent 2019, AWS announced the launch of a new service called Amazon RDS Proxy. A fully managed database proxy that sits between your application and your relational database. It efficiently pools and shares database connections to improve the scalability of your application.

 

Significant moments in the serverless LAMP stack timeline

Significant moments in the serverless LAMP stack timeline

Combining these services, it is now it is possible to build secure and performant scalable serverless applications with PHP and relational databases.

Custom runtime API

The custom runtime API is a simple interface to enable Lambda function execution in any programming language or a specific language version. The custom runtime API requires an executable text file called a bootstrap. The bootstrap file is responsible for the communication between your code and the Lambda environment.

To create a custom runtime, you must first compile the required version of PHP in an Amazon Linux environment compatible with the Lambda execution environment .To do this, follow these step-by-step instructions.

The bootstrap file

The file below is an example of a basic PHP bootstrap file. This example is for explanation purposes as there is no error handling or abstractions taking place. To ensure that you handle exceptions appropriately, consult the runtime API documentation as you build production custom runtimes.

#!/opt/bin/php
<?PHP

// This invokes Composer's autoloader so that we'll be able to use Guzzle and any other 3rd party libraries we need.
require __DIR__ . '/vendor/autoload.php;

// This is the request processing loop. Barring unrecoverable failure, this loop runs until the environment shuts down.
do {
    // Ask the runtime API for a request to handle.
    $request = getNextRequest();

    // Obtain the function name from the _HANDLER environment variable and ensure the function's code is available.
    $handlerFunction = array_slice(explode('.', $_ENV['_HANDLER']), -1)[0];
    require_once $_ENV['LAMBDA_TASK_ROOT'] . '/src/' . $handlerFunction . '.php;

    // Execute the desired function and obtain the response.
    $response = $handlerFunction($request['payload']);

    // Submit the response back to the runtime API.
    sendResponse($request['invocationId'], $response);
} while (true);

function getNextRequest()
{
    $client = new \GuzzleHttp\Client();
    $response = $client->get('http://' . $_ENV['AWS_LAMBDA_RUNTIME_API'] . '/2018-06-01/runtime/invocation/next');

    return [
      'invocationId' => $response->getHeader('Lambda-Runtime-Aws-Request-Id')[0],
      'payload' => json_decode((string) $response->getBody(), true)
    ];
}

function sendResponse($invocationId, $response)
{
    $client = new \GuzzleHttp\Client();
    $client->post(
    'http://' . $_ENV['AWS_LAMBDA_RUNTIME_API'] . '/2018-06-01/runtime/invocation/' . $invocationId . '/response',
       ['body' => $response]
    );
}

The #!/opt/bin/php declaration instructs the program loader to use the PHP binary compiled for Amazon Linux.

The bootstrap file performs the following tasks, in an operational loop:

  1. Obtains the next request.
  2. Executes the code to handle the request.
  3. Returns a response.

Follow these steps to package the bootstrap and compiled PHP binary together into a `runtime.zip`.

Libraries and dependencies

The runtime bootstrap uses an HTTP-based local interface. This retrieves the event payload for each Lambda function invocation and returns back the response from the function. This bootstrap file uses Guzzle, a popular PHP HTTP client, to make requests to the custom runtime API. The Guzzle package is installed using Composer package manager. Installing packages in this way creates a mechanism for incorporating additional libraries and dependencies as the application evolves.

Follow these steps to create and package the runtime dependencies into a `vendors.zip` binary.

Lambda Layers provides a mechanism to centrally manage code and data that is shared across multiple functions. When a Lambda function is configured with a layer, the layer’s contents are put into the /opt directory of the execution environment. You can include a custom runtime in your function’s deployment package, or as a layer. Lambda executes the bootstrap file in your deployment package, if available. If not, Lambda looks for a runtime in the function’s layers. There are several open source PHP runtime layers available today, most notably:

The following steps show how to publish the `runtime.zip` and `vendor.zip` binaries created earlier into Lambda layers and use them to build a Lambda function with a PHP runtime:

  1.  Use the AWS Command Line Interface (CLI) to publish layers from the binaries created earlier
    aws lambda publish-layer-version \
        --layer-name PHP-example-runtime \
        --zip-file fileb://runtime.zip \
        --region eu-west-1

    aws lambda publish-layer-version \
        --layer-name PHP-example-vendor \
        --zip-file fileb://vendors.zip \
        --region eu-west-1

  2. Make note of each command’s LayerVersionArn output value (for example arn:aws:lambda:eu-west-1:XXXXXXXXXXXX:layer:PHP-example-runtime:1), which you’ll need for the next steps.

Creating a PHP Lambda function

You can create a Lambda function via the AWS CLI, the AWS Serverless Application Model (SAM), or directly in the AWS Management Console. To do this using the console:

  1. Navigate to the Lambda section  of the AWS Management Console and choose Create function.
  2. Enter “PHPHello” into the Function name field, and choose Provide your own bootstrap in the Runtime field. Then choose Create function.
  3. Right click on bootstrap.sample and choose Delete.
  4. Choose the layers icon and choose Add a layer.
  5. Choose Provide a layer version ARN, then copy and paste the ARN of the custom runtime layer from in step 1 into the Layer version ARN field.
  6. Repeat steps 6 and 7 for the vendor ARN.
  7. In the Function Code section, create a new folder called src and inside it create a new file called index.php.
  8. Paste the following code into index.php:
    //index function
    function index($data)
    {
     return "Hello, ". $data['name'];
    }
    
  9. Insert “index” into the Handler input field. This instructs Lambda to run the index function when invoked.
  10. Choose Save at the top right of the page.
  11. Choose Test at the top right of the page, and  enter “PHPTest” into the Event name field. Enter the following into the event payload field and then choose Create:{ "name": "world"}
  12. Choose Test and Select the dropdown next to the execution result heading.

You can see that the event payload “name” value is used to return “hello world”. This is taken from the $data['name'] parameter provided to the Lambda function. The log output provides details about the actual duration, billed duration, and amount of memory used to execute the code.

Conclusion

This post explains how to create a Lambda function with a PHP runtime using Lambda Layers and the custom runtime API. It introduces the architecture for a serverless LAMP stack that scales with application traffic.

Lambda allows for functions with mixed runtimes to interact with each other. Now, PHP developers can join other serverless development teams focusing on shipping code. With serverless technologies, you no longer have to think about restarting webhosts, scaling or hosting.

Start building your own custom runtime for Lambda.

Building a Scalable Document Pre-Processing Pipeline

Post Syndicated from Joel Knight original https://aws.amazon.com/blogs/architecture/building-a-scalable-document-pre-processing-pipeline/

In a recent customer engagement, Quantiphi, Inc., a member of the Amazon Web Services Partner Network, built a solution capable of pre-processing tens of millions of PDF documents before sending them for inference by a machine learning (ML) model. While the customer’s use case—and hence the ML model—was very specific to their needs, the pipeline that does the pre-processing of documents is reusable for a wide array of document processing workloads. This post will walk you through the pre-processing pipeline architecture.

Pre-processing pipeline architecture-SM

Architectural goals

Quantiphi established the following goals prior to starting:

  • Loose coupling to enable independent scaling of compute components, flexible selection of compute services, and agility as the customer’s requirements evolved.
  • Work backwards from business requirements when making decisions affecting scale and throughput and not simply because “fastest is best.” Scale components only where it makes sense and for maximum impact.
  •  Log everything at every stage to enable troubleshooting when something goes wrong, provide a detailed audit trail, and facilitate cost optimization exercises by identifying usage and load of every compute component in the architecture.

Document ingestion

The documents are initially stored in a staging bucket in Amazon Simple Storage Service (Amazon S3). The processing pipeline is kicked off when the “trigger” Amazon Lambda function is called. This Lambda function passes parameters such as the name of the staging S3 bucket and the path(s) within the bucket which are to be processed to the “ingestion app.”

The ingestion app is a simple application that runs a web service to enable triggering a batch and lists documents from the S3 bucket path(s) received via the web service. As the app processes the list of documents, it feeds the document path, S3 bucket name, and some additional metadata to the “ingest” Amazon Simple Queue Service (Amazon SQS) queue. The ingestion app also starts the audit trail for the document by writing a record to the Amazon Aurora database. As the document moves downstream, additional records are added to the database. Records are joined together by a unique ID and assigned to each document by the ingestion app and passed along throughout the pipeline.

Chunking the documents

In order to maximize grip and control, the architecture is built to submit single-page files to the ML model. This enables correlating an inference failure to a specific page instead of a whole document (which may be many pages long). It also makes identifying the location of features within the inference results an easier task. Since the documents being processed can have varied sizes, resolutions, and page count, a big part of the pre-processing pipeline is to chunk a document up into its component pages prior to sending it for inference.

The “chunking orchestrator” app repeatedly pulls a message from the ingest queue and retrieves the document named therein from the S3 bucket. The PDF document is then classified along two metrics:

  • File size
  • Number of pages

We use these metrics to determine which chunking queue the document is sent to:

  • Large: Greater than 10MB in size or greater than 10 pages
  • Small: Less than or equal to 10MB and less than or equal to 10 pages
  • Single page: Less than or equal to 10MB and exactly one page

Each of these queues is serviced by an appropriately sized compute service that breaks the document down into smaller pieces, and ultimately, into individual pages.

  • Amazon Elastic Cloud Compute (EC2) processes large documents primarily because of the high memory footprint needed to read large, multi-gigabyte PDF files into memory. The output from these workers are smaller PDF documents that are stored in Amazon S3. The name and location of these smaller documents is submitted to the “small documents” queue.
  • Small documents are processed by a Lambda function that decomposes the document into single pages that are stored in Amazon S3. The name and location of these single page files is sent to the “single page” queue.

The Dead Letter Queues (DLQs) are used to hold messages from their respective size queue which are not successfully processed. If messages start landing in the DLQs, it’s an indication that there is a problem in the pipeline. For example, if messages start landing in the “small” or “single page” DLQ, it could indicate that the Lambda function processing those respective queues has reached its maximum run time.

An Amazon CloudWatch Alarm monitors the depth of each DLQ. Upon seeing DLQ activity, a notification is sent via Amazon Simple Notification Service (Amazon SNS) so an administrator can then investigate and make adjustments such as tuning the sizing thresholds to ensure the Lambda functions can finish before reaching their maximum run time.

In order to ensure no documents are left behind in the active run, there is a failsafe in the form of an Amazon EC2 worker that retrieves and processes messages from the DLQs. This failsafe app breaks a PDF all the way down into individual pages and then does image conversion.

For documents that don’t fall into a DLQ, they make it to the “single page” queue. This queue drives each page through the “image conversion” Lambda function which converts the single page file from PDF to PNG format. These PNG files are stored in Amazon S3.

Sending for inference

At this point, the documents have been chunked up and are ready for inference.

When the single-page image files land in Amazon S3, an S3 Event Notification is fired which places a message in a “converted image” SQS queue which in turn triggers the “model endpoint” Lambda function. This function calls an API endpoint on an Amazon API Gateway that is fronting the Amazon SageMaker inference endpoint. Using API Gateway with SageMaker endpoints avoided throttling during Lambda function execution due to high volumes of concurrent calls to the Amazon SageMaker API. This pattern also resulted in a 2x inference throughput speedup. The Lambda function passes the document’s S3 bucket name and path to the API which in turn passes it to the auto scaling SageMaker endpoint. The function reads the inference results that are passed back from API Gateway and stores them in Amazon Aurora.

The inference results as well as all the telemetry collected as the document was processed can be queried from the Amazon Aurora database to build reports showing number of documents processed, number of documents with failures, and number of documents with or without whatever feature(s) the ML model is trained to look for.

Summary

This architecture is able to take PDF documents that range in size from single page up to thousands of pages or gigabytes in size, pre-process them into single page image files, and then send them for inference by a machine learning model. Once triggered, the pipeline is completely automated and is able to scale to tens of millions of pages per batch.

In keeping with the architectural goals of the project, Amazon SQS is used throughout in order to build a loosely coupled system which promotes agility, scalability, and resiliency. Loose coupling also enables a high degree of grip and control over the system making it easier to respond to changes in business needs as well as focusing tuning efforts for maximum impact. And with every compute component logging everything it does, the system provides a high degree of auditability and introspection which facilitates performance monitoring, and detailed cost optimization.

ICYMI: Serverless Q4 2019

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2019/

Welcome to the eighth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

The three months comprising the fourth quarter of 2019

AWS re:Invent

AWS re:Invent 2019

re:Invent 2019 dominated the fourth quarter at AWS. The serverless team presented a number of talks, workshops, and builder sessions to help customers increase their skills and deliver value more rapidly to their own customers.

Serverless talks from re:Invent 2019

Chris Munns presenting 'Building microservices with AWS Lambda' at re:Invent 2019

We presented dozens of sessions showing how customers can improve their architecture and agility with serverless. Here are some of the most popular.

Videos

Decks

You can also find decks for many of the serverless presentations and other re:Invent presentations on our AWS Events Content.

AWS Lambda

For developers needing greater control over performance of their serverless applications at any scale, AWS Lambda announced Provisioned Concurrency at re:Invent. This feature enables Lambda functions to execute with consistent start-up latency making them ideal for building latency sensitive applications.

As shown in the below graph, provisioned concurrency reduces tail latency, directly impacting response times and providing a more responsive end user experience.

Graph showing performance enhancements with AWS Lambda Provisioned Concurrency

Lambda rolled out enhanced VPC networking to 14 additional Regions around the world. This change brings dramatic improvements to startup performance for Lambda functions running in VPCs due to more efficient usage of elastic network interfaces.

Illustration of AWS Lambda VPC to VPC NAT

New VPC to VPC NAT for Lambda functions

Lambda now supports three additional runtimes: Node.js 12, Java 11, and Python 3.8. Each of these new runtimes has new version-specific features and benefits, which are covered in the linked release posts. Like the Node.js 10 runtime, these new runtimes are all based on an Amazon Linux 2 execution environment.

Lambda released a number of controls for both stream and async-based invocations:

  • You can now configure error handling for Lambda functions consuming events from Amazon Kinesis Data Streams or Amazon DynamoDB Streams. It’s now possible to limit the retry count, limit the age of records being retried, configure a failure destination, or split a batch to isolate a problem record. These capabilities help you deal with potential “poison pill” records that would previously cause streams to pause in processing.
  • For asynchronous Lambda invocations, you can now set the maximum event age and retry attempts on the event. If either configured condition is met, the event can be routed to a dead letter queue (DLQ), Lambda destination, or it can be discarded.

AWS Lambda Destinations is a new feature that allows developers to designate an asynchronous target for Lambda function invocation results. You can set separate destinations for success and failure. This unlocks new patterns for distributed event-based applications and can replace custom code previously used to manage routing results.

Illustration depicting AWS Lambda Destinations with success and failure configurations

Lambda Destinations

Lambda also now supports setting a Parallelization Factor, which allows you to set multiple Lambda invocations per shard for Kinesis Data Streams and DynamoDB Streams. This enables faster processing without the need to increase your shard count, while still guaranteeing the order of records processed.

Illustration of multiple AWS Lambda invocations per Kinesis Data Streams shard

Lambda Parallelization Factor diagram

Lambda introduced Amazon SQS FIFO queues as an event source. “First in, first out” (FIFO) queues guarantee the order of record processing, unlike standard queues. FIFO queues support messaging batching via a MessageGroupID attribute that supports parallel Lambda consumers of a single FIFO queue, enabling high throughput of record processing by Lambda.

Lambda now supports Environment Variables in the AWS China (Beijing) Region and the AWS China (Ningxia) Region.

You can now view percentile statistics for the duration metric of your Lambda functions. Percentile statistics show the relative standing of a value in a dataset, and are useful when applied to metrics that exhibit large variances. They can help you understand the distribution of a metric, discover outliers, and find hard-to-spot situations that affect customer experience for a subset of your users.

Amazon API Gateway

Screen capture of creating an Amazon API Gateway HTTP API in the AWS Management Console

Amazon API Gateway announced the preview of HTTP APIs. In addition to significant performance improvements, most customers see an average cost savings of 70% when compared with API Gateway REST APIs. With HTTP APIs, you can create an API in four simple steps. Once the API is created, additional configuration for CORS and JWT authorizers can be added.

AWS SAM CLI

Screen capture of the new 'sam deploy' process in a terminal window

The AWS SAM CLI team simplified the bucket management and deployment process in the SAM CLI. You no longer need to manage a bucket for deployment artifacts – SAM CLI handles this for you. The deployment process has also been streamlined from multiple flagged commands to a single command, sam deploy.

AWS Step Functions

One powerful feature of AWS Step Functions is its ability to integrate directly with AWS services without you needing to write complicated application code. In Q4, Step Functions expanded its integration with Amazon SageMaker to simplify machine learning workflows. Step Functions also added a new integration with Amazon EMR, making EMR big data processing workflows faster to build and easier to monitor.

Screen capture of an AWS Step Functions step with Amazon EMR

Step Functions step with EMR

Step Functions now provides the ability to track state transition usage by integrating with AWS Budgets, allowing you to monitor trends and react to usage on your AWS account.

You can now view CloudWatch Metrics for Step Functions at a one-minute frequency. This makes it easier to set up detailed monitoring for your workflows. You can use one-minute metrics to set up CloudWatch Alarms based on your Step Functions API usage, Lambda functions, service integrations, and execution details.

Step Functions now supports higher throughput workflows, making it easier to coordinate applications with high event rates. This increases the limits to 1,500 state transitions per second and a default start rate of 300 state machine executions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). Click the above link to learn more about the limit increases in other Regions.

Screen capture of choosing Express Workflows in the AWS Management Console

Step Functions released AWS Step Functions Express Workflows. With the ability to support event rates greater than 100,000 per second, this feature is designed for high-performance workloads at a reduced cost.

Amazon EventBridge

Illustration of the Amazon EventBridge schema registry and discovery service

Amazon EventBridge announced the preview of the Amazon EventBridge schema registry and discovery service. This service allows developers to automate discovery and cataloging event schemas for use in their applications. Additionally, once a schema is stored in the registry, you can generate and download a code binding that represents the schema as an object in your code.

Amazon SNS

Amazon SNS now supports the use of dead letter queues (DLQ) to help capture unhandled events. By enabling a DLQ, you can catch events that are not processed and re-submit them or analyze to locate processing issues.

Amazon CloudWatch

Amazon CloudWatch announced Amazon CloudWatch ServiceLens to provide a “single pane of glass” to observe health, performance, and availability of your application.

Screenshot of Amazon CloudWatch ServiceLens in the AWS Management Console

CloudWatch ServiceLens

CloudWatch also announced a preview of a capability called Synthetics. CloudWatch Synthetics allows you to test your application endpoints and URLs using configurable scripts that mimic what a real customer would do. This enables the outside-in view of your customers’ experiences, and your service’s availability from their point of view.

CloudWatch introduced Embedded Metric Format, which helps you ingest complex high-cardinality application data as logs and easily generate actionable metrics. You can publish these metrics from your Lambda function by using the PutLogEvents API or using an open source library for Node.js or Python applications.

Finally, CloudWatch announced a preview of Contributor Insights, a capability to identify who or what is impacting your system or application performance by identifying outliers or patterns in log data.

AWS X-Ray

AWS X-Ray announced trace maps, which enable you to map the end-to-end path of a single request. Identifiers show issues and how they affect other services in the request’s path. These can help you to identify and isolate service points that are causing degradation or failures.

X-Ray also announced support for Amazon CloudWatch Synthetics, currently in preview. CloudWatch Synthetics on X-Ray support tracing canary scripts throughout the application, providing metrics on performance or application issues.

Screen capture of AWS X-Ray Service map in the AWS Management Console

X-Ray Service map with CloudWatch Synthetics

Amazon DynamoDB

Amazon DynamoDB announced support for customer-managed customer master keys (CMKs) to encrypt data in DynamoDB. This allows customers to bring your own key (BYOK) giving you full control over how you encrypt and manage the security of your DynamoDB data.

It is now possible to add global replicas to existing DynamoDB tables to provide enhanced availability across the globe.

Another new DynamoDB capability to identify frequently accessed keys and database traffic trends is currently in preview. With this, you can now more easily identify “hot keys” and understand usage of your DynamoDB tables.

Screen capture of Amazon CloudWatch Contributor Insights for DynamoDB in the AWS Management Console

CloudWatch Contributor Insights for DynamoDB

DynamoDB also released adaptive capacity. Adaptive capacity helps you handle imbalanced workloads by automatically isolating frequently accessed items and shifting data across partitions to rebalance them. This helps reduce cost by enabling you to provision throughput for a more balanced workload instead of over provisioning for uneven data access patterns.

Amazon RDS

Amazon Relational Database Services (RDS) announced a preview of Amazon RDS Proxy to help developers manage RDS connection strings for serverless applications.

Illustration of Amazon RDS Proxy

The RDS Proxy maintains a pool of established connections to your RDS database instances. This pool enables you to support a large number of application connections so your application can scale without compromising performance. It also increases security by enabling IAM authentication for database access and enabling you to centrally manage database credentials using AWS Secrets Manager.

AWS Serverless Application Repository

The AWS Serverless Application Repository (SAR) now offers Verified Author badges. These badges enable consumers to quickly and reliably know who you are. The badge appears next to your name in the SAR and links to your GitHub profile.

Screen capture of SAR Verifiedl developer badge in the AWS Management Console

SAR Verified developer badges

AWS Developer Tools

AWS CodeCommit launched the ability for you to enforce rule workflows for pull requests, making it easier to ensure that code has pass through specific rule requirements. You can now create an approval rule specifically for a pull request, or create approval rule templates to be applied to all future pull requests in a repository.

AWS CodeBuild added beta support for test reporting. With test reporting, you can now view the detailed results, trends, and history for tests executed on CodeBuild for any framework that supports the JUnit XML or Cucumber JSON test format.

Screen capture of AWS CodeBuild

CodeBuild test trends in the AWS Management Console

Amazon CodeGuru

AWS announced a preview of Amazon CodeGuru at re:Invent 2019. CodeGuru is a machine learning based service that makes code reviews more effective and aids developers in writing code that is more secure, performant, and consistent.

AWS Amplify and AWS AppSync

AWS Amplify added iOS and Android as supported platforms. Now developers can build iOS and Android applications using the Amplify Framework with the same category-based programming model that they use for JavaScript apps.

Screen capture of 'amplify init' for an iOS application in a terminal window

The Amplify team has also improved offline data access and synchronization by announcing Amplify DataStore. Developers can now create applications that allow users to continue to access and modify data, without an internet connection. Upon connection, the data synchronizes transparently with the cloud.

For a summary of Amplify and AppSync announcements before re:Invent, read: “A round up of the recent pre-re:Invent 2019 AWS Amplify Launches”.

Illustration of AWS AppSync integrations with other AWS services

Q4 serverless content

Blog posts

October

November

December

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q4:

Twitch

October

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

AWS Serverless Application Repository (SAR) Apps

In this edition of ICYMI, we are introducing a section devoted to SAR apps written by the AWS Serverless Developer Advocacy team. You can run these applications and review their source code to learn more about serverless and to see examples of suggested practices.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. We’re also kicking off a fresh series of Tech Talks in 2020 with new content providing greater detail on everything new coming out of AWS for serverless application developers.

Throughout 2020, the AWS Serverless Developer Advocates are crossing the globe to tell you more about serverless, and to hear more about what you need. Follow this blog to keep up on new launches and announcements, best practices, and examples of serverless applications in action.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Chris Munns: @chrismunns
Eric Johnson: @edjgeek
James Beswick: @jbesw
Moheeb Zara: @virgilvox
Ben Smith: @benjamin_l_s
Rob Sutter: @rts_rob
Julian Wood: @julian_wood

Happy coding!

Urgent & Important – Rotate Your Amazon RDS, Aurora, and DocumentDB Certificates

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/urgent-important-rotate-your-amazon-rds-aurora-and-documentdb-certificates/

You may have already received an email or seen a console notification, but I don’t want you to be taken by surprise!

Rotate Now
If you are using Amazon Aurora, Amazon Relational Database Service (RDS), or Amazon DocumentDB and are taking advantage of SSL/TLS certificate validation when you connect to your database instances, you need to download & install a fresh certificate, rotate the certificate authority (CA) for the instances, and then reboot the instances.

If you are not using SSL/TLS connections or certificate validation, you do not need to make any updates, but I recommend that you do so in order to be ready in case you decide to use SSL/TLS connections in the future. In this case, you can use a new CLI option that rotates and stages the new certificates but avoids a restart.

The new certificate (CA-2019) is available as part of a certificate bundle that also includes the old certificate (CA-2015) so that you can make a smooth transition without getting into a chicken and egg situation.

What’s Happening?
The SSL/TLS certificates for RDS, Aurora, and DocumentDB expire and are replaced every five years as part of our standard maintenance and security discipline. Here are some important dates to know:

September 19, 2019 – The CA-2019 certificates were made available.

January 14, 2020 – Instances created on or after this date will have the new (CA-2019) certificates. You can temporarily revert to the old certificates if necessary.

February 5 to March 5, 2020 – RDS will stage (install but not activate) new certificates on existing instances. Restarting the instance will activate the certificate.

March 5, 2020 – The CA-2015 certificates will expire. Applications that use certificate validation but have not been updated will lose connectivity.

How to Rotate
Earlier this month I created an Amazon RDS for MySQL database instance and set it aside in preparation for this blog post. As you can see from the screen shot above, the RDS console lets me know that I need to perform a Certificate update.

I visit Using SSL/TLS to Encrypt a Connection to a DB Instance and download a new certificate. If my database client knows how to handle certificate chains, I can download the root certificate and use it for all regions. If not, I download a certificate that is specific to the region where my database instance resides. I decide to download a bundle that contains the old and new root certificates:

Next, I update my client applications to use the new certificates. This process is specific to each app and each database client library, so I don’t have any details to share.

Once the client application has been updated, I change the certificate authority (CA) to rds-ca-2019. I can Modify the instance in the console, and select the new CA:

I can also do this via the CLI:

$ aws rds modify-db-instance --db-instance-identifier database-1 \
  --ca-certificate-identifier rds-ca-2019

The change will take effect during the next maintenance window. I can also apply it immediately:

$ aws rds modify-db-instance --db-instance-identifier database-1 \
  --ca-certificate-identifier rds-ca-2019 --apply-immediately

After my instance has been rebooted (either immediately or during the maintenance window), I test my application to ensure that it continues to work as expected.

If I am not using SSL and want to avoid a restart, I use --no-certificate-rotation-restart:

$ aws rds modify-db-instance --db-instance-identifier database-1 \
  --ca-certificate-identifier rds-ca-2019 --no-certificate-rotation-restart

The database engine will pick up the new certificate during the next planned or unplanned restart.

I can also use the RDS ModifyDBInstance API function or a CloudFormation template to change the certificate authority.

Once again, all of this must be completed by March 5, 2020 or your applications may be unable to connect to your database instance using SSL or TLS.

Things to Know
Here are a couple of important things to know:

Amazon Aurora ServerlessAWS Certificate Manager (ACM) is used to manage certificate rotations for this database engine, and no action is necessary.

Regions – Rotation is needed for database instances in all commercial AWS regions except Asia Pacific (Hong Kong), Middle East (Bahrain), and China (Ningxia).

Cluster Scaling – If you add more nodes to an existing cluster, the new nodes will receive the CA-2019 certificate if one or more of the existing nodes already have it. Otherwise, the CA-2015 certificate will be used.

Learning More
Here are some links to additional information:

Jeff;

 

New for Amazon Redshift – Data Lake Export and Federated Query

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-redshift-data-lake-export-and-federated-queries/

A data warehouse is a database optimized to analyze relational data coming from transactional systems and line of business applications. Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze data using standard SQL and existing Business Intelligence (BI) tools.

To get information from unstructured data that would not fit in a data warehouse, you can build a data lake. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. With a data lake built on Amazon Simple Storage Service (S3), you can easily run big data analytics and use machine learning to gain insights from your semi-structured (such as JSON, XML) and unstructured datasets.

Today, we are launching two new features to help you improve the way you manage your data warehouse and integrate with a data lake:

  • Data Lake Export to unload data from a Redshift cluster to S3 in Apache Parquet format, an efficient open columnar storage format optimized for analytics.
  • Federated Query to be able, from a Redshift cluster, to query across data stored in the cluster, in your S3 data lake, and in one or more Amazon Relational Database Service (RDS) for PostgreSQL and Amazon Aurora PostgreSQL databases.

This architectural diagram gives a quick summary of how these features work and how they can be used together with other AWS services.

Let’s explain the interactions you see in the diagram better, starting from how you can use these features, and the advantages they provide.

Using Redshift Data Lake Export

You can now unload the result of a Redshift query to your S3 data lake in Apache Parquet format. The Parquet format is up to 2x faster to unload and consumes up to 6x less storage in S3, compared to text formats. This enables you to save data transformation and enrichment you have done in Redshift into your S3 data lake in an open format.

You can then analyze the data in your data lake with Redshift Spectrum, a feature of Redshift that allows you to query data directly from files on S3. Or you can use different tools such as Amazon Athena, Amazon EMR, or Amazon SageMaker.

To try this new feature, I create a new cluster from the Redshift console, and follow this tutorial to load sample data that keeps track of sales of musical events across different venues. I want to correlate this data with social media comments on the events stored in my data lake. To understand their relevance, each event should have a way of comparing its relative sales to other events.

Let’s build a query in Redshift to export the data to S3. My data is stored across multiple tables. I need to create a query that gives me a single view of what is going on with sales. I want to join the content of the  sales and date tables, adding information on the gross sales for an event (total_price in the query), and the percentile in terms of all time gross sales compared to all events.

To export the result of the query to S3 in Parquet format, I use the following SQL command:

UNLOAD ('SELECT sales.*, date.*, total_price, percentile
           FROM sales, date,
                (SELECT eventid, total_price, ntile(1000) over(order by total_price desc) / 10.0 as percentile
                   FROM (SELECT eventid, sum(pricepaid) total_price
                           FROM sales
                       GROUP BY eventid)) as percentile_events
          WHERE sales.dateid = date.dateid
            AND percentile_events.eventid = sales.eventid')
TO 's3://MY-BUCKET/DataLake/Sales/'
FORMAT AS PARQUET
CREDENTIALS 'aws_iam_role=arn:aws:iam::123412341234:role/myRedshiftRole';

To give Redshift write access to my S3 bucket, I am using an AWS Identity and Access Management (IAM) role. I can see the result of the UNLOAD command using the AWS Command Line Interface (CLI). As expected, the output of the query is exported using the Parquet columnar data format:

$ aws s3 ls s3://MY-BUCKET/DataLake/Sales/
2019-11-25 14:26:56 1638550 0000_part_00.parquet
2019-11-25 14:26:56 1635489 0001_part_00.parquet
2019-11-25 14:26:56 1624418 0002_part_00.parquet
2019-11-25 14:26:56 1646179 0003_part_00.parquet

To optimize access to data, I can specify one or more partition columns so that unloaded data is automatically partitioned into folders in my S3 bucket. For example, I can unload sales data partitioned by year, month, and day. This enables my queries to take advantage of partition pruning and skip scanning irrelevant partitions, improving query performance and minimizing cost.

To use partitioning, I need to add to the previous SQL command the PARTITION BY option, followed by the columns I want to use to partition the data in different directories. In my case, I want to partition the output based on the year and the calendar date (caldate in the query) of the sales.

UNLOAD ('SELECT sales.*, date.*, total_price, percentile
           FROM sales, date,
                (SELECT eventid, total_price, ntile(1000) over(order by total_price desc) / 10.0 as percentile
                   FROM (SELECT eventid, sum(pricepaid) total_price
                           FROM sales
                       GROUP BY eventid)) as percentile_events
          WHERE sales.dateid = date.dateid
            AND percentile_events.eventid = sales.eventid')
TO 's3://MY-BUCKET/DataLake/SalesPartitioned/'
FORMAT AS PARQUET
PARTITION BY (year, caldate)
CREDENTIALS 'aws_iam_role=arn:aws:iam::123412341234:role/myRedshiftRole';

This time, the output of the query is stored in multiple partitions. For example, here’s the content of a folder for a specific year and date:

$ aws s3 ls s3://MY-BUCKET/DataLake/SalesPartitioned/year=2008/caldate=2008-07-20/
2019-11-25 14:36:17 11940 0000_part_00.parquet
2019-11-25 14:36:17 11052 0001_part_00.parquet
2019-11-25 14:36:17 11138 0002_part_00.parquet
2019-11-25 14:36:18 12582 0003_part_00.parquet

Optionally, I can use AWS Glue to set up a Crawler that (on demand or on a schedule) looks for data in my S3 bucket to update the Glue Data Catalog. When the Data Catalog is updated, I can easily query the data using Redshift Spectrum, Athena, or EMR.

The sales data is now ready to be processed together with the unstructured and semi-structured  (JSON, XML, Parquet) data in my data lake. For example, I can now use Apache Spark with EMR, or any Sagemaker built-in algorithm to access the data and get new insights.

Using Redshift Federated Query
You can now also access data in RDS and Aurora PostgreSQL stores directly from your Redshift data warehouse. In this way, you can access data as soon as it is available. Straight from Redshift, you can now perform queries processing data in your data warehouse, transactional databases, and data lake, without requiring ETL jobs to transfer data to the data warehouse.

Redshift leverages its advanced optimization capabilities to push down and distribute a significant portion of the computation directly into the transactional databases, minimizing the amount of data moving over the network.

Using this syntax, you can add an external schema from an RDS or Aurora PostgreSQL database to a Redshift cluster:

CREATE EXTERNAL SCHEMA IF NOT EXISTS online_system
FROM POSTGRES
DATABASE 'online_sales_db' SCHEMA 'online_system'
URI ‘my-hostname' port 5432
IAM_ROLE 'iam-role-arn'
SECRET_ARN 'ssm-secret-arn';

Schema and port are optional here. Schema will default to public if left unspecified and default port for PostgreSQL databases is 5432. Redshift is using AWS Secrets Manager to manage the credentials to connect to the external databases.

With this command, all tables in the external schema are available and can be used by Redshift for any complex SQL query processing data in the cluster or, using Redshift Spectrum, in your S3 data lake.

Coming back to the sales data example I used before, I can now correlate the trends of my historical data of musical events with real-time sales. In this way, I can understand if an event is performing as expected or not, and calibrate my marketing activities without delays.

For example, after I define the online commerce database as the online_system external schema in my Redshift cluster, I can compare previous sales with what is in the online commerce system with this simple query:

SELECT eventid,
       sum(pricepaid) total_price,
       sum(online_pricepaid) online_total_price
  FROM sales, online_system.current_sales
 GROUP BY eventid
 WHERE eventid = online_eventid;

Redshift doesn’t import database or schema catalog in its entirety. When a query is run, it localizes the metadata for the Aurora and RDS tables (and views) that are part of the query. This localized metadata is then used for query compilation and plan generation.

Available Now
Amazon Redshift data lake export is a new tool to improve your data processing pipeline and is supported with Redshift release version 1.0.10480 or later. Refer to the AWS Region Table for Redshift availability, and check the version of your clusters.

The new federation capability in Amazon Redshift is released as a public preview and allows you to bring together data stored in Redshift, S3, and one or more RDS and Aurora PostgreSQL databases. When creating a cluster in the Amazon Redshift management console, you can pick three tracks for maintenance: Current, Trailing, or Preview. Within the Preview track, preview_features should be chosen to participate to the Federated Query public preview. For example:

These features simplify data processing and analytics, giving you more tools to react quickly, and a single point of view for your data. Let me know what you are going to use them for!

Danilo

New for Amazon Aurora – Use Machine Learning Directly From Your Databases

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-aurora-use-machine-learning-directly-from-your-databases/

Machine Learning allows you to get better insights from your data. But where is most of the structured data stored? In databases! Today, in order to use machine learning with data in a relational database, you need to develop a custom application to read the data from the database and then apply the machine learning model. Developing this application requires a mix of skills to be able to interact with the database and use machine learning. This is a new application, and now you have to manage its performance, availability, and security.

Can we make it easier to apply machine learning to data in a relational database? Even for existing applications?

Starting today, Amazon Aurora is natively integrated with two AWS machine learning services:

  • Amazon SageMaker, a service providing you with the ability to build, train, and deploy custom machine learning models quickly.
  • Amazon Comprehend, a natural language processing (NLP) service that uses machine learning to find insights in text.

Using this new functionality, you can use a SQL function in your queries to apply a machine learning model to the data in your relational database. For example, you can detect the sentiment of a user comment using Comprehend, or apply a custom machine learning model built with SageMaker to estimate the risk of “churn” for your customers. Churn is a word mixing “change” and “turn” and is used to describe customers that stop using your services.

You can store the output of a large query including the additional information from machine learning services in a new table, or use this feature interactively in your application by just changing the SQL code run by the clients, with no machine learning experience required.

Let’s see a couple of examples of what you can do from an Aurora database, first by using Comprehend, then SageMaker.

Configuring Database Permissions
The first step is to give the database permissions to access the services you want to use: Comprehend, SageMaker, or both. In the RDS console, I create a new Aurora MySQL 5.7 database. When it is available, in the Connectivity & security tab of the regional endpoint, I look for the Manage IAM roles section.

There I connect Comprehend and SageMaker to this database cluster. For SageMaker, I need to provide the Amazon Resource Name (ARN) of the endpoint of a deployed machine learning model. If you want to use multiple endpoints, you need to repeat this step. The console takes care of creating the service roles for the Aurora database to access those services in order for the new machine learning integration to work.

Using Comprehend from Amazon Aurora
I connect to the database using a MySQL client. To run my tests, I create a table storing comments for a blogging platform and insert a few sample records:

CREATE TABLE IF NOT EXISTS comments (
       comment_id INT AUTO_INCREMENT PRIMARY KEY,
       comment_text VARCHAR(255) NOT NULL
);

INSERT INTO comments (comment_text)
VALUES ("This is very useful, thank you for writing it!");
INSERT INTO comments (comment_text)
VALUES ("Awesome, I was waiting for this feature.");
INSERT INTO comments (comment_text)
VALUES ("An interesting write up, please add more details.");
INSERT INTO comments (comment_text)
VALUES ("I don’t like how this was implemented.");

To detect the sentiment of the comments in my table, I can use the aws_comprehend_detect_sentiment and aws_comprehend_detect_sentiment_confidence SQL functions:

SELECT comment_text,
       aws_comprehend_detect_sentiment(comment_text, 'en') AS sentiment,
       aws_comprehend_detect_sentiment_confidence(comment_text, 'en') AS confidence
  FROM comments;

The aws_comprehend_detect_sentiment function returns the most probable sentiment for the input text: POSITIVE, NEGATIVE, or NEUTRAL. The aws_comprehend_detect_sentiment_confidence function returns the confidence of the sentiment detection, between 0 (not confident at all) and 1 (fully confident).

Using SageMaker Endpoints from Amazon Aurora
Similarly to what I did with Comprehend, I can access a SageMaker endpoint to enrich the information stored in my database. To see a practical use case, let’s implement the customer churn example mentioned at the beginning of this post.

Mobile phone operators have historical records on which customers ultimately ended up churning and which continued using the service. We can use this historical information to construct a machine learning model. As input for the model, we’re looking at the current subscription plan, how much the customer is speaking on the phone at different times of day, and how often has called customer service.

Here’s the structure of my customer table:

SHOW COLUMNS FROM customers;

To be able to identify customers at risk of churn, I train a model following this sample SageMaker notebook using the XGBoost algorithm. When the model has been created, it’s deployed to a hosted endpoint.

When the SageMaker endpoint is in service, I go back to the Manage IAM roles section of the console to give the Aurora database permissions to access the endpoint ARN.

Now, I create a new will_churn SQL function giving input to the endpoint the parameters required by the model:

CREATE FUNCTION will_churn (
       state varchar(2048), acc_length bigint(20),
       area_code bigint(20), int_plan varchar(2048),
       vmail_plan varchar(2048), vmail_msg bigint(20),
       day_mins double, day_calls bigint(20),
       eve_mins double, eve_calls bigint(20),
       night_mins double, night_calls bigint(20),
       int_mins double, int_calls bigint(20),
       cust_service_calls bigint(20))
RETURNS varchar(2048) CHARSET latin1
       alias aws_sagemaker_invoke_endpoint
       endpoint name 'estimate_customer_churn_endpoint_version_123';

As you can see, the model looks at the customer’s phone subscription details and service usage patterns to identify the risk of churn. Using the will_churn SQL function, I run a query over my customers table to flag customers based on my machine learning model. To store the result of the query, I create a new customers_churn table:

CREATE TABLE customers_churn AS
SELECT *, will_churn(state, acc_length, area_code, int_plan,
       vmail_plan, vmail_msg, day_mins, day_calls,
       eve_mins, eve_calls, night_mins, night_calls,
       int_mins, int_calls, cust_service_calls) will_churn
  FROM customers;

Let’s see a few records from the customers_churn table:

SELECT * FROM customers_churn LIMIT 7;

I am lucky the first 7 customers are apparently not going to churn. But what happens overall? Since I stored the results of the will_churn function, I can run a SELECT GROUP BY statement on the customers_churn table.

SELECT will_churn, COUNT(*) FROM customers_churn GROUP BY will_churn;

Starting from there, I can dive deep to understand what brings my customers to churn.

If I create a new version of my machine learning model, with a new endpoint ARN, I can recreate the will_churn function without changing my SQL statements.

Available Now
The new machine learning integration is available today for Aurora MySQL 5.7, with the SageMaker integration generally available and the Comprehend integration in preview. You can learn more in the documentation. We are working on other engines and versions: Aurora MySQL 5.6 and Aurora PostgreSQL 10 and 11 are coming soon.

The Aurora machine learning integration is available in all regions in which the underlying services are available. For example, if both Aurora MySQL 5.7 and SageMaker are available in a region, then you can use the integration for SageMaker. For a complete list of services availability, please see the AWS Regional Table.

There’s no additional cost for using the integration, you just pay for the underlying services at your normal rates. Pay attention to the size of your queries when using Comprehend. For example, if you do sentiment analysis on user feedback in your customer service web page, to contact those who made particularly positive or negative comments, and people are making 10,000 comments a day, you’d pay $3/day. To optimize your costs, remember to store results.

It’s never been easier to apply machine learning models to data stored in your relational databases. Let me know what you are going to build with this!

Danilo

ICYMI: Serverless Q3 2019

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q3-2019/

This post is courtesy of Julian Wood, Senior Developer Advocate – AWS Serverless

Welcome to the seventh edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

ICYMI calendar

Launches/New products

Amazon EventBridge was technically launched in this quarter although we were so excited to let you know, we squeezed it into the Q2 2019 update. If you missed it, EventBridge is the serverless event bus that connects application data from your own apps, SaaS, and AWS services. This allows you to create powerful event-driven serverless applications using a variety of event sources.

The AWS Bahrain Region has opened, the official name is Middle East (Bahrain) and the API name is me-south-1. AWS Cloud now spans 22 geographic Regions with 69 Availability Zones around the world.

AWS Lambda

In September we announced dramatic improvements in cold starts for Lambda functions inside a VPC. With this announcement, you see faster function startup performance and more efficient usage of elastic network interfaces, drastically reducing VPC cold starts.

VPC to VPC NAT

These improvements are rolling out to all existing and new VPC functions at no additional cost. Rollout is ongoing, you can track the status from the announcement post.

AWS Lambda now supports custom batch window for Kinesis and DynamoDB Event sources, which helps fine-tune Lambda invocation for cost optimization.

You can now deploy Amazon Machine Images (AMIs) and Lambda functions together from the AWS Marketplace using using AWS CloudFormation with just a few clicks.

AWS IoT Events actions now support AWS Lambda as a target. Previously you could only define actions to publish messages to SNS and MQTT. Now you can define actions to invoke AWS Lambda functions and even more targets, such as Amazon Simple Queue Service and Amazon Kinesis Data Firehose, and republish messages to IoT Events.

The AWS Lambda Console now shows recent invocations using CloudWatch Logs Insights. From the monitoring tab in the console, you can view duration, billing, and memory statistics for the 10 most recent invocations.

AWS Step Functions

AWS Step Functions example

AWS Step Functions has now been extended to support probably its most requested feature, Dynamic Parallelism, which allows steps within a workflow to be executed in parallel, with a new Map state type.

One way to use the new Map state is for fan-out or scatter-gather messaging patterns in your workflows:

  • Fan-out is applied when delivering a message to multiple destinations, and can be useful in workflows such as order processing or batch data processing. For example, you can retrieve arrays of messages from Amazon SQS and Map sends each message to a separate AWS Lambda function.
  • Scatter-gather broadcasts a single message to multiple destinations (scatter), and then aggregates the responses back for the next steps (gather). This is useful in file processing and test automation. For example, you can transcode ten 500-MB media files in parallel, and then join to create a 5-GB file.

Another important update is AWS Step Functions adds support for nested workflows, which allows you to orchestrate more complex processes by composing modular, reusable workflows.

AWS Amplify

A new Predictions category as been added to the Amplify Framework to quickly add machine learning capabilities to your web and mobile apps.

Amplify framework

With a few lines of code you can add and configure AI/ML services to configure your app to:

  • Identify text, entities, and labels in images using Amazon Rekognition, or identify text in scanned documents to get the contents of fields in forms and information stored in tables using Amazon Textract.
  • Convert text into a different language using Amazon Translate, text to speech using Amazon Polly, and speech to text using Amazon Transcribe.
  • Interpret text to find the dominant language, the entities, the key phrases, the sentiment, or the syntax of unstructured text using Amazon Comprehend.

AWS Amplify CLI (part of the open source Amplify Framework) has added local mocking and testing. This allows you to mock some of the most common cloud services and test your application 100% locally.

For this first release, the Amplify CLI can mock locally:

amplify mock

AWS CloudFormation

The CloudFormation team has released the much-anticipated CloudFormation Coverage Roadmap.

Styled after the popular AWS Containers Roadmap, the CloudFormation Coverage Roadmap provides transparency about our priorities, and the opportunity to provide your input.

The roadmap contains four columns:

  • Shipped – Available for use in production in all public AWS Regions.
  • Coming Soon – Generally a few months out.
  • We’re working on It – Work in progress, but further out.
  • Researching – We’re thinking about the right way to implement the coverage.

AWS CloudFormation roadmap

Amazon DynamoDB

NoSQL Workbench for Amazon DynamoDB has been released in preview. This is a free, client-side application available for Windows and macOS. It helps you more easily design and visualize your data model, run queries on your data, and generate the code for your application.

Amazon Aurora

Amazon Aurora Serverless is a dynamically scaling version of Amazon Aurora. It automatically starts up, shuts down, and scales up or down, based on your application workload.

Aurora Serverless has had a MySQL compatible edition for a while, now we’re excited to bring more serverless joy to databases with the PostgreSQL compatible version now GA.

We also have a useful post on Reducing Aurora PostgreSQL storage I/O costs.

AWS Serverless Application Repository

The AWS Serverless Application Repository has had some useful SAR apps added by Serverless Developer Advocate James Beswick.

  • S3 Auto Translator which automatically converts uploaded objects into other languages specified by the user, using Amazon Translate.
  • Serverless S3 Uploader allows you to upload JPG files to Amazon S3 buckets from your web applications using presigned URLs.

Serverless posts

July

August

September

Tech talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page.

Here are the ones from Q3:

Twitch

July

August

September

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

AWS re:Invent

AWS re:Invent

December 2 – 6 in Las Vegas, Nevada is peak AWS learning time with AWS re:Invent 2019. Join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements.

Be sure to take a look at the growing catalog of serverless sessions this year. Make sure to book time for Builders SessionsChalk Talks, and Workshops as these sessions will fill up quickly. The schedule is updated regularly so if your session is currently fully booked, a repeat may be scheduled.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft.

Our friends at IOPipe have written 5 tips for avoiding serverless FOMO at this year’s re:Invent.

AWS Serverless Heroes

We are excited to welcome some new AWS Serverless Heroes to help grow the serverless community. We look forward to some amazing content to help you with your serverless journey.

Still looking for more?

The Serverless landing page has much more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

 

ICYMI: Serverless Q2 2019

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/icymi-serverless-q2-2019/

This post is courtesy of Moheeb Zara, Senior Developer Advocate – AWS Serverless

Welcome to the sixth edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, checkout what happened last quarter here.

April - June 2019

Amazon EventBridge

Before we dive in to all that happened in Q2, we’re excited about this quarter’s launch of Amazon EventBridge, the serverless event bus that connects application data from your own apps, SaaS, and AWS-as-a-service. This allows you to create powerful event-driven serverless applications using a variety of event sources.

Our very own AWS Solutions Architect, Mike Deck, sat down with AWS Serverless Hero Jeremy Daly and recorded a podcast on Amazon EventBridge. It’s a worthy listen if you’re interested in exploring all the features offered by this launch.

Now, back to Q2, here’s what’s new.

AWS Lambda

Lambda Monitoring

Amazon CloudWatch Logs Insights now allows you to see statistics from recent invocations of your Lambda functions in the Lambda monitoring tab.

Additionally, as of June, you can monitor the [email protected] functions associated with your Amazon CloudFront distributions directly from your Amazon CloudFront console. This includes a revamped monitoring dashboard for CloudFront distributions and [email protected] functions.

AWS Step Functions

Step Functions

AWS Step Functions now supports workflow execution events, which help in the building and monitoring of even-driven serverless workflows. Automatic Execution event notifications can be delivered upon start/completion of CloudWatch Events/Amazon EventBridge. This allows services such as AWS Lambda, Amazon SNS, Amazon Kinesis, or AWS Step Functions to respond to these events.

Additionally you can use callback patterns to automate workflows for applications with human activities and custom integrations with third-party services. You create callback patterns in minutes with less code to write and maintain, run without servers and infrastructure to manage, and scale reliably.

Amazon API Gateway

API Gateway Tag Based Control

Amazon API Gateway now offers tag-based access control for WebSocket APIs using AWS Identity and Access Management (IAM) policies, allowing you to categorize API Gateway resources for WebSocket APIs by purpose, owner, or other criteria.  With the addition of tag-based access control to WebSocket resources, you can now give permissions to WebSocket resources at various levels by creating policies based on tags. For example, you can grant full access to admins to while limiting access to developers.

You can now enforce a minimum Transport Layer Security (TLS) version and cipher suites through a security policy for connecting to your Amazon API Gateway custom domain.

In addition, Amazon API Gateway now allows you to define VPC Endpoint policies, enabling you to specify which Private APIs a VPC Endpoint can connect to. This enables granular security control using VPC Endpoint policies.

AWS Amplify

Amplify CLI (part of the open source Amplify Framework) now includes support for adding and configuring AWS Lambda triggers for events when using Amazon Cognito, Amazon Simple Storage Service, and Amazon DynamoDB as event sources. This means you can setup custom authentication flows for mobile and web applications via the Amplify CLI and Amazon Cognito User Pool as an authentication provider.

Amplify Console

Amplify Console,  a Git-based workflow for continuous deployment and hosting for fullstack serverless web apps, launched several updates to the build service including SAM CLI and custom container support.

Amazon Kinesis

Amazon Kinesis Data Firehose can now utilize AWS PrivateLink to securely ingest data. AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises applications, securely over the Amazon network. When AWS PrivateLink is used with Amazon Kinesis Data Firehose, all traffic to a Kinesis Data Firehose from a VPC flows over a private connection.

You can now assign AWS resource tags to applications in Amazon Kinesis Data Analytics. These key/value tags can be used to organize and identify resources, create cost allocation reports, and control access to resources within Amazon Kinesis Data Analytics.

Amazon Kinesis Data Firehose is now available in the AWS GovCloud (US-East), Europe (Stockholm), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and EU (London) regions.

For a complete list of where Amazon Kinesis Data Analytics is available, please see the AWS Region Table.

AWS Cloud9

Cloud9 Quick Starts

Amazon Web Services (AWS) Cloud9 integrated development environment (IDE) now has a Quick Start which deploys in the AWS cloud in about 30 minutes. This enables organizations to provide developers a powerful cloud-based IDE that can edit, run, and debug code in the browser and allow easy sharing and collaboration.

AWS Cloud9 is also now available in the EU (Frankfurt) and Asia Pacific (Tokyo) regions. For a current list of supported regions, see AWS Regions and Endpoints in the AWS documentation.

Amazon DynamoDB

You can now tag Amazon DynamoDB tables when you create them. Tags are labels you can attach to AWS resources to make them easier to manage, search, and filter.  Tagging support has also been extended to the AWS GovCloud (US) Regions.

DynamoDBMapper now supports Amazon DynamoDB transactional API calls. This support is included within the AWS SDK for Java. These transactional APIs provide developers atomic, consistent, isolated, and durable (ACID) operations to help ensure data correctness.

Amazon DynamoDB now applies adaptive capacity in real time in response to changing application traffic patterns, which helps you maintain uninterrupted performance indefinitely, even for imbalanced workloads.

AWS Training and Certification has launched Amazon DynamoDB: Building NoSQL Database–Driven Applications, a new self-paced, digital course available exclusively on edX.

Amazon Aurora

Amazon Aurora Serverless MySQL 5.6 can now be accessed using the built-in Data API enabling you to access Aurora Serverless with web services-based applications, including AWS LambdaAWS AppSync, and AWS Cloud9. For more check out this post.

Sharing snapshots of Aurora Serverless DB clusters with other AWS accounts or publicly is now possible. We are also giving you the ability to copy Aurora Serverless DB cluster snapshots across AWS regions.

You can now set the minimum capacity of your Aurora Serverless DB clusters to 1 Aurora Capacity Unit (ACU). With Aurora Serverless, you specify the minimum and maximum ACUs for your Aurora Serverless DB cluster instead of provisioning and managing database instances. Each ACU is a combination of processing and memory capacity. By setting the minimum capacity to 1 ACU, you can keep your Aurora Serverless DB cluster running at a lower cost.

AWS Serverless Application Repository

The AWS Serverless Application Repository is now available in 17 regions with the addition of the AWS GovCloud (US-West) region.

Region support includes Asia Pacific (Mumbai, Singapore, Sydney, Tokyo), Canada (Central), EU (Frankfurt, Ireland, London, Paris, Stockholm), South America (São Paulo), US West (N. California, Oregon), and US East (N. Virginia, Ohio).

Amazon Cognito

Amazon Cognito has launched a new API – AdminSetUserPassword – for the Cognito User Pool service that provides a way for administrators to set temporary or permanent passwords for their end users. This functionality is available for end users even when their verified phone or email are unavailable.

Serverless Posts

April

May

June

Events

Events this quarter

Senior Developer Advocates for AWS Serverless spoke at several conferences this quarter. Here are some recordings worth watching!

Tech Talks

We hold several AWS Online Tech Talks covering serverless tech talks throughout the year, so look out for them in the Serverless section of the AWS Online Tech Talks page. Here are the ones from Q2.

Twitch

Twitch Series

In April, we started a 13-week deep dive into building APIs on AWS as part of our Twitch Build On series. The Building Happy Little APIs series covers the common and not-so-common use cases for APIs on AWS and the features available to customers as they look to build secure, scalable, efficient, and flexible APIs.

There are also a number of other helpful video series covering Serverless available on the AWS Twitch Channel.

Build with Serverless on Twitch

Serverless expert and AWS Specialist Solutions architect, Heitor Lessa, has been hosting a weekly Twitch series since April. Join him and others as they build an end-to-end airline booking solution using serverless. The final episode airs on August 7th at Wednesday 8:00am PT.

Here’s a recap of the last quarter:

AWS re:Invent

AWS re:Invent 2019

AWS re:Invent 2019 is around the corner! From December 2 – 6 in Las Vegas, Nevada, join tens of thousands of AWS customers to learn, share ideas, and see exciting keynote announcements. Be sure to take a look at the growing catalog of serverless sessions this year.

Register for AWS re:Invent now!

What did we do at AWS re:Invent 2018? Check out our recap here: AWS re:Invent 2018 Recap at the San Francisco Loft

AWS Serverless Heroes

We urge you to explore the efforts of our AWS Serverless Heroes Community. This is a worldwide network of AWS Serverless experts with a diverse background of experience. For example, check out this post from last month where Marcia Villalba demonstrates how to set up unit tests for serverless applications.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

Amazon Aurora PostgreSQL Serverless – Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-aurora-postgresql-serverless-now-generally-available/

The database is usually the most critical part of a software architecture and managing databases, especially relational ones, has never been easy. For this reason, we created Amazon Aurora Serverless, an auto-scaling version of Amazon Aurora that automatically starts up, shuts down and scales up or down based on your application workload.

The MySQL-compatible edition of Aurora Serverless has been available for some time now. I am pleased to announce that the PostgreSQL-compatible edition of Aurora Serverless is generally available today.

Before moving on with details, I take the opportunity to congratulate the Amazon Aurora development team that has just won the 2019 Association for Computing Machinery’s (ACM) Special Interest Group on Management of Data (SIGMOD) Systems Award!

When you create a database with Aurora Serverless, you set the minimum and maximum capacity. Your client applications transparently connect to a proxy fleet that routes the workload to a pool of resources that are automatically scaled. Scaling is very fast because resources are “warm” and ready to be added to serve your requests.

 

There is no change with Aurora Serverless on how storage is managed by Aurora. The storage layer is independent from the compute resources used by the database. There is no need to provision storage in advance. The minimum storage is 10GB and, based on the database usage, the Amazon Aurora storage will automatically grow, up to 64 TB, in 10GB increments with no impact to database performance.

Creating an Aurora Serverless PostgreSQL Database
Let’s start an Aurora Serverless PostgreSQL database and see the automatic scalability at work. From the Amazon RDS console, I select to create a database using Amazon Aurora as engine. Currently, Aurora serverless is compatible with PostgreSQL version 10.5. Selecting that version, the serverless option becomes available.

I give the new DB cluster an identifier, choose my master username, and let Amazon RDS generate a password for me. I will be able to retrieve my credentials during database creation.

I can now select the minimum and maximum capacity for my database, in terms of Aurora Capacity Units (ACUs), and in the additional scaling configuration I choose to pause compute capacity after 5 minutes of inactivity. Based on my settings, Aurora Serverless automatically creates scaling rules for thresholds for CPU utilization, connections, and available memory.

Testing Some Load on the Database
To generate some load on the database I am using sysbench on an EC2 instance. There are a couple of Lua scripts bundled with sysbench that can help generate an online transaction processing (OLTP) workload:

  • The first script, parallel_prepare.lua, generates 100,000 rows per table for 24 tables.
  • The second script, oltp.lua, generates workload against those data using 64 worker threads.

By using those scripts, I start generating load on my database cluster. As you can see from this graph, taken from the RDS console monitoring tab, the serverless database capacity grows and shrinks to follow my requirements. The metric shown on this graph is the number of ACUs used by the database cluster. First it scales up to accommodate the sysbench workload. When I stop the load generator, it scales down and then pauses.

Available Now
Aurora Serverless PostgreSQL is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo). With Aurora Serverless, you pay on a per-second basis for the database capacity you use when the database is active, plus the usual Aurora storage costs.

For more information on Amazon Aurora, I recommend this great post explaining why and how it was created:

Amazon Aurora ascendant: How we designed a cloud-native relational database

It’s never been so easy to use a relational database in production. I am so excited to see what you are going to use it for!

New – Data API for Amazon Aurora Serverless

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-data-api-for-amazon-aurora-serverless/

If you have ever written code that accesses a relational database, you know the drill. You open a connection, use it to process one or more SQL queries or other statements, and then close the connection. You probably used a client library that was specific to your operating system, programming language, and your database. At some point you realized that creating connections took a lot of clock time and consumed memory on the database engine, and soon after found out that you could (or had to) deal with connection pooling and other tricks. Sound familiar?

The connection-oriented model that I described above is adequate for traditional, long-running programs where the setup time can be amortized over hours or even days. It is not, however, a great fit for serverless functions that are frequently invoked and that run for time intervals that range from milliseconds to minutes. Because there is no long-running server, there’s no place to store a connection identifier for reuse.

Aurora Serverless Data API
In order to resolve this mismatch between serverless applications and relational databases, we are launching a Data API for the MySQL-compatible version of Amazon Aurora Serverless. This API frees you from the complexity and overhead that come along with traditional connection management, and gives you the power to quickly and easily execute SQL statements that access and modify your Amazon Aurora Serverless Database instances.

The Data API is designed to meet the needs of both traditional and serverless apps. It takes care of managing and scaling long-term connections to the database and returns data in JSON form for easy parsing. All traffic runs over secure HTTPS connections. It includes the following functions:

ExecuteStatement – Run a single SQL statement, optionally within a transaction.

BatchExecuteStatement – Run a single SQL statement across an array of data, optionally within a transaction.

BeginTransaction – Begin a transaction, and return a transaction identifier. Transactions are expected to be short (generally 2 to 5 minutes).

CommitTransaction – End a transaction and commit the operations that took place within it.

RollbackTransaction – End a transaction without committing the operations that took place within it.

Each function must run to completion within 1 minute, and can return up to 1 megabyte of data.

Using the Data API
I can use the Data API from the Amazon RDS Console, the command line, or by writing code that calls the functions that I described above. I’ll show you all three in this post.

The Data API is really easy to use! The first step is to enable it for the desired Amazon Aurora Serverless database. I open the Amazon RDS Console, find & select the cluster, and click Modify:

Then I scroll down to the Network & Security section, click Data API, and Continue:

On the next page I choose to apply the settings immediately, and click Modify cluster:

Now I need to create a secret to store the credentials that are needed to access my database. I open the Secrets Manager Console and click Store a new secret. I leave Credentials for RDS selected, enter a valid database user name and password, optionally choose a non-default encryption key, and then select my serverless database. Then I click Next:

I name my secret and tag it, and click Next to configure it:

I use the default values on the next page, click Next again, and now I have a brand new secret:

Now I need two ARNs, one for the database and one for the secret. I fetch both from the console, first for the database:

And then for the secret:

The pair of ARNs (database and secret) provides me with access to my database, and I will protect them accordingly!

Using the Data API from the Amazon RDS Console
I can use the Query Editor in the Amazon RDS Console to run queries that call the Data API. I open the console and click Query Editor, and create a connection to the database. I select the cluster, enter my credentials, and pre-select the table of interest. Then I click Connect to database to proceed:

I enter a query and click Run, and view the results within the editor:

Using the Data API from the Command Line
I can exercise the Data API from the command line:

$ aws rds-data execute-statement \
  --secret-arn "arn:aws:secretsmanager:us-east-1:123456789012:secret:aurora-serverless-data-api-sl-admin-2Ir1oL" \
  --resource-arn "arn:aws:rds:us-east-1:123456789012:cluster:aurora-sl-1" \
  --database users \
  --sql "show tables" \
  --output json

I can use jq to pick out the part of the result that is of interest to me:

... | jq .records
[
  {
    "values": [
      {
        "stringValue": "users"
      }
    ]
  }
]

I can query the table and get the results (the SQL statement is "select * from users where userid='jeffbarr'"):

... | jq .records
[
  {
    "values": [
      {
        "stringValue": "jeffbarr"
      },
      {
        "stringValue": "Jeff"
      },
      {
        "stringValue": "Barr"
      }
    ]
  }

If I specify --include-result-metadata, the query also returns data that describes the columns of the result (I’ll show only the first one in the interest of frugality):

... | jq .columnMetadata[0]
{
  "type": 12,
  "name": "userid",
  "label": "userid",
  "nullable": 1,
  "isSigned": false,
  "arrayBaseColumnType": 0,
  "scale": 0,
  "schemaName": "",
  "tableName": "users",
  "isCaseSensitive": false,
  "isCurrency": false,
  "isAutoIncrement": false,
  "precision": 15,
  "typeName": "VARCHAR"
}

The Data API also allows me to wrap a series of statements in a transaction, and then either commit or rollback. Here’s how I do that (I’m omitting --secret-arn and --resource-arn for clarity):

$ $ID=`aws rds-data begin-transaction --database users --output json | jq .transactionId`
$ echo $ID
"ATP6Gz88GYNHdwNKaCt/vGhhKxZs2QWjynHCzGSdRi9yiQRbnrvfwF/oa+iTQnSXdGUoNoC9MxLBwyp2XbO4jBEtczBZ1aVWERTym9v1WVO/ZQvyhWwrThLveCdeXCufy/nauKFJdl79aZ8aDD4pF4nOewB1aLbpsQ=="

$ aws rds-data execute-statement --transaction-id $ID --database users --sql "..."
$ ...
$ aws rds-data execute-statement --transaction-id $ID --database users --sql "..."
$ aws rds-data commit-transaction $ID

If I decide not to commit, I invoke rollback-transaction instead.

Using the Data API with Python and Boto
Since this is an API, programmatic access is easy. Here’s some very simple Python / Boto code:

import boto3

client = boto3.client('rds-data')

response = client.execute_sql(
    secretArn   = 'arn:aws:secretsmanager:us-east-1:123456789012:secret:aurora-serverless-data-api-sl-admin-2Ir1oL',
    database    = 'users',
    resourceArn = 'arn:aws:rds:us-east-1:123456789012:cluster:aurora-sl-1',
    sql         = 'select * from users'
)

for user in response['records']:
  userid     = user[0]['stringValue']
  first_name = user[1]['stringValue']
  last_name  = user[2]['stringValue']
  print(userid + ' ' + first_name + ' ' + last_name)

And the output:

$ python data_api.py
jeffbarr Jeff Barr
carmenbarr Carmen Barr

Genuine, production-quality code would reference the table columns symbolically using the metadata that is returned as part of the response.

By the way, my Amazon Aurora Serverless cluster was configured to scale capacity all the way down to zero when not active. Here’s what the scaling activity looked like while I was writing this post and running the queries:

Now Available
You can make use of the Data API today in the US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland) Regions. There is no charge for the API, but you will pay the usual price for data transfer out of AWS.

Jeff;

New – Parallel Query for Amazon Aurora

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-parallel-query-for-amazon-aurora/

Amazon Aurora is a relational database that was designed to take full advantage of the abundance of networking, processing, and storage resources available in the cloud. While maintaining compatibility with MySQL and PostgreSQL on the user-visible side, Aurora makes use of a modern, purpose-built distributed storage system under the covers. Your data is striped across hundreds of storage nodes distributed over three distinct AWS Availability Zones, with two copies per zone, on fast SSD storage. Here’s what this looks like (extracted from Getting Started with Amazon Aurora):

New Parallel Query
When we launched Aurora we also hinted at our plans to apply the same scale-out design principle to other layers of the database stack. Today I would like to tell you about our next step along that path.

Each node in the storage layer pictured above also includes plenty of processing power. Aurora is now able to make great use of that processing power by taking your analytical queries (generally those that process all or a large part of a good-sized table) and running them in parallel across hundreds or thousands of storage nodes, with speed benefits approaching two orders of magnitude. Because this new model reduces network, CPU, and buffer pool contention, you can run a mix of analytical and transactional queries simultaneously on the same table while maintaining high throughput for both types of queries.

The instance class determines the number of parallel queries that can be active at a given time:

  • db.r*.large – 1 concurrent parallel query session
  • db.r*.xlarge – 2 concurrent parallel query sessions
  • db.r*.2xlarge – 4 concurrent parallel query sessions
  • db.r*.4xlarge – 8 concurrent parallel query sessions
  • db.r*.8xlarge – 16 concurrent parallel query sessions
  • db.r4.16xlarge – 16 concurrent parallel query sessions

You can use the aurora_pq parameter to enable and disable the use of parallel queries at the global and the session level.

Parallel queries enhance the performance of over 200 types of single-table predicates and hash joins. The Aurora query optimizer will automatically decide whether to use Parallel Query based on the size of the table and the amount of table data that is already in memory; you can also use the aurora_pq_force session variable to override the optimizer for testing purposes.

Parallel Query in Action
You will need to create a fresh cluster in order to make use of the Parallel Query feature. You can create one from scratch, or you can restore a snapshot.

To create a cluster that supports Parallel Query, I simply choose Provisioned with Aurora parallel query enabled as the Capacity type:

I used the CLI to restore a 100 GB snapshot for testing, and then explored one of the queries from the TPC-H benchmark. Here’s the basic query:

SELECT
  l_orderkey,
  SUM(l_extendedprice * (1-l_discount)) AS revenue,
  o_orderdate,
  o_shippriority

FROM customer, orders, lineitem

WHERE
  c_mktsegment='AUTOMOBILE'
  AND c_custkey = o_custkey
  AND l_orderkey = o_orderkey
  AND o_orderdate < date '1995-03-13'
  AND l_shipdate > date '1995-03-13'

GROUP BY
  l_orderkey,
  o_orderdate,
  o_shippriority

ORDER BY
  revenue DESC,
  o_orderdate LIMIT 15;

The EXPLAIN command shows the query plan, including the use of Parallel Query:

+----+-------------+----------+------+-------------------------------+------+---------+------+-----------+--------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table    | type | possible_keys                 | key  | key_len | ref  | rows      | Extra                                                                                                                          |
+----+-------------+----------+------+-------------------------------+------+---------+------+-----------+--------------------------------------------------------------------------------------------------------------------------------+
|  1 | SIMPLE      | customer | ALL  | PRIMARY                       | NULL | NULL    | NULL |  14354602 | Using where; Using temporary; Using filesort                                                                                   |
|  1 | SIMPLE      | orders   | ALL  | PRIMARY,o_custkey,o_orderdate | NULL | NULL    | NULL | 154545408 | Using where; Using join buffer (Hash Join Outer table orders); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra)   |
|  1 | SIMPLE      | lineitem | ALL  | PRIMARY,l_shipdate            | NULL | NULL    | NULL | 606119300 | Using where; Using join buffer (Hash Join Outer table lineitem); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra) |
+----+-------------+----------+------+-------------------------------+------+---------+------+-----------+--------------------------------------------------------------------------------------------------------------------------------+
3 rows in set (0.01 sec)

Here is the relevant part of the Extras column:

Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra)

The query runs in less than 2 minutes when Parallel Query is used:

+------------+-------------+-------------+----------------+
| l_orderkey | revenue     | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
|   92511430 | 514726.4896 | 1995-03-06  |              0 |
|  593851010 | 475390.6058 | 1994-12-21  |              0 |
|  188390981 | 458617.4703 | 1995-03-11  |              0 |
|  241099140 | 457910.6038 | 1995-03-12  |              0 |
|  520521156 | 457157.6905 | 1995-03-07  |              0 |
|  160196293 | 456996.1155 | 1995-02-13  |              0 |
|  324814597 | 456802.9011 | 1995-03-12  |              0 |
|   81011334 | 455300.0146 | 1995-03-07  |              0 |
|   88281862 | 454961.1142 | 1995-03-03  |              0 |
|   28840519 | 454748.2485 | 1995-03-08  |              0 |
|  113920609 | 453897.2223 | 1995-02-06  |              0 |
|  377389669 | 453438.2989 | 1995-03-07  |              0 |
|  367200517 | 453067.7130 | 1995-02-26  |              0 |
|  232404000 | 452010.6506 | 1995-03-08  |              0 |
|   16384100 | 450935.1906 | 1995-03-02  |              0 |
+------------+-------------+-------------+----------------+
15 rows in set (1 min 53.36 sec)

I can disable Parallel Query for the session (I can use an RDS custom cluster parameter group for a longer-lasting effect):

set SESSION aurora_pq=OFF;

The query runs considerably slower without it:

+------------+-------------+-------------+----------------+
| l_orderkey | o_orderdate | revenue     | o_shippriority |
+------------+-------------+-------------+----------------+
|   92511430 | 1995-03-06  | 514726.4896 |              0 |
...
|   16384100 | 1995-03-02  | 450935.1906 |              0 |
+------------+-------------+-------------+----------------+
15 rows in set (1 hour 25 min 51.89 sec)

This was on a db.r4.2xlarge instance; other instance sizes, data sets, access patterns, and queries will perform differently. I can also override the query optimizer and insist on the use of Parallel Query for testing purposes:

set SESSION aurora_pq_force=ON;

Things to Know
Here are a couple of things to keep in mind when you start to explore Amazon Aurora Parallel Query:

Engine Support – We are launching with support for MySQL 5.6, and are working on support for MySQL 5.7 and PostgreSQL.

Table Formats – The table row format must be COMPACT; partitioned tables are not supported.

Data Types – The TEXT, BLOB, and GEOMETRY data types are not supported.

DDL – The table cannot have any pending fast online DDL operations.

Cost – You can make use of Parallel Query at no extra charge. However, because it makes direct access to storage, there is a possibility that your IO cost will increase.

Give it a Shot
This feature is available now and you can start using it today!

Jeff;

 

Migrating a multi-tier application from a Microsoft Hyper-V environment using AWS SMS and AWS Migration Hub

Post Syndicated from Martin Yip original https://aws.amazon.com/blogs/compute/migrating-a-multi-tier-application-from-a-microsoft-hyper-v-environment-using-aws-sms-and-aws-migration-hub/

Shane Baldacchino is a Solutions Architect at Amazon Web Services

Many customers ask for guidance to migrate end-to-end solutions running in their on-premises data center to AWS. This post provides an overview of moving a common blogging platform, WordPress, running on an on-premises virtualized Microsoft Hyper-V platform to AWS, including re-pointing the DNS records associated to the website.

AWS Server Migration Service (AWS SMS) is an agentless service that makes it easier and faster for you to migrate thousands of on-premises workloads to AWS. In November 2017, AWS added support for Microsoft’s Hyper-V hypervisor. AWS SMS allows you to automate, schedule, and track incremental replications of live server volumes, making it easier for you to coordinate large-scale server migrations. In this post, I guide you through migrating your multi-tier workloads using both AWS SMS and AWS Migration Hub.

Migration Hub provides a single location to track the progress of application migrations across multiple AWS and partner solutions. In this post, you use AWS SMS as a mechanism to migrate the virtual machines (VMs) and track them via Migration Hub. You can also use other third-party tools in Migration Hub, and choose the migration tools that best fit your needs. Migration Hub allows you to get progress updates across all migrations, identify and troubleshoot any issues, and reduce the overall time and effort spent on your migration projects.

Migration Hub and AWS SMS are both free. You pay only for the cost of the individual migration tools that you use, and any resources being consumed on AWS.

Walkthrough

For this walkthrough, the WordPress blog is currently running as a two-tier stack in a corporate data center. The example environment is multi-tier and polyglot in nature. The frontend uses Windows Server 2016 (running IIS 10 with PHP as an ISAPI extension) and the backend is supported by a MySQL server running on Ubuntu 16.04 LTS. All systems are hosted on a virtualized platform. As the environment consists of multiple servers, you can use Migration Hub to group the servers together as an application and manage the holistic process of migrating the application.
The key elements of this migration process involve the following steps:

  1. Establish your AWS environment.
  2. Replicate your database.
  3. Download the SMS Connector from the AWS Management Console.
  4. Configure AWS SMS and Hyper-V permissions.
  5. Install and configure the SMS Connector appliance.
  6. Configure Hyper-V host permissions.
  7. Import your virtual machine inventory and create a replication job.
  8. Use AWS Migration Hub to track progress.
  9. Launch your Amazon EC2 instance.
  10. Change your DNS records to resolve the WordPress blog to your EC2 instance.

Before you start, ensure that your source systems OS and hypervisor version are supported by AWS SMS. For more information, see the Server Migration Service FAQ. This post focuses on the Microsoft Hyper-V hypervisor.

Establish your AWS environment

First, establish your AWS environment. If your organization is new to AWS, this may include account or subaccount creation, a new virtual private cloud (VPC), and associated subnets, route tables, internet gateways, and so on. Think of this phase as setting up your software-defined data center. For more information, see Getting Started with Amazon EC2 Linux Instances.

The blog is a two-tier stack, so go with two private subnets. Because you want it to be highly available, use multiple Availability Zones. An Availability Zone resides within an AWS Region. Each Availability Zone is isolated, but the zones within a Region are connected through low-latency links. This allows architects and solution designers to build highly available solutions.

Replicate your database

WordPress uses a MySQL relational database. You could continue to manage MySQL and the associated EC2 instances associated with maintaining and scaling a database. But for this walkthrough, I am using this opportunity to migrate to an RDS instance of Amazon Aurora, as it is a MySQL-compliant database. Not only is Amazon Aurora a high-performant database engine but it frees you up to focus on application development by managing time-consuming database administration tasks, including backups, software patching, monitoring, scaling, and replication.

Use AWS Database Migration Service (AWS DMS) to migrate your MySQL database to Amazon Aurora easily and securely. You can send the results from AWS DMS to Migration Hub. This allows you to create a single pane view of your application migration.

After a database migration instance has been instantiated, configure the source and destination endpoints and create a replication task.

By attaching to the MySQL binlog, you can seed in the current data in the database and also capture all future state changes in near–real time. For more information, see Migrating a MySQL-Compatible Database to Amazon Aurora.

Finally, the task shows that you are replicating current data in your WordPress blog database and future changes from MySQL into Amazon Aurora.

Download the SMS Connector from the AWS Management Console

Now, use AWS SMS to migrate your IIS/PHP frontend. AWS SMS is delivered as a virtual appliance that can be deployed in your Hyper-V environment.

To download the SMS Connector, log in to the console and choose Server Migration Service, Connectors, SMS Connector setup guide. Download the VHD file for SCVMM/Hyper-V.

Configure SMS

Your hypervisor and AWS SMS need an appropriate user with sufficient privileges to perform migrations:

Launch a new VM in Hyper-V based on the SMS Connector that you downloaded. To configure the connector, connect to it via HTTPS. You can obtain the SMS Connector IP address from within Hyper-V. By default, the SMS Connector uses DHCP to obtain a valid IP address.

Connect to the SMS Connector via HTTPS. In the example above, the connector IP address is 10.0.0.88. In your browser, enter https://10.0.0.88. As the SMS Connector can only work with one hypervisor at a time, you must state the hypervisor with which to interface. For the purpose of this post, the examples use Microsoft Hyper-V.

Configure the connector with the IAM and hypervisor credentials that you created earlier.

After you have entered in both your AWS and Hyper-V credentials and the associated connectivity and authentication checks have passed, you are redirected to the home page of your SMS Connector. The home page provides you a status on connectivity and the health of the SMS Connector.

Configure Hyper-V host permissions

You also must modify your Hyper-V hosts to provide WinRM connectivity. AWS provides a downloadable PowerShell script to configure your Windows environment to support WinRM communications with the SMS Connector. The same script is used for configuring either standalone Hyper-V or SCVMM.

Execute the PowerShell script and follow the prompts. In the following example, Reconfigure Hyper-V not managed by SCVMM (Standalone Hyper-V)… was selected.

Import your virtual machine inventory and create a replication job

You have now configured the SMS Connector and your Microsoft Hyper-V hosts. Switch to the console to import your server catalog to AWS SMS. Within AWS SMS, choose Connectors, Import Server Catalog.

This process can take up to a few minutes and is dependent on the number of machines in your Hyper-V inventory.

Select the server to migrate and choose Create replication job. The console guides you through the process. The time that the initial replication task takes to complete is dependent on the available bandwidth and the size of your VM. After the initial seed replication, network bandwidth is minimized as AWS SMS replicates only incremental changes occurring on the VM.

Use Migration Hub to track progress

You have now successfully started your database migration via AWS DMS, set up your SMS Connector, configured your Microsoft Hyper-V environment, and started a replication job.

You can now track the collective progress of your application migration. To track migration progress, connect AWS DMS and AWS SMS to Migration Hub.

To do this, navigate to Migration Hub in the AWS Management Console. Under Migrate and Tools, connect both services so that the migration status of these services is sent to Migration Hub.

You can then group your servers into an application in Migration Hub and collectively track the progress of your migration. In this example, I created an application, Company Blog, and added in my servers from both AWS SMS and AWS DMS.

The progress updates from linked services are automatically sent to Migration Hub so that you can track tasks in progress. The dashboard reflects any status changes that occur in the linked services. You can see from the following image that one server is complete while another is in progress.

Using Migration Hub, you can view the migration progress of all applications. This allows you to quickly get progress updates across all of your migrations, easily identify and troubleshoot any issues, and reduce the overall time and effort spent on your migration projects.

Launch your EC2 instance

When your replication task is complete, the artifact created by AWS SMS is a custom AMI that you can use to deploy an EC2 instance. Follow the usual process to launch your EC2 instance, using the custom AMI created by AWS SMS, noting that you may need to replace any host-based firewalls with security groups and NACLs.

When you create an EC2 instance, ensure that you pick the most suitable EC2 instance type and size to match your performance requirements while optimizing for cost.

While your new EC2 instance is a replica of your on-premises VM, you should always validate that applications are functioning. How you do this differ on an application-by-application basis. You can use a combination of approaches, such as editing a local host file and testing your application, SSH, RDP, and Telnet.

From the RDS console, get your connection string details and update your WordPress configuration file to point to the Amazon Aurora database. As WordPress is expecting a MySQL database and Amazon Aurora is MySQL-compliant, this change of database engine is transparent to WordPress.

Change your DNS records to resolve the WordPress blog to your EC2 instance

You have validated that your WordPress application is running correctly, as you are still receiving changes from your on-premises data center via AWS DMS into your Amazon Aurora database. You can now update your DNS zone file using Amazon Route 53. Amazon Route 53 can be driven by multiple methods: console, SDK, or AWS CLI.

For this walkthrough, use Windows PowerShell for AWS to update the DNS zone file. The example shows UPSERTING the A record in the zone to resolve to the Amazon EC2 instance created with AWS SMS.

Based on the TTL of your DNS zone file, end users slowly resolve the WordPress blog to AWS.

Summary

You have now successfully migrated your WordPress blog to AWS using AWS migration services, specifically the AWS SMS Hyper-V/SCVMM Connector. Your blog now resolves to AWS. After validation, you are ready to decommission your on-premises resources.

Many architectures can be extended to use many of the inherent benefits of AWS, with little effort. For example, by using Amazon CloudWatch metrics to drive scaling policies, you can use an Application Load Balancer as your frontend. This removes the single point of failure for a single EC2 instance

Aurora Serverless MySQL Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aurora-serverless-ga/

You may have heard of Amazon Aurora, a custom built MySQL and PostgreSQL compatible database born and built in the cloud. You may have also heard of serverless, which allows you to build and run applications and services without thinking about instances. These are two pieces of the growing AWS technology story that we’re really excited to be working on. Last year, at AWS re:Invent we announced a preview of a new capability for Aurora called Aurora Serverless. Today, I’m pleased to announce that Aurora Serverless for Aurora MySQL is generally available. Aurora Serverless is on-demand, auto-scaling, serverless Aurora. You don’t have to think about instances or scaling and you pay only for what you use.

This paradigm is great for applications with unpredictable load or infrequent demand. I’m excited to show you how this all works. Let me show you how to launch a serverless cluster.

Creating an Aurora Serverless Cluster

First, I’ll navigate to the Amazon Relational Database Service (RDS) console and select the Clusters sub-console. From there, I’ll click the Create database button in the top right corner to get to this screen.

From the screen above I select my engine type and click next, for now only Aurora MySQL 5.6 is supported.

Now comes the fun part. I specify my capacity type as Serverless and all of the instance selection and configuration options go away. I only have to give my cluster a name and a master username/password combo and click next.

From here I can select a number of options. I can specify the minimum and maximum number of Aurora Compute Units (ACU) to be consumed. These are billed per-second, with a 5-minute minimum, and my cluster will autoscale between the specified minimum and maximum number of ACUs. The rules and metrics for autoscaling will be automatically created by Aurora Serverless and will include CPU utilization and number of connections. When Aurora Serverless detects that my cluster needs additional capacity it will grab capacity from a warm pool of resources to meet the need. This new capacity can start serving traffic in seconds because of the separation of the compute layer and storage layer intrinsic to the design of Aurora.

The cluster can even automatically scale down to zero if my cluster isn’t seeing any activity. This is perfect for development databases that might go long periods of time with little or no use. When the cluster is paused I’m only charged for the underlying storage. If I want to manually scale up or down, pre-empting a large spike in traffic, I can easily do that with a single API call.

Finally, I click Create database in the bottom right and wait for my cluster to become available – which happens quite quickly. For now we only support a limited number of cluster parameters with plans to enable more customized options as we iterate on customer feedback.

Now, the console provides a wealth of data, similar to any other RDS database.

From here, I can connect to my cluster like any other MySQL database. I could run a tool like sysbench or mysqlslap to generate some load and trigger a scaling event or I could just wait for the service to scale down and pause.

If I scroll down or select the events subconsole I can see a few different autoscaling events happening including pausing the instance at one point.

The best part about this? When I’m done writing the blog post I don’t need to remember to shut this server down! When I’m ready to use it again I just make a connection request and my cluster starts responding in seconds.

How Aurora Serverless Works

I want to dive a bit deeper into what exactly is happening behind the scenes to enable this functionality. When you provision an Aurora Serverless database the service does a few things:

  • It creates an Aurora storage volume replicated across multiple AZs.
  • It creates an endpoint in your VPC for the application to connect to.
  • It configures a network load balancer (invisible to the customer) behind that endpoint.
  • It configures multi-tenant request routers to route database traffic to the underlying instances.
  • It provisions the initial minimum instance capacity.

 

When the cluster needs to autoscale up or down or resume after a pause, Aurora grabs capacity from a pool of already available nodes and adds them to the request routers. This process takes almost no time and since the storage is shared between nodes Aurora can scale up or down in seconds for most workloads. The service currently has autoscaling cooldown periods of 1.5 minutes for scaling up and 5 minutes for scaling down. Scaling operations are transparent to the connected clients and applications since existing connections and session state are transferred to the new nodes. The only difference with pausing and resuming is a higher latency for the first connection, typically around 25 seconds.

Available Now

Aurora Serverless for Aurora MySQL is available now in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland). If you’re interested in learning more about the Aurora engine there’s a great design paper available. If you’re interested in diving a bit deeper on exactly how Aurora Serverless works then look forward to more detail in future posts!

I personally believe this is one of the really exciting points in the evolution of the database story and I can’t wait to see what customers build with it!

Randall

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Amazon Aurora Backtrack – Turn Back Time

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/

We’ve all been there! You need to make a quick, seemingly simple fix to an important production database. You compose the query, give it a once-over, and let it run. Seconds later you realize that you forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupt the query, but the damage has been done. You take a deep breath, whistle through your teeth, wish that reality came with an Undo option. Now what?

New Amazon Aurora Backtrack
Today I would like to tell you about the new backtrack feature for Amazon Aurora. This is as close as we can come, given present-day technology, to an Undo option for reality.

This feature can be enabled at launch time for all newly-launched Aurora database clusters. To enable it, you simply specify how far back in time you might want to rewind, and use the database as usual (this is on the Configure advanced settings page):

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

After that regrettable moment when all seems lost, you simply pause your application, open up the Aurora Console, select the cluster, and click Backtrack DB cluster:

Then you select Backtrack and choose the point in time just before your epic fail, and click Backtrack DB cluster:

Then you wait for the rewind to take place, unpause your application and proceed as if nothing had happened. When you initiate a backtrack, Aurora will pause the database, close any open connections, drop uncommitted writes, and wait for the backtrack to complete. Then it will resume normal operation and being to accept requests. The instance state will be backtracking while the rewind is underway:

The console will let you know when the backtrack is complete:

If it turns out that you went back a bit too far, you can backtrack to a later time. Other Aurora features such as cloning, backups, and restores continue to work on an instance that has been configured for backtrack.

I’m sure you can think of some creative and non-obvious use cases for this cool new feature. For example, you could use it to restore a test database after running a test that makes changes to the database. You can initiate the restoration from the API or the CLI, making it easy to integrate into your existing test framework.

Things to Know
This option applies to newly created MySQL-compatible Aurora database clusters and to MySQL-compatible clusters that have been restored from a backup. You must opt-in when you create or restore a cluster; you cannot enable it for a running cluster.

This feature is available now in all AWS Regions where Amazon Aurora runs, and you can start using it today.

Jeff;

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!

Jeff;