Tag Archives: AWS re:Invent

AWS Backup – Automate and Centrally Manage Your Backups

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-backup-automate-and-centrally-manage-your-backups/

AWS gives you the power to easily and dynamically create file systems, block storage volumes, relational databases, NoSQL databases, and other resources that store precious data. You can create them on a moment’s notice as the need arises, giving you access to as much storage as you need and opening the door to large-scale cloud migration. When you bring your sensitive data to the cloud, you need to make sure that you continue to meet business and regulatory compliance requirements, and you definitely want to make sure that you are protected against application errors.

While you can build your own backup tools using the built-in snapshot operations built in to many of the services that I listed above, creating an enterprise wide backup strategy and the tools to implement it still takes a lot of work. We are changing that.

New AWS Backup
AWS Backup is designed to help you automate and centrally manage your backups. You can create policy-driven backup plans, monitor the status of on-going backups, verify compliance, and find / restore backups, all using a central console. Using a combination of the existing AWS snapshot operations and new, purpose-built backup operations, Backup backs up EBS volumes, EFS file systems, RDS & Aurora databases, DynamoDB tables, and Storage Gateway volumes to Amazon Simple Storage Service (S3), with the ability to tier older backups to Amazon Glacier. Because Backup includes support for Storage Gateway volumes, you can include your existing, on-premises data in the backups that you create.

Each backup plan includes one or more backup rules. The rules express the backup schedule, frequency, and backup window. Resources to be backed-up can be identified explicitly or in a policy-driven fashion using tags. Lifecycle rules control storage tiering and expiration of older backups. Backup gathers the set of snapshots and the metadata that goes along with the snapshots into collections that define a recovery point. You get lots of control so that you can define your daily / weekly / monthly backup strategy, the ability to rest assured that your critical data is being backed up in accord with your requirements, and the ability to restore that data on an as-needed data. Backups are grouped into vaults, each encrypted by a KMS key.

Using AWS Backup
You can get started with AWS Backup in minutes. Open the AWS Backup Console and click Create backup plan:

I can build a plan from scratch, start from an existing plan or define one using JSON. I’ll Build a new plan, and start by giving my plan a name:

Now I create the first rule for my backup plan. I call it MainBackup, indicate that I want it to run daily, define the lifecycle (transition to cold storage after 1 month, expire after 6 months), and select the Default vault:

I can tag the recovery points that are created as a result of this rule, and I can also tag the backup plan itself:

I’m all set, so I click Create plan to move forward:

At this point my plan exists and is ready to run, but it has just one rule and does not have any resource assignments (so there’s nothing to back up):

Now I need to indicate which of my resources are subject to this backup plan I click Assign resources, and then create one or more resource assignments. Each assignment is named and references an IAM role that is used to create the recovery point. Resources can be denoted by tag or by resource ID, and I can use both in the same assignment. I enter all of the values and click Assign resources to wrap up:

The next step is to wait for the first backup job to run (I cheated by editing my backup window in order to get this post done as quickly as possible). I can peek at the Backup Dashboard to see the overall status:

Backups On Demand
I also have the ability to create a recovery point on demand for any of my resources. I choose the desired resource and designate a vault, then click Create an on-demand backup:

I indicated that I wanted to create the backup right away, so a job is created:

The job runs to completion within minutes:

Inside a Vault
I can also view my collection of vaults, each of which contains multiple recovery points:

I can examine see the list of recovery points in a vault:

I can inspect a recovery point, and then click Restore to restore my table (in this case):

I’ve shown you the highlights, and you can discover the rest for yourself!

Things to Know
Here are a couple of things to keep in mind when you are evaluating AWS Backup:

Services – We are launching with support for EBS volumes, RDS databases, DynamoDB tables, EFS file systems, and Storage Gateway volumes. We’ll add support for additional services over time, and welcome your suggestions. Backup uses the existing snapshot operations for all services except EFS file systems.

Programmatic Access – You can access all of the functions that I showed you above using the AWS Command Line Interface (CLI) and the AWS Backup APIs. The APIs are powerful integration points for your existing backup tools and scripts.

Regions – Backups work within the scope of a particular AWS Region, with plans in the works to enable several different types of cross-region functionality in 2019.

Pricing – You pay the normal AWS charges for backups that are created using the built-in AWS snapshot facilities. For Amazon EFS, there’s a low, per-GB charge for warm storage and an even lower charge for cold storage.

Available Now
AWS Backup is available now and you can start using it today!




Behind the Scenes & Under the Carpet – The CenturyLink Network that Powered AWS re:Invent 2018

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/behind-the-scenes-under-the-carpet-the-centurylink-network-that-powered-aws-reinvent-2018/

If you are a long-time reader, you may have already figured out that I am fascinated by the behind-the-scenes and beneath-the-streets activities that enable and power so much of our modern world. For example, late last year I told you how The AWS Cloud Goes Underground at re:Invent and shared some information about the communication and network infrastructure that was used to provide top-notch connectivity to re:Invent attendees and to those watching the keynotes and live streams from afar.

Today, with re:Invent 2018 in the rear-view mirror (and planning for next year already underway), I would like to tell you how 5-time re:Invent Network Services Provider CenturyLink designed and built a redundant, resilient network that used AWS Direct Connect to provide 180 Gbps of bandwidth and supported over 81,000 devices connected across eight venues. Above the ground, we worked closely with ShowNets to connect their custom network and WiFi deployment in each venue to the infrastructure provided by CenturyLink.

The 2018 re:Invent Network
This year, the network included diverse routes to multiple AWS regions, with a brand-new multi-node metro fiber ring that encompassed the Sands Expo, Wynn Resort, Circus Circus, Mirage, Vdara, Bellagio, Aria, and MGM Grand facilities. Redundant 10 Gbps connections to each venue and to multiple AWS Direct Connect locations were used to ensure high availability. The network was provisioned using CenturyLink Cloud Connect Dynamic Connections.

Here’s a network diagram (courtesy of CenturyLink) that shows the metro fiber ring and the connectivity:

The network did its job, and supported keynotes, live streams, breakout sessions, hands-on labs, hackathons, workshops, and certification exams. Here are the final numbers, as measured on-site at re:Invent 2018:

  • Live Streams – Over 60K views from over 100 countries.
  • Peak Data Transfer – 9.5 Gbps across six 10 Gbps connections.
  • Total Data Transfer – 160 TB.

Thanks again to our Managed Service Partner for building and running the robust network that supported our customers, partners, and employees at re:Invent!


AWS re:Invent Security Recap: Launches, Enhancements, and Takeaways

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/aws-reinvent-security-recap-launches-enhancements-and-takeaways/

For more from Steve, follow him on Twitter

Customers continue to tell me that our AWS re:Invent conference is a winner. It’s a place where they can learn, meet their peers, and rediscover the art of the possible. Of course, there is always an air of anticipation around what new AWS service releases will be announced. This time around, we went even bigger than we ever have before. There were over 50,000 people in attendance, spread across the Las Vegas strip, with over 2,000 breakout sessions, and jam packed hands-on learning opportunities with multiple day hackathons, workshops, and bootcamps.

A big part of all this activity included sharing knowledge about the latest AWS Security, Identity and Compliance services and features, as well as announcing new technology that we’re excited to be adopted so quickly across so many use-cases.

Here are the top Security, Identity and Compliance releases from re:invent 2018:

Keynotes: All that’s new

New AWS offerings provide more prescriptive guidance

The AWS re:Invent keynotes from Andy Jassy, Werner Vogels, and Peter DeSantis, as well as my own leadership session, featured the following new releases and service enhancements. We continue to strive to make architecting easier for developers, as well as our partners and our customers, so they stay secure as they build and innovate in the cloud.

  • We launched several prescriptive security services to assist developers and customers in understanding and managing their security and compliance postures in real time. My favorite new service is AWS Security Hub, which helps you centrally manage your security and compliance controls. With Security Hub, you now have a single place that aggregates, organizes, and prioritizes your security alerts, or findings, from multiple AWS services, such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie, as well as from AWS Partner solutions. Findings are visually summarized on integrated dashboards with actionable graphs and tables. You can also continuously monitor your environment using automated compliance checks based on the AWS best practices and industry standards your organization follows. Get started with AWS Security Hub with just a few clicks in the Management Console and once enabled, Security Hub will begin aggregating and prioritizing findings. You can enable Security Hub on a single account with one click in the AWS Security Hub console or a single API call.
  • Another prescriptive service we launched is called AWS Control Tower. One of the first things customers think about when moving to the cloud is how to set up a landing zone for their data. AWS Control Tower removes the guesswork, automating the set-up of an AWS landing zone that is secure, well-architected and supports multiple accounts. AWS Control Tower does this by using a set of blueprints that embody AWS best practices. Guardrails, both mandatory and recommended, are available for high-level, rule-based governance, allowing you to have the right operational control over your accounts. An integrated dashboard enables you to keep a watchful eye over the accounts provisioned, the guardrails that are enabled, and your overall compliance status. Sign up for the Control Tower preview, here.
  • The third prescriptive service, called AWS Lake Formation, will reduce your data lake build time from months to days. Prior to AWS Lake Formation, setting up a data lake involved numerous granular tasks. Creating a data lake with Lake Formation is as simple as defining where your data resides and what data access and security policies you want to apply. Lake Formation then collects and catalogs data from databases and object storage, moves the data into your new Amazon S3 data lake, cleans and classifies data using machine learning algorithms, and secures access to your sensitive data. Get started with a preview of AWS Lake Formation, here.
  • Next up, IoT Greengrass enables enhanced security through hardware root of trusted private key storage on hardware secure elements including Trusted Platform Modules (TPMs) and Hardware Security Modules (HSMs). Storing your private key on a hardware secure element adds hardware root of trust level-security to existing AWS IoT Greengrass security features that include X.509 certificates for TLS mutual authentication and encryption of data both in transit and at rest. You can also use the hardware secure element to protect secrets that you deploy to your AWS IoT Greengrass device using AWS IoT Greengrass Secrets Manager. To try these security enhancements for yourself, check out https://aws.amazon.com/greengrass/.
  • You can now use the AWS Key Management Service (KMS) custom key store feature to gain more control over your KMS keys. Previously, KMS offered the ability to store keys in shared HSMs managed by KMS. However, we heard from customers that their needs were more nuanced. In particular, they needed to manage keys in single-tenant HSMs under their exclusive control. With KMS custom key store, you can configure your own CloudHSM cluster and authorize KMS to use it as a dedicated key store for your keys. Then, when you create keys in KMS, you can choose to generate the key material in your CloudHSM cluster. Get started with KMS custom key store by following the steps in this blog post.
  • We’re excited to announce the release of ATO on AWS to help customers and partners speed up the FedRAMP approval process (which has traditionally taken SaaS providers up to 2 years to complete). We’ve already had customers, such as Smartsheet, complete the process in less than 90 days with ATO on AWS. Customers will have access to training, tools, pre-built CloudFormation templates, control implementation details, and pre-built artifacts. Additionally, customers are able to access direct engagement and guidance from AWS compliance specialists and support from expert AWS consulting and technology partners who are a part of our Security Automation and Orchestration (SAO) initiative, including GitHub, Yubico, RedHat, Splunk, Allgress, Puppet, Trend Micro, Telos, CloudCheckr, Saint, Center for Internet Security (CIS), OKTA, Barracuda, Anitian, Kratos, and Coalfire. To get started with ATO on AWS, contact the AWS partner team at [email protected].
  • Finally, I announced our first conference dedicated to cloud security, identity and compliance: AWS re:Inforce. The inaugural AWS re:Inforce, a hands-on gathering of like-minded security professionals, will take place in Boston, MA on June 25th and 26th, 2019 at the Boston Convention and Exhibition Center. The cost for a full conference pass will be $1,099. I’m hoping to see you all there. Sign up here to be notified of when registration opens.

Key re:Invent Takeaways

AWS is here to help you build

  1. Customers want to innovate, and cloud needs to securely enable this. Companies need to able to innovate to meet rapidly evolving consumer demands. This means they need cloud security capabilities they can rely on to meet their specific security requirements, while allowing them to continue to meet and exceed customer expectations. AWS Lake Formation, AWS Control Tower, and AWS Security Hub aggregate and automate otherwise manual processes involved with setting up a secure and compliant cloud environment, giving customers greater flexibility to innovate, create, and manage their businesses.
  2. Cloud Security is as much art as it is science. Getting to what you really need to know about your security posture can be a challenge. At AWS, we’ve found that the sweet spot lies in services and features that enable you to continuously gain greater depth of knowledge into your security posture, while automating mission critical tasks that relieve you from having to constantly monitor your infrastructure. This manifests itself in having an end-to-end automated remediation workflow. I spent some time covering this in my re:Invent session, and will continue to advocate using a combination of services, such as AWS Lambda, WAF, S3, AWS CloudTrail, and AWS Config to proactively identify, mitigate, and remediate threats that may arise as your infrastructure evolves.
  3. Remove human access to data. I’ve set a goal at AWS to reduce human access to data by 80%. While that number may sound lofty, it’s purposeful, because the only way to achieve this is through automation. There have been a number of security incidents in the news across industries, ranging from inappropriate access to personal information in healthcare, to credential stuffing in financial services. The way to protect against such incidents? Automate key security measures and minimize your attack surface by enabling access control and credential management with services like AWS IAM and AWS Secrets Manager. Additional gains can be found by leveraging threat intelligence through continuous monitoring of incidents via services such as Amazon GuardDuty, Amazon Inspector, and Amazon Macie (intelligence from these services will now be available in AWS Security Hub).
  4. Get your leadership on board with your security plan. We offer 500+ security services and features; however, new services and technology can’t be wholly responsible for implementing reliable security measures. Security teams need to set expectations with leadership early, aligning on a number of critical protocols, including how to restrict and monitor human access to data, patching and log retention duration, credential lifespan, blast radius reduction, embedded encryption throughout AWS architecture, and canaries and invariants for security functionality. It’s also important to set security Key Performance Indicators (KPIs) to continuously track. At AWS, we monitor the number of AppSec reviews, how many security checks we can automate, third-party compliance audits, metrics on internal time spent, and conformity with Service Level Agreements (SLAs). While the needs of your business may vary, we find baseline KPIs to be consistent measures of security assurance that can be easily communicated to leadership.

Final Thoughts

Queen’s famous lyric, “I want it all, I want it all, and I want it now,” accurately captures the sentiment at re:Invent this year. Security will always be job zero for us, and we continue to iterate on behalf of customers so they can securely build, experiment and create … right now! AWS is trusted by many of the world’s most risk-sensitive organizations precisely because we have demonstrated this unwavering commitment to putting security above all. Still, I believe we are in the early days of innovation and adoption of the cloud, and I look forward to seeing both the gains and use cases that come out of our latest batch of tools and services.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Steve Schmidt

Steve is Vice President and Chief Information Security Officer for AWS. His duties include leading product design, management, and engineering development efforts focused on bringing the competitive, economic, and security benefits of cloud computing to business and government customers. Prior to AWS, he had an extensive career at the Federal Bureau of Investigation, where he served as a senior executive and section chief. He currently holds five patents in the field of cloud security architecture. Follow Steve on Twitter

New – EC2 P3dn GPU Instances with 100 Gbps Networking & Local NVMe Storage for Faster Machine Learning + P3 Price Reduction

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-ec2-p3dn-gpu-instances-with-100-gbps-networking-local-nvme-storage-for-faster-machine-learning-p3-price-reduction/

Late last year I told you about Amazon EC2 P3 instances and also spent some time discussing the concept of the Tensor Core, a specialized compute unit that is designed to accelerate machine learning training and inferencing for large, deep neural networks. Our customers love P3 instances and are using them to run a wide variety of machine learning and HPC workloads. For example, fast.ai set a speed record for deep learning, training the ResNet-50 deep learning model on 1 million images for just $40.

Raise the Roof
Today we are expanding the P3 offering at the top end with the addition of p3dn.24xlarge instances, with 2x the GPU memory and 1.5x as many vCPUs as p3.16xlarge instances. The instances feature 100 Gbps network bandwidth (up to 4x the bandwidth of previous P3 instances), local NVMe storage, the latest NVIDIA V100 Tensor Core GPUs with 32 GB of GPU memory, NVIDIA NVLink for faster GPU-to-GPU communication, AWS-custom Intel® Xeon® Scalable (Skylake) processors running at 3.1 GHz sustained all-core Turbo, all built atop the AWS Nitro System. Here are the specs:4

Model NVIDIA V100 Tensor Core GPUs GPU Memory NVIDIA NVLink vCPUs Main Memory Local Storage Network Bandwidth EBS-Optimized Bandwidth
p3dn.24xlarge 8 256 GB 300 GB/s 96 768 GiB 2 x 900 GB NVMe SSD 100 Gbps 14 Gbps

If you are doing large-scale training runs using MXNet, TensorFlow, PyTorch, or Keras, be sure to check out the Horovod distributed training framework that is included in the Amazon Deep Learning AMIs. You should also take a look at the new NVIDIA AI Software containers in the AWS Marketplace; these containers are optimized for use on P3 instances with V100 GPUs.

With a total of 256 GB of GPU memory (twice as much as the largest of the current P3 instances), the p3dn.24xlarge allows you to explore bigger and more complex deep learning algorithms. You can rotate and scale your training images faster than ever before, while also taking advantage of the Intel AVX-512 instructions and other leading-edge Skylake features. Your GPU code can scale out across multiple GPUs and/or instances using NVLink and the NVLink Collective Communications Library (NCCL). Using NCCL will also allow you to fully exploit the 100 Gbps of network bandwidth that is available between instances when used within a Placement Group.

In addition to being a great fit for distributed machine learning training and image classification, these instances provide plenty of power for your HPC jobs. You can render 3D images, transcode video in real time, model financial risks, and much more.

You can use existing AMIs as long as they include the ENA, NVMe, and NVIDIA drivers. You will need to upgrade to the latest ENA driver to get 100 Gbps networking; if you are using the Deep Learning AMIs, be sure to use a recent version that is optimized for AVX-512.

Available Today
The p3dn.24xlarge instances are available now in the US East (N. Virginia) and US West (Oregon) Regions and you can start using them today in On-Demand, Spot, and Reserved Instance form.

Bonus – P3 Price Reduction
As part of today’s launch we are also reducing prices for the existing P3 instances. The following prices went in to effect on December 6, 2018:

  • 20% reduction for all prices (On-Demand and RI) and all instance sizes in the Asia Pacific (Tokyo) Region.
  • 15% reduction for all prices (On-Demand and RI) and all instance sizes in the Asia Pacific (Sydney), Asia Pacific (Singapore), and Asia Pacific (Seoul) Regions.
  • 15% reduction for Standard RIs with a three-year term for all instance sizes in all regions except Asia Pacific (Tokyo), Asia Pacific (Sydney), Asia Pacific (Singapore), and Asia Pacific (Seoul).

The percentages apply to instances running Linux; slightly smaller percentages apply to instances that run Microsoft Windows and other operating systems.

These reductions will help to make your machine learning training and inferencing even more affordable, and are being brought to you as we pursue our goal of putting machine learning in the hands of every developer.




New – AWS Well-Architected Tool – Review Workloads Against Best Practices

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-well-architected-tool-review-workloads-against-best-practices/

Back in 2015 we launched the AWS Well-Architected Framework and I asked Are You Well-Architected? The framework includes five pillars that encapsulate a set of core strategies and best practices for architecting systems in the cloud:

Operational Excellence – Running and managing systems to deliver business value.

Security – Protecting information and systems.

Reliability – Preventing and quickly recovering from failures.

Performance Efficiency – Using IT and compute resources efficiently.

Cost Optimization – Avoiding un-needed costs.

I think of it as a way to make sure that you are using the cloud right, and that you are using it well.

AWS Solutions Architects (SA) work with our customers to perform thousands of Well-Architected reviews every year! Even at that pace, the demand for reviews always seems to be a bit higher than our supply of SAs. Our customers tell us that the reviews are of great value and use the results to improve their use of AWS over time.

New AWS Well-Architected Tool
In order to make the Well-Architected reviews open to every AWS customer, we are introducing the AWS Well-Architected Tool. This is a self-service tool that is designed to help architects and their managers to review AWS workloads at any time, without the need for an AWS Solutions Architect.

The AWS Well-Architected Tool helps you to define your workload, answer questions designed to review the workload against the best practices specified by the five pillars, and to walk away with a plan that will help you to do even better over time. The review process includes educational content that focuses on the most current set of AWS best practices.

Let’s take a quick tour…

AWS Well-Architected Tool in Action
I open the AWS Well-Architected Tool Console and click Define workload to get started:

I begin by naming and defining my workload. I choose an industry type and an industry, list the regions where I operate, indicate if this is a pre-production or production workload, and optionally enter a list of AWS account IDs to define the span of the workload. Then I click Define workload to move ahead:

I am ready to get started, so I click Start review:

The first pillar is Operational Excellence. There are nine questions, each with multiple-choice answers. Helpful resources are displayed on the side:

I can go through the pillars and questions in order, save and exit, and so forth. After I complete my review, I can consult the improvement plan for my workload:

I can generate a detailed PDF report that summarizes my answers:

I can review my list of workloads:

And I can see the overall status in the dashboard:

Available Now
The AWS Well-Architected Tool is available now and you can start using it today for workloads in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions at no charge.


New – Compute, Database, Messaging, Analytics, and Machine Learning Integration for AWS Step Functions

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-compute-database-messaging-analytics-and-machine-learning-integration-for-aws-step-functions/

AWS Step Functions is a fully managed workflow service for application developers. You can think & work at a high level, connecting and coordinating activities in a reliable and repeatable way, while keeping your business logic separate from your workflow logic. After you design and test your workflows (which we call state machines), you can deploy them at scale, with tens or even hundreds of thousands running independently and concurrently. Step Functions tracks the status of each workflow, takes care of retrying activities on transient failures, and also simplifies monitoring and logging. To learn more, step through the Create a Serverless Workflow with AWS Step Functions and AWS Lambda tutorial.

Since our launch at AWS re:Invent 2016, our customers have made great use of Step Functions (my post, Things go Better with Step Functions describes a real-world use case). Our customers love the fact that they can easily call AWS Lambda functions to implement their business logic, and have asked us for even more options.

More Integration, More Power
Today we are giving you the power to use eight more AWS services from your Step Function state machines. Here are the new actions:

DynamoDB – Get an existing item from an Amazon DynamoDB table; put a new item into a DynamoDB table.

AWS Batch – Submit a AWS Batch job and wait for it to complete.

Amazon ECS – Run an Amazon ECS or AWS Fargate task using a task definition.

Amazon SNS – Publish a message to an Amazon Simple Notification Service (SNS) topic.

Amazon SQS – Send a message to an Amazon Simple Queue Service (SQS) queue.

AWS Glue – Start a AWS Glue job run.

Amazon SageMaker – Create an Amazon SageMaker training job; create a SageMaker transform job (learn more by reading New Features for Amazon SageMaker: Workflows, Algorithms, and Accreditation).

You can use these actions individually or in combination with each other. To help you get started, we’ve built some cool samples that will show you how to manage a batch job, manage a container task, copy data from DynamoDB, retrieve the status of a Batch job, and more. For example, here’s a visual representation of the sample that copies data from DynamoDB to SQS:

The sample (available to you as an AWS CloudFormation template) creates all of the necessary moving parts including a Lambda function that will populate (seed) the table with some test data. After I create the stack I can locate the state machine in the Step Functions Console and execute it:

I can inspect each step in the console; the first one (Seed the DynamoDB Table) calls a Lambda function that creates some table entries and returns a list of keys (message ids):

The third step (Send Message to SQS) starts with the following input:

And delivers this output, including the SQS MessageId:

As you can see, the state machine took care of all of the heavy lifting — calling the Lambda function, iterating over the list of message IDs, and calling DynamoDB and SQS for each one. I can run many copies at the same time:

I’m sure you can take this example as a starting point and build something awesome with it; be sure to check out the other samples and templates for some ideas!

If you are already building and running your own state machines, you should know about Magic ARNs and Parameters:

Magic ARNs – Each of these new operations is represented by a special “magic” (that’s the technical term Tim used) ARN. There’s one for sending to SQS, another one for running a batch job, and so forth.

Parameters – You can use the Parameters field in a Task state to control the parameters that are passed to the service APIs that implement the new functions. Your state machine definitions can include static JSON or references (in JsonPath form) to specific elements in the state input.

Here’s how the Magic ARNs and Parameters are used to define a state:

   "Read Next Message from DynamoDB": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:getItem",
      "Parameters": {
        "TableName": "StepDemoStack-DDBTable-1DKVAVTZ1QTSH",
        "Key": {
          "MessageId": {"S.$": "$.List[0]"}
      "ResultPath": "$.DynamoDB",
      "Next": "Send Message to SQS"

Available Now
The new integrations are available now and you can start using them today in all AWS Regions where Step Functions are available. You pay the usual charge for each state transition and for the AWS services that you consume.


New – Hibernate Your EC2 Instances

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-hibernate-your-ec2-instances/

As you know, you can easily build highly scalable AWS applications that launch fresh EC2 instances on an as-needed basis. While the instances can be up and running in a matter of seconds, booting the operating system and the application can take considerable time. Also, caches and other memory-centric application components can take some time (sometimes tens of minutes) to preload or warm up. Both of these factors impose a delay that can force you to over-provision in case you need incremental capacity very quickly.

Hibernation for EC2 Instances
Today we are giving you the ability to launch EC2 instances, set them up as desired, hibernate them, and then bring them back to life when you need them. The hibernation process stores the in-memory state of the instance, along with its private and elastic IP addresses, allowing it to pick up exactly where it left off.

This feature is available today and you can use it on freshly launched M3, M4, M5, C3, C4, C5, R3, R4, and R5 instances running Amazon Linux 1 (support for Amazon Linux 2 is in the works and will be ready soon). It applies to On-Demand instances and instances running with Reserved Instance coverage.

When an instance is instructed to hibernate, it writes the in-memory state to a file in the root EBS volume and then (in effect) shuts itself down. The AMI used to launch the instance must be encrypted, as must the root EBS volume of the instance. The encryption ensures proper protection for sensitive data when it is copied from memory to the EBS volume.

While the instance is in hibernation, you pay only for the EBS volumes and Elastic IP Addresses attached to it; there are no other hourly charges (just like any other stopped instance).

Hibernation in Action
In order to check out this feature I launch a c4.large instance, and select hibernation as a stop behavior:

I also expand my instance’s root volume, adding 10 GB + the memory size of the instance to the desired size:

I also create and associate an Elastic IP address with my instance since the public IP address will change. My instance is up and running, and I can check the uptime:

Then I select the instance in the EC2 Console and choose Stop – Hibernate from the Instance State menu (API and CLI support is also available):

The instance state transitions from running to stopping, and then to stopped, in seconds:

The console provides additional information about the transition:

The SSH connection to the instance drops, since it is no longer running:

Later, when I am ready to proceed, I click Start:

This time the state goes from stopped to pending, and then to running, again in seconds, and I can reconnect. I can then use uptime to see that the instance has not been rebooted, but has continued from where it left off:

If I was using this instance interactively, I could use a session manager such as screen, tmux, or mosh to make this totally seamless. The most interesting use cases for hibernation revolve around long-running processes and services that take a lot of time to initialize before they are ready to accept traffic where this would not be a concern.

Things to Know
As you can see, hibernation is really easy to use, and I hope that you are already thinking of some ways to apply it to your application. Here are a couple of things to keep in mind:

Instance Type – You can enable and use hibernation on freshly launches instances of the types that I listed above.

Root Volume Size – The root volume must have free space equal to the amount of RAM on the instance in order for the hibernation to succeed.

Operating Systems – The newest Amazon Linux 1 AMIs are configured for hibernation, with many others in the works. You will need to create an encrypted AMI, using one of these AMIs as a base. You can also follow our directions to customize and use your own AMI.

Modifications – You cannot modify the instance size or type while it is in hibernation, but you can modify the user data and the EBS Optimization setting.

Pricing – While the instance is in hibernation, you pay only for the EBS storage and any Elastic IP addresses attached to the instance.

Performance – The time to hibernate or resume is dependent on the memory size of the instance, the amount of in-memory data to be saved, and the throughput of the root EBS volume.

Coming Soon – We are working on support for Amazon Linux 2, Ubuntu, Windows Server 2008 R2, Windows Server 2012, Windows Server 2012 R2, Windows Server 2016, along with the SQL Server variants of the Windows AMIs.

Available Now
This feature is available now in the US East (N. Virginia, Ohio), US West (N. California, Oregon), Canada (Central), South America (São Paulo), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, Tokyo), and EU (Frankfurt, London, Ireland, Paris) Regions.


New AWS License Manager – Manage Software Licenses and Enforce Licensing Rules

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-license-manager-manage-software-licenses-and-enforce-licensing-rules/

When you make use of commercial, licensed software in the AWS Cloud using a BYOL (Bring Your Own License) strategy, you need to make sure that you stay within the provisions of the license, while also avoiding expensive over-provisioning. This can be a challenge when it is so easy to launch instances on demand whenever you need them!

New AWS License Manager
Today we are launching AWS License Manager. You can define your licensing rules, taking in to account any enterprise agreements and other terms that govern your use of the licensed software. Then you associate them with your deployment mechanism (golden AMIs or Launch Templates) so that EC2 instances launched via the mechanism will be automatically tracked. You can also discover existing usage across one or more AWS accounts, and track all usage through the AWS Management Console.

Let’s take a quick tour, assuming that I own a 100-vCPU license for an enterprise database server.

The first step is to define one or more License Configurations. I open the License Manager Console and click Create license configuration to get started:

I enter a name and description for my configuration, indicate that the license is based on vCPUs (and limited to 100), and that I want to enforce the license:

I can also create rules for the license. The rules control the applicability of the license with respect to this configuration. I can specify a minimum and/or maximum number of vCPUs, and any desired EC2 tenancy (shared, dedicated host, or dedicated instance). Here’s a rule that specifies 4-64 vCPUs, and shared tenancy:

I confirm that the rule is defined as desired, and click Submit to move ahead. My license configuration is ready, as are some others created by colleagues:

After I create my license configuration, I can associate it with an AMI by selecting the configuration and clicking Associate AMI in the Actions menu. I pick one or more AMIs and click Associate:

I can see my overall license usage at a glance (this is a central dashboard that works across multiple accounts and in conjunction with AWS Organizations):

I can click Settings to link to my AWS Organizations accounts, set up a cross-account inventory search and arrange to receive SNS alerts when the usage limit for a license has been breached:

Going Further
Here are a couple of other things to know about AWS License Manager:

Supported License TypesAWS License Manager supports any license based on vCPUs, physical cores, and physical sockets, and is not tied to any software vendor.

Cross-Account UsageAWS License Manager works hand-in-glove with AWS Organizations. You can sign it to your Master account, link all of the accounts with a click, and share license configurations across your Organization. You will be able to use the dashboard to see an Organization-wide view of your license usage.

Multi-Account Software DiscoveryAWS License Manager also works with AWS Systems Manager, and works across accounts within an Organization. The discovered data is stored in an S3 bucket and an Amazon Athena database (encrypted in both places), and is processed by a AWS Glue job.

Programmatic Access – You can create and manage license configurations from the Console, APIs, or the AWS Command Line Interface (CLI). Interesting functions include CreateLicenseConfiguration, GetLicenseConfiguration, ListResourceInventory, and ListUsageForLicenseConfiguration.

Pricing – You can use AWS License Manager at no charge. Behind the scenes, AWS License Manager stores inventory data in an S3 bucket and an Amazon Athena database, and processes it using a AWS Glue job. You’ll pay the usual AWS prices for these resources and services.

Available Now
AWS License Manager is available now and you can start using it today in the US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), Europe (Frankfurt), Asia Pacific (Seoul), Asia Pacific (Mumbai), and Europe (London) Regions.

AWS Launches, Previews, and Pre-Announcements at re:Invent 2018 – Andy Jassy Keynote

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-previews-and-pre-announcements-at-reinvent-2018-andy-jassy-keynote/

As promised in Welcome to AWS re:Invent 2018, here’s a summary of the launches, previews, and pre-announcements from Andy Jassy’s keynote. I have included links to allow you to sign up for previews, as appropriate.

(photo from AWS Community Hero Eric Hammond)

Here are the blog posts that we wrote for today’s launches:

S3 Glacier Deep Archive
This new storage class for Amazon Simple Storage Service (S3) is designed for long-term data archival and is the lowest cost storage from any cloud provider. Priced from just $0.00099/GB-mo (less than one-tenth of one cent, or $1.01 per TB-mo), the cost is comparable to tape archival services. Data can be retrieved in 12 hours or less, and there will also be a bulk retrieval option that will allow you to inexpensively retrieve even petabytes of data within 48 hours.

Control Tower
This service helps you automate the set up a well-architected multi-account AWS environment using a set of blueprints that embody AWS best practices. Guardrails, both mandatory and recommended, are available for high-level, rule-based governance. You will have access to an integrated dashboard so that you can keep a watchful eye over the accounts provisioned, the guardrails that are enabled, and your overall compliance status. Learn more.

Amazon Textract
This Optical Character Recognition (OCR) service will help you to extract text and data from virtually any document. Powered by Machine Learning, it will identify bounding boxes, detect key-value pairs, and make sense of tables, while eliminating manual effort and lowering your document-processing costs. Sign up for the preview.

AWS Outposts
This service will bring AWS to your existing data center, providing a consistent, seamless experience across on-premises and the cloud, and giving you the ability to run on-premises applications with the exact same Application Programming Interfaces (APIs), consoles, features, hardware, and tools that you use on AWS. Sign up for the preview.

Amazon RDS on VMware
This is a fully managed service for on-premises databases. You can set up, run, and scale databases in VMware vSphere using the same tools already enjoyed by hundreds of thousands of Amazon Relational Database Service (RDS) customers. You can build low-cost high-availability hybrid environments, implement disaster recovery to AWS, and do long-term archival in Amazon Simple Storage Service (S3). Sign up for the preview!

Amazon Quantum Ledger Database
This fully managed ledger database will allow you to track and verify the complete history of changes to your application data. It uses an immutable journal that maintains a sequenced, cryptographically verifiable record of all changes that cannot be deleted or modified. It is scalable and easy to use, supports SQL queries, and lets it run 2-3x faster than common blockchain frameworks. Sign up for the preview.

AWS Managed Blockchain
This fully managed ledger database will allow you to track and verify the complete history of changes to your application data. It uses an immutable journal that maintains a sequenced, cryptographically verifiable record of all changes that cannot be deleted or modified. It is scalable and easy to use, supports SQL queries, and lets it run 2-3x faster than common blockchain frameworks. Sign up for the preview.

Amazon Timestream
This a fast, scalable, fully managed time-series database that you can use to store and analyze trillions of events per day at 1/10th the cost of a relational database. It is optimized for data that arrives in time order and for queries that include a time interval. It is a great fit for IoT, industrial telemetry, app monitoring, and DevOps data. Timestream automates rollups, retention, tiering, and compression so time-series data can be efficiently stored and processed. Timestream’s query engine adapts to the location and format of data making it easier and faster to query time-series data. Learn more.

AWS Lake Formation
This fully managed service will help you to build, secure, and manage a data lake. You’ll be able to point it at your data sources, have it crawl the sources, and pull the data into Amazon Simple Storage Service (S3). Lake Formation uses Machine Learning to identify and de-duplicate data, and also performs format changes in order to accelerate analytical processing. You will also be able to define and centrally manage consistent security policies across your data lake and the services that you use to analyze and process the data. Sign up for the preview.

AWS Security Hub
This service will allow you to to centrally view & manage security alerts and automate compliance checks within and across AWS accounts. It will aggregate security findings from AWS and partner services and present you with built-in and customizable insights that are unique to your environment. Try the preview!

Stay Tuned
I am looking forward to writing about each of these services when they are ready to launch, so stay tuned!



Amazon SageMaker Neo – Train Your Machine Learning Models Once, Run Them Anywhere

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-neo-train-your-machine-learning-models-once-run-them-anywhere/

Machine learning (ML) is split in two distinct phases: training and inference. Training deals with building the model, i.e. running a ML algorithm on a dataset in order to identify meaningful patterns. This often requires large amounts of storage and computing power, making the cloud a natural place to train ML jobs with services such as Amazon SageMaker and the AWS Deep Learning AMIs.

Inference deals with using the model, i.e. predicting results for data samples that the model has never seen. Here, the requirements are different: developers are typically concerned with optimizing latency (how long does a single prediction take?) and throughput (how many predictions can I run in parallel?). Of course, the hardware architecture of your prediction environment has a very significant impact on such metrics, especially if you’re dealing with resource-constrained devices: as a Raspberry Pi enthusiast, I often wish the little fellow packed a little more punch to speed up my inference code.

Tuning a model for a specific hardware architecture is possible, but the lack of tooling makes this an error-prone and time-consuming process. Minor changes to the ML framework or the model itself usually require the user to start all over again. Unfortunately, this forces most ML developers to deploy the same model everywhere regardless of the underlying hardware, thus missing out on significant performance gains.

Well, no more. Today, I’m very happy to announce Amazon SageMaker Neo, a new capability of Amazon SageMaker that enables machine learning models to train once and run anywhere in the cloud and at the edge with optimal performance.

Introducing Amazon SageMaker Neo

Without any manual intervention, Amazon SageMaker Neo optimizes models deployed on Amazon EC2 instances, Amazon SageMaker endpoints and devices managed by AWS Greengrass.

Here are the supported configurations:

  • Frameworks and algorithms: TensorFlow, Apache MXNet, PyTorch, ONNX, and XGBoost.
  • Hardware architectures: ARM, Intel, and NVIDIA starting today, with support for Cadence, Qualcomm, and Xilinx hardware coming soon. In addition, Amazon SageMaker Neo is released as open source code under the Apache Software License, enabling hardware vendors to customize it for their processors and devices.

The Amazon SageMaker Neo compiler converts models into an efficient common format, which is executed on the device by a compact runtime that uses less than one-hundredth of the resources that a generic framework would traditionally consume. The Amazon SageMaker Neo runtime is optimized for the underlying hardware, using specific instruction sets that help speed up ML inference.

This has three main benefits:

  • Converted models perform at up to twice the speed, with no loss of accuracy.
  • Sophisticated models can now run on virtually any resource-limited device, unlocking innovative use cases like autonomous vehicles, automated video security, and anomaly detection in manufacturing.
  • Developers can run models on the target hardware without dependencies on the framework.

Under the hood

Most machine learning frameworks represent a model as a computational graph: a vertex represents an operation on data arrays (tensors) and an edge represents data dependencies between operations. The Amazon SageMaker Neo compiler exploits patterns in the computational graph to apply high-level optimizations including operator fusion, which fuses multiple small operations together; constant-folding, which statically pre-computes portions of the graph to save execution costs; a static memory planning pass, which pre-allocates memory to hold each intermediate tensor; and data layout transformations, which transform internal data layouts into hardware-friendly forms. The compiler then produces efficient code for each operator.

Once a model has been compiled, it can be run by the Amazon SageMaker Neo runtime. This runtime takes about 1MB of disk space, compared to the 500MB-1GB required by popular deep learning libraries. An application invokes a model by first loading the runtime, which then loads the model definition, model parameters, and precompiled operations.

I can’t wait to try this on my Raspberry Pi. Let’s get to work.

Downloading a pre-trained model

Plenty of pre-trained models are available in the Apache MXNet, Gluon CV or TensorFlow model zoos: here, I’m using a 50-layer model based on the ResNet architecture, pre-trained with Apache MXNet on the ImageNet dataset.

First, I’m downloading the 227MB model as well as the JSON file defining its different layers. This file is particularly important: it tells me that the input symbol is called ‘data’ and that its shape is [1, 3, 224, 224], i.e. 1 image, 3 channels (red, green and blue), 224×224 pixels. I’ll need to make sure that images passed to the model have this exact shape. The output shape is [1, 1000], i.e. a vector containing the probability for each one of the 1,000 classes present in the ImageNet dataset.

To define a performance baseline, I use this model and a vanilla unoptimized version of Apache MXNet 1.2 to predict a few images: on average, inference takes about 6.5 seconds and requires about 306 MB of RAM.

That’s pretty slow: let’s compile the model and see how fast it gets.

Compiling the model for the Raspberry Pi

First, let’s store both model files in a compressed TAR archive and upload it to an Amazon S3 bucket.

$ tar cvfz model.tar.gz resnet50_v1-symbol.json resnet50_v1-0000.params
a resnet50_v1-symbol.json
a resnet50_v1-0000.paramsresnet50_v1-0000.params
$ aws s3 cp model.tar.gz s3://jsimon-neo/
upload: ./model.tar.gz to s3://jsimon-neo/model.tar.gz

Then, I just have to write a simple configuration file for my compilation job. If you’re curious about other frameworks and hardware targets, ‘aws sagemaker create-compilation-job help‘ will give you the exact syntax to use.

    "CompilationJobName": "resnet50-mxnet-raspberrypi",
    "InputConfig": {
        "S3Uri": "s3://jsimon-neo/model.tar.gz",
        "DataInputConfig": "{\"data\": [1, 3, 224, 224]}",
        "Framework": "MXNET"
    "OutputConfig": {
        "S3OutputLocation": "s3://jsimon-neo/",
        "TargetDevice": "rasp3b"
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 300

Launching the compilation process takes a single command.

$ aws sagemaker create-compilation-job --cli-input-json file://job.json

Compilation is complete in seconds. Let’s figure out the name of the compilation artifact, fetch it from Amazon S3 and extract it locally

$ aws sagemaker describe-compilation-job \
--compilation-job-name resnet50-mxnet-raspberrypi \
--query "ModelArtifacts"
"S3ModelArtifacts": "s3://jsimon-neo/model-rasp3b.tar.gz"
$ aws s3 cp s3://jsimon-neo/model-rasp3b.tar.gz .
$ tar xvfz model-rasp3b.tar.gz
x compiled.params
x compiled_model.json
x compiled.so

As you can see, the artifact contains:

  • The original model and symbol files.
  • A shared object file storing compiled, hardware-optimized, operators used by the model.

For convenience, let’s rename them to ‘model.params’, ‘model.json’ and ‘model.so’, and then copy them on the Raspberry pi in a ‘resnet50’ directory.

$ mkdir resnet50
$ mv compiled.params resnet50/model.params
$ mv compiled_model.json resnet50/model.json
$ mv compiled.so resnet50/model.so
$ scp -r resnet50 [email protected]:~

Setting up the inference environment on the Raspberry Pi

Before I can predict images with the model, I need to install the appropriate runtime on my Raspberry Pi. Pre-built packages are available [neopackages]: I just have to download the one for ‘armv7l’ architectures and to install it on my Pi with the provided script. Please note that I don’t need to install any additional deep learning framework (Apache MXNet in this case), saving up to 1GB of persistent storage.

$ scp -r dlr-1.0-py2.py3-armv7l [email protected]:~
<ssh to the Pi>
$ cd dlr-1.0-py2.py3-armv7l
$ sh ./install-py3.sh

We’re all set. Time to predict images!

Using the Amazon SageMaker Neo runtime

On the Pi, the runtime is available as a Python package named ‘dlr’ (deep learning runtime). Using it to predict images is what you would expect:

  • Load the model, defining its input and output symbols.
  • Load an image.
  • Predict!

Here’s the corresponding Python code.

import os
import numpy as np
from dlr import DLRModel

# Load the compiled model
input_shape = {'data': [1, 3, 224, 224]} # A single RGB 224x224 image
output_shape = [1, 1000]                 # The probability for each one of the 1,000 classes
device = 'cpu'                           # Go, Raspberry Pi, go!
model = DLRModel('resnet50', input_shape, output_shape, device)

# Load names for ImageNet classes
synset_path = os.path.join(model_path, 'synset.txt')
with open(synset_path, 'r') as f:
    synset = eval(f.read())

# Load an image stored as a numpy array
image = np.load('dog.npy').astype(np.float32)
input_data = {'data': image}

# Predict 
out = model.run(input_data)
top1 = np.argmax(out[0])
prob = np.max(out)
print("Class: %s, probability: %f" % (synset[top1], prob))

Let’s give it a try on this image. Aren’t chihuahuas and Raspberry Pis made for one another?

(1, 3, 224, 224)
Class: Chihuahua, probability: 0.901803

The prediction is correct, but what about speed and memory consumption? Well, this prediction takes about 0.85 second and requires about 260MB of RAM: with Amazon SageMaker Neo, it’s now 5 times faster and 15% more RAM-efficient than with a vanilla model.

This impressive performance gain didn’t require any complex and time-consuming work: all we had to do was to compile the model. Of course, your mileage will vary depending on models and hardware architectures, but you should see significant improvements across the board, including on Amazon EC2 instances such as the C5 or P3 families.

Now available

I hope this post was informative. Compiling models with Amazon SageMaker Neo is free of charge, you will only pay for the underlying resource using the model (Amazon EC2 instances, Amazon SageMaker instances and devices managed by AWS Greengrass).

The service is generally available today in US-East (N. Virginia), US-West (Oregon) and Europe (Ireland). Please start exploring and let us know what you think. We can’t wait to see what you will build!


Amazon Personalize – Real-Time Personalization and Recommendation for Everyone

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-personalize-real-time-personalization-and-recommendation-for-everyone/

Machine learning definitely offers a wide range of exciting topics to work on, but there’s nothing quite like personalization and recommendation.

At first glance, matching users to items that they may like sounds like a simple problem. However, the task of developing an efficient recommender system is challenging. Years ago, Netflix even ran a movie recommendation competition with a $1 Million award! Indeed, building, optimizing and deploying real-time personalization today requires specialized expertise in analytics, applied machine learning, software engineering, and systems operations. Few organizations have the knowledge, skills, and experience to overcome these challenges, and they either abandon the idea of using recommendation or build under-performing models.

For over 20 years, Amazon.com has built recommender systems at scale, integrating personalized recommendations across the buying experience – from product discovery to checkout.

To help all AWS customers do the same, we are very happy to announce Amazon Personalize, a fully-managed service that puts personalization and recommendation in the hands of developers with little machine learning experience.

Introducing Amazon Personalize

How does Amazon Personalize simplify personalization and recommendation? As explained in a previous blog post, you could already build recommendation models on Amazon SageMaker using algorithms such as Factorization Machines. However, it’s fair to say that this requires extensive data preparation and expert tuning in order to get good results.

Creating a recommendation model with Amazon Personalize is much simpler. Using AutoML, a new process that automates complex machine learning tasks, Personalize performs and accelerates the difficult work required to design, train, and deploy a machine learning model.

Amazon Personalize supports both datasets stored in Amazon S3 and streaming data sets, e.g. events sent in real-time from a JavaScript tracker or server-side. The high-level process looks like this:

  1. Create a schema describing the dataset, using Personalize-reserved keywords for user ids, item ids, etc.
  2. Create a dataset group that contains datasets used for building the model and for predicting: user-item interactions (aka “who liked what”), users and items. The last two are optional, as we will see in the example below.
  3. Send data to Personalize.
  4. Create a solution, i.e. select a recommendation recipe and train it on the dataset group.
  5. Create a campaign to predict new samples.

With data stored in Amazon S3, sending data to Personalize simply means adding your data files to the dataset group. Ingestion is triggered automatically.

Working with streaming data is different. One way to send events would be to use the AWS Amplify JavaScript library, which is integrated with the event tracking service in Personalize. Another way would be to send them server-side via the AWS SDK in your favourite language: ingestion can happen from any source with the code hosted inside of AWS (e.g. in Amazon EC2 or AWS Lambda) or outside.

Time for an example. Let’s build a solution based on the MovieLens dataset!

The MovieLens dataset

MovieLens is a well-known dataset storing movies recommendations. It comes in different sizes and formats: here, we will use ml-20m, which contains 20 million ratings applied to 27,000 movies by 138,000 users.

This dataset contains a file named ‘ratings.csv’ storing user-item interactions. The first lines look like this.


It reads like this: user 1 gave movie 2 a 3.5 rating. Same for movies 29, 32, 47, 50 and so on! This is exactly what we need to build a recommendation model. Let’s get to work.

Creating a schema for the dataset

The first step is to create an Avro schema for this dataset. This is pretty straightforward, we just need to use some of the keywords defined in Amazon Personalize.

{"type": "record", 
"name": "Interactions", 
"namespace": "com.amazonaws.personalize.schema",
    {"name": "ITEM_ID", "type": "string"},
    {"name": "USER_ID", "type": "string"},
    {"name": "TIMESTAMP", "type": "long"}
"version": "1.0"}

Preparing the dataset

Once we’ve downloaded and unzipped the dataset, let’s load the ‘ratings.csv’ file and apply the following processing:

  • Shuffle reviews.
  • Keep only movies rated 4 and above, and drop the ratings columns: we just want our model to recommend movies that users should really like.
  • Rename columns to the names used in the schema.
  • Keep only 100,000 interactions to minimize training time (this is just a demo after all!).

All of this is easily achieved with the Pandas Python library, the Swiss Army knife for columnar data processing. While we’re at it, we’ll also upload the processed file to an Amazon S3 bucket.

import pandas, boto3 
from sklearn.utils import shuffle
ratings = pandas.read_csv('ratings.csv')
ratings = shuffle(ratings)
ratings = ratings[ratings['rating']>3.6]
ratings = ratings.drop(columns='rating')
ratings.columns = ['USER_ID','ITEM_ID','TIMESTAMP']
ratings = ratings[:100000]
s3 = boto3.client('s3')

Creating the dataset group

First, we need to create a dataset group containing the user-item dataset as well as its schema. Let’s do this with the AWS CLI: as you’ll see, a lot of these CLI operations require Amazon Resource Names (ARNs) output by a previous call, so make sure you keep track of everything when you experiment.

$ aws personalize create-dataset-group --name jsimon-ml20m-dataset-group
$ aws personalize create-schema --name jsimon-ml20m-schema \
--schema file://jsimon-ml20m-schema.json
$ aws personalize create-dataset --schema-arn $SCHEMA_ARN \
--dataset-group-arn $DATASET_GROUP_ARN \
--dataset-type INTERACTIONS 

Importing datasets

In this simple example, we’ll import data on-demand. It’s also possible to schedule import jobs in order to load new data regularly. We need to pass a role allowing data to be read from the Amazon S3 bucket.

$ aws personalize create-dataset-import-job --job-name jsimon-ml20m-job \
--role-arn $ROLE_ARN
--dataset-arn $DATASET_ARN \
--data-source dataLocation=s3://jsimon-ml20m/ratings.processed.csv

This will take a little while and we can use the describe-dataset-import-job API to check for completion. Plenty of information is returned, but let’s just query the import status.

$ aws personalize describe-dataset-import-job \
--dataset-import-job-arn $DATASET_IMPORT_JOB_ARN \
--query "datasetImportJob.latestDatasetImportJobRun.status"

Putting it all together: creating a solution

Once datasets have been imported, we need to select a recipe to cook our recommendation model. A recipe is much more than an algorithm: it also includes predefined feature transformation, initial parameters for the algorithm as well as automatic model tuning. Thus, recipes remove the need to have expertise in personalization.

Amazon Personalize comes with several recipes suitable for different use cases, and advanced users can also add their own recipes.

Here’s the list of available recipes.


Recommendation experts will certainly enjoy the flexibility that they bring, but what about developers who are new to the topic?

As mentioned earlier, Amazon Personalize supports AutoML, a new technique that automatically searches for the most optimal recipe, so let’s enable it. Hyper parameter optimization is enabled by default. Last but not least, Amazon Personalize solutions can scale automatically according to incoming traffic: we simply need to define the minimum number to transactions per second (TPS) that we want to support.

Thus, we can create the solution like so:

$ aws personalize create-solution --name jsimon-ml20m-solution \
--minTPS 10 --perform-auto-ml \
--dataset-group-arn $DATASET_GROUP_ARN \
--query 'solution.status'

This will take a little while as the optimal recipe is selected, trained and tuned. Once all of this is complete, we can look at solution metrics.

$ aws personalize get-metrics --solution-arn $SOLUTION_ARN

Recommending new items in real-time

If we’re happy with the model, we can now create a campaign in order to deploy it. It will be updated automatically every time the solution is deployed.

$ aws personalize create-campaign --name jsimon-ml20m-solution \
--solution-arn $SOLUTION_ARN --update-mode AUTO

Now, let’s recommend some movies.

$ aws personalize-rec get-recommendations --campaign-arn $CAMPAIGN_ARN \
--user-id $USER_ID --query "itemList[*].itemId"
["1210", "260", "2571", "110", "296", "1193", ...]

That’s it! As you can see, we successfully built a recommendation model with a few API calls. All we had to do was define a schema and upload the dataset. We relied on Amazon Personalize to select the best recipe with AutoML, and to optimize its hyper parameters. The solution was trained and deployed on fully-managed infrastructure, letting us focus even more on building our application.

Sign up for the preview now!

I hope this post was informative. We just scratched the surface of what Amazon Personalize can do. The service is available in preview in US-East (Virginia) and US-West (Oregon).

There is no charge for the service during the preview. Once the preview is complete, the service will be part of the AWS free tier. For the first two months after sign-up, you will be offered:
1. Data processing and storage: Up to 20 GB per month
2. Training: Up to 100 training hours per month
3. Inference: Up to 50 TPS-hours of real-time recommendations per month

To get started, visit aws.amazon.com/personalize/. Now it’s your turn to try it and let us know what you think.


AWS DeepRacer – Go Hands-On with Reinforcement Learning at re:Invent

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-deepracer-go-hands-on-with-reinforcement-learning-at-reinvent/

Reinforcement Learning is a type of machine learning that works when an “agent” is allowed to act on a trial-and-error basis within an interactive environment, using feedback from those actions to learn over time in order to reach a predetermined goal or to maximize some type of score or reward. This stands in contrast to other forms of machine learning such as Supervised Learning, where a set of facts (ground truths) are used to train a model so that it can make inferences.

We want you to get some hands-on experience with Reinforcement Learning at AWS re:Invent and I would like to tell you all about it today. This combination of hardware and software will help you get things (literally) moving!

AWS DeepRacer
Let’s talk about the hardware and software first. AWS DeepRacer is a 1/18th scale radio-controlled, four-wheel drive car:

There’s an Intel Atom® processor onboard, a 4 megapixel camera with 1080p resolution, fast (802.11ac) WiFi, multiple USB ports, and enough battery power to last for about 2 hours. The Atom processor runs Ubuntu 16.04 LTS, ROS (Robot Operating System), and the Intel OpenVino™ computer vision toolkit.

AWS DeepRacer includes a fully-configured cloud environment that you can use to train your Reinforcement Learning models. It takes advantage of the new Reinforcement Learning feature in Amazon SageMaker and also includes a 3D simulation environment powered by AWS RoboMaker. You can train an autonomous driving model against a collection of predefined race tracks included with the simulator and then evaluate them virtually or download them to a AWS DeepRacer car and verify performance in the real world.

Reinforcement Learning is one of the technologies that are used to make self-driving cars a reality; the AWS DeepRacer is the perfect vehicle (so to speak) for you to go hands-on and learn all about it. We’re ramping up volume production and you will be able to buy one of your very own very soon.

You can pre-order your very own AWS DeepRacer today and sign up to be part of the preview at aws.amazon.com/deepracer.

AWS DeepRacer & Reinforcement Learning at re:Invent
My colleagues have created an incredible program that will get you started with AWS DeepRacer and Reinforcement Learning!

re:Invent attendees can attend a workshop that will teach you the fundamentals of Reinforcement Learning and then show you how to create, train, and tweak an autonomous driving model for an AWS DeepRacer. You’ll create, train, and refine your model on an online simulator and then load it into a genuine AWS DeepRacer for a spin around one of our test tracks. Your goal: Get your AWS DeepRacer around the track as quickly and accurately as possible. There will be a competition every hour, with the chance to win AWS DeepRacers and AWS credits.

Start Your Engines
If you’re here at re:Invent consider yourselves under starters’ orders, because the very first AWS DeepRacer League will take place over the next 24 hours in the AWS DeepRacer workshops and at the MGM Speedway. You will use Amazon SageMaker, AWS RoboMaker, and other AWS services while you learn about Reinforcement Learning. There are 6 main tracks (and a pit area for each), a hacker garage, 2 extra tracks that you can use for training and experimentation, and a DJ to keep you revved up.

From 11:30 AM to 10 PM today (November 28th) every lap time will be entered onto the Speedway Leaderboard. The top 3 developers with the fastest times over the course of the day’s racing will advance to the 2018 grand finale where they will compete to become the AWS DeepRacer 2018 Champion.

The final race will take place on the AWS re:Invent International Speedway at 8 AM on Thursday, just before Werner’s keynote. You will get to race, learn, win prizes, and collect some swag!

AWS DeepRacer League
We want to make sure that developers all over the world have the same opportunity to get involved with AWS DeepRacer as re:Invent attendees. To that end I am excited to announce the AWS DeepRacer League – the world’s first global autonomous racing league, open to anyone. In 2019 there will be a series of live racing events at AWS Global Summits around the world, and we’ll also have virtual events and tournaments throughout the year. Winners and top scorers will advance to the AWS DeepRacer 2019 Championship Cup at re:invent 2019. I’ll have more detail on that soon, or you can check the AWS DeepRacer site for the latest updates.

I’ll have more details soon, so stay tuned and happy racing!




Amazon SageMaker RL – Managed Reinforcement Learning with Amazon SageMaker

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-rl-managed-reinforcement-learning-with-amazon-sagemaker/

In the last few years, machine learning (ML) has generated a lot of excitement. Indeed, from medical image analysis to self-driving trucks, the list of complex tasks that ML models can successfully accomplish keeps growing, but what makes these models so smart?

In a nutshell, you can train a model in several different ways of which these are three:

  1. Supervised learning: run an algorithm on a labelled data set, i.e. a data set containing samples and answers. Gradually, the model will learn how to correctly predict the right answer. Regression and classification are examples of supervised learning.
  2. Unsupervised learning: run an algorithm on an unlabelled data set, i.e. a data set containing samples only. Here, the model will progressively learn patterns in data and organize samples accordingly. Clustering and topic modeling are examples of unsupervised learning.
  3. Reinforcement learning: this one is quite different. Here, a computer program (aka an agent) interacts with its environment: most of the time, this takes place in a simulator. The agent receives a positive or negative reward for actions that it takes: rewards are computed by a user-defined function which outputs a numeric representation of the actions that should be incentivized. By trying to maximize positive rewards, the agent learns an optimal strategy for decision making.

Launched at AWS re:Invent 2017, Amazon SageMaker is helping customers quickly build, train and deploy ML models. Today, with the launch of Amazon SageMaker RL, we’re happy to extend the advantages of Amazon SageMaker to reinforcement learning, making it easier for all developers and data scientists regardless of their ML expertise.

A quick primer on reinforcement learning

Reinforcement learning (RL) can sound very confusing at first, so let’s take an example. Imagine an agent learning to navigate a maze. The simulator allows it to move in certain directions but blocks it from going through walls: using RL to learn a policy, the agent soon starts to take increasingly relevant actions.

One critical thing to understand is that the RL model isn’t trained on a predefined set of labelled mazes (that would be supervised learning). Instead, the agent discovers its environment (the current maze) one step at at time, moves one more step and receives a reward: stepping into a dead end is a negative reward, moving one step closer to the exit is a positive reward. Once a number of different mazes have been processed, the agent learns the action/reward data points and trains a model to make better decisions next time around. This cycle of exploring and training is central to RL: given enough mazes and enough training time, we would soon enough know how to navigate any maze.

RL is particularly suitable for complex, unpredictable, environments that can be simulated and where building a prior dataset would either be infeasible or prohibitively expensive: autonomous vehicles, games, portfolio management, inventory management, robotics or industrial control systems. For instance, researchers have shown that applying RL-based control to HVAC systems can result in 20% – 40% cost savings compared to typical rule-based systems [1], not to mention the large reduction in ecological footprint.

Introducing Amazon SageMaker RL

Amazon SageMaker RL builds on top of Amazon SageMaker, adding pre-packaged RL toolkits and making it easy to integrate any simulation environment. As you would expect, training and prediction infrastructure is fully managed, so that you can focus on your RL problem and not on managing servers.

Today, you can use containers provided by SageMaker for Apache MXNet and Tensorflow that include Open AI Gym, Intel Coach and Berkeley Ray RLLib. As usual with Amazon SageMaker, you can easily create your own custom environment using other RL libraries such as TensorForce or StableBaselines.

When it comes to simulation environments, Amazon SageMaker RL supports the following options:

  • First party simulators for AWS RoboMaker and Amazon Sumerian.
  • Open AI Gym environments and open source simulation environments that are developed using Gym interfaces, such as Roboschool or EnergyPlus.
  • Customer-developed simulation environments using the Gym interface.
  • Commercial simulators such as MATLAB and Simulink (customers will need to manage their own licenses).

Amazon SageMaker RL also comes with a collection of Jupyter notebooks, just like Amazon SageMaker does. They are available on Github, featuring both simple examples (cartpole, simple corridor) as well as advanced ones in a variety of domains such as robotics, operations research, finance, and more. You can easily extend these notebooks and customize them for your own business problem.

In addition, you’ll find examples showing you how to scale RL using either homogeneous or heterogeneous scaling. The latter is particularly important for many RL applications where simulation runs on CPUs and training on GPUs. Your simulation environment can also run locally or remotely in a different network and SageMaker will set everything up for you.

Don’t worry, this is easier than it seems. Let’s look at an example.

Predictive Auto Scaling with Amazon SageMaker RL

Auto Scaling allows you to dynamically scale your service (such as Amazon EC2), adding or removing capacity automatically according to conditions you define. Today, this typically requires setting up thresholds, alarms, scaling policies, etc.

Let’s see how we could optimize this process with a RL model and a custom simulator, pretending to scale your Amazon EC2 capacity (of course, this is just a toy example). For the sake of brevity, I will only highlight the most important code snippets: you’ll find the complete example on Github.

Here, the name of the game is to adapt the instance capacity to the load profile. We don’t want to be under-provisioned (losing traffic) or over-provisioned (wasting money): we want to be ‘just right’.

In RL terms:

  • The environment contains the load profile and the number of running instances.
  • At each step, the agent can take two actions: add instances and remove instances. Adding instances helps process more transactions, but they cost money and need a few minutes to come online. Removing instances saves money but reduces the overall processing capacity.
  • The reward is a combination of the cost for running instances and the value for completing successful transactions, with a big penalty for insufficient capacity.

Setting up the simulation

First, we need a simulator in order to generate load profiles similar to what you would observe on a high-traffic web server: let’s use a very simple Python program for that. Here’s an example plotting transactions per minute (tpm) over a 3-day period: mostly periodic with sharp unpredictable spikes.

Load profile

This is the initial state:

config_defaults = {
            "warmup_latency": 5,       # It takes 5 minutes for a new machine to warm up and become available.
            "tpm_per_machine": 300,    # Each machine can process 300 transactions per minute (tpm) on average
            "tpm_sigma": 30,           # Machine's TPM capacity is variable with +/- 30 standard deviation
            "machine_cost": 0.05,      # Machines cost $0.05/min
            "transaction_val": 0.90,   # Successful transactions are worth $0.90 per thousand (CPM)
            "downtime_cost": 200,      # Downtime is assumed to cost the business $200/min beyond incomplete transactions
            "downtime_percent": 99.5,  # Downtime is defined as availability dropping below 99.5%
            "initial_machines": 50,    # How many machines are initially turned on
            "max_time_steps": 1000,    # Maximum number of timesteps per episode

Computing the reward

This is quite straightforward! The current load is compared to the current capacity, we deduct the cost of any lost transaction and we apply a large penalty for losing more than 0.5% (a pretty strict definition of downtime!).

def _react_to_load(self):
        self.capacity = int(self.active_machines * np.random.normal(self.tpm_per_machine, self.tpm_sigma))
        if self.current_load <= self.capacity:
            # All transactions succeed
            self.failed = 0
            succeeded = self.current_load
            # Some transactions failed
            self.failed = self.current_load - self.capacity
            succeeded = self.capacity
        reward = succeeded * self.transaction_val / 1000.0  # divide by thousand for CPM
        percent_success = 100.0 * succeeded / (self.current_load + 1e-20)
        if percent_success < self.downtime_percent:
            self.is_down = 1
            reward -= self.downtime_cost
            self.is_down = 0
        reward -= self.active_machines * self.machine_cost
        return reward

Stepping through the simulation

Here’s how the agent goes through each time step initiated by the RL framework. As explained above, the model will initially predict random actions, but after a few training rounds, it’ll get much smarter.

def step(self, action):
        # First, react to the actions and adjust the fleet
        turn_on_machines = int(action[0])
        turn_off_machines = int(action[1])
        self.active_machines = max(0, self.active_machines - turn_off_machines)
        warmed_up_machines = self.warmup_queue[0]
        self.active_machines = min(self.active_machines + warmed_up_machines, self.max_machines)
        self.warmup_queue = self.warmup_queue[1:] + [turn_on_machines]
        # Now react to the current load and calculate reward
        self.current_load = self.load_simulator.time_step_load()
        reward = self._react_to_load()
        self.t += 1
        done = self.t > self.max_time_steps
        return self._observation(), reward, done, {}

Training on Amazon SageMaker

Now, we’re ready to train our model, just like any other SageMaker model: passing the image name (here, the TensorFlow container for Intel Coach), the instance type, etc.

rlestimator = RLEstimator(role=role,

In the training log, we see that the agent first explores its environment without any training: this is called the heatup phase and it’s used to generate an initial dataset to learn from.

## simple_rl_graph: Starting heatup
Heatup> Name=main_level/agent, Worker=0, Episode=1, Total reward=-39771.13, Steps=1001, Training iteration=0
Heatup> Name=main_level/agent, Worker=0, Episode=2, Total reward=-3089.54, Steps=2002, Training iteration=0
Heatup> Name=main_level/agent, Worker=0, Episode=3, Total reward=-43205.29, Steps=3003, Training iteration=0
Heatup> Name=main_level/agent, Worker=0, Episode=4, Total reward=-24542.07, Steps=4004, Training iteration=0

Once the heatup phase is complete, the model goes through repeated cycles of learning (aka ‘policy training’) and exploration based on what it has learned (aka ‘training’).

Policy training> Surrogate loss=-0.09095033258199692, KL divergence=0.0003891458618454635, Entropy=2.8382163047790527, training epoch=0, learning_rate=0.0003
Policy training> Surrogate loss=-0.1263471096754074, KL divergence=0.00145535240881145, Entropy=2.836780071258545, training epoch=1, learning_rate=0.0003
Policy training> Surrogate loss=-0.12835979461669922, KL divergence=0.0022696126252412796, Entropy=2.835214376449585, training epoch=2, learning_rate=0.0003
Policy training> Surrogate loss=-0.12992703914642334, KL divergence=0.00254297093488276, Entropy=2.8339898586273193, training epoch=3, learning_rate=0.0003
Training> Name=main_level/agent, Worker=0, Episode=152, Total reward=-54843.29, Steps=152152, Training iteration=1
Training> Name=main_level/agent, Worker=0, Episode=153, Total reward=-51277.82, Steps=153153, Training iteration=1
Training> Name=main_level/agent, Worker=0, Episode=154, Total reward=-26061.17, Steps=154154, Training iteration=1 

Once the model hits the number of epochs that we set, training is complete. In this case, we trained for 18 minutes: let’s see how well our model learned.

Visualizing training

One way to find out is to plot the rewards received by the agent after each exploration iteration. As expected, rewards in the heatup phase (150 iterations) are extremely negative because the agent hasn’t been trained at all. Then, as soon as training is applied, rewards start to improve rapidly.

Rewards vs iterations

Here’s a zoom on post-heatup iterations. As you can see, about halfway through, the agent starts receiving pretty consistent positive rewards, showing that it’s able to apply efficient scaling to the load profiles that it discovers.

Rewards vs iterations

Deploying the model

If we’re happy with the model, we can then deploy it just like any SageMaker model and use the newly-created HTTPS endpoint to predict. Alternatively, if you are training a robot then you can also deploy on Edge devices using AWS Greengrass.

Now available

I hope this post was informative. We’ve barely scratched the surface of what Amazon SageMaker RL can do. You can use it today in all regions where Amazon SageMaker is available. Please start exploring and let us know what you think. We can’t wait to see what you will build!


[1] “Deep Reinforcement Learning for Building HVAC Control”, T. Wei, Y. Wang and Q. Zhu, DAC’17, June 18-22, 2017, Austin, TX, USA.

NEW – Machine Learning algorithms and model packages now available in AWS Marketplace

Post Syndicated from Shaun Ray original https://aws.amazon.com/blogs/aws/new-machine-learning-algorithms-and-model-packages-now-available-in-aws-marketplace/

At AWS, our mission is to put machine learning in the hands of every developer. That’s why in 2017 we launched Amazon SageMaker. Since then it has become one of the fastest growing services in AWS history, used by thousands of customers globally. Customers using Amazon SageMaker can use optimized algorithms offered in Amazon SageMaker, to run fully-managed MXNet, TensorFlow, PyTorch, and Chainer algorithms, or bring their own algorithms and models. When it comes to building their own machine learning model, many customers spend significant time developing algorithms and models that are solutions to problems that have already been solved.


Introducing Machine Learning in AWS Marketplace

I am pleased to announce the new Machine Learning category of products offered by AWS Marketplace, which includes over 150+ algorithms and model packages, with more coming every day. AWS Marketplace offers a tailored selection for vertical industries like retail (35 products), media (19 products), manufacturing (17 products), HCLS (15 products), and more. Customers can find solutions to critical use cases like breast cancer prediction, lymphoma classifications, hospital readmissions, loan risk prediction, vehicle recognition, retail localizer, botnet attack detection, automotive telematics, motion detection, demand forecasting, and speech recognition.

Customers can search and browse a list of algorithms and model packages in AWS Marketplace. Once customers have subscribed to a machine learning solution, they can deploy it directly from the SageMaker console, a Jupyter Notebook, the SageMaker SDK, or the AWS CLI. Amazon SageMaker protects buyers data by employing security measures such as static scans, network isolation, and runtime monitoring.

The intellectual property of sellers on the AWS Marketplace is protected by encrypting the algorithms and model package artifacts in transit and at rest, using secure (SSL) connections for communications, and ensuring role based access for deployment of artifacts. AWS provides a secure way for the sellers to monetize their work with a frictionless self-service process to publish their algorithms and model packages.


Machine Learning category in Action

Having tried to build my own models in the past, I sure am excited about this feature. After browsing through the available algorithms and model packages from AWS Marketplace, I’ve decided to try the Deep Vision vehicle recognition model, published by Deep Vision AI. This model will allow us to identify the make, model and type of car from a set of uploaded images. You could use this model for insurance claims, online car sales, and vehicle identification in your business.

I continue to subscribe and accept the default options of recommended instance type and region. I read and accept the subscription contract, and I am ready to get started with our model.

My subscription is listed in the Amazon SageMaker console and is ready to use. Deploying the model with Amazon SageMaker is the same as any other model package, I complete the steps in this guide to create and deploy our endpoint.

With our endpoint deployed I can start asking the model questions. In this case I will be using a single image of a car; the model is trained to detect the model, maker, and year information from any angle. First, I will start off with a Volvo XC70 and see what results I get:


{'result': [{'mmy': {'make': 'Volvo', 'score': 0.97, 'model': 'Xc70', 'year': '2016-2016'}, 'bbox': {'top': 146, 'left': 50, 'right': 1596, 'bottom': 813}, 'View': 'Front Left View'}]}

My model has detected the make, model and year correctly for the supplied image. I was recently on holiday in the UK and stayed with a relative who had a McLaren 570s supercar. The thought that crossed my mind as the gulf-wing doors opened for the first time and I was about to be sitting in the car, was how much it would cost for the insurance excess if things went wrong! Quite apt for our use case today.


{'result': [{'mmy': {'make': 'Mclaren', 'score': 0.95, 'model': '570S', 'year': '2016-2017'}, 'bbox': {'top': 195, 'left': 126, 'right': 757, 'bottom': 494}, 'View': 'Front Right View'}]}

The score (0.95) measures how confident the model is that the result is right. The range of the score is 0.0 to 1.0. My score is extremely accurate for the McLaren car, with the make, model and year all correct. Impressive results for a relatively rare type of car on the road. I test a few more cars given to me by the launch team who are excitedly looking over my shoulder and now it’s time to wrap up.

Within ten minutes, I have been able to choose a model package, deploy an endpoint and accurately detect the make, model and year of vehicles, with no data scientists, expensive GPU’s for training or writing any code. You can be sure I will be subscribing to a whole lot more of these models from AWS Marketplace throughout re:Invent week and trying to solve other use cases in less than 15 minutes!

Access for the machine learning category in AWS Marketplace can be achieved through the Amazon SageMaker console, or directly through AWS Marketplace itself. Once an algorithm or model has been successfully subscribed to, it is accessible via the console, SDK, and AWS CLI. Algorithms and models from the AWS Marketplace can be deployed just like any other model or algorithm, by selecting the AWS Marketplace option as your package source. Once you have chosen an algorithm or model, you can deploy it to Amazon SageMaker by following this guide.


Availability & Pricing

Customers pay a subscription fee for the use of an algorithm or model package and the AWS resource fee. AWS Marketplace provides a consolidated monthly bill for all purchased subscriptions.

At launch, AWS Marketplace for Machine Learning includes algorithms and models from Deep Vision AI Inc, Knowledgent, RocketML, Sensifai, Cloudwick Technologies, Persistent Systems, Modjoul, H2Oai Inc, Figure Eight [Crowdflower], Intel Corporation, AWS Gluon Model Zoos, and more with new sellers being added regularly. If you are interested in selling machine learning algorithms and model packages, please reach out to [email protected]



Amazon SageMaker Ground Truth – Build Highly Accurate Datasets and Reduce Labeling Costs by up to 70%

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-ground-truth-build-highly-accurate-datasets-and-reduce-labeling-costs-by-up-to-70/

In 1959, Arthur Samuel defined machine learning as a “field of study that gives computers the ability to learn without being explicitly programmed”. However, there is no deus ex machina: the learning process requires an algorithm (“how to learn”) and a training dataset (“what to learn from”).

Today, most machine learning tasks use a technique called supervised learning: an algorithm learns patterns or behaviours from a labeled dataset. A labeled dataset containing data samples as well as the correct answer for each one of them, aka ‘ground truth’. Depending on the problem at hand, one could use labeled images (“this is a dog”, “this is a cat”), labeled text (“this is spam”, “this isn’t”), etc.

Fortunately, developers and data scientists can now rely on a vast collection of off-the-shelf algorithms (as illustrated by the built-in algorithms in Amazon SageMaker) and of reference datasets. Deep learning has popularized image datasets such as MNIST, CIFAR-10 or ImageNet, and more are also available for tasks like machine translation or text classification. These reference datasets are extremely useful for beginners and experienced practitioners alike, but a lot of companies and organizations still need to train machine learning models on their own dataset: think about medical imaging, autonomous driving, etc.

Building such datasets is a complex problem, particularly when working at scale. How long would it take one person to label one thousand images or documents? ‘Quite some time’ is probably the answer! Now imagine having to label one million images or documents: how many people would you now need? For most companies and organizations, this is a moot point, as they would never be able to muster enough people anyway.

Well, no more! Today, I’m very happy to announce Amazon SageMaker Ground Truth, a new capability of Amazon SageMaker that makes it easy for customers to to efficiently and accurately label the datasets required for training machine learning systems.

Introducing Amazon SageMaker Ground Truth

Amazon SageMaker Ground Truth helps you build datasets for:

  • Text classification.
  • Image classification, i.e categorizing images in specific classes.
  • Object detection, i.e. locating objects in images with bounding boxes.
  • Semantic segmentation, i.e. locating objects in images with pixel-level precision.
  • Custom user-defined tasks.

Amazon SageMaker Ground Truth can optionally use active learning to automate the labeling of your input data. Active learning is a machine learning technique that identifies data that needs to be labeled by humans and data that can be labeled by machine. Automated data labeling incurs Amazon SageMaker training and inference costs, but it can help to reduce the cost (up to 70%) and time that it takes to label your dataset over having humans label your complete dataset.

When manual effort is required, you can choose to use a crowdsourced Amazon Mechanical Turk workforce of over 500,000 workers, a private workforce of your own workers, or one of the curated third party vendors listed on the AWS Marketplace.

Let’s look at the high-level steps required to label a dataset:

  • Store your data in Amazon S3,
  • Create a labeling workforce,
  • Create a labeling job,
  • Get to work,
  • Visualize results.

How about an example? Let me show you how to label images from the CBCL StreetScenes dataset. This dataset contains 3548 images such as this one. For the sake of brevity, I will only use the first 10 images and annotate cars only.

Street scene

Storing data in Amazon S3

The first step is to create a manifest file for the dataset. This is a simple JSON file listing all images present in the dataset. Mine looks like this: please note that each line corresponds to a single object and is an independent JSON document.

{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00001.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00002.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00003.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00004.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00005.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00006.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00007.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00008.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00009.JPG"}
{"source-ref": "s3://jsimon-groundtruth-demo/SSDB00010.JPG"}

Then, I simply copy the manifest file and the corresponding images to an Amazon S3 bucket.

Creating a labeling workforce

Amazon SageMaker Ground Truth gives us different options:

  • Public workforce, backed by Amazon Mechanical Turk,
  • Private workforce, backed by internal resources,
  • Vendor workforce, backed by third-party resources.

The first option is probably the most scalable one. However, the last two may be a better fit if your job requires confidentiality, service guarantees, or special skills.

I can only count on myself here, so I create a private team authenticated by a new Amazon Cognito group. Indeed, authentication is required before any worker can access the dataset.

Work force

Then, I add myself to the team by entering my email address. A few seconds later, I receive an invitation containing credentials and a URL. This URL also can be found on the labeling workforces dashboard.

Once I’ve clicked on the link and changed my password, I am registered as a verified worker for this team.

The one-man team is now ready. It’s time to create the labeling job itself.

Creating a labeling job

As you would expect, I have to define the location of the manifest file and of the dataset.


Then, I can decide whether I want to use the full dataset or a subset: I could even write a SQL query to filter the files. Here, let’s use the full dataset, as it only has 10 images.

Data set

Next, I have to select the type of the labeling job. As stated earlier, there are multiple options available and here I’m interested in adding bounding boxes to my images.

Next, I select the team that I want to assign to the job. This is where I could select automated data labeling. I could also decide to ask multiple workers to label the same image to increase accuracy.

Labeling job

Finally, I can provide additional instructions to workers, detailing the specific task that needs to be performed and giving them a couple of examples.

Labeling job

That’s it. Our labeling job is now ready. Time for the team (well… me, really) to get to work.

Labeling job

Labeling images

Logging into the URL I received by email, I see the list of jobs I’m assigned to.


When I click on the ‘Start working’ button, I see instructions as well as a first image to work on. Using the toolbox, I can draw boxes, zoom in and out, etc. This is pretty intuitive, but drawing boxes that fit just right takes time and care. Now I understand why this is such a time-consuming process… and I have only ten images to go!

Here’s a zoom on another image. Can you see all seven cars?


Once I’m done with all ten images, I can take a well-deserved break and enjoy the completion of the labeling job.

Labeling job

Visualizing results

Annotated images are visible directly in the AWS console, which comes in handy for sanity checks. I can also click on any image and see the list of labels that have been applied.

Of course, our purpose is to use this information to train machine learning models: we can find it in the augmented manifest file stored in our bucket. For example, here’s what the manifest has to say about the first image, where I labeled five cars.

"source-ref": "s3://jsimon-groundtruth-demo/SSDB00001.JPG",
"GroundTruthDemo": {
  "annotations": [
    {"class_id": 0, "width": 54, "top": 482, "height": 39, "left": 337},
    {"class_id": 0, "width": 69, "top": 495, "height": 53, "left": 461},
    {"class_id": 0, "width": 52, "top": 482, "height": 41, "left": 523},
    {"class_id": 0, "width": 71, "top": 481, "height": 62, "left": 589},
    {"class_id": 0, "width": 347, "top": 479, "height": 120, "left": 573}
  "image_size": [{"width": 1280, "depth": 3, "height": 960}
"GroundTruthDemo-metadata": {
  "job-name": "labeling-job/groundtruthdemo",
  "class-map": {"0": "Car"},
  "human-annotated": "yes",
  "objects": [
    {"confidence": 0.94},
    {"confidence": 0.94},
    {"confidence": 0.94},
    {"confidence": 0.94},
    {"confidence": 0.94}
  "creation-date": "2018-11-26T04:01:09.038134",
  "type": "groundtruth/object-detection"

This has all the information required to train an object detection model, such as the built-in Single-Shot Detector available in Amazon SageMaker, but this is another story!

Now available!

I hope this post was informative. We just scratched the surface of what Amazon SageMaker Ground Truth can do. The service is available today in US-East (Virginia), US-Central (Ohio), US-West (Oregon), Europe (Ireland) and Asia Pacific (Tokyo). Now it’s your turn to try it, and let us know what you think!


Amazon Elastic Inference – GPU-Powered Deep Learning Inference Acceleration

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-elastic-inference-gpu-powered-deep-learning-inference-acceleration/

One of the reasons for the recent progress of Artificial Intelligence and Deep Learning is the fantastic computing capabilities of Graphics Processing Units (GPU). About ten years ago, researchers learned how to harness their massive hardware parallelism for Machine Learning and High Performance Computing: curious minds will enjoy the seminal paper (PDF) published in 2009 by Stanford University.

Today, GPUs help developers and data scientists train complex models on massive data sets for medical image analysis or autonomous driving. For instance, the Amazon EC2 P3 family lets you use up to eight NVIDIA V100 GPUs in the same instance, for up to 1 PetaFLOP of mixed-precision performance: can you believe that 10 years ago this was the performance of the fastest supercomputer ever built?

Of course, training a model is half the story: what about inference, i.e. putting the model to work and predicting results for new data samples? Unfortunately, developers are often stumped when the time comes to pick an instance type and size. Indeed, for larger models, the inference latency of CPUs may not meet the needs of online applications, while the cost of a full-fledged GPU may not be justified. In addition, resources like RAM and CPU may be more important to the overall performance of your application than raw inference speed.

For example, let’s say your power-hungry application requires a c5.9xlarge instance ($1.53 per hour in us-east-1): a single inference call with an SSD model would take close to 400 milliseconds, which is certainly too slow for real-time interaction. Moving your application to a p2.xlarge instance (the most inexpensive general-purpose GPU instance at $0.90 per hour in us-east-1) would improve inference performance to 180 milliseconds: then again, this would impact application performance as p2.xlarge has less vCPUs and less RAM.

Well, no more compromising. Today, I’m very happy to announce Amazon Elastic Inference, a new service that lets you attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 instance. This is also available for Amazon SageMaker notebook instances and endpoints, bringing acceleration to built-in algorithms and to deep learning environments.

Pick the best CPU instance type for your application, attach the right amount of GPU acceleration and get the best of both worlds! Of course, you can use EC2 Auto Scaling to add and remove accelerated instances whenever needed.

Introducing Amazon Elastic Inference

Amazon Elastic Inference supports popular machine learning frameworks TensorFlow, Apache MXNet and ONNX (applied via MXNet). Changes to your existing code are minimal, but you will need to use AWS-optimized builds which automatically detect accelerators attached to instances, ensure that only authorized access is allowed, and distribute computation across the local CPU resource and the attached accelerator. These builds are available in the AWS Deep Learning AMIs, on Amazon S3 so you can build it into your own image or container, and provided automatically when you use Amazon SageMaker.

Amazon Elastic Inference is available in three sizes, making it efficient for a wide range of inference models including computer vision, natural language processing, and speech recognition.

  • eia1.medium: 8 TeraFLOPs of mixed-precision performance.
  • eia1.large: 16 TeraFLOPs of mixed-precision performance.
  • eia1.xlarge: 32 TeraFLOPs of mixed-precision performance.

This lets you select the best price/performance ratio for your application. For instance, a c5.large instance configured with eia1.medium acceleration will cost you $0.22 an hour (us-east-1). This combination is only 10-15% slower than a p2.xlarge instance, which hosts a dedicated NVIDIA K80 GPU and costs $0.90 an hour (us-east-1). Bottom line: you get a 75% cost reduction for equivalent GPU performance, while picking the exact instance type that fits your application.

Let’s dive in and look at Apache MXNet and TensorFlow examples on an Amazon EC2 instance.

Setting up Amazon Elastic Inference

Here are the high-level steps required to use the service with an Amazon EC2 instance.

  1. Create a security group for the instance allowing only incoming SSH traffic.
  2. Create an IAM role for the instance, allowing it to connect to the Amazon Elastic Inference service.
  3. Create a VPC endpoint for Amazon Elastic Inference in the VPC where the instance will run, attaching a security group allowing only incoming HTTPS traffic from the instance. Please note that you’ll only have to do this once per VPC and that charges for the endpoint are included in the cost of the accelerator.

VPC endpoint

Creating an accelerated instance

Now that the endpoint is available, let’s use the AWS CLI to fire up a c5.large instance with the AWS Deep Learning AMI.

aws ec2 run-instances --image-id $AMI_ID \
--key-name $KEYPAIR_NAME --security-group-ids $SG_ID \
--subnet-id $SUBNET_ID --instance-type c5.large \
--elastic-inference-accelerator Type=eia1.large

That’s it! You don’t need to learn any new APIs to use Amazon Elastic Inference: simply pass an extra parameter describing the accelerator type. After a few minutes, the instance is up and we can connect to it.

Accelerating Apache MXNet

In this classic example, we will load a large pre-trained convolution neural network on the Amazon Elastic Inference Accelerator (if you’re not familiar with pre-trained models, I covered the topic in a previous post). Specifically, we’ll use a ResNet-152 network trained on the ImageNet dataset.

Then, we’ll simply classify an image on the Amazon Elastic Inference Accelerator

import mxnet as mx
import numpy as np
from collections import namedtuple
Batch = namedtuple('Batch', ['data'])

# Download model (ResNet-152 trained on ImageNet) and ImageNet categories

# Set compute context to Elastic Inference Accelerator
# ctx = mx.gpu(0) # This is how we'd predict on a GPU
ctx = mx.eia()    # This is how we predict on an EI accelerator

# Load pre-trained model
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0)
mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))],
mod.set_params(arg_params, aux_params, allow_missing=True)

# Load ImageNet category labels
with open('synset.txt', 'r') as f:
    labels = [l.rstrip() for l in f]

# Download and load test image
fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/dog.jpg?raw=true')
img = mx.image.imread(fname)

# Convert and reshape image to (batch=1, channels=3, width, height)
img = mx.image.imresize(img, 224, 224) # Resize to training settings
img = img.transpose((2, 0, 1)) # Channels 
img = img.expand_dims(axis=0)  # Batch size
# img = img.as_in_context(ctx) # Not needed: data is loaded automatically to the EIA

# Predict the image
prob = mod.get_outputs()[0].asnumpy()

# Print the top 3 classes
prob = np.squeeze(prob)
a = np.argsort(prob)[::-1]
for i in a[0:3]:
    print('probability=%f, class=%s' %(prob[i], labels[i]))

As you can see, there are only a couple of differences:

  • I set the compute context to mx.eia(). No numbering is required, as only one Amazon Elastic Inference accelerator may be attached on an Amazon EC2 instance.
  • I did not explicitly load the image on the Amazon Elastic Inference accelerator, as I would have done with a GPU. This is taken care of automatically.

Running this example produces the following result.

probability=0.979113, class=n02110958 pug, pug-dog
probability=0.003781, class=n02108422 bull mastiff
probability=0.003718, class=n02112706 Brabancon griffon

What about performance? On our c5.large instance, this prediction takes about 0.23 second on the CPU, and only 0.031 second on its eia1.large accelerator. For comparison, it takes about 0.015 second on a p3.2xlarge instance equipped with a full-fledged NVIDIA V100 GPU. If we use a eia1.medium accelerator instead, this prediction takes 0.046 second, which is just as fast as a p2.xlarge (0.042 second) but at a 75% discount!

Accelerating TensorFlow

You can use TensorFlow Serving to serve accelerated predictions: it’s a model server which loads saved models and serves high-performance prediction through REST APIs and gRPC.

Amazon Elastic Inference includes an accelerated version of TensorFlow Serving, which you would use like this.

$ ei_tensorflow_model_server --model_name=resnet --model_base_path=$MODEL_PATH --port=9000
$ python resnet_client.py --server=localhost:9000

Now Available

I hope this post was informative. Amazon Elastic Inference is available now in US East (N. Virginia and Ohio), US West (Oregon), EU (Ireland) and Asia Pacific (Seoul and Tokyo). You can start building applications with it today!


New – Amazon FSx for Lustre

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-fsx-for-lustre/

A pebibyte (PiB – 1,125,899,906,842,624 bytes) is an impressive amount of data, slightly less than half of the estimated memory capacity of a human brain. Data lakes, High-Performance Computing (HPC), and Electronic Design Automation (EDA) applications traditionally work at this scale, as do more recent data-intensive applications such as Machine Learning and media processing.

Amazon FSx for Lustre
Today we are launching Amazon FSx for Lustre, designed to meet the needs of these applications and others that you will undoubtedly dream up. Based on the mature and popular Lustre open source project, Amazon FSx for Lustre is a highly parallel file system that supports sub-millisecond access to petabyte-scale file systems. Thousands of simultaneous clients (EC2 instances and on-premises servers) can drive millions of IOPS (Input/Output Operations per Second) and transfer hundreds of gigibytes of data per second.

You can create a file system in minutes, mount it on any number of clients, and start accessing it right away. This is a fully managed service so there’s nothing to maintain and nothing to administer. You can build standalone file systems for ephemeral use, or you can seamlessly join them to an S3 bucket and then access the contents of the bucket as if it were a Lustre file system. Each file system is backed by NVMe SSD storage, provisioned in increments of 3.6 TiB, and designed to deliver 200 Mbps of aggregate throughput at 10,000 IOPS for every 1 TiB of provisioned capacity.

Creating a Lustre File System
You can create a Lustre file system from the AWS Management Console, CLI, or by calling the CreateFileSystem function. I’ll use the CLI today; I simply specify the subnets for the Lustre endpoints and the desired storage capacity:

$ aws fsx create-file-system --file-system-type LUSTRE --storage-capacity 3600 --subnet-ids subnet-009a1149
|                                      CreateFileSystem                                      |
||                                        FileSystem                                        ||
||  CreationTime   |  1542666225.28                                                         ||
||  DNSName        |  fs-00a2e062546ff4fce.fsx.us-east-1.amazonaws.com                      ||
||  FileSystemId   |  fs-00a2e062546ff4fce                                                  ||
||  FileSystemType |  LUSTRE                                                                ||
||  Lifecycle      |  CREATING                                                              ||
||  OwnerId        |  012345678912                                                          ||
||  ResourceARN    |  arn:aws:fsx:us-east-1:012345678912:file-system/fs-00a2e062546ff4fce   ||
||  StorageCapacity|  3600                                                                  ||
||  VpcId          |  vpc-e68d9c81                                                          ||
|||                                   LustreConfiguration                                  |||
|||  WeeklyMaintenanceStartTime                                    |  5:09:00              |||
|||                                        SubnetIds                                       |||
|||  subnet-009a1149                                                                       |||

This takes about 5 minutes and then it becomes AVAILABLE:

$ aws fsx describe-file-systems --file-system-id fs-00a2e062546ff4fce | grep Lifecycle
||  Lifecycle      |  AVAILABLE                                                             ||

My EC2 instance already has the Lustre kernel modules and the Lustre client installed:

I create a mount point and mount my Lustre file system:

$ sudo mkdir /fsx
$ sudo mount -t lustre [email protected]:/fsx /fsx

And my 3.4 TiB Lustre file system is ready to use:

I can also create a file system that sits in front of an S3 bucket (or a prefixed section of an S3 bucket). This allows me to treat my bucket as a data lake, and to process it using tools and applications that are file-based. I simply include the bucket name as the ImportPath when I create the file system:

$ aws fsx create-file-system --file-system-type LUSTRE --storage-capacity 3600 \
  --subnet-ids subnet-009a1149 --lustre-configuration ImportPath=s3://jbarr-src

My bucket has about 1 million files inside, so the creation process takes about 30 minutes (the team told me that this takes about 500 files per second). Here is my bucket:

And here is what it looks like from my EC2 instance:

At this point, the Lustre file system contains all of the metadata (names, dates, sizes, and so forth) for my objects but it does not have the actual file data. This data is copied from S3 on an as-needed basis. As a result, this command will not access S3:

$ find . -type f

And this one will, with a small latency penalty for each access because objects are copied from S3 to the file system on an as-needed basis:

$ find . -type f -exec grep -l -i main {} \;

If I understand my code’s access pattern, I can use the hsm_restore option of the lfs command to pre-load them. Perhaps I plan to analyze all of the C header files:

$ find . -type f -name '*.h' -print0 | \
  xargs -0 -n 50 -P 8 sudo lfs hsm_restore

Any changes that I make to the files remain within the file system. I can export changed files back to S3 using the hsm_archive option of the lfs command:

$ sudo lfs hsm_archive README.md
$ sudo lfs hsm_action README.md

The first command initiates the export operation and the second one indicates that it is complete by printing NOOP. The changed files are written to the same bucket, prefixed by the ExportPath of the file system:

I can discover the ExportPath from the command line:

$ aws fsx describe-file-systems --file-system-id fs-086f5160a68bc158b | grep Path
||||  ExportPath       |  s3://jbarr-src/FSxLustre20181120T005845Z                        ||||
||||  ImportPath       |  s3://jbarr-src                                                  ||||

Each file system publishes a rich set of metrics to CloudWatch:

There’s a lot more, but I’m just about out of space! For example, I didn’t show you the scale that you can achieve using Amazon FSx for Lustre. I used one client, but could just have easily used thousands.

Things to Know
Here are a couple of interesting things to keep in mind regarding Amazon FSx for Lustre:

Console Access – I wrote this post using the CLI; a full console is also available.

Regions – You can create Lustre file systems in the US East (N. Virginia), US West (Oregon), US East (Ohio), and Europe (Ireland) Regions.

Pricing – Pricing is based on the amount of storage that you have provisioned, and starts at $0.14 per GiB per month in the US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions.

Access – You can access your file systems from EC2 instances. You can also use AWS Direct Connect to connect your existing data center or colo to AWS, and access your file systems from there.

Security – Access to each file system goes through a security group, with IAM policies for fine-grained access control. Data at rest is encrypted using a 256-bit block cypher and keys managed by Amazon FSx for Lustre.

Available Now
Amazon FSx for Lustre is available now and you can start using it today!




New – Amazon FSx for Windows File Server – Fast, Fully Managed, and Secure

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-fsx-for-windows-file-server-fast-fully-managed-and-secure/

Organizations that want to run Windows applications on the cloud are commonly looking for network file storage that’s fully compatible with their applications and their Windows environments. For example, enterprises use Active Directory for identification and Windows Access Control Lists for fine-grained control over access to folders and files, and their applications typically rely on storage that provides full Windows file system (NTFS file system) compatibility.

Amazon FSx for Windows File Server
Amazon FSx for Windows File Server fits all of these needs, and more. It was designed from the ground up to work with your existing Windows applications and environments, making lift-and-shift of your Windows workloads to the cloud super-easy. You get a native Windows file system backed by fully-managed Windows file servers, accessible via the widely adopted SMB (Server Message Block) protocol. Built on SSD storage, Amazon FSx for Windows File Server delivers the throughput, IOPS, and consistent sub-millisecond performance that you (and your Windows applications) expect.

Here are the most important things to know:

Accessibility & Protocol Support – You can access your shares from Amazon Elastic Compute Cloud (EC2) instances, Amazon WorkSpaces virtual desktops, Amazon AppStream 2.0 applications, and VMware Cloud on AWS. Versions 2.0 through 3.1.1 of SMB are supported, allowing you to use Windows versions starting from Windows 7 and Windows Server 2008, and current versions of Linux (via Samba). Active Directory integration is built in, allowing you to easily integrate with your existing enterprise environment.

Performance and TunabilityAmazon FSx for Windows File Server delivers consistent, sub-millisecond latency. You can set the file system size and throughput (in megabytes per second) independently, with plenty of latitude in each dimension. File systems can be as big as 64 TB, and can deliver up to 2,048 MB/second of throughput.

Management – Your file systems are fully managed and data is stored in redundant form within an AWS Availability Zone. You don’t have to worry about attaching and formatting additional storage devices, updating Windows Server, or recovering from hardware failures. Incremental file-system consistent backups are taken automatically every day, with the option to take additional backups when needed.

Security – You get multiple levels of access control and data protection. File system endpoints are created within Virtual Private Clouds (VPCs) and access is governed by Security Groups. Windows ACLs are used to control access to folders and files; IAM roles are used to control access to administrative functions, with administrative activities logged to AWS CloudTrail. Your data is encrypted in transit and (using a KMS key that you can control) at rest. The service is PCI-DSS compliant and can be used to build HIPAA-compliant applications.

Multi-AZ Deployment – You create file systems in distinct AWS Availability Zones, and can use Microsoft DFS to set up automatic replication and failover between them. You can also use Microsoft DFS Namespaces to create shared, common namespaces that span multiple file systems and provide up to 300 PB of storage.

Creating a File System
Amazon FSx for Windows File Server is easy to use. I start by confirming that I have a Active Directory with a Domain Controller in the VPC subnet (subnet-009a1149) where I plan to create my file system’s endpoints:

For testing purposes, I also have an EC2 instance running Windows in the same subnet:

I open the Amazon FSx Console, and click Create file system:

I choose my file system option:

I specify a name, size, optional throughput, and other parameters for my new file system, and click Review summary to proceed:

On another browser tab I verify that the security group for the file system is configured to allow connections from my EC2 instance on the desired ports (135, 445, and 55555):

On the next page I review the settings and the estimated monthly costs, and click Create file system. My file system starts out in the Creating status and transitions to Available in minutes:

I can see an overview at a glance:

And I can click the Network & Security tab to get the DNS name for my file system:

I copy the DNS name, hop over to my EC2 instance, open Explorer, and Map my file system (a shared named share is created automatically):

Then I can use it like any other share (I’m sure that your use case is better than mine, but perhaps not as historically significant):

Each file system includes one share (named share) automatically. I can connect to the file system and create additional shares using the standard Windows tools and wizards:

File-system consistent backups are made daily during the backup window for the file system, and are retained for up to 35 days, as specified when the file system was created. I can also make backups on an as-needed basis:

Available Now
Amazon FSx for Windows File Server is available now and you can start using it today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions, with expansion to other Regions planned for the coming months. Pricing is based on the amount of storage and throughput that you configure.



Build a Continuous Delivery Pipeline for Your Container Images with Amazon ECR as Source

Post Syndicated from Daniele Stroppa original https://aws.amazon.com/blogs/devops/build-a-continuous-delivery-pipeline-for-your-container-images-with-amazon-ecr-as-source/

Today, we are launching support for Amazon Elastic Container Registry (Amazon ECR) as a source provider in AWS CodePipeline. You can now initiate an AWS CodePipeline pipeline update by uploading a new image to Amazon ECR. This makes it easier to set up a continuous delivery pipeline and use the AWS Developer Tools for CI/CD.

You can use Amazon ECR as a source if you’re implementing a blue/green deployment with AWS CodeDeploy from the AWS CodePipeline console. For more information about using the Amazon Elastic Container Service (Amazon ECS) console to implement a blue/green deployment without CodePipeline, see Implement Blue/Green Deployments for AWS Fargate and Amazon ECS Powered by AWS CodeDeploy.

This post shows you how to create a complete, end-to-end continuous deployment (CD) pipeline with Amazon ECR and AWS CodePipeline. It walks you through setting up a pipeline to build your images when the upstream base image is updated.


To follow along, you must have these resources in place:

  • A source control repository with your base image Dockerfile and a Docker image repository to store your image. In this walkthrough, we use a simple Dockerfile for the base image:
    FROM alpine:3.8

    RUN apk update

    RUN apk add nodejs
  • A source control repository with your application Dockerfile and source code and a Docker image repository to store your image. For the application Dockerfile, we use our base image and then add our application code:
    FROM 012345678910.dkr.ecr.us-east-1.amazonaws.com/base-image

    ENV PORT=80


    COPY app.js /app/

    CMD ["node", "/app/app.js"]

This walkthrough uses AWS CodeCommit for the source control repositories and Amazon ECR  for the Docker image repositories. For more information, see Create an AWS CodeCommit Repository in the AWS CodeCommit User Guide and Creating a Repository in the Amazon Elastic Container Registry User Guide.

Note: The source control repositories and image repositories must be created in the same AWS Region.

Set up IAM service roles

In this walkthrough you use AWS CodeBuild and AWS CodePipeline to build your Docker images and push them to Amazon ECR. Both services use Identity and Access Management (IAM) service roles to makes calls to Amazon ECR API operations. The service roles must have a policy that provides permissions to make these Amazon ECR calls. The following procedure helps you attach the required permissions to the CodeBuild service role.

To create the CodeBuild service role

  1. Follow these steps to use the IAM console to create a CodeBuild service role.
  2. On step 10, make sure to also add the AmazonEC2ContainerRegistryPowerUser policy to your role.

CodeBuild service role policies

Create a build specification file for your base image

A build specification file (or build spec) is a collection of build commands and related settings, in YAML format, that AWS CodeBuild uses to run a build. Add a buildspec.yml file to your source code repository to tell CodeBuild how to build your base image. The example build specification used here does the following:

  • Pre-build stage:
    • Sign in to Amazon ECR.
    • Set the repository URI to your ECR image and add an image tag with the first seven characters of the Git commit ID of the source.
  • Build stage:
    • Build the Docker image and tag the image with latest and the Git commit ID.
  • Post-build stage:
    • Push the image with both tags to your Amazon ECR repository.
version: 0.2

      - echo.Logging in to Amazon ECR...
      - aws --version
      - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=012345678910.dkr.ecr.us-east-1.amazonaws.com/base-image
      - IMAGE_TAG=${COMMIT_HASH:=latest}
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $REPOSITORY_URI:latest .
      - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG

To add a buildspec.yml file to your source repository

  1. Open a text editor and then copy and paste the build specification above into a new file.
  2. Replace the REPOSITORY_URI value (012345678910.dkr.ecr.us-east-1.amazonaws.com/base-image) with your Amazon ECR repository URI (without any image tag) for your Docker image. Replace base-image with the name for your base Docker image.
  3. Commit and push your buildspec.yml file to your source repository.
    git add .
    git commit -m "Adding build specification."
    git push

Create a build specification file for your application

Add a buildspec.yml file to your source code repository to tell CodeBuild how to build your source code and your application image. The example build specification used here does the following:

  • Pre-build stage:
    • Sign in to Amazon ECR.
    • Set the repository URI to your ECR image and add an image tag with the first seven characters of the CodeBuild build ID.
  • Build stage:
    • Build the Docker image and tag the image with latest and the Git commit ID.
  • Post-build stage:
    • Push the image with both tags to your ECR repository.
version: 0.2

      - echo Logging in to Amazon ECR...
      - aws --version
      - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=012345678910.dkr.ecr.us-east-1.amazonaws.com/hello-world
      - IMAGE_TAG=build-$(echo $CODEBUILD_BUILD_ID | awk -F":" '{print $2}')
      - echo Build started on `date`
      - echo Building the Docker image...
      - docker build -t $REPOSITORY_URI:latest .
      - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG
    - imageDetail.json

To add a buildspec.yml file to your source repository

  1. Open a text editor and then copy and paste the build specification above into a new file.
  2. Replace the REPOSITORY_URI value (012345678910.dkr.ecr.us-east-1.amazonaws.com/hello-world) with your Amazon ECR repository URI (without any image tag) for your Docker image. Replace hello-world with the container name in your service’s task definition that references your Docker image.
  3. Commit and push your buildspec.yml file to your source repository.
    git add .
    git commit -m "Adding build specification."
    git push

Create a continuous deployment pipeline for your base image

Use the AWS CodePipeline wizard to create your pipeline stages:

  1. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
  2. On the Welcome page, choose Create pipeline.
    If this is your first time using AWS CodePipeline, an introductory page appears instead of Welcome. Choose Get Started Now.
  3. On the Step 1: Name page, for Pipeline name, type the name for your pipeline and choose Next step. For this walkthrough, the pipeline name is base-image.
  4. On the Step 2: Source page, for Source provider, choose AWS CodeCommit.
    1. For Repository name, choose the name of the AWS CodeCommit repository to use as the source location for your pipeline.
    2. For Branch name, choose the branch to use, and then choose Next step.
  5. On the Step 3: Build page, choose AWS CodeBuild, and then choose Create project.
    1. For Project name, choose a unique name for your build project. For this walkthrough, the project name is base-image.
    2. For Operating system, choose Ubuntu.
    3. For Runtime, choose Docker.
    4. For Version, choose aws/codebuild/docker:17.09.0.
    5. For Service role, choose Existing service role, choose the CodeBuild service role you’ve created earlier, and then clear the Allow AWS CodeBuild to modify this service role so it can be used with this build project box.
    6. Choose Continue to CodePipeline.
    7. Choose Next.
  6. On the Step 4: Deploy page, choose Skip and acknowledge the pop-up warning.
  7. On the Step 5: Review page, review your pipeline configuration, and then choose Create pipeline.

Base image pipeline

Create a continuous deployment pipeline for your application image

The execution of the application image pipeline is triggered by changes to the application source code and changes to the upstream base image. You first create a pipeline, and then edit it to add a second source stage.

    1. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
    2. On the Welcome page, choose Create pipeline.
    3. On the Step 1: Name page, for Pipeline name, type the name for your pipeline, and then choose Next step. For this walkthrough, the pipeline name is hello-world.
    4. For Service role, choose Existing service role, and then choose the CodePipeline service role you modified earlier.
    5. On the Step 2: Source page, for Source provider, choose Amazon ECR.
      1. For Repository name, choose the name of the Amazon ECR repository to use as the source location for your pipeline. For this walkthrough, the repository name is base-image.

Amazon ECR source configuration

  1. On the Step 3: Build page, choose AWS CodeBuild, and then choose Create project.
    1. For Project name, choose a unique name for your build project. For this walkthrough, the project name is hello-world.
    2. For Operating system, choose Ubuntu.
    3. For Runtime, choose Docker.
    4. For Version, choose aws/codebuild/docker:17.09.0.
    5. For Service role, choose Existing service role, choose the CodeBuild service role you’ve created earlier, and then clear the Allow AWS CodeBuild to modify this service role so it can be used with this build project box.
    6. Choose Continue to CodePipeline.
    7. Choose Next.
  2. On the Step 4: Deploy page, choose Skip and acknowledge the pop-up warning.
  3. On the Step 5: Review page, review your pipeline configuration, and then choose Create pipeline.

The pipeline will fail, because it is missing the application source code. Next, you edit the pipeline to add an additional action to the source stage.

  1. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
  2. On the Welcome page, choose your pipeline from the list. For this walkthrough, the pipeline name is hello-world.
  3. On the pipeline page, choose Edit.
  4. On the Editing: hello-world page, in Edit: Source, choose Edit stage.
  5. Choose the existing source action, and choose the edit icon.
    1. Change Output artifacts to BaseImage, and then choose Save.
  6. Choose Add action, and then enter a name for the action (for example, Code).
    1. For Action provider, choose AWS CodeCommit.
    2. For Repository name, choose the name of the AWS CodeCommit repository for your application source code.
    3. For Branch name, choose the branch.
    4. For Output artifacts, specify SourceArtifact, and then choose Save.
  7. On the Editing: hello-world page, choose Save and acknowledge the pop-up warning.

Application image pipeline

Test your end-to-end pipeline

Your pipeline should have everything for running an end-to-end native AWS continuous deployment. Now, test its functionality by pushing a code change to your base image repository.

  1. Make a change to your configured source repository, and then commit and push the change.
  2. Open the AWS CodePipeline console at https://console.aws.amazon.com/codepipeline/.
  3. Choose your pipeline from the list.
  4. Watch the pipeline progress through its stages. As the base image is built and pushed to Amazon ECR, see how the second pipeline is triggered, too. When the execution of your pipeline is complete, your application image is pushed to Amazon ECR, and you are now ready to deploy your application. For more information about continuously deploying your application, see Create a Pipeline with an Amazon ECR Source and ECS-to-CodeDeploy Deployment in the AWS CodePipeline User Guide.


In this post, we showed you how to create a complete, end-to-end continuous deployment (CD) pipeline with Amazon ECR and AWS CodePipeline. You saw how to initiate an AWS CodePipeline pipeline update by uploading a new image to Amazon ECR. Support for Amazon ECR in AWS CodePipeline makes it easier to set up a continuous delivery pipeline and use the AWS Developer Tools for CI/CD.

New – Amazon CloudWatch Logs Insights – Fast, Interactive Log Analytics

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-cloudwatch-logs-insights-fast-interactive-log-analytics/

Many AWS services create logs. Off the top of my head there are VPC Flow Logs, Route 53 Logs, Lambda Logs, CloudTrail Logs (for AWS API calls), RDS Logs, IoT Logs, ECS Logs, API Gateway Logs, and S3 Server Access Logs, EC2 Instance Logs (via the CloudWatch Agent), to name a few. The services that you run on your EC2 instances (Apache, Tomcat, NGINX, and the like) also produce logs, and your application code probably does the same.

Embedded within these logs are the data points, patterns, trends, and insights that you can use to understand how your applications and AWS resources are behaving, identify room for improvement, and to address operational issues. But, as usual, there’s a catch. The breadth of formats and data elements and the sheer size of the raw logs can make analysis difficult. When individual AWS customers routinely generate 100 terabytes or more of log files each day, old-school tools such as find and grep no longer suffice!

CloudWatch Logs Insights
The new CloudWatch Logs Insights will help! This is a fully managed service that is designed to work at cloud scale, with no setup or maintenance required. It plows through massive logs in seconds, and gives you fast, interactive queries and visualizations. It can can handle any log format, and auto-discovers fields from JSON logs. As you will see, it is very flexible, and will quickly become one of your favorite tools for diving in to your logs.

CloudWatch Logs Insights includes a sophisticated ad-hoc query language, with commands to fetch desired event fields, filter based on conditions, calculate aggregate statistics including percentiles and time series aggregations, sort on any desired file, and limit the number of events returned by a query. You can also use regular expressions to extract data from an event field, creating one or more ephemeral fields that can be further processed by the query. You can visualize query results using line and stacked area charts, and you can add queries to a CloudWatch Dashboard. There’s even a rich set of sample queries to get you started.

Insights in Action
To get started, I open the CloudWatch Console and click Insights:

Then I choose the desired Log Group using the menu:

I can enter a query, or I can choose one of the samples:

As you can see, sample queries are supplied for several different types of logs. I pick the first one, click Run query, the logs are scanned and the results are visible within seconds:

I can add a filter to my query and run it again. Perhaps I want to focus on EC2 API calls, so I use a pipe ( | ) and the filter command:

I can filter by an absolute or relative time range:

I can also generate visualizations. Here’s a simple one: Amazon RDS memory usage metrics for the last 30 minutes, grouped into 1-minute bins:

CloudWatch Logs Insights discovers all of the fields in the events and tells me how common they are in the selected log:

I can use this to build my queries interactively:

For queries that do not do any aggregation, I can expand an event and see all of the fields:

The query language supports six types of commands:

fields – Retrieves one or more log fields. It can also make use of functions such as abs, sqrt, strlen, trim, and more.

filter – Retrieves log fields based on one or more conditions built from Boolean operators, comparison operators, and regular expressions.

stats – Calculates aggregate statistics such as sum, avg, count, min, max, and percentile for a log field, across a given time interval (specified using the optional by modifier).

sort – Sorts logs events in ascending or descending order.

limit – Limits the number of log events returned by a query.

parse – Extracts data from a log field, creating one or more ephemeral fields that can be further processed by the query.

The language also supports a rich set of arithmetic & comparison operators, numeric functions, string functions, date/time functions, and aggregation functions.

As usual, I have shown you a fairly simply subset of the functionality and power that is available to you. Here are a couple of things that you can try on your own:

Add to Dashboard – After you have created an insightful query, click Add to Dashboard, then select an existing dashboard or create a new one:

Copy Query Results – After your have used CloudWatch Logs Insights to discover an issue, click the Action menu and choose Copy query results:

Then you can paste the results into your ticketing system for resolution.

API and CLI Access – In addition to console access, this feature is accessible via the AWS Command Line Interface (CLI) and the AWS SDKs.

CloudWatch Integration – You can write a bit of glue code to run queries, use the results to publish Custom Metrics. Then you can visualize them, set alarms, and so forth, all with the goal of simplifying and accelerating your troubleshooting.

Available Now
CloudWatch Logs Insights is available now in the US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), Europe (Ireland), Europe (Frankfurt), Europe (London), Europe (Paris), Asia Pacific (Tokyo), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Sydney), and South America (São Paulo) Regions and you can start using it today.

Pricing is based on the amount of ingested log data scanned for each query; you pay $0.005 per GB in US East (N. Virginia), with similar prices in the other regions.