Tag Archives: announcements

How to automate AWS Managed Microsoft AD scaling based on utilization metrics

Post Syndicated from Dennis Rothmel original https://aws.amazon.com/blogs/security/how-to-automate-aws-managed-microsoft-ad-scaling-based-on-utilization-metrics/

AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD), provides a fully managed service for Microsoft Active Directory (AD) in the AWS cloud. When you create your directory, AWS deploys two domain controllers in separate Availability Zones that are exclusively yours for high availability. For use cases requiring even higher resilience and performance, in a specific Region or during specific hours, AWS Managed Microsoft AD allows you to scale by deploying additional domain controllers to meet your needs. These domain controllers can help load-balance, increase overall performance, or simply provide additional nodes to protect against temporary availability issues. AWS Managed Microsoft AD allows you to define the correct number of domain controllers for your directory based on your individual use case.

This post will walk you through how to automate scaling in AWS Managed Microsoft AD using utilization metrics from your directory. You’ll do this using Amazon CloudWatch Alarms, SNS notifications, and a Lambda function to increase the number of domain controllers in your directory based on utilization peaks.

Simplified directory scaling

AWS Managed Microsoft AD has now simplified this directory scaling process by integrating with Amazon CloudWatch metrics. This new integration enables you to:

  1. Analyze your directory to identify expected average and peak directory utilization
  2. Scale your directory based on utilization data to adequately address the expected load
  3. Automate the addition of domain controllers to handle unexpected load.

Integration is available for both domain controller utilization metrics such as CPU, Memory, Disk and Network, and for AD-specific metrics, such as LDAP searches, binds, DNS queries, and Directory reads/writes. Analyzing this data over time to identify expected average and peak utilization on your directory can help you deploy additional domain controllers in Regions that need them. Once you’ve established this utilization baseline, you can deploy additional domain controllers to service this load, and configure alarms for anything exceeding this baseline.

Solution overview

In this example, our AWS Managed Microsoft AD has the default two domain controllers; once your utilization threshold is reached, you’ll add one additional domain controller (domain controller 3 in the diagram) to cover this additional load. 

Figure 1: Solution overview

Figure 1: Solution overview

To create a CloudWatch Alarm with SNS topic notifications

  1. In the AWS Console, navigate to CloudWatch
  2. Choose Metrics to see the Browse Metrics panel
  3. Choose the Directory Service namespace, then choose AWS Managed Microsoft AD.
  4. In the Directory ID column, select your directory and check search for this only.
  5. From the Metric Category column, select Processor from Metric Category and check add to search. This view will show the processor utilization for your directory.
  6.  

    Figure 2. Processor utilization metrics

    Figure 2. Processor utilization metrics

  7. To see the average utilization across all domain controllers, choose Add Math, then All Functions, then AVG to create a metric math expression for average CPU utilization across all domain controllers.
  8.  

    Figure 3. Adding a math function to compute average

    Figure 3. Adding a math function to compute average

  9. Next, choose the Graphed Metrics tab in the CloudWatch metrics console, select the newly created expression, then select the bell icon from the Actions column to create a CloudWatch alarm based on this metric.
  10. Figure 4. Create a CloudWatch Alarm using Metric Math Expression

    Figure 4. Create a CloudWatch Alarm using Metric Math Expression

  11. Configure the threshold alarm to trigger when CPU utilization exceeds 70%.
  12.  In the Metrics section, under Period, choose 1 Hour.
     In the Conditions section, under Threshold Type, choose Static. Under Define the alarm condition, choose Greater than threshold. Under Define the threshold value, enter 70. See Figure 5 for an image of how alarm parameters should look on your screen. Choose Next to Configure actions

    Figure 5. Configure the alarm parameters

    Figure 5. Configure the alarm parameters

  13. On the Configure actions screen, configure the actions using the parameters listed below to send an email notification when the alarm state is triggered. See Figure 6 for an image of how email notifications are configured.
  14.   In the Notification section, set Alarm state trigger to In alarm.   Set Select an SNS topic to Create topic.  Fill in the name of the alarm in the Create a new topic field, and add the email where notifications should be sent to the Email endpoints that will receive notification field. An email address is required to create the SNS topic and you should use an email address that’s accessible by your operations team. This SNS topic will be used to trigger the Lambda automation described in a later section. Note: make a note of the SNS topic name you chose; you will use it later when creating the Lambda function in the To create an AWS Lambda function to automate scale out procedure below. 

    Figure 6. Create SNS topic and email notification

    Figure 6. Create SNS topic and email notification

  15. In the Alarm name field, provide a name for the alarm. You can optionally also add an Alarm description. Choose Next.
  16. Review your configuration, and choose Create alarm to create the alarm.

Once you’ve completed these steps, you will now have an alarm implemented for when domain controller CPU utilization exceeds an average of 70% across both domain controllers. This will trigger an SNS topic when your directory is experiencing a heavy load, which will be used to start the Lambda automation and will send an informational email notification. In the next section, we’ll configure an AWS Lambda function to automate the addition of a domain controller based on this SNS topic.

For additional details on CloudWatch Alarms, please see the Amazon CloudWatch documentation.

To create an AWS Lambda function to automate scale out

The sample Lambda function shown below checks the number of domain controllers in this Region, and increases that by adding one additional domain controller. This procedure describes how to configure the IAM role required for this Lambda function, then how to deploy the Lambda function to execute when the alarm is triggered to automatically add a domain controller when your load exceeds your typical usage baseline.

Note: For additional details on Lambda creation, please see the AWS Lambda documentation.

To automate scale-out using AWS Lambda

  1. In the AWS Console, navigate to IAM and choose Policies, then choose Create Policy.
  2. Choose the JSON tab, and create a new IAM role using the policy provided in JSON below.
  3. For more details on this configuration, see the AWS Directory Service documentation.

    Sample policy

     

    {
    	"Version":"2012-10-17",
    	"Statement":[
    	{
    		"Effect":"Allow",
    		"Action":[
    			"ds:DescribeDomainControllers",
    			"ds:UpdateNumberOfDomainControllers",
    			"ec2:DescribeSubnets",
    			"ec2:DescribeVpcs",
    			"ec2:CreateNetworkInterface",
    			"ec2:DescribeNetworkInterfaces",
    			"ec2:DeleteNetworkInterface"
    		],
    		"Resource":"*"
    	}
    	]
    }
    

  4. Choose Next:Tags to add tags (optional) before choosing Next:Review.
  5. On the Create Policy screen, provide a name in the Name field. You can optionally also add a description. Choose Create policy to complete creating the new policy.
  6.  
    Note: make a note of the policy name you chose; you will use it later when updating the execution role for the Lambda function.  

    Figure 7. Provide a name to create the IAM policy

    Figure 7. Provide a name to create the IAM policy

  7. In the AWS Console, navigate to Lambda and choose Create Function
  8. On the Create Function screen, select Author from Scratch and provide a Name, then choose Create Function.
  9. Figure 8. Create a Lambda function

    Figure 8. Create a Lambda function

  10. Once created, on the Lambda function’s page, choose the Configuration tab, then choose Permissions from the sidebar and choose the execution role name linked under Role name. This will open the IAM console in another tab, preloaded to your Lambda execution role.
  11.  

    Figure 9: Select the Execution Role

    Figure 9: Select the Execution Role

  12. On the execution role screen, choose Attach policies and select the IAM policy you’ve just created (e.g. DirectoryService-DCNumber Update). On the Attach Permissions screen, choose Attach policy to complete updating the execution role. Once completed this step, you may close this tab and return the previous browser tab.
  13.  

    Figure 10. Select and attach the IAM policy

    Figure 10. Select and attach the IAM policy

  14. On the Lambda function screen, choose the Configuration tab, then choose Triggers from the sidebar.
  15. On the Add Trigger screen, choose the pulldown under Trigger configuration and select SNS. On the SNS topic box, select the SNS topic you created in Step 9 of the To create a CloudWatch Alarm with SNS topic notifications procedure above. Then choose Add to complete the trigger configuration.
  16. On the Lambda function screen, choose the Configuration tab, then choose Environment variables from the sidebar.
  17. On the Environment variables card, click Edit.
  18. On the Edit environment variables screen, choose Add environment variables and use the Key “DIRECTORY_ID” and the Value will be the directory ID for you AWS Managed Microsoft AD.
  19.  

    Figure 11. The "Edit environment variables" screen

    Figure 11. The “Edit environment variables” screen

  20. On the Lambda function screen, choose the Code tab to open the in-browser code editor experience inside the Code source card. Paste in the sample Lambda function code given below to complete the implementation.
  21.  

    Figure 12. Paste sample code to complete the Lambda function setup

    Figure 12. Paste sample code to complete the Lambda function setup

Sample Lambda function code

The sample Lambda function given below automates adding another domain controller to your directory. When your CloudWatch alarm triggers, you will receive a notification email, and an additional domain controller will be deployed to provide the added capacity to support the increase in directory usage.

Note: The example code contains a variable for the maximum number of domain controllers (maxDcNum), to prevent you from over provisioning in the event of a missed configuration. This value is set to 3 for this blog post’s example and can be increased to suit your use case. 

import json
import boto3

maxDcNum = 10
minDcNum = 2
region   = "us-east-1"
dsId = "d-906752246f"

ds = boto3.client('ds', region_name=region)

def lambda_handler(event, context):
    
    ## get the current number of domain controllers
    dcs = ds.describe_domain_controllers(DirectoryId = dsId)

    DomainControllers = dcs["DomainControllers"]
    
    DCcount = len(DomainControllers)
    print(">>> Current number of DCs:" + str(DCcount))

    #increase the number of DCs
    if DCcount < maxDcNum:
        NewDCnumber = DCcount + 1 
        response = ds.update_number_of_domain_controllers(DirectoryId = dsId, DesiredNumber =  NewDCnumber);    

        return {
            'statusCode': 200,
            'body': json.dumps("New DC number will be " + str(NewDCnumber))
        }
    else:
        return {
            'statusCode': 200,
            'body': json.dumps("Max number of DCs reached. The number of DCs is" + str(DCcount))
        }

Note: When testing this Lambda function, remember that this will increase the number of domain controllers for your directory in that Region. If the additional domain controller is not needed, please reduce the count after the test to avoid costs for an additional domain controller. The same principles used in this article to automate the addition of domain controllers can be applied to automate the reduction of domain controllers and you should consider automating the reduction to optimize for resilience, performance and cost.

Conclusion

In this post, you’ve implemented alarms based on thresholds in Domain Controller utilization using AWS CloudWatch and automation to increase the number of domain controllers using AWS Lambda functions. This solution helps to cost-effectively improve resilience and performance of your directory, by scaling your directory based on historical load patterns.

To learn more about using AWS Managed Microsoft AD, visit the AWS Directory Service documentation. For general information and pricing, see the AWS Directory Service home page. If you have comments about this blog post, submit a comment in the Comments section below. If you have implementation or troubleshooting questions, start a new thread on the Directory Service forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Dennis Rothmel

Dennis is a Sr. Product Manager from AWS Identity with a passion for modernization and automation. He has worked globally across Asia Pacific, Europe, and America, and enjoys the travel, languages, cultures and foods that come with global working.

Author

Vladimir Provorov

Vladimir is a Product Solutions Architect from AWS Identity focused on Workforce Identity and Directory Service. He works on developing new features to make Enterprise Identity simpler and more scalable. He is excited to travel and explore the world with his family.

New – Enhanced Dead-letter Queue Management Experience for Amazon SQS Standard Queues

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/enhanced-dlq-management-sqs/

Hundreds of thousands of customers use Amazon Simple Queue Service (SQS) to build message-based applications to decouple and scale microservices, distributed systems, and serverless apps. When a message cannot be successfully processed by the queue consumer, you can configure SQS to store it in a dead-letter queue (DLQ).

As a software developer or architect, you’d like to examine and review unconsumed messages in your DLQs to figure out why they couldn’t be processed, identify patterns, resolve code errors, and ultimately reprocess these messages in the original queue. The life cycle of these unconsumed messages is part of your error-handling workflow, which is often manual and time consuming.

Today, I’m happy to announce the general availability of a new enhanced DLQ management experience for SQS standard queues that lets you easily redrive unconsumed messages from your DLQ to the source queue.

This new functionality is available in the SQS console and helps you focus on the important phase of your error handling workflow, which consists of identifying and resolving processing errors. With this new development experience, you can easily inspect a sample of the unconsumed messages and move them back to the original queue with a click, and without writing, maintaining, and securing any custom code. This new experience also takes care of redriving messages in batches, reducing overall costs.

DLQ and Lambda Processor Setup
If you’re already comfortable with the DLQ setup, then skip the setup and jump into the new DLQ redrive experience.

First, I create two queues: the source queue and the dead-letter queue.

I edit the source queue and configure the Dead-letter queue section. Here, I pick the DLQ and configure the Maximum receives, which is the number of times after which a message is reprocessed before being sent to the DLQ. For this demonstration, I’ve set it to one. This means that every failed message goes to the DLQ immediately. In a real-world environment, you might want to set a higher number depending on your requirements and based on what a failure means with respect to your application.

I also edit the DLQ to make sure that only my source queue is allowed to use this DLQ. This configuration is optional: when this Redrive allow policy is disabled, any SQS queue can use this DLQ. There are cases where you want to reuse a single DLQ for multiple queues. But usually it’s considered best practices to setup independent DLQs per source queue to simplify the redrive phase without affecting cost. Keep in mind that you’re charged based on the number of API calls, not the number of queues.

Once the DLQ is correctly set up, I need a processor. Let’s implement a simple message consumer using AWS Lambda.

The Lambda function written in Python will iterate over the batch of incoming messages, fetch two values from the message body, and print the sum of these two values.

import json

def lambda_handler(event, context):
    for record in event['Records']:
        payload = json.loads(record['body'])

        value1 = payload['value1']
        value2 = payload['value2']

        value_sum = value1 + value2
        print("the sum is %s" % value_sum)
        
    return "OK"

The code above assumes that each message’s body contains two integer values that can be summed, without dealing with any validation or error handling. As you can imagine, this will lead to trouble later on.

Before processing any messages, you must grant this Lambda function enough permissions to read messages from SQS and configure its trigger. For the IAM permissions, I use the managed policy named AWSLambdaSQSQueueExecutionRole, which grants permissions to invoke sqs:ReceiveMessage, sqs:DeleteMessage, and sqs:GetQueueAttributes.

I use the Lambda console to set up the SQS trigger. I could achieve the same from the SQS console too.

Now I’m ready to process new messages using Send and receive messages for my source queue in the SQS console. I write {"value1": 10, "value2": 5} in the message body, and select Send message.

When I look at the CloudWatch logs of my Lambda function, I see a successful invocation.

START RequestId: 637888a3-c98b-5c20-8113-d2a74fd9edd1 Version: $LATEST
the sum is 15
END RequestId: 637888a3-c98b-5c20-8113-d2a74fd9edd1
REPORT RequestId: 637888a3-c98b-5c20-8113-d2a74fd9edd1	Duration: 1.31 ms	Billed Duration: 2 ms	Memory Size: 128 MB	Max Memory Used: 39 MB	Init Duration: 116.90 ms	

Troubleshooting powered by DLQ Redrive
Now what if a different producer starts publishing messages with the wrong format? For example, {"value1": "10", "value2": 5}. The first number is a string and this is quite likely to become a problem in my processor.

In fact, this is what I find in the CloudWatch logs:

START RequestId: 542ac2ca-1db3-5575-a1fb-98ce9b30f4b3 Version: $LATEST
[ERROR] TypeError: can only concatenate str (not "int") to str
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 8, in lambda_handler
    value_sum = value1 + value2
END RequestId: 542ac2ca-1db3-5575-a1fb-98ce9b30f4b3
REPORT RequestId: 542ac2ca-1db3-5575-a1fb-98ce9b30f4b3	Duration: 1.69 ms	Billed Duration: 2 ms	Memory Size: 128 MB	Max Memory Used: 39 MB	

To figure out what’s wrong in the offending message, I use the new SQS redrive functionality, selecting DLQ redrive in my dead-letter queue.

I use Poll for messages and fetch all unconsumed messages from the DLQ.

And then I inspect the unconsumed message by selecting it.

The problem is clear, and I decide to update my processing code to handle this case properly. In the ideal world, this is an upstream issue that should be fixed in the message producer. But let’s assume that I can’t control that system and it’s critically important for the business that I process this new type of messages.

Therefore, I update the processing logic as follows:

import json

def lambda_handler(event, context):
    for record in event['Records']:
        payload = json.loads(record['body'])
        value1 = int(payload['value1'])
        value2 = int(payload['value2'])
        value_sum = value1 + value2
        print("the sum is %s" % value_sum)
        # do some more stuff
        
    return "OK"

Now that my code is ready to process the unconsumed message, I start a new redrive task from the DLQ to the source queue.

By default, SQS will redrive unconsumed messages to the source queue. But you could also specify a different destination and provide a custom velocity to set the maximum number of messages per second.

I wait for the redrive task to complete by monitoring the redrive status in the console. This new section always shows the status of most recent redrive task.

The message has been moved back to the source queue and successfully processed by my Lambda function. Everything looks fine in my CloudWatch logs.

START RequestId: 637888a3-c98b-5c20-8113-d2a74fd9edd1 Version: $LATEST
the sum is 15
END RequestId: 637888a3-c98b-5c20-8113-d2a74fd9edd1
REPORT RequestId: 637888a3-c98b-5c20-8113-d2a74fd9edd1	Duration: 1.31 ms	Billed Duration: 2 ms	Memory Size: 128 MB	Max Memory Used: 39 MB	Init Duration: 116.90 ms	

Available Today at No Additional Cost
Today you can start leveraging the new DLQ redrive experience to simplify your development and troubleshooting workflows, without any additional cost. This new console experience is available in all AWS Regions where SQS is available, and we’re looking forward to hearing your feedback.

Alex

New – Amazon VPC Network Access Analyzer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-vpc-network-access-analyzer/

If you are a member of your organization’s networking, cloud operations, or security teams, you are going to love this new feature. The new Amazon VPC Network Access Analyzer helps you identify network configurations that lead to unintended network access. As you will see in a moment, it will point out ways that you can improve your security posture while still letting you and your organization be agile and flexible. In contrast to manual checking of network configurations, which is error prone and hard to scale, this tool lets you analyze your AWS networks of any size and complexity.

Introducing Network Access Analyzer
Network Access Analyzer takes advantage of our automated reasoning technology that already powers AWS IAM Access Analyzer, Amazon VPC Reachability Analyzer, Amazon Inspector Network Reachability, and other provable security tools.

This new tool uses Network Access Scopes to specify the desired connectivity between your AWS resources. You can get started with a set of Amazon-created scopes, and then either copy & customize them, or create your own from scratch. The scopes are high-level and independent of any particular network architecture or configuration, and can be thought of as a language for specifying the proper level of access & connectivity for your network. You can, for example, create a scope to verify that all web apps use a firewall to access Internet resources, or to indicate that AWS resources used by your Finance team are separate, distinct, and unreachable from the resources used by your Development team.

To evaluate your network against a particular scope, you select it and initiate an analysis. It runs for a few minutes and then generates a set of findings, each of which indicates an unexpected network path between the AWS resources defined in the scope. You can analyze the findings, adjust your configuration or modify the scope in response to the findings, and re-run the analysis, all in just a few minutes.

The analysis process examines a very wide range of AWS resources including Security Groups, CIDR blocks, prefix lists, Elastic Network Interfaces, EC2 instances, Load Balancers, VPC, VPC subnets, VPC endpoints, VPC endpoint services, Transit Gateways, NAT Gateways, Internet Gateways, VPN Gateways, Peering Connections, and Network Firewalls. Your scopes can use Resource Groups to reference all resources that are tagged in a particular way.

Using Network Access Analyzer
To get started, I open the VPC Console, find the Network Analysis section on the left-side navigation, and click Network Access Analyzer:

I can see all of my scopes. Initially, I have four, all created by Amazon and ready to use:

To conduct an analysis, I select a scope (AWS-VPC-Ingress (Amazon created)) and click Analyze. The scope’s description reads:

“Identify ingress paths into your VPCs from Internet Gateways, Peering Connections, VPC Service Endpoints, VPN and Transit Gateways.”

The analysis runs for a couple of minutes and displays the findings as soon as it is done:

There’s a lot of very useful information here! The spectrum chart provides an overview of the resources that are in the findings. I can hover my mouse over any of the segments to learn more, or click on one in order to filter the findings and show only those that reference a particular resource or resource type:

For example, I click VPC Peering Connections and I can see all of the findings that reference the VPC peering connection:

As you can see, the Path details highlight the VPC peering connection in the path! The next step is to examine the findings, decide which ones are expected, and to add them to the scope so that they are excluded from future findings (more on that in a bit).

Inside a Network Access Scope
Let’s take a quick look inside of the Network Access Scope that I used above, and then build another scope from scratch using the visual builder. Each scope is represented in JSON format, and indicates what is considered in-scope (acceptable) traffic between sources and destinations:

{
          "networkInsightsAccessScopeId": "nis-070dc1d37ca315e86",
          "matchPaths": [
                    {
                              "source": {
                                        "resourceStatement": {
                                                  "resources": [],
                                                  "resourceTypes": [
                                                            "AWS::EC2::InternetGateway",
                                                            "AWS::EC2::VPCPeeringConnection",
                                                            "AWS::EC2::VPCEndpointService",
                                                            "AWS::EC2::TransitGatewayAttachment",
                                                            "AWS::EC2::VPNGateway"
                                                  ]
                                        }
                              },
                              "destination": {
                                        "resourceStatement": {
                                                  "resources": [],
                                                  "resourceTypes": [
                                                            "AWS::EC2::NetworkInterface"
                                                  ]
                                        }
                              }
                    }
          ],
          "excludePaths": []
}

The matchPaths element contains source and destination elements. Each of these elements, in turn, identifies AWS resource types and specific resources. While not shown here, scopes can also contain source and destination IP addresses, ports, prefix lists, and traffic types (TCP or UDP). The excludePaths can contain resource types, specific resources, and so forth. I could, for example, define sources and destinations that match all Internet Gateway ingress traffic, but exclude traffic that flows through a Load Balancer, or I could exclude SSH traffic destined for my bastion instances.

Building a Network Access Scope
I can build a new scope in three ways. I can Duplicate and modify an existing one, I can start from scratch and use the visual builder, or I can write my own JSON and use either the CLI or the API to create a scope. I click Create Network Access Scope to use the builder:

I can start with one of five predefined templates, or I can build my own:

I enter a name and a description:

Then I define the source and destinations by resource type, id, traffic type, and so forth:

I have many options for matching the traffic type. This allows me to create scopes for very specific purposes:

I can use a similar interface to add any optional exclusions.

Things to Know
This is a very powerful tool and one that I think you are going to love. Here are a couple of things to know about it:

Pricing – You pay $0.002 for each Elastic Network Interface (ENI) analyzed as part of an assessment.

Regions – Network Access Analyzer is available in the US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Africa (Cape Town), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Milan), Europe (Paris), Europe (Stockholm), South America (São Paulo), and Middle East (Bahrain) Regions.

In the Works – We have lots of additional features on the product roadmap including support for AWS Organizations, the ability to run your analyses on a regular schedule, and support for IPv6 address ranges and resources.

Jeff;

New AWS Scholarship Program Helps Underrepresented and Underserved Students Prep for Careers in AI and ML

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/new-aws-scholarship-program-helps-underrepresented-and-underserved-students-prep-for-careers-in-ai-and-ml/

As a woman working in information technology (IT) for many years, it has always been close to my heart to challenge long-standing gender stereotypes and inspire more young learners to consider a career in tech. With artificial intelligence (AI) and machine learning (ML) defining the future of technology, this future also depends on diverse representation.

The World Economic Forum estimates that technological advances and automation will create 97 million new technology jobs by 2025, including in the field of AI and ML. Yet, according to their research, women make up just 32% of AI jobs globally. The Pew Research Center found that Black and Hispanic workers in the U.S. comprise just 9% and 8% of workers in the science, technology, engineering, and mathematics (STEM) careers respectively.

At Amazon, we believe that technology should be built in a way that’s inclusive, diverse, and equitable. We already offer STEM programs to enable underrepresented groups to build or advance their technical skills and open up new career possibilities. Programs such as We Power Tech, Amazon Future Engineer, AWS Girls’ Tech Day, and AWS GetIT aim to build a future of tech that is inclusive, diverse, and accessible.

Today, I am excited to announce the launch of the AWS AI & ML Scholarship program in collaboration with Intel and Udacity, designed to prepare underrepresented and underserved students globally for careers in ML.

AWS AI & ML Scholarship Program Debuts with AWS DeepRacer Student
The AWS AI & ML Scholarship program is launching as part of the all-new AWS DeepRacer Student service and Student League. This is a new student division of the popular AWS DeepRacer program, a cloud-based 3D car racing simulator that provides a fun way to learn about ML and reinforcement learning (RL). Through DeepRacer Student, you have access to free online trainings to learn the ML and RL basics. You also have access to 10 hours of model training and 5 GB of storage per month to participate in the DeepRacer Student League, a global autonomous racing competition exclusively for AWS AI & ML students.

As part of DeepRacer Student, the AWS AI & ML Scholarship program in collaboration with Intel and Udacity is geared toward underserved and underrepresented high school and college students globally who are at least 16 years old. These are students who may have faced financial barriers growing up or are part of underrepresented groups, including women, people with disabilities, LGBTQ+ persons, as well as people of color. Students in these groups who take part in the DeepRacer Student League are eligible for a chance to win one or two of 2,500 annual scholarships from Udacity, an online learning platform focused on technology skills.

How to Qualify for the AWS AI & ML Scholarship
AWS DeepRacer Student League LogoTo be considered for a scholarship, you must successfully finish all AWS DeepRacer Student learning modules and achieve a score of at least 80% on all course quizzes, reach a certain lap time performance with your DeepRacer car in the Student League, and submit an essay.

Each year, 2,000 students will win a scholarship to the AI Programming with Python Udacity Nanodegree program. Udacity Nanodegrees are massive open online courses (MOOCs) designed to bridge the gap between learning and career goals. This aims to equip students with programming and ML fundamentals to solve real world problems with ML. The top 500 participants in this first Nanodegree will be eligible to join a second customized Nanodegree program curated specifically for AWS AI & ML Scholarship program recipients.

Coaching and Mentorship Opportunities for Scholarship Recipients
Scholarship recipients not only get access to the educational content, but also receive up to 85 hours of support from Udacity instructors to make sure that students successfully learn and progress through the Nanodegree. Instructors and students meet weekly in small groups, review the content for the week, answer questions, and work on a case study as a group exercise.

The top 500 participants who advance into the second Nanodegree will receive 12 months of mentorship from tenured technology leaders at Amazon and Intel to help prepare for a tech career.

All scholarship recipients will be given exclusive access to Ask-Me-Anythings (AMAs), fireside chats, and office hours with AI/ML professionals and diversity experts from Amazon, Intel, and AWS collaborators such as Girls in Tech. These will help familiarize students with different job functions in the AI/ML field.

Enroll via AWS DeepRacer Student
To enroll in the AWS AI & ML Scholarship program, first sign up at the AWS DeepRacer Student service with a valid email address. Note that this student player account is separate from an AWS account and doesn’t require you to provide any billing or credit card information.

AWS DeepRacer Student Sign Up

You will receive a verification code via email. Once you have verified your email address, log in to the DeepRacer Student service and complete the sign-up process.

AWS DeepRacer Student Complete Sign Up

The form will ask for some additional information, including your school, major, and planned year of graduation. You must also self-certify that you are a student enrolled in either high school, university, or community college.

AWS DeepRacer Student Complete Sign Up with Personal Information

Next, enroll in the AWS AI & ML Scholarship program by selecting the corresponding checkbox. The scholarship qualification will start on March 1, 2022.

AWS DeepRacer Student Opt In To Scholarship

Welcome to AWS DeepRacer Student! Start learning the fundamentals of ML through the provided online trainings. You can also participate in the pre-season DeepRacer Student League before the qualifying races start on March 1, 2022.

AWS DeepRacer Student Dashboard

Available Now
Start preparing for the AWS AI & ML Scholarship program by signing up for AWS DeepRacer Student, available globally today, and opt in to the scholarship program. Start learning with learning modules, train your first AWS DeepRacer model, and see how it performs in AWS DeepRacer Student League Pre-season happening now. Join the AWS Slack community to connect with experts and ask questions.

Sign up for AWS DeepRacer Student and the AWS AI & ML Scholarship program today.

Antje

Now in Preview – Amazon SageMaker Studio Lab, a Free Service to Learn and Experiment with ML

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/now-in-preview-amazon-sagemaker-studio-lab-a-free-service-to-learn-and-experiment-with-ml/

Our mission at AWS is to make machine learning (ML) more accessible. Through many conversations over the past years, I learned about barriers that many ML beginners face. Existing ML environments are often too complex for beginners, or too limited to support modern ML experimentation. Beginners want to quickly start learning and not worry about spinning up infrastructure, configuring services, or implementing billing alarms to avoid going over budget. This emphasizes another barrier for many people: the need to provide billing and credit card information at sign-up.

What if you could have a predictable and controlled environment for hosting your Jupyter notebooks in which you can’t accidentally run up a big bill? One that doesn’t require billing and credit card information at all at sign-up?

Today, I am extremely happy to announce the public preview of Amazon SageMaker Studio Lab, a free service that enables anyone to learn and experiment with ML without needing an AWS account, credit card, or cloud configuration knowledge.

At AWS, we believe technology has the power to solve the world’s most pressing issues. And, we proudly support the new and innovative ways that our customers are using these technologies to deliver social impacts.

This is why I am also excited to announce the launch of the AWS Disaster Response Hackathon using Amazon SageMaker Studio Lab. The hackathon, starting today and running through February 7, 2022, is a great way to start learning ML while doing good in the world. I will share more details on how to get involved at the end of the post.

Getting Started with Amazon SageMaker Studio Lab
Studio Lab is based on open-source JupyterLab and gives you free access to AWS compute resources to quickly start learning and experimenting with ML. Studio Lab is also simple to set up. In fact, the only configuration you have to do is one click to choose whether you need a CPU or GPU instance for your project. Let me show you.

The first step is to request a free Studio Lab account here.

Amazon SageMaker Studio Lab

When your account request is approved, you will receive an email with a link to the Studio Lab account registration page. You can now create your account with your approved email address and set a password and your username. This account is separate from an AWS account and doesn’t require you to provide any billing information.

Amazon SageMaker Studio Lab - Create Account

Once you have created your account and verified your email address, you can sign in to Studio Lab. Now, you can select the compute type for your project. You can choose between 12 hours of CPU or 4 hours of GPU per user session, with an unlimited number of user sessions available to you. Furthermore, you get a minimum of 15 GB of persistent storage per project. When your session expires, Studio Lab will take a snapshot of your environment. This enables you to pick up right where you left off. Let’s select CPU for this demo, and choose Start runtime.

Amazon SageMaker Studio Lab - Select Compute

Once the instance is running, select Open project to go to your free Studio Lab environment and start building. No additional configuration is required.

Amazon SageMaker Studio Lab - Open Project

Amazon SageMaker Studio Lab Environment

Customize your environment
Studio Lab comes with a Python base image to get you started. The image only has a few libraries pre-installed to save the available space for the frameworks and libraries that you actually need.

Amazon SageMaker Studio Lab - Select Kernel

You can customize the Conda environment and install additional packages using the %conda install <package> or %pip install <package> command right from within your notebook. You can also create entirely new, custom Conda environments, or install open-source JupyterLab and Jupyter Server extensions. For detailed instructions, see the Studio Lab documentation.

GitHub integration
Studio Lab is tightly integrated with GitHub and offers full support for the Git command line. This lets you easily clone, copy, and save your projects. Moreover, you can add an Open in Studio Lab badge to the README.md file or notebooks in your public GitHub repo to share your work with others.

Open in Amazon SageMaker Studio Lab Badge

This will let everyone open and view the notebook in Studio Lab. If they have a Studio Lab account, then they can also run the notebook. Add the following markdown to the top of your README.md file or notebook to add the Open in Studio Lab badge:

[![Open In Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/org/repo/blob/master/path/to/notebook.ipynb)

Replace org, repo, path and the notebook filename with those for your repo. Then, when you click the Open in Studio Lab badge, it will preview the notebook in Studio Lab. If your repo is private within a GitHub account or organization and you would like other people to use it, then you must additionally install the Amazon SageMaker GitHub App at the GitHub account or organization level.

Amazon SageMaker Studio Lab Notebook Preview

If you have a Studio Lab account, you can click Copy to project and choose to either copy just the notebook or to clone the entire repo into your Studio Lab account. Moreover, Studio Lab can check if the repository contains a Conda environment file and build the custom Conda environment for you.

Learn the Fundamentals of ML
If you are new to ML, then Studio Lab provides access to free, educational content to get you started. Dive into Deep Learning (D2L) is a free interactive book that teaches the ideas, the math, and the code behind ML and DL. The AWS Machine Learning University (MLU) gives you access to the same ML courses used to train Amazon’s own developers on ML. Hugging Face is a large open source community and a hub for pre-trained deep learning (DL) models. This is mainly aimed at natural language processing. In just a few clicks, you can import the relevant notebooks from D2L, MLU, and Hugging Face into your Studio Lab environment.

Join the AWS Disaster Response Hackathon using Amazon SageMaker Studio Lab
The frequency and severity of natural disasters are increasing. This year alone, we have seen significant wildfires across the Western United States and in countries like Greece and Turkey; major floods across Europe; and Hurricane Ida’s impact to the coast of Louisiana. In response, governments, businesses, nonprofits, and international organizations are placing more emphasis on disaster preparedness and response than ever before.

AWS Disaster Response Hackathon

Through the AWS Disaster Response Hackathon offering a total of $54,000 USD in prices, we hope to stimulate ways of applying ML to solve pressing challenges in natural disaster preparedness and response.

Join the hackathon today, start building, and don’t forget to submit your project before February 7, 2022. This hackathon is also an attempt to set the Guinness World Record for the “largest machine learning competition.”

Join the Preview
You can request a free Amazon SageMaker Studio Lab account starting today. The number of new account registrations will be limited to ensure a high quality of experience for all customers. You can find sample notebooks in the Studio Lab GitHub repository. Give it a try and let us know your feedback.

Request a free Amazon SageMaker Studio Lab account.

Antje

Announcing Amazon SageMaker Inference Recommender

Post Syndicated from Sean M. Tracey original https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-inference-recommender/

Today, we’re pleased to announce Amazon SageMaker Inference Recommender — a brand-new Amazon SageMaker Studio capability that automates load testing and optimizes model performance across machine learning (ML) instances. Ultimately, it reduces the time it takes to get ML models from development to production and optimizes the costs associated with their operation.

SageMaker Inference Recommender Banner Image

Until now, no service has provided MLOps Engineers with a means to pick the optimal ML instances for their model. To optimize costs and maximize instance utilization, MLOps Engineers would have to use their experience and intuition to select an ML instance type that would serve them and their model well, given the requirements to run them. Moreover, given the vast array of ML instances available, and the practically infinite nuances of each model, choosing the right instance type could take more than a few attempts to get it right. SageMaker Inference Recommender now lets MLOps Engineers and get recommendations for the best available instance type to run their model. Once an instance has been selected, their model can be instantly deployed to the selected instance type with only a few clicks. Gone are the days of writing custom scripts to run performance benchmarks and load testing.

For MLOps Engineers who want to get data on how their model will perform ahead of pushing to a production environment, SageMaker Inference Recommender also lets them run a load test against their model in a simulated environment. Ahead of deployment, they can specify parameters, such as required throughput, sample payloads, and latency constraints, and test their model against these constraints on a selected set of instances. This lets MLOps Engineers gather data on how well their model will perform in the real world, thereby enabling them to feel confident in pushing it to production—or highlighting potential issues that must be addressed before putting it out into the world.

SageMaker Inference Recommender has even more tricks up its sleeve to make the lives of MLOps Engineers easier and make sure that their models continue to operate optimally. MLOps Engineers can use SageMaker Inference Recommender benchmarking features to perform custom load tests that estimate model performance when accessed under load in a production environment given certain requirements. Results from these tests can be loaded with either SageMaker Studio or the AWS SDK or AWS CLI, giving MLOps Engineers an overview of model performance, comparisons of numerous configurations, and the ability to share the results with any stakeholders.

Find Out More
MLOps Engineers can get started with Amazon SageMaker Inference Recommender through Amazon SageMaker Studio, AWS SDKs and CLI . Amazon SageMaker Inference Recommender is available in all AWS commercial regions where SageMaker is available (except for KIX). To find out more information, you can visit the Amazon SageMaker Inference Recommender landing page.

To get started, see the SageMaker Inference Recommender documentation.

New – Introducing SageMaker Training Compiler

Post Syndicated from Sean M. Tracey original https://aws.amazon.com/blogs/aws/new-introducing-sagemaker-training-compiler/

An image explaining the benefits of using Amazon SageMaker Training CompilerToday, we’re pleased to announce Amazon SageMaker Training Compiler, a new Amazon SageMaker capability that can accelerate the training of deep learning (DL) models by up to 50%.

As DL models grow in complexity, so too does the time it can take to optimize and train them. For example, it can take 25,000 GPU-hours to train popular natural language processing (NLP) model “RoBERTa“. Although there are techniques and optimizations that customers can apply to reduce the time it can take to train a model, these also take time to implement and require a rare skillset. This can impede innovation and progress in the wider adoption of artificial intelligence (AI).

How has this been done to date?
Typically, there are three ways to speed up training:

  1. Using more powerful, individual machines to process the calculations
  2. Distributing compute across a cluster of GPU instances to train the model in parallel
  3. Optimizing model code to run more efficiently on GPUs by utilizing less memory and compute.

In practice, optimizing machine learning (ML) code is difficult, time-consuming, and a rare skill set to acquire. Data scientists typically write their training code in a Python-based ML framework, such as TensorFlow or PyTorch, relying on ML frameworks to convert their Python code into mathematical functions that can run on GPUs, commonly known as kernels. However, this translation from the Python code of a user is often inefficient because ML frameworks use pre-built, generic GPU kernels, instead of creating kernels specific to the code and model of the user.

It can take even the most skilled GPU programmers months to create custom kernels for each new model and optimize them. We built SageMaker Training Compiler to solve this problem.

Today’s launch lets SageMaker Training Compiler automatically compile your Python training code and generate GPU kernels specifically for your model. Consequently, the training code will use less memory and compute, and therefore train faster. For example, when fine-tuning Hugging Face’s GPT-2 model, SageMaker Training Compiler reduced training time from nearly 3 hours to 90 minutes.

Automatically Optimizing Deep Learning Models
So, how have we achieved this acceleration? SageMaker Training Compiler accelerates training jobs by converting DL models from their high-level language representation to hardware-optimized instructions that train faster than jobs with off-the-shelf frameworks. Under the hood, SageMaker Training Compiler makes incremental optimizations beyond what the native PyTorch and TensorFlow frameworks offer to maximize compute and memory utilization on SageMaker GPU instances.

More specifically, SageMaker Training Compiler uses graph-level optimization (operator fusion, memory planning, and algebraic simplification), data flow-level optimizations (layout transformation, common sub-expression elimination), and back-end optimizations (memory latency hiding, loop oriented optimizations) to produce an optimized model that efficiently uses hardware resources. As a result, training is accelerated by up to 50%, and the returned model is the same as if SageMaker Training Compiler had not been used.

But how do you use SageMaker Training Compiler with your models? It can be as simple as adding two lines of code!

SageMaker Training Compiler Code Changes

The shortened training times mean that customers gain more time for innovating and deploying their newly-trained models at a reduced cost and a greater ability to experiment with larger models and more data.

Getting the most from SageMaker Training Compiler
Although many DL models can benefit from SageMaker Training Compiler, larger models with longer training will realize the greatest time and cost savings. For example, training time and costs fell by 30% on a long-running RoBERTa-base fine-tuning exercise.

Jorge Lopez Grisman, a Senior Data Scientist at Quantum Health – an organization on a mission to “make healthcare navigation smarter, simpler, and more cost-effective for everyone” – said:

“Iterating with NLP models can be a challenge because of their size: long training times bog down workflows and high costs can discourage our team from trying larger models that might offer better performance. Amazon SageMaker Training Compiler is exciting because it has the potential to alleviate these frictions. Achieving a speedup with SageMaker Training Compiler is a real win for our team that will make us more agile and innovative moving forward.”

Further Resources
To learn more about how Amazon SageMaker Training Compiler can benefit you, you can visit our page here. And to get started see our technical documentation here.

Announcing Amazon SageMaker Ground Truth Plus

Post Syndicated from Sean M. Tracey original https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-ground-truth-plus/

Today, we’re pleased to announce the latest service in the Amazon SageMaker suite that will make labeling datasets easier than ever before. Ground Truth Plus is a turn-key service that uses an expert workforce to deliver high-quality training datasets fast, and reduces costs by up to 40 percent.

The Challenges of Machine Learning Model Creation
One of the biggest challenges in building and training machine learning (ML) models is sourcing enough high-quality, labeled data at scale to feed into and train those models so that they can make an accurate prediction.

On the face of it, labeling data might seem like a fairly straightforward task…

  • Step 1: Get data
  • Step 2: Label it

…but this is far from the reality.

Even before you have labelers begin annotations, you need a custom labeling workflow and user interface specific to your project so that you get a high-quality dataset. This relies on a combination of robust tooling and skilled workers, and the effort spent can be significant.

Once the data labeling workflow and user interface has been constructed, a workforce to use those systems must be organized and trained – and this is all before a single point of data has been labeled!

Finally, once the labeling systems have been built, the workflows designed, and the workforce trained and deployed, the process of passing data through that system must be monitored and checked to ensure a consistent, high-quality output. After enough data has been passed through and labeled by the system, you have arrived at the point you’ve been trying to get to all along: you finally have enough data to train the ML model.

Each of these steps represents a significant investment in time, costs, and energy. You could be spending these resources building ML models instead of labeling and managing data, and using Ground Truth Plus can help free you up to do just that.

Introducing Amazon SageMaker Ground Truth Plus
Amazon SageMaker Ground Truth Plus enables you to easily create high-quality training datasets without having to build labeling applications and manage the labeling workforce on your own. Which means you don’t even need to have deep ML expertise or extensive knowledge of workflow design and quality management. You simply provide data along with labeling requirements and Ground Truth Plus sets up the data labeling workflows and manages them on your behalf in accordance with your requirements.

For example, if you need medical experts to label radiology images, you can specify that in the guidelines you provide to Ground Truth Plus. The service will then automatically select labelers trained in radiology to label your data, and from there an expert workforce that is trained on a variety of ML tasks will start labeling the data. Ground Truth Plus brings ML-powered automation to data labeling, which increases the quality of the output dataset and decreases the data labeling costs.

Amazon SageMaker Ground Truth Plus uses a multi-step labeling workflow including ML techniques for active learning, pre-labeling, and machine validation. This reduces the time required to label datasets for a variety of use cases including computer vision and natural language processing. Finally, Ground Truth Plus provides transparency into data labeling operations and quality management through interactive dashboards and user interfaces. This lets you monitor the progress of training datasets across multiple projects, track project metrics such as daily throughput, inspect labels for quality, and provide feedback on the labeled data.

How Does It Work?
A SageMaker Ground Truth Plus screenshot showing the request formFirst, let’s head to the new Ground Truth Plus console and fill out a form outlining the requirements for the data labeling project. Following that, our team of AWS Experts will schedule a call to discuss your data labeling project.

After the call, you simply upload data in an Amazon Simple Storage Service (Amazon S3) bucket for labeling.

Once the data has been uploaded, our experts will set-up the data labeling workflow per your requirements and create a team of labelers with the expertise necessary to label your data effectively. This helps make sure that you have the best people possible working on your projects.

These expert labelers use the Ground Truth Plus tools we’ve built to label these datasets quickly and effectively.

Initially, labelers will annotate the data you’ve uploaded, much like the following example image that we’ve uploaded from the CBCL StreetScenes dataset. However, as the labelers start to submit examples of labeled data, something cool begins happening: our ML systems kick in and start to pre-label the images on behalf of the expert workforce!

An example of the raw dataset used to demonstrate Amazon SageMaker Ground Truth Plus functionality

As more and more data is labeled by the expert workforce, the ML model becomes better at pre-labeling those images. This means that there’s less need for a human to spend as much time creating each individual label for every object of interest in a dataset. Less time spent on labeling means lower costs for you, and it also means a quicker turnaround in creating a dataset that can be used for training a model – all without sacrificing quality.

A screenshot showing one of the labelling interfaces for SageMaker Ground Truth Plus

As the process continues, these ML models will also start to highlight potential areas of interest that the labeling workforce may have missed or incorrectly labeled through machine validation (indicated below by the purple box). Once an area of interest has been highlighted, a human labeler can view and either confirm or delete the suggestion that the model has made. This iteratively improves the pre-labeling and machine validation stages, further reducing the time needed by a labeler to manually label the data, and ensures a high-quality output throughout the process.

A screenshot showing one of the labelling interfaces filed in my a machine learning model for SageMaker Ground Truth Plus

While this is all going on, you can monitor the progress and output of the project using the Ground Truth Plus Project Portal. Within this portal, you can track the amount of data labeled on a day-by-day basis, and make sure that the project is progressing at an acceptable rate.

A screenshot showing the metrics dashboard enabling users to track the progress of their labelling jobs in SageMaker Ground Truth Plus

With each batch of images uploaded and labeled, you can decide whether to accept them or send them back for relabeling if something has been missed.

Finally, when the labeling process has completed, you can retrieve the labeled data from a secure S3 bucket and get to the business of training models.

Find out more
Today, Amazon SageMaker Ground Truth Plus is available in the N. Virginia (us-east-1) region.

To learn more:

New DynamoDB Table Class – Save Up To 60% in Your DynamoDB Costs

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/new-dynamodb-table-class-save-up-to-60-in-your-dynamodb-costs/

Today we are announcing Amazon DynamoDB Standard-Infrequent Access (DynamoDB Standard-IA). A new table class for DynamoDB that reduces storage costs by 60 percent compared to existing DynamoDB Standard tables, and that delivers the same performance, durability, and scaling.

Nowadays, many customers are moving their infrequently accessed data between DynamoDB and Amazon Simple Storage Service (Amazon S3). This means that customers are developing a process to migrate the data and build complex applications that must support two different APIs—one for DynamoDB and another for Amazon S3.

DynamoDB Standard-IA table class is designed for customers who want a cost-optimized solution for storing infrequently accessed data in DynamoDB without changing any application code. Using this new table class, you get the single-digit millisecond read and write performance from DynamoDB and use all of the same APIs.

When you use DynamoDB Standard-IA table class, you will save up to 60 percent in storage costs as compared to using the DynamoDB Standard table class. However, DynamoDB reads and writes for this new table class are priced higher than the Standard tables. Therefore, it is important to understand your use cases before applying this new table class to your tables.

DynamoDB Standard-IA is a great solution if you must store terabytes of data for several years where the data must be highly available, but it is not frequently accessed. An example is social media applications where end users rarely access their old posts. However, these posts remain stored, because if someone scrolls on a profile to see an old photo from 2009, they should be able to retrieve it as fast as if it was a newer post.

E-commerce sites are another good use case. These sites might have a lot of products that are not frequently accessed, but administrators of the site still want to have them available in their store just in case someone wants to buy them. Furthermore, this is a good solution for storing a customer’s previous orders. DynamoDB Standard-IA table offers the ability to retain historical orders at a lower cost.

Get started using DynamoDB Standard-IA
Get started using DynamoDB Standard-IA by evaluating the best class for your existing tables.

Go to the table page and select Update the table class in the Actions dropdown to change the table class. Then, choose the new table class and save the changes. You can change the table class for an existing table to be Standard-IA or Standard twice every 30-days with no impact on performance or availability. All of the features of DynamoDB are available when using a table in the Standard-IA table class.

Moreover, you can also create a new table with the DynamoDB Standard-IA table class.

Update table class

Availability and Pricing
DynamoDB Standard-IA is available in all of the AWS Regions, except the China Regions and AWS GovCloud.

For example, DynamoDB Standard-IA storage pricing in US East (N. Virginia) is now $0.10 per GB (60 percent less than DynamoDB Standard), while reads and writes are 25 percent higher.

For more information about this feature and its pricing, see the DynamoDB Standard-IA Feature page and the DynamoDB pricing page.

Marcia

New – Amazon DevOps Guru for RDS to Detect, Diagnose, and Resolve Amazon Aurora-Related Issues using ML

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/new-amazon-devops-guru-for-rds-to-detect-diagnose-and-resolve-amazon-aurora-related-issues-using-ml/

Today we are announcing Amazon DevOps Guru for RDS, a new capability for Amazon DevOps Guru. It allows developers to easily detect, diagnose, and resolve performance and operational issues in Amazon Aurora.

Hundreds of thousands of customers nowadays are using Amazon Aurora because it is highly available, scalable, and durable. But as applications grow in size and complexity, it becomes more challenging for these customers to detect and resolve operational and performance issues quickly.

During last year’s re:Invent, we announced DevOps Guru, a service that uses machine learning (ML) to automatically detect and alert customers of application issues, including database problems. Today we are announcing DevOps Guru for RDS to help developers using Amazon Aurora databases to detect, diagnose, and resolve database performance issues fast and at scale. Now developers will have enough information to determine the exact cause for a database performance issue. This launch will save developers and engineers many hours of work trying to uncover and remediate the performance-related database issues.

DevOps Guru for RDS uses ML to automatically identify and analyze a wide range of performance-related database issues, such as over-utilization of host resources, database bottlenecks, or misbehavior of SQL queries. It also recommends solutions to remediate the issues it finds. To use this capability, you don’t need to be a database or ML expert.

When an issue is detected, DevOps Guru for RDS displays the finding in the DevOps Guru console and sends notifications using Amazon EventBridge or Amazon Simple Notification Service (SNS). This allows developers to automatically manage and take real-time action on the issues.

How DevOps Guru for RDS Works
DevOps Guru for RDS uses anomaly detection on the database load (DB load) performance metric to detect issues. DB load is measured in units of Average Active Sessions (AAS). DB load measures the level of activity in your database, making it a great metric to understand the health of your database. If the DB load is high, this can result in performance issues. This metric can be compared to the number of virtual CPUs (vCPUs), and if the DB load is higher than that number, issues can arise.

The most useful dimensions for this metric are the wait events and the top SQL. The wait event describes what the system conditions that are currently running SQLs are waiting on. The most common reasons why a statement is waiting is that it is waiting for the CPU, waiting for a read or write, or waiting for a locked resource. The top SQL dimension shows which queries are contributing the most to DB load.

The following image is an example of a finding that DevOps Guru for RDS reported. The graph shows that from the AAS, most of them were waiting for access to a table or for CPU.

Example of anomaly detection
If you continue scrolling on the DevOps Guru for RDS analysis page, you can discover the cause for the problem and some recommendations to fix it. In this particular example, two problems were detected: high-load wait events and CPU capacity exceeded.

DevOps Guru for RDS looks more in-depth into these problems. First, it looks at the high-load wait events, where there were 27 AAS for the IO and CPU wait types, which is 99 percent of the total DB load.

Second, it tells us that the running tasks exceeded six processes. This database only has two vCPUs, and the recommended number of running processes should be a maximum of four (2x vCPUs). DevOps Guru for RDS also makes recommendations to fix these issues.

Recommendations

In another anomaly, the graph shows that there was a high load of wait events, and one SQL query was found to require further investigation. You can even see the exact SQL query if you click on the SQL digest IDs. The insight’s analysis and recommendation section is full of information on how to investigate further and fix the issue. You can get a lot of detailed information by clicking on the wait event, for example, on the wait event wait/io/table/sql/handler or in the View troubleshooting doc link.

Analysis and recommendations

Get started with DevOps Guru for RDS
To get started with this new capability of DevOps Guru, make sure that Performance Insights is enabled for your Amazon Aurora DB instances. It supports Amazon Aurora with MySQL- and PostgreSQL-compatibility. For instructions on how to enable Performance Insights, see Enabling and disabling Performance Insights.

The next step is to enable DevOps Guru to start monitoring your AWS resources. You can specify the resources you want to be covered by DevOps Guru.

If you are already using DevOps Guru, whenever there is a new insight for an Amazon Aurora database resource, you will see it in the console.

To see the detailed database analysis, navigate to the Insight page and select the new View analysis button under the DB load aggregated metric. That button will take you to the detailed analysis by DevOps Guru for RDS.

View analysis

Pricing and Availability
DevOps Guru for RDS is offered to customers at no additional charge, as part of the existing price that DevOps Guru charges customers for RDS resources.

DevOps Guru for RDS is available in all Regions where DevOps Guru is available, US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

Learn more about DevOps Guru for RDS and check out the talk at AWS re:Invent “Automatically detect and resolve performance issues with Amazon DevOps Guru for RDS” (Session Id 15877).

Marcia

Enhanced Amazon S3 Integration for Amazon FSx for Lustre

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/enhanced-amazon-s3-integration-for-amazon-fsx-for-lustre/

Today, we are announcing two additional capabilities of Amazon FSx for Lustre. First, a full bi-directional synchronization of your file systems with Amazon Simple Storage Service (Amazon S3), including deleted files and objects. Second, the ability to synchronize your file systems with multiple S3 buckets or prefixes.

Lustre is a large scale, distributed parallel file system powering the workloads of most of the largest supercomputers. It is popular among AWS customers for high-performance computing workloads, such as meteorology, life-science, and engineering simulations. It is also used in media and entertainment, as well as the financial services industry.

I had my first hands-on Lustre file systems when I was working for Sun Microsystems. I was a pre-sales engineer and worked on some deals to sell multimillion-dollar compute and storage infrastructure to financial services companies. Back then, having access to a Lustre file system was a luxury. It required expensive compute, storage, and network hardware. We had to wait weeks for delivery. Furthermore, it required days to install and configure a cluster.

Fast forward to 2021, I may create a petabyte-scale Lustre cluster and attach the file system to compute resources running in the AWS cloud, on-demand, and only pay for what I use. There is no need to know about Storage Area Networks (SAN), Fiber Channel (FC) fabric, and other underlying technologies.

Modern applications use different storage options for different workloads. It is common to use S3 object storage for data transformation, preparation, or import/export tasks. Other workloads may require POSIX file-systems to access the data. FSx for Lustre lets you synchronize objects stored on S3 with the Lustre file system to meet these requirements.

When you link your S3 bucket to your file system, FSx for Lustre transparently presents S3 objects as files and lets you to write results back to S3.

Full Bi-Directional Synchronization with Multiple S3 Buckets
If your workloads require a fast, POSIX-compliant file system access to your S3 buckets, then you can use FSx for Lustre to link your S3 buckets to a file system and keep data synchronized between the file system and S3 in both directions. However, until today, there were a couple limitations. First, you had to manually configure a task to export data back from FSx for Lustre to S3. Second, deleted files on S3 were not automatically deleted from the file system. And third, an FSx for Lustre file system was synchronized with one S3 bucket only. We are addressing these three challenges with this launch.

Starting today, when you configure an automatic export policy for your data repository association, files on your FSx for Lustre file system are automatically exported to your data repository on S3. Next, deleted objects on S3 are now deleted from the FSx for Lustre file system. The opposite is also available: deleting files on FSx for Lustre triggers the deletion of corresponding objects on S3. Finally, you may now synchronize your FSx for Lustre file system with multiple S3 buckets. Each bucket has a different path at the root of your Lustre file system. For example your S3 bucket logs may be mapped to /fsx/logs and your other financial_data bucket may be mapped to /fsx/finance.

These new capabilities are useful when you must concurrently process data in S3 buckets using both a file-based and an object-based workflow, as well as share results in near real time between these workflows. For example, an application that accesses file data can do so by using an FSx for Lustre file system linked to your S3 bucket, while another application running on Amazon EMR may process the same files from S3.

Moreover, you may link multiple S3 buckets or prefixes to a single FSx for Lustre file system, thereby enabling a unified view across multiple datasets. Now you can create a single FSx for Lustre file system and easily link multiple S3 data repositories (S3 buckets or prefixes). This is convenient when you use multiple S3 buckets or prefixes to organize and manage access to your data lake, access files from a public S3 bucket (such as these hundreds of public datasets) and write job outputs to a different S3 bucket, or when you want to use a larger FSx for Lustre file system linked to multiple S3 datasets to achieve greater scale-out performance.

How It Works
Let’s create an FSx for Lustre file system and attach it to an Amazon Elastic Compute Cloud (Amazon EC2) instance. I make sure that the file system and instance are in the same VPC subnet to minimize data transfer costs. The file system security group must authorize access from the instance.

I open the AWS Management Console, navigate to FSx, and select Create file system. Then, I select Amazon FSx for Lustre. I am not going through all of the options to create a file system here, you can refer to the documentation to learn how to create a file system. I make sure that Import data from and export data to S3 is selected.

Lustre - enable S3 synchronizationIt takes a few minutes to create the file system. Once the status is ✅ Available, I navigate to the Data repository tab, and then select Create data repository association.

I choose a Data Repository path (my source S3 bucket) and a file system path (where in the file system that bucket will be imported).

FsX Lustre Data repository

Then, I choose the Import policy and Export policy. I may synchronize the creation of file/objects, their updates, and when they are deleted. I select Create.

FsX Lustre Data repository import policies

When I use automatic import, I also make sure to provide an S3 bucket in the same AWS Region as the FSx for Lustre cluster. FSx for Lustre supports linking to an S3 bucket in a different AWS Region for automatic export and all other capabilities.

Using the console, I see the list of Data repository associations. I wait for the import task status to become ✅ Succeeded. If I link the file system to an S3 bucket with a large number of objects, then I may choose to skip Importing metadata from repository while creating the data repository association, and then load metadata from selected prefixes in my S3 buckets that are required for my workload using an Import task.

FsX for lustre - meta data repository tasks

I create an EC2 instance in the same VPC subnet. Furthermore, I make sure that the FSx for Lustre cluster security group authorizes ingress traffic from the EC2 instance. I use SSH to connect to the instance, and then type the following commands (commands are prefixed with the $ sign that is part of my shell prompt).

# check kernel version, minimum version 4.14.104-95.84 is required 
$ uname -r
4.14.252-195.483.amzn2.aarch64

# install lustre client 
$ sudo amazon-linux-extras install -y lustre2.10
Installing lustre-client
...
Installed:
  lustre-client.aarch64 0:2.10.8-5.amzn2                                                                                                                        

Complete!

# create a mount point 
$ sudo mkdir /fsx

# mount the file system 
$ sudo mount -t lustre -o noatime,flock fs-00...9d.fsx.us-east-1.amazonaws.com@tcp:/ny345bmv /fsx

# verify mount succeeded
$ mount 
...
172.0.0.0@tcp:/ny345bmv on /fsx type lustre (rw,noatime,flock,lazystatfs)

Then, I verify that the file system contains the S3 objects, and I create a new file using the touch command.

Fsx Lustre - check file system

I switch to the AWS Console, under S3 and then my bucket name, and I verify that the file has been synchronized.

Fsx Lustre - check s3

Using the console, I delete the file from S3. And, unsurprisingly, after a few seconds, the file is also deleted from the FSx file system.

Fsx Lustre - check file systems - deleted

Pricing and Availability
These new capabilities are available at no additional cost on Amazon FSx for Lustre file systems. Automatic export and multiple repositories are only available on Persistent 2 file systems in US East (N. Virginia), US East (Ohio), US West (Oregon), Canada (Central), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland). Automatic import with support for deleted and moved objects in S3 is available on file systems created after July 23, 2020 in all regions where FSx for Lustre is available.

You can configure your file system to automatically import S3 updates by using the AWS Management Console, the AWS Command Line Interface (CLI), and AWS SDKs.

Learn more about using S3 data repositories with Amazon FSx for Lustre file systems.

One More Thing
One more thing while you are reading. Today, we also launched the next generation of FSx for Lustre file systems. FSx for Lustre next-gen file systems are built on AWS Graviton processors. They are designed to provide you with up to 5x higher throughput per terabyte (up to 1 GB/s per terabyte) and reduce your cost of throughput by up to 60% as compared to previous generation file systems. Give it a try today!

— seb

PS : my colleague Michael recorded a demo video to show you the enhanced S3 integration for FSx for Lustre in action. Check it out today.

Preview – AWS Backup Adds Support for Amazon S3

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/preview-aws-backup-adds-support-for-amazon-s3/

Starting today, you can preview AWS Backup for Amazon Simple Storage Service (Amazon S3).

AWS Backup is a fully managed, policy-based service that lets you to centralize and automate the backup and restore of your applications spanning across 12 AWS services: Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Block Store (EBS) volumes, Amazon Relational Database Service (RDS) databases (including Amazon Aurora clusters), Amazon DynamoDB tables, Amazon Neptune databases, Amazon DocumentDB (with MongoDB compatibility) databases, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Lustre file systems, Amazon FSx for Windows File Server file systems, AWS Storage Gateway volumes, and now Amazon S3 (in preview).

Modern workloads and systems are leveraging different storage options for different functionalities. In the 21st century, it is normal to build applications relying on non-relational and relational databases, shared file storage, and object storage, just to name of few. When operating and managing these applications, you told us that you wanted centralized protection and provable compliance for application data stored in S3 alongside other AWS services for storage, compute, and databases.

I can see three benefits when integrating Amazon Simple Storage Service (Amazon S3) with your data protection policies in AWS Backup.

First, it lets you centrally manage your applications backups: AWS Backup provides an automated solution to centrally configure backup policies, thereby helping you simplify backup lifecycle management. This also makes it easy to ensure that your application data across AWS services (including S3) is centrally backed up.

Second, it lets you easily restore your data: AWS Backup provides a single-click-restore experience for your S3 data. This lets you perform point-in-time restores of your S3 buckets and objects to a new or existing S3 bucket.

Finally, it improves backup compliance: AWS Backup provides built-in dashboards that let you to track backup and restore operations for S3.

AWS Backup for S3 (Preview) lets you create continuous point-in-time backups along with periodic backups of S3 buckets, including object data, object tags, access control lists (ACLs), and user-defined metadata. The first backup is a full snapshot, while subsequent backups are incremental. If there is a data disruption event, then you choose a backup from the backup vault, and restore an S3 bucket (or individual S3 objects) to a new or existing S3 bucket. AWS Backup is integrated with AWS Organizations, which let you use a single policy across AWS accounts (within your Organizations) to automate backup creation and backup access management.

Furthermore, you can turn on AWS Backup Vault Lock to enable delete protection of the data that you protect with AWS Backup, and thereby improving protection of your immutable backups from accidental deletion or malicious re-encryption.

How to Get Started
AWS Backup works with versioned S3 buckets. Before you get started, turn on S3 Versioning on your buckets to backup.

I must enable S3 in AWS Backup Settings when I use this feature for the first time. Using the AWS Management Console, I navigate to AWS Backup, then select Settings and Configure resources. I enable S3, and select Confirm. This is a one-time operation.

AWS Backup - optin S3

For this demo, I already have an existing backup plan, and I want to add an S3 bucket to this plan. If you want to create a new backup plan, then you can refer to AWS Backup‘s technical documentation.

To start including my S3 objects in my backup plan, I open the AWS Management Console, navigate to Backup plans, and select Assign resources.

AWS Backup Add Resources

I give a name to my Resource assignment. I select Include specific resources types, then I select S3 as Resource type and one or several S3 Bucket names. When I am done, I select Assign resources.

Alternatively, I may use tags or resource IDs to assign S3 resources.

If you have thousands of S3 buckets, I recommend using tags to assign the S3 buckets to a backup plan. AWS Backup matches the tags in S3 buckets to the ones assigned to the backup plan, and it centrally backs up the S3 resources along with other AWS services that your application uses.

The other options are not different from what you know already.

AWS Backup - backup plan for S3

The Bucket names list in the previous screenshot only shows the S3 buckets in the same Region.

Alternatively, I may also create on-demand backups. I navigate to the Protected resources section, and select Create on-demand backup.

I select S3 as the Resource type, and select the Bucket name. As per usual, I choose a Backup Window, a Retention period, a Backup vault, and an IAM role. Then, I select Create on-demand backup.

AWS Backup - on-demand backup for S3After a while, depending on the size of my bucket, the backup is ✅ Completed.

AWS Backup for S3 - Backup completed

All of the backups are encrypted and stored securely in a backup vault that I selected in the backup plan.

A backup vault (or backup storage vault) is an encrypted logical construct in my AWS account that stores and organizes my backups (recovery points). I may create new backup vaults in every AWS Region where AWS Backup is available. I may enable AWS Backup Vault Lock (delete-protection capability) on the backup vault to avoid accidental deletions and prevent malicious actors from re-encrypting my data. AWS Backup stores my continuous backups and periodic snapshots in the backup vault of my preference, and it lets me browse and restore as per my requirements.

How to Restore Objects
Let’s try to restore this backup.

The restore operation is very flexible. I may restore entire S3 buckets or individual S3 objects. I may restore the backups to the source S3 bucket, or to another existing bucket. Furthermore, I may create a new S3 bucket during restore. The S3 buckets must have Versioning enabled. Also, I may change the encryption key during restore.

I navigate to Backup vaults to restore the S3 bucket I just backed up. In the Backups section, I select the Recovery point ID that I want to restore, and I select Restore from the Actions menu.

AWS Backup for S3 - restore

Before starting the restore, I may select a few options:

  • The Restore time: I may restore my continuous backup to a point-in-time in the last 35 days, while I can restore my periodic backups to their original state.
  • The Restore type: I may choose to restore the entire bucket or a subset of objects within it.
  • The Restore destination: I may choose to restore on the same bucket, on another one, or create a new bucket during restore.
  • The Restored object encryption: this lets me select the key I want to use to encrypt the restored objects in the bucket.

I select Restore backup to start the restore.

AWS Backup for S3 - restore optionsI can monitor the progress in the Jobs section, under the Restore jobs tab.

AWS Backup S3 - restore Jobs

When the status turns green to ✅ Completed, my objects are ready to use!

Generally, the most comprehensive data-protection strategies include regular testing and validation of your restore procedures before you need them. Testing your restores also helps to prepare and maintain recovery runbooks. In turn, that ensures operational readiness during a disaster recovery exercise, or an actual data loss scenario.

Availability and Pricing
The preview is available in the US West (Oregon) Region only.

During the preview, there are no charges for creating and storing backups. You will pay the AWS charges for underlying resources, such as S3 storage, API usage, and versioning.

Send us an email at [email protected] including your AWS account ID to register for the preview.

Go ahead and apply to the preview program today.

— seb

Amazon S3 Glacier is the Best Place to Archive Your Data – Introducing the S3 Glacier Instant Retrieval Storage Class

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/amazon-s3-glacier-is-the-best-place-to-archive-your-data-introducing-the-s3-glacier-instant-retrieval-storage-class/

Today we are announcing the Amazon S3 Glacier Instant Retrieval storage class. This new archive storage class delivers the lowest cost storage for long-lived data that is rarely accessed and requires millisecond retrieval.

We are also excited to announce that S3 Intelligent-Tiering now automatically optimizes storage costs for rarely accessed data that needs immediate retrieval with the new Archive Instant Access tier, which is ideal for data with unknown or changing access patterns. For existing customers, this will provide an immediate savings of 68 percent for data that hasn’t been accessed for more than 90 days, with no action needed. The Frequent, Infrequent, and now Archive Instant Access tiers are designed for the same milliseconds access time and high-throughput performance.

In addition, we are announcing the new name for the existing Amazon S3 Glacier storage class and several price reductions.

Amazon S3 Glacier Instant Retrieval
The Amazon S3 Glacier storage classes are extremely low-cost and built for data archiving. They are secure and durable, and they are designed to provide the lowest cost for data that does not require immediate access, with retrieval options from minutes to hours.

Many customers need to store rarely accessed data for several years. However the data must be highly available and immediately accessible. Today, these customers use the S3 Standard-Infrequent Access (S3 Standard-IA) storage class. This storage class offers low cost for storage and allows customers to retrieve their data instantly.

S3 Glacier Instant Retrieval is a new storage class that delivers the fastest access to archive storage, with the same low latency and high-throughput performance as the S3 Standard and S3 Standard-IA storage classes. You can save up to 68 percent on storage costs as compared with using the S3 Standard-IA storage class when you use the S3 Glacier Instant Retrieval storage class and pay a low price to retrieve data. For example, in the US East (N. Virginia) Region, S3 Glacier Instant Retrieval storage pricing is $0.004 per GB-month and data retrieval is $0.03 per GB. Learn more about pricing for your Region.

Media archives, medical images, or user-generated content are just a few examples of ideal use cases for S3 Glacier Instant Retrieval. Once created, this content is rarely accessed, but when it is needed it must be available in milliseconds.

To get started using the new storage class from the Amazon S3 console, upload an object as you would normally, and select the S3 Glacier Instant Retrieval storage class.

Upload object with the new storage class

This feature is available programmatically from AWS SDKs, AWS Command Line Interface (CLI), and AWS CloudFormation.

In my opinion, the easiest way to store data in S3 Glacier Instant Retrieval is to use the S3 PUT API using the CLI. When using this API, set the storage class to GLACIER_IR.

aws s3api put-object --bucket <bucket-name> --key <object-key> --body <name-file> --storage-class GLACIER_IR

When the object is uploaded to Amazon S3, verify the storage class in the list of objects or on the object details page.

Storage classes

For data that already exists in Amazon S3, you can use S3 Lifecycle to transition data from the S3 Standard and S3 Standard-IA storage classes into S3 Glacier Instant Retrieval.

New Archive Instant Access Tier in S3 Intelligent-Tiering
S3 Intelligent-Tiering is a storage class that automatically moves objects between access tiers to optimize costs. This is the recommended storage class for data with unpredictable or changing access patterns, such as in data lakes, analytics, or user-generated content.

Until today, there were two low latency access tiers optimized for frequent and infrequent access, and two optional archive access tiers designed for asynchronous access optimized for rare access at a low cost.

Beginning today, the Archive Instant Access tier is added as a new access tier in the S3 Intelligent-Tiering storage class. You will start seeing automatic costs savings for your storage in S3 Intelligent-Tiering for rarely accessed objects.

The Archive Instant Access tier joins the group of low latency access tiers. This new tier is optimized for data that is not accessed for months at a time but, when it is needed, is available within milliseconds.

S3 Intelligent-Tiering automatically stores objects in three access tiers that deliver the same performance as the S3 Standard storage class:

  • Frequent Access tier
  • Infrequent Access tier
  • Archive Instant Access (new)

For a small monitoring and automation charge, S3 Intelligent-Tiering monitors access patterns and moves objects between the different access tiers. Objects that have not been accessed for 30 consecutive days are moved from the Frequent Access tier to the Infrequent Access tier for savings of 40 percent. When an object hasn’t been accessed for 90 consecutive days, S3 Intelligent-Tiering will move the object from the Infrequent Access tier to the Archive Instant Access tier, with a savings of 68 percent. If the data is accessed later, it is automatically moved back to the Frequent Access tier. No tiering charges apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class.

S3 Intelligent-Tiering access tiers

To get started with this new access tier, select Intelligent-Tiering as the storage class for an object when uploading an object using the S3 console. After 90 days of inactivity (30 days in Frequent Access tier and 60 days in Infrequent Access tier), S3 Intelligent-Tiering will automatically move the object to the Archive Instant Access tier. The introduction of the new Archive Instant Access tier has no impact on performance when you retrieve objects.

New name for the Amazon S3 Glacier storage class – S3 Glacier Flexible Retrieval
The existing Amazon S3 Glacier storage class is now named S3 Glacier Flexible Retrieval. This storage class now has free bulk retrievals in 5 to 12 hours, and the storage price has been reduced by 10 percent in all Regions, effective December 1, 2021. S3 Glacier Flexible Retrieval is now even more cost-effective, and the free bulk retrievals make it ideal for retrieving large data volumes.

These are the Amazon S3 archive storage classes:

  • S3 Glacier Instant Retrieval: The newest storage class is optimized for long-lived data that is rarely accessed (typically once per quarter). However when data is needed, it is available within milliseconds. For example, medical images and news media assets are perfect for this storage class.
  • S3 Glacier Flexible Retrieval: This newly renamed storage class is optimized for archiving data that can be retrieved in minutes or with free bulk retrievals in 5 to 12 hours. This storage class is ideal for backups and disaster recovery use cases, where you have large amounts of long-term, rarely accessed data, and you don’t want to worry about retrieval costs when you need the data.
  • S3 Glacier Deep Archive: This storage class is the lowest-cost storage in the cloud and is optimized for archiving data that can be restored in at least 12 hours. It’s great for storing your compliance archives or for digital media preservation.

Amazon S3 has reduced storage prices!
We are excited to announce that Amazon S3 has reduced storage prices of up to 31 percent in the S3 Standard-IA and S3 One Zone-IA storage classes across 9 AWS Regions: US West (N. California), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and South America (São Paulo). These price reductions are effective December 1, 2021.

Learn more about price reduction details.

Available Now
The new storage class, S3 Glacier Instant Retrieval, and the new Archive Instant Access tier in S3 Intelligent-Tiering are available today (November 30, 2021) in all AWS Regions.

The price cut for S3 Glacier and free bulk retrievals in all AWS Regions, and the S3 Standard-Infrequent Access/One Zone-Infrequent storage class in nine Regions will be effective on December 1, 2021.

Learn more about the storage classes changes and all the storage classes.

Marcia

New – Simplify Access Management for Data Stored in Amazon S3

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/new-simplify-access-management-for-data-stored-in-amazon-s3/

Today, we are introducing a couple new features that simplify access management for data stored in Amazon Simple Storage Service (Amazon S3). First, we are introducing a new Amazon S3 Object Ownership setting that lets you disable access control lists (ACLs) to simplify access management for data stored in Amazon S3. Second, the Amazon S3 console policy editor now reports security warnings, errors, and suggestions powered by IAM Access Analyzer as you author your S3 policies.

Since launching 15 years ago, Amazon S3 buckets have been private by default. At first, the only way to grant access to objects was using ACLs. In 2011, AWS Identity and Access Management (IAM) was announced, which allowed the use of policies to define permissions and control access to buckets and objects in Amazon S3. Nowadays, you have several ways to control access to your data in Amazon S3, including IAM policies, S3 bucket policies, S3 Access Points policies, S3 Block Public Access, and ACLs.

ACLs are an access control mechanism in which each bucket and object has an ACL attached to it. ACLs define which AWS accounts or groups are granted access as well as the type of access. When an object is created, the ownership of it belongs to the creator.  This ownership information is embedded in the object ACL. When you upload an object to a bucket owned by another AWS account, and you want the bucket owner to access the object, then permissions need to be granted in the ACL. In many cases, ACLs and other kinds of policies are used within the same bucket.

The new Amazon S3 Object Ownership setting, Bucket owner enforced, lets you disable all of the ACLs associated with a bucket and the objects in it. When you apply this bucket-level setting, all of the objects in the bucket become owned by the AWS account that created the bucket, and ACLs are no longer used to grant access. Once applied, ownership changes automatically, and applications that write data to the bucket no longer need to specify any ACL. As a result, access to your data is based on policies. This simplifies access management for data stored in Amazon S3.

With this launch, when creating a new bucket in the Amazon S3 console, you can choose whether ACLs are enabled or disabled. In the Amazon S3 console, when you create a bucket, the default selection is that ACLs are disabled. If you wish to keep ACLs enabled, you can choose other configurations for Object Ownership, specifically:

  • Bucket owner preferred: All new objects written to this bucket with the bucket-owner-full-controlled canned ACL will be owned by the bucket owner. ACLs are still used for access control.
  • Object writer: The object writer remains the object owner. ACLs are still used for access control.

Options for object ownership

For existing buckets, you can view and manage this setting in the Permissions tab.

Before enabling the Bucket owner enforced setting for Object Ownership on an existing bucket, you must migrate access granted to other AWS accounts from the bucket ACL to the bucket policy. Otherwise, you will receive an error when enabling the setting. This helps you ensure applications writing data to your bucket are uninterrupted. Make sure to test your applications after you migrate the access.

Policy validation in the Amazon S3 console
We are also introducing policy validation in the Amazon S3 console to help you out when writing resource-based policies for Amazon S3. This simplifies authoring access control policies for Amazon S3 buckets and access points with over 100 actionable policy checks powered by IAM Access Analyzer.

To access policy validation in the Amazon S3 console, first go to the detail page for a bucket. Then, go to the Permissions tab and edit the bucket policy.

Accessing the IAM Policy Validation in S3 consoleWhen you start writing your policy, you see that, as you type, different findings appear at the bottom of the screen. Policy checks from IAM Access Analyzer are designed to validate your policies and report security warnings, errors, and suggestions as findings based on their impact to help you make your policy more secure.

You can also perform these checks and validations using the IAM Access Analyzer’s ValidatePolicy API.

Example of policy suggestion

Availability
Amazon S3 Object Ownership is available at no additional cost in all AWS Regions, excluding the AWS China Regions and AWS GovCloud Regions. IAM Access Analyzer policy validation in the Amazon S3 console is available at no additional cost in all AWS Regions, including the AWS China Regions and AWS GovCloud Regions.

Get started with Amazon S3 Object Ownership through the Amazon S3 console, AWS Command Line Interface (CLI), Amazon S3 REST API, AWS SDKs, or AWS CloudFormation. Learn more about this feature on the documentation page.

And to learn more and get started with policy validation in the Amazon S3 console, see the Access Analyzer policy validation documentation.

Marcia

New for AWS Backup – Support for VMware and VMware Cloud on AWS

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-backup-support-for-vmware-and-vmware-cloud-on-aws/

Today, I am happy to announce AWS Backup support for VMware, a new capability that enables you to centralize and automate data protection of virtual machines (VMs) running on VMware on premises and VMware CloudTM on AWS. You can now use a single, centrally managed policy in AWS Backup to protect these VMware environments together with 12 AWS compute, storage, and database services already supported by AWS Backup. You can then use AWS Backup to restore VMware workloads to on-premises data centers and VMware Cloud on AWS.

While doing so, AWS Backup Audit Manager lets you consistently demonstrate compliance by monitoring backup, copy, and restore operations and generating auditor-ready reports to satisfy your data governance and regulatory requirements.

Let’s see how this works in practice.

Using AWS Backup Support for VMware
There are three steps to back up VMware virtual machines (VMs) with AWS Backup:

  1. Create a gateway to connect AWS Backup to your hypervisor.
  2. Connect to your hypervisor through the gateway.
  3. Assign virtual machines managed by your hypervisor to a backup plan.

AWS Back Support for VMware diagram

On the left pane of the AWS Backup console, there is a new External resources section. There, I choose Gateways and then Create gateway. This AWS Backup gateway helps with discovery of the on-premises VMware environment and acts as a cloud gateway to send and receive data.

I download the Open Virtualization Format (OVF) file of the AWS Backup gateway and follow the instructions to deploy the gateway using the VMware vSphere client. I am using an internal test and development VMware environment for this walkthrough.

VMware vCenter screenshot.

After deploying the gateway in my VMware environment, I come back to the AWS Backup console. I write a name for the gateway (for simplicity, I use the same name of the gateway VM) and the IP address of the gateway VM. Optionally, I can add tags to help organize and track my setup. I go on and create the gateway.

Console screenshot.

Now, I choose Add hypervisor. I write a name for the hypervisor and the IP address of the VMware vCenter server host.

Console screenshot.

I enter the username and password of a service account that I created for AWS Backup on the Active Directory domain. The username should include the domain (for example, username@domain). Then, I choose the encryption key to protect the service account credentials. If I don’t choose my own AWS Key Management Service (KMS) key, AWS Backup encrypts the username and password using a key that AWS owns and manages.

Console screenshot.

I select the gateway to connect to the hypervisor and choose Test gateway connection. This test helps ensure that the gateway can communicate with the hypervisor before I complete the configuration. Optionally, I can add tags to help organize and track my setup. I go on and add the hypervisor.

Console screenshot.

After a few minutes, the hypervisor is online, and I see the VMs managed by vCenter in the AWS Backup console. I can now use these virtual machines as resources in my backup plans in the same way as the other AWS compute, storage, and database resources supported by AWS Backup.

Console screenshot.

I create a new backup plan and start with a template. The rules of the template enforce daily backups with five weeks of retention and monthly backups with one year of retention. I can customize these rules based on my requirements.

Console screenshot.

Then, I choose to assign resources to the backup plan, and I select three VMs.

Console screenshot.

If you need, you can create an on-demand backup in the Protected resources section of the console. For example, here I am starting the on-demand backup for one of the VMs.

Console screenshot.

When a backup is complete, VMs are added to the list of the protected resources, and I can initiate a restore.

Console screenshot.

I select the backup and choose Restore. Then, I enter the restore location, which can be the same VMware environment I used for the backup or another (for example, on VMware Cloud on AWS). Below, I specify name, path, compute resource name, and datastore to use for the restore. Then, I choose Restore backup.

Console screenshot.

I monitor the status of my backup and restore jobs from the AWS Backup console. To monitor backup and restore metrics over a period of time, I can use Amazon CloudWatch metrics, logs, and alarms. I can also send events to Amazon EventBridge to receive notifications once a job completes or fails.

Availability and Pricing
AWS Backup support for VMware is available in the US East (N. Virginia, Ohio), US West (N. California, Oregon), GovCloud (US-East, US-West), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), South America (São Paulo), Asia Pacific (Hong Kong, Mumbai, Seoul, Singapore, Sydney, Tokyo, Osaka), Middle East (Bahrain), and Africa (Cape Town) Regions. Please see the AWS Regional Services List for more information.

AWS Backup supports VMware ESXi 6.7.x and 7.0.x VMs running on NFS, VMFS, and VSAN data stores on premises and in VMware Cloud on AWS. In addition, AWS Backup supports both SCSI Hot-Add and Network Block Device (NBD) transport modes for copying data from source VMs to AWS.

With AWS Backup support for VMware, you pay using the same dimensions that AWS Backup uses today: backup storage, restore, and cross-region data transfer. For more information, see the AWS Backup pricing page.

Your VM backups are stored in a backup vault. All backups stored and managed by AWS Backup are replicated to 3 Availability Zones (AZs) in the Region and designed for 99.999999999 percent (11 9s) durability and 99.99 percent (4 9s) of service availability.

AWS Backup supports first full, then incremental-forever, backups of VMs that you can create on-demand or via a schedule configured in your backup plan. AWS Backup always does full restores even though backups are stored as incremental, enabling you to benefit from storage efficiency cost savings while easily performing restores.

Centrally protect your VMware environments and your AWS compute, storage, and database resources with AWS Backup.

Danilo

New – Amazon FSx for OpenZFS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-fsx-for-openzfs/

Last month, my colleague Bill Vass said that we are “slowly adding additional file systems” to Amazon FSx. I’d question Bill’s definition of slow, given that his team has launched Amazon FSx for Lustre, Amazon FSx for Windows File Server, and Amazon FSx for NetApp ONTAP in less than three years.

Amazon FSx for OpenZFS
Today I am happy to announce Amazon FSx for OpenZFS, the newest addition to the Amazon FSx family. Just like the other members of the family, this new addition lets you use a popular file system without having to deal with hardware provisioning, software configuration, patching, backups, and the like. You can create a file system in minutes and begin to enjoy the benefits of OpenZFS right away: transparent compression, continuous integrity verification, snapshots, and copy-on-write. Even better, you get all of these benefits without having to develop the specialized expertise that has traditionally been needed to set up and administer OpenZFS.

FSx for OpenZFS is powered by the AWS Graviton family processors and AWS SRD (Scalable Reliable Datagram) Networking, and can deliver up to 1 million IOPS with latencies of 100-200 microseconds, along with up to 4 GB/second of uncompressed throughput, up to 12 GB/second of compressed throughput, and up to 12.5 GB/second throughput to cached data. FSx for OpenZFS supports the OpenZFS Adaptive Replacement Cache (ARC) and uses memory in the file server to provide faster performance. It also supports advanced NFS performance features such as session trunking and NFS delegation, allowing you to get very high throughput and IOPS from a single client, while still safely caching frequently accessed data on the client side.

FSx for OpenZFS volumes can be accessed from cloud or on-premises Linux, MacOS, and Windows clients via industry-standard NFS protocols (v3, v4, v4.1, and v4.2). Cloud clients can be Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (EKS) clusters, Amazon WorkSpaces virtual desktops, and VMware Cloud on AWS. Your data is stored in encrypted form and replicated within an AWS Availability Zone, with components replaced automatically and transparently as necessary.

You can use FSx for OpenZFS to address your highly demanding machine learning, EDA (Electronic Design Automation), media processing, financial analytics, code repository, DevOps, and web content management workloads. With performance that is close to local storage, FSx for OpenZFS is great for these and other latency-sensitive workloads that manipulate and sequentially access many small files. Finally, because you can create, mount, use, and delete file systems as needed, you can now use OpenZFS in a dynamic, agile fashion.

Using Amazon FSx for OpenZFS
I can create an OpenZFS file system using the AWS Management Console, CLI, APIs, or AWS CloudFormation. From the FSx Console I click Create file system and choose Amazon FSx for OpenZFS:

I can choose Quick create (and use recommended best-practice configurations), or Standard create (and set all of the configuration options myself). I’ll take the easy route and use the recommended best practices to get started. I enter a name (Jeff-OpenZFS) select the amount of SSD storage that I need, choose a VPC & subnet, and click Next:

The console shows me that I can edit many of the attributes of my file system later if necessary. I review the settings and click Create file system:

My file system is ready within a minute or two, and I click Attach to get the proper commands to mount it to my client:

To be more precise, I am mounting the root volume (/fsx) of my file system. Once it is mounted, I can use it as I would any other file system. After I add some files to it, I can use the Action menu in the console to create a backup:

I can restore the backup to a new file system:

As I noted earlier, each file system can deliver up to 4 gigabytes per second of throughput for uncompressed data. I can look at total throughput and other metrics in the console:

I can set throughput capacity of each volume when I create it, and then change it later if necessary:

Changes take effect within minutes. The file system remains active and mounted while the change is put into effect, but some operations may pause momentarily:

A single OpenZFS file system can contain multiple volumes, each with separate quotas (overall volume storage, per-user storage, and per-group storage) and compression settings. When I use the quick create option a root volume named fsx is created for me; I can click Create volume to create more volumes at any time:

The new volume exists within the namespace hierarchy of the parent, and can be mounted separately or accessed from the parent.

Things to Know
Here are a couple of quick facts and to wrap up this post:

Pricing – Pricing is based on the provisioned storage capacity, throughput, and IOPS.

Regions – Amazon FSx for OpenZFS is available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Canada (Central), Asia Pacific (Tokyo), and Europe (Frankfurt) Regions.

In the Works – We are working on additional features including storage scaling, IOPS scaling, a high availability option and another storage class.

Now Available
Amazon FSx for OpenZFS is available now and you can start using it today!

Jeff;

AWS Nitro SSD – High Performance Storage for your I/O-Intensive Applications

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-nitro-ssd-high-performance-storage-for-your-i-o-intensive-applications/

We love to solve difficult problems for our customers! As you have seen through the years, innovation at AWS takes many forms, and encompasses both hardware and software.

One of my favorite examples of customer-driven innovation is AWS Nitro System, which I first wrote about back in mid-2018. In that post I told you how Nitro System would allow us to innovate more quickly than ever, with the goal of creating instances that would run even more types of workloads. I also shared the basic building blocks, as they existed at that time, including Nitro Cards to accelerate and offload network and storage I/O, the Nitro Security Chip to monitor and protect hardware resources, and the Nitro Hypervisor to manage memory and CPU allocation with very low overhead.

Today I would like to tell you about one more building block!

AWS Nitro SSD
For decades, traditional hard drives (sometimes jokingly referred to as spinning rust) were the primary block storage devices. Today, while spinning rust still has its place, most high-performance storage is based on more modern Solid State Drives (SSD). Open up an SSD and you will find lots of flash memory and a firmware-driven processor that manages access to the memory and supports higher-level functions such as block mapping, encryption, caching, wear leveling, and so forth.

The scale of the AWS Cloud and the range of customer use cases that it supports gives us some valuable insights into the ways that today’s applications, database engines, and operating systems make use of block storage. As a result, after delivering several generations of EC2 instances we saw an opportunity to do better. Our goal was to allow I/O-intensive workloads (relational databases, NoSQL databases, data warehouses, search engines, and analytics engines to name a few) to run faster and with more predictable performance.

Today I would like to tell you about the AWS Nitro SSD. The first generation of these devices were used to power io2 Block Express EBS volumes, and allow us to give you EBS volumes with lots of IOPS, plenty of throughput, and a maximum volume size of 64 TiB. The Im4gn and Is4gen instances that I wrote about earlier today make use of the second generation of AWS Nitro SSDs, as will many future EC2 instances, including the I4i instances that we preannounced today.

The AWS Nitro SSDs are designed to be installed and to operate at cloud scale. While this sounds like a simple exercise in manufacturing and installing more devices, the reality is a lot more complex and a lot more interesting. As I noted earlier, the firmware inside of each device is responsible for implementing many lower-level functions. As our customers push the devices to their limits, they expect us to be able to diagnose and resolve any performance inconsistencies they observe. Building our own devices allows us to design in operational telemetry and diagnostics, along with mechanisms that enable us to install firmware updates at cloud scale & at cloud speed. Taking this even further, we developed our own code to manage the instance-level storage in order to further improve the reliability and debug-ability, and to deliver consistent performance.

On the performance side, our deep understanding of cloud workloads led us to engineer the devices so that they can deliver maximum performance under a sustained, continuous load. SSDs are built from fast, dense flash memory. Due to the characteristics of this semiconductor memory, each cell can only be written, erased, and then rewritten a limited number of times. In order to make the devices last as long as possible, the firmware is responsible for a process known as wear leveling. I don’t understand the details, but I assume that this includes some sort of mapping from logical block numbers to physical cells in a way that evens out the number of cycles over time. There’s some housekeeping (a form of garbage collection) involved in this process, and garden-variety SSDs can slow down (creating latency spikes) at unpredictable times when dealing with a barrage of writes. We also took advantage of our database expertise and built a very sophisticated, power-fail-safe journal-based database into the SSD firmware.

The second generation of AWS Nitro SSDs were designed to avoid latency spikes and deliver great I/O performance on real-world workloads. Our benchmarks show instances that use the AWS Nitro SSDs, such as the new Im4gn and Is4gen, deliver 75% lower latency variability than I3 instances, giving you more consistent performance.

Putting all of this together, there’s a very tight, rapidly rotating flywheel in action here because the team that builds the Nitro SSDs is part of the AWS storage team, and also has operational responsibilities. Like all teams at AWS, they watch the metrics day-in and day-out, and can efficiently deploy new firmware using a CI/CD model.

Join the Team
As is always the case, there’s always more innovation ahead, and we have some awesome positions on the teams that design the AWS Nitro SSDs. For example:

Jeff;

New Storage-Optimized Amazon EC2 Instances (Im4gn and Is4gen) Powered by AWS Graviton2 Processors

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-storage-optimized-amazon-ec2-instances-im4gn-and-is4gen-powered-by-aws-graviton2-processors/

EC2 storage-optimized instances are designed to deliver high disk I/O performance, and plenty of storage. Our customers use them to host high-performance real-time databases, distributed file systems, data warehouses, key-value stores, and more. Over the years we have released multiple generations of storage-optimized instances including the HS1 (2012) , D2 (2015), I2 (2013) , I3 (2017), I3en (2019), and D3/D3en (2020).

As I look back on all of these launches, it is interesting to see how we continue to provide an ever-increasing set of options that make each successive generation an even better fit for the diverse (and also ever-increasing) needs of our customers. HS1 instances were available in just one size, D2 and I2 in four, I3 in six, and I3en in eight. These instances give our customers the freedom to choose the size that best meets their current needs while also giving them room to scale up or down if those needs happen to change.

Im4gn and Is4gen
Today I am happy to introduce the two newest families of storage-optimized instances, Im4gn and Is4gen, powered by Graviton2 processors. Both instances offer up to 30 TB of NVMe storage using AWS Nitro SSD devices that are custom-built by AWS. As part of our drive to innovate on behalf of our customers, we turned our attention to storage and designed devices that were optimized to support high-speed access to large amounts of data. The AWS Nitro SSDs reduce I/O latency by up to 60% and also reduce latency variability by up to 75% when compared to the third generation of storage-optimized instances. As a result you get faster and more predictable performance for your I/O-intensive EC2 workloads.

Im4gn instances are a great fit for applications that require large amounts of dense SSD storage and high compute performance, but are not especially memory intensive such as social games, session storage, chatbots, and search engines. Here are the specs:

Instance Name vCPUs
Memory Local NVMe Storage
(AWS Nitro SSD)
Read Throughput
(128 KB Blocks)
EBS-Optimized Bandwidth Network Bandwidth
im4gn.large 2 8 GiB 937 GB 250 MB/s Up to 9.5 Gbps Up to 25 Gbps
im4gn.xlarge 4 16 GiB 1.875 TB 500 MB/s Up to 9.5 Gbps Up to 25 Gbps
im4gn.2xlarge 8 32 GiB 3.75 TB 1 GB/s Up to 9.5 Gbps Up to 25 Gbps
im4gn.4xlarge 16 64 GiB 7.5 TB 2 GB/s 9.5 Gbps 25 Gbps
im4gn.8xlarge 32 128 GiB 15 TB
(2 x 7.5 TB)
4 GB/s 19 Gbps 50 Gbps
im4gn.16xlarge 64 256 GiB 30 TB
(4 x 7.5 TB)
8 GB/s 38 Gbps 100 Gbps

Im4gn instances provide up to 40% better price performance and up to 44% lower cost per TB of storage compared to I3 instances. The new instances are available in the AWS US West (Oregon), US East (Ohio), US East (N. Virginia), and Europe (Ireland) Regions as On-Demand, Spot, Savings Plan, and Reserved instances.

Is4gen instances are a great fit for applications that do large amounts of random I/O to large amounts of SSD storage. This includes shared file systems, stream processing, social media monitoring, and streaming platforms, all of which can use the increased storage density to retain more data locally. Here are the specs:

Instance Name vCPUs
Memory Local NVMe Storage
(AWS Nitro SSD)
Read Throughput
(128 KB Blocks)
EBS-Optimized Bandwidth Network Bandwidth
is4gen.medium 1 6 GiB 937 GB 250 MB/s Up to 9.5 Gbps Up to 25 Gbps
is4gen.large 2 12 GiB 1.875 TB 500 MB/s Up to 9.5 Gbps Up to 25 Gbps
is4gen.xlarge 4 24 GiB 3.75 TB 1 GB/s Up to 9.5 Gbps Up to 25 Gbps
is4gen.2xlarge 8 48 GiB 7.5 TB 2 GB /s Up to 9.5 Gbps Up to 25 Gbps
is4gen.4xlarge 16 96 GiB 15 TB
(2 x 7.5 TB)
4 GB/s 9.5 Gbps 25 Gbps
is4gen.8xlarge 32 192 GiB 30 TB
(4 x 7.5 TB)
8 GB/s 19 Gbps 50 Gbps

Is4gen instances provide 15% lower cost per TB of storage and up to 48% better compute performance compared to I3en instances. The new instances are available in the AWS US West (Oregon), US East (Ohio), US East (N. Virginia), and Europe (Ireland) Regions as On-Demand, Spot, Savings Plan, and Reserved instances.

Available Now
As I never get tired of saying, these new instances are available now and you can start using them today. You can use Amazon Linux 2, Ubuntu 18.04.05 (and newer), Red Hat Enterprise Linux 8.0, and SUSE Enterprise Server 15 (and newer) AMIs, along with the container-optimized ECS and EKS AMIs. Learn more about the Im4gn and Is4gen instances.

Jeff;

PS – As of this launch twelve EC2 instance types are now powered by Graviton2 processors! To learn more, visit the Graviton2 page.

New for AWS Control Tower – Region Deny and Guardrails to Help You Meet Data Residency Requirements

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-control-tower-region-deny-and-guardrails-to-help-you-meet-data-residency-requirements/

Many customers, such as those in highly regulated industries and the public sector, want to have control over where their data is stored and processed. AWS already offers many tools and features to comply with local laws and regulations, but we want to provide a simplified way to translate data residency requirements into controls that can be applied to single- and multi-account environments.

Starting today, you can use AWS Control Tower to deploy data residency preventive and detective controls, referred to as guardrails. These guardrails will prevent provisioning resources in unwanted AWS Regions by restricting access to AWS APIs through service control policies (SCPs) built and managed by AWS Control Tower. In this way, content cannot be created or transferred outside of your selected Regions at the infrastructure level. In this context, content can be software (including machine images), data, text, audio, video, or images hosted on AWS for processing or storage. For example, AWS customers in Germany can deny access to AWS services in Regions outside of Frankfurt with the exception of global services such as AWS Identity and Access Management (IAM) and AWS Organizations.

AWS Control Tower also offers guardrails to further control data residency in underlying AWS service options, for example, blocking Amazon Simple Storage Service (Amazon S3) cross-region replication or blocking the creation of internet gateways.

The AWS account used for managing AWS Control Tower is not restricted by the new Region deny settings. That account can be used for remediation if you have data in an unwanted Region before enabling Region deny.

Detective guardrails are implemented via AWS Config rules and can further detect unexpected configuration changes that should not be allowed.

You still retain a shared responsibility model for data residency at the application level, but these controls can help you restrict what infrastructure and application teams can do on AWS.

Using Data Residency Guardrails in AWS Control Tower
To use the new data residency guardrails, you need to have created a landing zone using AWS Control Tower. See Plan your AWS Control Tower landing zone for more information.

To see all the new controls that are available, I select Guardrails on the left pane of the AWS Control Tower console and then find those in the Data Residency category. I sort results by Behavior. Guardrails that have a Prevention behavior are implemented as SCPs. Those that have a Detection behavior are implemented as AWS Config rules.

Console screenshot.

The most interesting guardrail is probably the one denying access to AWS based on the requested AWS Region. I choose it from the list and find that it is different from the other guardrails because it affects all Organizational Units (OUs) and cannot be activated here but must be activated in the landing zone settings.

Console screenshot.

Below the Overview, in the Guardrail components, there is a link to the full SCP for this guardrail, and I can see the list of the AWS APIs that, when this setting is enabled, are still going to be allowed towards non-governed Regions. Depending on your requirements, some of those services, such as Amazon CloudFront or AWS Global Accelerator, can be further limited by a custom SCP.

In the Landing zone settings, the Region deny guardrail is currently not enabled. I choose Modify settings and then enable the Region deny settings.

Console screenshot.

Below the Region deny settings, there is the list of AWS Regions governed by the landing zone. Those will be the regions allowed when I enable Region deny.

Console screenshot.

In my case, I have four governed Regions, two in the US and two in Europe:

  • US East (N. Virginia), which is also the home Region for the landing zone
  • US West (Oregon)
  • Europe (Ireland)
  • Europe (Frankfurt)

I choose Update landing zone at the bottom. The update of the landing zone takes a few minutes to complete. Now, the vast majority of the AWS APIs are blocked if they are not directed to one of those governed Regions. Let’s do a few tests.

Testing Region Deny in a Sandbox Account
Using AWS Single Sign-On, I copy the AWS credentials to use the sandbox account with AWSAdministratorAccess permissions. In a terminal, I paste the commands setting the environment variables to use those credentials.

Console screenshot.

Now, I try to start a new Amazon Elastic Compute Cloud (Amazon EC2) instance in US East (Ohio), one of the non-governed Regions. In a landing zone, the default VPC is replaced by a VPC managed by AWS Control Tower. To start the instance, I need to specify a VPC subnet. Let’s find a subnet ID that I can use.

aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --region us-east-2

An error occurred (UnauthorizedOperation) when calling the DescribeSubnets operation:
You are not authorized to perform this operation.

As expected, I am not authorized to perform this operation in US East (Ohio). Let’s try to start an EC2 instance without passing the subnet ID.

aws ec2 run-instances --image-id ami-0dd0ccab7e2801812 --region us-east-2 \
    --instance-type t3.small                                     

An error occurred (UnauthorizedOperation) when calling the RunInstances operation:
You are not authorized to perform this operation.
Encoded authorization failure message: <ENCODED MESSAGE>

Again, I am not authorized. More information is included in the encoded authorization failure message that I can decode as described in this article:

aws sts decode-authorization-message --encoded-message <ENCODED MESSAGE>

The decoded message (that I have omitted for brevity) tells me that there was an explicit deny to my request and includes the full SCP that caused the deny. This information is really useful for debugging these kind of errors.

Now, let’s try in US East (N. Virginia), one of the four governed regions.

aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --region us-east-1
"subnet-0f3580c0c5e56c210"

This time, the command returns the subnet ID of the first subnet returned by the request. Let’s start an instance in US East (N. Virginia) using this subnet.

aws ec2 run-instances --image-id  ami-04ad2567c9e3d7893 --region us-east-1 \
    --instance-type t3.small --subnet-id subnet-0f3580c0c5e56c210

As expected, it works, and I can see the EC2 instance running in the console.

Console screenshot.

Similarly, APIs for other AWS services are limited by the Region deny settings. For example, I can’t create an S3 bucket in a non-governed Region.

Console screenshot.

When I try to create the bucket, I get an access denied error.

Console screenshot.

As expected, the creation of an S3 bucket works in a governed Region.

Even if someone gives this account access to a bucket in a non-governed Region, I would not be able to copy any data into that bucket.

Other preventive guardrails can enforce data residency, for example:

  • Disallow cross-region networking for Amazon EC2, Amazon CloudFront, and AWS Global Accelerator
  • Disallow internet access for an Amazon VPC instance managed by a customer
  • Disallow Amazon Virtual Private Network (VPN) connections

Now, let’s see how detective guardrails work.

Testing Detective Guardrails in a Sandbox Account
I enable the following guardrails for all accounts in the sandbox OU:

  • Detect whether Amazon EBS snapshots are restorable by all AWS accounts
  • Detect whether public routes exist in the route table for an internet gateway

Now, I want to see what happens if I go against these guardrails. In the EC2 console, I create an EBS snapshot for the volume of the EC2 instance I started before. Then, I modify permissions to share it with all AWS accounts.

Console screenshot.

Then, in the VPC console, I create an internet gateway, attach it to the AWS Control Tower managed VPC, and update the route table of one of the private subnets to use the internet gateway.

Console screenshot.

After a few minutes, the noncompliant resources in the sandbox account are found by the detective guardrails.

Console screenshot.

I look at the information provided by the guardrails and update my configuration to fix the issues. In a multi-account setup I’d contact the account owner and ask for remediation.

Availability and Pricing
You can use data-residency guardrails to control resources in any AWS Region. To create a landing zone, you should start from one of the Regions where AWS Control Tower is offered. For more information, see the AWS Regional Services List. There is no additional cost for this feature. You pay the costs of other services used, such as AWS Config.

This feature provides you with a framework of controls and guidance for setting up a multi-account environment that addresses data residency requirements. Depending on your use case, you may use any subset of the new data residency guardrails.

Set up guardrails based on your data residency requirements with AWS Control Tower.

Danilo

Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/

As an organization facing business problems and dealing with data on a daily basis, the ability to build systems that can predict business outcomes becomes very important. This ability lets you solve problems and move faster by automating slow processes and embedding intelligence in your IT systems.

But how do you make sure that all teams and individual decision makers in the organization are empowered to create these machine learning (ML) systems at scale, and without depending on other data science and data engineering teams? As a business user or data analyst, you’d like to build and use prediction systems based on the data that you analyze and process every day, without having to learn about hundreds of algorithms, training parameters, evaluation metrics, and deployment best practices.

Today, I’m excited to announce the general availability of Amazon SageMaker Canvas, a new visual, no code capability that allows business analysts to build ML models and generate accurate predictions without writing code or requiring ML expertise. Its intuitive user interface lets you browse and access disparate data sources in the cloud or on-premises, combine datasets with the click of a button, train accurate models, and then generate new predictions once new data is available.

SageMaker Canvas leverages the same technology as Amazon SageMaker to automatically clean and combine your data, create hundreds of models under the hood, select the best performing one, and generate new individual or batch predictions. It supports multiple problem types such as binary classification, multi-class classification, numerical regression, and time series forecasting. These problem types let you address business-critical use cases, such as fraud detection, churn reduction, and inventory optimization, without writing a single line of code.

SageMaker Canvas in Action
Imagine that I’m an e-commerce manager who needs to predict whether or not a product will be shipped on time. The datasets at my disposal consist of a product catalog and the historical shipping dataset, both in CSV format.

First, I enter the SageMaker Canvas application where all of my models and datasets are created and inspected.

I select Import, and upload two CSV files: ProductData.csv and ShippingData.csv. I have 120 products and 10,000 shipping records.

I could also fetch data from Amazon Simple Storage Service (Amazon S3) or connect to other cloud or on-premises data sources, such as Amazon Redshift or Snowflake. For this use case, I prefer to upload 1.6 MB of data directly from my computer.

Before confirming the import, I have a chance to preview the two datasets, their columns, and their respective values. For example, each product has a ComputerBrand, ScreenSize, and PackageWeight. In addition to useful columns such as ShippingOrigin, OrderDate, and ShippingPriority, each record in the shipping dataset also contains OnTimeDelivery, which is either On Time or Late. This column will be used by SageMaker Canvas to generate a prediction model based on historical data.

After a few seconds of processing, the datasets are ready, and I decide to join them to create a single dataset containing both product and shipping information. This is an optional step that often lets you increase the precision of a prediction model.

Now I can simply drag and drop the two datasets: SageMaker Canvas will automatically identify the shared ProductId column and apply an Inner Join transformation.

The join preview lets me visualize the resulting columns, identify missing or invalid values, and optionally deselect unwanted columns.

I select Save joined data and provide a new name for this joined dataset, which now includes 16 columns and 10,000 records.

Next, I want to create a model and start by selecting New model in the Models section on the left menu. I call it On Time Prediction Model.

The first step is selecting a dataset.

I select a target column that my model will predict: OnTimeDelivery.

SageMaker Canvas shows me the value distribution and already recommends the most appropriate model type: two categories classification.

Before proceeding with the model training, I have the option to generate an analysis report. This analysis gives me two very important pieces of information: the estimated accuracy and the impact of each column.

The estimated accuracy of 99.9% gives me confidence, but then I notice that the highest impact is provided by the ActualShippingDays column. Unfortunately, this column is not available in advance and I can’t use it for my predictions. So I deselect it and run the analysis again.

The new estimated accuracy is 94.2%, which is still pretty high. The most impactful columns are ShippingPriority, YShippingDistance, XShippingDistance, and Carrier. This is great because all of this information is available in advance and can be used for a prediction. On the other hand, product-related columns, such PackageWeight and ScreenSize, have very small impacts on the prediction. This means that in the future I could simplify the overall process by feeding only shipping information into the training and prediction phases.

I’m happy with the analysis insights. Therefore, I decide to proceed and build a prediction model by selecting the Standard build option.

Now I can go for a walk, attend a few productive meetings, or simply spend some time with family. SageMaker Canvas is doing all of the work for me, training hundreds of models behind the scenes. It will select the best performing one, so that I can start generating accurate predictions in a couple of hours. Of course, the training duration will vary depending on the dataset size and problem type.

After about an hour and a half, the model is ready and the console lets me analyze its accuracy and the column impacts visually. I’m also happy to see that the model predicts the correct value 95.8% of the time, which is even higher than the estimated accuracy.

Optionally, I could also inspect advanced metrics such as Precision, Recall, F1 Score, and so on. These metrics help me understand how the model is performing and what kind of false positives and false negatives I can expect from this model.

From here, I could share the model into Amazon SageMaker Studio or continue using the Canvas UI to generate new predictions.

I decide to continue with the intuitive UI and select Predict. Now I can work with individual records or with a dataset for batch predictions.

When selecting Single prediction, SageMaker Canvas simplifies my life and lets me start from an existing record. I modify the column values and get immediate feedback on the prediction and the corresponding feature importance.

This quick feedback loop and intuitive UI allows me to use the ML model without having to write custom code. In case I decide to integrate the model into an automated production system, the Amazon SageMaker Studio integration lets me share the model easily with other data scientists in my team.

Generally Available Today
SageMaker Canvas is generally available in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), and Europe (Ireland). You can start using it with your local datasets, as well as data already stored on Amazon S3, Amazon Redshift, or Snowflake. With just a few clicks, you’ll prepare and join your datasets, analyze estimated accuracy, verify which columns are impactful, train the best performing model, and generate new individual or batch predictions. We’re excited to hear your feedback and help you solve even more business problems with ML.

Alex