Tag Archives: AWS re:Invent

New – Amazon FSx for OpenZFS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-fsx-for-openzfs/

Last month, my colleague Bill Vass said that we are “slowly adding additional file systems” to Amazon FSx. I’d question Bill’s definition of slow, given that his team has launched Amazon FSx for Lustre, Amazon FSx for Windows File Server, and Amazon FSx for NetApp ONTAP in less than three years.

Amazon FSx for OpenZFS
Today I am happy to announce Amazon FSx for OpenZFS, the newest addition to the Amazon FSx family. Just like the other members of the family, this new addition lets you use a popular file system without having to deal with hardware provisioning, software configuration, patching, backups, and the like. You can create a file system in minutes and begin to enjoy the benefits of OpenZFS right away: transparent compression, continuous integrity verification, snapshots, and copy-on-write. Even better, you get all of these benefits without having to develop the specialized expertise that has traditionally been needed to set up and administer OpenZFS.

FSx for OpenZFS is powered by the AWS Graviton family processors and AWS SRD (Scalable Reliable Datagram) Networking, and can deliver up to 1 million IOPS with latencies of 100-200 microseconds, along with up to 4 GB/second of uncompressed throughput, up to 12 GB/second of compressed throughput, and up to 12.5 GB/second throughput to cached data. FSx for OpenZFS supports the OpenZFS Adaptive Replacement Cache (ARC) and uses memory in the file server to provide faster performance. It also supports advanced NFS performance features such as session trunking and NFS delegation, allowing you to get very high throughput and IOPS from a single client, while still safely caching frequently accessed data on the client side.

FSx for OpenZFS volumes can be accessed from cloud or on-premises Linux, MacOS, and Windows clients via industry-standard NFS protocols (v3, v4, v4.1, and v4.2). Cloud clients can be Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (EKS) clusters, Amazon WorkSpaces virtual desktops, and VMware Cloud on AWS. Your data is stored in encrypted form and replicated within an AWS Availability Zone, with components replaced automatically and transparently as necessary.

You can use FSx for OpenZFS to address your highly demanding machine learning, EDA (Electronic Design Automation), media processing, financial analytics, code repository, DevOps, and web content management workloads. With performance that is close to local storage, FSx for OpenZFS is great for these and other latency-sensitive workloads that manipulate and sequentially access many small files. Finally, because you can create, mount, use, and delete file systems as needed, you can now use OpenZFS in a dynamic, agile fashion.

Using Amazon FSx for OpenZFS
I can create an OpenZFS file system using the AWS Management Console, CLI, APIs, or AWS CloudFormation. From the FSx Console I click Create file system and choose Amazon FSx for OpenZFS:

I can choose Quick create (and use recommended best-practice configurations), or Standard create (and set all of the configuration options myself). I’ll take the easy route and use the recommended best practices to get started. I enter a name (Jeff-OpenZFS) select the amount of SSD storage that I need, choose a VPC & subnet, and click Next:

The console shows me that I can edit many of the attributes of my file system later if necessary. I review the settings and click Create file system:

My file system is ready within a minute or two, and I click Attach to get the proper commands to mount it to my client:

To be more precise, I am mounting the root volume (/fsx) of my file system. Once it is mounted, I can use it as I would any other file system. After I add some files to it, I can use the Action menu in the console to create a backup:

I can restore the backup to a new file system:

As I noted earlier, each file system can deliver up to 4 gigabytes per second of throughput for uncompressed data. I can look at total throughput and other metrics in the console:

I can set throughput capacity of each volume when I create it, and then change it later if necessary:

Changes take effect within minutes. The file system remains active and mounted while the change is put into effect, but some operations may pause momentarily:

A single OpenZFS file system can contain multiple volumes, each with separate quotas (overall volume storage, per-user storage, and per-group storage) and compression settings. When I use the quick create option a root volume named fsx is created for me; I can click Create volume to create more volumes at any time:

The new volume exists within the namespace hierarchy of the parent, and can be mounted separately or accessed from the parent.

Things to Know
Here are a couple of quick facts and to wrap up this post:

Pricing – Pricing is based on the provisioned storage capacity, throughput, and IOPS.

Regions – Amazon FSx for OpenZFS is available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Canada (Central), Asia Pacific (Tokyo), and Europe (Frankfurt) Regions.

In the Works – We are working on additional features including storage scaling, IOPS scaling, a high availability option and another storage class.

Now Available
Amazon FSx for OpenZFS is available now and you can start using it today!

Jeff;

AWS Nitro SSD – High Performance Storage for your I/O-Intensive Applications

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-nitro-ssd-high-performance-storage-for-your-i-o-intensive-applications/

We love to solve difficult problems for our customers! As you have seen through the years, innovation at AWS takes many forms, and encompasses both hardware and software.

One of my favorite examples of customer-driven innovation is AWS Nitro System, which I first wrote about back in mid-2018. In that post I told you how Nitro System would allow us to innovate more quickly than ever, with the goal of creating instances that would run even more types of workloads. I also shared the basic building blocks, as they existed at that time, including Nitro Cards to accelerate and offload network and storage I/O, the Nitro Security Chip to monitor and protect hardware resources, and the Nitro Hypervisor to manage memory and CPU allocation with very low overhead.

Today I would like to tell you about one more building block!

AWS Nitro SSD
For decades, traditional hard drives (sometimes jokingly referred to as spinning rust) were the primary block storage devices. Today, while spinning rust still has its place, most high-performance storage is based on more modern Solid State Drives (SSD). Open up an SSD and you will find lots of flash memory and a firmware-driven processor that manages access to the memory and supports higher-level functions such as block mapping, encryption, caching, wear leveling, and so forth.

The scale of the AWS Cloud and the range of customer use cases that it supports gives us some valuable insights into the ways that today’s applications, database engines, and operating systems make use of block storage. As a result, after delivering several generations of EC2 instances we saw an opportunity to do better. Our goal was to allow I/O-intensive workloads (relational databases, NoSQL databases, data warehouses, search engines, and analytics engines to name a few) to run faster and with more predictable performance.

Today I would like to tell you about the AWS Nitro SSD. The first generation of these devices were used to power io2 Block Express EBS volumes, and allow us to give you EBS volumes with lots of IOPS, plenty of throughput, and a maximum volume size of 64 TiB. The Im4gn and Is4gen instances that I wrote about earlier today make use of the second generation of AWS Nitro SSDs, as will many future EC2 instances, including the I4i instances that we preannounced today.

The AWS Nitro SSDs are designed to be installed and to operate at cloud scale. While this sounds like a simple exercise in manufacturing and installing more devices, the reality is a lot more complex and a lot more interesting. As I noted earlier, the firmware inside of each device is responsible for implementing many lower-level functions. As our customers push the devices to their limits, they expect us to be able to diagnose and resolve any performance inconsistencies they observe. Building our own devices allows us to design in operational telemetry and diagnostics, along with mechanisms that enable us to install firmware updates at cloud scale & at cloud speed. Taking this even further, we developed our own code to manage the instance-level storage in order to further improve the reliability and debug-ability, and to deliver consistent performance.

On the performance side, our deep understanding of cloud workloads led us to engineer the devices so that they can deliver maximum performance under a sustained, continuous load. SSDs are built from fast, dense flash memory. Due to the characteristics of this semiconductor memory, each cell can only be written, erased, and then rewritten a limited number of times. In order to make the devices last as long as possible, the firmware is responsible for a process known as wear leveling. I don’t understand the details, but I assume that this includes some sort of mapping from logical block numbers to physical cells in a way that evens out the number of cycles over time. There’s some housekeeping (a form of garbage collection) involved in this process, and garden-variety SSDs can slow down (creating latency spikes) at unpredictable times when dealing with a barrage of writes. We also took advantage of our database expertise and built a very sophisticated, power-fail-safe journal-based database into the SSD firmware.

The second generation of AWS Nitro SSDs were designed to avoid latency spikes and deliver great I/O performance on real-world workloads. Our benchmarks show instances that use the AWS Nitro SSDs, such as the new Im4gn and Is4gen, deliver 75% lower latency variability than I3 instances, giving you more consistent performance.

Putting all of this together, there’s a very tight, rapidly rotating flywheel in action here because the team that builds the Nitro SSDs is part of the AWS storage team, and also has operational responsibilities. Like all teams at AWS, they watch the metrics day-in and day-out, and can efficiently deploy new firmware using a CI/CD model.

Join the Team
As is always the case, there’s always more innovation ahead, and we have some awesome positions on the teams that design the AWS Nitro SSDs. For example:

Jeff;

New Storage-Optimized Amazon EC2 Instances (Im4gn and Is4gen) Powered by AWS Graviton2 Processors

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-storage-optimized-amazon-ec2-instances-im4gn-and-is4gen-powered-by-aws-graviton2-processors/

EC2 storage-optimized instances are designed to deliver high disk I/O performance, and plenty of storage. Our customers use them to host high-performance real-time databases, distributed file systems, data warehouses, key-value stores, and more. Over the years we have released multiple generations of storage-optimized instances including the HS1 (2012) , D2 (2015), I2 (2013) , I3 (2017), I3en (2019), and D3/D3en (2020).

As I look back on all of these launches, it is interesting to see how we continue to provide an ever-increasing set of options that make each successive generation an even better fit for the diverse (and also ever-increasing) needs of our customers. HS1 instances were available in just one size, D2 and I2 in four, I3 in six, and I3en in eight. These instances give our customers the freedom to choose the size that best meets their current needs while also giving them room to scale up or down if those needs happen to change.

Im4gn and Is4gen
Today I am happy to introduce the two newest families of storage-optimized instances, Im4gn and Is4gen, powered by Graviton2 processors. Both instances offer up to 30 TB of NVMe storage using AWS Nitro SSD devices that are custom-built by AWS. As part of our drive to innovate on behalf of our customers, we turned our attention to storage and designed devices that were optimized to support high-speed access to large amounts of data. The AWS Nitro SSDs reduce I/O latency by up to 60% and also reduce latency variability by up to 75% when compared to the third generation of storage-optimized instances. As a result you get faster and more predictable performance for your I/O-intensive EC2 workloads.

Im4gn instances are a great fit for applications that require large amounts of dense SSD storage and high compute performance, but are not especially memory intensive such as social games, session storage, chatbots, and search engines. Here are the specs:

Instance Name vCPUs
Memory Local NVMe Storage
(AWS Nitro SSD)
Read Throughput
(128 KB Blocks)
EBS-Optimized Bandwidth Network Bandwidth
im4gn.large 2 8 GiB 937 GB 250 MB/s Up to 9.5 Gbps Up to 25 Gbps
im4gn.xlarge 4 16 GiB 1.875 TB 500 MB/s Up to 9.5 Gbps Up to 25 Gbps
im4gn.2xlarge 8 32 GiB 3.75 TB 1 GB/s Up to 9.5 Gbps Up to 25 Gbps
im4gn.4xlarge 16 64 GiB 7.5 TB 2 GB/s 9.5 Gbps 25 Gbps
im4gn.8xlarge 32 128 GiB 15 TB
(2 x 7.5 TB)
4 GB/s 19 Gbps 50 Gbps
im4gn.16xlarge 64 256 GiB 30 TB
(4 x 7.5 TB)
8 GB/s 38 Gbps 100 Gbps

Im4gn instances provide up to 40% better price performance and up to 44% lower cost per TB of storage compared to I3 instances. The new instances are available in the AWS US West (Oregon), US East (Ohio), US East (N. Virginia), and Europe (Ireland) Regions as On-Demand, Spot, Savings Plan, and Reserved instances.

Is4gen instances are a great fit for applications that do large amounts of random I/O to large amounts of SSD storage. This includes shared file systems, stream processing, social media monitoring, and streaming platforms, all of which can use the increased storage density to retain more data locally. Here are the specs:

Instance Name vCPUs
Memory Local NVMe Storage
(AWS Nitro SSD)
Read Throughput
(128 KB Blocks)
EBS-Optimized Bandwidth Network Bandwidth
is4gen.medium 1 6 GiB 937 GB 250 MB/s Up to 9.5 Gbps Up to 25 Gbps
is4gen.large 2 12 GiB 1.875 TB 500 MB/s Up to 9.5 Gbps Up to 25 Gbps
is4gen.xlarge 4 24 GiB 3.75 TB 1 GB/s Up to 9.5 Gbps Up to 25 Gbps
is4gen.2xlarge 8 48 GiB 7.5 TB 2 GB /s Up to 9.5 Gbps Up to 25 Gbps
is4gen.4xlarge 16 96 GiB 15 TB
(2 x 7.5 TB)
4 GB/s 9.5 Gbps 25 Gbps
is4gen.8xlarge 32 192 GiB 30 TB
(4 x 7.5 TB)
8 GB/s 19 Gbps 50 Gbps

Is4gen instances provide 15% lower cost per TB of storage and up to 48% better compute performance compared to I3en instances. The new instances are available in the AWS US West (Oregon), US East (Ohio), US East (N. Virginia), and Europe (Ireland) Regions as On-Demand, Spot, Savings Plan, and Reserved instances.

Available Now
As I never get tired of saying, these new instances are available now and you can start using them today. You can use Amazon Linux 2, Ubuntu 18.04.05 (and newer), Red Hat Enterprise Linux 8.0, and SUSE Enterprise Server 15 (and newer) AMIs, along with the container-optimized ECS and EKS AMIs. Learn more about the Im4gn and Is4gen instances.

Jeff;

PS – As of this launch twelve EC2 instance types are now powered by Graviton2 processors! To learn more, visit the Graviton2 page.

Machine Learning-Powered Amazon Connect, Now With Call Summarization

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/machine-learning-powered-amazon-connect-now-with-call-summarization/

At AWS our mission is to make machine learning (ML) accessible to data scientists, developers, and business users. To help businesses easily leverage the power of ML, we create purpose-built solutions that embed ML and deep learning technologies directly into a business process to address real customer needs, rather than leaving companies to sort it out on their own.

One place where we have seen ML have an impact is within the contact center—the place you receive and respond to customer inquiries and issues. Because of the growing role of customer experience (CX) and the increase in contact less commerce via phone or email, contact centers are essentials to maintaining the human connections that businesses depend on. However, analog or outdated methods make it difficult to address every customer need in an effective way that delivers timely resolutions, delivers great experiences, and fosters customer loyalty.

Embedding AWS ML technologies into a cloud contact center solution helps decrease the friction of calls, chats, and other engagements. It also makes it possible to automate outdated processes.

Amazon Connect is an easy-to-use, cloud-based, ML-powered contact center service that helps companies of any size deliver superior customer service at a lower cost.

Let me take three examples with Voice ID, Wisdom, and Contact Lens.

Amazon Connect Voice ID
ML capabilities might help streamline customer experience for authentication. Instead of asking customers to repeat their email address and their mother’s maiden name several times, ML-powered voice identification can establish a digital voice print associated with each customer’s unique voice. Then, it can recognize it at the beginning of each subsequent call. Voice identification provides a confidence score that may be used to automate authentication workflows.

Amazon Connect Wisdom
ML might also help search the vast documentation and knowledge base to find the most relevant answers to the questions raised by the customer. ML helps resolve customer issues faster and better.

Contact Lens for Amazon Connect
ML technologies also shine at analyzing the tone and content of a conversation, capturing customer sentiment in the moment, and learning from it. ML can help transcribe calls, track customer sentiment, detect common issues and customer trends, or even pinpoint discrepancies.

At just about the same time last year, I announced the addition of real-time capabilities for Contact Lens. This lets supervisors identify when to assist an agent on live calls so that they can provide guidance via chat or have the agent transfer the call. Last September, we added support for eight new languages, ending up with a total of 21 languages for post-call analytics and 12 languages for both post-call and real-time analytics.

Contact Lens Adds Call Summarization
But we didn’t stop there. Today, I am pleased to announce the addition of a new capability that helps you improve customer experience and agent and supervisor productivity by automatically summarizing the important aspects of each customer call.

You told us that keeping notes of customer conversations is time consuming, especially, for agents that must take notes during the call and import them manually in your CRM tool afterward. In the end, this is more time for us, the customers, waiting in queue for an agent to become available. Likewise, using automatically generated call transcripts doesn’t save time for supervisors. It is time consuming for supervisors to read these full call transcripts to understand what happened during customer conversations.

How it Works
Starting today, Contact Lens has added a summary of the key moments in a conversation. It is enabled by default, and there is no additional configuration step. You may toggle the Show transcript summary button to show or hide the summary when you don’t need it.

Contac Lens - Show Transcript Summary - Toggle button

Once a call is analyzed, the summary is available on the contact detail page.

Contact Lens identifies and summarizes the sections corresponding to Issue (e.g., lost package), Outcome (e.g., customer refund), and Action item (e.g., send a follow-up mail confirming the refund was processed). A manager can quickly see where there’s an action to send a customer a follow-up email and take action to ensure it happens.

Contact Lens Call Summary Example

The call summary is also available in JSON format. Contact Lens uploads these in the S3 bucket of your choice. Having access to the JSON file lets you import the summaries programmatically in your CRM or other tools.

... redacted for brevity ...

"IssuesDetected": [
{
   "CharacterOffsets": {
      "BeginOffsetChar": 31,
      "EndOffsetChar": 73
   },
   "Text": "I would like to cancel my subscription"
}
]
...
"ActionItemsDetected": [
 {
   "CharacterOffsets": {
      "BeginOffsetChar": 32,
      "EndOffsetChar": 116
   },
   "Text": "I will send you an email with details"
 }
 ]

Availability and Pricing
Call summarization by Contact Lens is available in all AWS Regions where Contact Lens is available today. We support post-call analytics in the US West (Oregon), US East (N. Virginia), Canada (Central), Europe (London), Europe (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Tokyo), and Asia Pacific (Sydney) regions. We support real-time analytics in the US West (Oregon), US East (N. Virginia), Canada (Central), Europe (London), Europe (Frankfurt), Asia Pacific (Seoul), Asia Pacific (Tokyo), and Asia Pacific (Sydney) regions.

Call summary comes at no additional cost on top of the usual charges for Contact Lens. This is why we choose to enable it by default. Contact Lens is charged $0.015 per minute of voice conversation analyzed. Most of our Contact Lens customers analyze millions of conversation minutes per month. The price is $0.0125 per minute when you analyze more than 5 millions minutes per month.

If you do not have Contact Lens enabled on your call center, go ahead and start using it today.

— seb

New for AWS Control Tower – Region Deny and Guardrails to Help You Meet Data Residency Requirements

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-control-tower-region-deny-and-guardrails-to-help-you-meet-data-residency-requirements/

Many customers, such as those in highly regulated industries and the public sector, want to have control over where their data is stored and processed. AWS already offers many tools and features to comply with local laws and regulations, but we want to provide a simplified way to translate data residency requirements into controls that can be applied to single- and multi-account environments.

Starting today, you can use AWS Control Tower to deploy data residency preventive and detective controls, referred to as guardrails. These guardrails will prevent provisioning resources in unwanted AWS Regions by restricting access to AWS APIs through service control policies (SCPs) built and managed by AWS Control Tower. In this way, content cannot be created or transferred outside of your selected Regions at the infrastructure level. In this context, content can be software (including machine images), data, text, audio, video, or images hosted on AWS for processing or storage. For example, AWS customers in Germany can deny access to AWS services in Regions outside of Frankfurt with the exception of global services such as AWS Identity and Access Management (IAM) and AWS Organizations.

AWS Control Tower also offers guardrails to further control data residency in underlying AWS service options, for example, blocking Amazon Simple Storage Service (Amazon S3) cross-region replication or blocking the creation of internet gateways.

The AWS account used for managing AWS Control Tower is not restricted by the new Region deny settings. That account can be used for remediation if you have data in an unwanted Region before enabling Region deny.

Detective guardrails are implemented via AWS Config rules and can further detect unexpected configuration changes that should not be allowed.

You still retain a shared responsibility model for data residency at the application level, but these controls can help you restrict what infrastructure and application teams can do on AWS.

Using Data Residency Guardrails in AWS Control Tower
To use the new data residency guardrails, you need to have created a landing zone using AWS Control Tower. See Plan your AWS Control Tower landing zone for more information.

To see all the new controls that are available, I select Guardrails on the left pane of the AWS Control Tower console and then find those in the Data Residency category. I sort results by Behavior. Guardrails that have a Prevention behavior are implemented as SCPs. Those that have a Detection behavior are implemented as AWS Config rules.

Console screenshot.

The most interesting guardrail is probably the one denying access to AWS based on the requested AWS Region. I choose it from the list and find that it is different from the other guardrails because it affects all Organizational Units (OUs) and cannot be activated here but must be activated in the landing zone settings.

Console screenshot.

Below the Overview, in the Guardrail components, there is a link to the full SCP for this guardrail, and I can see the list of the AWS APIs that, when this setting is enabled, are still going to be allowed towards non-governed Regions. Depending on your requirements, some of those services, such as Amazon CloudFront or AWS Global Accelerator, can be further limited by a custom SCP.

In the Landing zone settings, the Region deny guardrail is currently not enabled. I choose Modify settings and then enable the Region deny settings.

Console screenshot.

Below the Region deny settings, there is the list of AWS Regions governed by the landing zone. Those will be the regions allowed when I enable Region deny.

Console screenshot.

In my case, I have four governed Regions, two in the US and two in Europe:

  • US East (N. Virginia), which is also the home Region for the landing zone
  • US West (Oregon)
  • Europe (Ireland)
  • Europe (Frankfurt)

I choose Update landing zone at the bottom. The update of the landing zone takes a few minutes to complete. Now, the vast majority of the AWS APIs are blocked if they are not directed to one of those governed Regions. Let’s do a few tests.

Testing Region Deny in a Sandbox Account
Using AWS Single Sign-On, I copy the AWS credentials to use the sandbox account with AWSAdministratorAccess permissions. In a terminal, I paste the commands setting the environment variables to use those credentials.

Console screenshot.

Now, I try to start a new Amazon Elastic Compute Cloud (Amazon EC2) instance in US East (Ohio), one of the non-governed Regions. In a landing zone, the default VPC is replaced by a VPC managed by AWS Control Tower. To start the instance, I need to specify a VPC subnet. Let’s find a subnet ID that I can use.

aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --region us-east-2

An error occurred (UnauthorizedOperation) when calling the DescribeSubnets operation:
You are not authorized to perform this operation.

As expected, I am not authorized to perform this operation in US East (Ohio). Let’s try to start an EC2 instance without passing the subnet ID.

aws ec2 run-instances --image-id ami-0dd0ccab7e2801812 --region us-east-2 \
    --instance-type t3.small                                     

An error occurred (UnauthorizedOperation) when calling the RunInstances operation:
You are not authorized to perform this operation.
Encoded authorization failure message: <ENCODED MESSAGE>

Again, I am not authorized. More information is included in the encoded authorization failure message that I can decode as described in this article:

aws sts decode-authorization-message --encoded-message <ENCODED MESSAGE>

The decoded message (that I have omitted for brevity) tells me that there was an explicit deny to my request and includes the full SCP that caused the deny. This information is really useful for debugging these kind of errors.

Now, let’s try in US East (N. Virginia), one of the four governed regions.

aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --region us-east-1
"subnet-0f3580c0c5e56c210"

This time, the command returns the subnet ID of the first subnet returned by the request. Let’s start an instance in US East (N. Virginia) using this subnet.

aws ec2 run-instances --image-id  ami-04ad2567c9e3d7893 --region us-east-1 \
    --instance-type t3.small --subnet-id subnet-0f3580c0c5e56c210

As expected, it works, and I can see the EC2 instance running in the console.

Console screenshot.

Similarly, APIs for other AWS services are limited by the Region deny settings. For example, I can’t create an S3 bucket in a non-governed Region.

Console screenshot.

When I try to create the bucket, I get an access denied error.

Console screenshot.

As expected, the creation of an S3 bucket works in a governed Region.

Even if someone gives this account access to a bucket in a non-governed Region, I would not be able to copy any data into that bucket.

Other preventive guardrails can enforce data residency, for example:

  • Disallow cross-region networking for Amazon EC2, Amazon CloudFront, and AWS Global Accelerator
  • Disallow internet access for an Amazon VPC instance managed by a customer
  • Disallow Amazon Virtual Private Network (VPN) connections

Now, let’s see how detective guardrails work.

Testing Detective Guardrails in a Sandbox Account
I enable the following guardrails for all accounts in the sandbox OU:

  • Detect whether Amazon EBS snapshots are restorable by all AWS accounts
  • Detect whether public routes exist in the route table for an internet gateway

Now, I want to see what happens if I go against these guardrails. In the EC2 console, I create an EBS snapshot for the volume of the EC2 instance I started before. Then, I modify permissions to share it with all AWS accounts.

Console screenshot.

Then, in the VPC console, I create an internet gateway, attach it to the AWS Control Tower managed VPC, and update the route table of one of the private subnets to use the internet gateway.

Console screenshot.

After a few minutes, the noncompliant resources in the sandbox account are found by the detective guardrails.

Console screenshot.

I look at the information provided by the guardrails and update my configuration to fix the issues. In a multi-account setup I’d contact the account owner and ask for remediation.

Availability and Pricing
You can use data-residency guardrails to control resources in any AWS Region. To create a landing zone, you should start from one of the Regions where AWS Control Tower is offered. For more information, see the AWS Regional Services List. There is no additional cost for this feature. You pay the costs of other services used, such as AWS Config.

This feature provides you with a framework of controls and guidance for setting up a multi-account environment that addresses data residency requirements. Depending on your use case, you may use any subset of the new data residency guardrails.

Set up guardrails based on your data residency requirements with AWS Control Tower.

Danilo

New – AWS Outposts Servers in Two Form Factors

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-outposts-servers-in-two-form-factors/

AWS Outposts gives you on-premises compute and storage that is monitored and managed by AWS, and controlled by the same, familiar AWS APIs. You may already know about the AWS Outposts rack, which occupies a full 42U rack.

Last year I told you that we were working on new sizes of Outposts suitable for locations such as branch offices, factories, retail stores, health clinics, hospitals, and cell sites that are space-constrained and need access to low-latency compute capacity. Today we are launching three AWS Outposts servers, all powered by AWS Nitro System and with your choice of x86 or Arm/Graviton2 processors. Here’s an overview:

Name/Rack Size/Catalog ID
EC2 Instance Capacity
Processor / Architecture
vCPUs Memory
Local NVMe
SSD Storage
Outposts 1U
(STBKRBE)
c6gd.16xlarge Graviton2 / Arm 64 128 GiB 3.8 TB
( 2x 1.9 TB)
Outposts 2U
(LMXAD41)
c6id.16xlarge Intel Ice Lake / x86 64 128 GiB 3.8 TB
(2 x 1.9 TB)
Outposts 2U
(KOSKFSF)
c6id.32xlarge Intel Ice Lake / x86 128 256 GiB 7.6 TB
(4 x 1.9 GB)

You can create VPC subnets on each Outpost, and you can launch Amazon Elastic Compute Cloud (Amazon EC2) instances from EBS-backed AMIs in the parent region. The c6gd.16xlarge model supports six instance sizes, as follows:

Instance Name vCPUs Memory Local Storage
c6gd.large 2 4 GiB 118 GB
c6gd.xlarge 4 8 GiB 237 GB
c6gd.2xlarge 8 16 GiB 474 GB
c6gd.4xlarge 16 32 GiB 950 GB
c6gd.8xlarge 32 64 GiB 1.9 TB
c6gd.16xlarge 64 128 GiB 3.8 TB

The c6id.16xlarge model supports all but the largest of the following instance sizes, and the c6id.32xlarge supports all of them:

Instance Name vCPUs Memory Local Storage
c6id.large 2 4 GiB 118 GB
c6id.xlarge 4 8 GiB 237 GB
c6id.2xlarge 8 16 GiB 474 GB
c6id.4xlarge 16 32 GiB 950 GB
c6id.8xlarge 32 64 GiB 1.9 TB
c6id.16xlarge 64 128 GiB 3.8 TB
c6id.32xlarge 128 256 GiB 7.6 TB

Within each of your Outposts servers, you can launch any desired mix of instance sizes as long as you remain within the overall processing and storage available. You can create Amazon Elastic Container Service (Amazon ECS) clusters (Amazon Elastic Kubernetes Service (EKS) is coming soon) , and the code you run on-premises can make use of the entire lineup of services in the AWS Cloud.

Each Outposts server connects to the cloud via the public Internet or across a private AWS Direct Connect line. Additionally, each Outpost server supports a Local Network Interface (LNI) that provides a Level 2 presence on your local network for AWS service endpoints.

Outposts servers incorporate many powerful Nitro features including high speed networking and enhanced security. The security model is locked-down and prevents administrative access, preventing tampering or human error. Additionally, data at rest is protected by a NIST-compliant physical security key.

While I was writing this post, I stopped in to say hello to the design and development team, and met with my colleague Bianca Nagy to learn more about the Outposts server:

Ordering Outposts Servers
Let’s walk through the process of ordering an Outposts server from the AWS Management Console. I visit the AWS Outposts Console, make sure that I am in the desired AWS Region, and click Place order to get started:

I click Servers, and then choose the desired configuration. I pick the c6gd.16xlarge, and click Next to proceed:

Then I create a new Outpost:

And a new Site:

Then I review my payment options and select my shipping address:

On the next page I review all of my options, click Place order, and await delivery:

In general, we expect to be able to deliver Outposts servers in two to six weeks, starting in the first quarter of 2022. After you receive yours, you or a member of your IT team can mount it in a 19″ rack or position it on a flat surface, cable it to power and networking, and power the device on. You then use a set of temporary AWS credentials to confirm the identity of the device, and to verify that the device is able to use DHCP to obtain an IP address. Once the device has established connectivity to the designated AWS parent region, we will finalize the provisioning of EC2 instance capacity and make it available to you.

After that, you are ready to launch instances and to deploy your on-premises applications.

We will monitor hardware performance and will contact you if your device is in need of maintenance. We will ship a replacement device for arrival within 2 business days. You can migrate your workloads to a redundant device, and use tracking information & notifications to track delivery status. When the replacement arrives, you install it and then destroy the physical security key in the old one before shipping it back to AWS.

Outposts API Update
We are also enhancing the Outposts API as part of this launch. Here are some of the new functions:

ListCatalogItem – Get a list of items in the Outposts catalog, with optional filtering by EC2 family or supported storage options.

GetCatalogItem – Get full information about a single item in the Outposts catalog.

GetSiteAddress – Get the physical address of a site where an Outposts rack or server is installed.

You can use the information returned by GetCatalogItem to place an order that contains the desired quantity of one or more catalog items.

Things to Know
Here are a couple of important things to know about Outposts servers:

Availability – Outposts servers are available for order to most locations where Outposts racks are available (currently 23 regions and 49 countries), with more to follow in 2022.

Ordering at Scale – I showed you the console-based ordering process above, and also gave you a glimpse at the Outposts API. If you need hundreds or thousands of devices, get in touch and we will give you a template that you can fill in and then upload.

re:Invent 2021 Outposts Server Selfie Challenge
If you attend AWS re:Invent, be sure to visit the AWS Hybrid kiosk in the AWS Booth (#1719) to see the new Outposts Servers up close and personal. While you are there, take a fun & creative selfie, tag it with #AWSOutposts & #AWSPromotion, and share it on Twitter. I will post my three favorites at the end of the show!

Jeff;

Announcing Amazon SageMaker Canvas – a Visual, No Code Machine Learning Capability for Business Analysts

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/

As an organization facing business problems and dealing with data on a daily basis, the ability to build systems that can predict business outcomes becomes very important. This ability lets you solve problems and move faster by automating slow processes and embedding intelligence in your IT systems.

But how do you make sure that all teams and individual decision makers in the organization are empowered to create these machine learning (ML) systems at scale, and without depending on other data science and data engineering teams? As a business user or data analyst, you’d like to build and use prediction systems based on the data that you analyze and process every day, without having to learn about hundreds of algorithms, training parameters, evaluation metrics, and deployment best practices.

Today, I’m excited to announce the general availability of Amazon SageMaker Canvas, a new visual, no code capability that allows business analysts to build ML models and generate accurate predictions without writing code or requiring ML expertise. Its intuitive user interface lets you browse and access disparate data sources in the cloud or on-premises, combine datasets with the click of a button, train accurate models, and then generate new predictions once new data is available.

SageMaker Canvas leverages the same technology as Amazon SageMaker to automatically clean and combine your data, create hundreds of models under the hood, select the best performing one, and generate new individual or batch predictions. It supports multiple problem types such as binary classification, multi-class classification, numerical regression, and time series forecasting. These problem types let you address business-critical use cases, such as fraud detection, churn reduction, and inventory optimization, without writing a single line of code.

SageMaker Canvas in Action
Imagine that I’m an e-commerce manager who needs to predict whether or not a product will be shipped on time. The datasets at my disposal consist of a product catalog and the historical shipping dataset, both in CSV format.

First, I enter the SageMaker Canvas application where all of my models and datasets are created and inspected.

I select Import, and upload two CSV files: ProductData.csv and ShippingData.csv. I have 120 products and 10,000 shipping records.

I could also fetch data from Amazon Simple Storage Service (Amazon S3) or connect to other cloud or on-premises data sources, such as Amazon Redshift or Snowflake. For this use case, I prefer to upload 1.6 MB of data directly from my computer.

Before confirming the import, I have a chance to preview the two datasets, their columns, and their respective values. For example, each product has a ComputerBrand, ScreenSize, and PackageWeight. In addition to useful columns such as ShippingOrigin, OrderDate, and ShippingPriority, each record in the shipping dataset also contains OnTimeDelivery, which is either On Time or Late. This column will be used by SageMaker Canvas to generate a prediction model based on historical data.

After a few seconds of processing, the datasets are ready, and I decide to join them to create a single dataset containing both product and shipping information. This is an optional step that often lets you increase the precision of a prediction model.

Now I can simply drag and drop the two datasets: SageMaker Canvas will automatically identify the shared ProductId column and apply an Inner Join transformation.

The join preview lets me visualize the resulting columns, identify missing or invalid values, and optionally deselect unwanted columns.

I select Save joined data and provide a new name for this joined dataset, which now includes 16 columns and 10,000 records.

Next, I want to create a model and start by selecting New model in the Models section on the left menu. I call it On Time Prediction Model.

The first step is selecting a dataset.

I select a target column that my model will predict: OnTimeDelivery.

SageMaker Canvas shows me the value distribution and already recommends the most appropriate model type: two categories classification.

Before proceeding with the model training, I have the option to generate an analysis report. This analysis gives me two very important pieces of information: the estimated accuracy and the impact of each column.

The estimated accuracy of 99.9% gives me confidence, but then I notice that the highest impact is provided by the ActualShippingDays column. Unfortunately, this column is not available in advance and I can’t use it for my predictions. So I deselect it and run the analysis again.

The new estimated accuracy is 94.2%, which is still pretty high. The most impactful columns are ShippingPriority, YShippingDistance, XShippingDistance, and Carrier. This is great because all of this information is available in advance and can be used for a prediction. On the other hand, product-related columns, such PackageWeight and ScreenSize, have very small impacts on the prediction. This means that in the future I could simplify the overall process by feeding only shipping information into the training and prediction phases.

I’m happy with the analysis insights. Therefore, I decide to proceed and build a prediction model by selecting the Standard build option.

Now I can go for a walk, attend a few productive meetings, or simply spend some time with family. SageMaker Canvas is doing all of the work for me, training hundreds of models behind the scenes. It will select the best performing one, so that I can start generating accurate predictions in a couple of hours. Of course, the training duration will vary depending on the dataset size and problem type.

After about an hour and a half, the model is ready and the console lets me analyze its accuracy and the column impacts visually. I’m also happy to see that the model predicts the correct value 95.8% of the time, which is even higher than the estimated accuracy.

Optionally, I could also inspect advanced metrics such as Precision, Recall, F1 Score, and so on. These metrics help me understand how the model is performing and what kind of false positives and false negatives I can expect from this model.

From here, I could share the model into Amazon SageMaker Studio or continue using the Canvas UI to generate new predictions.

I decide to continue with the intuitive UI and select Predict. Now I can work with individual records or with a dataset for batch predictions.

When selecting Single prediction, SageMaker Canvas simplifies my life and lets me start from an existing record. I modify the column values and get immediate feedback on the prediction and the corresponding feature importance.

This quick feedback loop and intuitive UI allows me to use the ML model without having to write custom code. In case I decide to integrate the model into an automated production system, the Amazon SageMaker Studio integration lets me share the model easily with other data scientists in my team.

Generally Available Today
SageMaker Canvas is generally available in US East (Ohio), US East (N. Virginia), US West (Oregon), Europe (Frankfurt), and Europe (Ireland). You can start using it with your local datasets, as well as data already stored on Amazon S3, Amazon Redshift, or Snowflake. With just a few clicks, you’ll prepare and join your datasets, analyze estimated accuracy, verify which columns are impactful, train the best performing model, and generate new individual or batch predictions. We’re excited to hear your feedback and help you solve even more business problems with ML.

Alex

Amazon Kinesis Data Streams On-Demand – Stream Data at Scale Without Managing Capacity

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/amazon-kinesis-data-streams-on-demand-stream-data-at-scale-without-managing-capacity/

Today we are launching Amazon Kinesis Data Streams On-demand, a new capacity mode. This capacity mode eliminates capacity provisioning and management for streaming workloads.

Kinesis Data Streams is a fully-managed, serverless service for real-time processing of streamed data at a massive scale. Kinesis Data Streams can take any amount of data, from any number of sources, and scale up and down as needed. Creating a new data stream is easy, since we announced Kinesis Data Streams in November 2013. To get started, you only need to specify the number of shards with which you must provision your stream.

Shards are the way to define capacity in Kinesis Data Streams. Each shard can ingest 1 MB/s and 1,000 records/second and egress up to 2 MB/s. You can add or remove shards of the stream using Kinesis Data Streams APIs to adjust the stream capacity according to the throughout needs of their workloads. This lets you make sure that producer and consumer applications don’t experience any throttling.

As customers adopt data streaming broadly, workloads with data traffic that can increase by millions of events in a few minutes are becoming more common. For these volatile traffic patterns, customers carefully plan capacity, monitor throughput, and in some cases develop processes that automatically change the Kinesis Data Streams stream capacity.

Kinesis Data Streams On-Demand Mode
That is why today we are announcing Kinesis Data Streams On-demand. This new capacity mode eliminates the need for provisioning and managing the capacity for streaming data. Using Kinesis Data Streams On-demand automatically scales the capacity in response to varying data traffic. Customers are charged per gigabyte of data written, read, and stored in the stream, in a pay-per-throughput fashion.

Data streams in the on-demand mode have the same high durability, high availability, low latency, security, and deep AWS integrations that Kinesis Data Streams already provides. Moreover, there are no new APIs to write or read data. All existing Kinesis Data Streams integrations work in the on-demand mode.

Kinesis Data Streams uses the partition key to distribute data across shards. That is why when using Kinesis Data Streams On-demand, you still must specify a partition key for each record to write data into a data stream, as you do today in Kinesis Data Streams using the provisioned mode. In Kinesis Data Streams On-demand, the data stream automatically adapts to handle uneven data distribution patterns. But you must be careful that no partition key exceeds a shard’s limits. If this happens, then you will receive write throttles, and then you can retry these requests.

When a new data stream is created using Kinesis Data Streams On-demand, it gets created with the default capacity of 4 MB/s and 4,000 records per second for writes. Kinesis Data Streams On-demand can automatically scale up to 200 MB/s and 200,000 records per second for writes.

Kinesis Data Streams On-demand accommodates up to double its previous peak write throughput observed in the last 30 days. As your data stream’s write throughput hits a new peak, Kinesis Data Streams automatically scales the stream’s capacity.

For example, if your data stream has a write throughput that varies between 10 MB/s and 40 MB/s, Kinesis Data Streams will make sure that you can easily burst to double the peak—80 MB/s. And, if later on that same data stream reaches a new peak of 50 MB/s, then Kinesis Data Streams will make sure that there is enough capacity to ingest 100 MB/s. However, write throttling can occur if your traffic grows more than double the previous peak in less than 15 minutes.

When to Use Kinesis Data Streams On-demand
On-demand mode is great for customers that have an unknown or variable workload, or who simply don’t want to deal with capacity management. On-demand mode works best for workloads that have even partition key distribution. For example, you run a mobile game that has variable traffic through the week or day, as customers play mostly on nights or weekends. Or, you run a streaming platform that hosts live shows, and you see a sudden increase in demand depending on the guests you have.

In addition, you can switch between on-demand and provision mode twice a day. For example, you run an e-commerce site with predictable traffic. But, starting next month, there will be many marketing campaigns launched globally. You don’t know the impact that those will have on the site traffic. Switch your Kinesis Data Streams to on-demand mode, and now you can enjoy the automated capacity planning and management for your data streams.

Get Started with Kinesis Data Streams On-demand
Create a new data stream with Kinesis Data Streams On-demand from the AWS console, AWS SDKs, AWS Command Line Interface (CLI), and AWS CloudFormation.

To create one from the console, visit the Kinesis console and Create data stream. When selecting the capacity mode, select On-demand.

Creating a data stream

At the end of the page, all of the settings for the new data stream are presented. These settings can be changed after the data stream has been created.

Data stream settings

Let’s See This in Action!
For this demo, I want to show you how the new Kinesis Data Streams capability works. This situation is best described if you at look at the following Amazon CloudWatch graphs. The green line represents the bytes ingested successfully into the stream, and the red line shows the percentage of traffic that is throttled.

First, we will start with a stream provisioned with five shards. For the first three minutes, we are sending a load of 4 MB/s. You can see that the stream can handle the load.

At the time stamp 21:19, we increase the load to 12 MB/s. Now the stream cannot handle the load, and the throttles start (the red line starts climbing up to 60 percent of request being throttled).

Increase the load on a provisioned stream

At the time stamp 21:23, we change the stream capacity from provisioned to on-demand. You can do that on-the-fly without affecting the stream. See that it takes a very short time for the stream to handle the load when converting from one capacity mode to the other.

In a few minutes (time stamp 21:24) the throttles start to drop as the stream starts scaling up. The stream capacity doubles to 10 shards first (time stamp 21:26), and the stream keeps scaling up until each shard has a load of less than 0.5 MB/s. In this way, if the stream suddenly receives double the amount of load, then it has the capacity ready to handle it.

Change to on-demand mode

At the time stamp 21:26, the load in the stream is increased to 18 MB/s. You can see the green line climbing to 350,000 records – there are no throttles, and the stream ends this demo with 40 open shards. This means that if suddenly the stream receives a load of 40 MB/s, then it could handle it with no problem.

Increase the load

Available Now!
The Amazon Kinesis Data Streams On-demand is available globally in all commercial Regions.

You can learn more about the capacity modes in the Amazon Kinesis Data Streams Developer Guide.

Marcia

Introducing Amazon Redshift Serverless – Run Analytics At Any Scale Without Having to Manage Data Warehouse Infrastructure

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-redshift-serverless-run-analytics-at-any-scale-without-having-to-manage-infrastructure/

We’re seeing the use of data analytics expanding among new audiences within organizations, for example with users like developers and line of business analysts who don’t have the expertise or the time to manage a traditional data warehouse. Also, some customers have variable workloads with unpredictable spikes, and it can be very difficult for them to constantly manage capacity.

With Amazon Redshift, you use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. Today, I am happy to introduce the public preview of Amazon Redshift Serverless, a new capability that makes it super easy to run analytics in the cloud with high performance at any scale. Just load your data and start querying. There is no need to set up and manage clusters. You pay for the duration in seconds when your data warehouse is in use, for example, while you are querying or loading data. There is no charge when your data warehouse is idle.

Amazon Redshift Serverless automatically provisions the right compute resources for you to get started. As your demand evolves with more concurrent users and new workloads, your data warehouse scales seamlessly and automatically to adapt to the changes. You can optionally specify the base data warehouse size to have additional control on cost and application-specific SLAs.

With the new serverless option, you can continue to query data in other AWS data stores, such as Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Aurora and Amazon Relational Database Service (RDS) databases.

Amazon Redshift Serverless is ideal when it is difficult to predict compute needs such as variable workloads, periodic workloads with idle time, and steady-state workloads with spikes. This approach is also a good fit for ad-hoc analytics needs that need to get started quickly and for test and development environments.

Let’s see how this works in practice.

Using Amazon Redshift Serverless
I go to the Amazon Redshift console and choose the new serverless option. The first time, I set up the serverless endpoint and configure networking and security.

I confirm the default settings that use all subnets in my default Amazon Virtual Private Cloud (VPC) and its default security group. Data is always encrypted, and I use the default AWS-owned key. Optionally, I can customize all settings. I can associate now or later the AWS Identity and Access Management (IAM) roles to give permissions to access other AWS resources, for example, to be able to load data from an S3 bucket. The configuration of the serverless endpoint will be shared by all my serverless data warehouses in the same AWS account and Region.

Console screenshot.

To query data, I use Amazon Redshift Query Editor V2, a new free web-based tool that we made available a few months back. The query editor provides quick access to a few sample datasets to make it easy to learn Amazon Redshift’s SQL capabilities: TPC-H, TPC-DS, and tickit, a dataset containing information on ticket sales for events.

For a quick test, I use the tickit sample dataset so I don’t need to load any data. I prepare a query to get the list of tickets sold per date, sorted to see the dates with more sales first:

SELECT caldate, sum(qtysold) as sumsold
FROM   tickit.sales, tickit.date
WHERE  sales.dateid = date.dateid 
GROUP BY caldate
ORDER BY sumsold DESC;

By using the web-based query editor, I don’t need to configure a SQL client or set up the network permissions to reach the serverless endpoint. Instead, I just write my SQL query and run it.

Console screenshot.

I am a visual person. I enable the Chart option on the right of the result table and select a bar chart.

Console screenshot.

Satisfied with the clarity of the chart, I export it as an image file. In this way, I can quickly share it or include it in a report.

Bar chart

Amazon Redshift Serverless supports all rich SQL functionality of Amazon Redshift such as semi-structured data support. I can use any JDBC/ODBC-compliant tool or the Amazon Redshift Data API to query my data. To migrate data, I can take a snapshot of an Amazon Redshift provisioned cluster and restore it as serverless. Then, I just need to update my SQL applications to use the new serverless endpoint.

Availability and Pricing
Amazon Redshift Serverless is available in public preview in the following AWS Regions: US East (N. Virginia), US West (N. California, Oregon), Europe (Frankfurt, Ireland), Asia Pacific (Tokyo).

With Amazon Redshift Serverless, you pay separately for the compute and storage you use. Compute capacity is measured in Redshift Processing Units (RPUs), and you pay for the workloads in RPU-hours with per-second billing. For storage, you pay for data stored in Amazon Redshift-managed storage and storage used for snapshots, similar to what you’d pay with a provisioned cluster using RA3 instances.

To control your costs, you can specify usage limits and define actions that Amazon Redshift automatically takes if those limits are reached. You can specify usage limits in RPU-hours and associated with a daily, weekly, or monthly duration. Setting higher usage limits can improve the overall throughput of the system, especially for workloads that need to handle high concurrency while maintaining consistently high performance.

Compute resources automatically shutdown behind the scenes when there is no activity and resume when you are loading data, or there are queries coming in. When accessing your S3 data lake via the new serverless endpoint, you do not pay for Amazon Redshift Spectrum separately. You have a unified serverless experience and pay for data lake queries also in RPU-seconds. For more information, see the Amazon Redshift pricing page.

The serverless end point is configured at the AWS account level. If you have multiple teams or projects and want to manage costs separately, you can use separate AWS accounts. You can share data between your provisioned clusters and serverless endpoint and between serverless endpoints across accounts.

To help you get practice, we provide you upfront with $500 in AWS credits to try the Amazon Redshift Serverless public preview. You get the credits when you first create a database with Amazon Redshift Serverless. These credits are used to cover your costs for compute, storage, and snapshot usage of Amazon Redshift Serverless only.

Start using Amazon Redshift Serverless today to run and scale analytics without having to provision and manage data warehouse clusters.

Danilo

Announcing Amazon EMR Serverless (Preview): Run big data applications without managing servers

Post Syndicated from Damon Cortesi original https://aws.amazon.com/blogs/big-data/announcing-amazon-emr-serverless-preview-run-big-data-applications-without-managing-servers/

Today we’re happy to announce Amazon EMR Serverless, a new option in Amazon EMR that makes it easy and cost-effective for data engineers and analysts to run petabyte-scale data analytics in the cloud. With EMR Serverless, you can run applications built using open-source frameworks such as Apache Spark, Hive, and Presto, without having to configure, manage, optimize, or secure clusters. EMR Serverless automatically provisions and scales the compute and memory resources required by your applications, and you only pay for the resources that your applications use.

In this post, we discuss the benefits of EMR Serverless, walk you through the core concepts of EMR Serverless and how you can use it, and show you a quick demo.

Overview of EMR Serverless

Tens of thousands of customers use Amazon EMR, a managed service for running open-source analytics frameworks such as Apache Spark, Hive, and Presto, for large-scale data analytics applications. With Amazon EMR, you can provision clusters of any size in minutes. Amazon EMR automatically installs and configures the frameworks you choose, and provides a performance-optimized runtime that is compatible with and over twice as fast as standard open-source.

Amazon EMR customers have full control over cluster configuration. The ability to customize clusters allows you to optimize for cost and performance based on workload requirements. For example, you can use Amazon Elastic Compute Cloud (Amazon EC2) memory optimized instances to run SQL workloads with low latency, or use the EC2 Graviton2-based instances to improve performance. You can also use EC2 Spot Instances, which are integrated in Amazon EMR so that you can take advantage of unused EC2 capacity in the AWS Cloud to obtain instances at up to a 90% discount compared to On-Demand prices. If you run your applications on Kubernetes, you can use Amazon EMR on Amazon EKS to run your Amazon EMR analytics applications on Amazon Elastic Kubernetes Service (Amazon EKS) clusters.

However, tuning clusters for optimal cost and performance requires engineers to have deep knowledge of the underlying analytics frameworks. Furthermore, the specific compute and memory resources needed to optimally run applications depend on various factors, such as the schedule and complexity of data processing jobs and the volume of data being processed. When these characteristics change over time, you need to reevaluate and reconfigure clusters. In addition, administrators have to secure and monitor the clusters to ensure that they’re compliant with corporate security policies, and adjust security settings each time the cluster is reconfigured. Many customers don’t need this level of customization and control, and want a simpler way to process data using open-source frameworks on the Amazon EMR performance-optimized runtime.

With this in mind, we built EMR Serverless. With EMR Serverless, you can get all the benefits of running Amazon EMR, but with a serverless environment. We had the following goals in mind when we built EMR Serverless:

  • Provide a simpler experience – EMR Serverless is simple to use because you don’t have to configure, optimize, operate, or secure clusters. You don’t have to worry about instance types or cluster sizes, or about applying OS patches. You simply specify the framework and version that you want to use for your application, and submit your data processing jobs. You still get all the benefits that you expect out of Amazon EMR—open-source compatibility, open-source version currency, and performance-optimized runtime—but without the need to manage clusters.
  • No need to guess cluster sizes – EMR Serverless eliminates the need to right-size clusters for varying jobs and data sizes. With EMR Serverless, you create an application using an open-source framework version, and submit jobs to the application. EMR Serverless automatically adds and removes workers at different stages of processing your job. As a result, you don’t have to reconfigure when data volumes change, and you only pay for what your jobs require. You can control costs by specifying the minimum and maximum number of concurrent workers, and the VCPU and memory per worker.
  • Retain Amazon EMR’s performance-optimized runtime and open-source currency – EMR Serverless includes the Amazon EMR performance-optimized runtime for Apache Spark, Hive, and Presto. The Amazon EMR runtime is API-compatible and over twice as fast as standard open-source, so your jobs run faster and incur less compute costs.
  • Seamless integration with EMR Studio – EMR Serverless includes EMR Studio, which provides fully managed serverless Jupyter Notebooks and familiar open-source tools such as Spark UI and Tez UI to help you develop, visualize, and debug your applications.
  • Automatic and fine-grained scaling – EMR Serverless automatically scales up workers at each stage of processing your job and scales them down when they’re not required. You’re charged for aggregate vCPU, memory, and storage resources used from the time a worker starts running until it stops, rounded up to the nearest second with a 1-minute minimum. For example, your job may require 10 workers for the first 10 minutes of processing the job, and 50 workers for the next 5 minutes. With fine-grained automatic scaling, you only incur cost for 10 workers for 10 minutes and 50 workers for 5 minutes. As a result, you don’t have to pay for underutilized resources.
  • Resilience to Availability Zone failures – EMR Serverless is a Regional service. When you submit jobs to an EMR Serverless application, it can run in any Availability Zone in the Region. A job is run in a single Availability Zone to avoid performance implications of network traffic across Availability Zones. In case an Availability Zone is impaired, a job submitted to your EMR Serverless application is automatically run in a different (healthy) Availability Zone. When using resources in a private VPC, EMR Serverless recommends you specify the private VPC configuration for multiple Availability Zones so that EMR Serverless can automatically select a healthy Availability Zone.
  • Enable shared applications – When you submit jobs to an EMR Serverless application, you can specify the AWS Identity and Access Management (IAM) role that must be used by the job to access AWS resources such as Amazon Simple Storage Service (Amazon S3) objects. As a result, different IAM principals can run jobs on a single EMR Serverless application, and each job can only access the AWS resources that the IAM principal is allowed to access. This enables you to set up scenarios where a single application with a pre-initialized pool of workers is made available to multiple tenants wherein each tenant can submit jobs using a different IAM role but use the common pool of pre-initialized workers to immediately process requests.
  • Enable interactive applications – Interactive applications that allow data scientists and analysts to run interactive SQL queries for data exploration require a fast response time to user requests. For such interactive applications, EMR Serverless allows you to pre-initialize a pool of workers. You can start your EMR Serverless application and pre-initialize the pool of workers as soon as a user starts the application, and stop the application to stop workers when no interactive users are active. If processing user requests requires more workers than what have been pre-initialized, EMR Serverless automatically adds more workers up to the maximum concurrent limits that you specify. Therefore, by controlling the number of workers to pre-initialize and the maximum concurrent workers, you can optimize user experience and cost for your interactive applications.
  • Make it easy to switch from one deployment model to another – The same Amazon EMR releases are provided for applications using EMR clusters, Amazon EMR on EKS, and EMR Serverless. When you build an application using an Amazon EMR release (for example a Spark job using Amazon EMR release 6.4), you can choose to run it on an EMR cluster, Amazon EMR on EKS, or EMR Serverless without having to rewrite the application. This allows you to build applications for a given framework version, and retain the flexibility to change the deployment model based on future operational needs.

Core concepts

In this section, we discuss the core concepts in EMR Serverless: applications, jobs, workers, and pre-initialized workers.

Application

With EMR Serverless, you can create one or more applications that use open-source analytics frameworks. To create an application, you specify the open-source framework that you want to use (for example, Apache Spark or Apache Hive), the Amazon EMR release for the open-source framework version (for example, Amazon EMR release 6.4, which corresponds to Apache Spark 3.1.2), and a name for your application. After you create an application, you can submit data processing jobs or interactive requests to your application.

The following are a few examples where you may want to create multiple applications:

  • To use different open-source frameworks (for example, Hive or Spark)
  • To use different versions of open-source frameworks for different use cases (for example, use a newer version of Spark for a new application without having to upgrade older applications)
  • To perform A/B testing when upgrading from one version to another (for example, migrating from Spark 2.4 to Spark 3.1)
  • To maintain separate logical environments for test and production scenarios
  • To provide separate logical environments for different teams with independent cost controls and usage tracking
  • To logically separate different line-of-business applications (for example, finance vs. marketing)

Job

A job is a request submitted to an EMR Serverless application that is asynchronously run and tracked through completion. You can run multiple jobs concurrently in an application.

Workers

An EMR Serverless application internally uses workers to run your jobs. By default, each application uses workers with 4 VCPU, 30 memory, and 20 GB of local storage per worker. You have the ability to customize this configuration.

Pre-initialized workers

EMR Serverless provides an optional feature to pre-initialize workers when your application starts up, so that the workers are ready to process requests immediately when a job is submitted to the application. Pre-initialized workers allow you to maintain a warm pool of workers for the application so that it can provide a sub-second response to start processing requests.

Common usage patterns applied to EMR Serverless

Now let’s examine some common usage scenarios and how EMR Serverless provides you a simple solution.

Pattern #1: Data pipelines

Data pipelines are the backbone of your analytics workloads. A common pattern with data pipelines is to start a cluster, run a job, and stop the cluster when the job is complete. Because data is separated from compute, the inputs and outputs for each job are persisted separately from the cluster (for example, in Amazon S3). These steps are frequently automated using workflow orchestration applications such as Apache Airflow. You can also use AWS services such as AWS Step Functions and AWS Managed Workflows for Apache Airflow (Amazon MWAA) to create such workflows.

Although automating these steps isn’t complex, data engineers have to spend time determining the appropriate EC2 instance and cluster size. They have to determine the Availability Zone where the cluster is run, and handle failover. They have to test their applications when adopting OS updates. When data sizes change over time, they have to resize clusters, or use features like Amazon EMR managed scaling that automatically resize clusters. EMR Serverless provides a simpler solution by eliminating the need for you to handle these scenarios. You simply choose the open-source framework and version for your application, and submit jobs. You don’t have to worry about instance selection, cluster sizes, cluster startup, cluster resize, stopping nodes, Availability Zone failover, or OS updates.

Pattern #2: Shared clusters

Another common pattern is for teams to use a shared long-running cluster to run multiple jobs. In this case, engineers implement queues in Apache YARN for different workloads on a common cluster, and set up rules to automatically scale the cluster up or down based on overall workload. With Amazon EMR on EC2 clusters, you can use Amazon EMR managed scaling, a feature that automatically scales clusters up or down depending on the workload. With EMR Serverless, workers are assigned to each job when required, so your jobs get the resources they need. Moreover, because you only pay for the workers that your jobs require, you don’t incur cost for over-provisioned resources. Finally, because each job can specify the IAM role that should be used to access AWS resources when running the job, you don’t have to set up complex configurations to manage queues and permissions.

Pattern #3: Interactive workloads

A third pattern of use is when teams keep a cluster of instances available to support interactive analysis. In this case, the cluster is set up and initialized with applications that wait for interactive user requests. Applications are pre-initialized so that they can immediately start processing user requests and provide an interactive user experience. EMR Serverless enables this scenario without requiring you to manage clusters. You can specify the number of workers that you want to pre-initialize when you start an EMR Serverless application. Subsequently, when users submit requests, the pre-initialized workers can be used to immediately process user requests. If processing the user requests requires more workers than what you have chosen to pre-initialize, EMR Serverless automatically adds more workers (up to the maximum concurrent limit that you specify). When the requests are processed, EMR Serverless automatically reverts back to maintaining the pre-initialized workers that you specified. You can control when the pre-initialized workers start by controlling when to start and stop your EMR Serverless application. For example, you can start your application when users begin interactive analysis and turn it off when there are no user requests and the application remains idle.

Demo

Conclusion

In this post, we discussed the core concepts and common usage patterns of EMR Serverless, and showed you a quick demo. EMR Serverless is in Preview, in which you can run workloads using Spark 3.1.2 and Hive 2.0 using the API, AWS Command Line Interface (AWS CLI), and SDK. Sign up for now, and for more information, see EMR Serverless documentation.


About the Authors

Damon Cortesi is a Principal Developer Advocate with Amazon Web Services.

Mehul Y. Shah is the GM for Amazon EMR.

Abhishek Sinha is a Principal Product Manager at Amazon Web Services.

AWS Lake Formation – General Availability of Cell-Level Security and Governed Tables with Automatic Compaction

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-lake-formation-general-availability-of-cell-level-security-and-governed-tables-with-automatic-compaction/

A data lake can help you break down data silos and combine different types of analytics into a centralized repository. You can store all of your structured and unstructured data in this repository. However, setting up and managing data lakes involve a lot of manual, complicated, and time-consuming tasks. AWS Lake Formation makes it easy to set up a secure data lake in days instead of weeks or months.

Today, I am excited to share the general availability of some new features that simplify even further loading data, optimizing storage, and managing access to a data lake:

  • Governed Tables – A new type of Amazon Simple Storage Service (Amazon S3) tables that makes it simple and reliable to ingest and manage data at any scale. Governed tables support ACID transactions that let multiple users concurrently and reliably insert and delete data across multiple governed tables. ACID transactions also let you run queries that return consistent and up-to-date data. In case of errors in your extract, transform, and load (ETL) processes, or during an update, changes are not committed and will not be visible.
  • Storage Optimization with Automatic Compaction for governed tables – When this option is enabled, Lake Formation automatically compacts small S3 objects in your governed tables into larger objects to optimize access via analytics engines, such as Amazon Athena and Amazon Redshift Spectrum. By using automatic compaction, you don’t have to implement custom ETL jobs that read, merge, and compress data into new files, and then replace the original files.
  • Granular Access Control with Row and Cell-Level Security – You can control access to specific rows and columns in query results and within AWS Glue ETL jobs based on the identity of who is performing the action. In this way, you don’t have to create (and keep updated) subsets of your data for different roles and legislations. This works for both governed and traditional S3 tables.

Using Governed Tables, ACID Transactions, and Automatic Compaction
In the Lake Formation console, I can enable governed data access and management at table creation. Automatic compaction is enabled by default, and it can be disabled using the AWS Command Line Interface (CLI) or AWS SDKs.

Console screenshot.

Governed tables have a manifest that tracks the S3 objects that are part of the table’s data. I can use the UpdateTableObjects API to keep the manifest updated when adding new objects to the table, and I can call it using the AWS CLI and SDKs. This API is implicitly used by the AWS Glue ETL library.

Moreover, I have access to new Lake Formation APIs to start, commit, or cancel a transaction. I can use these APIs to wrap data loading, data transformation, and output consistent and up-to-date data.

Using Row and Cell-Level Security
There are many use cases where, for a table, you want to restrict access to specific columns, rows, or a combination that depends on the role of the user accessing the data. For example, a company with offices in the US, Germany, and France can create a filter for analysts based in the European Union (EU) to limit access to EU-based customers.

Console screenshot.

The filter can enforce that some columns, such as date of birth (dob) and phone, are not accessible to those analysts. Moreover, access to individual rows can be filtered by using filter expressions. You can configure row filter expressions with a SQL-compatible syntax based on the open-source PartiQL language. In this case, only rows with country equal to Germany or France (country='DE' OR country='FR') are visible.

Console screenshot.

Availability and Pricing
These new features are available today in the following AWS Regions: US East (N. Virginia), US West (Oregon), Europe (Ireland), US East (Ohio), and Asia Pacific (Tokyo).

When querying governed tables, or tables secured with row and cell-level security, you pay by the amount of data scanned (with a 10MB minimum). When using governed tables, transaction metadata is charged by the number of S3 objects tracked, and you pay for the number of transaction requests. Automatic compaction is charged based on the data processed. For more information, see the AWS Lake Formation pricing page.

While implementing these features, we introduced a new Lake Formation Storage API that is integrated with tools such as AWS Glue, Amazon Athena, Amazon Redshift Spectrum, and Amazon QuickSight. You can use this storage API directly in your applications to query tables with a SQL-like syntax (joins are not supported) and get the benefits of governed tables and cell-level security.

See the detailed blog series published during the preview to learn more:

Effective data lakes using AWS Lake Formation

Take advantage of these new features to simplify the creation and management of your data lake.

Danilo

Join the Preview – Amazon EC2 C7g Instances Powered by New AWS Graviton3 Processors

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/join-the-preview-amazon-ec2-c7g-instances-powered-by-new-aws-graviton3-processors/

We announced the first generation AWS-designed Graviton processor in late 2018, and followed it up with the second generation Graviton2 a year later. Today, AWS customers make use of twelve different Graviton2-powered instances including the new X2gd instances that are designed for memory-intensive workloads. All Graviton processors include dedicated cores & caches for each vCPU, along with additional security features courtesy of AWS Nitro System; the Graviton2 processors add support for always-on memory encryption.

C7g in the Works
I am thrilled to tell you about our upcoming C7g instances. Powered by new Graviton3 processors, these instances are going to be a great match for your compute-intensive workloads: HPC, batch processing, electronic design automation (EDA), media encoding, scientific modeling, ad serving, distributed analytics, and CPU-based machine learning inferencing.

While we are still optimizing these instances, it is clear that the Graviton3 is going to deliver amazing performance. In comparison to the Graviton2, the Graviton3 will deliver up to 25% more compute performance and up to twice as much floating point & cryptographic performance. On the machine learning side, Graviton3 includes support for bfloat16 data and will be able to deliver up to 3x better performance.

Graviton3 processors also include a new pointer authentication feature that is designed to improve security. Before return addresses are pushed on to the stack, they are first signed with a secret key and additional context information, including the current value of the stack pointer. When the signed addresses are popped off the stack, they are validated before being used. An exception is raised if the address is not valid, thereby blocking attacks that work by overwriting the stack contents with the address of harmful code. We are working with operating system and compiler developers to add additional support for this feature, so please get in touch if this is of interest to you.

C7g instances will be available in multiple sizes (including bare metal), and are the first in the cloud industry to be equipped with DDR5 memory. In addition to drawing less power, this memory delivers 50% higher bandwidth than the DDR4 memory used in the current generation of EC2 instances.

On the network side, C7g instances will offer up to 30 Gbps of network bandwidth and Elastic Fabric Adapter (EFA) support.

Join the Preview
We are now running a preview of the C7g instances so that you can be among the first to experience all of this power. Sign up now, take an instance for a spin, and let me know what you think!

Jeff;

New – Use Amazon S3 Event Notifications with Amazon EventBridge

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/

We launched Amazon EventBridge in mid-2019 to make it easy for you to build powerful, event-driven applications at any scale. Since that launch, we have added several important features including a Schema Registry, the power to Archive and Replay Events, support for Cross-Region Event Bus Targets, and API Destinations to allow you to send events to any HTTP API. With support for a very long list of destinations and the ability to do pattern matching, filtering, and routing of events, EventBridge is an incredibly powerful and flexible architectural component.

S3 Event Notifications
Today we are making it even easier for you to use EventBridge to build applications that react quickly and efficiently to changes in your S3 objects. This is a new, “directly wired” model that is faster, more reliable, and more developer-friendly than ever. You no longer need to make additional copies of your objects or write specialized, single-purpose code to process events.

At this point you might be thinking that you already had the ability to react to changes in your S3 objects, and wondering what’s going on here. Back in 2014 we launched S3 Event Notifications to SNS Topics, SQS Queues, and Lambda functions. This was (and still is) a very powerful feature, but using it at enterprise-scale can require coordination between otherwise-independent teams and applications that share an interest in the same objects and events. Also, EventBridge can already extract S3 API calls from CloudTrail logs and use them to do pattern matching & filtering. Again, very powerful and great for many kinds of apps (with a focus on auditing & logging), but we always want to do even better.

Net-net, you can now configure S3 Event Notifications to directly deliver to EventBridge! This new model gives you several benefits including:

Advanced Filtering – You can filter on many additional metadata fields, including object size, key name, and time range. This is more efficient than using Lambda functions that need to make calls back to S3 to get additional metadata in order to make decisions on the proper course of action. S3 only publishes events that match a rule, so you save money by only paying for events that are of interest to you.

Multiple Destinations – You can route the same event notification to your choice of 18 AWS services including Step Functions, Kinesis Firehose, Kinesis Data Streams, and HTTP targets via API Destinations. This is a lot easier than creating your own fan-out mechanism, and will also help you to deal with those enterprise-scale situations where independent teams want to do their own event processing.

Fast, Reliable Invocation – Patterns are matched (and targets are invoked) quickly and directly. Because S3 provides at-least-once delivery of events to EventBridge, your applications will be more reliable.

You can also take advantage of other EventBridge features, including the ability to archive and then replay events. This allows you to reprocess events in case of an error or if you add a new target to an event bus.

Getting Started
I can get started in minutes. I start by enabling EventBridge notifications on one of my S3 buckets (jbarr-public in this case). I open the S3 Console, find my bucket, open the Properties tab, scroll down to Event notifications, and click Edit:

I select On, click Save changes, and I’m ready to roll:

Now I use the EventBridge Console to create a rule. I start, as usual, by entering a name and a description:

Then I define a pattern that matches the bucket and the events of interest:

One pattern can match one or more buckets and one or more events; the following events are supported:

  • Object Created
  • Object Deleted
  • Object Restore Initiated
  • Object Restore Completed
  • Object Restore Expired
  • Object Tags Added
  • Object Tags Deleted
  • Object ACL Updated
  • Object Storage Class Changed
  • Object Access Tier Changed

Then I choose the default event bus, and set the target to an SNS topic (BucketAction) which publishes the messages to my Amazon email address:

I click Create, and I am all set. To test it out, I simply upload some files to my bucket and await the messages:

The message contains all of the interesting and relevant information about the event, and (after some unquoting and formatting), looks like this:

{
    "version": "0",
    "id": "2d4eba74-fd51-3966-4bfa-b013c9da8ff1",
    "detail-type": "Object Created",
    "source": "aws.s3",
    "account": "348414629041",
    "time": "2021-11-13T00:00:59Z",
    "region": "us-east-1",
    "resources": [
        "arn:aws:s3:::jbarr-public"
    ],
    "detail": {
        "version": "0",
        "bucket": {
            "name": "jbarr-public"
        },
        "object": {
            "key": "eb_create_rule_mid_1.png",
            "size": 99797,
            "etag": "7a72374e1238761aca7778318b363232",
            "version-id": "a7diKodKIlW3mHIvhGvVphz5N_ZcL3RG",
            "sequencer": "00618F003B7286F496"
        },
        "request-id": "4Z2S00BKW2P1AQK8",
        "requester": "348414629041",
        "source-ip-address": "72.21.198.68",
        "reason": "PutObject"
    }

My initial event pattern was very simple, and matched only the bucket name. I can use content-based filtering to write more complex and more interesting patterns. For example, I could use numeric matching to set up a pattern that matches events for objects that are smaller than 1 megabyte:

{
    "source": [
        "aws.s3"
    ],
    "detail-type": [
        "Object Created",
        "Object Deleted",
        "Object Tags Added",
        "Object Tags Deleted"
    ],

    "detail": {
        "bucket": {
            "name": [
                "jbarr-public"
            ]
        },
        "object" : {
            "size": [{"numeric" :["<=", 1048576 ] }]
        }
    }
}

Or, I could use prefix matching to set up a pattern that looks for objects uploaded to a “subfolder” (which doesn’t really exist) of a bucket:

"object": {
  "key" : [{"prefix" : "uploads/"}]
  }]
}

You can use all of this in conjunction with all of the existing EventBridge features, including Archive/Replay. You can also access the CloudWatch metrics for each of your rules:

Available Now
This feature is available now and you can start using it today in all commercial AWS Regions. You pay $1 for every 1 million events that match a rule; check out the EventBridge Pricing page for more information.

Jeff;

New – Amazon EBS Snapshots Archive

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-amazon-ebs-snapshots-archive/

I am pleased to announce the availability of Amazon EBS Snapshots Archive, a new storage tier for the long-term retention of Amazon Elastic Block Store (EBS) snapshots of your EBS volumes.

In a nutshell, EBS is an easy-to-use high-performance block storage service for your Amazon Elastic Compute Cloud (Amazon EC2) instances. An EBS volume mounted to your EC2 instances lets you boot an operating system and store data for your most performance-demanding workloads. You may use EBS snapshots to create point-in-time copies of your volume data. The first snapshot of a volume contains all of the data written into that volume. Subsequent snapshots are incremental. Snapshots are stored on Amazon Simple Storage Service (Amazon S3), and they may be shared between AWS accounts and AWS Regions.

The ability to take frequent snapshots and easily restore volumes makes EBS snapshots an obvious choice for your data management strategy, alongside other backup options. The incremental nature of snapshots makes them cost-effective for daily and weekly backups that need immediate restores. However, you were telling us that business compliance and regulatory needs have meant that you needed to retain EBS snapshots for longer periods of time (months or years). For example, snapshots taken at the end of a project, or snapshots for test and development preserved for future project releases. The vast majority of these snapshots are taken and never read. For these snapshots, you are looking to lower your storage costs. Today, to benefit from lower storage costs, you may have written complex scripts involving temporary EC2 instances to restore snapshots, mount the corresponding volumes, and transfer the data to lower-cost storage tiers, such as Amazon Glacier.

EBS Snapshots Archive provides a low-cost storage tier to archive full, point-in-time copies of EBS Snapshots that you must retain for 90 days or more for regulatory and compliance reasons, or for future project releases. Now, you can easily archive and manage EBS Snapshots, thereby eliminating the need for custom scripts and third-party tools to manage these snapshots. This lets you move your rarely accessed snapshots to EBS Snapshots Archive to achieve up to 75% lower storage costs, and avoid licensing costs for third-party tools. Furthermore, you can retrieve an archived snapshot within 24-72 hours, and, once restored, use the snapshot to recover an EBS volume.

As per usual, let me show you how it works.

How to Get Started
I have a snapshot available in the US East (N. Virginia) Region, and I want to archive this snapshot for compliance reasons. I open the AWS Management Console, navigate to EC2, then to Snapshots. I select the snapshot I want to archive, and select the Actions menu. I select the Archive snapshot menu option.

EBS Snapshot Archive - create snapshot

I carefully read the confirmation message :-), and I select Archive snapshot.

EBS Snapshot Archive - create snapshot - confirmation

I may monitor the progress of the archive operation with the new Storage Tier tab at the bottom of the screen. After some time, depending on the size of the snapshot, the Tiering status becomes ✅ Archival completed.

EBS Snapshot Archive - create snapshot - archival completedArchived snapshots stay visible in the console. The new Storage tier column indicates the tier used for storage (Standard or Archive).

How do I Restore a Volume?
Restoring a volume from EBS Snapshots Archive is a two-step process. First, I retrieve the snapshot from EBS Snapshots Archive to its original snapshot ID, using RestoreSnapshotTier API call or the management console. It takes between 24-72 hours to retrieve the snapshot from the archive, depending on the snapshot size. Once retrieved, the snapshot appears as a regular snapshot on my account. At this stage, I hydrate the retrieved snapshot into an EBS volume using the default snapshot restore or Fast Snapshot Restore (FSR) for expedited restores, just like usual.

A CloudWatch event is generated when the snapshot is restored. You may listen to this event to avoid pulling the status with the API.

A CreateVolume API call on an archived snapshot will fail. You must restore a snapshot from archive before you use it to create a volume.

Using the AWS Management Console, I select the snapshot that I want to restore, I select the Actions menu, and then I select the Restore snapshot from archive menu option.

EBS Snapshot Archive - create snapshot - restore archive

I have the choice to restore the snapshot permanently, or just temporarily. At the end of the temporary duration, the standard tier snapshot is deleted, and only the archive is preserved.

EBS Snapshot Archive - create snapshot - restore archive - confirmation

After a while, depending on the snapshot size, the archive is restored to standard storage and may be used to recreate a volume, just like usual. I may monitor the progress of the retrieval and the lifetime for temporarily restored archives in the new Storage tier tab in the bottom half of the screen. Temporary restored snapshots may be kept for up to 180 days.

Pricing and Availability
EBS Snapshots Archive is available for you today in 17 AWS Regions. At the time of launch, it is not available in the two Regions in China, Asia Pacific (Seoul), Asia Pacific (Osaka), Canada (Central), and South America (São Paulo).

As per usual, you pay as-you-go, with no minimum or fixed fees. There are two metrics that influence EBS Snapshots Archive billing: data storage and data retrieval. We charge you $0.0125 per GB-month of stored data and $0.03 per GB retrieved. You are charged for a 90-day period at minimum. This means that if you delete a snapshot archive or permanently restore it less than 90 days after creation, then we charge for the full 90-day period. The EBS pricing page has the details.

Go ahead and start to configure your long term storage for EBS snaphots today.

— seb

New – AWS Control Tower Account Factory for Terraform

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-aws-control-tower-account-factory-for-terraform/

AWS Control Tower makes it easier to set up and manage a secure, multi-account AWS environment. AWS Control Tower uses AWS Organizations to create what is called a landing zone, bringing ongoing account management and governance based on our experience working with thousands of customers.

If you use AWS CloudFormation to manage your infrastructure as code, you can customize your AWS Control Tower landing zone using Customizations for AWS Control Tower, a solution that helps you deploy custom templates and policies to individual accounts and organizational units (OUs) within your organization.

But what if you use Terraform to manage your AWS infrastructure?

Today, I am happy to share the availability of AWS Control Tower Account Factory for Terraform (AFT), a new Terraform module maintained by the AWS Control Tower team that allows you to provision and customize AWS accounts through Terraform using a deployment pipeline. The source code for the development pipeline can be stored in AWS CodeCommit, GitHub, GitHub Enterprise, or BitBucket. With AFT, you can automate the creation of fully functional accounts that have access to all the resources they need to be productive. The module works with Terraform open source, Terraform Enterprise, and Terraform Cloud.

Architectural diagram.

Let’s see how this works in practice.

Using AWS Control Tower Account Factory for Terraform
First, I create a main.tf file that uses the AWS Control Tower Account Factory for Terraform (AFT) module:

module "aft" {
  source = "[email protected]:aws-ia/terraform-aws-control_tower_account_factory.git"

  # Required Parameters
  ct_management_account_id    = "123412341234"
  log_archive_account_id      = "234523452345"
  audit_account_id            = "345634563456"
  aft_management_account_id   = "456745674567"
  ct_home_region              = "us-east-1"
  tf_backend_secondary_region = "us-west-2"

  # Optional Parameters
  terraform_distribution = "oss"
  vcs_provider           = "codecommit"

  # Optional Feature Flags
  aft_feature_delete_default_vpcs_enabled = false
  aft_feature_cloudtrail_data_events      = false
  aft_feature_enterprise_support          = false
}

The first six parameters are required. As a prerequisite, I need to pass the ID of four AWS accounts in my AWS organization:

  • ct_management_account_id – AWS Control Tower management account
  • log_archive_account_id – Log Archive account
  • audit_account_id – Audit account
  • aft_management_account_id – AFT management account

Then, I have to pass two AWS Regions:

  • ct_home_region – The Region from which this module will be executed. This must be the same Region where AWS Control Tower is deployed.
  • tf_backend_secondary_region – The backend primary Region is the same as the AFT Region. This parameter defines the secondary Region to replicate to. AFT creates a backend for state tracking for its own state. It is also used for Terraform when using the open-source version.

The other parameters are optional and are set to their default value in the previous main.tf file:

  • terraform_distribution – To select between Terraform open source (default), Enterprise, or Cloud
  • vcs_provider – To choose the version control system to use between AWS CodeCommit (default), GitHub, GitHub Enterprise, or BitBucket.

These feature flags are disabled by default and can be omitted unless you want to enable them:

  • aft_feature_delete_default_vpcs_enabled – To automatically delete the default VPC for new accounts.
  • aft_feature_cloudtrail_data_events – To enable AWS CloudTrail data events for new accounts. Be aware that this option, usually required for compliance in highly regulated environments, can have an impact on your costs.
  • aft_feature_enterprise_support – To automatically enroll new accounts with Enterprise Support (if you have an Enterprise Support Plan).

First, I initialize the project and download the plugins:

terraform init

Then, I use AWS Single Sign-On to log in with the AWS Control Tower management account and start the deployment:

terraform apply

I confirm with a yes and, after some time, the deployment is complete.

Now, I use AWS SSO again to log in with the AFT management account. In the AWS CodeCommit console, I find four repositories that I can use to customize the accounts created with AFT.

Console screenshot.

These repositories are used by pipelines managed by AWS CodePipeline to automate the account creation:

  • xaft-account-request – This is where I place requests for accounts provisioned and managed by AFT.
  • aft-global-customizations – I can use this repository to customize all provisioned accounts with customer-defined resources. The resources can be created through Terraform or through Python.
  • aft-account-customizations – Here, I can customize provisioned accounts depending on the value of the account_customizations_name parameter in the aft-account-request repository. In this way, I can create different sets of customizations depending on the role the account will be used for.
  • aft-account-provisioning-customizations – This repository uses AWS Step Functions to customize the provisioning process for new accounts and simplify the integration with additional environments. State machines can use AWS Lambda functions, Amazon Elastic Container Service (Amazon ECS) or AWS Fargate tasks, custom activities hosted either on AWS or on-premises, or Amazon Simple Notification Service (SNS) and Amazon Simple Queue Service (SQS) to communicate with external applications.

Currently, these four repositories are all empty. To start, I use the code in the sources/aft-customizations-repos folder in the GitHub repo of the AFT Terraform module.

Using the example in the aft-account-request repository, I prepare a template to create a couple of AWS accounts. One of the two accounts is for a software developer.

To help software developers be productive quickly, I create a specific account customization. In the template, I set the parameter account_customizations_name equal to developer-customization.

Then, in the aft-account-customizations repository, I create a developer-customization folder where I put a Terraform template to automatically create an AWS Cloud9 EC2-based development environment for new accounts of that type. Optionally, I can extend that with my Python code, for example, to invoke internal or external APIs. Using this approach, all new accounts for software developers will have their development environment ready as they go through the delivery pipeline.

I push the changes to the main branch (first for the aft-account-customizations repository, then for the aft-account-request). This triggers the execution of the pipeline. After a few minutes, the two new accounts are ready to be used.

You can customize accounts created by AFT based on your unique requirements. For example, you can provide each account with its own specific security setup (such as IAM roles or security groups) and storage (for example, pre-configured Amazon Simple Storage Service (Amazon S3) buckets).

Availability and Pricing
AWS Control Tower Account Factory for Terraform (AFT) works in any Region where AWS Control Tower is available. There are no additional costs when using AFT. You pay for the services used by the solution. For example, when you set up AWS Control Tower, you will begin to incur costs for AWS services configured to set up your landing zone and mandatory guardrails.

When building this solution, we worked together with HashiCorp. Armon Dadgar, HashiCorp Co-Founder and CTO, told us: “Managing cloud environments with hundreds or thousands of users can be a complex and time-consuming process. Using a software delivery pipeline integrating Terraform and AWS Control Tower makes it easier to achieve consistent governance and compliance requirements across all accounts.”

The pipeline provides an account creation process that monitors when account provisioning is complete and then triggers additional Terraform modules to enhance the account with further customizations. You can configure the pipeline to use your own custom Terraform modules or pick from pre-published Terraform modules for common products and configurations.

Simplify and standardize AWS account creation using AWS Control Tower Account Factory for Terraform.

Danilo

New – Recycle Bin for EBS Snapshots

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-recycle-bin-for-ebs-snapshots/

It is easy to create EBS Snapshots, and just as easy to either delete them manually or to use the Data Lifecycle Manager to delete them automatically in accord with your organization’s retention model. Sometimes, as it turns out, it is a bit too easy to delete snapshots, and a well-intended cleanup effort or a wayward script can sometimes go a bit overboard!

New Recycle Bin
In order to give you more control over the deletion process, we are launching a Recycle Bin for EBS Snapshots. As you will see in a moment, you can now set up rules to retain deleted snapshots so that you can recover them after an accidental deletion. You can think of this as a two-level model, where individual AWS users are responsible for the initial deletion, and then a designated “Recycle Bin Administrator” (as specified by an IAM role) manages retention and recovery.

Rules can apply to all snapshots, or to snapshots that include a specified set of tag/value pairs. Each rule specifies a retention period (between one day and one year), after which the snapshot is permanently deleted.

Let’s Recycle!
I open the Recycle Bin Console, select the region of interest, and click Create retention rule to begin:

I call my first rule KeepAll, and set it to retain all deleted EBS snapshots for 4 days:

I add a tag (User) to the rule, and click Create retention rule:

Because Apply to all resources is checked, this is a general rule that applies when there are no applicable rules that specify one or more tags.

Then I create a second rule (KeepDev) that retains snapshots tagged with a Mode of Dev for just one day:

If two different tag-based rules match the same resource, then the one with the longer retention period applies.

Here are my retention rules:

Here are my EBS snapshots. As you can see, the first three are tagged with a Mode of Dev:

In an effort to save several cents per month, I impulsively delete them all:

And they are gone:

Later in the day, a member of my developer team messages me in a panic and lets me know that they desperately need the latest snapshot of the development server’s code. I open the Recycle Bin and I locate the snapshot (DevServer_2021_10_6):

I select the snapshot and click Recover:

Then I confirm my intent:

And the snapshot is available once again:

As has always been the case, Fast Snapshot Restore is disabled when a snapshot is deleted. With this launch, it will remain disabled when a snapshot is restored.

All of this functionality (creating rules, listing resources in the Recycle Bin, and restoring them) is also available from the CLI and via the Recycle Bin APIs.

Things to Know
Here are a couple of things to know about the new Recycle Bin:

IAM Support – As I mentioned earlier, you can use AWS Identity and Access Management (IAM) to grant access to this feature, and should consider creating an empowered user known as the Recycle Bin Administrator.

Rule Changes – You can make changes to your retention rules at any time, but be aware that the rules are evaluated (and the retention period is set) when you delete a snapshot. Changing a rule after an item has been deleted will not alter the retention period for the item.

Pricing – Resources that are in the Recycle Bin are charged the usual price, but be aware that creating rules with long retention periods could increase your AWS bill. On a related note, be sure that keeping deleted snapshots around does not violate your organization’s data retention policies. There is no charge for deleting or recovering a resource.

In the Bin – Resources in the Recycle Bin are immutable. If a resource is recovered, all of its existing metadata (tags and so forth) is also recovered intact.

Recycling  – We will do our best to recycle all of the zeroes and all of the ones once when a resource in your Recycle Bin reaches the end of its retention period!

Jeff;

Introducing Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/introducing-karpenter-an-open-source-high-performance-kubernetes-cluster-autoscaler/

Today we are announcing that Karpenter is ready for production. Karpenter is an open-source, flexible, high-performance Kubernetes cluster autoscaler built with AWS. It helps improve your application availability and cluster efficiency by rapidly launching right-sized compute resources in response to changing application load. Karpenter also provides just-in-time compute resources to meet your application’s needs and will soon automatically optimize a cluster’s compute resource footprint to reduce costs and improve performance.

Before Karpenter, Kubernetes users needed to dynamically adjust the compute capacity of their clusters to support applications using Amazon EC2 Auto Scaling groups and the Kubernetes Cluster Autoscaler. Nearly half of Kubernetes customers on AWS report that configuring cluster auto scaling using the Kubernetes Cluster Autoscaler is challenging and restrictive.

When Karpenter is installed in your cluster, Karpenter observes the aggregate resource requests of unscheduled pods and makes decisions to launch new nodes and terminate them to reduce scheduling latencies and infrastructure costs. Karpenter does this by observing events within the Kubernetes cluster and then sending commands to the underlying cloud provider’s compute service, such as Amazon EC2.

Karpenter is an open-source project licensed under the Apache License 2.0. It is designed to work with any Kubernetes cluster running in any environment, including all major cloud providers and on-premises environments. We welcome contributions to build additional cloud providers or to improve core project functionality. If you find a bug, have a suggestion, or have something to contribute, please engage with us on GitHub.

Getting Started with Karpenter on AWS
To get started with Karpenter in any Kubernetes cluster, ensure there is some compute capacity available, and install it using the Helm charts provided in the public repository. Karpenter also requires permissions to provision compute resources on the provider of your choice.

Once installed in your cluster, the default Karpenter provisioner will observe incoming Kubernetes pods, which cannot be scheduled due to insufficient compute resources in the cluster and automatically launch new resources to meet their scheduling and resource requirements.

I want to show a quick start using Karpenter in an Amazon EKS cluster based on Getting Started with Karpenter on AWS. It requires the installation of AWS Command Line Interface (AWS CLI), kubectl, eksctl, and Helm (the package manager for Kubernetes). After setting up these tools, create a cluster with eksctl. This example configuration file specifies a basic cluster with one initial node.

cat <<EOF > cluster.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: eks-karpenter-demo
  region: us-east-1
  version: "1.20"
managedNodeGroups:
  - instanceType: m5.large
    amiFamily: AmazonLinux2
    name: eks-kapenter-demo-ng
    desiredCapacity: 1
    minSize: 1
    maxSize: 5
EOF
$ eksctl create cluster -f cluster.yaml

Karpenter itself can run anywhere, including on self-managed node groups, managed node groups, or AWS Fargate. Karpenter will provision EC2 instances in your account.

Next, you need to create necessary AWS Identity and Access Management (IAM) resources using the AWS CloudFormation template and IAM Roles for Service Accounts (IRSA) for the Karpenter controller to get permissions like launching instances following the documentation. You also need to install the Helm chart to deploy Karpenter to your cluster.

$ helm repo add karpenter https://charts.karpenter.sh
$ helm repo update
$ helm upgrade --install --skip-crds karpenter karpenter/karpenter --namespace karpenter \
  --create-namespace --set serviceAccount.create=false --version 0.5.0 \
  --set controller.clusterName=eks-karpenter-demo
  --set controller.clusterEndpoint=$(aws eks describe-cluster --name eks-karpenter-demo --query "cluster.endpoint" --output json) \
  --wait # for the defaulting webhook to install before creating a Provisioner

Karpenter provisioners are a Kubernetes resource that enables you to configure the behavior of Karpenter in your cluster. When you create a default provisioner, without further customization besides what is needed for Karpenter to provision compute resources in your cluster, Karpenter automatically discovers node properties such as instance types, zones, architectures, operating systems, and purchase types of instances. You don’t need to define these spec:requirements if there is no explicit business requirement.

cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
#Requirements that constrain the parameters of provisioned nodes. 
#Operators { In, NotIn } are supported to enable including or excluding values
  requirements:
    - key: node.k8s.aws/instance-type #If not included, all instance types are considered
      operator: In
      values: ["m5.large", "m5.2xlarge"]
    - key: "topology.kubernetes.io/zone" #If not included, all zones are considered
      operator: In
      values: ["us-east-1a", "us-east-1b"]
    - key: "kubernetes.io/arch" #If not included, all architectures are considered
      values: ["arm64", "amd64"]
    - key: " karpenter.sh/capacity-type" #If not included, the webhook for the AWS cloud provider will default to on-demand
      operator: In
      values: ["spot", "on-demand"]
  provider:
    instanceProfile: KarpenterNodeInstanceProfile-eks-karpenter-demo
  ttlSecondsAfterEmpty: 30  
EOF

The ttlSecondsAfterEmpty value configures Karpenter to terminate empty nodes. If this value is disabled, nodes will never scale down due to low utilization. To learn more, see Provisioner custom resource definitions (CRDs) on the Karpenter site.

Karpenter is now active and ready to begin provisioning nodes in your cluster. Create some pods using a deployment, and watch Karpenter provision nodes in response.

$ kubectl create deployment --name inflate \
          --image=public.ecr.aws/eks-distro/kubernetes/pause:3.2

Let’s scale the deployment and check out the logs of the Karpenter controller.

$ kubectl scale deployment inflate --replicas 10
$ kubectl logs -f -n karpenter $(kubectl get pods -n karpenter -l karpenter=controller -o name)
2021-11-23T04:46:11.280Z        INFO    controller.allocation.provisioner/default       Starting provisioning loop      {"commit": "abc12345"}
2021-11-23T04:46:11.280Z        INFO    controller.allocation.provisioner/default       Waiting to batch additional pods        {"commit": "abc123456"}
2021-11-23T04:46:12.452Z        INFO    controller.allocation.provisioner/default       Found 9 provisionable pods      {"commit": "abc12345"}
2021-11-23T04:46:13.689Z        INFO    controller.allocation.provisioner/default       Computed packing for 10 pod(s) with instance type option(s) [m5.large]  {"commit": " abc123456"}
2021-11-23T04:46:16.228Z        INFO    controller.allocation.provisioner/default       Launched instance: i-01234abcdef, type: m5.large, zone: us-east-1a, hostname: ip-192-168-0-0.ec2.internal    {"commit": "abc12345"}
2021-11-23T04:46:16.265Z        INFO    controller.allocation.provisioner/default       Bound 9 pod(s) to node ip-192-168-0-0.ec2.internal  {"commit": "abc12345"}
2021-11-23T04:46:16.265Z        INFO    controller.allocation.provisioner/default       Watching for pod events {"commit": "abc12345"}

The provisioner’s controller listens for Pods changes, which launched a new instance and bound the provisionable Pods into the new nodes.

Now, delete the deployment. After 30 seconds (ttlSecondsAfterEmpty = 30), Karpenter should terminate the empty nodes.

$ kubectl delete deployment inflate
$ kubectl logs -f -n karpenter $(kubectl get pods -n karpenter -l karpenter=controller -o name)
2021-11-23T04:46:18.953Z        INFO    controller.allocation.provisioner/default       Watching for pod events {"commit": "abc12345"}
2021-11-23T04:49:05.805Z        INFO    controller.Node Added TTL to empty node ip-192-168-0-0.ec2.internal {"commit": "abc12345"}
2021-11-23T04:49:35.823Z        INFO    controller.Node Triggering termination after 30s for empty node ip-192-168-0-0.ec2.internal {"commit": "abc12345"}
2021-11-23T04:49:35.849Z        INFO    controller.Termination  Cordoned node ip-192-168-116-109.ec2.internal   {"commit": "abc12345"}
2021-11-23T04:49:36.521Z        INFO    controller.Termination  Deleted node ip-192-168-0-0.ec2.internal    {"commit": "abc12345"}

If you delete a node with kubectl, Karpenter will gracefully cordon, drain, and shut down the corresponding instance. Under the hood, Karpenter adds a finalizer to the node object, which blocks deletion until all pods are drained, and the instance is terminated.

Things to Know
Here are a couple of things to keep in mind about Kapenter features:

Accelerated Computing: Karpenter works with all kinds of Kubernetes applications, but it performs particularly well for use cases that require rapid provisioning and deprovisioning large numbers of diverse compute resources quickly. For example, this includes batch jobs to train machine learning models, run simulations, or perform complex financial calculations. You can leverage custom resources of nvidia.com/gpu, amd.com/gpu, and aws.amazon.com/neuron for use cases that require accelerated EC2 instances.

Provisioners Compatibility: Kapenter provisioners are designed to work alongside static capacity management solutions like Amazon EKS managed node groups and EC2 Auto Scaling groups. You may choose to manage the entirety of your capacity using provisioners, a mixed model with both dynamic and statically managed capacity, or a fully static approach. We recommend not using Kubernetes Cluster Autoscaler at the same time as Karpenter because both systems scale up nodes in response to unschedulable pods. If configured together, both systems will race to launch or terminate instances for these pods.

Join our Karpenter Community
Karpenter’s community is open to everyone. Give it a try, and join our working group meeting, or follow our roadmap for future releases that interest you. As I said, we welcome any contributions such as bug reports, new features, corrections, or additional documentation.

To learn more about Karpenter, see the documentation and demo video from AWS Container Day.

Channy

New – AWS Marketplace for Containers Anywhere to Deploy Your Kubernetes Cluster in Any Environment

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-aws-marketplace-for-containers-anywhere-to-deploy-your-kubernetes-cluster-in-any-environment/

More than 300,000 customers use AWS Marketplace today to find, subscribe to, and deploy third-party software packaged as Amazon Machine Images (AMIs), software-as-a-service (SaaS), and containers. Customers can find and subscribe containerized third-party applications from AWS Marketplace and deploy them in Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon EKS).

Many customers that run Kubernetes applications on AWS want to deploy them on-premises due to constraints, such as latency and data governance requirements. Also, once they have deployed the Kubernetes application, they need additional tools to govern the application through license tracking, billing, and upgrades.

Today, we announce AWS Marketplace for Containers Anywhere, a set of capabilities that allows AWS customers to find, subscribe to, and deploy third-party Kubernetes applications from AWS Marketplace on any Kubernetes cluster in any environment. This capability makes the AWS Marketplace more useful for customers who run containerized workloads.

With this launch, you can deploy third party Kubernetes applications to on-premises environments using Amazon EKS Anywhere or any customer self-managed Kubernetes cluster in on-premises environments or in Amazon Elastic Compute Cloud (Amazon EC2), enabling you to use a single catalog to find container images regardless of where they eventually plan to deploy.

With AWS Marketplace for Containers Anywhere, you can get the same benefits as any other products in AWS Marketplace, including consolidated billing, flexible payment options, and lower pricing for long-term contracts. You can find vetted, security-scanned, third-party Kubernetes applications, manage upgrades with a few clicks, and track all licenses and bills. You can migrate applications between any environment without purchasing duplicate licenses. After you have subscribed to an application using this feature, you can migrate your Kubernetes applications to AWS by deploying the independent software vendor (ISV) provided Helm charts onto their Kubernetes clusters on AWS without changing their licenses.

Getting Started with AWS Marketplace for Containers Anywhere
You can get started by visiting AWS Marketplace. Easily search in Delivery methods in all products, then filter Helm Chart in the catalog to find Kubernetes-based applications that they can deploy on AWS and on premises.

If you chose to subscribe to your favorite product, you would select Continue to Subscribe.

Once you accept the seller’s end user license agreement (EULA), select Create Contract and Continue to Configuration.

You can configure the software deployment using the dropdowns. Once Fulfillment option and Software Version are selected, choose Continue to Launch.

To deploy on Amazon EKS, you have the option to deploy the application on a new EKS cluster or copy and paste commands into existing clusters. You can also deploy into self-managed Kubernetes in EC2 by clicking on the self-managed Kubernetes option in the supported services.

To deploy on-premises or in EC2, you can select EKS Anywhere and then take an additional step to request a license token on the AWS Marketplace launch page. You will then use commands provided by AWS Marketplace to download container images, Helm charts from the AWS Marketplace Elastic Container Registry (ECR), the service account creation, and the token to apply IAM Roles for Service Accounts on your EKS cluster.

To upgrade or renew your existing software licenses, you can go to the AWS Marketplace website for a self-service upgrade or renewal experience. You can also negotiate a private offer directly with ISVs to upgrade and renew the application. After you subscribe to the new offer, the license is automatically updated in AWS License Manager. You can view all the licenses you have purchased from AWS Marketplace using AWS License Manager, including the application capabilities you’re entitled to and the expiration date.

Launch Partners of AWS Marketplace for Containers Anywhere
Here is the list of our launch partners to support an on-premises deployment option. Try them out today!

  • D2iQ delivers the leading independent platform for enterprise-grade Kubernetes implementations at scale and across environments, including cloud, hybrid, edge, and air-gapped.
  • HAProxy Technologies offers widely used software load balancers to deliver websites and applications with the utmost performance, observability, and security at any scale and in any environment.
  • Isovalent builds open-source software and enterprise solutions such as Cilium and eBPF solving networking, security, and observability needs for modern cloud-native infrastructure.
  • JFrog‘s “liquid software” mission is to power the world’s software updates through the seamless, secure flow of binaries from developers to the edge.
  • Kasten by Veeam provides Kasten K10, a data management platform purpose-built for Kubernetes, an easy-to-use, scalable, and secure system for backup and recovery, disaster recovery, and application mobility.
  • Nirmata, the creator of Kyverno, provides open source and enterprise solutions for policy-based security and automation of production Kubernetes workloads and clusters.
  • Palo Alto Networks, the global cybersecurity leader, is shaping the cloud-centric future with technology that is transforming the way people and organizations operate.
  • Prosimo‘s SaaS combines cloud networking, performance, security, AI powered observability and cost management to reduce enterprise cloud deployment complexity and risk.
  • Solodev is an enterprise CMS and digital ecosystem for building custom cloud apps, from content to crypto. Get access to DevOps, training, and 24/7 support—powered by AWS.
  • Trilio, a leader in cloud-native data protection for Kubernetes, OpenStack, and Red Hat Virtualization environments, offers solutions for backup and recovery, migration, and application mobility.

If you are interested in offering your Kubernetes application on AWS Marketplace, register and modify your product to integrate with AWS License Manager APIs using the provided AWS SDK. Integrating with AWS License Manager will allow the application to check licenses procured through AWS Marketplace.

Next, you would create a new container product on AWS Marketplace with a contract offer by submitting details of the listing, including the product information, license options, and pricing. The details would be reviewed, approved, and published by AWS Marketplace Technical Account Managers. You would then submit the new container image to AWS Marketplace ECR and add it to a newly created container product through the self-service Marketplace Management Portal. All container images are scanned for Common Vulnerabilities and Exposures (CVEs).

Finally, the product listing and container images would be published and accessible by customers on AWS Marketplace’s customer website. To learn more details about creating container products on AWS Marketplace, visit Getting started as a seller and Container-based products in the AWS documentation.

Available Now
The feature of AWS Marketplace for Containers Anywhere is available now in all Regions that support AWS Marketplace. You can start using the feature directly from the product of launch partners.

Give it a try, and please send us feedback either in the AWS forum for AWS Marketplace or through your usual AWS support contacts.

Channy

Announcing AWS Data Exchange for APIs: Find, Subscribe to, and Use Third-party APIs with Consistent Authentication

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/data-exchange-for-apis-find-subscribe-use-third-party-apis-consistent-authentication/

Data is at the center of many processes and products, whether it’s a large-scale dataset used to train machine learning models, a relational database, or an API-based integration. AWS Data Exchange lets you discover, subscribe to, and use hundreds of file-based datasets via Amazon Simple Storage Service (Amazon S3) offered by third parties such as Reuters, Foursquare, Change Healthcare, Vortexa, IMDb, and many more. Additionally, AWS Data Exchange for Amazon Redshift makes it even easier to ingest third-party data in your Amazon Redshift data warehouse, without any manual processing or transformation.

However, in many cases your data projects require more than static datasets because you need frequent and synchronous retrieval of small amounts of information – for example, you might need to fetch a stock price every hour. Data APIs let you answer specific questions quickly and without having to build ad-hoc data pipelines to ingest, process, and analyze bulk datasets. But each API provider has its own ease of use, SDK, documentation, and authentication mechanisms, which makes this harder than it needs to be.

Today, I’m happy to announce the general availability of AWS Data Exchange for APIs, a new capability that lets you find, subscribe to, and use third-party APIs with a consistent access using AWS SDKs, as well as consistent AWS-native authentication and governance. This simplifies the lives of developers and IT administrators who have to integrate and secure the access to multiple third-party APIs.

Now you can make RESTful or GraphQL API calls directly to AWS Data Exchange and receive synchronous responses that contain the information you need, using the AWS SDK in the programming language of your choice. We take care of integrating with the API provider, implementing proper authentication, managing the API subscription, and ensuring charges appear on your AWS bill. You can manage API access centrally with AWS Identity and Access Management (IAM).

As a data provider, you make your API discoverable by millions of AWS customers by listing it in the AWS Data Exchange catalog using an OpenAPI specification and fronting it with an Amazon API Gateway endpoint.

AWS Data Exchange for APIs in Action
First, I look for an API product in the AWS Data Exchange catalog, review its subscription terms, support information, and auto-renewal. Each API product might include multiple public or private subscription offers and periods.

I select Subscribe and a couple of minutes later I’m successfully subscribed.

Within the API product, I select an entitled data set and its latest revision.

Each API revision contains one or more API assets that correspond to a specific API endpoint and a unique Asset ARN.

AWS Data Exchange takes care of invoking API endpoints with the correct authentication.

All I need to do is check the Integration notes, which include instructions and code snippets based on the AWS Command Line Interface (CLI).

Of course, I could implement the very same API call with my favorite programming language using one of the AWS SDKs.

For example, here’s how I’d implement a simple wrapper function in Python:

import json
import urllib
import boto3

adx = boto3.client('dataexchange')

def get_api_response(path, method="GET", querystring={}, headers={}, body={}):
    return adx.send_api_asset(
        DataSetId="4b3fbabc31171662851531b8576a3411",
        RevisionId="e8e78e921af12c76499edc40f92e3082",
        AssetId="557d858c317efdfb5b6c9a2860ec4a03",
        Method=method,
        Path=path,
        QueryStringParameters=urllib.urlencode(querystring),
        RequestHeaders=urllib.urlencode(headers),
        Body=json.dumps(body),
    )

Please note that there are no hard-coded credentials in the code above because all the authorization happens via AWS Identity and Access Management (IAM).

And that’s how you make your first API call via AWS Data Exchange for APIs.

Available Today
AWS Data Exchange for APIs is generally available in all AWS Regions where AWS Data Exchange is available. We’re looking forward to helping you simplify and centralize the management and governance of third-party APIs while we take care of the undifferentiated heavy lifting for you.

Today you can start integrating third-party APIs such as Infutor, Variety Business Intelligence, IMDb, PeopleDataLabs, Neustar, Experian, Foursquare, PredictHQ, WeatherTrends International, and many more.

If you’re a developer, check out the new AWS Data Exchange for APIs documentation to learn more about subscribing and using APIs. If you’re an API provider, check out the new publishing documentation to learn more about publishing new APIs on the AWS Data Exchange catalog.

Alex

AWS Security Profiles: Jenny Brinkley, Director, AWS Security

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profiles-jenny-brinkley-director-aws-security/

AWS Security Profiles: Jenny Brinkley, Director, AWS Security
In the week leading up to AWS re:Invent 2021, we’ll share conversations we’ve had with people at AWS who will be presenting, and get a sneak peek at their work.


How long have you been at AWS, and what do you do in your current role?

I’ve been at AWS for 5½ years. I get to focus on the future of security and compliance. It gives me a lot of space to experiment and try new things, which is how I like to operate.

How did you get started in AWS Security?

I joined AWS through a startup acquisition, and I actually didn’t think I was going to go with the acquisition. I thought AWS would be way too big and move way too slow. I love being in environments where I get to move fast and be entrepreneurial. I started on the product side. I was able to learn what it takes to build and ship products at the scale of AWS – which is on another level and mind-blowing.

Then, like others at AWS, I was able to reinvent myself, find different passions, and experiment with new things. One of those areas for me was compliance. I started to get perspective on how that space was being defined by regulatory activity for the cloud, and it started opening my mind in different ways.

I started thinking, how do you make compliance easier for customers? How do you work with regulated entities to understand how to audit, and to understand the function of how the cloud operates? From there, my career has been about changing how to think about product, about how to make security easier. Layering in this compliance aspect, too, means I get to play in all these different worlds, work with internal and external customers, and work to simplify security, while also understanding where and how compliance fits in, without slowing down innovation.

How do you explain your job to non-tech friends?

I explain my work as removing the fear around security. You go see images of people in hoodies, with darkened faces, and binary code running behind them, and my job is to break that perception and walk in the light – yes, that’s my nod to Olivia Pope in Scandal. I love the idea of that gladiator mentality. You’re going in and solving the big problems, but you’re also creating more visibility and transparency around how security operates. And you’re doing this without making anyone afraid that they’re being watched or monitored, and without holding back innovation. My job is to provide that transparency and clarity, and give people prescriptive guidance on how to operate securely on AWS.

What are you currently working on that you’re excited about?

So much! That’s what I really love about my job – I get to play in a lot of spaces, and the context switching is something that really fuels me. One of the top projects I’m working on is something we just released in response to an ask from the White House, which I feel really privileged to work on. We released a new Cybersecurity Awareness training which is now available to everyone in the world. You can access this training right now, and you can share it with your grandparents or implement it in your corporation or small business. We were able to take a training product we built for all Amazon employees–and then externalize it. The size and the scope is something I’m really excited about. Making security easier for everybody is a big mission for us.

Another big area is up-skilling. You hear a lot about security jobs being the future, so we’re building everything from apprenticeships to new learning paths for anyone interested in security. We’re thinking about how we can build quick learning modules for people to listen to on the go. That’s something I get really excited about in this job – creating opportunities for people to understand that security jobs and opportunities are vast. If you’re curious and want to learn new things, AWS is endless.

You’re presenting at re:Invent this year – can you give readers a sneak peek at what you’re covering?

I am partnering with Eric Brandwine, AWS VP/Distinguished Engineer for a session called Introverts and extroverts collide: Build an inclusive workforce (SEC204). Eric and I are night and day in terms of how we work. In our talk we’ll touch on some of the challenges we had when we first started working together, but how we found value in our different approaches.

We’ll be discussing how he solves problems with technology and how I solve problems regarding people, and thinking about how that empathetic layer resonates between the two perspectives. Not every problem needs technology, and not every problem needs a people-focused solution. But, humans are behind any of those aspects of impact.

We’ll give prescriptive guidance to customers on how they should think about their security culture as it relates to people and as it relates to technology. We’ll talk about how those two worlds can blend together in a way that empowers an entire organization to prioritize security, and that they shouldn’t be afraid of it. We want to help bridge the gaps between the technologists and the empathetic individuals who think about how the technology lands in use cases across a business.

From your perspective, what’s the most important thing leaders can do to create an inclusive work environment?

Listening. Sitting back, getting the feedback, being vulnerable, asking the questions. So much of what we need to do now is practice that listening skill, really understand the motivations of our teams, and then try to create these safe working environments where people feel comfortable sharing their perspectives. It’s not that you’re going to act on everything everyone’s talking about, but at least you get diverse perspectives and points of view to help create an inclusive work environment that makes everyone want to show up, support each other, and do the best work possible.

What’s your favorite Leadership Principle at Amazon and why?

I have two. One is Learn and Be Curious because that is how I like to operate. I think, “what if…” or “why can’t we…”. Then Think Big pairs with “why can’t we…” The culture within AWS really supports that. On a daily basis, we can flip the script on how we think about our jobs and how we position the business.

If you’re entrepreneurial and like to create, this place is like a magic playground. Some people look at my job and they’re so confused with all the different things I get to do – but it goes back to that context switching. I believe that Learn and Be Curious and Think Big fit in that realm for me–I feel like I can be anything, I can do anything. I also had parents who told me as a kid that I could do anything and be anything, so I think that’s just who I am. Those two leadership principles help me to produce and do my best work.

What’s the thing you’re most proud of in your career?

That’s hard. It’s a couple of things. I’ve had a lot of incredible opportunities. One of which was being involved in a startup. We raised the money quickly, we worked with incredible customers, we solved really challenging business issues. The fact that I was able to bring that here to AWS, in a way that now hundreds of thousands of people get to see the kind of work we’re able to produce, is pretty cool.

But honestly, working with some of our new hires who are just getting into the workforce–especially with our diverse candidates–I’m at a place in my career where I want to create opportunities for others. I’m working to create safe spaces for people to operate and do their best work and really break down barriers for people who might not otherwise get those opportunities. That’s what I’m most excited about for the future, and also the most proud about–giving people opportunities to work in careers they never thought were available to them. I love that, and I get to do it daily.

If you had to pick any other job, what would you want to do?

Sports agent. I think I’d be so good at it. I would love to go work with young athletes, especially with the new NCAA ruling that college athletes can get paid for the use of their likeness. I would love to help them develop really interesting business plans.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Jenny Brinkley, Director, AWS Security

Jenny Brinkley

Jenny leads efforts for AWS Security to understand where compliance and security is headed. In her role as a Director, she helps teams understand how to consider security when building their services and deliverables. Prior to joining AWS, Jenny co-founded a security start-up, harvest.ai, that was acquired by AWS in April 2016.

Author

Maddie Bacon

Maddie (she/her) is a technical writer for AWS Security with a passion for creating meaningful content. She previously worked as a security reporter and editor at TechTarget and has a BA in Mathematics. In her spare time, she enjoys reading, traveling, and all things Harry Potter.