All posts by Danilo Poccia

Introducing Amazon Neptune Serverless – A Fully Managed Graph Database that Adjusts Capacity for Your Workloads

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-neptune-serverless-a-fully-managed-graph-database-that-adjusts-capacity-for-your-workloads/

Amazon Neptune is a fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. With Neptune, you can use open and popular graph query languages to execute powerful queries that are easy to write and perform well on connected data. You can use Neptune for graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

Neptune has always been fully managed and handles time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair. However, managing database capacity for optimal cost and performance requires you to monitor and reconfigure capacity as workload characteristics change. Also, many applications have variable or unpredictable workloads where the volume and complexity of database queries can change significantly. For example, a knowledge graph application for social media may see a sudden spike in queries due to sudden popularity.

Introducing Amazon Neptune Serverless
Today, we’re making that easier with the launch of Amazon Neptune Serverless. Neptune Serverless scales automatically as your queries and your workloads change, adjusting capacity in fine-grained increments to provide just the right amount of database resources that your application needs. In this way, you pay only for the capacity you use. You can use Neptune Serverless for development, test, and production workloads and optimize your database costs compared to provisioning for peak capacity.

With Neptune Serverless you can quickly and cost-effectively deploy graphs for your modern applications. You can start with a small graph, and as your workload grows, Neptune Serverless will automatically and seamlessly scale your graph databases to provide the performance you need. You no longer need to manage database capacity and you can now run graph applications without the risk of higher costs from over-provisioning or insufficient capacity from under-provisioning.

With Neptune Serverless, you can continue to use the same query languages (Apache TinkerPop Gremlin, openCypher, and RDF/SPARQL) and features (such as snapshots, streams, high availability, and database cloning) already available in Neptune.

Let’s see how this works in practice.

Creating an Amazon Neptune Serverless Database
In the Neptune console, I choose Databases in the navigation pane and then Create database. For Engine type, I select Serverless and enter my-database as the DB cluster identifier.

Console screenshot.

I can now configure the range of capacity, expressed in Neptune capacity units (NCUs), that Neptune Serverless can use based on my workload. I can now choose a template that will configure some of the next options for me. I choose the Production template that by default creates a read replica in a different Availability Zone. The Development and Testing template would optimize my costs by not having a read replica and giving access to DB instances that provide burstable capacity.

Console screenshot.

For Connectivity, I use my default VPC and its default security group.

Console screenshot.

Finally, I choose Create database. After a few minutes, the database is ready to use. In the list of databases, I choose the DB identifier to get the Writer and Reader endpoints that I am going to use later to access the database.

Using Amazon Neptune Serverless
There is no difference in the way you use Neptune Serverless compared to a provisioned Neptune database. I can use any of the query languages supported by Neptune. For this walkthrough, I choose to use openCypher, a declarative query language for property graphs originally developed by Neo4j that was open-sourced in 2015 and contributed to the openCypher project.

To connect to the database, I start an Amazon Linux Amazon Elastic Compute Cloud (Amazon EC2) instance in the same AWS Region and associate the default security group and a second security group that gives me SSH access.

With a property graph I can represent connected data. In this case, I want to create a simple graph that shows how some AWS services are part of a service category and implement common enterprise integration patterns.

I use curl to access the Writer openCypher HTTPS endpoint and create a few nodes that represent patterns, services, and service categories. The following commands are split into multiple lines in order to improve readability.

curl https://<my-writer-endpoint>:8182/openCypher \
-d "query=CREATE (mq:Pattern {name: 'Message Queue'}),
(pubSub:Pattern {name: 'Pub/Sub'}),
(eventBus:Pattern {name: 'Event Bus'}),
(workflow:Pattern {name: 'WorkFlow'}),
(applicationIntegration:ServiceCategory {name: 'Application Integration'}),
(sqs:Service {name: 'Amazon SQS'}), (sns:Service {name: 'Amazon SNS'}),
(eventBridge:Service {name: 'Amazon EventBridge'}), (stepFunctions:Service {name: 'AWS StepFunctions'}),
(sqs)-[:IMPLEMENT]->(mq), (sns)-[:IMPLEMENT]->(pubSub),
(eventBridge)-[:IMPLEMENT]->(eventBus),
(stepFunctions)-[:IMPLEMENT]->(workflow),
(applicationIntegration)-[:CONTAIN]->(sqs),
(applicationIntegration)-[:CONTAIN]->(sns),
(applicationIntegration)-[:CONTAIN]->(eventBridge),
(applicationIntegration)-[:CONTAIN]->(stepFunctions);"

This is a visual representation of the nodes and their relationships for the graph created by the previous command. The type (such as Service or Pattern) and properties (such as name) are shown inside each node. The arrows represent the relationships (such as CONTAIN or IMPLEMENT) between the nodes.

Visualization of graph data.

Now, I query the database to get some insights. To query the database, I can use either a Writer or a Reader endpoint. First, I want to know the name of the service implementing the “Message Queue” pattern. Note how the syntax of openCypher resembles that of SQL with MATCH instead of SELECT.

curl https://<my-endpoint>:8182/openCypher \
-d "query=MATCH (s:Service)-[:IMPLEMENT]->(p:Pattern {name: 'Message Queue'}) RETURN s.name;"
{
  "results" : [ {
    "s.name" : "Amazon SQS"
  } ]
}

I use the following query to see how many services are in the “Application Integration” category. This time, I use the WHERE clause to filter results.

curl https://<my-endpoint>:8182/openCypher \
-d "query=MATCH (c:ServiceCategory)-[:CONTAIN]->(s:Service) WHERE c.name='Application Integration' RETURN count(s);"
{
  "results" : [ {
    "count(s)" : 4
  } ]
}

There are many options now that I have this graph database up and running. I can add more data (services, categories, patterns) and more relationships between the nodes. I can focus on my application and let Neptune Serverless manage capacity and infrastructure for me.

Availability and Pricing
Amazon Neptune Serverless is available today in the following AWS Regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Asia Pacific (Tokyo), and Europe (Ireland, London).

With Neptune Serverless, you only pay for what you use. The database capacity is adjusted to provide the right amount of resources you need in terms of Neptune capacity units (NCUs). Each NCU is a combination of approximately 2 gibibytes (GiB) of memory with corresponding CPU and networking. The use of NCUs is billed per second. For more information, see the Neptune pricing page.

Having a serverless graph database opens many new possibilities. To learn more, see the Neptune Serverless documentation. Let us know what you build with this new capability!

Simplify the way you work with highly connected data using Neptune Serverless.

Danilo

AWS Week in Review – October 3, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-october-3-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

A new week and a new month just started. Curious which were the most significant AWS news from the previous seven days? I got you covered with this post.

Last Week’s Launches
Here are the launches that got my attention last week:

Amazon File Cache – A high performance cache on AWS that accelerates and simplifies demanding cloud bursting and hybrid workflows by giving access to files using a fast and familiar POSIX interface, no matter if the original files live on premises on any file system that can be accessed through NFS v3 or on S3.

Amazon Data Lifecycle Manager – You can now automatically archive Amazon EBS snapshots to save up to 75 percent on storage costs for those EBS snapshots that you intend to retain for more than 90 days and rarely access.

AWS App Runner – You can now build and run web applications and APIs from source code using the new Node.js 16 managed runtime.

AWS Copilot – The CLI for containerized apps adds IAM permission boundaries, support for FIFO SNS/SQS for the Copilot worker-service pattern, and using Amazon CloudFront for low-latency content delivery and fast TLS-termination for public load-balanced web services.

Bottlerocket – The Linux-based operating system purpose-built to run container workloads is now supported by Amazon Inspector. Amazon Inspector can now recommend an update of Bottlerocket if it finds a vulnerability.

Amazon SageMaker Canvas – Now supports mathematical functions and operators for richer data exploration and to understand the relationships between variables in your data.

AWS Compute Optimizer – Now provides cost and performance optimization recommendations for 37 new EC2 instance types, including bare metal instances (m6g.metal) and compute optimized instances (c7g.2xlarge, hpc6a.48xlarge), and new memory metrics for Windows instances.

AWS Budgets – Use a simplified 1-click workflow for common budgeting scenarios with step-by-step tutorials on how to use each template.

Amazon Connect – Now provides an updated flow designer UI that makes it easier and faster to build personalized and automated end-customer experiences, as well as a queue dashboard to view and compare real-time queue performance through time series graphs.

Amazon WorkSpaces – You can now provision Ubuntu desktops and use virtual desktops for new categories of workloads, such as for your developers, engineers, and data scientists.

Amazon WorkSpaces Core – A fully managed infrastructure-only solution for third-party Virtual Desktop Infrastructure (VDI) management software that simplifies VDI migration and combines your current VDI software with the security and reliability of AWS. Read more about it in this Desktop and Application Streaming blog post.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A few more blog posts you might have missed:

Introducing new language extensions in AWS CloudFormation – In this Cloud Operations & Migrations blog post, we introduce the new language transform that enhances CloudFormation core language with intrinsic functions that simplify handling JSON strings (Fn::ToJsonString), array lengths (Fn::Length), and update and deletion policies.

Building a GraphQL API with Java and AWS Lambda – This blog shows different options for resolving GraphQL queries using serverless technologies on AWS.

For AWS open-source news and updates, here’s the latest newsletter curated by Ricardo to bring you the most recent updates on open-source projects, posts, events, and more.

Upcoming AWS Events
As usual, there are many opportunities to meet:

AWS Summits– Connect, collaborate, and learn about AWS at these free in-person events: Bogotá (October 4), and Singapore (October 6).

AWS Community DaysAWS Community Day events are community-led conferences to share and learn together. Join us in Amersfoort, Netherlands (on October 3, today), Warsaw, Poland (October 14), and Dresden, Germany (October 19).

That’s all from me for this week. Come back next Monday for another Week in Review!

Danilo

AWS Week in Review – September 5, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-september-5-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

As a new week begins, let’s quickly look back at the most significant AWS news from the previous seven days.

Last Week’s Launches
Here are the launches that got my attention last week:

AWS announces open-sourced credentials-fetcher to simplify Microsoft AD access from Linux containers. You can find more in the What’s New post.

AWS Step Functions now has 14 new intrinsic functions that help you process data more efficiently and make it easier to perform data processing tasks such as array manipulation, JSON object manipulation, and math functions within your workflows without having to invoke downstream services or add Task states.

AWS SAM CLI esbuild support is now generally available. You can now use esbuild in the SAM CLI build workflow for your JavaScript applications.

Amazon QuickSight launches a new user interface for dataset management that replaces the existing popup dialog modal with a full-page experience, providing a clearer breakdown of dataset management categories.

AWS GameKit adds Unity support. With this release for Unity, you can integrate cloud-based game features into Win64, MacOS, Android, or iOS games from both the Unreal and Unity engines with just a few clicks.

AWS and VMware announce VMware Cloud on AWS integration with Amazon FSx for NetApp ONTAP. Read more in Veliswa‘s blog post.

The AWS Region in the United Arab Emirates (UAE) is now open. More info in Marcia‘s blog post.

View of Abu Dhabi in the United Arab Emirates

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A few more blog posts you might have missed:

Easy analytics and cost-optimization with Amazon Redshift Serverless – Four different use cases of Redshift Serverless are discussed in this post.

Building cost-effective AWS Step Functions workflows – In this blog post, Ben explains the difference between Standard and Express Workflows, including costs, migrating from Standard to Express, and some interesting ways of using both together.

How to subscribe to the new Security Hub Announcements topic for Amazon SNS – You can now receive updates about new Security Hub services and features, newly supported standards and controls, and other Security Hub changes.

Deploying AWS Lambda functions using AWS Controllers for Kubernetes (ACK) – With the ACK service controller for AWS Lambda, you can provision and manage Lambda functions with kubectl and custom resources.

For AWS open-source news and updates, here’s the latest newsletter curated by Ricardo to bring you the most recent updates on open-source projects, posts, events, and more.

Upcoming AWS Events
Depending on where you are on this planet, there are many opportunities to meet and learn:

AWS Summits – Come together to connect, collaborate, and learn about AWS. Registration is open for the following in-person AWS Summits: Ottawa (September 8), New Delhi (September 9), Mexico City (September 21–22), Bogotá (October 4), and Singapore (October 6).

AWS Community DaysAWS Community Day events are community-led conferences to share and learn with one another. In September, the AWS community in the US will run events in the Bay Area, California (September 9) and Arlington, Virginia (September 30). In Europe, Community Day events will be held in October. Join us in Amersfoort, Netherlands (October 3), Warsaw, Poland (October 14), and Dresden, Germany (October 19).

That’s all from me for this week. Come back next Monday for another Week in Review!

Danilo

Graviton Fast Start – A New Program to Help Move Your Workloads to AWS Graviton

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/graviton-fast-start-a-new-program-to-help-move-your-workloads-to-aws-graviton/

With the Graviton Challenge last year, we helped customers migrate to Graviton-based EC2 instances and get up to 40 percent price performance benefit in as little as 4 days. Tens of thousands of customers, including 48 of the top 50 Amazon Elastic Compute Cloud (Amazon EC2) customers, use AWS Graviton processors for their workloads. In addition to EC2, many AWS managed services can run their workloads on Graviton. For most customers, adoption is easy, requiring minimal code changes. However, the effort and time required to move workloads to Graviton depends on a few factors including your software development environment and the technology stack on which your application is built.

This year, we want to take it a step further and make it even easier for customers to adopt Graviton not only through EC2, but also through managed services. Today, we are launching AWS Graviton Fast Start, a new program that makes it even easier to move your workloads to AWS Graviton by providing step-by-step directions for EC2 and other managed services that support the Graviton platform:

  • Amazon Elastic Compute Cloud (Amazon EC2) – EC2 provides the most flexible environment for a migration and can support many kinds of workloads, such as web apps, custom databases, or analytics. You have full control over the interpreted or compiled code running in the EC2 instance. You can also use many open-source and commercial software products that support the Arm64 architecture.
  • AWS Lambda – Migrating your serverless functions can be really easy, especially if you use an interpreted runtime such as Node.js or Python. Most of the time, you only have to check the compatibility of your software dependencies. I have shown a few examples in this blog post.
  • AWS Fargate – Fargate works best if your applications are already running in containers or if you are planning to containerize them. By using multi-architecture container images or images that have Arm64 in their image manifest, you get the serverless benefits of Fargate and the price-performance advantages of Graviton.
  • Amazon Aurora – Relational databases are at the core of many applications. If you need a database compatible with PostgreSQL or MySQL, you can use Amazon Aurora to have a highly performant and globally available database powered by Graviton.
  • Amazon Relational Database Service (RDS) – Similarly to Aurora, Amazon RDS engines such as PostgreSQL, MySQL, and MariaDB can provide a fully managed relational database service using Graviton-based instances.
  • Amazon ElastiCache – When your workload requires ultra-low latency and high throughput, you can speed up your applications with ElastiCache and have a fully managed in-memory cache running on Graviton and compatible with Redis or Memcached.
  • Amazon EMR – With Amazon EMR, you can run large-scale distributed data processing jobs, interactive SQL queries, and machine learning applications on Graviton using open-source analytics frameworks such as Apache SparkApache Hive, and Presto.

Here’s some feedback we got from customers running their workloads on Graviton:

  • Formula 1 racing told us that Graviton2-based C6gn instances provided the best price performance benefits for some of their computational fluid dynamics (CFD) workloads. More recently, they found that Graviton3 C7g instances are 40 percent faster for the same simulations and expect Graviton3-based instances to become the optimal choice to run all of their CFD workloads.
  • Honeycomb has 100 percent of their production workloads running on Graviton using EC2 and Lambda. They have tested the high-throughput telemetry ingestion workload they use for their observability platform against early preview instances of Graviton3 and have seen a 35 percent performance increase for their workload over Graviton2. They were able to run 30 percent fewer instances of C7g than C6g serving the same workload and with 30 percent reduced latency. With these instances in production, they expect over 50 percent price performance improvement over x86 instances.
  • Twitter is working on a multi-year project to leverage Graviton-based EC2 instances to deliver Twitter timelines. As part of their ongoing effort to drive further efficiencies, they tested the new Graviton3-based C7g instances. Across a number of benchmarks representative of their workloads, they found Graviton3-based C7g instances deliver 20-80 percent higher performance compared to Graviton2-based C6g instances, while also reducing tail latencies by as much as 35 percent. They are excited to utilize Graviton3-based instances in the future to realize significant price performance benefits.

With all these options, getting the benefits of running all or part of your workload on AWS Graviton can be easier than you expect. To help you get started, there’s also a free trial on the Graviton-based T4g instances for up to 750 hours per month through December 31st, 2022.

Visit AWS Graviton Fast Start to get step-by-step directions on how to move your workloads to AWS Graviton.

Danilo

New for AWS Global Accelerator – Internet Protocol Version 6 (IPv6) Support

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-global-accelerator-internet-protocol-version-6-ipv6-support/

IPv6 adoption has consistently increased over the last few years, especially among mobile networks. The main reasons to move to IPv6 are:

  • The limited availability of IPv4 addresses can limit the ability to scale up public-facing web and applications servers.
  • IPv6 users from mobile networks experience better performance when their network traffic doesn’t need to manage IPv6 to IPv4 translation.
  • You might need to comply with regulatory rules (such as the Federal Acquisition Regulation in US) to run specific internet traffic over IPv6.

Based on this, we found that we could help improve the network path that your customers use to reach your applications by adding IPv6 support to AWS Global Accelerator. Global Accelerator uses the AWS global network to route network traffic and keep packet loss, jitter, and latency consistently low. Customers like Atlassian, New Relic, and SkyScanner already use Global Accelerator to improve the global availability and performance of their applications.

Global Accelerator provides two global static public IPs that act as a fixed entry point to your application. You can update your application endpoints without making user-facing changes to the IP address. If you configure more than one application endpoint, Global Accelerator automatically reroutes your traffic to your nearest healthy available endpoint to mitigate endpoint failure.

Starting today, you can provide better network performance by routing IPv6 traffic through Global Accelerator to your application endpoints running in AWS Regions. Global Accelerator now supports two types of accelerators: dual-stack and IPv4-only. With a dual-stack accelerator, you are provided with a pair of IPv4 and IPv6 global static IP addresses that can serve both IPv4 and IPv6 traffic.

For existing IPv4-only accelerators, you can update your accelerators to dual-stack to serve both IPv4 and IPv6 traffic. This update enables your accelerator to serve IPv6 traffic and doesn’t impact existing IPv4 traffic served by the accelerator.

Dual-stack accelerators supporting both IPv6 and IPv4 traffic require dual-stack endpoints in the back end. For example, Application Load Balancers (ALBs) can have their IP address type configured as either IPv4-only or dual stack, allowing them to accept both IPv4 or IPv6 client connections. Today, dual-stack ALBs are supported as endpoints for dual-stack accelerators.

Deploying a Dual-Stack Application
To test this new feature, I need a dual-stack application with an ALB entry point. The application must be deployed in Amazon Virtual Private Cloud (Amazon VPC) and support IPv6 traffic. I don’t happen to have IPv6-ready VPCs in my account. I can follow these instructions to migrate an existing VPC that supports IPv4 only to IPv6, or I can create a VPC that supports IPv6 addressing. For this post, I choose to create a VPC.

In the AWS Management Console, I navigate to the Amazon VPC Dashboard. I choose Launch VPC Wizard. In the wizard, I enter a value for the Name tag. This value will be used to auto-generate Name tags for all resources in the VPC. Then, I select the option to associate an Amazon-provided IPv6 CIDR block. I leave all other options to their default values and choose Create VPC.

Console screenshot.

After less than a minute, the VPC is ready. I edit the settings of both public subnets to enable the Auto-assign IP settings to automatically request both a public IPv4 address and an IPv6 address for new network interfaces in this subnet.

Console screenshot.

Now, I want to deploy an application in this VPC. The application will be the endpoint for my accelerator. I view and download the WordPress scalable and durable AWS CloudFormation template from the Sample solutions section of the CloudFormation documentation. This template deploys a full WordPress website behind an ALB. The web tier is scalable and implemented as an EC2 Auto Scaling group. The MySQL database is managed by Amazon Relational Database Service (RDS).

Before deploying the stack, I edit the template to make a few changes. First, I add a DBSubnetGroup resource:

"DBSubnetGroup" : {
  "Type": "AWS::RDS::DBSubnetGroup",
  "Properties": {
    "DBSubnetGroupDescription" : "DB subnet group",
    "SubnetIds" : { "Ref" : "Subnets"}
  }
},

Then, I add the DBSubnetGroupName property to the DBInstance resource. In this way, the database created by the template will be deployed in the same subnets (and VPC) as the web servers.

"DBSubnetGroupName" : { "Ref" : "DBSubnetGroup" },

The last change adds the IpAddressType property to the ApplicationLoadBalancer resource to create a dual-stack load balancer that has IPv6 addresses and will be ready to be used with the new dual-stack option of Global Accelerator.

"IpAddressType": "dualstack",

Because IpAddressType is set to dualstack, the ALB created by the stack will also have IPv6 addresses and will be ready to be used with the new dual-stack option of Global Accelerator.

In the CloudFormation console, I create a stack and upload the template I just edited. In the template parameters, I enter a database user and password to use. For the VpcId parameter, I select the IPv6-ready VPC I just created. For the Subnets parameter, I select the two public subnets of the same VPC. After that, I go to the next steps and create the stack.

After a few minutes, the stack creation is complete. To access the website, I need to open network access to the load balancer. In the EC2 console, I create a security group that allows public access using the HTTP and HTTPS protocols (ports 80 and 443).

Console screenshot.

I choose Load balancers from the navigation pane and select the ALB used by my application. In the Security section, I choose Edit security groups and add the security group I just created to allow web access.

Console screenshot.

Now, I look for the dual-stack (A or AAAA Record) DNS name of the load balancer. I open a browser and connect using the DNS name to complete the configuration of WordPress.

Website.

When connecting again to the endpoint, I see my new (and empty) WordPress website.

Website.

Using Dual-Stack Accelerators with Support for Both IPv6 and IPv4 traffic
Now that my application is ready, I add a dual-stack accelerator in front of the dual-stack ALB. In the Global Accelerator console, I choose Create accelerator. I enter a name for the accelerator and choose the Standard accelerator type.

Console screenshot.

To route both IPv4 and IPv6 through this accelerator, I select the Dual-stack option for the IP address type.

Console screenshot.

Then I add a listener for port 80 using the TCP protocol.

Console screenshot.

For that listener, I configure an endpoint group in the AWS Region where I have my application deployed.

Console screenshot.

I choose Application Load Balancer for the Endpoint type and select the ALB in the CloudFormation stack.

Console screenshot.

Then, I choose Create accelerator. After a few minutes, the accelerator is deployed, and I have a dual-stack DNS name to reach the ALB using IPv4 or IPv6 depending on the network used by the client.

Console screenshot.

Now, my customers can use the IPv4 and IPv6 addresses or, even better, the dual-stack DNS name of the accelerator to connect to the WordPress website. If there is a front-end or mobile application my customers use to connect to the WordPress REST APIs, I can use the dual-stack DNS name so that clients will connect using their preferred IPv4 or IPv6 route.

To understand if the communication between Global Accelerator and the ALB is working, I can monitor the new FlowsDrop Amazon CloudWatch metric. This metric tells me if Global Accelerator is unable to route IPv6 traffic through the endpoint. For example, that can happen if, after the creation of the accelerator, the configuration of the ALB is updated to use IPv4 only.

Availability and Pricing
You can configure dual-stack accelerators using the AWS Management Console, the AWS Command Line Interface (CLI), and AWS SDKs. You can use dual-stack accelerators to optimize access to your applications deployed in any commercial AWS Region.

Protocol translation is not supported, neither IPv4 to IPv6 nor IPv6 to IPv4. For example, Global Accelerator will not allow me to configure a dual-stack accelerator with an IPv4-only ALB endpoint. Also, for IPv6 ALB endpoints, client IP preservation must be enabled.

There are no additional costs for using dual-stack accelerators. You pay for the hours and the amount of data transfer in the dominant direction used by traffic to or from the accelerator. Data transfer costs depend on the location of your clients and the AWS Regions where you are running your applications. For more information, see the Global Accelerator pricing page.

Optimize the IPv6 and IPv4 network paths used by your customers to reach your applications with AWS Global Accelerator.

Danilo

New for Amazon GuardDuty – Malware Detection for Amazon EBS Volumes

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-guardduty-malware-detection-for-amazon-ebs-volumes/

With Amazon GuardDuty, you can monitor your AWS accounts and workloads to detect malicious activity. Today, we are adding to GuardDuty the capability to detect malware. Malware is malicious software that is used to compromise workloads, repurpose resources, or gain unauthorized access to data. When you have GuardDuty Malware Protection enabled, a malware scan is initiated when GuardDuty detects that one of your EC2 instances or container workloads running on EC2 is doing something suspicious. For example, a malware scan is triggered when an EC2 instance is communicating with a command-and-control server that is known to be malicious or is performing denial of service (DoS) or brute-force attacks against other EC2 instances.

GuardDuty supports many file system types and scans file formats known to be used to spread or contain malware, including Windows and Linux executables, PDF files, archives, binaries, scripts, installers, email databases, and plain emails.

When potential malware is identified, actionable security findings are generated with information such as the threat and file name, the file path, the EC2 instance ID, resource tags and, in the case of containers, the container ID and the container image used. GuardDuty supports container workloads running on EC2, including customer-managed Kubernetes clusters or individual Docker containers. If the container is managed by Amazon Elastic Kubernetes Service (EKS) or Amazon Elastic Container Service (Amazon ECS), the findings also include the cluster name and the task or pod ID so application and security teams can quickly find the affected container resources.

As with all other GuardDuty findings, malware detections are sent to the GuardDuty console, pushed through Amazon EventBridge, routed to AWS Security Hub, and made available in Amazon Detective for incident investigation.

How GuardDuty Malware Protection Works
When you enable malware protection, you set up an AWS Identity and Access Management (IAM) service-linked role that grants GuardDuty permissions to perform malware scans. When a malware scan is initiated for an EC2 instance, GuardDuty Malware Protection uses those permissions to take a snapshot of the attached Amazon Elastic Block Store (EBS) volumes that are less than 1 TB in size and then restore the EBS volumes in an AWS service account in the same AWS Region to scan them for malware. You can use tagging to include or exclude EC2 instances from those permissions and from scanning. In this way, you don’t need to deploy security software or agents to monitor for malware, and scanning the volumes doesn’t impact running workloads. The EBS volumes in the service account and the snapshots in your account are deleted after the scan. Optionally, you can preserve the snapshots when malware is detected.

The service-linked role grants GuardDuty access to AWS Key Management Service (AWS KMS) keys used to encrypt EBS volumes. If the EBS volumes attached to a potentially compromised EC2 instance are encrypted with a customer-managed key, GuardDuty Malware Protection uses the same key to encrypt the replica EBS volumes as well. If the volumes are not encrypted, GuardDuty uses its own key to encrypt the replica EBS volumes and ensure privacy. Volumes encrypted with EBS-managed keys are not supported.

Security in cloud is a shared responsibility between you and AWS. As a guardrail, the service-linked role used by GuardDuty Malware Protection cannot perform any operation on your resources (such as EBS snapshots and volumes, EC2 instances, and KMS keys) if it has the GuardDutyExcluded tag. Once you mark your snapshots with GuardDutyExcluded set to true, the GuardDuty service won’t be able to access these snapshots. The GuardDutyExcluded tag supersedes any inclusion tag. Permissions also restrict how GuardDuty can modify your snapshot so that they cannot be made public while shared with the GuardDuty service account.

The EBS volumes created by GuardDuty are always encrypted. GuardDuty can use KMS keys only on EBS snapshots that have a GuardDuty scan ID tag. The scan ID tag is added by GuardDuty when snapshots are created after an EC2 finding. The KMS keys that are shared with GuardDuty service account cannot be invoked from any other context except the Amazon EBS service. Once the scan completes successfully, the KMS key grant is revoked and the volume replica in GuardDuty service account is deleted, making sure GuardDuty service cannot access your data after completing the scan operation.

Enabling Malware Protection for an AWS Account
If you’re not using GuardDuty yet, Malware Protection is enabled by default when you activate GuardDuty for your account. Because I am already using GuardDuty, I need to enable Malware Protection from the console. If you’re using AWS Organizations, your delegated administrator accounts can enable this for existing member accounts and configure if new AWS accounts in the organization should be automatically enrolled.

In the GuardDuty console, I choose Malware Protection under Settings in the navigation pane. There, I choose Enable and then Enable Malware Protection.

Console screenshot.

Snapshots are automatically deleted after they are scanned. In General settings, I have the option to retain in my AWS account the snapshots where malware is detected and have them available for further analysis.

Console screenshot.

In Scan options, I can configure a list of inclusion tags, so that only EC2 instances with those tags are scanned, or exclusion tags, so that EC2 instances with tags in the list are skipped.

Console screenshot.

Testing Malware Protection GuardDuty Findings
To generate several Amazon GuardDuty findings, including the new Malware Protection findings, I clone the Amazon GuardDuty Tester repo:

$ git clone https://github.com/awslabs/amazon-guardduty-tester

First, I create an AWS CloudFormation stack using the guardduty-tester.template file. When the stack is ready, I follow the instructions to configure my SSH client to log in to the tester instance through the bastion host. Then, I connect to the tester instance:

$ ssh tester

From the tester instance, I start the guardduty_tester.sh script to generate the findings:

$ ./guardduty_tester.sh 

***********************************************************************
* Test #1 - Internal port scanning                                    *
* This simulates internal reconaissance by an internal actor or an   *
* external actor after an initial compromise. This is considered a    *
* low priority finding for GuardDuty because its not a clear indicator*
* of malicious intent on its own.                                     *
***********************************************************************


Starting Nmap 6.40 ( http://nmap.org ) at 2022-05-19 09:36 UTC
Nmap scan report for ip-172-16-0-20.us-west-2.compute.internal (172.16.0.20)
Host is up (0.00032s latency).
Not shown: 997 filtered ports
PORT     STATE  SERVICE
22/tcp   open   ssh
80/tcp   closed http
5050/tcp closed mmcc
MAC Address: 06:25:CB:F4:E0:51 (Unknown)

Nmap done: 1 IP address (1 host up) scanned in 4.96 seconds

-----------------------------------------------------------------------

***********************************************************************
* Test #2 - SSH Brute Force with Compromised Keys                     *
* This simulates an SSH brute force attack on an SSH port that we    *
* can access from this instance. It uses (phony) compromised keys in  *
* many subsequent attempts to see if one works. This is a common      *
* techique where the bad actors will harvest keys from the web in     *
* places like source code repositories where people accidentally leave*
* keys and credentials (This attempt will not actually succeed in     *
* obtaining access to the target linux instance in this subnet)       *
***********************************************************************

2022-05-19 09:36:29 START
2022-05-19 09:36:29 Crowbar v0.4.3-dev
2022-05-19 09:36:29 Trying 172.16.0.20:22
2022-05-19 09:36:33 STOP
2022-05-19 09:36:33 No results found...
2022-05-19 09:36:33 START
2022-05-19 09:36:33 Crowbar v0.4.3-dev
2022-05-19 09:36:33 Trying 172.16.0.20:22
2022-05-19 09:36:37 STOP
2022-05-19 09:36:37 No results found...
2022-05-19 09:36:37 START
2022-05-19 09:36:37 Crowbar v0.4.3-dev
2022-05-19 09:36:37 Trying 172.16.0.20:22
2022-05-19 09:36:41 STOP
2022-05-19 09:36:41 No results found...
2022-05-19 09:36:41 START
2022-05-19 09:36:41 Crowbar v0.4.3-dev
2022-05-19 09:36:41 Trying 172.16.0.20:22
2022-05-19 09:36:45 STOP
2022-05-19 09:36:45 No results found...
2022-05-19 09:36:45 START
2022-05-19 09:36:45 Crowbar v0.4.3-dev
2022-05-19 09:36:45 Trying 172.16.0.20:22
2022-05-19 09:36:48 STOP
2022-05-19 09:36:48 No results found...
2022-05-19 09:36:49 START
2022-05-19 09:36:49 Crowbar v0.4.3-dev
2022-05-19 09:36:49 Trying 172.16.0.20:22
2022-05-19 09:36:52 STOP
2022-05-19 09:36:52 No results found...
2022-05-19 09:36:52 START
2022-05-19 09:36:52 Crowbar v0.4.3-dev
2022-05-19 09:36:52 Trying 172.16.0.20:22
2022-05-19 09:36:56 STOP
2022-05-19 09:36:56 No results found...
2022-05-19 09:36:56 START
2022-05-19 09:36:56 Crowbar v0.4.3-dev
2022-05-19 09:36:56 Trying 172.16.0.20:22
2022-05-19 09:37:00 STOP
2022-05-19 09:37:00 No results found...
2022-05-19 09:37:00 START
2022-05-19 09:37:00 Crowbar v0.4.3-dev
2022-05-19 09:37:00 Trying 172.16.0.20:22
2022-05-19 09:37:04 STOP
2022-05-19 09:37:04 No results found...
2022-05-19 09:37:04 START
2022-05-19 09:37:04 Crowbar v0.4.3-dev
2022-05-19 09:37:04 Trying 172.16.0.20:22
2022-05-19 09:37:08 STOP
2022-05-19 09:37:08 No results found...
2022-05-19 09:37:08 START
2022-05-19 09:37:08 Crowbar v0.4.3-dev
2022-05-19 09:37:08 Trying 172.16.0.20:22
2022-05-19 09:37:12 STOP
2022-05-19 09:37:12 No results found...
2022-05-19 09:37:12 START
2022-05-19 09:37:12 Crowbar v0.4.3-dev
2022-05-19 09:37:12 Trying 172.16.0.20:22
2022-05-19 09:37:16 STOP
2022-05-19 09:37:16 No results found...
2022-05-19 09:37:16 START
2022-05-19 09:37:16 Crowbar v0.4.3-dev
2022-05-19 09:37:16 Trying 172.16.0.20:22
2022-05-19 09:37:20 STOP
2022-05-19 09:37:20 No results found...
2022-05-19 09:37:20 START
2022-05-19 09:37:20 Crowbar v0.4.3-dev
2022-05-19 09:37:20 Trying 172.16.0.20:22
2022-05-19 09:37:23 STOP
2022-05-19 09:37:23 No results found...
2022-05-19 09:37:23 START
2022-05-19 09:37:23 Crowbar v0.4.3-dev
2022-05-19 09:37:23 Trying 172.16.0.20:22
2022-05-19 09:37:27 STOP
2022-05-19 09:37:27 No results found...
2022-05-19 09:37:27 START
2022-05-19 09:37:27 Crowbar v0.4.3-dev
2022-05-19 09:37:27 Trying 172.16.0.20:22
2022-05-19 09:37:31 STOP
2022-05-19 09:37:31 No results found...
2022-05-19 09:37:31 START
2022-05-19 09:37:31 Crowbar v0.4.3-dev
2022-05-19 09:37:31 Trying 172.16.0.20:22
2022-05-19 09:37:34 STOP
2022-05-19 09:37:34 No results found...
2022-05-19 09:37:35 START
2022-05-19 09:37:35 Crowbar v0.4.3-dev
2022-05-19 09:37:35 Trying 172.16.0.20:22
2022-05-19 09:37:38 STOP
2022-05-19 09:37:38 No results found...
2022-05-19 09:37:38 START
2022-05-19 09:37:38 Crowbar v0.4.3-dev
2022-05-19 09:37:38 Trying 172.16.0.20:22
2022-05-19 09:37:42 STOP
2022-05-19 09:37:42 No results found...
2022-05-19 09:37:42 START
2022-05-19 09:37:42 Crowbar v0.4.3-dev
2022-05-19 09:37:42 Trying 172.16.0.20:22
2022-05-19 09:37:46 STOP
2022-05-19 09:37:46 No results found...

-----------------------------------------------------------------------

***********************************************************************
* Test #3 - RDP Brute Force with Password List                        *
* This simulates an RDP brute force attack on the internal RDP port  *
* of the windows server that we installed in the environment.  It uses*
* a list of common passwords that can be found on the web. This test  *
* will trigger a detection, but will fail to get into the target      *
* windows instance.                                                   *
***********************************************************************

Sending 250 password attempts at the windows server...
Hydra v9.4-dev (c) 2022 by van Hauser/THC & David Maciejak - Please do not use in military or secret service organizations, or for illegal purposes (this is non-binding, these *** ignore laws and ethics anyway).

Hydra (https://github.com/vanhauser-thc/thc-hydra) starting at 2022-05-19 09:37:46
[WARNING] rdp servers often don't like many connections, use -t 1 or -t 4 to reduce the number of parallel connections and -W 1 or -W 3 to wait between connection to allow the server to recover
[INFO] Reduced number of tasks to 4 (rdp does not like many parallel connections)
[WARNING] the rdp module is experimental. Please test, report - and if possible, fix.
[DATA] max 4 tasks per 1 server, overall 4 tasks, 1792 login tries (l:7/p:256), ~448 tries per task
[DATA] attacking rdp://172.16.0.24:3389/
[STATUS] 1099.00 tries/min, 1099 tries in 00:01h, 693 to do in 00:01h, 4 active
1 of 1 target completed, 0 valid password found
Hydra (https://github.com/vanhauser-thc/thc-hydra) finished at 2022-05-19 09:39:23

-----------------------------------------------------------------------

***********************************************************************
* Test #4 - CryptoCurrency Mining Activity                            *
* This simulates interaction with a cryptocurrency mining pool which *
* can be an indication of an instance compromise. In this case, we are*
* only interacting with the URL of the pool, but not downloading      *
* any files. This will trigger a threat intel based detection.        *
***********************************************************************

Calling bitcoin wallets to download mining toolkits

-----------------------------------------------------------------------

***********************************************************************
* Test #5 - DNS Exfiltration                                          *
* A common exfiltration technique is to tunnel data out over DNS      *
* to a fake domain.  Its an effective technique because most hosts    *
* have outbound DNS ports open.  This test wont exfiltrate any data,  *
* but it will generate enough unusual DNS activity to trigger the     *
* detection.                                                          *
***********************************************************************

Calling large numbers of large domains to simulate tunneling via DNS

***********************************************************************
* Test #6 - Fake domain to prove that GuardDuty is working            *
* This is a permanent fake domain that customers can use to prove that*
* GuardDuty is working.  Calling this domain will always generate the *
* Backdoor:EC2/C&CActivity.B!DNS finding type                         *
***********************************************************************

Calling a well known fake domain that is used to generate a known finding

; <<>> DiG 9.11.4-P2-RedHat-9.11.4-26.P2.amzn2.5.2 <<>> GuardDutyC2ActivityB.com any
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11495
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;GuardDutyC2ActivityB.com.	IN	ANY

;; ANSWER SECTION:
GuardDutyC2ActivityB.com. 6943	IN	SOA	ns1.markmonitor.com. hostmaster.markmonitor.com. 2018091906 86400 3600 2592000 172800
GuardDutyC2ActivityB.com. 6943	IN	NS	ns3.markmonitor.com.
GuardDutyC2ActivityB.com. 6943	IN	NS	ns5.markmonitor.com.
GuardDutyC2ActivityB.com. 6943	IN	NS	ns7.markmonitor.com.
GuardDutyC2ActivityB.com. 6943	IN	NS	ns2.markmonitor.com.
GuardDutyC2ActivityB.com. 6943	IN	NS	ns4.markmonitor.com.
GuardDutyC2ActivityB.com. 6943	IN	NS	ns6.markmonitor.com.
GuardDutyC2ActivityB.com. 6943	IN	NS	ns1.markmonitor.com.

;; Query time: 27 msec
;; SERVER: 172.16.0.2#53(172.16.0.2)
;; WHEN: Thu May 19 09:39:23 UTC 2022
;; MSG SIZE  rcvd: 238


*****************************************************************************************************
Expected GuardDuty Findings

Test 1: Internal Port Scanning
Expected Finding: EC2 Instance  i-011e73af27562827b  is performing outbound port scans against remote host. 172.16.0.20
Finding Type: Recon:EC2/Portscan

Test 2: SSH Brute Force with Compromised Keys
Expecting two findings - one for the outbound and one for the inbound detection
Outbound:  i-011e73af27562827b  is performing SSH brute force attacks against  172.16.0.20
Inbound:  172.16.0.25  is performing SSH brute force attacks against  i-0bada13e0aa12d383
Finding Type: UnauthorizedAccess:EC2/SSHBruteForce

Test 3: RDP Brute Force with Password List
Expecting two findings - one for the outbound and one for the inbound detection
Outbound:  i-011e73af27562827b  is performing RDP brute force attacks against  172.16.0.24
Inbound:  172.16.0.25  is performing RDP brute force attacks against  i-0191573dec3b66924
Finding Type : UnauthorizedAccess:EC2/RDPBruteForce

Test 4: Cryptocurrency Activity
Expected Finding: EC2 Instance  i-011e73af27562827b  is querying a domain name that is associated with bitcoin activity
Finding Type : CryptoCurrency:EC2/BitcoinTool.B!DNS

Test 5: DNS Exfiltration
Expected Finding: EC2 instance  i-011e73af27562827b  is attempting to query domain names that resemble exfiltrated data
Finding Type : Trojan:EC2/DNSDataExfiltration

Test 6: C&C Activity
Expected Finding: EC2 instance  i-011e73af27562827b  is querying a domain name associated with a known Command & Control server. 
Finding Type : Backdoor:EC2/C&CActivity.B!DNS

After a few minutes, the findings appear in the GuardDuty console. At the top, I see the malicious files found by the new Malware Protection capability. One of the findings is related to an EC2 instance, the other to an ECS cluster.

Console screenshot.

First, I select the finding related to the EC2 instance. In the panel, I see the information on the instance and the malicious file, such as the file name and path. In the Malware scan details section, the Trigger finding ID points to the original GuardDuty finding that triggered the malware scan. In my case, the original finding was that this EC2 instance was performing RDP brute force attacks against another EC2 instance.

Console screenshot.

Here, I choose Investigate with Detective and, directly from the GuardDuty console, I go to the Detective console to visualize AWS CloudTrail and Amazon Virtual Private Cloud (Amazon VPC) flow data for the EC2 instance, the AWS account, and the IP address affected by the finding. Using Detective, I can analyze, investigate, and identify the root cause of suspicious activities found by GuardDuty.

Console screenshot.

When I select the finding related to the ECS cluster, I have more information on the resource affected, such as the details of the ECS cluster, the task, the containers, and the container images.

Console screenshot.

Using the GuardDuty tester scripts makes it easier to test the overall integration of GuardDuty with other security frameworks you use so that you can be ready when a real threat is detected.

Comparing GuardDuty Malware Protection with Amazon Inspector
At this point, you might ask yourself how GuardDuty Malware Protection relates to Amazon Inspector, a service that scans AWS workloads for software vulnerabilities and unintended network exposure. The two services complement each other and offer different layers of protection:

  • Amazon Inspector offers proactive protection by identifying and remediating known software and application vulnerabilities that serve as an entry point for attackers to compromise resources and install malware.
  • GuardDuty Malware Protection detects malware that is found to be present on actively running workloads. At that point, the system has already been compromised, but GuardDuty can limit the time of an infection and take action before a system compromise results in a business-impacting event.

Availability and Pricing
Amazon GuardDuty Malware Protection is available today in all AWS Regions where GuardDuty is available, excluding the AWS China (Beijing), AWS China (Ningxia), AWS GovCloud (US-East), and AWS GovCloud (US-West) Regions.

At launch, GuardDuty Malware Protection is integrated with these partner offerings:

With GuardDuty, you don’t need to deploy security software or agents to monitor for malware. You only pay for the amount of GB scanned in the file systems (not for the size of the EBS volumes) and for the EBS snapshots during the time they are kept in your account. All EBS snapshots created by GuardDuty are automatically deleted after they are scanned unless you enable snapshot retention when malware is found. For more information, see GuardDuty pricing and EBS pricing. Note that GuardDuty only scans EBS volumes less than 1 TB in size. To help you control costs and avoid repeating alarms, the same volume is not scanned more often than once every 24 hours.

Detect malicious activity and protect your applications from malware with Amazon GuardDuty.

Danilo

Amazon Redshift Serverless – Now Generally Available with New Capabilities

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-redshift-serverless-now-generally-available-with-new-capabilities/

Last year at re:Invent, we introduced the preview of Amazon Redshift Serverless, a serverless option of Amazon Redshift that lets you analyze data at any scale without having to manage data warehouse infrastructure. You just need to load and query your data, and you pay only for what you use. This allows more companies to build a modern data strategy, especially for use cases where analytics workloads are not running 24-7 and the data warehouse is not active all the time. It is also applicable to companies where the use of data expands within the organization and users in new departments want to run analytics without having to take ownership of data warehouse infrastructure.

Today, I am happy to share that Amazon Redshift Serverless is generally available and that we added many new capabilities. We are also reducing Amazon Redshift Serverless compute costs compared to the preview.

You can now create multiple serverless endpoints per AWS account and Region using namespaces and workgroups:

  • A namespace is a collection of database objects and users, such as database name and password, permissions, and encryption configuration. This is where your data is managed and where you can see how much storage is used.
  • A workgroup is a collection of compute resources, including network and security settings. Each workgroup has a serverless endpoint to which you can connect your applications. When configuring a workgroup, you can set up private or publicly accessible endpoints.

Each namespace can have only one workgroup associated with it. Conversely, each workgroup can be associated with only one namespace. You can have a namespace without any workgroup associated with it, for example, to use it only for sharing data with other namespaces in the same or another AWS account or Region.

In your workgroup configuration, you can now use query monitoring rules to help keep your costs under control. Also, the way Amazon Redshift Serverless automatically scales data warehouse capacity is more intelligent to deliver fast performance for demanding and unpredictable workloads.

Let’s see how this works with a quick demo. Then, I’ll show you what you can do with namespaces and workgroups.

Using Amazon Redshift Serverless
In the Amazon Redshift console, I select Redshift serverless in the navigation pane. To get started, I choose Use default settings to configure a namespace and a workgroup with the most common options. For example, I’ll be able to connect using my default VPC and default security group.

Console screenshot.

With the default settings, the only option left to configure is Permissions. Here, I can specify how Amazon Redshift can interact with other services such as S3, Amazon CloudWatch Logs, Amazon SageMaker, and AWS Glue. To load data later, I give Amazon Redshift access to an S3 bucket. I choose Manage IAM roles and then Create IAM role.

Console screenshot.

When creating the IAM role, I select the option to give access to specific S3 buckets and pick an S3 bucket in the same AWS Region. Then, I choose Create IAM role as default to complete the creation of the role and to automatically use it as the default role for the namespace.

Console screenshot.

I choose Save configuration and after a few minutes the database is ready for use. In the Serverless dashboard, I choose Query data to open the Redshift query editor v2. There, I follow the instructions in the Amazon Redshift Database Developer guide to load a sample database. If you want to do a quick test, a few sample databases (including the one I am using here) are already available in the sample_data_dev database. Note also that loading data into Amazon Redshift is not required for running queries. I can use data from an S3 data lake in my queries by creating an external schema and an external table.

The sample database consists of seven tables and tracks sales activity for a fictional “TICKIT” website, where users buy and sell tickets for sporting events, shows, and concerts.

Sample database tables relations

To configure the database schema, I run a few SQL commands to create the users, venue, category, date, event, listing, and sales tables.

Console screenshot.

Then, I download the tickitdb.zip file that contains the sample data for the database tables. I unzip and load the files to a tickit folder in the same S3 bucket I used when configuring the IAM role.

Now, I can use the COPY command to load the data from the S3 bucket into my database. For example, to load data into the users table:

copy users from 's3://MYBUCKET/tickit/allusers_pipe.txt' iam_role default;

The file containing the data for the sales table uses tab-separated values:

copy sales from 's3://MYBUCKET/tickit/sales_tab.txt' iam_role default delimiter '\t' timeformat 'MM/DD/YYYY HH:MI:SS';

After I load data in all tables, I start running some queries. For example, the following query joins five tables to find the top five sellers for events based in California (note that the sample data is for the year 2008):

select sellerid, username, (firstname ||' '|| lastname) as sellername, venuestate, sum(qtysold)
from sales, date, users, event, venue
where sales.sellerid = users.userid
and sales.dateid = date.dateid
and sales.eventid = event.eventid
and event.venueid = venue.venueid
and year = 2008
and venuestate = 'CA'
group by sellerid, username, sellername, venuestate
order by 5 desc
limit 5;

Console screenshot.

Now that my database is ready, let’s see what I can do by configuring Amazon Redshift Serverless namespaces and workgroups.

Using and Configuring Namespaces
Namespaces are collections of database data and their security configurations. In the navigation pane of the Amazon Redshift console, I choose Namespace configuration. In the list, I choose the default namespace that I just created.

In the Data backup tab, I can create or restore a snapshot or restore data from one of the recovery points that are automatically created every 30 minutes and kept for 24 hours. That can be useful to recover data in case of accidental writes or deletes.

Console screenshot.

In the Security and encryption tab, I can update permissions and encryption settings, including the AWS Key Management Service (AWS KMS) key used to encrypt and decrypt my resources. In this tab, I can also enable audit logging and export the user, connection, and user activity logs.

Console screenshot.

In the Datashares tab, I can create a datashare to share data with other namespaces and AWS accounts in the same or different Regions. In this tab, I can also create a database from a share I receive from other namespaces or AWS accounts, and I can see the subscriptions for datashares managed by AWS Data Exchange.

Console screenshot.

When I create a datashare, I can select which objects to include. For example, here I want to share only the date and event tables because they don’t contain sensitive data.

Console screenshot.

Using and Configuring Workgroups
Workgroups are collections of compute resources and their network and security settings. They provide the serverless endpoint for the namespace they are configured for. In the navigation pane of the Amazon Redshift console, I choose Workgroup configuration. In the list, I choose the default namespace that I just created.

In the Data access tab, I can update the network and security settings (for example, change the VPC, the subnets, or the security group) or make the endpoint publicly accessible. In this tab, I can also enable Enhanced VPC routing to route network traffic between my serverless database and the data repositories I use (for example, the S3 buckets used to load or unload data) through a VPC instead of the internet. To access serverless endpoints that are in another VPC or subnet, I can create a VPC endpoint managed by Amazon Redshift.

Console screenshot.

In the Limits tab, I can configure the base capacity (expressed in Redshift processing units, or RPUs) used to process my queries. Amazon Redshift Serverless scales the capacity to deal with a higher number of users. Here I also have the option to increase the base capacity to speed up my queries or decrease it to reduce costs.

In this tab, I can also set Usage limits to configure daily, weekly, and monthly thresholds to keep my costs predictable. For example, I configured a daily limit of 200 RPU-hours, and a monthly limit of 2,000 RPU-hours for my compute resources. To control the data-transfer costs for cross-Region datashares, I configured a daily limit of 3 TB and a weekly limit of 10 TB. Finally, to limit the resources used by each query, I use Query limits to time out queries running for more than 60 seconds.

Console screenshot.

Availability and Pricing
Amazon Redshift Serverless is generally available today in the US East (Ohio), US East (N. Virginia), US East (Oregon), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Stockholm), and Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), and Asia Pacific (Tokyo) AWS Regions.

You can connect to a workgroup endpoint using your favorite client tools via JDBC/ODBC or with the Amazon Redshift query editor v2, a web-based SQL client application available on the Amazon Redshift console. When using web services-based applications (such as AWS Lambda functions or Amazon SageMaker notebooks), you can access your database and perform queries using the built-in Amazon Redshift Data API.

With Amazon Redshift Serverless, you pay only for the compute capacity your database consumes when active. The compute capacity scales up or down automatically based on your workload and shuts down during periods of inactivity to save time and costs. Your data is stored in managed storage, and you pay a GB-month rate.

To give you improved price performance and the flexibility to use Amazon Redshift Serverless for an even broader set of use cases, we are lowering the price from $0.5 to $0.375 per RPU-hour for the US East (N. Virginia) Region. Similarly, we are lowering the price in other Regions by an average of 25 percent from the preview price. For more information, see the Amazon Redshift pricing page.

To help you get practice with your own use cases, we are also providing $300 in AWS credits for 90 days to try Amazon Redshift Serverless. These credits are used to cover your costs for compute, storage, and snapshot usage of Amazon Redshift Serverless only.

Get insights from your data in seconds with Amazon Redshift Serverless.

Danilo

AWS Week in Review – June 27, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-june-27-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

It’s the beginning of a new week, and I’d like to start with a recap of the most significant AWS news from the previous 7 days. Last week was special because I had the privilege to be at the very first EMEA AWS Heroes Summit in Milan, Italy. It was a great opportunity of mutual learning as this community of experts shared their thoughts with AWS developer advocates, product managers, and technologists on topics such as containers, serverless, and machine learning.

Participants at the EMEA AWS Heroes Summit 2022

Last Week’s Launches
Here are the launches that got my attention last week:

Amazon Connect Cases (available in preview) – This new capability of Amazon Connect provides built-in case management for your contact center agents to create, collaborate on, and resolve customer issues. Learn more in this blog post that shows how to simplify case management in your contact center.

Many updates for Amazon RDS and Amazon AuroraAmazon RDS Custom for Oracle now supports Oracle database 12.2 and 18c, and Amazon RDS Multi-AZ deployments with one primary and two readable standby database instances now supports M5d and R5d instances and is available in more Regions. There is also a Regional expansion for RDS Custom. Finally, PostgreSQL 14, a new major version, is now supported by Amazon Aurora PostgreSQL-Compatible Edition.

AWS WAF Captcha is now generally available – You can use AWS WAF Captcha to block unwanted bot traffic by requiring users to successfully complete challenges before their web requests are allowed to reach resources.

Private IP VPNs with AWS Site-to-Site VPN – You can now deploy AWS Site-to-Site VPN connections over AWS Direct Connect using private IP addresses. This way, you can encrypt traffic between on-premises networks and AWS via Direct Connect connections without the need for public IP addresses.

AWS Center for Quantum Networking – Research and development of quantum computers have the potential to revolutionize science and technology. To address fundamental scientific and engineering challenges and develop new hardware, software, and applications for quantum networks, we announced the AWS Center for Quantum Networking.

Simpler access to sustainability data, plus a global hackathon – The Amazon Sustainability Data Initiative catalog of datasets is now searchable and discoverable through AWS Data Exchange. As part of a new collaboration with the International Research Centre in Artificial Intelligence, under the auspices of UNESCO, you can use the power of the cloud to help the world become sustainable by participating to the Amazon Sustainability Data Initiative Global Hackathon.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A couple of takeaways from the Amazon re:MARS conference:

Amazon CodeWhisperer (preview) – Amazon CodeWhisperer is a coding companion powered by machine learning with support for multiple IDEs and languages.

Synthetic data generation with Amazon SageMaker Ground TruthGenerate labeled synthetic image data that you can combine with real-world data to create more complete training datasets for your ML models.

Some other updates you might have missed:

AstraZeneca’s drug design program built using AWS wins innovation award – AstraZeneca received the BioIT World Innovative Practice Award at the 20th anniversary of the Bio-IT World Conference for its novel augmented drug design platform built on AWS. More in this blog post.

Large object storage strategies for Amazon DynamoDB – A blog post showing different options for handling large objects within DynamoDB and the benefits and disadvantages of each approach.

Amazon DevOps Guru for RDS under the hoodSome details of how DevOps Guru for RDS works, with a specific focus on its scalability, security, and availability.

AWS open-source news and updates – A newsletter curated by my colleague Ricardo to bring you the latest open-source projects, posts, events, and more.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

On June 30, the AWS User Group Ukraine is running an AWS Tech Conference to discuss digital transformation with AWS. Join to learn from many sessions including a fireside chat with Dr. Werner Vogels, CTO at Amazon.com.

That’s all from me for this week. Come back next Monday for another Week in Review!

Danilo

New for AWS DataSync – Move Data Between AWS and Google Cloud Storage or AWS and Microsoft Azure Files

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-datasync-move-data-between-aws-and-google-cloud-storage-or-aws-and-microsoft-azure-files/

Moving data to and from AWS Storage services can be automated and accelerated with AWS DataSync. For example, you can use DataSync to migrate data to AWS, replicate data for business continuity, and move data for analysis and processing in the cloud. You can use DataSync to transfer data to and from AWS Storage services, including Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), and Amazon FSx. DataSync also integrates with Amazon CloudWatch and AWS CloudTrail for logging, monitoring, and alerting.

Today, we added to DataSync the capability to migrate data between AWS Storage services and either Google Cloud Storage or Microsoft Azure Files. In this way, you can simplify your data processing or storage consolidation tasks. This also helps if you need to import, share, and exchange data with customers, vendors, or partners who use Google Cloud Storage or Microsoft Azure Files. DataSync provides end-to-end security, including encryption and integrity validation, to ensure your data arrives securely, intact, and ready to use.

Let’s see how this works in practice.

Preparing the DataSync Agent
First, I need a DataSync agent to read from, or write to, storage located in Google Cloud Storage or Azure Files. I deploy the agent on an Amazon Elastic Compute Cloud (Amazon EC2) instance. The latest DataSync Amazon Machine Image (AMI) ID is stored in the Parameter Store, a capability of AWS Systems Manager. I use the AWS Command Line Interface (CLI) to get the value of the /aws/service/datasync/ami parameter:

aws ssm get-parameter --name /aws/service/datasync/ami --region us-east-1
{
    "Parameter": {
        "Name": "/aws/service/datasync/ami",
        "Type": "String",
        "Value": "ami-0e244fe801cf5a510",
        "Version": 54,
        "LastModifiedDate": "2022-05-11T14:08:09.319000+01:00",
        "ARN": "arn:aws:ssm:us-east-1::parameter/aws/service/datasync/ami",
        "DataType": "text"
    }
}

Using the EC2 console, I start an EC2 instance using the AMI ID specified in the Value property of the parameter. For networking, I use a public subnet and the option to auto-assign a public IP address. The EC2 instance needs network access to both the source and the destination of a data moving task. Another requirement for the instance is to be able to receive HTTP traffic from DataSync to activate the agent.

When using AWS DataSync in a virtual private cloud (VPC) based on the Amazon VPC service, it is a best practice to use VPC endpoints to connect the agent with the DataSync service. In the VPC console, I choose Endpoints on the navigation pane and then Create endpoint. I enter a name for the endpoint and select the AWS services category.

Console screenshot.

In the Services section, I look for DataSync.

Console screenshot.

Then, I select the same VPC where I started the EC2 instance.

Console screenshot.

To reduce cross-AZ traffic, I choose the same subnet used for the EC2 instance.

Console screenshot.

The DataSync agent running on the EC2 instance needs network access to the VPC endpoint. For simplicity, I use the default security group of the VPC for both. I create the VPC endpoint and, after a few minutes, it’s ready to be used.

Console screenshot.

In the AWS DataSync console, I choose Agents from the navigation pane and then Create agent. I select Amazon EC2 for the Hypervisor.

Console screenshot.

I choose VPC endpoints using AWS PrivateLink for the Endpoint type. I select the VPC endpoint I created before and the same Subnet and Security group I used for the VPC endpoint.

I choose the option to Automatically get the activation key and type the public IP of the EC2 instance. Then, I choose Get key.

Console screenshot.

After the DataSync agent has been activated, I don’t need HTTP access anymore, and I remove that from the security groups of the EC2 instance. Now that the DataSync agent is active, I can configure tasks and locations to move my data.

Moving Data from Google Cloud Storage to Amazon S3
I have a few images in a Google Cloud Storage bucket, and I want to synchronize those files with an S3 bucket. In the Google Cloud console, I open the settings of the bucket. There, I create a service account with Storage Object Viewer permissions and write down the credentials (access key and secret) to access the bucket programmatically.

Back in the AWS DataSync console, I choose Tasks and then Create task.

To configure the source of the task, I create a location. I select Object storage for the Location type and choose the agent I just created. For the Server, I use storage.googleapis.com. Then, I enter the name of the Google Cloud bucket and the folder where my images are stored.

Console screenshot.

For authentication, I enter the access key and the secret I retrieved when I created the service account. I choose Next.

Console screenshot.

To configure the destination of the task, I create another location. This time, I select Amazon S3 for the Location Type. I choose the destination S3 bucket and enter a folder that will be used as a prefix for the files transferred to the bucket. I use the Autogenerate button to create the IAM role that will give DataSync permissions to access the S3 bucket.

Console screenshot.

In the next step, I configure the task settings. I enter a name for the task. Optionally, I can fine-tune how DataSync verifies the integrity of the transferred data or allocate a bandwidth for the task.

Console screenshot.

I can also choose what data to scan and what to transfer. By default, all source data is scanned, and only data that has changed is transferred. In the Additional settings, I disable Copy object tags because tags are currently not supported with Google Cloud Storage.

Console screenshot.

I can select the schedule used to run this task. For now, I leave it Not scheduled, and I will start it manually.

Console screenshot.

For logging, I use the Autogenerate button to create a log group for DataSync. I choose Next.

Console screenshot.

I review the configurations and create the task. Now, I start the data moving task from the console. After a few minutes, the files are synced with my S3 bucket and I can access them from the S3 console.

Console screenshot.

Moving Data from Azure Files to Amazon FSx for Windows File Server
I take a lot of pictures, and I also have a few images in an Azure file share. I want to synchronize those files with an Amazon FSx for Windows file system. In the Azure console, I select the file share and choose the Connect button to generate a PowerShell script that checks if this storage account is accessible over the network.

$connectTestResult = Test-NetConnection -ComputerName <SMB_SERVER> -Port 445
if ($connectTestResult.TcpTestSucceeded) {
    # Save the password so the drive will persist on reboot
    cmd.exe /C "cmdkey /add:`"danilopsync.file.core.windows.net`" /user:`"localhost\<USER>`" /pass:`"<PASSWORD>`""
    # Mount the drive
    New-PSDrive -Name Z -PSProvider FileSystem -Root "\\danilopsync.file.core.windows.net\<SHARE_NAME>" -Persist
} else {
    Write-Error -Message "Unable to reach the Azure storage account via port 445. Check to make sure your organization or ISP is not blocking port 445, or use Azure P2S VPN, Azure S2S VPN, or Express Route to tunnel SMB traffic over a different port."
}

From this script, I grab the information I need to configure the DataSync location:

  • SMB Server
  • Share Name
  • User
  • Password

Back in the AWS DataSync console, I choose Tasks and then Create task.

To configure the source of the task, I create a location. I select Server Message Block (SMB) for the Location Type and the agent I created before. Then, I use the information I found in the script to enter the SMB Server address, the Share name, and the User/Password to use for authentication.

Console screenshot.

To configure the destination of the task, I again create a location. This time, I choose Amazon FSx for the Location type. I select an FSx for Windows file system that I created before and use the default share name. I use the default security group to connect to the file system. Because I am using AWS Directory Service for Microsoft Active Directory with FSx for Windows File Server, I use the credentials of a user member of the AWS Delegated FSx Administrators and Domain Admins groups. For more information, see Creating a location for FSx for Windows File Server in the documentation.

Console screenshot.

In the next step, I enter a name for the task and leave all other options to their default values in the same way I did for the previous task.

Console screenshot.

I review the configurations and create the task. Now, I start the data moving task from the console. After a few minutes, the files are synched with my FSx for Windows file system share. I mount the file system share with a Windows EC2 instance and see that my images are there.

EC2 screenshot.

When creating a task, I can reuse existing locations. For example, if I want to synchronize files from Azure Files to my S3 bucket, I can quickly select the two corresponding locations I created for this post.

Availability and Pricing
You can move your data using the AWS DataSync console, AWS Command Line Interface (CLI), or AWS SDKs to create tasks that move data between AWS storage and Google Cloud Storage buckets or Azure Files file systems. As your tasks run, you can monitor progress from the DataSync console or by using CloudWatch.

There are no changes to DataSync pricing with these new capabilities. Moving data to and from Google Cloud or Microsoft Azure is charged at the same rate as all other data sources supported by DataSync today.

You may be subject to data transfer out fees by Google Cloud or Microsoft Azure. Because DataSync compresses data in flight when copying between the agent and AWS, you may be able to reduce egress fees by deploying the DataSync agent in a Google Cloud or Microsoft Azure environment.

When using DataSync to move data from AWS to Google Cloud or Microsoft Azure, you are charged for data transfer out from EC2 to the internet. See Amazon EC2 pricing for more information.

Automate and accelerate the way you move data with AWS DataSync.

Danilo

AWS Week in Review – May 9, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-may-9-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Another week starts, and here’s a collection of the most significant AWS news from the previous seven days. This week is also the one-year anniversary of CloudFront Functions. It’s exciting to see what customers have built during this first year.

Last Week’s Launches
Here are some launches that caught my attention last week:

Amazon RDS supports PostgreSQL 14 with three levels of cascaded read replicas – That’s 5 replicas per instance, supporting a maximum of 155 read replicas per source instance with up to 30X more read capacity. You can now build a more robust disaster recovery architecture with the capability to create Single-AZ or Multi-AZ cascaded read replica DB instances in same or cross Region.

Amazon RDS on AWS Outposts storage auto scalingAWS Outposts extends AWS infrastructure, services, APIs, and tools to virtually any datacenter. With Amazon RDS on AWS Outposts, you can deploy managed DB instances in your on-premises environments. Now, you can turn on storage auto scaling when you create or modify DB instances by selecting a checkbox and specifying the maximum database storage size.

Amazon CodeGuru Reviewer suppression of files and folders in code reviews – With CodeGuru Reviewer, you can use automated reasoning and machine learning to detect potential code defects that are difficult to find and get suggestions for improvements. Now, you can prevent CodeGuru Reviewer from generating unwanted findings on certain files like test files, autogenerated files, or files that have not been recently updated.

Amazon EKS console now supports all standard Kubernetes resources to simplify cluster management – To make it easy to visualize and troubleshoot your applications, you can now use the console to see all standard Kubernetes API resource types (such as service resources, configuration and storage resources, authorization resources, policy resources, and more) running on your Amazon EKS cluster. More info in the blog post Introducing Kubernetes Resource View in Amazon EKS console.

AWS AppConfig feature flag Lambda Extension support for Arm/Graviton2 processors – Using AWS AppConfig, you can create feature flags or other dynamic configuration and safely deploy updates. The AWS AppConfig Lambda Extension allows you to access this feature flag and dynamic configuration data in your Lambda functions. You can now use the AWS AppConfig Lambda Extension from Lambda functions using the Arm/Graviton2 architecture.

AWS Serverless Application Model (SAM) CLI now supports enabling AWS X-Ray tracing – With the AWS SAM CLI you can initialize, build, package, test on local and cloud, and deploy serverless applications. With AWS X-Ray, you have an end-to-end view of requests as they travel through your application, making them easier to monitor and troubleshoot. Now, you can enable tracing by simply adding a flag to the sam init command.

Amazon Kinesis Video Streams image extraction – With Amazon Kinesis Video Streams you can capture, process, and store media streams. Now, you can also request images via API calls or configure automatic image generation based on metadata tags in ingested video. For example, you can use this to generate thumbnails for playback applications or to have more data for your machine learning pipelines.

AWS GameKit supports Android, iOS, and MacOS games developed with Unreal Engine – With AWS GameKit, you can build AWS-powered game features directly from the Unreal Editor with just a few clicks. Now, the AWS GameKit plugin for Unreal Engine supports building games for the Win64, MacOS, Android, and iOS platforms.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates you might have missed:

🎂 One-year anniversary of CloudFront Functions – I can’t believe it’s been one year since we launched CloudFront Functions. Now, we have tens of thousands of developers actively using CloudFront Functions, with trillions of invocations per month. You can use CloudFront Functions for HTTP header manipulation, URL rewrites and redirects, cache key manipulations/normalization, access authorization, and more. See some examples in this repo. Let’s see what customers built with CloudFront Functions:

  • CloudFront Functions enables Formula 1 to authenticate users with more than 500K requests per second. The solution is using CloudFront Functions to evaluate if users have access to view the race livestream by validating a token in the request.
  • Cloudinary is a media management company that helps its customers deliver content such as videos and images to users worldwide. For them, Lambda@Edge remains an excellent solution for applications that require heavy compute operations, but lightweight operations that require high scalability can now be run using CloudFront Functions. With CloudFront Functions, Cloudinary and its customers are seeing significantly increased performance. For example, one of Cloudinary’s customers began using CloudFront Functions, and in about two weeks it was seeing 20–30 percent better response times. The customer also estimates that they will see 75 percent cost savings.
  • Based in Japan, DigitalCube is a web hosting provider for WordPress websites. Previously, DigitalCube spent several hours completing each of its update deployments. Now, they can deploy updates across thousands of distributions quickly. Using CloudFront Functions, they’ve reduced update deployment times from 4 hours to 2 minutes. In addition, faster updates and less maintenance work result in better quality throughout DigitalCube’s offerings. It’s now easier for them to test on AWS because they can run tests that affect thousands of distributions without having to scale internally or introduce downtime.
  • Amazon.com is using CloudFront Functions to change the way it delivers static assets to customers globally. CloudFront Functions allows them to experiment with hyper-personalization at scale and optimal latency performance. They have been working closely with the CloudFront team during product development, and they like how it is easy to create, test, and deploy custom code and implement business logic at the edge.

AWS open-source news and updates – A newsletter curated by my colleague Ricardo to bring you the latest open-source projects, posts, events, and more. Read the latest edition here.

Reduce log-storage costs by automating retention settings in Amazon CloudWatch – By default, CloudWatch Logs stores your log data indefinitely. This blog post shows how you can reduce log-storage costs by establishing a log-retention policy and applying it across all of your log groups.

Observability for AWS App Runner VPC networking – With X-Ray support in App runner, you can quickly deploy web applications and APIs at any scale and take advantage of adding tracing without having to manage sidecars or agents. Here’s an example of how you can instrument your applications with the AWS Distro for OpenTelemetry (ADOT).

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

You can now register for re:MARS to get fresh ideas on topics such as machine learning, automation, robotics, and space. The conference will be in person in Las Vegas, June 21–24.

That’s all from me for this week. Come back next Monday for another Week in Review!

Danilo

New AWS Wavelength Zone in Toronto – The First in Canada

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-aws-wavelength-zone-in-toronto-the-first-in-canada/

Wireless communication has put us closer to each other. 5G networks increase the reach of what we can achieve to new use cases that need end-to-end low latency. With AWS Wavelength, you can deploy AWS compute and storage services within telecommunications providers’ data centers at the edge of the 5G networks. Your applications can then deliver single-digit millisecond latencies to mobile devices and end users and, at the same time, seamlessly access AWS services in the closest AWS Region.

For example, low latency enables new use cases such as:

  • Delivery of high-resolution and high-fidelity live video streaming.
  • Improved experience for augmented/virtual reality (AR/VR) applications.
  • Running machine learning (ML) inference at the edge for applications in medical diagnostics, retail, and factories.
  • Connected vehicle applications with near real-time connectivity with the cloud to improve driver assistance, autonomous driving, and in-vehicle entertainment experiences.

We opened the first AWS Wavelength Zones in 2020 in the US, and then we expanded to new countries, such as Japan, South Korea, the United Kingdom, and Germany. Today, I am happy to share that, in partnership with Bell Canada, we are expanding in a new country with a Wavelength Zone in Toronto.

What You Can Do with AWS Wavelength
As an example of what is possible with Wavelength, let’s look at food deliveries in Toronto. Most deliveries are made within 2 km, and a significant number are for just one item, such as a cup of coffee. Using a car for these deliveries is slow, expensive, and has a large carbon footprint. A better solution is provided by Tiny Mile: they use small remote-controlled robots to deliver small food orders such as coffees and sandwiches at one-tenth the cost of conventional delivery services.

Tiny Mile robot image.

Their remote staff uses the camera feed from the robots to understand the environment, read signage, and drive the robots. To scale up more efficiently, Tiny Mile can now use Bell’s public Multi-access Edge Computing (MEC) solution, delivered through AWS Wavelength, to process data and analyze the video feed in almost real time to detect obstacles and avoid collisions without manual intervention. Having computation at the edge also reduces the weight and the costs of the robots (they don’t need expensive computers onboard) and increases the amount of cargo they can carry.

Using a Wavelength Zone
I follow the instructions in Get started with AWS Wavelength in the documentation. First, I opt in to use the new Wavelength Zone. In the EC2 console for the Canada (Central) Region, I enable New EC2 Experience in the upper-left corner. In the navigation pane, I choose EC2 Dashboard. In the Account attributes section, I choose Zones. There, I enable the Canada (BELL) Wavelength Zone.

Console screenshot.

Now, I can configure networking to use the Wavelength Zone. I can either create an Amazon Virtual Private Cloud (VPC) or extend an existing VPC to include a subnet in a Wavelength Zone. In this case, I want to use a new VPC. In the VPC console, I choose Your VPCs and then Create VPC. I select the VPC only option to create subnets later. I write a name for the VPC and choose the IPv4 CIDR block that will be used for the private addresses of the resources in this VPC. Then, I complete the creation of the VPC.

Console screenshot.

In the navigation pane, I choose Carrier Gateways and then Create carrier gateway. I write a name and select the VPC I just created. I enable Route subnet traffic to the carrier gateway to automatically route traffic from subnets to the carrier gateway.

Console screenshot.

In the Subnets to route section, I configure a subnet residing in the Canada (BELL) – Toronto Wavelength Zone. For the subnet IPv4 CIDR Block, I use a block within the VPC range. Then, I complete the creation of the carrier gateway.

Console screenshot.

Now that networking is configured, I can deploy the portions of my application that require ultra-low latency in the Wavelength Zone and then connect that back to the rest of the application and the cloud services running in the Canada (Central) Region.

To run an EC2 instance in the Wavelength Zone, I use the AWS Command Line Interface (CLI) run-instances command. In this way, I can pass an option to automatically allocate and associate the Carrier IP address with the network interface of the EC2 instance. Another option is to allocate the carrier address and associate it with the network interface after I create the instance. The Carrier IP address is only valid within the telecommunications provider’s network. The carrier gateway uses NAT to translate the Carrier IP address and send traffic to the internet or to mobile devices.

aws ec2 --region ca-central-1 run-instances
--network-interfaces '[{"DeviceIndex":0, "AssociateCarrierIpAddress": true, "SubnetId": "subnet-0d753f7203c2cfd42"}]'
--image-id ami-01d29fca5bdf8f4b4 --instance-type t3.medium

To discover the IP associated with the EC2 instance in the carrier network, I use the describe-instances command:

aws ec2 --region ca-central-1 describe-instances

In the NetworkInterfaces section of the output, I find the Association and the CarrierIP:

"Association": {
  "CarrierIp": "207.61.170.56",
  "IpOwnerId": "amazon",
  "PublicDnsName": ""
}

Now that the EC2 instance is running in the Wavelength Zone, I can deploy a portion of my application in the EC2 instance so that application traffic can be processed at very low latency without leaving the mobile network.

Architectural diagram.

For my next steps, I look at Deploying your first 5G enabled application with AWS Wavelength and follow the walkthrough for a common Wavelength use case: implementing machine learning inference at the edge.

Availability and Pricing
The new Wavelength Zone in Toronto, Canada, is embedded in Bell Canada’s 5G network and is available today. EC2 instances and other AWS resources in Wavelength Zones have different prices than in the parent Region. See the Wavelength pricing page for more information.

AWS Wavelength is part of AWS for the Edge services that help you deliver data processing, analysis, and storage outside AWS data centers and closer to your endpoints. These capabilities allow you to process and store data close to where it’s generated, enabling low-latency, intelligent, and real-time responsiveness.

Start using AWS Wavelength to deliver ultra-low-latency applications for 5G devices.

Danilo

AWS Week in Review – March 21, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-march-21-2022/

This post is part of our Week in Review series. Check back each week for a quick round up of interesting news and announcements from AWS!

Another week, another round up of the most significant AWS launches from the previous seven days! Among the news, we have new AWS Heroes and a cost reduction. Also, improvements for customers using AWS Lambda and Amazon Elastic Kubernetes Service (EKS), and a new database-to-database connectivity option for Amazon Relational Database Service (RDS).

Last Week’s Launches
Here are some launches that caught my attention last week:

AWS Billing Conductor – This new tool provides customizable pricing and cost visibility for your end customers or business units and helps when you have specific showback and chargeback needs. To get started, see Getting Started with AWS Billing Conductor. And yes, you can call it “ABC.”

Cost Reduction for Amazon Route 53 Resolver DNS Firewall – Starting from the beginning of March, we are introducing a new tiered pricing structure that reduces query processing fees as your query volume increases. We are also implementing internal optimizations to reduce the number of DNS queries for which you are charged without affecting the number of DNS queries that are inspected or introducing any other changes to your security posture. For more info, see the What’s New.

Share Test Events in the Lambda Console With Other Developers – You can now share the test events you create in the Lambda console with other team members and have a consistent set of test events across your team. This new capability is based on Amazon EventBridge schemas and is available in the AWS Regions where both Lambda and EventBridge are available. Have a look at the What’s New for more details.

Use containerd with Windows Worker Nodes Managed by Amazon EKS – containerd is a container runtime that manages the complete container lifecycle on its host system with an emphasis on simplicity, robustness, and portability. In this way, you can get on Windows similar performance, security, and stability benefits to those available for Linux worker nodes. Here’s the What’s New with more info.

Amazon RDS for PostgreSQL databases can now connect and retrieve data from MySQL databases – You can connect your RDS PostgreSQL databases to Amazon Aurora MySQL-compatible, MySQL, and MariaDB databases. This capability works by adding support to mysql_fdw, an extension that implements a Foreign Data Wrapper (FDW) for MySQL. Foreign Data Wrappers are libraries that PostgreSQL databases can use to communicate with an external data source. Find more info in the What’s New.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
New AWS Heroes – It’s great to see both new and familiar faces joining the AWS Heroes program, a worldwide initiative that acknowledges individuals who have truly gone above and beyond to share knowledge in technical communities. Get to know them in the blog post!

More Than 400 Points of Presence for Amazon CloudFront – Impressive growth here, doubling the Points of Presence we had in October 2019. This number includes edge locations and mid-tier caches in AWS Regions. Do you know that edge locations are connected to the AWS Regions through the AWS network backbone? It’s a fully redundant, multiple 100GbE parallel fiber that circles the globe and links with tens of thousands of networks for improved origin fetches and dynamic content acceleration.

AWS Open Source News and Updates – A newsletter curated by my colleague Ricardo where he brings you the latest open-source projects, posts, events, and much more. This week he is also sharing a short list of some of the open-source roles currently open across Amazon and AWS, covering a broad range of open-source technologies. Read edition #105 here.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

The AWS Summits Are Back – Don’t forget to register to the AWS Summits in Brussels (on March 31) and Paris (on April 12). More summits are coming in the next weeks, and we’ll let you know in this weekly posts.

That’s all from me for this week. Come back next Monday for another Week in Review!

Danilo

New for Amazon CodeGuru Reviewer – Detector Library and Security Detectors for Log-Injection Flaws

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-codeguru-reviewer-detector-library-and-security-detectors-for-log-injection-flaws/

Amazon CodeGuru Reviewer is a developer tool that detects security vulnerabilities in your code and provides intelligent recommendations to improve code quality. For example, CodeGuru Reviewer introduced Security Detectors for Java and Python code to identify security risks from the top ten Open Web Application Security Project (OWASP) categories and follow security best practices for AWS APIs and common crypto libraries. At re:Invent, CodeGuru Reviewer introduced a secrets detector to identify hardcoded secrets and suggest remediation steps to secure your secrets with AWS Secrets Manager. These capabilities help you find and remediate security issues before you deploy.

Today, I am happy to share two new features of CodeGuru Reviewer:

  • A new Detector Library describes in detail the detectors that CodeGuru Reviewer uses when looking for possible defects and includes code samples for both Java and Python.
  • New security detectors have been introduced for detecting log-injection flaws in Java and Python code, similar to what happened with the recent Apache Log4j vulnerability we described in this blog post.

Let’s see these new features in more detail.

Using the Detector Library
To help you understand more clearly which detectors CodeGuru Reviewer uses to review your code, we are now sharing a Detector Library where you can find detailed information and code samples.

These detectors help you build secure and efficient applications on AWS. In the Detector Library, you can find detailed information about CodeGuru Reviewer’s security and code quality detectors, including descriptions, their severity and potential impact on your application, and additional information that helps you mitigate risks.

Note that each detector looks for a wide range of code defects. We include one noncompliant and compliant code example for each detector. However, CodeGuru uses machine learning and automated reasoning to identify possible issues. For this reason, each detector can find a range of defects in addition to the explicit code example shown on the detector’s description page.

Let’s have a look at a few detectors. One detector is looking for insecure cross-origin resource sharing (CORS) policies that are too permissive and may lead to loading content from untrusted or malicious sources.

Detector Library screenshot.

Another detector checks for improper input validation that can enable attacks and lead to unwanted behavior.

Detector Library screenshot.

Specific detectors help you use the AWS SDK for Java and the AWS SDK for Python (Boto3) in your applications. For example, there are detectors that can detect hardcoded credentials, such as passwords and access keys, or inefficient polling of AWS resources.

New Detectors for Log-Injection Flaws
Following the recent Apache Log4j vulnerability, we introduced in CodeGuru Reviewer new detectors that check if you’re logging anything that is not sanitized and possibly executable. These detectors cover the issue described in CWE-117: Improper Output Neutralization for Logs.

These detectors work with Java and Python code and, for Java, are not limited to the Log4j library. They don’t work by looking at the version of the libraries you use, but check what you are actually logging. In this way, they can protect you if similar bugs happen in the future.

Detector Library screenshot.

Following these detectors, user-provided inputs must be sanitized before they are logged. This avoids having an attacker be able to use this input to break the integrity of your logs, forge log entries, or bypass log monitors.

Availability and Pricing
These new features are available today in all AWS Regions where Amazon CodeGuru is offered. For more information, see the AWS Regional Services List.

The Detector Library is free to browse as part of the documentation. For the new detectors looking for log-injection flaws, standard pricing applies. See the CodeGuru pricing page for more information.

Start using Amazon CodeGuru Reviewer today to improve the security of your code.

Danilo

New for App Runner – VPC Support

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-app-runner-vpc-support/

With AWS App Runner, you can quickly deploy web applications and APIs at any scale. You can start with your source code or a container image, and App Runner will fully manage all infrastructure including servers, networking, and load balancing for your application. If you want, App Runner can also configure a deployment pipeline for you.

Starting today, App Runner enables your services to communicate with databases and other applications hosted in an Amazon Virtual Private Cloud (VPC). For example, you can now connect App Runner services to databases in Amazon Relational Database Service (RDS), Redis or Memcached caches in Amazon ElastiCache, or your own applications running in Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Compute Cloud (Amazon EC2), or on-premises and connected via AWS Direct Connect.

Previously, in order for your App Runner application to connect to these resources, they needed to be publicly accessible over the internet. With this feature, App Runner applications can connect to private endpoints in your VPC, and you can enable a more secure and compliant environment by removing public access to these resources.

Within App Runner, you can now create VPC connectors that specify which VPC, subnets, and security groups to use for private networking. Once configured, you can use a VPC connector with one or more App Runner services.

When connected to a VPC, all outbound traffic from your AppRunner service will be routed based on the VPC routing rules. Services will not have access to the public internet (including AWS APIs) unless allowed by a route to a NAT Gateway. You can also set up VPC endpoints to connect to AWS APIs such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB to avoid NAT traffic.

The VPC connectors in App Runner work similarly to VPC networking in AWS Lambda and are based on AWS Hyperplane, the internal Amazon network function virtualization system behind AWS services and resources like Network Load Balancer, NAT Gateway, and AWS PrivateLink.

Let’s see how this works in practice with a web application connected to an RDS database.

Preparing the Amazon RDS Database
I start by configuring a database for my application. To simplify capacity management for this database, I use Amazon Aurora Serverless. In the RDS console, I create an Amazon Aurora MySQL-Compatible database. For the Capacity type, I choose Serverless. For networking, I use my default VPC and the default security group. I don’t need to make the database publicly accessible because I am going to connect using private VPC networking. To simplify connecting later, I enable AWS Identity and Access Management (IAM) database authentication.

I start an Amazon Linux EC2 instance in the same VPC. To connect from the EC2 instance to the database, I need a MySQL client. I install MariaDB, a community-developed branch of MySQL:

sudo yum install mariadb

Then, I connect to the database using the admin user.

mysql -h <DATABASE_HOST> -u admin -P

I enter the admin user password to log in. Then, I create a new user (bookuser) that is configured to use IAM authentication.

CREATE USER bookuser IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS'; 

I create the bookcase database and give permissions to the bookuser user to query the bookcase database.

CREATE DATABASE bookcase;
GRANT SELECT ON bookcase.* TO 'bookuser'@'%’;

To store information about some of my books, I create the authors and books tables.

CREATE TABLE authors (
  authorId INT,
  name varchar(255)
 );

CREATE TABLE books (
  bookId INT,
  authorId INT,
  title varchar(255),
  year INT
);

Then, I insert some values in the two tables:

INSERT INTO authors VALUES (1, "Issac Asimov");
INSERT INTO authors VALUES (2, "Robert A. Heinlein");
INSERT INTO books VALUES (1, 1, "Foundation", 1951);
INSERT INTO books VALUES (2, 1, "Foundation and Empire", 1952);
INSERT INTO books VALUES (3, 1, "Second Foundation", 1953);
INSERT INTO books VALUES (4, 2, "Stranger in a Strange Land", 1961);

Preparing the Application Source Code Repository
With App Runner, I can deploy a new service from code hosted in a source code repository or using a container image. In this example, I use a private project that I have on GitHub.

It’s a very simple Python web application connecting to the database I just created. This is the source code of the app (server.py):

from wsgiref.simple_server import make_server
from pyramid.config import Configurator
from pyramid.response import Response
import os
import boto3
import mysql.connector

import os

DATABASE_REGION = 'us-east-1'
DATABASE_CERT = 'cert/us-east-1-bundle.pem'
DATABASE_HOST = os.environ['DATABASE_HOST']
DATABASE_PORT = os.environ['DATABASE_PORT']
DATABASE_USER = os.environ['DATABASE_USER']
DATABASE_NAME = os.environ['DATABASE_NAME']

os.environ['LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN'] = '1'

PORT = int(os.environ.get('PORT'))

rds = boto3.client('rds')

try:
    token = rds.generate_db_auth_token(
        DBHostname=DATABASE_HOST,
        Port=DATABASE_PORT,
        DBUsername=DATABASE_USER,
        Region=DATABASE_REGION
    )
    mydb =  mysql.connector.connect(
        host=DATABASE_HOST,
        user=DATABASE_USER,
        passwd=token,
        port=DATABASE_PORT,
        database=DATABASE_NAME,
        ssl_ca=DATABASE_CERT
    )
except Exception as e:
    print('Database connection failed due to {}'.format(e))          

def all_books(request):
    mycursor = mydb.cursor()
    mycursor.execute('SELECT name, title, year FROM authors, books WHERE authors.authorId = books.authorId ORDER BY year')
    title = 'Books'
    message = '<html><head><title>' + title + '</title></head><body>'
    message += '<h1>' + title + '</h1>'
    message += '<ul>'
    for (name, title, year) in mycursor:
        message += '<li>' + name + ' - ' + title + ' (' + str(year) + ')</li>'
    message += '</ul>'
    message += '</body></html>'
    return Response(message)

if __name__ == '__main__':

    with Configurator() as config:
        config.add_route('all_books', '/')
        config.add_view(all_books, route_name='all_books')
        app = config.make_wsgi_app()
    server = make_server('0.0.0.0', PORT, app)
    server.serve_forever()

The application uses the AWS SDK for Python (boto3) for IAM database authentication, the Pyramid web framework, and the MySQL connector for Python. The requirements.txt file describes the application dependencies:

boto3
pyramid==2.0
mysql-connector-python

To use SSL/TLS encryption when connecting to the database, I download a certificate bundle and add it to my source code repository.

Using VPC Support in AWS App Runner
In the App Runner console, I select Source code repository and the branch to use.

Console screenshot.

For the deployment settings, I choose Manual. Optionally, I could have selected the Automatic deployment trigger to have every push to this branch deploy a new version of my service.

Console screenshot.

Then, I configure the build. This is a very simple application, so I pass the build and start commands in the console:

Build commandpip install -r requirements.txt
Start commandpython server.py

For more advanced use cases, I would add an apprunner.yaml configuration file to my repository as in this sample application.

Console screenshot.

In the service configuration, I add the environment variables used by the application to connect to the database. I don’t need to pass a database password here because I am using IAM authentication.

Console screenshot.

In the Security section, I select an IAM role that gives permissions to connect to the database using IAM database authentication as described in Creating and using an IAM policy for IAM database access.

Console screenshot.

Here’s the syntax of the IAM role. I find the database Resource ID in the Configuration tab of the RDS console.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "rds-db:connect"
            ],
            "Resource": [
                "arn:aws:rds-db:<REGION>:<ACCOUNT>:dbuser:<DB_RESOURCE_ID>/<DB_USER>"
            ]
        }
    ]
}

For the role trust policy,   I follow the instruction for instance roles in How App Runner works with IAM.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "tasks.apprunner.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

For Networking, I select the new option to use a Custom VPC for outgoing network traffic and then add a new VPC connector.

Console screenshot.

To add a new VPC connector, I write down a name and then select the VPC, subnets, and security groups to use. Here, I select all the subnets of my default VPC and the default security group. In this way, the App Runner service will be able to connect to the RDS database.

Console screenshot.

The next time, when configuring another application with the same VPC networking requirements, I can just select the VPC connector I created before.

Console screenshot. I review all the settings and then create and deploy the service.

After a few minutes, the service is running, and I choose the default domain to open a new tab in my browser. The application is connected to the database using VPC networking and performs a SQL query to join the books and authors tables and provide some reading suggestions. It works!

Browser screenshot.

Availability and Pricing
VPC connectors are available in all AWS Regions where AWS App Runner is offered. For more information, see the Regional Services List. There is no additional cost for using this feature, but you pay the standard pricing for data transmission or any NAT gateway or VPC endpoints you set up. You can set up VPC connectors with the AWS Management Console, AWS Command Line Interface (CLI), AWS SDKs, and AWS CloudFormation.

With VPC connectors, you can deploy your applications using App Runner and connect them to your private databases, caches, and applications running in a VPC or on-premises and connected via AWS Direct Connect.

Build and run web applications at any scale and connect to your private VPC resources with AWS App Runner.

Danilo

New for AWS Backup – Support for VMware and VMware Cloud on AWS

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-backup-support-for-vmware-and-vmware-cloud-on-aws/

Today, I am happy to announce AWS Backup support for VMware, a new capability that enables you to centralize and automate data protection of virtual machines (VMs) running on VMware on premises and VMware CloudTM on AWS. You can now use a single, centrally managed policy in AWS Backup to protect these VMware environments together with 12 AWS compute, storage, and database services already supported by AWS Backup. You can then use AWS Backup to restore VMware workloads to on-premises data centers and VMware Cloud on AWS.

While doing so, AWS Backup Audit Manager lets you consistently demonstrate compliance by monitoring backup, copy, and restore operations and generating auditor-ready reports to satisfy your data governance and regulatory requirements.

Let’s see how this works in practice.

Using AWS Backup Support for VMware
There are three steps to back up VMware virtual machines (VMs) with AWS Backup:

  1. Create a gateway to connect AWS Backup to your hypervisor.
  2. Connect to your hypervisor through the gateway.
  3. Assign virtual machines managed by your hypervisor to a backup plan.

AWS Back Support for VMware diagram

On the left pane of the AWS Backup console, there is a new External resources section. There, I choose Gateways and then Create gateway. This AWS Backup gateway helps with discovery of the on-premises VMware environment and acts as a cloud gateway to send and receive data.

I download the Open Virtualization Format (OVF) file of the AWS Backup gateway and follow the instructions to deploy the gateway using the VMware vSphere client. I am using an internal test and development VMware environment for this walkthrough.

VMware vCenter screenshot.

After deploying the gateway in my VMware environment, I come back to the AWS Backup console. I write a name for the gateway (for simplicity, I use the same name of the gateway VM) and the IP address of the gateway VM. Optionally, I can add tags to help organize and track my setup. I go on and create the gateway.

Console screenshot.

Now, I choose Add hypervisor. I write a name for the hypervisor and the IP address of the VMware vCenter server host.

Console screenshot.

I enter the username and password of a service account that I created for AWS Backup on the Active Directory domain. The username should include the domain (for example, username@domain). Then, I choose the encryption key to protect the service account credentials. If I don’t choose my own AWS Key Management Service (KMS) key, AWS Backup encrypts the username and password using a key that AWS owns and manages.

Console screenshot.

I select the gateway to connect to the hypervisor and choose Test gateway connection. This test helps ensure that the gateway can communicate with the hypervisor before I complete the configuration. Optionally, I can add tags to help organize and track my setup. I go on and add the hypervisor.

Console screenshot.

After a few minutes, the hypervisor is online, and I see the VMs managed by vCenter in the AWS Backup console. I can now use these virtual machines as resources in my backup plans in the same way as the other AWS compute, storage, and database resources supported by AWS Backup.

Console screenshot.

I create a new backup plan and start with a template. The rules of the template enforce daily backups with five weeks of retention and monthly backups with one year of retention. I can customize these rules based on my requirements.

Console screenshot.

Then, I choose to assign resources to the backup plan, and I select three VMs.

Console screenshot.

If you need, you can create an on-demand backup in the Protected resources section of the console. For example, here I am starting the on-demand backup for one of the VMs.

Console screenshot.

When a backup is complete, VMs are added to the list of the protected resources, and I can initiate a restore.

Console screenshot.

I select the backup and choose Restore. Then, I enter the restore location, which can be the same VMware environment I used for the backup or another (for example, on VMware Cloud on AWS). Below, I specify name, path, compute resource name, and datastore to use for the restore. Then, I choose Restore backup.

Console screenshot.

I monitor the status of my backup and restore jobs from the AWS Backup console. To monitor backup and restore metrics over a period of time, I can use Amazon CloudWatch metrics, logs, and alarms. I can also send events to Amazon EventBridge to receive notifications once a job completes or fails.

Availability and Pricing
AWS Backup support for VMware is available in the US East (N. Virginia, Ohio), US West (N. California, Oregon), GovCloud (US-East, US-West), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), South America (São Paulo), Asia Pacific (Hong Kong, Mumbai, Seoul, Singapore, Sydney, Tokyo, Osaka), Middle East (Bahrain), and Africa (Cape Town) Regions. Please see the AWS Regional Services List for more information.

AWS Backup supports VMware ESXi 6.7.x and 7.0.x VMs running on NFS, VMFS, and VSAN data stores on premises and in VMware Cloud on AWS. In addition, AWS Backup supports both SCSI Hot-Add and Network Block Device (NBD) transport modes for copying data from source VMs to AWS.

With AWS Backup support for VMware, you pay using the same dimensions that AWS Backup uses today: backup storage, restore, and cross-region data transfer. For more information, see the AWS Backup pricing page.

Your VM backups are stored in a backup vault. All backups stored and managed by AWS Backup are replicated to 3 Availability Zones (AZs) in the Region and designed for 99.999999999 percent (11 9s) durability and 99.99 percent (4 9s) of service availability.

AWS Backup supports first full, then incremental-forever, backups of VMs that you can create on-demand or via a schedule configured in your backup plan. AWS Backup always does full restores even though backups are stored as incremental, enabling you to benefit from storage efficiency cost savings while easily performing restores.

Centrally protect your VMware environments and your AWS compute, storage, and database resources with AWS Backup.

Danilo

New for AWS Control Tower – Region Deny and Guardrails to Help You Meet Data Residency Requirements

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-control-tower-region-deny-and-guardrails-to-help-you-meet-data-residency-requirements/

Many customers, such as those in highly regulated industries and the public sector, want to have control over where their data is stored and processed. AWS already offers many tools and features to comply with local laws and regulations, but we want to provide a simplified way to translate data residency requirements into controls that can be applied to single- and multi-account environments.

Starting today, you can use AWS Control Tower to deploy data residency preventive and detective controls, referred to as guardrails. These guardrails will prevent provisioning resources in unwanted AWS Regions by restricting access to AWS APIs through service control policies (SCPs) built and managed by AWS Control Tower. In this way, content cannot be created or transferred outside of your selected Regions at the infrastructure level. In this context, content can be software (including machine images), data, text, audio, video, or images hosted on AWS for processing or storage. For example, AWS customers in Germany can deny access to AWS services in Regions outside of Frankfurt with the exception of global services such as AWS Identity and Access Management (IAM) and AWS Organizations.

AWS Control Tower also offers guardrails to further control data residency in underlying AWS service options, for example, blocking Amazon Simple Storage Service (Amazon S3) cross-region replication or blocking the creation of internet gateways.

The AWS account used for managing AWS Control Tower is not restricted by the new Region deny settings. That account can be used for remediation if you have data in an unwanted Region before enabling Region deny.

Detective guardrails are implemented via AWS Config rules and can further detect unexpected configuration changes that should not be allowed.

You still retain a shared responsibility model for data residency at the application level, but these controls can help you restrict what infrastructure and application teams can do on AWS.

Using Data Residency Guardrails in AWS Control Tower
To use the new data residency guardrails, you need to have created a landing zone using AWS Control Tower. See Plan your AWS Control Tower landing zone for more information.

To see all the new controls that are available, I select Guardrails on the left pane of the AWS Control Tower console and then find those in the Data Residency category. I sort results by Behavior. Guardrails that have a Prevention behavior are implemented as SCPs. Those that have a Detection behavior are implemented as AWS Config rules.

Console screenshot.

The most interesting guardrail is probably the one denying access to AWS based on the requested AWS Region. I choose it from the list and find that it is different from the other guardrails because it affects all Organizational Units (OUs) and cannot be activated here but must be activated in the landing zone settings.

Console screenshot.

Below the Overview, in the Guardrail components, there is a link to the full SCP for this guardrail, and I can see the list of the AWS APIs that, when this setting is enabled, are still going to be allowed towards non-governed Regions. Depending on your requirements, some of those services, such as Amazon CloudFront or AWS Global Accelerator, can be further limited by a custom SCP.

In the Landing zone settings, the Region deny guardrail is currently not enabled. I choose Modify settings and then enable the Region deny settings.

Console screenshot.

Below the Region deny settings, there is the list of AWS Regions governed by the landing zone. Those will be the regions allowed when I enable Region deny.

Console screenshot.

In my case, I have four governed Regions, two in the US and two in Europe:

  • US East (N. Virginia), which is also the home Region for the landing zone
  • US West (Oregon)
  • Europe (Ireland)
  • Europe (Frankfurt)

I choose Update landing zone at the bottom. The update of the landing zone takes a few minutes to complete. Now, the vast majority of the AWS APIs are blocked if they are not directed to one of those governed Regions. Let’s do a few tests.

Testing Region Deny in a Sandbox Account
Using AWS Single Sign-On, I copy the AWS credentials to use the sandbox account with AWSAdministratorAccess permissions. In a terminal, I paste the commands setting the environment variables to use those credentials.

Console screenshot.

Now, I try to start a new Amazon Elastic Compute Cloud (Amazon EC2) instance in US East (Ohio), one of the non-governed Regions. In a landing zone, the default VPC is replaced by a VPC managed by AWS Control Tower. To start the instance, I need to specify a VPC subnet. Let’s find a subnet ID that I can use.

aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --region us-east-2

An error occurred (UnauthorizedOperation) when calling the DescribeSubnets operation:
You are not authorized to perform this operation.

As expected, I am not authorized to perform this operation in US East (Ohio). Let’s try to start an EC2 instance without passing the subnet ID.

aws ec2 run-instances --image-id ami-0dd0ccab7e2801812 --region us-east-2 \
    --instance-type t3.small                                     

An error occurred (UnauthorizedOperation) when calling the RunInstances operation:
You are not authorized to perform this operation.
Encoded authorization failure message: <ENCODED MESSAGE>

Again, I am not authorized. More information is included in the encoded authorization failure message that I can decode as described in this article:

aws sts decode-authorization-message --encoded-message <ENCODED MESSAGE>

The decoded message (that I have omitted for brevity) tells me that there was an explicit deny to my request and includes the full SCP that caused the deny. This information is really useful for debugging these kind of errors.

Now, let’s try in US East (N. Virginia), one of the four governed regions.

aws ec2 describe-subnets --query 'Subnets[0].SubnetId' --region us-east-1
"subnet-0f3580c0c5e56c210"

This time, the command returns the subnet ID of the first subnet returned by the request. Let’s start an instance in US East (N. Virginia) using this subnet.

aws ec2 run-instances --image-id  ami-04ad2567c9e3d7893 --region us-east-1 \
    --instance-type t3.small --subnet-id subnet-0f3580c0c5e56c210

As expected, it works, and I can see the EC2 instance running in the console.

Console screenshot.

Similarly, APIs for other AWS services are limited by the Region deny settings. For example, I can’t create an S3 bucket in a non-governed Region.

Console screenshot.

When I try to create the bucket, I get an access denied error.

Console screenshot.

As expected, the creation of an S3 bucket works in a governed Region.

Even if someone gives this account access to a bucket in a non-governed Region, I would not be able to copy any data into that bucket.

Other preventive guardrails can enforce data residency, for example:

  • Disallow cross-region networking for Amazon EC2, Amazon CloudFront, and AWS Global Accelerator
  • Disallow internet access for an Amazon VPC instance managed by a customer
  • Disallow Amazon Virtual Private Network (VPN) connections

Now, let’s see how detective guardrails work.

Testing Detective Guardrails in a Sandbox Account
I enable the following guardrails for all accounts in the sandbox OU:

  • Detect whether Amazon EBS snapshots are restorable by all AWS accounts
  • Detect whether public routes exist in the route table for an internet gateway

Now, I want to see what happens if I go against these guardrails. In the EC2 console, I create an EBS snapshot for the volume of the EC2 instance I started before. Then, I modify permissions to share it with all AWS accounts.

Console screenshot.

Then, in the VPC console, I create an internet gateway, attach it to the AWS Control Tower managed VPC, and update the route table of one of the private subnets to use the internet gateway.

Console screenshot.

After a few minutes, the noncompliant resources in the sandbox account are found by the detective guardrails.

Console screenshot.

I look at the information provided by the guardrails and update my configuration to fix the issues. In a multi-account setup I’d contact the account owner and ask for remediation.

Availability and Pricing
You can use data-residency guardrails to control resources in any AWS Region. To create a landing zone, you should start from one of the Regions where AWS Control Tower is offered. For more information, see the AWS Regional Services List. There is no additional cost for this feature. You pay the costs of other services used, such as AWS Config.

This feature provides you with a framework of controls and guidance for setting up a multi-account environment that addresses data residency requirements. Depending on your use case, you may use any subset of the new data residency guardrails.

Set up guardrails based on your data residency requirements with AWS Control Tower.

Danilo

Introducing Amazon Redshift Serverless – Run Analytics At Any Scale Without Having to Manage Data Warehouse Infrastructure

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-redshift-serverless-run-analytics-at-any-scale-without-having-to-manage-infrastructure/

We’re seeing the use of data analytics expanding among new audiences within organizations, for example with users like developers and line of business analysts who don’t have the expertise or the time to manage a traditional data warehouse. Also, some customers have variable workloads with unpredictable spikes, and it can be very difficult for them to constantly manage capacity.

With Amazon Redshift, you use SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. Today, I am happy to introduce the public preview of Amazon Redshift Serverless, a new capability that makes it super easy to run analytics in the cloud with high performance at any scale. Just load your data and start querying. There is no need to set up and manage clusters. You pay for the duration in seconds when your data warehouse is in use, for example, while you are querying or loading data. There is no charge when your data warehouse is idle.

Amazon Redshift Serverless automatically provisions the right compute resources for you to get started. As your demand evolves with more concurrent users and new workloads, your data warehouse scales seamlessly and automatically to adapt to the changes. You can optionally specify the base data warehouse size to have additional control on cost and application-specific SLAs.

With the new serverless option, you can continue to query data in other AWS data stores, such as Amazon Simple Storage Service (Amazon S3) data lakes and Amazon Aurora and Amazon Relational Database Service (RDS) databases.

Amazon Redshift Serverless is ideal when it is difficult to predict compute needs such as variable workloads, periodic workloads with idle time, and steady-state workloads with spikes. This approach is also a good fit for ad-hoc analytics needs that need to get started quickly and for test and development environments.

Let’s see how this works in practice.

Using Amazon Redshift Serverless
I go to the Amazon Redshift console and choose the new serverless option. The first time, I set up the serverless endpoint and configure networking and security.

I confirm the default settings that use all subnets in my default Amazon Virtual Private Cloud (VPC) and its default security group. Data is always encrypted, and I use the default AWS-owned key. Optionally, I can customize all settings. I can associate now or later the AWS Identity and Access Management (IAM) roles to give permissions to access other AWS resources, for example, to be able to load data from an S3 bucket. The configuration of the serverless endpoint will be shared by all my serverless data warehouses in the same AWS account and Region.

Console screenshot.

To query data, I use Amazon Redshift Query Editor V2, a new free web-based tool that we made available a few months back. The query editor provides quick access to a few sample datasets to make it easy to learn Amazon Redshift’s SQL capabilities: TPC-H, TPC-DS, and tickit, a dataset containing information on ticket sales for events.

For a quick test, I use the tickit sample dataset so I don’t need to load any data. I prepare a query to get the list of tickets sold per date, sorted to see the dates with more sales first:

SELECT caldate, sum(qtysold) as sumsold
FROM   tickit.sales, tickit.date
WHERE  sales.dateid = date.dateid 
GROUP BY caldate
ORDER BY sumsold DESC;

By using the web-based query editor, I don’t need to configure a SQL client or set up the network permissions to reach the serverless endpoint. Instead, I just write my SQL query and run it.

Console screenshot.

I am a visual person. I enable the Chart option on the right of the result table and select a bar chart.

Console screenshot.

Satisfied with the clarity of the chart, I export it as an image file. In this way, I can quickly share it or include it in a report.

Bar chart

Amazon Redshift Serverless supports all rich SQL functionality of Amazon Redshift such as semi-structured data support. I can use any JDBC/ODBC-compliant tool or the Amazon Redshift Data API to query my data. To migrate data, I can take a snapshot of an Amazon Redshift provisioned cluster and restore it as serverless. Then, I just need to update my SQL applications to use the new serverless endpoint.

Availability and Pricing
Amazon Redshift Serverless is available in public preview in the following AWS Regions: US East (N. Virginia), US West (N. California, Oregon), Europe (Frankfurt, Ireland), Asia Pacific (Tokyo).

With Amazon Redshift Serverless, you pay separately for the compute and storage you use. Compute capacity is measured in Redshift Processing Units (RPUs), and you pay for the workloads in RPU-hours with per-second billing. For storage, you pay for data stored in Amazon Redshift-managed storage and storage used for snapshots, similar to what you’d pay with a provisioned cluster using RA3 instances.

To control your costs, you can specify usage limits and define actions that Amazon Redshift automatically takes if those limits are reached. You can specify usage limits in RPU-hours and associated with a daily, weekly, or monthly duration. Setting higher usage limits can improve the overall throughput of the system, especially for workloads that need to handle high concurrency while maintaining consistently high performance.

Compute resources automatically shutdown behind the scenes when there is no activity and resume when you are loading data, or there are queries coming in. When accessing your S3 data lake via the new serverless endpoint, you do not pay for Amazon Redshift Spectrum separately. You have a unified serverless experience and pay for data lake queries also in RPU-seconds. For more information, see the Amazon Redshift pricing page.

The serverless end point is configured at the AWS account level. If you have multiple teams or projects and want to manage costs separately, you can use separate AWS accounts. You can share data between your provisioned clusters and serverless endpoint and between serverless endpoints across accounts.

To help you get practice, we provide you upfront with $500 in AWS credits to try the Amazon Redshift Serverless public preview. You get the credits when you first create a database with Amazon Redshift Serverless. These credits are used to cover your costs for compute, storage, and snapshot usage of Amazon Redshift Serverless only.

Start using Amazon Redshift Serverless today to run and scale analytics without having to provision and manage data warehouse clusters.

Danilo

AWS Lake Formation – General Availability of Cell-Level Security and Governed Tables with Automatic Compaction

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-lake-formation-general-availability-of-cell-level-security-and-governed-tables-with-automatic-compaction/

A data lake can help you break down data silos and combine different types of analytics into a centralized repository. You can store all of your structured and unstructured data in this repository. However, setting up and managing data lakes involve a lot of manual, complicated, and time-consuming tasks. AWS Lake Formation makes it easy to set up a secure data lake in days instead of weeks or months.

Today, I am excited to share the general availability of some new features that simplify even further loading data, optimizing storage, and managing access to a data lake:

  • Governed Tables – A new type of Amazon Simple Storage Service (Amazon S3) tables that makes it simple and reliable to ingest and manage data at any scale. Governed tables support ACID transactions that let multiple users concurrently and reliably insert and delete data across multiple governed tables. ACID transactions also let you run queries that return consistent and up-to-date data. In case of errors in your extract, transform, and load (ETL) processes, or during an update, changes are not committed and will not be visible.
  • Storage Optimization with Automatic Compaction for governed tables – When this option is enabled, Lake Formation automatically compacts small S3 objects in your governed tables into larger objects to optimize access via analytics engines, such as Amazon Athena and Amazon Redshift Spectrum. By using automatic compaction, you don’t have to implement custom ETL jobs that read, merge, and compress data into new files, and then replace the original files.
  • Granular Access Control with Row and Cell-Level Security – You can control access to specific rows and columns in query results and within AWS Glue ETL jobs based on the identity of who is performing the action. In this way, you don’t have to create (and keep updated) subsets of your data for different roles and legislations. This works for both governed and traditional S3 tables.

Using Governed Tables, ACID Transactions, and Automatic Compaction
In the Lake Formation console, I can enable governed data access and management at table creation. Automatic compaction is enabled by default, and it can be disabled using the AWS Command Line Interface (CLI) or AWS SDKs.

Console screenshot.

Governed tables have a manifest that tracks the S3 objects that are part of the table’s data. I can use the UpdateTableObjects API to keep the manifest updated when adding new objects to the table, and I can call it using the AWS CLI and SDKs. This API is implicitly used by the AWS Glue ETL library.

Moreover, I have access to new Lake Formation APIs to start, commit, or cancel a transaction. I can use these APIs to wrap data loading, data transformation, and output consistent and up-to-date data.

Using Row and Cell-Level Security
There are many use cases where, for a table, you want to restrict access to specific columns, rows, or a combination that depends on the role of the user accessing the data. For example, a company with offices in the US, Germany, and France can create a filter for analysts based in the European Union (EU) to limit access to EU-based customers.

Console screenshot.

The filter can enforce that some columns, such as date of birth (dob) and phone, are not accessible to those analysts. Moreover, access to individual rows can be filtered by using filter expressions. You can configure row filter expressions with a SQL-compatible syntax based on the open-source PartiQL language. In this case, only rows with country equal to Germany or France (country='DE' OR country='FR') are visible.

Console screenshot.

Availability and Pricing
These new features are available today in the following AWS Regions: US East (N. Virginia), US West (Oregon), Europe (Ireland), US East (Ohio), and Asia Pacific (Tokyo).

When querying governed tables, or tables secured with row and cell-level security, you pay by the amount of data scanned (with a 10MB minimum). When using governed tables, transaction metadata is charged by the number of S3 objects tracked, and you pay for the number of transaction requests. Automatic compaction is charged based on the data processed. For more information, see the AWS Lake Formation pricing page.

While implementing these features, we introduced a new Lake Formation Storage API that is integrated with tools such as AWS Glue, Amazon Athena, Amazon Redshift Spectrum, and Amazon QuickSight. You can use this storage API directly in your applications to query tables with a SQL-like syntax (joins are not supported) and get the benefits of governed tables and cell-level security.

See the detailed blog series published during the preview to learn more:

Effective data lakes using AWS Lake Formation

Take advantage of these new features to simplify the creation and management of your data lake.

Danilo

New – AWS Control Tower Account Factory for Terraform

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-aws-control-tower-account-factory-for-terraform/

AWS Control Tower makes it easier to set up and manage a secure, multi-account AWS environment. AWS Control Tower uses AWS Organizations to create what is called a landing zone, bringing ongoing account management and governance based on our experience working with thousands of customers.

If you use AWS CloudFormation to manage your infrastructure as code, you can customize your AWS Control Tower landing zone using Customizations for AWS Control Tower, a solution that helps you deploy custom templates and policies to individual accounts and organizational units (OUs) within your organization.

But what if you use Terraform to manage your AWS infrastructure?

Today, I am happy to share the availability of AWS Control Tower Account Factory for Terraform (AFT), a new Terraform module maintained by the AWS Control Tower team that allows you to provision and customize AWS accounts through Terraform using a deployment pipeline. The source code for the development pipeline can be stored in AWS CodeCommit, GitHub, GitHub Enterprise, or BitBucket. With AFT, you can automate the creation of fully functional accounts that have access to all the resources they need to be productive. The module works with Terraform open source, Terraform Enterprise, and Terraform Cloud.

Architectural diagram.

Let’s see how this works in practice.

Using AWS Control Tower Account Factory for Terraform
First, I create a main.tf file that uses the AWS Control Tower Account Factory for Terraform (AFT) module:

module "aft" {
  source = "[email protected]:aws-ia/terraform-aws-control_tower_account_factory.git"

  # Required Parameters
  ct_management_account_id    = "123412341234"
  log_archive_account_id      = "234523452345"
  audit_account_id            = "345634563456"
  aft_management_account_id   = "456745674567"
  ct_home_region              = "us-east-1"
  tf_backend_secondary_region = "us-west-2"

  # Optional Parameters
  terraform_distribution = "oss"
  vcs_provider           = "codecommit"

  # Optional Feature Flags
  aft_feature_delete_default_vpcs_enabled = false
  aft_feature_cloudtrail_data_events      = false
  aft_feature_enterprise_support          = false
}

The first six parameters are required. As a prerequisite, I need to pass the ID of four AWS accounts in my AWS organization:

  • ct_management_account_id – AWS Control Tower management account
  • log_archive_account_id – Log Archive account
  • audit_account_id – Audit account
  • aft_management_account_id – AFT management account

Then, I have to pass two AWS Regions:

  • ct_home_region – The Region from which this module will be executed. This must be the same Region where AWS Control Tower is deployed.
  • tf_backend_secondary_region – The backend primary Region is the same as the AFT Region. This parameter defines the secondary Region to replicate to. AFT creates a backend for state tracking for its own state. It is also used for Terraform when using the open-source version.

The other parameters are optional and are set to their default value in the previous main.tf file:

  • terraform_distribution – To select between Terraform open source (default), Enterprise, or Cloud
  • vcs_provider – To choose the version control system to use between AWS CodeCommit (default), GitHub, GitHub Enterprise, or BitBucket.

These feature flags are disabled by default and can be omitted unless you want to enable them:

  • aft_feature_delete_default_vpcs_enabled – To automatically delete the default VPC for new accounts.
  • aft_feature_cloudtrail_data_events – To enable AWS CloudTrail data events for new accounts. Be aware that this option, usually required for compliance in highly regulated environments, can have an impact on your costs.
  • aft_feature_enterprise_support – To automatically enroll new accounts with Enterprise Support (if you have an Enterprise Support Plan).

First, I initialize the project and download the plugins:

terraform init

Then, I use AWS Single Sign-On to log in with the AWS Control Tower management account and start the deployment:

terraform apply

I confirm with a yes and, after some time, the deployment is complete.

Now, I use AWS SSO again to log in with the AFT management account. In the AWS CodeCommit console, I find four repositories that I can use to customize the accounts created with AFT.

Console screenshot.

These repositories are used by pipelines managed by AWS CodePipeline to automate the account creation:

  • xaft-account-request – This is where I place requests for accounts provisioned and managed by AFT.
  • aft-global-customizations – I can use this repository to customize all provisioned accounts with customer-defined resources. The resources can be created through Terraform or through Python.
  • aft-account-customizations – Here, I can customize provisioned accounts depending on the value of the account_customizations_name parameter in the aft-account-request repository. In this way, I can create different sets of customizations depending on the role the account will be used for.
  • aft-account-provisioning-customizations – This repository uses AWS Step Functions to customize the provisioning process for new accounts and simplify the integration with additional environments. State machines can use AWS Lambda functions, Amazon Elastic Container Service (Amazon ECS) or AWS Fargate tasks, custom activities hosted either on AWS or on-premises, or Amazon Simple Notification Service (SNS) and Amazon Simple Queue Service (SQS) to communicate with external applications.

Currently, these four repositories are all empty. To start, I use the code in the sources/aft-customizations-repos folder in the GitHub repo of the AFT Terraform module.

Using the example in the aft-account-request repository, I prepare a template to create a couple of AWS accounts. One of the two accounts is for a software developer.

To help software developers be productive quickly, I create a specific account customization. In the template, I set the parameter account_customizations_name equal to developer-customization.

Then, in the aft-account-customizations repository, I create a developer-customization folder where I put a Terraform template to automatically create an AWS Cloud9 EC2-based development environment for new accounts of that type. Optionally, I can extend that with my Python code, for example, to invoke internal or external APIs. Using this approach, all new accounts for software developers will have their development environment ready as they go through the delivery pipeline.

I push the changes to the main branch (first for the aft-account-customizations repository, then for the aft-account-request). This triggers the execution of the pipeline. After a few minutes, the two new accounts are ready to be used.

You can customize accounts created by AFT based on your unique requirements. For example, you can provide each account with its own specific security setup (such as IAM roles or security groups) and storage (for example, pre-configured Amazon Simple Storage Service (Amazon S3) buckets).

Availability and Pricing
AWS Control Tower Account Factory for Terraform (AFT) works in any Region where AWS Control Tower is available. There are no additional costs when using AFT. You pay for the services used by the solution. For example, when you set up AWS Control Tower, you will begin to incur costs for AWS services configured to set up your landing zone and mandatory guardrails.

When building this solution, we worked together with HashiCorp. Armon Dadgar, HashiCorp Co-Founder and CTO, told us: “Managing cloud environments with hundreds or thousands of users can be a complex and time-consuming process. Using a software delivery pipeline integrating Terraform and AWS Control Tower makes it easier to achieve consistent governance and compliance requirements across all accounts.”

The pipeline provides an account creation process that monitors when account provisioning is complete and then triggers additional Terraform modules to enhance the account with further customizations. You can configure the pipeline to use your own custom Terraform modules or pick from pre-published Terraform modules for common products and configurations.

Simplify and standardize AWS account creation using AWS Control Tower Account Factory for Terraform.

Danilo

Introducing Amazon Braket Hybrid Jobs – Set Up, Monitor, and Efficiently Run Hybrid Quantum-Classical Workloads

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-braket-hybrid-jobs-set-up-monitor-and-efficiently-run-hybrid-quantum-classical-workloads/

I find quantum computing fascinating! At its simplest level, it extends the concept of bits, that have 0 or 1 values, with quantum bits, or qubits, that can have a combination of two different (quantum) states.

Two characteristics make qubits really interesting:

  • When you look at the value of a qubit, you get only one of the two possible states with a probability that depends on how its own states are combined.
  • Multiple qubits can be “connected” together (this is called quantum entanglement) so that by changing the state of one, even just by reading its value, you alter the states of the others.

These characteristics come from low-level properties described by quantum mechanics, a fundamental theory in physics that provides a description of the physical properties of nature at atomic and subatomic scales. Luckily, we don’t need a degree in quantum mechanics to use quantum computing in the same way we don’t need to be expert in semiconductors to use an ordinary computer.

Using qubits, researchers are designing new algorithms that have the potential to be much faster than what classical computers can achieve. To help speed up scientific research and software development for quantum computing, we introduced Amazon Braket at re:Invent 2019. A fully managed quantum computing service, Amazon Braket allows you to build, test, and run quantum algorithms on simulators and quantum computers.

Hybrid Algorithms and Quantum Processing Units (QPUs)
Quantum algorithms, which would be transformational in many different areas, require the execution of hundreds of thousands to millions of quantum gates. Unfortunately, the current generation of QPUs suffer from noise, creating errors that limit operations to only a few hundreds or thousands of gates before the errors take over.

To help solve this, we can take inspiration from machine learning: instead of using fixed quantum circuits, the logic that implements the algorithm, we let the algorithm “learn” by adjusting the parameters that tune the circuit to have a better chance of solving a given problem by adapting to the noise in a particular device (think of them as “self-learning quantum algorithms”).

This is similar to computer vision: instead of hand-crafting the features to distinguish a dog from a cat (which is notoriously difficult for a computer), machine learning algorithms “learn” the right features by iteratively adjusting parameters of a neural network.

A rapidly emerging area of research in quantum computing uses QPUs, the processors used by quantum computers, in the same way as GPUs are used in machine learning: Quantum circuits are parameterized, initialized with some values, and then run on the QPU. Like the weights in a neural network, these parameters are then iteratively adjusted based on the results of the computation. These so-called hybrid algorithms rely on rapid, iterative computations between classical computers and QPUs.

Architectural diagram.

To run hybrid algorithms, you need to manually set up a classical infrastructure, install the required software, and manage the interaction between your quantum and classical compute processes for the duration of your hybrid algorithm. You then need to build custom monitoring solutions to visualize the progress of your algorithm to make sure it converges to the solution as expected or intervene if necessary to adjust the parameters of the algorithm.

Another big challenge is that QPUs are shared, inelastic resources, and you compete with others for access. This can slow down the execution of your algorithm. A single large workload from another customer can bring the algorithm to a halt, potentially extending your total runtime for hours. This is not only inconvenient but also impacts the quality of the results because today’s QPUs need periodic re-calibration, which can invalidate the progress of a hybrid algorithm. In the worst case, the algorithm fails, wasting budget and time.

Introducing Amazon Braket Hybrid Jobs
Today, I am happy to introduce Amazon Braket Hybrid Jobs, a new capability of Amazon Braket that simplifies the process of setting up, monitoring, and efficiently executing hybrid quantum-classical algorithms. Jobs are fully managed so you can avoid extensive infrastructure and software management and confidently execute your algorithms quickly and predictably, with on-demand priority access to QPUs.

When you create a job, Amazon Braket spins up the job instance (providing a CPU environment based on an Amazon Elastic Compute Cloud (Amazon EC2) instance), executes the algorithm (using quantum hardware or simulators), and releases the resources once the job is completed so that you only pay for what you use. You can also define custom metrics for algorithms, which are automatically logged by Amazon CloudWatch and displayed in near real-time in the Amazon Braket console as the algorithm runs. This provides you with live insights into how your algorithm is progressing, creating the opportunity to adjust your algorithm as necessary and innovate more quickly.

Architectural diagram.

To run hybrid algorithms as jobs, you can define your algorithm using the Amazon Braket SDK or with PennyLane, an open-source library for hybrid quantum computing. Let’s see how that works in practice with a couple of examples.

Using Amazon Braket Hybrid Jobs
Before building a trainable quantum algorithm, let’s get started by running a series of fixed quantum operations, what we’ll refer to as quantum tasks. I use Python and the Amazon Braket SDK to define a circuit that constructs what is called a bell state, a state which has a fifty-fifty chance of resolving to each of two states. It’s the quantum computing equivalent of tossing a coin.

Here’s the content of the algorithm_script.py file:

import os

from braket.aws import AwsDevice
from braket.circuits import Circuit
from braket.jobs import save_job_result


def start_here():

    print("Test job started!")

    device = AwsDevice(os.environ["AMZN_BRAKET_DEVICE_ARN"])

    results = []
    
    bell = Circuit().h(0).cnot(0, 1)
    for count in range(5):
        task = device.run(bell, shots=100)
        print(task.result().measurement_counts)
        results.append(task.result().measurement_counts)

    save_job_result({ "measurement_counts": results })
    
    print("Test job completed!")

This script uses the environment variable AMZN_BRAKET_DEVICE_ARN to instantiate the device that I select when creating the job.

Quantum computing is probabilistic. For this reason, circuits need to be evaluated multiple times to get accurate results. A single run is called a shot. The higher the number of shots, the better the accuracy of the result. In this case, the circuit is run for 100 shots.

I use the save_job_result function to store the results of my job so that I can analyze them at the end.

In the Amazon Braket console, I choose Jobs on the left panel and then Create job. To start, I give the job a name.

Console screenshot.

Then, I pass the file with the algorithm. The CPU component of the hybrid algorithm runs in a container, and I can choose which container image to use. For example, I can use a pre-built container image that includes software my algorithm depends on, such as PennyLane, TensorFlow, or PyTorch, or bring my own custom image. I select the Base container image because I don’t have external dependencies.

I leave all other settings to their default value. In this way, I use the SV1 simulator, rather than quantum hardware, to run the quantum tasks.

After some time, the job has completed, and I follow the link to the Amazon Simple Storage Service (Amazon S3) console to download the result. As expected, for each of the five tasks, the results show that the proportion of the 00 and 11 states is roughly 50:50. The proportions vary slightly because of the probabilistic nature of quantum computing.

{
    "braketSchemaHeader": {
        "name": "braket.jobs_data.persisted_job_data",
        "version": "1"
    },
    "dataDictionary": {
        "measurement_counts": [
            {
                "00": 51,
                "11": 49
            },
            {
                "00": 44,
                "11": 56
            },
            {
                "11": 51,
                "00": 49
            },
            {
                "00": 56,
                "11": 44
            },
            {
                "00": 49,
                "11": 51
            }
        ]
    },
    "dataFormat": "plaintext"
}

This example is quite basic because I am not running any classical logic other than initiating tasks. To see the real value, let’s see how it works with a hybrid algorithm where we tweak the parameters of the quantum circuit iteratively from task to task.

Using Amazon Braket Hybrid Jobs with Hybrid Algorithms
For a more advanced example, I use a well-known example of an actual hybrid algorithm, called the quantum approximate optimization algorithm (QAOA), included in the examples provided by Amazon Braket when creating a notebook from the Braket console. QAOA is a quantum algorithm that produces approximate solutions for combinatorial optimization problems. You can also find the example in this GitHub repo.

In this case, I am using QAOA to solve the Max-Cut problem: when partitioning nodes of a graph in two, what is the maximum number of edges connecting nodes between the two parts? For example, in the figure below, there are six nodes connected by eight edges. The thick yellow line partitions the nodes into two sets by crossing six edges.

In the QAOA example, the tuning of parameters that are used to run the successive rounds of quantum tasks is optimized in a classical computing environment (such as an EC2 instance) using tools like TensorFlow or PyTorch. In one of the notebook cells, I can choose which interface to use to tune the parameters as well as the other hyperparameters in a similar way to what I’d do for machine learning training.

Braket jobs then coordinates running the classical and quantum computing parts of the algorithm and the exchange of parameters and results between them. I can just sit back and relax as I watch my algorithm converge, ready to retrieve my results from S3, as before, for deeper analysis.

Running Hybrid Algorithms in Local Mode
To test and debug hybrid algorithms quickly, the Amazon Braket SDK can run jobs in local mode. With local mode, Braket jobs are run locally on your machine (for example, your laptop). In this way, you can get fast feedback and iterate quickly during the development of your algorithms.

To run a job in local mode, you just need to replace AwsQuantumJob with LocalQuantumJob. Note that AwsQuantumJob is imported from braket.aws , while LocalQuantumJob is imported from braket.jobs.local.

Availability and Pricing
Amazon Braket Hybrid Jobs are available today in all AWS Regions where Amazon Braket is available. For more information, see the AWS Regional Services List.

With Amazon Braket Hybrid Jobs, you only pay for the resources you use. There is no need to deploy, configure, and manage classical infrastructure, making it easy to experiment and improve algorithms iteratively. For more information, see the Amazon Braket pricing page.

Instead of relying on theoretical studies, you can start to use quantum computers as the primary tool to understand and improve hybrid algorithms and test their applicability for industry and research use cases. In this way, you can focus on your research and not deal with setting up and coordinating these different compute resources for your experiments.

During the development of this new capability, we talked with customers and partners to understand their needs. “As application developers, Braket Hybrid Jobs gives us the opportunity to explore the potential of hybrid variational algorithms with our customers,” says Vic Putz head of engineering at QCWare. “We are excited to extend our integration with Amazon Braket and the ability to run our own proprietary algorithms libraries in custom containers means we can innovate quickly in a secure environment. The operational maturity of Amazon Braket and the convenience of priority access to different types of quantum hardware means we can build this new capability into our stack with confidence.”

Simplify running hybrid quantum-classical workloads with Amazon Braket Hybrid Jobs.

Danilo