Tag Archives: announcements

Integrating Amazon EventBridge and Amazon ECS

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/integrating-amazon-eventbridge-and-amazon-ecs/

This post is courtesy of Jakub Narloch, Senior Software Development Engineer.

Today, AWS announced the support for Amazon API Gateway as an event target in Amazon EventBridge. This feature enables new integration scenarios for web applications and services. It allows customers to seamlessly connect their infrastructure, SaaS services, and APIs hosted in AWS.

With API Gateway as a target for EventBridge, this creates new integration capabilities for new or existing web applications. This post explains how developers can now deliver events directly to their applications hosted on Amazon ECS, Amazon Elastic Kubernetes Service (EKS), or Amazon EC2 using EventBridge and API Gateway. In this post, I show how to build an event driven application running on ECS Fargate that process events from EventBridge using API Gateway integration.

EventBridge is a serverless event bus that makes it easy to connect applications together. It uses data from your own applications, integrated software as a service (SaaS) applications, and AWS services. This simplifies the process of building event-driven architectures by decoupling event producers from event consumers. This allows producers and consumers to be scaled, updated, and deployed independently. Loose coupling improves developer agility in addition to application resiliency.

API Gateway helps developers to create, publish, and maintain secure APIs at any scale. When used with EventBridge, API Gateway authenticates and authorizes API calls. It also acts as an HTTP proxy layer for integrating other AWS services or third-party web applications.

Previously, EventBridge customers could consume events from EventBridge in ECS via Amazon SNS or Amazon SQS, or by triggering an ECS task directly. API Gateway as a target replaces this approach and brings additional API Gateway features like authentication and rate limiting. This can help you build more resilient and feature-rich integrations. API Gateway throttling limits the maximum number events delivered at a same time, while EventBridge retries events delivery for up to 24 hours.

This blog post uses an ecommerce application as an example of a custom integration. The application is responsible for processing customer orders. The following diagram illustrates the interaction of the components of the system. The application itself is hosted as ECS service on top of AWS Fargate.

Architecture diagram

To achieve high availability, the application cluster is distributed across subnets in different Availability Zones. The Application Load Balancer ensures that the incoming traffic is distributed across the nodes in the cluster. API Gateway is responsible for authenticating requests and routing to the backend. The application logic is responsible for receiving the event and persisting it in Amazon DocumentDB.

The order event is modeled as follows:

{
  "version": "0",
  "region": "us-east-1",
  "account": "123456789012",
  "id": "4236e18e-f858-4e2b-a8e8-9b772525e0b2",
  "source": "ecommerce",
  "detail-type": "CreateOrder",
  "resources": [],
  "detail": {
    "order_id": "ce4fe8b7-9911-4377-a795-29ecca9d7a3d",
    "create_date": "2020-06-02T13:01:00Z",
    "items": [
      {
        "product_id": "b8575571-5e91-4521-8a29-4af4a8aaa6f6",
        "quantity": 1,
        "price": "9.99",
        "currency": "CAD"
      }
    ],
    "customer": {
      "customer_id": "5d22899e-3ff5-4ce0-a2a3-480cfce39a56"
    },
    "payment": {
      "payment_id": "fb563473-bef4-4965-ad78-e37c6c9c0c2a",
    },
    "delivery_address": {
      "street": "510 W Georgia St",
      "city": "Vancouver",
      "state": "BC",
      "country": "Canada",
      "zip_code": "V6B0M7"
    }
  }
}

Application layer
The application that processes the orders is implemented using a reactive stack through Spring Boot. A reactive application design can help build a scalable application capable of handling thousands of transactions per second from a single instance. This is important for applications with high throughput and can help in achieving economies of scale.

The resource handler
The application itself defines a OrderResource, which acts as entry handler for receiving the events from EventBridge and processing them. The handler logic is responsible for unmarshalling the event and retrieving the order details out of the event detail. The order is then persisted in DocumentDB using a dedicated DAO instance.

@Slf4j
@RequestMapping("/orders")
@RestController
public class EventResource {
 
    private final OrderRepository orderRepository;
 
    public EventResource(OrderRepository orderRepository) {
        this.orderRepository = Objects.requireNonNull(orderRepository);
    }
 
    @RequestMapping(method = RequestMethod.PUT)
    public Mono<ResponseEntity<Object>> onEvent(@Valid @RequestBody Event<Order> event) {
 
        log.info("Processing order {}", event.getDetail().getOrderId());
 
        return orderRepository.save(event.getDetail())
                .map(order -> ResponseEntity.created(UriComponentsBuilder.fromPath("/events/{id}")
                        .build(order.getOrderId())).build())
                .onErrorResume(error -> {
                    log.error("Unhandled error occurred", error);
                    return Mono.just(ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).build());
                });
    }
}

The handler is a mapped to process requests at ‘/orders’ path. The implementation unmarshals an event payload and stores it into DocumentDB. Upon successful execution, the service responds with a 201 Created HTTP status code.

You can store EventBridge events in a document database like Amazon DocumentDB. This is a non-relational database that allows you to store JSON content directly. This example uses DocumentDB for storing the documents, making it easy for writing the event payload directly. It also supports general querying of event content.

Prerequisites
To build and deploy the application, you must have AWS CDK and JDK 11 installed. Start by cloning the GitHub repository. The repository contains the example code and supporting infrastructure for deploying to AWS.

Step 1: Create Amazon ECR repository.
Start by creating a dedicated Amazon ECR repository, where Docker images are uploaded. There is an AWS CDK template in the application code repo for this purpose.

First, install Node.js dependencies needed to execute the CDK command:

cd ../eventbridge-integration-solution-aws-api-cdk
npm install

Next, compile the CDK TypeScript template.

npm run build

Finally, synthesize the CloudFormation stack.

cdk synth "*"

Now bootstrap CloudFormation resources needed to deploy the remaining templates.

cdk bootstrap

Finally, deploy the stack that creates the Amazon ECR registry.

cdk deploy EventsRegistry

Step 2: Build the application

Before the application is deployed, it must be built and uploaded to Amazon ECR.
To get started, compile the source code and build the application distribution.

cd ../eventbridge-integration-solution-aws-api
./gradlew clean build

Step 3: Containerizing the application
The build system is configured to include the task for containerizing the artifacts and creating the Docker image. To create a new Docker image from the build artifact, run the following command:

./gradlew dockerBuildImage

The build task generates the Dockerfile using the provided settings. It then executes the docker build command to create a new Docker image named eventbridge-integration-solution-aws-api.

Step 4: Upload the image to Amazon ECR
You can now upload the image directly to Amazon ECR. First, login into the Amazon ECR registry through Docker. Replace AWS_ACCOUNT_ID with your actual account.

aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin "${AWS_ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com"

Before uploading the image to ECR, tag it with the expected remote repository name. To do that, first list all of the Docker images.

docker images

Copy the image id attribute of eventbridge-integration-solution-aws-api image and use it in the tag command, also replacing AWS_ACCOUNT_ID.

docker tag $DOCKER_IMAGE "${AWS_ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com/eventbridge-integration-solution-aws-api"

Finally, push the Docker image to ECR, replacing AWS_ACCOUNT_ID with your AWS account ID.

docker push "${AWS_ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com/eventbridge-integration-solution-aws-api"

Step 5: Deploying the application stack
Once the application image is uploaded to Amazon ECR, you can deploy the entire application stack using CDK. The stack creates multiple resources including a VPC, DocumentDB cluster, ECS TaskDefinition and Service, Application Load Balancer, API Gateway and EventBridge rule. You can inspect the resources created in the CDK definition by opening the TypeScript files in the eventbridge-integration-solution-aws-api-cdk/lib directory.

At this point, you can proceed with deploying the CloudFormation stack.

cd ../eventbridge-integration-solution-aws-api-cdk
cdk deploy "*"

Step 6: Testing running application
Now, test the end to end event delivery by publishing the sample events to the EventBridge PutEvents API. Create a file named event.json and paste the following code:

[
  {
    "Source": "ecommerce",
    "DetailType": "CreateOrder",
    "Detail": "{\"order_id\": \"ce4fe8b7-9911-4377-a795-29ecca9d7a3d\",\"create_date\": \"2020-06-02T13:01:00Z\",\"items\": [{\"product_id\": \"b8575571-5e91-4521-8a29-4af4a8aaa6f6\",\"quantity\": 1,\"price\": \"9.99\",\"currency\": \"CAD\"}],\"customer\": {\"customer_id\": \"5d22899e-3ff5-4ce0-a2a3-480cfce39a56\"},\"payment\": {\"payment_id\": \"fb563473-bef4-4965-ad78-e37c6c9c0c2a\"},\"delivery_address\": {\"street\": \"510 W Georgia St\",\"city\": \"Vancouver\",\"state\": \"BC\",\"country\": \"Canada\",\"zip_code\": \"V6B0M7\"}}"
  }
]

Publish this event with the following AWS CLI command.

aws events put-events --entries file://event.json

EventBridge delivers the event to API Gateway and the application persists it in DocumentDB.

Step 7: Cleanup
Delete all the resources created in this tutorial by running this CDK command:

cdk destroy "*"

Additional considerations
The demo application is simplified for the purpose of showcasing the EventBridge integration with API Gateway. In production, it’s recommended that you isolate the DocumentDB cluster in a private subnet. Additionally, the Application Load Balancer can be hidden from public access and connected to API Gateway through VPC Link.

Conclusion

This post demonstrates how to set up a sample application for consuming events directly from EventBridge into a custom application hosted in ECS. This integration uses EventBridge’s native support for API Gateway as a target that allows to integrate any HTTP base web applications.

Learn more from the EventBridge documentation.

Create Snapshots From Any Block Storage Using EBS Direct APIs

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-create-snapshots-from-any-block-storage/

I am excited to announce you can now create Amazon Elastic Block Store (EBS) snapshots from any block storage data, such as on-premises volumes, volumes from another cloud provider, existing block data stored on Amazon Simple Storage Service (S3), or even your own laptop 🙂

AWS customers using the cloud for disaster recovery of on-premises infrastructure all have the same question: how can I transfer my on-premises volume data to the cloud efficiently and at low cost? You usually create temporary Amazon Elastic Compute Cloud (EC2) instances, attach Amazon Elastic Block Store (EBS) volumes, transfer the data at block level from on-premises to these new Amazon Elastic Block Store (EBS) volumes, take a snapshot of every EBS volumes created and tear-down the temporary infrastructure. Some of you choose to use CloudEndure to simplify this process. Or maybe you just gave up and did not copy your on-premises volumes to the cloud because of the complexity.

To simplify this, we are announcing today 3 new APIs that are part of EBS direct API, a new set of APIs we announced at re:Invent 2019. We initially launched a read and diff APIs. We extend it today with write capabilities. These 3 new APIs allow to create Amazon Elastic Block Store (EBS) snapshots from your on-premises volumes, or any block storage data that you want to be able to store and recover in AWS.

With the addition of write capability in EBS direct API, you can now create new snapshots from your on-premises volumes, or create incremental snapshots, and delete them. Once a snapshot is created, it has all the benefits of snapshots created from Amazon Elastic Block Store (EBS) volumes. You can copy them, share them between AWS Accounts, keep them available for a Fast Snapshot Restore, or create Amazon Elastic Block Store (EBS) volumes from them.

Having Amazon Elastic Block Store (EBS) snapshots created from any volumes, without the need to spin up Amazon Elastic Compute Cloud (EC2) instances and Amazon Elastic Block Store (EBS) volumes, allows you to simplify and to lower the cost of the creation and management of your disaster recovery copy in the cloud.

Let’s have a closer look at the API
You first call StartSnapshot to create a new snapshot. When the snapshot is incremental, you pass the ID of the parent snapshot. You can also pass additional tags to apply to the snapshot, or encrypt these snapshots and manage the key, just like usual. If you choose to encrypt snapshots, be sure to check our technical documentation to understand the nuances and options.

Then, for each block of data, you call PutSnapshotBlock. This API has 6 mandatory parameters: snapshot-id, block-index, block-data, block-length, checksum, and checksum-algorithm. The API supports block lengths of 512 KB. You can send your blocks in any order, and in parallel, block-index keeps the order correct.

After you send all the blocks, you call CompleteSnapshot with changed-blocks-count parameter having the number of blocks you sent.

Let’s put all these together
Here is the pseudo code you must write to create a snapshot.

AmazonEBS amazonEBS = AmazonEBSClientBuilder.standard()
   .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(endpointName, awsRegion))
   .withCredentials(credentialsProvider)
   .build();

response = amazonEBS.startSnapshot(startSnapshotRequest)
snapshotId = response.getSnapshotId();

for each (block in changeset) {
    putResponse = amazonEBS.putSnapshotBlock(putSnapshotBlockRequest);
}
amazonEBS.completeSnapshot(completeSnapshotRequest);

As usual, when using this code, you must have appropriate IAM policies allowing to call the new API. For example:

{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"ebs:StartSnapshot",
"ebs:PutSnapshotBlock",
"ebs:CompleteSnapshot"
],
"Resource": "arn:aws:ec2:<Region>::snapshot/*" }]

Also include some KMS related permissions when creating encrypted snapshots.

In addition of the storage cost for snapshots, there is a charge per API call when you call PutSnapshotBlock.

These new snapshot APIs are available in the following AWS Regions: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), China (Beijing), China (Ningxia), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), Middle East (Bahrain), and South America (São Paulo).

You can start to use them today.

— seb

Announcing the New Version of the Well-Architected Framework

Post Syndicated from Rodney Lester original https://aws.amazon.com/blogs/architecture/announcing-the-new-version-of-the-well-architected-framework/

“Nothing is constant except for change.” – Unknown

We are announcing the availability of a new version of the AWS Well-Architected Framework. We’ve made changes in response to your feedback, as well as to industry trends that are becoming widely adopted. We focused on four themes: removing perceived repetition, adding content areas to explicitly call out previously implied best practices, and revising best practices to provide clarity. These changes are intended to make it easier for you to understand and implement the best practices in your workloads. In addition to updating all the Well-Architected Pillar whitepapers to align more closely with these best practices, we have also improved the usability of the AWS Well-Architected Tool (AWS WA Tool).

History

This is our first Framework update since we released the AWS WA Tool in the AWS Management Console at re:Invent 2018. However, this is actually the eighth version of the Framework since it was introduced in 2012. The Framework consists of design principles, questions, and best practices that can be achieved in many different ways and are written in a manner that is much broader than a “checklist.” The best practices can be implemented using technology (AWS or third party) or by processes, which can also be automated. The Framework is written to allow you to explore if you are actually accomplishing each best practice, rather than simply giving you “yes” or “no” questions to indicate if you have performed a particular action. Evaluating your design, architecture, and implementation should not be a checklist but an evaluation of your progress toward the desired outcome.

We revise the Framework every year through a Kaizen process, where we collect the data about what worked for customers, what could be improved to provide clarity, and what should be added or removed. The first three versions were purely internal releases—used by our Solutions Architects, Professional Services Consultants, and Enterprise Support teams to have conversations with customers about building and operating the best cloud architectures.

With the fourth version in 2015, we published our guidance as a whitepaper. This increased its visibility and pointed out a blind spot: we had not documented the operational best practices required to achieve the other pillars’ operational aspects. The fifth version, released the following year, included a new pillar to address our omission, Operational Excellence. It was revised two more times, once in 2017, when we released the pillar-specific whitepapers and AWS Well-Architected Lenses, and again in 2018 when we launched the AWS WA Tool.

What’s new

We have added more topics to the Operational Excellence pillar. These specifically relate to the structure of your organization and how your culture supports those organizational choices. There are also additional best practices that we have identified and clarified in the existing areas of Prepare, Operate, and Evolve.

Operational Excellence

The Security pillar has one less question, which we accomplished by refining the best practices and removing duplication. The most changes are in identity and access management and how to operate your workloads securely.

Security pillar

The Reliability pillar has three more questions, based around your workload architecture. These have been covered as part of failure management in the Reliability Pillar whitepaper since 2017, but they have elicited enough discussion and action with customers that we now explicitly call them out.

Reliability pillar

The number of questions in the Performance Efficiency pillar did not change, but we clarified the best practices.

Performance efficiency

The Cost Optimization pillar has introduced a new section on Cloud Financial Management (CFM), which added a new question along with the best practices associated with it.

Cost optimization

These changes make the best practices clearer, remove duplication, and explicitly call out the new best practices that we have identified as an important part of having great cloud workloads. Expect to see additional blog posts explaining the changes to each pillar in more detail.

With this release, the Framework in now available in ten languages: English, Spanish, French, German, Italian, Japanese, Korean, Brazilian Portuguese, Simplified Chinese, and Traditional Chinese.

Learn, measure, improve, & iterate

It’s a best practice to regularly review your workloads—even those that have not had major changes. Our internal standard is to perform reviews at least annually. We encourage you to review your existing workloads at least once this year, and to create milestones for your workloads as they evolve. Use the Framework to guide your design and architecture of new workloads, or of workloads that you are planning on moving to the cloud. The greatest successes have come from customers that take these best practices into consideration as early as possible in the process. In effective organizations, every best practice is considered and prioritized.

Continue to learn, measure, and improve your cloud workloads with the AWS Well-Architected Framework and use the AWS Well-Architected Tool to help document your progress to having Well-Architected workloads.

Learn more about the new version of Well-Architected and its pillars

What’s New in the Well-Architected Reliability Pillar?

Post Syndicated from Seth Eliot original https://aws.amazon.com/blogs/architecture/whats-new-in-the-well-architected-reliability-pillar/

The new version of the Reliability pillar for AWS Well-Architected includes expanded content across all areas of reliability. Guidance on distributed system architecture has been reorganized and expanded, and new best practices have been added as part of the Well-Architected Review. There is a sharper focus on chaos engineering with more explanation and examples. We’ve added more details on using fault isolation to protect your workloads using Availability Zones, and beyond.

In the AWS Well-Architected Tool, new reliability best practices have been added, and existing ones updated. We have completely updated the Reliability Pillar whitepaper to align to the questions and best practices found in the tool. Additionally, we added the latest guidance on implementing the best practices using the newest AWS resources and partner technologies, such as AWS Transit Gateway, AWS Service Quotas, and CloudEndure Disaster Recovery.

The whitepaper provides clearer definitions to help you better understand the relationships among reliability, resiliency, and availability. The focus remains on resiliency, and how to design this into your workloads so that they are able to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions, such as misconfigurations or transient network issues.

Launched at re:Invent 2019, Amazon Builders’ Library shares in-depth articles on how Amazon builds and runs resilient workloads. Our updated Reliability pillar draws extensively on this information, incorporating it across multiple best practices, and linking back to specific Amazon Builders’ Library articles. The AWS Well-Architected hands-on reliability labs now include Implementing Health Checks and Managing Dependencies to improve Reliability, which lets you exercise the practices demonstrated in the library’s Implementing health checks article firsthand. We expanded the suite of Well-Architected Reliability labs with new labs on data backup, data replication, and automated infrastructure deployment.

Implementing Health Checks and Managing Dependencies to Improve Reliability-2

The new Implementing Health Checks and Managing Dependencies to Improve Reliability lab shows you how to implement practices to detect dependency failures and remain resilient despite them.

Prior to this version of the Reliability pillar, we had identified three best practice areas: Foundations, Change Management, and Failure Management. In this new version, we added a fourth area:

  • Workload Architecture: Specific patterns to follow as you design and implement software architecture for your distributed systems.

This new area covers best practices related to service-oriented architecture, microservices architectures, and distributed systems. We also added these to the AWS Well-Architected Tool, so that you can review your workloads and understand if they are using these best architectural practices. Also the whitepaper content for these has been expanded, and draws on Amazon Builders’ Library articles, including Challenges with distributed systems and Timeouts, retries, and backoff with jitter.

The previous version helped you to understand the important role of Availability Zones in a reliable architecture. In the new version, we expanded on this by adding more detail on using bulkhead architectures, such as cell-based architecture (used across AWS), where each cell is a complete, independent instance of the service.

Best practices on how you implement change have always been an important part of the Reliability pillar. We now have more practical guidance on reliable deployment, including runbooks and pipeline tests. The new best practice on immutable infrastructure expands on our previous guidance on deployment automation using canary deployment or blue/green deployment.

We’ve also expanded coverage of Chaos Engineering. You can’t consider your workload to be resilient until you hypothesize how your workload will react to failures, inject those failures to test your design, and then compare your hypothesis to the testing results. While Chaos Monkey popularized the constructive use of chaos in 2010, Amazon has been purposely injecting failures since the early 2000s to increase resiliency and ensure readiness under the most adverse of circumstance. This history and experience are all the more applicable today in the cloud, where you can both design for recovery and test those designs. This is an often-overlooked best practice, but our most successful resiliency customers recognize it as a necessary and powerful tool.

This update to the Reliability pillar of the AWS Well-Architected Framework gives you and your teams the tools and information you need to understand your workload reliability. Together with the AWS Well-Architected Tool, start creating a plan today and continue to learn, measure, and improve your cloud workloads.

A huge thank you to everyone who gives us feedback on the tool and whitepapers, and a special thank you to Stephen Beck, Adrian Hornsby, Mahanth Jayadeva, Krupakar Pasupuleti, Jon Steele, and Jon Wright for their help with this update.

Learn more about the new version of Well-Architected and its pillars

New – Label Videos with Amazon SageMaker Ground Truth

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/new-label-videos-with-amazon-sagemaker-ground-truth/

Launched at AWS re:Invent 2018, Amazon Sagemaker Ground Truth is a capability of Amazon SageMaker that makes it easy to annotate machine learning datasets. Customers can efficiently and accurately label image, text and 3D point cloud data with built-in workflows, or any other type of data with custom workflows. Data samples are automatically distributed to a workforce (private, 3rd party or MTurk), and annotations are stored in Amazon Simple Storage Service (S3). Optionally, automated data labeling may also be enabled, reducing both the amount of time required to label the dataset, and the associated costs.

As models become more sophisticated, AWS customers are increasingly applying machine learning prediction to video content. Autonomous driving is perhaps the most well-known use case, as safety demands that road condition and moving objects be correctly detected and tracked in real-time. Video prediction is also a popular application in Sports, tracking players or racing vehicles to compute all kinds of statistics that fans are so fond of. Healthcare organizations also use video prediction to identify and track anatomical objects in medical videos. Manufacturing companies do the same to track objects on the assembly line, parcels for logistics, and more. The list goes on, and amazing applications keep popping up in many different industries.

Of course, this requires building and labeling video datasets, where objects of interest need to be labeled manually. At 30 frames per second, one minute of video translates to 1,800 individual images, so the amount of work can quickly become overwhelming. In addition, specific tools have to be built to label images, manage workflows, and so on. All this work takes valuable time and resources away from an organization’s core business.

AWS customers have asked us for a better solution, and today I’m very happy to announce that Amazon Sagemaker Ground Truth now supports video labeling.

Customer use case: the National Football League
The National Football League (NFL) has already put this new feature to work. Says Jennifer Langton, SVP of Player Health and Innovation, NFL: “At the National Football League (NFL), we continue to look for new ways to use machine learning (ML) to help our fans, broadcasters, coaches, and teams benefit from deeper insights. Building these capabilities requires large amounts of accurately labeled training data. Amazon SageMaker Ground Truth was truly a force multiplier in accelerating our project timelines. We leveraged the new video object tracking workflow in addition to other existing computer vision (CV) labeling workflows to develop labels for training a computer vision system that tracks all 22 players as they move on the field during plays. Amazon SageMaker Ground Truth reduced the timeline for developing a high quality labeling dataset by more than 80%”.

Courtesy of the NFL, here are a couple of predicted frames, showing helmet detection in a Seattle Seahawks video. This particular video has 353 frames. This first picture is frame #100.

Object tracking

This second picture is frame #110.

Object tracking

Introducing Video Labeling
With the addition of video task types, customers can now use Amazon Sagemaker Ground Truth for:

  • Video clip classification
  • Video multi-frame object detection
  • Video multi-frame object tracking

The multi-frame task types support multiple labels, so that you may label different object classes present in the video frames. You can create labeling jobs to annotate frames from scratch, as well as adjustment jobs to review and fine tune frames that have already been labeled. These jobs may be distributed either to a private workforce, or to a vendor workforce you picked on AWS Marketplace.

Using the built-in GUI, workers can then easily label and track objects across frames. Once they’ve annotated a frame, they can use an assistive labeling feature to predict the location of bounding boxes in the next frame, as you will see in the demo below. This significantly simplifies labeling work, saves time, and improves the quality of annotations. Last but not least, work is saved automatically.

Preparing Input Data for Video Object Detection and Tracking
As you would expect, input data must be located in S3. You may bring either video files, or sequences of video frames.

The first option is the simplest, as Amazon Sagemaker Ground Truth includes a tool that automatically extracts frames from your video files. Optionally, you can sample frames (1 in ‘n’), in order to reduce the amount of labeling work. The extraction tool also builds a manifest file describing sequences and frames. You can learn more about it in the documentation.

The second option requires two steps: extracting frames, and building the manifest file. Extracting frames can easily be performed with the popular ffmpeg open source tool. Here’s how you could convert the first 60 seconds of a video to a frame sequence.

$ ffmpeg -ss 00:00:00.00 -t 00:01:0.00 -i basketball.mp4 frame%04d.jpg

Each frame sequence should be uploaded to S3 under a different prefix, for example s3://my-bucket/my-videos/sequence1, s3://my-bucket/my-videos/sequence2, and so on, as explained in the documentation.

Once you have uploaded your frame sequences, you may then either bring your own JSON files to describe them, or let Ground Truth crawl your sequences and build the JSON files and the manifest file for you automatically. Please note that a video sequence cannot be longer than 2,000 frames, which corresponds to about a minute of video at 30 frames per second.

Each sequence should be described by a simple sequence file:

  • A sequence number, an S3 prefix, and a number of frames.
  • A list of frames: number, file name, and creation timestamp.

Here’s an example of a sequence file.

{"version": "2020-06-01",
"seq-no": 1, "prefix": "s3://jsimon-smgt/videos/basketball", "number-of-frames": 1800, 
	"frames": [
		{"frame-no": 1, "frame": "frame0001.jpg", "unix-timestamp": 1594111541.71155},
		{"frame-no": 2, "frame": "frame0002.jpg", "unix-timestamp": 1594111541.711552},
		{"frame-no": 3, "frame": "frame0003.jpg", "unix-timestamp": 1594111541.711553},
		{"frame-no": 4, "frame": "frame0004.jpg", "unix-timestamp": 1594111541.711555},
. . .

Finally, the manifest file should point at the sequence files you’d like to include in the labeling job. Here’s an example.

{"source-ref": "s3://jsimon-smgt/videos/seq1.json"}
{"source-ref": "s3://jsimon-smgt/videos/seq2.json"}
. . .

Just like for other task types, the augmented manifest is available in S3 once labeling is complete. It contains annotations and labels, which you can then feed to your machine learning training job.

Labeling Videos with Amazon SageMaker Ground Truth
Here’s a sample video where I label the first ten frames of a sequence. You can see a screenshot below.

I first use the Ground Truth GUI to carefully label the first frame, drawing bounding boxes for basketballs and basketball players. Then, I use the “Predict next” assistive labeling tool to predict the location of the boxes in the next nine frames, applying only minor adjustments to some boxes. Although this was my first try, I found the process easy and intuitive. With a little practice, I could certainly go much faster!

Getting Started
Now, it’s your turn. You can start labeling videos with Amazon Sagemaker Ground Truth today in the following regions:

  • US East (N. Virginia), US East (Ohio), US West (Oregon),
  • Canada (Central),
  • Europe (Ireland), Europe (London), Europe (Frankfurt),
  • Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Sydney), Asia Pacific (Tokyo).

We’re looking forward to reading your feedback. You can send it through your usual support contacts, or in the AWS Forum for Amazon SageMaker.

– Julien

New PCI DSS on AWS Compliance Guide provides essential information for implementing compliant applications

Post Syndicated from Tim Winston original https://aws.amazon.com/blogs/security/new-pci-dss-on-aws-compliance-guide-provides-essential-information-for-implementing-compliant-applications/

Our mission in AWS Security Assurance Services is to ease Payment Card Industry Data Security Standard (PCI DSS) compliance for all Amazon Web Services (AWS) customers. We work closely with the AWS audit team to answer customer questions about understanding their compliance, finding and implementing solutions, and optimizing their controls and assessments. The most frequent and foundational questions have been compiled to create the Payment Card Industry Data Security Standard (PCI DSS) 3.2.1 on AWS Compliance Guide. The guide is an overview of concepts and principles for building PCI DSS compliant applications. Each section is thoroughly referenced to source AWS documentation to meet PCI DSS reporting requirements.

The guide helps customers who are developing payment applications, compliance teams that are preparing to manage assessments of cloud applications, internal assessment teams, and PCI Qualified Security Assessors (QSA) supporting customers who use AWS.

What’s in the guide?

The objective of the guide is to provide customers with the information they need to plan for and document the PCI DSS compliance of their AWS workloads.

The guide includes:

  1. What AWS PCI DSS Level 1 Service Provider status means for customers
  2. Assessment scoping of AWS applications
  3. Required diagrams for assessments
  4. Requirement-by-requirement guidance

The guide is most useful for people who are developing solutions on AWS, but it also will help Qualified Security Assessors (QSAs), internal security assessors (ISAs), and internal audit teams better understand the assessment of cloud applications. It provides examples of the diagrams required for assessments and includes links to AWS source documentation to support assessment evidence requirements.

Compliance at cloud scale

More customers than ever are running PCI DSS compliant workloads on AWS, with thousands of compliant applications. New security and governance tools available from AWS and the AWS Partner Network (APN) enable building business-as-usual compliance and automated security tasks so you can shift your focus to scaling and innovating your business.

If you have questions or want to learn more, contact your account executive, or submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Tim Winston

Tim is a Senior Security Consultant for AWS Security Assurance Services. He focuses on assisting customers build-in and optimize PCI compliance.

Author

Ted Tanner

Ted is a Senior Security Consultant for AWS Security Assurance Services. He focuses on assisting customers build-in and optimize PCI compliance.

Amazon RDS Proxy – Now Generally Available

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/amazon-rds-proxy-now-generally-available/

At AWS re:Invent 2019, we launched the preview of Amazon RDS Proxy, a fully managed, highly available database proxy for Amazon Relational Database Service (RDS) that makes applications more scalable, more resilient to database failures, and more secure. Following the preview of MySQL engine, we extended to the PostgreSQL compatibility. Today, I am pleased to announce that we are now generally available for both engines.

Many applications, including those built on modern serverless architectures using AWS Lambda, Fargate, Amazon ECS, or EKS can have a large number of open connections to the database server, and may open and close database connections at a high rate, exhausting database memory and compute resources.

Amazon RDS Proxy allows applications to pool and share connections established with the database, improving database efficiency, application scalability, and security. With RDS Proxy, failover times for Amazon Aurora and RDS databases are reduced by up to 66%, and database credentials, authentication, and access can be managed through integration with AWS Secrets Manager and AWS Identity and Access Management (IAM).

Amazon RDS Proxy can be enabled for most applications with no code change, and you don’t need to provision or manage any additional infrastructure and only pay per vCPU of the database instance for which the proxy is enabled.

Amazon RDS Proxy – Getting started
You can get started with Amazon RDS Proxy in just a few clicks by going to the AWS management console and creating an RDS Proxy endpoint for your RDS databases. In the navigation pane, choose Proxies and Create proxy. You can also see the proxy panel below.

To create your proxy, specify the Proxy identifier, a unique name of your choosing, and choose the database engine – either MySQL or PostgreSQL. Choose the encryption setting if you want the proxy to enforce TLS / SSL for all connection between application and proxy, and specify a time period that a client connection can be idle before the proxy can close it.

A client connection is considered idle when the application doesn’t submit a new request within the specified time after the previous request completed. The underlying connection between the proxy and database stays open and is returned to the connection pool. Thus, it’s available to be reused for new client connections.

Next, choose one RDS DB instance or Aurora DB cluster in Database to access through this proxy. The list only includes DB instances and clusters with compatible database engines, engine versions, and other settings.

Specify Connection pool maximum connections, a value between 1 and 100. This setting represents the percentage of the max_connections value that RDS Proxy can use for its connections. If you only intend to use one proxy with this DB instance or cluster, you can set it to 100. For details about how RDS Proxy uses this setting, see Connection Limits and Timeouts.

Choose at least one Secrets Manager secret associated with the RDS DB instance or Aurora DB cluster that you intend to access with this proxy, and select an IAM role that has permission to access the Secrets Manager secrets you chose. If you don’t have an existing secret, please click Create a new secret before setting up the RDS proxy.

After setting VPC Subnets and a security group, please click Create proxy. If you more settings in details, please refer to the documentation.

You can see the new RDS proxy after waiting a few minutes and then point your application to the RDS Proxy endpoint. That’s it!

You can also create an RDS proxy easily via AWS CLI command.

aws rds create-db-proxy \
    --db-proxy-name channy-proxy \
    --role-arn iam_role \
    --engine-family { MYSQL|POSTGRESQL } \
    --vpc-subnet-ids space_separated_list \
    [--vpc-security-group-ids space_separated_list] \
    [--auth ProxyAuthenticationConfig_JSON_string] \
    [--require-tls | --no-require-tls] \
    [--idle-client-timeout value] \
    [--debug-logging | --no-debug-logging] \
    [--tags comma_separated_list]

How RDS Proxy works
Let’s see an example that demonstrates how open connections continue working during a failover when you reboot a database or it becomes unavailable due to a problem. This example uses a proxy named channy-proxy and an Aurora DB cluster with DB instances instance-8898 and instance-9814. When the failover-db-cluster command is run from the Linux command line, the writer instance that the proxy is connected to changes to a different DB instance. You can see that the DB instance associated with the proxy changes while the connection remains open.

$ mysql -h channy-proxy.proxy-abcdef123.us-east-1.rds.amazonaws.com -u admin_user -p
Enter password:
...
mysql> select @@aurora_server_id;
+--------------------+
| @@aurora_server_id |
+--------------------+
| instance-9814 |
+--------------------+
1 row in set (0.01 sec)

mysql>
[1]+ Stopped mysql -h channy-proxy.proxy-abcdef123.us-east-1.rds.amazonaws.com -u admin_user -p
$ # Initially, instance-9814 is the writer.
$ aws rds failover-db-cluster --db-cluster-id cluster-56-2019-11-14-1399
JSON output
$ # After a short time, the console shows that the failover operation is complete.
$ # Now instance-8898 is the writer.
$ fg
mysql -h channy-proxy.proxy-abcdef123.us-east-1.rds.amazonaws.com -u admin_user -p

mysql> select @@aurora_server_id;
+--------------------+
| @@aurora_server_id |
+--------------------+
| instance-8898 |
+--------------------+
1 row in set (0.01 sec)

mysql>
[1]+ Stopped mysql -h channy-proxy.proxy-abcdef123.us-east-1.rds.amazonaws.com -u admin_user -p
$ aws rds failover-db-cluster --db-cluster-id cluster-56-2019-11-14-1399
JSON output
$ # After a short time, the console shows that the failover operation is complete.
$ # Now instance-9814 is the writer again.
$ fg
mysql -h channy-proxy.proxy-abcdef123.us-east-1.rds.amazonaws.com -u admin_user -p

mysql> select @@aurora_server_id;
+--------------------+
| @@aurora_server_id |
+--------------------+
| instance-9814 |
+--------------------+
1 row in set (0.01 sec)
+---------------+---------------+
| Variable_name | Value |
+---------------+---------------+
| hostname | ip-10-1-3-178 |
+---------------+---------------+
1 row in set (0.02 sec)

With RDS Proxy, you can build applications that can transparently tolerate database failures without needing to write complex failure handling code. RDS Proxy automatically routes traffic to a new database instance while preserving application connections.

You can review the demo for an overview of RDS Proxy and the steps you need take to access RDS Proxy from a Lambda function.

If you want to know how your serverless applications maintain excellent performance even at peak loads, please read this blog post. For a deeper dive into using RDS Proxy for MySQL with serverless, visit this post.

The following are a few things that you should be aware of:

  • Currently, RDS Proxy is available for the MySQL and PostgreSQL engine family. This engine family includes RDS for MySQL 5.6 and 5.7, PostgreSQL 10.11 and 11.5.
  • In an Aurora cluster, all of the connections in the connection pool are handled by the Aurora primary instance. To perform load balancing for read-intensive workloads, you still use the reader endpoint directly for the Aurora cluster.
  • Your RDS Proxy must be in the same VPC as the database. Although the database can be publicly accessible, the proxy can’t be.
  • Proxies don’t support compressed mode. For example, they don’t support the compression used by the --compress or -C options of the mysql command.

Now Available!
Amazon RDS Proxy is generally available in US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Frankfurt), Europe (Ireland), Europe (London) , Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney) and Asia Pacific (Tokyo) regions for Aurora MySQL, RDS for MySQL, Aurora PostgreSQL, and RDS for PostgreSQL, and it includes support for Aurora Serverless and Aurora Multi-Master.

Take a look at the product page, pricing, and the documentation to learn more. Please send us feedback either in the AWS forum for Amazon RDS or through your usual AWS support contacts.

Channy;

Find Your Most Expensive Lines of Code – Amazon CodeGuru Is Now Generally Available

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/find-your-most-expensive-lines-of-code-amazon-codeguru-is-now-generally-available/

Bringing new applications into production, maintaining their code base as they grow and evolve, and at the same time respond to operational issues, is a challenging task. For this reason, you can find many ideas on how to structure your teams, on which methodologies to apply, and how to safely automate your software delivery pipeline.

At re:Invent last year, we introduced in preview Amazon CodeGuru, a developer tool powered by machine learning that helps you improve your applications and troubleshoot issues with automated code reviews and performance recommendations based on runtime data. During the last few months, many improvements have been launched, including a more cost-effective pricing model, support for Bitbucket repositories, and the ability to start the profiling agent using a command line switch, so that you no longer need to modify the code of your application, or add dependencies, to run the agent.

You can use CodeGuru in two ways:

  • CodeGuru Reviewer uses program analysis and machine learning to detect potential defects that are difficult for developers to find, and recommends fixes in your Java code. The code can be stored in GitHub (now also in GitHub Enterprise), AWS CodeCommit, or Bitbucket repositories. When you submit a pull request on a repository that is associated with CodeGuru Reviewer, it provides recommendations for how to improve your code. Each pull request corresponds to a code review, and each code review can include multiple recommendations that appear as comments on the pull request.
  • CodeGuru Profiler provides interactive visualizations and recommendations that help you fine-tune your application performance and troubleshoot operational issues using runtime data from your live applications. It currently supports applications written in Java virtual machine (JVM) languages such as Java, Scala, Kotlin, Groovy, Jython, JRuby, and Clojure. CodeGuru Profiler can help you find the most expensive lines of code, in terms of CPU usage or introduced latency, and suggest ways you can improve efficiency and remove bottlenecks. You can use CodeGuru Profiler in production, and when you test your application with a meaningful workload, for example in a pre-production environment.

Today, Amazon CodeGuru is generally available with the addition of many new features.

In CodeGuru Reviewer, we included the following:

  • Support for Github Enterprise – You can now scan your pull requests and get recommendations against your source code on Github Enterprise on-premises repositories, together with a description of what’s causing the issue and how to remediate it.
  • New types of recommendations to solve defects and improve your code – For example, checking input validation, to avoid issues that can compromise security and performance, and looking for multiple copies of code that do the same thing.

In CodeGuru Profiler, you can find these new capabilities:

  • Anomaly detection – We automatically detect anomalies in the application profile for those methods that represent the highest proportion of CPU time or latency.
  • Lambda function support – You can now profile AWS Lambda functions just like applications hosted on Amazon Elastic Compute Cloud (EC2) and containerized applications running on Amazon ECS and Amazon Elastic Kubernetes Service, including those using AWS Fargate.
  • Cost of issues in the recommendation report – Recommendations contain actionable resolution steps which explain what the problem is, the CPU impact, and how to fix the issue. To help you better prioritize your activities, you now have an estimation of the savings introduced by applying the recommendation.
  • Color-my-code – In the visualizations, to help you easily find your own code, we are coloring your methods differently from frameworks and other libraries you may use.
  • CloudWatch metrics and alerts – To keep track and monitor efficiency issues that have been discovered.

Let’s see some of these new features at work!

Using CodeGuru Reviewer with a Lambda Function
I create a new repo in my GitHub account, and leave it empty for now. Locally, where I am developing a Lambda function using the Java 11 runtime, I initialize my Git repo and add only the README.md file to the master branch. In this way, I can add all the code as a pull request later and have it go through a code review by CodeGuru.

git init
git add README.md
git commit -m "First commit"

Now, I add the GitHub repo as origin, and push my changes to the new repo:

git remote add origin https://github.com/<my-user-id>/amazon-codeguru-sample-lambda-function.git
git push -u origin master

I associate the repository in the CodeGuru console:

When the repository is associated, I create a new dev branch, add all my local files to it, and push it remotely:

git checkout -b dev
git add .
git commit -m "Code added to the dev branch"
git push --set-upstream origin dev

In the GitHub console, I open a new pull request by comparing changes across the two branches, master and dev. I verify that the pull request is able to merge, then I create it.

Since the repository is associated with CodeGuru, a code review is listed as Pending in the Code reviews section of the CodeGuru console.

After a few minutes, the code review status is Completed, and CodeGuru Reviewer issues a recommendation on the same GitHub page where the pull request was created.

Oops! I am creating the Amazon DynamoDB service object inside the function invocation method. In this way, it cannot be reused across invocations. This is not efficient.

To improve the performance of my Lambda function, I follow the CodeGuru recommendation, and move the declaration of the DynamoDB service object to a static final attribute of the Java application object, so that it is instantiated only once, during function initialization. Then, I follow the link in the recommendation to learn more best practices for working with Lambda functions.

Using CodeGuru Profiler with a Lambda Function
In the CodeGuru console, I create a MyServerlessApp-Development profiling group and select the Lambda compute platform.

Next, I give the AWS Identity and Access Management (IAM) role used by my Lambda function permissions to submit data to this profiling group.

Now, the console is giving me all the info I need to profile my Lambda function. To configure the profiling agent, I use a couple of environment variables:

  • AWS_CODEGURU_PROFILER_GROUP_ARN to specify the ARN of the profiling group to use.
  • AWS_CODEGURU_PROFILER_ENABLED to enable (TRUE) or disable (FALSE) profiling.

I follow the instructions (for Maven and Gradle) to add a dependency, and include the profiling agent in the build. Then, I update the code of the Lambda function to wrap the handler function inside the LambdaProfiler provided by the agent.

To generate some load, I start a few scripts invoking my function using the Amazon API Gateway as trigger. After a few minutes, the profiling group starts to show visualizations describing the runtime behavior of my Lambda function.

For example, I can see how much CPU time is spent in the different methods of my function. At the bottom, there are the entry point methods. As I scroll up, I find methods that are called deeper in the stack trace. I right-click and hide the LambdaRuntimeClient methods to focus on my code. Note that my methods are colored differently than those in the packages I am using, such as the AWS SDK for Java.

I am mostly interested in what happens in the handler method invoked by the Lambda platform. I select the handler method, and now it becomes the new “base” of the visualization.

As I move my pointer on each of my methods, I get more information, including an estimation of the yearly cost of running that specific part of the code in production, based on the load experienced by the profiling agent during the selected time window. In my case, the handler function cost is estimated to be $6. If I select the two main functions above, I have an estimation of $3 each. The cost estimation works for code running on Lambda functions, EC2 instances, and containerized applications.

Similarly, I can visualize Latency, to understand how much time is spent inside the methods in my code. I keep the Lambda function handler method selected to drill down into what is under my control, and see where time is being spent the most.

The CodeGuru Profiler is also providing a recommendation based on the data collected. I am spending too much time (more than 4%) in managing encryption. I can use a more efficient crypto provider, such as the open source Amazon Corretto Crypto Provider, described in this blog post. This should lower the time spent to what is expected, about 1% of my profile.

Finally, I edit the profiling group to enable notifications. In this way, if CodeGuru detects an anomaly in the profile of my application, I am notified in one or more Amazon Simple Notification Service (SNS) topics.

Available Now
Amazon CodeGuru is available today in 10 regions, and we are working to add more regions in the coming months. For regional availability, please see the AWS Region Table.

CodeGuru helps you improve your application code and reduce compute and infrastructure costs with an automated code reviewer and application profiler that provide intelligent recommendations. Using visualizations based on runtime data, you can quickly find the most expensive lines of code of your applications. With CodeGuru, you pay only for what you use. Pricing is based on the lines of code analyzed by CodeGuru Reviewer, and on sampling hours for CodeGuru Profiler.

To learn more, please see the documentation.

Danilo

Spring 2020 PCI DSS report now available with 124 services in scope

Post Syndicated from Nivetha Chandran original https://aws.amazon.com/blogs/security/spring-2020-pci-dss-report-available-124-services-in-scope/

Amazon Web Services (AWS) continues to expand the scope of our PCI compliance program to support our customers’ most important workloads. We are pleased to announce that six services have been added to the scope of our Payment Card Industry Data Security Standard (PCI DSS) compliance program. These services were validated by Coalfire, our independent Qualified Security Assessor (QSA).

The Spring 2020 PCI DSS attestation of compliance covers 124 services that you can use to securely architect your Cardholder Data Environment (CDE) in AWS. You can see the full list of services on the AWS Services in Scope by Compliance Program page. The six newly added services are:

The compliance reports, including the Spring 2020 PCI DSS report, are available on demand through AWS Artifact. The PCI DSS package available in AWS Artifact includes the DSS v. 3.2.1 Attestation of Compliance (AOC) and Shared Responsibility Guide.

You can learn more about our PCI program and other compliance and security programs on the AWS Compliance Programs page.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Nivetha Chandran

Nivetha is a Security Assurance Manager at Amazon Web Services on the Global Audits team, managing the PCI compliance program. Nivetha holds a Master’s degree in Information Management from the University of Washington.

AWS Solutions Constructs – A Library of Architecture Patterns for the AWS CDK

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-solutions-constructs-a-library-of-architecture-patterns-for-the-aws-cdk/

Cloud applications are built using multiple components, such as virtual servers, containers, serverless functions, storage buckets, and databases. Being able to provision and configure these resources in a safe, repeatable way is incredibly important to automate your processes and let you focus on the unique parts of your implementation.

With the AWS Cloud Development Kit, you can leverage the expressive power of your favorite programming languages to model your applications. You can use high-level components called constructs, preconfigured with “sensible defaults” that you can customize, to quickly build a new application. The CDK provisions your resources using AWS CloudFormation to get all the benefits of managing your infrastructure as code. One of the reasons I like the CDK, is that you can compose and share your own custom components as higher-level constructs.

As you can imagine, there are recurring patterns that can be useful to more than one customer. For this reason, today we are launching the AWS Solutions Constructs, an open source extension library for the CDK that provides well-architected patterns to help you build your unique solutions. CDK constructs mostly cover single services. AWS Solutions Constructs provide multi-service patterns that combine two or more CDK resources, and implement best practices such as logging and encryption.

Using AWS Solutions Constructs
To see the power of a pattern-based approach, let’s take a look at how that works when building a new application. As an example, I want to build an HTTP API to store data in a Amazon DynamoDB table. To keep the content of the table small, I can use DynamoDB Time to Live (TTL) to expire items after a few days. After the TTL expires, data is deleted from the table and sent, via DynamoDB Streams, to a AWS Lambda function to archive the expired data on Amazon Simple Storage Service (S3).

To build this application, I can use a few components:

  • An Amazon API Gateway endpoint for the API.
  • A DynamoDB table to store data.
  • A Lambda function to process the API requests, and store data in the DynamoDB table.
  • DynamoDB Streams to capture data changes.
  • A Lambda function processing data changes to archive the expired data.

Can I make it simpler? Looking at the available patterns in the AWS Solutions Constructs, I find two that can help me build my app:

  • aws-apigateway-lambda, a Construct that implements an API Gateway REST API connected to a Lambda function. As an example of the “sensible defaults” used by AWS Solutions Constructs, this pattern enables CloudWatch logging for the API Gateway.
  • aws-dynamodb-stream-lambda, a Construct implementing a DynamoDB table streaming data changes to a Lambda function with the least privileged permissions.

To build the final architecture, I simply connect those two Constructs together:

I am using TypeScript to define the CDK stack, and Node.js for the Lambda functions. Let’s start with the CDK stack:

 

import * as cdk from '@aws-cdk/core';
import * as lambda from '@aws-cdk/aws-lambda';
import * as apigw from '@aws-cdk/aws-apigateway';
import * as dynamodb from '@aws-cdk/aws-dynamodb';
import { ApiGatewayToLambda } from '@aws-solutions-constructs/aws-apigateway-lambda';
import { DynamoDBStreamToLambda } from '@aws-solutions-constructs/aws-dynamodb-stream-lambda';

export class DemoConstructsStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const apiGatewayToLambda = new ApiGatewayToLambda(this, 'ApiGatewayToLambda', {
      deployLambda: true,
      lambdaFunctionProps: {
        code: lambda.Code.fromAsset('lambda'),
        runtime: lambda.Runtime.NODEJS_12_X,
        handler: 'restApi.handler'
      },
      apiGatewayProps: {
        defaultMethodOptions: {
          authorizationType: apigw.AuthorizationType.NONE
        }
      }
    });

    const dynamoDBStreamToLambda = new DynamoDBStreamToLambda(this, 'DynamoDBStreamToLambda', {
      deployLambda: true,
      lambdaFunctionProps: {
        code: lambda.Code.fromAsset('lambda'),
        runtime: lambda.Runtime.NODEJS_12_X,
        handler: 'processStream.handler'
      },
      dynamoTableProps: {
        tableName: 'my-table',
        partitionKey: { name: 'id', type: dynamodb.AttributeType.STRING },
        timeToLiveAttribute: 'ttl'
      }
    });

    const apiFunction = apiGatewayToLambda.lambdaFunction;
    const dynamoTable = dynamoDBStreamToLambda.dynamoTable;

    dynamoTable.grantReadWriteData(apiFunction);
    apiFunction.addEnvironment('TABLE_NAME', dynamoTable.tableName);
  }
}

At the beginning of the stack, I import the standard CDK constructs for the Lambda function, the API Gateway endpoint, and the DynamoDB table. Then, I add the two patterns from the AWS Solutions Constructs, ApiGatewayToLambda and DynamoDBStreamToLambda.

After declaring the two ApiGatewayToLambda and DynamoDBStreamToLambda constructs, I store the Lambda function, created by the ApiGatewayToLambda constructs, and the DynamoDB table, created by DynamoDBStreamToLambda, in two variables.

At the end of the stack, I “connect” the two patterns together by granting permissions to the Lambda function to read/write in the DynamoDB table, and add the name of the DynamoDB table to the environment of the Lambda function, so that it can be used in the function code to store data in the table.

The code of the two Lambda functions is in the lambda folder of the CDK application. I am using the Node.js 12 runtime.

The restApi.js function implements the API and writes data to the DynamoDB table. The URL path is used as partition key, all the query string parameters in the URL are stored as attributes. The TTL for the item is computed adding a time window of 7 days to the current time.

const { DynamoDB } = require("aws-sdk");

const docClient = new DynamoDB.DocumentClient();

const TABLE_NAME = process.env.TABLE_NAME;
const TTL_WINDOW = 7 * 24 * 60 * 60; // 7 days expressed in seconds

exports.handler = async function (event) {

  const item = event.queryStringParameters;
  item.id = event.pathParameters.proxy;

  const now = new Date(); 
  item.ttl = Math.round(now.getTime() / 1000) + TTL_WINDOW;

  const response = await docClient.put({
    TableName: TABLE_NAME,
    Item: item
  }).promise();

  let statusCode = 204;
  
  if (response.err != null) {
    console.error('request: ', JSON.stringify(event, undefined, 2));
    console.error('error: ', response.err);
    statusCode = 500
  }

  return {
    statusCode: statusCode
  };
};

The processStream.js function is processing data capture records from the DynamoDB Stream, looking for the items deleted by TTL. The archive functionality is not implemented in this sample code.

exports.handler = async function (event) {
  event.Records.forEach((record) => {
    console.log('Stream record: ', JSON.stringify(record, null, 2));
    if (record.userIdentity.type == "Service" &&
      record.userIdentity.principalId == "dynamodb.amazonaws.com") {

      // Record deleted by DynamoDB Time to Live (TTL)
      
      // I can archive the record to S3, for example using Kinesis Data Firehose.
    }
  }
};

Let’s see if this works! First, I need to install all dependencies. To simplify dependencies, each release of AWS Solutions Constructs is linked to the corresponding version of the CDK. I this case, I am using version 1.46.0 for both the CDK and the AWS Solutions Constructs patterns. The first three commands are installing plain CDK constructs. The last two commands are installing the AWS Solutions Constructs patterns I am using for this application.

npm install @aws-cdk/[email protected]
npm install @aws-cdk/[email protected]
npm install @aws-cdk/[email protected]
npm install @aws-solutions-constructs/[email protected]
npm install @aws-solutions-constructs/[email protected]

Now, I build the application and use the CDK to deploy the application.

npm run build
cdk deploy

Towards the end of the output of the cdk deploy command, a green light is telling me that the deployment of the stack is completed. Just next, in the Outputs, I find the endpoint of the API Gateway.

 ✅  DemoConstructsStack

Outputs:
DemoConstructsStack.ApiGatewayToLambdaLambdaRestApiEndpoint9800D4B5 = https://1a2c3c4d.execute-api.eu-west-1.amazonaws.com/prod/

I can now use curl to test the API:

curl "https://1a2c3c4d.execute-api.eu-west-1.amazonaws.com/prod/danilop?name=Danilo&amp;company=AWS"

Let’s have a look at the DynamoDB table:

The item is stored, and the TTL is set. After a week, the item will be deleted and sent via DynamoDB Streams to the processStream.js function.

After I complete my testing, I use the CDK again to quickly delete all resources created for this application:

cdk destroy

Available Now
The AWS Solutions Constructs are available now for TypeScript and Python. The AWS Solutions Builders team is working to make these constructs also available when using Java and C# with the CDK, stay tuned. There is no cost in using the AWS Solutions Constructs, or the CDK, you only pay for the resources created when deploying the stack.

In this first release, 25 patterns are included, covering lots of different use cases. Which new patterns and features should we focus now? Give use your feedback in the open source project repository!

Danilo

Introducing Instance Refresh for EC2 Auto Scaling

Post Syndicated from Ben Peven original https://aws.amazon.com/blogs/compute/introducing-instance-refresh-for-ec2-auto-scaling/

This post is contributed to by: Ran Sheinberg – Principal EC2 Spot SA, and Isaac Vallhonrat – Sr. EC2 Spot Specialist SA

Today, we are launching Instance Refresh. This is a new feature in EC2 Auto Scaling that enables automatic deployments of instances in Auto Scaling Groups (ASGs), in order to release new application versions or make infrastructure updates.

Amazon EC2 Auto Scaling is used for a wide variety of workload types and applications. EC2 Auto Scaling helps you maintain application availability through a rich feature set. This feature set includes integration into Elastic Load Balancing, automatically replacing unhealthy instances, balancing instances across Availability Zones, provisioning instances across multiple pricing options and instance types, dynamically adding and removing instances, and more.

Many customers use an immutable infrastructure approach. This approach encourages replacing EC2 instances to update the application or configuration, instead of deploying into EC2 instances that are already running. This can be done by baking code and software in golden Amazon Machine Images (AMIs), and rolling out new EC2 Instances that use the new AMI version. Another common pattern for rolling out application updates is changing the package version that the instance pulls when it boots (via updates to instance user data). Or, keeping that pointer static, and pushing a new version to the code repository or another type of artifact (container, package on Amazon S3) to be fetched by an instance when it boots and gets provisioned.

Until today, EC2 Auto Scaling customers used different methods for replacing EC2 instances inside EC2 Auto Scaling groups when a deployment or operating system update was needed. For example, UpdatePolicy within AWS CloudFormation, create_before_destroy lifecycle in Hashicorp Terraform, using AWS CodeDeploy, or even custom scripts that call the EC2 Auto Scaling API.

Customers told us that they want native deployment functionality built into EC2 Auto Scaling to take away the heavy lifting of custom solutions, or deployments that are initiated from outside of Auto Scaling groups.

Introducing Instance Refresh in EC2 Auto Scaling

You can trigger an Instance Refresh using the EC2 Auto Scaling groups Management Console, or use the new StartInstanceRefresh API in AWS CLI or any AWS SDK. All you need to do is specify the percentage of healthy instances to keep in the group while ASG terminates and launches instances. Also specify the warm-up time, which is the time period that ASG waits between groups of instances, that refreshes via Instance Refresh. If your ASG is using Health Checks, the ASG waits for the instances in the group to be healthy before it continues to the next group of instances.

Instance Refresh in action

To get started with Instance Refresh in the AWS Management Console, click on an existing ASG in the EC2 Auto Scaling Management Console. Then click the Instance refresh tab.

When clicking the Start instance refresh button, I am presented with the following options:

start instance refresh

With the default configuration, ASG works to keep 90% of the instances running and does not proceed to the next group of instances if that percentage is not kept. After each group, ASG waits for the newly launched instances to transition into the healthy state, in addition to the 300-second warm-up time to pass before proceeding to the next group of instances.

I can also initiate the same action from the AWS CLI by using the following code:

aws autoscaling start-instance-refresh --auto-scaling-group-name ASG-Instance-Refresh —preferences MinHealthyPercentage=90,InstanceWarmup=300

After initializing the instance refresh process, I can see ongoing instance refreshes in the console:

initialize instance refresh

The following image demonstrates how an active Instance refresh looks in the EC2 Instances console. Moreover, ASG strives to keep the capacity balanced between Availability Zones by terminating and launching instances in different Availability Zones in each group.

active instance refresh

Automate your workflow with Instance Refresh

You can now use this new functionality to create automations that work for your use-case.

To get started quickly, we created an example solution based on AWS Lambda. Visit the solution page on Github and see the deployment instructions.

Here’s an overview of what the solution contains and how it works:

  • An EC2 Auto Scaling group with two instances running
  • An EC2 Image Builder pipeline, set up to build and test an AMI
  • An SNS topic that would get notified when the image build completes
  • A Lambda function that is subscribed to the SNS topic, which gets triggered when the image build completes
  • The Lambda function gets the new AMI ID from the SNS notification, creates a new Launch Template version, and then triggers an Instance Refresh in the ASG, which starts the rolling update of instances.
  • Because you can configure the ASG with LaunchTemplateVersion = $Latest, every new instance that is launched by the Instance Refresh process uses the new AMI from the latest version of the Launch Template.

See the automation flow in the following diagram.

instance refresh automation flow

Conclusion

We hope that the new Instance Refresh functionality in your ASGs allow for a more streamlined approach to launching and updating your application deployments running on EC2. You can now create automations that fit your use case. This allows you to more easily refresh the EC2 instances running in your Auto Scaling groups, when deploying a new version of your application or when you must replace the AMI being used. Visit the user-guide to learn more and get started.

New – A Shared File System for Your Lambda Functions

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/

I am very happy to announce that AWS Lambda functions can now mount an Amazon Elastic File System (EFS), a scalable and elastic NFS file system storing data within and across multiple availability zones (AZ) for high availability and durability. In this way, you can use a familiar file system interface to store and share data across all concurrent execution environments of one, or more, Lambda functions. EFS supports full file system access semantics, such as strong consistency and file locking.

To connect an EFS file system with a Lambda function, you use an EFS access point, an application-specific entry point into an EFS file system that includes the operating system user and group to use when accessing the file system, file system permissions, and can limit access to a specific path in the file system. This helps keeping file system configuration decoupled from the application code.

You can access the same EFS file system from multiple functions, using the same or different access points. For example, using different EFS access points, each Lambda function can access different paths in a file system, or use different file system permissions.

You can share the same EFS file system with Amazon Elastic Compute Cloud (EC2) instances, containerized applications using Amazon ECS and AWS Fargate, and on-premises servers. Following this approach, you can use different computing architectures (functions, containers, virtual servers) to process the same files. For example, a Lambda function reacting to an event can update a configuration file that is read by an application running on containers. Or you can use a Lambda function to process files uploaded by a web application running on EC2.

In this way, some use cases are much easier to implement with Lambda functions. For example:

  • Processing or loading data larger than the space available in /tmp (512MB).
  • Loading the most updated version of files that change frequently.
  • Using data science packages that require storage space to load models and other dependencies.
  • Saving function state across invocations (using unique file names, or file system locks).
  • Building applications requiring access to large amounts of reference data.
  • Migrating legacy applications to serverless architectures.
  • Interacting with data intensive workloads designed for file system access.
  • Partially updating files (using file system locks for concurrent access).
  • Moving a directory and all its content within a file system with an atomic operation.

Creating an EFS File System
To mount an EFS file system, your Lambda functions must be connected to an Amazon Virtual Private Cloud that can reach the EFS mount targets. For simplicity, I am using here the default VPC that is automatically created in each AWS Region.

Note that, when connecting Lambda functions to a VPC, networking works differently. If your Lambda functions are using Amazon Simple Storage Service (S3) or Amazon DynamoDB, you should create a gateway VPC endpoint for those services. If your Lambda functions need to access the public internet, for example to call an external API, you need to configure a NAT Gateway. I usually don’t change the configuration of my default VPCs. If I have specific requirements, I create a new VPC with private and public subnets using the AWS Cloud Development Kit, or use one of these AWS CloudFormation sample templates. In this way, I can manage networking as code.

In the EFS console, I select Create file system and make sure that the default VPC and its subnets are selected. For all subnets, I use the default security group that gives network access to other resources in the VPC using the same security group.

In the next step, I give the file system a Name tag and leave all other options to their default values.

Then, I select Add access point. I use 1001 for the user and group IDs and limit access to the /message path. In the Owner section, used to create the folder automatically when first connecting to the access point, I use the same user and group IDs as before, and 750 for permissions. With this permissions, the owner can read, write, and execute files. Users in the same group can only read. Other users have no access.

I go on, and complete the creation of the file system.

Using EFS with Lambda Functions
To start with a simple use case, let’s build a Lambda function implementing a MessageWall API to add, read, or delete text messages. Messages are stored in a file on EFS so that all concurrent execution environments of that Lambda function see the same content.

In the Lambda console, I create a new MessageWall function and select the Python 3.8 runtime. In the Permissions section, I leave the default. This will create a new AWS Identity and Access Management (IAM) role with basic permissions.

When the function is created, in the Permissions tab I click on the IAM role name to open the role in the IAM console. Here, I select Attach policies to add the AWSLambdaVPCAccessExecutionRole and AmazonElasticFileSystemClientReadWriteAccess AWS managed policies. In a production environment, you can restrict access to a specific VPC and EFS access point.

Back in the Lambda console, I edit the VPC configuration to connect the MessageWall function to all subnets in the default VPC, using the same default security group I used for the EFS mount points.

Now, I select Add file system in the new File system section of the function configuration. Here, I choose the EFS file system and accesss point I created before. For the local mount point, I use /mnt/msg and Save. This is the path where the access point will be mounted, and corresponds to the /message folder in my EFS file system.

In the Function code editor of the Lambda console, I paste the following code and Save.

import os
import fcntl

MSG_FILE_PATH = '/mnt/msg/content'


def get_messages():
    try:
        with open(MSG_FILE_PATH, 'r') as msg_file:
            fcntl.flock(msg_file, fcntl.LOCK_SH)
            messages = msg_file.read()
            fcntl.flock(msg_file, fcntl.LOCK_UN)
    except:
        messages = 'No message yet.'
    return messages


def add_message(new_message):
    with open(MSG_FILE_PATH, 'a') as msg_file:
        fcntl.flock(msg_file, fcntl.LOCK_EX)
        msg_file.write(new_message + "\n")
        fcntl.flock(msg_file, fcntl.LOCK_UN)


def delete_messages():
    try:
        os.remove(MSG_FILE_PATH)
    except:
        pass


def lambda_handler(event, context):
    method = event['requestContext']['http']['method']
    if method == 'GET':
        messages = get_messages()
    elif method == 'POST':
        new_message = event['body']
        add_message(new_message)
        messages = get_messages()
    elif method == 'DELETE':
        delete_messages()
        messages = 'Messages deleted.'
    else:
        messages = 'Method unsupported.'
    return messages

I select Add trigger and in the configuration I select the Amazon API Gateway. I create a new HTTP API. For simplicity, I leave my API endpoint open.

With the API Gateway trigger selected, I copy the endpoint of the new API I just created.

I can now use curl to test the API:

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
No message yet.
$ curl -X POST -H "Content-Type: text/plain" -d 'Hello from EFS!' https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Hello from EFS!

$ curl -X POST -H "Content-Type: text/plain" -d 'Hello again :)' https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Hello from EFS!
Hello again :)

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Hello from EFS!
Hello again :)

$ curl -X DELETE https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
Messages deleted.

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MessageWall
No message yet.

It would be relatively easy to add unique file names (or specific subdirectories) for different users and extend this simple example into a more complete messaging application. As a developer, I appreciate the simplicity of using a familiar file system interface in my code. However, depending on your requirements, EFS throughput configuration must be taken into account. See the section Understanding EFS performance later in the post for more information.

Now, let’s use the new EFS file system support in AWS Lambda to build something more interesting. For example, let’s use the additional space available with EFS to build a machine learning inference API processing images.

Building a Serverless Machine Learning Inference API
To create a Lambda function implementing machine learning inference, I need to be able, in my code, to import the necessary libraries and load the machine learning model. Often, when doing so, the overall size of those dependencies goes beyond the current AWS Lambda limits in the deployment package size. One way of solving this is to accurately minimize the libraries to ship with the function code, and then download the model from an S3 bucket straight to memory (up to 3 GB, including the memory required for processing the model) or to /tmp (up 512 MB). This custom minimization and download of the model has never been easy to implement. Now, I can use an EFS file system.

The Lambda function I am building this time needs access to the public internet to download a pre-trained model and the images to run inference on. So I create a new VPC with public and private subnets, and configure a NAT Gateway and the route table used by the the private subnets to give access to the public internet. Using the AWS Cloud Development Kit, it’s just a few lines of code.

I create a new EFS file system and an access point in the new VPC using similar configurations as before. This time, I use /ml for the access point path.

Then, I create a new MLInference Lambda function with the same set up as before for permissions and connect the function to the private subnets of the new VPC. Machine learning inference is quite a heavy workload, so I select 3 GB for memory and 5 minutes for timeout. In the File system configuration, I add the new access point and mount it under /mnt/inference.

The machine learning framework I am using for this function is PyTorch, and I need to put the libraries required to run inference in the EFS file system. I launch an Amazon Linux EC2 instance in a public subnet of the new VPC. In the instance details, I select one of the availability zones where I have an EFS mount point, and then Add file system to automatically mount the same EFS file system I am using for the function. For the security groups of the EC2 instance, I select the default security group (to be able to mount the EFS file system) and one that gives inbound access to SSH (to be able to connect to the instance).

I connect to the instance using SSH and create a requirements.txt file containing the dependencies I need:

torch
torchvision
numpy

The EFS file system is automatically mounted by EC2 under /mnt/efs/fs1. There, I create the /ml directory and change the owner of the path to the user and group I am using now that I am connected (ec2-user).

$ sudo mkdir /mnt/efs/fs1/ml
$ sudo chown ec2-user:ec2-user /mnt/efs/fs1/ml

I install Python 3 and use pip to install the dependencies in the /mnt/efs/fs1/ml/lib path:

$ sudo yum install python3
$ pip3 install -t /mnt/efs/fs1/ml/lib -r requirements.txt

Finally, I give ownership of the whole /ml path to the user and group I used for the EFS access point:

$ sudo chown -R 1001:1001 /mnt/efs/fs1/ml

Overall, the dependencies in my EFS file system are using about 1.5 GB of storage.

I go back to the MLInference Lambda function configuration. Depending on the runtime you use, you need to find a way to tell where to look for dependencies if they are not included with the deployment package or in a layer. In the case of Python, I set the PYTHONPATH environment variable to /mnt/inference/lib.

I am going to use PyTorch Hub to download this pre-trained machine learning model to recognize the kind of bird in a picture. The model I am using for this example is relatively small, about 200 MB. To cache the model on the EFS file system, I set the TORCH_HOME environment variable to /mnt/inference/model.

All dependencies are now in the file system mounted by the function, and I can type my code straight in the Function code editor. I paste the following code to have a machine learning inference API:

import urllib
import json
import os

import torch
from PIL import Image
from torchvision import transforms

transform_test = transforms.Compose([
    transforms.Resize((600, 600), Image.BILINEAR),
    transforms.CenterCrop((448, 448)),
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),
])

model = torch.hub.load('nicolalandro/ntsnet-cub200', 'ntsnet', pretrained=True,
                       **{'topN': 6, 'device': 'cpu', 'num_classes': 200})
model.eval()


def lambda_handler(event, context):
    url = event['queryStringParameters']['url']

    img = Image.open(urllib.request.urlopen(url))
    scaled_img = transform_test(img)
    torch_images = scaled_img.unsqueeze(0)

    with torch.no_grad():
        top_n_coordinates, concat_out, raw_logits, concat_logits, part_logits, top_n_index, top_n_prob = model(torch_images)

        _, predict = torch.max(concat_logits, 1)
        pred_id = predict.item()
        bird_class = model.bird_classes[pred_id]
        print('bird_class:', bird_class)

    return json.dumps({
        "bird_class": bird_class,
    })

I add the API Gateway as trigger, similarly to what I did before for the MessageWall function. Now, I can use the serverless API I just created to analyze pictures of birds. I am not really an expert in the field, so I looked for a couple of interesting images on Wikipedia:

I call the API to get a prediction for these two pictures:

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MLInference?url=https://path/to/image/atlantic-puffin.jpg

{"bird_class": "106.Horned_Puffin"}

$ curl https://1a2b3c4d5e.execute-api.us-east-1.amazonaws.com/default/MLInference?url=https://path/to/image/western-grebe.jpg

{"bird_class": "053.Western_Grebe"}

It works! Looking at Amazon CloudWatch Logs for the Lambda function, I see that the first invocation, when the function loads and prepares the pre-trained model for inference on CPUs, takes about 30 seconds. To avoid a slow response, or a timeout from the API Gateway, I use Provisioned Concurrency to keep the function ready. The next invocations take about 1.8 seconds.

Understanding EFS Performance
When using EFS with your Lambda function, is very important to understand how EFS performance works. For throughput, each file system can be configured to use bursting or provisioned mode.

When using bursting mode, all EFS file systems, regardless of size, can burst at least to 100 MiB/s of throughput. Those over 1 TiB in the standard storage class can burst to 100 MiB/s per TiB of data stored in the file system. EFS uses a credit system to determine when file systems can burst. Each file system earns credits over time at a baseline rate that is determined by the size of the file system that is stored in the standard storage class. A file system uses credits whenever it reads or writes data. The baseline rate is 50 KiB/s per GiB of storage.

You can monitor the use of credits in CloudWatch, each EFS file system has a BurstCreditBalance metric. If you see that you are consuming all credits, and the BurstCreditBalance metric is going to zero, you should enable provisioned throughput mode for the file system, from 1 to 1024 MiB/s. There is an additional cost when using provisioned throughput, based on how much throughput you are adding on top of the baseline rate.

To avoid running out of credits, you should think of the throughput as the average you need during the day. For example, if you have a 10GB file system, you have 500 KiB/s of baseline rate, and every day you can read/write 500 KiB/s * 3600 seconds * 24 hours = 43.2 GiB.

If the libraries and everything you function needs to load during initialization are about 2 GiB, and you only access the EFS file system during function initialization, like in the MLInference Lambda function above, that means you can initialize your function (for example because of updates or scaling up activities) about 20 times per day. That’s not a lot, and you would probably need to configure provisioned throughput for the EFS file system.

If you have 10 MiB/s of provisioned throughput, then every day you have 10 MiB/s * 3600 seconds * 24 hours = 864 GiB to read or write. If you only use the EFS file system at function initialization to read about 2 GB of dependencies, it means that you can have 400 initializations per day. That may be enough for your use case.

In the Lambda function configuration, you can also use the reserve concurrency control to limit the maximum number of execution environments used by a function.

If, by mistake, the BurstCreditBalance goes down to zero, and the file system is relatively small (for example, a few GiBs), there is the possibility that your function gets stuck and can’t execute fast enough before reaching the timeout. In that case, you should enable (or increase) provisioned throughput for the EFS file system, or throttle your function by setting the reserved concurrency to zero to avoid all invocations until the EFS file system has enough credits.

Understanding Security Controls
When using EFS file systems with AWS Lambda, you have multiple levels of security controls. I’m doing a quick recap here because they should all be considered during the design and implementation of your serverless applications. You can find more info on using IAM authorization and access points with EFS in this post.

To connect a Lambda function to an EFS file system, you need:

  • Network visibility in terms of VPC routing/peering and security group.
  • IAM permissions for the Lambda function to access the VPC and mount (read only or read/write) the EFS file system.
  • You can specify in the IAM policy conditions which EFS access point the Lambda function can use.
  • The EFS access point can limit access to a specific path in the file system.
  • File system security (user ID, group ID, permissions) can limit read, write, or executable access for each file or directory mounted by a Lambda function.

The Lambda function execution environment and the EFS mount point uses industry standard Transport Layer Security (TLS) 1.2 to encrypt data in transit. You can provision Amazon EFS to encrypt data at rest. Data encrypted at rest is transparently encrypted while being written, and transparently decrypted while being read, so you don’t have to modify your applications. Encryption keys are managed by the AWS Key Management Service (KMS), eliminating the need to build and maintain a secure key management infrastructure.

Available Now
This new feature is offered in all regions where AWS Lambda and Amazon EFS are available, with the exception of the regions in China, where we are working to make this integration available as soon as possible. For more information on availability, please see the AWS Region table. To learn more, please see the documentation.

EFS for Lambda can be configured using the console, the AWS Command Line Interface (CLI), the AWS SDKs, and the Serverless Application Model. This feature allows you to build data intensive applications that need to process large files. For example, you can now unzip a 1.5 GB file in a few lines of code, or process a 10 GB JSON document. You can also load libraries or packages that are larger than the 250 MB package deployment size limit of AWS Lambda, enabling new machine learning, data modelling, financial analysis, and ETL jobs scenarios.

Amazon EFS for Lambda is supported at launch in AWS Partner Network solutions, including Epsagon, Lumigo, Datadog, HashiCorp Terraform, and Pulumi.

There is no additional charge for using EFS from Lambda functions. You pay the standard price for AWS Lambda and Amazon EFS. Lambda execution environments always connect to the right mount target in an AZ and not across AZs. You can connect to EFS in the same AZ via cross account VPC but there can be data transfer costs for that. We do not support cross region, or cross AZ connectivity between EFS and Lambda.

Danilo

OpenFOAM on Amazon EC2 C6g Arm-based Graviton2 Instances – up to 37% better price/performance

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/c6g-openfoam-better-price-performance/

This post is contributed to by: Neil Ashton (AWS) – Principal CFD Specialist SA, Karthik Raman (AWS) – Senior HPC Specialist SA, Oliver Perks (Arm) – Principal HPC Engineer

Over the past 30 years, Computational Fluid Dynamics (CFD) has become a key part of many engineering design processes. From aircraft design to modelling the blood flow inside our bodies, the ability to understand the behavior of fluids has enabled countless innovations and improved the time to market for many products. Typically, the modelling of the underlying fluid flow equations remains the major limit to accuracy; however, the scale and availability of HPC resources is arguably the main bottleneck for the typical CFD user. The need for more HPC resources has accelerated over recent years due to the move to higher-fidelity approaches, in addition to the growing use of optimization techniques and machine learning driven workflows. These workflows require the simulation of thousands or even millions of small jobs to explore a design space or train an improved turbulence model. AWS enables engineers to overcome this HPC bottleneck by allowing the quick deployment of a supercomputer. This can scale to practically any size and with the right hardware to match your needs, for example, CPUs, GPUs.

CFD users have a broad choice for instance types on AWS.  Typically the majority of users select the compute-optimized Amazon EC2 C5 family of instances. These offer a better price/performance over the Amazon EC2 M5 family of instances for the majority of CFD cases due to a need for memory bandwidth over total memory needs. This broad set of available instances allows users to easily incorporate Amazon EC2 M5 or Amazon EC2 R5 instances when higher memory is needed, for example, for pre or post-processing.

There are typically two competing needs for the typical CFD user: turn-around time and cost. Depending on the underlying numerical methods, the speed of a simulation does not linearly increase with an increased core count. At some point, the cost of network communication means a non-linear increase. This means the cost of increasing the number of cores is not matched by a similar decrease in runtime. Therefore, the user typically faces a choice between the fastest possible runtime or the most efficient cost for the simulation.

The recently announced Amazon EC2 C6g instances powered by AWS-designed Arm-based Graviton2 processors bring in a new instance option. So, the logical question for any AWS customer is whether this is suitable for their CFD workload. In this blog, we provide benchmarking results for the open-source CFD code OpenFOAM that can be easily compiled and run with the Amazon EC2 C6g instance. We demonstrate that you can achieve 37% better price/performance for both single-node and multi-node cases on more than 200M cells on thousands of cores.

 

AWS Graviton2-based Instances

AWS Graviton2 processors are custom built by AWS using the 64-bit Arm Neoverse cores to deliver great price performance for your cloud workloads running in Amazon EC2. The Graviton2-powered EC2 instances were announced at re:Invent 2019. The new Amazon EC2 C6g compute-optimized instances are available as part of the sixth generation EC2 offering. These instances are powered by AWS Graviton2 processors that use 64-bit Arm Neoverse N1 cores and custom silicon designed by AWS, built using advanced 7-nanometer manufacturing technology.

Graviton2 processor cores feature 64 KB L1 cache and 1 MB L2 cache, includes dual SIMD units to double the floating-point performance versus first-generation Graviton processors.  This targets high performance computing workloads and also support int8/fp16 number formats to accelerate machine learning inference workloads. Every vCPU is a physical core (that is, no simultaneous multithreading – SMT). The instances are single socket, so there are no NUMA concerns since every core sees the same path to memory and other cores. There are 8x DDR4 memory channels running at 3200 MT/s delivering over 200 GB/s of peak memory bandwidth.

The compute-optimized Amazon EC2 C6g instances are available in 9 sizes 1, 2, 4, 8, 16, 32, 48, and 64 vCPUs, or as bare metal instances. The compute-optimized Amazon EC2 C6g instances support configurations with up to 128 GB of DDR4 memory or 2 GB/ CPU. The instances support up-to 25 Gbps of network bandwidth and 19 Gbps of Amazon EBS bandwidth. These instances are powered by the AWS Nitro System, a combination of dedicated hardware and lightweight hypervisor.

Benchmarking Results

Single-Node Performance

In order to demonstrate the performance of the Amazon EC2 C6g instance, we have taken two test-cases that reflect the range of cases that CFD users run. For the first case, we simulate a 4 million cell motorbike case that is part of the standard OpenFOAM tutorial suite. It might seem that these cases do not represent the complexity of some production CFD workflows. However, these were chosen to allow readers to easily replicate the results. This case uses the OpenFOAM simpleFoam solver, which uses a steady-state incompressible segregated SIMPLE approach combined with a multigrid method for the linear solver. The k- SST model is used with second order upwinding for both momentum and turbulent equations. The scotch domain decomposition approach is used to ensure best parallel efficiency. However, for this first case the focus is on the single-node performance, given the low cell count.

Figures 1 and 2 show the time taken to run 5000 iterations as well as the cost for that simulation based upon on-demand pricing. It can be seen that the C6g.16xlarge offers a clear cost per simulation benefit over both the C5n.18xlarge (37%), and C5.24xlarge (29%) instances. There is however an increase in total run time compared to the C5.24xlarge (33%) and C5n.18xlarge (13%) which may mean that the C5.24xlarge or C5n.18xlarge will still be preferred for those prioritizing turn-around time.

 

Figure 1 - Single-node OpenFOAM cost per simulation

Figure 1 – Single-node OpenFOAM cost per simulation

Figure 2 - Single-node OpenFOAM run-time performance

Figure 2 – Single-node OpenFOAM run-time performance

Single-Node Performance Profile

Profiling performed on the simpleFoam solver (used in the motorbike example) showed that it is limited by the sustained memory bandwidth on the instance. We therefore measured the peak sustainable memory bandwidth performance (Figure 3) on the three instance types (C5n.18xlarge, C6g.16xlarge, C5.24xlarge) using STREAM Triad. Then calculated the corresponding bandwidth efficiencies, which are shown in Figure 4.

 

Figure 3 - Peak memory bandwidth using STREAM

Figure 3 – Peak memory bandwidth using STREAM

Figure 4 - Memory bandwidth efficiency using STREAM Triad

Figure 4 – Memory bandwidth efficiency using STREAM Triad

The C5n.18xlarge and C5.24xlarge instances have higher peak memory bandwidth due to more memory channels (dual socket) compared to C6g.16xlarge instance. However, the Graviton2 based processor can achieve a higher percentage of peak (~86% bandwidth efficiency) compared to the x86 processors (~79% on C5.24xlarge). This in addition to the cost benefits (47% vs. C5.24xlarge, 44% vs. C5n.18xlarge) is the reason for the improved cost per simulation with the C6g.16xlarge instance as shown in Figure 1.

Multi-Node Performance

A further, larger mesh of 222 million cells was studied on the same motorbike geometry with the same underlying numerical methods. We did this to assess whether the same performance trends continue for much larger multi-node simulations. Figure 5 shows the number of iterations possible per minute for different number of cores for the C5.24xlarge, C5n.18xlarge, and C6g.16xlarge instances. You can see in Figure 5 that the scaling of the C5n.18xlarge instances is much better than C5.24xlarge and C6g.16xlarge instances due to the use of the Elastic Fabric Adapter (EFA), which enables optimum scaling thanks to low latency, high-bandwidth  communication. However, while the scaling is much improved for C5n.18xlarge instances with EFA, the cost per simulation (Figure 6) shows up to 37% better price/performance for the C6g.16xlarge instances over the C5n.18xlarge and C5n.24xlarge instances.

Figure 5 - Multi-node OpenFOAM Scaling Performance

Figure 5 – Multi-node OpenFOAM Scaling Performance

Figure 6 - Multi-node OpenFOAM cost per simulation

Figure 6 – Multi-node OpenFOAM cost per simulation

Conclusions

This blog has demonstrated the ease with which both single-node and multi-node CFD simulations can be ported to the new Amazon EC2 C6g instances powered by Arm-based AWS Graviton2 processors. With no code modifications, we have been able to port an x86 workload to the new C6g instances and achieve competitive single node and multi-node performance with up to 37% better price/performance over other C family instances. We encourage you to test out your applications for yourself and reach out to us if you have any questions!

OpenFOAM compilation on AWS Graviton2 Instances

OpenFOAM v1912 builds on modern Arm-based systems, with support in-place for both the GCC compiler and the Arm Compiler for Linux (ACfL). No source code modifications are required to obtain a working OpenFOAM binary. The process for building OpenFOAM on Arm based systems is equivalent to that on x86 systems, by specifying the desired compiler in OpenFOAM bashrc file.

The only external dependencies are a working MPI library. For this experiment, we used Open MPI version 4.0.3, built with UCX 1.8 and GCC 9.2. We note that we have not applied any additional hardware-specific performance optimizations. Although deploying OpenFOAM on the Amazon EC2 C6g instances was trivial, there is documentation on the Arm community GitLab pages. This documentation covers the installation of different OpenFOAM versions and with different compilers.

For the C5n.18xlarge and C5.24xlarge simulations, OpenFOAM v1912 was compiled using GCC 8.2 and IntelMPI 2019.7. For all simulations Amazon Linux 2 was the operating system. The HPC environment for all tests was created using AWS ParallelCluster, which will soon include official support for AWS’ latest Graviton2 instances, including C6g. You can stay up to date on the latest information and releases of AWS ParallelCluster on GitHub.

Instance Details:

ModelvCPUMemory (GiB)Instance Storage (GiB)Network Bandwidth (Gbps)EBS Bandwidth (Mbps)
C5n.18xlarge72192EBS-only10019,000
C6g.16xlarge64128EBS-only2519,000
C5.24xlarge96192EBS-only2519,000

 

Software Package Management with AWS CodeArtifact

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/software-package-management-with-aws-codeartifact/

Software artifact repositories and their associated package managers are an essential component of development. Downloading and referencing pre-built libraries of software with a package manager, at the point in time the libraries are needed, simplifies both development and build processes. A variety of package repositories can be used, for example Maven Central, npm public registry, and PyPi (Python Package Index), among others. Working with a multitude of artifact repositories can present some challenges to organizations that want to carefully control both versions of, and access to, the software dependencies of their applications. Any changes to dependencies need to be controlled, to try and prevent undetected and exploitable vulnerabilities creeping into the organization’s applications. By using a centralized repository, it becomes easier for organizations to manage access control and version changes, and gives teams confidence that when updating package versions, the new versions have been approved for use by their IT leaders. Larger organizations may turn to traditional artifact repository software to solve these challenges, but these products can introduce additional challenges around installation, configuration, maintenance, and scaling. For smaller organizations, the price and maintenance effort of traditional artifact repository software may be prohibitive.

Generally available today, AWS CodeArtifact is a fully managed artifact repository service for developers and organizations to help securely store and share the software packages used in their development, build, and deployment processes. Today, CodeArtifact can be used with popular build tools and package managers such as Maven and Gradle (for Java), npm and yarn (for Javascript), and pip and twine (for Python), with more to come. As new packages are ingested, or published to your repositories, CodeArtifact automatically scales, and as a fully managed service, CodeArtifact requires no infrastructure installation or maintenance on your part. Additionally, CodeArtifact is a polyglot artifact repository, meaning it can store artifact packages of any supported type. For example, a single CodeArtifact repository could be configured to store packages from Maven, npm and Python repositories side by side in one location.

CodeArtifact repositories are organized into a domain. We recommend that you use a single domain for your organization, and then add repositories to it. For example you might choose to use different repositories for different teams. To publish packages into your repositories, or ingest packages from external repositories, you simply use the package manager tools your developers are used to. Let’s take a look at the process of getting started.

Getting started with CodeArtifact
To get started with CodeArtifact, I first need to create a domain for my organization, which will aggregate my repositories. Domains are used to perform the actual storage of packages and metadata, even though I consume them from a repository. This has the advantage that a single package asset, for example a given npm package, would be stored only once per domain no matter how many repositories it may appear to be in. From the CodeArtifact console, I can select Domains from the left-hand navigation panel, or instead create a domain as part of creating my first repository, which I’ll do here by clicking Create repository.

First, I give my repository a name and optional description, and I then have the option to connect my repository to several upstream repositories. When requests are made for packages not present in my repository, CodeArtifact will pull the respective packages from these upstream repositories for me, and cache them into my CodeArtifact repository. Note that a CodeArtifact repository can also act as an upstream for other CodeArtifact repositories. For the example here, I’m going to pull packages from the npm public registry and PyPi. CodeArtifact will refer to the repositories it creates on my behalf to manage these external connections as npm-store and pypi-store.

Clicking Next, I then select, or create, a domain which I do by choosing the account that will own the domain and then giving the domain a name. Note that CodeArtifact encrypts all assets and metadata in a domain using a single AWS Key Management Service (KMS) key. Here, I’m going to use a key that will be created for me by the service, but I can elect to use my own.

Clicking Next takes me to the final step to review my settings, and I can confirm the package flow from my selected upstream repositories is as I expect. Clicking Create repository completes the process, and in this case creates the domain, my repository, and two additional repositories representing the upstreams.

After using this simple setup process, my domain and its initial repository, configured to pull upstream from npm and PyPi, are now ready to hold software artifact packages, and I could also add additional repositories if needed. However my next step for this example is to configure the package managers for my upstream repositories, npm and pip, with access to the CodeArtifact repository, as follows.

Configuring package managers
The steps to configure various package managers can be found in the documentation, but conveniently the console also gives me the instructions I need when I select my repository. I’m going to start with npm, and I can access the instructions by first selecting my npm-pypi-example-repository and clicking View connection instructions.

In the resulting dialog I select the package manager I want to configure and I am shown the relevant instructions. I have the choice of using the AWS Command Line Interface (CLI) to manage the whole process (for npm, pip, and twine), or I can use a CLI command to get the token and then run npm commands to attach the token to the repository reference.

Regardless of the package manager, or the set of instructions I follow, the commands simply attach an authorization token, which is valid for 12 hours, to the package manager configuration for the repository. So that I don’t forget to refresh the token, I have taken the approach of adding the relevant command to my startup profile so that my token is automatically refreshed at the start of each day.

Following the same guidance, I similarly configure pip, again using the AWS CLI approach:

C:\> aws codeartifact login --tool pip --repository npm-pypi-example-repository --domain my-example-domain --domain-owner ACCOUNT_ID
Writing to C:\Users\steve\AppData\Roaming\pip\pip.ini
Successfully logged in to codeartifact for pypi

That’s it! I’m now ready to start using the single repository for dependencies in my Node.js and Python applications. Any dependency I add which is not already in the repository will be fetched from the designated upstream repositories and added to my CodeArtifact repository.

Let’s try some simple tests to close out the post. First, after changing to an empty directory, I execute a simple npm install command, in this case to install the AWS Cloud Development Kit.

npm install -g aws-cdk

Selecting the repository in the CodeArtifact console, I can see that the packages for the AWS Cloud Development Kit, and its dependencies, have now been downloaded from the upstream npm public registry repository, and added to my repository.

I mentioned earlier that CodeArtifact repositories are polyglot, and able to store packages of any supported type. Let’s now add a Python package, in this case Pillow, a popular image manipulation library.

> pip3 install Pillow
Looking in indexes: https://aws:****@my-example-domain-123456789012.d.codeartifact.us-west-2.amazonaws.com/pypi/npm-pypi-example-repository/simple/
Collecting Pillow
  Downloading https://my-example-domain-123456789012.d.codeartifact.us-west-2.amazonaws.com/pypi/npm-pypi-example-repository/simple/pillow/7.1.2/Pillow-7.1.2-cp38-cp38-win_amd64.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 819 kB/s
Installing collected packages: Pillow
Successfully installed Pillow-7.1.2

In the console, I can see the Python package sitting alongside the npm packages I added earlier.

Although I’ve used the console to verify my actions, I could equally well use CLI commands. For example, to list the repository packages I could have run the following command:

aws codeartifact list-packages --domain my-example-domain --repository npm-pypi-example-repository

As you might expect, additional commands are available to help with work with domains, repositories, and the packages they contain.

Availability
AWS CodeArtifact is now generally available in the Frankfurt, Ireland, Mumbai, N.Virginia, Ohio, Oregon, Singapore, Sweden, Sydney, and Tokyo regions. Tune in on June 12th at noon (PST) to Twitch.tv/aws or LinkedIn Live, where we will be showing how you can get started with CodeArtifact.

— Steve

New – Label 3D Point Clouds with Amazon SageMaker Ground Truth

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/new-label-3d-point-clouds-with-amazon-sagemaker-ground-truth/

Launched at AWS re:Invent 2018, Amazon Sagemaker Ground Truth is a capability of Amazon SageMaker that makes it easy to annotate machine learning datasets. Customers can efficiently and accurately label image and text data with built-in workflows, or any other type of data with custom workflows. Data samples are automatically distributed to a workforce (private, 3rd party or MTurk), and annotations are stored in Amazon Simple Storage Service (S3). Optionally, automated data labeling may also be enabled, reducing both the amount of time required to label the dataset, and the associated costs.

About a year ago, I met with Automotive customers who expressed interest in labeling 3-dimensional (3D) datasets for autonomous driving. Captured by LIDAR sensors, these datasets are particularly large and complex. Data is stored in frames that typically contain 50,000 to 5 million points, and can weigh up to hundreds of Megabytes each. Frames are either stored individually, or in sequences that make it easier to track moving objects.

As you can imagine, labeling these datasets is extremely time-consuming, as workers need to navigate complex 3D scenes and annotate many different object classes. This often requires building and managing very complex tools. Always looking to help customers build simpler and more efficient workflows, the Ground Truth team gathered more feedback, and got to work.

Today, I’m extremely happy to announce that you can use Amazon Sagemaker Ground Truth to label 3D point clouds using a built-in editor, and state-of-the-art assistive labeling features.

Introducing 3D Point Cloud Labeling
Just like for other Ground Truth tasks types, input data for 3D point clouds has to be stored in an S3 bucket. It also needs to be described by a manifest file, a JSON file containing both the location of frames in S3 and their attributes. A dataset may contain either single-frame data, or multi-frame sequences.

Optionally, the dataset may also include image data captured by on-board cameras. Using a feature called “sensor fusion”, Ground Truth can synchronize a 3D point cloud with up to 8 cameras. Thanks to this, workers get a real-life view of the scene, and they can also interchangeably apply labels to 2D images and 3D point clouds.

Once the manifest file is ready, Ground Truth lets you create the following task types:

  • Object Detection: identify objects of interest within a 3D point cloud frame.
  • Object Tracking: track objects of interest across a sequence of 3D point cloud frames.
  • Semantic Segmentation: segment the points of a 3D point cloud frame into predefined categories.

These can either be labeling jobs where workers annotate new frames, or adjustment jobs where they review and fine-tune existing annotations. Jobs may be distributed either to a private workforce or to a vendor workforce you picked on AWS Marketplace.

Using the built-in graphical user interface (GUI) and its shortcuts for navigation and labeling, workers can quickly and accurately apply labels, boxes and categories to 3D objects (“car”, “pedestrian”, and so on). They can also add user-defined attributes, such as the color of a car, or whether an object is fully or partially visible.

The GUI includes many assistive labeling features that significantly simplify labeling work, save time, and improve the quality of annotations. Here are a few examples:

  • Snapping: Ground Truth infers a tight-fitting box around the object.
  • Interpolation: the labeler annotates an object in the first and last frames of a sequence. Ground Truth automatically annotates it in the middle frames.
  • Ground detection and removal: Ground Truth can automatically detect and remove 3D points belonging to the ground from object boxes.

Even with assistive labeling, it may take a while to annotate complex frames and sequences, so work is saved periodically to avoid any data loss.

Preparing 3D Point Cloud Datasets
As previously mentioned, you have to provide a manifest file describing your 3D dataset. The format of this file is defined in the Ground Truth documentation. Of course, the steps required to build it will vary from one dataset to the next. For example, the Audi A2D2 dataset contains almost 400,000 frames, with 360-degree 3D LIDAR data and 2D images. KITTI, another popular choice for autonomous driving research, includes a 3D dataset with 15,000 images and their corresponding point clouds, for a total of 80,256 labeled objects. This notebook shows you how to convert KITTI data to the Ground Truth format.

When datasets contain both 3D LIDAR data and 2D camera images, one challenge is to synchronize them. This allows us to project 3D points to 2D coordinates, map them on the pictures captured by on-board cameras, and vice versa. Another challenge is that data captured by a given device uses coordinates local to this device. Fortunately, we know where the device is located on the car, and where it’s pointed to. All of this can be solved by building a global coordinate system, also known as a World Coordinate System (WCS). Using matrix operations (which I’ll spare you), we can compute the coordinates of all data points inside the WCS.

Once frames have been processed, their information is saved in the manifest file: the position of the vehicle, the location of LIDAR data in S3, the location of associated pictures in S3, and so on. For large datasets, the whole process is a significant workload, and you could run it on a managed service such as Amazon SageMaker Processing, Amazon EMR or AWS Glue.

Labeling 3D Point Clouds with Amazon SageMaker Ground Truth
Let’s do a quick demo, based on this notebook. Starting from pre-processed sample frames, it streamlines the process of creating a 3D point cloud labeling job for each of the six task types (Object Detection, Object Tracking, Semantic Segmentation, and the associated adjustment task types). You can easily make yourself a private worker, and start labeling frames with the worker GUI and its labeling tools.

A picture is worth a thousand words, and a video even more! In this first video, I annotate a couple of cars using two assistive labeling features. First, I fit the box to the ground, which helps me capture object points that are close to the ground without actually capturing the ground itself. Second, I fit the box to the object, which ensures a tight fit without any blank space.

Amazon SageMaker Ground Truth

In this second video, I annotate a third car using the same technique. It’s quite harder to “see” than the previous ones, but I still manage to fit a tight box around it. Playing the next nine frames, I see that this car is actually moving. Jumping directly to the tenth frame, I adjust the bounding box to the new location of the car. Ground Truth automatically labels the eight middle frames, another assistive labeling feature called interpolation.

Amazon SageMaker Ground Truth

I’ve barely scratched the surface, and there’s plenty more to learn. Now it’s your turn!

Getting Started
You can start labeling 3D point clouds with Amazon Sagemaker Ground Truth today in the following regions:

  • US East (N. Virginia), US East (Ohio), US West (Oregon),
  • Canada (Central),
  • Europe (Ireland), Europe (London), Europe (Frankfurt),
  • Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Sydney), Asia Pacific (Tokyo).

We’re looking forward to reading your feedback. You can send it through your usual support contacts, or in the AWS Forum for Amazon SageMaker.

– Julien

New – Amazon EC2 C5a Instances Powered By 2nd Gen AMD EPYC™ Processors

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-c5a-instances-powered-by-2nd-gen-amd-epyc-processors/

Over the last 18 months, we have launched AMD-powered M5a and R5a/M5ad and R5ad, and T3a instances to provide customers additional choice for running their general purpose and memory intensive workloads. Built on the AWS Nitro System, these instances are powered by custom 1st generation AMD EPYC™ processors. These instances are priced 10% lower than comparable EC2 M5, R5, and T3 instances, and provide you with options to balance your instance mix based on cost and performance.

Today, I am excited to announce the general availability of compute-optimized C5a instances featuring 2nd Gen AMD EPYC™ processors, running at frequencies up to 3.3 GHz, are generally available. C5a instances are variants of Amazon EC2’s compute-optimized (C5) instance family and provide high performance processing at 10% lower cost over comparable instances. C5a instances are ideal for a broad set of compute-intensive workloads including batch processing, distributed analytics, data transformations, log analysis, and web applications.

You can launch C5a instances today in eight sizes in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Singapore) Regions in On-Demand, Spot, and Reserved Instance or as part of a Savings Plan. Here are the specs:

Instance NamevCPUsRAMEBS-Optimized BandwidthNetwork Bandwidth
c5a.large
24 GiBUp to 3.170 GbpsUp to 10 Gbps
c5a.xlarge
48 GiBUp to 3.170 GbpsUp to 10 Gbps
c5a.2xlarge
816 GiBUp to 3.170 GbpsUp to 10 Gbps
c5a.4xlarge
1632 GiBUp to 3.170 GbpsUp to 10 Gbps
c5a.8xlarge
3264 GiB3.170 Gbps10 Gbps
c5a.12xlarge
4896 GiB4.750 Gbps12 Gbps
c5a.16xlarge
64128 GiB6.3 Gbps20 Gbps
c5a.24xlarge
96192 GiB9.5 Gbps20 Gbps

But wait, there’s more! Disk variants, C5ad, that come with fast, local NVMe instance storage and bare metal variants, C5an.metal and C5adn.metal, are coming soon.

The C5a instances will be supported across a broad set of AWS services that support C5 today, including AWS Batch, Amazon EMR, Elastic Container Service (ECS), and Elastic Kubernetes Service(EKS). C5a are fully-compatible 64-bit x86 and managed by the same Nitro platform used across Amazon EC2. Again, these instances are available in similar sizes as the C5 instances, and the AMIs work on either, so go ahead and try both!

To learn more, visit our AMD Instances page and please send feedback to [email protected], AWS forum for EC2 or through your usual AWS Support contacts.

Channy;

Proactively Monitoring System Performance on Amazon Lightsail Instances

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/proactively-monitoring-system-performance-on-amazon-lightsail-instances/

This post is contributed by Mike Coleman, AWS Senior Developer Advocate – Lightsail

I commonly hear from customers that they want to be able to proactively identify issues that could affect system performance before they become a problem. For instance, the ability to be alerted before an instance might become unresponsive to a burst in traffic due to exhaustion of all its CPU burst capacity. Burst capacity can be consumed either from a workload that needs to operate in the burstable zone for long periods of time, or unexpected CPU consumption by system processes. In either case, you’d want to be notified so you could take corrective action such as moving to a larger instance or stopping errant processes. To that end, today Amazon Lightsail launched a new feature allowing you to set up custom alarms to be notified when your burst capacity is running low.

 

Amazon Lightsail instances use burstable CPUs. These CPUs operate in two different zones: the sustainable zone and the burstable zone. The sustainable zone is based on the CPUs baseline performance. As long as the CPU utilization stays below this baseline, the system will perform with no impact to system responsiveness. The burstable zone is entered whenever the CPU utilization climbs above the baseline. The instance can only operate in this zone for a finite amount of time (more on that below) before system performance is negatively impacted.

 

Earlier this year, we announced the release of resource monitoring, alarms, and notifications for Amazon Lightsail instances. This feature introduced a graph that shows when an instance operates in the CPUs burstable zone. However, this was only a partial solution since there was no easy way for you to know how long your system could effectively operate in the burstable zone.

With today’s new feature, Lightsail has augmented that functionality by allowing you to see how much burstable capacity is available to your system at any given time. Additionally, you can create alarms and be notified when that burstable capacity has dropped to a critical level. This allows users to be proactively notified that there is a potential performance issue developing.

 

The remainder of this blog runs through how to configure a burstable capacity alert so you can prevent system performance issues before they impact your users.

 

Burstable capacity overview

Before I begin, it’s important to understand how burst capacity minutes are calculated. One minute of the CPU running at 100% is one minute of CPU burst capacity. By the same token, one minute of the CPU running at 50% is 30 seconds of CPU burst capacity. So, if a system has 72 minutes of available burst capacity, and it’s running at 50% CPU it can run in that state for 144 minutes before system performance is impacted.

CPU burstable capacity can be displayed two ways: percentage available and minutes available, with the default being percentage.  CPU burst capacity percentage is a simple calculation of dividing the available burst capacity (minutes) by the maximum available burst capacity (minutes). For example, an instance with 36 minutes of burst capacity left out of a maximum available limit of 72 minutes would be at 50%.

 

Configuring a Burstable Capacity Alert

Let’s take a look at how to configure an alert for CPU burst capacity (percentage).

 

Note: If you’re not familiar with the general concepts around how to configure alerts and notifications in Lightsail, you can read this blog post.

 

  1. From the Lightsail home page click on the instance you wish to create the alert for.
  2. From the horizontal menu, choose Metrics.
    metrics tab of Lightsail console
  3. You should now see the graphs for both CPU utilization and burst capacity.

Notice how the CPU utilization graph shows the sustainable and the burstable zones.

The burst capacity graph shows you the current percentage of burst capacity you have remaining of minutes that you before system performance is impacted.

In the following graphs, you can see that the CPU is operating at about 15% and has just over 90% of remaining burst capacity.
cpu graphs

  1. Scroll down and click CPU burst capacity (percentage) under
    cpu alarm option
  2. Click +Add alarm.
  3. For my alarm, I choose to be notified whenever the percentage of available CPU burst capacity drops below 25% for 10 consecutive minutes.
    cpu percentage alarm option
  4. At this point, you can enable notifications via email or SMS message (instructions on how to do this can be found in the blog post I linked earlier). Regardless of whether you choose to enable notifications, you’ll receive a banner in the Lightsail console whenever your alarm threshold is breached.
  5. Click Create to create your alarm.

Lightsail is now configured with a single alarm. I consider it a best practice to configure two alarms. The first is a warning level, and the second is the critical level. This ensures that you have additional time to respond to developing problems. If you’d like to create a second alarm, just repeat the previous steps.

 

Conclusion

In this blog, I covered the concepts behind Lightsail’s burstable CPUs, and how you create alarms to respond before your system runs out of burst capacity.  If you see that your system is running out of bursting capacity frequently, you should investigate the processes running on your instance looking for any that are consuming extra CPU. Or you should consider upgrading your instance to a larger plan. Read more about this latest feature here, and check out our Getting Started page for more tutorials and resources.

 

 

 

 

 

Amazon FSx for Windows File Server – Storage Size and Throughput Capacity Scaling

Post Syndicated from Harunobu Kameda original https://aws.amazon.com/blogs/aws/amazon-fsx-for-windows-file-server-storage-size-and-throughput-capacity-scaling/

Amazon FSx for Windows File Server provides fully managed, highly reliable file storage that is accessible over the Server Message Block (SMB) protocol. It is built on Windows Server, delivering a wide range of administrative features such as user quotas, end-user file restore, and Microsoft Active Directory integration, consistent with operating an on-premises Microsoft Windows file server. Today, we are happy to announce two new features: storage capacity scaling and throughput capacity scaling. The storage capacity scaling allows you to increase your file system size as your data set increases, and throughput capacity is bidirectional letting you can adjust throughput up or down dynamically to help fine-tune performance and reduce costs. With the capability to grow storage capacity, you can adjust your storage size as your data sets grow, so you don’t need to worry about growing data sets when creating the file system. With the capability to change throughput capacity, you can dynamically adjust throughput capacity for cyclical workloads or for one-time bursts to achieve a time-sensitive goal such as data migration.

When we create a file system, we specify Storage Capacity and Throughput Capacity.

The storage capacity of SSD can be specified between 32 GiB and 65,536 GiB, and the capacity of HDD can be specified between 2,000 GiB and 65,536 GiB. With throughput capacity, every Amazon FSx file system has a throughput capacity that you configure when the file system is created. The throughput capacity determines the speed at which the file server hosting your file system can serve file data to clients accessing it. Higher levels of throughput capacity also come with more memory for caching data on the file server and support higher levels of IOPS.

With this release, you can scale up storage capacity and can scale up / down throughput capacity on your file system with the click of a button within the AWS Management Console, or you can use the AWS Software Development Kit (SDK) or Command Line Interface (CLI) tools. The file system is available online while scaling is in progress and you’ll have full access to it for storage scaling. During scaling throughput, Amazon FSx for Windows switches out the file servers on your file system, so you’ll see an automatic failover and failback on multi-AZ file systems.

So, let’s have a little trip through the new feature. We’ll look at the AWS Management Console at first.

Operation by AWS Management Console

Before we begin, we assume AWS Managed Microsoft AD by AWS Directory Service and Amazon FSx for Windows File Server are already set up. You can obtain a walkthrough guide here. With Actions drop down, we can select Update storage capacity and Update throughput capacity

We can assign new storage capacity by Percentage or Absolute value.

With throughput scaling, we can select the desired capacity from the drop down list.

Then, Status is changed to In Progress, and you still have access to the file system.

Scaling Storage Capacity and Throughput Capacity via CLI

First, we need a CLI environment. I prefer to work on AWS Cloud9, but you can use whatever you want. We need to know the file system ID to scale it. Type in the command below:

aws fsx --endpoint-url <endpoint> describe-file-systems

The endpoint differs among AWS Regions, and you can get a full list here. We’ll get a return, which is long and detailed. The file system ID is at the top of the return.

Let’s change Storage Capacity. The command below is the one to change it:

aws fsx --endpoint-url <endpoint> update-file-system --file-system-id=<FileSystemId> --storage-capacity <new capacity>

The <new capacity> should be a number up to 65536, and the new assigned capacity should be at least 10% larger than the current capacity. Once we type in the command, the new capacity is available for use within minutes. Once the new storage capacity is available on our file system, Amazon FSx begins storage optimization, which is the process of migrating the file system’s data to the new, larger disks. If needed, we can accelerate the storage optimization process at any time by temporarily increasing the file system’s throughput capacity. There is minimal performance impact while Amazon FSx performs these operations in the background, and we always have full access to our file system.

If you enter the following command, you’ll see that file system update is in “IN_PROGESS” and storage optimization is in “PENDING” at the bottom part of the log return.

aws fsx --endpoint-url <endpoint> describe-file-systems

After the storage optimization process begins:

We can also go further and run throughput scaling at the same time. Type the command below:

aws fsx --endpoint-url <endpoint> update-file-system --file-system-id=<FileSystemId> --windows-configuration ThroughputCapacity=<new capacity>

The “new capacity” should be <8 or 16 or 32 or 64 or 128 or 256 or 512 or 1024 or 2048> and should be larger than the current capacity.

Now, we can see that throughput scaling and storage optimization are both in progress. Again, we still have full access to the file system.

With throughput scaling, we can select the desired capacity from the drop down list.

When we need further large capacity more than 65,536 GiB, we can use Microsoft’s Distributed File System (DFS) Namespaces to group multiple file systems under a single namespace.

Available Today

Storage capacity scaling and throughput capacity scaling are available today for all AWS Regions where Amazon FSx for Windows File Server is available. This support is available for new file systems starting today, and will be expanded to all file system in the coming weeks. Check our documentation for more details.

– Kame;

Fine-grained Continuous Delivery With CodePipeline and AWS Step Functions

Post Syndicated from Richard H Boyd original https://aws.amazon.com/blogs/devops/new-fine-grained-continuous-delivery-with-codepipeline-and-aws-stepfunctions/

Automating your software release process is an important step in adopting DevOps best practices. AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline was modeled after the way that the retail website Amazon.com automated software releases, and many early decisions for CodePipeline were based on the lessons learned from operating a web application at that scale.

However, while most cross-cutting best practices apply to most releases, there are also business specific requirements that are driven by domain or regulatory requirements. CodePipeline attempts to strike a balance between enforcing best practices out-of-the-box and offering enough flexibility to cover as many use-cases as possible.

To support use cases requiring fine-grained customization, we are launching today a new AWS CodePipeline action type for starting an AWS Step Functions state machine execution. Previously, accomplishing such a workflow required you to create custom integrations that marshaled data between CodePipeline and Step Functions. However, you can now start either a Standard or Express Step Functions state machine during the execution of a pipeline.

With this integration, you can do the following:

·       Conditionally run an Amazon SageMaker hyper-parameter tuning job

·       Write and read values from Amazon DynamoDB, as an atomic transaction, to use in later stages of the pipeline

·       Run an Amazon Elastic Container Service (Amazon ECS) task until some arbitrary condition is satisfied, such as performing integration or load testing

Example Application Overview

In the following use case, you’re working on a machine learning application. This application contains both a machine learning model that your research team maintains and an inference engine that your engineering team maintains. When a new version of either the model or the engine is released, you want to release it as quickly as possible if the latency is reduced and the accuracy improves. If the latency becomes too high, you want the engineering team to review the results and decide on the approval status. If the accuracy drops below some threshold, you want the research team to review the results and decide on the approval status.

This example will assume that a CodePipeline already exists and is configured to use a CodeCommit repository as the source and builds an AWS CodeBuild project in the build stage.

The following diagram illustrates the components built in this post and how they connect to existing infrastructure.

Architecture Diagram for CodePipline Step Functions integration

First, create a Lambda function that uses Amazon Simple Email Service (Amazon SES) to email either the research or engineering team with the results and the opportunity for them to review it. See the following code:

import json
import os
import boto3
import base64

def lambda_handler(event, context):
    email_contents = """
    <html>
    <body>
    <p><a href="{url_base}/{token}/success">PASS</a></p>
    <p><a href="{url_base}/{token}/fail">FAIL</a></p>
    </body>
    </html>
"""
    callback_base = os.environ['URL']
    token = base64.b64encode(bytes(event["token"], "utf-8")).decode("utf-8")

    formatted_email = email_contents.format(url_base=callback_base, token=token)
    ses_client = boto3.client('ses')
    ses_client.send_email(
        Source='[email protected]',
        Destination={
            'ToAddresses': [event["team_alias"]]
        },
        Message={
            'Subject': {
                'Data': 'PLEASE REVIEW',
                'Charset': 'UTF-8'
            },
            'Body': {
                'Text': {
                    'Data': formatted_email,
                    'Charset': 'UTF-8'
                },
                'Html': {
                    'Data': formatted_email,
                    'Charset': 'UTF-8'
                }
            }
        },
        ReplyToAddresses=[
            '[email protected]',
        ]
    )
    return {}

To set up the Step Functions state machine to orchestrate the approval, use AWS CloudFormation with the following template. The Lambda function you just created is stored in the email_sender/app directory. See the following code:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  NotifierFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: email_sender/
      Handler: app.lambda_handler
      Runtime: python3.7
      Timeout: 30
      Environment:
        Variables:
          URL: !Sub "https://${TaskTokenApi}.execute-api.${AWS::Region}.amazonaws.com/Prod"
      Policies:
      - Statement:
        - Sid: SendEmail
          Effect: Allow
          Action:
          - ses:SendEmail
          Resource: '*'

  MyStepFunctionsStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      RoleArn: !GetAtt SFnRole.Arn
      DefinitionString: !Sub |
        {
          "Comment": "A Hello World example of the Amazon States Language using Pass states",
          "StartAt": "ChoiceState",
          "States": {
            "ChoiceState": {
              "Type": "Choice",
              "Choices": [
                {
                  "Variable": "$.accuracypct",
                  "NumericLessThan": 96,
                  "Next": "ResearchApproval"
                },
                {
                  "Variable": "$.latencyMs",
                  "NumericGreaterThan": 80,
                  "Next": "EngineeringApproval"
                }
              ],
              "Default": "SuccessState"
            },
            "EngineeringApproval": {
                 "Type":"Task",
                 "Resource":"arn:aws:states:::lambda:invoke.waitForTaskToken",
                 "Parameters":{  
                    "FunctionName":"${NotifierFunction.Arn}",
                    "Payload":{
                      "latency.$":"$.latencyMs",
                      "team_alias":"[email protected]",
                      "token.$":"$$.Task.Token"
                    }
                 },
                 "Catch": [ {
                    "ErrorEquals": ["HandledError"],
                    "Next": "FailState"
                 } ],
              "Next": "SuccessState"
            },
            "ResearchApproval": {
                 "Type":"Task",
                 "Resource":"arn:aws:states:::lambda:invoke.waitForTaskToken",
                 "Parameters":{  
                    "FunctionName":"${NotifierFunction.Arn}",
                    "Payload":{  
                       "accuracy.$":"$.accuracypct",
                       "team_alias":"[email protected]",
                       "token.$":"$$.Task.Token"
                    }
                 },
                 "Catch": [ {
                    "ErrorEquals": ["HandledError"],
                    "Next": "FailState"
                 } ],
              "Next": "SuccessState"
            },
            "FailState": {
              "Type": "Fail",
              "Cause": "Invalid response.",
              "Error": "Failed Approval"
            },
            "SuccessState": {
              "Type": "Succeed"
            }
          }
        }

  TaskTokenApi:
    Type: AWS::ApiGateway::RestApi
    Properties: 
      Description: String
      Name: TokenHandler
  SuccessResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId: !Ref TokenResource
      PathPart: "success"
      RestApiId: !Ref TaskTokenApi
  FailResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId: !Ref TokenResource
      PathPart: "fail"
      RestApiId: !Ref TaskTokenApi
  TokenResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId: !GetAtt TaskTokenApi.RootResourceId
      PathPart: "{token}"
      RestApiId: !Ref TaskTokenApi
  SuccessMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      HttpMethod: GET
      ResourceId: !Ref SuccessResource
      RestApiId: !Ref TaskTokenApi
      AuthorizationType: NONE
      MethodResponses:
        - ResponseParameters:
            method.response.header.Access-Control-Allow-Origin: true
          StatusCode: 200
      Integration:
        IntegrationHttpMethod: POST
        Type: AWS
        Credentials: !GetAtt APIGWRole.Arn
        Uri: !Sub "arn:aws:apigateway:${AWS::Region}:states:action/SendTaskSuccess"
        IntegrationResponses:
          - StatusCode: 200
            ResponseTemplates:
              application/json: |
                {}
          - StatusCode: 400
            ResponseTemplates:
              application/json: |
                {"uhoh": "Spaghetti O's"}
        RequestTemplates:
          application/json: |
              #set($token=$input.params('token'))
              {
                "taskToken": "$util.base64Decode($token)",
                "output": "{}"
              }
        PassthroughBehavior: NEVER
        IntegrationResponses:
          - StatusCode: 200
      OperationName: "TokenResponseSuccess"
  FailMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      HttpMethod: GET
      ResourceId: !Ref FailResource
      RestApiId: !Ref TaskTokenApi
      AuthorizationType: NONE
      MethodResponses:
        - ResponseParameters:
            method.response.header.Access-Control-Allow-Origin: true
          StatusCode: 200
      Integration:
        IntegrationHttpMethod: POST
        Type: AWS
        Credentials: !GetAtt APIGWRole.Arn
        Uri: !Sub "arn:aws:apigateway:${AWS::Region}:states:action/SendTaskFailure"
        IntegrationResponses:
          - StatusCode: 200
            ResponseTemplates:
              application/json: |
                {}
          - StatusCode: 400
            ResponseTemplates:
              application/json: |
                {"uhoh": "Spaghetti O's"}
        RequestTemplates:
          application/json: |
              #set($token=$input.params('token'))
              {
                 "cause": "Failed Manual Approval",
                 "error": "HandledError",
                 "output": "{}",
                 "taskToken": "$util.base64Decode($token)"
              }
        PassthroughBehavior: NEVER
        IntegrationResponses:
          - StatusCode: 200
      OperationName: "TokenResponseFail"

  APIDeployment:
    Type: AWS::ApiGateway::Deployment
    DependsOn:
      - FailMethod
      - SuccessMethod
    Properties:
      Description: "Prod Stage"
      RestApiId:
        Ref: TaskTokenApi
      StageName: Prod

  APIGWRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: "Allow"
            Principal:
              Service:
                - "apigateway.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        - PolicyName: root
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 
                 - 'states:SendTaskSuccess'
                 - 'states:SendTaskFailure'
                Resource: '*'
  SFnRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: "Allow"
            Principal:
              Service:
                - "states.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        - PolicyName: root
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 
                 - 'lambda:InvokeFunction'
                Resource: !GetAtt NotifierFunction.Arn

 

After you create the CloudFormation stack, you have a state machine, an Amazon API Gateway REST API, a Lambda function, and the roles each resource needs.

Your pipeline invokes the state machine with the load test results, which contain the accuracy and latency statistics. It decides which, if either, team to notify of the results. If the results are positive, it returns a success status without notifying either team. If a team needs to be notified, the Step Functions asynchronously invokes the Lambda function and passes in the relevant metric and the team’s email address. The Lambda function renders an email with links to the pass/fail response so the team can choose the Pass or Fail link in the email to respond to the review. You use the REST API to capture the response and send it to Step Functions to continue the state machine execution.

The following diagram illustrates the visual workflow of the approval process within the Step Functions state machine.

StepFunctions StateMachine for approving code changes

 

After you create your state machine, Lambda function, and REST API, return to CodePipeline console and add the Step Functions integration to your existing release pipeline. Complete the following steps:

  1. On the CodePipeline console, choose Pipelines.
  2. Choose your release pipeline.CodePipeline before adding StepFunction integration
  3. Choose Edit.CodePipeline Edit View
  4. Under the Edit:Build section, choose Add stage.
  5. Name your stage Release-Approval.
  6. Choose Save.
    You return to the edit view and can see the new stage at the end of your pipeline.CodePipeline Edit View with new stage
  7. In the Edit:Release-Approval section, choose Add action group.
  8. Add the Step Functions StateMachine invocation Action to the action group. Use the following settings:
    1. For Action name, enter CheckForRequiredApprovals.
    2. For Action provider, choose AWS Step Functions.
    3. For Region, choose the Region where your state machine is located (this post uses US West (Oregon)).
    4. For Input artifacts, enter BuildOutput (the name you gave the output artifacts in the build stage).
    5. For State machine ARN, choose the state machine you just created.
    6. For Input type¸ choose File path. (This parameter tells CodePipeline to take the contents of a file and use it as the input for the state machine execution.)
    7. For Input, enter results.json (where you store the results of your load test in the build stage of the pipeline).
    8. For Variable namespace, enter StepFunctions. (This parameter tells CodePipeline to store the state machine ARN and execution ARN for this event in a variable namespace named StepFunctions. )
    9. For Output artifacts, enter ApprovalArtifacts. (This parameter tells CodePipeline to store the results of this execution in an artifact called ApprovalArtifacts. )Edit Action Configuration
  9. Choose Done.
    You return to the edit view of the pipeline.
    CodePipeline Edit Configuration
  10. Choose Save.
  11. Choose Release change.

When the pipeline execution reaches the approval stage, it invokes the Step Functions state machine with the results emitted from your build stage. This post hard-codes the load-test results to force an engineering approval by increasing the latency (latencyMs) above the threshold defined in the CloudFormation template (80ms). See the following code:

{
  "accuracypct": 100,
  "latencyMs": 225
}

When the state machine checks the latency and sees that it’s above 80 milliseconds, it invokes the Lambda function with the engineering email address. The engineering team receives a review request email similar to the following screenshot.

review email

If you choose PASS, you send a request to the API Gateway REST API with the Step Functions task token for the current execution, which passes the token to Step Functions with the SendTaskSuccess command. When you return to your pipeline, you can see that the approval was processed and your change is ready for production.

Approved code change with stepfunction integration

Cleaning Up

When the engineering and research teams devise a solution that no longer mixes performance information from both teams into a single application, you can remove this integration by deleting the CloudFormation stack that you created and deleting the new CodePipeline stage that you added.

Conclusion

For more information about CodePipeline Actions and the Step Functions integration, see Working with Actions in CodePipeline.

Single Sign-On between Okta Universal Directory and AWS

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/single-sign-on-between-okta-universal-directory-and-aws/

Enterprises adopting the AWS Cloud want to effectively manage identities. Having one central place to manage identities makes it easier to enforce policies, to manage access permissions, and to reduce the overhead by removing the need to duplicate users and user permissions across multiple identity silos. Having a unique identity also simplifies access for all of us, the users. We all have access to multiple systems, and we all have troubles to remember multiple distinct passwords. Being able to connect to multiple systems using one single combination of user name and password is a daily security and productivity gain. Being able to link an identity from one system with an identity managed on another trusted system is known as “Identity Federation“, which single sign-on is a subset of. Identity Federation is made possible thanks to industry standards such as Security Assertion Markup Language (SAML), OAuth, OpenID and others.

Recently, we announced a new evolution of AWS Single Sign-On, allowing you to link AWS identities with Azure Active Directory identities. We did not stop there. Today, we are announcing the integration of AWS Single Sign-On with Okta Universal Directory.

Let me show you the experience for System Administrators, then I will demonstrate the single sign-on experience for the users.

First, let’s imagine that I am an administrator for an enterprise that already uses Okta Universal Directory to manage my workforce identities. Now I want to enable a simple and easy to use access to our AWS environments for my users, using their existing identities. As most enterprises, I manage multiple AWS Accounts. I want more than just a single sign-on solution, I want to manage access to my AWS Accounts centrally. I do not want to duplicate my Okta groups and user memberships by hand, nor maintain multiple identity systems (Okta Universal Directory and one for each AWS Account I manage). I want to enable automatic user synchronization between Okta and AWS. My users will sign in to the AWS environments using the experience they are already familiar with in Okta.

Connecting Okta as an identity source for AWS Single Sign-On
The first step is to add AWS Single Sign-On as an “application” Okta users can connect to. I navigate to the Okta administration console and login with my Okta administrator credentials, then I navigate to the Applications tab.

Okta admin consoleI click the green Add Application button and I search for AWS SSO application. I click Add.

Okta add applicationI enter a name to the app (you can choose whatever name you like) and click Done.

On the next screen, I configure the mutual agreement between AWS Single Sign-On and Okta. I first download the SAML Meta Data file generated by Okta by clicking the blue link Identity Provider Metadata. I keep this file, I need it later to configure the AWS side of the single sign-on.

Okta Identity Provider metadata

Now that I have the metadata file, I open to the AWS Management Console in a new tab. I keep the Okta tab open as the procedure is not finished there yet. I navigate to AWS Single Sign-On and click Enable AWS SSO.

I click Settings in the navigation panel. I first set the Identity source by clicking the Change link and selecting External identity provider from the list of options. Secondly, I browse to and select the XML file I downloaded from Okta in the Identity provider metadata section.

SSO configure metadata

I click Next: Review, enter CONFIRM in the provided field, and finally click Change identity source to complete the AWS Single Sign-On side of the process. I take note of the two values AWS SSO ACS URL and AWS SSO Issuer URL as I must enter these in the Okta console.

AWS SSO Save URLsI return to the tab I left open to my Okta console, and copy the values for AWS SSO ACS URL and AWS SSO Issuer URL.

OKTA ACS URLsI click Save to complete the configuration

Configuring Automatic Provisioning
Now that Okta is configured for single sign-on for my users to connect using AWS Single Sign-On I’m going to enable automatic provisioning of user accounts. As new accounts are added to Okta, and assigned to the AWS SSO application, a corresponding AWS Single Sign-On user is created automatically. As an administrator, I do not need to do any work to configure a corresponding account in AWS to map to the Okta user.

From the AWS Single Sign-On Console, I navigate to Settings and then click the Enable identity synchronization link. This opens a dialog containing the values for the SCIM endpoint and an OAuth bearer access token (hidden by default). I need both of these values to use in the Okta application settings.

AWS SSO SCIMI switch back to the tab open on the Okta console, and click on Provisioning tab under the AWS SSO Application. I select Enable API Integration. Then I copy / paste the values Base URL (I paste the value copied in AWS Single Sign-On Console SCIM endpoint) and API Token (I paste the value copied AWS Single Sign-On Console Access token)

Okta API IntegrationI click Test API Credentials to verify everything works as expected. Then I click To App to enable users creation, update, and deactivate.

Okta Provisioning To App

With provisioning enabled, my final task is to assign the users and groups that I want to synchronize from Okta to AWS Single Sign-On. I click the Assignments tab and add Okta users and groups. I click Assign, and I select the Okta users and groups I want to have access to AWS.

OKTA AssignmentsThese users are synchronized to AWS Single Sign-On, and the users now see the AWS Single Sign-On application appear in their Okta portal.

Okta Portal User ViewTo verify user synchronization is working, I switch back to the AWS Single Sign-On console and select the Users tab. The users I assigned in Okta console are present.

AWS SSO User View

I Configured Single Sign-On, Now What?
Okta is now my single source of truth for my user identities and their assignment into groups, and periodic synchronization automatically creates corresponding identities in AWS Single Sign-On. My users sign into their AWS accounts and applications with their Okta credentials and experience, and don’t have to remember an additional user name or password. However, as things stand my users have only access to sign in. To manage permissions in terms of what they can access once signed into AWS, I must set up permissions in AWS Single Sign-On.

Back to AWS SSO Console, I click AWS Accounts on the left tab bar and select the account from my AWS Organizations that I am giving access to. For enterprises having multiple accounts for multiple applications or environment, it gives you the granularity to grant access to a subset of your AWS accounts.

AWS SSO Select AWS AccountI click Assign users to assign SSO users or groups to a set of IAM permissions. For this example, I assign just one user, the one with @example.com email address.

Assign SSO UsersI click Next: Permission sets and Create new permission set to create a set of IAM policies to describe the set of permissions I am granting to these Okta users. For this example, I am granting a read-only permission on all AWS services.SSO Permission setAnd voila, I am ready to test this setup.

SSO User Experience for the console
Now that I showed you the steps System Administrators take to configure the integration, let me show you what is the user experience.

As an AWS Account user, I can sign-in on Okta and get access to my AWS Management Console. I can start either from the AWS Single Sign-On user portal (the URL is on the AWS Single Sign-On settings page) or from the Okta user portal page and select the AWS SSO app.

I choose to start from the AWS SSO User Portal. I am redirected to the Okta login page. I enter my Okta credentials and I land on the AWS Account and Role selection page. I click on AWS Account, select the account I want to log into, and click Management console. After a few additional redirections, I land on the AWS Console page.

SSO User experience

SSO User Experience for the CLI
System administrators, DevOps engineers, Developers, and your automation scripts are not using the AWS console. They use the AWS Command Line Interface (CLI) instead. To configure SSO for the command line, I open a terminal and type aws configure sso. I enter the AWS SSO User Portal URL and the Region.

$aws configure sso
SSO start URL [None]: https://d-0123456789.awsapps.com/start
SSO Region [None]: eu-west-1
Attempting to automatically open the SSO authorization page in your default browser.
If the browser does not open or you wish to use a different device to authorize this request, open the following URL:

https://device.sso.eu-west-1.amazonaws.com/

Then enter the code:

AAAA-BBBB

At this stage, my default browser pops up and I enter my Okta credentials on the Okta login page. I confirm I want to enable SSO for the CLI.

SSO for the CLIand I close the browser when I receive this message:

AWS SSO CLI Close Browser Message

The CLI automatically resumes the configuration, I enter the default Region, the default output format and the name of the CLI profile I want to use.

The only AWS account available to you is: 012345678901
Using the account ID 012345678901
The only role available to you is: ViewOnlyAccess
Using the role name "ViewOnlyAccess"
CLI default client Region [eu-west-1]:
CLI default output format [None]:
CLI profile name [okta]:

To use this profile, specify the profile name using --profile, as shown:

aws s3 ls --profile okta

I am now ready to use the CLI with SSO. In my terminal, I type:

aws --profile okta s3 ls
2020-05-04 23:14:49 do-not-delete-gatedgarden-audit-012345678901
2015-09-24 16:46:30 elasticbeanstalk-eu-west-1-012345678901
2015-06-11 08:23:17 elasticbeanstalk-us-west-2-012345678901

If the machine you want to configure CLI SSO has no graphical user interface, you can configure SSO in headless mode, using the URL and the code provided by the CLI (https://device.sso.eu-west-1.amazonaws.com/ and AAAA-BBBB in the example above)

In this post, I showed how you can take advantage of the new AWS Single Sign-On capabilities to link Okta identities to AWS accounts for user single sign-on. I also make use of the automatic provisioning support to reduce complexity when managing and using identities. Administrators can now use a single source of truth for managing their users, and users no longer need to manage an additional identity and password to sign into their AWS accounts and applications.

AWS Single Sign-On with Okta is free to use, and is available in all Regions where AWS Single Sign-On is available. The full list is here.

To see all this in motion, you can check out the following demo video for more details on getting started.

— seb