Tag Archives: launch

Announcing CloudTrail Insights: Identify and Respond to Unusual API Activity

Post Syndicated from Brandon West original https://aws.amazon.com/blogs/aws/announcing-cloudtrail-insights-identify-and-respond-to-unusual-api-activity/

Building software in the cloud makes it easy to instrument systems for logging from the very beginning. With tools like AWS CloudTrail, tracking every action taken on AWS accounts and services is straightforward, providing a way to find the event that caused a given change. But not all log entries are useful. When things are running smoothly, those log entries are like the steady, reassuring hum of machinery on a factory floor. When things start going wrong, that hum can make it harder to hear which piece of equipment has gone a bit wobbly. The same is true with large scale software systems: the volume of log data can be overwhelming. Sifting through those records to find actionable information is tedious. It usually requires a lot of custom software or custom integrations, and can result in false positives and alert fatigue when new services are added.

That’s where software automation and machine learning can help. Today, we’re launching AWS CloudTrail Insights in all commercial AWS regions. CloudTrail Insights automatically analyzes write management events from CloudTrail trails and alerts you to unusual activity. For example, if there is an increase in TerminateInstance events that differs from established baselines, you’ll see it as an Insight event. These events make finding and responding to unusual API activity easier than ever.

Enabling AWS CloudTrail Insights

CloudTrail tracks user activity and API usage. It provides an event history of AWS account activity, including actions taken through the AWS Management Console, AWS SDKs, command line tools, and other AWS services. With the launch of AWS CloudTrail Insights, you can enable machine learning models that detect unusual activity in these logs with just a few clicks. AWS CloudTrail Insights will analyze historical API calls, identifying usage patterns and generating Insight Events for unusual activity.

Screenshot showing how to enable CloudTrail Insights

You can also enable Insights on a trail from the AWS Command Line Interface (CLI) by using the put-insight-selectors command:

$ aws cloudtrail put-insight-selectors --trail-name trail_name --insight-selectors '{"InsightType": "ApiCallRateInsight"}'

Once enabled, CloudTrail Insights sends events to the S3 bucket specified on the trail details page. Events are also sent to CloudWatch Events, and optionally to an CloudWatch Logs log group, just like other CloudTrail Events. This gives you options when it comes to alerting, from sophisticated rules that respond to CloudWatch events to custom AWS Lambda functions. After enabling Insights, historical events for the trail will be analyzed. Anomalous usage patterns found will appear in the CloudTrail Console within 30 minutes.

Using CloudTrail Insights

In this post we’ll take a look at some AWS CloudTrail Insights Events from the AWS Console. If you’d like to view Insight events from the AWS CLI, you use the CloudTrail LookupEvents call with the event-category parameter.

$ aws cloudtrail lookup-events --event-category insight [--max-item] [--lookup-attributes]

Quickly scanning the list of CloudTrail Insights, the RunInstances event jumps out to me. Spinning up more EC2 instances can be expensive, and I’ve definitely mis-configured things such that I created more instances than needed before, so I want to take a closer look. Let’s filter the list down to just these events and see what we can learn from AWS CloudTrail Insights.

Let’s dig in to the latest event.

Here we see that over the course of one minute, there was a spike in RunInstances API call volume. From the Insights graph, we can see the raw event as JSON.

{
    "Records": [
        {
            "eventVersion": "1.07",
            "eventTime": "2019-11-07T13:25:00Z",
            "awsRegion": "us-east-1",
            "eventID": "a9edc959-9488-4790-be0f-05d60e56b547",
            "eventType": "AwsCloudTrailInsight",
            "recipientAccountId": "-REDACTED-",
            "sharedEventID": "c2806063-d85d-42c3-9027-d2c56a477314",
            "insightDetails": {
                "state": "Start",
                "eventSource": "ec2.amazonaws.com",
                "eventName": "RunInstances",
                "insightType": "ApiCallRateInsight",
                "insightContext": {
                    "statistics": {
                        "baseline": {
                            "average": 0.0020833333},
                        "insight": {
                            "average": 6}
                    }
                }
            },
            "eventCategory": "Insight"},
        {
            "eventVersion": "1.07",
            "eventTime": "2019-11-07T13:26:00Z",
            "awsRegion": "us-east-1",
            "eventID": "33a52182-6ff8-49c8-baaa-9caac16a96ce",
            "eventType": "AwsCloudTrailInsight",
            "recipientAccountId": "-REDACTED-",
            "sharedEventID": "c2806063-d85d-42c3-9027-d2c56a477314",
            "insightDetails": {
                "state": "End",
                "eventSource": "ec2.amazonaws.com",
                "eventName": "RunInstances",
                "insightType": "ApiCallRateInsight",
                "insightContext": {
                    "statistics": {
                        "baseline": {
                            "average": 0.0020833333},
                        "insight": {
                            "average": 6},
                        "insightDuration": 1}
                }
            },
            "eventCategory": "Insight"}
    ]}

Here we can see that the baseline API call volume is 0.002. That means that there’s usually one call to RunInstances roughly once every 500 minutes, so the activity we see in the graph is definitely not normal. By clicking over to the CloudTrail Events tab we can see the individual events that are grouped into this Insight event. It looks like this was probably a normal EC2 autoscaling activity, but I still want to dig in and confirm.

By expanding an event in this tab and clicking “View Event,” I can head directly to the event in CloudTrail for more information. After reviewing the event metadata and associated EC2 and IAM resources, I’ve confirmed that while this behavior was unusual, it’s not a cause for concern. It looks like autoscaling did what it was supposed to and that the correct type of instance was created.

Things to Know

Before you get started, here are some important things to know:

  • CloudTrail Insights costs $0.35 for every 100,000 write management events analyzed for each Insight type. At launch, API call volume insights are the only type available.
  • Activity baselines are scoped to the region and account in which the CloudTrail trail is operating.
  • After an account enables Insights events for the first time, if an unusual activity is detected, you can expect to receive the first Insights events within 36 hours of enabling Insights..
  • New unusual activity is logged as it is discovered, sending Insight Events to your destination S3 buckets and the AWS console within 30 minutes in most cases.

Let me know if you have any questions or feature requests, and happy building!

— Brandon

 

Welcome to AWS Storage Day

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/welcome-to-aws-storage-day/

Everyone on the AWS team has been working non-stop to make sure that re:Invent 2019 is the biggest and best one yet. Way back in September, the entire team of AWS News Bloggers gathered in Seattle for a set of top-secret briefings. We listened to the teams, read their PRFAQs (Press Release + FAQ), and chose the launches that we wanted to work on. We’ve all been heads-down ever since, reading the docs, putting the services to work, writing drafts, and responding to feedback.

Heads-Up
Today, a week ahead of opening day, we are making the first round of announcements, all of them related to storage. We are doing this in order to spread out the launches a bit, and to give you some time to sign up for the appropriate re:Invent sessions. We’ve written individual posts for some of the announcements, and are covering the rest in summary form in this post.

Regardless of the AWS storage service that you use, I think you’ll find something interesting and useful here. We are launching significant new features for Amazon Elastic Block Store (EBS), Amazon FSx for Windows File Server, Amazon Elastic File System (EFS), AWS DataSync, AWS Storage Gateway, and Amazon Simple Storage Service (S3). As I never tire of saying, all of these features are available now and you can start using them today!

Let’s get into it…

Elastic Block Store (EBS)
The cool new Fast Snapshot Restore (FSR) feature enables the creation of fully-initialized, full-performance EBS volumes.

Amazon FSx for Windows File Server
This file system now includes a long list of enterprise-ready features, including remote management, native multi-AZ file systems, user quotas, and more! Several new features make this file system even more cost-effective, including data deduplication and support for smaller SSD file systems.

Elastic File System (EFS)
Amazon EFS is now available in all commercial AWS regions. Check out the EFS Pricing page for pricing in your region, or read my original post, Amazon Elastic File System – Production-Ready in Three Regions, to learn more.

AWS DataSync
AWS DataSync was launched at re:Invent 2018. It supports Automated and Accelerated Data Transfer, and can be used for migration, upload & process operations, and backup/DR.

Effective November 1, 2019, we are reducing the per-GB price for AWS DataSync from $0.04/GB to $0.0125/GB. For more information, check out the AWS DataSync Pricing page.

The new task scheduling feature allows you to periodically execute a task that detects changes and copies them from the source storage system to the destination, with options to run tasks on an hourly, daily, weekly, or custom basis:

We recently added support in the Europe (London), Europe (Paris), and Canada (Central) Regions. Today we are adding support in the Europe (Stockholm), Asia Pacific (Mumbai), South America (São Paulo), Asia Pacific (Hong Kong), and AWS GovCloud (US-East) Regions. As a result, AWS DataSync is now available in all commercial and GovCloud regions.

For more information, check out the AWS DataSync Storage Day post!

AWS Storage Gateway
AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage. With this launch, you now have access to a new set of enterprise features:

High AvailabilityStorage Gateway now includes a range of health checks when running within a VMware vSphere High Availability (VMware HA) environment, and can now recover from most service interruptions in under 60 seconds. In the unlikely event that a recovery is necessary, sessions will be maintained and applications should continue to operate unaffected after a pause. The new gateway health checks integrate automatically with VMware through the VM heartbeat. You have the ability to adjust the sensitivity of the heartbeat from within VMware:

To learn more, read Deploy a Highly Available AWS Storage Gateway on a VMware vSphere Cluster.

Enhanced Metrics – If you enable Amazon CloudWatch Integration, AWS Storage Gateway now publishes cache utilization, access pattern, throughput, and I/O metrics to Amazon CloudWatch and makes them visible in the Monitoring tab for each gateway. To learn more, read Monitoring Gateways.

More Maintenance Options – You now have additional control over the software updates that are applied to each Storage Gateway. Mandatory security updates are always applied promptly, and you can control the schedule for feature updates. You have multiple options including day of the week and day of the month, with more coming soon. To learn more, read Managing Storage Gateway Updates.

Increased Performance – AWS Storage Gateway now delivers higher read performance when used as a Virtual Tape Library, and for reading data and listing directories when used as a File Gateway, providing you faster access to data managed through these gateways.

Amazon S3
We launched Same-Region Replication (SRR) in mid-September, giving you the ability to configure in-region replication based on bucket, prefix, or object tag. When an object is replicated using SRR, the metadata, Access Control Lists (ACLs), and objects tags associated with the object are also replicated. Once SRR has been configured on a source bucket, any changes to these elements will trigger a replication to the destination bucket. To learn more, read about S3 Replication.

Today we are launching a Service Level Agreement (SLA) for S3 Replication, along with the ability to monitor the status of each of your replication configurations. To learn more, read S3 Replication Update: Replication SLA, Metrics, and Events.

AWS Snowball Edge
This is, as I have already shared, a large-scale data migration and edge computing device with on-board compute and storage capabilities. We recently launched three free training courses that will help you to learn more about this unique and powerful device:

You may also enjoy reading about Data Migration Best Practices with AWS Snowball Edge.

Jeff;

 

New – Amazon EBS Fast Snapshot Restore (FSR)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-ebs-fast-snapshot-restore-fsr/

Amazon Elastic Block Store (EBS) has been around for more than a decade and is a fundamental AWS building block. You can use it to create persistent storage volumes that can store up to 16 TiB and supply up to 64,000 IOPS (Input/Output Operations per Second). You can choose between four types of volumes, making the choice that best addresses your data transfer throughput, IOPS, and pricing requirements. If your requirements change, you can modify the type of a volume, expand it, or change the performance while the volume remains online and active. EBS snapshots allow you to capture the state of a volume for backup, disaster recovery, and other purposes. Once created, a snapshot can be used to create a fresh EBS volume. Snapshots are stored in Amazon Simple Storage Service (S3) for high durability.

Our ever-creative customers are using EBS snapshots in many interesting ways. In addition to the backup and disaster recovery use cases that I just mentioned, they are using snapshots to quickly create analytical or test environments using data drawn from production, and to support Virtual Desktop Interface (VDI) environments. As you probably know, the AMIs (Amazon Machine Images), that you use to launch EC2 instances are also stored as one or more snapshots.

Fast Snapshot Restore
Today we are launching Fast Snapshot Restore (FSR) for EBS. You can enable it for new and existing snapshots on a per-AZ (Availability Zone) basis, and then create new EBS volumes that deliver their maximum performance and do not need to be initialized.

This performance enhancement will allow you to build AWS-based systems that are even faster and more responsive than before. Faster boot times will speed up your VDI environments and allow your Auto Scaling Groups to come online and start processing traffic more quickly, even if you use large and/or custom AMIs. I am sure that you will soon dream up new applications that can take advantage of this new level of speed and predictability.

Fast Snapshot Restore can be enabled on a snapshot even while the snapshot is being created. If you create nightly backup snapshots, enabling them for FSR will allow you to do fast restores the following day regardless of the size of the volume or the snapshot.

Enabling & Using Fast Snapshot Restore
I can get started in minutes! I open the EC2 Console and find the first snapshot that I want to set up for fast restore:

I select the snapshot and choose Manage Fast Snapshot Restore from the Actions menu:

Then I select the Availability Zones where I plan to create EBS volumes, and click Save:

After the settings are saved, I receive a confirmation:

The console shows me that my snapshot is being enabled for Fast Snapshot Restore:

The status progresses from enabling to optimizing, and then to enabled. Behind the scenes and with no extra effort on my part, the optimization process provisions extra resources to deliver the fast restores, proceeding at a rate of one TiB per hour. By contrast, non-optimized volumes retrieve data directly from the S3-stored snapshot on an incremental, on-demand basis.

Once the optimization is complete, I can create volumes from the snapshot in the usual way, confident that they will be ready in seconds and pre-initialized for full performance! Each FSR-enabled snapshot supports creation of up to 10 initialized volumes per hour per Availability Zone; additional volume creations will be non-initialized. As my needs change, I can enable Fast Snapshot Restore in additional Availability Zones and I can disable it in Zones where I had previously enabled it.

When Fast Snapshot Restore is enabled for a snapshot in a particular Availability Zone, a bucket-based credit system governs the acceleration process. Creating a volume consumes a credit; the credits refill over time, and the maximum number of credits is a function of the FSR-enabled snapshot size. Here are some guidelines:

  • A 100 GiB FSR-enabled snapshot will have a maximum credit balance of 10, and a fill rate of 10 credits per hour.
  • A 4 TiB FSR-enabled snapshot will have a maximum credit balance of 1, and a fill rate of 1 credit every 4 hours.

In other words, you can do 1 TiB of restores per hour for a given FSR-enabled snapshot within an AZ.

Things to Know
Here are some things to know about Fast Snapshot Restore:

Regions & AZs – Fast Snapshot Restore is available in all Availability Zones of the US East (N. Virginia), US West (Oregon), US West (N. California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Sydney), and Asia Pacific (Tokyo) Regions.

Pricing – You pay $0.75 for each hour that Fast Snapshot Restore is enabled for a snapshot in a particular Availability Zone, pro-rated and with a minimum of one hour.

Monitoring – You can use the following per-minute CloudWatch metrics to track the state of the credit bucket for each FSR-enabled snapshot:

  • FastSnapshotRestoreCreditsBalance – The number of volume creation credits that are available.
  • FastSnapshotRestoreCreditsBucketSize – The maximum number of volume creation credits that can be accumulated.

CLI & Programmatic Access – You can use the enable-fast-snapshot-restores, describe-fast-snapshot-restores, and disable-fast-snapshot-restores commands to create and manage your accelerated snapshots from the command line. You can also use the EnableFastSnapshotRestores, DescribeFastSnapshotRestores, and DisableFastSnapshotRestores API functions from your application code.

CloudWatch Events – You can use the EBS Fast Snapshot Restore State-change Notification event type to invoke Lambda functions or other targets when the state of a snapshot/AZ pair changes. Events are emitted on successful and unsuccessful transitions to the enabling, optimizing, enabled, disabling, and disabled states.

Data Lifecycle Manager – You can enable FSR on snapshots created by your DLM lifecycle policies, specify AZs, and specify the number of snapshots to be FSR-enabled. You can use an existing CloudFormation template to integrate FSR into your DLM policies (read about the AWS::DLM::LifecyclePolicy to learn more).

In the Works
We are launching with support for snapshots that you own. Over time, we intend to expand coverage and allow you to enable Fast Snapshot Restore for snapshots that you have been granted access to.

Available Now
Fast Snapshot Restore is available now and you can start using it today!

Jeff;

 

S3 Replication Update: Replication SLA, Metrics, and Events

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/s3-replication-update-replication-sla-metrics-and-events/

S3 Cross-Region Replication has been around since early 2015 (new Cross-Region Replication for Amazon S3), and Same-Region Replication has been around for a couple of months.

Replication is very easy to set up, and lets you use rules to specify that you want to copy objects from one S3 bucket to another one. The rules can specify replication of the entire bucket, or of a subset based on prefix or tag:

You can use replication to copy critical data within or between AWS regions in order to meet regulatory requirements for geographic redundancy as part of a disaster recover plan, or for other operational reasons. You can copy within a region to aggregate logs, set up test & development environments, and to address compliance requirements.

S3’s replication features have been put to great use: Since the launch in 2015, our customers have replicated trillions of objects and exabytes of data! Today I am happy to be able to tell you that we are making it even more powerful, with the addition of Replication Time Control. This feature builds on the existing rule-driven replication and gives you fine-grained control based on tag or prefix so that you can use Replication Time Control with the data set you specify. Here’s what you get:

Replication SLA – You can now take advantage of a replication SLA to increase the predictability of replication time.

Replication Metrics – You can now monitor the maximum replication time for each rule using new CloudWatch metrics.

Replication Events – You can now use events to track any object replications that deviate from the SLA.

Let’s take a closer look!

New Replication SLA
S3 replicates your objects to the destination bucket, with timing influenced by object size & count, available bandwidth, other traffic to the buckets, and so forth. In situations where you need additional control over replication time, you can use our new Replication Time Control feature, which is designed to perform as follows:

  • Most of the objects will be replicated within seconds.
  • 99% of the objects will be replicated within 5 minutes.
  • 99.99% of the objects will be replicated within 15 minutes.

When you enable this feature, you benefit from the associated Service Level Agreement. The SLA is expressed is terms of a percentage of objects that are expected to be replicated within 15 minutes, and provides for billing credits if the SLA is not met:

  • 99.9% to 98.0% – 10% credit
  • 98.0% to 95.0% – 25% credit
  • 95% to 0% – 100% credit

The billing credit applies to a percentage of the Replication Time Control fee, replication data transfer, S3 requests, and S3 storage charges in the destination for the billing period.

I can enable Replication Time Control when I create a new replication rule, and I can also add it to an existing rule:

Replication begins as soon as I create or update the rule. I can use the Replication Metrics and the Replication Events to monitor compliance.

In addition to the existing charges for S3 requests and data transfer between regions, you will pay an extra per-GB charge to use Replication Time Control; see the S3 Pricing page for more information.

Replication Metrics
Each time I enable Replication Time Control for a rule, S3 starts to publish three new metrics to CloudWatch. They are available in the S3 and CloudWatch Consoles:

I created some large tar files, and uploaded them to my source bucket. I took a quick break, and inspected the metrics. Note that I did my testing before the launch, so don’t get overly concerned with the actual numbers. Also, keep in mind that these metrics are aggregated across the replication for display, and are not a precise indication of per-object SLA compliance.

BytesPendingReplication jumps up right after the upload, and then drops down as the replication takes place:

ReplicationLatency peaks and then quickly drops down to zero after S3 Replication transfers over 37 GB from the United States to Australia with a maximum latency of 8.3 minutes:

And OperationsPendingCount tracks the number of objects to be replicated:

I can also set CloudWatch Alarms on the metrics. For example, I might want to know if I have a replication backlog larger than 75 GB (for this to work as expected, I must set the Missing data treatment to Treat missing data as ignore (maintain the alarm state):

These metrics are billed as CloudWatch Custom Metrics.

Replication Events
Finally, you can track replication issues by setting up events on an SQS queue, SNS topic, or Lambda function. Start at the console’s Events section:

You can use these events to monitor adherence to the SLA. For example, you could store Replication time missed threshold and Replication time completed after threshold events in a database to track occasions where replication took longer than expected. The first event will tell you that the replication is running late, and the second will tell you that it has completed, and how late it was.

To learn more, read about Replication.

Available Now
You can start using these features today in all commercial AWS Regions, excluding the AWS China (Beijing) and AWS China (Ningxia) Regions.

Jeff;

PS – If you want to learn more about how S3 works, be sure to attend the re:Invent session: Beyond Eleven Nines: Lessons from the Amazon S3 Culture of Durability.

 

Amazon FSx For Windows File Server Update – Multi-AZ, & New Enterprise-Ready Features

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-fsx-for-windows-file-server-update-new-enterprise-ready-features/

Last year I told you about Amazon FSx for Windows File Server — Fast, Fully Managed, and Secure. That launch was well-received, and our customers (Neiman Marcus, Ancestry, Logicworks, and Qube Research & Technologies to name a few) are making great use of the service. They love the fact that they can access their shares from a wide variety of sources, and that they can use their existing Active Directory environment to authenticate users. They benefit from a native implementation with fast, SSD-powered performance, and no longer spend time attaching and formatting storage devices, updating Windows Server, or recovering from hardware failures.

Since the launch, we have continued to enhance Amazon FSx for Windows File Server, largely in response to customer requests. Some of the more significant enhancements include:

Self-Managed Directories – This launch gave you the ability to join your Amazon FSx file systems to on-premises or in-cloud self-managed Microsoft Active Directories. To learn how to get started with this feature, read Using Amazon FSx with Your Self-Managed Microsoft Active Directory.

Fine-Grained File Restoration – This launch (powered by Windows shadow copies) gave your users the ability to easily view and restore previous versions of their files. To learn how to configure and use this feature, read Working with Shadow Copies.

On-Premises Access – This launch gave you the power to access your file systems from on-premises using AWS Direct Connect or an AWS VPN connection. You can host user shares in the cloud for on-premises access, and you can also use it to support your backup and disaster recovery model. To learn more, read Accessing Amazon FSx for Windows File Server File Systems from On-Premises.

Remote Management CLI – This launch focused on a set of CLI commands (PowerShell Cmdlets) to manage your Amazon FSx for Windows File Server file systems. The commands support remote management and give you the ability to fully automate many types of setup, configuration, and backup workflows from a central location.

Enterprise-Ready Features
Today we are launching an extensive list of new features that are designed to address the top-priority requests from our enterprise customers.

Native Multi-AZ File Systems -You can now create file systems that span AWS Availability Zones (AZs). You no longer need to set up or manage replication across AZs; instead, you select the multi-AZ deployment option when you create your file system:

Then you select two subnets where your file system will reside:

This will create an active file server and a hot standby, each with their own storage, and synchronous replication across AZs to the standby. If the active file server fails, Amazon FSx will automatically fail over to the standby, so that you can maintain operations without losing any data. Failover typically takes less than 30 seconds. The DNS name remains unchanged, making replication and failover transparent, even during planned maintenance windows. This feature is available in the US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (London), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Europe (Stockholm) Regions.

Support for SQL Server – Amazon FSx now supports the creation of Continuously Available (CA) file shares, which are optimized for use by Microsoft SQL Server. This allows you to store your active SQL Server data on a fully managed Windows file system in AWS.

Smaller Minimum Size – Single-AZ file systems can now be as small as 32 GiB (the previous minimum was 300 GiB).

Data Deduplication – You can optimize your storage by seeking out and eliminating low-level duplication of data, with the potential to reduce your storage costs. The actual space savings will depend on your use case, but you can expect it to be around 50% for typical workloads (read Microsoft’s Data Duplication Overview and Understanding Data Duplication to learn more).

Once enabled for a file system with Enable-FSxDedup, deduplication jobs are run on a default schedule that you can customize if desired. You can use the Get-FSxDedupStatus command to see some interesting stats about your file system:

To learn more, read Using Data Deduplication.

Programmatic File Share Configuration – You can now programmatically configure your file shares using PowerShell commands (this is part of the Remote Management CLI that I mentioned earlier). You can use these commands to automate your setup, migration, and synchronization workflows. The commands include:

  • New-FSxSmbShare – Create a new shared folder.
  • Grant-FSxSmbShareAccess – Add an access control entry (ACE) to an ACL.
  • Get-FSxSmbSession – Get information about active SMB sessions.
  • Get-FSxSmbOpenFile – Get information about files opened on SMB sessions.

To learn more, read Managing File Shares.

Enforcement of In-Transit Encryption – You can insist that connections to your file shares make use of in-transit SMB encryption:

PS> Set-FSxSmbServerConfiguration -RejectUnencryptedAccess $True

To learn more, read about Encryption of Data in Transit.

Quotas – You can now use quotas to monitor and control the amount of storage space consumed by each user. You can set up per-user quotas, monitor usage, track violations, and choose to deny further consumption to users who exceed their quotas:

PS> Enable-FSxUserQuotas Mode=Enforce
PS> Set-FSXUserQuota jbarr ...

To learn more, read about Managing User Quotas.

Available Now
Putting it all together, this laundry list of new enterprise-ready features and the power to create Multi-AZ file systems makes Amazon FSx for Windows File Server a great choice when you are moving your existing NAS (Network Attached Storage) to the AWS Cloud.

All of these features are available now and you can start using them today in all commercial AWS Regions where Amazon FSx for Windows File Server is available, unless otherwise noted above.

Jeff;

 

New – Using Step Functions to Orchestrate Amazon EMR workloads

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-using-step-functions-to-orchestrate-amazon-emr-workloads/

AWS Step Functions allows you to add serverless workflow automation to your applications. The steps of your workflow can run anywhere, including in AWS Lambda functions, on Amazon Elastic Compute Cloud (EC2), or on-premises. To simplify building workflows, Step Functions is directly integrated with multiple AWS Services: Amazon ECS, AWS Fargate, Amazon DynamoDB, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), AWS Batch, AWS Glue, Amazon SageMaker, and (to run nested workflows) with Step Functions itself.

Starting today, Step Functions connects to Amazon EMR, enabling you to create data processing and analysis workflows with minimal code, saving time, and optimizing cluster utilization. For example, building data processing pipelines for machine learning is time consuming and hard. With this new integration, you have a simple way to orchestrate workflow capabilities, including parallel executions and dependencies from the result of a previous step, and handle failures and exceptions when running data processing jobs.

Specifically, a Step Functions state machine can now:

  • Create or terminate an EMR cluster, including the possibility to change the cluster termination protection. In this way, you can reuse an existing EMR cluster for your workflow, or create one on-demand during execution of a workflow.
  • Add or cancel an EMR step for your cluster. Each EMR step is a unit of work that contains instructions to manipulate data for processing by software installed on the cluster, including tools such as Apache Spark, Hive, or Presto.
  • Modify the size of an EMR cluster instance fleet or group, allowing you to manage scaling programmatically depending on the requirements of each step of your workflow. For example, you may increase the size of an instance group before adding a compute-intensive step, and reduce the size just after it has completed.

When you create or terminate a cluster or add an EMR step to a cluster, you can use synchronous integrations to move to the next step of your workflow only when the corresponding activity has completed on the EMR cluster.

Reading the configuration or the state of your EMR clusters is not part of the Step Functions service integration. In case you need that, the EMR List* and Describe* APIs can be accessed using Lambda functions as tasks.

Building a Workflow with EMR and Step Functions
On the Step Functions console, I create a new state machine. The console renders it visually, so that is much easier to understand:

To create the state machine, I use the following definition using the Amazon States Language (ASL):

{
  "StartAt": "Should_Create_Cluster",
  "States": {
    "Should_Create_Cluster": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.CreateCluster",
          "BooleanEquals": true,
          "Next": "Create_A_Cluster"
        },
        {
          "Variable": "$.CreateCluster",
          "BooleanEquals": false,
          "Next": "Enable_Termination_Protection"
        }
      ],
      "Default": "Create_A_Cluster"
    },
    "Create_A_Cluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
      "Parameters": {
        "Name": "WorkflowCluster",
        "VisibleToAllUsers": true,
        "ReleaseLabel": "emr-5.28.0",
        "Applications": [{ "Name": "Hive" }],
        "ServiceRole": "EMR_DefaultRole",
        "JobFlowRole": "EMR_EC2_DefaultRole",
        "LogUri": "s3://aws-logs-123412341234-eu-west-1/elasticmapreduce/",
        "Instances": {
          "KeepJobFlowAliveWhenNoSteps": true,
          "InstanceFleets": [
            {
              "InstanceFleetType": "MASTER",
              "TargetOnDemandCapacity": 1,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m4.xlarge"
                }
              ]
            },
            {
              "InstanceFleetType": "CORE",
              "TargetOnDemandCapacity": 1,
              "InstanceTypeConfigs": [
                {
                  "InstanceType": "m4.xlarge"
                }
              ]
            }
          ]
        }
      },
      "ResultPath": "$.CreateClusterResult",
      "Next": "Merge_Results"
    },
    "Merge_Results": {
      "Type": "Pass",
      "Parameters": {
        "CreateCluster.$": "$.CreateCluster",
        "TerminateCluster.$": "$.TerminateCluster",
        "ClusterId.$": "$.CreateClusterResult.ClusterId"
      },
      "Next": "Enable_Termination_Protection"
    },
    "Enable_Termination_Protection": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:setClusterTerminationProtection",
      "Parameters": {
        "ClusterId.$": "$.ClusterId",
        "TerminationProtected": true
      },
      "ResultPath": null,
      "Next": "Add_Steps_Parallel"
    },
    "Add_Steps_Parallel": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "Step_One",
          "States": {
            "Step_One": {
              "Type": "Task",
              "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
              "Parameters": {
                "ClusterId.$": "$.ClusterId",
                "Step": {
                  "Name": "The first step",
                  "ActionOnFailure": "CONTINUE",
                  "HadoopJarStep": {
                    "Jar": "command-runner.jar",
                    "Args": [
                      "hive-script",
                      "--run-hive-script",
                      "--args",
                      "-f",
                      "s3://eu-west-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
                      "-d",
                      "INPUT=s3://eu-west-1.elasticmapreduce.samples",
                      "-d",
                      "OUTPUT=s3://MY-BUCKET/MyHiveQueryResults/"
                    ]
                  }
                }
              },
              "End": true
            }
          }
        },
        {
          "StartAt": "Wait_10_Seconds",
          "States": {
            "Wait_10_Seconds": {
              "Type": "Wait",
              "Seconds": 10,
              "Next": "Step_Two (async)"
            },
            "Step_Two (async)": {
              "Type": "Task",
              "Resource": "arn:aws:states:::elasticmapreduce:addStep",
              "Parameters": {
                "ClusterId.$": "$.ClusterId",
                "Step": {
                  "Name": "The second step",
                  "ActionOnFailure": "CONTINUE",
                  "HadoopJarStep": {
                    "Jar": "command-runner.jar",
                    "Args": [
                      "hive-script",
                      "--run-hive-script",
                      "--args",
                      "-f",
                      "s3://eu-west-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
                      "-d",
                      "INPUT=s3://eu-west-1.elasticmapreduce.samples",
                      "-d",
                      "OUTPUT=s3://MY-BUCKET/MyHiveQueryResults/"
                    ]
                  }
                }
              },
              "ResultPath": "$.AddStepsResult",
              "Next": "Wait_Another_10_Seconds"
            },
            "Wait_Another_10_Seconds": {
              "Type": "Wait",
              "Seconds": 10,
              "Next": "Cancel_Step_Two"
            },
            "Cancel_Step_Two": {
              "Type": "Task",
              "Resource": "arn:aws:states:::elasticmapreduce:cancelStep",
              "Parameters": {
                "ClusterId.$": "$.ClusterId",
                "StepId.$": "$.AddStepsResult.StepId"
              },
              "End": true
            }
          }
        }
      ],
      "ResultPath": null,
      "Next": "Step_Three"
    },
    "Step_Three": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
      "Parameters": {
        "ClusterId.$": "$.ClusterId",
        "Step": {
          "Name": "The third step",
          "ActionOnFailure": "CONTINUE",
          "HadoopJarStep": {
            "Jar": "command-runner.jar",
            "Args": [
              "hive-script",
              "--run-hive-script",
              "--args",
              "-f",
              "s3://eu-west-1.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
              "-d",
              "INPUT=s3://eu-west-1.elasticmapreduce.samples",
              "-d",
              "OUTPUT=s3://MY-BUCKET/MyHiveQueryResults/"
            ]
          }
        }
      },
      "ResultPath": null,
      "Next": "Disable_Termination_Protection"
    },
    "Disable_Termination_Protection": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:setClusterTerminationProtection",
      "Parameters": {
        "ClusterId.$": "$.ClusterId",
        "TerminationProtected": false
      },
      "ResultPath": null,
      "Next": "Should_Terminate_Cluster"
    },
    "Should_Terminate_Cluster": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.TerminateCluster",
          "BooleanEquals": true,
          "Next": "Terminate_Cluster"
        },
        {
          "Variable": "$.TerminateCluster",
          "BooleanEquals": false,
          "Next": "Wrapping_Up"
        }
      ],
      "Default": "Wrapping_Up"
    },
    "Terminate_Cluster": {
      "Type": "Task",
      "Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
      "Parameters": {
        "ClusterId.$": "$.ClusterId"
      },
      "Next": "Wrapping_Up"
    },
    "Wrapping_Up": {
      "Type": "Pass",
      "End": true
    }
  }
}

I let the Step Functions console create a new AWS Identity and Access Management (IAM) role for the executions of this state machine. The role automatically includes all permissions required to access EMR.

This state machine can either use an existing EMR cluster, or create a new one. I can use the following input to create a new cluster that is terminated at the end of the workflow:

{
"CreateCluster": true,
"TerminateCluster": true
}

To use an existing cluster, I need to provide input in the cluster ID, using this syntax:

{
"CreateCluster": false,
"TerminateCluster": false,
"ClusterId": "j-..."
}

Let’s see how that works. As the workflow starts, the Should_Create_Cluster Choice state looks into the input to decide if it should enter the Create_A_Cluster state or not. There, I use a synchronous call (elasticmapreduce:createCluster.sync) to wait for the new EMR cluster to reach the WAITING state before progressing to the next workflow state. The AWS Step Functions console shows the resource that is being created with a link to the EMR console:

After that, the Merge_Results Pass state merges the input state with the cluster ID of the newly created cluster to pass it to the next step in the workflow.

Before starting to process any data, I use the Enable_Termination_Protection state (elasticmapreduce:setClusterTerminationProtection) to help ensure that the EC2 instances in my EMR cluster are not shut down by an accident or error.

Now I am ready to do something with the EMR cluster. I have three EMR steps in the workflow. For the sake of simplicity, these steps are all based on this Hive tutorial. For each step, I use Hive’s SQL-like interface to run a query on some sample CloudFront logs and write the results to Amazon Simple Storage Service (S3). In a production use case, you’d probably have a combination of EMR tools processing and analyzing your data in parallel (two or more steps running at the same time) or with some dependencies (the output of one step is required by another step). Let’s try to do something similar.

First I execute Step_One and Step_Two inside a Parallel state:

  • Step_One is running the EMR step synchronously as a job (elasticmapreduce:addStep.sync). That means that the execution waits for the EMR step to be completed (or cancelled) before moving on to the next step in the workflow. You can optionally add a timeout to monitor that the execution of the EMR step happens within an expected time frame.
  • Step_Two is adding an EMR step asynchronously (elasticmapreduce:addStep). In this case, the workflow moves to the next step as soon as EMR replies that the request has been received. After a few seconds, to try another integration, I cancel Step_Two (elasticmapreduce:cancelStep). This integration can be really useful in production use cases. For example, you can cancel an EMR step if you get an error from another step running in parallel that would make it useless to continue with the execution of this step.

After those two steps have both completed and produce their results, I execute Step_Three as a job, similarly to what I did for Step_One. When Step_Three has completed, I enter the Disable_Termination_Protection step, because I am done using the cluster for this workflow.

Depending on the input state, the Should_Terminate_Cluster Choice state is going to enter the Terminate_Cluster state (elasticmapreduce:terminateCluster.sync) and wait for the EMR cluster to terminate, or go straight to the Wrapping_Up state and leave the cluster running.

Finally I have a state for Wrapping_Up. I am not doing much in this final state actually, but you can’t end a workflow from a Choice state.

In the EMR console I see the status of my cluster and of the EMR steps:

Using the AWS Command Line Interface (CLI), I find the results of my query in the S3 bucket configured as output for the EMR steps:

aws s3 ls s3://MY-BUCKET/MyHiveQueryResults/
...

Based on my input, the EMR cluster is still running at the end of this workflow execution. I follow the resource link in the Create_A_Cluster step to go to the EMR console and terminate it. In case you are following along with this demo, be careful to not leave your EMR cluster running if you don’t need it.

Available Now
Step Functions integration with EMR is available in all regions. There is no additional cost for using this feature on top of the usual Step Functions and EMR pricing.

You can now use Step Functions to quickly build complex workflows for executing EMR jobs. A workflow can include parallel executions, dependencies, and exception handling. Step Functions makes it easy to retry failed jobs and terminate workflows after critical errors, because you can specify what happens when something goes wrong. Let me know what are you going to use this feature for!

Danilo

AWS Systems Manager Explorer – A Multi-Account, Multi-Region Operations Dashboard

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/aws-systems-manager-explorer-a-multi-account-multi-region-operations-dashboard/

Since 2006, Amazon Web Services has been striving to simplify IT infrastructure. Thanks to services like Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), Amazon Relational Database Service (RDS), AWS CloudFormation and many more, millions of customers can build reliable, scalable, and secure platforms in any AWS region in minutes. Having spent 10 years procuring, deploying and managing more hardware than I care to remember, I’m still amazed every day by the pace of innovation that builders achieve with our services.

With great power come great responsibility. The second you create AWS resources, you’re responsible for them: security of course, but also cost and scaling. This makes monitoring and alerting all the more important, which is why we built services like Amazon CloudWatch, AWS Config and AWS Systems Manager.

Still, customers told us that their operations work would be much simpler if they could just look at a single dashboard, listing potential issues on AWS resources no matter which ones of their accounts or which region they’ve been created in.

We got to work, and today we’re very happy to announce the availability of AWS Systems Manager Explorer, a unified operations dashboard built as part of Systems Manager.

Introducing AWS Systems Manager Explorer
Collecting monitoring information and alerts from EC2, Config, CloudWatch and Systems Manager, Explorer presents you with an intuitive graphical dashboard that lets you quickly view and browse problems affecting your AWS resources. By default, this data comes from the account and region you’re running in, and you can easily include other regions as well as other accounts managed with AWS Organizations.

Specifically, Explorer can provide operational information about:

  • EC2 issues, such as unhealthy instances,
  • EC2 instances that have a non-compliant patching status,
  • AWS resources that don’t comply with Config rules (predefined or your own),
  • AWS resources that have triggered a CloudWatch Events rule (predefined or your own).

Each issue is stored as an OpsItem in AWS Systems Manager OpsCenter, and is assigned a status (open, in progress, resolved), a severity and a category (security, performance, cost, etc.). Widgets let you quickly browse OpsItems, and a timeline of all OpsItems is also available.

In addition to OpsItems, the Explorer dashboard also includes widgets that show consolidated information on EC2 instances:

  • Instance count, with a tag filter,
  • Instances managed by Systems Manager, as well as unmanaged instances,
  • Instances sorted by AMI id.

As you would expect, all information can be exported to S3 for archival or further processing, and you can also set up Amazon Simple Notification Service (SNS) notifications. Last but not least, all data visible on the dashboard can be accessed from the AWS CLI or any AWS SDK with the GetOpsSummary API.

Let’s take a quick tour.

A Look at AWS Systems Manager Explorer
Before using Explorer, we recommend that you first set up Config and Systems Manager. This will help populate your Explorer dashboard immediately. No setup is required for CloudWatch events.

Setting up Config is a best practice, and the procedure is extremely simple: don’t forget to enable EC2 rules too. Setting up Systems Manager is equally simple, thanks to the quick setup procedure: add managed instances and check for patch compliance in just a few clicks! Don’t forget to do this in all regions and accounts you want Explorer to manage.

If you set these services up later, you’ll have to wait for a little while for data to be retrieved and displayed.

Now, let’s head out to the AWS console for Explorer.

Once I’ve completed the one-click setup page creating a service role and enabling data sources, a quick look at the CloudWatch Events console confirms that rules have been created automatically.

Explorer recommends that I add regions and accounts in order to get a unified view. Of course, you can skip this step if you just want a quick taste of the service.

If you’re keen on synchronizing data, you can easily create a resource data sync, which will fetch operations data coming from other regions and other accounts. I’m going all in here, but please make sure you tick the boxes that work for you.

Once data has been retrieved and processed, my dashboard lights up. Good thing it’s only a test account!

I can also see information on all EC2 instances.

From here on, I can group OpsItems and instances according to different dimensions (accounts, regions, tags). I can also drill down on OpsItems, view their details in Opscenter, and apply runbooks to fix them. If you want to know more about Opscenter, here’s the launch post.

Now Available!
We believe AWS Systems Manager Explorer will help Operations teams find and solve problems easier and faster, no matter what the scale of their AWS infrastructure is.

This feature is available today in all regions where AWS Systems Manager OpsCenter is available. Give it a try, and please share your feedback in the AWS forum for AWS Systems Manager, or with your usual AWS support contacts.

Julien

New – Application Load Balancer Simplifies Deployment with Weighted Target Groups

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-application-load-balancer-simplifies-deployment-with-weighted-target-groups/

One of the benefits of cloud computing is the possibility to create infrastructure programmatically and to tear it down when it is not longer needed. This allows to radically change the way developers deploy their applications. When developers used to deploy applications on premises, they had to reuse existing infrastructure for new versions of their applications. In the cloud, developers create new infrastructure for new versions of their applications. They keep the previous version running in parallel for awhile before to tear it down. This technique is called blue/green deployments. It allows to progressively switch traffic between two versions of your apps, to monitor business and operational metrics on the new version, and to switch traffic back to the previous version in case anything goes wrong.

To adopt blue/green deployments, AWS customers are adopting two strategies. The first strategy consists of creating a second application stack, including a second load balancer. Developers use some kind of weighted routing technique, such as DNS, to direct part of the traffic to each stack. The second strategy consists of replacing infrastructure behind the load balancer. Both strategies can cause delays in moving traffic between versions, depending on DNS TTL and caching on client machines. They can cause additional costs to run the extra load balancer, and potential delays to warm up the extra load balancer.

A target group tells a load balancer where to direct traffic to : EC2 instances, fixed IP addresses; or AWS Lambda functions, amongst others. When creating a load balancer, you create one or more listeners and configure listener rules to direct the traffic to one target group.

Today, we are announcing weighted target groups for application load balancers. It allows developers to control how to distribute traffic to multiple versions of their application.

Multiple, Weighted Target Groups
You can now add more than one target group to the forward action of a listener rule, and specify a weight for each group. For example, when you define a rule having two target groups with weights of 8 and 2, the load balancer will route 80% of the traffic to the first target group and 20% to the other.

To experiment with weighted target groups today, you can use this CDK code. It creates two auto scaling groups with EC2 instances and an Elastic Load Balancer in front of them. It also deploys a sample web app on the instances. The blue version of the web app is deployed to the blue instance and the green version of the web app is deployed to the green instance. The infrastructure looks like this:

You can git clone the CDK project and type npm run build && cdk bootstrap && cdk deploy to deploy the above infrastructure. To show you how to configure the load balancer, the CDK code creates the auto scaling, the load balancer and a generic target group. Let’s manually finish the configuration and create two weighted target groups, one for each version of the application.

First, I navigate to the EC2 console, select Target Groups and click the Create Target Group button. I create a target group called green. Be sure to select the correct Amazon Virtual Private Cloud, the one created by the CDK script has a name starting with “AlbWtgStack...“, then click Create.

I repeat the operation to create a blue target group. My Target Groups console looks like this:

Next, I change the two auto scaling groups to point them to the blue and green target groups. In the AWS Management Console, I click Auto Scaling Groups, select one of the two auto scaling groups and I pay attention to the name (it contains either ‘green’ or ‘blue’), I click Actions then Edit.

In the Edit details screen, I remove the target group that has been created by the CDK script and add the target group matching the name of the auto scaling group (green or blue). I click Save at the bottom of the screen and I repeat the operation for the other auto scaling group.

Next, I change the listener rule to add these two target groups, each having their own weight. In the EC2 console, I select Load Balancers on the left side, then I search for the load balancer created by the CDK code (the name starts with “alb”). I click Listeners, then View / edit rules:

There is one rule created by the CDK script. I modify it by clicking the edit icon on the top, then again the edit icon on the left of the rule. I delete the Foward to rule by clicking the trash can icon.

Then I click “+ Add Action” to add two Forward to rules, each having a target group, (blue and green) weighted with 50 and 50.

Finally, click Update on the right side. I am now ready to test the weighted load balancing.

I point my browser to the DNS name of the load balancer. I see either the green or the blue version of the web app. I force my browser to reload the page and I observe the load balancer in action, sending 50% of the requests to the green application and 50% to the blue application. Some browsers might cache the page and not reflect the weight I defined. Safari and Chrome are less aggressive than Firefox at this exercise.

Now, in the AWS Management Console, I change the weights to 80 and 20 and continue to refresh my browser. I observe that the blue version is displayed 8 times out of 10, on average.

I can also adjust the weight from the ALB ModifyListener API, the AWS Command Line Interface (CLI) or with AWS CloudFormation.

For example, I use the AWS Command Line Interface (CLI) like this:

aws elbv2 modify-listener    \
     --listener-arn "<listener arn>" \
     --default-actions        \
        '[{
          "Type": "forward",
          "Order": 1,
          "ForwardConfig": {
             "TargetGroups": [
               { "TargetGroupArn": "<target group 1 arn>",
                 "Weight": 80 },
               { "TargetGroupArn": "<target group 2 arn>",
                 "Weight": 20 },
             ]
          }
         }]'

Or I use AWS CloudFormation with this JSON extract:

"ListenerRule1": {
      "Type": "AWS::ElasticLoadBalancingV2::ListenerRule",
      "Properties": {
        "Actions": [{
          "Type": "forward",
          "ForwardConfig": {
            "TargetGroups": [{
              "TargetGroupArn": { "Ref": "TargetGroup1" },
              "Weight": 1
            }, {
              "TargetGroupArn": { "Ref": "TargetGroup2" },
              "Weight": 1
            }]
          }
        }],
        "Conditions": [{
          "Field": "path-pattern",
          "Values": ["foo"]
        }],
        "ListenerArn": { "Ref": "Listener" },
        "Priority": 1
      }
    }

If you are using an external service or tool to manage your load balancer, you may need to wait till the provider updates their APIs to support weighted routing configuration on Application load balancer.

Other uses
In addition to blue/green deployments, AWS customers can use weighted target groups for two other use cases: cloud migration, or migration between different AWS compute resources.

When you migrate an on-premises application to the cloud, you may want to do it progressively, with a period where the application is running both on the on-premises data center and in the cloud. Eventually, when you have verified that the cloud version performs satisfactorily, you may completely deprecate the on-premises application.

Similarly, when you migrate a workload from EC2 instances to Docker containers running on AWS Fargate for example, you can easily bring up your new application stack on a new target group and gradually move the traffic by changing the target group weights, with no downtime for end users. With Application Load Balancer supporting a variety of AWS resources like EC2, Containers (Amazon ECS, Amazon Elastic Kubernetes Service, AWS Fargate), AWS Lambda functions and IP addresses as targets, you can choose to move traffic between any of these.

Target Group Stickiness
There are situations when you want the clients to experience the same version of the application for a specified duration. Or you want clients currently using the app to not switch to the newly deployed (green) version during their session. For these use cases, we also introduce target group stickiness. When target group stickiness is enabled, the requests from a client are all sent to the same target group for the specified time duration. At the expiry of the duration, the requests are distributed to a target group according to the weight. ALB issues a cookie to maintain target group stickiness.

Note that target group stickiness is different from the already existing target stickiness (also known as Sticky Sessions). Sticky Sessions makes sure that the requests from a client are always sticking to a particular target within a target group. Target group stickiness only ensures the requests are sent to a particular target group. Sticky sessions can be used in conjunction with the target group level stickiness.

To add or configure target group stickiness from the AWS Command Line Interface (CLI), you use the TargetGroupStickinessConfig parameter, like below:

aws elbv2 modify-listener \
    --listener-arn "<listener arn" \
    --default-actions \
    '[{
       "Type": "forward",
       "Order": 1,
       "ForwardConfig": {
          "TargetGroups": [
             {"TargetGroupArn": "<target group 1 arn>", "Weight": 20}, \
             {"TargetGroupArn": "<target group 2 arn>", "Weight": 80}, \
          ],
          "TargetGroupStickinessConfig": {
             "Enabled": true,
             "DurationSeconds": 2000
          }
       }
   }]'

Availability
Application Load Balancer supports up to 5 target groups per listener’s rules, each having their weight. You can adjust the weights as many times as you need, up to the API threshold limit. There might be a slight delay before the actual traffic weight is updated.

Weighted target group is available in all AWS Regions today. There is no additional cost to use weighted target group on Application Load Balancer.

— seb

PS: do not forget to delete the example infrastructure created for this blog post and stop accruing AWS charges. As we manually modified an infrastructure created by the CDK, a simple cdk destroy will immediately return. Connect to the AWS CloudFormation console instead and delete the AlbWtgStack. You also need to manually delete the blue and green target groups in the EC2 console.

CloudFormation Update – CLI + Third-Party Resource Support + Registry

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/cloudformation-update-cli-third-party-resource-support-registry/

CloudFormation was launched in 2011 (AWS CloudFormation – Create Your AWS Stack From a Recipe) and has become an indispensable tool for many AWS customers. They love the fact that they can define a template once and then use it to reliably provision their AWS resources. They also make frequent use of Change Sets, and count on them to provide insights into the actions (additions, changes, and deletions) that will take place when the change set is executed.

As I have written about in the past, CloudFormation takes special care to implement a model that is consistent, stable, and uniform. This is an important, yet often overlooked, element of CloudFormation, but one that our customers tell us that they highly value!

Let’s take a look at a couple of the most frequent requests from our customers:

Performance – Over the past year, the number of operations performed on CloudFormation stacks has grown by 30% per quarter! The development team has worked non-stop to make CloudFormation faster and more efficient, even as usage grows like a weed, using a combination of architectural improvements and low-level optimizations. Over the past couple of months, this work has allowed us to raise a number of soft and hard limits associated with CloudFormation, and drove a significant reduction in average and maximum latency for Create and Update operations.

Coverage – We release new services and new features very rapidly, and sometimes without CloudFormation support. Our goal is to support new services and new features as quickly as possible, and I believe that we are making progress. We are also using the new CloudFormation Coverage Roadmap as a primary source of input to our development process, and have already addressed 43 of the issues.

Extensibility – Customers who make extensive use of CloudFormation tell us that they want to automate the creation of non-AWS resources. This includes resources created by their own development teams and by third-party suppliers of SaaS applications, monitoring tools, and so forth. They are already making good use of Custom Resources, but as always want even more control and power, and a simple way to manage them.

CloudFormation Registry and CloudFormation CLI
Today we are addressing your requests for more coverage and better extensibility with the launch of the CloudFormation CLI as an open source project.

You can use this kit to define and create resource providers that automate the creation of resources in a safe & systematic way. You create a schema, define a handler for five core operations, test it locally, and then publish your provider to a new provider registry that is associated with your AWS account.

We have also been working with a select set of third-party vendors, helping them to create resource providers for their SaaS applications, monitoring tools, and so forth. You will be able to obtain the providers from the vendors of interest and add them to your provider registry.

Finally, we are making a set of AWS resource providers available in open source form. You can use them to learn how to write a robust provider, and you can also extend them (in your own namespace), as desired.

Let’s dive in!

CloudFormation CLI
This set of tools gives you everything you need to build your own resource providers, including detailed documentation and sample code. The cfn (CloudFormation Command Line Interface) command helps you to initialize your project, generate skeleton code, test your provider, and register it with CloudFormation.

Here are the principal steps:

Model – Create and validate a schema that serves as the canonical description of your resource.

Develop – Write a handler (Java and Go now, with other languages to follow) that defines five core operations (Create, Read, Update, Delete, and List) on your resource, and test it locally.

Register – Register the provider with CloudFormation so that it can be used in your CloudFormation templates.

Modeling a Resource
The schema for a resource must conform to the Resource Provider Definition Schema. It defines the resource, its properties, and its attributes. The properties can be defined as read-only, write-only, and create-only; this provides CloudFormation with the information it needs to have in order to be able to modify existing resources when it is executing an operation on a stack. Here is a simple definition:

{
  "additionalProperties": false,
  "createOnlyProperties": [
    "/properties/Name"
  ],
  "primaryIdentifier": [
    "/properties/Name"
  ],
  "properties": {
    "Name": {
      "description": "The name of the configuration set.",
      "maxLength": 64,
      "pattern": "^[a-zA-Z0-9_-]{0,64}$",
      "type": "string"
    }
  },
  "typeName": "AWS::SES::ConfigurationSet",
  "description": "A sample resource"
}

Develop
The handlers make use of a framework that takes care of error handling, throttling of calls to downstream APIs, credential management, and so forth. The CloudFormation CLI contains complete sample code; you can also study the Amazon SES Resource Provider (or any of the others) to learn more.

To learn more, read Walkthrough: Develop a Resource Provider in the CloudFormation CLI documentation.

Register
After you have developed and locally tested your resource provider, you need to tell CloudFormation about it. Using the CloudFormation CLI, you submit the package (schema and compiled handlers) to the desired AWS region(s). The acceptance process is asynchronous; once it completes you can use the new resource type in your CloudFormation templates.

Cloudformation Registry
The CloudFormation registry provides per-account, per-region storage for your resource providers. You can access it from the CloudFormation Console:

Select Public to view the native AWS resources (AWS::*::*); select Private to view resources that you create, and those that you obtain from third parties.

You can also access the registry programmatically using the RegisterType, DeregisterType, ListTypes, ListTypeRegistrations, ListTypeVersions, and DescribeType functions.

Third-Party Support
As I mentioned earlier, a select set of third-party vendors have been working to create resource providers ahead of today’s launch. Here’s the initial list:

After registering the provider from a vendor, you will be able to reference the corresponding resource types in your CloudFormation templates. For example, you can use Datadog::Monitors::Monitor to create a Datadog monitor.

If you are a third-party vendor and are interested in creating a resource provider for your product, send an email to [email protected].

Available Now
You can use the CloudFormation CLI to build resource providers for use in all public AWS regions.

Jeff;

Announcing Firelens – A New Way to Manage Container Logs

Post Syndicated from Martin Beeby original https://aws.amazon.com/blogs/aws/announcing-firelens-a-new-way-to-manage-container-logs/

Today, the fantastic team that builds our container services at AWS have launched an excellent new tool called AWS FireLens that will make dealing with logs a whole lot easier.

Using FireLens, customers can direct container logs to storage and analytics tools without modifying deployment scripts, manually installing extra software or writing additional code. With a few configuration updates on Amazon ECS or AWS Fargate, you select the destination and optionally define filters to instruct FireLens to send container logs to where they are needed.

FireLens works with either Fluent Bit or Fluentd, which means that you can send logs to any destination supported by either of those open-source projects. We maintain a web page where you can see a list of AWS Partner Network products that have been reviewed by AWS Solution Architects. You can send log data or events to any of these products using FireLens.

I find the simplest way to understand FireLens is to use it, so in the rest of this blog post, I’m going to demonstrate using FireLens with a container in Amazon ECS, forwarding the container logs on to Amazon CloudWatch.

First, I need to configure a task definition, I got an example definition from the Amazon ECS FireLens Examples on GitHub.

I replaced the AWS Identity and Access Management (IAM) roles with my own taskRoleArn and executionRoleArn IAM roles, I also added port mappings so that I could access the NGINX container from a browser.

{
	"family": "firelens-example-cloudwatch",
	"taskRoleArn": "arn:aws:iam::365489000573:role/ecsInstanceRole",
	"executionRoleArn": "arn:aws:iam::365489300073:role/ecsTaskExecutionRole",
	"containerDefinitions": [
		{
			"essential": true,
			"image": "906394416424.dkr.ecr.us-east-1.amazonaws.com/aws-for-fluent-bit:latest",
			"name": "log_router",
			"firelensConfiguration": {
				"type": "fluentbit"
			},
			"logConfiguration": {
				"logDriver": "awslogs",
				"options": {
					"awslogs-group": "firelens-container",
					"awslogs-region": "us-west-2",
					"awslogs-create-group": "true",
					"awslogs-stream-prefix": "firelens"
				}
			},
			"memoryReservation": 50
		 },
		 {
			 "essential": true,
			 "image": "nginx",
			 "name": "app",
			 "portMappings": [
				{
				  "containerPort": 80,
				  "hostPort": 80
				}
			  ],
			 "logConfiguration": {
				 "logDriver":"awsfirelens",
				 "options": {
					"Name": "cloudwatch",
					"region": "us-west-2",
					"log_group_name": "firelens-fluent-bit",
					"auto_create_group": "true",
					"log_stream_prefix": "from-fluent-bit"
				}
			},
			"memoryReservation": 100
		}
	]
}

I saved the task definition to a local folder and then used the AWS Command Line Interface (CLI) to register the task definition.

aws ecs register-task-definition --cli-input-json file://cloudwatch_task_definition.json

I already have an ECS cluster set up, but if you don’t, you can learn how to do that from the ECS documentation. The command below creates a service on my ECS cluster using my newly registered task definition.

aws ecs create-service --cluster demo-cluster --service-name demo-service --task-definition firelens-example-cloudwatch --desired-count 1 --launch-type "EC2"

After logging into the Amazon ECS console and drilling into my service, and my tasks, I find the container definition that exposes an External Link. This IP address is exposed since I asked for the container to map port 80 of the container port to port 80 of the host port inside of the task definition.

If I go to that IP adress in a browser then the NGINX container which I used as my app, serves its default page. The NGINX container logs any requests that it receives to Stdout and so FireLens will now forward these logs on to CloudWatch. I added a little message to the URL so that when I take a look at the logs, I should be able to quickly identify this request from all the others.

I then navigated over to the Amazon CloudWatch console and drilled down into the firelens-fluent-bit log group. If you remember this is the log group name that I set up in the original task definition. Below you will notice I have several logs in my log stream and the last one is the request that I just made in the browser. If you look closely at the log, you will find that “IT WORKS” is passed in as part of the GET request.

So there we have it, I successfully set up FireLens and had it forward my container logs on to CloudWatch I could, of course, have chosen a different destination, for example, a third-party provider like Datadog or an AWS destination like Amazon Kinesis Data Firehose.

If you want to try FireLens, it is available today in all regions that support Amazon ECS, and AWS Fargate.

Happy Logging!

In The Works – New AMD-Powered, Compute-Optimized EC2 Instances (C5a/C5ad)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-new-amd-powered-compute-optimized-ec2-instances-c5a-c5ad/

We’re getting ready to give you even more power and even more choices when it comes to EC2 instances.

We will soon launch C5a and C5ad instances powered by custom second-generation AMD EPYC “Rome” processors running at frequencies as high as 3.3 GHz. You will be able to use these compute-optimized instances to run your batch processing, distributed analytics, web applications and other compute-intensive workloads. Like the existing AMD-powered instances in the M, R and T families, the C5a and C5ad instances are built on the AWS Nitro System and give you an opportunity to balance your instance mix based on cost and performance.

The instances will be available in eight sizes and also in bare metal form, with up to 192 vCPUs and 384 GiB of memory. The C5ad instances will include up to 7.6 TiB of fast, local NVMe storage, making them perfect for video encoding, image manipulation, and other media processing workloads.

The bare metal instances (c5an.metal and c5adn.metal) will offer twice as much memory and double the vCPU count of comparable instances, making them some of the largest and most powerful compute-optimized instances yet. The bare metal variants will have access to 100 Gbps of network bandwidth and will be compatible with Elastic Fabric Adapter — perfect for your most demanding HPC workloads!

I’ll have more information soon, so stay tuned!

Jeff;

Now available: Batch Recommendations in Amazon Personalize

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/now-available-batch-recommendations-in-amazon-personalize/

Today, we’re very happy to announce that Amazon Personalize now supports batch recommendations.

Launched at AWS re:Invent 2018, Personalize is a fully-managed service that allows you to create private, customized personalization recommendations for your applications, with little to no machine learning experience required.

With Personalize, you provide the unique signals in your activity data (page views, sign-ups, purchases, and so forth) along with optional customer demographic information (age, location, etc.). You then provide the inventory of the items you want to recommend, such as articles, products, videos, or music: as explained in previous blog posts, you can use both historical data stored in Amazon Simple Storage Service (S3) and streaming data sent in real-time from a JavaScript tracker or server-side.

Then, entirely under the covers, Personalize will process and examine the data, identify what is meaningful, select the right algorithms, train and optimize a personalization model that is customized for your data, and is accessible via an API that can be easily invoked by your business application.

However, some customers have told us that batch recommendations would be a better fit for their use cases. For example, some of them need the ability to compute recommendations for very large numbers of users or items in one go, store them, and feed them over time to batch-oriented workflows such as sending email or notifications: although you could certainly do this with a real-time recommendation endpoint, batch processing is simply more convenient and more cost-effective.

Let’s do a quick demo.

Introducing Batch Recommendations
For the sake of brevity, I’ll reuse the movie recommendation solution trained in this post on the MovieLens data set. Here, instead of deploying a real-time campaign based on this solution, we’re going to create a batch recommendation job.

First, let’s define users for whom we’d like to recommend movies. I simply list their user ids in a JSON file that I store in an S3 bucket.

{"userId": "123"}
{"userId": "456"}
{"userId": "789"}
{"userId": "321"}
{"userId": "654"}
{"userId": "987"}

Then, I apply a bucket policy to that bucket, so that Personalize may read and write objects in it. I’m using the AWS console here, and you can do the same thing programmatically with the PutBucketAcl API.

Now let’s head out to the Personalize console, and create a batch inference job.

As you would expect, I need to give the job a name, and select an AWS Identity and Access Management (IAM) role for Personalize in order to allow access to my S3 bucket. The bucket policy was taken care of already.

Then, I select the solution that I want to use to recommend movies.

Finally, I define the location of input and output data, with optional AWS Key Management Service (KMS) keys for decryption and encryption.

After a little while, the job is complete, and I can fetch recommendations from my bucket.

$ aws s3 cp s3://jsimon-personalize-euwest-1/batch/output/batch/users.json.out -
{"input":{"userId":"123"}, "output": {"recommendedItems": ["137", "285", "14", "283", "124", "13", "508", "276", "275", "475", "515", "237", "246", "117", "19", "9", "25", "93", "181", "100", "10", "7", "273", "1", "150"]}}
{"input":{"userId":"456"}, "output": {"recommendedItems": ["272", "333", "286", "271", "268", "313", "340", "751", "332", "750", "347", "316", "300", "294", "690", "331", "307", "288", "304", "302", "245", "326", "315", "346", "305"]}}
{"input":{"userId":"789"}, "output": {"recommendedItems": ["275", "14", "13", "93", "1", "117", "7", "246", "508", "9", "248", "276", "137", "151", "150", "111", "124", "237", "744", "475", "24", "283", "20", "273", "25"]}}
{"input":{"userId":"321"}, "output": {"recommendedItems": ["86", "197", "180", "603", "170", "427", "191", "462", "494", "175", "61", "198", "238", "45", "507", "203", "357", "661", "30", "428", "132", "135", "479", "657", "530"]}}
{"input":{"userId":"654"}, "output": {"recommendedItems": ["272", "270", "268", "340", "210", "313", "216", "302", "182", "318", "168", "174", "751", "234", "750", "183", "271", "79", "603", "204", "12", "98", "333", "202", "902"]}}
{"input":{"userId":"987"}, "output": {"recommendedItems": ["286", "302", "313", "294", "300", "268", "269", "288", "315", "333", "272", "242", "258", "347", "690", "310", "100", "340", "50", "292", "327", "332", "751", "319", "181"]}}

In a real-life scenario, I would then feed these recommendations to downstream applications for further processing. Of course, instead of using the console, I would create and manage jobs programmatically with the CreateBatchInferenceJob, DescribeBatchInferenceJob, and ListBatchInferenceJobs APIs.

Now Available!
Using batch recommendations with Amazon Personalize is an easy and cost-effective way to add personalization to your applications. You can start using this feature today in all regions where Personalize is available.

Please send us feedback, either on the AWS forum for Amazon Personalize, or through your usual AWS support contacts.

Julien

New – Insert, Update, Delete Data on S3 with Amazon EMR and Apache Hudi

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-insert-update-delete-data-on-s3-with-amazon-emr-and-apache-hudi/

Storing your data in Amazon S3 provides lots of benefits in terms of scale, reliability, and cost effectiveness. On top of that, you can leverage Amazon EMR to process and analyze your data using open source tools like Apache Spark, Hive, and Presto. As powerful as these tools are, it can still be challenging to deal with use cases where you need to do incremental data processing, and record-level insert, update, and delete.

Talking with customers, we found that there are use cases that need to handle incremental changes to individual records, for example:

  • Complying with data privacy regulations, where their users choose to exercise their right to be forgotten, or change their consent as to how their data can be used.
  • Working with streaming data, when you have to handle specific data insertion and update events.
  • Using change data capture (CDC) architectures to track and ingest database change logs from enterprise data warehouses or operational data stores.
  • Reinstating late arriving data, or analyzing data as of a specific point in time.

Starting today, EMR release 5.28.0 includes Apache Hudi (incubating), so that you no longer need to build custom solutions to perform record-level insert, update, and delete operations. Hudi development started in Uber in 2016 to address inefficiencies across ingest and ETL pipelines. In the recent months the EMR team has worked closely with the Apache Hudi community, contributing patches that include updating Hudi to Spark 2.4.4 (HUDI-12), supporting Spark Avro (HUDI-91), adding support for AWS Glue Data Catalog (HUDI-306), as well as multiple bug fixes.

Using Hudi, you can perform record-level inserts, updates, and deletes on S3 allowing you to comply with data privacy laws, consume real time streams and change data captures, reinstate late arriving data and track history and rollbacks in an open, vendor neutral format. You create datasets and tables and Hudi manages the underlying data format. Hudi uses Apache Parquet, and Apache Avro for data storage, and includes built-in integrations with Spark, Hive, and Presto, enabling you to query Hudi datasets using the same tools that you use today with near real-time access to fresh data.

When launching an EMR cluster, the libraries and tools for Hudi are installed and configured automatically any time at least one of the following components is selected: Hive, Spark, or Presto. You can use Spark to create new Hudi datasets, and insert, update, and delete data. Each Hudi dataset is registered in your cluster’s configured metastore (including the AWS Glue Data Catalog), and appears as a table that can be queried using Spark, Hive, and Presto.

Hudi supports two storage types that define how data is written, indexed, and read from S3:

  • Copy on Write – data is stored in columnar format (Parquet) and updates create a new version of the files during writes. This storage type is best used for read-heavy workloads, because the latest version of the dataset is always available in efficient columnar files.
  • Merge on Read – data is stored with a combination of columnar (Parquet) and row-based (Avro) formats; updates are logged to row-based “delta files” and compacted later creating a new version of the columnar files. This storage type is best used for write-heavy workloads, because new commits are written quickly as delta files, but reading the data set requires merging the compacted columnar files with the delta files.

Let’s do a quick overview of how you can set up and use Hudi datasets in an EMR cluster.

Using Apache Hudi with Amazon EMR
I start creating a cluster from the EMR console. In the advanced options I select EMR release 5.28.0 (the first including Hudi) and the following applications: Spark, Hive, and Tez. In the hardware options, I add 3 task nodes to ensure I have enough capacity to run both Spark and Hive.

When the cluster is ready, I use the key pair I selected in the security options to SSH into the master node and access the Spark Shell. I use the following command to start the Spark Shell to use it with Hudi:

$ spark-shell --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer"
              --conf "spark.sql.hive.convertMetastoreParquet=false"
              --jars /usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/external/lib/spark-avro.jar

There, I use the following Scala code to import some sample ELB logs in a Hudi dataset using the Copy on Write storage type:

import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.functions._
import org.apache.hudi.DataSourceWriteOptions
import org.apache.hudi.config.HoodieWriteConfig
import org.apache.hudi.hive.MultiPartKeysValueExtractor

//Set up various input values as variables
val inputDataPath = "s3://athena-examples-us-west-2/elb/parquet/year=2015/month=1/day=1/"
val hudiTableName = "elb_logs_hudi_cow"
val hudiTablePath = "s3://MY-BUCKET/PATH/" + hudiTableName

// Set up our Hudi Data Source Options
val hudiOptions = Map[String,String](
    DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "request_ip",
    DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "request_verb", 
    HoodieWriteConfig.TABLE_NAME -> hudiTableName, 
    DataSourceWriteOptions.OPERATION_OPT_KEY ->
        DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL, 
    DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "request_timestamp", 
    DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", 
    DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> hudiTableName, 
    DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "request_verb", 
    DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false", 
    DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY ->
        classOf[MultiPartKeysValueExtractor].getName)

// Read data from S3 and create a DataFrame with Partition and Record Key
val inputDF = spark.read.format("parquet").load(inputDataPath)

// Write data into the Hudi dataset
inputDF.write
       .format("org.apache.hudi")
       .options(hudiOptions)
       .mode(SaveMode.Overwrite)
       .save(hudiTablePath)

In the Spark Shell, I can now count the records in the Hudi dataset:

scala> inputDF2.count()
res1: Long = 10491958

In the options, I used the integration with the Hive metastore configured for the cluster, so that the table is created in the default database. In this way, I can use Hive to query the data in the Hudi dataset:

hive> use default;
hive> select count(*) from elb_logs_hudi_cow;
...
OK
10491958
...

I can now update or delete a single record in the dataset. In the Spark Shell, I prepare some variables to find the record I want to update, and a SQL statement to select the value of the column I want to change:

val requestIpToUpdate = "243.80.62.181"
val sqlStatement = s"SELECT elb_name FROM elb_logs_hudi_cow WHERE request_ip = '$requestIpToUpdate'"

I execute the SQL statement to see the current value of the column:

scala> spark.sql(sqlStatement).show()
+------------+                                                                  
|    elb_name|
+------------+
|elb_demo_003|
+------------+

Then, I select and update the record:

// Create a DataFrame with a single record and update column value
val updateDF = inputDF.filter(col("request_ip") === requestIpToUpdate)
                      .withColumn("elb_name", lit("elb_demo_001"))

Now I update the Hudi dataset with a syntax similar to the one I used to create it. But this time, the DataFrame I am writing contains only one record:

// Write the DataFrame as an update to existing Hudi dataset
updateDF.write
        .format("org.apache.hudi")
        .options(hudiOptions)
        .option(DataSourceWriteOptions.OPERATION_OPT_KEY,
                DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
        .mode(SaveMode.Append)
        .save(hudiTablePath)

In the Spark Shell, I check the result of the update:

scala> spark.sql(sqlStatement).show()
+------------+                                                                  
|    elb_name|
+------------+
|elb_demo_001|
+------------+

Now I want to delete the same record. To delete it, I pass the EmptyHoodieRecordPayload payload in the write options:

// Write the DataFrame with an EmptyHoodieRecordPayload for deleting a record
updateDF.write
        .format("org.apache.hudi")
        .options(hudiOptions)
        .option(DataSourceWriteOptions.OPERATION_OPT_KEY,
                DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
        .option(DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY,
                "org.apache.hudi.EmptyHoodieRecordPayload")
        .mode(SaveMode.Append)
        .save(hudiTablePath)

In the Spark Shell, I see that the record is no longer available:

scala> spark.sql(sqlStatement).show()
+--------+                                                                      
|elb_name|
+--------+
+--------+

How are all those updates and deletes managed by Hudi? Let’s use the Hudi Command Line Interface (CLI) to connect to the dataset and see now those changes are interpreted as commits:

This dataset is a Copy on Write dataset, that means that each time there is an update to a record, the file that contains that record will be rewritten to contain the updated values. You can see how many records have been written for each commit. The bottom line of the table describes the initial creation of the dataset, above there is the single record update, and at the top the single record delete.

With Hudi, you can roll back to each commit. For example, I can roll back the delete operation with:

hudi:elb_logs_hudi_cow->commit rollback --commit 20191104121031

In the Spark Shell, the record is now back to where it was, just after the update:

scala> spark.sql(sqlStatement).show()
+------------+                                                                  
|    elb_name|
+------------+
|elb_demo_001|
+------------+

Copy on Write is the default storage type. I can repeat the steps above to create and update a Merge on Read dataset type by adding this to our hudiOptions:

DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ"

If you update a Merge on Read dataset and look at the commits with the Hudi CLI, you can see how different Merge on Read is compared to Copy on Write. With Merge on Read, you are only writing the updated rows and not whole files as with Copy on Write. This is why Merge on Read is helpful for use cases that require more writes, or update/delete heavy workload, with a fewer number of reads. Delta commits are written to disk as Avro records (row-based storage), and compacted data is written as Parquet files (columnar storage). To avoid creating too many delta files, Hudi will automatically compact your dataset so that your reads are as performant as possible.

When a Merge On Read dataset is created, two Hive tables are created:

  • The first table matches the name of the dataset.
  • The second table has the characters _rt appended to its name; the _rt postfix stands for real-time.

When queried, the first table return the data that has been compacted, and will not show the latest delta commits. Using this table provides the best performance, but omits the freshest data. Querying the real-time table will merge the compacted data with the delta commits on read, hence this dataset being called “Merge on Read”. This will result in the freshest data being available, but incurs a performance overhead, and is not as performant as querying the compacted data. In this way, data engineers and analysts have the flexibility to choose between performance and data freshness.

Available Now
This new feature is available now in all regions with EMR 5.28.0. There is no additional cost in using Hudi with EMR. You can learn more about Hudi in the EMR documentation. This new tool can simplify the way you process, update and delete data in S3. Let me know which use cases are you going to use it for!

Danilo

Accelerate SQL Server Always On Deployments with AWS Launch Wizard

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/accelerate-sql-server-always-on-deployments-with-aws-launch-wizard/

Customers sometimes tell us that while they are experts in their domain, their unfamiliarity with the cloud can make getting started more challenging and take more time. They want to be able to quickly and easily deploy enterprise applications on AWS without needing prior tribal knowledge of the AWS platform and best practices, so as to accelerate their journey to the cloud.

Announcing AWS Launch Wizard for SQL Server
AWS Launch Wizard for SQL Server is a simple, intuitive and free to use wizard-based experience that enables quick and easy deployment of high availability SQL solutions on AWS. The wizard walks you through an end-to-end deployment experience of Always On Availability Groups using prescriptive guidance. By answering a few high-level questions about the application such as required performance characteristics the wizard will then take care of identifying, provisioning, and configuring matching AWS resources such as Amazon Elastic Compute Cloud (EC2) instances, Amazon Elastic Block Store (EBS) volumes, and an Amazon Virtual Private Cloud. Based on your selections the wizard presents you with a dynamically generated estimated cost of deployment – as you modify your resource selections, you can see an updated cost assessment to help you match your budget.

Once you approve, AWS Launch Wizard for SQL Server provisions these resources and configures them to create a fully functioning production-ready SQL Server Always On deployment in just a few hours. The created resources are tagged making it easy to identity and work with them and the wizard also creates AWS CloudFormation templates, providing you with a baseline for repeatable and consistent application deployments.

Subsequent SQL Server Always On deployments become faster and easier as AWS Launch Wizard for SQL Server takes care of dealing with the required infrastructure on your behalf, determining the resources to match your application’s requirements such as performance, memory, bandwidth etc (you can modify the recommended defaults if you wish). If you want to bring your own SQL Server licenses, or have other custom requirements for the instances, you can also select to use your own custom AMIs provided they meet certain requirements (noted in the service documentation).

Using AWS Launch Wizard for SQL Server
To get started with my deployment, in the Launch Wizard Console I click the Create deployment button to start the wizard and select SQL Server Always On.


The wizard requires an AWS Identity and Access Management (IAM) role granting it permissions to deploy and access resources in my account. The wizard will check to see if a role named AmazonEC2RoleForLaunchWizard exists in my account. If so it will be used, otherwise a new role will be created. The new role will have two AWS managed policies, AmazonSSMManagedInstanceCore and AmazonEC2RolePolicyforLaunchWizard, attached to it. Note that this one time setup process will be typically performed by an IAM Administrator for your organization. However, the IAM user does not have to be an Administrator and CreateRole, AttachRolePolicy, and GetRole permissions are sufficient to perform these operations. After the role is created, the IAM Administrator can delegate the application deployment process to another IAM user who, in turn, must have the AWS Launch Wizard for SQL Server IAM managed policy called AmazonLaunchWizardFullaccess attached to it.

With the application type selected I can proceed by clicking Next to start configuring my application settings, beginning with setting a deployment name and optionally an Amazon Simple Notification Service (SNS) topic that AWS Launch Wizard for SQL Server can use for notifications and alerts. In the connectivity options I can choose to use an existing Amazon Virtual Private Cloud or have a new one created. I can also specify the name of an existing key pair (or create one). The key pair will be used if I want to RDP into my instances or obtain the administrator password. For a new Virtual Private Cloud I can also configure the IP address or range to which remote desktop access will be permitted:
Instances launched by AWS Launch Wizard for SQL Server will be domain joined to an Active Directory. I can select either an existing AWS Managed AD, or an on-premises AD, or have the wizard create a new AWS Managed Directory for my deployment:

The final application settings relate to SQL Server. This is also where I can specify a custom AMI to be used if I want to bring my own SQL Server licenses or have other customization requirements. Here I’m just going to create a new SQL Server Service account and use an Amazon-provided image with license included. Note that if I choose to use an existing service account it should be part of the Managed AD in which you are deploying:

Clicking Next takes me to a page to define the infrastructure requirements of my application, in terms of CPU and network performance and memory. I can also select the type of storage (solid state vs magnetic) and required SQL Server throughput. The wizard will recommend the resource types to be launched but I can also override and select specific instance and volume types, and I can also set custom tags to apply to the resources that will be created:

The final section of this page shows me the cost estimate based on my selections. This data in this panel is dynamically generated based on my prior selections and I can go back and forth in the wizard, tuning my selections to match my budget:

When I am happy with my selections, clicking Next takes me to wizard’s final Review page where I can view a summary of my selections and acknowledge that AWS resources and AWS Identity and Access Management (IAM) permissions will be created on my behalf, along with the estimated cost as was shown in the estimator on the previous page. My final step is to click Deploy to start the deployment process. Status updates during deployment can be viewed on the Deployments page with a final notification to inform me on completion.

Post-deployment Management
Once my application has been deployed I can manage its resources easily. Firstly I can navigate to Deployments on the AWS Launch Wizard for SQL Server dashboard and using the Actions dropdown I can jump to the Amazon Elastic Compute Cloud (EC2) console where I can manage the EC2 instances, EBS volumes, Active Directory etc. Or, using the same Actions dropdown, I can access SQL Server via the remote desktop gateway instance. If I want to manage future updates and patches to my application using AWS Systems Manager another Actions option takes me to the Systems Manager dashboard for managing my application. I can also use the AWS Launch Wizard for SQL Server to delete deployments performed using the wizard and it will perform a roll-back of all AWS CloudFormation stacks that the service created.

Now Available
AWS Launch Wizard for SQL Server is generally available and you can use it in the following AWS Regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Canada (Central), South America (Sao Paulo), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), EU (London), and EU (Stockholm). Support for the AWS regions in China, and for the GovCloud Region, is in the works. There is no additional charge for using AWS Launch Wizard for SQL Server, only for the resources it creates.

— Steve

AWS Data Exchange – Find, Subscribe To, and Use Data Products

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-data-exchange-find-subscribe-to-and-use-data-products/

We live in a data-intensive, data-driven world! Organizations of all types collect, store, process, analyze data and use it to inform and improve their decision-making processes. The AWS Cloud is well-suited to all of these activities; it offers vast amounts of storage, access to any conceivable amount of compute power, and many different types of analytical tools.

In addition to generating and working with data internally, many organizations generate and then share data sets with the general public or within their industry. We made some initial steps to encourage this back in 2008 with the launch of AWS Public Data Sets (Paging Researchers, Analysts, and Developers). That effort has evolved into the Registry of Open Data on AWS (New – Registry of Open Data on AWS (RODA)), which currently contains 118 interesting datasets, with more added all the time.

New AWS Data Exchange
Today, we are taking the next step forward, and are launching AWS Data Exchange. This addition to AWS Marketplace contains over one thousand licensable data products from over 80 data providers. There’s a diverse catalog of free and paid offerings, in categories such as financial services, health care / life sciences, geospatial, weather, and mapping.

If you are a data subscriber, you can quickly find, procure, and start using these products. If you are a data provider, you can easily package, license, and deliver products of your own. Let’s take a look at Data Exchange from both vantage points, and then review some important details.

Let’s define a few important terms before diving in:

Data Provider – An organization that has one or more data products to share.

Data Subscriber – An AWS customer that wants to make use of data products from Data Providers.

Data Product – A collection of data sets.

Data Set – A container for data assets that belong together, grouped by revision.

Revision – A container for one or more data assets as of a point in time.

Data Asset – The actual data, in any desired format.

AWS Data Exchange for Data Subscribers
As a data subscriber, I click View product catalog and start out in the Discover data section of the AWS Data Exchange Console:

Products are available from a long list of vendors:

I can enter a search term, click Search, and then narrow down my results to show only products that have a Free pricing plan:

I can also search for products from a specific vendor, that match a search term, and that have a Free pricing plan:

The second one looks interesting and relevant, so I click on 5 Digit Zip Code Boundaries US (TRIAL) to learn more:

I think I can use this in my app, and want to give it a try, so I click Continue to subscribe. I review the details, read the Data Subscription Agreement, and click Subscribe:

The subscription is activated within a few minutes, and I can see it in my list of Subscriptions:

Then I can download the set to my S3 bucket, and take a look. I click into the data set, and find the Revisions:

I click into the revision, and I can see the assets (containing the actual data) that I am looking for:

I select the asset(s) that I want, and click Export to Amazon S3. Then I choose a bucket, and Click Export to proceed:

This creates a job that will copy the data to my bucket (extra IAM permissions are required here; read the Access Control documentation for more info):

The jobs run asynchronously and copy data from Data Exchange to the bucket. Jobs can be created interactively, as I just showed you, or programmatically. Once the data is in the bucket, I can access and process it in any desired way. I could, for example, use a AWS Lambda function to parse the ZIP file and use the results to update a Amazon DynamoDB table. Or, I could run an AWS Glue crawler to get the data into my Glue catalog, run an Amazon Athena query, and visualize the results in a Amazon QuickSight dashboard.

Subscription can last from 1-36 months with an auto-renew option; subscription fees are billed to my AWS account each month.

AWS Data Exchange for Data Providers
Now I am going to put my “data provider” hat and show you the basics of the publication process (the User Guide contains a more detailed walk-through). In order to be able to license data, I must agree to the terms and conditions, and my application must be approved by AWS.

After I apply and have been approved, I start by creating my first data set. I click Data sets in the navigation, and then Create data set:

I describe my data set, and have the option to tag it, then click Create:

Next, I click Create revision to create the first revision to the data set:

I add a comment, and have the option to tag the revision before clicking Create:

I can copy my data from an existing S3 location, or I can upload it from my desktop:

I choose the second option, select my file, and it appears as an Imported asset after the import job completes. I review everything, and click Finalize for the revision:

My data set is ready right away, and now I can use it to create one or more products:

The console outlines the principal steps:

I can set up public pricing information for my product:

AWS Data Exchange lets me create private pricing plans for individual customers, and it also allows my existing customers to bring their existing (pre-AWS Data Exchange) licenses for my products along with them by creating a Bring Your Own Subscription offer.

I can use the provided Data Subscription Agreement (DSA) provided by AWS Data Exchange, use it as the basis for my own, or I can upload an existing one:

I can use the AWS Data Exchange API to create, update, list, and manage data sets and revisions to them. Functions include CreateDataSet, UpdataSet, ListDataSets, CreateRevision, UpdateAsset, and CreateJob.

Things to Know
Here are a couple of things that you should know about Data Exchange:

Subscription Verification – The data provider can also require additional information in order to verify my subscription. If that is the case, the console will ask me to supply the info, and the provider will review and approve or decline within 45 days:

Here is what the provider sees:

Revisions & Notifications – The Data Provider can revise their data sets at any time. The Data Consumer receives a CloudWatch Event each time a product that they are subscribed to is updated; this can be used to launch a job to retrieve the latest revision of the assets. If you are implementing a system of this type and need some test events, find and subscribe to the Heartbeat product:

Data Categories & Types – Certain categories of data are not permitted on AWS Data Exchange. For example, your data products may not include information that can be used to identify any person, unless that information is already legally available to the public. See, Publishing Guidelines for detailed guidelines on what categories of data are permitted.

Data Provider Location – Data providers must either be a valid legal entity domiciled in the United States or in a member state of the EU.

Available Now
AWS Data Exchange is available now and you can start using it today. If you own some interesting data and would like to publish it, start here. If you are a developer, browse the product catalog and look for data that will add value to your product.

Jeff;

 

 

New – Import Existing Resources into a CloudFormation Stack

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-import-existing-resources-into-a-cloudformation-stack/

With AWS CloudFormation, you can model your entire infrastructure with text files. In this way, you can treat your infrastructure as code and apply software development best practices, such as putting it under version control, or reviewing architectural changes with your team before deployment.

Sometimes AWS resources initially created using the console or the AWS Command Line Interface (CLI) need to be managed using CloudFormation. For example, you (or a different team) may create an IAM role, a Virtual Private Cloud, or an RDS database in the early stages of a migration, and then you have to spend time to include them in the same stack as the final application. In such cases, you often end up recreating the resources from scratch using CloudFormation, and then migrating configuration and data from the original resource.

To make these steps easier for our customers, you can now import existing resources into a CloudFormation stack!

It was already possible to remove resources from a stack without deleting them by setting the DeletionPolicy to Retain. This, together with the new import operation, enables a new range of possibilities. For example, you are now able to:

  • Create a new stack importing existing resources.
  • Import existing resources in an already created stack.
  • Migrate resources across stacks.
  • Remediate a detected drift.
  • Refactor nested stacks by deleting children stacks from one parent and then importing them into another parent stack.

To import existing resources into a CloudFormation stack, you need to provide:

  • A template that describes the entire stack, including both the resources to import and (for existing stacks) the resources that are already part of the stack.
  • Each resource to import must have a DeletionPolicy attribute in the template. This enables easy reverting of the operation in a completely safe manner.
  • A unique identifier for each target resource, for example the name of the Amazon DynamoDB table or of the Amazon Simple Storage Service (S3) bucket you want to import.

During the resource import operation, CloudFormation checks that:

  • The imported resources do not already belong to another stack in the same region (be careful with global resources such as IAM roles).
  • The target resources exist and you have sufficient permissions to perform the operation.
  • The properties and configuration values are valid against the resource type schema, which defines its required, acceptable properties, and supported values.

The resource import operation does not check that the template configuration and the actual configuration are the same. Since the import operation supports the same resource types as drift detection, I recommend running drift detection after importing resources in a stack.

Importing Existing Resources into a New Stack
In my AWS account, I have an S3 bucket and a DynamoDB table, both with some data inside, and I’d like to manage them using CloudFormation. In the CloudFormation console, I have two new options:

  • I can create a new stack importing existing resources.

  • I can import resources into an existing stack.

In this case, I want to start from scratch, so I create a new stack. The next step is to provide a template with the resources to import.

I upload the following template with two resources to import: a DynamoDB table and an S3 bucket.

AWSTemplateFormatVersion: "2010-09-09"
Description: Import test
Resources:

  ImportedTable:
    Type: AWS::DynamoDB::Table
    DeletionPolicy: Retain
    Properties: 
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions: 
        - AttributeName: id
          AttributeType: S
      KeySchema: 
        - AttributeName: id
          KeyType: HASH

  ImportedBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain

In this template I am setting DeletionPolicy  to Retain for both resources. In this way, if I remove them from the stack, they will not be deleted. This is a good option for resources which contain data you don’t want to delete by mistake, or that you may want to move to a different stack in the future. It is mandatory for imported resources to have a deletion policy set, so you can safely and easily revert the operation, and be protected from mistakenly deleting resources that were imported by someone else.

I now have to provide an identifier to map the logical IDs in the template with the existing resources. In this case, I use the DynamoDB table name and the S3 bucket name. For other resource types, there may be multiple ways to identify them and you can select which property to use in the drop-down menus.

In the final recap, I review changes before applying them. Here I check that I’m targeting the right resources to import with the right identifiers. This is actually a CloudFormation Change Set that will be executed when I import the resources.

When importing resources into an existing stack, no changes are allowed to the existing resources of the stack. The import operation will only allow the Change Set action of Import. Changes to parameters are allowed as long as they don’t cause changes to resolved values of properties in existing resources. You can change the template for existing resources to replace hard coded values with a Ref to a resource being imported. For example, you may have a stack with an EC2 instance using an existing IAM role that was created using the console. You can now import the IAM role into the stack and replace in the template the hard coded value used by the EC2 instance with a Ref to the role.

Moving on, each resource has its corresponding import events in the CloudFormation console.

When the import is complete, in the Resources tab, I see that the S3 bucket and the DynamoDB table are now part of the stack.

To be sure the imported resources are in sync with the stack template, I use drift detection.

All stack-level tags, including automatically created tags, are propagated to resources that CloudFormation supports. For example, I can use the AWS CLI to get the tag set associated with the S3 bucket I just imported into my stack. Those tags give me the CloudFormation stack name and ID, and the logical ID of the resource in the stack template:

$ aws s3api get-bucket-tagging --bucket danilop-toimport

{
  "TagSet": [
    {
      "Key": "aws:cloudformation:stack-name",
      "Value": "imported-stack"
    },
    {
      "Key": "aws:cloudformation:stack-id",
      "Value": "arn:aws:cloudformation:eu-west-1:123412341234:stack/imported-stack/..."
    },
    {
      "Key": "aws:cloudformation:logical-id",
      "Value": "ImportedBucket"
    }
  ]
}

Available Now
You can use the new CloudFormation import operation via the console, AWS Command Line Interface (CLI), or AWS SDKs, in the following regions: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), Canada (Central), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), EU (Frankfurt), EU (Ireland), EU (London), EU (Paris), and South America (São Paulo).

It is now simpler to manage your infrastructure as code, you can learn more on bringing existing resources into CloudFormation management in the documentation.

Danilo

15 Years of AWS Blogging!

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/15-years-of-aws-blogging/

I wrote the first post (Welcome) to this blog exactly 15 years ago today. It is safe to say that I never thought that writing those introductory sentences would lead my career in such a new and ever -challenging dimension. This seems like as good of a time as any to document and share the story of how the blog came to be, share some of my favorite posts, and to talk about the actual mechanics of writing and blogging.

Before the Beginning
Back in 1999 or so, I was part of the Visual Basic team at Microsoft. XML was brand new, and Dave Winer was just starting to talk about RSS. The intersection of VB6, XML, and RSS intrigued me, and I built a little app called Headline Viewer as a side project. I put it up for download, people liked it, and content owners started to send me their RSS feeds for inclusion. The list of feeds took on a life of its own, and people wanted it just as much as they wanted the app. I also started my third personal blog around this time after losing the earlier incarnations in server meltdowns.

With encouragement from Aaron Swartz and others, I put Headline Viewer aside and started Syndic8 in late 2001 to collect, organize, and share them. I wrote nearly 90,000 lines of PHP in my personal time, all centered around a very complex MySQL database that included over 50 tables. I learned a lot about hosting, scaling, security, and database management. The site also had an XML-RPC web service interface that supported a very wide range of query and update operations. The feed collection grew to nearly 250,000 over the first couple of years.

I did not know it at the time, but my early experience with XML, RSS, blogging, and web services would turn out to be the skills that set me apart when I applied to work at Amazon. Sometimes, as it turns out, your hobbies and personal interests can end up as career-changing assets & differentiators.

E-Commerce Web Services
In parallel to all of this, I left Microsoft in 2000 and was consulting in the then-new field of web services. At that time, most of the web services in use were nothing more than cute demos: stock quotes, weather forecasts, and currency conversions. Technologists could marvel at a function call that crossed the Internet and back, but investors simply shrugged and moved on.

In mid-2002 I became aware of Amazon’s very first web service (now known as the Product Advertising API). This was, in my eyes, the first useful web service. It did something non-trivial that could not have been done locally, and provided value to both the provider and the consumer. I downloaded the SDK (copies were later made available on the mini-CD shown at right), sent the developers some feedback, and before I knew it I was at Amazon HQ, along with 4 or 5 other early fans of the service, for a day-long special event. Several teams shared their plans with us, and asked for our unvarnished feedback.

At some point during the day, one of the presenters said “We launched our first service, developers found it, and were building & sharing apps within 24 hours or so. We are going to look around the company and see if we can put web service interfaces on other parts of our business.”

This was my light-bulb moment — Amazon.com was going to become accessible to developers! I turned to Sarah Bryar (she had extended the invite to the event) and told her that I wanted to be a part of this. She said that they could make that happen, and a few weeks later (summer of 2002), I was a development manager on the Amazon Associates team, reporting to Larry Hughes. In addition to running a team that produced daily reports for each member of the Associates program, Larry gave me the freedom to “help out” with the nascent web services effort. I wrote sample programs, helped out on the forums, and even contributed to the code base. I went through the usual Amazon interview loop, and had to write some string-handling code on the white board.

Web Services Evangelist
A couple of months in to the job, Sarah and Rob Frederick approached me and asked me to speak at a conference because no one else wanted to. I was more than happy to do this, and a few months later Sarah offered me the position of Web Services Evangelist. This was a great match for my skills and I took to it right away, booking events with any developer, company, school, or event that wanted to hear from me!

Later in 2003 I was part of a brainstorming session at Jeff Bezos’ house. Jeff, Andy Jassy, Al Vermeulen, me, and a few others (I should have kept better notes) spent a day coming up with a long list of ideas that evolved into EC2, S3, RDS, and so forth. I am fairly sure that this is the session discussed in How AWS Came to Be, but I am not 100% certain.

Using this list as a starting point, Andy started to write a narrative to define the AWS business. I was fortunate enough to have an office just 2 doors up the hall from him, and spent a lot of time reviewing and commenting on his narrative (read How Jeff Bezos Turned Narrative into Amazon’s Competitive Advantage to learn how we use narratives to define businesses and drive decisions). I also wrote some docs of my own that defined our plans for a developer relations team.

We Need a Blog
As I read through early drafts of Andy’s first narrative, I began to get a sense that we were going to build something complex & substantial.

My developer relations plan included a blog, and I spent a ton of time discussing the specifics in meetings with Andy and Drew Herdener. I remember that it was very hard for me to define precisely what this blog would look like, and how it would work from a content-generation and approval perspective. As is the Amazon way, every answer that I supplied basically begat even more questions from Andy and Drew! We ultimately settled on a few ground rules regarding tone and review, and I was raring to go.

I was lucky enough to be asked to accompany Jeff Bezos to the second Foo Camp as his technical advisor. Among many others, I met Ben and Mena Trott of Six Apart, and they gave me a coupon for 1000 free days of access to TypePad, their blogging tool.

We Have a Blog
Armed with that coupon, I returned to Seattle, created the AWS Blog (later renamed the AWS News Blog), and wrote the first two posts (Welcome and Browse Node API) later that year. Little did I know that those first couple of posts would change the course of my career!

I struggled a bit with “voice” in the early days, and could not decide if I was writing as the company, the group, the service, or simply as me. After some experimentation, I found that a personal, first-person style worked best and that’s what I settled on.

In the early days, we did not have much of a process or a blog team. Interesting topics found their way in to my inbox, and I simply wrote about them as I saw fit. I had an incredible amount of freedom to pick and choose topics, and words, and I did my best to be a strong, accurate communicator while staying afield of controversies that would simply cause more work for my colleagues in Amazon PR.

Launching AWS
Andy started building teams and I began to get ready for the first launches. We could have started with a dramatic flourish, proclaiming that we were about to change the world with the introduction of a broad lineup of cloud services. But we don’t work that way, and are happy to communicate in a factual, step-by-step fashion. It was definitely somewhat disconcerting to see that Business Week characterized our early efforts as Jeff Bezo’s Risky Bet, but we accept that our early efforts can sometimes be underappreciated or even misunderstood.

Here are some of the posts that I wrote for the earliest AWS services and features:

SQS – I somehow neglected to write about the first beta of Amazon Simple Queue Service (SQS), and the first mention is in a post called Queue Scratchpad. This post references AWS Zone, a site built by long-time Amazonian Elena Dykhno before she even joined the company! I did manage to write a post for Simple Queue Service Beta 2. At this point I am sure that many people wondered why their bookstore was trying to sell messages queues, but we didn’t see the need to over-explain ourselves or to telegraph our plans.

S3 – I wrote my first Amazon S3 post while running to catch a plane, but I did manage to cover all of the basics: a service overview, definitions of major terms, pricing, and an invitation for developers to create cool applications!

EC2 – EC2 had been “just about to launch” for quite some time, and I knew that the launch would be a big deal. I had already teased the topic of scalable on-demand web services in Sometimes You Need Just a Little…, and I was ever so ready to actually write about EC2. Of course, our long-scheduled family vacation was set to coincide with the launch, and I wrote part of the Amazon EC2 Beta post while sitting poolside in Cabo San Lucas, Mexico! That post was just about perfect, but I probably should have been clear that “AMI” should be pronounced, and not spelled out, as some pundits claim.

EBS – Initially, all of the storage on EC2 instances was ephemeral, and would be lost when the instance was shut down. I think it is safe to say that the launch of EBS (Amazon EBS (Elastic Block Store) – Bring Us Your Data) greatly simplified the use of EC2.

These are just a few of my early posts, but they definitely laid the foundation for what has followed. I still take great delight in reading those posts, thinking back to the early days of the cloud.

AWS Blogging Today
Over the years, the fraction of my time that is allocated to blogging has grown, and now stands at about 80%. This leaves me with time to do a little bit of public speaking, meet with customers, and to do what I can to keep up with this amazing and ever-growing field. I thoroughly enjoy the opportunities that I have to work with the AWS service teams that work so hard to listen to our customers and do their best to respond with services that meet their needs.

We now have a strong team and an equally strong production process for new blog posts. Teams request a post by creating a ticket, attaching their PRFAQ (Press Release + FAQ, another type of Amazon document) and giving the bloggers early internal access to their service. We review the materials, ask hard questions, use the service, and draft our post. We share the drafts internally, read and respond to feedback, and eagerly await the go-ahead to publish.

Planning and Writing a Post
With 3100 posts under my belt (and more on the way), here is what I focus on when planning and writing a post:

Learn & Be Curious – This is an Amazon Leadership Principle. Writing is easy once I understand what I want to say. I study each PRFAQ, ask hard questions, and am never afraid to admit that I don’t grok some seemingly obvious point. Time after time I am seemingly at the absolute limit of what I can understand and absorb, but that never stops me from trying.

Accuracy – I never shade the truth, and I never use weasel words that could be interpreted in more than one way to give myself an out. The Internet is the ultimate fact-checking vehicle, and I don’t want to be wrong. If I am, I am more than happy to admit it, and to fix the issue.

Readability – I have plenty of words in my vocabulary, but I don’t feel the need to use all of them. I would rather use the most appropriate word than the longest and most obscure one. I am also cautious with acronyms and enterprise jargon, and try hard to keep my terabytes and tebibytes (ugh) straight.

Frugality – This is also an Amazon Leadership Principle, and I use it in an interesting way. I know that you are busy, and that you don’t need extra words or flowery language. So I try hard (this post notwithstanding) to keep most of my posts at 700 to 800 words. I’d rather you spend the time using the service and doing something useful.

Some Personal Thoughts
Before I wrap up, I have a couple of reflections on this incredible journey…

Writing – Although I love to write, I was definitely not a natural-born writer. In fact, my high school English teacher gave me the lowest possible passing grade and told me that my future would be better if I could only write better. I stopped trying to grasp formal English, and instead started to observe how genuine writers used words & punctuation. That (and decades of practice) made all the difference.

Career Paths – Blogging and evangelism have turned out to be a great match for my skills and interests, but I did not figure this out until I was on the far side of 40. It is perfectly OK to be 20-something, 30-something, or even 40-something before you finally figure out who you are and what you like to do. Keep that in mind, and stay open and flexible to new avenues and new opportunities throughout your career.

Special Thanks – Over the years I have received tons of good advice and 100% support from many great managers while I slowly grew into a full-time blogger: Andy Jassy, Prashant Sridharan, Steve Rabuchin, and Ariel Kelman. I truly appreciate the freedom that they have given me to develop my authorial voice and my blogging skills over the years! Ana Visneski and Robin Park have done incredible work to build a blogging team that supports me and the other bloggers.

Thanks for Reading
And with that, I would like to thank you, dear reader, for your time, attention, and very kind words over the past 15 years. It has been the privilege of a lifetime to be able to share so much interesting technology with you!

Jeff;

 

Cross-Account Cross-Region Dashboards with Amazon CloudWatch

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/cross-account-cross-region-dashboards-with-amazon-cloudwatch/

Best practices for AWS cloud deployments include the use of multiple accounts and/or multiple regions. Multiple accounts provide a security and billing boundary that isolates resources and reduces the impact of issues. Multiple regions ensures a high degree of isolation, low latency for end users, and data resiliency of applications. These best practices can come with monitoring and troubleshooting complications.

Centralized operations teams, DevOps engineers, and service owners need to monitor, troubleshoot, and analyze applications running in multiple regions and in many accounts. If an alarm is received an on-call engineer likely needs to login to a dashboard to diagnose the issue and might also need to login to other accounts to view additional dashboards for multiple application components or dependencies. Service owners need visibility of application resources, shared resources, or cross-application dependencies that can impact service availability. Using multiple accounts and/or multiple regions can make it challenging to correlate between components for root cause analysis and increase the time to resolution.

Announced today, Amazon CloudWatch cross-account cross-region dashboards enable customers to create high level operational dashboards and utilize one-click drill-downs into more specific dashboards in different accounts, without having to log in and out of different accounts or switch regions. The ability to visualize, aggregate, and summarize performance and operational data across accounts and regions helps reduce friction and thus assists in reducing time to resolution. Cross-Account Cross-Region can also be used purely for navigation, without building dashboards, if I’m only interested in viewing alarms/resources/metrics in other accounts and/or regions for example.

Amazon CloudWatch Cross-Account Cross-Region Dashboards Account Setup
Getting started with cross-account cross-region dashboards is easy and I also have the choice of integrating with AWS Organizations if I wish. By using Organizations to manage and govern multiple AWS accounts I can use the CloudWatch console to navigate between Amazon CloudWatch dashboards, metrics and alarms, in any account in my organization, without logging in, as I’ll show in this post. I can also of course just set up cross-region dashboards for a single account. In this post I’ll be making use of the integration with Organizations.

To support this blog post, I’ve already created an organization and invited, using the Organizations console, several other of my accounts to join. As noted, using Organizations makes it easy for me to select accounts later when I’m configuring my dashboards. I could also choose to not use Organizations and pre-populate a custom account selector, so that I don’t need to remember accounts, or enter the account IDs manually when I need them, as I build my dashboard. You can read more on how to set up an organization in the AWS Organizations User Guide. With my organization set up I’m ready to start configuring the accounts.

My first task is to identify and configure the account in which I will create a dashboard – this is my monitoring account (and I can have more than one). Secondly, I need to identify the accounts (known as member accounts in Organizations) that I want to monitor – these accounts will be configured to share data with my monitoring account. My monitoring account requires a Service Linked Role (SLR) to permit CloudWatch to assume a role in each member account. The console will automatically create this role when I enable the cross-account cross-region option. To set up each member account I need to enable data sharing, from within the account, with the monitoring account(s).

Starting with my monitoring account, from the CloudWatch console home, I select Settings in the navigation panel to the left. Cross-Account Cross-Region is shown at the top of the page and I click Configure to get started.


This takes me to a settings screen that I’ll also use in my member accounts to enable data sharing. For now, in my monitoring account, I want to click the Edit option to view my cross-account cross-region options:


The final step for my monitoring account is to enable the AWS Organization account selector option. This will require an additional role be deployed to the master account for the organization to permit the account to access the account list in the organization. The console will guide me through this process for the master account.


This concludes set up for my monitoring account and I can now switch focus to my member accounts and enable data sharing. To do this, I log out of my monitoring account and for each member account, log in and navigate to the CloudWatch console and again click Settings before clicking Configure under Cross-Account Cross-Region, as shown earlier. This time I click Share data, enter the IDs of the monitoring account(s) I want to share data with and set the scope of the sharing (read-only access to my CloudWatch data or full read-only access to my account), and then launch a CloudFormation stack with a predefined template to complete the process. Note that I can also elect to share my data with all accounts in the organization. How to do this is detailed in the documentation.


That completes configuration of both my monitoring account and the member accounts that my monitoring account will be able to access to obtain CloudWatch data for my resources. I can now proceed to create one or more dashboards in my monitoring account.

Configuring Cross-Account Cross-Region Dashboards
With account configuration complete it’s time to create a dashboard! In my member accounts I am running several EC2 instances, in different regions. One member account has one Windows and one Linux instance running in US West (Oregon). My second member account is running three Windows instances in an AWS Auto Scaling group in US East (Ohio). I’d like to create a dashboard giving me insight into CPU and network utilization for all these instances across both accounts and both regions.

To get started I log into the AWS console with my monitoring account and navigate to the CloudWatch console home, click Dashboards, then Create dashboard. Note the new account ID and region fields at the top of the page – now that cross-account cross-region access has been configured I can also perform ad-hoc inspection across accounts and/or regions without constructing a dashboard.


I first give the dashboard a name – I chose Compute – and then select Add widget to add my first set of metrics for CPU utilization. I chose a Line widget and clicked Configure. This takes me to an Add metric graph dialog and I can select the account and regions to pull metrics from into my dashboard.


With the account and region selected, I can proceed to select the relevant metrics for my instances and can add all my instances for my monitoring account in the two different regions. Switching accounts, and region, I repeat for the instances in my member accounts. I then add another widget, this time a Stacked area, for inbound network traffic, again selecting the instances of interest in each of my accounts and regions. Finally I click Save dashboard. The end result is a dashboard showing CPU utilization and network traffic for my 4 instances and one cluster across the accounts and regions (note the xa indicator in the top right of each widget, denoting this is representing data from multiple accounts and regions).


Hovering over a particular instance triggers a fly-out with additional data, and a deep link that will open the CloudWatch homepage in the account and region of the metric:

Availability
Amazon CloudWatch cross-account cross-region dashboards are available for use today in all commercial AWS regions and you can take advantage of the integration with AWS Organizations in those regions where Organizations is available.

— Steve

New – Savings Plans for AWS Compute Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-savings-plans-for-aws-compute-services/

I first wrote about EC2 Reserved Instances a decade ago! Since I wrote that post, our customers have saved billions of dollars by using Reserved Instances to commit to usage of a specific instance type and operating system within an AWS region.

Over the years we have enhanced the Reserved Instance model to make it easier for you to take advantage of the RI discount. This includes:

Regional Benefit – This enhancement gave you the ability to apply RIs across all Availability Zones in a region.

Convertible RIs – This enhancement allowed you to change the operating system or instance type at any time.

Instance Size Flexibility – This enhancement allowed your Regional RIs to apply to any instance size within a particular instance family.

The model, as it stands today, gives you discounts of up to 72%, but it does require you to coordinate your RI purchases and exchanges in order to ensure that you have an optimal mix that covers usage that might change over time.

New Savings Plans
Today we are launching Savings Plans, a new and flexible discount model that provides you with the same discounts as Reserved Instances, in exchange for a commitment to use a specific amount (measured in dollars per hour) of compute power over a one or three year period.

Every type of compute usage has an On Demand price and a (lower) Savings Plan price. After you commit to a specific amount of compute usage per hour, all usage up to that amount will be covered by the Saving Plan, and anything past it will be billed at the On Demand rate.

If you own Reserved Instances, the Savings Plan applies to any On Demand usage that is not covered by the RIs. We will continue to sell RIs, but Savings Plans are more flexible and I think many of you will prefer them!

Savings Plans are available in two flavors:

Compute Savings Plans provide the most flexibility and help to reduce your costs by up to 66% (just like Convertible RIs). The plans automatically apply to any EC2 instance regardless of region, instance family, operating system, or tenancy, including those that are part of EMR, ECS, or EKS clusters, or launched by Fargate. For example, you can shift from C4 to C5 instances, move a workload from Dublin to London, or migrate from EC2 to Fargate, benefiting from Savings Plan prices along the way, without having to do anything.

EC2 Instance Savings Plans apply to a specific instance family within a region and provide the largest discount (up to 72%, just like Standard RIs). Just like with RIs, your savings plan covers usage of different sizes of the same instance type (such as a c5.4xlarge or c5.large) throughout a region. You can even switch switch from Windows to Linux while continuing to benefit, without having to make any changes to your savings plan.

Purchasing a Savings Plan
AWS Cost Explorer will help you to choose a Savings Plan, and will guide you through the purchase process. Since my own EC2 usage is fairly low, I used a test account that had more usage. I open AWS Cost Explorer, then click Recommendations within Savings Plans:

I choose my Recommendation options, and review the recommendations:

Cost Explorer recommends that I purchase $2.40 of hourly Savings Plan commitment, and projects that I will save 40% (nearly $1200) per month, in comparison to On-Demand. This recommendation tries to take into account variable usage or temporary usage spikes in order to recommend the steady state capacity for which we believe you should consider a Savings Plan. In my case, the variable usage averages out to $0.04 per hour that we’re recommending I keep as On-Demand.

I can see the recommended Savings Plans at the bottom of the page, select those that I want to purchase, and Add them to my cart:

When I am ready to proceed, I click View cart, review my purchases, and click Submit order to finalize them:

My Savings Plans become active right away. I can use the Cost Explorer’s Performance & Coverage reports to review my actual savings, and to verify that I own sufficient Savings Plans to deliver the desired amount of coverage.

Available Now
As you can see, Savings Plans are easy to use! You can access compute power at discounts of up to 72%, while gaining the flexibility to change compute services, instance types, operating systems, regions, and so forth.

Savings Plans are available in all AWS regions outside of China, and you can start to purchase (and benefit) from them today!

Jeff;

 

Meet the newest AWS Heroes, including the first Data Heroes!

Post Syndicated from Ross Barich original https://aws.amazon.com/blogs/aws/meet-the-newest-aws-heroes-including-the-first-data-heroes/

The AWS Heroes program recognizes community leaders from around the world who have extensive AWS knowledge and a passion for sharing their expertise with others. As trends in the technical community shift, the program evolves to better recognize the most influential community leaders across a variety of technical disciplines.

Introducing AWS Data Heroes
Today we are introducing AWS Data Heroes: a vibrant community of developers, IT leaders, and educators with a shared passion for analytics, database, and blockchain technologies. Data Heroes are data experts, who actively participate at the forefront of technology trends, leveraging their extensive technical expertise to share knowledge and build a community around a passion for AWS data services.

Developers from all backgrounds and skill sets can learn about database, analytics, and blockchain technology through a variety of educational content created by Data Heroes including videos, books, guides, blog posts, and open source projects. The first cohort of AWS Data Heroes include:

Alex DeBrie – Omaha, USA

Data Hero Alex Debrie is an Engineering Manager at Serverless, Inc. focused on designing, building, and promoting serverless applications. He is passionate about DynamoDB, Lambda, and many other AWS technologies. He is the creator of DynamoDBGuide.com, a guided walkthrough to DynamoDB, and the author of The DynamoDB Book, a comprehensive guide to data modeling with DynamoDB. He has spoken about data design with DynamoDB at AWS Summits in Chicago and New York, as well as other conferences, and has also assisted AWS in writing official tutorials for using DynamoDB. He blogs on tech-related topics, mostly related to AWS.

 

 

 

Álvaro Hernández – Madrid, Spain

Data Hero Álvaro Hernández is a passionate database and software developer. He founded and works as the CEO of OnGres, a PostGres startup set to disrupt the database market. He has been dedicated to PostgreSQL and R&D in databases for two decades. An open source advocate and developer at heart, Álvaro is a well-known member of the PostgreSQL Community, to which he has contributed founding the non-profit Fundación PostgreSQL and the Spanish PostgreSQL User Group. You can find him frequently speaking at PostgreSQL, database, cloud, and Java conferences. Every year, Álvaro travels approximately three-four times around the globe—in 2020, he will hit the milestone of having delivered 100 tech talks.

 

 

 

Goran Opacic – Belgrade, Serbia

Data Hero Goran Opacic Goran is the CEO and owner of Esteh and community leader of AWS User Group Belgrade. He is a Solutions Architect focused on Databases and Security and runs madabout.cloud, a blog related to AWS, Java, and databases. He works on promoting All Things Cloud, giving lectures and educating a new generation of developers into AWS community. This includes a series of interviews he runs with prominent technology leaders on his YouTube channel.

 

 

 

 

Guillermo Fisher – Norfolk, USA

Data Hero Guillermo Fisher is an Engineering Manager at Handshake and founder of 757ColorCoded, an organization that exists to educate and empower local people of color to achieve careers in technology and improve their lives. In 2019, he partnered with We Power Tech to offer free, instructor-led AWS Tech Essentials training to Southeastern Virginia. As an advocate of AWS technologies, Guillermo blogs about services like Lambda, Athena, and DynamoDB on Medium. He also shares his knowledge at events such as the Atlanta AWS Summit, Hampton Roads DevFest, RevolutionConf, and re:Invent.

 

 

 

 

Helen Anderson – Wellington, New Zealand

Data Hero Helen Anderson is a Business Intelligence Consultant based out of Wellington, New Zealand. She focuses on leading projects that use AWS services to empower users and improve efficiencies. She is a passionate advocate for data analysts and is well known in the Data Community for writing beginner-friendly blog posts, teaching, and mentoring those who are new to the tech industry. In fact, her post, “AWS from A to Z,” is one of the most popular AWS post ever on Dev.to. Helen was also named one of Jefferson Frank’s “Top 7 AWS Experts You Should be Following in 2019.” As a Woman in Tech and career switcher, Helen is passionate about mentoring and inspiring those who are underrepresented in the industry.

 

 

 

Manrique Lopez – Madrid, Spain

Data Hero Manrique Lopez is the CEO Bitergia, a software development analytics company. He is passionate about free, libre, open source software development communities. He is a frequent speaker on Open Distro for Elasticsearch. Currently he is active in GrimoireLab, the open source software for software development analytics, and CHAOSS (Community Health Analytics for Open Source Software).

 

 

 

 

 

Lynn Langit – Minneapolis, USA

Data Hero Lynn Langit is a consultant in the Minneapolis area focused on big data and cloud architecture. In addition to having designed many production AWS solutions, Lynn has also created and delivered technical content about the practicalities of working with the AWS Cloud at developer conferences worldwide. And she has created a series of technical AWS courses for Lynda.com. She is currently collaborating virtually with a team at CSIRO Bioinformatics in Sydney, Australia. They are working to leverage modern cloud architectures (containers and serverless) to scale genomic research and tools for use world-wide.

 

 

 

Matt Lewis – Swansea, United Kingdom

Data Hero Matt Lewis is Chief Architect at the UK Driver and Vehicle Licensing Agency where he is responsible for setting technology direction and guiding solutions that operate against critical data sets, including a record of all drivers in Great Britain and a record of all vehicles in the UK. He also founded and runs the AWS South Wales user group. He has been actively exploring and presenting the benefits of moving from traditional databases to cloud native services, most recently prototyping use cases for the adoption of Quantum Ledger Database (QLDB). In his spare time, Matt writes about different aspects of public cloud on his personal blog and Twitter, and spends too much time cycling online.

 

 

 

Robert Koch – Denver, USA

Data Hero Robert Koch is the Lead Architect at S&P Global and one of the community leaders of DeafintheCloud.com. He helps drive cloud-based architecture, blogs about migrating to the cloud, and loves to talk data and event-driven systems. In a recent lightning talk, he gave an overview of how Redshift has a symbiotic relationship with PostgreSQL. He currently has AWS certifications as a Cloud Practitioner, Big Data – Specialty, and as a Solution Architect – Associate. He is actively involved in the development community in Denver, often speaking at Denver Dev Day, a bi-annual mini-conference and at the AWS Denver Meetup.

 

 

 

 

Meet the Other New AWS Heroes
Not to be outdone, this month we are thrilled to introduce to you a variety of other new AWS Heroes:

Ankit Gupta – Kolkata, India

Community Hero Ankit Gupta is a Solutions Architect at PwC India. He brings deep expertise in Solutions Architecture designing on AWS. Ankit is an AWS user since 2012 and works with most AWS Services. He has multiple AWS Certifications and has worked on various types of AWS projects. He is an AWS Community leader helping drive the AWS Community in India since 2014. He is also a co-organizer of AWS User Group Kolkata, India. Ankit has given multiple sessions on AWS Services at various events. He frequently visits Engineering Colleges for providing knowledge sharing sessions on Cloud Technologies. He also mentors Engineering students.

 

 

 

Brian LeRoux – Vancouver, Canada

Serverless Hero Brian LeRoux is the co-founder and CTO of continuous delivery platform Begin.com and core maintainer of OpenJS Architect. Brian helped create the declarative .arc manifest format which aims to make configuration clear, simple, terse and precise. This concision unlocks formerly complex severless primitives with the determinism and interop of standard CloudFormation. Brian believes the future is open source, serverless, and will be written by hackers like you.

 

 

 

 

Brian Tarbox – Boston, USA

Community Hero Brian Tarbox has over thirty years experience delivering mission critical systems on-time, on-target, and with commercial success. He has ten patents, dozens of technical papers, “high engagement” Alexa skills, co-leads the Boston AWS Meetup, manages his company’s all-engineers-get-certified program and has presented at numerous industry events including AWS Community Days. He was the inaugural speaker for the Portland AWS User Group’s first meeting. In 2010 he won RockStar and Duke’s Choice award for the Most Innovative Use of Java for his system for turning log files into music so you could “listen” to your programs. He also won Atlassian’s Charlie award for the Most Innovative Use Of Jira.

 

 

 

Calvin Hendryx-Parker – Indianapolis, USA

Community Hero Calvin Hendryx-Parker is the co-founder of Six Feet Up, a women-owned company specializing in Python and AWS consulting. As CTO, he’s an active proponent of Cloud deployments and strategies in the Midwest, and has been using AWS technologies since early 2013. In 2017, Calvin founded the Indiana AWS user group (“IndyAWS”), now the fastest growing tech community in the Midwest with 750+ members. To-date, Calvin has held 30+ IndyAWS monthly meetups and organized the first annual Indy Cloud Conf event focused on cloud computing and cross cloud deployments.

 

 

 

 

Farrah Campbell – Portland, USA

Serverless Hero Farrah Campbell is the Ecosystems Director at Stackery, a serverless workflow company. She is passionate about her work with the Serverless, DevOps, and Women in Technology communities, participating in global industry events, user events, conferences, user groups along with, a documentary focused on how culture changes stories for women in the technology industry. She is the organizer of the Portland Serverless Days, the Portland Serverless Meetup, and speaks around the world about her serverless journey and the serverless mindset.

 

 

 

 

Gabriel Ramírez – Mexico City, Mexico

Community Hero Gabriel Ramírez is the founder of Bootcamp Institute, a company specializes in democratizing the usage and knowledge of AWS for Spanish speakers. He has worked as an AWS Authorized Trainer for years and holds 10 AWS certifications ranging from professional to specialty. Gabriel is the organizer of several AWS User Groups in Mexico and a strong contributor for social programs like AWS Educate, empowering students to adopt the AWS Cloud and get certified. He has helped thousands of people pass the AWS Solutions Architect exam by doing workshops, webinars, and study groups on different social networks and local meetups.

 

 

 

Gillian Armstrong – Belfast, United Kingdom

Machine Learning Hero Gillian Armstrong works for Liberty IT where she is helping to bring Machine Learning and Serverless into the enterprise. This involves hands-on architecting and building systems, as well as helping build out strategy and education. She’s excited about how Applied AI, the space where Machine Learning and Serverless meet, is allowing Software Engineers to build intelligence into their systems, and as such is an enthusiastic user and evangelist for the AWS AI Services. She is also exploring how tools like Amazon SageMaker can allow Software Engineers and Data Scientists to work closer together.

 

 

 

Ilya Dmitrichenko – London, United Kingdom

Container Hero Ilya Dmitrichenko is a Software Engineer at Weaveworks, focused on making Kubernetes work for a wide range of users. Having started contributing to Kubernetes projects from the early days in 2014, Ilya has focused his attention on cluster lifecycle matters, networking, and observability, as well as developer tools. Most recently, in 2018 Ilya created eksctl, which is now an official CLI for Amazon EKS.

 

 

 

 

 

Juyoung Song – Seoul, Korea

Container Hero Juyoung Song is a DevOps Engineer at beNX, currently in charge of transforming the legacy-cloud systems into modern cloud architecture to bring global stars such as BTS and millions of fans together in the digital sphere. Juyoung speaks regularly at AWS-organized events such as AWS Container Day, AWS Summit, and This is My Architecture. He also organizes and speaks at various Meetups like AWS Korea User Group and DevOps Korea, about topics such as ECS and Fargate, and its DevOps best practices. He is interested in building hyper-scale DevOps environments for containers using AWS CodeBuild, Terraform, and various open-source tools.

 

 

 

Lukasz Dorosz – Warsaw, Poland

Community Hero Lukasz Dorosz is a Head of AWS Architecture and Board Member at CloudState/Chmurowisko. AWS Consulting Partner with a mission to help businesses leverage cloud services to make an impact in the world. As an active community member, he enjoys to share knowledge and experience with people and teach them about the cloud. He is a Co-Leader of AWS User Group POLAND, where regularly contributes to any events and organizes many meetups around Poland. Additionally, he popularizes the cloud through training, workshops and as a speaker at many events. He is also the author of online courses, webinars, and blog posts.

 

 

 

Martijn van Dongen – Amsterdam, The Netherlands

Community Hero Martijn van Dongen is an experienced AWS Cloud Evangelist for binx.io, part of Xebia. He is a generalist in a broad set of AWS services, with a strong focus on security, containers, serverless and data-intensive platforms. He is the founder and lead of the Dutch AWS User Group. He organizes approximately 20 meetups per year with an average of 100 attendees and has built a powerful network of speakers and sponsors. At the Benelux AWS re:Invent re:Cap early 2019, Martijn organized 14 meetup style sessions in the evening with more than 300 attendees. Martijn regularly writes technical articles on blogs and speaks at meetups, events, and technical conferences such as AWS re:Invent and local AWS events.

 

 

 

Or Hiltch – Tel Aviv, Israel

Machine Learning Hero Or Hiltch is Co-Founder and CTO at Skyline AI, the artificial intelligence investment manager for commercial real estate. In parallel to his business career, Or maintains a strong community presence, regularly hosting and speaking on AI, ML, and AWS related meetups and conferences, including SageMaker related topics on the AWS User Group meetup, Serverless NYC Conference, MIT AI 2018, and more. Or is an open-source hacker and creator/maintainer of a few high-profile (1000+ stars) open-source repo’s on GitHub and an avid blogger on ML and software engineering topics, posting on Amazon SageMaker, word2vec, novel uses for unsupervised-learning in real estate, and more.

 

 

 

Sebastian Müller – Hamburg, Germany

Serverless Hero Sebastian Müller writes about all things Serverless, GraphQL, React, TypeScript, and Go on his personal website sbstjn.com. He has a background as an Engineering Lead, Scrum Master, and Full Stack Engineer. Sebastian is a general Technology Enthusiast working as Senior Cloud Consultant at superluminar, an AWS Advanced Consulting Partner and Serverless Development Partner, in Hamburg, Germany. Most articles on his website and projects on GitHub are the results of established practices from his work with various clients.

 

 

 

 

Steve Bjorg – San Diego, USA

Community Hero Steve Bjorg is the Founder and Chief Technical Officer at MindTouch, a San Diego-based enterprise software company that specializes in customer self-service software. He is a frequent contributor to open-source projects and is passionate about serverless software. He is the author of LambdaSharp – a tool for optimizing the developer experience when building Serverless .NET applications on AWS. Steve and his team host a monthly serverless hacking challenge in San Diego to learn and master new AWS services and features.

 

 

 

 

Vlad Ionescu – Bucharest, Romania

Container Hero Vlad Ionescu is a DevOps Consultant helping companies deliver more reliable software faster and safer. He is focused on observability and reliability, with a passion for rapid deployments and simplicity. Vlad’s work is predominantly focused on Kubernetes and Serverless. After starting with kops, he then moved to EKS which he enjoys pushing as far as possible. He can often be found sharing insights in #eks on the Kubernetes Slack. Before rising to the clouds he was a software developer with a background in finance. He has a passion for Haskell and Ruby, but spends most of his time in Python or Go while grumbling about JavaScript features.

 

 

 

 

You can learn more about AWS Heroes and connect with a Hero near you by visiting the AWS Hero website.

Ross;