The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.
Save Nemo
The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.
In addition, they provide dos and don’ts for how to behave on a reef dive.
The Nemo-Pi
To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.
This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.
The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.
The web dashboard showing live Nemo-Pi data
Long-term goals
Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.
A healthy coral reef
Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.
A bleached coral reef
Vote now for Save Nemo
If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:
Click “Abstimmen” in the footer of the page to vote
Click “JA” in the footer to confirm
Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.
This post is courtesy of Alan Protasio, Software Development Engineer, Amazon Web Services
Just like compute and storage, messaging is a fundamental building block of enterprise applications. Message brokers (aka “message-oriented middleware”) enable different software systems, often written in different languages, on different platforms, running in different locations, to communicate and exchange information. Mission-critical applications, such as CRM and ERP, rely on message brokers to work.
A common performance consideration for customers deploying a message broker in a production environment is the throughput of the system, measured as messages per second. This is important to know so that application environments (hosts, threads, memory, etc.) can be configured correctly.
In this post, we demonstrate how to measure the throughput for Amazon MQ, a new managed message broker service for ActiveMQ, using JMS Benchmark. It should take between 15–20 minutes to set up the environment and an hour to run the benchmark. We also provide some tips on how to configure Amazon MQ for optimal throughput.
Benchmarking throughput for Amazon MQ
ActiveMQ can be used for a number of use cases. These use cases can range from simple fire and forget tasks (that is, asynchronous processing), low-latency request-reply patterns, to buffering requests before they are persisted to a database.
The throughput of Amazon MQ is largely dependent on the use case. For example, if you have non-critical workloads such as gathering click events for a non-business-critical portal, you can use ActiveMQ in a non-persistent mode and get extremely high throughput with Amazon MQ.
On the flip side, if you have a critical workload where durability is extremely important (meaning that you can’t lose a message), then you are bound by the I/O capacity of your underlying persistence store. We recommend using mq.m4.large for the best results. The mq.t2.micro instance type is intended for product evaluation. Performance is limited, due to the lower memory and burstable CPU performance.
Tip: To improve your throughput with Amazon MQ, make sure that you have consumers processing messaging as fast as (or faster than) your producers are pushing messages.
Because it’s impossible to talk about how the broker (ActiveMQ) behaves for each and every use case, we walk through how to set up your own benchmark for Amazon MQ using our favorite open-source benchmarking tool: JMS Benchmark. We are fans of the JMS Benchmark suite because it’s easy to set up and deploy, and comes with a built-in visualizer of the results.
Non-Persistent Scenarios – Queue latency as you scale producer throughput
Getting started
At the time of publication, you can create an mq.m4.large single-instance broker for testing for $0.30 per hour (US pricing).
Step 2 – Create an EC2 instance to run your benchmark Launch the EC2 instance using Step 1: Launch an Instance. We recommend choosing the m5.large instance type.
Step 3 – Configure the security groups Make sure that all the security groups are correctly configured to let the traffic flow between the EC2 instance and your broker.
From the broker list, choose the name of your broker (for example, MyBroker)
In the Details section, under Security and network, choose the name of your security group or choose the expand icon ( ).
From the security group list, choose your security group.
At the bottom of the page, choose Inbound, Edit.
In the Edit inbound rules dialog box, add a role to allow traffic between your instance and the broker: • Choose Add Rule. • For Type, choose Custom TCP. • For Port Range, type the ActiveMQ SSL port (61617). • For Source, leave Custom selected and then type the security group of your EC2 instance. • Choose Save.
Your broker can now accept the connection from your EC2 instance.
Step 4 – Run the benchmark Connect to your EC2 instance using SSH and run the following commands:
After the benchmark finishes, you can find the results in the ~/reports directory. As you may notice, the performance of ActiveMQ varies based on the number of consumers, producers, destinations, and message size.
Amazon MQ architecture
The last bit that’s important to know so that you can better understand the results of the benchmark is how Amazon MQ is architected.
Amazon MQ is architected to be highly available (HA) and durable. For HA, we recommend using the multi-AZ option. After a message is sent to Amazon MQ in persistent mode, the message is written to the highly durable message store that replicates the data across multiple nodes in multiple Availability Zones. Because of this replication, for some use cases you may see a reduction in throughput as you migrate to Amazon MQ. Customers have told us they appreciate the benefits of message replication as it helps protect durability even in the face of the loss of an Availability Zone.
Conclusion
We hope this gives you an idea of how Amazon MQ performs. We encourage you to run tests to simulate your own use cases.
To learn more, see the Amazon MQ website. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.
Thanks to Susan Ferrell, Senior Technical Writer, for a great blog post on how to use CodeCommit branch-level permissions. —-
AWS CodeCommit users have been asking for a way to restrict commits to some repository branches to just a few people. In this blog post, we’re going to show you how to do that by creating and applying a conditional policy, an AWS Identity and Access Management (IAM) policy that contains a context key.
Why would I do this?
When you create a branch in an AWS CodeCommit repository, the branch is available, by default, to all repository users. Here are some scenarios in which refining access might help you:
You maintain a branch in a repository for production-ready code, and you don’t want to allow changes to this branch except from a select group of people.
You want to limit the number of people who can make changes to the default branch in a repository.
You want to ensure that pull requests cannot be merged to a branch except by an approved group of developers.
We’ll show you how to create a policy in IAM that prevents users from pushing commits to and merging pull requests to a branch named master. You’ll attach that policy to one group or role in IAM, and then test how users in that group are affected when that policy is applied. We’ll explain how it works, so you can create custom policies for your repositories.
What you need to get started
You’ll need to sign in to AWS with sufficient permissions to:
Create and apply policies in IAM.
Create groups in IAM.
Add users to those groups.
Apply policies to those groups.
You can use existing IAM groups, but because you’re going to be changing permissions, you might want to first test this out on groups and users you’ve created specifically for this purpose.
You’ll need a repository in AWS CodeCommit with at least two branches: master and test-branch. For information about how to create repositories, see Create a Repository. For information about how to create branches, see Create a Branch. In this blog post, we’ve named the repository MyDemoRepo. You can use an existing repository with branches of another name, if you prefer.
Let’s get started!
Create two groups in IAM
We’re going to set up two groups in IAM: Developers and Senior_Developers. To start, both groups will have the same managed policy, AWSCodeCommitPowerUsers, applied. Users in each group will have exactly the same permissions to perform actions in IAM.
Figure 1: Two example groups in IAM, with distinct users but the same managed policy applied to each group
In the navigation pane, choose Groups, and then choose Create New Group.
In the Group Name box, type Developers, and then choose Next Step.
In the list of policies, select the check box for AWSCodeCommitPowerUsers, then choose Next Step.
Choose Create Group.
Now, follow these steps to create the Senior_Developers group and attach the AWSCodeCommitPowerUsers managed policy. You now have two empty groups with the same policy attached.
Create users in IAM
Next, add at least one unique user to each group. You can use existing IAM users, but because you’ll be affecting their access to AWS CodeCommit, you might want to create two users just for testing purposes. Let’s go ahead and create Arnav and Mary.
In the navigation pane, choose Users, and then choose Add user.
For the new user, type Arnav_Desai.
Choose Add another user, and then type Mary_Major.
Select the type of access (programmatic access, access to the AWS Management Console, or both). In this blog post, we’ll be testing everything from the console, but if you want to test AWS CodeCommit using the AWS CLI, make sure you include programmatic access and console access.
For Console password type, choose Custom password. Each user is assigned the password that you type in the box. Write these down so you don’t forget them. You’ll need to sign in to the console using each of these accounts.
Choose Next: Permissions.
On the Set permissions page, choose Add user to group. Add Arnav to the Developers group. Add Mary to the Senior_Developers group.
Choose Next: Review to see all of the choices you made up to this point. When you are ready to proceed, choose Create user.
Sign in as Arnav, and then follow these steps to go to the master branch and add a file. Then sign in as Mary and follow the same steps.
On the Dashboard page, from the list of repositories, choose MyDemoRepo.
In the Code view, choose the branch named master.
Choose Add file, and then choose Create file. Type some text or code in the editor.
Provide information to other users about who added this file to the repository and why.
In Author name, type the name of the user (Arnav or Mary).
In Email address, type an email address so that other repository users can contact you about this change.
In Commit message, type a brief description to help you remember why you added this file or any other details you might find helpful.
Type a name for the file.
Choose Commit file.
Now follow the same steps to add a file in a different branch. (In our example repository, that’s the branch named test-branch.) You should be able to add a file to both branches regardless of whether you’re signed in as Arnav or Mary.
Let’s change that.
Create a conditional policy in IAM
You’re going to create a policy in IAM that will deny API actions if certain conditions are met. We want to prevent users with this policy applied from updating a branch named master, but we don’t want to prevent them from viewing the branch, cloning the repository, or creating pull requests that will merge to that branch. For this reason, we want to pick and choose our APIs carefully. Looking at the Permissions Reference, the logical permissions for this are:
GitPush
PutFile
MergePullRequestByFastForward
Now’s the time to think about what else you might want this policy to do. For example, because we don’t want users with this policy to make changes to this branch, we probably don’t want them to be able to delete it either, right? So let’s add one more permission:
DeleteBranch
The branch in which we want to deny these actions is master. The repository in which the branch resides is MyDemoRepo. We’re going to need more than just the repository name, though. We need the repository ARN. Fortunately, that’s easy to find. Just go to the AWS CodeCommit console, choose the repository, and choose Settings. The repository ARN is displayed on the General tab.
Now we’re ready to create a policy. 1. Open the IAM console at https://console.aws.amazon.com/iam/. Make sure you’re signed in with the account that has sufficient permissions to create policies, and not as Arnav or Mary. 2. In the navigation pane, choose Policies, and then choose Create policy. 3. Choose JSON, and then paste in the following:
You’ll notice a few things here. First, change the repository ARN to the ARN for your repository and include the repository name. Second, if you want to restrict access to a branch with a name different from our example, master, change that reference too.
Now let’s talk about this policy and what it does. You might be wondering why we’re using a Git reference (refs/heads) value instead of just the branch name. The answer lies in how Git references things, and how AWS CodeCommit, as a Git-based repository service, implements its APIs. A branch in Git is a simple pointer (reference) to the SHA-1 value of the head commit for that branch.
You might also be wondering about the second part of the condition, the nullification language. This is necessary because of the way git push and git-receive-pack work. Without going into too many technical details, when you attempt to push a change from a local repo to AWS CodeCommit, an initial reference call is made to AWS CodeCommit without any branch information. AWS CodeCommit evaluates that initial call to ensure that:
a) You’re authorized to make calls.
b) A repository exists with the name specified in the initial call. If you left that null out of the policy, users with that policy would be unable to complete any pushes from their local repos to the AWS CodeCommit remote repository at all, regardless of which branch they were trying to push their commits to.
Could you write a policy in such a way that the null is not required? Of course. IAM policy language is flexible. There’s an example of how to do this in the AWS CodeCommit User Guide, if you’re curious. But for the purposes of this blog post, let’s continue with this policy as written.
So what have we essentially said in this policy? We’ve asked IAM to deny the relevant CodeCommit permissions if the request is made to the resource MyDemoRepo and it meets the following condition: the reference is to refs/heads/master. Otherwise, the deny does not apply.
I’m sure you’re wondering if this policy has to be constrained to a specific repository resource like MyDemoRepo. After all, it would be awfully convenient if a single policy could apply to all branches in any repository in an AWS account, particularly since the default branch in any repository is initially the master branch. Good news! Simply replace the ARN with an *, and your policy will affect ALL branches named master in every AWS CodeCommit repository in your AWS account. Make sure that this is really what you want, though. We suggest you start by limiting the scope to just one repository, and then changing things when you’ve tested it and are happy with how it works.
When you’re sure you’ve modified the policy for your environment, choose Review policy to validate it. Give this policy a name, such as DenyChangesToMaster, provide a description of its purpose, and then choose Create policy.
Now that you have a policy, it’s time to apply and test it.
Apply the policy to a group
In theory, you could apply the policy you just created directly to any IAM user, but that really doesn’t scale well. You should apply this policy to a group, if you use IAM groups to manage users, or to a role, if your users assume a role when interacting with AWS resources.
In the IAM console, choose Groups, and then choose Developers.
On the Permissions tab, choose Attach Policy.
Choose DenyChangesToMaster, and then choose Attach policy.
Your groups now have a critical difference: users in the Developers group have an additional policy applied that restricts their actions in the master branch. In other words, Mary can continue to add files, push commits, and merge pull requests in the master branch, but Arnav cannot.
Figure 2: Two example groups in IAM, one with an additional policy applied that will prevent users in this group from making changes to the master branch
Test it out. Sign in as Arnav, and do the following:
On the Dashboard page, from the list of repositories, choose MyDemoRepo.
In the Code view, choose the branch named master.
Choose Add file, and then choose Create file, just as you did before. Provide some text, and then add the file name and your user information.
Choose Commit file.
This time you’ll see an error after choosing Commit file. It’s not a pretty message, but at the very end, you’ll see a telling phrase: “explicit deny”. That’s the policy in action. You, as Arnav, are explicitly denied PutFile, which prevents you from adding a file to the master branch. You’ll see similar results if you try other actions denied by that policy, such as deleting the master branch.
Stay signed in as Arnav, but this time add a file to test-branch. You should be able to add a file without seeing any errors. You can create a branch based on the master branch, add a file to it, and create a pull request that will merge to the master branch, all just as before. However, you cannot perform denied actions on that master branch.
Sign out as Arnav and sign in as Mary. You’ll see that as that IAM user, you can add and edit files in the master branch, merge pull requests to it, and even, although we don’t recommend this, delete it.
Conclusion
You can use conditional statements in policies in IAM to refine how users interact with your AWS CodeCommit repositories. This blog post showed how to use such a policy to prevent users from making changes to a branch named master. There are many other options. We hope this blog post will encourage you to experiment with AWS CodeCommit, IAM policies, and permissions. If you have any questions or suggestions, we’d love to hear from you.
Contributed by Shea Lutton, AWS Cloud Infrastructure Architect
Amazon Simple Queue Service (Amazon SQS) is a fully managed queuing service that helps decouple applications, distributed systems, and microservices to increase fault tolerance. SQS queues come in two distinct types:
Standard SQS queues are able to scale to enormous throughput with at-least-once delivery.
FIFO queues are designed to guarantee that messages are processed exactly once in the exact order that they are received and have a default rate of 300 transactions per second.
As customers explore SQS FIFO queues, they often have questions about how the behavior works when messages arrive and are consumed. This post walks through some common situations to identify the exact behavior that you can expect. It also covers the behavior of message groups in depth and explains why message groups are key to understanding how FIFO queues work.
The simple case
Suppose that you run a major auction platform where people buy and sell a wide range of products. Your platform requires that transactions from buyers and sellers get processed in exactly the order received. Here’s how a FIFO queue helps you keep all your transactions in one straight flow.
A seller currently is holding an auction for a laptop, and three different bids are received for the same price. Ties are awarded to the first bidder at that price so it is important to track which arrived first. Your auction platform receives the three bids and sends them to a FIFO queue before they are processed.
Now observe how messages leave the queue. When your consumer asks for a batch of up to 10 messages, SQS starts filling the batch with the oldest message (bid A1). It keeps filling until either the batch is full or the queue is empty. In this case, the batch contains the three messages and the queue is now empty. After a batch has left the queue, SQS considers that batch of messages to be “in-flight” until the consumer either deletes them or the batch’s visibility timer expires.
When you have a single consumer, this is easy to envision. The consumer gets a batch of messages (now in-flight), does its processing, and deletes the messages. That consumer is then ready to ask for the next batch of messages.
The critical thing to keep in mind is that SQS won’t release the next batch of messages until the first batch has been deleted. By adding more messages to the queue, you can see more interesting behaviors. Imagine that a burst of 11 bids is sent to your FIFO queue, with two bids for Auction A arriving last.
The FIFO queue now has at least two batches of messages in it. When your single consumer requests the first batch of 10 messages, it receives a batch starting with B1 and ending with A1. Later, after the first batch has been deleted, the consumer can get the second batch of messages containing the final A2 message from the queue.
Adding complexity with multiple message groups
A new challenge arises. Your auction platform is getting busier and your dev team added a number of new features. The combination of increased messages and extra processing time for the new features means that a single consumer is too slow. The solution is to scale to have more consumers and process messages in parallel.
To work in parallel, your team realized that only the messages related to a single auction must be kept in order. All transactions for Auction A need to be kept in order and so do all transactions for Auction B. But the two auctions are independent and it does not matter which auctions transactions are processed first.
FIFO can handle that case with a feature called message groups. Each transaction related to Auction A is placed by your producer into message group A, and so on. In the diagram below, Auction A and Auction B each received three bid transactions, with bid B1 arriving first. The FIFO queue always keeps transactions within a message group in the order in which they arrived.
How is this any different than earlier examples? The consumer now gets the messages ordered by message groups, all the B group messages followed by all the A group messages. Multiple message groups create the possibility of using multiple consumers, which I explain in a moment. If FIFO can’t fill up a batch of messages with a single message group, FIFO can place more than one message group in a batch of messages. But whenever possible, the queue gives you a full batch of messages from the same group.
The order of messages leaving a FIFO queue is governed by three rules:
Return the oldest message where no other message in the same message group is currently in-flight.
Return as many messages from the same message group as possible.
If a message batch is still not full, go back to rule 1.
To see this behavior, add a second consumer and insert many more messages into the queue. For simplicity, the delete message action has been omitted in these diagrams but it is assumed that all messages in a batch are processed successfully by the consumer and the batch is properly deleted immediately after.
In this example, there are 11 Group A and 11 Group B transactions arriving in interleaved order and a second consumer has been added. Consumer 1 asks for a group of 10 messages and receives 10 Group A messages. Consumer 2 then asks for 10 messages but SQS knows that Group A is in flight, so it releases 10 Group B messages. The two consumers are now processing two batches of messages in parallel, speeding up throughput and then deleting their batches. When Consumer 1 requests the next batch of messages, it receives the remaining two messages, one from Group A and one from Group B.
Consider this nuanced detail from the example above. What would happen if Consumer 1 was on a faster server and processed its first batch of messages before Consumer 2 could mark its messages for deletion? See if you can predict the behavior before looking at the answer.
If Consumer 2 has not deleted its Group B messages yet when Consumer 1 asks for the next batch, then the FIFO queue considers Group B to still be in flight. It does not release any more Group B messages. Consumer 1 gets only the remaining Group A message. Later, after Consumer 2 has deleted its first batch, the remaining Group B message is released.
Conclusion
I hope this post answered your questions about how Amazon SQS FIFO queues work and why message groups are helpful. If you’re interested in exploring SQS FIFO queues further, here are a few ideas to get you started:
Create an Amazon SQS FIFO queue with three simple commands in the SQS console
Chris Mason and Josef Bacik led a brief discussion on the block-I/O controller for control groups (cgroups) in the filesystem track at the 2018 Linux Storage, Filesystem, and Memory-Management Summit. Mostly they were just aiming to get feedback on the approach they have taken. They are trying to address the needs of their employer, Facebook, with regard to the latency of I/O operations.
EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.
EC2 Fleet Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.
You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.
I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).
Using EC2 Fleet There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.
Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.
You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:
m4.16xlarge – 64 vCPUs, weight = 64
m5.24xlarge – 96 vCPUs, weight = 96
This is expressed in a template that looks like this:
By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.
Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:
The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.
Putting it all together, I have a single file (fl1.json) that describes my fleet:
My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.
Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:
{
"TotalTargetCapacity": 960,
}
Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:
Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.
In the Works We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.
Available Now You can create and make use of EC2 Fleets today in all public AWS Regions!
Memory control groups allow the system administrator to impose memory-use limits on the members of control groups. In many ways, these limits behave like the overall limit on available memory, but there are also some differences. The behavior of the memory controller also changed with the advent of the version-2 control-group API, creating problems for at least one significant user. Three sessions held in the memory-management track of the Linux Storage, Filesystem, and Memory-Management Summit explored some of these problems.
We made it easier for you to comply with regulatory standards by controlling access to AWS Regions using IAM policies. For example, if your company requires users to create resources in a specific AWS region, you can now add a new condition to the IAM policies you attach to your IAM principal (user or role) to enforce this for all AWS services. In this post, I review conditions in policies, introduce the new condition, and review a policy example to demonstrate how you can control access across multiple AWS services to a specific region.
Condition concepts
Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that lets you specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.
Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.
AWS:RequestedRegion condition key
The new global condition key, , supports all actions across all AWS services. You can use any string operator and specify any AWS region for its value.
Condition key
Description
Operator(s)
Value
aws:RequestedRegion
Allows you to specify the region to which the IAM principal (user or role) can make API calls
I’ll now demonstrate the use of the new global condition key.
Example: Policy with region-level control
Let’s say a group of software developers in my organization is working on a project using Amazon EC2 and Amazon RDS. The project requires a web server running on an EC2 instance using Amazon Linux and a MySQL database instance in RDS. The developers also want to test Amazon Lambda, an event-driven platform, to retrieve data from the MySQL DB instance in RDS for future use.
My organization requires all the AWS resources to remain in the Frankfurt, eu-central-1, region. To make sure this project follows these guidelines, I create a single IAM policy for all the AWS services that this group is going to use and apply the new global condition key aws:RequestedRegion for all the services. This way I can ensure that any new EC2 instances launched or any database instances created using RDS are in Frankfurt. This policy also ensures that any Lambda functions this group creates for testing are also in the Frankfurt region.
The first statement in the above example contains all the read-only actions that let my developers use the console for EC2, RDS, and Lambda. The permissions for IAM-related actions are required to launch EC2 instances with a role, enable enhanced monitoring in RDS, and for AWS Lambda to assume the IAM execution role to execute the Lambda function. I’ve combined all the read-only actions into a single statement for simplicity. The second statement is where I give write access to my developers for the three services and restrict the write access to the Frankfurt region using the aws:RequestedRegion condition key. You can also list multiple AWS regions with the new condition key if your developers are allowed to create resources in multiple regions. The third statement grants permissions for the IAM action iam:PassRole required by AWS Lambda. For more information on allowing users to create a Lambda function, see Using Identity-Based Policies for AWS Lambda.
Summary
You can now use the aws:RequestedRegion global condition key in your IAM policies to specify the region to which the IAM principal (user or role) can invoke an API call. This capability makes it easier for you to restrict the AWS regions your IAM principals can use to comply with regulatory standards and improve account security. For more information about this global condition key and policy examples using aws:RequestedRegion, see the IAM documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.
Want more AWS Security news? Follow us on Twitter.
Ever since we introduced our Groups feature, Backblaze for Business has been growing at a rapid rate! We’ve been staffing up in order to support the product and the newest addition to the sales team, Victoria, joins us as a Sales Development Representative! Let’s learn a bit more about Victoria, shall we?
What is your Backblaze Title? Sales Development Representative.
Where are you originally from? Harrisburg, North Carolina.
What attracted you to Backblaze? The leaders and family-style culture.
What do you expect to learn while being at Backblaze? How to sell, sell, sell!
Where else have you worked? The North Carolina Autism Society, an ophthalmologist’s office, home health care, and another tech startup.
Where did you go to school? The University of North Carolina Chapel Hill and Duke University’s Fuqua School of Business.
What’s your dream job? Fighter pilot, professional snowboarder or killer whale trainer.
Favorite place you’ve traveled? Hawaii and Banff.
Favorite hobby? Basketball and cars.
Of what achievement are you most proud? Missionary work and helping patients feel better.
Star Trek or Star Wars? Neither, but probably Star Wars.
Coke or Pepsi? Neither, bubble tea.
Favorite food? Snow crab legs.
Why do you like certain things? Because God made me that way.
Anything else you’d like you’d like to tell us? I’m a germophobe, drink a lot of water and unfortunately, am introverted.
Being on the phones all day is a good way to build up those extroversion skills! Welcome to the team and we hope you enjoy learning how to sell, sell, sell!
Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.
To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.
In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.
I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.
You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.
Solution overview
The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.
The following diagram covers the key steps involved in this solution.
The solution as illustrated in the preceding diagram flows like this:
The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.
Understanding the configuration data
This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.
The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.
Parameter name
Type
Description
Global parameters—defined once and applied to all jobs
redshift_query_logs.global.s3_prefix
String
The Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdir
String
The Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role>
String
The name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_list
StringList
A comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parameters—for each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connection
String
The name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.user
String
The user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.password
Secure String
The password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.
For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.
Solution deployment
To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.
Prerequisites
To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.
To start the deployment, launch the CloudFormation template:
CloudFormation stack parameters
The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.
Property
Default
Description
S3Bucket
mybucket
The bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClusters
Requires Input
A comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroups
Requires Input
A list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.
After you launch the template and create the stack, you see that the following resources have been created:
AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
All parameters required for this solution created in the parameter store.
The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.
After the deployment
For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:
Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.
For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.
Querying the exported system tables
Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.
To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.
Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.
Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.
You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.
How to extend the solution
You can extend this post’s solution in two ways:
Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.
Extend the solution to other Amazon Redshift clusters
To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.
Extend the solution to add other tables or custom queries to an Amazon Redshift cluster
The current solution ships with the export functionality for the following Amazon Redshift system tables:
stl_alert_event_log
stl_dlltext
stl_explain
stl_query
stl_querytext
stl_scan
stl_utilitytext
stl_wlm_query
You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:
Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.
returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)
In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.
Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.
Thanks to Raja Mani, AWS Solutions Architect, for this great blog.
—
In this blog post, I’ll walk you through the steps for setting up continuous replication of an AWS CodeCommit repository from one AWS region to another AWS region using a serverless architecture. CodeCommit is a fully-managed, highly scalable source control service that stores anything from source code to binaries. It works seamlessly with your existing Git tools and eliminates the need to operate your own source control system. Replicating an AWS CodeCommit repository from one AWS region to another AWS region enables you to achieve lower latency pulls for global developers. This same approach can also be used to automatically back up repositories currently hosted on other services (for example, GitHub or BitBucket) to AWS CodeCommit.
This solution uses AWS Lambda and AWS Fargate for continuous replication. Benefits of this approach include:
The replication process can be easily setup to trigger based on events, such as commits made to the repository.
Setting up a serverless architecture means you don’t need to provision, maintain, or administer servers.
Note: AWS Fargate has a limitation of 10 GB for storage and is available in US East (N. Virginia) region. A similar solution that uses Amazon EC2 instances to replicate the repositories on a schedule was published in a previous blog and can be used if your repository does not meet these conditions.
Replication using Fargate
As you follow this blog post, you’ll set up an architecture that looks like this:
Any change in the AWS CodeCommit repository will trigger a Lambda function. The Lambda function will call the Fargate task that replicates the repository using a Git command line tool.
Let us assume a user wants to replicate a repository (Source) from US East (N. Virginia/us-east-1) region to a repository (Destination) in US West (Oregon/us-west-2) region. I’ll walk you through the steps for it:
Prerequisites
Create an AWS Service IAM role for Amazon EC2 that has permission for both source and destination repositories, IAM CreateRole, AttachRolePolicy and Amazon ECR privileges. Here is the EC2 role policy I used:
You need a Docker environment to build this solution. You can launch an EC2 instance and install Docker (or) you can use AWS Cloud9 that comes with Docker and Git preinstalled. I used an EC2 instance and installed Docker in it. Use the IAM role created in the previous step when creating the EC2 instance. I am going to refer this environment as “Docker Environment” in the following steps.
You need to install the AWS CLI on the Docker environment. For AWS CLI installation, refer this page.
You need to install Git, including a Git command line on the Docker environment.
Step 1: Create the Docker image
To create the Docker image, first it needs a Dockerfile. A Dockerfile is a manifest that describes the base image to use for your Docker image and what you want installed and running on it. For more information about Dockerfiles, go to the Dockerfile Reference.
1. Choose a directory in the Docker environment and perform the following steps in that directory. I used /home/ec2-user directory to perform the following steps.
2. Clone the AWS CodeCommit repository in the Docker environment. Open the terminal to the Docker environment and run the following commands to clone your source AWS CodeCommit repository (I ran the commands from /home/ec2-user directory):
Note: Change the URL marked in red to your source and destination repository URL.
3. Create a file called Dockerfile (case sensitive) with the following content (I created it in /home/ec2-user directory):
# Pull the Amazon Linux latest base image
FROM amazonlinux:latest
#Install aws-cli and git command line tools
RUN yum -y install unzip aws-cli
RUN yum -y install git
WORKDIR /home/ec2-user
RUN mkdir LocalRepository
WORKDIR /home/ec2-user/LocalRepository
#Copy Cloned CodeCommit repository to Docker container
COPY ./LocalRepository /home/ec2-user/LocalRepository
#Copy shell script that does the replication
COPY ./repl_repository.bash /home/ec2-user/LocalRepository
RUN chmod ugo+rwx /home/ec2-user/LocalRepository/repl_repository.bash
WORKDIR /home/ec2-user/LocalRepository
#Call this script when Docker starts the container
ENTRYPOINT ["/home/ec2-user/LocalRepository/repl_repository.bash"]
4. Copy the following shell script into a file called repl_repository.bash to the DockerFile directory location in the Docker environment (I created it in /home/ec2-user directory)
6. Verify whether the replication is working by running the repl_repository.bash script from the LocalRepository directory. Go to LocalRepository directory and run this command: . ../repl_repository.bash If it is successful, you will get the “Everything up-to-date” at the last line of the result like this:
$ . ../repl_repository.bash
Everything up-to-date
Step 2: Build the Docker Image
1. Build the Docker image by running this command from the directory where you created the DockerFile in the Docker environment in the previous step (I ran it from /home/ec2-user directory):
$ docker build . –t ccrepl
Output: It installs various packages and set environment variables as part of steps 1 to 3 from the Dockerfile. The steps 4 to 11 from the Dockerfile should produce an output similar to the following:
2. Run the following command to verify that the image was created successfully. It will display “Everything up-to-date” at the end if it is successful.
[[email protected] LocalRepository]$ docker run ccrepl
Everything up-to-date
Step 3: Push the Docker Image to Amazon Elastic Container Registry (ECR)
Perform the following steps in the Docker Environment.
1. Run the AWS CLI configure command and set default region as your source repository region (I used us-east-1).
$ aws configure set default.region <Source Repository Region>
2. Create an Amazon ECR repository using this command to store your ccrepl image (Note the repositoryUri in the output):
2. Create a role called AccessRoleForCCfromFG using the following command in the DockerEnvironment:
$ aws iam create-role --role-name AccessRoleForCCfromFG --assume-role-policy-document file://trustpolicyforecs.json
3. Assign CodeCommit service full access to the above role using the following command in the DockerEnvironment:
$ aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AWSCodeCommitFullAccess --role-name AccessRoleForCCfromFG
4. In the Amazon ECS Console, choose Repositories and select the ccrepl repository that was created in the previous step. Copy the Repository URI.
5. In the Amazon ECS Console, choose Task Definitions and click Create New Task Definition.
6. Select launch type compatibility as FARGATE and click Next Step.
7. In the create task definition screen, do the following:
In Task Definition Name, type ccrepl
In Task Role, choose AccessRoleForCCfromFG
In Task Memory, choose 2GB
In Task CPU, choose 1 vCPU
Click Add Container under Container Definitions in the same screen. In the Add Container screen, do the following:
Enter Container name as ccreplcont
Enter Image URL copied from step 4
Enter Memory Limits as 128 and click Add.
Note: Select TaskExecutionRole as “ecsTaskExecutionRole” if it already exists. If not, select create new role and it will create “ecsTaskExecutionRole” for you.
8. Click the Create button in the task definition screen to create the task. It will successfully create the task, execution role and AWS CloudWatch Log groups.
9. In the Amazon ECS Console, click Clusters and create cluster. Select template as “Networking only, Powered by AWS Fargate” and click next step.
10. Enter cluster name as ccreplcluster and click create.
Step 5: Create the Lambda Function
In this section, I used Amazon Elastic Container Service (ECS) run task API from Lambda to invoke the Fargate task.
1. In the IAM Console, create a new role called ECSLambdaRole with the permissions to AWS CodeCommit, Amazon ECS as well as pass roles privileges needed to run the ECS task. Your statement should look similar to the following (replace <your account id>):
2. In AWS management console, select VPC service and click subnets in the left navigation screen. Note down the Subnet IDs that you want to run the Fargate task in.
3. Create a new Lambda Node.js function called FargateTaskExecutionFunc and assign the role ECSLambdaRole with the following content:
Note: Replace subnets values (marked in red color) with the subnet IDs you identified as the subnets you wanted to run the Fargate task on in Step 2 of this section.
1. In the Lambda Console, click FargateTaskExecutionFunc under functions.
2. Under Add triggers in the Designer, select CodeCommit
3. In the Configure triggers screen, do the following:
Enter Repository name as Source (your source repository name)
Enter trigger name as LambdaTrigger
Leave the Events as “All repository events”
Leave the Branch names as “All branches”
Click Add button
Click Save button to save the changes
Step 6: Verification
To test the application, make a commit and push the changes to the source repository in AWS CodeCommit. That should automatically trigger the Lambda function and replicate the changes in the destination repository. You can verify this by checking CloudWatch Logs for Lambda and ECS, or simply going to the destination repository and verifying the change appears.
Conclusion
Congratulations! You have successfully configured repository replication of an AWS CodeCommit repository using AWS Lambda and AWS Fargate. You can use this technique in a deployment pipeline. You can also tweak the trigger configuration in AWS CodeCommit to call the Lambda function in response to any supported trigger event in AWS CodeCommit.
Last year I wrote about the Digital Security Exchange. The project is live:
The DSX works to strengthen the digital resilience of U.S. civil society groups by improving their understanding and mitigation of online threats.
We do this by pairing civil society and social sector organizations with credible and trustworthy digital security experts and trainers who can help them keep their data and networks safe from exposure, exploitation, and attack. We are committed to working with community-based organizations, legal and journalistic organizations, civil rights advocates, local and national organizers, and public and high-profile figures who are working to advance social, racial, political, and economic justice in our communities and our world.
If you are either an organization who needs help, or an expert who can provide help, visit their website.
What happens when you combine the Internet of Things, Machine Learning, and Edge Computing? Before I tell you, let’s review each one and discuss what AWS has to offer.
Internet of Things (IoT) – Devices that connect the physical world and the digital one. The devices, often equipped with one or more types of sensors, can be found in factories, vehicles, mines, fields, homes, and so forth. Important AWS services include AWS IoT Core, AWS IoT Analytics, AWS IoT Device Management, and Amazon FreeRTOS, along with others that you can find on the AWS IoT page.
Machine Learning (ML) – Systems that can be trained using an at-scale dataset and statistical algorithms, and used to make inferences from fresh data. At Amazon we use machine learning to drive the recommendations that you see when you shop, to optimize the paths in our fulfillment centers, fly drones, and much more. We support leading open source machine learning frameworks such as TensorFlow and MXNet, and make ML accessible and easy to use through Amazon SageMaker. We also provide Amazon Rekognition for images and for video, Amazon Lex for chatbots, and a wide array of language services for text analysis, translation, speech recognition, and text to speech.
Edge Computing – The power to have compute resources and decision-making capabilities in disparate locations, often with intermittent or no connectivity to the cloud. AWS Greengrass builds on AWS IoT, giving you the ability to run Lambda functions and keep device state in sync even when not connected to the Internet.
ML Inference at the Edge Today I would like to toss all three of these important new technologies into a blender! You can now perform Machine Learning inference at the edge using AWS Greengrass. This allows you to use the power of the AWS cloud (including fast, powerful instances equipped with GPUs) to build, train, and test your ML models before deploying them to small, low-powered, intermittently-connected IoT devices running in those factories, vehicles, mines, fields, and homes that I mentioned.
Here are a few of the many ways that you can put Greengrass ML Inference to use:
Precision Farming – With an ever-growing world population and unpredictable weather that can affect crop yields, the opportunity to use technology to increase yields is immense. Intelligent devices that are literally in the field can process images of soil, plants, pests, and crops, taking local corrective action and sending status reports to the cloud.
Physical Security – Smart devices (including the AWS DeepLens) can process images and scenes locally, looking for objects, watching for changes, and even detecting faces. When something of interest or concern arises, the device can pass the image or the video to the cloud and use Amazon Rekognition to take a closer look.
Industrial Maintenance – Smart, local monitoring can increase operational efficiency and reduce unplanned downtime. The monitors can run inference operations on power consumption, noise levels, and vibration to flag anomalies, predict failures, detect faulty equipment.
Greengrass ML Inference Overview There are several different aspects to this new AWS feature. Let’s take a look at each one:
Machine Learning Models – Precompiled TensorFlow and MXNet libraries, optimized for production use on the NVIDIA Jetson TX2 and Intel Atom devices, and development use on 32-bit Raspberry Pi devices. The optimized libraries can take advantage of GPU and FPGA hardware accelerators at the edge in order to provide fast, local inferences.
Model Building and Training – The ability to use Amazon SageMaker and other cloud-based ML tools to build, train, and test your models before deploying them to your IoT devices. To learn more about SageMaker, read Amazon SageMaker – Accelerated Machine Learning.
Model Deployment – SageMaker models can (if you give them the proper IAM permissions) be referenced directly from your Greengrass groups. You can also make use of models stored in S3 buckets. You can add a new machine learning resource to a group with a couple of clicks:
There’s often tension between distributed and centralized control, especially in larger organizations. While a distributed control model allows teams to move fast and to respond to specialized local needs, a central model can provide the right level of oversight for global initiatives and challenges that span all teams.
We’ve seen this challenge arise first-hand when AWS customers grow to the point where their application footprint encompasses a plethora of AWS regions, AWS accounts, development teams, and applications. They love the fact that AWS increases their agility and responsiveness, while letting them deploy resources in the most appropriate location. This diversity and scale brings new challenges when it comes to security and compliance. The freedom to innovate must be balanced by the need to protect important data and to respond quickly when threats emerge.
Over the last couple of years we have provided our customers with an increasingly broad set of options for protection including AWS WAF and AWS Shield. Our customers are making great use of all of these options, and have asked for the ability to manage them from a single, central location.
Meet AWS Firewall Manager AWS Firewall Manager is designed to help these customers! It gives them the freedom to use multiple AWS accounts and to host applications in any desired region while maintaining centralized control over their organization’s security settings and profile. Developers can develop and innovators can innovate, while the security team gains the ability to respond quickly, uniformly, and globally to potential threats and actual attacks.
With automated policy enforcement across accounts & applications, your security team can be confident that new and existing applications comply with organization-wide security policies when they use Firewall Manager. They can find applications and AWS resources that don’t measure up, and bring them into compliance in minutes.
Firewall Manager is built around named policies that contain WAF rule sets and optional AWS Shield advanced protection. Each policy applies to a specific set of AWS resources, specified by account, resource type, resource identifier, or tag. Policies can be applied automatically to all matching resources, or to a subset that you select. Policies can include WAF rules drawn from within the organization, and also those created by AWS Partners such as Imperva, F5, Trend Micro, and other AWS Marketplace vendors. This gives your security team the power to duplicate their existing on-premises security posture in the cloud.
Take the Tour Firewall Manager has three prerequisites:
Firewall Administrator – You must designate one of the AWS accounts in your organization as the administrator for Firewall Manager. This gives the account permission to deploy AWS WAF rules across the organization.
AWS Config – You must enable AWS Config for all of the accounts in the Organization so that Firewall Manager can detect newly created resources (you can use the Enable AWS Config template on the StackSets Sample Templates page to take care of this). To learn more, read Getting Started with AWS Config.
Since I don’t own an enterprise, my colleagues were kind enough to create some test accounts for me! When I open the Firewall Manager Console in the master account, I can see where I stand with respect to the first two prerequisites:
The Learn more about… button reveals the Account ID of the administrator:
I switch to that account (in a a real-world situation it is unlikely that I would have access to the master account and this one), open the console, and see that I now meet the prerequisites. I click Create policy to move ahead:
The console outlines the process for me. I need to create rules and a rule group, define a policy with the rule group, define the scope of the policy, and then actually create the policy.
At the bottom of the page I choose to create a new policy and rule group, for resources in the US East (N. Virginia) Region, and click Next:
Then I specify the conditions for my rule, choosing from the following options:
Cross-site scripting
Geographic origin
SQL injection
IP address or range
Size constraint
String or regular expression
For example, I can create a condition that blocks malicious IP addresses (this AWS Solution shows you how to use a third-party reputation list with WAF, and may be helpful):
I’ll keep this one simple, but a rule can include multiple conditions. After I have added all of them, I click Next to proceed. Now I am ready to create my rule, and I click Createrule (I can add more conditions to it later if I want):
I give my rule a name (BlockExcludedIPs), enter a CloudWatch metric name, and add my condition (ExcludeIPs), then click Create:
I can create more rules, and include them in the same rule group. Again, I’ll keep this one simple, and click Next to move ahead:
I enter a name for my group, choose the rules that will make up the group, and click Create:
I now have two rule groups (testRuleGroup was already present in the account). I name my policy and click Next to proceed:
Now I define the scope of my policy. I choose the type of resource to be protected, and indicate when the policy should be applied:
I can also use tags to include or exclude resources:
Once I have defined the scope of my policy I click Next and review it, then click Create policy:
Now that the policy is in force, the ALBs within its scope are initially noncompliant:
Within minutes, Firewall Manager applies the policy and provides me with a status report:
Start Using AWS Firewall Manager Today You can start using AWS Firewall Manager today!
If you are using AWS Shield Advanced, you have access to AWS Firewall Manager and AWS WAF at no extra charge. If not, you are charged a monthly fee for each policy in each region, along with the usual charges for WAF WebACLs, WAF Rules, and AWS Config Rules.
Sinclair Broadcast Group е най-голяма тв група по брой медии и също така по покритие в САЩ. Известни са с новинарско съдържание и предавания, които популяризират консервативни политически позиции и са в подкрепа на Републиканската партия.
Когато Тръмп каже следното –
So funny to watch Fake News Networks, among the most dishonest groups of people I have ever dealt with, criticize Sinclair Broadcasting for being biased. Sinclair is far superior to CNN and even more Fake NBC, which is a total joke.
With the advent of AWS PrivateLink, you can provide services to AWS customers directly in their Virtual Private Networks by offering cross-account SaaS solutions on private IP addresses rather than over the Internet.
Traffic that flows to the services you provide does so over private AWS networking rather than over the Internet, offering security and performance enhancements, as well as convenience. PrivateLink can tie in with the AWS Marketplace, facilitating billing and providing a straightforward consumption model to your customers.
The use cases are myriad, but, for this blog post, we’ll demonstrate a fictional order-processing resource. The resource accepts JSON data over a RESTful API, simulating an interface. This could easily be an existing application being considered for a PrivateLink-based consumption model. Consumers of this resource send JSON payloads representing new orders and the system responds with order IDs corresponding to newly-created orders in the system. In a real-world scenario, additional APIs, such as authentication, might also represent critical aspects of the system. This example will not demonstrate these additional APIs because they could be consumed over PrivateLink in a similar fashion to the API constructed in the example.
I’ll demonstrate how to expose the resource on a private IP address in a customer’s VPC. I’ll also explain an architecture leveraging PrivateLink and provide detailed instructions for how to set up such a service. Finally, I’ll provide an example of how a customer might consume such a service. I’ll focus not only on how to architect the solution, but also the considerations that drive architectural choices.
Solution Overview
N.B.: Only two subnets and Availability Zones are shown per VPC for simplicity. Resources must cover all Availability Zones per Region, so that the application is available to all consumers in the region. The instructions in this post, which pertain to resources sitting in us-east-1 will detail the deployment of subnets in all six Availability Zones for this region.
This solution exposes an application’s HTTP-based API over PrivateLink in a provider’s AWS account. The application is a stateless web server running on Amazon Elastic Compute Cloud (EC2) instances. The provider places instances within a virtual private network (VPC) consisting of one private subnet per Availability Zone (AZ). Each AZ contains a subnet. Instances populate each subnet inside of Auto Scaling Groups (ASGs), maintaining a desired count per subnet. There is one ASG per subnet to ensure that the service is available in each AZ. An internal Network Load Balancer (NLB) sits in front of the entire fleet of application instances and an endpoint service is connected with the NLB.
In the customer’s AWS account, they create an endpoint that consumes the endpoint service from the provider’s account. The endpoint exposes an Elastic Network Interface (ENI) in each subnet the customer desires. Each ENI is assigned an IP address within the CIDR block associated with the subnet, for any number of subnets in any number of AZs within the region, for each customer.
PrivateLink facilitates cross-account access to services so the customer can use the provider’s service, feeding it data that exist within the customer’s account while using application logic and systems that run in the provider’s account. The routing between accounts is over private networking rather than over the Internet.
Though this example shows a simple, stateless service running on EC2 and sitting behind an NLB, many kinds of AWS services can be exposed through PrivateLink and can serve as pathways into a provider’s application, such as Amazon Kinesis Streams, Amazon EC2 Container Service, Amazon EC2 Systems Manager, and more.
Using PrivateLink to Establish a Service for Consumption
Building a service to be consumed through PrivateLink involves a few steps:
Build a VPC covering all AZs in region with private subnets
Create a NLB, listener, and target group for instances
Create a launch configuration and ASGs to manage the deployment of Amazon
EC2 instances in each subnet
Launch an endpoint service and connect it with the NLB
Tie endpoint-request approval with billing systems or the AWS Marketplace
Provide the endpoint service in multiple regions
Step 1: Build a VPC and private subnets
Start by determining the network you will need to serve the application. Keep in mind, that you will need to serve the application out of each AZ within any region you choose. Customers will expect to consume your service in multiple AZs because AWS recommends they architect their own applications to span across AZs for fault-tolerance purposes.
Additionally, anything less than full coverage across all AZs in a single region will not facilitate straightforward consumption of your service because AWS does not guarantee that a single AZ will carry the same name across accounts. In fact, AWS randomizes AZ names across accounts to ensure even distribution of independent workloads. Telling customers, for example, that you provide a service in us-east-1a may not give them sufficient information to connect with your service.
The solution is to serve your application in all AZs within a region because this guarantees that no matter what AZs a customer chooses for endpoint creation, that customer is guaranteed to find a running instance of your application with which to connect.
You can lay the foundations for doing this by creating a subnet in each AZ within the region of your choice. The subnets can be private because the service, exposed via PrivateLink, will not provide any publicly routable APIs.
This example uses the us-east-1 region. If you use a different region, the number of AZs may vary, which will change the number of subnets required, and thus the size of the IP address range for your VPC may require adjustments.
The example above creates a VPC with 128 IP addresses starting at 10.3.0.0. Each subnet will contain 16 IP addresses, using a total of 96 addresses in the space. Allocating a sufficient block of addresses requires some planning (though you can make adjustments later if needed). I’d suggest an equally-sized address space in each subnet because the provided service should embody the same performance, availability, and functionality regardless of which AZ your customers choose. Each subnet will need a sufficient address space to accommodate the number of instances you run within it. Additionally, you will need enough space to allow for one IP address per subnet to assign to that subnet’s NLB node’s Elastic Network Interface (ENI).
In this simple example, 16 IP addresses per subnet are enough because we will configure ASGs to maintain two instances each and the NLB requires one ENI. Each subnet reserves five IP addresses for internal purposes, for a total of eight IP addresses needed in each subnet to support the service.
Next, create the private subnets for each Availability Zone. The following demonstrates the creation of the first subnet, which sits in the us-east-1a AZ:
Repeat this step for each remaining AZ. If using the us-east-1 region, you will need to create private subnets in all AZs as follows:
For the purpose of this example, the subnets can leverage the default route table, as it contains a single rule for routing requests to private IP addresses in the VPC, as follows:
In a real-world case, additional routing may be required. For example, you may need additional routes to support VPC peering to access dependencies in other VPCs, connectivity to on-premises resources over DirectConnect or VPN, Internet-accessible dependencies via NAT, or other scenarios.
Security Group Creation
Instances will need to be placed in a security group that allows traffic from the NLB nodes that sit in each subnet.
All instances running the service should be in a security group accepting TCP traffic on the traffic port from any other IP address in the VPC. This will allow the NLB to forward traffic to those instances because the NLB nodes sit in the VPC and are assigned IP addresses in the subnets. In this example, the order processing server running on each instance exposes a service on port 3000, so the security group rule covers this port.
Create a security group for instances:
aws ec2 create-security-group \
--group-name "service-sg" \
--description "Security group for service instances" \
--vpc-id "vpc-2cadab54"
Step 2: Create a Network Load Balancer, Listener, and Target Group
The service integrates with PrivateLink using an internal NLB which sits in front of instances that run the service.
Step 3: Create a Launch Configuration and Auto Scaling Groups
Each private subnet in the VPC will require its own ASG in order to ensure that there is always a minimum number of instances in each subnet.
A single ASG spanning all subnets will not guarantee that every subnet contains the appropriate number of instances. For example, while a single ASG could be configured to work across six subnets and maintain twelve instances, there is no guarantee that each of the six subnets will contain two instances. To guarantee the appropriate number of instances on a per-subnet basis, each subnet must be configured with its own ASG.
New instances should be automatically created within each ASG based on a single launch configuration. The launch configuration should be set up to use an existing Amazon Machine Image (AMI).
This post presupposes you have an AMI that can be used to create new instances that serve the application. There are only a few basic assumptions to how this image is configured:
1. The image containes a web server that serves traffic (in this case, on port 3000) 2. The image is configured to automatically launch the web server as a daemon when the instance starts.
Create a launch configuration for the ASGs, providing the AMI ID, the ID of the security group created in previous steps (above), a key for access, and an instance type:
Repeat this process to create an ASG in each remaining subnet, using the same launch configuration and target group.
In this example, only two instances are created in each subnet. In a real-world scenario, additional instances would likely be recommended for both availability and scale. The ASGs use the provided launch configuration as a template for creating new instances.
When creating the ASGs, the ARN of the target group for the NLB is provided. This way, the ASGs automatically register newly-created instances with the target group so that the NLB can begin sending traffic to them.
Step 4: Launch an endpoint service and connect with NLB
Now, expose the service via PrivateLink with an endpoint service, providing the ARN of the NLB:
This endpoint service is configured to require acceptance. This means that new consumers who attempt to add endpoints that consume it will have to wait for the provider to allow access. This provides an opportunity to control access and integrate with billing systems that monetize the provided service.
Step 5: Tie endpoint request approval with billing system or the AWS Marketplace
If you’re maintaining your service as a private service, any account that is intended to have access must be whitelisted before it can find the endpoint service and create an endpoint to consume it.
For more information on listing a PrivateLink service in the AWS Marketplace, see How to List Your Product in AWS Marketplace (https://aws.amazon.com/blogs/apn/how-to-list-your-product-in-aws-marketplace/).
Most production-ready services offered through PrivateLink will require acceptance of Endpoint requests before customers can consume them. Typically, some level of automation around processing approvals is helpful. PrivateLink can publish on a Simple Notification Service (SNS) topic when customers request approval.
Setting this up requires two steps:
1. Create a new SNS topic 2. Create an endpoint connection notification that publishes to the SNS topic.
Each is discussed below.
Create an SNS Topic
First, create a new SNS Topic that can receive messages relating to endpoint service access requests:
This creates a new topic with the name “service-notification-topic”. Endpoint request approval systems can subscribe to this Topic so that acceptance can be automated.
Next, create a new Endpoint Connection Notification, so that messages will be published to the topic when new Endpoints connect and need to have access requests approved:
A billing system may ultimately tie in with request approval. This can also be done manually, which may be less useful, but is illustrative. As an example, assume that a customer account has already requested an endpoint to consume the service. The customer can be accepted manually, as follows:
At this point, the consumer can begin consuming the service.
Step 6: Take the Service Across Regions
In distributing SaaS via PrivateLink, providers may have to have to think about how to make their services available in different regions because Endpoint Services are only available within the region where they are created. Customers who attempt to consume Endpoint Services will not be able to create Endpoints across regions.
Rather than saddling consumers with the responsibility of making the jump across regions, we recommend providers work to make services available where their customers consume. They are in a better position to adapt their architectures to multiple regions than customers who do not know the internals of how providers have designed their services.
There are several architectural options that can support multi-region adaptation. Selection among them will depend on a number of factors, including read-to-write ratio, latency requirements, budget, amenability to re-architecture, and preference for simplicity.
Generally, the challenge in providing multi-region SaaS is in instantiating stateful components in multiple regions because the data on which such components depend are hard to replicate, synchronize, and access with low latency over large geographical distances.
Of all stateful components, perhaps the most frequently encountered will be databases. Some solutions for overcoming this challenge with respect to databases are as follows:
1. Provide a master in a single region; provide read replicas in every region. 2. Provide a master in every region; assign each tenant to one master only. 3. Create a full multi-master architecture; replicate data efficiently. 4. Rely on a managed service for replicating data cross-regionally (e.g., DynamoDB Global Tables).
Stateless components can be provisioned in multiple regions more easily. In this example, you will have to re-create all of the VPC resources—including subnets, Routing Tables, Security Groups, and Endpoint Services—as well as all EC2 resources—including instances, NLBs, Listeners, Target Groups, ASGs, and Launch Configurations—in each additional region. Because of the complexity in doing so, in addition to the significant need to keep regional configurations in-sync, you may wish to explore an orchestration tool such as CloudFormation, rather than the command line.
Regardless of what orchestration tooling you choose, you will need to copy your AMI to each region in which you wish to deploy it. Once available, you can build out your service in that region much as you did in the first one.
This copies the AMI used to provide the service in us-east-1 into us-east-2. Repeat the process for each region you wish to enable.
Consuming a Service via PrivateLink
To consume a service over PrivateLink, a customer must create a new Endpoint in their VPC within a Security Group that allows traffic on the traffic port.
Start by creating a Security Group to apply to a new Endpoint:
aws ec2 create-security-group \ --group-name "consumer-sg" \ --description "Security group for service consumers" \ --vpc-id "vpc-2cadab54"
Next, create an endpoint in the VPC, placing it in the Security Group:
The response will include an attribute called VpcEndpoint.DnsEntries. The service can be accessed at each of the DNS names in the output under any of the entries there. Before the consumer can access the endpoint service, the provider has to accept the Endpoint.
Access Endpoint Via Custom DNS Names
When creating a new Endpoint, the consumer will receive named endpoint addresses in each AZ where the Endpoint is created, plus a named endpoint that is AZ-agnostic. For example:
The consumer can use Route53 to provide a custom DNS name for the service. This not only allows for using cleaner service names, but also enables the consumer to leverage the traffic management features of Route53, such as fail-over routing.
First, the the consumer must enable DNS Hostnames and DNS Support on the VPC within which the Endpoint was created. The consumer should start by enabling DNS Hostnames:
After the VPC is properly configured to work with Route53, the consumer should either select an existing hosted zone or create a new one. Assuming one has not already been created, the consumer should create one as follows:
In the request, the consumer specifies the DNS name, VPC ID, region, and flags the hosted zone as private. Additionally, the consumer must provide a “caller reference” which is a unique ID of the request that can be used to identify it in subsequent actions if the request fails.
Next, the consumer should create a JSON file corresponding to a batch of record change requests. In this file, the consumer can specify the name of the endpoint, as well as a CNAME pointing to the AZ-agnostic DNS name of the Endpoint:
At this point, the Endpoint can be consumed at http://order-processor.endpoints.internal.
Conclusion
AWS PrivateLink is an exciting way to expose SaaS services to customers. This article demonstrated how to expose an existing application on EC2 via PrivateLink in a customer’s VPC, as well as recommended architecture. Finally, it walked through the steps that a customer would have to go through to consume the service.
This blog was contributed by Rucha Nene, Sr. Product Manager for Amazon EBS
AWS customers use tags to track ownership of resources, implement compliance protocols, control access to resources via IAM policies, and drive their cost accounting processes. Last year, we made tagging for Amazon EC2 instances and Amazon EBS volumes easier by adding the ability to tag these resources upon creation. We are now extending this capability to EBS snapshots.
Earlier, you could tag your EBS snapshots only after the resource had been created and sometimes, ended up with EBS snapshots in an untagged state if tagging failed. You also could not control the actions that users and groups could take over specific snapshots, or enforce tighter security policies.
To address these issues, we are making tagging for EBS snapshots more flexible and giving customers more control over EBS snapshots by introducing two new capabilities:
Tag on creation for EBS snapshots – You can now specify tags for EBS snapshots as part of the API call that creates the resource or via the Amazon EC2 Console when creating an EBS snapshot.
Resource-level permission and enforced tag usage – The CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttrribute API actions now support IAM resource-level permissions. You can now write IAM policies that mandate the use of specific tags when taking actions on EBS snapshots.
Tag on creation
You can now specify tags for EBS snapshots as part of the API call that creates the resources. The resource creation and the tagging are performed atomically; both must succeed in order for the operation CreateSnapshot to succeed. You no longer need to build tagging scripts that run after EBS snapshots have been created.
Here’s how you specify tags when you create an EBS snapshot, using the console:
CreateSnapshot, DeleteSnapshot, and ModifySnapshotAttribute now support resource-level permissions, which allow you to exercise more control over EBS snapshots. You can write IAM policies that give you precise control over access to resources and let you specify which users are able to create snapshots for a given set of volumes. You can also enforce the use of specific tags to help track resources and achieve more accurate cost allocation reporting.
For example, here’s a statement that requires that the costcenter tag (with a value of “115”) be present on the volume from which snapshots are being created. It requires that this tag be applied to all newly created snapshots. In addition, it requires that the created snapshots are tagged with User:username for the customer.
To implement stronger compliance and security policies, you could also restrict access to DeleteSnapshot, if the resource is not tagged with the user’s name. Here’s a statement that allows the deletion of a snapshot only if the snapshot is tagged with User:username for the customer.
Hadoop User Experience (Hue) is an open-source, web-based, graphical user interface for use with Amazon EMR and Apache Hadoop. The Hue database stores things like users, groups, authorization permissions, Apache Hive queries, Apache Oozie workflows, and so on.
There might come a time when you want to migrate your Hue database to a new EMR cluster. For example, you might want to upgrade from an older version of the Amazon EMR AMI (Amazon Machine Image), but your Hue application and its database have had a lot of customization.You can avoid re-creating these user entities and retain query/workflow histories in Hue by migrating the existing Hue database, or remote database in Amazon RDS, to a new cluster.
By default, Hue user information and query histories are stored in a local MySQL database on the EMR cluster’s master node. However, you can create one or more Hue-enabled clusters using a configuration stored in Amazon S3 and a remote MySQL database in Amazon RDS. This allows you to preserve user information and query history that Hue creates without keeping your Amazon EMR cluster running.
This post describes the step-by-step process for migrating the Hue database from an existing EMR cluster.
Note: Amazon EMR supports different Hue versions across different AMI releases. Keep in mind the compatibility of Hue versions between the old and new clusters in this migration activity. Currently, Hue 3.x.x versions are not compatible with Hue 4.x.x versions, and therefore a migration between these two Hue versions might create issues. In addition, Hue 3.10.0 is not backward compatible with its previous 3.x.x versions.
Before you begin
First, let’s create a new testUser in Hue on an existing EMR cluster, as shown following:
You will use these credentials later to log in to Hue on the new EMR cluster and validate whether you have successfully migrated the Hue database.
Let’s get started!
Migration how-to
Follow these steps to migrate your database to a new EMR cluster and then validate the migration process.
1.) Make a backup of the existing Hue database.
Use SSH to connect to the master node of the old cluster, as shown following (if you are using Linux/Unix/macOS), and dump the Hue database to a JSON file.
Edit the hue-mysql.json output file by removing all JSON objects that have useradmin.userprofile in the model field, and save the file. For example, remove the objects as shown following:
2.) Store the hue-mysql.json file on persistent storage like Amazon S3.
You can copy the file from the old EMR cluster to Amazon S3 using the AWS CLI or Secure Copy (SCP) client. For example, the following uses the AWS CLI:
b.) Connect to the Hue database—either the local MySQL database or the remote database in Amazon RDS for your cluster as shown following, using the mysql client.
$ mysql -h HOST –u USER –pPASSWORD
For a local MySQL database, you can find the hostname, user name, and password for connecting to the database in the /etc/hue/conf/hue.ini file on the master node.
[[database]]
engine = mysql
name = huedb
case_insensitive_collation = utf8_unicode_ci
test_charset = utf8
test_collation = utf8_bin
host = ip-172-31-37-133.us-west-2.compute.internal
user = hue
test_name = test_huedb
password = QdWbL3Ai6GcBqk26
port = 3306
Based on the preceding example configuration, the sample command is as follows. (Replace the host, user, and password details based on your EMR cluster settings.)
$ mysql -h ip-172-31-37-133.us-west-2.compute.internal -u hue -pQdWbL3Ai6GcBqk26
c.) Drop the existing Hue database with the name huedb from the MySQL server.
mysql> DROP DATABASE IF EXISTS huedb;
d.) Create a new empty database with the same name huedb.
mysql> CREATE DATABASE huedb DEFAULT CHARACTER SET utf8 DEFAULT COLLATE=utf8_bin;
i.) In MySQL, add the foreign key content_type_id back to the auth_permission
mysql> use huedb;
mysql> ALTER TABLE huedb.auth_permission ADD FOREIGN KEY (`content_type_id`) REFERENCES `django_content_type` (`id`);
j.) Start the Hue service again.
$ sudo start hue
hue start/running, process XXXX
That’s it! Now, verify whether you can successfully access the Hue UI, and sign in using your existing testUser credentials.
After a successful sign in to Hue on the new EMR cluster, you should see a similar Hue homepage as shown following with testUser as the user signed in:
Conclusion
You have now learned how to migrate an existing Hue database to a new Amazon EMR cluster and validate the migration process. If you have any similar Amazon EMR administration topics that you want to see covered in a future post, please let us know in the comments below.
Anvesh Ragi is a Big Data Support Engineer with Amazon Web Services. He works closely with AWS customers to provide them architectural and engineering assistance for their data processing workflows. In his free time, he enjoys traveling and going for hikes.
The kernel development community has consistently resisted adding any formal notion of what a “container” is to the kernel. While the needed building blocks (namespaces, control groups, etc.) are provided, it is up to user space to assemble the pieces into the sort of container implementation it needs. This approach maximizes flexibility and makes it possible to implement a number of different container abstractions, but it also can make it hard to associate events in the kernel with the container that caused them. Audit container IDs are an attempt to fix that problem for one specific use case; they have not been universally well received in the past, but work on this mechanism continues regardless.
I’m Charles Fort, a developer at Woot who specializes in deployments and developer experience. Woot is the original daily deals website. It was founded in 2004 and acquired by Amazon in 2010 – https://www.woot.com
We just moved our web front-end deployments from Troop, a deployment agent we developed ourselves, to AWS CodeDeploy and AWS CodePipeline. This migration involved launching a new customer-facing EC2 web server fleet with the CodeDeploy agent installed and creating and managing our CodeDeploy and CodePipeline resources with AWS CloudFormation. Immediately after we completed the migration, we observed a ~50% reduction in HTTP 500 errors during deployment.
In this blog post, I’m going to show you:
Why we chose AWS deployment tools.
An architectural diagram of Woot’s systems.
An overview of the migration project.
Our migration results.
The old and busted
We wanted to replace our in-house deployment system with something we could build automation on top of, and something that we didn’t have to own or maintain. We already own and maintain a build system, which is bad enough. We didn’t want to run additional infrastructure for our deployment pipeline.
In short, we wanted a cloud service. Because all of our infrastructure is in AWS, CodeDeploy was a natural fit to replace our deployment agent. CodePipeline acts as the automation orchestrator, our wizard in the cloud telling CodeDeploy what to deploy and when.
Architecture overview
Here’s a look at the architecture of a Woot web front end:
Woot architecture overview
Project overview
Our project involved migrating five web front ends, which together handle an average of 12 million requests per day, to CodeDeploy and CodePipeline while keeping the site live for our customers.
Here are the steps we took:
Wrote some new deployment scripts.
Launched a new fleet of EC2 web servers with CodeDeploy support.
Created a deployment pipeline for our CloudFormation-defined CodeDeploy and CodePipeline configuration.
Introduced our new fleet to live traffic. Hello, customers!
Deployment scripts
Our old deployment system didn’t stop or start our web server. Instead, it tried to swap out the build artifacts from under the server while the server was running. That’s definitely a bad idea.
We wrote some deployment scripts in Powershell that are run by CodeDeploy to stop and start our IIS web servers. These scripts work in conjunction with the Elastic Load Balancing (ELB) support in CodeDeploy because we certainly didn’t want to stop the web server while it’s serving customer traffic.
New fleet
Because our fleet is running on Amazon EC2, we built an Amazon Machine Image (AMI) for our web fleet with the CodeDeploy agent already installed. From a fleet perspective, this was most of what we had to do. With the agent installed, CodeDeploy can use our deployment scripts to deploy our web projects.
AWS CloudFormation deployment pipeline
Because we needed a deployment pipeline and several pieces of CodeDeploy configuration (a CodeDeploy application and at least one CodeDeploy deployment group) for each web project we want to deploy, we decided to use AWS CloudFormation to version this configuration. Our build system (TeamCity) can read from our version control system and write to Amazon S3. We made a simple build in TeamCity to push an AWS CloudFormation template to S3, which triggers a pipeline that deploys to AWS CloudFormation. This creates the CodePipeline and CodeDeploy resources. Now we can do code reviews on our infrastructure changes. More eyes is more safety! We can also trace infrastructure changes over time, just like we can with code changes.
Live traffic time!
Our web fleets run behind Classic Load Balancers. By choosing CodeDeploy, we can use its sweet new ELB features. For example, through an elastic load balancer, CodeDeploy can prevent internet traffic from being routed to an instance while it is being deployed to. After the deployment to that instance is complete, it then makes the instance available for traffic.
We launched new hosts with the CodeDeploy agent and deployed to them without ELB support turned on. Then we slowly, manually introduced them into our fleet while monitoring stats. After we had the new machines in, we slowly removed the old ones from the load balancer until our fleet was fully supported by CodeDeploy. Doing the migration this way resulted in 0 downtime for our sites.
One fun detail: When we had 2/3 of the new fleet in our load balancer, we triggered a CodeDeploy deployment to the fleet, but this time with ELB support turned on. This caused CodeDeploy to place the rest of the machines into the load balancer (coexisting with the old fleet), and there were slightly fewer buttons to press.
AWS CloudFormation example
This is a simplified example of the AWS CloudFormation template we use to manage the AWS configuration for one of our web projects. It is deployed in a deployment pipeline, much like the web projects themselves.
Parameters:
CodePipelineBucket:
Type: String
CodePipelineRole:
Type: String
CodeDeployRole:
Type: String
CodeDeployBucket:
Type: String
Resources:
### Woot.Example deployment configuration ###
ExampleDeploymentConfig:
Type: 'AWS::CodeDeploy::DeploymentConfig'
Properties:
MinimumHealthyHosts:
Type: FLEET_PERCENT
Value: '66' # Let's keep 2/3 of the fleet healthy at any point
#Woot.Example CodeDeploy application
WootExampleApplication:
Type: "AWS::CodeDeploy::Application"
Properties:
ApplicationName: "Woot.Example"
#Woot.Example CodeDeploy deployment groups
WootExampleDeploymentGroup:
DependsOn: "WootExampleApplication"
Type: "AWS::CodeDeploy::DeploymentGroup"
Properties:
ApplicationName: "Woot.Example"
DeploymentConfigName: !Ref "ExampleDeploymentConfig" # use the deployment configuration we created
DeploymentGroupName: "Woot.Example.Main"
AutoRollbackConfiguration:
Enabled: true
Events:
- DEPLOYMENT_FAILURE # this makes the deployment rollback when the deployment hits the failure threshold
- DEPLOYMENT_STOP_ON_REQUEST # this makes the deployment rollback if you hit the stop button
LoadBalancerInfo:
ElbInfoList:
- Name: "WootExampleInternal" # this is the ELB the hosts live in, they will be added and removed from here
DeploymentStyle:
DeploymentOption: "WITH_TRAFFIC_CONTROL" # this tells CodeDeploy to actually add/remove the hosts from the ELB
Ec2TagFilters:
-
Key: "Name"
Value: "exampleweb*" # deploy to all machines named like exampleweb001.yourdomain.com, etc
Type: "KEY_AND_VALUE"
ServiceRoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/${CodeDeployRole}" # this is the IAM role CodeDeploy in your account should use
#Woot.Example CodePipeline
WootExampleDeploymentPipeline:
DependsOn: "WootExampleDeploymentGroup"
Type: "AWS::CodePipeline::Pipeline"
Properties:
RoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/${CodePipelineRole}" # this is the IAM role CodePipeline in your account should use
Name: "Woot.Example" # name of the pipeline
ArtifactStore:
Type: S3
Location: !Ref "CodePipelineBucket" # the bucket CodePipeline uses in your account to shuffle artifacts around
Stages:
-
Name: Source # one S3 source stage
Actions:
-
Name: SourceAction
ActionTypeId:
Category: Source
Owner: AWS
Version: 1
Provider: S3
OutputArtifacts:
-
Name: SourceOutput
Configuration:
S3Bucket: !Ref "CodeDeployBucket" # the S3 bucket your builds go into (needs to be versioned)
S3ObjectKey: "Woot.Example/Woot.Example-Release.zip" # build artifact path for this application
RunOrder: 1
-
Name: Deploy # one deploy stage that triggers the CodeDeploy deployment
Actions:
-
Name: DeployAction
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CodeDeploy
InputArtifacts:
-
Name: SourceOutput
Configuration:
ApplicationName: "Woot.Example"
DeploymentGroupName: "Woot.Example.Main"
RunOrder: 2
Appspec.yml example
This is the appspec.yml file we use for our main web front end (Woot.Web.Retail). The appspec.yml file tells CodeDeploy where to put files and when to run our deployment scripts.
Because our server-launching infrastructure doesn’t use CodeDeploy for initial placement of the build artifacts, CodeDeploy won’t overwrite the files. (The service has no knowledge of them.) This is both good and bad: good because CodeDeploy won’t overwrite files it didn’t write, and bad because it means we have to have a deployment script like clearWebsiteDeployment.ps1:
# restart script as 64bit powershell if it's 32 bit
if ($PSHOME -like "*SysWOW64*") {
& (Join-Path ($PSHOME -replace "SysWOW64", "SysNative") powershell.exe) -File `
(Join-Path $PSScriptRoot $MyInvocation.MyCommand) @args
Exit $LastExitCode
}
Import-Module WebAdministration
Stop-Website -name $env:APPLICATION_NAME
#sleep for 10 seconds to give IIS a chance to stop
Start-Sleep -s 10
$website = Get-Website -name "*$env:APPLICATION_NAME*"
if ($website.state -ne 'stopped') {
throw "The website cannot be stopped"
}
startWebsite.ps1:
# restart script as 64bit powershell if it's 32 bit
if ($PSHOME -like "*SysWOW64*") {
& (Join-Path ($PSHOME -replace "SysWOW64", "SysNative") powershell.exe) -File `
(Join-Path $PSScriptRoot $MyInvocation.MyCommand) @args
Exit $LastExitCode
}
Import-Module WebAdministration
Start-Website -name $env:APPLICATION_NAME
#sleep for 10 seconds to give IIS a chance to start
Start-Sleep -s 10
$website = Get-Website -name "*$env:APPLICATION_NAME*"
if ($website.state -ne 'started') {
throw "The website cannot be started"
}
You can see we used a CodeDeploy environment variable ($env:APPLICATION_NAME) in our scripts. The name of the CodeDeploy application is also the name of the IIS website. This way, we can use the same deployment scripts for multiple websites.
The new hotness
Now that we’re running CodeDeploy in production we are extremely pleased with the results. Our old deployment agent, Troop, did not give us much control over the way releases went out. Now we can check on a deployment at a per-instance level, and the opportunities for automation are impressive.
After our migration, we saw a 50% reduction in HTTP 500 errors served to customers during deployments. We looked at the one-hour time slices during a deployment and compared the average count before and after our migration to CodeDeploy. These numbers show our old deployment system was hella busted (really broken).
This graph shows summed deployment errors over time and compares CodeDeploy to our legacy in-house deployment agent (Troop).
CodeDeploy vs Troop
We plan to implement a full cross-account release process on AWS deployment tools. We will have a single AWS account with a pipeline that controls CodeDeploy in our various environments, triggering tests and promoting to the next environment as they pass. Building something like that with our own tooling would take a lot of work. Thanks to CodeDeploy, CodePipeline, and AWS CloudFormation for making our lives easier.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.