Tag Archives: dimension

Colour sensing with a Raspberry Pi

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/colour-sensing-raspberry-pi/

In their latest video and tutorial, Electronic Hub shows you how to detect colour using a Raspberry Pi and a TCS3200 colour sensor.

Raspberry Pi Color Sensor (TCS3200) Interface | Color Detector

A simple Raspberry Pi based project using TCS3200 Color Sensor. The project demonstrates how to interface a Color Sensor (like TCS3200) with Raspberry Pi and implement a simple Color Detector using Raspberry Pi.

What is a TCS3200 colour sensor?

Colour sensors sense reflected light from nearby objects. The bright light of the TCS3200’s on-board white LEDs hits an object’s surface and is reflected back. The sensor has an 8×8 array of photodiodes, which are covered by either a red, blue, green, or clear filter. The type of filter determines what colour a diode can detect. Then the overall colour of an object is determined by how much light of each colour it reflects. (For example, a red object reflects mostly red light.)

Colour sensing with the TCS3200 Color Sensor and a Raspberry Pi

As Electronics Hub explains:

TCS3200 is one of the easily available colour sensors that students and hobbyists can work on. It is basically a light-to-frequency converter, i.e. based on colour and intensity of the light falling on it, the frequency of its output signal varies.

I’ll save you a physics lesson here, but you can find a detailed explanation of colour sensing and the TCS3200 on the Electronics Hub blog.

Raspberry Pi colour sensor

The TCS3200 colour sensor is connected to several of the onboard General Purpose Input Output (GPIO) pins on the Raspberry Pi.

Colour sensing with the TCS3200 Color Sensor and a Raspberry Pi

These connections allow the Raspberry Pi 3 to run one of two Python scripts that Electronics Hub has written for the project. The first displays the RAW RGB values read by the sensor. The second detects the primary colours red, green, and blue, and it can be expanded for more colours with the help of the first script.

Colour sensing with the TCS3200 Color Sensor and a Raspberry Pi

Electronic Hub’s complete build uses a breadboard for simply prototyping

Use it in your projects

This colour sensing setup is a simple means of adding a new dimension to your builds. Why not build a candy-sorting robot that organises your favourite sweets by colour? Or add colour sensing to your line-following buggy to allow for multiple path options!

If your Raspberry Pi project uses colour sensing, we’d love to see it, so be sure to share it in the comments!

The post Colour sensing with a Raspberry Pi appeared first on Raspberry Pi.

Build a house in Minecraft using Python

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/build-minecraft-house-using-python/

In this tutorial from The MagPi issue 68, Steve Martin takes us through the process of house-building in Minecraft Pi. Get your copy of The MagPi in stores now, or download it as a free PDF here.

Minecraft Pi is provided for free as part of the Raspbian operating system. To start your Minecraft: Pi Edition adventures, try our free tutorial Getting started with Minecraft.

Minecraft Raspberry Pi

Writing programs that create things in Minecraft is not only a great way to learn how to code, but it also means that you have a program that you can run again and again to make as many copies of your Minecraft design as you want. You never need to worry about your creation being destroyed by your brother or sister ever again — simply rerun your program and get it back! Whilst it might take a little longer to write the program than to build one house, once it’s finished you can build as many houses as you want.

Co-ordinates in Minecraft

Let’s start with a review of the coordinate system that Minecraft uses to know where to place blocks. If you are already familiar with this, you can skip to the next section. Otherwise, read on.

Minecraft Raspberry Pi Edition

Plan view of our house design

Minecraft shows us a three-dimensional (3D) view of the world. Imagine that the room you are in is the Minecraft world and you want to describe your location within that room. You can do so with three numbers, as follows:

  • How far across the room are you? As you move from side to side, you change this number. We can consider this value to be our X coordinate.
  • How high off the ground are you? If you are upstairs, or if you jump, this value increases. We can consider this value to be our Y coordinate.
  • How far into the room are you? As you walk forwards or backwards, you change this number. We can consider this value to be our Z coordinate.

You might have done graphs in school with X going across the page and Y going up the page. Coordinates in Minecraft are very similar, except that we have an extra value, Z, for our third dimension. Don’t worry if this still seems a little confusing: once we start to build our house, you will see how these three dimensions work in Minecraft.

Designing our house

It is a good idea to start with a rough design for our house. This will help us to work out the values for the coordinates when we are adding doors and windows to our house. You don’t have to plan every detail of your house right away. It is always fun to enhance it once you have got the basic design written. The image above shows the plan view of the house design that we will be creating in this tutorial. Note that because this is a plan view, it only shows the X and Z co-ordinates; we can’t see how high anything is. Hopefully, you can imagine the house extending up from the screen.

We will build our house close to where the Minecraft player is standing. This a good idea when creating something in Minecraft with Python, as it saves us from having to walk around the Minecraft world to try to find our creation.

Starting our program

Type in the code as you work through this tutorial. You can use any editor you like; we would suggest either Python 3 (IDLE) or Thonny Python IDE, both of which you can find on the Raspberry Pi menu under Programming. Start by selecting the File menu and creating a new file. Save the file with a name of your choice; it must end with .py so that the Raspberry Pi knows that it is a Python program.

It is important to enter the code exactly as it is shown in the listing. Pay particular attention to both the spelling and capitalisation (upper- or lower-case letters) used. You may find that when you run your program the first time, it doesn’t work. This is very common and just means there’s a small error somewhere. The error message will give you a clue about where the error is.

It is good practice to start all of your Python programs with the first line shown in our listing. All other lines that start with a # are comments. These are ignored by Python, but they are a good way to remind us what the program is doing.

The two lines starting with from tell Python about the Minecraft API; this is a code library that our program will be using to talk to Minecraft. The line starting mc = creates a connection between our Python program and the game. Then we get the player’s location broken down into three variables: x, y, and z.

Building the shell of our house

To help us build our house, we define three variables that specify its width, height, and depth. Defining these variables makes it easy for us to change the size of our house later; it also makes the code easier to understand when we are setting the co-ordinates of the Minecraft bricks. For now, we suggest that you use the same values that we have; you can go back and change them once the house is complete and you want to alter its design.

It’s now time to start placing some bricks. We create the shell of our house with just two lines of code! These lines of code each use the setBlocks command to create a complete block of bricks. This function takes the following arguments:

setBlocks(x1, y1, z1, x2, y2, z2, block-id, data)

x1, y1, and z1 are the coordinates of one corner of the block of bricks that we want to create; x1, y1, and z1 are the coordinates of the other corner. The block-id is the type of block that we want to use. Some blocks require another value called data; we will see this being used later, but you can ignore it for now.

We have to work out the values that we need to use in place of x1, y1, z1, x1, y1, z1 for our walls. Note that what we want is a larger outer block made of bricks and that is filled with a slightly smaller block of air blocks. Yes, in Minecraft even air is actually just another type of block.

Once you have typed in the two lines that create the shell of your house, you almost ready to run your program. Before doing so, you must have Minecraft running and displaying the contents of your world. Do not have a world loaded with things that you have created, as they may get destroyed by the house that we are building. Go to a clear area in the Minecraft world before running the program. When you run your program, check for any errors in the ‘console’ window and fix them, repeatedly running the code again until you’ve corrected all the errors.

You should see a block of bricks now, as shown above. You may have to turn the player around in the Minecraft world before you can see your house.

Adding the floor and door

Now, let’s make our house a bit more interesting! Add the lines for the floor and door. Note that the floor extends beyond the boundary of the wall of the house; can you see how we achieve this?

Hint: look closely at how we calculate the x and z attributes as compared to when we created the house shell above. Also note that we use a value of y-1 to create the floor below our feet.

Minecraft doors are two blocks high, so we have to create them in two parts. This is where we have to use the data argument. A value of 0 is used for the lower half of the door, and a value of 8 is used for the upper half (the part with the windows in it). These values will create an open door. If we add 4 to each of these values, a closed door will be created.

Before you run your program again, move to a new location in Minecraft to build the house away from the previous one. Then run it to check that the floor and door are created; you will need to fix any errors again. Even if your program runs without errors, check that the floor and door are positioned correctly. If they aren’t, then you will need to check the arguments so setBlock and setBlocks are exactly as shown in the listing.

Adding windows

Hopefully you will agree that your house is beginning to take shape! Now let’s add some windows. Looking at the plan for our house, we can see that there is a window on each side; see if you can follow along. Add the four lines of code, one for each window.

Now you can move to yet another location and run the program again; you should have a window on each side of the house. Our house is starting to look pretty good!

Adding a roof

The final stage is to add a roof to the house. To do this we are going to use wooden stairs. We will do this inside a loop so that if you change the width of your house, more layers are added to the roof. Enter the rest of the code. Be careful with the indentation: I recommend using spaces and avoiding the use of tabs. After the if statement, you need to indent the code even further. Each indentation level needs four spaces, so below the line with if on it, you will need eight spaces.

Since some of these code lines are lengthy and indented a lot, you may well find that the text wraps around as you reach the right-hand side of your editor window — don’t worry about this. You will have to be careful to get those indents right, however.

Now move somewhere new in your world and run the complete program. Iron out any last bugs, then admire your house! Does it look how you expect? Can you make it better?

Customising your house

Now you can start to customise your house. It is a good idea to use Save As in the menu to save a new version of your program. Then you can keep different designs, or refer back to your previous program if you get to a point where you don’t understand why your new one doesn’t work.

Consider these changes:

  • Change the size of your house. Are you able also to move the door and windows so they stay in proportion?
  • Change the materials used for the house. An ice house placed in an area of snow would look really cool!
  • Add a back door to your house. Or make the front door a double-width door!

We hope that you have enjoyed writing this program to build a house. Now you can easily add a house to your Minecraft world whenever you want to by simply running this program.

Get the complete code for this project here.

Continue your Minecraft journey

Minecraft Pi’s programmable interface is an ideal platform for learning Python. If you’d like to try more of our free tutorials, check out:

You may also enjoy Martin O’Hanlon’s and David Whale’s Adventures in Minecraft, and the Hacking and Making in Minecraft MagPi Essentials guide, which you can download for free or buy in print here.

The post Build a house in Minecraft using Python appeared first on Raspberry Pi.

SoFi, the underwater robotic fish

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/robotic-fish/

With the Greenland shark finally caught on video for the very first time, scientists and engineers are discussing the limitations of current marine monitoring technology. One significant advance comes from the CSAIL team at Massachusetts Institute of Technology (MIT): SoFi, the robotic fish.

A Robotic Fish Swims in the Ocean

More info: http://bit.ly/SoFiRobot Paper: http://robert.katzschmann.eu/wp-content/uploads/2018/03/katzschmann2018exploration.pdf

The untethered SoFi robot

Last week, the Computer Science and Artificial Intelligence Laboratory (CSAIL) team at MIT unveiled SoFi, “a soft robotic fish that can independently swim alongside real fish in the ocean.”

MIT CSAIL underwater fish SoFi using Raspberry Pi

Directed by a Super Nintendo controller and acoustic signals, SoFi can dive untethered to a maximum of 18 feet for a total of 40 minutes. A Raspberry Pi receives input from the controller and amplifies the ultrasound signals for SoFi via a HiFiBerry. The controller, Raspberry Pi, and HiFiBerry are sealed within a waterproof, cast-moulded silicone membrane filled with non-conductive mineral oil, allowing for underwater equalisation.

MIT CSAIL underwater fish SoFi using Raspberry Pi

The ultrasound signals, received by a modem within SoFi’s head, control everything from direction, tail oscillation, pitch, and depth to the onboard camera.

As explained on MIT’s news blog, “to make the robot swim, the motor pumps water into two balloon-like chambers in the fish’s tail that operate like a set of pistons in an engine. As one chamber expands, it bends and flexes to one side; when the actuators push water to the other channel, that one bends and flexes in the other direction.”

MIT CSAIL underwater fish SoFi using Raspberry Pi

Ocean exploration

While we’ve seen many autonomous underwater vehicles (AUVs) using onboard Raspberry Pis, SoFi’s ability to roam untethered with a wireless waterproof controller is an exciting achievement.

“To our knowledge, this is the first robotic fish that can swim untethered in three dimensions for extended periods of time. We are excited about the possibility of being able to use a system like this to get closer to marine life than humans can get on their own.” – CSAIL PhD candidate Robert Katzschmann

As the MIT news post notes, SoFi’s simple, lightweight setup of a single camera, a motor, and a smartphone lithium polymer battery set it apart it from existing bulky AUVs that require large motors or support from boats.

For more in-depth information on SoFi and the onboard tech that controls it, find the CSAIL team’s paper here.

The post SoFi, the underwater robotic fish appeared first on Raspberry Pi.

Your Hard Drive Crashed — Get Working Again Fast with Backblaze

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/how-to-recover-your-files-with-backblaze/

holding a hard drive and diagnostic tools
The worst thing for a computer user has happened. The hard drive on your computer crashed, or your computer is lost or completely unusable.

Fortunately, you’re a Backblaze customer with a current backup in the cloud. That’s great. The challenge is that you’ve got a presentation to make in just 48 hours and the document and materials you need for the presentation were on the hard drive that crashed.

Relax. Backblaze has your data (and your back). The question is, how do you get what you need to make that presentation deadline?

Here are some strategies you could use.

One — The first approach is to get back the presentation file and materials you need to meet your presentation deadline as quickly as possible. You can use another computer (maybe even your smartphone) to make that presentation.

Two — The second approach is to get your computer (or a new computer, if necessary) working again and restore all the files from your Backblaze backup.

Let’s start with Option One, which gets you back to work with just the files you need now as quickly as possible.

Option One — You’ve Got a Deadline and Just Need Your Files

Getting Back to Work Immediately

You want to get your computer working again as soon as possible, but perhaps your top priority is getting access to the files you need for your presentation. The computer can wait.

Find a Computer to Use

First of all. You’re going to need a computer to use. If you have another computer handy, you’re all set. If you don’t, you’re going to need one. Here are some ideas on where to find one:

  • Family and Friends
  • Work
  • Neighbors
  • Local library
  • Local school
  • Community or religious organization
  • Local computer shop
  • Online store

Laptop computer

If you have a smartphone that you can use to give your presentation or to print materials, that’s great. With the Backblaze app for iOS and Android, you can download files directly from your Backblaze account to your smartphone. You also have the option with your smartphone to email or share files from your Backblaze backup so you can use them elsewhere.

Laptop with smartphone

Download The File(s) You Need

Once you have the computer, you need to connect to your Backblaze backup through a web browser or the Backblaze smartphone app.

Backblaze Web Admin

Sign into your Backblaze account. You can download the files directly or use the share link to share files with yourself or someone else.

If you need step-by-step instructions on retrieving your files, see Restore the Files to the Drive section below. You also can find help at https://help.backblaze.com/hc/en-us/articles/217665888-How-to-Create-a-Restore-from-Your-Backblaze-Backup.

Smartphone App

If you have an iOS or Android smartphone, you can use the Backblaze app and retrieve the files you need. You then could view the file on your phone, use a smartphone app with the file, or email it to yourself or someone else.

Backblaze Smartphone app (iOS)

Backblaze Smartphone app (iOS)

Using one of the approaches above, you got your files back in time for your presentation. Way to go!

Now, the next step is to get the computer with the bad drive running again and restore all your files, or, if that computer is no longer usable, restore your Backblaze backup to a new computer.

Option Two — You Need a Working Computer Again

Getting the Computer with the Failed Drive Running Again (or a New Computer)

If the computer with the failed drive can’t be saved, then you’re going to need a new computer. A new computer likely will come with the operating system installed and ready to boot. If you’ve got a running computer and are ready to restore your files from Backblaze, you can skip forward to Restore the Files to the Drive.

If you need to replace the hard drive in your computer before you restore your files, you can continue reading.

Buy a New Hard Drive to Replace the Failed Drive

The hard drive is gone, so you’re going to need a new drive. If you have a computer or electronics store nearby, you could get one there. Another choice is to order a drive online and pay for one or two-day delivery. You have a few choices:

  1. Buy a hard drive of the same type and size you had
  2. Upgrade to a drive with more capacity
  3. Upgrade to an SSD. SSDs cost more but they are faster, more reliable, and less susceptible to jolts, magnetic fields, and other hazards that can affect a drive. Otherwise, they work the same as a hard disk drive (HDD) and most likely will work with the same connector.


Hard Disk Drive (HDD)Solid State Drive (SSD)

Hard Disk Drive (HDD)

Solid State Drive (SSD)


Be sure that the drive dimensions are compatible with where you’re going to install the drive in your computer, and the drive connector is compatible with your computer system (SATA, PCIe, etc.) Here’s some help.

Install the Drive

If you’re handy with computers, you can install the drive yourself. It’s not hard, and there are numerous videos on YouTube and elsewhere on how to do this. Just be sure to note how everything was connected so you can get everything connected and put back together correctly. Also, be sure that you discharge any static electricity from your body by touching something metallic before you handle anything inside the computer. If all this sounds like too much to handle, find a friend or a local computer store to help you.

Note:  If the drive that failed is a boot drive for your operating system (either Macintosh or Windows), you need to make sure that the drive is bootable and has the operating system files on it. You may need to reinstall from an operating system source disk or install files.

Restore the Files to the Drive

To start, you will need to sign in to the Backblaze website with your registered email address and password. Visit https://secure.backblaze.com/user_signin.htm to login.

Sign In to Your Backblaze Account

Selecting the Backup

Once logged in, you will be brought to the account Overview page. On this page, all of the computers registered for backup under your account are shown with some basic information about each. Select the backup from which you wish to restore data by using the appropriate “Restore” button.

Screenshot of Admin for Selecting the Type of Restore

Selecting the Type of Restore

Backblaze offers three different ways in which you can receive your restore data: downloadable ZIP file, USB flash drive, or USB hard drive. The downloadable ZIP restore option will create a ZIP file of the files you request that is made available for download for 7 days. ZIP restores do not have any additional cost and are a great option for individual files or small sets of data.

Depending on the speed of your internet connection to the Backblaze data center, downloadable restores may not always be the best option for restoring very large amounts of data. ZIP restores are limited to 500 GB per request and a maximum of 5 active requests can be submitted under a single account at any given time.

USB flash and hard drive restores are built with the data you request and then shipped to an address of your choosing via FedEx Overnight or FedEx Priority International. USB flash restores cost $99 and can contain up to 128 GB (110,000 MB of data) and USB hard drive restores cost $189 and can contain up to 4TB max (3,500,000 MB of data). Both include the cost of shipping.

You can return the ZIP drive within 30 days for a full refund with our Restore Return Refund Program, effectively making the process of restoring free, even with a shipped USB drive.

Screenshot of Admin for Selecting the Backup

Selecting Files for Restore

Using the left hand file viewer, navigate to the location of the files you wish to restore. You can use the disclosure triangles to see subfolders. Clicking on a folder name will display the folder’s files in the right hand file viewer. If you are attempting to restore files that have been deleted or are otherwise missing or files from a failed or disconnected secondary or external hard drive, you may need to change the time frame parameters.

Put checkmarks next to disks, files or folders you’d like to recover. Once you have selected the files and folders you wish to restore, select the “Continue with Restore” button above or below the file viewer. Backblaze will then build the restore via the option you select (ZIP or USB drive). You’ll receive an automated email notifying you when the ZIP restore has been built and is ready for download or when the USB restore drive ships.

If you are using the downloadable ZIP option, and the restore is over 2 GB, we highly recommend using the Backblaze Downloader for better speed and reliability. We have a guide on using the Backblaze Downloader for Mac OS X or for Windows.

For additional assistance, visit our help files at https://help.backblaze.com/hc/en-us/articles/217665888-How-to-Create-a-Restore-from-Your-Backblaze-Backup

Screenshot of Admin for Selecting Files for Restore

Extracting the ZIP

Recent versions of both macOS and Windows have built-in capability to extract files from a ZIP archive. If the built-in capabilities aren’t working for you, you can find additional utilities for Macintosh and Windows.

Reactivating your Backblaze Account

Now that you’ve got a working computer again, you’re going to need to reinstall Backblaze Backup (if it’s not on the system already) and connect with your existing account. Start by downloading and reinstalling Backblaze.

If you’ve restored the files from your Backblaze Backup to your new computer or drive, you don’t want to have to reupload the same files again to your Backblaze backup. To let Backblaze know that this computer is on the same account and has the same files, you need to use “Inherit Backup State.” See https://help.backblaze.com/hc/en-us/articles/217666358-Inherit-Backup-State

Screenshot of Admin for Inherit Backup State

That’s It

You should be all set, either with the files you needed for your presentation, or with a restored computer that is again ready to do productive work.

We hope your presentation wowed ’em.

If you have any additional questions on restoring from a Backblaze backup, please ask away in the comments. Also, be sure to check out our help resources at https://www.backblaze.com/help.html.

The post Your Hard Drive Crashed — Get Working Again Fast with Backblaze appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

LED cubes and how to map them

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/led-panel-cubes-and-how-to-build-them/

Taking inspiration from a cube he had filmed at the 34th Chaos Communication Congress in Leipzig, Germany, polyfloyd gathered friends Sebastius and Boekenwuurm together to create their own.

The build

As polyfloyd’s blog post for the project notes, Sebastius led the way with the hardware portion of the build. The cube is made from six LED panels driven by a Raspberry Pi, and uses a breakout board to support the panels, which are connected in pairs:

The displays are connected in 3 chains, the maximum number of parallel chains the board supports, of 2 panels each. Having a higher degree of parallelization increases the refresh rate which in turn improves the overall image quality.

The first two chains make up the 4 sides. The remaining chain makes up the top and bottom of the cube.

Sebastius removed the plastic frames that come as standard on the panels, in order to allow them to fit together snugly as a cube. He designed and laser-cut a custom frame from plywood to support the panels instead.

Raspberry Pi LED Cube

Software

The team used hzeller’s software to drive the panels, and polyfloyd wrote their own program to “shove the pixels around”. polyfloyd used Ledcat, software they had made to drive previous LED projects, and adapted this interface so programs written for Ledcat would also work with hzeller’s library.

The full code for the project can be found on polyfloyd’s GitHub profile. It includes the ability to render animations to gzipped files, and to stream animations in real time via SSH.

Mapping 2D and spherical images with shaders

“One of the programs that could work with my LED-panels through [Unix] pipes was Shady,” observes polyfloyd, explaining the use of shaders with the cube. “The program works by rendering OpenGL fragment shaders to an RGB24 format which could then be piped to wherever needed. These shaders are small programs that can render an image by calculating the color for each pixel on the screen individually.”

The team programmed a shader to map the two-dimensional position of pixels in an image to the three-dimensional space of the cube. This then allowed the team to apply the mapping to spherical images, such as the globe in the video below:

The team has interesting plans for the cube moving forward, including the addition of an accelerometer and batteries. Follow their progress on the polyfloyd blog.

Fun with LED panels

The internet is full of amazing Raspberry Pi projects that use LED panels. This recent project available on Instructables shows how to assemble and set up a particle generator, while this one, featured on this blog last year, tracks emojis used on the Chelsea Handler: Gotta Go! app.

The post LED cubes and how to map them appeared first on Raspberry Pi.

Improve the Operational Efficiency of Amazon Elasticsearch Service Domains with Automated Alarms Using Amazon CloudWatch

Post Syndicated from Veronika Megler original https://aws.amazon.com/blogs/big-data/improve-the-operational-efficiency-of-amazon-elasticsearch-service-domains-with-automated-alarms-using-amazon-cloudwatch/

A customer has been successfully creating and running multiple Amazon Elasticsearch Service (Amazon ES) domains to support their business users’ search needs across products, orders, support documentation, and a growing suite of similar needs. The service has become heavily used across the organization.  This led to some domains running at 100% capacity during peak times, while others began to run low on storage space. Because of this increased usage, the technical teams were in danger of missing their service level agreements.  They contacted me for help.

This post shows how you can set up automated alarms to warn when domains need attention.

Solution overview

Amazon ES is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities along with the availability, scalability, and security that production workloads require.  The service offers built-in integrations with a number of other components and AWS services, enabling customers to go from raw data to actionable insights quickly and securely.

One of these other integrated services is Amazon CloudWatch. CloudWatch is a monitoring service for AWS Cloud resources and the applications that you run on AWS. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.

CloudWatch collects metrics for Amazon ES. You can use these metrics to monitor the state of your Amazon ES domains, and set alarms to notify you about high utilization of system resources.  For more information, see Amazon Elasticsearch Service Metrics and Dimensions.

While the metrics are automatically collected, the missing piece is how to set alarms on these metrics at appropriate levels for each of your domains. This post includes sample Python code to evaluate the current state of your Amazon ES environment, and to set up alarms according to AWS recommendations and best practices.

There are two components to the sample solution:

  • es-check-cwalarms.py: This Python script checks the CloudWatch alarms that have been set, for all Amazon ES domains in a given account and region.
  • es-create-cwalarms.py: This Python script sets up a set of CloudWatch alarms for a single given domain.

The sample code can also be found in the amazon-es-check-cw-alarms GitHub repo. The scripts are easy to extend or combine, as described in the section “Extensions and Adaptations”.

Assessing the current state

The first script, es-check-cwalarms.py, is used to give an overview of the configurations and alarm settings for all the Amazon ES domains in the given region. The script takes the following parameters:

python es-checkcwalarms.py -h
usage: es-checkcwalarms.py [-h] [-e ESPREFIX] [-n NOTIFY] [-f FREE][-p PROFILE] [-r REGION]
Checks a set of recommended CloudWatch alarms for Amazon Elasticsearch Service domains (optionally, those beginning with a given prefix).
optional arguments:
  -h, --help   		show this help message and exit
  -e ESPREFIX, --esprefix ESPREFIX	Only check Amazon Elasticsearch Service domains that begin with this prefix.
  -n NOTIFY, --notify NOTIFY    List of CloudWatch alarm actions; e.g. ['arn:aws:sns:xxxx']
  -f FREE, --free FREE  Minimum free storage (MB) on which to alarm
  -p PROFILE, --profile PROFILE     IAM profile name to use
  -r REGION, --region REGION       AWS region for the domain. Default: us-east-1

The script first identifies all the domains in the given region (or, optionally, limits them to the subset that begins with a given prefix). It then starts running a set of checks against each one.

The script can be run from the command line or set up as a scheduled Lambda function. For example, for one customer, it was deemed appropriate to regularly run the script to check that alarms were correctly set for all domains. In addition, because configuration changes—cluster size increases to accommodate larger workloads being a common change—might require updates to alarms, this approach allowed the automatic identification of alarms no longer appropriately set as the domain configurations changed.

The output shown below is the output for one domain in my account.

Starting checks for Elasticsearch domain iotfleet , version is 53
Iotfleet Automated snapshot hour (UTC): 0
Iotfleet Instance configuration: 1 instances; type:m3.medium.elasticsearch
Iotfleet Instance storage definition is: 4 GB; free storage calced to: 819.2 MB
iotfleet Desired free storage set to (in MB): 819.2
iotfleet WARNING: Not using VPC Endpoint
iotfleet WARNING: Does not have Zone Awareness enabled
iotfleet WARNING: Instance count is ODD. Best practice is for an even number of data nodes and zone awareness.
iotfleet WARNING: Does not have Dedicated Masters.
iotfleet WARNING: Neither index nor search slow logs are enabled.
iotfleet WARNING: EBS not in use. Using instance storage only.
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.yellow-Alarm ClusterStatus.yellow
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.red-Alarm ClusterStatus.red
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-CPUUtilization-Alarm CPUUtilization
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-JVMMemoryPressure-Alarm JVMMemoryPressure
iotfleet WARNING: Missing alarm!! ('ClusterIndexWritesBlocked', 'Maximum', 60, 5, 'GreaterThanOrEqualToThreshold', 1.0)
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-AutomatedSnapshotFailure-Alarm AutomatedSnapshotFailure
iotfleet Alarm: Threshold does not match: Test-Elasticsearch-iotfleet-FreeStorageSpace-Alarm Should be:  819.2 ; is 3000.0

The output messages fall into the following categories:

  • System overview, Informational: The Amazon ES version and configuration, including instance type and number, storage, automated snapshot hour, etc.
  • Free storage: A calculation for the appropriate amount of free storage, based on the recommended 20% of total storage.
  • Warnings: best practices that are not being followed for this domain. (For more about this, read on.)
  • Alarms: An assessment of the CloudWatch alarms currently set for this domain, against a recommended set.

The script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. Using the array allows alarm parameters (such as free space) to be updated within the code based on current domain statistics and configurations.

For a given domain, the script checks if each alarm has been set. If the alarm is set, it checks whether the values match those in the array esAlarms. In the output above, you can see three different situations being reported:

  • Alarm ok; definition matches. The alarm set for the domain matches the settings in the array.
  • Alarm: Threshold does not match. An alarm exists, but the threshold value at which the alarm is triggered does not match.
  • WARNING: Missing alarm!! The recommended alarm is missing.

All in all, the list above shows that this domain does not have a configuration that adheres to best practices, nor does it have all the recommended alarms.

Setting up alarms

Now that you know that the domains in their current state are missing critical alarms, you can correct the situation.

To demonstrate the script, set up a new domain named “ver”, in us-west-2. Specify 1 node, and a 10-GB EBS disk. Also, create an SNS topic in us-west-2 with a name of “sendnotification”, which sends you an email.

Run the second script, es-create-cwalarms.py, from the command line. This script creates (or updates) the desired CloudWatch alarms for the specified Amazon ES domain, “ver”.

python es-create-cwalarms.py -r us-west-2 -e test -c ver -n "['arn:aws:sns:us-west-2:xxxxxxxxxx:sendnotification']"
EBS enabled: True type: gp2 size (GB): 10 No Iops 10240  total storage (MB)
Desired free storage set to (in MB): 2048.0
Creating  Test-Elasticsearch-ver-ClusterStatus.yellow-Alarm
Creating  Test-Elasticsearch-ver-ClusterStatus.red-Alarm
Creating  Test-Elasticsearch-ver-CPUUtilization-Alarm
Creating  Test-Elasticsearch-ver-JVMMemoryPressure-Alarm
Creating  Test-Elasticsearch-ver-FreeStorageSpace-Alarm
Creating  Test-Elasticsearch-ver-ClusterIndexWritesBlocked-Alarm
Creating  Test-Elasticsearch-ver-AutomatedSnapshotFailure-Alarm
Successfully finished creating alarms!

As with the first script, this script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. This approach allows you to add or modify alarms based on your use case (more on that below).

After running the script, navigate to Alarms on the CloudWatch console. You can see the set of alarms set up on your domain.

Because the “ver” domain has only a single node, cluster status is yellow, and that alarm is in an “ALARM” state. It’s already sent a notification that the alarm has been triggered.

What to do when an alarm triggers

After alarms are set up, you need to identify the correct action to take for each alarm, which depends on the alarm triggered. For ideas, guidance, and additional pointers to supporting documentation, see Get Started with Amazon Elasticsearch Service: Set CloudWatch Alarms on Key Metrics. For information about common errors and recovery actions to take, see Handling AWS Service Errors.

In most cases, the alarm triggers due to an increased workload. The likely action is to reconfigure the system to handle the increased workload, rather than reducing the incoming workload. Reconfiguring any backend store—a category of systems that includes Elasticsearch—is best performed when the system is quiescent or lightly loaded. Reconfigurations such as setting zone awareness or modifying the disk type cause Amazon ES to enter a “processing” state, potentially disrupting client access.

Other changes, such as increasing the number of data nodes, may cause Elasticsearch to begin moving shards, potentially impacting search performance on these shards while this is happening. These actions should be considered in the context of your production usage. For the same reason I also do not recommend running a script that resets all domains to match best practices.

Avoid the need to reconfigure during heavy workload by setting alarms at a level that allows a considered approach to making the needed changes. For example, if you identify that each weekly peak is increasing, you can reconfigure during a weekly quiet period.

While Elasticsearch can be reconfigured without being quiesced, it is not a best practice to automatically scale it up and down based on usage patterns. Unlike some other AWS services, I recommend against setting a CloudWatch action that automatically reconfigures the system when alarms are triggered.

There are other situations where the planned reconfiguration approach may not work, such as low or zero free disk space causing the domain to reject writes. If the business is dependent on the domain continuing to accept incoming writes and deleting data is not an option, the team may choose to reconfigure immediately.

Extensions and adaptations

You may wish to modify the best practices encoded in the scripts for your own environment or workloads. It’s always better to avoid situations where alerts are generated but routinely ignored. All alerts should trigger a review and one or more actions, either immediately or at a planned date. The following is a list of common situations where you may wish to set different alarms for different domains:

  • Dev/test vs. production
    You may have a different set of configuration rules and alarms for your dev environment configurations than for test. For example, you may require zone awareness and dedicated masters for your production environment, but not for your development domains. Or, you may not have any alarms set in dev. For test environments that mirror your potential peak load, test to ensure that the alarms are appropriately triggered.
  • Differing workloads or SLAs for different domains
    You may have one domain with a requirement for superfast search performance, and another domain with a heavy ingest load that tolerates slower search response. Your reaction to slow response for these two workloads is likely to be different, so perhaps the thresholds for these two domains should be set at a different level. In this case, you might add a “max CPU utilization” alarm at 100% for 1 minute for the fast search domain, while the other domain only triggers an alarm when the average has been higher than 60% for 5 minutes. You might also add a “free space” rule with a higher threshold to reflect the need for more space for the heavy ingest load if there is danger that it could fill the available disk quickly.
  • “Normal” alarms versus “emergency” alarms
    If, for example, free disk space drops to 25% of total capacity, an alarm is triggered that indicates action should be taken as soon as possible, such as cleaning up old indexes or reconfiguring at the next quiet period for this domain. However, if free space drops below a critical level (20% free space), action must be taken immediately in order to prevent Amazon ES from setting the domain to read-only. Similarly, if the “ClusterIndexWritesBlocked” alarm triggers, the domain has already stopped accepting writes, so immediate action is needed. In this case, you may wish to set “laddered” alarms, where one threshold causes an alarm to be triggered to review the current workload for a planned reconfiguration, but a different threshold raises a “DefCon 3” alarm that immediate action is required.

The sample scripts provided here are a starting point, intended for you to adapt to your own environment and needs.

Running the scripts one time can identify how far your current state is from your desired state, and create an initial set of alarms. Regularly re-running these scripts can capture changes in your environment over time and adjusting your alarms for changes in your environment and configurations. One customer has set them up to run nightly, and to automatically create and update alarms to match their preferred settings.

Removing unwanted alarms

Each CloudWatch alarm costs approximately $0.10 per month. You can remove unwanted alarms in the CloudWatch console, under Alarms. If you set up a “ver” domain above, remember to remove it to avoid continuing charges.

Conclusion

Setting CloudWatch alarms appropriately for your Amazon ES domains can help you avoid suboptimal performance and allow you to respond to workload growth or configuration issues well before they become urgent. This post gives you a starting point for doing so. The additional sleep you’ll get knowing you don’t need to be concerned about Elasticsearch domain performance will allow you to focus on building creative solutions for your business and solving problems for your customers.

Enjoy!


Additional Reading

If you found this post useful, be sure to check out Analyzing Amazon Elasticsearch Service Slow Logs Using Amazon CloudWatch Logs Streaming and Kibana and Get Started with Amazon Elasticsearch Service: How Many Shards Do I Need?

 


About the Author

Dr. Veronika Megler is a senior consultant at Amazon Web Services. She works with our customers to implement innovative big data, AI and ML projects, helping them accelerate their time-to-value when using AWS.

 

 

 

Best Practices for Running Apache Kafka on AWS

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-kafka-on-aws/

This post was written in partnership with Intuit to share learnings, best practices, and recommendations for running an Apache Kafka cluster on AWS. Thanks to Vaishak Suresh and his colleagues at Intuit for their contribution and support.

Intuit, in their own words: Intuit, a leading enterprise customer for AWS, is a creator of business and financial management solutions. For more information on how Intuit partners with AWS, see our previous blog post, Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS. Apache Kafka is an open-source, distributed streaming platform that enables you to build real-time streaming applications.

The best practices described in this post are based on our experience in running and operating large-scale Kafka clusters on AWS for more than two years. Our intent for this post is to help AWS customers who are currently running Kafka on AWS, and also customers who are considering migrating on-premises Kafka deployments to AWS.

AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed.

Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments. However, given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this blog post, we cover the following aspects of running Kafka clusters on AWS:

  • Deployment considerations and patterns
  • Storage options
  • Instance types
  • Networking
  • Upgrades
  • Performance tuning
  • Monitoring
  • Security
  • Backup and restore

Note: While implementing Kafka clusters in a production environment, make sure also to consider factors like your number of messages, message size, monitoring, failure handling, and any operational issues.

Deployment considerations and patterns

In this section, we discuss various deployment options available for Kafka on AWS, along with pros and cons of each option. A successful deployment starts with thoughtful consideration of these options. Considering availability, consistency, and operational overhead of the deployment helps when choosing the right option.

Single AWS Region, Three Availability Zones, All Active

One typical deployment pattern (all active) is in a single AWS Region with three Availability Zones (AZs). One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and consumer instances as shown in the illustration following.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers and Kafka cluster are deployed on each AZ.
  • Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
  • Kafka consumers aggregate data from all three Kafka clusters.

Kafka cluster failover occurs this way:

  • Mark down all Kafka producers
  • Stop consumers
  • Debug and restack Kafka
  • Restart consumers
  • Restart Kafka producers

Following are the pros and cons of this pattern.

ProsCons
  • Highly available
  • Can sustain the failure of two AZs
  • No message loss during failover
  • Simple deployment

 

  • Very high operational overhead:
    • All changes need to be deployed three times, one for each Kafka cluster
    • Maintaining and monitoring three Kafka clusters
    • Maintaining and monitoring three consumer clusters

A restart is required for patching and upgrading brokers in a Kafka cluster. In this approach, a rolling upgrade is done separately for each cluster.

Single Region, Three Availability Zones, Active-Standby

Another typical deployment pattern (active-standby) is in a single AWS Region with a single Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka mirroring with MirrorMaker to replicate messages between any two clusters.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers are deployed on all three AZs.
  • Only one Kafka cluster is deployed across three AZs (active).
  • ZooKeeper instances are deployed on each AZ.
  • Brokers are spread evenly across all three AZs.
  • Kafka consumers can be deployed across all three AZs.
  • Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.

Kafka cluster failover occurs this way:

  • Switch traffic to standby Kafka producers cluster and Kafka cluster.
  • Restart consumers to consume from standby Kafka cluster.

Following are the pros and cons of this pattern.

ProsCons
  • Less operational overhead when compared to the first option
  • Only one Kafka cluster to manage and consume data from
  • Can handle single AZ failures without activating a standby Kafka cluster
  • Added latency due to cross-AZ data transfer among Kafka brokers
  • For Kafka versions before 0.10, replicas for topic partitions have to be assigned so they’re distributed to the brokers on different AZs (rack-awareness)
  • The cluster can become unavailable in case of a network glitch, where ZooKeeper does not see Kafka brokers
  • Possibility of in-transit message loss during failover

Intuit recommends using a single Kafka cluster in one AWS Region, with brokers distributing across three AZs (single region, three AZs). This approach offers stronger fault tolerance than otherwise, because a failed AZ won’t cause Kafka downtime.

Storage options

There are two storage options for file storage in Amazon EC2:

Ephemeral storage is local to the Amazon EC2 instance. It can provide high IOPS based on the instance type. On the other hand, Amazon EBS volumes offer higher resiliency and you can configure IOPS based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. Your choice of storage is closely related to the type of workload supported by your Kafka cluster.

Kafka provides built-in fault tolerance by replicating data partitions across a configurable number of instances. If a broker fails, you can recover it by fetching all the data from other brokers in the cluster that host the other replicas. Depending on the size of the data transfer, it can affect recovery process and network traffic. These in turn eventually affect the cluster’s performance.

The following table contrasts the benefits of using an instance store versus using EBS for storage.

Instance storeEBS
  • Instance storage is recommended for large- and medium-sized Kafka clusters. For a large cluster, read/write traffic is distributed across a high number of brokers, so the loss of a broker has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important, but a failed broker takes longer and requires more network traffic for a smaller Kafka cluster.
  • Storage-optimized instances like h1, i3, and d2 are an ideal choice for distributed applications like Kafka.

 

  • The primary advantage of using EBS in a Kafka deployment is that it significantly reduces data-transfer traffic when a broker fails or must be replaced. The replacement broker joins the cluster much faster.
  • Data stored on EBS is persisted in case of an instance failure or termination. The broker’s data stored on an EBS volume remains intact, and you can mount the EBS volume to a new EC2 instance. Most of the replicated data for the replacement broker is already available in the EBS volume and need not be copied over the network from another broker. Only the changes made after the original broker failure need to be transferred across the network. That makes this process much faster.

 

 

Intuit chose EBS because of their frequent instance restacking requirements and also other benefits provided by EBS.

Generally, Kafka deployments use a replication factor of three. EBS offers replication within their service, so Intuit chose a replication factor of two instead of three.

Instance types

The choice of instance types is generally driven by the type of storage required for your streaming applications on a Kafka cluster. If your application requires ephemeral storage, h1, i3, and d2 instances are your best option.

Intuit used r3.xlarge instances for their brokers and r3.large for ZooKeeper, with ST1 (throughput optimized HDD) EBS for their Kafka cluster.

Here are sample benchmark numbers from Intuit tests.

ConfigurationBroker bytes (MB/s)
  • r3.xlarge
  • ST1 EBS
  • 12 brokers
  • 12 partitions

 

Aggregate 346.9

If you need EBS storage, then AWS has a newer-generation r4 instance. The r4 instance is superior to R3 in many ways:

  • It has a faster processor (Broadwell).
  • EBS is optimized by default.
  • It features networking based on Elastic Network Adapter (ENA), with up to 10 Gbps on smaller sizes.
  • It costs 20 percent less than R3.

Note: It’s always best practice to check for the latest changes in instance types.

Networking

The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.

If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.

In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.

If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.

Upgrades

Kafka has a history of not being backward compatible, but its support of backward compatibility is getting better. During a Kafka upgrade, you should keep your producer and consumer clients on a version equal to or lower than the version you are upgrading from. After the upgrade is finished, you can start using a new protocol version and any new features it supports. There are three upgrade approaches available, discussed following.

Rolling or in-place upgrade

In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into consideration the recommendations for doing rolling restarts to avoid downtime for end users.

Downtime upgrade

If you can afford the downtime, you can take your entire cluster down, upgrade each Kafka broker, and then restart the cluster.

Blue/green upgrade

Intuit followed the blue/green deployment model for their workloads, as described following.

If you can afford to create a separate Kafka cluster and upgrade it, we highly recommend the blue/green upgrade scenario. In this scenario, we recommend that you keep your clusters up-to-date with the latest Kafka version. For additional details on Kafka version upgrades or more details, see the Kafka upgrade documentation.

The following illustration shows a blue/green upgrade.

In this scenario, the upgrade plan works like this:

  • Create a new Kafka cluster on AWS.
  • Create a new Kafka producers stack to point to the new Kafka cluster.
  • Create topics on the new Kafka cluster.
  • Test the green deployment end to end (sanity check).
  • Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the new green Kafka environment that you have created.

The roll-back plan works like this:

  • Switch Amazon Route 53 to the old Kafka producers stack on AWS to point to the old Kafka environment.

For additional details on blue/green deployment architecture using Kafka, see the re:Invent presentation Leveraging the Cloud with a Blue-Green Deployment Architecture.

Performance tuning

You can tune Kafka performance in multiple dimensions. Following are some best practices for performance tuning.

 These are some general performance tuning techniques:

  • If throughput is less than network capacity, try the following:
    • Add more threads
    • Increase batch size
    • Add more producer instances
    • Add more partitions
  • To improve latency when acks =-1, increase your num.replica.fetches value.
  • For cross-AZ data transfer, tune your buffer settings for sockets and for OS TCP.
  • Make sure that num.io.threads is greater than the number of disks dedicated for Kafka.
  • Adjust num.network.threads based on the number of producers plus the number of consumers plus the replication factor.
  • Your message size affects your network bandwidth. To get higher performance from a Kafka cluster, select an instance type that offers 10 Gb/s performance.

For Java and JVM tuning, try the following:

  • Minimize GC pauses by using the Oracle JDK, which uses the new G1 garbage-first collector.
  • Try to keep the Kafka heap size below 4 GB.

Monitoring

Knowing whether a Kafka cluster is working correctly in a production environment is critical. Sometimes, just knowing that the cluster is up is enough, but Kafka applications have many moving parts to monitor. In fact, it can easily become confusing to understand what’s important to watch and what you can set aside. Items to monitor range from simple metrics about the overall rate of traffic, to producers, consumers, brokers, controller, ZooKeeper, topics, partitions, messages, and so on.

For monitoring, Intuit used several tools, including Newrelec, Wavefront, Amazon CloudWatch, and AWS CloudTrail. Our recommended monitoring approach follows.

For system metrics, we recommend that you monitor:

  • CPU load
  • Network metrics
  • File handle usage
  • Disk space
  • Disk I/O performance
  • Garbage collection
  • ZooKeeper

For producers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

For consumers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

Security

Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved. Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper. The following tells you more about the Intuit approach. For details on Kafka security not covered in this section, see the Kafka documentation.

Encryption at rest

For EBS-backed EC2 instances, you can enable encryption at rest by using Amazon EBS volumes with encryption enabled. Amazon EBS uses AWS Key Management Service (AWS KMS) for encryption. For more details, see Amazon EBS Encryption in the EBS documentation. For instance store–backed EC2 instances, you can enable encryption at rest by using Amazon EC2 instance store encryption.

Encryption in transit

Kafka uses TLS for client and internode communications.

Authentication

Authentication of connections to brokers from clients (producers and consumers) to other brokers and tools uses either Secure Sockets Layer (SSL) or Simple Authentication and Security Layer (SASL).

Kafka supports Kerberos authentication. If you already have a Kerberos server, you can add Kafka to your current configuration.

Authorization

In Kafka, authorization is pluggable and integration with external authorization services is supported.

Backup and restore

The type of storage used in your deployment dictates your backup and restore strategy.

The best way to back up a Kafka cluster based on instance storage is to set up a second cluster and replicate messages using MirrorMaker. Kafka’s mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. Depending on your setup and requirements, your backup cluster might be in the same AWS Region as your main cluster or in a different one.

For EBS-based deployments, you can enable automatic snapshots of EBS volumes to back up volumes. You can easily create new EBS volumes from these snapshots to restore. We recommend storing backup files in Amazon S3.

For more information on how to back up in Kafka, see the Kafka documentation.

Conclusion

In this post, we discussed several patterns for running Kafka in the AWS Cloud. AWS also provides an alternative managed solution with Amazon Kinesis Data Streams, there are no servers to manage or scaling cliffs to worry about, you can scale the size of your streaming pipeline in seconds without downtime, data replication across availability zones is automatic, you benefit from security out of the box, Kinesis Data Streams is tightly integrated with a wide variety of AWS services like Lambda, Redshift, Elasticsearch and it supports open source frameworks like Storm, Spark, Flink, and more. You may refer to kafka-kinesis connector.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics.


About the Author

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

Dotcom: Obama Admitted “Mistakes Were Made” in Megaupload Case

Post Syndicated from Andy original https://torrentfreak.com/dotcom-obama-admitted-mistakes-were-made-in-megaupload-case-180301/

When Megaupload was forcefully shut down in 2012, it initially appeared like ‘just’ another wave of copyright enforcement action by US authorities.

When additional details began to filter through, the reality of what had happened was nothing short of extraordinary.

Not only were large numbers of Megaupload servers and millions of dollars seized, but Kim Dotcom’s home in New Zealand was subjected to a military-style raid comprised of helicopters and dozens of heavily armed special tactics police. The whole thing was monitored live by the FBI.

Few people who watched the events of that now-infamous January day unfold came to the conclusion this was a routine copyright-infringement case. According to Kim Dotcom, whose life had just been turned upside down, something of this scale must’ve filtered down from the very top of the US government. It was hard to disagree.

At the time, Dotcom told TorrentFreak that then-Vice President Joe Biden directed attorney Neil MacBride to target the cloud storage site and ever since the Megaupload founder has leveled increasingly serious allegations at officials of the former government of Barack Obama.

For example, Dotcom says that since the US would have difficulty gaining access to him in his former home of Hong Kong, the government of New Zealand was persuaded to welcome him in, knowing they would eventually turn him over to the United States. More recently he’s been turning up the pressure again, such as a tweet on February 20th which cast more light on that process.

“Joe Biden had a White House meeting with an ‘extradition expert’ who worked for Hong Kong police and a handful of Hollywood executives to discuss my case. A week prior to this meeting Neil MacBride hand-delivered his action plan to Biden’s chief of staff, also at the White House,” Dotcom wrote.

But this claim is just the tip of an extremely large iceberg that’s involved illegal spying on Dotcom in New Zealand and a dizzying array of legal battles that are set to go on for years to come. But perhaps of most interest now is that rather than wilting away under the pressure, Dotcom appears to be just warming up.

A few hours ago Dotcom commented on an article published in The Hill which revealed that Barack Obama will visit New Zealand in March, possibly to celebrate the opening of Air New Zealand’s new route to the U.S.

Rather than expressing disappointment, the Megaupload founder seemed pleased that the former president would be touching down next month.

“Great. I’ll have a Court subpoena waiting for him in New Zealand,” Dotcom wrote.

But that was just a mere hors d’oeuvre, with the main course was yet to come. But come it did.

“A wealthy Asian Megaupload shareholder hired a friend of the Obamas to enquire about our case. This person was recommended by a member of the Chinese politburo ‘if you want to get to Obama directly’. We did,” Dotcom revealed.

Dotcom says he’ll release a transcript detailing what Obama told his friend on March 21 when Obama arrives in town but in the meantime, he offered another little taster.

“Mistakes were made. It hasn’t gone well,” Obama reportedly told the person reporting back to Megaupload. “It’s a problem. I’ll see to it after the election.”

Of course, Obama’s position after the election was much different to what had gone before, but that didn’t stop Dotcom’s associates infiltrating the process aimed at keeping the Democrats in power.

“Our friendly Obama contact smuggled an @EFF lawyer into a re-election fundraiser hosted by former Vice President Joe Biden,” he revealed.

“When Biden was asked about the Megaupload case he bragged that it was his case and that he ‘took care of it’,” which is what Dotcom has been claiming all along.

On March 21, when Obama lands in New Zealand, Dotcom says he’ll be waiting.

“I’m looking forward to @BarackObama providing some insight into the political dimension of the Megaupload case when he arrives in the New Zealand jurisdiction,” he teased.

Better get the popcorn ready….

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Troubleshooting event publishing issues in Amazon SES

Post Syndicated from Dustin Taylor original https://aws.amazon.com/blogs/ses/troubleshooting-event-publishing-issues-in-amazon-ses/

Over the past year, we’ve released several features that make it easier to track the metrics that are associated with your Amazon SES account. The first of these features, launched in November of last year, was event publishing.

Initially, event publishing let you capture basic metrics related to your email sending and publish them to other AWS services, such as Amazon CloudWatch and Amazon Kinesis Data Firehose. Some examples of these basic metrics include the number of emails that were sent and delivered, as well as the number that bounced or received complaints. A few months ago, we expanded this feature by adding engagement metrics—specifically, information about the number of emails that your customers opened or engaged with by clicking links.

As a former Cloud Support Engineer, I’ve seen Amazon SES customers do some amazing things with event publishing, but I’ve also seen some common issues. In this article, we look at some of these issues, and discuss the steps you can take to resolve them.

Before we begin

This post assumes that your Amazon SES account is already out of the sandbox, that you’ve verified an identity (such as an email address or domain), and that you have the necessary permissions to use Amazon SES and the service that you’ll publish event data to (such as Amazon SNS, CloudWatch, or Kinesis Data Firehose).

We also assume that you’re familiar with the process of creating configuration sets and specifying event destinations for those configuration sets. For more information, see Using Amazon SES Configuration Sets in the Amazon SES Developer Guide.

Amazon SNS event destinations

If you want to receive notifications when events occur—such as when recipients click a link in an email, or when they report an email as spam—you can use Amazon SNS as an event destination.

Occasionally, customers ask us why they’re not receiving notifications when they use an Amazon SNS topic as an event destination. One of the most common reasons for this issue is that they haven’t configured subscriptions for their Amazon SNS topic yet.

A single topic in Amazon SNS can have one or more subscriptions. When you subscribe to a topic, you tell that topic which endpoints (such as email addresses or mobile phone numbers) to contact when it receives a notification. If you haven’t set up any subscriptions, nothing will happen when an email event occurs.

For more information about setting up topics and subscriptions, see Getting Started in the Amazon SNS Developer Guide. For information about publishing Amazon SES events to Amazon SNS topics, see Set Up an Amazon SNS Event Destination for Amazon SES Event Publishing in the Amazon SES Developer Guide.

Kinesis Data Firehose event destinations

If you want to store your Amazon SES event data for the long term, choose Amazon Kinesis Data Firehose as a destination for Amazon SES events. With Kinesis Data Firehose, you can stream data to Amazon S3 or Amazon Redshift for storage and analysis.

The process of setting up Kinesis Data Firehose as an event destination is similar to the process for setting up Amazon SNS: you choose the types of events (such as deliveries, opens, clicks, or bounces) that you want to export, and the name of the Kinesis Data Firehose stream that you want to export to. However, there’s one important difference. When you set up a Kinesis Data Firehose event destination, you must also choose the IAM role that Amazon SES uses to send event data to Kinesis Data Firehose.

When you set up the Kinesis Data Firehose event destination, you can choose to have Amazon SES create the IAM role for you automatically. For many users, this is the best solution—it ensures that the IAM role has the appropriate permissions to move event data from Amazon SES to Kinesis Data Firehose.

Customers occasionally run into issues with the Kinesis Data Firehose event destination when they use an existing IAM role. If you use an existing IAM role, or create a new role for this purpose, make sure that the role includes the firehose:PutRecord and firehose:PutRecordBatch permissions. If the role doesn’t include these permissions, then the Amazon SES event data isn’t published to Kinesis Data Firehose. For more information, see Controlling Access with Amazon Kinesis Data Firehose in the Amazon Kinesis Data Firehose Developer Guide.

CloudWatch event destinations

By publishing your Amazon SES event data to Amazon CloudWatch, you can create dashboards that track your sending statistics in real time, as well as alarms that notify you when your event metrics reach certain thresholds.

The amount that you’re charged for using CloudWatch is based on several factors, including the number of metrics you use. In order to give you more control over the specific metrics you send to CloudWatch—and to help you avoid unexpected charges—you can limit the email sending events that are sent to CloudWatch.

When you choose CloudWatch as an event destination, you must choose a value source. The value source can be one of three options: a message tag, a link tag, or an email header. After you choose a value source, you then specify a name and a value. When you send an email using a configuration set that refers to a CloudWatch event destination, it only sends the metrics for that email to CloudWatch if the email contains the name and value that you specified as the value source. This requirement is commonly overlooked.

For example, assume that you chose Message Tag as the value source, and specified “CategoryId” as the dimension name and “31415” as the dimension value. When you want to send events for an email to CloudWatch, you must specify the name of the configuration set that uses the CloudWatch destination. You must also include a tag in your message. The name of the tag must be “CategoryId” and the value must be “31415”.

For more information about adding tags and email headers to your messages, see Send Email Using Amazon SES Event Publishing in the Amazon SES Developer Guide. For more information about adding tags to links, see Amazon SES Email Sending Metrics FAQs in the Amazon SES Developer Guide.

Troubleshooting event publishing for open and click data

Occasionally, customers ask why they’re not seeing open and click data for their emails. This issue most often occurs when the customer only sends text versions of their emails. Because of the way Amazon SES tracks open and click events, you can only see open and click data for emails that are sent as HTML. For more information about how Amazon SES modifies your emails when you enable open and click tracking, see Amazon SES Email Sending Metrics FAQs in the Amazon SES Developer Guide.

The process that you use to send HTML emails varies based on the email sending method you use. The Code Examples section of the Amazon SES Developer Guide contains examples of several methods of sending email by using the Amazon SES SMTP interface or an AWS SDK. All of the examples in this section include methods for sending HTML (as well as text-only) emails.

If you encounter any issues that weren’t covered in this post, please open a case in the Support Center and we’d be more than happy to assist.

Create SLUG! It’s just like Snake, but with a slug

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/slug-snake/

Recreate Snake, the favourite mobile phone game from the late nineties, using a slug*, a Raspberry Pi, a Sense HAT, and our free resource!

Raspberry Pi Sense HAT Slug free resource

*A virtual slug. Not a real slug. Please leave the real slugs out in nature.

Snake SLUG!

Move aside, Angry Birds! On your bike, Pokémon Go! When it comes to the cream of the crop of mobile phone games, Snake holds the top spot.

Snake Nokia Game

I could while away the hours…

You may still have an old Nokia 3310 lost in the depths of a drawer somewhere — the drawer that won’t open all the way because something inside is jammed at an odd angle. So it will be far easier to grab your Pi and Sense HAT, or use the free Sense HAT emulator (online or on Raspbian), and code Snake SLUG yourself. In doing so, you can introduce the smaller residents of your household to the best reptile-focused game ever made…now with added mollusc.

The resource

To try out the game for yourself, head to our resource page, where you’ll find the online Sense HAT emulator embedded and ready to roll.

Raspberry Pi Sense HAT Slug free resource

It’ll look just like this, and you can use your computer’s arrow keys to direct your slug toward her tasty treats.

From there, you’ll be taken on a step-by-step journey from zero to SLUG glory while coding your own versionof the game in Python. On the way, you’ll learn to work with two-dimensional lists and to use the Sense HAT’s pixel display and joystick input. And by completing the resource, you’ll expand your understanding of applying abstraction and decomposition to solve more complex problems, in line with our Digital Making Curriculum.

The Sense HAT

The Raspberry Pi Sense HAT was originally designed and made as part of the Astro Pi mission in December 2015. With an 8×8 RGB LED matrix, a joystick, and a plethora of on-board sensors including an accelerometer, gyroscope, and magnetometer, it’s a great add-on for your digital making toolkit, and excellent for projects involving data collection and evaluation.

You can find more of our free Sense HAT tutorials here, including for making Flappy Bird Astronaut, a marble maze, and Pong.

The post Create SLUG! It’s just like Snake, but with a slug appeared first on Raspberry Pi.

Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

  • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
  • Save data in an Amazon S3
  • Query data using Amazon Redshift Spectrum.

We use the following services:

Serverless architecture for capturing and analyzing Aurora data changes

Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

The following diagram shows the flow of data as it occurs in this tutorial:

The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

Creating an Aurora database

First, create a database by following these steps in the Amazon RDS console:

  1. Sign in to the AWS Management Console, and open the Amazon RDS console.
  2. Choose Launch a DB instance, and choose Next.
  3. For Engine, choose Amazon Aurora.
  4. Choose a DB instance class. This example uses a small, since this is not a production database.
  5. In Multi-AZ deployment, choose No.
  6. Configure DB instance identifier, Master username, and Master password.
  7. Launch the DB instance.

After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

The following screenshot shows the MySQL Workbench configuration:

Next, create a table in the database by running the following SQL statement:

Create Table
CREATE TABLE Sales (
InvoiceID int NOT NULL AUTO_INCREMENT,
ItemID int NOT NULL,
Category varchar(255),
Price double(10,2), 
Quantity int not NULL,
OrderDate timestamp,
DestinationState varchar(2),
ShippingType varchar(255),
Referral varchar(255),
PRIMARY KEY (InvoiceID)
)

You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

#!/usr/bin/python
import MySQLdb
import random
import datetime

db = MySQLdb.connect(host="AURORA_CNAME",
                     user="DBUSER",
                     passwd="DBPASSWORD",
                     db="DB")

states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",
"IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ",
"NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VT","VA",
"WA","WV","WI","WY")

shipping_types = ("Free", "3-Day", "2-Day")

product_categories = ("Garden", "Kitchen", "Office", "Household")
referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")

for i in range(0,10):
    item_id = random.randint(1,100)
    state = states[random.randint(0,len(states)-1)]
    shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
    product_category = product_categories[random.randint(0,len(product_categories)-1)]
    quantity = random.randint(1,4)
    referral = referrals[random.randint(0,len(referrals)-1)]
    price = random.randint(1,100)
    order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()

    data_order = (item_id, product_category, price, quantity, order_date, state,
    shipping_type, referral)

    add_order = ("INSERT INTO Sales "
                   "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                   ShippingType, Referral) "
                   "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")

    cursor = db.cursor()
    cursor.execute(add_order, data_order)

    db.commit()

cursor.close()
db.close() 

The following screenshot shows how the table appears with the sample data:

Sending data from Amazon Aurora to Amazon S3

There are two methods available to send data from Amazon Aurora to Amazon S3:

  • Using a Lambda function
  • Using SELECT INTO OUTFILE S3

To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

Creating a Kinesis data delivery stream

The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

To create a delivery stream:

  1. Open the Kinesis Data Firehose console
  2. Choose Create delivery stream.
  3. For Delivery stream name, type AuroraChangesToS3.
  4. For Source, choose Direct PUT.
  5. For Record transformation, choose Disabled.
  6. For Destination, choose Amazon S3.
  7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
  8. Enter a prefix if needed, and choose Next.
  9. For Data compression, choose GZIP.
  10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
  11. Review all the details on the screen, and choose Create delivery stream when you’re finished.

 

Creating a Lambda function

Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

To create the Lambda function:

  1. Open the AWS Lambda console.
  2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
  3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
  4. Choose Author from scratch.
  5. Give your function a name and select Python 3.6 for Runtime
  6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
  7. Choose Next on the trigger selection screen.
  8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
  9. Choose File -> Save in the code editor and then choose Save.
import boto3
import json

firehose = boto3.client('firehose')
stream_name = ‘AuroraChangesToS3’


def Kinesis_publish_message(event, context):
    
    firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
    event['Category'], event['Price'], event['Quantity'],
    event['OrderDate'], event['DestinationState'], event['ShippingType'], 
    event['Referral']))
    
    firehose_data = {'Data': str(firehose_data)}
    print(firehose_data)
    
    firehose.put_record(DeliveryStreamName=stream_name,
    Record=firehose_data)

Note the Amazon Resource Name (ARN) of this Lambda function.

Giving Aurora permissions to invoke a Lambda function

To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

Creating a stored procedure and a trigger in Amazon Aurora

Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

DROP PROCEDURE IF EXISTS CDC_TO_FIREHOSE;
DELIMITER ;;
CREATE PROCEDURE CDC_TO_FIREHOSE (IN ItemID VARCHAR(255), 
									IN Category varchar(255), 
									IN Price double(10,2),
                                    IN Quantity int(11),
                                    IN OrderDate timestamp,
                                    IN DestinationState varchar(2),
                                    IN ShippingType varchar(255),
                                    IN Referral  varchar(255)) LANGUAGE SQL 
BEGIN
  CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
     CONCAT('{ "ItemID" : "', ItemID, 
            '", "Category" : "', Category,
            '", "Price" : "', Price,
            '", "Quantity" : "', Quantity, 
            '", "OrderDate" : "', OrderDate, 
            '", "DestinationState" : "', DestinationState, 
            '", "ShippingType" : "', ShippingType, 
            '", "Referral" : "', Referral, '"}')
     );
END
;;
DELIMITER ;

Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

DROP TRIGGER IF EXISTS TR_Sales_CDC;
 
DELIMITER ;;
CREATE TRIGGER TR_Sales_CDC
  AFTER INSERT ON Sales
  FOR EACH ROW
BEGIN
  SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
  , New.DestinationState, New.ShippingType, New.Referral
  INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral;
  CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral);
END
;;
DELIMITER ;

If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

Querying data in Amazon Redshift

In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

Next, connect to the Amazon Redshift cluster, and create an external schema and database:

create external schema if not exists spectrum_schema
from data catalog 
database 'spectrum_db' 
region 'us-east-1'
IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
create external database if not exists;

Don’t forget to replace the IAM role in the statement.

Then create an external table within the database:

 CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
  ItemID int,
  Category varchar,
  Price DOUBLE PRECISION,
  Quantity int,
  OrderDate TIMESTAMP,
  DestinationState varchar,
  ShippingType varchar,
  Referral varchar)
ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://{BUCKET_NAME}/CDC/'

Query the table, and it should contain data. This is a fact table.

select top 10 * from spectrum_schema.ecommerce_sales

 

Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

CREATE TABLE date_dimension (
  d_datekey           integer       not null sortkey,
  d_dayofmonth        integer       not null,
  d_monthnum          integer       not null,
  d_dayofweek                varchar(10)   not null,
  d_prettydate        date       not null,
  d_quarter           integer       not null,
  d_half              integer       not null,
  d_year              integer       not null,
  d_season            varchar(10)   not null,
  d_fiscalyear        integer       not null)
diststyle all;

Populate the table with data:

copy date_dimension from 's3://reparmar-lab/2016dates' 
iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
DELIMITER ','
dateformat 'auto';

The date dimension table should look like the following:

Querying data in local and external tables using Amazon Redshift

Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

select sum(quantity*price) as total_sales, date_dimension.d_season
from spectrum_schema.ecommerce_sales 
join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
group by date_dimension.d_season

You get the following results:

Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

Enter the database connection details, validate the connection, and create the data source.

Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

On the next screen, choose Edit.

In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

Final notes

Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

Keep the following in mind:

  • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
  • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
  • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum


About the Authors

Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.

 

 

 

Random with care

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/02/random-with-care/

Hi! Here are a few loose thoughts about picking random numbers.

A word about crypto

DON’T ROLL YOUR OWN CRYPTO

This is all aimed at frivolous pursuits like video games. Hell, even video games where money is at stake should be deferring to someone who knows way more than I do. Otherwise you might find out that your deck shuffles in your poker game are woefully inadequate and some smartass is cheating you out of millions. (If your random number generator has fewer than 226 bits of state, it can’t even generate every possible shuffling of a deck of cards!)

Use the right distribution

Most languages have a random number primitive that spits out a number uniformly in the range [0, 1), and you can go pretty far with just that. But beware a few traps!

Random pitches

Say you want to pitch up a sound by a random amount, perhaps up to an octave. Your audio API probably has a way to do this that takes a pitch multiplier, where I say “probably” because that’s how the only audio API I’ve used works.

Easy peasy. If 1 is unchanged and 2 is pitched up by an octave, then all you need is rand() + 1. Right?

No! Pitch is exponential — within the same octave, the “gap” between C and C♯ is about half as big as the gap between B and the following C. If you pick a pitch multiplier uniformly, you’ll have a noticeable bias towards the higher pitches.

One octave corresponds to a doubling of pitch, so if you want to pick a random note, you want 2 ** rand().

Random directions

For two dimensions, you can just pick a random angle with rand() * TAU.

If you want a vector rather than an angle, or if you want a random direction in three dimensions, it’s a little trickier. You might be tempted to just pick a random point where each component is rand() * 2 - 1 (ranging from −1 to 1), but that’s not quite right. A direction is a point on the surface (or, equivalently, within the volume) of a sphere, and picking each component independently produces a point within the volume of a cube; the result will be a bias towards the corners of the cube, where there’s much more extra volume beyond the sphere.

No? Well, just trust me. I don’t know how to make a diagram for this.

Anyway, you could use the Pythagorean theorem a few times and make a huge mess of things, or it turns out there’s a really easy way that even works for two or four or any number of dimensions. You pick each coordinate from a Gaussian (normal) distribution, then normalize the resulting vector. In other words, using Python’s random module:

1
2
3
4
5
6
def random_direction():
    x = random.gauss(0, 1)
    y = random.gauss(0, 1)
    z = random.gauss(0, 1)
    r = math.sqrt(x*x + y*y + z*z)
    return x/r, y/r, z/r

Why does this work? I have no idea!

Note that it is possible to get zero (or close to it) for every component, in which case the result is nonsense. You can re-roll all the components if necessary; just check that the magnitude (or its square) is less than some epsilon, which is equivalent to throwing away a tiny sphere at the center and shouldn’t affect the distribution.

Beware Gauss

Since I brought it up: the Gaussian distribution is a pretty nice one for choosing things in some range, where the middle is the common case and should appear more frequently.

That said, I never use it, because it has one annoying drawback: the Gaussian distribution has no minimum or maximum value, so you can’t really scale it down to the range you want. In theory, you might get any value out of it, with no limit on scale.

In practice, it’s astronomically rare to actually get such a value out. I did a hundred million trials just to see what would happen, and the largest value produced was 5.8.

But, still, I’d rather not knowingly put extremely rare corner cases in my code if I can at all avoid it. I could clamp the ends, but that would cause unnatural bunching at the endpoints. I could reroll if I got a value outside some desired range, but I prefer to avoid rerolling when I can, too; after all, it’s still (astronomically) possible to have to reroll for an indefinite amount of time. (Okay, it’s really not, since you’ll eventually hit the period of your PRNG. Still, though.) I don’t bend over backwards here — I did just say to reroll when picking a random direction, after all — but when there’s a nicer alternative I’ll gladly use it.

And lo, there is a nicer alternative! Enter the beta distribution. It always spits out a number in [0, 1], so you can easily swap it in for the standard normal function, but it takes two “shape” parameters α and β that alter its behavior fairly dramatically.

With α = β = 1, the beta distribution is uniform, i.e. no different from rand(). As α increases, the distribution skews towards the right, and as β increases, the distribution skews towards the left. If α = β, the whole thing is symmetric with a hump in the middle. The higher either one gets, the more extreme the hump (meaning that value is far more common than any other). With a little fiddling, you can get a number of interesting curves.

Screenshots don’t really do it justice, so here’s a little Wolfram widget that lets you play with α and β live:

Note that if α = 1, then 1 is a possible value; if β = 1, then 0 is a possible value. You probably want them both greater than 1, which clamps the endpoints to zero.

Also, it’s possible to have either α or β or both be less than 1, but this creates very different behavior: the corresponding endpoints become poles.

Anyway, something like α = β = 3 is probably close enough to normal for most purposes but already clamped for you. And you could easily replicate something like, say, NetHack’s incredibly bizarre rnz function.

Random frequency

Say you want some event to have an 80% chance to happen every second. You (who am I kidding, I) might be tempted to do something like this:

1
2
if random() < 0.8 * dt:
    do_thing()

In an ideal world, dt is always the same and is equal to 1 / f, where f is the framerate. Replace that 80% with a variable, say P, and every tic you have a P / f chance to do the… whatever it is.

Each second, f tics pass, so you’ll make this check f times. The chance that any check succeeds is the inverse of the chance that every check fails, which is \(1 – \left(1 – \frac{P}{f}\right)^f\).

For P of 80% and a framerate of 60, that’s a total probability of 55.3%. Wait, what?

Consider what happens if the framerate is 2. On the first tic, you roll 0.4 twice — but probabilities are combined by multiplying, and splitting work up by dt only works for additive quantities. You lose some accuracy along the way. If you’re dealing with something that multiplies, you need an exponent somewhere.

But in this case, maybe you don’t want that at all. Each separate roll you make might independently succeed, so it’s possible (but very unlikely) that the event will happen 60 times within a single second! Or 200 times, if that’s someone’s framerate.

If you explicitly want something to have a chance to happen on a specific interval, you have to check on that interval. If you don’t have a gizmo handy to run code on an interval, it’s easy to do yourself with a time buffer:

1
2
3
4
5
6
timer += dt
# here, 1 is the "every 1 seconds"
while timer > 1:
    timer -= 1
    if random() < 0.8:
        do_thing()

Using while means rolls still happen even if you somehow skipped over an entire second.

(For the curious, and the nerds who already noticed: the expression \(1 – \left(1 – \frac{P}{f}\right)^f\) converges to a specific value! As the framerate increases, it becomes a better and better approximation for \(1 – e^{-P}\), which for the example above is 0.551. Hey, 60 fps is pretty accurate — it’s just accurately representing something nowhere near what I wanted. Er, you wanted.)

Rolling your own

Of course, you can fuss with the classic [0, 1] uniform value however you want. If I want a bias towards zero, I’ll often just square it, or multiply two of them together. If I want a bias towards one, I’ll take a square root. If I want something like a Gaussian/normal distribution, but with clearly-defined endpoints, I might add together n rolls and divide by n. (The normal distribution is just what you get if you roll infinite dice and divide by infinity!)

It’d be nice to be able to understand exactly what this will do to the distribution. Unfortunately, that requires some calculus, which this post is too small to contain, and which I didn’t even know much about myself until I went down a deep rabbit hole while writing, and which in many cases is straight up impossible to express directly.

Here’s the non-calculus bit. A source of randomness is often graphed as a PDF — a probability density function. You’ve almost certainly seen a bell curve graphed, and that’s a PDF. They’re pretty nice, since they do exactly what they look like: they show the relative chance that any given value will pop out. On a bog standard bell curve, there’s a peak at zero, and of course zero is the most common result from a normal distribution.

(Okay, actually, since the results are continuous, it’s vanishingly unlikely that you’ll get exactly zero — but you’re much more likely to get a value near zero than near any other number.)

For the uniform distribution, which is what a classic rand() gives you, the PDF is just a straight horizontal line — every result is equally likely.


If there were a calculus bit, it would go here! Instead, we can cheat. Sometimes. Mathematica knows how to work with probability distributions in the abstract, and there’s a free web version you can use. For the example of squaring a uniform variable, try this out:

1
PDF[TransformedDistribution[u^2, u \[Distributed] UniformDistribution[{0, 1}]], u]

(The \[Distributed] is a funny tilde that doesn’t exist in Unicode, but which Mathematica uses as a first-class operator. Also, press shiftEnter to evaluate the line.)

This will tell you that the distribution is… \(\frac{1}{2\sqrt{u}}\). Weird! You can plot it:

1
Plot[%, {u, 0, 1}]

(The % refers to the result of the last thing you did, so if you want to try several of these, you can just do Plot[PDF[…], u] directly.)

The resulting graph shows that numbers around zero are, in fact, vastly — infinitely — more likely than anything else.

What about multiplying two together? I can’t figure out how to get Mathematica to understand this, but a great amount of digging revealed that the answer is -ln x, and from there you can plot them both on Wolfram Alpha. They’re similar, though squaring has a much better chance of giving you high numbers than multiplying two separate rolls — which makes some sense, since if either of two rolls is a low number, the product will be even lower.

What if you know the graph you want, and you want to figure out how to play with a uniform roll to get it? Good news! That’s a whole thing called inverse transform sampling. All you have to do is take an integral. Good luck!


This is all extremely ridiculous. New tactic: Just Simulate The Damn Thing. You already have the code; run it a million times, make a histogram, and tada, there’s your PDF. That’s one of the great things about computers! Brute-force numerical answers are easy to come by, so there’s no excuse for producing something like rnz. (Though, be sure your histogram has sufficiently narrow buckets — I tried plotting one for rnz once and the weird stuff on the left side didn’t show up at all!)

By the way, I learned something from futzing with Mathematica here! Taking the square root (to bias towards 1) gives a PDF that’s a straight diagonal line, nothing like the hyperbola you get from squaring (to bias towards 0). How do you get a straight line the other way? Surprise: \(1 – \sqrt{1 – u}\).

Okay, okay, here’s the actual math

I don’t claim to have a very firm grasp on this, but I had a hell of a time finding it written out clearly, so I might as well write it down as best I can. This was a great excuse to finally set up MathJax, too.

Say \(u(x)\) is the PDF of the original distribution and \(u\) is a representative number you plucked from that distribution. For the uniform distribution, \(u(x) = 1\). Or, more accurately,

$$
u(x) = \begin{cases}
1 & \text{ if } 0 \le x \lt 1 \\
0 & \text{ otherwise }
\end{cases}
$$

Remember that \(x\) here is a possible outcome you want to know about, and the PDF tells you the relative probability that a roll will be near it. This PDF spits out 1 for every \(x\), meaning every number between 0 and 1 is equally likely to appear.

We want to do something to that PDF, which creates a new distribution, whose PDF we want to know. I’ll use my original example of \(f(u) = u^2\), which creates a new PDF \(v(x)\).

The trick is that we need to work in terms of the cumulative distribution function for \(u\). Where the PDF gives the relative chance that a roll will be (“near”) a specific value, the CDF gives the relative chance that a roll will be less than a specific value.

The conventions for this seem to be a bit fuzzy, and nobody bothers to explain which ones they’re using, which makes this all the more confusing to read about… but let’s write the CDF with a capital letter, so we have \(U(x)\). In this case, \(U(x) = x\), a straight 45° line (at least between 0 and 1). With the definition I gave, this should make sense. At some arbitrary point like 0.4, the value of the PDF is 1 (0.4 is just as likely as anything else), and the value of the CDF is 0.4 (you have a 40% chance of getting a number from 0 to 0.4).

Calculus ahoy: the PDF is the derivative of the CDF, which means it measures the slope of the CDF at any point. For \(U(x) = x\), the slope is always 1, and indeed \(u(x) = 1\). See, calculus is easy.

Okay, so, now we’re getting somewhere. What we want is the CDF of our new distribution, \(V(x)\). The CDF is defined as the probability that a roll \(v\) will be less than \(x\), so we can literally write:

$$V(x) = P(v \le x)$$

(This is why we have to work with CDFs, rather than PDFs — a PDF gives the chance that a roll will be “nearby,” whatever that means. A CDF is much more concrete.)

What is \(v\), exactly? We defined it ourselves; it’s the do something applied to a roll from the original distribution, or \(f(u)\).

$$V(x) = P\!\left(f(u) \le x\right)$$

Now the first tricky part: we have to solve that inequality for \(u\), which means we have to do something, backwards to \(x\).

$$V(x) = P\!\left(u \le f^{-1}(x)\right)$$

Almost there! We now have a probability that \(u\) is less than some value, and that’s the definition of a CDF!

$$V(x) = U\!\left(f^{-1}(x)\right)$$

Hooray! Now to turn these CDFs back into PDFs, all we need to do is differentiate both sides and use the chain rule. If you never took calculus, don’t worry too much about what that means!

$$v(x) = u\!\left(f^{-1}(x)\right)\left|\frac{d}{dx}f^{-1}(x)\right|$$

Wait! Where did that absolute value come from? It takes care of whether \(f(x)\) increases or decreases. It’s the least interesting part here by far, so, whatever.

There’s one more magical part here when using the uniform distribution — \(u(\dots)\) is always equal to 1, so that entire term disappears! (Note that this only works for a uniform distribution with a width of 1; PDFs are scaled so the entire area under them sums to 1, so if you had a rand() that could spit out a number between 0 and 2, the PDF would be \(u(x) = \frac{1}{2}\).)

$$v(x) = \left|\frac{d}{dx}f^{-1}(x)\right|$$

So for the specific case of modifying the output of rand(), all we have to do is invert, then differentiate. The inverse of \(f(u) = u^2\) is \(f^{-1}(x) = \sqrt{x}\) (no need for a ± since we’re only dealing with positive numbers), and differentiating that gives \(v(x) = \frac{1}{2\sqrt{x}}\). Done! This is also why square root comes out nicer; inverting it gives \(x^2\), and differentiating that gives \(2x\), a straight line.

Incidentally, that method for turning a uniform distribution into any distribution — inverse transform sampling — is pretty much the same thing in reverse: integrate, then invert. For example, when I saw that taking the square root gave \(v(x) = 2x\), I naturally wondered how to get a straight line going the other way, \(v(x) = 2 – 2x\). Integrating that gives \(2x – x^2\), and then you can use the quadratic formula (or just ask Wolfram Alpha) to solve \(2x – x^2 = u\) for \(x\) and get \(f(u) = 1 – \sqrt{1 – u}\).

Multiply two rolls is a bit more complicated; you have to write out the CDF as an integral and you end up doing a double integral and wow it’s a mess. The only thing I’ve retained is that you do a division somewhere, which then gets integrated, and that’s why it ends up as \(-\ln x\).

And that’s quite enough of that! (Okay but having math in my blog is pretty cool and I will definitely be doing more of this, sorry, not sorry.)

Random vs varied

Sometimes, random isn’t actually what you want. We tend to use the word “random” casually to mean something more like chaotic, i.e., with no discernible pattern. But that’s not really random. In fact, given how good humans can be at finding incidental patterns, they aren’t all that unlikely! Consider that when you roll two dice, they’ll come up either the same or only one apart almost half the time. Coincidence? Well, yes.

If you ask for randomness, you’re saying that any outcome — or series of outcomes — is acceptable, including five heads in a row or five tails in a row. Most of the time, that’s fine. Some of the time, it’s less fine, and what you really want is variety. Here are a couple examples and some fairly easy workarounds.

NPC quips

The nature of games is such that NPCs will eventually run out of things to say, at which point further conversation will give the player a short brush-off quip — a slight nod from the designer to the player that, hey, you hit the end of the script.

Some NPCs have multiple possible quips and will give one at random. The trouble with this is that it’s very possible for an NPC to repeat the same quip several times in a row before abruptly switching to another one. With only a few options to choose from, getting the same option twice or thrice (especially across an entire game, which may have numerous NPCs) isn’t all that unlikely. The notion of an NPC quip isn’t very realistic to start with, but having someone repeat themselves and then abruptly switch to something else is especially jarring.

The easy fix is to show the quips in order! Paradoxically, this is more consistently varied than choosing at random — the original “order” is likely to be meaningless anyway, and it already has the property that the same quip can never appear twice in a row.

If you like, you can shuffle the list of quips every time you reach the end, but take care here — it’s possible that the last quip in the old order will be the same as the first quip in the new order, so you may still get a repeat. (Of course, you can just check for this case and swap the first quip somewhere else if it bothers you.)

That last behavior is, in fact, the canonical way that Tetris chooses pieces — the game simply shuffles a list of all 7 pieces, gives those to you in shuffled order, then shuffles them again to make a new list once it’s exhausted. There’s no avoidance of duplicates, though, so you can still get two S blocks in a row, or even two S and two Z all clumped together, but no more than that. Some Tetris variants take other approaches, such as actively avoiding repeats even several pieces apart or deliberately giving you the worst piece possible.

Random drops

Random drops are often implemented as a flat chance each time. Maybe enemies have a 5% chance to drop health when they die. Legally speaking, over the long term, a player will see health drops for about 5% of enemy kills.

Over the short term, they may be desperate for health and not survive to see the long term. So you may want to put a thumb on the scale sometimes. Games in the Metroid series, for example, have a somewhat infamous bias towards whatever kind of drop they think you need — health if your health is low, missiles if your missiles are low.

I can’t give you an exact approach to use, since it depends on the game and the feeling you’re going for and the variables at your disposal. In extreme cases, you might want to guarantee a health drop from a tough enemy when the player is critically low on health. (Or if you’re feeling particularly evil, you could go the other way and deny the player health when they most need it…)

The problem becomes a little different, and worse, when the event that triggers the drop is relatively rare. The pathological case here would be something like a raid boss in World of Warcraft, which requires hours of effort from a coordinated group of people to defeat, and which has some tiny chance of dropping a good item that will go to only one of those people. This is why I stopped playing World of Warcraft at 60.

Dialing it back a little bit gives us Enter the Gungeon, a roguelike where each room is a set of encounters and each floor only has a dozen or so rooms. Initially, you have a 1% chance of getting a reward after completing a room — but every time you complete a room and don’t get a reward, the chance increases by 9%, up to a cap of 80%. Once you get a reward, the chance resets to 1%.

The natural question is: how frequently, exactly, can a player expect to get a reward? We could do math, or we could Just Simulate The Damn Thing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from collections import Counter
import random

histogram = Counter()

TRIALS = 1000000
chance = 1
rooms_cleared = 0
rewards_found = 0
while rewards_found < TRIALS:
    rooms_cleared += 1
    if random.random() * 100 < chance:
        # Reward!
        rewards_found += 1
        histogram[rooms_cleared] += 1
        rooms_cleared = 0
        chance = 1
    else:
        chance = min(80, chance + 9)

for gaps, count in sorted(histogram.items()):
    print(f"{gaps:3d} | {count / TRIALS * 100:6.2f}%", '#' * (count // (TRIALS // 100)))
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  1 |   0.98%
  2 |   9.91% #########
  3 |  17.00% ################
  4 |  20.23% ####################
  5 |  19.21% ###################
  6 |  15.05% ###############
  7 |   9.69% #########
  8 |   5.07% #####
  9 |   2.09% ##
 10 |   0.63%
 11 |   0.12%
 12 |   0.03%
 13 |   0.00%
 14 |   0.00%
 15 |   0.00%

We’ve got kind of a hilly distribution, skewed to the left, which is up in this histogram. Most of the time, a player should see a reward every three to six rooms, which is maybe twice per floor. It’s vanishingly unlikely to go through a dozen rooms without ever seeing a reward, so a player should see at least one per floor.

Of course, this simulated a single continuous playthrough; when starting the game from scratch, your chance at a reward always starts fresh at 1%, the worst it can be. If you want to know about how many rewards a player will get on the first floor, hey, Just Simulate The Damn Thing.

1
2
3
4
5
6
7
  0 |   0.01%
  1 |  13.01% #############
  2 |  56.28% ########################################################
  3 |  27.49% ###########################
  4 |   3.10% ###
  5 |   0.11%
  6 |   0.00%

Cool. Though, that’s assuming exactly 12 rooms; it might be worth changing that to pick at random in a way that matches the level generator.

(Enter the Gungeon does some other things to skew probability, which is very nice in a roguelike where blind luck can make or break you. For example, if you kill a boss without having gotten a new gun anywhere else on the floor, the boss is guaranteed to drop a gun.)

Critical hits

I suppose this is the same problem as random drops, but backwards.

Say you have a battle sim where every attack has a 6% chance to land a devastating critical hit. Presumably the same rules apply to both the player and the AI opponents.

Consider, then, that the AI opponents have exactly the same 6% chance to ruin the player’s day. Consider also that this gives them an 0.4% chance to critical hit twice in a row. 0.4% doesn’t sound like much, but across an entire playthrough, it’s not unlikely that a player might see it happen and find it incredibly annoying.

Perhaps it would be worthwhile to explicitly forbid AI opponents from getting consecutive critical hits.

In conclusion

An emerging theme here has been to Just Simulate The Damn Thing. So consider Just Simulating The Damn Thing. Even a simple change to a random value can do surprising things to the resulting distribution, so unless you feel like differentiating the inverse function of your code, maybe test out any non-trivial behavior and make sure it’s what you wanted. Probability is hard to reason about.

New – Amazon CloudWatch Agent with AWS Systems Manager Integration – Unified Metrics & Log Collection for Linux & Windows

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-cloudwatch-agent-with-aws-systems-manager-integration-unified-metrics-log-collection-for-linux-windows/

In the past I’ve talked about several agents, deaemons, and scripts that you could use to collect system metrics and log files for your Windows and Linux instances and on-premise services and publish them to Amazon CloudWatch. The data collected by this somewhat disparate collection of tools gave you visibility into the status and behavior of your compute resources, along with the power to take action when a value goes out of range and indicates a potential issue. You can graph any desired metrics on CloudWatch Dashboards, initiate actions via CloudWatch Alarms, and search CloudWatch Logs to find error messages, while taking advantage of our support for custom high-resolution metrics.

New Unified Agent
Today we are taking a nice step forward and launching a new, unified CloudWatch Agent. It runs in the cloud and on-premises, on Linux and Windows instances and servers, and handles metrics and log files. You can deploy it using AWS Systems Manager (SSM) Run Command, SSM State Manager, or from the CLI. Here are some of the most important features:

Single Agent – A single agent now collects both metrics and logs. This simplifies the setup process and reduces complexity.

Cross-Platform / Cross-Environment – The new agent runs in the cloud and on-premises, on 64-bit Linux and 64-bit Windows, and includes HTTP proxy server support.

Configurable – The new agent captures the most useful system metrics automatically. It can be configured to collect hundreds of others, including fine-grained metrics on sub-resources such as CPU threads, mounted filesystems, and network interfaces.

CloudWatch-Friendly – The new agent supports standard 1-minute metrics and the newer 1-second high-resolution metrics. It automatically includes EC2 dimensions such as Instance Id, Image Id, and Auto Scaling Group Name, and also supports the use of custom dimensions. All of the dimensions can be used for custom aggregation across Auto Scaling Groups, applications, and so forth.

Migration – You can easily migrate existing AWS SSM and EC2Config configurations for use with the new agent.

Installing the Agent
The CloudWatch Agent uses an IAM role when running on an EC2 instance, and an IAM user when running on an on-premises server. The role or the user must include the AmazonSSMFullAccess and AmazonEC2ReadOnlyAccess policies. Here’s my role:

I can easily add it to a running instance (this is a relatively new and very handy EC2 feature):

The SSM Agent is already running on my instance. If it wasn’t, I would follow the steps in Installing and Configuring SSM Agent to set it up.

Next, I install the CloudWatch Agent using the AWS Systems Manager:

This takes just a few seconds. Now I can use a simple wizard to set up the configuration file for the agent:

The wizard also lets me set up the log files to be monitored:

The wizard generates a JSON-format config file and stores it on the instance. It also offers me the option to upload the file to my Parameter Store so that I can deploy it to my other instances (I can also do fine-grained customization of the metrics and log collection configuration by editing the file):

Now I can start the CloudWatch Agent using Run Command, supplying the name of my configuration in the Parameter Store:

This runs in a few seconds and the agent begins to publish metrics right away. As I mentioned earlier, the agent can publish fine-grained metrics on the resources inside of or attached to an instance. For example, here are the metrics for each filesystem:

There’s a separate log stream for each monitored log file on each instance:

I can view and search it, just like I can do for any other log stream:

Now Available
The new CloudWatch Agent is available now and you can start using it today in all public AWS Regions, with AWS GovCloud (US) and the Regions in China to follow.

There’s no charge for the agent; you pay the usual CloudWatch prices for logs and custom metrics.

Jeff;

New – Interactive AWS Cost Explorer API

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-interactive-aws-cost-explorer-api/

We launched the AWS Cost Explorer a couple of years ago in order to allow you to track, allocate, and manage your AWS costs. The response to that launch, and to additions that we have made since then, has been very positive. However our customers are, as Jeff Bezos has said, “beautifully, wonderfully, dissatisfied.”

I see this first-hand every day. We launch something and that launch inspires our customers to ask for even more. For example, with many customers going all-in and moving large parts of their IT infrastructure to the AWS Cloud, we’ve had many requests for the raw data that feeds into the Cost Explorer. These customers want to programmatically explore their AWS costs, update ledgers and accounting systems with per-application and per-department costs, and to build high-level dashboards that summarize spending. Some of these customers have been going to the trouble of extracting the data from the charts and reports provided by Cost Explorer!

New Cost Explorer API
Today we are making the underlying data that feeds into Cost Explorer available programmatically. The new Cost Explorer API gives you a set of functions that allow you do everything that I described above. You can retrieve cost and usage data that is filtered and grouped across multiple dimensions (Service, Linked Account, tag, Availability Zone, and so forth), aggregated by day or by month. This gives you the power to start simple (total monthly costs) and to refine your requests to any desired level of detail (writes to DynamoDB tables that have been tagged as production) while getting responses in seconds.

Here are the operations:

GetCostAndUsage – Retrieve cost and usage metrics for a single account or all accounts (master accounts in an organization have access to all member accounts) with filtering and grouping.

GetDimensionValues – Retrieve available filter values for a specified filter over a specified period of time.

GetTags – Retrieve available tag keys and tag values over a specified period of time.

GetReservationUtilization – Retrieve EC2 Reserved Instance utilization over a specified period of time, with daily or monthly granularity plus filtering and grouping.

I believe that these functions, and the data that they return, will give you the ability to do some really interesting things that will give you better insights into your business. For example, you could tag the resources used to support individual marketing campaigns or development projects and then deep-dive into the costs to measure business value. You how have the potential to know, down to the penny, how much you spend on infrastructure for important events like Cyber Monday or Black Friday.

Things to Know
Here are a couple of things to keep in mind as you start to think about ways to make use of the API:

Grouping – The Cost Explorer web application provides you with one level of grouping; the APIs give you two. For example you could group costs or RI utilization by Service and then by Region.

Pagination – The functions can return very large amounts of data and follow the AWS-wide model for pagination by including a nextPageToken if additional data is available. You simply call the same function again, supplying the token, to move forward.

Regions – The service endpoint is in the US East (Northern Virginia) Region and returns usage data for all public AWS Regions.

Pricing – Each API call costs $0.01. To put this into perspective, let’s say you use this API to build a dashboard and it gets 1000 hits per month from your users. Your operating cost for the dashboard should be $10 or so; this is far less expensive than setting up your own systems to extract & ingest the data and respond to interactive queries.

The Cost Explorer API is available now and you can start using it today. To learn more, read about the Cost Explorer API.

Jeff;

Capturing Custom, High-Resolution Metrics from Containers Using AWS Step Functions and AWS Lambda

Post Syndicated from Nathan Taber original https://aws.amazon.com/blogs/compute/capturing-custom-high-resolution-metrics-from-containers-using-aws-step-functions-and-aws-lambda/

Contributed by Trevor Sullivan, AWS Solutions Architect

When you deploy containers with Amazon ECS, are you gathering all of the key metrics so that you can correctly monitor the overall health of your ECS cluster?

By default, ECS writes metrics to Amazon CloudWatch in 5-minute increments. For complex or large services, this may not be sufficient to make scaling decisions quickly. You may want to respond immediately to changes in workload or to identify application performance problems. Last July, CloudWatch announced support for high-resolution metrics, up to a per-second basis.

These high-resolution metrics can be used to give you a clearer picture of the load and performance for your applications, containers, clusters, and hosts. In this post, I discuss how you can use AWS Step Functions, along with AWS Lambda, to cost effectively record high-resolution metrics into CloudWatch. You implement this solution using a serverless architecture, which keeps your costs low and makes it easier to troubleshoot the solution.

To show how this works, you retrieve some useful metric data from an ECS cluster running in the same AWS account and region (Oregon, us-west-2) as the Step Functions state machine and Lambda function. However, you can use this architecture to retrieve any custom application metrics from any resource in any AWS account and region.

Why Step Functions?

Step Functions enables you to orchestrate multi-step tasks in the AWS Cloud that run for any period of time, up to a year. Effectively, you’re building a blueprint for an end-to-end process. After it’s built, you can execute the process as many times as you want.

For this architecture, you gather metrics from an ECS cluster, every five seconds, and then write the metric data to CloudWatch. After your ECS cluster metrics are stored in CloudWatch, you can create CloudWatch alarms to notify you. An alarm can also trigger an automated remediation activity such as scaling ECS services, when a metric exceeds a threshold defined by you.

When you build a Step Functions state machine, you define the different states inside it as JSON objects. The bulk of the work in Step Functions is handled by the common task state, which invokes Lambda functions or Step Functions activities. There is also a built-in library of other useful states that allow you to control the execution flow of your program.

One of the most useful state types in Step Functions is the parallel state. Each parallel state in your state machine can have one or more branches, each of which is executed in parallel. Another useful state type is the wait state, which waits for a period of time before moving to the next state.

In this walkthrough, you combine these three states (parallel, wait, and task) to create a state machine that triggers a Lambda function, which then gathers metrics from your ECS cluster.

Step Functions pricing

This state machine is executed every minute, resulting in 60 executions per hour, and 1,440 executions per day. Step Functions is billed per state transition, including the Start and End state transitions, and giving you approximately 37,440 state transitions per day. To reach this number, I’m using this estimated math:

26 state transitions per-execution x 60 minutes x 24 hours

Based on current pricing, at $0.000025 per state transition, the daily cost of this metric gathering state machine would be $0.936.

Step Functions offers an indefinite 4,000 free state transitions every month. This benefit is available to all customers, not just customers who are still under the 12-month AWS Free Tier. For more information and cost example scenarios, see Step Functions pricing.

Why Lambda?

The goal is to capture metrics from an ECS cluster, and write the metric data to CloudWatch. This is a straightforward, short-running process that makes Lambda the perfect place to run your code. Lambda is one of the key services that makes up “Serverless” application architectures. It enables you to consume compute capacity only when your code is actually executing.

The process of gathering metric data from ECS and writing it to CloudWatch takes a short period of time. In fact, my average Lambda function execution time, while developing this post, is only about 250 milliseconds on average. For every five-second interval that occurs, I’m only using 1/20th of the compute time that I’d otherwise be paying for.

Lambda pricing

For billing purposes, Lambda execution time is rounded up to the nearest 100-ms interval. In general, based on the metrics that I observed during development, a 250-ms runtime would be billed at 300 ms. Here, I calculate the cost of this Lambda function executing on a daily basis.

Assuming 31 days in each month, there would be 535,680 five-second intervals (31 days x 24 hours x 60 minutes x 12 five-second intervals = 535,680). The Lambda function is invoked every five-second interval, by the Step Functions state machine, and runs for a 300-ms period. At current Lambda pricing, for a 128-MB function, you would be paying approximately the following:

Total compute

Total executions = 535,680
Total compute = total executions x (3 x $0.000000208 per 100 ms) = $0.334 per day

Total requests

Total requests = (535,680 / 1000000) * $0.20 per million requests = $0.11 per day

Total Lambda Cost

$0.11 requests + $0.334 compute time = $0.444 per day

Similar to Step Functions, Lambda offers an indefinite free tier. For more information, see Lambda Pricing.

Walkthrough

In the following sections, I step through the process of configuring the solution just discussed. If you follow along, at a high level, you will:

  • Configure an IAM role and policy
  • Create a Step Functions state machine to control metric gathering execution
  • Create a metric-gathering Lambda function
  • Configure a CloudWatch Events rule to trigger the state machine
  • Validate the solution

Prerequisites

You should already have an AWS account with a running ECS cluster. If you don’t have one running, you can easily deploy a Docker container on an ECS cluster using the AWS Management Console. In the example produced for this post, I use an ECS cluster running Windows Server (currently in beta), but either a Linux or Windows Server cluster works.

Create an IAM role and policy

First, create an IAM role and policy that enables Step Functions, Lambda, and CloudWatch to communicate with each other.

  • The CloudWatch Events rule needs permissions to trigger the Step Functions state machine.
  • The Step Functions state machine needs permissions to trigger the Lambda function.
  • The Lambda function needs permissions to query ECS and then write to CloudWatch Logs and metrics.

When you create the state machine, Lambda function, and CloudWatch Events rule, you assign this role to each of those resources. Upon execution, each of these resources assumes the specified role and executes using the role’s permissions.

  1. Open the IAM console.
  2. Choose Roles, create New Role.
  3. For Role Name, enter WriteMetricFromStepFunction.
  4. Choose Save.

Create the IAM role trust relationship
The trust relationship (also known as the assume role policy document) for your IAM role looks like the following JSON document. As you can see from the document, your IAM role needs to trust the Lambda, CloudWatch Events, and Step Functions services. By configuring your role to trust these services, they can assume this role and inherit the role permissions.

  1. Open the IAM console.
  2. Choose Roles and select the IAM role previously created.
  3. Choose Trust RelationshipsEdit Trust Relationships.
  4. Enter the following trust policy text and choose Save.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "events.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "states.us-west-2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Create an IAM policy

After you’ve finished configuring your role’s trust relationship, grant the role access to the other AWS resources that make up the solution.

The IAM policy is what gives your IAM role permissions to access various resources. You must whitelist explicitly the specific resources to which your role has access, because the default IAM behavior is to deny access to any AWS resources.

I’ve tried to keep this policy document as generic as possible, without allowing permissions to be too open. If the name of your ECS cluster is different than the one in the example policy below, make sure that you update the policy document before attaching it to your IAM role. You can attach this policy as an inline policy, instead of creating the policy separately first. However, either approach is valid.

  1. Open the IAM console.
  2. Select the IAM role, and choose Permissions.
  3. Choose Add in-line policy.
  4. Choose Custom Policy and then enter the following policy. The inline policy name does not matter.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [ "logs:*" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "cloudwatch:PutMetricData" ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [ "states:StartExecution" ],
            "Resource": [
                "arn:aws:states:*:*:stateMachine:WriteMetricFromStepFunction"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [ "lambda:InvokeFunction" ],
            "Resource": "arn:aws:lambda:*:*:function:WriteMetricFromStepFunction"
        },
        {
            "Effect": "Allow",
            "Action": [ "ecs:Describe*" ],
            "Resource": "arn:aws:ecs:*:*:cluster/ECSEsgaroth"
        }
    ]
}

Create a Step Functions state machine

In this section, you create a Step Functions state machine that invokes the metric-gathering Lambda function every five (5) seconds, for a one-minute period. If you divide a minute (60) seconds into equal parts of five-second intervals, you get 12. Based on this math, you create 12 branches, in a single parallel state, in the state machine. Each branch triggers the metric-gathering Lambda function at a different five-second marker, throughout the one-minute period. After all of the parallel branches finish executing, the Step Functions execution completes and another begins.

Follow these steps to create your Step Functions state machine:

  1. Open the Step Functions console.
  2. Choose DashboardCreate State Machine.
  3. For State Machine Name, enter WriteMetricFromStepFunction.
  4. Enter the state machine code below into the editor. Make sure that you insert your own AWS account ID for every instance of “676655494xxx”
  5. Choose Create State Machine.
  6. Select the WriteMetricFromStepFunction IAM role that you previously created.
{
    "Comment": "Writes ECS metrics to CloudWatch every five seconds, for a one-minute period.",
    "StartAt": "ParallelMetric",
    "States": {
      "ParallelMetric": {
        "Type": "Parallel",
        "Branches": [
          {
            "StartAt": "WriteMetricLambda",
            "States": {
             	"WriteMetricLambda": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFive",
            "States": {
            	"WaitFive": {
            		"Type": "Wait",
            		"Seconds": 5,
            		"Next": "WriteMetricLambdaFive"
          		},
             	"WriteMetricLambdaFive": {
                  "Type": "Task",
				  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitTen",
            "States": {
            	"WaitTen": {
            		"Type": "Wait",
            		"Seconds": 10,
            		"Next": "WriteMetricLambda10"
          		},
             	"WriteMetricLambda10": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
    	  {
            "StartAt": "WaitFifteen",
            "States": {
            	"WaitFifteen": {
            		"Type": "Wait",
            		"Seconds": 15,
            		"Next": "WriteMetricLambda15"
          		},
             	"WriteMetricLambda15": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait20",
            "States": {
            	"Wait20": {
            		"Type": "Wait",
            		"Seconds": 20,
            		"Next": "WriteMetricLambda20"
          		},
             	"WriteMetricLambda20": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait25",
            "States": {
            	"Wait25": {
            		"Type": "Wait",
            		"Seconds": 25,
            		"Next": "WriteMetricLambda25"
          		},
             	"WriteMetricLambda25": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait30",
            "States": {
            	"Wait30": {
            		"Type": "Wait",
            		"Seconds": 30,
            		"Next": "WriteMetricLambda30"
          		},
             	"WriteMetricLambda30": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait35",
            "States": {
            	"Wait35": {
            		"Type": "Wait",
            		"Seconds": 35,
            		"Next": "WriteMetricLambda35"
          		},
             	"WriteMetricLambda35": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait40",
            "States": {
            	"Wait40": {
            		"Type": "Wait",
            		"Seconds": 40,
            		"Next": "WriteMetricLambda40"
          		},
             	"WriteMetricLambda40": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait45",
            "States": {
            	"Wait45": {
            		"Type": "Wait",
            		"Seconds": 45,
            		"Next": "WriteMetricLambda45"
          		},
             	"WriteMetricLambda45": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait50",
            "States": {
            	"Wait50": {
            		"Type": "Wait",
            		"Seconds": 50,
            		"Next": "WriteMetricLambda50"
          		},
             	"WriteMetricLambda50": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          },
          {
            "StartAt": "Wait55",
            "States": {
            	"Wait55": {
            		"Type": "Wait",
            		"Seconds": 55,
            		"Next": "WriteMetricLambda55"
          		},
             	"WriteMetricLambda55": {
                  "Type": "Task",
                  "Resource": "arn:aws:lambda:us-west-2:676655494xxx:function:WriteMetricFromStepFunction",
                  "End": true
                } 
            }
          }
        ],
        "End": true
      }
  }
}

Now you’ve got a shiny new Step Functions state machine! However, you might ask yourself, “After the state machine has been created, how does it get executed?” Before I answer that question, create the Lambda function that writes the custom metric, and then you get the end-to-end process moving.

Create a Lambda function

The meaty part of the solution is a Lambda function, written to consume the Python 3.6 runtime, that retrieves metric values from ECS, and then writes them to CloudWatch. This Lambda function is what the Step Functions state machine is triggering every five seconds, via the Task states. Key points to remember:

The Lambda function needs permission to:

  • Write CloudWatch metrics (PutMetricData API).
  • Retrieve metrics from ECS clusters (DescribeCluster API).
  • Write StdOut to CloudWatch Logs.

Boto3, the AWS SDK for Python, is included in the Lambda execution environment for Python 2.x and 3.x.

Because Lambda includes the AWS SDK, you don’t have to worry about packaging it up and uploading it to Lambda. You can focus on writing code and automatically take a dependency on boto3.

As for permissions, you’ve already created the IAM role and attached a policy to it that enables your Lambda function to access the necessary API actions. When you create your Lambda function, make sure that you select the correct IAM role, to ensure it is invoked with the correct permissions.

The following Lambda function code is generic. So how does the Lambda function know which ECS cluster to gather metrics for? Your Step Functions state machine automatically passes in its state to the Lambda function. When you create your CloudWatch Events rule, you specify a simple JSON object that passes the desired ECS cluster name into your Step Functions state machine, which then passes it to the Lambda function.

Use the following property values as you create your Lambda function:

Function Name: WriteMetricFromStepFunction
Description: This Lambda function retrieves metric values from an ECS cluster and writes them to Amazon CloudWatch.
Runtime: Python3.6
Memory: 128 MB
IAM Role: WriteMetricFromStepFunction

import boto3

def handler(event, context):
    cw = boto3.client('cloudwatch')
    ecs = boto3.client('ecs')
    print('Got boto3 client objects')
    
    Dimension = {
        'Name': 'ClusterName',
        'Value': event['ECSClusterName']
    }

    cluster = get_ecs_cluster(ecs, Dimension['Value'])
    
    cw_args = {
       'Namespace': 'ECS',
       'MetricData': [
           {
               'MetricName': 'RunningTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['runningTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'PendingTask',
               'Dimensions': [ Dimension ],
               'Value': cluster['pendingTasksCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'ActiveServices',
               'Dimensions': [ Dimension ],
               'Value': cluster['activeServicesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           },
           {
               'MetricName': 'RegisteredContainerInstances',
               'Dimensions': [ Dimension ],
               'Value': cluster['registeredContainerInstancesCount'],
               'Unit': 'Count',
               'StorageResolution': 1
           }
        ]
    }
    cw.put_metric_data(**cw_args)
    print('Finished writing metric data')
    
def get_ecs_cluster(client, cluster_name):
    cluster = client.describe_clusters(clusters = [ cluster_name ])
    print('Retrieved cluster details from ECS')
    return cluster['clusters'][0]

Create the CloudWatch Events rule

Now you’ve created an IAM role and policy, Step Functions state machine, and Lambda function. How do these components actually start communicating with each other? The final step in this process is to set up a CloudWatch Events rule that triggers your metric-gathering Step Functions state machine every minute. You have two choices for your CloudWatch Events rule expression: rate or cron. In this example, use the cron expression.

A couple key learning points from creating the CloudWatch Events rule:

  • You can specify one or more targets, of different types (for example, Lambda function, Step Functions state machine, SNS topic, and so on).
  • You’re required to specify an IAM role with permissions to trigger your target.
    NOTE: This applies only to certain types of targets, including Step Functions state machines.
  • Each target that supports IAM roles can be triggered using a different IAM role, in the same CloudWatch Events rule.
  • Optional: You can provide custom JSON that is passed to your target Step Functions state machine as input.

Follow these steps to create the CloudWatch Events rule:

  1. Open the CloudWatch console.
  2. Choose Events, RulesCreate Rule.
  3. Select Schedule, Cron Expression, and then enter the following rule:
    0/1 * * * ? *
  4. Choose Add Target, Step Functions State MachineWriteMetricFromStepFunction.
  5. For Configure Input, select Constant (JSON Text).
  6. Enter the following JSON input, which is passed to Step Functions, while changing the cluster name accordingly:
    { "ECSClusterName": "ECSEsgaroth" }
  7. Choose Use Existing Role, WriteMetricFromStepFunction (the IAM role that you previously created).

After you’ve completed with these steps, your screen should look similar to this:

Validate the solution

Now that you have finished implementing the solution to gather high-resolution metrics from ECS, validate that it’s working properly.

  1. Open the CloudWatch console.
  2. Choose Metrics.
  3. Choose custom and select the ECS namespace.
  4. Choose the ClusterName metric dimension.

You should see your metrics listed below.

Troubleshoot configuration issues

If you aren’t receiving the expected ECS cluster metrics in CloudWatch, check for the following common configuration issues. Review the earlier procedures to make sure that the resources were properly configured.

  • The IAM role’s trust relationship is incorrectly configured.
    Make sure that the IAM role trusts Lambda, CloudWatch Events, and Step Functions in the correct region.
  • The IAM role does not have the correct policies attached to it.
    Make sure that you have copied the IAM policy correctly as an inline policy on the IAM role.
  • The CloudWatch Events rule is not triggering new Step Functions executions.
    Make sure that the target configuration on the rule has the correct Step Functions state machine and IAM role selected.
  • The Step Functions state machine is being executed, but failing part way through.
    Examine the detailed error message on the failed state within the failed Step Functions execution. It’s possible that the
  • IAM role does not have permissions to trigger the target Lambda function, that the target Lambda function may not exist, or that the Lambda function failed to complete successfully due to invalid permissions.
    Although the above list covers several different potential configuration issues, it is not comprehensive. Make sure that you understand how each service is connected to each other, how permissions are granted through IAM policies, and how IAM trust relationships work.

Conclusion

In this post, you implemented a Serverless solution to gather and record high-resolution application metrics from containers running on Amazon ECS into CloudWatch. The solution consists of a Step Functions state machine, Lambda function, CloudWatch Events rule, and an IAM role and policy. The data that you gather from this solution helps you rapidly identify issues with an ECS cluster.

To gather high-resolution metrics from any service, modify your Lambda function to gather the correct metrics from your target. If you prefer not to use Python, you can implement a Lambda function using one of the other supported runtimes, including Node.js, Java, or .NET Core. However, this post should give you the fundamental basics about capturing high-resolution metrics in CloudWatch.

If you found this post useful, or have questions, please comment below.