Tag Archives: design

EC2 Instance Update – C5 Instances with Local NVMe Storage (C5d)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-instances-with-local-nvme-storage-c5d/

As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!

Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.

New C5d Instances with Local Storage
In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:

Instance Name vCPUs RAM Local Storage EBS Bandwidth Network Bandwidth
c5d.large 2 4 GiB 1 x 50 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.xlarge 4 8 GiB 1 x 100 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.2xlarge 8 16 GiB 1 x 225 GB NVMe SSD Up to 2.25 Gbps Up to 10 Gbps
c5d.4xlarge 16 32 GiB 1 x 450 GB NVMe SSD 2.25 Gbps Up to 10 Gbps
c5d.9xlarge 36 72 GiB 1 x 900 GB NVMe SSD 4.5 Gbps 10 Gbps
c5d.18xlarge 72 144 GiB 2 x 900 GB NVMe SSD 9 Gbps 25 Gbps

Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.

You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.

Here are a couple of things to keep in mind about the local NVMe storage:

Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.

Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.

Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.

Available Now
C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.

Jeff;

PS – We will be adding local NVMe storage to other EC2 instance types in the months to come, so stay tuned!

UK soldiers design Raspberry Pi bomb disposal robot

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/uk-soldiers-design-raspberry-pi-bomb-disposal-robot/

Three soldiers in the British Army have used a Raspberry Pi to build an autonomous robot, as part of their Foreman of Signals course.

Meet The Soldiers Revolutionising Bomb Disposal

Three soldiers from Blandford Camp have successfully designed and built an autonomous robot as part of their Foreman of Signals Course at the Dorset Garrison.

Autonomous robots

Forces Radio BFBS carried a story last week about Staff Sergeant Jolley, Sergeant Rana, and Sergeant Paddon, also known as the “Project ROVER” team. As part of their Foreman of Signals training, their task was to design an autonomous robot that can move between two specified points, take a temperature reading, and transmit the information to a remote computer. The team comments that, while semi-autonomous robots have been used as far back as 9/11 for tasks like finding people trapped under rubble, nothing like their robot and on a similar scale currently exists within the British Army.

The ROVER buggy

Their build is named ROVER, which stands for Remote Obstacle aVoiding Environment Robot. It’s a buggy that moves on caterpillar tracks, and it’s tethered; we wonder whether that might be because it doesn’t currently have an on-board power supply. A demo shows the robot moving forward, then changing its path when it encounters an obstacle. The team is using RealVNC‘s remote access software to allow ROVER to send data back to another computer.

Applications for ROVER

Dave Ball, Senior Lecturer in charge of the Foreman of Signals course, comments that the project is “a fantastic opportunity for [the team] to, even only halfway through the course, showcase some of the stuff they’ve learnt and produce something that’s really quite exciting.” The Project ROVER team explains that the possibilities for autonomous robots like this one are extensive: they include mine clearance, bomb disposal, and search-and-rescue campaigns. They point out that existing semi-autonomous hardware is not as easy to program as their build. In contrast, they say, “with the invention of the Raspberry Pi, this has allowed three very inexperienced individuals to program a robot very capable of doing these things.”

We make Raspberry Pi computers because we want building things with technology to be as accessible as possible. So it’s great to see a project like this, made by people who aren’t techy and don’t have a lot of computing experience, but who want to solve a problem and see that the Pi is an affordable and powerful tool that can help.

The post UK soldiers design Raspberry Pi bomb disposal robot appeared first on Raspberry Pi.

AWS IoT 1-Click – Use Simple Devices to Trigger Lambda Functions

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-1-click-use-simple-devices-to-trigger-lambda-functions/

We announced a preview of AWS IoT 1-Click at AWS re:Invent 2017 and have been refining it ever since, focusing on simplicity and a clean out-of-box experience. Designed to make IoT available and accessible to a broad audience, AWS IoT 1-Click is now generally available, along with new IoT buttons from AWS and AT&T.

I sat down with the dev team a month or two ago to learn about the service so that I could start thinking about my blog post. During the meeting they gave me a pair of IoT buttons and I started to think about some creative ways to put them to use. Here are a few that I came up with:

Help Request – Earlier this month I spent a very pleasant weekend at the HackTillDawn hackathon in Los Angeles. As the participants were hacking away, they occasionally had questions about AWS, machine learning, Amazon SageMaker, and AWS DeepLens. While we had plenty of AWS Solution Architects on hand (decked out in fashionable & distinctive AWS shirts for easy identification), I imagined an IoT button for each team. Pressing the button would alert the SA crew via SMS and direct them to the proper table.

Camera ControlTim Bray and I were in the AWS video studio, prepping for the first episode of Tim’s series on AWS Messaging. Minutes before we opened the Twitch stream I realized that we did not have a clean, unobtrusive way to ask the camera operator to switch to a closeup view. Again, I imagined that a couple of IoT buttons would allow us to make the request.

Remote Dog Treat Dispenser – My dog barks every time a stranger opens the gate in front of our house. While it is great to have confirmation that my Ring doorbell is working, I would like to be able to press a button and dispense a treat so that Luna stops barking!

Homes, offices, factories, schools, vehicles, and health care facilities can all benefit from IoT buttons and other simple IoT devices, all managed using AWS IoT 1-Click.

All About AWS IoT 1-Click
As I said earlier, we have been focusing on simplicity and a clean out-of-box experience. Here’s what that means:

Architects can dream up applications for inexpensive, low-powered devices.

Developers don’t need to write any device-level code. They can make use of pre-built actions, which send email or SMS messages, or write their own custom actions using AWS Lambda functions.

Installers don’t have to install certificates or configure cloud endpoints on newly acquired devices, and don’t have to worry about firmware updates.

Administrators can monitor the overall status and health of each device, and can arrange to receive alerts when a device nears the end of its useful life and needs to be replaced, using a single interface that spans device types and manufacturers.

I’ll show you how easy this is in just a moment. But first, let’s talk about the current set of devices that are supported by AWS IoT 1-Click.

Who’s Got the Button?
We’re launching with support for two types of buttons (both pictured above). Both types of buttons are pre-configured with X.509 certificates, communicate to the cloud over secure connections, and are ready to use.

The AWS IoT Enterprise Button communicates via Wi-Fi. It has a 2000-click lifetime, encrypts outbound data using TLS, and can be configured using BLE and our mobile app. It retails for $19.99 (shipping and handling not included) and can be used in the United States, Europe, and Japan.

The AT&T LTE-M Button communicates via the LTE-M cellular network. It has a 1500-click lifetime, and also encrypts outbound data using TLS. The device and the bundled data plan is available an an introductory price of $29.99 (shipping and handling not included), and can be used in the United States.

We are very interested in working with device manufacturers in order to make even more shapes, sizes, and types of devices (badge readers, asset trackers, motion detectors, and industrial sensors, to name a few) available to our customers. Our team will be happy to tell you about our provisioning tools and our facility for pushing OTA (over the air) updates to large fleets of devices; you can contact them at [email protected].

AWS IoT 1-Click Concepts
I’m eager to show you how to use AWS IoT 1-Click and the buttons, but need to introduce a few concepts first.

Device – A button or other item that can send messages. Each device is uniquely identified by a serial number.

Placement Template – Describes a like-minded collection of devices to be deployed. Specifies the action to be performed and lists the names of custom attributes for each device.

Placement – A device that has been deployed. Referring to placements instead of devices gives you the freedom to replace and upgrade devices with minimal disruption. Each placement can include values for custom attributes such as a location (“Building 8, 3rd Floor, Room 1337”) or a purpose (“Coffee Request Button”).

Action – The AWS Lambda function to invoke when the button is pressed. You can write a function from scratch, or you can make use of a pair of predefined functions that send an email or an SMS message. The actions have access to the attributes; you can, for example, send an SMS message with the text “Urgent need for coffee in Building 8, 3rd Floor, Room 1337.”

Getting Started with AWS IoT 1-Click
Let’s set up an IoT button using the AWS IoT 1-Click Console:

If I didn’t have any buttons I could click Buy devices to get some. But, I do have some, so I click Claim devices to move ahead. I enter the device ID or claim code for my AT&T button and click Claim (I can enter multiple claim codes or device IDs if I want):

The AWS buttons can be claimed using the console or the mobile app; the first step is to use the mobile app to configure the button to use my Wi-Fi:

Then I scan the barcode on the box and click the button to complete the process of claiming the device. Both of my buttons are now visible in the console:

I am now ready to put them to use. I click on Projects, and then Create a project:

I name and describe my project, and click Next to proceed:

Now I define a device template, along with names and default values for the placement attributes. Here’s how I set up a device template (projects can contain several, but I just need one):

The action has two mandatory parameters (phone number and SMS message) built in; I add three more (Building, Room, and Floor) and click Create project:

I’m almost ready to ask for some coffee! The next step is to associate my buttons with this project by creating a placement for each one. I click Create placements to proceed. I name each placement, select the device to associate with it, and then enter values for the attributes that I established for the project. I can also add additional attributes that are peculiar to this placement:

I can inspect my project and see that everything looks good:

I click on the buttons and the SMS messages appear:

I can monitor device activity in the AWS IoT 1-Click Console:

And also in the Lambda Console:

The Lambda function itself is also accessible, and can be used as-is or customized:

As you can see, this is the code that lets me use {{*}}include all of the placement attributes in the message and {{Building}} (for example) to include a specific placement attribute.

Now Available
I’ve barely scratched the surface of this cool new service and I encourage you to give it a try (or a click) yourself. Buy a button or two, build something cool, and let me know all about it!

Pricing is based on the number of enabled devices in your account, measured monthly and pro-rated for partial months. Devices can be enabled or disabled at any time. See the AWS IoT 1-Click Pricing page for more info.

To learn more, visit the AWS IoT 1-Click home page or read the AWS IoT 1-Click documentation.

Jeff;

 

[$] Supporting multi-actuator drives

Post Syndicated from jake original https://lwn.net/Articles/753652/rss

In a combined filesystem and storage session at the 2018 Linux Storage,
Filesystem, and Memory-Management Summit (LSFMM), Tim Walker asked for help
in designing the interface to some new storage hardware. He wanted some
feedback on how a multi-actuator
drive
should present itself to the system.
These drives have two (or, eventually, more) sets of read/write heads and
other hardware that can all operate in parallel.

Amazon Sumerian – Now Generally Available

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-sumerian-now-generally-available/

We announced Amazon Sumerian at AWS re:Invent 2017. As you can see from Tara‘s blog post (Presenting Amazon Sumerian: An Easy Way to Create VR, AR, and 3D Experiences), Sumerian does not require any specialized programming or 3D graphics expertise. You can build VR, AR, and 3D experiences for a wide variety of popular hardware platforms including mobile devices, head-mounted displays, digital signs, and web browsers.

I’m happy to announce that Sumerian is now generally available. You can create realistic virtual environments and scenes without having to acquire or master specialized tools for 3D modeling, animation, lighting, audio editing, or programming. Once built, you can deploy your finished creation across multiple platforms without having to write custom code or deal with specialized deployment systems and processes.

Sumerian gives you a web-based editor that you can use to quickly and easily create realistic, professional-quality scenes. There’s a visual scripting tool that lets you build logic to control how objects and characters (Sumerian Hosts) respond to user actions. Sumerian also lets you create rich, natural interactions powered by AWS services such as Amazon Lex, Polly, AWS Lambda, AWS IoT, and Amazon DynamoDB.

Sumerian was designed to work on multiple platforms. The VR and AR apps that you create in Sumerian will run in browsers that supports WebGL or WebVR and on popular devices such as the Oculus Rift, HTC Vive, and those powered by iOS or Android.

During the preview period, we have been working with a broad spectrum of customers to put Sumerian to the test and to create proof of concept (PoC) projects designed to highlight an equally broad spectrum of use cases, including employee education, training simulations, field service productivity, virtual concierge, design and creative, and brand engagement. Fidelity Labs (the internal R&D unit of Fidelity Investments), was the first to use a Sumerian host to create an engaging VR experience. Cora (the host) lives within a virtual chart room. She can display stock quotes, pull up company charts, and answer questions about a company’s performance. This PoC uses Amazon Polly to implement text to speech and Amazon Lex for conversational chatbot functionality. Read their blog post and watch the video inside to see Cora in action:

Now that Sumerian is generally available, you have the power to create engaging AR, VR, and 3D experiences of your own. To learn more, visit the Amazon Sumerian home page and then spend some quality time with our extensive collection of Sumerian Tutorials.

Jeff;

 

Solving Complex Ordering Challenges with Amazon SQS FIFO Queues

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/solving-complex-ordering-challenges-with-amazon-sqs-fifo-queues/

Contributed by Shea Lutton, AWS Cloud Infrastructure Architect

Amazon Simple Queue Service (Amazon SQS) is a fully managed queuing service that helps decouple applications, distributed systems, and microservices to increase fault tolerance. SQS queues come in two distinct types:

  • Standard SQS queues are able to scale to enormous throughput with at-least-once delivery.
  • FIFO queues are designed to guarantee that messages are processed exactly once in the exact order that they are received and have a default rate of 300 transactions per second.

As customers explore SQS FIFO queues, they often have questions about how the behavior works when messages arrive and are consumed. This post walks through some common situations to identify the exact behavior that you can expect. It also covers the behavior of message groups in depth and explains why message groups are key to understanding how FIFO queues work.

The simple case

Suppose that you run a major auction platform where people buy and sell a wide range of products. Your platform requires that transactions from buyers and sellers get processed in exactly the order received. Here’s how a FIFO queue helps you keep all your transactions in one straight flow.

A seller currently is holding an auction for a laptop, and three different bids are received for the same price. Ties are awarded to the first bidder at that price so it is important to track which arrived first. Your auction platform receives the three bids and sends them to a FIFO queue before they are processed.

Now observe how messages leave the queue. When your consumer asks for a batch of up to 10 messages, SQS starts filling the batch with the oldest message (bid A1). It keeps filling until either the batch is full or the queue is empty. In this case, the batch contains the three messages and the queue is now empty. After a batch has left the queue, SQS considers that batch of messages to be “in-flight” until the consumer either deletes them or the batch’s visibility timer expires.

 

When you have a single consumer, this is easy to envision. The consumer gets a batch of messages (now in-flight), does its processing, and deletes the messages. That consumer is then ready to ask for the next batch of messages.

The critical thing to keep in mind is that SQS won’t release the next batch of messages until the first batch has been deleted. By adding more messages to the queue, you can see more interesting behaviors. Imagine that a burst of 11 bids is sent to your FIFO queue, with two bids for Auction A arriving last.

The FIFO queue now has at least two batches of messages in it. When your single consumer requests the first batch of 10 messages, it receives a batch starting with B1 and ending with A1. Later, after the first batch has been deleted, the consumer can get the second batch of messages containing the final A2 message from the queue.

Adding complexity with multiple message groups

A new challenge arises. Your auction platform is getting busier and your dev team added a number of new features. The combination of increased messages and extra processing time for the new features means that a single consumer is too slow. The solution is to scale to have more consumers and process messages in parallel.

To work in parallel, your team realized that only the messages related to a single auction must be kept in order. All transactions for Auction A need to be kept in order and so do all transactions for Auction B. But the two auctions are independent and it does not matter which auctions transactions are processed first.

FIFO can handle that case with a feature called message groups. Each transaction related to Auction A is placed by your producer into message group A, and so on. In the diagram below, Auction A and Auction B each received three bid transactions, with bid B1 arriving first. The FIFO queue always keeps transactions within a message group in the order in which they arrived.

How is this any different than earlier examples? The consumer now gets the messages ordered by message groups, all the B group messages followed by all the A group messages. Multiple message groups create the possibility of using multiple consumers, which I explain in a moment. If FIFO can’t fill up a batch of messages with a single message group, FIFO can place more than one message group in a batch of messages. But whenever possible, the queue gives you a full batch of messages from the same group.

The order of messages leaving a FIFO queue is governed by three rules:

  1. Return the oldest message where no other message in the same message group is currently in-flight.
  2. Return as many messages from the same message group as possible.
  3. If a message batch is still not full, go back to rule 1.

To see this behavior, add a second consumer and insert many more messages into the queue. For simplicity, the delete message action has been omitted in these diagrams but it is assumed that all messages in a batch are processed successfully by the consumer and the batch is properly deleted immediately after.

In this example, there are 11 Group A and 11 Group B transactions arriving in interleaved order and a second consumer has been added. Consumer 1 asks for a group of 10 messages and receives 10 Group A messages. Consumer 2 then asks for 10 messages but SQS knows that Group A is in flight, so it releases 10 Group B messages. The two consumers are now processing two batches of messages in parallel, speeding up throughput and then deleting their batches. When Consumer 1 requests the next batch of messages, it receives the remaining two messages, one from Group A and one from Group B.

Consider this nuanced detail from the example above. What would happen if Consumer 1 was on a faster server and processed its first batch of messages before Consumer 2 could mark its messages for deletion? See if you can predict the behavior before looking at the answer.

If Consumer 2 has not deleted its Group B messages yet when Consumer 1 asks for the next batch, then the FIFO queue considers Group B to still be in flight. It does not release any more Group B messages. Consumer 1 gets only the remaining Group A message. Later, after Consumer 2 has deleted its first batch, the remaining Group B message is released.

Conclusion

I hope this post answered your questions about how Amazon SQS FIFO queues work and why message groups are helpful. If you’re interested in exploring SQS FIFO queues further, here are a few ideas to get you started:

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:

 

 

The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
)
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
  SELECT 
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
    eventtype,
    -- CURRENT STATE
    json_extract_scalar(
      currentagentsnapshot,
      '$.agentstatus.name') AS current_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        currentagentsnapshot,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.firstname') AS current_firstname,
    json_extract_scalar(
      currentagentsnapshot,
      '$.configuration.lastname') AS current_lastname,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.username') AS current_username,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
    -- PREVIOUS STATE
    json_extract_scalar(
      previousagentsnapshot, 
      '$.agentstatus.name') as prev_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        previousagentsnapshot, 
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.firstname') as prev_firstname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.lastname') as prev_lastname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.username') as prev_username,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
)
SELECT
  current_status as status,
  current_username as username,
  event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
  SELECT
     eventid,
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
),
src2 AS (
  SELECT *
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT 
  eventid,
  username,
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT 
  username,
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

SELECT 
  eventtype,
  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
  json_extract_path_text(
    currentagentsnapshot,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:

Summary

In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.

 


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.


About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.

 

 

 

Mayank Sinha’s home security project

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/home-security/

Yesterday, I received an email from someone called Mayank Sinha, showing us the Raspberry Pi home security project he’s been working on. He got in touch particularly because, he writes, the Raspberry Pi community has given him “immense support” with his build, and he wanted to dedicate it to the commmunity as thanks.

Mayank’s project is named Asfaleia, a Greek word that means safety, certainty, or security against threats. It’s part of an honourable tradition dating all the way back to 2012: it’s a prototype housed in a polystyrene box, using breadboards and jumper leads and sticky tape. And it’s working! Take a look.

Asfaleia DIY Home Security System

An IOT based home security system. The link to the code: https://github.com/mayanksinha11/Asfaleia

Home security with Asfaleida

Asfaleia has a PIR (passive infrared) motion sensor, an IR break beam sensor, and a gas sensor. All are connected to a Raspberry Pi 3 Model B, the latter two via a NodeMCU board. Mayank currently has them set up in a box that’s divided into compartments to model different rooms in a house.

A shallow box divided into four labelled "rooms", all containing electronic components

All the best prototypes have sticky tape or rubber bands

If the IR sensors detect motion or a broken beam, the webcam takes a photo and emails it to the build’s owner, and the build also calls their phone (I like your ringtone, Mayank). If the gas sensor detects a leak, the system activates an exhaust fan via a small relay board, and again the owner receives a phone call. The build can also authenticate users via face and fingerprint recognition. The software that runs it all is written in Python, and you can see Mayank’s code on GitHub.

Of prototypes and works-in-progess

Reading Mayank’s email made me very happy yesterday. We know that thousands of people in our community give a great deal of time and effort to help others learn and make things, and it is always wonderful to see an example of how that support is helping someone turn their ideas into reality. It’s great, too, to see people sharing works-in-progress, as well as polished projects! After all, the average build is more likely to feature rubber bands and Tupperware boxes than meticulously designed laser-cut parts or expert joinery. Mayank’s YouTube channel shows earlier work on this and another Pi project, and I hope he’ll continue to document his builds.

So here’s to Raspberry Pi projects big, small, beginner, professional, endlessly prototyped, unashamedly bodged, unfinished or fully working, shonky or shiny. Please keep sharing them all!

The post Mayank Sinha’s home security project appeared first on Raspberry Pi.

Augmented-reality projection lamp with Raspberry Pi and Android Things

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/augmented-reality-projector/

If your day has been a little fraught so far, watch this video. It opens with a tableau of methodically laid-out components and then shows them soldered, screwed, and slotted neatly into place. Everything fits perfectly; nothing needs percussive adjustment. Then it shows us glimpses of an AR future just like the one promised in the less dystopian comics and TV programmes of my 1980s childhood. It is all very soothing, and exactly what I needed.

Android Things – Lantern

Transform any surface into mixed-reality using Raspberry Pi, a laser projector, and Android Things. Android Experiments – http://experiments.withgoogle.com/android/lantern Lantern project site – http://nordprojects.co/lantern check below to make your own ↓↓↓ Get the code – https://github.com/nordprojects/lantern Build the lamp – https://www.hackster.io/nord-projects/lantern-9f0c28

Creating augmented reality with projection

We’ve seen plenty of Raspberry Pi IoT builds that are smart devices for the home; they add computing power to things like lights, door locks, or toasters to make these objects interact with humans and with their environment in new ways. Nord ProjectsLantern takes a different approach. In their words, it:

imagines a future where projections are used to present ambient information, and relevant UI within everyday objects. Point it at a clock to show your appointments, or point to speaker to display the currently playing song. Unlike a screen, when Lantern’s projections are no longer needed, they simply fade away.

Lantern is set up so that you can connect your wireless device to it using Google Nearby. This means there’s no need to create an account before you can dive into augmented reality.

Lantern Raspberry Pi powered projector lamp

Your own open-source AR lamp

Nord Projects collaborated on Lantern with Google’s Android Things team. They’ve made it fully open-source, so you can find the code on GitHub and also download their parts list, which includes a Pi, an IKEA lamp, an accelerometer, and a laser projector. Build instructions are at hackster.io and on GitHub.

This is a particularly clear tutorial, very well illustrated with photos and GIFs, and once you’ve sourced and 3D-printed all of the components, you shouldn’t need a whole lot of experience to put everything together successfully. Since everything is open-source, though, if you want to adapt it — for example, if you’d like to source a less costly projector than the snazzy one used here — you can do that too.

components of Lantern Raspberry Pi powered augmented reality projector lamp

The instructions walk you through the mechanical build and the wiring, as well as installing Android Things and Nord Projects’ custom software on the Raspberry Pi. Once you’ve set everything up, an accelerometer connected to the Pi’s GPIO pins lets the lamp know which surface it is pointing at. A companion app on your mobile device lets you choose from the mini apps that work on that surface to select the projection you want.

The designers are making several mini apps available for Lantern, including the charmingly named Space Porthole: this uses Processing and your local longitude and latitude to project onto your ceiling the stars you’d see if you punched a hole through to the sky, if it were night time, and clear weather. Wouldn’t you rather look at that than deal with the ant problem in your kitchen or tackle your GitHub notifications?

What would you like to project onto your living environment? Let us know in the comments!

The post Augmented-reality projection lamp with Raspberry Pi and Android Things appeared first on Raspberry Pi.

Supply-Chain Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/supply-chain_se.html

Earlier this month, the Pentagon stopped selling phones made by the Chinese companies ZTE and Huawei on military bases because they might be used to spy on their users.

It’s a legitimate fear, and perhaps a prudent action. But it’s just one instance of the much larger issue of securing our supply chains.

All of our computerized systems are deeply international, and we have no choice but to trust the companies and governments that touch those systems. And while we can ban a few specific products, services or companies, no country can isolate itself from potential foreign interference.

In this specific case, the Pentagon is concerned that the Chinese government demanded that ZTE and Huawei add “backdoors” to their phones that could be surreptitiously turned on by government spies or cause them to fail during some future political conflict. This tampering is possible because the software in these phones is incredibly complex. It’s relatively easy for programmers to hide these capabilities, and correspondingly difficult to detect them.

This isn’t the first time the United States has taken action against foreign software suspected to contain hidden features that can be used against us. Last December, President Trump signed into law a bill banning software from the Russian company Kaspersky from being used within the US government. In 2012, the focus was on Chinese-made Internet routers. Then, the House Intelligence Committee concluded: “Based on available classified and unclassified information, Huawei and ZTE cannot be trusted to be free of foreign state influence and thus pose a security threat to the United States and to our systems.”

Nor is the United States the only country worried about these threats. In 2014, China reportedly banned antivirus products from both Kaspersky and the US company Symantec, based on similar fears. In 2017, the Indian government identified 42 smartphone apps that China subverted. Back in 1997, the Israeli company Check Point was dogged by rumors that its government added backdoors into its products; other of that country’s tech companies have been suspected of the same thing. Even al-Qaeda was concerned; ten years ago, a sympathizer released the encryption software Mujahedeen Secrets, claimed to be free of Western influence and backdoors. If a country doesn’t trust another country, then it can’t trust that country’s computer products.

But this trust isn’t limited to the country where the company is based. We have to trust the country where the software is written — and the countries where all the components are manufactured. In 2016, researchers discovered that many different models of cheap Android phones were sending information back to China. The phones might be American-made, but the software was from China. In 2016, researchers demonstrated an even more devious technique, where a backdoor could be added at the computer chip level in the factory that made the chips ­ without the knowledge of, and undetectable by, the engineers who designed the chips in the first place. Pretty much every US technology company manufactures its hardware in countries such as Malaysia, Indonesia, China and Taiwan.

We also have to trust the programmers. Today’s large software programs are written by teams of hundreds of programmers scattered around the globe. Backdoors, put there by we-have-no-idea-who, have been discovered in Juniper firewalls and D-Link routers, both of which are US companies. In 2003, someone almost slipped a very clever backdoor into Linux. Think of how many countries’ citizens are writing software for Apple or Microsoft or Google.

We can go even farther down the rabbit hole. We have to trust the distribution systems for our hardware and software. Documents disclosed by Edward Snowden showed the National Security Agency installing backdoors into Cisco routers being shipped to the Syrian telephone company. There are fake apps in the Google Play store that eavesdrop on you. Russian hackers subverted the update mechanism of a popular brand of Ukrainian accounting software to spread the NotPetya malware.

In 2017, researchers demonstrated that a smartphone can be subverted by installing a malicious replacement screen.

I could go on. Supply-chain security is an incredibly complex problem. US-only design and manufacturing isn’t an option; the tech world is far too internationally interdependent for that. We can’t trust anyone, yet we have no choice but to trust everyone. Our phones, computers, software and cloud systems are touched by citizens of dozens of different countries, any one of whom could subvert them at the demand of their government. And just as Russia is penetrating the US power grid so they have that capability in the event of hostilities, many countries are almost certainly doing the same thing at the consumer level.

We don’t know whether the risk of Huawei and ZTE equipment is great enough to warrant the ban. We don’t know what classified intelligence the United States has, and what it implies. But we do know that this is just a minor fix for a much larger problem. It’s doubtful that this ban will have any real effect. Members of the military, and everyone else, can still buy the phones. They just can’t buy them on US military bases. And while the US might block the occasional merger or acquisition, or ban the occasional hardware or software product, we’re largely ignoring that larger issue. Solving it borders on somewhere between incredibly expensive and realistically impossible.

Perhaps someday, global norms and international treaties will render this sort of device-level tampering off-limits. But until then, all we can do is hope that this particular arms race doesn’t get too far out of control.

This essay previously appeared in the Washington Post.

Amazon Aurora Backtrack – Turn Back Time

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/

We’ve all been there! You need to make a quick, seemingly simple fix to an important production database. You compose the query, give it a once-over, and let it run. Seconds later you realize that you forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupt the query, but the damage has been done. You take a deep breath, whistle through your teeth, wish that reality came with an Undo option. Now what?

New Amazon Aurora Backtrack
Today I would like to tell you about the new backtrack feature for Amazon Aurora. This is as close as we can come, given present-day technology, to an Undo option for reality.

This feature can be enabled at launch time for all newly-launched Aurora database clusters. To enable it, you simply specify how far back in time you might want to rewind, and use the database as usual (this is on the Configure advanced settings page):

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

After that regrettable moment when all seems lost, you simply pause your application, open up the Aurora Console, select the cluster, and click Backtrack DB cluster:

Then you select Backtrack and choose the point in time just before your epic fail, and click Backtrack DB cluster:

Then you wait for the rewind to take place, unpause your application and proceed as if nothing had happened. When you initiate a backtrack, Aurora will pause the database, close any open connections, drop uncommitted writes, and wait for the backtrack to complete. Then it will resume normal operation and being to accept requests. The instance state will be backtracking while the rewind is underway:

The console will let you know when the backtrack is complete:

If it turns out that you went back a bit too far, you can backtrack to a later time. Other Aurora features such as cloning, backups, and restores continue to work on an instance that has been configured for backtrack.

I’m sure you can think of some creative and non-obvious use cases for this cool new feature. For example, you could use it to restore a test database after running a test that makes changes to the database. You can initiate the restoration from the API or the CLI, making it easy to integrate into your existing test framework.

Things to Know
This option applies to newly created MySQL-compatible Aurora database clusters and to MySQL-compatible clusters that have been restored from a backup. You must opt-in when you create or restore a cluster; you cannot enable it for a running cluster.

This feature is available now in all AWS Regions where Amazon Aurora runs, and you can start using it today.

Jeff;

AWS Online Tech Talks – May and Early June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-may-and-early-june-2018/

AWS Online Tech Talks – May and Early June 2018  

Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

Analytics & Big Data

May 21, 2018 | 11:00 AM – 11:45 AM PT Integrating Amazon Elasticsearch with your DevOps Tooling – Learn how you can easily integrate Amazon Elasticsearch Service into your DevOps tooling and gain valuable insight from your log data.

May 23, 2018 | 11:00 AM – 11:45 AM PTData Warehousing and Data Lake Analytics, Together – Learn how to query data across your data warehouse and data lake without moving data.

May 24, 2018 | 11:00 AM – 11:45 AM PTData Transformation Patterns in AWS – Discover how to perform common data transformations on the AWS Data Lake.

Compute

May 29, 2018 | 01:00 PM – 01:45 PM PT – Creating and Managing a WordPress Website with Amazon Lightsail – Learn about Amazon Lightsail and how you can create, run and manage your WordPress websites with Amazon’s simple compute platform.

May 30, 2018 | 01:00 PM – 01:45 PM PTAccelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.

Containers

May 24, 2018 | 01:00 PM – 01:45 PM PT – Building Microservices with the 12 Factor App Pattern on AWS – Learn best practices for building containerized microservices on AWS, and how traditional software design patterns evolve in the context of containers.

Databases

May 21, 2018 | 01:00 PM – 01:45 PM PTHow to Migrate from Cassandra to Amazon DynamoDB – Get the benefits, best practices and guides on how to migrate your Cassandra databases to Amazon DynamoDB.

May 23, 2018 | 01:00 PM – 01:45 PM PT5 Hacks for Optimizing MySQL in the Cloud – Learn how to optimize your MySQL databases for high availability, performance, and disaster resilience using RDS.

DevOps

May 23, 2018 | 09:00 AM – 09:45 AM PT.NET Serverless Development on AWS – Learn how to build a modern serverless application in .NET Core 2.0.

Enterprise & Hybrid

May 22, 2018 | 11:00 AM – 11:45 AM PTHybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.

IoT

May 31, 2018 | 11:00 AM – 11:45 AM PTUsing AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.

Machine Learning

May 22, 2018 | 09:00 AM – 09:45 AM PTUsing Apache Spark with Amazon SageMaker – Discover how to use Apache Spark with Amazon SageMaker for training jobs and application integration.

May 24, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.

Management Tools

May 21, 2018 | 09:00 AM – 09:45 AM PTGaining Better Observability of Your VMs with Amazon CloudWatch – Learn how CloudWatch Agent makes it easy for customers like Rackspace to monitor their VMs.

Mobile

May 29, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive on Amazon Pinpoint Segmentation and Endpoint Management – See how segmentation and endpoint management with Amazon Pinpoint can help you target the right audience.

Networking

May 31, 2018 | 09:00 AM – 09:45 AM PTMaking Private Connectivity the New Norm via AWS PrivateLink – See how PrivateLink enables service owners to offer private endpoints to customers outside their company.

Security, Identity, & Compliance

May 30, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS Certificate Manager Private Certificate Authority (CA) – Learn how AWS Certificate Manager (ACM) Private Certificate Authority (CA), a managed private CA service, helps you easily and securely manage the lifecycle of your private certificates.

June 1, 2018 | 09:00 AM – 09:45 AM PTIntroducing AWS Firewall Manager – Centrally configure and manage AWS WAF rules across your accounts and applications.

Serverless

May 22, 2018 | 01:00 PM – 01:45 PM PTBuilding API-Driven Microservices with Amazon API Gateway – Learn how to build a secure, scalable API for your application in our tech talk about API-driven microservices.

Storage

May 30, 2018 | 11:00 AM – 11:45 AM PTAccelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.

June 1, 2018 | 11:00 AM – 11:45 AM PTLearn to Build a Cloud-Scale Website Powered by Amazon EFS – Technical deep dive where you’ll learn tips and tricks for integrating WordPress, Drupal and Magento with Amazon EFS.

 

 

 

 

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!

Jeff;

Cryptocurrency Security Challenges

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/cryptocurrency-security-challenges/

Physical coins representing cyrptocurrencies

Most likely you’ve read the tantalizing stories of big gains from investing in cryptocurrencies. Someone who invested $1,000 into bitcoins five years ago would have over $85,000 in value now. Alternatively, someone who invested in bitcoins three months ago would have seen their investment lose 20% in value. Beyond the big price fluctuations, currency holders are possibly exposed to fraud, bad business practices, and even risk losing their holdings altogether if they are careless in keeping track of the all-important currency keys.

It’s certain that beyond the rewards and risks, cryptocurrencies are here to stay. We can’t ignore how they are changing the game for how money is handled between people and businesses.

Some Advantages of Cryptocurrency

  • Cryptocurrency is accessible to anyone.
  • Decentralization means the network operates on a user-to-user (or peer-to-peer) basis.
  • Transactions can completed for a fraction of the expense and time required to complete traditional asset transfers.
  • Transactions are digital and cannot be counterfeited or reversed arbitrarily by the sender, as with credit card charge-backs.
  • There aren’t usually transaction fees for cryptocurrency exchanges.
  • Cryptocurrency allows the cryptocurrency holder to send exactly what information is needed and no more to the merchant or recipient, even permitting anonymous transactions (for good or bad).
  • Cryptocurrency operates at the universal level and hence makes transactions easier internationally.
  • There is no other electronic cash system in which your account isn’t owned by someone else.

On top of all that, blockchain, the underlying technology behind cryptocurrencies, is already being applied to a variety of business needs and itself becoming a hot sector of the tech economy. Blockchain is bringing traceability and cost-effectiveness to supply-chain management — which also improves quality assurance in areas such as food, reducing errors and improving accounting accuracy, smart contracts that can be automatically validated, signed and enforced through a blockchain construct, the possibility of secure, online voting, and many others.

Like any new, booming marketing there are risks involved in these new currencies. Anyone venturing into this domain needs to have their eyes wide open. While the opportunities for making money are real, there are even more ways to lose money.

We’re going to cover two primary approaches to staying safe and avoiding fraud and loss when dealing with cryptocurrencies. The first is to thoroughly vet any person or company you’re dealing with to judge whether they are ethical and likely to succeed in their business segment. The second is keeping your critical cryptocurrency keys safe, which we’ll deal with in this and a subsequent post.

Caveat Emptor — Buyer Beware

The short history of cryptocurrency has already seen the demise of a number of companies that claimed to manage, mine, trade, or otherwise help their customers profit from cryptocurrency. Mt. Gox, GAW Miners, and OneCoin are just three of the many companies that disappeared with their users’ money. This is the traditional equivalent of your bank going out of business and zeroing out your checking account in the process.

That doesn’t happen with banks because of regulatory oversight. But with cryptocurrency, you need to take the time to investigate any company you use to manage or trade your currencies. How long have they been around? Who are their investors? Are they affiliated with any reputable financial institutions? What is the record of their founders and executive management? These are all important questions to consider when evaluating a company in this new space.

Would you give the keys to your house to a service or person you didn’t thoroughly know and trust? Some companies that enable you to buy and sell currencies online will routinely hold your currency keys, which gives them the ability to do anything they want with your holdings, including selling them and pocketing the proceeds if they wish.

That doesn’t mean you shouldn’t ever allow a company to keep your currency keys in escrow. It simply means that you better know with whom you’re doing business and if they’re trustworthy enough to be given that responsibility.

Keys To the Cryptocurrency Kingdom — Public and Private

If you’re an owner of cryptocurrency, you know how this all works. If you’re not, bear with me for a minute while I bring everyone up to speed.

Cryptocurrency has no physical manifestation, such as bills or coins. It exists purely as a computer record. And unlike currencies maintained by governments, such as the U.S. dollar, there is no central authority regulating its distribution and value. Cryptocurrencies use a technology called blockchain, which is a decentralized way of keeping track of transactions. There are many copies of a given blockchain, so no single central authority is needed to validate its authenticity or accuracy.

The validity of each cryptocurrency is determined by a blockchain. A blockchain is a continuously growing list of records, called “blocks”, which are linked and secured using cryptography. Blockchains by design are inherently resistant to modification of the data. They perform as an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable, permanent way. A blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without the alteration of all subsequent blocks, which requires collusion of the network majority. On a scaled network, this level of collusion is impossible — making blockchain networks effectively immutable and trustworthy.

Blockchain process

The other element common to all cryptocurrencies is their use of public and private keys, which are stored in the currency’s wallet. A cryptocurrency wallet stores the public and private “keys” or “addresses” that can be used to receive or spend the cryptocurrency. With the private key, it is possible to write in the public ledger (blockchain), effectively spending the associated cryptocurrency. With the public key, it is possible for others to send currency to the wallet.

What is a cryptocurrency address?

Cryptocurrency “coins” can be lost if the owner loses the private keys needed to spend the currency they own. It’s as if the owner had lost a bank account number and had no way to verify their identity to the bank, or if they lost the U.S. dollars they had in their wallet. The assets are gone and unusable.

The Cryptocurrency Wallet

Given the importance of these keys, and lack of recourse if they are lost, it’s obviously very important to keep track of your keys.

If you’re being careful in choosing reputable exchanges, app developers, and other services with whom to trust your cryptocurrency, you’ve made a good start in keeping your investment secure. But if you’re careless in managing the keys to your bitcoins, ether, Litecoin, or other cryptocurrency, you might as well leave your money on a cafe tabletop and walk away.

What Are the Differences Between Hot and Cold Wallets?

Just like other numbers you might wish to keep track of — credit cards, account numbers, phone numbers, passphrases — cryptocurrency keys can be stored in a variety of ways. Those who use their currencies for day-to-day purchases most likely will want them handy in a smartphone app, hardware key, or debit card that can be used for purchases. These are called “hot” wallets. Some experts advise keeping the balances in these devices and apps to a minimal amount to avoid hacking or data loss. We typically don’t walk around with thousands of dollars in U.S. currency in our old-style wallets, so this is really a continuation of the same approach to managing spending money.

Bread mobile app screenshot

A “hot” wallet, the Bread mobile app

Some investors with large balances keep their keys in “cold” wallets, or “cold storage,” i.e. a device or location that is not connected online. If funds are needed for purchases, they can be transferred to a more easily used payment medium. Cold wallets can be hardware devices, USB drives, or even paper copies of your keys.

Trezor hardware wallet

A “cold” wallet, the Trezor hardware wallet

Ledger Nano S hardware wallet

A “cold” wallet, the Ledger Nano S

Bitcoin paper wallet

A “cold” Bitcoin paper wallet

Wallets are suited to holding one or more specific cryptocurrencies, and some people have multiple wallets for different currencies and different purposes.

A paper wallet is nothing other than a printed record of your public and private keys. Some prefer their records to be completely disconnected from the internet, and a piece of paper serves that need. Just like writing down an account password on paper, however, it’s essential to keep the paper secure to avoid giving someone the ability to freely access your funds.

How to Keep your Keys, and Cryptocurrency Secure

In a post this coming Thursday, Securing Your Cryptocurrency, we’ll discuss the best strategies for backing up your cryptocurrency so that your currencies don’t become part of the millions that have been lost. We’ll cover the common (and uncommon) approaches to backing up hot wallets, cold wallets, and using paper and metal solutions to keeping your keys safe.

In the meantime, please tell us of your experiences with cryptocurrencies — good and bad — and how you’ve dealt with the issue of cryptocurrency security.

The post Cryptocurrency Security Challenges appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Ray Ozzie’s Encryption Backdoor

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/ray_ozzies_encr.html

Last month, Wired published a long article about Ray Ozzie and his supposed new scheme for adding a backdoor in encrypted devices. It’s a weird article. It paints Ozzie’s proposal as something that “attains the impossible” and “satisfies both law enforcement and privacy purists,” when (1) it’s barely a proposal, and (2) it’s essentially the same key escrow scheme we’ve been hearing about for decades.

Basically, each device has a unique public/private key pair and a secure processor. The public key goes into the processor and the device, and is used to encrypt whatever user key encrypts the data. The private key is stored in a secure database, available to law enforcement on demand. The only other trick is that for law enforcement to use that key, they have to put the device in some sort of irreversible recovery mode, which means it can never be used again. That’s basically it.

I have no idea why anyone is talking as if this were anything new. Several cryptographers have already explained why this key escrow scheme is no better than any other key escrow scheme. The short answer is (1) we won’t be able to secure that database of backdoor keys, (2) we don’t know how to build the secure coprocessor the scheme requires, and (3) it solves none of the policy problems around the whole system. This is the typical mistake non-cryptographers make when they approach this problem: they think that the hard part is the cryptography to create the backdoor. That’s actually the easy part. The hard part is ensuring that it’s only used by the good guys, and there’s nothing in Ozzie’s proposal that addresses any of that.

I worry that this kind of thing is damaging in the long run. There should be some rule that any backdoor or key escrow proposal be a fully specified proposal, not just some cryptography and hand-waving notions about how it will be used in practice. And before it is analyzed and debated, it should have to satisfy some sort of basic security analysis. Otherwise, we’ll be swatting pseudo-proposals like this one, while those on the other side of this debate become increasingly convinced that it’s possible to design one of these things securely.

Already people are using the National Academies report on backdoors for law enforcement as evidence that engineers are developing workable and secure backdoors. Writing in Lawfare, Alan Z. Rozenshtein claims that the report — and a related New York Times story — “undermine the argument that secure third-party access systems are so implausible that it’s not even worth trying to develop them.” Susan Landau effectively corrects this misconception, but the damage is done.

Here’s the thing: it’s not hard to design and build a backdoor. What’s hard is building the systems — both technical and procedural — around them. Here’s Rob Graham:

He’s only solving the part we already know how to solve. He’s deliberately ignoring the stuff we don’t know how to solve. We know how to make backdoors, we just don’t know how to secure them.

A bunch of us cryptographers have already explained why we don’t think this sort of thing will work in the foreseeable future. We write:

Exceptional access would force Internet system developers to reverse “forward secrecy” design practices that seek to minimize the impact on user privacy when systems are breached. The complexity of today’s Internet environment, with millions of apps and globally connected services, means that new law enforcement requirements are likely to introduce unanticipated, hard to detect security flaws. Beyond these and other technical vulnerabilities, the prospect of globally deployed exceptional access systems raises difficult problems about how such an environment would be governed and how to ensure that such systems would respect human rights and the rule of law.

Finally, Matthew Green:

The reason so few of us are willing to bet on massive-scale key escrow systems is that we’ve thought about it and we don’t think it will work. We’ve looked at the threat model, the usage model, and the quality of hardware and software that exists today. Our informed opinion is that there’s no detection system for key theft, there’s no renewability system, HSMs are terrifically vulnerable (and the companies largely staffed with ex-intelligence employees), and insiders can be suborned. We’re not going to put the data of a few billion people on the line an environment where we believe with high probability that the system will fail.

EDITED TO ADD (5/14): An analysis of the proposal.

Infamous ‘Kodi Box’ Case Sees Man Pay Back Just £1 to the State

Post Syndicated from Andy original https://torrentfreak.com/infamous-kodi-box-case-sees-man-pay-back-just-1-to-the-state-180507/

In 2015, Middlesbrough-based shopkeeper Brian ‘Tomo’ Thompson shot into the headlines after being raided by police and Trading Standards in the UK.

Thompson had been selling “fully-loaded” piracy-configured Kodi boxes from his shop but didn’t think he’d done anything wrong.

“All I want to know is whether I am doing anything illegal. I know it’s a gray area but I want it in black and white,” he said.

Thompson started out with a particularly brave tone. He insisted he’d take the case to Crown Court and even to the European Court. His mission was show what was legal and what wasn’t, he said.

Very quickly, Thompson’s case took on great importance, with observers everywhere reporting on a potential David versus Goliath copyright battle for the ages. But Thompson’s case wasn’t straightforward.

The shopkeeper wasn’t charged with basic “making available” under the Copyrights, Designs and Patents Acts that would have found him guilty under the earlier BREIN v Filmspeler case. Instead, he stood accused of two offenses under section 296ZB of the Copyright, Designs and Patents Act, which deals with devices and services designed to “circumvent technological measures”.

In the end it was all moot. After entering his official ‘not guilty’ plea, last year Thompson suddenly changed his tune. He accepted the prosecution’s version of events, throwing himself at the mercy of the court with a guilty plea.

In October 2017, Teeside Crown Court heard that Thompson cost Sky around £200,000 in lost subscriptions while the shopkeeper made around £38,500 from selling the devices. But despite the fairly big numbers, Judge Peter Armstrong decided to go reasonably light on the 55-year-old, handing him an 18-month prison term, suspended for two years.

“I’ve come to the conclusion that in all the circumstances an immediate custodial sentence is not called for. But as a warning to others in future, they may not be so lucky,” the Judge said.

But things wouldn’t end there for Thompson.

In the UK, people who make money or obtain assets from criminal activity can be forced to pay back their profits, which are then confiscated by the state under the Proceeds of Crime Act (pdf). Almost anything can be taken, from straight cash to cars, jewellery and houses.

However, it appears that whatever cash Thompson earned from Kodi Box activities has long since gone.

During a Proceeds of Crime hearing reported on by Gazette Live, the Court heard that Thompson has no assets whatsoever so any confiscation order would have to be a small one.

In the end, Judge Simon Hickey decided that Thompson should forfeit a single pound, an amount that could increase if the businessman got lucky moving forward.

“If anything changes in the future, for instance if you win the lottery, it might come back,” the Judge said.

With that seeming particularly unlikely, perhaps this will be the end for Thompson. Considering the gravity and importance placed on his case, zero jail time and just a £1 to pay back will probably be acceptable to the 55-year-old and also a lesson to the authorities, who have gotten very little out of this expensive case.

Who knows, perhaps they might sum up the outcome using the same eight-letter word that Thompson can be seen half-covering in this photograph.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

YouTube Won’t Put Up With Blatant Piracy Tutorials Forever

Post Syndicated from Andy original https://torrentfreak.com/youtube-wont-put-up-with-blatant-piracy-tutorials-forever-180506/

Once upon a time, Internet users’ voices would be heard in limited circles, on platforms such as Usenet or other niche platforms.

Then, with the rise of forum platforms such as phpBB in 2000 and Invision Power Board in 2002, thriving communities could gather in public to discuss endless specialist topics, including file-sharing of course.

When dedicated piracy forums began to gain traction, it was pretty much a free-for-all. People discussed obtaining free content absolutely openly. Nothing was taboo and no one considered that there would be any repercussions. As such, moderation was limited to keeping troublemakers in check.

As the years progressed and lawsuits against both sites and services became more commonplace, most sites that weren’t actually serving illegal content began to consider their positions. Run by hobbyists, most didn’t want the hassle of a multi-million dollar lawsuit, so links to pirate content began to diminish and the more overt piracy tutorials began to disappear underground.

Those that remained in plain sight became much more considered. Tutorials on how to pirate specific Hollywood blockbusters were no longer needed, a plain general tutorial would suffice. And, as communities matured and took time to understand the implications of their actions, those without political motivations realized that drawing attention to potential criminality was neither required nor necessary.

Then YouTube and social media happened and almost overnight, no one was in charge and anyone could say whatever they liked.

In this new reality, there were no irritating moderator-type figures removing links to this and that, and nobody warning people against breaking rules that suddenly didn’t exist anymore. In essence, previously tight-knit and street-wise file-sharing and piracy communities not only became fragmented, but also chaotic.

This meant that anyone could become a leader and in some cases, this was the utopia that many had hoped for. Not only couldn’t the record labels or Hollywood tell people what to do anymore, discussion site operators couldn’t either. For those who didn’t abuse the power and for those who knew no better, this was a much-needed breath of fresh air. But, like all good things, it was unlikely to last forever.

Where most file-sharing of yesterday was carried out by hobbyist enthusiasts, many of today’s pirates are far more casual. They’re just as thirsty for content, but they don’t want to spend hours hunting for it. They want it all on a plate, at the flick of a switch, delivered to their TV with a minimum of hassle.

With online discussions increasingly seen as laborious and old-fashioned, many mainstream pirates have turned to easy-to-consume videos. In support of their Kodi media player habits, YouTube has become the educational platform of choice for millions.

As a result, there is now a long line of self-declared Kodi piracy specialists scooping up millions of views on YouTube. Their videos – which in many cases are thinly veiled advertisements for third party addons, Kodi ‘builds’, illegal IPTV services, and obscure Android APKs – are now the main way for a new generation to obtain direct advice on pirating.

Many of the videos are incredibly blatant, like the past 15 years of litigation never happened. All the lessons learned by the phpBB board operators of yesteryear, of how to achieve their goals of sharing information without getting shut down, have been long forgotten. In their place, a barrage of daily videos designed to generate clicks and affiliate revenue, no matter what the cost, no matter what the risk.

It’s pretty clear that these videos are at least partly responsible for the phenomenal uptick in Kodi and Android-based piracy over the past few years. In that respect, many lovers of free content will be eternally grateful for the service they’ve provided. But like many piracy movements over the years, people shouldn’t get too attached to them, at least in their current form.

Thanks to the devil-may-care approach of many influential YouTubers, it won’t be long before a whole new set of moderators begin flexing their muscles. While your average phpBB moderator could be reasoned with in order to get a second chance, a determined and largely faceless YouTube will eject offenders without so much as a clear explanation.

When this happens (and it’s only a question of time given the growing blatancy of many tutorials) YouTubers will not only lose their voices but their revenue streams too. While YouTube’s partner programs bring in some welcome cash, the profitable affiliate schemes touted on these channels for external products will also be under threat.

Perhaps the most surprising thing in this drama-waiting-to-happen is that many of the most popular YouTubers can hardly be considered young and naive. While some are of more tender years, most – with their undoubted skill, knowledge and work ethic – should know better for their 30 or 40 years on this planet. Yet not only do they make their names public, they feature their faces heavily in their videos too.

Still, it’s likely that it will take some big YouTube accounts to fall before YouTubers respond by shaving the sharp edges off their blatant promotion of illegal activity. And there’s little doubt that those advertising products (which is most of them) will have to do so sooner rather than later.

Just this week, YouTube made it clear that it won’t tolerate people making money from the promotion of illegal activities.

“YouTube creators may include paid endorsements as part of their content only if the product or service they are endorsing complies with our advertising policies,” YouTube told the BBC.

“We will be working with creators going forward so they better understand that in video promotions [they] must not promote dishonest activity.”

That being said, like many other players in the piracy and file-sharing space over the past 18 years, YouTubers will eventually begin to learn that not only can the smart survive, they can flourish too.

Sure, there will be people out there who’ll protest that free speech allows citizens to express themselves in a manner of their choosing. But try PM’ing that to YouTube in response to a strike, and see how that fares.

When they say you’re done, the road back is a long one.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”