Tag Archives: raw

Kernel 4.17 released

Post Syndicated from corbet original https://lwn.net/Articles/756373/rss

Linus has released the 4.17 kernel, which
will indeed be called “4.17”.
No, I didn’t call it 5.0, even though all the git object count
numerology was in place for that. It will happen in the not _too_
distant future, and I’m told all the release scripts on kernel.org are
ready for it, but I didn’t feel there was any real reason for it.

Headline features in this release include
improved load estimation in the CPU
BPF tracepoints
lazytime support in the XFS filesystem,
full in-kernel TLS protocol support,
histogram triggers for tracing,
mitigations for the latest Spectre variants,
and, of course, the removal of support for eight unloved processor

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!


Randomly generated, thermal-printed comics

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/random-comic-strip-generation-vomit-comic-robot/

Python code creates curious, wordless comic strips at random, spewing them from the thermal printer mouth of a laser-cut body reminiscent of Disney Pixar’s WALL-E: meet the Vomit Comic Robot!

The age of the thermal printer!

Thermal printers allow you to instantly print photos, data, and text using a few lines of code, with no need for ink. More and more makers are using this handy, low-maintenance bit of kit for truly creative projects, from Pierre Muth’s tiny PolaPi-Zero camera to the sound-printing Waves project by Eunice Lee, Matthew Zhang, and Bomani McClendon (and our own Secret Santa Babbage).

Vomiting robots

Interaction designer and developer Cadin Batrack, whose background is in game design and interactivity, has built the Vomit Comic Robot, which creates “one-of-a-kind comics on demand by processing hand-drawn images through a custom software algorithm.”

The robot is made up of a Raspberry Pi 3, a USB thermal printer, and a handful of LEDs.

Comic Vomit Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

At the press of a button, Processing code selects one of a set of Cadin’s hand-drawn empty comic grids and then randomly picks images from a library to fill in the gaps.

Vomit Comic Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

Each image is associated with data that allows the code to fit it correctly into the available panels. Cadin says about the concept behing his build:

Although images are selected and placed randomly, the comic panel format suggests relationships between elements. Our minds create a story where there is none in an attempt to explain visuals created by a non-intelligent machine.

The Raspberry Pi saves the final image as a high-resolution PNG file (so that Cadin can sell prints on thick paper via Etsy), and a Python script sends it to be vomited up by the thermal printer.

Comic Vomit Robot Cadin Batrack's Raspberry Pi comic-generating thermal printer machine

For more about the Vomit Comic Robot, check out Cadin’s blog. If you want to recreate it, you can find the info you need in the Imgur album he has put together.

We ❤ cute robots

We have a soft spot for cute robots here at Pi Towers, and of course we make no exception for the Vomit Comic Robot. If, like us, you’re a fan of adorable bots, check out Mira, the tiny interactive robot by Alonso Martinez, and Peeqo, the GIF bot by Abhishek Singh.

Mira Alfonso Martinez Raspberry Pi

The post Randomly generated, thermal-printed comics appeared first on Raspberry Pi.

[$] Stratis: Easy local storage management for Linux

Post Syndicated from jake original https://lwn.net/Articles/755454/rss

Stratis is a new local
storage-management solution for Linux. It can be compared to
ZFS, Btrfs, or LVM. Its focus is on simplicity of concepts and ease of use,
while giving users access to advanced storage features. Internally,
Stratis’s implementation favors tight integration of existing
components instead of the fully-integrated, in-kernel approach that ZFS and
Btrfs use. This has benefits and drawbacks for Stratis, but also greatly
decreases the overall time needed to develop a useful and stable initial
version, which can then be a base for further improvement in later
versions. Subscribers can read on for an introduction to Stratis, by guest
author (and Stratis team lead at Red Hat) Andy Grover.

Enchanting images with Inky Lines, a Pi‑powered polargraph

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/enchanting-images-inky-lines-pi-powered-polargraph/

A hanging plotter, also known as a polar plotter or polargraph, is a machine for drawing images on a vertical surface. It does so by using motors to control the length of two cords that form a V shape, supporting a pen where they meet. We’ve featured one on this blog before: Norbert “HomoFaciens” Heinz’s video is a wonderfully clear introduction to how a polargraph works and what you have to consider when you’re putting one together.

Today, we look at Inky Lines, by John Proudlock. With it, John is creating a series of captivating and beautiful pieces, and with his most recent work, each rendering of an image is unique.

The Inky Lines plotter draws a flock of seagulls in blue ink on white paper. The print head is suspended near the bottom left corner of the image, as the pen inks the wing of a gull

An evolving project

The project isn’t new – John has been working on it for at least a couple of years – but it is constantly evolving. When we first spotted it, John had just implemented code to allow the plotter to produce mesmeric, spiralling patterns.

A blue spiral pattern featuring overlapping "bubbles"
A dense pink spiral pattern, featuring concentric circles and reminiscent of a mandala
A blue spirograph-type pattern formed of large overlapping squares, each offset from its neighbour by a few degrees, producing a four-spiral-armed "galaxy" shape where lines overlap. The plotter's print head is visible in a corner of the image

But we’re skipping ahead. Let’s go back to the beginning.

From pixels to motor movements

John starts by providing an image, usually no more than 100 pixels wide, to a Raspberry Pi. Custom software that he wrote evaluates the darkness of each pixel and selects a pattern of a suitable density to represent it.

The two cords supporting the plotter’s pen are wound around the shafts of two stepper motors, such that the movement of the motors controls the length of the cords: the program next calculates how much each motor must move in order to produce the pattern. The Raspberry Pi passes corresponding instructions to two motor circuits, which transform the signals to a higher voltage and pass them to the stepper motors. These turn by very precise amounts, winding or unwinding the cords and, very slowly, dragging the pen across the paper.

A Raspberry Pi in a case, with a wide flex connected to a GPIO header
The Inky Lines plotter's print head, featuring cardboard and tape, draws an apparently random squiggle
A large area of apparently random pattern drawn by the plotter

John explains,

Suspended in-between the two motors is a print head, made out of a new 3-d modelling material I’ve been prototyping called cardboard. An old coat hanger and some velcro were also used.

(He’s our kind of maker.)

Unique images

The earlier drawings that John made used a repeatable method to render image files as lines on paper. That is, if the machine drew the same image a number of times, each copy would be identical. More recently, though, he has been using a method that yields random movements of the pen:

The pen point is guided around the image, but moves to each new point entirely at random. Up close this looks like a chaotic squiggle, but from a distance of a couple of meters, the human eye (and brain) make order from the chaos and view an infinite number of shades and a smoother, less mechanical image.

An apparently chaotic squiggle

This method means that no matter how many times the polargraph repeats the same image, each copy will be unique.

A gallery of work

Inky Lines’ website and its Instagram feed offer a collection of wonderful pieces John has drawn with his polargraph, and he discusses the different techniques and types of image that he is exploring.

A 3 x 3 grid of varied and colourful images from inkylinespolargraph's Instagram feed

They range from holiday photographs, processed to extract particular features and rendered in silhouette, to portraits, made with a single continuous line that can be several hundred metres long, to generative images spirograph images like those pictured above, created by an algorithm rather than rendered from a source image.

The post Enchanting images with Inky Lines, a Pi‑powered polargraph appeared first on Raspberry Pi.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/755076/rss

Security updates have been issued by Arch Linux (lib32-curl, lib32-libcurl-compat, lib32-libcurl-gnutls, libcurl-compat, and libcurl-gnutls), CentOS (firefox), Debian (imagemagick), Fedora (exiv2, LibRaw, and love), Gentoo (chromium), Mageia (kernel, librelp, and miniupnpc), openSUSE (curl, enigmail, ghostscript, libvorbis, lilypond, and thunderbird), Red Hat (Red Hat OpenStack Platform director), and Ubuntu (firefox).

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/754653/rss

Security updates have been issued by CentOS (dhcp), Debian (xen), Fedora (dhcp, flac, kubernetes, leptonica, libgxps, LibRaw, matrix-synapse, mingw-LibRaw, mysql-mmm, patch, seamonkey, webkitgtk4, and xen), Mageia (389-ds-base, exempi, golang, graphite2, libpam4j, libraw, libsndfile, libtiff, perl, quassel, spring-ldap, util-linux, and wget), Oracle (dhcp and kernel), Red Hat (389-ds-base, chromium-browser, dhcp, docker-latest, firefox, kernel-alt, libvirt, qemu-kvm, redhat-vertualization-host, rh-haproxy18-haproxy, and rhvm-appliance), Scientific Linux (389-ds-base, dhcp, firefox, libvirt, and qemu-kvm), and Ubuntu (poppler).

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/754430/rss

Security updates have been issued by Debian (tiff and tiff3), Fedora (glusterfs, kernel, libgxps, LibRaw, postgresql, seamonkey, webkit2gtk3, wget, and xen), Mageia (afflib, flash-player-plugin, imagemagick, qpdf, and transmission), openSUSE (Chromium, opencv, and xen), SUSE (kernel), and Ubuntu (firefox).

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:



The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
      '$.agentstatus.name') AS current_status,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
      '$.configuration.firstname') AS current_firstname,
      '$.configuration.lastname') AS current_lastname,
      '$.configuration.username') AS current_username,
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
      '$.agentstatus.name') as prev_status,
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
      '$.configuration.firstname') as prev_firstname,
      '$.configuration.lastname') as prev_lastname,
      '$.configuration.username') as prev_username,
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
  current_status as status,
  current_username as username,
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
src2 AS (
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
dataset AS (
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:


In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.

About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.




Airline Ticket Fraud

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/airline_ticket_.html

New research: “Leaving on a jet plane: the trade in fraudulently obtained airline tickets:”

Abstract: Every day, hundreds of people fly on airline tickets that have been obtained fraudulently. This crime script analysis provides an overview of the trade in these tickets, drawing on interviews with industry and law enforcement, and an analysis of an online blackmarket. Tickets are purchased by complicit travellers or resellers from the online blackmarket. Victim travellers obtain tickets from fake travel agencies or malicious insiders. Compromised credit cards used to be the main method to purchase tickets illegitimately. However, as fraud detection systems improved, offenders displaced to other methods, including compromised loyalty point accounts, phishing, and compromised business accounts. In addition to complicit and victim travellers, fraudulently obtained tickets are used for transporting mules, and for trafficking and smuggling. This research details current prevention approaches, and identifies additional interventions, aimed at the act, the actor, and the marketplace.

Blog post.

Security updates for Thursday

Post Syndicated from ris original https://lwn.net/Articles/754145/rss

Security updates have been issued by Arch Linux (freetype2, libraw, and powerdns), CentOS (389-ds-base and kernel), Debian (php5, prosody, and wavpack), Fedora (ckeditor, fftw, flac, knot-resolver, patch, perl, and perl-Dancer2), Mageia (cups, flac, graphicsmagick, libcdio, libid3tag, and nextcloud), openSUSE (apache2), Oracle (389-ds-base and kernel), Red Hat (389-ds-base and flash-plugin), Scientific Linux (389-ds-base), Slackware (firefox and wget), SUSE (xen), and Ubuntu (wget).

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/754021/rss

Security updates have been issued by Debian (kernel), Gentoo (rsync), openSUSE (Chromium), Oracle (kernel), Red Hat (kernel and kernel-rt), Scientific Linux (kernel), SUSE (kernel and php7), and Ubuntu (dpdk, libraw, linux, linux-lts-trusty, linux-snapdragon, and webkit2gtk).

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!


Whimsical builds and messing things up

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/whimsical-builds-and-messing-things-up/

Today is the early May bank holiday in England and Wales, a public holiday, and while this blog rarely rests, the Pi Towers team does. So, while we take a day with our families, our friends, and/or our favourite pastimes, I thought I’d point you at a couple of features from HackSpace magazine, our monthly magazine for makers.

To my mind, they go quite well with a deckchair in the garden, the buzz of a lawnmower a few houses down, and a view of the weeds I ought to have dealt with by now, but I’m sure you’ll find your own ambience.

Make anything with pencils – HackSpace magazine

If you want a unique piece of jewellery to show your love for pencils, follow Peter Brown’s lead. Peter glued twelve pencils together in two rows of six. He then measured the size of his finger and drilled a hole between the glued pencils using a drill bit.

First off, pencils. It hadn’t occurred to me that you could make super useful stuff like a miniature crossbow and a catapult out of pencils. Not only can you do this, you can probably go ahead and do it right now: all you need is a handful of pencils, some rubber bands, some drawing pins, and a bulldog clip (or, as you might prefer, some push pins and a binder clip). The sentence that really leaps out at me here is “To keep a handful of boys aged three to eleven occupied during a family trip, Marie decided to build mini crossbows to help their target practice.” The internet hasn’t helped me find out much about Marie, but I am in awe of her.

If you haven’t wandered off to make a stationery arsenal by now, read Lucy Rogers‘ reflections on making a right mess of things. I hope you do, because I think it’d be great if more people coped better with the fact that we all, unavoidably, fail. You probably won’t really get anywhere without a few goes where you just completely muck it all up.

A ceramic mug, broken into several pieces on the floor

Never mind. We can always line a plant pot with them.
“In Pieces” by dusk-photography / CC BY

This true of everything. Wet lab work and gardening and coding and parenting. And everything. You can share your heroic failures in the comments, if you like, as well as any historic weaponry you have fashioned from the contents of your desk tidy.

The post Whimsical builds and messing things up appeared first on Raspberry Pi.

The Helium Factor and Hard Drive Failure Rates

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/helium-filled-hard-drive-failure-rates/

Seagate Enterprise Capacity 3.5 Helium HDD

In November 2013, the first commercially available helium-filled hard drive was introduced by HGST, a Western Digital subsidiary. The 6 TB drive was not only unique in being helium-filled, it was for the moment, the highest capacity hard drive available. Fast forward a little over 4 years later and 12 TB helium-filled drives are readily available, 14 TB drives can be found, and 16 TB helium-filled drives are arriving soon.

Backblaze has been purchasing and deploying helium-filled hard drives over the past year and we thought it was time to start looking at their failure rates compared to traditional air-filled drives. This post will provide an overview, then we’ll continue the comparison on a regular basis over the coming months.

The Promise and Challenge of Helium Filled Drives

We all know that helium is lighter than air — that’s why helium-filled balloons float. Inside of an air-filled hard drive there are rapidly spinning disk platters that rotate at a given speed, 7200 rpm for example. The air inside adds an appreciable amount of drag on the platters that in turn requires an appreciable amount of additional energy to spin the platters. Replacing the air inside of a hard drive with helium reduces the amount of drag, thereby reducing the amount of energy needed to spin the platters, typically by 20%.

We also know that after a few days, a helium-filled balloon sinks to the ground. This was one of the key challenges in using helium inside of a hard drive: helium escapes from most containers, even if they are well sealed. It took years for hard drive manufacturers to create containers that could contain helium while still functioning as a hard drive. This container innovation allows helium-filled drives to function at spec over the course of their lifetime.

Checking for Leaks

Three years ago, we identified SMART 22 as the attribute assigned to recording the status of helium inside of a hard drive. We have both HGST and Seagate helium-filled hard drives, but only the HGST drives currently report the SMART 22 attribute. It appears the normalized and raw values for SMART 22 currently report the same value, which starts at 100 and goes down.

To date only one HGST drive has reported a value of less than 100, with multiple readings between 94 and 99. That drive continues to perform fine, with no other errors or any correlating changes in temperature, so we are not sure whether the change in value is trying to tell us something or if it is just a wonky sensor.

Helium versus Air-Filled Hard Drives

There are several different ways to compare these two types of drives. Below we decided to use just our 8, 10, and 12 TB drives in the comparison. We did this since we have helium-filled drives in those sizes. We left out of the comparison all of the drives that are 6 TB and smaller as none of the drive models we use are helium-filled. We are open to trying different comparisons. This just seemed to be the best place to start.

Lifetime Hard Drive Failure Rates: Helium vs. Air-Filled Hard Drives table

The most obvious observation is that there seems to be little difference in the Annualized Failure Rate (AFR) based on whether they contain helium or air. One conclusion, given this evidence, is that helium doesn’t affect the AFR of hard drives versus air-filled drives. My prediction is that the helium drives will eventually prove to have a lower AFR. Why? Drive Days.

Let’s go back in time to Q1 2017 when the air-filled drives listed in the table above had a similar number of Drive Days to the current number of Drive Days for the helium drives. We find that the failure rate for the air-filled drives at the time (Q1 2017) was 1.61%. In other words, when the drives were in use a similar number of hours, the helium drives had a failure rate of 1.06% while the failure rate of the air-filled drives was 1.61%.

Helium or Air?

My hypothesis is that after normalizing the data so that the helium and air-filled drives have the same (or similar) usage (Drive Days), the helium-filled drives we use will continue to have a lower Annualized Failure Rate versus the air-filled drives we use. I expect this trend to continue for the next year at least. What side do you come down on? Will the Annualized Failure Rate for helium-filled drives be better than air-filled drives or vice-versa? Or do you think the two technologies will be eventually produce the same AFR over time? Pick a side and we’ll document the results over the next year and see where the data takes us.

The post The Helium Factor and Hard Drive Failure Rates appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/753457/rss

Security updates have been issued by CentOS (firefox, java-1.7.0-openjdk, java-1.8.0-openjdk, librelp, patch, and python-paramiko), Debian (kernel and quassel), Gentoo (chromium, hesiod, and python), openSUSE (corosync, dovecot22, libraw, patch, and squid), Oracle (java-1.7.0-openjdk), Red Hat (go-toolset-7 and go-toolset-7-golang, java-1.7.0-openjdk, and rh-php70-php), and SUSE (corosync and patch).

EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-fleet-manage-thousands-of-on-demand-and-spot-instances-with-one-request/

EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.

EC2 Fleet
Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.

You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.

I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).

Using EC2 Fleet
There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.

Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.

You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:

  • m4.16xlarge – 64 vCPUs, weight = 64
  • m5.24xlarge – 96 vCPUs, weight = 96

This is expressed in a template that looks like this:

"Overrides": [
  "InstanceType": "m4.16xlarge",
  "WeightedCapacity": 64,
  "InstanceType": "m5.24xlarge",
  "WeightedCapacity": 96,

By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.

Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 2880,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 1920,
	"DefaultTargetCapacityType": "Spot"

The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.

Putting it all together, I have a single file (fl1.json) that describes my fleet:

    "LaunchTemplateConfigs": [
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-0e8c754449b27161c",
                "Version": "1"
        "Overrides": [
          "InstanceType": "m4.16xlarge",
          "WeightedCapacity": 64,
          "InstanceType": "m5.24xlarge",
          "WeightedCapacity": 96,
    "TargetCapacitySpecification": {
        "TotalTargetCapacity": 2880,
        "OnDemandTargetCapacity": 960,
        "SpotTargetCapacity": 1920,
        "DefaultTargetCapacityType": "Spot"

I can launch my fleet with a single command:

$ aws ec2 create-fleet --cli-input-json file://home/ec2-user/fl1.json

My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.

Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:

    "TotalTargetCapacity": 960,

Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 960,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 0,
	"DefaultTargetCapacityType": "Spot"

When I no longer need my fleet I can delete it and terminate the instances in it like this:

$ aws ec2 delete-fleets --fleet-id fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a \
    "UnsuccessfulFleetDletetions": [],
    "SuccessfulFleetDeletions": [
            "CurrentFleetState": "deleted_terminating",
            "PreviousFleetState": "active",
            "FleetId": "fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a"

Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.

In the Works
We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.

Available Now
You can create and make use of EC2 Fleets today in all public AWS Regions!