Have you ever witnessed something marvellous but, by the time you get your camera out to record it, the moment has passed? Johan Link‘s Film in the Past hat-mounted camera is here to save the day!
Record the past
As 18-year-old student Johan explains, “Imagine you are walking in the street and you see a meteorite in the sky – obviously you don’t have time to take your phone to film it.” While I haven’t seen many meteorites in the sky, I have found myself wishing I’d had a camera to hand more than once in my life – usually when a friend trips over or says something ridiculous. “Fortunately after the passage of the meteorite, you just have to press a button on the hat and the camera will record the last 7 seconds”, Johan continues. “Then you can download the video from an application on your phone.”
Johan’s project, Film in the Past, consists of a Raspberry Pi 3 with USB camera attached, mounted to the peak of a baseball cap.
The camera is always on, and, at the press of a button, will save the last seven seconds of footage to the Raspberry Pi. You can then access the saved footage from an application on your smartphone. It’s a bit like the video capture function on the Xbox One or, as I like to call it, the option to record hilarious glitches during gameplay. But, unlike the Xbox One, it’s a lot easier to get the footage off the Raspberry Pi and onto your phone.
Fancy building your own? The full Python code for the project can be downloaded via GitHub, and more information can be found on Instructables and Johan’s website.
Warning: a GIF used in today’s blog contains flashing images.
Students at the University of Bremen, Germany, have built a wearable camera that records the seconds of vision lost when you blink. Augenblick uses a Raspberry Pi Zero and Camera Module alongside muscle sensors to record footage whenever you close your eyes, producing a rather disjointed film of the sights you miss out on.
Blink and you’ll miss it
The average person blinks up to five times a minute, with each blink lasting 0.5 to 0.8 seconds. These half-seconds add up to about 30 minutes a day. What sights are we losing during these minutes? That is the question asked by students Manasse Pinsuwan and René Henrich when they set out to design Augenblick.
Blinking is a highly invasive mechanism for our eyesight. Every day we close our eyes thousands of times without noticing it. Our mind manages to never let us wonder what exactly happens in the moments that we miss.
Capturing lost moments
For Augenblick, the wearer sticks MyoWare Muscle Sensor pads to their face, and these detect the electrical impulses that trigger blinking.
Two pads are applied over the orbicularis oculi muscle that forms a ring around the eye socket, while the third pad is attached to the cheek as a neutral point.
Biology fact: there are two muscles responsible for blinking. The orbicularis oculi muscle closes the eye, while the levator palpebrae superioris muscle opens it — and yes, they both sound like the names of Harry Potter spells.
The sensor is read 25 times a second. Whenever it detects that the orbicularis oculi is active, the Camera Module records video footage.
Pressing a button on the side of the Augenblick glasses set the code running. An LED lights up whenever the camera is recording and also serves to confirm the correct placement of the sensor pads.
The Pi Zero saves the footage so that it can be stitched together later to form a continuous, if disjointed, film.
Learn more about the Augenblick blink camera
You can find more information on the conception, design, and build process of Augenblickhere in German, with a shorter explanation including lots of photos here in English.
And if you’re keen to recreate this project, our free project resource for a wearable Pi Zero time-lapse camera will come in handy as a starting point.
Students taking Design of Mechatronics at the Technical University of Denmark have created some seriously elegant and striking Raspberry Pi speakers. Their builds are part of a project asking them to “explore, design and build a 3D printed speaker, around readily available electronics and components”.
The students have been uploading their designs, incorporating Raspberry Pis and HiFiBerry HATs, to Thingiverse throughout April. The task is a collaboration with luxury brand Bang & Olufsen’s Create initiative, and the results wouldn’t look out of place in a high-end showroom; I’d happily take any of these home.
Søren Qvist’s wall-mounted kitchen sphere uses 3D-printed and laser-cut parts, along with the HiFiBerry HAT and B&O speakers to create a sleek-looking design.
Otto Ømann’s group have designed the Hex One – a work-in-progress wireless 360° speaker. A particular objective for their project is to create a speaker using as many 3D-printed parts as possible.
“The design is supposed to resemble that of a B&O speaker, and from a handful of categories we chose to create a portable and wearable speaker,” explain Gustav Larsen and his team.
Oliver Repholtz Behrens and team have housed a Raspberry Pi and HiFiBerry HAT inside this this stylish airplay speaker. You can follow their design progress on their team blog.
Tue Thomsen’s six-person team Mechatastic have produced the B&O TILE. “The speaker consists of four 3D-printed cabinet and top parts, where the top should be covered by fabric,” they explain. “The speaker insides consists of laser-cut wood to hold the tweeter and driver and encase the Raspberry Pi.”
The team aimed to design a speaker that would be at home in a kitchen. With a removable upper casing allowing for a choice of colour, the TILE can be customised to fit particular tastes and colour schemes.
Build your own speakers with Raspberry Pis
Raspberry Pi’s onboard audio jack, along with third-party HATs such as the HiFiBerry and Pimoroni Speaker pHAT, make speaker design and fabrication with the Pi an interesting alternative to pre-made tech. These builds don’t tend to be technically complex, and they provide some lovely examples of tech-based projects that reflect makers’ own particular aesthetic style.
If you have access to a 3D printer or a laser cutter, perhaps at a nearby maker space, then those can be excellent resources, but fancy kit isn’t a requirement. Basic joinery and crafting with card or paper are just a couple of ways you can build things that are all your own, using familiar tools and materials. We think more people would enjoy getting hands-on with this sort of thing if they gave it a whirl, and we publish a free magazine to help.
Looking for a new project to build around the Raspberry Pi Zero, I came across the pHAT DAC from Pimoroni. This little add-on board adds audio playback capabilities to the Pi Zero. Because the pHAT uses the GPIO pins, the USB OTG port remains available for a wifi dongle.
This video by Frederick Vandenbosch is a great example of building AirPlay speakers using a Pi and HAT, and a quick search will find you lots more relevant tutorials and ideas.
Have you built your own? Share your speaker-based Pi builds with us in the comments.
This blog post was co-authored by Ujjwal Ratan, a senior AI/ML solutions architect on the global life sciences team.
Healthcare data is generated at an ever-increasing rate and is predicted to reach 35 zettabytes by 2020. Being able to cost-effectively and securely manage this data whether for patient care, research or legal reasons is increasingly important for healthcare providers.
Healthcare providers must have the ability to ingest, store and protect large volumes of data including clinical, genomic, device, financial, supply chain, and claims. AWS is well-suited to this data deluge with a wide variety of ingestion, storage and security services (e.g. AWS Direct Connect, Amazon Kinesis Streams, Amazon S3, Amazon Macie) for customers to handle their healthcare data. In a recent Healthcare IT News article, healthcare thought-leader, John Halamka, noted, “I predict that five years from now none of us will have datacenters. We’re going to go out to the cloud to find EHRs, clinical decision support, analytics.”
I realize simply storing this data is challenging enough. Magnifying the problem is the fact that healthcare data is increasingly attractive to cyber attackers, making security a top priority. According to Mariya Yao in her Forbes column, it is estimated that individual medical records can be worth hundreds or even thousands of dollars on the black market.
In this first of a 2-part post, I will address the value that AWS can bring to customers for ingesting, storing and protecting provider’s healthcare data. I will describe key components of any cloud-based healthcare workload and the services AWS provides to meet these requirements. In part 2 of this post we will dive deep into the AWS services used for advanced analytics, artificial intelligence and machine learning.
The data tsunami is upon us
So where is this data coming from? In addition to the ubiquitous electronic health record (EHR), the sources of this data include:
genomic sequencers
devices such as MRIs, x-rays and ultrasounds
sensors and wearables for patients
medical equipment telemetry
mobile applications
Additional sources of data come from non-clinical, operational systems such as:
human resources
finance
supply chain
claims and billing
Data from these sources can be structured (e.g., claims data) as well as unstructured (e.g., clinician notes). Some data comes across in streams such as that taken from patient monitors, while some comes in batch form. Still other data comes in near-real time such as HL7 messages. All of this data has retention policies dictating how long it must be stored. Much of this data is stored in perpetuity as many systems in use today have no purge mechanism. AWS has services to manage all these data types as well as their retention, security and access policies.
Imaging is a significant contributor to this data tsunami. Increasing demand for early-stage diagnoses along with aging populations drive increasing demand for images from CT, PET, MRI, ultrasound, digital pathology, X-ray and fluoroscopy. For example, a thin-slice CT image can be hundreds of megabytes. Increasing demand and strict retention policies make storage costly.
Due to the plummeting cost of gene sequencing, molecular diagnostics (including liquid biopsy) is a large contributor to this data deluge. Many predict that as the value of molecular testing becomes more identifiable, the reimbursement models will change and it will increasingly become the standard of care. According to the Washington Post article “Sequencing the Genome Creates so Much Data We Don’t Know What to do with It,”
“Some researchers predict that up to one billion people will have their genome sequenced by 2025 generating up to 40 exabytes of data per year.”
Although genomics is primarily used for oncology diagnostics today, it’s also used for other purposes, pharmacogenomics — used to understand how an individual will metabolize a medication.
Reference Architecture
It is increasingly challenging for the typical hospital, clinic or physician practice to securely store, process and manage this data without cloud adoption.
Amazon has a variety of ingestion techniques depending on the nature of the data including size, frequency and structure. AWS Snowball and AWS Snowmachine are appropriate for extremely-large, secure data transfers whether one time or episodic. AWS Glue is a fully-managed ETL service for securely moving data from on-premise to AWS and Amazon Kinesis can be used for ingesting streaming data.
Amazon S3, Amazon S3 IA, and Amazon Glacier are economical, data-storage services with a pay-as-you-go pricing model that expand (or shrink) with the customer’s requirements.
The above architecture has four distinct components – ingestion, storage, security, and analytics. In this post I will dive deeper into the first three components, namely ingestion, storage and security. In part 2, I will look at how to use AWS’ analytics services to draw value on, and optimize, your healthcare data.
Ingestion
A typical provider data center will consist of many systems with varied datasets. AWS provides multiple tools and services to effectively and securely connect to these data sources and ingest data in various formats. The customers can choose from a range of services and use them in accordance with the use case.
For use cases involving one-time (or periodic), very large data migrations into AWS, customers can take advantage of AWS Snowball devices. These devices come in two sizes, 50 TB and 80 TB and can be combined together to create a petabyte scale data transfer solution.
The devices are easy to connect and load and they are shipped to AWS avoiding the network bottlenecks associated with such large-scale data migrations. The devices are extremely secure supporting 256-bit encryption and come in a tamper-resistant enclosure. AWS Snowball imports data in Amazon S3 which can then interface with other AWS compute services to process that data in a scalable manner.
For use cases involving a need to store a portion of datasets on premises for active use and offload the rest on AWS, the Amazon storage gateway service can be used. The service allows you to seamlessly integrate on premises applications via standard storage protocols like iSCSI or NFS mounted on a gateway appliance. It supports a file interface, a volume interface and a tape interface which can be utilized for a range of use cases like disaster recovery, backup and archiving, cloud bursting, storage tiering and migration.
The AWS Storage Gateway appliance can use the AWS Direct Connect service to establish a dedicated network connection from the on premises data center to AWS.
Specific Industry Use Cases
By using the AWS proposed reference architecture for disaster recovery, healthcare providers can ensure their data assets are securely stored on the cloud and are easily accessible in the event of a disaster. The “AWS Disaster Recovery” whitepaper includes details on options available to customers based on their desired recovery time objective (RTO) and recovery point objective (RPO).
AWS is an ideal destination for offloading large volumes of less-frequently-accessed data. These datasets are rarely used in active compute operations but are exceedingly important to retain for reasons like compliance. By storing these datasets on AWS, customers can take advantage of the highly-durable platform to securely store their data and also retrieve them easily when they need to. For more details on how AWS enables customers to run back and archival use cases on AWS, please refer to the following set of whitepapers.
A healthcare provider may have a variety of databases spread throughout the hospital system supporting critical applications such as EHR, PACS, finance and many more. These datasets often need to be aggregated to derive information and calculate metrics to optimize business processes. AWS Glue is a fully-managed Extract, Transform and Load (ETL) service that can read data from a JDBC-enabled, on-premise database and transfer the datasets into AWS services like Amazon S3, Amazon Redshift and Amazon RDS. This allows customers to create transformation workflows that integrate smaller datasets from multiple sources and aggregates them on AWS.
Healthcare providers deal with a variety of streaming datasets which often have to be analyzed in near real time. These datasets come from a variety of sources such as sensors, messaging buses and social media, and often do not adhere to an industry standard. The Amazon Kinesis suite of services, that includes Amazon Kinesis Streams, Amazon Kinesis Firehose, and Amazon Kinesis Analytics, are the ideal set of services to accomplish the task of deriving value from streaming data.
Example: Using AWS Glue to de-identify and ingest healthcare data into S3 Let’s consider a scenario in which a provider maintains patient records in a database they want to ingest into S3. The provider also wants to de-identify the data by stripping personally- identifiable attributes and store the non-identifiable information in an S3 bucket. This bucket is different from the one that contains identifiable information. Doing this allows the healthcare provider to separate sensitive information with more restrictions set up via S3 bucket policies.
To ingest records into S3, we create a Glue job that reads from the source database using a Glue connection. The connection is also used by a Glue crawler to populate the Glue data catalog with the schema of the source database. We will use the Glue development endpoint and a zeppelin notebook server on EC2 to develop and execute the job.
Step 1: Import the necessary libraries and also set a glue context which is a wrapper on the spark context:
Step 2: Create a dataframe from the source data. I call the dataframe “readmissionsdata”. Here is what the schema would look like:
Step 3: Now select the columns that contains indentifiable information and store it in a new dataframe. Call the new dataframe “phi”.
Step 4: Non-PHI columns are stored in a separate dataframe. Call this dataframe “nonphi”.
Step 5: Write the two dataframes into two separate S3 buckets
Once successfully executed, the PHI and non-PHI attributes are stored in two separate files in two separate buckets that can be individually maintained.
Storage
In 2016, 327 healthcare providers reported a protected health information (PHI) breach, affecting 16.4m patient records[1]. There have been 342 data breaches reported in 2017 — involving 3.2 million patient records.[2]
To date, AWS has released 51 HIPAA-eligible services to help customers address security challenges and is in the process of making many more services HIPAA-eligible. These HIPAA-eligible services (along with all other AWS services) help customers build solutions that comply with HIPAA security and auditing requirements. A catalogue of HIPAA-enabled services can be found at AWS HIPAA-eligible services. It is important to note that AWS manages physical and logical access controls for the AWS boundary. However, the overall security of your workloads is a shared responsibility, where you are responsible for controlling user access to content on your AWS accounts.
AWS storage services allow you to store data efficiently while maintaining high durability and scalability. By using Amazon S3 as the central storage layer, you can take advantage of the Amazon S3 storage management features to get operational metrics on your data sets and transition them between various storage classes to save costs. By tagging objects on Amazon S3, you can build a governance layer on Amazon S3 to grant role based access to objects using Amazon IAM and Amazon S3 bucket policies.
To learn more about the Amazon S3 storage management features, see the following link.
Security
In the example above, we are storing the PHI information in a bucket named “phi.” Now, we want to protect this information to make sure its encrypted, does not have unauthorized access, and all access requests to the data are logged.
Encryption: S3 provides settings to enable default encryption on a bucket. This ensures any object in the bucket is encrypted by default.
Logging: S3 provides object level logging that can be used to capture all API calls to the object. The API calls are logged in cloudtrail for easy access and consolidation. Moreover, it also supports events to proactively alert customers of read and write operations.
Access control: Customers can use S3 bucket policies and IAM policies to restrict access to the phi bucket. It can also put a restriction to enforce multi-factor authentication on the bucket. For example, the following policy enforces multi-factor authentication on the phi bucket:
In Part 1 of this blog, we detailed the ingestion, storage, security and management of healthcare data on AWS. Stay tuned for part two where we are going to dive deep into optimizing the data for analytics and machine learning.
This summer, the Raspberry Pi Foundation is bringing you an all-new community event taking place in Cambridge, UK!
Raspberry Fields
On the weekend of Saturday 30 June and Sunday 1 July 2018, the Pi Towers team, with lots of help from our community of young people, educators, hobbyists, and tech enthusiasts, will be running Raspberry Fields, our brand-new annual festival of digital making!
It will be a chance for people of all ages and skill levels to have a go at getting creative with tech, and it will be a celebration of all that our digital makers have already learnt and achieved, whether through taking part in Code Clubs, CoderDojos, or Raspberry Jams, or through trying our resources at home.
Dive into digital making
At Raspberry Fields, you will have the chance to inspire your inner inventor! Learn about amazing projects others in the community are working on, such as cool robots and wearable technology; have a go at a variety of hands-on activities, from home automation projects to remote-controlled vehicles and more; see fascinating science- and technology-related talks and musical performances. After your visit, you’ll be excited to go home and get making!
If you’re wondering about bringing along young children or less technologically minded family members or friends, there’ll be plenty for them to enjoy — with lots of festival-themed activities such as face painting, fun performances, free giveaways, and delicious food, Raspberry Fields will have something for everyone!
Get your tickets
This two-day ticketed event will be taking place at Cambridge Junction, the city’s leading arts centre. Tickets are £5 if you are aged 16 or older, and free for everyone under 16. Get your tickets by clicking the button on the Raspberry Fields web page!
Where: Cambridge Junction, Clifton Way, Cambridge, CB1 7GX, UK When: Saturday 30 June 2018, 10:30 – 18:00 and Sunday 1 July 2018, 10:00 – 17:30
Get involved
We are currently looking for people who’d like to contribute activities, talks, or performances with digital themes to the festival. This could be something like live music, dance, or other show acts; talks; or drop-in making activities. In addition, we’re looking for artists who’d like to showcase interactive digital installations, for proud makers who are keen to exhibit their projects, and for vendors who’d like to join in. We particularly encourage young people to showcase projects they’ve created or deliver talks on their digital making journey!
Your contribution to Raspberry Fields should focus on digital making and be fun and engaging for an audience of various ages. However, it doesn’t need to be specific to Raspberry Pi. You might be keen to demonstrate a project you’ve built, do a short Q&A session on what you’ve learnt, or present something more in-depth in the auditorium; maybe you’re one of our approved resellers wanting to showcase in our market area. We’re also looking for digital makers to run drop-in activity sessions, as well as for people who’d like to be marshals with smiling faces who will ensure that everyone has a wonderful time!
If you’d like to take part in Raspberry Fields, let us know via this form, and we’ll be in touch with you soon.
A couple of weekends ago, we celebrated our sixth birthday by coordinating more than 100 simultaneous Raspberry Jam events around the world. The Big Birthday Weekend was a huge success: our fantastic community organised Jams in 40 countries, covering six continents!
We sent the Jams special birthday kits to help them celebrate in style, and a video message featuring a thank you from Philip and Eben:
To celebrate the Raspberry Pi’s sixth birthday, we coordinated Raspberry Jams all over the world to take place over the Raspberry Jam Big Birthday Weekend, 3-4 March 2018. A massive thank you to everyone who ran an event and attended.
The Raspberry Jam photo booth
I put together code for a Pi-powered photo booth which overlaid the Big Birthday Weekend logo onto photos and (optionally) tweeted them. We included an arcade button in the Jam kits so they could build one — and it seemed to be quite popular. Some Jams put great effort into housing their photo booth:
If you want to try out the photo booth software yourself, find the code on GitHub.
The great Raspberry Jam bake-off
Traditionally, in the UK, people have a cake on their birthday. And we had a few! We saw (and tasted) a great selection of Pi-themed cakes and other baked goods throughout the weekend:
Raspberry Jams everywhere
We always say that every Jam is different, but there’s a common and recognisable theme amongst them. It was great to see so many different venues around the world filling up with like-minded Pi enthusiasts, Raspberry Jam–branded banners, and Raspberry Pi balloons!
Thank you so much to all the attendees of the Ikana Jam in Krakow past Saturday! We shared fun experiences, some of them… also painful 😉 A big thank you to @Raspberry_Pi for these global celebrations! And a big thank you to @hubraum for their hospitality! #PiParty #rjam
We also had a super successful set of wearables workshops using @adafruit Circuit Playground Express boards and conductive thread at today’s @Raspberry_Pi Jam! Very popular! #PiParty
Learning how to scare the zombies in case of an apocalypse- it worked on our young learners #PiParty @worksopcollege @Raspberry_Pi https://t.co/pntEm57TJl
Being one of the two places in Kenya where the #PiParty took place, it was an amazing time spending the day with this team and getting to learn and have fun. @TaitaTavetaUni and @Raspberry_Pi thank you for your support. @TTUTechlady @mictecttu ch
The Philly & Pi #PiParty event with @Bresslergroup and @TechGirlzorg was awesome! The Scratch and Pi workshop was amazing! It was overall a great day of fun and tech!!! Thank you everyone who came out!
Thanks everyone who came out to the @Raspberry_Pi Big Birthday Jam! Special thanks to @PBFerrell @estefanniegg @pcsforme @pandafulmanda @colnels @bquentin3 couldn’t’ve put on this amazing community event without you guys!
Así terminamos el #Raspberry Jam Big Birthday Weekend #Bogota 2018 #PiParty de #RaspberryJamBogota 2018 @Raspberry_Pi Nos vemos el 7 de marzo en #ArduinoDayBogota 2018 y #RaspberryJamBogota 2018
Happy 6th birthday, @Raspberry_Pi! Greetings all the way from CEBU,PH! #PiParty #IoTCebu Thanks @CebuXGeeks X Ramos for these awesome pics. #Fablab #UPCebu
ラズパイ、6才のお誕生日会スタート in Tokyo PCNブースで、いろいろ展示とhttps://t.co/L6E7KgyNHFとIchigoJamつないだ、こどもIoTハッカソンmini体験やってます at 東京蒲田駅近 https://t.co/yHEuqXHvqe #piparty #pipartytokyo #rjam #opendataday
Personally, I managed to get to three Jams over the weekend: two run by the same people who put on the first two Jams to ever take place, and also one brand-new one! The Preston Raspberry Jam team, who usually run their event on a Monday evening, wanted to do something extra special for the birthday, so they came up with the idea of putting on a Raspberry Jam Sandwich — on the Friday and Monday around the weekend! This meant I was able to visit them on Friday, then attend the Manchester Raspberry Jam on Saturday, and finally drop by the new Jam at Worksop College on my way home on Sunday.
I’m at my first Raspberry Jam #PiParty event of the big birthday weekend! @PrestonRJam has been running for nearly 6 years and is a great place to start the celebrations!
Thanks to everyone who came to our Jam and everyone who helped out. @phoenixtogether thanks for amazing cake & hosting. Ademir you’re so cool. It was awesome to meet Craig Morley from @Raspberry_Pi too. #PiParty
Great #PiParty today at the @cotswoldjam with bloody delicious cake and lots of raspberry goodness. Great to see @ClareSutcliffe @martinohanlon playing on my new pi powered arcade build:-)
It’s @Raspberry_Pi 6th birthday and we’re celebrating by taking part in @amsterjam__! Happy Birthday Raspberry Pi, we’re so happy to be a part of the family! #PiParty
For more Jammy birthday goodness, check out the PiParty hashtag on Twitter!
The Jam makers!
A lot of preparation went into each Jam, and we really appreciate all the hard work the Jam makers put in to making these events happen, on the Big Birthday Weekend and all year round. Thanks also to all the teams that sent us a group photo:
Lots of the Jams that took place were brand-new events, so we hope to see them continue throughout 2018 and beyond, growing the Raspberry Pi community around the world and giving more people, particularly youths, the opportunity to learn digital making skills.
So many wonderful people in the @Raspberry_Pi community. Thanks to everyone at #PottonPiAndPints for a great afternoon and for everything you do to help young people learn digital making. #PiParty
Special thanks to ModMyPi for shipping the special Raspberry Jam kits all over the world!
Don’t forget to check out our Jam page to find an event near you! This is also where you can find free resources to help you get a new Jam started, and download free starter projects made especially for Jam activities. These projects are available in English, Français, Français Canadien, Nederlands, Deutsch, Italiano, and 日本語. If you’d like to help us translate more content into these and other languages, please get in touch!
PS Some of the UK Jams were postponed due to heavy snowfall, so you may find there’s a belated sixth-birthday Jam coming up where you live!
With OTON GLASS, users are able to capture text with a blink and have it read back to them in their chosen language. It’s wonderful tool for people with dyslexia or poor vision, or for travellers abroad.
I was determined to develop OTON GLASS because of my father’s dyslexia experience. In 2012, my father had a brain tumor, and developed dyslexia after his operation — the catalyst for OTON GLASS. Fortunately, he recovered fully after rehabilitation. However, many people have congenital dyslexia regardless of their health.
Assembling a team of engineers and designers, Keisuke got to work.
The OTON GLASS device includes a Raspberry Pi 3, two cameras, and an earphone. One camera on the inside of the frame tracks the user’s eyes, and when it detects the blinked trigger, the outward-facing camera captures an image of what the user is looking at. This image is then processed by the Raspberry Pi via a program that performs optical character recognition. If the Pi detects written words, it converts them to speech, which the earphone plays back for the user.
The initial prototype of OTON GLASS had a 15-second delay between capturing text and replaying audio. This was cut down to three seconds in the team’s second prototype, designed in CAD software and housed within a 3D-printed case. The makers were then able to do real-world testing of the prototype to collect feedback from dyslexic users, and continued to upgrade the device based on user opinions.
Awards buzz
OTON GLASS is on its way to public distribution this year, and is currently doing the rounds at various trade and tech shows throughout Japan. Models are also available for trial at the Japan Blind Party Association, Kobe Eye Centre, and Nippon Keihan Library. In 2016, the device was runner-up for the James Dyson Award, and it has also garnered attention at various other awards shows and in the media. We’re looking forward to getting out hands on OTON GLASS, and we can’t wait to find out where team will take this device in the future.
Big things are afoot in the world of HackSpace magazine! This month we’re running our first special issue, with wearables projects throughout the magazine. Moreover, we’re giving away our first subscription gift free to all 12-month print subscribers. Lastly, and most importantly, we’ve made the cover EXTRA SHINY!
Prepare your eyeballs — it’s HackSpace magazine issue 4!
Wearables
In this issue, we’re taking an in-depth look at wearable tech. Not Fitbits or Apple Watches — we’re talking stuff you can make yourself, from projects that take a couple of hours to put together, to the huge, inspiring builds that are bringing technology to the runway. If you like wearing clothes and you like using your brain to make things better, then you’ll love this feature.
We’re continuing our obsession with Nixie tubes, with the brilliant Time-To-Go-Clock – Trump edition. This ingenious bit of kit uses obsolete Russian electronics to count down the time until the end of the 45th president’s term in office. However, you can also program it to tell the time left to any predictable event, such as the deadline for your tax return or essay submission, or the date England gets knocked out of the World Cup.
We’re also talking to Dr Lucy Rogers — NASA alumna, Robot Wars judge, and fellow of the Institution of Mechanical Engineers — about the difference between making as a hobby and as a job, and about why we need the Guild of Makers. Plus, issue 4 has a teeny boat, the most beautiful Raspberry Pi cases you’ve ever seen, and it explores the results of what happens when you put a bunch of hardware hackers together in a French chateau — sacré bleu!
Tutorials
As always, we’ve got more how-tos than you can shake a soldering iron at. Fittingly for the current climate here in the UK, there’s a hot water monitor, which shows you how long you have before your morning shower turns cold, and an Internet of Tea project to summon a cuppa from your kettle via the web. Perhaps not so fittingly, there’s also an ESP8266 project for monitoring a solar power station online. Readers in the southern hemisphere, we’ll leave that one for you — we haven’t seen the sun here for months!
And there’s more!
We’re super happy to say that all our 12-month print subscribers have been sent an Adafruit Circuit Playground Express with this new issue:
This gadget was developed primarily with wearables in mind and comes with all sorts of in-built functionality, so subscribers can get cracking with their latest wearable project today! If you’re not a 12-month print subscriber, you’ll miss out, so subscribe here to get your magazine and your device, and let us know what you’ll make.
This column is from The MagPi issue 58. You can download a PDF of the full issue for free, or subscribe to receive the print edition through your letterbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve our charitable goals.
Dr Lucy Rogers calls herself a Transformer. “I transform simple electronics into cool gadgets, I transform science into plain English, I transform problems into opportunities. I am also a catalyst. I am interested in everything around me, and can often see ways of putting two ideas from very different fields together into one package. If I cannot do this myself, I connect the people who can.”
Among many other projects, Dr Lucy Rogers currently focuses much of her attention on reducing the damage from space debris
It’s a pretty wide range of interests and skills for sure. But it only takes a brief look at Lucy’s résumé to realise that she means it. When she says she’s interested in everything around her, this interest reaches from electronics to engineering, wearable tech, space, robotics, and robotic dinosaurs. And she can be seen talking about all of these things across various companies’ social media, such as IBM, websites including the Women’s Engineering Society, and books, including her own.
With her bright LED boots, Lucy was one of the wonderful Pi community members invited to join us and HRH The Duke of York at St James’s Palace just over a year ago
When not attending conferences as guest speaker, tinkering with electronics, or creating engaging IoT tutorials, she can be found retrofitting Raspberry Pis into the aforementioned robotic dinosaurs at Blackgang Chine Land of Imagination, writing, and judging battling bots for the BBC’s Robot Wars.
First broadcast in the UK between 1998 and 2004, Robot Wars was revived in 2016 with a new look and new judges, including Dr Lucy Rogers. Competitors battle their home-brew robots, and Lucy, together with the other two judges, awards victories among the carnage of robotic remains
Lucy graduated from Lancaster University with a degree in Mechanical Engineering. After that, she spent seven years at Rolls-Royce Industrial Power Group as a graduate trainee before becoming a chartered engineer and earning her PhD in bubbles.
Bubbles?
“Foam formation in low‑expansion fire-fighting equipment. I investigated the equipment to determine how the bubbles were formed,” she explains. Obviously. Bubbles!
Lucy graduated from the Singularity University Graduate Studies Program in 2011, focusing on how robotics, nanotech, medicine, and various technologies can tackle the challenges facing the world
She then went on to become a fellow of the Royal Astronomical Society (RAS) in 2005 and, later, a fellow of both the Institution of Mechanical Engineers (IMechE) and British Interplanetary Society. As a member of the Association of British Science Writers, Lucy wrote It’s ONLY Rocket Science: an Introduction in Plain English.
In It’s Only Rocket Science: An Introduction in Plain English Lucy explains that ‘hard to understand’ isn’t the same as ‘impossible to understand’, and takes her readers through the journey of building a rocket, leaving Earth, and travelling the cosmos
As a standout member of the industry, and all-round fun person to be around, Lucy has quickly established herself as a valued member of the Pi community.
In 2014, with the help of Neil Ford and Andy Stanford-Clark, Lucy worked with the UK’s oldest amusement park, Blackgang Chine Land of Imagination, on the Isle of Wight, with the aim of updating its animatronic dinosaurs. The original Blackgang Chine dinosaurs had a limited range of behaviour: able to roar, move their heads, and stomp a foot in a somewhat repetitive action.
When she contacted Raspberry Pi back in the November of that same year, the team were working on more creative, varied behaviours, giving each dinosaur a new Raspberry Pi-sized brain. This later evolved into a very successful dino-hacking Raspberry Jam.
Lucy, Neil Ford, and Andy Stanford-Clark used several Raspberry Pis and Node-RED to visualise flows of events when updating the robotic dinosaurs at Blackgang Chine. They went on to create the successful WightPi Raspberry Jam event, where visitors could join in with the unique hacking opportunity.
Given her love for tinkering with tech, and a love for stand-up comedy that can be uncovered via a quick YouTube search, it’s no wonder that Lucy was asked to help judge the first round of the ‘Make us laugh’ Pioneers challenge for Raspberry Pi. Alongside comedian Bec Hill, Code Club UK director Maria Quevedo, and the face of the first challenge, Owen Daughtery, Lucy lent her expertise to help name winners in the various categories of the teens event, and offered her support to future Pioneers.
Here at Backblaze we have a lot of folks who are all about technology. With the holiday season fast approaching, you might have all of your gift buying already finished — but if not, we put together a list of things that the employees here at Backblaze are pretty excited about giving (and/or receiving) this year.
Smart Homes:
It’s no secret that having a smart home is the new hotness, and many of the items below can be used to turbocharge your home’s ascent into the future:
Raspberry Pi The holidays are all about eating pie — well why not get a pie of a different type for the DIY fan in your life! Wyze Cam An inexpensive way to keep a close eye on all your favorite people…and intruders! Snooz Have trouble falling asleep? Try this portable white noise machine. Also great for the office! Amazon Echo Dot Need a cheap way to keep track of your schedule or play music? The Echo Dot is a great entry into the smart home of your dreams! Google Wifi These little fellows make it easy to Wifi-ify your entire home, even if it’s larger than the average shoe box here in Silicon Valley. Google Wifi acts as a mesh router and seamlessly covers your whole dwelling. Have a mansion? Buy more! Google Home Like the Amazon Echo Dot, this is the Google variant. It’s more expensive (similar to the Amazon Echo) but has better sound quality and is tied into the Google ecosystem. Nest Thermostat This is a smart thermostat. What better way to score points with the in-laws than installing one of these bad boys in their home — and then making it freezing cold randomly in the middle of winter from the comfort of your couch!
Wearables:
Homes aren’t the only things that should be smart. Your body should also get the chance to be all that it can be:
Apple AirPods You’ve seen these all over the place, and the truth is they do a pretty good job of making sounds appear in your ears. Bose SoundLink Wireless Headphones If you like over-the-ear headphones, these noise canceling ones work great, are wireless and lovely. There’s no better way to ignore people this holiday season! Garmin Fenix 5 Watch This watch is all about fitness. If you enjoy fitness. This watch is the fitness watch for your fitness needs. Apple Watch The Apple Watch is a wonderful gadget that will light up any movie theater this holiday season. Nokia Steel Health Watch If you’re into mixing analogue and digital, this is a pretty neat little gadget. Fossil Smart Watch This stylish watch is a pretty neat way to dip your toe into smartwatches and activity trackers. Pebble Time Steel Smart Watch Some people call this the greatest smartwatch of all time. Those people might be named Yev. This watch is great at sending you notifications from your phone, and not needing to be charged every day. Bellissimo!
Random Goods:
A few of the holiday gift suggestions that we got were a bit off-kilter, but we do have a lot of interesting folks in the office. Hopefully, you might find some of these as interesting as they do:
Wireless Qi Charger Wireless chargers are pretty great in that you don’t have to deal with dongles. There are even kits to make your electronics “wirelessly chargeable” which is pretty great! Self-Heating Coffee Mug Love coffee? Hate lukewarm coffee? What if your coffee cup heated itself? Brilliant! Yeast Stirrer Yeast. It makes beer. And bread! Sometimes you need to stir it. What cooler way to stir your yeast than with this industrial stirrer? Toto Washlet This one is self explanatory. You know the old rhyme: happy butts, everyone’s happy!
The FDA hasapproved a pill with an embedded sensor that can report when it is swallowed. The pill transmits information to a wearable patch, which in turn transmits information to a smartphone.
Looking for the perfect Christmas gift for a beloved maker in your life? Maybe you’d like to give a relative or friend a taste of the world of coding and Raspberry Pi? Whatever you’re looking for, the Raspberry Pi Christmas shopping list will point you in the right direction.
For those getting started
Thinking about introducing someone special to the wonders of Raspberry Pi during the holidays? Although you can set up your Pi with peripherals from around your home, such as a mobile phone charger, your PC’s keyboard, and the old mouse dwelling in an office drawer, a starter kit is a nice all-in-one package for the budding coder.
You can also buy the Raspberry Pi Press’s brand-new Raspberry Pi Beginners Book, which includes a Raspberry Pi Zero W, a case, a ready-made SD card, and adapter cables.
Once you’ve presented a lucky person with their first Raspberry Pi, it’s time for them to spread their maker wings and learn some new skills.
If you’re looking for something for a confident digital maker, you can’t go wrong with adding to their arsenal of electric and electronic bits and bobs that are no doubt cluttering drawers and boxes throughout their house.
Components such as servomotors, displays, and sensors are staples of the maker world. And when it comes to jumper wires, buttons, and LEDs, one can never have enough.
You could also consider getting your person a soldering iron, some helpings hands, or small tools such as a Dremel or screwdriver set.
And to make their life a little less messy, pop it all inside a Really Useful Box…because they’re really useful.
For kit makers
While some people like to dive into making head-first and to build whatever comes to mind, others enjoy working with kits.
The Naturebytes kit allows you to record the animal visitors of your garden with the help of a camera and a motion sensor. Footage of your local badgers, birds, deer, and more will be saved to an SD card, or tweeted or emailed to you if it’s in range of WiFi.
Coretec’s Tiny 4WD is a kit for assembling a Pi Zero–powered remote-controlled robot at home. Not only is the robot adorable, building it also a great introduction to motors and wireless control.
Finally, why not help your favourite maker create their own gaming arcade using the Arcade Building Kit from The Pi Hut?
For the reader
For those who like to curl up with a good read, or spend too much of their day on public transport, a book or magazine subscription is the perfect treat.
For makers, hackers, and those interested in new technologies, our brand-new HackSpace magazine and the ever popular community magazine The MagPi are ideal. Both are available via a physical or digital subscription, and new subscribers to The MagPi also receive a free Raspberry Pi Zero W plus case.
Looking for something small to keep your loved ones occupied on Christmas morning? Or do you have to buy a Secret Santa gift for the office tech? Here are some wonderful stocking fillers to fill your boots with this season.
The Pi Hut 3D Xmas Tree: available as both a pre-soldered and a DIY version, this gadget will work with any 40-pin Raspberry Pi and allows you to create your own mini light show.
Google AIY Voice kit: build your own home assistant using a Raspberry Pi, the MagPi Essentials guide, and this brand-new kit. “Google, play Mariah Carey again…”
Pimoroni’s Raspberry Pi Zero W Project Kits offer everything you need, including the Pi, to make your own time-lapse cameras, music players, and more.
The official Raspberry Pi Sense HAT, Camera Module, and cases for the Pi 3 and Pi Zero will complete the collection of any Raspberry Pi owner, while also opening up exciting project opportunities.
LEGO Idea’s bought out this amazing ‘Women of NASA’ set, and I thought it would be fun to build, play and learn from these inspiring women! First up, let’s discover a little more about Sally Ride and Mae Jemison, two AWESOME ASTRONAUTS!
Treat the kids, and big kids, in your life to the newest LEGO Ideas set, the Women of NASA — starring Nancy Grace Roman, Margaret Hamilton, Sally Ride, and Mae Jemison!
Explore the world of wearables with Pimoroni’s sewable, hackable, wearable, adorable Bearables kits.
Add lights and motors to paper creations with the Activating Origami Kit, available from The Pi Hut.
With so many amazing kits, HATs, and books available from members of the Raspberry Pi community, it’s hard to only pick a few. Have you found something splendid for the maker in your life? Maybe you’ve created your own kit that uses the Raspberry Pi? Share your favourites with us in the comments below or via our social media accounts.
HackSpace magazine is finally here! Grab your copy of the new magazine for makers today, and try your hand at some new, exciting skills.
What is HackSpace magazine?
HackSpace magazine is the newest publication from the team behind The MagPi. Chock-full of amazing projects, tutorials, features, and maker interviews, HackSpace magazine brings together the makers of the world every month, with you — the community — providing the content.
The new magazine for the modern maker is out now! Learn more at https://hsmag.cc HackSpace magazine is the new monthly magazine for people who love to make things and those who want to learn. Grab some duct tape, fire up a microcontroller, ready a 3D printer and hack the world around you!
Inside issue 1
Fancy smoking bacon with your very own cold smoker? How about protecting your home with a mini trebuchet for your front lawn? Or maybe you’d like to learn from awesome creator Becky Stern how to get paid for making the things you love? No matter whether it’s handheld consoles, robot prosthetics, Christmas projects, or, er, duct tape — whatever your maker passion, issue 1 is guaranteed to tick your boxes!
HackSpace magazine is packed with content from every corner of the maker world: from welding to digital making, and from woodwork to wearables. And whatever you enjoy making, we want to see it! So as you read through this first issue, imagine your favourite homemade projects on our pages, then make that a reality by emailing us the details via [email protected].
Get your copy
You can grab issue 1 of HackSpace magazine right now from WHSmith, Tesco, Sainsbury’s, and independent newsagents. If you live in the US, check out your local Barnes & Noble, Fry’s, or Micro Center next week. We’re also shipping to stores in Australia, Hong Kong, Canada, Singapore, Belgium and Brazil — ask your local newsagent whether they’ll be getting HackSpace magazine. Alternatively, you can get the new issue online from our store, or digitally via our Android or iOS apps. And don’t forget, as with all our publications, a free PDF of HackSpace magazine is available from release day.
We’re also offering money-saving subscriptions — find details on the the magazine website. And if you’re a subscriber of The MagPi, your free copy of HackSpace magazine is on its way, with details of a super 50% discount on subscriptions! Could this be the Christmas gift you didn’t know you wanted?
Share your makes and thoughts
Make sure to follow HackSpace magazine on Facebook and Twitter, or email the team at [email protected] to tell us about your projects and share your thoughts about issue 1. We’ve loved creating this new magazine for the maker community, and we hope you enjoy it as much as we do.
This post courtesy of Aaron Friedman, Healthcare and Life Sciences Partner Solutions Architect, AWS and Angel Pizarro, Genomics and Life Sciences Senior Solutions Architect, AWS
Precision medicine is tailored to individuals based on quantitative signatures, including genomics, lifestyle, and environment. It is often considered to be the driving force behind the next wave of human health. Through new initiatives and technologies such as population-scale genomics sequencing and IoT-backed wearables, researchers and clinicians in both commercial and public sectors are gaining new, previously inaccessible insights.
Many of these precision medicine initiatives are already happening on AWS. A few of these include:
PrecisionFDA – This initiative is led by the US Food and Drug Administration. The goal is to define the next-generation standard of care for genomics in precision medicine.
Deloitte ConvergeHEALTH – Gives healthcare and life sciences organizations the ability to analyze their disparate datasets on a singular real world evidence platform.
Central to many of these initiatives is genomics, which gives healthcare organizations the ability to establish a baseline for longitudinal studies. Due to its wide applicability in precision medicine initiatives—from rare disease diagnosis to improving outcomes of clinical trials—genomics data is growing at a larger rate than Moore’s law across the globe. Many expect these datasets to grow to be in the range of tens of exabytes by 2025.
Genomics data is also regularly re-analyzed by the community as researchers develop new computational methods or compare older data with newer genome references. These trends are driving innovations in data analysis methods and algorithms to address the massive increase of computational requirements.
Edico Genome, an AWS Partner Network (APN) Partner, has developed a novel solution that accelerates genomics analysis using field-programmable gate arrays, or FPGAs. Historically, Edico Genome deployed their FPGA appliances on-premises. When AWS announced the Amazon EC2 F1 PGA-based instance family in December 2016, Edico Genome adopted a cloud-first strategy, became a F1 launch partner, and was one of the first partners to deploy FPGA-enabled applications on AWS.
On October 19, 2017, Edico Genome partnered with the Children’s Hospital of Philadelphia (CHOP) to demonstrate their FPGA-accelerated genomic pipeline software, called DRAGEN. It can significantly reduce time-to-insight for patient genomes, and analyzed 1,000 genomes from the Center for Applied Genomics Biobank in the shortest time possible. This set a Guinness World Record for the fastest analysis of 1000 whole human genomes, and they did this using 1000 EC2 f1.2xlarge instances in a single AWS region. Not only were they able to analyze genomes at high throughput, they did so averaging approximately $3 per whole human genome of AWS compute for the analysis.
The version of DRAGEN that Edico Genome used for this analysis was also the same one used in the precisionFDA Hidden Treasures – Warm Up challenge, where they were one of the top performers in every assessment.
In the remainder of this post, we walk through the architecture used by Edico Genome, combining EC2 F1 instances and AWS Batch to achieve this milestone.
EC2 F1 instances and Edico’s DRAGEN
EC2 F1 instances provide access to programmable hardware-acceleration using FPGAs at a cloud scale. AWS customers use F1 instances for a wide variety of applications, including big data, financial analytics and risk analysis, image and video processing, engineering simulations, AR/VR, and accelerated genomics. Edico Genome’s FPGA-backed DRAGEN Bio-IT Platform is now integrated with EC2 F1 instances. You can access the accuracy, speed, flexibility, and low compute cost of DRAGEN through a number of third-party platforms, AWS Marketplace, and Edico Genome’s own platform. The DRAGEN platform offers a scalable, accelerated, and cost-efficient secondary analysis solution for a wide variety of genomics applications. Edico Genome also provides a highly optimized mechanism for the efficient storage of genomic data.
Scaling DRAGEN on AWS
Edico Genome used 1,000 EC2 F1 instances to help their customer, the Children’s Hospital of Philadelphia (CHOP), to process and analyze all 1,000 whole human genomes in parallel. They used AWS Batch to provision compute resources and orchestrate DRAGEN compute jobs across the 1,000 EC2 F1 instances. This solution successfully addressed the challenge of creating a scalable genomic processing pipeline that can easily scale to thousands of engines running in parallel.
Architecture
A simplified view of the architecture used for the analysis is shown in the following diagram:
DRAGEN’s portal uses Elastic Load Balancing and Auto Scaling groups to scale out EC2 instances that submitted jobs to AWS Batch.
Job metadata is stored in their Workflow Management (WFM) database, built on top of Amazon Aurora.
The DRAGEN Workflow Manager API submits jobs to AWS Batch.
These jobs are executed on the AWS Batch managed compute environment that was responsible for launching the EC2 F1 instances.
These jobs run as Docker containers that have the requisite DRAGEN binaries for whole genome analysis.
As each job runs, it retrieves and stores genomics data that is staged in Amazon S3.
The steps listed previously can also be bucketed into the following higher-level layers:
Workflow: Edico Genome used their Workflow Management API to orchestrate the submission of AWS Batch jobs. Metadata for the jobs (such as the S3 locations of the genomes, etc.) resides in the Workflow Management Database backed by Amazon Aurora.
Batch execution: AWS Batch launches EC2 F1 instances and coordinates the execution of DRAGEN jobs on these compute resources. AWS Batch enabled Edico to quickly and easily scale up to the full number of instances they needed as jobs were submitted. They also scaled back down as each job was completed, to optimize for both cost and performance.
Compute/job: Edico Genome stored their binaries in a Docker container that AWS Batch deployed onto each of the F1 instances, giving each instance the ability to run DRAGEN without the need to pre-install the core executables. The AWS based DRAGEN solution streams all genomics data from S3 for local computation and then writes the results to a destination bucket. They used an AWS Batch job role that specified the IAM permissions. The role ensured that DRAGEN only had access to the buckets or S3 key space it needed for the analysis. Jobs didn’t need to embed AWS credentials.
In the following sections, we dive deeper into several tasks that enabled Edico Genome’s scalable FPGA genome analysis on AWS:
Prepare your Amazon FPGA Image for AWS Batch
Create a Dockerfile and build your Docker image
Set up your AWS Batch FPGA compute environment
Prerequisites
In brief, you need a modern Linux distribution (3.10+), Amazon ECS Container Agent, awslogs driver, and Docker configured on your image. There are additional recommendations in the Compute Resource AMI specification.
Preparing your Amazon FPGA Image for AWS Batch
You can use any Amazon Machine Image (AMI) or Amazon FPGA Image (AFI) with AWS Batch, provided that it meets the Compute Resource AMI specification. This gives you the ability to customize any workload by increasing the size of root or data volumes, adding instance stores, and connecting with the FPGA (F) and GPU (G and P) instance families.
Next, install the AWS CLI:
pip install awscli
Add any additional software required to interact with the FPGAs on the F1 instances.
As a starting point, AWS publishes an FPGA Developer AMI in the AWS Marketplace. It is based on a CentOS Linux image and includes pre-integrated FPGA development tools. It also includes the runtime tools required to develop and use custom FPGAs for hardware acceleration applications.
For more information about how to set up custom AMIs for your AWS Batch managed compute environments, see Creating a Compute Resource AMI.
Building your Dockerfile
There are two common methods for connecting to AWS Batch to run FPGA-enabled algorithms. The first method, which is the route Edico Genome took, involves storing your binaries in the Docker container itself and running that on top of an F1 instance with Docker installed. The following code example is what a Dockerfile to build your container might look like for this scenario.
# DRAGEN_EXEC Docker image generator --
# Run this Dockerfile from a local directory that contains the latest release of
# - Dragen RPM and Linux DMA Driver available from Edico
# - Edico's Dragen WFMS Wrapper files
FROM centos:centos7
RUN rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# Install Basic packages needed for Dragen
RUN yum -y install \
perl \
sos \
coreutils \
gdb \
time \
systemd-libs \
bzip2-libs \
R \
ca-certificates \
ipmitool \
smartmontools \
rsync
# Install the Dragen RPM
RUN mkdir -m777 -p /var/log/dragen /var/run/dragen
ADD . /root
RUN rpm -Uvh /root/edico_driver*.rpm || true
RUN rpm -Uvh /root/dragen-aws*.rpm || true
# Auto generate the Dragen license
RUN /opt/edico/bin/dragen_lic -i auto
#########################################################
# Now install the Edico WFMS "Wrapper" functions
# Add development tools needed for some util
RUN yum groupinstall -y "Development Tools"
# Install necessary standard packages
RUN yum -y install \
dstat \
git \
python-devel \
python-pip \
time \
tree && \
pip install --upgrade pip && \
easy_install requests && \
pip install psutil && \
pip install python-dateutil && \
pip install constants && \
easy_install boto3
# Setup Python path used by the wrapper
RUN mkdir -p /opt/workflow/python/bin
RUN ln -s /usr/bin/python /opt/workflow/python/bin/python2.7
RUN ln -s /usr/bin/python /opt/workflow/python/bin/python
# Install d_haul and dragen_job_execute wrapper functions and associated packages
RUN mkdir -p /root/wfms/trunk/scheduler/scheduler
COPY scheduler/d_haul /root/wfms/trunk/scheduler/
COPY scheduler/dragen_job_execute /root/wfms/trunk/scheduler/
COPY scheduler/scheduler/aws_utils.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/constants.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/job_utils.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/logger.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/scheduler_utils.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/webapi.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/wfms_exception.py /root/wfms/trunk/scheduler/scheduler/
RUN touch /root/wfms/trunk/scheduler/scheduler/__init__.py
# Landing directory should be where DJX is located
WORKDIR "/root/wfms/trunk/scheduler/"
# Debug print of container's directories
RUN tree /root/wfms/trunk/scheduler
# Default behaviour. Over-ride with --entrypoint on docker run cmd line
ENTRYPOINT ["/root/wfms/trunk/scheduler/dragen_job_execute"]
CMD []
Note: Edico Genome’s custom Python wrapper functions for its Workflow Management System (WFMS) in the latter part of this Dockerfile should be replaced with functions that are specific to your workflow.
The second method is to install binaries and then use Docker as a lightweight connector between AWS Batch and the AFI. For example, this might be a route you would choose to use if you were provisioning DRAGEN from the AWS Marketplace.
In this case, the Dockerfile would not contain the installation of the binaries to run DRAGEN, but would contain any other packages necessary for job completion. When you run your Docker container, you enable Docker to access the underlying file system.
Connecting to AWS Batch
AWS Batch provisions compute resources and runs your jobs, choosing the right instance types based on your job requirements and scaling down resources as work is completed. AWS Batch users submit a job, based on a template or “job definition” to an AWS Batch job queue.
Job queues are mapped to one or more compute environments that describe the quantity and types of resources that AWS Batch can provision. In this case, Edico created a managed compute environment that was able to launch 1,000 EC2 F1 instances across multiple Availability Zones in us-east-1. As jobs are submitted to a job queue, the service launches the required quantity and types of instances that are needed. As instances become available, AWS Batch then runs each job within appropriately sized Docker containers.
The Edico Genome workflow manager API submits jobs to an AWS Batch job queue. This job queue maps to an AWS Batch managed compute environment containing On-Demand F1 instances. In this section, you can set this up yourself.
To create the compute environment that DRAGEN can use:
An f1.2xlarge EC2 instance contains one FPGA, eight vCPUs, and 122-GiB RAM. As DRAGEN requires an entire FPGA to run, Edico Genome needed to ensure that only one analysis per time executed on an instance. By using the f1.2xlarge vCPUs and memory as a proxy in their AWS Batch job definition, Edico Genome could ensure that only one job runs on an instance at a time. Here’s what that looks like in the AWS CLI:
You can query the status of your DRAGEN job with the following command:
aws batch describe-jobs --jobs <the job ID from the above command>
The logs for your job are written to the /aws/batch/job CloudWatch log group.
Conclusion
In this post, we demonstrated how to set up an environment with AWS Batch that can run DRAGEN on EC2 F1 instances at scale. If you followed the walkthrough, you’ve replicated much of the architecture Edico Genome used to set the Guinness World Record.
There are several ways in which you can harness the computational power of DRAGEN to analyze genomes at scale. First, DRAGEN is available through several different genomics platforms, such as the DNAnexus Platform. DRAGEN is also available on the AWS Marketplace. You can apply the architecture presented in this post to build a scalable solution that is both performant and cost-optimized.
For more information about how AWS Batch can facilitate genomics processing at scale, be sure to check out our aws-batch-genomics GitHub repo on high-throughput genomics on AWS.
Come with us on a journey to discover the 2017 Raspberry Pi Halloween projects that caught our eye, raised our hair, or sent us screaming into the night.
Happy Halloween
Whether you’re easily scared or practically unshakeable, you can celebrate Halloween with Pi projects of any level of creepiness.
Even makers of a delicate constitution will enjoy making this Code Club Ghostbusters game, or building an interactive board game using Halloween lights with this MagPi tutorial by Mike Cook. And how about a wearable, cheerily LED-enhanced pumpkin created with the help of this CoderDojo resource? Cute, no?
Speaking of wearables, Derek Woodroffe’s be-tentacled hat may writhe disconcertingly, but at least it won’t reach out for you. Although, you could make it do that, if you were a terrible person.
Slightly queasy Halloween
Your decorations don’t have to be terrifying: this carved Pumpkin Pi and the Poplawskis’ Halloween decorations are controlled remotely via the web, but they’re more likely to give you happy goosebumps than cold sweats.
The Snake Eyes Bonnet pumpkin and the monster-face projection controlled by Pis that we showed you in our Halloween Twitter round-up look fairly friendly. Even the 3D-printed jack-o’-lantern by wermy, creator of mintyPi, is kind of adorable, if you ignore the teeth. And who knows, that AlexaPi-powered talking skull that’s staring at you could be an affable fellow who just fancies a chat, right? Right?
Horror-struck Halloween
OK, fine. You’re after something properly frightening. How about the haunted magic mirror by Kapitein Haak, or this one, with added Philips Hue effects, by Ben Eagan. As if your face first thing in the morning wasn’t shocking enough.
If you find those rigid-faced, bow-lipped, plastic dolls more sinister than sweet – and you’re right to do so: they’re horrible – you won’t like this evil toy. Possessed by an unquiet shade, it’s straight out of my nightmares.
Earlier this month we covered Adafruit’s haunted portrait how-to. This build by Dominick Marino takes that concept to new, terrifying, heights.
Why not add some motion-triggered ghost projections to your Halloween setup? They’ll go nicely with the face-tracking, self-winding, hair-raising jack-in-the-box you can make thanks to Sean Hodgins’ YouTube tutorial.
And then, last of all, there’s this.
NO.
This recreation of Billy the Puppet from the Saw franchise is Pi-powered, it’s mobile, and it talks. You can remotely control it, and I am not even remotely OK with it. That being said, if you’re keen to have one of your own, be my guest. Just follow the guide on Instructables. It’s your funeral.
Make your Halloween
It’s been a great year for scary Raspberry Pi makes, and we hope you have a blast using your Pi to get into the Halloween spirit.
And speaking of spirits, Matt Reed of RedPepper has created a Pi-based ghost detector! It uses Google’s Speech Neural Network AI to listen for voices in the ether, and it’s live-streaming tonight. Perfect for watching while you’re waiting for the trick-or-treaters to show up.
Halloween: that glorious time of year when you’re officially allowed to make your friends jump out of their skin with your pranks. For those among us who enjoy dressing up, Halloween is also the occasion to go all out with costumes. And so, dear reader, we present to you: a steampunk tentacle hat, created by Derek Woodroffe.
Derek is an engineer who loves all things electronics. He’s part of Extreme Kits, and he runs the website Extreme Electronics. Raspberry Pi Zero-controlled Tesla coils are Derek’s speciality — he’s even been on one of the Royal Institution’s Christmas Lectures with them! Skip ahead to 15:06 in this video to see Derek in action:
The first Lecture from Professor Saiful Islam’s 2016 series of CHRISTMAS LECTURES, ‘Supercharged: Fuelling the future’. Watch all three Lectures here: http://richannel.org/christmas-lectures 2016 marked the 80th anniversary since the BBC first broadcast the Christmas Lectures on TV. To celebrate, chemist Professor Saiful Islam explores a subject that the lectures’ founder – Michael Faraday – addressed in the very first Christmas Lectures – energy.
Wearables
Wearables are electronically augmented items you can wear. They might take the form of spy eyeglasses, clothes with integrated sensors, or, in this case, headgear adorned with mechanised tentacles.
Why did Derek make this? We’re not entirely sure, but we suspect he’s a fan of the Cthulu mythos. In any case, we were a little astounded by his project. This is how we reacted when Derek tweeted us about it:
@ExtElec @extkits This is beyond incredible and completely unexpected.
In fact, we had to recover from a fit of laughter before we actually managed to type this answer.
Making a steampunk tentacle hat
Derek made the ‘skeleton’ of each tentacle out of a net curtain spring, acrylic rings, and four lengths of fishing line. Two servomotors connect to two ends of fishing line each, and pull them to move the tentacle.
Then he covered the tentacles with nylon stockings and liquid latex, glued suckers cut out of MDF onto them, and mounted them on an acrylic base. The eight motors connect to a Raspberry Pi via an I2C 8-port PWM controller board.
The Pi makes the servos pull the tentacles so that they move in sine waves in both the x and y directions, seemingly of their own accord. Derek cut open the top of a hat to insert the mounted tentacles, and he used more liquid latex to give the whole thing a slimy-looking finish.
Iä! Iä! Cthulhu fhtagn!
You can read more about Derek’s steampunk tentacle hat here. He will be at the Beeston Raspberry Jam in November to show off his build, so if you’re in the Nottingham area, why not drop by?
Wearables for Halloween
This build is already pretty creepy, but just imagine it with a sensor- or camera-powered upgrade that makes the tentacles reach for people nearby. You’d have nightmare fodder for weeks.
With the help of the Raspberry Pi, any Halloween costume can be taken to the next level. How could Pi technology help you to win that coveted ‘Scariest costume’ prize this year? Tell us your ideas in the comments, and be sure to share pictures of you in your get-up with us on Twitter, Facebook, or Instagram.
In March, the CoderDojo Foundation launched their Girls Initiative, which aims to increase the average proportion of girls attending CoderDojo clubs from 29% to at least 40% over the next three years.
Six months on, we wanted to highlight what we’ve done so far and what’s next for our initiative.
What we’ve done so far
To date, we have focussed our efforts on four key areas:
Developing and improving content
Conducting and learning from research
Highlighting role models
Developing a guide of tried and tested best practices for encouraging and sustaining girls in a Dojo setting (Empowering the Future)
Content
We’ve taken measures to ensure our resources are as friendly to girls as well as boys, and we are improving them based on feedback from girls. For example, we have developed beginner-level content (Sushi Cards) for working with wearables and for building apps using App Inventor. In response to girls’ feedback, we are exploring more creative goal-orientated content.
Moreover, as part of our Empowering the Future guide, we have developed three short ‘Mini-Sushi’ projects which provide a taster of different programming languages, such as Scratch, HTML, and App Inventor.
What’s next?
We are currently finalising our intermediate-level wearables Sushi Cards. These are resources for learners to further explore wearables and integrate them with other coding skills they are developing. The Cards will enable young people to program LEDs which can be sewn into clothing with conductive thread. We are also planning another series of Sushi Cards focused on using coding skills to solve problems Ninjas have reported as important to them.
Research
In June 2017 we conducted the first Ninja survey. It was sent to all young people registered on the CoderDojo community platform, Zen. Hundreds of young people involved in Dojos around the world responded and shared their experiences.
We are currently examining these results to identify areas in which girls feel most or least confident, as well as the motivations and influencing factors that cause them to continue with coding.
What’s next?
Over the coming months we will delve deeper into the findings of this research, and decide how we can improve our content and Dojo support to adapt accordingly. Additionally, as part of sending out our Empowering the Future guide, we’re asking Dojos to provide insights into their current proportions of girls and female Mentors.
We will follow up with recipients of the guide to document the impact of the recommended approaches they try at their Dojo. Thus, we will find out which approaches are most effective in different regional contexts, which will help us improve our support for Dojos wanting to increase their proportion of attending girls.
Role models
Many Dojos, Champions, and Mentors are doing amazing work to support and encourage girls at their Dojos. Female Mentors not only help by supporting attending girls, but they also act as vital role models in an environment which is often male-dominated. Blogs by female Mentors and Ninjas which have already featured on our website include:
We recognise the importance of female role models, and over the coming months we will continue to encourage community members to share their stories so that we bring them to the wider CoderDojo community. Do you know a female Mentor or Ninja you would like to shine a spotline on? Get in touch with us at [email protected] You can also use #CoderDojoGirls on social media.
Empowering the Future guide
Ahead of Ada Lovelace Day and International Day of the Girl Child, the CoderDojo Foundation has released Empowering the Future, a comprehensive guide of practical approaches which Dojos have tested to engage and sustain girls.
Some topics covered in the guide are:
Approaches to improve the Dojo environment and layout
Language and images used to describe and promote Dojos
Content considerations, and suggested resources
The importance of female Mentors, and ways to increase access to role models
For the next month, Dojos that want to improve their proportion of girls can still sign up to have the guide book sent to them for free! From today, Dojos and anyone else can also download a PDF file of the guide.
We would like to say a massive thank you to all community members who have shared their insights with us to make our Empowering the Future guide as comprehensive and beneficial as possible for other Dojos.
Tell us what you think
Have you found an approach, or used content, which girls find particularly engaging? Do you have questions about our Girls Initiative? We would love to hear your ideas, insights, and experiences in relation to supporting CoderDojo girls! Feel free to use our forums to share with the global CoderDojo community, and email us at [email protected]
Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.
The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.
AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.
Customer 360 solution
A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.
A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.
The following diagram illustrates the workflow for such a solution.
This solution delivers value by:
Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.
Proof of concept walkthrough
To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.
The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:
An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.
Stream ingestion and enrichment
The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:
Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.
If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.
Environment
The project consists of the following environment:
Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
The customer table contains attributes for 3 million customers.
The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
The clickstream data is stored in an S3 bucket, and serves as a fact table.
Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
Changes were minimal, so that existing test harnesses for this test can be adapted:
Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.
Queries across the data lake and data warehouse
Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect:
The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.
Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.
Performance analysis
Two key utilities provide visibility into Redshift Spectrum:
EXPLAIN Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
SVL_S3QUERY_SUMMARY Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.
You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.
Partitioning
Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:
The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:
Only for specific customers
Only data for specific months
A combination of specific customers and year/months
You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:
SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey ASC
This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:
SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC
All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!
Predicate pushdown and storage optimizations
Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:
File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.
To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:
SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC
For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.
The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X.
Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits.
The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum.
Chart showing simple aggregation on ~3.7 billion records
So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:
The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:
In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:
Bytes scanned is 21.6X less because of the Parquet data format.
Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
No adverse change in average parallelism.
Assessing the value of Amazon Redshift vs. Redshift Spectrum
At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.
In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:
Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data
As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.
Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.
On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:
SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
…
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC
This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.
Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data
Summary
Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.
There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.
Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.
Commenting on the convenient size of the Raspberry Pi Zero W, Amanda explains on her blog “I decided that I wanted to make something that would fully take advantage of the compact size of the Pi Zero, that was somewhat useful, and that I could take with me and share with my maker friends during my summer tech travels.”
Awesome grandmothers and wearable tech are an instant recipe for success!
With access to her grandmother’s “high-tech embroidery machine”, Amanda was able to incorporate various maker skills into her project.
The Tech
Amanda used five clear white LEDs and the Raspberry Pi Zero for the project. Taking inspiration from the LED-adorned Babbage Bear her team created at Picademy, she decided to connect the LEDs using female-to-female jumper wires
Poor Babbage really does suffer at Picademy events
It’s worth noting that she could also have used conductive thread, though we wonder how this slightly less flexible thread would work in a sewing machine, so don’t try this at home. Or do, but don’t blame me if it goes wonky.
Having set the LEDs in place, Amanda worked on the code. Unsure about how she wanted the LEDs to blink, she finally settled on a random pulsing of the lights, and used the GPIO Zero library to achieve the effect.
Check out the GPIO Zero library for some great LED effects
The GPIO Zero pulse effect allows users to easily fade an LED in and out without the need for long strings of code. Very handy.
The Bag
Inspiration for the bag’s final design came thanks to a YouTube video, and Amanda and her grandmother were able to recreate the make using their fabric of choice.
Learn how to make this cute tote bag. A great project for beginning seamstresses!
A small pocket was added on the outside of the bag to allow for the Raspberry Pi Zero to be snugly secured, and the pattern was stitched into the front, allowing spaces for the LEDs to pop through.
Amanda shows off her bag to Philip at ISTE 2017
You can find more information on the project, including Amanda’s initial experimentation with the Sense HAT, on her blog. If you’re a maker, an educator or, (and here’s a word I’m pretty sure I’ve made up) an edumaker, be sure to keep her blog bookmarked!
Make your own wearable tech
Whether you use jumper leads, or conductive thread or paint, we’d love to see your wearable tech projects.
Getting started with wearables
To help you get started, we’ve created this Getting started with wearables free resource that allows you to get making with the Adafruit FLORA and and NeoPixel. Check it out!
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.