Tag Archives: ATS

Build your own weather station with our new guide!

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/build-your-own-weather-station/

One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”

Build Your Own weather station kit assembled

Tadaaaa! The BYO weather station fully assembled.

Our Oracle Weather Station

In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.

The original Raspberry Pi Oracle Weather Station HAT – Build Your Own Raspberry Pi weather station

The original Raspberry Pi Oracle Weather Station HAT

We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.

Our new BYO weather station guide

We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!

Build Your Own Raspberry Pi weather station

Fun with meteorological experiments!

Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.

Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.

Build Your Own Raspberry Pi weather station on a breadboard

There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.

Who should try this build

We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.

Build Your Own Raspberry Pi weather station – components

The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.

You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.

Prototyping HAT for Raspberry Pi weather station sensors

For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.

Our plans for the guide

Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!

*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.

The post Build your own weather station with our new guide! appeared first on Raspberry Pi.

Protecting coral reefs with Nemo-Pi, the underwater monitor

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/coral-reefs-nemo-pi/

The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.

Nemo-Pi — Save Nemo

Save Nemo

The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.

reef damaged by anchor
boat anchored at buoy

In addition, they provide dos and don’ts for how to behave on a reef dive.

The Nemo-Pi

To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.

Nemo-Pi schematic — Nemo-Pi — Save Nemo

This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.

Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo
Inside the Nemo-Pi device — Save Nemo

The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.

web dashboard — Nemo-Pi — Save Nemo

The web dashboard showing live Nemo-Pi data

Long-term goals

Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.

Coral reefs with fishes

A healthy coral reef

Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.

Bleached coral

A bleached coral reef

Vote now for Save Nemo

If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:

  1. Head to the voting web page
  2. Click “Abstimmen” in the footer of the page to vote
  3. Click “JA” in the footer to confirm

Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.

The post Protecting coral reefs with Nemo-Pi, the underwater monitor appeared first on Raspberry Pi.

Project Floofball and more: Pi pet stuff

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/project-floofball-pi-pet-stuff/

It’s a public holiday here today (yes, again). So, while we indulge in the traditional pastime of barbecuing stuff (ourselves, mainly), here’s a little trove of Pi projects that cater for our various furry friends.

Project Floofball

Nicole Horward created Project Floofball for her hamster, Harold. It’s an IoT hamster wheel that uses a Raspberry Pi and a magnetic door sensor to log how far Harold runs.

Project Floofball: an IoT hamster wheel

An IoT Hamsterwheel using a Raspberry Pi and a magnetic door sensor, to see how far my hamster runs.

You can follow Harold’s runs in real time on his ThingSpeak channel, and you’ll find photos of the build on imgur. Nicole’s Python code, as well as her template for the laser-cut enclosure that houses the wiring and LCD display, are available on the hamster wheel’s GitHub repo.

A live-streaming pet feeder

JaganK3 used to work long hours that meant he couldn’t be there to feed his dog on time. He found that he couldn’t buy an automated feeder in India without paying a lot to import one, so he made one himself. It uses a Raspberry Pi to control a motor that turns a dispensing valve in a hopper full of dry food, giving his dog a portion of food at set times.

A transparent cylindrical hopper of dry dog food, with a motor that can turn a dispensing valve at the lower end. The motor is connected to a Raspberry Pi in a plastic case. Hopper, motor, Pi, and wiring are all mounted on a board on the wall.

He also added a web cam for live video streaming, because he could. Find out more in JaganK3’s Instructable for his pet feeder.

Shark laser cat toy

Sam Storino, meanwhile, is using a Raspberry Pi to control a laser-pointer cat toy with a goshdarned SHARK (which is kind of what I’d expect from the guy who made the steampunk-looking cat feeder a few weeks ago). The idea is to keep his cats interested and active within the confines of a compact city apartment.

Raspberry Pi Automatic Cat Laser Pointer Toy

Post with 52 votes and 7004 views. Tagged with cat, shark, lasers, austin powers, raspberry pi; Shared by JeorgeLeatherly. Raspberry Pi Automatic Cat Laser Pointer Toy

If I were a cat, I would definitely be entirely happy with this. Find out more on Sam’s website.

And there’s more

Michel Parreno has written a series of articles to help you monitor and feed your pet with Raspberry Pi.

All of these makers are generous in acknowledging the tutorials and build logs that helped them with their projects. It’s lovely to see the Raspberry Pi and maker community working like this, and I bet their projects will inspire others too.

Now, if you’ll excuse me. I’m late for a barbecue.

The post Project Floofball and more: Pi pet stuff appeared first on Raspberry Pi.

Шремс подава оплаквания срещу Google, Facebook, Instagram и WhatsApp

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/27/gdpr-schrems/

Макс Шремс подава оплаквания срещу Google, Facebook, Instagram и WhatsApp. Причината е, че според него е незаконен изборът, пред който са изправени потребителите им – да приемат условията на компаниите или да загубят достъп до услугите им.

Подходът “съгласи се или напусни”, казва Шремс пред Reuters Television, нарушава правото на хората съгласно Общия регламент за защита на данните (GDPR) да избират свободно дали да позволят на компаниите да използват данните им. Трябва да има  избор,  смята Шремс.

Шремс е австриецът, който все не е доволен от защитата на личните данни в социалните мрежи и не ги оставя на мира, като превръща борбата си за защита на данните и в професия, юрист е. Този път действа чрез създадена от него неправителствена организация noyb

Independent.ie 

The Guardian

Enchanting images with Inky Lines, a Pi‑powered polargraph

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/enchanting-images-inky-lines-pi-powered-polargraph/

A hanging plotter, also known as a polar plotter or polargraph, is a machine for drawing images on a vertical surface. It does so by using motors to control the length of two cords that form a V shape, supporting a pen where they meet. We’ve featured one on this blog before: Norbert “HomoFaciens” Heinz’s video is a wonderfully clear introduction to how a polargraph works and what you have to consider when you’re putting one together.

Today, we look at Inky Lines, by John Proudlock. With it, John is creating a series of captivating and beautiful pieces, and with his most recent work, each rendering of an image is unique.

The Inky Lines plotter draws a flock of seagulls in blue ink on white paper. The print head is suspended near the bottom left corner of the image, as the pen inks the wing of a gull

An evolving project

The project isn’t new – John has been working on it for at least a couple of years – but it is constantly evolving. When we first spotted it, John had just implemented code to allow the plotter to produce mesmeric, spiralling patterns.

A blue spiral pattern featuring overlapping "bubbles"
A dense pink spiral pattern, featuring concentric circles and reminiscent of a mandala
A blue spirograph-type pattern formed of large overlapping squares, each offset from its neighbour by a few degrees, producing a four-spiral-armed "galaxy" shape where lines overlap. The plotter's print head is visible in a corner of the image

But we’re skipping ahead. Let’s go back to the beginning.

From pixels to motor movements

John starts by providing an image, usually no more than 100 pixels wide, to a Raspberry Pi. Custom software that he wrote evaluates the darkness of each pixel and selects a pattern of a suitable density to represent it.

The two cords supporting the plotter’s pen are wound around the shafts of two stepper motors, such that the movement of the motors controls the length of the cords: the program next calculates how much each motor must move in order to produce the pattern. The Raspberry Pi passes corresponding instructions to two motor circuits, which transform the signals to a higher voltage and pass them to the stepper motors. These turn by very precise amounts, winding or unwinding the cords and, very slowly, dragging the pen across the paper.

A Raspberry Pi in a case, with a wide flex connected to a GPIO header
The Inky Lines plotter's print head, featuring cardboard and tape, draws an apparently random squiggle
A large area of apparently random pattern drawn by the plotter

John explains,

Suspended in-between the two motors is a print head, made out of a new 3-d modelling material I’ve been prototyping called cardboard. An old coat hanger and some velcro were also used.

(He’s our kind of maker.)

Unique images

The earlier drawings that John made used a repeatable method to render image files as lines on paper. That is, if the machine drew the same image a number of times, each copy would be identical. More recently, though, he has been using a method that yields random movements of the pen:

The pen point is guided around the image, but moves to each new point entirely at random. Up close this looks like a chaotic squiggle, but from a distance of a couple of meters, the human eye (and brain) make order from the chaos and view an infinite number of shades and a smoother, less mechanical image.

An apparently chaotic squiggle

This method means that no matter how many times the polargraph repeats the same image, each copy will be unique.

A gallery of work

Inky Lines’ website and its Instagram feed offer a collection of wonderful pieces John has drawn with his polargraph, and he discusses the different techniques and types of image that he is exploring.

A 3 x 3 grid of varied and colourful images from inkylinespolargraph's Instagram feed

They range from holiday photographs, processed to extract particular features and rendered in silhouette, to portraits, made with a single continuous line that can be several hundred metres long, to generative images spirograph images like those pictured above, created by an algorithm rather than rendered from a source image.

The post Enchanting images with Inky Lines, a Pi‑powered polargraph appeared first on Raspberry Pi.

Съд на ЕС: предоставяне на потребителски данни на полицията – кога?

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/16/ecj_privacy/

Известно е Заключението на Генералния адвокат по дело  C‑207/16 въз основа на преюдициално запитване, отправено от Audiencia Provincial de Tarragona (съд на провинция Тарагона, Испания).

Запитването се отнася до тълкуването на понятието „тежки престъпления“ по смисъла на практиката на Съда, установена с решение Digital Rights Ireland и решение Tele2 Sverige и Watson  – в които това понятие се използва като критерий за преценка на законосъобразността и пропорционалността на намесата в правата по членове 7 и 8 от Хартата на основните права на Европейския съюз  –  именно съответно правото на зачитане на личния и семейния живот, както и правото на защита на личните данни.

Запитване е направено в контекста на производство по жалба против съдебно решение, с което на полицейски органи е отказана възможността да им бъдат предадени   данни, притежавани от мобилни телефонни оператори, с цел идентифициране на лица за нуждите на наказателно разследване. Обжалваното решение е мотивирано по-специално със съображението, че деянията, предмет на това разследване, не съставляват тежко престъпление, в разрез с изискванията на приложимата испанска правна уредба.

Въпросите:

„1)      Може ли достатъчната тежест на престъплението като критерий, обосноваващ засягането на признатите в членове 7 и 8 от [Хартата] основни права, да се определи единствено с оглед на наказанието, което може да се наложи за разследваното престъпление, или е необходимо освен това да се установи, че с престъпното деяние се увреждат в особена степен индивидуални и/или колективни правни интереси?

2)      Евентуално, ако определянето на тежестта на престъплението с оглед единствено на наказанието, което може да се наложи, отговаря на конституционните принципи на Съюза, приложени от Съда на ЕС в решението му [Digital Rights] като критерии за строг контрол на Директивата[, обявена за невалидна с това решение], то какъв следва да е минималният праг за наказанието? Допустимо ли е по общ начин да се предвиди праг от три години лишаване от свобода?“.

Тоиз разговор е добре известен на българите от времето на прилагането на Директива 24/2006/ЕС за задържане на трафичните данни, обявена от Съда за невалидна. Тогава имаше разногласия по въпроса кое е тежко и кое е сериозно престъпление и как целта за защита на обществения интерес се съотнася с правото на защита на личния живот и личната кореспонденция. Генералният адвокат също прави препратка към Директива 24/2006/ЕС.

ГА по първия въпрос:

90.      Според мен следва да се внимава, за да не се възприеме твърде широко разбиране относно изискванията, поставени от Съда с тези две решения, за да не се препятства, или поне не прекомерно, възможността на държавите членки да дерогират установения от Директива 2002/58 режим, която им е предоставена с член 15, параграф 1 от същата, в случаите, в които разглежданите намеси в личния живот едновременно преследват законна цел и са с ограничен обхват, каквито е възможно да настъпят в случая в резултат от искането на разследващата полицейска служба. По-конкретно считам, че правото на Съюза допуска възможността за компетентните органи да имат достъп до държаните от доставчици на електронни съобщителни услуги данни за идентификация, позволяващи да се издирят предполагаемите извършители на престъпление, което не е тежко.

91.      С оглед на това препоръчвам на Съда да отговори на преформулирания преюдициален въпрос, че член 15, параграф 1 от Директива 2002/58 във връзка с членове 7 и 8 и член 52, параграф 1 от Хартата трябва да се тълкува в смисъл, че мярка, която за целите на борбата с престъпленията дава на компетентните национални органи достъп до идентификационните данни на ползвателите на телефонни номера, активирани с определен мобилен телефон през ограничен период, при обстоятелства като разглежданите в главното производство води до намеса в гарантираните от споменатата директива и Хартата основни права, която не е толкова сериозна, че да налага такъв достъп да се предоставя само в случаите, в които съответното престъпление е тежко.

ГА по втория въпрос:

96.      Според мен право да определят какво представлява „тежко престъпление“ имат по принцип компетентните органи на държавите членки. Независимо от това, благодарение на преюдициалните запитвания, с които юрисдикциите на държавите членки могат да сезират Съда, същият е натоварен да следи за спазването на всички изисквания, произтичащи от правото на Съюза, и по-специално да осигури последователно прилагане на закрилата, предоставена от разпоредбите на Хартата.

107.  Ако понятието „тежко престъпление“ по смисъла на съдебната практика, установена с решения Digital Rights и Tele2, бъде прието от Съда за самостоятелно понятие на правото на Съюза, то би трябвало да се тълкува в смисъл, че тежестта на дадено престъпление, която може да оправдае достъпа на компетентните национални органи до лични данни съгласно член 15, параграф 1 от Директива 2002/58, трябва да се измерва, като се вземат предвид не само наказанията, които е възможно да бъдат наложени, но и съвкупност от други обективни критерии за преценка като упоменатите по-горе.

121. В заключение считам, че ако Съдът постанови — в разрез с това, което препоръчвам — че за да се квалифицира престъплението като „тежко“ по смисъла на неговата практика, установена с решение Digital Rights, следва да се отчита единствено предвиденото наказание, на втория преюдициален въпрос би следвало да се отговори, че държавите членки са свободни да определят минималния размер на съответното наказание за целта, стига да спазват изискванията, произтичащи от правото на Съюза, и по-специално онези изисквания, съгласно които намесата в основните права, гарантирани с членове 7 и 8 от Хартата, трябва да остане изключение и да бъде съобразена с принципа на пропорционалност.

Да напиша и името на този Генерален адвокат – Henrik Saugmandsgaard Øe от Дания. Успял да застане едновременно на най-разнообразни позиции, като един електрон.

Brutus 2: the gaming PC case of your dreams

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/brutus-2-gaming-pc-case/

Attention, case modders: take a look at the Brutus 2, an extremely snazzy computer case with a partly transparent, animated side panel that’s powered by a Pi. Daniel Otto and Carsten Lehman have a current crowdfunder for the case; their video is in German, but the looks of the build speak for themselves. There are some truly gorgeous effects here.

der BRUTUS 2 by 3nb Gaming

Vorbestellungen ab sofort auf https://www.startnext.com/brutus2 Weitere Infos zu uns auf: https://3nb.de https://www.facebook.com/3nb.de https://www.instagram.com/3nb.de Über 3nb: – GbR aus Leipzig, gegründet 2017 – wir kommen aus den Bereichen Elektronik und Informatik – erstes Produkt: der Brutus One ein Gaming PC mit transparentem Display in der Seite Kurzinfo Brutus 2: – Markencomputergehäuse für Gaming- /Casemoddingszene – Besonderheit: animiertes Seitenfenster angesteuert mit einem Raspberry Pi – Vorteile von unserem Case: o Case ist einzeln lieferbar und nicht nur als komplett-PC o kein Leistungsverbrauch der Grafikkarte dank integriertem Raspberry Pi o bessere Darstellung von Texten und Grafiken durch unscharfen Hintergrund

What’s case modding?

Case modding just means modifying your computer or gaming console’s case, and it’s very popular in the gaming community. Some mods are functional, while others improve the way the case looks. Lots of dedicated gamers don’t only want a powerful computer, they also want it to look amazing — at home, or at LAN parties and games tournaments.

The Brutus 2 case

The Brutus 2 case is made by Daniel and Carsten’s startup, 3nb electronics, and it’s a product that is officially Powered by Raspberry Pi. Its standout feature is the semi-transparent TFT screen, which lets you play any video clip you choose while keeping your gaming hardware on display. It looks incredibly cool. All the graphics for the case’s screen are handled by a Raspberry Pi, so it doesn’t use any of your main PC’s GPU power and your gaming won’t suffer.

Brutus 2 PC case powered by Raspberry Pi

The software

To use Brutus 2, you just need to run a small desktop application on your PC to choose what you want to display on the case. A number of neat animations are included, and you can upload your own if you want.

So far, the app only runs on Windows, but 3nb electronics are planning to make the code open-source, so you can modify it for other operating systems, or to display other file types. This is true to the spirit of the case modding and Raspberry Pi communities, who love adapting, retrofitting, and overhauling projects and code to fit their needs.

Brutus 2 PC case powered by Raspberry Pi

Daniel and Carsten say that one of their campaign’s stretch goals is to implement more functionality in the Brutus 2 app. So in the future, the case could also show things like CPU temperature, gaming stats, and in-game messages. Of course, there’s nothing stopping you from integrating features like that yourself.

If you have any questions about the case, you can post them directly to Daniel and Carsten here.

The crowdfunding campaign

The Brutus 2 campaign on Startnext is currently halfway to its first funding goal of €10000, with over three weeks to go until it closes. If you’re quick, you still be may be able to snatch one of the early-bird offers. And if your whole guild NEEDS this, that’s OK — there are discounts for bulk orders.

The post Brutus 2: the gaming PC case of your dreams appeared first on Raspberry Pi.

Details on a New PGP Vulnerability

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/details_on_a_ne.html

A new PGP vulnerability was announced today. Basically, the vulnerability makes use of the fact that modern e-mail programs allow for embedded HTML objects. Essentially, if an attacker can intercept and modify a message in transit, he can insert code that sends the plaintext in a URL to a remote website. Very clever.

The EFAIL attacks exploit vulnerabilities in the OpenPGP and S/MIME standards to reveal the plaintext of encrypted emails. In a nutshell, EFAIL abuses active content of HTML emails, for example externally loaded images or styles, to exfiltrate plaintext through requested URLs. To create these exfiltration channels, the attacker first needs access to the encrypted emails, for example, by eavesdropping on network traffic, compromising email accounts, email servers, backup systems or client computers. The emails could even have been collected years ago.

The attacker changes an encrypted email in a particular way and sends this changed encrypted email to the victim. The victim’s email client decrypts the email and loads any external content, thus exfiltrating the plaintext to the attacker.

A few initial comments:

1. Being able to intercept and modify e-mails in transit is the sort of thing the NSA can do, but is hard for the average hacker. That being said, there are circumstances where someone can modify e-mails. I don’t mean to minimize the seriousness of this attack, but that is a consideration.

2. The vulnerability isn’t with PGP or S/MIME itself, but in the way they interact with modern e-mail programs. You can see this in the two suggested short-term mitigations: “No decryption in the e-mail client,” and “disable HTML rendering.”

3. I’ve been getting some weird press calls from reporters wanting to know if this demonstrates that e-mail encryption is impossible. No, this just demonstrates that programmers are human and vulnerabilities are inevitable. PGP almost certainly has fewer bugs than your average piece of software, but it’s not bug free.

3. Why is anyone using encrypted e-mail anymore, anyway? Reliably and easily encrypting e-mail is an insurmountably hard problem for reasons having nothing to do with today’s announcement. If you need to communicate securely, use Signal. If having Signal on your phone will arouse suspicion, use WhatsApp.

I’ll post other commentaries and analyses as I find them.

EDITED TO ADD (5/14): News articles.

Slashdot thread.

Съвместима ли е таксата за радио и телевизия с правото на ЕС

Post Syndicated from nellyo original https://nellyo.wordpress.com/2018/05/13/fee_psm/

През март  2018 г. Oberverwaltungsgericht Rheinland-Pfalz (Върховен административен съд на Рейнланд-Пфалц – OVG Rheinland-Pfalz) решава, че таксата за радио и телевизия в Германия е съвместима с правото на ЕС (дело № 7 A 11938/17) . Съдът отхвърля тезата, че таксата е несъвместима с правото на ЕС, тъй като предоставя на обществените радио- и телевизионни доставчици на медийни услуги несправедливо предимство пред техните частни конкуренти.

Съдът посочва, че през 2016 г. Bundesverwaltungsgericht (Федерален административен съд – BVerwG) вече е установил съответствието на таксата  – в новата й форма, въведена през 2013 г. – с правото на ЕС (решение от 18 март 2016 г., BVerwG 6 С 6.15). Съгласно това решение въвеждането на таксата  не изисква съгласието на Европейската комисия и е с съвместимо с Директивата за аудиовизуалните медийни услуги.  Обществените и частните радио- и телевизионни оператори  неизбежно ще бъдат финансирани по различни начини. Това обаче не означава непременно, че обществените радио- и телевизионни оператори са получили несправедливо предимство, тъй като за разлика от частните радио- и телевизионни оператори те са подложени на много по-ограничителни правила за рекламиране и следователно са финансово зависими от таксата.

Междувременно Landgericht Tübingen (Районен съд в Тюбинген, решение от 3 август 2017 г., дело № 5 T 246/17 и др.) е постановил, че таксата  нарушава правото на ЕС  – и в резултат има подадено преюдициално запитване до Съда на ЕС –  дело  С-492/17.

 

Преюдициални въпроси:

1)

Несъвместим ли е с правото на Съюза националният Gesetz vom 18.10.2011 zur Geltung des Rundfunkbeitragsstaatsvertrags (RdFunkBeitrStVtrBW) vom 17 Dezember 2010 (Закон от 18 октомври 2011 г. за прилагане на Държавния договор за вноската за радио- и телевизионно разпространение от 17 декември 2010 г., наричан по-нататък „RdFunkBeitrStVtrBW“) на провинция Баден-Вюртемберг, последно изменен с член 4 от Neunzehnter Rundfunkänderungsstaatsvertrag (Деветнадесети държавен договор за изменение на Държавните договори за радио- и телевизионно разпространение) от 3 декември 2015 г. (Закон от 23 февруари 2016 г., GBl. стр. 126, 129), поради това че вноската, събирана от 1 януари 2013 г. съгласно този закон безусловно по принцип от всяко живеещо в германската федерална провинция Баден-Вюртемберг пълнолетно лице в полза на радио- и телевизионните оператори SWR и ZDF, представлява помощ, която противоречи на правото на Съюза и предоставя по-благоприятно третиране само в полза на тези обществени радио- и телевизионни оператори спрямо частни радио- и телевизионни оператори? Трябва ли членове 107 и 108 ДФЕС да се тълкуват в смисъл, че за Закона за вноската за радио- и телевизионно разпространение е трябвало да се получи разрешението на Комисията и поради липсата на разрешение той е невалиден?

2)

Трябва ли член 107 ДФЕС, съответно член 108 ДФЕС да се тълкува в смисъл, че в обхвата му попада правна уредба, установена в националния закон „RdFunkBeitrStVtrBW“, която предвижда, че по принцип от всяко живеещо в Баден-Вюртемберг пълнолетно лице безусловно се събира вноска в полза само на държавни/обществени радио- и телевизионни оператори, поради това че тази вноска съдържа противоречаща на правото на Съюза и предоставяща по-благоприятно третиране помощ с цел изключването по технически причини на оператори от държави от Европейския съюз, доколкото вноските са предназначени да се използват за създаването на конкурентен начин на пренос (монопол върху DVB-T2), без да е предвидено той да се използва от чуждестранни оператори? Трябва ли член 107 ДФЕС, съответно член 108 ДФЕС да се тълкува в смисъл, че в обхвата му попадат не само преки субсидии, но и други релевантни от икономическа гледна точка привилегии (право на издаване на изпълнителен лист, правомощия за предприемане на действия както в качеството на стопанско предприятие, така и в качеството на орган, поставяне в по-благоприятно положение при изчисляването на дълговете)?

3)

Съвместимо ли е с принципа на равно третиране и със забраната за предоставящи привилегии помощи положение, при което на основание национален закон на провинция Баден-Вюртемберг германски телевизионен оператор, който се урежда от нормите на публичното право и има предоставени правомощия на орган, но същевременно се конкурира с частни радио- и телевизионни оператори на рекламния пазар, е привилегирован в сравнение с тези оператори поради това че не трябва като частните конкуренти да иска по общия съдебен ред да му бъде издаден изпълнителен лист за вземанията му срещу зрителите, преди да може да пристъпи към принудително изпълнение, а самият той има право, без участието на съд, да издаде титул, който същевременно му дава право на принудително изпълнение?

4)

Съвместимо ли е с член 10 от ЕКПЧ /член [11] от Хартата на основните права (свобода на информация) положение, при което държава членка предвижда в национален закон на провинция Баден-Вюртемберг, че телевизионен оператор, на който са предоставени правомощия на орган, има право да изисква плащането на вноска от всяко живеещо в зоната на радио- и телевизионното излъчване пълнолетно лице за целите на финансирането на точно този оператор, при неплащането на която е предвидена глоба, независимо дали това лице въобще разполага с приемник или само използва услугите на други, а именно чуждестранни или други, частни оператори?

5)

Съвместим ли е националният закон „RdFunkBeitrStVtrBW“, и по-специално членове 2 и 3, с установените в правото на Съюза принципи на равно третиране и на недопускане на дискриминация в положение, при което вноската, която следва да се плаща безусловно от всеки жител за целите на финансирането на обществен телевизионен оператор, налага на всяко лице, което само отглежда детето си, тежест в размер, многократно по-висок от сумата, дължима от лице, което живее в общо жилище с други хора? Следва ли Директива 2004/113/ЕО (1) да се тълкува в смисъл, че спорната вноска също попада в обхвата ѝ и че e достатъчно да е налице косвено поставяне в по-неблагоприятно положение, след като с оглед на реалните дадености 90 % от жените понасят по-голяма тежест?

6)

Съвместим ли националният закон „RdFunkBeitrStVtrBW“, и по-специално членове 2 и 3, с установените в правото на Съюза принципи на равно третиране и на недопускане на дискриминация в положение, при което вноската, която следва да се плаща безусловно от всеки жител за целите на финансирането на обществен телевизионен оператор, за нуждаещите се от второ жилище лица по свързана с работата причина е двойно по-голяма, отколкото за други работници?

7)

Съвместим ли е националният закон „RdFunkBeitrStVtrBW“, и по-специално членове 2 и 3, с установените в правото на Съюза принципи на равно третиране и на недопускане на дискриминация и със свободата на установяване, ако вноската, която следва да се плаща безусловно от всеки жител за целите на финансирането на обществен телевизионен оператор, е уредена по такъв начин, че при еднаква възможност за приемане на радио- и телевизионно разпространение непосредствено преди границата със съседна държава от ЕС германски гражданин дължи вноската само поради мястото си на пребиваване, докато германският гражданин, живущ непосредствено от другата страна на границата, не дължи вноската, също както гражданинът на друга държава — членка на ЕС, който по свързани с работата причини трябва да се установи непосредствено от другата страна на вътрешна граница на ЕС, понася тежестта на вноската, но не и гражданинът на ЕС, живущ непосредствено преди границата, дори и никой от двамата да не се интересува от приемането на излъчванията на германския оператор?

Коментар по въпрос №4:  допуснат е въпрос за съвместимост с чл.10 от Конвенцията за правата на човека. Съдът за правата на човека вече се е произнасял, има съображения за недопустимост по сходно дело отпреди десетина години –  ето тук съм писала – вж Faccio v Italy – но нека да се произнесе и Съдът на ЕС.

И – отново за характера на таксата: ако  плащат и хората без приемник, това очевидно не е такса в смисъл цена за услуга, а данъчно вземане, по мое мнение това е тенденцията.

Чакаме решението на Съда на ЕС. Нека да се развива и множи практиката.

A serverless solution for invoking AWS Lambda at a sub-minute frequency

Post Syndicated from Emanuele Menga original https://aws.amazon.com/blogs/architecture/a-serverless-solution-for-invoking-aws-lambda-at-a-sub-minute-frequency/

If you’ve used Amazon CloudWatch Events to schedule the invocation of a Lambda function at regular intervals, you may have noticed that the highest frequency possible is one invocation per minute. However, in some cases, you may need to invoke Lambda more often than that. In this blog post, I’ll cover invoking a Lambda function every 10 seconds, but with some simple math you can change to whatever interval you like.

To achieve this, I’ll show you how to leverage Step Functions and Amazon Kinesis Data Streams.

The Solution

For this example, I’ve created a Step Functions State Machine that invokes our Lambda function 6 times, 10 seconds apart. Such State Machine is then executed once per minute by a CloudWatch Events Rule. This state machine is then executed once per minute by an Amazon CloudWatch Events rule. Finally, the Kinesis Data Stream triggers our Lambda function for each record inserted. The result is our Lambda function being invoked every 10 seconds, indefinitely.

Below is a diagram illustrating how the various services work together.

Step 1: My sampleLambda function doesn’t actually do anything, it just simulates an execution for a few seconds. This is the (Python) code of my dummy function:

import time

import random


def lambda_handler(event, context):

rand = random.randint(1, 3)

print('Running for {} seconds'.format(rand))

time.sleep(rand)

return True

Step 2:

The next step is to create a second Lambda function, that I called Iterator, which has two duties:

  • It keeps track of the current number of iterations, since Step Function doesn’t natively have a state we can use for this purpose.
  • It asynchronously invokes our Lambda function at every loops.

This is the code of the Iterator, adapted from here.

 

import boto3

client = boto3.client('kinesis')

def lambda_handler(event, context):

index = event['iterator']['index'] + 1

response = client.put_record(

StreamName='LambdaSubMinute',

PartitionKey='1',

Data='',

)

return {

'index': index,

'continue': index < event['iterator']['count'],

'count': event['iterator']['count']

}

This function does three things:

  • Increments the counter.
  • Verifies if we reached a count of (in this example) 6.
  • Sends an empty record to the Kinesis Stream.

Now we can create the Step Functions State Machine; the definition is, again, adapted from here.

 

{

"Comment": "Invoke Lambda every 10 seconds",

"StartAt": "ConfigureCount",

"States": {

"ConfigureCount": {

"Type": "Pass",

"Result": {

"index": 0,

"count": 6

},

"ResultPath": "$.iterator",

"Next": "Iterator"

},

"Iterator": {

"Type": "Task",

"Resource": “arn:aws:lambda:REGION:ACCOUNT_ID:function:Iterator",

"ResultPath": "$.iterator",

"Next": "IsCountReached"

},

"IsCountReached": {

"Type": "Choice",

"Choices": [

{

"Variable": "$.iterator.continue",

"BooleanEquals": true,

"Next": "Wait"

}

],

"Default": "Done"

},

"Wait": {

"Type": "Wait",

"Seconds": 10,

"Next": "Iterator"

},

"Done": {

"Type": "Pass",

"End": true

}

}

}

This is how it works:

  1. The state machine starts and sets the index at 0 and the count at 6.
  2. Iterator function is invoked.
  3. If the iterator function reached the end of the loop, the IsCountReached state terminates the execution, otherwise the machine waits for 10 seconds.
  4. The machine loops back to the iterator.

Step 3: Create an Amazon CloudWatch Events rule scheduled to trigger every minute and add the state machine as its target. I’ve actually prepared an Amazon CloudFormation template that creates the whole stack and starts the Lambda invocations, you can find it here.

Performance

Let’s have a look at a sample series of invocations and analyse how precise the timing is. In the following chart I reported the delay (in excess of the expected 10-second-wait) of 30 consecutive invocations of my dummy function, when the Iterator is configured with a memory size of 1024MB.

Invocations Delay

Notice the delay increases by a few hundred milliseconds at every invocation. The good news is it accrues only within the same loop, 6 times; after that, a new CloudWatch Events kicks in and it resets.

This delay  is due to the work that AWS Step Function does outside of the Wait state, the main component of which is the Iterator function itself, that runs synchronously in the state machine and therefore adds up its duration to the 10-second-wait.

As we can easily imagine, the memory size of the Iterator Lambda function does make a difference. Here are the Average and Maximum duration of the function with 256MB, 512MB, 1GB and 2GB of memory.

Average Duration

Maximum Duration


Given those results, I’d say that a memory of 1024MB is a good compromise between costs and performance.

Caveats

As mentioned, in our Amazon CloudWatch Events documentation, in rare cases a rule can be triggered twice, causing two parallel executions of the state machine. If that is a concern, we can add a task state at the beginning of the state machine that checks if any other executions are currently running. If the outcome is positive, then a choice state can immediately terminate the flow. Since the state machine is invoked every 60 seconds and runs for about 50, it is safe to assume that executions should all be sequential and any parallel executions should be treated as duplicates. The task state that checks for current running executions can be a Lambda function similar to the following:

 

import boto3

client = boto3.client('stepfunctions')

def lambda_handler(event, context):

response = client.list_executions(

stateMachineArn='arn:aws:states:REGION:ACCOUNTID:stateMachine:LambdaSubMinute',

statusFilter='RUNNING'

)

return {

'alreadyRunning': len(response['executions']) > 0

}

About the Author

Emanuele Menga, Cloud Support Engineer

 

Analyze Apache Parquet optimized data using Amazon Kinesis Data Firehose, Amazon Athena, and Amazon Redshift

Post Syndicated from Roy Hasson original https://aws.amazon.com/blogs/big-data/analyzing-apache-parquet-optimized-data-using-amazon-kinesis-data-firehose-amazon-athena-and-amazon-redshift/

Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.

To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.

Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.

What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.

Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.

Solution overview

Here is how the solution is laid out:

 

 

The following sections walk you through each of these steps to set up the pipeline.

1. Define the schema

When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.

To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.

The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.

CREATE EXTERNAL TABLE default.kfhconnectblog (
  awsaccountid string,
  agentarn string,
  currentagentsnapshot string,
  eventid string,
  eventtimestamp string,
  eventtype string,
  instancearn string,
  previousagentsnapshot string,
  version string
)
STORED AS parquet
LOCATION 's3://your_bucket/kfhconnectblog/'
TBLPROPERTIES ("parquet.compression"="SNAPPY")

That’s all you have to do to prepare the schema for Kinesis Data Firehose.

2. Define the data streams

Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events.  Open the Kinesis Data Streams console and create two streams.  You can configure them with only one shard each because you don’t have a lot of data right now.

3. Define the Kinesis Data Firehose delivery stream for Parquet

Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.

As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.

To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.

On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.

Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.

Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.

Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.

4. Set up the Amazon Connect contact center

After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.

After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.

At this point, your pipeline is complete.  Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.

So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.

To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.

5. Analyze the data with Athena

Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures.  The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.

Let’s run the first query to see which agents have logged into the system.

The query might look complex, but it’s fairly straightforward:

WITH dataset AS (
  SELECT 
    from_iso8601_timestamp(eventtimestamp) AS event_ts,
    eventtype,
    -- CURRENT STATE
    json_extract_scalar(
      currentagentsnapshot,
      '$.agentstatus.name') AS current_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        currentagentsnapshot,
        '$.agentstatus.starttimestamp')) AS current_starttimestamp,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.firstname') AS current_firstname,
    json_extract_scalar(
      currentagentsnapshot,
      '$.configuration.lastname') AS current_lastname,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.username') AS current_username,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') AS               current_outboundqueue,
    json_extract_scalar(
      currentagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
    -- PREVIOUS STATE
    json_extract_scalar(
      previousagentsnapshot, 
      '$.agentstatus.name') as prev_status,
    from_iso8601_timestamp(
      json_extract_scalar(
        previousagentsnapshot, 
       '$.agentstatus.starttimestamp')) as prev_starttimestamp,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.firstname') as prev_firstname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.lastname') as prev_lastname,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.username') as prev_username,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
    json_extract_scalar(
      previousagentsnapshot, 
      '$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
  from kfhconnectblog
  where eventtype <> 'HEART_BEAT'
)
SELECT
  current_status as status,
  current_username as username,
  event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC

The query output looks something like this:

Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.

WITH src AS (
  SELECT
     eventid,
     json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
     cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
     cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
  from kfhconnectblog
),
src2 AS (
  SELECT *
  FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT 
  eventid,
  username,
  json_extract_scalar(c_item, '$.contactid') as c_contactid,
  json_extract_scalar(c_item, '$.channel') as c_channel,
  json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
  json_extract_scalar(c_item, '$.queue.name') as c_queue,
  json_extract_scalar(c_item, '$.state') as c_state,
  from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
  
  json_extract_scalar(p_item, '$.contactid') as p_contactid,
  json_extract_scalar(p_item, '$.channel') as p_channel,
  json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
  json_extract_scalar(p_item, '$.queue.name') as p_queue,
  json_extract_scalar(p_item, '$.state') as p_state,
  from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT 
  username,
  c_channel as channel,
  c_direction as direction,
  p_state as prev_state,
  c_state as current_state,
  c_ts as current_ts,
  c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC

The query output looks similar to the following:

6. Analyze the data with Amazon Redshift Spectrum

With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.

Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.

SELECT 
  eventtype,
  json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
  json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
  json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
  json_extract_path_text(
    currentagentsnapshot,
    'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog

The following shows the query output:

Summary

In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.

 


Additional Reading

If you found this post useful, be sure to check out Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight and Work with partitioned data in AWS Glue.


About the Author

Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.

 

 

 

Mayank Sinha’s home security project

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/home-security/

Yesterday, I received an email from someone called Mayank Sinha, showing us the Raspberry Pi home security project he’s been working on. He got in touch particularly because, he writes, the Raspberry Pi community has given him “immense support” with his build, and he wanted to dedicate it to the commmunity as thanks.

Mayank’s project is named Asfaleia, a Greek word that means safety, certainty, or security against threats. It’s part of an honourable tradition dating all the way back to 2012: it’s a prototype housed in a polystyrene box, using breadboards and jumper leads and sticky tape. And it’s working! Take a look.

Asfaleia DIY Home Security System

An IOT based home security system. The link to the code: https://github.com/mayanksinha11/Asfaleia

Home security with Asfaleida

Asfaleia has a PIR (passive infrared) motion sensor, an IR break beam sensor, and a gas sensor. All are connected to a Raspberry Pi 3 Model B, the latter two via a NodeMCU board. Mayank currently has them set up in a box that’s divided into compartments to model different rooms in a house.

A shallow box divided into four labelled "rooms", all containing electronic components

All the best prototypes have sticky tape or rubber bands

If the IR sensors detect motion or a broken beam, the webcam takes a photo and emails it to the build’s owner, and the build also calls their phone (I like your ringtone, Mayank). If the gas sensor detects a leak, the system activates an exhaust fan via a small relay board, and again the owner receives a phone call. The build can also authenticate users via face and fingerprint recognition. The software that runs it all is written in Python, and you can see Mayank’s code on GitHub.

Of prototypes and works-in-progess

Reading Mayank’s email made me very happy yesterday. We know that thousands of people in our community give a great deal of time and effort to help others learn and make things, and it is always wonderful to see an example of how that support is helping someone turn their ideas into reality. It’s great, too, to see people sharing works-in-progress, as well as polished projects! After all, the average build is more likely to feature rubber bands and Tupperware boxes than meticulously designed laser-cut parts or expert joinery. Mayank’s YouTube channel shows earlier work on this and another Pi project, and I hope he’ll continue to document his builds.

So here’s to Raspberry Pi projects big, small, beginner, professional, endlessly prototyped, unashamedly bodged, unfinished or fully working, shonky or shiny. Please keep sharing them all!

The post Mayank Sinha’s home security project appeared first on Raspberry Pi.

Supply-Chain Security

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/05/supply-chain_se.html

Earlier this month, the Pentagon stopped selling phones made by the Chinese companies ZTE and Huawei on military bases because they might be used to spy on their users.

It’s a legitimate fear, and perhaps a prudent action. But it’s just one instance of the much larger issue of securing our supply chains.

All of our computerized systems are deeply international, and we have no choice but to trust the companies and governments that touch those systems. And while we can ban a few specific products, services or companies, no country can isolate itself from potential foreign interference.

In this specific case, the Pentagon is concerned that the Chinese government demanded that ZTE and Huawei add “backdoors” to their phones that could be surreptitiously turned on by government spies or cause them to fail during some future political conflict. This tampering is possible because the software in these phones is incredibly complex. It’s relatively easy for programmers to hide these capabilities, and correspondingly difficult to detect them.

This isn’t the first time the United States has taken action against foreign software suspected to contain hidden features that can be used against us. Last December, President Trump signed into law a bill banning software from the Russian company Kaspersky from being used within the US government. In 2012, the focus was on Chinese-made Internet routers. Then, the House Intelligence Committee concluded: “Based on available classified and unclassified information, Huawei and ZTE cannot be trusted to be free of foreign state influence and thus pose a security threat to the United States and to our systems.”

Nor is the United States the only country worried about these threats. In 2014, China reportedly banned antivirus products from both Kaspersky and the US company Symantec, based on similar fears. In 2017, the Indian government identified 42 smartphone apps that China subverted. Back in 1997, the Israeli company Check Point was dogged by rumors that its government added backdoors into its products; other of that country’s tech companies have been suspected of the same thing. Even al-Qaeda was concerned; ten years ago, a sympathizer released the encryption software Mujahedeen Secrets, claimed to be free of Western influence and backdoors. If a country doesn’t trust another country, then it can’t trust that country’s computer products.

But this trust isn’t limited to the country where the company is based. We have to trust the country where the software is written — and the countries where all the components are manufactured. In 2016, researchers discovered that many different models of cheap Android phones were sending information back to China. The phones might be American-made, but the software was from China. In 2016, researchers demonstrated an even more devious technique, where a backdoor could be added at the computer chip level in the factory that made the chips ­ without the knowledge of, and undetectable by, the engineers who designed the chips in the first place. Pretty much every US technology company manufactures its hardware in countries such as Malaysia, Indonesia, China and Taiwan.

We also have to trust the programmers. Today’s large software programs are written by teams of hundreds of programmers scattered around the globe. Backdoors, put there by we-have-no-idea-who, have been discovered in Juniper firewalls and D-Link routers, both of which are US companies. In 2003, someone almost slipped a very clever backdoor into Linux. Think of how many countries’ citizens are writing software for Apple or Microsoft or Google.

We can go even farther down the rabbit hole. We have to trust the distribution systems for our hardware and software. Documents disclosed by Edward Snowden showed the National Security Agency installing backdoors into Cisco routers being shipped to the Syrian telephone company. There are fake apps in the Google Play store that eavesdrop on you. Russian hackers subverted the update mechanism of a popular brand of Ukrainian accounting software to spread the NotPetya malware.

In 2017, researchers demonstrated that a smartphone can be subverted by installing a malicious replacement screen.

I could go on. Supply-chain security is an incredibly complex problem. US-only design and manufacturing isn’t an option; the tech world is far too internationally interdependent for that. We can’t trust anyone, yet we have no choice but to trust everyone. Our phones, computers, software and cloud systems are touched by citizens of dozens of different countries, any one of whom could subvert them at the demand of their government. And just as Russia is penetrating the US power grid so they have that capability in the event of hostilities, many countries are almost certainly doing the same thing at the consumer level.

We don’t know whether the risk of Huawei and ZTE equipment is great enough to warrant the ban. We don’t know what classified intelligence the United States has, and what it implies. But we do know that this is just a minor fix for a much larger problem. It’s doubtful that this ban will have any real effect. Members of the military, and everyone else, can still buy the phones. They just can’t buy them on US military bases. And while the US might block the occasional merger or acquisition, or ban the occasional hardware or software product, we’re largely ignoring that larger issue. Solving it borders on somewhere between incredibly expensive and realistically impossible.

Perhaps someday, global norms and international treaties will render this sort of device-level tampering off-limits. But until then, all we can do is hope that this particular arms race doesn’t get too far out of control.

This essay previously appeared in the Washington Post.

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

[$] Reworking page-table traversal

Post Syndicated from corbet original https://lwn.net/Articles/753267/rss

A system’s page tables are organized into a tree that is as many as five
levels deep. In many ways those levels are all similar, but the kernel
treats them all as being different, with the result that page-table
manipulations include a fair amount of repetitive code. During the
memory-management track of the 2018 Linux Storage, Filesystem, and
Memory-Management Summit, Kirill Shutemov proposed reworking how page
tables are maintained. The idea was popular, but the implementation is
likely to be tricky.

3D-printed speakers from the Technical University of Denmark

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/technical-university-denmark-speakers/

Students taking Design of Mechatronics at the Technical University of Denmark have created some seriously elegant and striking Raspberry Pi speakers. Their builds are part of a project asking them to “explore, design and build a 3D printed speaker, around readily available electronics and components”.

The students have been uploading their designs, incorporating Raspberry Pis and HiFiBerry HATs, to Thingiverse throughout April. The task is a collaboration with luxury brand Bang & Olufsen’s Create initiative, and the results wouldn’t look out of place in a high-end showroom; I’d happily take any of these home.

The Sphere

Søren Qvist Sphere 3D-printed laser-cut Raspberry Pi Speaker
Søren Qvist Sphere 3D-printed laser-cut Raspberry Pi Speaker
Søren Qvist Sphere 3D-printed laser-cut Raspberry Pi Speaker

Søren Qvist’s wall-mounted kitchen sphere uses 3D-printed and laser-cut parts, along with the HiFiBerry HAT and B&O speakers to create a sleek-looking design.

Hex One

Otto Ømann Hex One 3D-printed laser-cut Raspberry Pi Speaker

Otto Ømann Hex One 3D-printed laser-cut Raspberry Pi Speaker

Otto Ømann’s group have designed the Hex One – a work-in-progress wireless 360° speaker. A particular objective for their project is to create a speaker using as many 3D-printed parts as possible.

Portable B&O-Create Speaker



“The design is supposed to resemble that of a B&O speaker, and from a handful of categories we chose to create a portable and wearable speaker,” explain Gustav Larsen and his team.

Desktop Loudspeaker

Oliver Repholtz Behrens loudspeaker

Oliver Repholtz Behrens loudspeaker

Oliver Repholtz Behrens and team have housed a Raspberry Pi and HiFiBerry HAT inside this this stylish airplay speaker. You can follow their design progress on their team blog.

B&O TILE



Tue Thomsen’s six-person team Mechatastic have produced the B&O TILE. “The speaker consists of four 3D-printed cabinet and top parts, where the top should be covered by fabric,” they explain. “The speaker insides consists of laser-cut wood to hold the tweeter and driver and encase the Raspberry Pi.”

The team aimed to design a speaker that would be at home in a kitchen. With a removable upper casing allowing for a choice of colour, the TILE can be customised to fit particular tastes and colour schemes.

Build your own speakers with Raspberry Pis

Raspberry Pi’s onboard audio jack, along with third-party HATs such as the HiFiBerry and Pimoroni Speaker pHAT, make speaker design and fabrication with the Pi an interesting alternative to pre-made tech. These builds don’t tend to be technically complex, and they provide some lovely examples of tech-based projects that reflect makers’ own particular aesthetic style.

If you have access to a 3D printer or a laser cutter, perhaps at a nearby maker space, then those can be excellent resources, but fancy kit isn’t a requirement. Basic joinery and crafting with card or paper are just a couple of ways you can build things that are all your own, using familiar tools and materials. We think more people would enjoy getting hands-on with this sort of thing if they gave it a whirl, and we publish a free magazine to help.

Raspberry Pi Zero AirPlay Speaker

Looking for a new project to build around the Raspberry Pi Zero, I came across the pHAT DAC from Pimoroni. This little add-on board adds audio playback capabilities to the Pi Zero. Because the pHAT uses the GPIO pins, the USB OTG port remains available for a wifi dongle.

This video by Frederick Vandenbosch is a great example of building AirPlay speakers using a Pi and HAT, and a quick search will find you lots more relevant tutorials and ideas.

Have you built your own? Share your speaker-based Pi builds with us in the comments.

The post 3D-printed speakers from the Technical University of Denmark appeared first on Raspberry Pi.

Hard Drive Stats for Q1 2018

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/hard-drive-stats-for-q1-2018/

Backblaze Drive Stats Q1 2018

As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.

Background

Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.

Hard Drive Reliability Statistics for Q1 2018

At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.

Q1 2018 Hard Drive Failure Rates

Notes and Observations

If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.

The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.

There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.

Welcome Toshiba 8TB drives, almost…

We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.

In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.

Q1 2018 Hard Drive Failure Rate — Toshiba 8TB

So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.

Lifetime Hard Drive Reliability Statistics

While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.

Lifetime Hard Drive Failure Rates

Notes and Observations

The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.

The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.

Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.

We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.

New SMART Attributes

If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:

  • 177 – Wear Range Delta
  • 179 – Used Reserved Block Count Total
  • 181- Program Fail Count Total or Non-4K Aligned Access Count
  • 182 – Erase Fail Count
  • 235 – Good Block Count AND System(Free) Block Count

The 5 values are all related to SSD drives.

Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.

Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.

Good luck and let us know if you find anything interesting.

[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]

The post Hard Drive Stats for Q1 2018 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

10 visualizations to try in Amazon QuickSight with sample data

Post Syndicated from Karthik Kumar Odapally original https://aws.amazon.com/blogs/big-data/10-visualizations-to-try-in-amazon-quicksight-with-sample-data/

If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.

The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.

Which data sources does Amazon QuickSight support?

At the time of publication, you can use the following data methods:

  • Connect to AWS data sources, including:
    • Amazon RDS
    • Amazon Aurora
    • Amazon Redshift
    • Amazon Athena
    • Amazon S3
  • Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
  • Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
  • Import data from SaaS applications like Salesforce and Snowflake
  • Use big data processing engines like Spark and Presto

This list is constantly growing. For more information, see Supported Data Sources.

Answers in instants

SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.

Typical Amazon QuickSight workflow

When you create an analysis, the typical workflow is as follows:

  1. Connect to a data source, and then create a new dataset or choose an existing dataset.
  2. (Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
  3. Create a new analysis.
  4. Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
  5. (Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
  6. (Optional) Add more visuals to the analysis.
  7. (Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
  8. (Optional) Publish the analysis as a dashboard to share insights with other users.

The following graphic illustrates a typical Amazon QuickSight workflow.

Visualizations created in Amazon QuickSight with sample datasets

Visualizations for a data analyst

Source:  https://data.worldbank.org/

Download and Resources:  https://datacatalog.worldbank.org/dataset/world-development-indicators

Data catalog:  The World Bank invests into multiple development projects at the national, regional, and global levels. It’s a great source of information for data analysts.

The following graph shows the percentage of the population that has access to electricity (rural and urban) during 2000 in Asia, Africa, the Middle East, and Latin America.

The following graph shows the share of healthcare costs that are paid out-of-pocket (private vs. public). Also, you can maneuver over the graph to get detailed statistics at a glance.

Visualizations for a trading analyst

Source:  Deutsche Börse Public Dataset (DBG PDS)

Download and resources:  https://aws.amazon.com/public-datasets/deutsche-boerse-pds/

Data catalog:  The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.

The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.

The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.

Visualizations for a data scientist

Source:  https://catalog.data.gov/

Download and resources:  https://catalog.data.gov/dataset/road-weather-information-stations-788f8

Data catalog:  Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.

The following graph shows the present max air temperature in Seattle from different RWI station sensors.

The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.

Visualizations for a data engineer

Source:  https://www.kaggle.com/

Download and resources:  https://www.kaggle.com/datasnaek/youtube-new/data

Data catalog:  Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.

The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.

The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.

Visualizations for a business user

Source:  New York Taxi Data

Download and resources:  https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb

Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.

The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.

A quick search for that date and location shows you the following news report:

Summary

Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!

 


Additional Reading

If you found this post useful, be sure to check out Amazon QuickSight Adds Support for Combo Charts and Row-Level Security and Visualize AWS Cloudtrail Logs Using AWS Glue and Amazon QuickSight.


Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.

 

 

 

Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.