PyCon UK 2018 will take place on Saturday 15 September to Wednesday 19 September in the splendid Cardiff City Hall, just a few miles from the Sony Technology Centre where the vast majority of Raspberry Pis is made. We’re pleased to announce that we’re curating this year’s Education Summit at the conference, where we’ll offer opportunities for young people to learn programming skills, and for educators to undertake professional development!
PyCon UK 2018 is your chance to be welcomed into the wonderful Python community. At the Education Summit, we’ll put on a young coders’ day on the Saturday, and an educators’ day on the Sunday.
Saturday — young coders’ day
On Saturday we’ll be running a CoderDojo full of workshops on Raspberry Pi and micro:bits for young people aged 7 to 17. If they wish, participants will get to make a project and present it to the conference on the main stage, and everyone will be given a free micro:bit to take home!
PyCon UK has been bringing developers and educators together ever since it first started its education track in 2011. This year’s Sunday will be a day of professional development: we’ll give teachers, educators, parents, and coding club leaders the chance to learn from us and from each other to build their programming, computing, and digital making skills.
We invite you to send in your proposal for a talk and workshop at the Education Summit! We’re looking for:
25-minute talks for the educators’ day
50-minute workshops for either the young coders’ or the educators’ day
If you have something you’d like to share, such as a professional development session for educators, advice on best practice for teaching programming, a workshop for up-skilling in Python, or a fun physical computing activity for the CoderDojo, then we’d love to hear about it! Please submit your proposalby 15 June.
After the Education Summit, the conference will continue for two days of talks and a final day of development sprints. Feel free to submit your education-related talk to the main conference too if you want to share it with a wider audience! Check out the PyCon UK 2018 website for more information.
GDPR, или новият Общ регламент относно защитата на данните, е гореща тема, тъй като влиза в сила на 25-ти май. И разбира се, публичното пространство е пълно с мнения и заключения по въпроса. За съжаление повечето от тях са грешни. На база на наблюденията ми от последните месеци реших да извадя 7 мита за Регламента.
От края на миналата година активно консултирам малки и големи компании относно регламента, водя обучения и семинари и пиша технически разяснения. И не, не съм юрист, но Регламентът изисква познаване както на правните, така и на технологичните аспекти на защитата на данните.
1. „GDPR ми е ясен, разбрал съм го“
Най-опасното е човек да мисли, че разбира нещо след като само е чувал за него или е прочел две статии в новинарски сайт (както за GDPR така и в по-общ смисъл). Аз самият все още не твърдя, че познавам всички ъгълчета на Регламента. Но по конференции, кръгли маси, обучения, срещи, форуми и фейсбук групи съм чул и прочел твърде много глупости относно GDPR. И то такива, които могат да се оборят с „Не е вярно, виж чл. Х“. В тази категория за съжаление влизат и юристи, и IT специалисти, и хора на ръководни позиции.
От мита, че познаваме GDPR, произлизат и всички останали митове. Част от вината за това е и на самия Регламент. Дълъг е, чете се трудно, има лоши законодателни практики (3 различни хипотези в едно изречение??) и нито Европейската Комисия, нито някоя друга европейска институция си е направила труда да го разясни за хората, за които се отнася – а именно, за почти всички. Т.нар. „работна група по чл. 29 (от предишната Директива)“ има разяснения по някои въпроси, но те са също толкова дълги и трудно четими ако човек няма контекст. При толкова широкообхватно законодателство е голяма грешка то да се остави нерязяснено. Да, в него има много нюанси и много условности (което е друг негов минус), но е редно поне общите положения да бъдат разказани ясно и то от практическа гледна точка.
Така че не – да не си мислим, че сме разбрали GDPR.
2. „Личните данни са тайна“
Определението за лични данни в Регламента може би характеризира целия Регламент – трудно четима и „увъртяно“:
„лични данни“ означава всяка информация, свързана с идентифицирано физическо лице или физическо лице, което може да бъде идентифицирано („субект на данни“); физическо лице, което може да бъде идентифицирано, е лице, което може да бъде идентифицирано, пряко или непряко, по-специално чрез идентификатор като име, идентификационен номер, данни за местонахождение, онлайн идентификатор или по един или повече признаци, специфични за физическата, физиологичната, генетичната, психическата, умствената, икономическата, културната или социална идентичност на това физическо лице;
Всъщност лични данни са всичко, което се отнася за нас. Включително съвсем очевидни неща като цвят на очи и коса, ръст и т.н. И не, личните данни не са тайна. Имената ни не са тайна, ръстът ни не е тайна. ЕГН-то ни не е тайна (да, не е). Има специални категории лични данни, които могат да бъдат тайна (напр. медицински данни), но за тях има специален ред.
Разграничаването, което GDPR не прави ясно, за разлика от едно разяснение на NIST – има лични данни, на база на които хората могат да бъдат идентифицирани, и такива, с които не могат, но се отнасят за тях. По цвят на косата не можем да бъдем идентифицирани. Но цветът на косата представлява лични данни. По професия не можем да бъдем идентифицирани. (По три имена и професия обаче – евентуално може и да можем). И тук едно много важно нещо, посочено в последните изречения на съображение 26 – данни, които са лични, но не могат да бъдат отнесени към конкретно лице, и на база на които не може да бъде идентифицирано такова, не попадат в обхвата на регламента. И съвсем не са тайна – „имаме 120 клиента на възраст 32 години, които са си купили телефон Sony между Април и Юли“ е напълно окей.
Та, личните данни не са та тайни – някои даже са съвсем явни и видни. Целта на GDPR е да уреди тяхната обработка с автоматизирани средства (или полуавтоматизирани в структуриран вид, т.е. тетрадки). С други думи – кой има право да ги съхранява, за какво има право да ги използва и как трябва да ги съхранява и използва.
3. „GDPR не се отнася за мен“
Няма почти никакви изключения в Регламента. Компании под 250 души не са длъжни да водят едни регистри, а компании, които нямат мащабна обработка и наблюдение на субекти на данни нямат задължение за длъжностно лице по защита на данните (Data protection officer; тази точка е дискусионна с оглед на предложенията за изменения на българския закон за защита на личните данни, които разширяват прекалено много изискванията за DPO). Всичко останало важи за всички, които обработват лични данни. И всички граждани на ЕС имат всички права, посочени в Регламента.
4. „Ще ни глобят 20 милиона евро“
Тези глоби са единствената причина GDPR да е популярен. Ако не бяха те, на никого нямаше да му дреме за поредното европейско законодателство. Обаче заради плашещите глоби всякакви консултанти ходят и обясняват как „ами те глобите, знаете, са до 20 милиона“.
Но колкото и да се повтарят тези 20 милиона (или както някои пресоляват манджата „глоби над 20 милиона евро“), това не ги прави реалистични. Първо, има процес, който всички регулатори ще следват, и който включва няколко стъпки на „препоръки“ преди налагане на глоба. Идва комисията, установява несъответствие, прави препоръки, идва пак, установява взети ли са мерки. И ако сте съвсем недобросъвестни и не направите нищо, тогава идват глобите. И тези глоби са пропорционални на риска и на количеството данни. Не е „добър ден, 20 милиона“. Според мен 20-те милиона ще са само за огромни международни компании, като Google и Facebook, които обработват данни на милиони хора. За тетрадката с вересиите глоба няма да има (правото да бъдеш забравен се реализира със задраскване, но само ако магазинерът няма легитимен интерес да ги съхранява, а именно – да му върнете парите :)).
Тук една скоба за българското законодателство – то предвижда доста високи минимуми на глобите (10 хил. лева). Това се оспорва в рамките на общественото обсъждане и е несъразмерно на минимумите в други европейски държави и се надявам да спадне значително.
5. „Трябва да спрем да обработваме лични данни“
В никакъв случай. GDPR не забранява обработката на лични данни, просто урежда как и кога те да се обработват. Имате право да обработвате всички данни, които са ви нужни, за да си свършите работата.
Някои интернет компании напоследък обявиха, че спират работа заради GDPR, защото не им позволявал да обработват данни. И това в общия случай са глупости. Или те така или иначе са били на загуба и сега си търсят оправдание, или са били такъв разграден двор и са продавали данните ви наляво и надясно без ваше знание и съгласие, че GDPR представлява риск. Но то това му е идеята – да няма такива практики. Защото (както твърди Регламентът) това представлява риск за правата и свободите на субектите на данни (субект на данните – това звучи гордо).
6. „Трябва да искаме съгласие за всичко“
Съгласието на потребителите е само едно от основанията за обработка на данните. Има доста други и те дори са по-често срещани в реалния бизнес. Както отбелязах по-горе, ако можете да докажете легитимен интерес да обработвате данните, за да си свършите работата, може да го правите без съгласие. Имате ли право да събирате адреса и телефона на клиента, ако доставяте храна? Разбира се, иначе не може да му я доставите. Няма нужда от съгласие в този случай (би имало нужда от съгласие ако освен за доставката, ползвате данните му и за други цели). Нужно ли е съгласие за обработка на лични данни в рамките на трудово правоотношение? Не, защото Кодекса на труда изисква работодателят да води трудово досие. Има ли нужда банката да поиска съгласие, за да ви обработва личните данни за кредита? Не, защото те са нужни за изпълнението на договора за кредит (и не, не можете да кажете на банката да ви „забрави“ кредита; правото да бъдеш забравен важи само в някои случаи).
Усещането ми обаче е, че ще плъзнат едни декларации и чекбоксове за съгласие, които ще са напълно излишни…но вж. т.1. А дори когато трябва да ги има, ще бъдат прекалено общи, а не за определени цели (съгласявам се да ми обработвате данните, ама за какво точно?).
7. „Съответсвието с GDPR е трудно и скъпо“
…и съответно Регламентът е голяма административна тежест, излишно натоварване на бизнеса и т.н. Ами не, не е. Съответствието с GDPR изисква осъзната обработка на личните данни. Да, изисква и няколко хартии – политики и процедури, с които да докажете, че знаете какви лични данни обработвате и че ги обработвате съвестно, както и че знаете, че гражданите имат някакви права във връзка с данните си (и че всъщност не вие, а те са собственици на тези данни), но извън това съответствието не е тежко. Е, ако хал хабер си нямате какви данни и бизнес процеси имате, може и да отнеме време да ги вкарате в ред, но това е нещо, което по принцип e добре да се случи, със или без GDPR.
Ако например досега в една болница данните за пациентите са били на незащитен по никакъв начин сървър и всеки е имал достъп до него, без това да оставя следа, и също така е имало още 3-4 сървъра, на които никой не е знаел, че има данни (щото „IT-то“ е напуснало преди 2 години), то да, ще трябват малко усилия.
Но почти всичко в GDPR са „добри практики“ така или иначе. Неща, които са полезни и за самия бизнес, не само за гражданите.
Разбира се, синдромът „по-светец и от Папата“ започва да се наблюдава. Освен компаниите, които са изсипали милиони на юристи, консултанти, доставчици (и което накрая е имало плачевен резултат и се е оказало, че за един месец няколко човека могат да я свършат цялата тая работа) има и такива, които четат Регламента като „по-добре да не даваме никакви данни никъде, за всеки случай“. Презастраховането на големи компании, като Twitter и Facebook например, има риск да „удари“ компании, които зависят от техните данни. Но отново – вж. т.1.
В заключение, GDPR не е нещо страшно, не е нещо лошо и не е „измислица на бюрократите в Брюксел“. Има много какво да се желае откъм яснотата му и предполагам ще има какво да се желае откъм приложението му, но „по принцип“ е окей.
И както става винаги със законодателства, обхващащи много хора и бизнеси – в началото ще има не само 7, а 77 мита, които с времето и с практиката ще се изяснят. Ще има грешки на растежа, има риск (особено в по-малки и корумпирани държави) някой „да го отнесе“, но гледайки голямата картинка, смятам, че с този Регламент след 5 години ще сме по-добре откъм защита на данните и откъм последици от липсата на на такава защита.
It’s been just over three weeks since we launched the new Raspberry Pi 3 Model B+. Although the product is branded Raspberry Pi 3B+ and not Raspberry Pi 4, a serious amount of engineering was involved in creating it. The wireless networking, USB/Ethernet hub, on-board power supplies, and BCM2837 chip were all upgraded: together these represent almost all the circuitry on the board! Today, I’d like to tell you about the work that has gone into creating a custom power supply chip for our newest computer.
The new Raspberry Pi 3B+, sporting a new, custom power supply chip (bottom left-hand corner)
The Raspberry Pi 3B+ has been well received, and we’ve enjoyed hearing feedback from the community as well as reading the various reviews and articles highlighting the solid improvements in wireless networking, Ethernet, CPU, and thermal performance of the new board. Gareth Halfacree’s post here has some particularly nice graphs showing the increased performance as well as how the Pi 3B+ keeps cool under load due to the new CPU package that incorporates a metal heat spreader. The Raspberry Pi production lines at the Sony UK Technology Centre are running at full speed, and it seems most people who want to get hold of the new board are able to find one in stock.
Powering your Pi
One of the most critical but often under-appreciated elements of any electronic product, particularly one such as Raspberry Pi with lots of complex on-board silicon (processor, networking, high-speed memory), is the power supply. In fact, the Raspberry Pi 3B+ has no fewer than six different voltage rails: two at 3.3V — one special ‘quiet’ one for audio, and one for everything else; 1.8V; 1.2V for the LPDDR2 memory; and 1.2V nominal for the CPU core. Note that the CPU voltage is actually raised and lowered on the fly as the speed of the CPU is increased and decreased depending on how hard the it is working. The sixth rail is 5V, which is the master supply that all the others are created from, and the output voltage for the four downstream USB ports; this is what the mains power adaptor is supplying through the micro USB power connector.
Power supply primer
There are two common classes of power supply circuits: linear regulators and switching regulators. Linear regulators work by creating a lower, regulated voltage from a higher one. In simple terms, they monitor the output voltage against an internally generated reference and continually change their own resistance to keep the output voltage constant. Switching regulators work in a different way: they ‘pump’ energy by first storing the energy coming from the source supply in a reactive component (usually an inductor, sometimes a capacitor) and then releasing it to the regulated output supply. The switches in switching regulators effect this energy transfer by first connecting the inductor (or capacitor) to store the source energy, and then switching the circuit so the energy is released to its destination.
Linear regulators produce smoother, less noisy output voltages, but they can only convert to a lower voltage, and have to dissipate energy to do so. The higher the output current and the voltage difference across them is, the more energy is lost as heat. On the other hand, switching supplies can, depending on their design, convert any voltage to any other voltage and can be much more efficient (efficiencies of 90% and above are not uncommon). However, they are more complex and generate noisier output voltages.
Designers use both types of regulators depending on the needs of the downstream circuit: for low-voltage drops, low current, or low noise, linear regulators are usually the right choice, while switching regulators are used for higher power or when efficiency of conversion is required. One of the simplest switching-mode power supply circuits is the buck converter, used to create a lower voltage from a higher one, and this is what we use on the Pi.
A history lesson
The BCM2835 processor chip (found on the original Raspberry Pi Model B and B+, as well as on the Zero products) has on-chip power supplies: one switch-mode regulator for the core voltage, as well as a linear one for the LPDDR2 memory supply. This meant that in addition to 5V, we only had to provide 3.3V and 1.8V on the board, which was relatively simple to do using cheap, off-the-shelf parts.
Pi Zero sporting a BCM2835 processor which only needs 2 external switchers (the components clustered behind the camera port)
When we moved to the BCM2836 for Raspberry Pi Model 2 (and subsequently to the BCM2837A1 and B0 for Raspberry Pi 3B and 3B+), the core supply and the on-chip LPDDR2 memory supply were not up to the job of supplying the extra processor cores and larger memory, so we removed them. (We also used the recovered chip area to help fit in the new quad-core ARM processors.) The upshot of this was that we had to supply these power rails externally for the Raspberry Pi 2 and models thereafter. Moreover, we also had to provide circuitry to sequence them correctly in order to control exactly when they power up compared to the other supplies on the board.
Power supply design is tricky (but critical)
Raspberry Pi boards take in 5V from the micro USB socket and have to generate the other required supplies from this. When 5V is first connected, each of these other supplies must ‘start up’, meaning go from ‘off’, or 0V, to their correct voltage in some short period of time. The order of the supplies starting up is often important: commonly, there are structures inside a chip that form diodes between supply rails, and bringing supplies up in the wrong order can sometimes ‘turn on’ these diodes, causing them to conduct, with undesirable consequences. Silicon chips come with a data sheet specifying what supplies (voltages and currents) are needed and whether they need to be low-noise, in what order they must power up (and in some cases down), and sometimes even the rate at which the voltages must power up and down.
A Pi3. Power supply components are clustered bottom left next to the micro USB, middle (above LPDDR2 chip which is on the bottom of the PCB) and above the A/V jack.
In designing the power chain for the Pi 2 and 3, the sequencing was fairly straightforward: power rails power up in order of voltage (5V, 3.3V, 1.8V, 1.2V). However, the supplies were all generated with individual, discrete devices. Therefore, I spent quite a lot of time designing circuitry to control the sequencing — even with some design tricks to reduce component count, quite a few sequencing components are required. More complex systems generally use a Power Management Integrated Circuit (PMIC) with multiple supplies on a single chip, and many different PMIC variants are made by various manufacturers. Since Raspberry Pi 2 days, I was looking for a suitable PMIC to simplify the Pi design, but invariably (and somewhat counter-intuitively) these were always too expensive compared to my discrete solution, usually because they came with more features than needed.
One device to rule them all
It was way back in May 2015 when I first chatted to Peter Coyle of Exar (Exar were bought by MaxLinear in 2017) about power supply products for Raspberry Pi. We didn’t find a product match then, but in June 2016 Peter, along with Tuomas Hollman and Trevor Latham, visited to pitch the possibility of building a custom power management solution for us.
I was initially sceptical that it could be made cheap enough. However, our discussion indicated that if we could tailor the solution to just what we needed, it could be cost-effective. Over the coming weeks and months, we honed a specification we agreed on from the initial sketches we’d made, and Exar thought they could build it for us at the target price.
The chip we designed would contain all the key supplies required for the Pi on one small device in a cheap QFN package, and it would also perform the required sequencing and voltage monitoring. Moreover, the chip would be flexible to allow adjustment of supply voltages from their default values via I2C; the largest supply would be capable of being adjusted quickly to perform the dynamic core voltage changes needed in order to reduce voltage to the processor when it is idling (to save power), and to boost voltage to the processor when running at maximum speed (1.4 GHz). The supplies on the chip would all be generously specified and could deliver significantly more power than those used on the Raspberry Pi 3. All in all, the chip would contain four switching-mode converters and one low-current linear regulator, this last one being low-noise for the audio circuitry.
The MXL7704 chip
The project was a great success: MaxLinear delivered working samples of first silicon at the end of May 2017 (almost exactly a year after we had kicked off the project), and followed through with production quantities in December 2017 in time for the Raspberry Pi 3B+ production ramp.
Front row: Roger with the very first Pi 3B+ prototypes and James with a MXL7704 development board hacked to power a Pi 3. Back row left to right: Will Torgerson, Trevor Latham, Peter Coyle, Tuomas Hollman.
The MXL7704 device has been key to reducing Pi board complexity and therefore overall bill of materials cost. Furthermore, by being able to deliver more power when needed, it has also been essential to increasing the speed of the (newly packaged) BCM2837B0 processor on the 3B+ to 1.4GHz. The result is improvements to both the continuous output current to the CPU (from 3A to 4A) and to the transient performance (i.e. the chip has helped to reduce the ‘transient response’, which is the change in supply voltage due to a sudden current spike that occurs when the processor suddenly demands a large current in a few nanoseconds, as modern CPUs tend to do).
With the MXL7704, the power supply circuitry on the 3B+ is now a lot simpler than the Pi 3B design. This new supply also provides the LPDDR2 memory voltage directly from a switching regulator rather than using linear regulators like the Pi 3, thereby improving energy efficiency. This helps to somewhat offset the extra power that the faster Ethernet, wireless networking, and processor consume. A pleasing side effect of using the new chip is the symmetric board layout of the regulators — it’s easy to see the four switching-mode supplies, given away by four similar-looking blobs (three grey and one brownish), which are the inductors.
The Pi 3B+ PMIC MXL7704 — pleasingly symmetric
It takes a lot of effort to design a new chip from scratch and get it all the way through to production — we are very grateful to the team at MaxLinear for their hard work, dedication, and enthusiasm. We’re also proud to have created something that will not only power Raspberry Pis, but will also be useful for other product designs: it turns out when you have a low-cost and flexible device, it can be used for many things — something we’re fairly familiar with here at Raspberry Pi! For the curious, the product page (including the data sheet) for the MXL7704 chip is here. Particular thanks go to Peter Coyle, Tuomas Hollman, and Trevor Latham, and also to Jon Cronk, who has been our contact in the US and has had to get up early to attend all our conference calls!
The MXL7704 design team celebrating on Pi Day — it takes a lot of people to design a chip!
I hope you liked reading about some of the effort that has gone into creating the new Pi. It’s nice to finally have a chance to tell people about some of the (increasingly complex) technical work that makes building a $35 computer possible — we’re very pleased with the Raspberry Pi 3B+, and we hope you enjoy using it as much as we’ve enjoyed creating it!
Here’s a long post. We think you’ll find it interesting. If you don’t have time to read it all, we recommend you watch this video, which will fill you in with everything you need, and then head straight to the product page to fill yer boots. (We recommend the video anyway, even if you do have time for a long read. ‘Cos it’s fab.)
Raspberry Pi 3 Model B+ is now on sale now for $35, featuring: – A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU – Dual-band 802.11ac wireless LAN and Bluetooth 4.2 – Faster Ethernet (Gigabit Ethernet over USB 2.0) – Power-over-Ethernet support (with separate PoE HAT) – Improved PXE network and USB mass-storage booting – Improved thermal management Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.
If you’ve been a Raspberry Pi watcher for a while now, you’ll have a bit of a feel for how we update our products. Just over two years ago, we releasedRaspberry Pi 3 Model B. This was our first 64-bit product, and our first product to feature integrated wireless connectivity. Since then, we’ve sold over nine million Raspberry Pi 3 units (we’ve sold 19 million Raspberry Pis in total), which have been put to work in schools, homes, offices and factories all over the globe.
Those Raspberry Pi watchers will know that we have a history of releasing improved versions of our products a couple of years into their lives. The first example was Raspberry Pi 1 Model B+, which added two additional USB ports, introduced our current form factor, and rolled up a variety of other feedback from the community. Raspberry Pi 2 didn’t get this treatment, of course, as it was superseded after only one year; but it feels like it’s high time that Raspberry Pi 3 received the “plus” treatment.
So, without further ado, Raspberry Pi 3 Model B+ is now on sale for $35 (the same price as the existing Raspberry Pi 3 Model B), featuring:
A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU
Dual-band 802.11ac wireless LAN and Bluetooth 4.2
Faster Ethernet (Gigabit Ethernet over USB 2.0)
Power-over-Ethernet support (with separate PoE HAT)
Improved PXE network and USB mass-storage booting
Improved thermal management
Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.
Behold the shiny
Raspberry Pi 3B+ is available to buy today from our network of Approved Resellers.
New features, new chips
Roger Thornton did the design work on this revision of the Raspberry Pi. Here, he and I have a chat about what’s new.
Raspberry Pi 3 Model B+ is now on sale now for $35, featuring: – A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU – Dual-band 802.11ac wireless LAN and Bluetooth 4.2 – Faster Ethernet (Gigabit Ethernet over USB 2.0) – Power-over-Ethernet support (with separate PoE HAT) – Improved PXE network and USB mass-storage booting – Improved thermal management Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.
The new product is built around BCM2837B0, an updated version of the 64-bit Broadcom application processor used in Raspberry Pi 3B, which incorporates power integrity optimisations, and a heat spreader (that’s the shiny metal bit you can see in the photos). Together these allow us to reach higher clock frequencies (or to run at lower voltages to reduce power consumption), and to more accurately monitor and control the temperature of the chip.
Dual-band wireless LAN and Bluetooth are provided by the Cypress CYW43455 “combo” chip, connected to a Proant PCB antenna similar to the one used on Raspberry Pi Zero W. Compared to its predecessor, Raspberry Pi 3B+ delivers somewhat better performance in the 2.4GHz band, and far better performance in the 5GHz band, as demonstrated by these iperf results from LibreELEC developer Milhouse.
Tx bandwidth (Mb/s)
Rx bandwidth (Mb/s)
Raspberry Pi 3B
Raspberry Pi 3B+ (2.4GHz)
Raspberry Pi 3B+ (5GHz)
The wireless circuitry is encapsulated under a metal shield, rather fetchingly embossed with our logo. This has allowed us to certify the entire board as a radio module under FCC rules, which in turn will significantly reduce the cost of conformance testing Raspberry Pi-based products.
We’ll be teaching metalwork next.
Previous Raspberry Pi devices have used the LAN951x family of chips, which combine a USB hub and 10/100 Ethernet controller. For Raspberry Pi 3B+, Microchip have supported us with an upgraded version, LAN7515, which supports Gigabit Ethernet. While the USB 2.0 connection to the application processor limits the available bandwidth, we still see roughly a threefold increase in throughput compared to Raspberry Pi 3B. Again, here are some typical iperf results.
Tx bandwidth (Mb/s)
Rx bandwidth (Mb/s)
Raspberry Pi 3B
Raspberry Pi 3B+
We use a magjack that supports Power over Ethernet (PoE), and bring the relevant signals to a new 4-pin header. We will shortly launch a PoE HAT which can generate the 5V necessary to power the Raspberry Pi from the 48V PoE supply.
There… are… four… pins!
Coming soon to a Raspberry Pi 3B+ near you
Raspberry Pi 3B was our first product to support PXE Ethernet boot. Testing it in the wild shook out a number of compatibility issues with particular switches and traffic environments. Gordon has rolled up fixes for all known issues into the BCM2837B0 boot ROM, and PXE boot is now enabled by default.
Clocking, voltages and thermals
The improved power integrity of the BCM2837B0 package, and the improved regulation accuracy of our new MaxLinear MxL7704 power management IC, have allowed us to tune our clocking and voltage rules for both better peak performance and longer-duration sustained performance.
Below 70°C, we use the improvements to increase the core frequency to 1.4GHz. Above 70°C, we drop to 1.2GHz, and use the improvements to decrease the core voltage, increasing the period of time before we reach our 80°C thermal throttle; the reduction in power consumption is such that many use cases will never reach the throttle. Like a modern smartphone, we treat the thermal mass of the device as a resource, to be spent carefully with the goal of optimising user experience.
This graph, courtesy of Gareth Halfacree, demonstrates that Raspberry Pi 3B+ runs faster and at a lower temperature for the duration of an eight‑minute quad‑core Sysbench CPU test.
Note that Raspberry Pi 3B+ does consume substantially more power than its predecessor. We strongly encourage you to use a high-quality 2.5A power supply, such as the official Raspberry Pi Universal Power Supply.
We’ll keep updating this list over the next couple of days, but here are a few to get you started.
Are you discontinuing earlier Raspberry Pi models?
No. We have a lot of industrial customers who will want to stick with the existing products for the time being. We’ll keep building these models for as long as there’s demand. Raspberry Pi 1B+, Raspberry Pi 2B, and Raspberry Pi 3B will continue to sell for $25, $35, and $35 respectively.
What about Model A+?
Raspberry Pi 1A+ continues to be the $20 entry-level “big” Raspberry Pi for the time being. We are considering the possibility of producing a Raspberry Pi 3A+ in due course.
What about the Compute Module?
CM1, CM3 and CM3L will continue to be available. We may offer versions of CM3 and CM3L with BCM2837B0 in due course, depending on customer demand.
Are you still using VideoCore?
Yes. VideoCore IV 3D is the only publicly-documented 3D graphics core for ARM‑based SoCs, and we want to make Raspberry Pi more open over time, not less.
A project like this requires a vast amount of focused work from a large team over an extended period. Particular credit is due to Roger Thornton, who designed the board and ran the exhaustive (and exhausting) RF compliance campaign, and to the team at the Sony UK Technology Centre in Pencoed, South Wales. A partial list of others who made major direct contributions to the BCM2837B0 chip program, CYW43455 integration, LAN7515 and MxL7704 developments, and Raspberry Pi 3B+ itself follows:
James Adams, David Armour, Jonathan Bell, Maria Blazquez, Jamie Brogan-Shaw, Mike Buffham, Rob Campling, Cindy Cao, Victor Carmon, KK Chan, Nick Chase, Nigel Cheetham, Scott Clark, Nigel Clift, Dominic Cobley, Peter Coyle, John Cronk, Di Dai, Kurt Dennis, David Doyle, Andrew Edwards, Phil Elwell, John Ferdinand, Doug Freegard, Ian Furlong, Shawn Guo, Philip Harrison, Jason Hicks, Stefan Ho, Andrew Hoare, Gordon Hollingworth, Tuomas Hollman, EikPei Hu, James Hughes, Andy Hulbert, Anand Jain, David John, Prasanna Kerekoppa, Shaik Labeeb, Trevor Latham, Steve Le, David Lee, David Lewsey, Sherman Li, Xizhe Li, Simon Long, Fu Luo Larson, Juan Martinez, Sandhya Menon, Ben Mercer, James Mills, Max Passell, Mark Perry, Eric Phiri, Ashwin Rao, Justin Rees, James Reilly, Matt Rowley, Akshaye Sama, Ian Saturley, Serge Schneider, Manuel Sedlmair, Shawn Shadburn, Veeresh Shivashimper, Graham Smith, Ben Stephens, Mike Stimson, Yuree Tchong, Stuart Thomson, John Wadsworth, Ian Watch, Sarah Williams, Jason Zhu.
If you’re not on this list and think you should be, please let me know, and accept my apologies.
The eagle-eyed among you may have noticed that today is 28 February, which is as close as you’re going to get to our sixth birthday, given that we launched on a leap day. For the last three years, we’ve launched products on or around our birthday: Raspberry Pi 2 in 2015; Raspberry Pi 3 in 2016; and Raspberry Pi Zero W in 2017. But today is a snow day here at Pi Towers, so rather than launching something, we’re taking a photo tour of the last six years of Raspberry Pi products before we don our party hats for the Raspberry Jam Big Birthday Weekend this Saturday and Sunday.
Before there was Raspberry Pi, there was the Broadcom BCM2763 ‘micro DB’, designed, as it happens, by our very own Roger Thornton. This was the first thing we demoed as a Raspberry Pi in May 2011, shown here running an ARMv6 build of Ubuntu 9.04.
BCM2763 micro DB
Ubuntu on Raspberry Pi, 2011-style
A few months later, along came the first batch of 50 “alpha boards”, designed for us by Broadcom. I used to have a spreadsheet that told me where in the world each one of these lived. These are the first “real” Raspberry Pis, built around the BCM2835 application processor and LAN9512 USB hub and Ethernet adapter; remarkably, a software image taken from the download page today will still run on them.
Raspberry Pi alpha board
We shot some great demos with this board, including this video of Quake III:
A little something for the weekend: here’s Eben showing the Raspberry Pi running Quake 3, and chatting a bit about the performance of the board. Thanks to Rob Bishop and Dave Emett for getting the demo running.
Pete spent the second half of 2011 turning the alpha board into a shippable product, and just before Christmas we produced the first 20 “beta boards”, 10 of which were sold at auction, raising over £10000 for the Foundation.
Beta boards on parade
Here’s Dom, demoing both the board and his excellent taste in movie trailers:
See http://www.raspberrypi.org/ for more details, FAQ and forum.
Rather to Pete’s surprise, I took his beta board design (with a manually-added polygon in the Gerbers taking the place of Paul Grant’s infamous red wire), and ordered 2000 units from Egoman in China. After a few hiccups, units started to arrive in Cambridge, and on 29 February 2012, Raspberry Pi went on sale for the first time via our partners element14 and RS Components.
The first 2000 Raspberry Pis
The first Raspberry Pi from the first box from the first pallet
We took over 100000 orders on the first day: something of a shock for an organisation that had imagined in its wildest dreams that it might see lifetime sales of 10000 units. Some people who ordered that day had to wait until the summer to finally receive their units.
Even as we struggled to catch up with demand, we were working on ways to improve the design. We quickly replaced the USB polyfuses in the top right-hand corner of the board with zero-ohm links to reduce IR drop. If you have a board with polyfuses, it’s a real limited edition; even more so if it also has Hynix memory. Pete’s “rev 2” design made this change permanent, tweaked the GPIO pin-out, and added one much-requested feature: mounting holes.
Revision 1 versus revision 2
If you look carefully, you’ll notice something else about the revision 2 board: it’s made in the UK. 2012 marked the start of our relationship with the Sony UK Technology Centre in Pencoed, South Wales. In the five years since, they’ve built every product we offer, including more than 12 million “big” Raspberry Pis and more than one million Zeros.
Celebrating 500,000 Welsh units, back when that seemed like a lot
Economies of scale, and the decline in the price of SDRAM, allowed us to double the memory capacity of the Model B to 512MB in the autumn of 2012. And as supply of Model B finally caught up with demand, we were able to launch the Model A, delivering on our original promise of a $25 computer.
A UK-built Raspberry Pi Model A
In 2014, James took all the lessons we’d learned from two-and-a-bit years in the market, and designed the Model B+, and its baby brother the Model A+. The Model B+ established the form factor for all our future products, with a 40-pin extended GPIO connector, four USB ports, and four mounting holes.
The Raspberry Pi 1 Model B+ — entering the era of proper product photography with a bang.
While James was working on the Model B+, Broadcom was busy behind the scenes developing a follow-on to the BCM2835 application processor. BCM2836 samples arrived in Cambridge at 18:00 one evening in April 2014 (chips never arrive at 09:00 — it’s always early evening, usually just before a public holiday), and within a few hours Dom had Raspbian, and the usual set of VideoCore multimedia demos, up and running.
We launched Raspberry Pi 2 at the start of 2015, pairing BCM2836 with 1GB of memory. With a quad-core Arm Cortex-A7 clocked at 900MHz, we’d increased performance sixfold, and memory fourfold, in just three years.
Nobody mention the xenon death flash.
And of course, while James was working on Raspberry Pi 2, Broadcom was developing BCM2837, with a quad-core 64-bit Arm Cortex-A53 clocked at 1.2GHz. Raspberry Pi 3 launched barely a year after Raspberry Pi 2, providing a further doubling of performance and, for the first time, wireless LAN and Bluetooth.
All our recent products are just the same board shot from different angles
Zero to hero
Where the PC industry has historically used Moore’s Law to “fill up” a given price point with more performance each year, the original Raspberry Pi used Moore’s law to deliver early-2000s PC performance at a lower price. But with Raspberry Pi 2 and 3, we’d gone back to filling up our original $35 price point. After the launch of Raspberry Pi 2, we started to wonder whether we could pull the same trick again, taking the original Raspberry Pi platform to a radically lower price point.
The result was Raspberry Pi Zero. Priced at just $5, with a 1GHz BCM2835 and 512MB of RAM, it was cheap enough to bundle on the front of The MagPi, making us the first computer magazine to give away a computer as a cover gift.
MagPi issue 40 in all its glory
We followed up with the $10 Raspberry Pi Zero W, launched exactly a year ago. This adds the wireless LAN and Bluetooth functionality from Raspberry Pi 3, using a rather improbable-looking PCB antenna designed by our buddies at Proant in Sweden.
RS Components limited-edition blue Raspberry Pi 1 Model B
Brazilian-market Raspberry Pi 3 Model B
Visible-light Camera Module v2
Learning about injection moulding the hard way
250 pages of content each month, every month
Forward the Foundation
Why does all this matter? Because we’re providing everyone, everywhere, with the chance to own a general-purpose programmable computer for the price of a cup of coffee; because we’re giving people access to tools to let them learn new skills, build businesses, and bring their ideas to life; and because when you buy a Raspberry Pi product, every penny of profit goes to support the Raspberry Pi Foundation in its mission to change the face of computing education.
We’ve had an amazing six years, and they’ve been amazing in large part because of the community that’s grown up alongside us. This weekend, more than 150 Raspberry Jams will take place around the world, comprising the Raspberry Jam Big Birthday Weekend.
If you want to know more about the Raspberry Pi community, go ahead and find your nearest Jam on our interactive map — maybe we’ll see you there.
Everything online is hackable. This is true for Equifax’s data and the federal Office of Personal Management’s data, which was hacked in 2015. If information is on a computer connected to the Internet, it is vulnerable.
But just because everything is hackable doesn’t mean everything will be hacked. The difference between the two is complex, and filled with defensive technologies, security best practices, consumer awareness, the motivation and skill of the hacker and the desirability of the data. The risks will be different if an attacker is a criminal who just wants credit card details and doesn’t care where he gets them from or the Chinese military looking for specific data from a specific place.
The proper question isn’t whether it’s possible to protect consumer data, but whether a particular site protects our data well enough for the benefits provided by that site. And here, again, there are complications.
In most cases, it’s impossible for consumers to make informed decisions about whether their data is protected. We have no idea what sorts of security measures Google uses to protect our highly intimate Web search data or our personal e-mails. We have no idea what sorts of security measures Facebook uses to protect our posts and conversations.
We have a feeling that these big companies do better than smaller ones. But we’re also surprised when a lone individual publishes personal data hacked from the infidelity site AshleyMadison.com, or when the North Korean government does the same with personal information in Sony’s network.
Think about all the companies collecting personal data about you the websites you visit, your smartphone and its apps, your Internet-connected car — and how little you know about their security practices. Even worse, credit bureaus and data brokers like Equifax collect your personal information without your knowledge or consent.
So while it might be possible for companies to do a better job of protecting our data, you as a consumer are in no position to demand such protection.
Government policy is the missing ingredient. We need standards and a method for enforcement. We need liabilities and the ability to sue companies that poorly secure our data. The biggest reason companies don’t protect our data online is that it’s cheaper not to. Government policy is how we change that.
This essay appeared as half of a point/counterpoint with Priscilla Regan, in a CQ Researcher report titled “Privacy and the Internet.”
The wait is over. September and October’s Partner Webinars have officially arrived! In case you missed the intro last month, the AWS Partner Webinar Series is a selection of live and recorded presentations covering a broad range of topics at varying technical levels and scale. A little different from our AWS Online TechTalks, each AWS Partner Webinar is hosted by an AWS solutions architect and an AWS Competency Partner who has successfully helped customers evaluate and implement the tools, techniques, and technologies of AWS.
tldr: I went to DragonCon, a conference of 85,000 people, so sniff WiFi packets and test how many phones now uses MAC address randomization. Almost all iPhones nowadays do, but it seems only a third of Android phones do.
Ten years ago at BlackHat, we presented the “data seepage” problem, how the broadcasts from your devices allow you to be tracked. Among the things we highlighted was how WiFi probes looking to connect to access-points expose the unique hardware address burned into the phone, the MAC address. This hardware address is unique to your phone, shared by no other device in the world. Evildoers, such as the NSA or GRU, could install passive listening devices in airports and train-stations around the world in order to track your movements. This could be done with $25 devices sprinkled around a few thousand places — within the budget of not only a police state, but also the average hacker.
In 2014, with the release of iOS 8, Apple addressed this problem by randomizing the MAC address. Every time you restart your phone, it picks a new, random, hardware address for connecting to WiFi. This causes a few problems: every time you restart your iOS devices, your home network sees a completely new device, which can fill up your router’s connection table. Since that table usually has at least 100 entries, this shouldn’t be a problem for your home, but corporations and other owners of big networks saw their connection tables suddenly get big with iOS 8.
In 2015, Google added the feature to Android as well. However, even though most Android phones today support this feature in theory, it’s usually not enabled.
Recently, I went to DragonCon in order to test out how well this works. DragonCon is a huge sci-fi/fantasy conference in Atlanta in August, second to San Diego’s ComicCon in popularity. It’s spread across several neighboring hotels in the downtown area. A lot of the traffic funnels through the Marriot Marquis hotel, which has a large open area where, from above, you can see thousands of people at a time.
And, with a laptop, see their broadcast packets.
So I went up on a higher floor and setup my laptop in order to capture “probe” broadcasts coming from phones, in order to record the hardware MAC addresses. I’ve done this in years past, before address randomization, in order to record the popularity of iPhones. The first three bytes of an old-style, non-randomized address, identifies the manufacturer. This time, I should see a lot fewer manufacturer IDs, and mostly just random addresses instead.
I recorded 9,095 unique probes over a couple hours. I’m not sure exactly how long — my laptop would go to sleep occasionally because of lack of activity on the keyboard. I should probably setup a Raspberry Pi somewhere next year to get a more consistent result.
A quick summary of the results are:
The 9,000 devices were split almost evenly between Apple and Android. Almost all of the Apple devices randomized their addresses. About a third of the Android devices randomized. (This assumes Android only randomizes the final 3 bytes of the address, and that Apple randomizes all 6 bytes — my assumption may be wrong).
A table of the major results are below. A little explanation:
The first item in the table is the number of phones that randomized the full 6 bytes of the MAC address. I’m guessing these are either mostly or all Apple iOS devices. They are nearly half of the total, or 4498 out of 9095 unique probes.
The second number is those that randomized the final 3 bytes of the MAC address, but left the first three bytes identifying themselves as Android devices. I’m guessing this represents all the Android devices that randomize. My guesses may be wrong, maybe some Androids randomize the full 6 bytes, which would get them counted in the first number.
The following numbers are phones from major Android manufacturers like Motorola, LG, HTC, Huawei, OnePlus, ZTE. Remember: the first 3 bytes of an un-randomized address identifies who made it. There are roughly 2500 of these devices.
There is a count for 309 Apple devices. These are either older iOS devices pre iOS 8, or which have turned off the feature (some corporations demand this), or which are actually MacBooks instead of phones.
The vendor of the access-points that Marriot uses is “Ruckus”. There have a lot of access-points in the hotel.
The “TCT mobile” entry is actually BlackBerry. Apparently, BlackBerry stopped making phones and instead just licenses the software/brand to other hardware makers. If you buy a BlackBerry from the phone store, it’s likely going to be a TCT phone instead.
I’m assuming the “Amazon” devices are Kindle ebooks.
Lastly, I’d like to point out the two records for “Ford”. I was capturing while walking out of the building, I think I got a few cars driving by.
Think about all of the websites you visit every day. Now imagine if the likes of Time Warner, AT&T, and Verizon collected all of your browsing history and sold it on to the highest bidder. That’s what will probably happen if Congress has its way.
This week, lawmakers voted to allow Internet service providers to violate your privacy for their own profit. Not only have they voted to repeal a rule that protects your privacy, they are also trying to make it illegal for the Federal Communications Commission to enact other rules to protect your privacy online.
That this is not provoking greater outcry illustrates how much we’ve ceded any willingness to shape our technological future to for-profit companies and are allowing them to do it for us.
There are a lot of reasons to be worried about this. Because your Internet service provider controls your connection to the Internet, it is in a position to see everything you do on the Internet. Unlike a search engine or social networking platform or news site, you can’t easily switch to a competitor. And there’s not a lot of competition in the market, either. If you have a choice between two high-speed providers in the US, consider yourself lucky.
What can telecom companies do with this newly granted power to spy on everything you’re doing? Of course they can sell your data to marketers — and the inevitable criminals and foreign governments who also line up to buy it. But they can do more creepy things as well.
They can snoop through your traffic and insert their own ads. They can deploy systems that remove encryption so they can better eavesdrop. They can redirect your searches to other sites. They can install surveillance software on your computers and phones. None of these are hypothetical.
They’re all things Internet service providers have done before, and they are some of the reasons the FCC tried to protect your privacy in the first place. And now they’ll be able to do all of these things in secret, without your knowledge or consent. And, of course, governments worldwide will have access to these powers. And all of that data will be at risk of hacking, either by criminals and other governments.
Telecom companies have argued that other Internet players already have these creepy powers — although they didn’t use the word “creepy” — so why should they not have them as well? It’s a valid point.
Surveillance is already the business model of the Internet, and literally hundreds of companies spy on your Internet activity against your interests and for their own profit.
Your e-mail provider already knows everything you write to your family, friends, and colleagues. Google already knows our hopes, fears, and interests, because that’s what we search for.
Your cellular provider already tracks your physical location at all times: it knows where you live, where you work, when you go to sleep at night, when you wake up in the morning, and — because everyone has a smartphone — who you spend time with and who you sleep with.
And some of the things these companies do with that power is no less creepy. Facebook has run experiments in manipulating your mood by changing what you see on your news feed. Uber used its ride data to identify one-night stands. Even Sony once installed spyware on customers’ computers to try and detect if they copied music files.
Aside from spying for profit, companies can spy for other purposes. Uber has already considered using data it collects to intimidate a journalist. Imagine what an Internet service provider can do with the data it collects: against politicians, against the media, against rivals.
Of course the telecom companies want a piece of the surveillance capitalism pie. Despite dwindlingrevenues, increasinguse of ad blockers, and increases in clickfraud, violating our privacy is still a profitable business — especially if it’s done in secret.
The bigger question is: why do we allow for-profit corporations to create our technological future in ways that are optimized for their profits and anathema to our own interests?
When markets work well, different companies compete on price and features, and society collectively rewards better products by purchasing them. This mechanism fails if there is no competition, or if rival companies choose not to compete on a particular feature. It fails when customers are unable to switch to competitors. And it fails when what companies do remains secret.
Unlike service providers like Google and Facebook, telecom companies are infrastructure that requires government involvement and regulation. The practical impossibility of consumers learning the extent of surveillance by their Internet service providers, combined with the difficulty of switching them, means that the decision about whether to be spied on should be with the consumer and not a telecom giant. That this new bill reverses that is both wrong and harmful.
Today, technology is changing the fabric of our society faster than at any other time in history. We have big questions that we need to tackle: not just privacy, but questions of freedom, fairness, and liberty. Algorithms are making decisions about policing, healthcare.
Driverless vehicles are making decisions about traffic and safety. Warfare is increasingly being fought remotely and autonomously. Censorship is on the rise globally. Propaganda is being promulgated more efficiently than ever. These problems won’t go away. If anything, the Internet of things and the computerization of every aspect of our lives will make it worse.
In today’s political climate, it seems impossible that Congress would legislate these things to our benefit. Right now, regulatory agencies such as the FTC and FCC are our best hope to protect our privacy and security against rampant corporate power. That Congress has decided to reduce that power leaves us at enormous risk.
It’s too late to do anything about this bill — Trump will certainly sign it — but we need to be alert to future bills that reduce our privacy and security.
A decade ago, I wroteabout the death of ephemeral conversation. As computers were becoming ubiquitous, some unintended changes happened, too. Before computers, what we said disappeared once we’d said it. Neither face-to-face conversations nor telephone conversations were routinely recorded. A permanent communication was something different and special; we called it correspondence.
The Internet changed this. We now chat by text message and e-mail, on Facebook and on Instagram. These conversations — with friends, lovers, colleagues, fellow employees — all leave electronic trails. And while we know this intellectually, we haven’t truly internalized it. We still think of conversation as ephemeral, forgetting that we’re being recorded and what we say has the permanence of correspondence.
That our data is used by large companies for psychological manipulation – we call this advertising – is well known. So is its use by governments for law enforcement and, depending on the country, social control. What made the news over the past year were demonstrations of how vulnerable all of this data is to hackers and the effects of having it hacked, copied, and then published online. We call this doxing.
Doxing isn’t new, but it has become more common. It’s been perpetrated against corporations, law firms, individuals, the NSA and — just this week — the CIA. It’s largely harassment and notwhistleblowing, and it’s not going to change anytime soon. The data in your computer and in the cloud are, and will continue to be, vulnerable to hacking and publishing online. Depending on your prominence and the details of this data, you may need some new strategies to secure your private life.
There are two basic ways hackers can get at your e-mail and private documents. One way is to guess your password. That’s how hackers got their hands on personal photos of celebrities from iCloud in 2014.
How to protect yourself from this attack is pretty obvious. First, don’t choose a guessable password. This is more than not using “password1” or “qwerty”; most easily memorizable passwords are guessable. My advice is to generate passwords you have to remember by using either the XKCD scheme or the Schneier scheme, and to use large random passwords stored in a password manager for everything else.
Second, turn on two-factor authentication where you can, like Google’s 2-Step Verification. This adds another step besides just entering a password, such as having to type in a one-time code that’s sent to your mobile phone. And third, don’t reuse the same password on any sites you actually care about.
You’re not done, though. Hackers have accessed accounts by exploiting the “secret question” feature and resetting the password. That was how Sarah Palin’s e-mail account was hacked in 2008. The problem with secret questions is that they’re not very secret and not very random. My advice is to refuse to use those features. Type randomness into your keyboard, or choose a really random answer and store it in your password manager.
Finally, you also have to stay alert to phishing attacks, where a hacker sends you an enticing e-mail with a link that sends you to a web page that looks almost like the expected page, but which actually isn’t. This sort of thing can bypass two-factor authentication, and is almost certainly what tricked John Podesta and Colin Powell.
The other way hackers can get at your personal stuff is by breaking in to the computers the information is stored on. This is how the Russians got into the Democratic National Committee’s network and how a lone hacker got into the Panamanian law firm Mossack Fonseca. Sometimes individuals are targeted, as when China hacked Google in 2010 to access the e-mail accounts of human rights activists. Sometimes the whole network is the target, and individuals are inadvertent victims, as when thousands of Sony employees had their e-mails published by North Korea in 2014.
Protecting yourself is difficult, because it often doesn’t matter what you do. If your e-mail is stored with a service provider in the cloud, what matters is the security of that network and that provider. Most users have no control over that part of the system. The only way to truly protect yourself is to not keep your data in the cloud where someone could get to it. This is hard. We like the fact that all of our e-mail is stored on a server somewhere and that we can instantly search it. But that convenience comes with risk. Consider deleting old e-mail, or at least downloading it and storing it offline on a portable hard drive. In fact, storing data offline is one of the best things you can do to protect it from being hacked and exposed. If it’s on your computer, what matters is the security of your operating system and network, not the security of your service provider.
Consider this for files on your own computer. The more things you can move offline, the safer you’ll be.
E-mail, no matter how you store it, is vulnerable. If you’re worried about your conversations becoming public, think about an encrypted chat program instead, such as Signal, WhatsApp or Off-the-Record Messaging. Consider using communications systems that don’t save everything by default.
None of this is perfect, of course. Portable hard drives are vulnerable when you connect them to your computer. There are ways to jump air gaps and access data on computers not connected to the Internet. Communications and data files you delete might still exist in backup systems somewhere — either yours or those of the various cloud providers you’re using. And always remember that there’s always another copy of any of your conversations stored with the person you’re conversing with. Even with these caveats, though, these measures will make a big difference.
When secrecy is truly paramount, go back to communications systems that are still ephemeral. Pick up the telephone and talk. Meet face to face. We don’t yet live in a world where everything is recorded and everything is saved, although that era is coming. Enjoy the last vestiges of ephemeral conversation while you still can.
There’s nothing in the CIA #Vault7 leaks that calls into question strong attribution, like Russia being responsible for the DNC hacks. On the other hand, it does call into question weak attribution, like North Korea being responsible for the Sony hacks.
There are really two types of attribution. Strong attribution is a preponderance of evidence that would convince an unbiased, skeptical expert. Weak attribution is flimsy evidence that confirms what people are predisposed to believe.
The DNC hacks have strong evidence pointing to Russia. Not only does all the malware check out, but also other, harder to “false flag” bits, like active command-and-control servers. A serious operator could still false-flag this in theory, if only by bribing people in Russia, but nothing in the CIA dump hints at this.
The Sony hacks have weak evidence pointing to North Korea. One of the items was the use of the RawDisk driver, used both in malware attributed to North Korea and the Sony attacks. This was described as “flimsy” at the time [*]. The CIA dump [*] demonstrates that indeed it’s flimsy — as apparently CIA malware also uses the RawDisk code.
In the coming days, biased partisans are going to seize on the CIA leaks as proof of “false flag” operations, calling into question Russian hacks. No, this isn’t valid. We experts in the industry criticized “malware techniques” as flimsy attribution, long before the Sony attack, and long before the DNC hacks. All the CIA leaks do is prove we were right. On the other hand, the DNC hack attribution is based on more than just this, so nothing in the CIA leaks calls into question that attribution.
This year’s Consumer Electronics Show (CES) just wrapped up in Las Vegas. The usual parade of cool tech toys created a lot of headlines this year, but there were some genuine trends to keep an eye on too. If you’re like us, you’re probably one of the first people around to adopt promising new technologies when they emerge. As early adopters we can sometimes lose the forest through the trees when it comes to understanding what this means for everyone else, so we’re going to look at it through that prism.
2017 promises to be a big year for voice-activated “smart home” devices. The final landscape for this is still to be determined – all the expected players have their foot in it right now. Amazon, Apple, Google, Microsoft, even some smaller players.
Amazon deserves props after a holiday season that saw its Echo and Echo Dot devices in high demand. The company’s published an API that is Alexa is picking up plenty of support from third party manufacturers. Alexa’s testing for far beyond Echo, it seems.
Electronics giant LG is building Alexa into a line of robots designed for domestic duties and a refrigerator that also sports interior fridge cams, for example. Ford is integrating Alexa support into its Sync 3 automotive interface. Televisions, lighting devices, and home security products are among the many devices to feature Alexa integration.
Alexa is the new hotness, but the real trend here is in voice-assisted connectivity around the home. Even if Alexa runs out of steam, this tech is here to stay. The Internet of Things and voice activated interfaces are converging quickly, though that day isn’t today. It’s tantalizingly close. It’s still a niche, though, where it will stay for as long as consumers have to piece different things together to get it to work. That means there’s still room for disruption.
There’s especially ripe opportunity in underserved verticals. Take the home health market, for example: Natural language interfaces have huge implications for elderly and disabled care and assistance. Finding and developing solutions for those sorts of vertical markets is an awesome opportunity for the right players.
Of course, with great power comes great responsibility. A family of a six-year-old recently got stuck with a $160 bill after she told Alexa to order her cookies and a dollhouse. The family ended up donating the accidental order to charity. For what it’s worth, that problem can be avoided by activating a confirmation code feature in the Alexa software.
The Electric Vehicle (EV) Market Heats Up
One of the trickiest things to unpack from CES is hype from substance. Nowhere was that more apparent last week than the unveiling of Faraday Future’s FF91, a new Electric Vehicle (EV) positioned to go toe-to-toe with Tesla’s EV fleet.
The FF91 EV can purportedly go 378 miles on a single charge and also possesses autonomous driving capabilities (although its vaunted self-parking abilities didn’t demo as well as planned). When or if it’ll make it into production is still a head-scratcher, however. Faraday Future says it’ll be out next year, assuming that the company is beyond the production and manufacturing woes that have plagued it up until now.
While new vehicles and vehicle concepts are still largely the domain of auto shows, some auto manufacturers used CES to float new concepts ahead of the Detroit Auto Show, which happens this week. Toyota, for example, showed off its Concept-i, a car with artificial intelligence and natural language processing (like Siri or Alexa) designed to learn from you and adapt.
As we mentioned, Alexa is integrated into Ford’s Sync 3 platform, too. Already you can buy new cars with CarPlay and Android Auto, which makes it a lot easier to just talk with your mobile device to stay connected, get directions and entertain yourself on the morning commute simply by talking to your car instead of touching buttons. That’s a smart user interface change, but it’s still a potentially dangerous distraction for the driver. For this technology to succeed, it’s imperative that natural language interface designers make the experience as frictionless as possible.
Chrysler is making a play for future millennial families. We’re not making this up – they used “millennial” to describe the target market for this several times. The Portal concept is an electric minivan of sorts that’s chock-full of buzzwords: Facial recognition, Wi-Fi, media sharing, ten charging ports, semi-autonomous driving abilities and more).
2017 marks a pivot for car makers in this respect. For years the conventional wisdom that millennials were a lost cause for auto makers – Uber and Zipcar was all they needed. It turns out that was totally wrong. Economic pressures and diverse lifestyles may have delayed millennials’ trek toward auto ownership, but they’re turning out now in big numbers to buy wheels. Millennial families will need transportation just like generations before them back to the station wagon, which is why Chrysler says this “fifth-generation” family car will go into production sometime after 2018.
Volkswagen showed off its new I.D. concept car, a Golf-looking EV that also has all the requisite buzzwords. Speaking of buzzwords, what really excited us was the I.D. Buzz. This new EV resurrects the styling of the Hippy-era Microbus, with mood lighting, autonomous driving capabilities and a retractable steering wheel.
Rumors have persisted for years that VW was on the cusp of introducing a refreshed Microbus, but those rumors have never come to pass. And unfortunately, VW has no concrete plans to actually produce this – it seems to be a marketing effort to draw on nostalgic Boomer appeal, more than anything..
Both Buzz and Chrysler’s Portal do give us some insight about where auto makers are going when it comes to future generations of minivans: Electric, autonomous, customizable and more social than ever. If we are headed towards a future where vehicles drive themselves, family transportation will look very different than it is today.
Laptops At Both Extremes
CES saw the rollout of several new PC laptop models and concepts that will be hitting store shelves over the next several months.
Gamers looking for more real estate – a lot more real estate – were interested in Razer’s latest concept, Project Valerie. The laptop sports not one but three 4K displays which fold out on hinges. That’s 12K pixels of horizontal image space, mated to an Nvidia GeForce GTX 1080 graphics processor. A unibody aluminum chassis keeps it relatively thin (1.5 inches) when closed, but the entire rig weighs more than 12 pounds. Razer doesn’t have any immediate production plans, which may explain why their prototype was stolen before the end of the show.
Unlike Razer, Acer has production plans – immediate plans – for its gargantuan 21-inch Predator 21X laptop, priced at $8,999 and headed to store shelves next month. It was announced last year, but Acer finally offered launch details last week. A 17-inch model is also coming soon.
Big gaming laptops make for pretty pictures and certainly have their place in the PC ecosystem, but they’re niche devices. After a ramp up on 2-in-1s and low-powered laptops, Intel’s Kaby Lake processors are finally ready for the premium and mid-range laptop market. Kaby Lake efficiency improvements are helping PC makers build thinner and lighter laptops with better battery life, 4K video processing, faster solid state storage and more.
HP, Asus, MSI, Dell (and its gaming arm Alienware) were among the many companies with sleek new Kaby Lake-equipped models.
Gaming in the cloud with Nvidia
Nvidia, makers of premium graphics processors, offers GeForce Now cloud gaming to users of its Shield, an Android-based gaming handheld. That service is expanding to Windows and Mac in March.
Gaming as a Service, if you will, isn’t a new idea. OnLive pioneered the concept more than a decade ago. Gaikai followed, then was acquired by Sony in 2012. Nvidia’s had limited success with GeForce Now, but it’s been a single-platform offering up until now.
Nvidia has robust data centers to handle the processing and traffic, so best of luck to them as they scale up to meet demand. Gaming is very sensitive to network disruption – no gamer appreciates lag – so it’ll be interesting to see how GeForce Now scales to accommodate the new devices.
Mesh networking delivers more consistent, stronger network reception and performance than a conventional Wi-Fi router. Some of us have set up routers and extenders to fix dead spots – mesh networking works differently through smart traffic and better radio management between multiple network bases.
Eero, Ubiquiti, and even Google (with Google Wifi) are already offering mesh networking products, and this market segment looks to expand big in 2017. Netgear, Linksys, Asus, TP-Link and others are among those with new mesh networking setups. Mesh networking gear is still hampered by a higher price than plain old routers. That means the value isn’t there for some of us who have networking gear that gets the job done, even with shortcomings like dead zones or slow zones. But prices are coming down fast as more companies get into the market. If you have an 802.11ac router you’re happy with, stick with it for now, and move to a mesh networking setup for your next Wi-Fi upgrade.
Getting Your Feet Into VR
Our award for wackiest CES product has to go to Cerevo Taclim. Tactile feedback shoes and wireless hand controllers that help you “feel” the surface you’re walking on. Crunching snow underfoot, splashing through water. At an expected $1,000-$1,500 a pop, these probably won’t be next year’s Hatchimals, but it’s fun to imagine what game devs can do with the technology. Strap these to your feet then break out your best Hadouken in Street Fighter VR!
CES isn’t the real world. Only a fraction of what’s shown off ever sees the light of day, but it’s always interesting to see the trend-focused consumer electronics market shift and change from year to year. At the end of the year we hope to look back and see how much of this stuff ended up resonating with the actual consumer the show is named for.
President Barack Obama’s public accusation of Russia as the source of the hacks in the US presidential election and the leaking of sensitive e-mails through WikiLeaks and other sources has opened up a debate on what constitutes sufficient evidence to attribute an attack in cyberspace. The answer is both complicated and inherently tied up in political considerations.
The administration is balancing political considerations and the inherent secrecy of electronic espionage with the need to justify its actions to the public. These issues will continue to plague us as more international conflict plays out in cyberspace.
It’s true that it’s easy for an attacker to hide who he is in cyberspace. We are unable to identify particular pieces of hardware and software around the world positively. We can’t verify the identity of someone sitting in front of a keyboard through computer data alone. Internet data packets don’t come with return addresses, and it’s easy for attackers to disguise their origins. For decades, hackers have used techniques such as jump hosts, VPNs, Tor and open relays to obscure their origin, and in many cases they work. I’m sure that many national intelligence agencies route their attacks through China, simply because everyone knows lots of attacks come from China.
On the other hand, there are techniques that can identify attackers with varying degrees of precision. It’s rarely just one thing, and you’ll often hear the term “constellation of evidence” to describe how a particular attacker is identified. It’s analogous to traditional detective work. Investigators collect clues and piece them together with known mode of operations. They look for elements that resemble other attacks and elements that are anomalies. The clues might involve ones and zeros, but the techniques go back to Sir Arthur Conan Doyle.
The University of Toronto-based organization Citizen Lab routinely attributes attacks against the computers of activists and dissidents to particular Third World governments. It took months to identify China as the source of the 2012 attacks against the New York Times. While it was uncontroversial to say that Russia was the source of a cyberattack against Estonia in 2007, no one knew if those attacks were authorized by the Russian government — until the attackers explained themselves. And it was the Internet security company CrowdStrike, which first attributed the attacks against the Democratic National Committee to Russian intelligence agencies in June, based on multiple pieces of evidence gathered from its forensic investigation.
Attribution is easier if you are monitoring broad swaths of the Internet. This gives the National Security Agency a singular advantage in the attribution game. The problem, of course, is that the NSA doesn’t want to publish what it knows.
Regardless of what the government knows and how it knows it, the decision of whether to make attribution evidence public is another matter. When Sony was attacked, many security experts — myselfincluded — were skeptical of both the government’s attribution claims and the flimsy evidence associated with it. I only became convinced when the New York Timesran a story about the government’s attribution, which talked about both secret evidence inside the NSA and human intelligence assets inside North Korea. In contrast, when the Office of Personnel Management was breached in 2015, the US government decided not to accuse China publicly, either because it didn’t want to escalate the political situation or because it didn’t want to reveal any secret evidence.
It’s one thing for the government to know who attacked it. It’s quite another for it to convince the public who attacked it. As attribution increasingly relies on secret evidence — as it did with North Korea’s attack of Sony in 2014 and almost certainly does regarding Russia and the previous election — the government is going to have to face the choice of making previously secret evidence public and burning sources and methods, or keeping it secret and facing perfectly reasonable skepticism.
If the government is going to take public action against a cyberattack, it needs to make its evidence public. But releasing secret evidence might get people killed, and it would make any future confidentiality assurances we make to human sources completely non-credible. This problem isn’t going away; secrecy helps the intelligence community, but it wounds our democracy.
The constellation of evidence attributing the attacks against the DNC, and subsequent release of information, is comprehensive. It’s possible that there was more than one attack. It’s possible that someone not associated with Russia leaked the information to WikiLeaks, although we have no idea where that someone else would have obtained the information. We know that the Russian actors who hacked the DNC — both the FSB, Russia’s principal security agency, and the GRU, Russia’s military intelligence unit — are also attacking other political networks around the world.
In the end, though, attribution comes down to whom you believe. When Citizen Lab writes a report outlining how a United Arab Emirates human rights defender was targeted with a cyberattack, we have no trouble believing that it was the UAE government. When Google identifies China as the source of attacks against Gmail users, we believe it just as easily.
Obama decided not to make the accusation public before the election so as not to be seen as influencing the election. Now, afterward, there are political implications in accepting that Russia hacked the DNC in an attempt to influence the US presidential election. But no amount of evidence can convince the unconvinceable.
The most important thing we can do right now is deter any country from trying this sort of thing in the future, and the political nature of the issue makes that harder. Right now, we’ve told the world that others can get away with manipulating our election process as long as they can keep their efforts secret until after one side wins. Obama has promised both secret retaliations and public ones. We need to hope they’re enough.
Roy Ben-Alta is Sr. Business Development Manager at AWS – Big Data & Machine Learning
We can’t believe that there are just a couple of weeks left before re:Invent 2016. If you are attending this year, you will want to check out our Big Data sessions! Unlike in previous years, these sessions are covered in multiple tracks, such as Big Data & Analytics, Architecture, Databases, and IoT. We will also have—for the first time—two mini-conferences: Big Data and Machine Learning. These resource mini-conferences include full-day technical deep dives on a broad variety of topics, including big data, IoT, machine learning, and more.
This year, we have over 40 sessions!
We have great sessions from Netflix, Chick-fil-A, Under Armour, FINRA, King.com, Beeswax, GE, Toyota Racing Development, Quantcast, Groupon, Amazon.com, Scholastic,Thomson Reuters, DataXu, Sony, EA, and many more. All sessions are recorded and made available on YouTube. Also, all slide decks from the sessions are made available on SlideShare.net after the conference.
Today, I highlight the sessions to be presented as part of the Big Data & Machine Learning mini-conferences, Big Data analytics, and relevant sessions from other tracks. The following sessions are in this year’s session catalog. Choose any link to learn more or to add a session to your schedule.
We are looking forward to meeting you at re:invent.
BDM205 – Big Data Mini-Con State of the Union – Tuesday Join us for this general session where AWS big data experts present an in-depth look at the current state of big data. Learn about the latest big data trends and industry use cases. Hear how other organizations are using the AWS big data platform to innovate and remain competitive. Take a look at some of the most recent AWS big data announcements, as we kick off the Big Data Mini-Con.
MAC206 – Amazon Machine Learning State of the Union Mini-Con – Wednesday With the growing number of business cases for artificial intelligence (AI), machine learning and deep learning continue to drive the development of state-of-the-art technology. We see this manifested in computer vision, predictive modeling, natural language understanding, and recommendation engines. During this full day of sessions and workshops, learn how we use some of these technologies within Amazon, and how you can develop your applications to leverage the benefits of these AI services.
Deep dive customer use case sessions
ARC306 – Event Handling at Scale: Designing an Auditable Ingestion and Persistence Architecture for 10K+ events/second How does McGraw-Hill Education use the AWS platform to scale and reliably receive 10,000 learning events per second? How do we provide near-real-time reporting and event-driven analytics for hundreds of thousands of concurrent learners in a reliable, secure, and auditable manner that is cost effective? MHE designed and implemented a robust solution that integrates AWS API Gateway, AWS Lambda, Amazon Kinesis, Amazon S3, Amazon Elasticsearch Service, Amazon DynamoDB, HDFS, Amazon EMR, Amazon EC2, and other technologies to deliver this cloud-native platform across the US and soon the world. This session describes the challenges we faced, architecture considerations, how we gained confidence for a successful production roll-out, and the behind-the-scenes lessons we learned.
ARC308 – Metering Big Data at AWS: From 0 to 100 Million Records in 1 Second Learn how AWS processes millions of records per second to support accurate metering across AWS and our customers. This session shows how we migrated from traditional frameworks to AWS managed services to support a broad processing pipeline. You gain insights on how we used AWS services to build a reliable, scalable, and fast processing system using Amazon Kinesis, Amazon S3, and Amazon EMR. Along the way, we dive deep into use cases that deal with scaling and accuracy constraints. Attend this session to see AWS’s end-to-end solution that supports metering at AWS.
BDA203 – Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift Billions of Rows Transformed in Record Time Using Matillion ETL for Amazon Redshift GE Power & Water develops advanced technologies to help solve some of the world’s most complex challenges related to water availability and quality. They had amassed billions of rows of data on on-premises databases but decided to migrate some of their core big data projects to the AWS Cloud. When they decided to transform and store it all in Amazon Redshift, they knew they needed an ETL/ELT tool that could handle this enormous amount of data and safely deliver it to its destination.
In this session, Ryan Oates, Enterprise Architect at GE Water, shares his use case, requirements, outcomes and lessons learned. He also shares the details of his solution stack, including Amazon Redshift and Matillion ETL for Amazon Redshift in AWS Marketplace. You learn best practices on Amazon Redshift ETL supporting enterprise analytics and big data requirements, simply and at scale. You learn how to simplify data loading, transformation and orchestration on to Amazon Redshift and how to build out a real data pipeline.
BDA204 – Leverage the Power of the Crowd To Work with Amazon Mechanical Turk With Amazon Mechanical Turk (MTurk), you can leverage the power of the crowd for a host of tasks ranging from image moderation and video transcription to data collection and user testing. You simply build a process that submits tasks to the Mechanical Turk marketplace and get results quickly, accurately, and at scale. In this session, Russ, from Rainforest QA, shares best practices and lessons learned from his experience using MTurk. The session covers the key concepts of MTurk, getting started as a Requester, and using MTurk via the API. You learn how to set and manage Worker incentives, achieve great Worker quality, and how to integrate and scale your crowdsourced application. By the end of this session, you have a comprehensive understanding of MTurk and know how to get started harnessing the power of the crowd.
BDA205 – Delighting Customers Through Device Data with Salesforce IoT Cloud and AWS IoT The Internet of Things (IoT) produces vast quantities of data that promise a deep, always connected view into customer experiences through their devices. In this connected age, the question is no longer how do you gather customer data, but what do you do with all that data. How do you ingest at massive scale and develop meaningful experiences for your customers? In this session, you’ll learn how Salesforce IoT Cloud works in concert with the AWS IoT engine to ingest and transform all of the data generated by every one of your customers, partners, devices, and sensors into meaningful action. You’ll also see how customers are using Salesforce and AWS together to process massive quantities of data, build business rules with simple, intuitive tools, and engage proactively with customers in real time. Session sponsored by Salesforce.
BDM203 – FINRA: Building a Secure Data Science Platform on AWS Data science is a key discipline in a data-driven organization. Through analytics, data scientists can uncover previously unknown relationships in data to help an organization make better decisions. However, data science is often performed from local machines with limited resources and multiple datasets on a variety of databases. Moving to the cloud can help organizations provide scalable compute and storage resources to data scientists, while freeing them from the burden of setting up and managing infrastructure. In this session, FINRA, the Financial Industry Regulatory Authority, shares best practices and lessons learned when building a self-service, curated data science platform on AWS. A project that allowed us to remove the technology middleman and empower users to choose the best compute environment for their workloads. Understand the architecture and underlying data infrastructure services to provide a secure, self-service portal to data scientists, learn how we built consensus for tooling from of our data science community, hear about the benefits of increased collaboration among the scientists due to the standardized tools, and learn how you can retain the freedom to experiment with the latest technologies while retaining information security boundaries within a virtual private cloud (VPC).
BDM204 – Visualizing Big Data Insights with Amazon QuickSight Amazon QuickSight is a fast BI service that makes it easy for you to build visualizations, perform ad-hoc analysis, and quickly get business insights from your data. QuickSight is built to harness the power and scalability of the cloud, so you can easily run analysis on large datasets, and support hundreds of thousands of users. In this session, we’ll demonstrate how you can easily get started with Amazon QuickSight, uploading files, connecting to Amazon S3 and Amazon Redshift and creating analyses from visualizations that are optimized based on the underlying data. After we’ve built our analysis and dashboard, we’ll show you easy it is to share it with colleagues and stakeholders in just a few seconds.
BDM303 – JustGiving: Serverless Data Pipelines, Event-Driven ETL, and Stream Processing Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), application programming interfaces (API), clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. Building scalable big data pipelines with automated extract-transform-load (ETL) and machine learning processes can address these limitations. JustGiving is the world’s social platform for giving. In this session, we describe how we created several scalable and loosely coupled event-driven ETL and ML pipelines as part of our in-house data science platform called RAVEN. You learn how to leverage AWS Lambda, Amazon S3, Amazon EMR, Amazon Kinesis, and other services to build serverless, event-driven, data and stream processing pipelines in your organization. We review common design patterns, lessons learned, and best practices, with a focus on serverless big data architectures with AWS Lambda.
BDM306 – Netflix: Using Amazon S3 as the fabric of our big data ecosystem Amazon S3 is the central data hub for Netflix’s big data ecosystem. We currently have over 1.5 billion objects and 60+ PB of data stored in S3. As we ingest, transform, transport, and visualize data, we find this data naturally weaving in and out of S3. Amazon S3 provides us the flexibility to use an interoperable set of big data processing tools like Spark, Presto, Hive, and Pig. It serves as the hub for transporting data to additional data stores / engines like Teradata, Amazon Redshift, and Druid, as well as exporting data to reporting tools like Microstrategy and Tableau. Over time, we have built an ecosystem of services and tools to manage our data on S3. We have a federated metadata catalog service that keeps track of all our data. We have a set of data lifecycle management tools that expire data based on business rules and compliance. We also have a portal that allows users to see the cost and size of their data footprint. In this talk, we’ll dive into these major uses of S3, as well as many smaller cases, where S3 smoothly addresses an important data infrastructure need. We also provide solutions and methodologies on how you can build your S3 big data hub.
BDM402 – Best Practices for Data Warehousing with Amazon Redshift In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift’s columnar technology and parallel processing capabilities to deliver high throughput and query performance and you learn from king.com how to design optimal schemas, load data efficiently, and use workload management.
DAT202 – Migrating Your Data Warehouse to Amazon Redshift Amazon Redshift is a fast, simple, cost-effective data warehousing solution, and in this session, we look at the tools and techniques you can use to migrate your existing data warehouse to Amazon Redshift. We then present a case study on Scholastic’s migration to Amazon Redshift. Scholastic, a large 100-year-old publishing company, was running their business with older, on-premise, data warehousing and analytics solutions, which could not keep up with business needs and were expensive. Scholastic also needed to include new capabilities like streaming data and real-time analytics. Scholastic migrated to Amazon Redshift, and achieved agility and faster time to insight while dramatically reducing costs. In this session, Scholastic discusses how they achieved this, including options considered, technical architecture implemented, results, and lessons learned.
DAT204 – How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to Minutes with MongoDB & AWS Mass spectrometry is the gold standard for determining chemical compositions, with spectrometers often measuring the mass of a compound down to a single electron. This level of granularity produces an enormous amount of hierarchical data that doesn’t fit well into rows and columns. In this talk, learn how Thermo Fisher is using MongoDB Atlas on AWS to allow their users to get near real-time insights from mass spectrometry experiments—a process that used to take days. We also share how the underlying database service used by Thermo Fisher was built on AWS.
DAT205 – Fanatics Migrates Data to Hadoop on the AWS Cloud Using Attunity CloudBeam in AWS Marketplace Keeping a data warehouse current and relevant can be challenging because of the time and effort required to insert new data. The world’s most licensed sports merchandiser, Fanatics, used Attunity CloudBeam in AWS Marketplace to transform their data from Microsoft SQL, Oracle, and other sources to Amazon S3, where they consume the data in Hadoop and Amazon Redshift. Fanatics can now analyze the huge volumes of data from their transactional, e-commerce, and back office systems, and make this data available immediately. In this session, Fanatics shares their use case, requirements, outcomes and lessons learned. You’ll learn best practices on implementing a data lake, using Apache Kafka and how to consistently replicate data to Amazon Redshift and Amazon S3.
DAT308 – Fireside chat with Groupon, Intuit, and LifeLock on solving Big Data database challenges with Redis Redis Labs’ CMO is hosting a fireside chat with leaders from multiple industries including Groupon (e-commerce), Intuit (Finance), and LifeLock (Identity Protection). This conversation-style session covers the Big Data related challenges faced by these leading companies as they scale their applications, ensure high availability, serve the best user experience at lowest latencies, and optimize between cloud and on-premises operations. The introductory level session can appeal to both developer and DevOps functions. Attendees hear about diverse use cases such as recommendations engine, hybrid transactions and analytics operations, and time-series data analysis. The audience learns how the Redis in-memory database platform addresses the above use cases with its multi-model capability and in a cost effective manner to meet the needs of the next generation applications. Session sponsored by Redis Labs.
DAT309 – How Fulfillment by Amazon (FBA) and Scopely Improved Results and Reduced Costs with a Serverless Architecture In this session, we share an overview of leveraging serverless architectures to support high-performance data intensive applications. Fulfillment by Amazon (FBA) built the Seller Inventory Authority Platform (IAP) using Amazon DynamoDB Streams, AWS Lambda functions, Amazon Elasticsearch Service, and Amazon Redshift to improve results and reduce costs. Scopely shares how they used a flexible logging system built on Amazon Kinesis, Lambda, and Amazon ES to provide high-fidelity reporting on hotkeys in Memcached and DynamoDB, and drastically reduce the incidence of hotkeys. Both of these customers are using managed services and serverless architecture to build scalable systems that can meet the projected business growth without a corresponding increase in operational costs
DAT310 – Building Real-Time Campaign Analytics Using AWS Services Quantcast provides its advertising clients the ability to run targeted ad campaigns reaching millions of online users. The real-time bidding for campaigns runs on thousands of machines across the world. When Quantcast wanted to collect and analyze campaign metrics in real time, they turned to AWS to rapidly build a scalable, resilient, and extensible framework. Quantcast used Amazon Kinesis streams to stage data, Amazon EC2 instances to shuffle and aggregate the data, and Amazon DynamoDB and Amazon ElastiCache for building scalable time-series databases. With Elastic Load Balancing and Auto Scaling groups, they can set up distributed microservices with minimal operation overhead. This session discusses their use case, how they architected the application with AWS technologies integrated with their existing home-grown stack, and the lessons they learned.
DAT311 – How Toyota Racing Development Makes Racing Decisions in Real Time with AWS In this session, you learn how Toyota Racing Development (TRD) developed a robust and highly performant real-time data analysis tool for professional racing. In this talk, learn how we structured a reliable, maintainable, decoupled architecture built around Amazon DynamoDB as both a streaming mechanism and a long-term persistent data store. In racing, milliseconds matter and even moments of downtime can cost a race. You’ll see how we used DynamoDB together with Amazon Kinesis Streams and Amazon Kinesis Firehose to build a real-time streaming data analysis tool for competitive racing.
DAT312 – How DataXu scaled its Attribution System to handle billions of events per day with Amazon DynamoDB “Attribution” is the marketing term of art for allocating full or partial credit to advertisements that eventually lead to purchase, sign up, download, or other desired consumer interaction. DataXu shares how we use DynamoDB at the core of our attribution system to store terabytes of advertising history data. The system is cost effective and dynamically scales from 0 to 300K requests per second on demand with predictable performance and low operational overhead.
DAT313 – 6 Million New Registrations in 30 Days: How the Chick-fil-A One App Scaled with AWS Chris leads the team providing back-end services for the massively popular Chick-fil-A One mobile app that launched in June 2016. Chick-fil-A follows AWS best practices for web services and leverages numerous AWS services, including Elastic Beanstalk, DynamoDB, Lambda, and Amazon S3. This was the largest technology-dependent promotion in Chick-fil-A history. To ensure their architecture would perform at unknown and massive scale, Chris worked with AWS Support through an AWS Infrastructure Event Management (IEM) engagement and leaned on automated operations to enable load testing before launch.
DAT316 – How Telltale Games migrated its story analytics from Apache CouchDB to Amazon DynamoDB Every choice made in Telltale Games titles influences how your character develops and how the world responds to you. With millions of users making thousands of choices in a single episode, Telltale Games tracks this data and leverages it to build more relevant stories in real time as the season is developed. In this session, you’ll learn about Telltale Games’ migration from Apache CouchDB to Amazon DynamoDB, the challenges of adjusting capacity to handling spikes in database activity, and how it streamlined its analytics storage to provide new perspectives on player interaction to improve its games.
DAT318 – Migrating from RDBMS to NoSQL: How Sony Moved from MySQL to Amazon DynamoDB In this session, you learn the key differences between a relational database management service (RDBMS) and non-relational (NoSQL) databases like Amazon DynamoDB. You learn about suitable and unsuitable use cases for NoSQL databases. You’ll learn strategies for migrating from an RDBMS to DynamoDB through a 5-phase, iterative approach. See how Sony migrated an on-premises MySQL database to the cloud with Amazon DynamoDB, and see the results of this migration.
GAM301 – How EA Leveraged Amazon Redshift and AWS Partner 47Lining to Gather Meaningful Player Insights In November 2015, Capital Games launched a mobile game accompanying a major feature film release. The back end of the game is hosted in AWS and uses big data services like Amazon Kinesis, Amazon EC2, Amazon S3, Amazon Redshift, and AWS Data Pipeline. Capital Games describe some of their challenges on their initial setup and usage of Amazon Redshift and Amazon EMR. They then go over their engagement with AWS Partner 47lining and talk about specific best practices regarding solution architecture, data transformation pipelines, and system maintenance using AWS big data services. Attendees of this session should expect a candid view of the process to implementing a big data solution. From problem statement identification to visualizing data, with an in-depth look at the technical challenges and hurdles along the way.
LFS303 – How to Build a Big Data Analytics Data Lake For discovery phase research, life sciences companies have to support infrastructure that processes millions to billions of transactions. The advent of a data lake to accomplish such a task is showing itself to be a stable and productive data platform pattern to meet the goal. We discuss how to build a data lake on AWS, using services and techniques such as AWS CloudFormation, Amazon EC2, Amazon S3, IAM, and AWS Lambda. We also review a reference architecture from Amgen that uses a data lake to aid in their Life Science Research.
SVR301 – Real-time Data Processing Using AWS Lambda, Amazon Kinesis In this session, you learn from Thomson Reuters how they leverage AWS for its Product Insight service. The service provides insights to collect usage analytics for Thomson Reuters products. They walk through its architecture and demonstrate how they leverage Amazon Kinesis Streams, Amazon Kinesis Firehose, AWS Lambda, Amazon S3, Amazon Route 53, and AWS KMS for near real-time access to data being collected around the globe. They also outline how applying AWS methodologies benefited its business, such as time-to-market and cross-region ingestion, auto-scaling capabilities, low-latency, security features, and extensibility.
SVR305 – ↑↑↓↓←→←→ BA Lambda Start Ever wished you had a list of cheat codes to unleash the full power of AWS Lambda for your production workload? Come learn how to build a robust, scalable, and highly available serverless application using AWS Lambda. In this session, we discuss hacks and tricks for maximizing your AWS Lambda performance, such as leveraging customer reuse, using the 500 MB scratch space and local cache, creating custom metrics for managing operations, aligning upstream and downstream services to scale along with Lambda, and many other workarounds and optimizations across your entire function lifecycle. You also learn how Hearst converted its real-time clickstream analytics data pipeline from a server-based model to a serverless one. The infrastructure of the data pipeline relied on Amazon EC2 instances and cron jobs to shepherd data through the process. In 2016, Hearst converted its data pipeline architecture to a serverless process that is based on event triggers and the power of AWS Lambda. By moving from a time-based process to a trigger-based process, Hearst improved its pipeline latency times by 50%.
SVR308 – Content and Data Platforms at Vevo: Rebuilding and Scaling from Zero in One Year Vevo has undergone a complete strategic and technical reboot, driven not only by product but also by engineering. Since November 2015, Vevo has been replacing monolithic, legacy content services with a modern, modular, microservices architecture, all while developing new features and functionality. In parallel, Vevo has built its data platform from scratch to power the internal analytics as well as a unique music video consumption experience through a new personalized feed of recommendations — all in less than one year. This has been a monumental effort that was made possible in this short time span largely because of AWS technologies. The content team has been heavily using serverless architectures and AWS Lambda in the form of microservices, taking a similar approach to functional programming, which has helped us speed up the development process and time to market. The data team has been building the data platform by heavily leveraging Amazon Kinesis for data exchange across services, Amazon Aurora for consumer-facing services, Apache Spark on Amazon EMR for ETL + Machine Learning, as well as Amazon Redshift as the core analytics data store..
Machine learning sessions
MAC201 – Getting to Ground Truth with Amazon Mechanical Turk Jump-start your machine learning project by using the crowd to build your training set. Before you can train your machine learning algorithm, you need to take your raw inputs and label, annotate, or tag them to build your ground truth. Learn how to use the Amazon Mechanical Turk marketplace to perform these tasks. We share Amazon’s best practices, developed while training our own machine learning algorithms and walk you through quickly getting affordable and high-quality training data.
MAC202 – Deep Learning in Alexa Neural networks have a long and rich history in automatic speech recognition. In this talk, we present a brief primer on the origin of deep learning in spoken language, and then explore today’s world of Alexa. Alexa is the AWS service that understands spoken language and powers Amazon Echo. Alexa relies heavily on machine learning and deep neural networks for speech recognition, text-to-speech, language understanding, and more. We also discuss the Alexa Skills Kit, which lets any developer teach Alexa new skills.
MAC205 – Deep Learning at Cloud Scale: Improving Video Discoverability by Scaling Up Caffe on AWS Deep learning continues to push state of the art in domains such as video analytics, computer vision, and speech recognition. Deep networks are powered by amazing levels of representational power, feature learning, and abstraction. This approach comes at the cost of a significant increase in required compute power, which makes the AWS cloud an excellent environment for training. Innovators in this space are applying deep learning to a variety of applications. One such innovator, Vilynx, a startup based in Palo Alto, realized that the current pre-roll advertising-based models for mobile video weren’t returning publishers’ desired levels of engagement. In this session, we explain the algorithmic challenges of scaling across multiple nodes, and what Intel is doing on AWS to overcome them. We describe the benefits of using AWS CloudFormation to set up a distributed training environment for deep networks. We also showcase Vilynx’s contributions to video discoverability and explain how Vilynx uses AWS tools to understand video content.
MAC301 – Transforming Industrial Processes with Deep Learning Deep learning has revolutionized computer vision by significantly increasing the accuracy of recognition systems. This session discusses how the Amazon Fulfillment Technologies Computer Vision Research team has harnessed deep learning to identify inventory defects in Amazon’s warehouses. Beginning with a brief overview of how orders on Amazon.com are fulfilled, the session describes a combination of hardware and software that uses computer vision and deep learning that visually examine bins of Amazon inventory to locate possible mismatches between the physical inventory and inventory records. With the growth of deep learning, the emphasis of new system design shifts from clever algorithms to innovative ways to harness available data.
MAC302 – Leveraging Amazon Machine Learning, Amazon Redshift, and an Amazon Simple Storage Service Data Lake for Strategic Advantage in Real Estate The Howard Hughes Corporation partnered with 47Lining to develop a managed enterprise data lake based on Amazon S3. The purpose of the managed EDL is to fuse relevant on-premises and third-party data to enable Howard Hughes to answer its most valuable business questions. Their first analysis was a lead-scoring model that uses Amazon Machine Learning (Amazon ML) to predict propensity to purchase high-end real estate. The model is based on a combined set of public and private data sources, including all publicly recorded real estate transactions in the US for the past 35 years. By changing their business process for identifying and qualifying leads to use the results of data-driven analytics from their managed data lake in AWS, Howard Hughes increased the number of identified qualified leads in their pipeline by over 400% and reduced the acquisition cost per lead by more than 10 times. In this session, you see a practical example of how to use Amazon ML to improve business results, how to architect a data lake with Amazon S3 that fuses on-premises, third-party, and public datasets, and how to train and run an Amazon ML model to attain predictions
MAC303 – Developing Classification and Recommendation Engines with Amazon EMR and Apache Spark Customers are adopting Apache Spark‒a set of open-source distributed machine learning algorithms‒on Amazon EMR for large-scale machine learning workloads, especially for applications that power customer segmentation and content recommendation. By leveraging Spark ML, customers can quickly build and execute massively parallel machine learning jobs. Additionally, Spark applications can train models in streaming or batch contexts and can access data from Amazon S3, Amazon Kinesis, Apache Kafka, Amazon Elasticsearch Service, Amazon Redshift, and other services. This session explains how to quickly and easily create scalable Spark clusters with Amazon EMR, build and share models using Apache Zeppelin notebooks, and create a sample application using Spark Streaming, which updates models with real-time data.
MAC306 – Using MXNet for Recommendation Modeling at Scale For many companies, recommendation systems solve important machine learning problems. But as recommendation systems grow to millions of users and millions of items, they pose significant challenges when deployed at scale. The user-item matrix can have trillions of entries (or more), most of which are zero. To make common ML techniques practical, sparse data requires special techniques. Learn how to use MXNet to build neural network models for recommendation systems that can scale efficiently to large sparse datasets.
MAC307 – Predicting Customer Churn with Amazon Machine Learning In this session, we take a specific business problem—predicting Telco customer churn—and explore the practical aspects of building and evaluating an Amazon Machine Learning model. We explore considerations ranging from assigning a dollar value to applying the model using the relative cost of false positive and false negative errors. We discuss all aspects of putting Amazon ML to practical use, including how to build multiple models to choose from, put models into production, and update them. We also discuss using Amazon Redshift and Amazon S3 with Amazon ML.
Services sessions: Architecture and best practices
BDM201 – Big Data Architectural Patterns and Best Practices on AWS The world is producing an ever-increasing volume, velocity, and variety of big data. Consumers and businesses are demanding up-to-the-second (or even millisecond) analytics on their fast-moving data, in addition to classic batch processing. AWS delivers many technologies for solving big data problems. But what services should you use, why, when, and how? In this session, we simplify big data processing as a data bus comprising various stages: ingest, store, process, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architecture, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost
BDM301 – Best Practices for Apache Spark on Amazon EMR Organizations need to perform increasingly complex analysis on data — streaming analytics, ad-hoc querying, and predictive analytics — in order to get better customer insights and actionable business intelligence. Apache Spark has recently emerged as the framework of choice to address many of these challenges. In this session, we show you how to use Apache Spark on AWS to implement and scale common big data use cases such as real-time data processing, interactive data science, predictive analytics, and more. We talk about common architectures, best practices to quickly create Spark clusters using Amazon EMR, and ways to integrate Spark with other big data services in AWS.
BDM302 – Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana Elasticsearch is a fully featured search engine used for real-time analytics, and Amazon Elasticsearch Service makes it easy to deploy Elasticsearch clusters on AWS. With Amazon ES, you can ingest and process billions of events per day, and explore the data using Kibana to discover patterns. In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution. First, we cover how to configure an Amazon ES cluster and ingest data into it using Amazon Kinesis Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data. Then we demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
BDM304 – Analyzing Streaming Data in Real-time with Amazon Kinesis Analytics As more and more organizations strive to gain real-time insights into their business, streaming data has become ubiquitous. Typical streaming data analytics solutions require specific skills and complex infrastructure. However, with Amazon Kinesis Analytics, you can analyze streaming data in real time with standard SQL—there is no need to learn new programming languages or processing frameworks. In this session, we dive deep into the capabilities of Amazon Kinesis Analytics using real-world examples. We’ll present an end-to-end streaming data solution using Amazon Kinesis Streams for data ingestion, Amazon Kinesis Analytics for real-time processing, and Amazon Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Amazon Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system.
BDM401 – Deep Dive: Amazon EMR Best Practices & Design Patterns Amazon EMR is one of the largest Hadoop operators in the world. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features.
DAT304 – Deep Dive on Amazon DynamoDB Explore Amazon DynamoDB capabilities and benefits in detail and learn how to get the most out of your DynamoDB database. We go over best practices for schema design with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, DynamoDB Streams, and more. We also provide lessons learned from operating DynamoDB at scale, including provisioning DynamoDB for IoT.
BDM202 – Workshop: Building Your First Big Data Application with AWS Want to get ramped up on how to use Amazon’s big data web services and launch your first big data application on AWS? Join us in this workshop as we build a big data application in real-time using Amazon EMR, Amazon Redshift, Amazon Kinesis, Amazon DynamoDB, and Amazon S3. We review architecture design patterns for big data solutions on AWS and give you access to a take-home lab so that you can rebuild and customize the application yourself.
IOT306 – IoT Visualizations and Analytics In this workshop, we focus on visualizations of IoT data using ELK, Amazon Elasticsearch Service, Logstash, and Kibana or Amazon Kinesis. We dive into how these visualizations can give you new capabilities and understanding when interacting with your device data from the context they provide on the world around them.
MAC401 – Scalable Deep Learning Using MXNet Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding, and recommendation engines. One of the key reasons for this progress is the availability of highly flexible and developer friendly deep learning frameworks. During this workshop, members of the Amazon Machine Learning team provide a short background on Deep Learning focusing on relevant application domains and an introduction to using the powerful and scalable Deep Learning framework, MXNet. At the end of this tutorial, you’ll gain hands-on experience targeting a variety of applications including computer vision and recommendation engines as well as exposure to how to use preconfigured Deep Learning AMIs and CloudFormation Templates to help speed your development.
STG312 – Workshop: Working with AWS Snowball – Accelerating Data Ingest into the Cloud This workshop provides customers with the opportunity to work hands-on with the AWS Snowball service, with attendees broken out into small teams to perform various on-premises to cloud data transfer scenarios using actual Snowball devices. These scenarios include migrating backup & archive data to S3-IA and Amazon Glacier, HDFS cluster migration to S3 for use with Amazon EMR and Amazon Redshift, and leveraging the Snowball API & SDK to build AWS Snowball service integration into a custom application. The session opens with an overview of the service, objectives, and guidance on where to find resources. Attendees should bring their own laptops and should have a basic familiarity with AWS storage services (S3 and Amazon Glacier). Prerequisites: Participants should have an AWS account established and available for use during the workshop. Please bring your own laptop.
A week ago Friday, someone took down numerous popular websites in a massive distributed denial-of-service (DDoS) attack against the domain name provider Dyn. DDoS attacks are neither new nor sophisticated. The attacker sends a massive amount of traffic, causing the victim’s system to slow to a crawl and eventually crash. There are more or less clever variants, but basically, it’s a datapipe-size battle between attacker and victim. If the defender has a larger capacity to receive and process data, he or she will win. If the attacker can throw more data than the victim can process, he or she will win.
The attacker can build a giant data cannon, but that’s expensive. It is much smarter to recruit millions of innocent computers on the internet. This is the “distributed” part of the DDoS attack, and pretty much how it’s worked for decades. Cybercriminals infect innocent computers around the internet and recruit them into a botnet. They then target that botnet against a single victim.
You can imagine how it might work in the real world. If I can trick tens of thousands of others to order pizzas to be delivered to your house at the same time, I can clog up your street and prevent any legitimate traffic from getting through. If I can trick many millions, I might be able to crush your house from the weight. That’s a DDoS attack it’s simple brute force.
As you’d expect, DDoSers have various motives. The attacks started out as a way to show off, then quickly transitioned to a method of intimidation or a way of just getting back at someone you didn’t like. More recently, they’ve become vehicles of protest. In 2013, the hacker group Anonymous petitioned the White House to recognize DDoS attacks as a legitimate form of protest. Criminals have used these attacks as a means of extortion, although one group found that just the fear of attack was enough. Military agencies are also thinking about DDoS as a tool in their cyberwar arsenals. A 2007 DDoS attack against Estonia was blamed on Russia and widely called an act of cyberwar.
The DDoS attack against Dyn two weeks ago was nothing new, but it illustrated several important trends in computer security.
These attack techniques are broadly available. Fully capable DDoS attack tools are available for free download. Criminal groups offer DDoS services for hire. The particular attack technique used against Dyn was first used a month earlier. It’s called Mirai, and since the source code was released four weeks ago, over a dozen botnets have incorporated the code.
The Dyn attacks were probably not originated by a government. The perpetrators were most likely hackers mad at Dyn for helping Brian Krebs identify and the FBI arrest two Israeli hackers who were running a DDoS-for-hire ring. Recently I have written about probing DDoS attacks against internet infrastructure companies that appear to be perpetrated by a nation-state. But, honestly, we don’t know for sure.
This is important. Software spreads capabilities. The smartest attacker needs to figure out the attack and write the software. After that, anyone can use it. There’s not even much of a difference between government and criminal attacks. In December 2014, there was a legitimate debate in the security community as to whether the massive attack against Sony had been perpetrated by a nation-state with a $20 billion military budget or a couple of guys in a basement somewhere. The internet is the only place where we can’t tell the difference. Everyone uses the same tools, the same techniques and the same tactics.
These attacks are getting larger. The Dyn DDoS attack set a record at 1.2 Tbps. The previous record holder was the attack against cybersecurity journalist Brian Krebs a month prior at 620 Gbps. This is much larger than required to knock the typical website offline. A year ago, it was unheard of. Now it occurs regularly.
The botnets attacking Dyn and Brian Krebs consisted largely of unsecure Internet of Things (IoT) devices webcams, digital video recorders, routers and so on. This isn’t new, either. We’ve already seen internet-enabled refrigerators and TVs used in DDoS botnets. But again, the scale is bigger now. In 2014, the news was hundreds of thousands of IoT devices the Dyn attack used millions. Analysts expect the IoT to increase the number of things on the internet by a factor of 10 or more. Expect these attacks to similarly increase.
The problem is that these IoT devices are unsecure and likely to remain that way. The economics of internet security don’t trickle down to the IoT. Commenting on the Krebs attack last month, I wrote:
The market can’t fix this because neither the buyer nor the seller cares. Think of all the CCTV cameras and DVRs used in the attack against Brian Krebs. The owners of those devices don’t care. Their devices were cheap to buy, they still work, and they don’t even know Brian. The sellers of those devices don’t care: They’re now selling newer and better models, and the original buyers only cared about price and features. There is no market solution because the insecurity is what economists call an externality: It’s an effect of the purchasing decision that affects other people. Think of it kind of like invisible pollution.
To be fair, one company that made some of the unsecure things used in these attacks recalled its unsecure webcams. But this is more of a publicity stunt than anything else. I would be surprised if the company got many devices back. We already know that the reputational damage from having your unsecure software made public isn’t large and doesn’t last. At this point, the market still largely rewards sacrificing security in favor of price and time-to-market.
DDoS prevention works best deep in the network, where the pipes are the largest and the capability to identify and block the attacks is the most evident. But the backbone providers have no incentive to do this. They don’t feel the pain when the attacks occur and they have no way of billing for the service when they provide it. So they let the attacks through and force the victims to defend themselves. In many ways, this is similar to the spam problem. It, too, is best dealt with in the backbone, but similar economics dump the problem onto the endpoints.
We’re unlikely to get any regulation forcing backbone companies to clean up either DDoS attacks or spam, just as we are unlikely to get any regulations forcing IoT manufacturers to make their systems secure. This is me again:
What this all means is that the IoT will remain insecure unless government steps in and fixes the problem. When we have market failures, government is the only solution. The government could impose security regulations on IoT manufacturers, forcing them to make their devices secure even though their customers don’t care. They could impose liabilities on manufacturers, allowing people like Brian Krebs to sue them. Any of these would raise the cost of insecurity and give companies incentives to spend money making their devices secure.
That leaves the victims to pay. This is where we are in much of computer security. Because the hardware, software and networks we use are so unsecure, we have to pay an entire industry to provide after-the-fact security.
There are solutions you can buy. Many companies offer DDoS protection, although they’re generally calibrated to the older, smaller attacks. We can safely assume that they’ll up their offerings, although the cost might be prohibitive for many users. Understand your risks. Buy mitigation if you need it, but understand its limitations. Know the attacks are possible and will succeed if large enough. And the attacks are getting larger all the time. Prepare for that.
На 15 септември стана известно решение на Съда на ЕС по дело C‑484/14 с предмет преюдициално запитване, отправено на основание член 267 ДФЕС от Landgericht München I (Областен съд Мюнхен I, Германия) в рамките на производство по дело Tobias Mc Fadden срещу Sony Music Entertainment Germany GmbH относно евентуалната отговорност за използването от трето лице на безжичната локална мрежа (Wireless local area network (WLAN), която г‑н Mc Fadden поддържа и чрез която е предоставен на публично разположение звукозапис, продуциран от Sony Music.
Г‑н Mc Fadden е управител на предприятие, което продава или отдава под наем осветителна и звукова техника. Той поддържа безжична локална мрежа, предоставяща в близост до предприятието му безплатен и анонимен достъп до интернет. За да предоставя достъпа до интернет, г‑н Mc Fadden използва услугите на предприятие за далекосъобщителни услуги. Достъпът до мрежата умишлено не е защитен с цел да се привлече вниманието върху дружеството на клиенти от съседни магазини, минувачи и съседи. Чрез поддържаната от г‑н Mc Fadden мрежа на публично разположение в интернет безвъзмездно е предоставено музикално произведение без съгласието на притежателите на правата върху него. Г‑н Mc Fadden твърди, че не е извършил твърдяното нарушение, но не може да изключи, че то е извършено от някой от потребителите на неговата мрежа.
Sony Music е продуцент на звукозаписа на това произведение. Запитващата юрисдикция възнамерява да ангажира косвената отговорност (Störerhaftung) на г‑н Mc Fadden, тъй като същият не е защитил мрежата и с това е позволил анонимното извършване на нарушението.
Изяснява се прилагането на чл.12.1 от Директивата за електронната търговия
1. Когато се предоставя услуга на информационното общество, която се състои в пренасяне по комуникационната мрежа на информация за получателя на услугата, или предоставяне на достъп до комуникационна мрежа, държавите членки гарантират, че доставчикът на услуги не носи отговорност за пренесената информация, при условие че доставчикът:
а) не започва пренасянето на информация;
б) не подбира получателя на пренесената информация; и
в) не подбира или променя информацията, която се съдържа в пренасянето.
Зададени са 10 въпроса с подвъпроси, целящи изясняване на отговорността при използване на Wi Fi (Терминът „Wi-Fi“ е общоизползван термин за означаване на безжична мрежа и е марка, която се отнася за най-често срещания стандарт на безжична мрежа. Общият термин за означаване на всякакъв вид безжична мрежа е „WLAN“ – Wireless local area network).
Включително се задава въпрос следва ли чл.12.1 да се тълкува в смисъл, че допуска съдът да постанови решение, с което да забрани на доставчика занапред да дава възможност на трети лица чрез конкретна интернет връзка да предоставят електронен достъп до защитено с авторско право произведение или части от него чрез интернет платформите за обмен на файлове (peer-to-peer).
Какво е тълкуването, когато доставчикът на достъп в действителност може да изпълни тази съдебна забрана само като изключи интернет връзката или я защити с парола или като проследява цялата протичаща през тази връзка комуникация с цел да установи дали отново е налице незаконно пренасяне на защитено с авторско право произведение, при положение че това е установено […] още в самото начало, а […] не едва в производството по принудително изпълнение или в административно-наказателното производство.
По преюдициалните въпроси
1 Услуги на информационното общество по член 12, параграф 1 от Директива 2000/31 могат да са само услуги, които нормално се предоставят срещу възнаграждение. Въз основа на това обаче не може да се направи изводът, че икономическа по естеството си услуга, която е предоставена безвъзмездно, не може никога да се счита за „услуга на информационното общество“ – защото възнаграждението за услуга, която доставчик предоставя в рамките на икономическата си дейност, не трябва непременно да е платено от получателите ѝ – например когато доставчик безвъзмездно предоставя услуга с цел реклама на стоките, които продава, и на услугите, които предоставя, като разходите за тази дейност са включени в продажната цена на тези стоки или услуги [т.34 – 43].
Следователно услуга като разглежданата в главното производство, предоставяна от лице, което поддържа комуникационна мрежа, и състояща се в безплатното предоставяне на публично разположение на тази мрежа, представлява „услуга на информационното общество“, ако съответният доставчик я извършва с цел реклама на стоките, които продава, или на услугите, които предоставя.
2. Визираната в чл.12.1 услуга, състояща се в предоставяне на достъп до комуникационна мрежа, се счита за предоставена, ако е налице достъп – технически, автоматичен и пасивен способ за осигуряване на пренасянето на необходимата информация – без да се изисква да е изпълнено каквото и да било допълнително изискване.
3. Разликата между отговорността за кеширане и отговорността за предоставяне на достъп до интернет- чл.14 не се прилага по аналогия
Лице, което съхранява уебсайт, предоставя услуга по съхраняване на информация за определен период от време. Следователно то може да узнае за незаконния характер на дадена информация, която съхранява, на по-късен етап, след като вече я е съхранило, като все още може да предприеме действия с оглед отстраняването или блокирането на достъпа до нея.
Лице, което предоставя достъп до комуникационна мрежа, предоставя услуга по пренос на информация, която обикновено не се проточва във времето, така че след като е пренесло информацията, то не упражнява какъвто и да било контрол върху нея. Предвид това лицето, което предоставя достъп до комуникационна мрежа, за разлика от лицето, което съхранява уебсайт, често няма възможност да предприеме на по-късен етап действия с оглед отстраняването или блокирането на достъпа до съответната информация.
4. Когато трето лице извърши нарушение посредством интернет връзка, която доставчик на достъп до комуникационна мрежа му е предоставил на разположение, член 12, параграф 1 от споменатата директива допуска увреденото от това нарушение лице да поиска от национален орган или съд да забрани на доставчика да позволява продължаване на това нарушение ( вж и член 12, параграф 3). Ето защо Съдът приема, че чл.12
не допуска лице, увредено от нарушение на правата му върху произведение, да може да предяви искане за обезщетение за вреди от доставчик на достъп до комуникационна мрежа с мотива, че такъв достъп е използван от трети лица за нарушение на правата му –
но допуска това лице да предяви искане за забрана на продължаването на нарушението
5. И накрая, Съдът обсъжда може ли да се изисква от доставчик на достъп до комуникационна мрежа, позволяваща публично достъпна интернет връзка, да възпрепятства трети лица да предоставят на публично разположение посредством тази интернет връзка защитено с авторско право произведение или части от него чрез интернет платформите за обмен на файлове (peer-to-peer), когато доставчикът действително е свободен да избира какви технически мерки да вземе за съобразяване с тази забрана, но на практика е установено, че единствените мерки, които би могъл да вземе, са или да спре интернет връзката, или да я защити с парола, или да проследява цялата пренасяна посредством тази връзка информация [т.80 – 101].
Съдът констатира конкуренция на права – право на интелектуална собственост и право на свободна стопанска дейност – и съответно търси справедливо равновесие. Като разглежда трите опции, Съдът приема, че мярката, състояща се в защита на интернет връзката с парола, може да доведе до ограничаване както на правото на стопанска инициатива на доставчика на услуга за достъп до комуникационна мрежа, така и правото на свобода на информацията на получателите на тази услуга – но при все това следва да се констатира:
На първо място, че такава мярка не засяга същественото съдържание на правото на стопанска инициатива на доставчик на достъп до комуникационна мрежа, тъй като само незначително променя един от техническите способи за извършване на дейността му;
На второ място, мярка, състояща се в защита на интернет връзка, не изглежда да може да засегне същественото съдържание на правото на свободна информация на получателите на услуга за достъп до интернет мрежа, доколкото само изисква от последните да поискат да получат парола, като при това тази връзка е само един от начините да имат достъп до интернет
На трето място, видно от съдебната практика, взетата мярка трябва да е с точно определена цел, в смисъл че трябва да служи за преустановяване на извършвано от трето лице нарушение на авторско право или на сродно на него право, без при това да засяга възможността на потребителите на интернет да ползват услугите на този доставчик за правомерен достъп до информация. В противен случай намесата на доставчика в свободата на информация на потребителите би била необоснована с оглед на преследваната цел (решение от 27 март 2014 г., UPC Telekabel Wien, C‑314/12, EU:C:2014:192, т. 56).
При все това обаче мярка, взета от доставчик на достъп до комуникационна мрежа и състояща се в защита на връзката на тази мрежа с интернет, не изглежда да може да засегне възможността, с която разполагат ползващите услугите на този доставчик потребители на интернет, да имат правомерен достъп до информация, тъй като не води до каквото и да било блокиране на уебсайт.
На четвърто място, Съдът приема, че мерките, взети от адресат на забрана като разглежданата в главното производство за изпълнението ѝ, трябва да бъдат достатъчно ефикасни, за да осигурят ефективна защита на разглежданото основно право, тоест трябва да имат за резултат да предотвратят или поне да направят трудно осъществими неразрешените посещения на закриляни обекти и в значителна степен да разубеждават потребителите на интернет, които ползват услугите на адресата на това разпореждане, да посещават тези обекти, предоставени на тяхно разположение в нарушение на посоченото основно право (решение от 27 март 2014 г., UPC Telekabel Wien, C‑314/12, EU:C:2014:192, т. 62).
В това отношение трябва да се констатира, че мярка, състояща се в защита на интернет връзка с парола, може да разубеди потребителите на тази връзка да извършват нарушения на авторско право или на сродни на него права, доколкото тези потребители биха били задължени да се идентифицират, за да получат необходимата парола, и не биха могли следователно да действат анонимно, като запитващата юрисдикция следва да провери дали това е така.
На пето място, следва да се напомни, че според запитващата юрисдикция, освен трите посочени от нея мерки, не съществуват други мерки, които доставчик на достъп до комуникационна мрежа като разглежданата в главното производство на практика би могъл да приложи, за да изпълни забрана като разглежданата в главното производство.
След като Съдът отхвърли останалите две мерки, евентуална констатация, че доставчик на достъп до комуникационна мрежа не трябва и да защити интернет връзката си, би довела до лишаване на основното право на интелектулна собственост от всякаква защита, като това би противоречало на идеята за справедливото равновесие.
Предвид това мярка, целяща да се защити интернет връзка с парола, трябва да се приеме като необходима за гарантиране на ефективната защита на основното право на защита на интелектуалната собственост.
Защитата на връзката с парола, смята Съдът, трябва да се приеме за подходяща за установяване на справедливо равновесие между, от една страна, основното право на защита на интелектуалната собственост, и от друга страна, правото на стопанска инициатива на доставчика на услуга за достъп до комуникационна мрежа, както и правото на свобода на информацията на получателите на тази услуга. Ето и конкретният отговор:
Член 12, параграф 1 от Директива 2000/31, във връзка с член 12, параграф 3 от същата директива, трябва да се тълкува, предвид изискванията, следващи от защитата на основните права, както и от предвидените в Директиви 2001/29 и 2004/48 правила, в смисъл, че не допуска прилагането на скрепена със санкция забрана като разглежданата в главното производство, която изисква от доставчик на достъп до комуникационна мрежа, позволяваща публично достъпна интернет връзка, да възпрепятства трети лица да предоставят на публично разположение посредством тази интернет връзка защитено с авторско право произведение или части от него чрез интернет платформите за обмен на файлове (peer-to-peer), когато доставчикът е свободен да избира какви технически мерки да вземе за съобразяване с тази забрана, дори и ако изборът му се свежда само до една мярка, състояща се в защита на интернет връзката с парола, при условие че потребителите на мрежата бъдат задължени да се идентифицират, за да получат необходимата парола, и не могат следователно да действат анонимно, като запитващата юрисдикция следва да провери дали това е така.
Опасението, че Съдът ще намери точка на равновесие на правата в използването на пароли се появи въпреки позитивните оценки за заключението на Генералния адвокат Spuznar – и още тогава имаше акции в защита на свободния WiFi. В заключението се казваше, че принуждаването на доставчиците да въвеждат защита с парола може да обезкуражи или възпрепятства използването на услугата WiFi и по този начин да подкопае бизнес модела на доставчика. Szpunar коментира, че “предоставяне на активна превантивна роля на междинните доставчици на услуги би било в противоречие с техния специален статут, който е защитен по силата на Директива 2000/31” и че да се принуждават доставчиците на защита с парола не е пропорционална стратегия за защита на авторското право – вж оценката на EDRI за заключението на Szpunar.
Днес Комисията предложи нова инициатива, за да се даде възможност на всички заинтересовани местни власти да предлагат безплатен безжичен интернет (Wi-Fi) на всички граждани, например във и около обществени сгради, здравни центрове, паркове или площади. С първоначален бюджет от 120 млн. евро тази нова публична ваучерна схема има потенциал да осигури интернет на хиляди обществени места, където ще се осъществяват между 40 и 50 милиона Wi-Fi връзки на ден. Финансирането за инсталиране на локални точки за безжичен достъп до интернет трябва бързо да стане налично, след приемането на схемата от Европейския парламент и държавите от ЕС. До края на 2020 г. поне между 6000 и 8000 местни общности ще могат да се възползват от този нов проект. Както е предвидено в Директивата за електронната търговия, местните органи, предлагащи такива услуги на своите граждани, няма да носят отговорност за предаваното съдържание.
Само не е добавено, че се на цената на края на отворения (password-free) WiFi.
Широко се коментира разминаването между решението Mc Fadden от 15 септември 2016 и изявлението на Юнкер за свързаност и прочее от 14 септември 2016.
Реакцията на Юлия Реда: След това решение обявените от Юнкер цели изглеждат по-нереалистични от всякога. Съжалявам, че Съдът на Европейския съюз не следва заключението на Генералния адвокат, който установи, че задължението за осигуряване на достъп до Wi-Fi с парола би довело до непропорционално големи вреди за обществото като цяло. Ползата за обществото от безплатен безжичен Wi-Fi далеч надвишава потенциалните рискове за притежателите на авторски права.
In the past few years, the devastating effects of hackers breaking into an organization’s network, stealing confidential data, and publishing everything have been made clear. It happened to the Democratic National Committee, to Sony, to the National Security Agency, to the cyber-arms weapons manufacturer Hacking Team, to the online adultery site Ashley Madison, and to the Panamanian tax-evasion law firm Mossack Fonseca.
This style of attack is known as organizational doxing. The hackers, in some cases individuals and in others nation-states, are out to make political points by revealing proprietary, secret, and sometimes incriminating information. And the documents they leak do that, airing the organizations’ embarrassments for everyone to see.
In all of these instances, the documents were real: the email conversations, still-secret product details, strategy documents, salary information, and everything else. But what if hackers were to alter documents before releasing them? This is the next step in organizational doxing — and the effects can be much worse.
It’s one thing to have all of your dirty laundry aired in public for everyone to see. It’s another thing entirely for someone to throw in a few choice items that aren’t real.
Recently, Russia has started using forged documents as part of broader disinformation campaigns, particularly in relation to Sweden’s entering of a military partnership with NATO, and Russia’s invasion of Ukraine.
Forging thousands — or more — documents is difficult to pull off, but slipping a single forgery in an actual cache is much easier. The attack could be something subtle. Maybe a country that anonymously publishes another country’s diplomatic cables wants to influence yet a third country, so adds some particularly egregious conversations about that third country. Or the next hacker who steals and publishes email from climate change researchers invents a bunch of over-the-top messages to make his political point even stronger. Or it could be personal: someone dumping email from thousands of users making changes in those by a friend, relative, or lover.
Imagine trying to explain to the press, eager to publish the worst of the details in the documents, that everything is accurate except this particular email. Or that particular memo. That the salary document is correct except that one entry. Or that the secret customer list posted up on WikiLeaks is correct except that there’s one inaccurate addition. It would be impossible. Who would believe you? No one. And you couldn’t prove it.
It has long been easy to forge documents on the Internet. It’s easy to create new ones, and modify old ones. It’s easy to change things like a document’s creation date, or a photograph’s location information. With a little more work, pdf files and images can be altered. These changes will be undetectable. In many ways, it’s surprising that this kind of manipulation hasn’t been seen before. My guess is that hackers who leak documents don’t have the secondary motives to make the data dumps worse than they already are, and nation-states have just gotten into the document leaking business.
Major newspapers do their best to verify the authenticity of leaked documents they receive from sources. They only publish the ones they know are authentic. The newspapers consult experts, and pay attention to forensics. They have tense conversations with governments, trying to get them to verify secret documents they’re not actually allowed to admit even exist. This is only possible because the news outlets have ongoing relationships with the governments, and they care that they get it right. There are lots of instances where neither of these two things are true, and lots of ways to leak documents without any independent verification at all.
No one is talking about this, but everyone needs to be alert to the possibility. Sooner or later, the hackers who steal an organization’s data are going to make changes in them before they release them. If these forgeries aren’t questioned, the situations of those being hacked could be made worse, or erroneous conclusions could be drawn from the documents. When someone says that a document they have been accused of writing is forged, their arguments at least should be heard.
This is the second community-driven edition of the AWS Week in Review. Special thanks are due to the 13 external contributors who helped to make this happen. If you would like to contribute, please take a look at the AWS Week in Review on GitHub. Adding relevant content is fast and easy and can be done from the comfort of your web browser! Just to be clear, it is perfectly fine for you to add content written by someone else. The goal is to catch it all, as they say!
Bustle uses AWS Lambda to process high volumes of data generated by the website in real-time, allowing the team to make faster, data-driven decisions. Bustle.com is a news, entertainment, lifestyle, and fashion website catering to women.
Graze continually improves its customers’ experience by staying agile—including in its infrastructure. The company sells healthy snacks through its website and via U.K. retailers. It runs all its infrastructure on AWS, including its customer-facing websites and all its internal systems from the factory floor to business intelligence.
Made.com migrated to AWS to support a record-breaking sales period with no downtime. The company provides a website that links home-furnishings designers directly to consumers. It now runs its e-commerce platform, website, and customer-facing applications on AWS, using services such as Amazon EC2, Amazon RDS, and Auto Scaling groups.
Sony DADC New Media Solutions (NMS) distributes hundreds of thousands of hours of video content monthly, spins up data analytics, renders solutions in days instead of months, and saves millions of dollars in hardware refresh costs by going all in on AWS. The organization distributes and delivers content to film studios, television broadcasters, and other providers across the globe. NMS runs its content distribution platform, broadcast playout services, and video archive on the AWS Cloud.
Upserve quickly develops and trains more than 100 learning models, streams restaurant sales and menu item data in real time, and gives restaurateurs the ability to predict their nightly business using Amazon Machine Learning. The company provides online payment and analytical software to thousands of restaurant owners throughout the U.S. Upserve uses Amazon Machine Learning to provide predictive analysis through its Shift Prep application.
Disaster stories involving the Internet of Things are all the rage. They feature cars (both driven and driverless), the power grid, dams, and tunnel ventilation systems. A particularly vivid and realistic one, near-future fiction published last month in New York Magazine, described a cyberattack on New York that involved hacking of cars, the water system, hospitals, elevators, and the power grid. In these stories, thousands of people die. Chaos ensues. While some of these scenarios overhype the mass destruction, the individual risks are all real. And traditional computer and network security isn’t prepared to deal with them.
Classic information security is a triad: confidentiality, integrity, and availability. You’ll see it called “CIA,” which admittedly is confusing in the context of national security. But basically, the three things I can do with your data are steal it (confidentiality), modify it (integrity), or prevent you from getting it (availability).
So far, Internet threats have largely been about confidentiality. These can be expensive; one survey estimated that data breaches cost an average of $3.8 million each. They can be embarrassing, as in the theft of celebrity photos from Apple’s iCloud in 2014 or the Ashley Madison breach in 2015. They can be damaging, as when the government of North Korea stole tens of thousands of internal documents from Sony or when hackers stole data about 83 million customer accounts from JPMorgan Chase, both in 2014. They can even affect national security, as in the case of the Office of Personnel Management data breach by — presumptively — China in 2015.
On the Internet of Things, integrity and availability threats are much worse than confidentiality threats. It’s one thing if your smart door lock can be eavesdropped upon to know who is home. It’s another thing entirely if it can be hacked to allow a burglar to open the door — or prevent you from opening your door. A hacker who can deny you control of your car, or take over control, is much more dangerous than one who can eavesdrop on your conversations or track your car’s location.
With the advent of the Internet of Things and cyber-physical systems in general, we’ve given the Internet hands and feet: the ability to directly affect the physical world. What used to be attacks against data and information have become attacks against flesh, steel, and concrete.
Today’s threats include hackers crashingairplanes by hacking into computer networks, and remotely disabling cars, either when they’re turned off and parked or while they’re speeding down the highway. We’re worried about manipulated counts from electronic voting machines, frozen water pipes through hacked thermostats, and remote murder through hacked medical devices. The possibilities are pretty literally endless. The Internet of Things will allow for attacks we can’t even imagine.
The increased risks come from three things: software control of systems, interconnections between systems, and automatic or autonomous systems. Let’s look at them in turn:
Software Control. The Internet of Things is a result of everything turning into a computer. This gives us enormous power and flexibility, but it brings insecurities with it as well. As more things come under software control, they become vulnerable to all the attacks we’ve seen against computers. But because many of these things are both inexpensive and long-lasting, many of the patch and update systems that work with computers and smartphones won’t work. Right now, the only way to patch most home routers is to throw them away and buy new ones. And the security that comes from replacing your computer and phone every few years won’t work with your refrigerator and thermostat: on the average, you replace the former every 15 years, and the latter approximately never. A recent Princeton survey found 500,000 insecure devices on the Internet. That number is about to explode.
Interconnections. As these systems become interconnected, vulnerabilities in one lead to attacks against others. Already we’ve seen Gmail accounts compromised through vulnerabilities in Samsung smart refrigerators, hospital IT networks compromised through vulnerabilities in medical devices, and Target Corporation hacked through a vulnerability in its HVAC system. Systems are filled with externalities that affect other systems in unforeseen and potentially harmful ways. What might seem benign to the designers of a particular system becomes harmful when it’s combined with some other system. Vulnerabilities on one system cascade into other systems, and the result is a vulnerability that no one saw coming and no one bears responsibility for fixing. The Internet of Things will make exploitable vulnerabilities much more common. It’s simple mathematics. If 100 systems are all interacting with each other, that’s about 5,000 interactions and 5,000 potential vulnerabilities resulting from those interactions. If 300 systems are all interacting with each other, that’s 45,000 interactions. 1,000 systems: 12.5 million interactions. Most of them will be benign or uninteresting, but some of them will be very damaging.
Autonomy. Increasingly, our computer systems are autonomous. They buy and sell stocks, turn the furnace on and off, regulate electricity flow through the grid, and — in the case of driverless cars — automatically pilot multi-ton vehicles to their destinations. Autonomy is great for all sorts of reasons, but from a security perspective it means that the effects of attacks can take effect immediately, automatically, and ubiquitously. The more we remove humans from the loop, faster attacks can do their damage and the more we lose our ability to rely on actual smarts to notice something is wrong before it’s too late.
We’re building systems that are increasingly powerful, and increasingly useful. The necessary side effect is that they are increasingly dangerous. A single vulnerability forced Chrysler to recall 1.4 million vehicles in 2015. We’re used to computers being attacked at scale — think of the large-scale virus infections from the last decade — but we’re not prepared for this happening to everything else in our world.
Governments are taking notice. Last year, both Director of National Intelligence James Clapper and NSA Director Mike Rogers testified before Congress, warning of these threats. They both believe we’re vulnerable.
This is how it was phrased in the DNI’s 2015 Worldwide Threat Assessment: “Most of the public discussion regarding cyber threats has focused on the confidentiality and availability of information; cyber espionage undermines confidentiality, whereas denial-of-service operations and data-deletion attacks undermine availability. In the future, however, we might also see more cyber operations that will change or manipulate electronic information in order to compromise its integrity (i.e. accuracy and reliability) instead of deleting it or disrupting access to it. Decision-making by senior government officials (civilian and military), corporate executives, investors, or others will be impaired if they cannot trust the information they are receiving.”
The DNI 2016 threat assessment included something similar: “Future cyber operations will almost certainly include an increased emphasis on changing or manipulating data to compromise its integrity (i.e., accuracy and reliability) to affect decision making, reduce trust in systems, or cause adverse physical effects. Broader adoption of IoT devices and AI — in settings such as public utilities and healthcare — will only exacerbate these potential effects.”
Security engineers are working on technologies that can mitigate much of this risk, but many solutions won’t be deployed without government involvement. This is not something that the market can solve. Like data privacy, the risks and solutions are too technical for most people and organizations to understand; companies are motivated to hide the insecurity of their own systems from their customers, their users, and the public; the interconnections can make it impossible to connect data breaches with resultant harms; and the interests of the companies often don’t match the interests of the people.
Governments need to play a larger role: setting standards, policing compliance, and implementing solutions across companies and networks. And while the White House Cybersecurity National Action Plan says some of the right things, it doesn’t nearly go far enough, because so many of us are phobic of any government-led solution to anything.
The next president will probably be forced to deal with a large-scale Internet disaster that kills multiple people. I hope he or she responds with both the recognition of what government can do that industry can’t, and the political will to make it happen.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.