The EU’s General Data Protection Regulation (GDPR) describes data processor and data controller roles, and some customers and AWS Partner Network (APN) partners are asking how this affects the long-established AWS Shared Responsibility Model. I wanted to take some time to help folks understand shared responsibilities for us and for our customers in context of the GDPR.
How does the AWS Shared Responsibility Model change under GDPR? The short answer – it doesn’t. AWS is responsible for securing the underlying infrastructure that supports the cloud and the services provided; while customers and APN partners, acting either as data controllers or data processors, are responsible for any personal data they put in the cloud. The shared responsibility model illustrates the various responsibilities of AWS and our customers and APN partners, and the same separation of responsibility applies under the GDPR.
AWS responsibilities as a data processor
The GDPR does introduce specific regulation and responsibilities regarding data controllers and processors. When any AWS customer uses our services to process personal data, the controller is usually the AWS customer (and sometimes it is the AWS customer’s customer). However, in all of these cases, AWS is always the data processor in relation to this activity. This is because the customer is directing the processing of data through its interaction with the AWS service controls, and AWS is only executing customer directions. As a data processor, AWS is responsible for protecting the global infrastructure that runs all of our services. Controllers using AWS maintain control over data hosted on this infrastructure, including the security configuration controls for handling end-user content and personal data. Protecting this infrastructure, is our number one priority, and we invest heavily in third-party auditors to test our security controls and make any issues they find available to our customer base through AWS Artifact. Our ISO 27018 report is a good example, as it tests security controls that focus on protection of personal data in particular.
AWS has an increased responsibility for our managed services. Examples of managed services include Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon Elastic MapReduce, and Amazon WorkSpaces. These services provide the scalability and flexibility of cloud-based resources with less operational overhead because we handle basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. For most managed services, you only configure logical access controls and protect account credentials, while maintaining control and responsibility of any personal data.
Customer and APN partner responsibilities as data controllers — and how AWS Services can help
Our customers can act as data controllers or data processors within their AWS environment. As a data controller, the services you use may determine how you configure those services to help meet your GDPR compliance needs. For example, AWS Services that are classified as Infrastructure as a Service (IaaS), such as Amazon EC2, Amazon VPC, and Amazon S3, are under your control and require you to perform all routine security configuration and management that would be necessary no matter where the servers were located. With Amazon EC2 instances, you are responsible for managing: guest OS (including updates and security patches), application software or utilities installed on the instances, and the configuration of the AWS-provided firewall (called a security group).
To help you realize data protection by design principles under the GDPR when using our infrastructure, we recommend you protect AWS account credentials and set up individual user accounts with Amazon Identity and Access Management (IAM) so that each user is only given the permissions necessary to fulfill their job duties. We also recommend using multi-factor authentication (MFA) with each account, requiring the use of SSL/TLS to communicate with AWS resources, setting up API/user activity logging with AWS CloudTrail, and using AWS encryption solutions, along with all default security controls within AWS Services. You can also use advanced managed security services, such as Amazon Macie, which assists in discovering and securing personal data stored in Amazon S3.
For more information, you can download the AWS Security Best Practices whitepaper or visit the AWS Security Resources or GDPR Center webpages. In addition to our solutions and services, AWS APN partners can provide hundreds of tools and features to help you meet your security objectives, ranging from network security and configuration management to access control and data encryption.
Earlier this spring, an excited group of STEM educators came together to participate in the first ever Raspberry Pi and Arduino workshop in Puerto Rico.
Their three-day digital making adventure was led by MakerTechPR’s José Rullán and Raspberry Pi Certified Educator Alex Martínez. They ran the event as part of the Robot Makers challenge organized by Yees! and sponsored by Puerto Rico’s Department of Economic Development and Trade to promote entrepreneurial skills within Puerto Rico’s education system.
Over 30 educators attended the workshop, which covered the use of the Raspberry Pi 3 as a computer and digital making resource. The educators received a kit consisting of a Raspberry Pi 3 with an Explorer HAT Pro and an Arduino Uno. At the end of the workshop, the educators were able to keep the kit as a demonstration unit for their classrooms. They were enthusiastic to learn new concepts and immerse themselves in the world of physical computing.
In their first session, the educators were introduced to the Raspberry Pi as an affordable technology for robotic clubs. In their second session, they explored physical computing and the coding languages needed to control the Explorer HAT Pro. They started off coding with Scratch, with which some educators had experience, and ended with controlling the GPIO pins with Python. In the final session, they learned how to develop applications using the powerful combination of Arduino and Raspberry Pi for robotics projects. This gave them a better understanding of how they could engage their students in physical computing.
“The Raspberry Pi ecosystem is the perfect solution in the classroom because to us it is very resourceful and accessible.” – Alex Martínez
Computer science and robotics courses are important for many schools and teachers in Puerto Rico. The simple idea of programming a microcontroller from a $35 computer increases the chances of more students having access to more technology to create things.
Puerto Rico’s education system has faced enormous challenges after Hurricane Maria, including economic collapse and the government’s closure of many schools due to the exodus of families from the island. By attending training like this workshop, educators in Puerto Rico are becoming more experienced in fields like robotics in particular, which are key for 21st-century skills and learning. This, in turn, can lead to more educational opportunities, and hopefully the reopening of more schools on the island.
“We find it imperative that our children be taught STEM disciplines and skills. Our goal is to continue this work of spreading digital making and computer science using the Raspberry Pi around Puerto Rico. We want our children to have the best education possible.” – Alex Martínez
After attending Picademy in 2016, Alex has integrated the Raspberry Pi Foundation’s online resources into his classroom. He has also taught small workshops around the island and in the local Puerto Rican makerspace community. José is an electrical engineer, entrepreneur, educator and hobbyist who enjoys learning to use technology and sharing his knowledge through projects and challenges.
Chris Mason and Josef Bacik led a brief discussion on the block-I/O controller for control groups (cgroups) in the filesystem track at the 2018 Linux Storage, Filesystem, and Memory-Management Summit. Mostly they were just aiming to get feedback on the approach they have taken. They are trying to address the needs of their employer, Facebook, with regard to the latency of I/O operations.
Memory control groups allow the system administrator to impose memory-use limits on the members of control groups. In many ways, these limits behave like the overall limit on available memory, but there are also some differences. The behavior of the memory controller also changed with the advent of the version-2 control-group API, creating problems for at least one significant user. Three sessions held in the memory-management track of the Linux Storage, Filesystem, and Memory-Management Summit explored some of these problems.
Security is our top priority at AWS, and from the beginning we have built security into the fabric of our services. With the introduction of GDPR (which becomes enforceable on May 25 of 2018), privacy and data protection have become even more ingrained into our security-centered culture. Three weeks ago, well ahead of the deadline, we announced that all AWS services are compliant with GDPR, meaning you can use AWS as a data processor as a way to help solve your GDPR challenges (be sure to visit our GDPR Center for additional information).
When it comes to GDPR compliance, many customers are progressing nicely and much of the initial trepidation is gone. In my interactions with customers on this topic, a few themes have emerged as universal:
GDPR is important. You need to have a plan in place if you process personal data of EU data subjects, not only because it’s good governance, but because GDPR does carry significant penalties for non-compliance.
Solving this can be complex, potentially involving a lot of personnel and multiple tools. Your GDPR process will also likely span across disciplines – impacting people, processes, and technology.
Each customer is unique, and there are many methodologies around assessing your compliance with GDPR. It’s important to be aware of your own individual business attributes.
I thought it might be helpful to share some of our own lessons learned. In our experience in solving the GDPR challenge, the following were keys to our success:
Get your senior leadership involved. We have a regular cadence of detailed status conversations about GDPR with our CEO, Andy Jassy. GDPR is high stakes, and the AWS leadership team knows it. If GDPR doesn’t have the attention it needs with the visibility of top management today, it’s time to escalate.
Centralize the GDPR efforts. Driving all work streams centrally is key. This may sound obvious, but managing this in a distributed manner may result in duplicative effort and/or team members moving in a different direction.
The most important single partner in solving GDPR is your legal team. Having non-legal people make assumptions about how to interpret GDPR for your unique environment is both risky and a potential waste of time and resources. You want to avoid analysis paralysis by getting proper legal advice, collaborating on a direction, and then moving forward with the proper urgency.
Collaborate closely with tech leadership. The “process” people in your organization, the ones who already know how to approach governance problems, are typically comfortable jumping right in to GDPR. But technical teams, including data owners, have set up their software for business application. They may not even know what kind of data they are storing, processing, or transferring to other parts of the business. In the GDPR exercise they need to be aware of (or at least help facilitate) the tracking of data and data elements between systems. This isn’t a typical ask for technical teams, so be prepared to educate and to fully understand data flow.
Don’t live by the established checklists. There are multiple methodologies to solving the compliance challenges of GDPR. At AWS, we ended up establishing core requirements, mapped out by data controller and data processor functions and then, in partnership with legal, decided upon a group of projects based on our known current state. Be careful about using a set methodology, tool or questionnaire to govern your efforts. These generic assessments can help educate, but letting them drive or limit your work could lead to missing something that is key to your own compliance. In this sense, a generic, “one size fits all” solution might not be helpful.
Don’t be afraid to challenge prior orthodoxy. Many times we changed course based on new information. You shouldn’t be afraid to scrap an effort if you determine it’s not working. You should also not be afraid to escalate issues to senior leadership when needed. This is an executive issue.
Look for ways to leverage your work beyond this compliance activity. GDPR requires serious effort, but are the results limited to GDPR compliance? Certainly not. You can use GDPR workflows as a way to ensure better governance moving forward. Privacy and security will require work for the foreseeable future, so make your governance program scalable and usable for other purposes.
One last tip that has made all the difference: think about protecting data subjects and work backwards from there. Customer focus drives us to ask, “what would customers and data subjects want and expect us to do?” Taking GDPR from a pure legal or compliance standpoint may be technically sufficient, but we believe the objectives of security and personal data protection require a more comprehensive view, and you can most effectively shape that view by starting with the individuals GDPR was meant to protect.
If you would like to find out more about our experiences, as well as how we can help you in your efforts, please reach out to us today.
-Chad Woolf
Vice President, AWS Security Assurance
Interested in additional AWS Security news? Follow the AWS Security Blog on Twitter.
With the Greenland shark finally caught on video for the very first time, scientists and engineers are discussing the limitations of current marine monitoring technology. One significant advance comes from the CSAIL team at Massachusetts Institute of Technology (MIT): SoFi, the robotic fish.
More info: http://bit.ly/SoFiRobot Paper: http://robert.katzschmann.eu/wp-content/uploads/2018/03/katzschmann2018exploration.pdf
The untethered SoFi robot
Last week, the Computer Science and Artificial Intelligence Laboratory (CSAIL) team at MIT unveiled SoFi, “a soft robotic fish that can independently swim alongside real fish in the ocean.”
Directed by a Super Nintendo controller and acoustic signals, SoFi can dive untethered to a maximum of 18 feet for a total of 40 minutes. A Raspberry Pi receives input from the controller and amplifies the ultrasound signals for SoFi via a HiFiBerry. The controller, Raspberry Pi, and HiFiBerry are sealed within a waterproof, cast-moulded silicone membrane filled with non-conductive mineral oil, allowing for underwater equalisation.
The ultrasound signals, received by a modem within SoFi’s head, control everything from direction, tail oscillation, pitch, and depth to the onboard camera.
As explained on MIT’s news blog, “to make the robot swim, the motor pumps water into two balloon-like chambers in the fish’s tail that operate like a set of pistons in an engine. As one chamber expands, it bends and flexes to one side; when the actuators push water to the other channel, that one bends and flexes in the other direction.”
Ocean exploration
While we’ve seen many autonomous underwater vehicles (AUVs) using onboard Raspberry Pis, SoFi’s ability to roam untethered with a wireless waterproof controller is an exciting achievement.
“To our knowledge, this is the first robotic fish that can swim untethered in three dimensions for extended periods of time. We are excited about the possibility of being able to use a system like this to get closer to marine life than humans can get on their own.” – CSAIL PhD candidate Robert Katzschmann
As the MIT news post notes, SoFi’s simple, lightweight setup of a single camera, a motor, and a smartphone lithium polymer battery set it apart it from existing bulky AUVs that require large motors or support from boats.
For more in-depth information on SoFi and the onboard tech that controls it, find the CSAIL team’s paper here.
Here’s the second part of Daniel Stone’s series on recent improvements in low-level graphics support. “The end result of all this work is that we have been able to eliminate the magic side channels which used to proliferate, and lay the groundwork for properly communicating this information across multiple devices as well. Devices supporting ARM’s AFBC compression format are just beginning to hit the market, which share a single compression format between video decoder, GPU, and display controller. We are also beginning to see GPUs from different vendors share tiling formats, in order to squeeze the most performance possible from hybrid GPU systems.”
GDPR is the new data protection regulation, as you probably already know. I’ve given a detailed practical advice for what it means for developers (and product owners). However, there’s one thing missing there – cookies. The elephant in the room.
Previously I’ve stated that cookies are subject to another piece of legislation – the ePrivacy directive, which is getting updated and its new version will be in force a few years from now. And while that’s technically correct, cookies seem to be affected by GDPR as well. In a way I’ve underestimated that effect.
When you do a Google search on “GDPR cookies”, you’ll pretty quickly realize that a) there’s not too much information and b) there’s not much technical understanding of the issue.
What appears to be the consensus is that GDPR does change the way cookies are handled. More specifically – tracking cookies. Here’s recital 30:
(30) Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.
How tracking cookies work – a 3rd party (usually an ad network) gives you a code snippet that you place on your website, for example to display ads. That code snippet, however, calls “home” (makes a request to the 3rd party domain). If the 3rd party has previously been used on your computer, it has created a cookie. In the example of Facebook, they have the cookie with your Facebook identifier because you’ve logged in to Facebook. So this cookie (with your identifier) is sent with the request. The request also contains all the details from the page. In effect, you are uniquely identified by an identifier (in the case of Facebook and Google – fully identified, rather than some random anonymous identifier as with other ad networks).
Your behaviour on the website is personal data. It gets associated with your identifier, which in turn is associated with your profile. And all of that is personal data. Who is responsible for collecting the website behaviour data, i.e. who is the “controller”? Is it Facebook (or any other 3rd party) that technically does the collection? No, it’s the website owner, as the behaviour data is obtained on their website, and they have put the tracking piece of code there. So they bear responsibility.
What’s the responsibility? So far it boiled down to displaying the useless “we use cookies” warning that nobody cares about. And the current (old) ePrivacy directive and its interpretations says that this is enough – if the users actions can unambiguously mean that they are fine with cookies – i.e. if they continue to use the website after seeing the warning – then you’re fine. This is no longer true from a GDPR perspective – you are collecting user data and you have to have a lawful ground for processing.
For the data collected by tracking cookies you have two options – “consent” and “legitimate interest”. Legitimate interest will be hard to prove – it is not something that a user reasonably expects, it is not necessary for you to provide the service. If your lawyers can get that option to fly, good for them, but I’m not convinced regulators will be happy with that.
The other option is “consent”. You have to ask your users explicitly – that means “with a checkbox” – to let you use tracking cookies. That has two serious implications – from technical and usability point of view.
The technical issue is that the data is sent via 3rd party code as soon as the page loads and before the user can give their consent. And that’s already a violation. You can, of course, have the 3rd party code be dynamically inserted only after the user gives consent, but that will require some fiddling with javascript and might not always work depending on the provider. And you’d have to support opt-out at any time (which would in turn disable the 3rd party snippet). It would require actual coding, rather than just copy-pasting a snippet.
The usability aspect is the bigger issue – while you could neatly tuck a cookie warning at the bottom, you’d now have to have a serious, “stop the world” popup that asks for consent if you want anyone to click it. You can, of course, just add a checkbox to the existing cookie warning, but don’t expect anyone to click it.
These aspects pose a significant questions: is it worth it to have tracking cookies? Is developing new functionality worth it, is interrupting the user worth it, and is implementing new functionality just so that users never clicks a hidden checkbox worth it? Especially given that Firefox now blocks all tracking cookies and possibly other browsers will follow?
That by itself is an interesting topic – Firefox has basically implemented the most strict form of requirements of the upcoming ePrivacy directive update (that would turn it into an ePrivacy regulation). Other browsers will have to follow, even though Google may not be happy to block their own tracking cookies. I hope other browsers follow Firefox in tracking protection and the issue will be gone automatically.
To me it seems that it will be increasingly not worthy to have tracking cookies on your website. They add regulatory obligations for you and give you very little benefit (yes, you could track engagement from ads, but you can do that in other ways, arguably by less additional code than supporting the cookie consents). And yes, the cookie consent will be “outsourced” to browsers after the ePrivacy regulation is passed, but we can’t be sure at the moment whether there won’t be technical whack-a-mole between browsers and advertisers and whether you wouldn’t still need additional effort to have dynamic consent for tracking cookies. (For example there are reported issues that Firefox used to make Facebook login fail if tracking protection is enabled. Which could be a simple bug, or could become a strategy by big vendors in the future to force browsers into a less strict tracking protection).
Okay, we’ve decided it’s not worth it managing tracking cookies. But do you have a choice as a website owner? Can you stop your ad network from using them? (Remember – you are liable if users’ data is collected by visiting your website). And currently the answer is no – you can’t disable that. You can’t have “just the ads”. This is part of the “deal” – you get money for the ads you place, but you participate in a big “surveillance” network. Users have a way to opt out (e.g. Google AdWords gives them that option). You, as a website owner, don’t.
Facebook has a recommendations page that says “you take care of getting the consent”. But for example the “like button” plugin doesn’t have an option to not send any data to Facebook.
And sometimes you don’t want to serve ads, just track user behaviour and measure conversion. But even if you ask for consent for that and conditionally insert the plugin/snippet, do you actually know what data it sends? And what it’s used for? Because you have to know in order to inform your users. “Do you agree to use tracking cookies that Facebook has inserted in order to collect data about your behaviour on our website” doesn’t sound compelling.
So, what to do? The easiest thing is just not to use any 3rd party ad-related plugins. But that’s obviously not an option, as ad revenue is important, especially in the publishing industry. I don’t have a good answer, apart from “Regulators should pressure ad networks to provide opt-outs and clearly document their data usage”. They have to do that under GDPR, and while website owners are responsible for their users’ data, the ad networks that are in the role of processors in this case (as you delegate the data collection for your visitors to them) also have obligation to assist you in fulfilling your obligations. So ask Facebook – what should I do with your tracking cookies? And when the regulator comes after a privacy-aware customer files a complaint, you could prove that you’ve tried.
The ethical debate whether it’s wrong to collect data about peoples’ behaviour without their informed consent is an easy one. And that’s why I don’t put blame on the regulators – they are putting the ethical consensus in law. It gets more complicated if not allowing tracking means some internet services are no longer profitable and therefore can’t exist. Can we have the cake and eat it too?
Here’s a long post. We think you’ll find it interesting. If you don’t have time to read it all, we recommend you watch this video, which will fill you in with everything you need, and then head straight to the product page to fill yer boots. (We recommend the video anyway, even if you do have time for a long read. ‘Cos it’s fab.)
Raspberry Pi 3 Model B+ is now on sale now for $35, featuring: – A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU – Dual-band 802.11ac wireless LAN and Bluetooth 4.2 – Faster Ethernet (Gigabit Ethernet over USB 2.0) – Power-over-Ethernet support (with separate PoE HAT) – Improved PXE network and USB mass-storage booting – Improved thermal management Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.
If you’ve been a Raspberry Pi watcher for a while now, you’ll have a bit of a feel for how we update our products. Just over two years ago, we releasedRaspberry Pi 3 Model B. This was our first 64-bit product, and our first product to feature integrated wireless connectivity. Since then, we’ve sold over nine million Raspberry Pi 3 units (we’ve sold 19 million Raspberry Pis in total), which have been put to work in schools, homes, offices and factories all over the globe.
Those Raspberry Pi watchers will know that we have a history of releasing improved versions of our products a couple of years into their lives. The first example was Raspberry Pi 1 Model B+, which added two additional USB ports, introduced our current form factor, and rolled up a variety of other feedback from the community. Raspberry Pi 2 didn’t get this treatment, of course, as it was superseded after only one year; but it feels like it’s high time that Raspberry Pi 3 received the “plus” treatment.
So, without further ado, Raspberry Pi 3 Model B+ is now on sale for $35 (the same price as the existing Raspberry Pi 3 Model B), featuring:
A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU
Dual-band 802.11ac wireless LAN and Bluetooth 4.2
Faster Ethernet (Gigabit Ethernet over USB 2.0)
Power-over-Ethernet support (with separate PoE HAT)
Improved PXE network and USB mass-storage booting
Improved thermal management
Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.
Behold the shiny
Raspberry Pi 3B+ is available to buy today from our network of Approved Resellers.
New features, new chips
Roger Thornton did the design work on this revision of the Raspberry Pi. Here, he and I have a chat about what’s new.
Raspberry Pi 3 Model B+ is now on sale now for $35, featuring: – A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU – Dual-band 802.11ac wireless LAN and Bluetooth 4.2 – Faster Ethernet (Gigabit Ethernet over USB 2.0) – Power-over-Ethernet support (with separate PoE HAT) – Improved PXE network and USB mass-storage booting – Improved thermal management Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.
The new product is built around BCM2837B0, an updated version of the 64-bit Broadcom application processor used in Raspberry Pi 3B, which incorporates power integrity optimisations, and a heat spreader (that’s the shiny metal bit you can see in the photos). Together these allow us to reach higher clock frequencies (or to run at lower voltages to reduce power consumption), and to more accurately monitor and control the temperature of the chip.
Dual-band wireless LAN and Bluetooth are provided by the Cypress CYW43455 “combo” chip, connected to a Proant PCB antenna similar to the one used on Raspberry Pi Zero W. Compared to its predecessor, Raspberry Pi 3B+ delivers somewhat better performance in the 2.4GHz band, and far better performance in the 5GHz band, as demonstrated by these iperf results from LibreELEC developer Milhouse.
Tx bandwidth (Mb/s)
Rx bandwidth (Mb/s)
Raspberry Pi 3B
35.7
35.6
Raspberry Pi 3B+ (2.4GHz)
46.7
46.3
Raspberry Pi 3B+ (5GHz)
102
102
The wireless circuitry is encapsulated under a metal shield, rather fetchingly embossed with our logo. This has allowed us to certify the entire board as a radio module under FCC rules, which in turn will significantly reduce the cost of conformance testing Raspberry Pi-based products.
We’ll be teaching metalwork next.
Previous Raspberry Pi devices have used the LAN951x family of chips, which combine a USB hub and 10/100 Ethernet controller. For Raspberry Pi 3B+, Microchip have supported us with an upgraded version, LAN7515, which supports Gigabit Ethernet. While the USB 2.0 connection to the application processor limits the available bandwidth, we still see roughly a threefold increase in throughput compared to Raspberry Pi 3B. Again, here are some typical iperf results.
Tx bandwidth (Mb/s)
Rx bandwidth (Mb/s)
Raspberry Pi 3B
94.1
95.5
Raspberry Pi 3B+
315
315
We use a magjack that supports Power over Ethernet (PoE), and bring the relevant signals to a new 4-pin header. We will shortly launch a PoE HAT which can generate the 5V necessary to power the Raspberry Pi from the 48V PoE supply.
There… are… four… pins!
Coming soon to a Raspberry Pi 3B+ near you
Raspberry Pi 3B was our first product to support PXE Ethernet boot. Testing it in the wild shook out a number of compatibility issues with particular switches and traffic environments. Gordon has rolled up fixes for all known issues into the BCM2837B0 boot ROM, and PXE boot is now enabled by default.
Clocking, voltages and thermals
The improved power integrity of the BCM2837B0 package, and the improved regulation accuracy of our new MaxLinear MxL7704 power management IC, have allowed us to tune our clocking and voltage rules for both better peak performance and longer-duration sustained performance.
Below 70°C, we use the improvements to increase the core frequency to 1.4GHz. Above 70°C, we drop to 1.2GHz, and use the improvements to decrease the core voltage, increasing the period of time before we reach our 80°C thermal throttle; the reduction in power consumption is such that many use cases will never reach the throttle. Like a modern smartphone, we treat the thermal mass of the device as a resource, to be spent carefully with the goal of optimising user experience.
This graph, courtesy of Gareth Halfacree, demonstrates that Raspberry Pi 3B+ runs faster and at a lower temperature for the duration of an eight‑minute quad‑core Sysbench CPU test.
Note that Raspberry Pi 3B+ does consume substantially more power than its predecessor. We strongly encourage you to use a high-quality 2.5A power supply, such as the official Raspberry Pi Universal Power Supply.
FAQs
We’ll keep updating this list over the next couple of days, but here are a few to get you started.
Are you discontinuing earlier Raspberry Pi models?
No. We have a lot of industrial customers who will want to stick with the existing products for the time being. We’ll keep building these models for as long as there’s demand. Raspberry Pi 1B+, Raspberry Pi 2B, and Raspberry Pi 3B will continue to sell for $25, $35, and $35 respectively.
What about Model A+?
Raspberry Pi 1A+ continues to be the $20 entry-level “big” Raspberry Pi for the time being. We are considering the possibility of producing a Raspberry Pi 3A+ in due course.
What about the Compute Module?
CM1, CM3 and CM3L will continue to be available. We may offer versions of CM3 and CM3L with BCM2837B0 in due course, depending on customer demand.
Are you still using VideoCore?
Yes. VideoCore IV 3D is the only publicly-documented 3D graphics core for ARM‑based SoCs, and we want to make Raspberry Pi more open over time, not less.
Credits
A project like this requires a vast amount of focused work from a large team over an extended period. Particular credit is due to Roger Thornton, who designed the board and ran the exhaustive (and exhausting) RF compliance campaign, and to the team at the Sony UK Technology Centre in Pencoed, South Wales. A partial list of others who made major direct contributions to the BCM2837B0 chip program, CYW43455 integration, LAN7515 and MxL7704 developments, and Raspberry Pi 3B+ itself follows:
James Adams, David Armour, Jonathan Bell, Maria Blazquez, Jamie Brogan-Shaw, Mike Buffham, Rob Campling, Cindy Cao, Victor Carmon, KK Chan, Nick Chase, Nigel Cheetham, Scott Clark, Nigel Clift, Dominic Cobley, Peter Coyle, John Cronk, Di Dai, Kurt Dennis, David Doyle, Andrew Edwards, Phil Elwell, John Ferdinand, Doug Freegard, Ian Furlong, Shawn Guo, Philip Harrison, Jason Hicks, Stefan Ho, Andrew Hoare, Gordon Hollingworth, Tuomas Hollman, EikPei Hu, James Hughes, Andy Hulbert, Anand Jain, David John, Prasanna Kerekoppa, Shaik Labeeb, Trevor Latham, Steve Le, David Lee, David Lewsey, Sherman Li, Xizhe Li, Simon Long, Fu Luo Larson, Juan Martinez, Sandhya Menon, Ben Mercer, James Mills, Max Passell, Mark Perry, Eric Phiri, Ashwin Rao, Justin Rees, James Reilly, Matt Rowley, Akshaye Sama, Ian Saturley, Serge Schneider, Manuel Sedlmair, Shawn Shadburn, Veeresh Shivashimper, Graham Smith, Ben Stephens, Mike Stimson, Yuree Tchong, Stuart Thomson, John Wadsworth, Ian Watch, Sarah Williams, Jason Zhu.
If you’re not on this list and think you should be, please let me know, and accept my apologies.
Germany-based Andreas Rottach’s multi-purpose LED table is an impressive build within a gorgeous-looking body. Play games, view (heavily pixelated) images, and become hypnotised by flashy lights, once you’ve built your own using his newly released tutorial.
This is a short presentation of my LED-Matrix Table. The table is controlled by a raspberry pi computer that executes a control engine, written in c++. It supports input from keyboards or custom made game controllers. A full list of all features as well as the source code is available on GitHub (https://github.com/rottaca/LEDTableEngine).
Much excitement
Andreas uploaded a video of his LED Matrix Table to YouTube back in February, with the promise of publishing a complete write-up within the coming weeks. And so the members of Pi Towers sat, eagerly waiting and watching. Now the write-up has arrived, to our cheers of acclaim for this beautful, shiny, flashy, LED-based wonderment.
Andreas created the table’s impressive light matrix using a strip of 300 LEDs, chained together and connected to the Raspberry Pi via an LED controller.
The LEDs are set out in zigzags
For the code, he used several open-source tools, such as SDL for image and audio support, and CMake for building the project software.
Anyone planning to recreate Andreas’ table can compile its engine by downloading the project repository from GitHub. Again, find full instructions for this on his GitHub.
Features
The table boasts multiple cool features, including games and visualisation tools. Using the controllers, you can play simplified versions of Flappy Bird and Minesweeper, or go on a nostalgia trip with Tetris, Pong, and Snake.
There’s also a version of Conway’s Game of Life. Andreas explains: “The lifespan of each cell is color-coded. If the game field gets static, the animation is automatically reset to a new random cell population.”
The table can also display downsampled Bitmap images, or show clear static images such as a chess board, atop of which you can place physical game pieces.
Find all the 3D-printable aspects of the LED table on Thingiverse here and here, and the full GitHub tutorial and repository here. If you build your own, or have already dabbled in LED tables and displays, be sure to share your project with us, either in the comments below or via our social media accounts. What other functions would you integrate into this awesome build?
This post was written in partnership with Intuit to share learnings, best practices, and recommendations for running an Apache Kafka cluster on AWS. Thanks to Vaishak Suresh and his colleagues at Intuit for their contribution and support.
Intuit, in their own words: Intuit, a leading enterprise customer for AWS, is a creator of business and financial management solutions. For more information on how Intuit partners with AWS, see our previous blog post, Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS. Apache Kafka is an open-source, distributed streaming platform that enables you to build real-time streaming applications.
The best practices described in this post are based on our experience in running and operating large-scale Kafka clusters on AWS for more than two years. Our intent for this post is to help AWS customers who are currently running Kafka on AWS, and also customers who are considering migrating on-premises Kafka deployments to AWS.
Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments. However, given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.
In this blog post, we cover the following aspects of running Kafka clusters on AWS:
Deployment considerations and patterns
Storage options
Instance types
Networking
Upgrades
Performance tuning
Monitoring
Security
Backup and restore
Note: While implementing Kafka clusters in a production environment, make sure also to consider factors like your number of messages, message size, monitoring, failure handling, and any operational issues.
Deployment considerations and patterns
In this section, we discuss various deployment options available for Kafka on AWS, along with pros and cons of each option. A successful deployment starts with thoughtful consideration of these options. Considering availability, consistency, and operational overhead of the deployment helps when choosing the right option.
Single AWS Region, Three Availability Zones, All Active
One typical deployment pattern (all active) is in a single AWS Region with three Availability Zones (AZs). One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and consumer instances as shown in the illustration following.
In this pattern, this is the Kafka cluster deployment:
Kafka producers and Kafka cluster are deployed on each AZ.
Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
Kafka consumers aggregate data from all three Kafka clusters.
Kafka cluster failover occurs this way:
Mark down all Kafka producers
Stop consumers
Debug and restack Kafka
Restart consumers
Restart Kafka producers
Following are the pros and cons of this pattern.
Pros
Cons
Highly available
Can sustain the failure of two AZs
No message loss during failover
Simple deployment
Very high operational overhead:
All changes need to be deployed three times, one for each Kafka cluster
Maintaining and monitoring three Kafka clusters
Maintaining and monitoring three consumer clusters
A restart is required for patching and upgrading brokers in a Kafka cluster. In this approach, a rolling upgrade is done separately for each cluster.
Single Region, Three Availability Zones, Active-Standby
Another typical deployment pattern (active-standby) is in a single AWS Region with a single Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka mirroring with MirrorMaker to replicate messages between any two clusters.
In this pattern, this is the Kafka cluster deployment:
Kafka producers are deployed on all three AZs.
Only one Kafka cluster is deployed across three AZs (active).
ZooKeeper instances are deployed on each AZ.
Brokers are spread evenly across all three AZs.
Kafka consumers can be deployed across all three AZs.
Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.
Kafka cluster failover occurs this way:
Switch traffic to standby Kafka producers cluster and Kafka cluster.
Restart consumers to consume from standby Kafka cluster.
Following are the pros and cons of this pattern.
Pros
Cons
Less operational overhead when compared to the first option
Only one Kafka cluster to manage and consume data from
Can handle single AZ failures without activating a standby Kafka cluster
Added latency due to cross-AZ data transfer among Kafka brokers
For Kafka versions before 0.10, replicas for topic partitions have to be assigned so they’re distributed to the brokers on different AZs (rack-awareness)
The cluster can become unavailable in case of a network glitch, where ZooKeeper does not see Kafka brokers
Possibility of in-transit message loss during failover
Intuit recommends using a single Kafka cluster in one AWS Region, with brokers distributing across three AZs (single region, three AZs). This approach offers stronger fault tolerance than otherwise, because a failed AZ won’t cause Kafka downtime.
Storage options
There are two storage options for file storage in Amazon EC2:
Ephemeral storage is local to the Amazon EC2 instance. It can provide high IOPS based on the instance type. On the other hand, Amazon EBS volumes offer higher resiliency and you can configure IOPS based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. Your choice of storage is closely related to the type of workload supported by your Kafka cluster.
Kafka provides built-in fault tolerance by replicating data partitions across a configurable number of instances. If a broker fails, you can recover it by fetching all the data from other brokers in the cluster that host the other replicas. Depending on the size of the data transfer, it can affect recovery process and network traffic. These in turn eventually affect the cluster’s performance.
The following table contrasts the benefits of using an instance store versus using EBS for storage.
Instance store
EBS
Instance storage is recommended for large- and medium-sized Kafka clusters. For a large cluster, read/write traffic is distributed across a high number of brokers, so the loss of a broker has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important, but a failed broker takes longer and requires more network traffic for a smaller Kafka cluster.
Storage-optimized instances like h1, i3, and d2 are an ideal choice for distributed applications like Kafka.
The primary advantage of using EBS in a Kafka deployment is that it significantly reduces data-transfer traffic when a broker fails or must be replaced. The replacement broker joins the cluster much faster.
Data stored on EBS is persisted in case of an instance failure or termination. The broker’s data stored on an EBS volume remains intact, and you can mount the EBS volume to a new EC2 instance. Most of the replicated data for the replacement broker is already available in the EBS volume and need not be copied over the network from another broker. Only the changes made after the original broker failure need to be transferred across the network. That makes this process much faster.
Intuit chose EBS because of their frequent instance restacking requirements and also other benefits provided by EBS.
Generally, Kafka deployments use a replication factor of three. EBS offers replication within their service, so Intuit chose a replication factor of two instead of three.
Instance types
The choice of instance types is generally driven by the type of storage required for your streaming applications on a Kafka cluster. If your application requires ephemeral storage, h1, i3, and d2 instances are your best option.
Intuit used r3.xlarge instances for their brokers and r3.large for ZooKeeper, with ST1 (throughput optimized HDD) EBS for their Kafka cluster.
Here are sample benchmark numbers from Intuit tests.
Configuration
Broker bytes (MB/s)
r3.xlarge
ST1 EBS
12 brokers
12 partitions
Aggregate 346.9
If you need EBS storage, then AWS has a newer-generation r4 instance. The r4 instance is superior to R3 in many ways:
It has a faster processor (Broadwell).
EBS is optimized by default.
It features networking based on Elastic Network Adapter (ENA), with up to 10 Gbps on smaller sizes.
The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.
If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.
In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.
If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.
Upgrades
Kafka has a history of not being backward compatible, but its support of backward compatibility is getting better. During a Kafka upgrade, you should keep your producer and consumer clients on a version equal to or lower than the version you are upgrading from. After the upgrade is finished, you can start using a new protocol version and any new features it supports. There are three upgrade approaches available, discussed following.
Rolling or in-place upgrade
In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into consideration the recommendations for doing rolling restarts to avoid downtime for end users.
Downtime upgrade
If you can afford the downtime, you can take your entire cluster down, upgrade each Kafka broker, and then restart the cluster.
Blue/green upgrade
Intuit followed the blue/green deployment model for their workloads, as described following.
If you can afford to create a separate Kafka cluster and upgrade it, we highly recommend the blue/green upgrade scenario. In this scenario, we recommend that you keep your clusters up-to-date with the latest Kafka version. For additional details on Kafka version upgrades or more details, see the Kafka upgrade documentation.
The following illustration shows a blue/green upgrade.
In this scenario, the upgrade plan works like this:
Create a new Kafka cluster on AWS.
Create a new Kafka producers stack to point to the new Kafka cluster.
Create topics on the new Kafka cluster.
Test the green deployment end to end (sanity check).
Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the new green Kafka environment that you have created.
The roll-back plan works like this:
Switch Amazon Route 53 to the old Kafka producers stack on AWS to point to the old Kafka environment.
You can tune Kafka performance in multiple dimensions. Following are some best practices for performance tuning.
These are some general performance tuning techniques:
If throughput is less than network capacity, try the following:
Add more threads
Increase batch size
Add more producer instances
Add more partitions
To improve latency when acks =-1, increase your num.replica.fetches value.
For cross-AZ data transfer, tune your buffer settings for sockets and for OS TCP.
Make sure that num.io.threads is greater than the number of disks dedicated for Kafka.
Adjust num.network.threads based on the number of producers plus the number of consumers plus the replication factor.
Your message size affects your network bandwidth. To get higher performance from a Kafka cluster, select an instance type that offers 10 Gb/s performance.
For Java and JVM tuning, try the following:
Minimize GC pauses by using the Oracle JDK, which uses the new G1 garbage-first collector.
Try to keep the Kafka heap size below 4 GB.
Monitoring
Knowing whether a Kafka cluster is working correctly in a production environment is critical. Sometimes, just knowing that the cluster is up is enough, but Kafka applications have many moving parts to monitor. In fact, it can easily become confusing to understand what’s important to watch and what you can set aside. Items to monitor range from simple metrics about the overall rate of traffic, to producers, consumers, brokers, controller, ZooKeeper, topics, partitions, messages, and so on.
For monitoring, Intuit used several tools, including Newrelec, Wavefront, Amazon CloudWatch, and AWS CloudTrail. Our recommended monitoring approach follows.
For system metrics, we recommend that you monitor:
CPU load
Network metrics
File handle usage
Disk space
Disk I/O performance
Garbage collection
ZooKeeper
For producers, we recommend that you monitor:
Batch-size-avg
Compression-rate-avg
Waiting-threads
Buffer-available-bytes
Record-queue-time-max
Record-send-rate
Records-per-request-avg
For consumers, we recommend that you monitor:
Batch-size-avg
Compression-rate-avg
Waiting-threads
Buffer-available-bytes
Record-queue-time-max
Record-send-rate
Records-per-request-avg
Security
Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved. Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper. The following tells you more about the Intuit approach. For details on Kafka security not covered in this section, see the Kafka documentation.
Encryption at rest
For EBS-backed EC2 instances, you can enable encryption at rest by using Amazon EBS volumes with encryption enabled. Amazon EBS uses AWS Key Management Service (AWS KMS) for encryption. For more details, see Amazon EBS Encryption in the EBS documentation. For instance store–backed EC2 instances, you can enable encryption at rest by using Amazon EC2 instance store encryption.
Encryption in transit
Kafka uses TLS for client and internode communications.
Authentication
Authentication of connections to brokers from clients (producers and consumers) to other brokers and tools uses either Secure Sockets Layer (SSL) or Simple Authentication and Security Layer (SASL).
Kafka supports Kerberos authentication. If you already have a Kerberos server, you can add Kafka to your current configuration.
Authorization
In Kafka, authorization is pluggable and integration with external authorization services is supported.
Backup and restore
The type of storage used in your deployment dictates your backup and restore strategy.
The best way to back up a Kafka cluster based on instance storage is to set up a second cluster and replicate messages using MirrorMaker. Kafka’s mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. Depending on your setup and requirements, your backup cluster might be in the same AWS Region as your main cluster or in a different one.
For EBS-based deployments, you can enable automatic snapshots of EBS volumes to back up volumes. You can easily create new EBS volumes from these snapshots to restore. We recommend storing backup files in Amazon S3.
For more information on how to back up in Kafka, see the Kafka documentation.
Conclusion
In this post, we discussed several patterns for running Kafka in the AWS Cloud. AWS also provides an alternative managed solution with Amazon Kinesis Data Streams, there are no servers to manage or scaling cliffs to worry about, you can scale the size of your streaming pipeline in seconds without downtime, data replication across availability zones is automatic, you benefit from security out of the box, Kinesis Data Streams is tightly integrated with a wide variety of AWS services like Lambda, Redshift, Elasticsearch and it supports open source frameworks like Storm, Spark, Flink, and more. You may refer to kafka-kinesis connector.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
This is the fourth article of a series discussing various methods of reducing the size of the Linux kernel to make it suitable for small environments. Reducing the kernel binary has its limits and we have pushed them as far as possible at this point. Still, our goal, which is to be able to run Linux entirely from the on-chip resources of a microcontroller, has not been reached yet. This article will conclude this series by looking at the problem from the perspective of making the kernel and user space fit into a resource-limited system.
Volker Lendecke is one of the first contributors to Samba, having submitted his first patches in 1994. In addition to developing other important file-sharing tools, he’s heavily involved in development of the winbind service, which is implemented in winbindd. Although the core Active Directory (AD) domain controller (DC) code was written by his colleague Stefan Metzmacher, winbind is a crucial component of Samba’s AD functionality. In his information-packed talk at FOSDEM 2018, Lendecke said he aimed to give a high-level overview of what AD and Samba authentication is, and in particular the communication pathways and trust relationships between the parts of Samba that authenticate a Samba user in an AD environment.
Piano keys are so limiting! Why not swap them out for LEDs and the wealth of instruments in Pygame to build air keys, as demonstrated by Instructables maker 2fishy?
Raspberry Pi LED Light Schroeder Piano – Twinkle Little Star
Keys? Where we’re going you don’t need keys!
This project, created by either Yolanda or Ken Fisher (or both!), uses an array of LEDs and photoresistors to form a MIDI sequencer. Twelve LEDs replace piano keys, and another three change octaves and access the menu.
Each LED is paired with a photoresistor, which detects the emitted light to form a closed circuit. Interrupting the light beam — in this case with a finger — breaks the circuit, telling the Python program to perform an action.
We’re all hoping this is just the scaled-down prototype of a full-sized LED grand piano
Using Pygame, the 2fishy team can access 75 different instruments and 128 notes per instrument, making their wooden piano more than just a one-hit wonder.
Piano building
The duo made the piano’s body out of plywood, hardboard, and dowels, and equipped it with a Raspberry Pi 2, a speaker, and the aforementioned LEDs and photoresistors.
A Raspberry Pi 2 and speaker sit within the wooden body, with LEDs and photoresistors in place of the keys.
A complete how-to for the build, including some rather fancy and informative schematics, is available at Instructables, where 2fishy received a bronze medal for their project. Congratulations!
Learn more
If you’d like to learn more about using Pygame, check out The MagPi’s Make Games with Python Essentials Guide, available both in print and as a free PDF download.
And for more music-based projects using a variety of tech, be sure to browse our free resources.
Lastly, if you’d like to see more piano-themed Raspberry Pi projects, take a look at our Big Minecraft Piano, these brilliant piano stairs, this laser-guided piano teacher, and our video below about the splendid Street Fighter duelling pianos we witnessed at Maker Faire.
Two pianos wired up as Playstation 2 controllers allow users to battle…musically! We caught up with makers Eric Redon and Cyril Chapellier of foobarflies a…
Linus has released the 4.15 kernel. “After a release cycle that was unusual in so many (bad) ways, this last week was really pleasant. Quiet and small, and no last-minute panics, just small fixes for various issues. I never got a feeling that I’d need to extend things by yet another week, and 4.15 looks fine to me.”
Some of the more significant features in this release include: the long-awaited CPU controller for the version-2 control-group interface, significant live-patching improvements, initial support for the RISC-V architecture, support for AMD’s secure encrypted virtualization feature, and the MAP_SYNC mechanism for working with nonvolatile memory. This release also, of course, includes mitigations for the Meltdown and Spectre variant-2 vulnerabilities though, as Linus points out in the announcement, the work of dealing with these issues is not yet done.
Alison Chaiken looks in detail at how the kernel boots on opensource.com. “Besides starting buggy spyware, what function does early boot firmware serve? The job of a bootloader is to make available to a newly powered processor the resources it needs to run a general-purpose operating system like Linux. At power-on, there not only is no virtual memory, but no DRAM until its controller is brought up.”
Many enterprises use Microsoft Active Directory to manage users, groups, and computers in a network. And a question is asked frequently: How can Active Directory users access big data workloads running on Amazon EMR with the same single sign-on (SSO) experience they have when accessing resources in the Active Directory network?
This post walks you through the process of using AWS CloudFormation to set up a cross-realm trust and extend authentication from an Active Directory network into an Amazon EMR cluster with Kerberos enabled. By establishing a cross-realm trust, Active Directory users can use their Active Directory credentials to access an Amazon EMR cluster and run jobs as themselves.
Walkthrough overview
In this example, you build a solution that allows Active Directory users to seamlessly access Amazon EMR clusters and run big data jobs. Here’s what you need before setting up this solution:
A possible limit increase for your account (Note: Usually a limit increase will not be necessary. See the AWS Service Limits documentation if you encounter a limit error while building the solution.)
To make it easier for you to get started, I created AWS CloudFormation templates that automatically configure and deploy the solution for you. The following steps and resources are involved in setting up the solution:
Note: If you want to manually create and configure the components for this solution without using AWS CloudFormation, refer to the Amazon EMR cross-realm documentation. IMPORTANT: The AWS CloudFormation templates used in this post are designed to work only in the us-east-1 (N. Virginia) Region. They are not intended for production use without modification.
Single-step solution deployment
If you don’t want to set up each component individually, you can use the single-step AWS CloudFormation template. The single-step template is a master template that uses nested stacks (additional templates) to launch and configure all the resources for the solution in one go.
To deploy the single-step template into your account, choose Launch Stack:
This takes you to the Create stack wizard in the AWS CloudFormation console. The template is launched in the US East (N. Virginia) Region by default. Do not change to a different Region because the template is designed to work only in us-east-1 (N. Virginia).
On the Select Template page, keep the default URL for the AWS CloudFormation template, and then choose Next.
On the Specify Details page, review the parameters for the template. Provide values for the parameters that require input (for more information, see the parameters table that follows).
The following parameters are available in this template.
Parameter
Default
Description
Domain Controller name
DC1
NetBIOS (hostname) name of the Active Directory server. This name can be up to 15 characters long.
Active Directory domain
example.com
Fully qualified domain name (FQDN) of the forest root domain (for example, example.com).
Domain NetBIOS name
EXAMPLE
NetBIOS name of the domain for users of earlier versions of Windows. This name can be up to 15 characters long.
Domain admin user
CrossRealmAdmin
User name for the account that is added as domain administrator. This account is separate from the default administrator account.
Domain admin password
Requires input
Password for the domain admin user. Must be at least eight characters including letters, numbers, and symbols.
Key pair name
Requires input
Name of an existing key pair, which enables you to connect securely to your instance after it launches.
Instance type
m4.xlarge
Instance type for the domain controller and the Amazon EMR cluster.
Allowed IP address
10.0.0.0/16
The client IP address can that can reach your cluster. Specify an IP address range in CIDR notation (for example, 203.0.113.5/32). By default, only the VPC CIDR (10.0.0.0/16) can reach the cluster. Be sure to add your client IP range so that you can connect to the cluster using SSH.
EMR Kerberos realm
EC2.INTERNAL
Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).
Trusted AD domain
EXAMPLE.COM
The Active Directory (AD) domain that you want to trust. This is the same as the “Active Directory domain.” However, it must use all uppercase letters (for example, EXAMPLE.COM).
Cross-realm trust password
Requires input
Password that you want to use for your cross-realm trust.
Instance count
2
The number of instances (core nodes) for the cluster.
EMR applications
Hadoop, Spark, Ganglia, Hive
Comma separated list of applications to install on the cluster.
After you specify the template details, choose Next. On the Options page, choose Next again. On the Review page, select the I acknowledge that AWS CloudFormation might create IAM resources with custom names check box, and then choose Create.
It takes approximately 45 minutes for the deployment to complete. When the stack launch is complete, it will return outputs with information about the resources that were created. Note the outputs and skip to the Managing and testing the solution section. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:
This section describes how to use AWS CloudFormation templates to perform each step separately in the solution.
Create and configure an Amazon VPC
In order for you to establish a cross-realm trust between an Amazon EMR Kerberos realm and an Active Directory domain, your Amazon VPC must meet the following requirements:
The subnet used for the Amazon EMR cluster must have a CIDR block of fewer than nine digits (for example, 10.0.1.0/24).
Both DNS resolution and DNS hostnames must be enabled (set to “yes”).
The Active Directory domain controller must be the DNS server for instances in the Amazon VPC (this is configured in the next step).
To use the AWS CloudFormation template to create and configure an Amazon VPC with the prerequisites listed previously, choose Launch Stack:
Note: If you want to create the VPC manually (without using AWS CloudFormation), see Set Up the VPC and Subnet in the Amazon EMR documentation.
Launching this stack creates the following AWS resources:
Amazon VPC with CIDR block 10.0.0.0/16 (Name: CrossRealmVPC)
Internet Gateway (Name: CrossRealmGateway)
Public subnet with CIDR block 10.0.1.0/24 (Name: CrossRealmSubnet)
Security group allowing inbound access from the VPC’s subnets (Name tag: CrossRealmSecurityGroup)
When the stack launch is complete, it should return outputs similar to the following.
Key
Value example
Description
SubnetID
subnet-xxxxxxxx
The subnet for the Active Directory domain controller and the EMR cluster.
SecurityGroup
sg-xxxxxxxx
The security group for the Active Directory domain controller.
VPCID
vpc-xxxxxxxx
The Active Directory domain controller and EMR cluster will be launched on this VPC.
Note the outputs because they are used in the next step. You can view the stack outputs on the AWS Management Console or by using the following AWS CLI command:
Launch and configure an Active Directory domain controller
In this step, you use an AWS CloudFormation template to automatically launch and configure a new Active Directory domain controller and cross-realm trust.
Note: There are various ways to install and configure an Active Directory domain controller. For details on manually launching and installing a domain controller without AWS CloudFormation, see Step 2: Launch and Install the AD Domain Controller in the Amazon EMR documentation.
In addition to launching and configuring an Active Directory domain controller and cross-realm trust, this AWS CloudFormation template also sets the domain controller as the DNS server (name server) for your Amazon VPC. In other words, the template creates a new DHCP option-set for the VPC where it’s being deployed to, and it sets the private IP of the domain controller as the name server for that new DHCP option set.
IMPORTANT: You should not use this template on a production VPC with existing resources like Amazon EC2 instances. When you launch this stack, make sure that you use the new environment and resources (Amazon VPC, subnet, and security group) that were created in the Create and configure an Amazon VPC step.
To launch this stack, choose Launch Stack:
The following table contains information about the parameters available in this template. Review the parameters for the template and provide values for those that require input.
Parameter
Default
Description
VPC ID
Requires input
Launch the domain controller on this VPC (for example, use the VPC created in the Create and configure an Amazon VPC step).
Subnet ID
Requires input
Subnet used for the domain controller (for example, use the subnet created in the Create and configure an Amazon VPC step).
Security group ID
Requires input
Security group (SG) for the domain controller (for example, use the SG created in the Create and configure an Amazon VPC step).
Domain Controller name
DC1
NetBIOS name of the Active Directory server (up to 15 characters).
Active Directory domain
example.com
Fully qualified domain name (FQDN) of the forest root domain (for example, example.com).
Domain NetBIOS name
EXAMPLE
NetBIOS name of the domain for users of earlier versions of Windows. This name can be up to 15 characters long.
Domain admin user
CrossRealmAdmin
User name for the account that is added as domain administrator. This account is separate from the default administrator account.
Domain admin password
Requires input
Password for the domain admin user. Must be at least eight characters including letters, numbers, and symbols.
Key pair name
Requires input
Name of an existing EC2 key pair to enable access to the domain controller instance.
Instance type
m4.xlarge
Instance type for the domain controller.
EMR Kerberos realm
EC2.INTERNAL
Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).
Cross-realm trust password
Requires input
Password that you want to use for your cross-realm trust.
It takes 25–30 minutes for this stack to be created. When it’s complete, note the stack’s outputs, and then move to the next step: Launch an EMR cluster with Kerberos enabled.
Create a security configuration and launch an Amazon EMR cluster with Kerberos enabled
To launch a kerberized Amazon EMR cluster, you first need to create a security configuration containing the cross-realm trust configuration. You then specify cluster-specific Kerberos attributes when launching the cluster.
In this step, you use AWS CloudFormation to launch and configure a kerberized Amazon EMR cluster with a cross-realm trust. If you want to manually launch and configure a cluster with Kerberos enabled, see Step 6: Launch a Kerberized EMR Cluster in the Amazon EMR documentation.
Note: At the time of this writing, AWS CloudFormation does not yet support launching Amazon EMR clusters with Kerberos authentication enabled. To overcome this limitation, I created a template that uses an AWS Lambda-backed custom resource to launch and configure the Amazon EMR cluster with Kerberos enabled. If you use this template, there’s nothing else that you need to do. Just keep in mind that the template creates and invokes an AWS Lambda function (custom resource) to launch the cluster.
To create a cross-realm trust security configuration and launch a kerberized Amazon EMR cluster using AWS CloudFormation, choose Launch Stack:
The following table lists and describes the template parameters for deploying a kerberized Amazon EMR cluster and configuring a cross-realm trust.
Parameter
Default
Description
Active Directory domain
example.com
The Active Directory domain that you want to establish the cross-realm trust with.
Domain admin user (joiner user)
CrossRealmAdmin
The user name of an Active Directory domain user with privileges to join domains/computers to the Active Directory domain (joiner user).
Domain admin password
Requires input
Password of the joiner user.
Cross-realm trust password
Requires input
Password of your cross-realm trust.
EC2 key pair name
Requires input
Name of an existing key pair, which enables you to connect securely to your cluster after it launches.
Subnet ID
Requires input
Subnet that you want to use for your Amazon EMR cluster (for example, choose the subnet created in the Create and configure an Amazon VPC step).
Security group ID
Requires input
Security group that you want to use for your Amazon EMR cluster (for example, choose the security group created in the Create and configure an Amazon VPC step).
Instance type
m4.xlarge
The instance type that you want to use for the cluster nodes.
Instance count
2
The number of instances (core nodes) for the cluster.
Allowed IP address
10.0.0.0/16
The client IP address can that can reach your cluster. Specify an IP address range in CIDR notation (for example, 203.0.113.5/32). By default, only the VPC CIDR (10.0.0.0/16) can reach the cluster. Be sure to add your client IP range so that you can connect to the cluster using SSH.
EMR applications
Hadoop, Spark, Ganglia, Hive
Comma separated list of the applications that you want installed on the cluster.
EMR Kerberos realm
EC2.INTERNAL
Cluster’s Kerberos realm name. By default, the realm name is derived from the cluster’s VPC domain name in uppercase letters (for example, EC2.INTERNAL is the default VPC domain name in the us-east-1 Region).
Trusted AD domain
EXAMPLE.COM
The Active Directory domain that you want to trust. This name is the same as the “AD domain name.” However, it must use all uppercase letters (for example, EXAMPLE.COM).
It takes 10–15 minutes for this stack to be created. When it’s complete, note the stack’s outputs, and then move to the next section: Managing and testing the solution.
Managing and testing the solution
Now that you’ve configured and built the solution, it’s time to test it by connecting to a cluster using Active Directory credentials.
SSH to a cluster using Active Directory credentials (single sign-on)
After you launch a kerberized Amazon EMR cluster, if you used the AWS CloudFormation templates and added your client IP range to the Allowed IP address parameter, you should be able to connect to the cluster using an SSH client and your Active Directory user credentials. If you have trouble connecting to the cluster using SSH, check the cluster’s security group to make sure that it allows inbound SSH connection (TCP port 22) from your client’s IP address (source).
The following steps assume that you’re using a client such as OpenSSH. If you’re using a different SSH application (for example, PuTTY), consult the application-specific documentation.
Note: Because the cluster was launched with a cross-realm trust configuration, you don’t need to use a private key (.pem file) when you connect to it as a domain user using SSH.
To connect to your Amazon EMR cluster as an Active Directory user using SSH, run the following command. Replace ad_user with the domain admin user that you created while setting up the domain controller and replace master_node_URL with the cluster’s URL (see the stack’s outputs to find this information):
$ ssh -l <ad_user> <master_node_URL>
If your SSH client is configured to use a key as the preferred authentication method, the login might fail. If that’s the case, you can add the following options to your SSH command to force the SSH connection to use password authentication:
After a domain user connects to the cluster using SSH, if this is the first that the user is connecting, a local home directory is created for that user. In addition to creating a local home directory, if you used the create-hdfs-home-ba.sh bootstrap action when launching the cluster (done by default if you used the AWS CloudFormation template to launch a kerberized cluster), an HDFS user home directory is also automatically created.
Note: If you manually launched the cluster and did not use the create-hdfs-home-ba.sh bootstrap action, then you’ll need to manually create HDFS user home directories for your users.
When you connect to the cluster using SSH for the first time (as a domain user), you should see the following messages if the HDFS home directory for your domain user was successfully created:
Running jobs on a kerberized Amazon EMR cluster
To run a job on a kerberized cluster, the user submitting the job must first be authenticated. If you followed the previous section to connect to your cluster as an Active Directory user using SSH, the user should be authenticated automatically.
If running the klist command returns a “No credentials cache found” message, it means that the user is not authenticated (the user doesn’t have a Kerberos ticket). You can re-authenticate a user at any time by running the following command (be sure to use all uppercase letters for the Active Directory domain):
$ kinit <username>@<AD_DOMAIN>
When the user is authenticated, they can submit jobs just like they would on a non-kerberized cluster.
Auditing jobs
Another advantage that Kerberos can provide is that you can easily tell which user ran a particular job. For example, connect (using SSH) to a kerberized cluster with an Active Directory user, and submit the SparkPi sample application:
$ spark-example SparkPi
After running the SparkPi application, go to the Amazon EMR console and choose your cluster. Then choose the Application history tab. There you can see information about the application, including the user that submitted the job:
Common issues
Although it would be hard to cover every possible Kerberos issue, this section covers some of the more common issues that might occur and ways to fix them.
Issue 1: You can successfully connect and get authenticated on a cluster. However, whenever you try running job, it fails with an error similar to the following:
Solution: Make sure that an HDFS home directory for the user was created and that it has the right permissions.
Issue 2: You can successfully connect to the cluster, but you can’t run any Hadoop or HDFS commands.
Solution: Use the klist command to confirm whether the user is authenticated and has a valid Kerberos ticket. Use the kinit command to re-authenticate a user.
Issue 3: You can’t connect (using SSH) to the cluster using Active Directory user credentials, but you can manually authenticate the user with kinit.
Solution: Make sure that the Active Directory domain controller is the DNS server (name server) for the cluster nodes.
Cleaning up
After completing and testing this solution, remember to clean up the resources. If you used the AWS CloudFormation templates to create the resources, then use the AWS CloudFormation console or AWS CLI/SDK to delete the stacks. Deleting a stack also deletes the resources created by that stack.
If one of your stacks does not delete, make sure that there are no dependencies on the resources created by that stack. For example, if you deployed an Amazon VPC using AWS CloudFormation and then deployed a domain controller into that VPC using a different AWS CloudFormation stack, you must first delete the domain controller stack before the VPC stack can be deleted.
Summary
The ability to authenticate users and services with Kerberos not only allows you to secure your big data applications, but it also enables you to easily integrate Amazon EMR clusters with an Active Directory environment. This post showed how you can use Kerberos on Amazon EMR to create a single sign-on solution where Active Directory domain users can seamlessly access Amazon EMR clusters and run big data applications. We also showed how you can use AWS CloudFormation to automate the deployment of this solution.
Honolulu-based software developer bbtinkerer was tired of never being able to find the TV remote. So he made his own using a Raspberry Pi Zero, and connected it to a web app accessible on his smartphone.
Finding a remote alternative
“I needed one because the remote in my house tends to go missing a lot,” explains Bernard aka bbtinkerer on the Instructables page for his Raspberry Pi Zero Universal Remote.”If I want the controller, I have to hunt down three people and hope one of them remembers that they took it.”
For the build, Bernard used a Raspberry Pi Zero, an IR LED and corresponding receiver, Raspbian Lite, and a neat little 3D-printed housing.
First, he soldered a circuit for the LED and resistors on a small piece of perf board. Then he assembled the hardware components. Finally, all he needed to do was to write the code to control his devices (including a tower fan), and to set up the app.
Bernard employed the Linux Infrared Remote Control (LIRC) package to control the television with the Raspberry Pi Zero, accessing the Zero via SSH. He gives a complete rundown of the installation process on Instructables.
Setting up a remote’s buttons with LIRC is a simple case of pressing them and naming their functions one by one. You’ll need the remote to set up the system, but after that, feel free to lock it in a drawer and use your smartphone instead.
Finally, Bernard created the web interface using Node.js, and again, because he’s lovely, he published the code for anyone wanting to build their own. Thanks, Bernard!
Life hacks
If you’ve used a Raspberry Pi to build a time-saving life hack like Bernard’s, be sure to share it with us. Other favourites of ours include fridge cameras, phone app doorbell notifications, and Alan’s ocarina home automation system. I’m not sure if this last one can truly be considered a time-saving life hack. It’s still cool though!
At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!
Company Description: Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.
We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.
We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.
Some Backblaze Perks:
Competitive healthcare plans
Competitive compensation and 401k
All employees receive Option grants
Unlimited vacation days
Strong coffee
Fully stocked Micro kitchen
Catered breakfast and lunches
Awesome people who work on awesome projects
Childcare bonus
Normal work hours
Get to bring your pets into the office
San Mateo Office – located near Caltrain and Highways 101 & 280.
Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.
At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.
We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.
Are you 1/2” deep into many different technologies, and unafraid to dive deeper?
Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?
Are you excited to setup complicated software in a lab and write knowledge base articles about your work?
Then Backblaze is the place for you!
Enough about Backblaze already, what’s in it for me? In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.
Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:
How to setup VM servers for lab environments, both on-prem and using cloud services.
Create an automatically “resetting” demo environment for the sales team.
Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
Learn the basics of OAUTH and web single sign on (SSO).
Archive video workflows from camera to media asset management systems.
How upload/download files from Javascript by enabling CORS.
How to install and monitor online backup installations using RMM tools, like JAMF.
Tape (LTO) systems. (Yes – people still use tape for storage!)
How can I know if I’ll succeed in this role?
You have:
Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.
You are versed in:
The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
Building, installing, integrating and configuring applications on any operating system.
Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
The basics of TCP/IP networking and the HTTP protocol.
Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
Your background contains:
Bachelor’s degree in computer science or the equivalent.
2+ years of experience as a pre or post-sales engineer.
The right extra credit: There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!
Experience using or programming against Amazon S3.
Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
Experience with photo or video media. Media archiving is a key market for Backblaze B2.
Experience with Windows Servers, Active Directory, Group policies and the like.
What’s it like working with the Sales team? The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.
We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.
If this all sounds like you:
Send an email to [email protected] with the position in the subject line.
Tell us a bit about your Sales Engineering experience.
Noted below are the upcoming scheduled live, online technical sessions being held during the month of January. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.