One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”
Tadaaaa! The BYO weather station fully assembled.
Our Oracle Weather Station
In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.
The original Raspberry Pi Oracle Weather Station HAT
We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!
Fun with meteorological experiments!
Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.
Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.
There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.
Who should try this build
We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.
The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.
You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.
For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.
Our plans for the guide
Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!
*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.
Side projects are the things you do at home, after work, for your own “entertainment”, or to satisfy your desire to learn new stuff, in case your workplace doesn’t give you that opportunity (or at least not enough of it). Side projects are also a way to build stuff that you think is valuable but not necessarily “commercialisable”. Many side projects are open-sourced sooner or later and some of them contribute to the pool of tools at other people’s disposal.
But there are more benefits than that – serendipitous benefits, for example. And I’d like to tell some personal stories about that. I’ll focus on a few examples from my list of side projects to show how, through a sort-of butterfly effect, they helped shape my career.
The computoser project, no matter how cool algorithmic music composition, didn’t manage to have much of a long term impact. But it did teach me something apart from niche musical theory – how to read a bulk of scientific papers (mostly computer science) and understand them without being formally trained in the particular field. We’ll see how that was useful later.
Then there was the “State alerts” project – a website that scraped content from public institutions in my country (legislation, legislation proposals, decisions by regulators, new tenders, etc.), made them searchable, and “subscribable” – so that you get notified when a keyword of interest is mentioned in newly proposed legislation, for example. (I obviously subscribed for “information technologies” and “electronic”).
And that project turned out to have a significant impact on the following years. First, I chose a new technology to write it with – Scala. Which turned out to be of great use when I started working at TomTom, and on the 3rd day I was transferred to a Scala project, which was way cooler and much more complex than the original one I was hired for. It was a bit ironic, as my colleagues had just read that “I don’t like Scala” a few weeks earlier, but nevertheless, that was one of the most interesting projects I’ve worked on, and it went on for two years. Had I not known Scala, I’d probably be gone from TomTom much earlier (as the other project was restructured a few times), and I would not have learned many of the scalability, architecture and AWS lessons that I did learn there.
But the very same project had an even more important follow-up. Because if its “civic hacking” flavour, I was invited to join an informal group of developers (later officiated as an NGO) who create tools that are useful for society (something like MySociety.org). That group gathered regularly, discussed both tools and policies, and at some point we put up a list of policy priorities that we wanted to lobby policy makers. One of them was open source for the government, the other one was open data. As a result of our interaction with an interim government, we donated the official open data portal of my country, functioning to this day.
As a result of that, a few months later we got a proposal from the deputy prime minister’s office to “elect” one of the group for an advisor to the cabinet. And we decided that could be me. So I went for it and became advisor to the deputy prime minister. The job has nothing to do with anything one could imagine, and it was challenging and fascinating. We managed to pass legislation, including one that requires open source for custom projects, eID and open data. And all of that would not have been possible without my little side project.
As for my latest side project, LogSentinel – it became my current startup company. And not without help from the previous two mentioned above – the computer science paper reading was of great use when I was navigating the crypto papers landscape, and from the government job I not only gained invaluable legal knowledge, but I also “got” a co-founder.
Some other side projects died without much fanfare, and that’s fine. But the ones above shaped my “story” in a way that would not have been possible otherwise.
And I agree that such serendipitous chain of events could have happened without side projects – I could’ve gotten these opportunities by meeting someone at a bar (unlikely, but who knows). But we, as software engineers, are capable of tilting chance towards us by utilizing our skills. Side projects are our “extracurricular activities”, and they often lead to unpredictable, but rather positive chains of events. They would rarely be the only factor, but they are certainly great at unlocking potential.
The Intercept has a long article on Japan’s equivalent of the NSA: the Directorate for Signals Intelligence. Interesting, but nothing really surprising.
The directorate has a history that dates back to the 1950s; its role is to eavesdrop on communications. But its operations remain so highly classified that the Japanese government has disclosed little about its work even the location of its headquarters. Most Japanese officials, except for a select few of the prime minister’s inner circle, are kept in the dark about the directorate’s activities, which are regulated by a limited legal framework and not subject to any independent oversight.
Now, a new investigation by the Japanese broadcaster NHK — produced in collaboration with The Intercept — reveals for the first time details about the inner workings of Japan’s opaque spy community. Based on classified documents and interviews with current and former officials familiar with the agency’s intelligence work, the investigation shines light on a previously undisclosed internet surveillance program and a spy hub in the south of Japan that is used to monitor phone calls and emails passing across communications satellites.
The article includes some new documents from the Snowden archive.
Thanks to Susan Ferrell, Senior Technical Writer, for a great blog post on how to use CodeCommit branch-level permissions. —-
AWS CodeCommit users have been asking for a way to restrict commits to some repository branches to just a few people. In this blog post, we’re going to show you how to do that by creating and applying a conditional policy, an AWS Identity and Access Management (IAM) policy that contains a context key.
Why would I do this?
When you create a branch in an AWS CodeCommit repository, the branch is available, by default, to all repository users. Here are some scenarios in which refining access might help you:
You maintain a branch in a repository for production-ready code, and you don’t want to allow changes to this branch except from a select group of people.
You want to limit the number of people who can make changes to the default branch in a repository.
You want to ensure that pull requests cannot be merged to a branch except by an approved group of developers.
We’ll show you how to create a policy in IAM that prevents users from pushing commits to and merging pull requests to a branch named master. You’ll attach that policy to one group or role in IAM, and then test how users in that group are affected when that policy is applied. We’ll explain how it works, so you can create custom policies for your repositories.
What you need to get started
You’ll need to sign in to AWS with sufficient permissions to:
Create and apply policies in IAM.
Create groups in IAM.
Add users to those groups.
Apply policies to those groups.
You can use existing IAM groups, but because you’re going to be changing permissions, you might want to first test this out on groups and users you’ve created specifically for this purpose.
You’ll need a repository in AWS CodeCommit with at least two branches: master and test-branch. For information about how to create repositories, see Create a Repository. For information about how to create branches, see Create a Branch. In this blog post, we’ve named the repository MyDemoRepo. You can use an existing repository with branches of another name, if you prefer.
Let’s get started!
Create two groups in IAM
We’re going to set up two groups in IAM: Developers and Senior_Developers. To start, both groups will have the same managed policy, AWSCodeCommitPowerUsers, applied. Users in each group will have exactly the same permissions to perform actions in IAM.
Figure 1: Two example groups in IAM, with distinct users but the same managed policy applied to each group
In the navigation pane, choose Groups, and then choose Create New Group.
In the Group Name box, type Developers, and then choose Next Step.
In the list of policies, select the check box for AWSCodeCommitPowerUsers, then choose Next Step.
Choose Create Group.
Now, follow these steps to create the Senior_Developers group and attach the AWSCodeCommitPowerUsers managed policy. You now have two empty groups with the same policy attached.
Create users in IAM
Next, add at least one unique user to each group. You can use existing IAM users, but because you’ll be affecting their access to AWS CodeCommit, you might want to create two users just for testing purposes. Let’s go ahead and create Arnav and Mary.
In the navigation pane, choose Users, and then choose Add user.
For the new user, type Arnav_Desai.
Choose Add another user, and then type Mary_Major.
Select the type of access (programmatic access, access to the AWS Management Console, or both). In this blog post, we’ll be testing everything from the console, but if you want to test AWS CodeCommit using the AWS CLI, make sure you include programmatic access and console access.
For Console password type, choose Custom password. Each user is assigned the password that you type in the box. Write these down so you don’t forget them. You’ll need to sign in to the console using each of these accounts.
Choose Next: Permissions.
On the Set permissions page, choose Add user to group. Add Arnav to the Developers group. Add Mary to the Senior_Developers group.
Choose Next: Review to see all of the choices you made up to this point. When you are ready to proceed, choose Create user.
Sign in as Arnav, and then follow these steps to go to the master branch and add a file. Then sign in as Mary and follow the same steps.
On the Dashboard page, from the list of repositories, choose MyDemoRepo.
In the Code view, choose the branch named master.
Choose Add file, and then choose Create file. Type some text or code in the editor.
Provide information to other users about who added this file to the repository and why.
In Author name, type the name of the user (Arnav or Mary).
In Email address, type an email address so that other repository users can contact you about this change.
In Commit message, type a brief description to help you remember why you added this file or any other details you might find helpful.
Type a name for the file.
Choose Commit file.
Now follow the same steps to add a file in a different branch. (In our example repository, that’s the branch named test-branch.) You should be able to add a file to both branches regardless of whether you’re signed in as Arnav or Mary.
Let’s change that.
Create a conditional policy in IAM
You’re going to create a policy in IAM that will deny API actions if certain conditions are met. We want to prevent users with this policy applied from updating a branch named master, but we don’t want to prevent them from viewing the branch, cloning the repository, or creating pull requests that will merge to that branch. For this reason, we want to pick and choose our APIs carefully. Looking at the Permissions Reference, the logical permissions for this are:
Now’s the time to think about what else you might want this policy to do. For example, because we don’t want users with this policy to make changes to this branch, we probably don’t want them to be able to delete it either, right? So let’s add one more permission:
The branch in which we want to deny these actions is master. The repository in which the branch resides is MyDemoRepo. We’re going to need more than just the repository name, though. We need the repository ARN. Fortunately, that’s easy to find. Just go to the AWS CodeCommit console, choose the repository, and choose Settings. The repository ARN is displayed on the General tab.
Now we’re ready to create a policy. 1. Open the IAM console at https://console.aws.amazon.com/iam/. Make sure you’re signed in with the account that has sufficient permissions to create policies, and not as Arnav or Mary. 2. In the navigation pane, choose Policies, and then choose Create policy. 3. Choose JSON, and then paste in the following:
You’ll notice a few things here. First, change the repository ARN to the ARN for your repository and include the repository name. Second, if you want to restrict access to a branch with a name different from our example, master, change that reference too.
Now let’s talk about this policy and what it does. You might be wondering why we’re using a Git reference (refs/heads) value instead of just the branch name. The answer lies in how Git references things, and how AWS CodeCommit, as a Git-based repository service, implements its APIs. A branch in Git is a simple pointer (reference) to the SHA-1 value of the head commit for that branch.
You might also be wondering about the second part of the condition, the nullification language. This is necessary because of the way git push and git-receive-pack work. Without going into too many technical details, when you attempt to push a change from a local repo to AWS CodeCommit, an initial reference call is made to AWS CodeCommit without any branch information. AWS CodeCommit evaluates that initial call to ensure that:
a) You’re authorized to make calls.
b) A repository exists with the name specified in the initial call. If you left that null out of the policy, users with that policy would be unable to complete any pushes from their local repos to the AWS CodeCommit remote repository at all, regardless of which branch they were trying to push their commits to.
Could you write a policy in such a way that the null is not required? Of course. IAM policy language is flexible. There’s an example of how to do this in the AWS CodeCommit User Guide, if you’re curious. But for the purposes of this blog post, let’s continue with this policy as written.
So what have we essentially said in this policy? We’ve asked IAM to deny the relevant CodeCommit permissions if the request is made to the resource MyDemoRepo and it meets the following condition: the reference is to refs/heads/master. Otherwise, the deny does not apply.
I’m sure you’re wondering if this policy has to be constrained to a specific repository resource like MyDemoRepo. After all, it would be awfully convenient if a single policy could apply to all branches in any repository in an AWS account, particularly since the default branch in any repository is initially the master branch. Good news! Simply replace the ARN with an *, and your policy will affect ALL branches named master in every AWS CodeCommit repository in your AWS account. Make sure that this is really what you want, though. We suggest you start by limiting the scope to just one repository, and then changing things when you’ve tested it and are happy with how it works.
When you’re sure you’ve modified the policy for your environment, choose Review policy to validate it. Give this policy a name, such as DenyChangesToMaster, provide a description of its purpose, and then choose Create policy.
Now that you have a policy, it’s time to apply and test it.
Apply the policy to a group
In theory, you could apply the policy you just created directly to any IAM user, but that really doesn’t scale well. You should apply this policy to a group, if you use IAM groups to manage users, or to a role, if your users assume a role when interacting with AWS resources.
In the IAM console, choose Groups, and then choose Developers.
On the Permissions tab, choose Attach Policy.
Choose DenyChangesToMaster, and then choose Attach policy.
Your groups now have a critical difference: users in the Developers group have an additional policy applied that restricts their actions in the master branch. In other words, Mary can continue to add files, push commits, and merge pull requests in the master branch, but Arnav cannot.
Figure 2: Two example groups in IAM, one with an additional policy applied that will prevent users in this group from making changes to the master branch
Test it out. Sign in as Arnav, and do the following:
On the Dashboard page, from the list of repositories, choose MyDemoRepo.
In the Code view, choose the branch named master.
Choose Add file, and then choose Create file, just as you did before. Provide some text, and then add the file name and your user information.
Choose Commit file.
This time you’ll see an error after choosing Commit file. It’s not a pretty message, but at the very end, you’ll see a telling phrase: “explicit deny”. That’s the policy in action. You, as Arnav, are explicitly denied PutFile, which prevents you from adding a file to the master branch. You’ll see similar results if you try other actions denied by that policy, such as deleting the master branch.
Stay signed in as Arnav, but this time add a file to test-branch. You should be able to add a file without seeing any errors. You can create a branch based on the master branch, add a file to it, and create a pull request that will merge to the master branch, all just as before. However, you cannot perform denied actions on that master branch.
Sign out as Arnav and sign in as Mary. You’ll see that as that IAM user, you can add and edit files in the master branch, merge pull requests to it, and even, although we don’t recommend this, delete it.
You can use conditional statements in policies in IAM to refine how users interact with your AWS CodeCommit repositories. This blog post showed how to use such a policy to prevent users from making changes to a branch named master. There are many other options. We hope this blog post will encourage you to experiment with AWS CodeCommit, IAM policies, and permissions. If you have any questions or suggestions, we’d love to hear from you.
At the 2018 Python Language Summit, Carl Shapiro described some of the experiments that he and others at Instagram did to look at ways to improve the performance of the CPython interpreter. The talk was somewhat academic in tone and built on what has been learned in other dynamic languages over the years. By modifying the Python object model fairly substantially, they were able to roughly double the performance of the “classic” Richards benchmark.
Abstract: Every day, hundreds of people fly on airline tickets that have been obtained fraudulently. This crime script analysis provides an overview of the trade in these tickets, drawing on interviews with industry and law enforcement, and an analysis of an online blackmarket. Tickets are purchased by complicit travellers or resellers from the online blackmarket. Victim travellers obtain tickets from fake travel agencies or malicious insiders. Compromised credit cards used to be the main method to purchase tickets illegitimately. However, as fraud detection systems improved, offenders displaced to other methods, including compromised loyalty point accounts, phishing, and compromised business accounts. In addition to complicit and victim travellers, fraudulently obtained tickets are used for transporting mules, and for trafficking and smuggling. This research details current prevention approaches, and identifies additional interventions, aimed at the act, the actor, and the marketplace.
If your day has been a little fraught so far, watch this video. It opens with a tableau of methodically laid-out components and then shows them soldered, screwed, and slotted neatly into place. Everything fits perfectly; nothing needs percussive adjustment. Then it shows us glimpses of an AR future just like the one promised in the less dystopian comics and TV programmes of my 1980s childhood. It is all very soothing, and exactly what I needed.
Transform any surface into mixed-reality using Raspberry Pi, a laser projector, and Android Things. Android Experiments – http://experiments.withgoogle.com/android/lantern Lantern project site – http://nordprojects.co/lantern check below to make your own ↓↓↓ Get the code – https://github.com/nordprojects/lantern Build the lamp – https://www.hackster.io/nord-projects/lantern-9f0c28
Creating augmented reality with projection
We’ve seen plenty of Raspberry Pi IoT builds that are smart devices for the home; they add computing power to things like lights, door locks, or toasters to make these objects interact with humans and with their environment in new ways. Nord Projects‘ Lantern takes a different approach. In their words, it:
imagines a future where projections are used to present ambient information, and relevant UI within everyday objects. Point it at a clock to show your appointments, or point to speaker to display the currently playing song. Unlike a screen, when Lantern’s projections are no longer needed, they simply fade away.
Lantern is set up so that you can connect your wireless device to it using Google Nearby. This means there’s no need to create an account before you can dive into augmented reality.
Your own open-source AR lamp
Nord Projects collaborated on Lantern with Google’s Android Things team. They’ve made it fully open-source, so you can find the code on GitHub and also download their parts list, which includes a Pi, an IKEA lamp, an accelerometer, and a laser projector. Build instructions are at hackster.io and on GitHub.
This is a particularly clear tutorial, very well illustrated with photos and GIFs, and once you’ve sourced and 3D-printed all of the components, you shouldn’t need a whole lot of experience to put everything together successfully. Since everything is open-source, though, if you want to adapt it — for example, if you’d like to source a less costly projector than the snazzy one used here — you can do that too.
The instructions walk you through the mechanical build and the wiring, as well as installing Android Things and Nord Projects’ custom software on the Raspberry Pi. Once you’ve set everything up, an accelerometer connected to the Pi’s GPIO pins lets the lamp know which surface it is pointing at. A companion app on your mobile device lets you choose from the mini apps that work on that surface to select the projection you want.
The designers are making several mini apps available for Lantern, including the charmingly named Space Porthole: this uses Processing and your local longitude and latitude to project onto your ceiling the stars you’d see if you punched a hole through to the sky, if it were night time, and clear weather. Wouldn’t you rather look at that than deal with the ant problem in your kitchen or tackle your GitHub notifications?
What would you like to project onto your living environment? Let us know in the comments!
On the Messaging and Targeting team, we’re constantly inspired by the new and novel ways that customers use our services. For example, last year we took an in-depth look at a customer who built a fully featured email marketing platform based on Amazon SES and other AWS Services.
This week, our friends on the AWS Machine Learning team published a blog post that brings together the worlds of data science and customer engagement. Their solution uses Amazon SageMaker (a platform for building and deploying machine learning models) to create a system that makes purchasing predictions based on customers’ past behaviors. It then uses Amazon Pinpoint to send campaigns to customers based on these predictions.
The blog post is an interesting read that includes a primer on the process of creating a useful Machine Learning solution. It then goes in-depth, discussing the real-world considerations that are involved in implementing the solution.
Micah Lee ran a two-year experiment designed to detect whether or not his laptop was ever tampered with. The results are inconclusive, but demonstrate how difficult it can be to detect laptop tampering.
EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.
EC2 Fleet Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.
You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.
I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).
Using EC2 Fleet There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.
Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.
You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:
m4.16xlarge – 64 vCPUs, weight = 64
m5.24xlarge – 96 vCPUs, weight = 96
This is expressed in a template that looks like this:
By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.
Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:
The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.
Putting it all together, I have a single file (fl1.json) that describes my fleet:
My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.
Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:
Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:
Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.
In the Works We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.
Available Now You can create and make use of EC2 Fleets today in all public AWS Regions!
As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.
Hard Drive Reliability Statistics for Q1 2018
At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.
Notes and Observations
If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.
The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.
There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.
Welcome Toshiba 8TB drives, almost…
We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.
In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.
So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.
Lifetime Hard Drive Reliability Statistics
While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.
Notes and Observations
The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.
The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.
Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.
We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.
New SMART Attributes
If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:
177 – Wear Range Delta
179 – Used Reserved Block Count Total
181- Program Fail Count Total or Non-4K Aligned Access Count
182 – Erase Fail Count
235 – Good Block Count AND System(Free) Block Count
The 5 values are all related to SSD drives.
Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.
Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.
If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.
Good luck and let us know if you find anything interesting.
[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]
According to this Wired article, Ray Ozzie may have a solution to the crypto backdoor problem. No, he hasn’t. He’s only solving the part we already know how to solve. He’s deliberately ignoring the stuff we don’t know how to solve. We know how to make backdoors, we just don’t know how to secure them.
The vault doesn’t scale
Yes, Apple has a vault where they’ve successfully protected important keys. No, it doesn’t mean this vault scales. The more people and the more often you have to touch the vault, the less secure it becomes. We are talking thousands of requests per day from 100,000 different law enforcement agencies around the world. We are unlikely to protect this against incompetence and mistakes. We are definitely unable to secure this against deliberate attack.
A good analogy to Ozzie’s solution is LetsEncrypt for getting SSL certificates for your website, which is fairly scalable, using a private key locked in a vault for signing hundreds of thousands of certificates. That this scales seems to validate Ozzie’s proposal.
But at the same time, LetsEncrypt is easily subverted. LetsEncrypt uses DNS to verify your identity. But spoofing DNS is easy, as was recently shown in the recent BGP attack against a cryptocurrency. Attackers can create fraudulent SSL certificates with enough effort. We’ve got other protections against this, such as discovering and revoking the SSL bad certificate, so while damaging, it’s not catastrophic.
But with Ozzie’s scheme, equivalent attacks would be catastrophic, as it would lead to unlocking the phone and stealing all of somebody’s secrets.
In particular, consider what would happen if LetsEncrypt’s certificate was stolen (as Matthew Green points out). The consequence is that this would be detected and mass revocations would occur. If Ozzie’s master key were stolen, nothing would happen. Nobody would know, and evildoers would be able to freely decrypt phones. Ozzie claims his scheme can work because SSL works — but then his scheme includes none of the many protections necessary to make SSL work.
What I’m trying to show here is that in a lab, it all looks nice and pretty, but when attacked at scale, things break down — quickly. We have so much experience with failure at scale that we can judge Ozzie’s scheme as woefully incomplete. It’s not even up to the standard of SSL, and we have a long list of SSL problems.
Cryptography is about people more than math We have a mathematically pure encryption algorithm called the “One Time Pad”. It can’t ever be broken, provably so with mathematics.
It’s also perfectly useless, as it’s not something humans can use. That’s why we use AES, which is vastly less secure (anything you encrypt today can probably be decrypted in 100 years). AES can be used by humans whereas One Time Pads cannot be. (I learned the fallacy of One Time Pad’s on my grandfather’s knee — he was a WW II codebreaker who broke German messages trying to futz with One Time Pads).
The same is true with Ozzie’s scheme. It focuses on the mathematical model but ignores the human element. We already know how to solve the mathematical problem in a hundred different ways. The part we don’t know how to secure is the human element.
How do we know the law enforcement person is who they say they are? How do we know the “trusted Apple employee” can’t be bribed? How can the law enforcement agent communicate securely with the Apple employee?
You think these things are theoretical, but they aren’t. Consider financial transactions. It used to be common that you could just email your bank/broker to wire funds into an account for such things as buying a house. Hackers have subverted that, intercepting messages, changing account numbers, and stealing millions. Most banks/brokers require additional verification before doing such transfers.
Let me repeat: Ozzie has only solved the part we already know how to solve. He hasn’t addressed these issues that confound us.
We still can’t secure security, much less secure backdoors
We already know how to decrypt iPhones: just wait a year or two for somebody to discover a vulnerability. FBI claims it’s “going dark”, but that’s only for timely decryption of phones. If they are willing to wait a year or two a vulnerability will eventually be found that allows decryption.
That’s what’s happened with the “GrayKey” device that’s been all over the news lately. Apple is fixing it so that it won’t work on new phones, but it works on old phones.
Ozzie’s solution is based on the assumption that iPhones are already secure against things like GrayKey. Like his assumption “if Apple already has a vault for private keys, then we have such vaults for backdoor keys”, Ozzie is saying “if Apple already had secure hardware/software to secure the phone, then we can use the same stuff to secure the backdoors”. But we don’t really have secure vaults and we don’t really have secure hardware/software to secure the phone.
Again, to stress this point, Ozzie is solving the part we already know how to solve, but ignoring the stuff we don’t know how to solve. His solution is insecure for the same reason phones are already insecure.
Locked phones aren’t the problem Phones are general purpose computers. That means anybody can install an encryption app on the phone regardless of whatever other security the phone might provide. The police are powerless to stop this. Even if they make such encryption crime, then criminals will still use encryption.
That leads to a strange situation that the only data the FBI will be able to decrypt is that of people who believe they are innocent. Those who know they are guilty will install encryption apps like Signal that have no backdoors.
In the past this was rare, as people found learning new apps a barrier. These days, apps like Signal are so easy even drug dealers can figure out how to use them.
We know how to get Apple to give us a backdoor, just pass a law forcing them to. It may look like Ozzie’s scheme, it may be something more secure designed by Apple’s engineers. Sure, it will weaken security on the phone for everyone, but those who truly care will just install Signal. But again we are back to the problem that Ozzie’s solving the problem we know how to solve while ignoring the much larger problem, that of preventing people from installing their own encryption.
The FBI isn’t necessarily the problem Ozzie phrases his solution in terms of U.S. law enforcement. Well, what about Europe? What about Russia? What about China? What about North Korea?
Technology is borderless. A solution in the United States that allows “legitimate” law enforcement requests will inevitably be used by repressive states for what we believe would be “illegitimate” law enforcement requests.
Ozzie sees himself as the hero helping law enforcement protect 300 million American citizens. He doesn’t see himself what he really is, the villain helping oppress 1.4 billion Chinese, 144 million Russians, and another couple billion living in oppressive governments around the world.
Conclusion Ozzie pretends the problem is political, that he’s created a solution that appeases both sides. He hasn’t. He’s solved the problem we already know how to solve. He’s ignored all the problems we struggle with, the problems we claim make secure backdoors essentially impossible. I’ve listed some in this post, but there are many more. Any famous person can create a solution that convinces fawning editors at Wired Magazine, but if Ozzie wants to move forward he’s going to have to work harder to appease doubting cryptographers.
For some organizations, the idea of “going serverless” can be daunting. But with an understanding of best practices – and the right tools — many serverless applications can be fully functional with only a few lines of code and little else.
Examples of fully-serverless-application use cases include:
Web or mobile backends – Create fully-serverless, mobile applications or websites by creating user-facing content in a native mobile application or static web content in an S3 bucket. Then have your front-end content integrate with Amazon API Gateway as a backend service API. Lambda functions will then execute the business logic you’ve written for each of the API Gateway methods in your backend API.
Chatbots and virtual assistants – Build new serverless ways to interact with your customers, like customer support assistants and bots ready to engage customers on your company-run social media pages. The Amazon Alexa Skills Kit (ASK) and Amazon Lex have the ability to apply natural-language understanding to user-voice and freeform-text input so that a Lambda function you write can intelligently respond and engage with them.
Internet of Things (IoT) backends – AWS IoT has direct-integration for device messages to be routed to and processed by Lambda functions. That means you can implement serverless backends for highly secure, scalable IoT applications for uses like connected consumer appliances and intelligent manufacturing facilities.
Using AWS Lambda as the logic layer of a serverless application can enable faster development speed and greater experimentation – and innovation — than in a traditional, server-based environment.
Andrew Baird is a Sr. Solutions Architect for AWS. Prior to becoming a Solutions Architect, Andrew was a developer, including time as an SDE with Amazon.com. He has worked on large-scale distributed systems, public-facing APIs, and operations automation.
This article, pointed out by @TheGrugq, is stupid enough that it’s worth rebutting.
“The views and opinions expressed are those of the author and not necessarily the positions of the U.S. Army, Department of Defense, or the U.S. Government.” <- I sincerely hope so… “the cyber guns of August” https://t.co/xdybbr5B0E
The article starts with the question “Why did the lessons of Stuxnet, Wannacry, Heartbleed and Shamoon go unheeded?“. It then proceeds to ignore the lessons of those things.
Some of the actual lessons should be things like how Stuxnet crossed air gaps, how Wannacry spread through flat Windows networking, how Heartbleed comes from technical debt, and how Shamoon furthers state aims by causing damage.
But this article doesn’t cover the technical lessons. Instead, it thinks the lesson should be the moral lesson, that we should take these things more seriously. But that’s stupid. It’s the sort of lesson people teach you that know nothing about the topic. When you have nothing of value to contribute to a topic you can always take the moral high road and criticize everyone for being morally weak for not taking it more seriously. Obviously, since doctors haven’t cured cancer yet, it’s because they don’t take the problem seriously.
The article continues to ignore the lesson of these cyber attacks and instead regales us with a list of military lessons from WW I and WW II. This makes the same flaw that many in the military make, trying to understand cyber through analogies with the real world. It’s not that such lessons could have no value, it’s that this article contains a poor list of them. It seems to consist of a random list of events that appeal to the author rather than events that have bearing on cybersecurity.
Then, in case we don’t get the point, the article bullies us with hyperbole, cliches, buzzwords, bombastic language, famous quotes, and citations. It’s hard to see how most of them actually apply to the text. Rather, it seems like they are included simply because he really really likes them.
The article invests much effort in discussing the buzzword “OODA loop”. Most attacks in cyberspace don’t have one. Instead, attackers flail around, trying lots of random things, overcoming defense with brute-force rather than an understanding of what’s going on. That’s obviously the case with Wannacry: it was an accident, with the perpetrator experimenting with what would happen if they added the ETERNALBLUE exploit to their existing ransomware code. The consequence was beyond anybody’s ability to predict.
You might claim that this is just the first stage, that they’ll loop around, observe Wannacry’s effects, orient themselves, decide, then act upon what they learned. Nope. Wannacry burned the exploit. It’s essentially removed any vulnerable systems from the public Internet, thereby making it impossible to use what they learned. It’s still active a year later, with infected systems behind firewalls busily scanning the Internet so that if you put a new system online that’s vulnerable, it’ll be taken offline within a few hours, before any other evildoer can take advantage of it.
See what I’m doing here? Learning the actual lessons of things like Wannacry? The thing the above article fails to do??
The article has a humorous paragraph on “defense in depth”, misunderstanding the term. To be fair, it’s the cybersecurity industry’s fault: they adopted then redefined the term. That’s why there’s two separate articles on Wikipedia: one for the old military term (as used in this article) and one for the new cybersecurity term.
As used in the cybersecurity industry, “defense in depth” means having multiple layers of security. Many organizations put all their defensive efforts on the perimeter, and none inside a network. The idea of “defense in depth” is to put more defenses inside the network. For example, instead of just one firewall at the edge of the network, put firewalls inside the network to segment different subnetworks from each other, so that a ransomware infection in the customer support computers doesn’t spread to sales and marketing computers.
The article talks about exploiting WiFi chips to bypass the defense in depth measures like browser sandboxes. This is conflating different types of attacks. A WiFi attack is usually considered a local attack, from somebody next to you in bar, rather than a remote attack from a server in Russia. Moreover, far from disproving “defense in depth” such WiFi attacks highlight the need for it. Namely, phones need to be designed so that successful exploitation of other microprocessors (namely, the WiFi, Bluetooth, and cellular baseband chips) can’t directly compromise the host system. In other words, once exploited with “Broadpwn”, a hacker would need to extend the exploit chain with another vulnerability in the hosts Broadcom WiFi driver rather than immediately exploiting a DMA attack across PCIe. This suggests that if PCIe is used to interface to peripherals in the phone that an IOMMU be used, for “defense in depth”.
Cybersecurity is a young field. There are lots of useful things that outsider non-techies can teach us. Lessons from military history would be well-received.
But that’s not this story. Instead, this story is by an outsider telling us we don’t know what we are doing, that they do, and then proceeds to prove they don’t know what they are doing. Their argument is based on a moral suasion and bullying us with what appears on the surface to be intellectual rigor, but which is in fact devoid of anything smart.
My fear, here, is that I’m going to be in a meeting where somebody has read this pretentious garbage, explaining to me why “defense in depth” is wrong and how we need to OODA faster. I’d rather nip this in the bud, pointing out if you found anything interesting from that article, you are wrong.
AWS Glue is an increasingly popular way to develop serverless ETL (extract, transform, and load) applications for big data and data lake workloads. Organizations that transform their ETL applications to cloud-based, serverless ETL architectures need a seamless, end-to-end continuous integration and continuous delivery (CI/CD) pipeline: from source code, to build, to deployment, to product delivery. Having a good CI/CD pipeline can help your organization discover bugs before they reach production and deliver updates more frequently. It can also help developers write quality code and automate the ETL job release management process, mitigate risk, and more.
AWS Glue is a fully managed data catalog and ETL service. It simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. AWS Glue crawls your data sources and constructs a data catalog using pre-built classifiers for popular data formats and data types, including CSV, Apache Parquet, JSON, and more.
When you are developing ETL applications using AWS Glue, you might come across some of the following CI/CD challenges:
Iterative development with unit tests
Continuous integration and build
Pushing the ETL pipeline to a test environment
Pushing the ETL pipeline to a production environment
Testing ETL applications using real data (live test)
The following diagram shows the pipeline workflow:
This solution uses AWS CodePipeline, which lets you orchestrate and automate the test and deploy stages for ETL application source code. The solution consists of a pipeline that contains the following stages:
1.) Source Control: In this stage, the AWS Glue ETL job source code and the AWS CloudFormation template file for deploying the ETL jobs are both committed to version control. I chose to use AWS CodeCommit for version control.
2.) LiveTest: In this stage, all resources—including AWS Glue crawlers, jobs, S3 buckets, roles, and other resources that are required for the solution—are provisioned, deployed, live tested, and cleaned up.
The LiveTest stage includes the following actions:
Deploy: In this action, all the resources that are required for this solution (crawlers, jobs, buckets, roles, and so on) are provisioned and deployed using an AWS CloudFormation template.
AutomatedLiveTest: In this action, all the AWS Glue crawlers and jobs are executed and data exploration and validation tests are performed. These validation tests include, but are not limited to, record counts in both raw tables and transformed tables in the data lake and any other business validations. I used AWS CodeBuild for this action.
LiveTestApproval: This action is included for the cases in which a pipeline administrator approval is required to deploy/promote the ETL applications to the next stage. The pipeline pauses in this action until an administrator manually approves the release.
LiveTestCleanup: In this action, all the LiveTest stage resources, including test crawlers, jobs, roles, and so on, are deleted using the AWS CloudFormation template. This action helps minimize cost by ensuring that the test resources exist only for the duration of the AutomatedLiveTest and LiveTestApproval
3.) DeployToProduction: In this stage, all the resources are deployed using the AWS CloudFormation template to the production environment.
Try it out
This code pipeline takes approximately 20 minutes to complete the LiveTest test stage (up to the LiveTest approval stage, in which manual approval is required).
To get started with this solution, choose Launch Stack:
This creates the CI/CD pipeline with all of its stages, as described earlier. It performs an initial commit of the sample AWS Glue ETL job source code to trigger the first release change.
In the AWS CloudFormation console, choose Create. After the template finishes creating resources, you see the pipeline name on the stack Outputs tab.
After that, open the CodePipeline console and select the newly created pipeline. Initially, your pipeline’s CodeCommit stage shows that the source action failed.
Allow a few minutes for your new pipeline to detect the initial commit applied by the CloudFormation stack creation. As soon as the commit is detected, your pipeline starts. You will see the successful stage completion status as soon as the CodeCommit source stage runs.
In the CodeCommit console, choose Code in the navigation pane to view the solution files.
Next, you can watch how the pipeline goes through the LiveTest stage of the deploy and AutomatedLiveTest actions, until it finally reaches the LiveTestApproval action.
At this point, if you check the AWS CloudFormation console, you can see that a new template has been deployed as part of the LiveTest deploy action.
At this point, make sure that the AWS Glue crawlers and the AWS Glue job ran successfully. Also check whether the corresponding databases and external tables have been created in the AWS Glue Data Catalog. Then verify that the data is validated using Amazon Athena, as shown following.
Open the AWS Glue console, and choose Databases in the navigation pane. You will see the following databases in the Data Catalog:
Open the Amazon Athena console, and run the following queries. Verify that the record counts are matching.
SELECT count(*) FROM "nycitytaxi_gluedemocicdtest"."data";
SELECT count(*) FROM "nytaxiparquet_gluedemocicdtest"."datalake";
The following shows the raw data:
The following shows the transformed data:
The pipeline pauses the action until the release is approved. After validating the data, manually approve the revision on the LiveTestApproval action on the CodePipeline console.
Add comments as needed, and choose Approve.
The LiveTestApproval stage now appears as Approved on the console.
After the revision is approved, the pipeline proceeds to use the AWS CloudFormation template to destroy the resources that were deployed in the LiveTest deploy action. This helps reduce cost and ensures a clean test environment on every deployment.
Production deployment is the final stage. In this stage, all the resources—AWS Glue crawlers, AWS Glue jobs, Amazon S3 buckets, roles, and so on—are provisioned and deployed to the production environment using the AWS CloudFormation template.
After successfully running the whole pipeline, feel free to experiment with it by changing the source code stored on AWS CodeCommit. For example, if you modify the AWS Glue ETL job to generate an error, it should make the AutomatedLiveTest action fail. Or if you change the AWS CloudFormation template to make its creation fail, it should affect the LiveTest deploy action. The objective of the pipeline is to guarantee that all changes that are deployed to production are guaranteed to work as expected.
In this post, you learned how easy it is to implement CI/CD for serverless AWS Glue ETL solutions with AWS developer tools like AWS CodePipeline and AWS CodeBuild at scale. Implementing such solutions can help you accelerate ETL development and testing at your organization.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.
We launched AWS Support a full decade ago, with Gold and Silver plans focused on Amazon EC2, Amazon S3, and Amazon SQS. Starting from that initial offering, backed by a small team in Seattle, AWS Support now encompasses thousands of people working from more than 60 locations.
A Quick Look Back Over the years, that offering has matured and evolved in order to meet the needs of an increasingly diverse base of AWS customers. We aim to support you at every step of your cloud adoption journey, from your initial experiments to the time you deploy mission-critical workloads and applications.
We have worked hard to make our support model helpful and proactive. We do our best to provide you with the tools, alerts, and knowledge that will help you to build systems that are secure, robust, and dependable. Here are some of our most recent efforts toward that goal:
Trusted Advisor S3 Bucket Policy Check – AWS Trusted Advisor provides you with five categories of checks and makes recommendations that are designed to improve security and performance. Earlier this year we announced that the S3 Bucket Permissions Check is now free, and available to all AWS users. If you are signed up for the Business or Professional level of AWS Support, you can also monitor this check (and many others) using Amazon CloudWatch Events. You can use this to monitor and secure your buckets without human intervention.
Personal Health Dashboard – This tool provides you with alerts and guidance when AWS is experiencing events that may affect you. You get a personalized view into the performance and availability of the AWS services that underlie your AWS resources. It also generates Amazon CloudWatch Events so that you can initiate automated failover and remediation if necessary.
Well Architected / Cloud Ops Review – We’ve learned a lot about how to architect AWS-powered systems over the years and we want to share everything we know with you! The AWS Well-Architected Framework provide proven, detailed guidance in critical areas including operational excellence, security, reliability, performance efficiency, and cost optimization. You can read the materials online and you can also sign up for the online training course. If you are signed up for Enterprise support, you can also benefit from our Cloud Ops review.
Infrastructure Event Management – If you are launching a new app, kicking off a big migration, or hosting a large-scale event similar to Prime Day, we are ready with guidance and real-time support. Our Infrastructure Event Management team will help you to assess the readiness of your environment and work with you to identify and mitigate risks ahead of time.
The Amazon retail site makes heavy use of AWS. You can read my post, Prime Day 2017 – Powered by AWS, to learn more about the process of preparing to sustain a record-setting amount of traffic and to accept a like number of orders.
Come and Join Us The AWS Support Team is in continuous hiring mode and we have openings all over the world! Here are a couple of highlights:
It’s been just over three weeks since we launched the new Raspberry Pi 3 Model B+. Although the product is branded Raspberry Pi 3B+ and not Raspberry Pi 4, a serious amount of engineering was involved in creating it. The wireless networking, USB/Ethernet hub, on-board power supplies, and BCM2837 chip were all upgraded: together these represent almost all the circuitry on the board! Today, I’d like to tell you about the work that has gone into creating a custom power supply chip for our newest computer.
The new Raspberry Pi 3B+, sporting a new, custom power supply chip (bottom left-hand corner)
The Raspberry Pi 3B+ has been well received, and we’ve enjoyed hearing feedback from the community as well as reading the various reviews and articles highlighting the solid improvements in wireless networking, Ethernet, CPU, and thermal performance of the new board. Gareth Halfacree’s post here has some particularly nice graphs showing the increased performance as well as how the Pi 3B+ keeps cool under load due to the new CPU package that incorporates a metal heat spreader. The Raspberry Pi production lines at the Sony UK Technology Centre are running at full speed, and it seems most people who want to get hold of the new board are able to find one in stock.
Powering your Pi
One of the most critical but often under-appreciated elements of any electronic product, particularly one such as Raspberry Pi with lots of complex on-board silicon (processor, networking, high-speed memory), is the power supply. In fact, the Raspberry Pi 3B+ has no fewer than six different voltage rails: two at 3.3V — one special ‘quiet’ one for audio, and one for everything else; 1.8V; 1.2V for the LPDDR2 memory; and 1.2V nominal for the CPU core. Note that the CPU voltage is actually raised and lowered on the fly as the speed of the CPU is increased and decreased depending on how hard the it is working. The sixth rail is 5V, which is the master supply that all the others are created from, and the output voltage for the four downstream USB ports; this is what the mains power adaptor is supplying through the micro USB power connector.
Power supply primer
There are two common classes of power supply circuits: linear regulators and switching regulators. Linear regulators work by creating a lower, regulated voltage from a higher one. In simple terms, they monitor the output voltage against an internally generated reference and continually change their own resistance to keep the output voltage constant. Switching regulators work in a different way: they ‘pump’ energy by first storing the energy coming from the source supply in a reactive component (usually an inductor, sometimes a capacitor) and then releasing it to the regulated output supply. The switches in switching regulators effect this energy transfer by first connecting the inductor (or capacitor) to store the source energy, and then switching the circuit so the energy is released to its destination.
Linear regulators produce smoother, less noisy output voltages, but they can only convert to a lower voltage, and have to dissipate energy to do so. The higher the output current and the voltage difference across them is, the more energy is lost as heat. On the other hand, switching supplies can, depending on their design, convert any voltage to any other voltage and can be much more efficient (efficiencies of 90% and above are not uncommon). However, they are more complex and generate noisier output voltages.
Designers use both types of regulators depending on the needs of the downstream circuit: for low-voltage drops, low current, or low noise, linear regulators are usually the right choice, while switching regulators are used for higher power or when efficiency of conversion is required. One of the simplest switching-mode power supply circuits is the buck converter, used to create a lower voltage from a higher one, and this is what we use on the Pi.
A history lesson
The BCM2835 processor chip (found on the original Raspberry Pi Model B and B+, as well as on the Zero products) has on-chip power supplies: one switch-mode regulator for the core voltage, as well as a linear one for the LPDDR2 memory supply. This meant that in addition to 5V, we only had to provide 3.3V and 1.8V on the board, which was relatively simple to do using cheap, off-the-shelf parts.
Pi Zero sporting a BCM2835 processor which only needs 2 external switchers (the components clustered behind the camera port)
When we moved to the BCM2836 for Raspberry Pi Model 2 (and subsequently to the BCM2837A1 and B0 for Raspberry Pi 3B and 3B+), the core supply and the on-chip LPDDR2 memory supply were not up to the job of supplying the extra processor cores and larger memory, so we removed them. (We also used the recovered chip area to help fit in the new quad-core ARM processors.) The upshot of this was that we had to supply these power rails externally for the Raspberry Pi 2 and models thereafter. Moreover, we also had to provide circuitry to sequence them correctly in order to control exactly when they power up compared to the other supplies on the board.
Power supply design is tricky (but critical)
Raspberry Pi boards take in 5V from the micro USB socket and have to generate the other required supplies from this. When 5V is first connected, each of these other supplies must ‘start up’, meaning go from ‘off’, or 0V, to their correct voltage in some short period of time. The order of the supplies starting up is often important: commonly, there are structures inside a chip that form diodes between supply rails, and bringing supplies up in the wrong order can sometimes ‘turn on’ these diodes, causing them to conduct, with undesirable consequences. Silicon chips come with a data sheet specifying what supplies (voltages and currents) are needed and whether they need to be low-noise, in what order they must power up (and in some cases down), and sometimes even the rate at which the voltages must power up and down.
A Pi3. Power supply components are clustered bottom left next to the micro USB, middle (above LPDDR2 chip which is on the bottom of the PCB) and above the A/V jack.
In designing the power chain for the Pi 2 and 3, the sequencing was fairly straightforward: power rails power up in order of voltage (5V, 3.3V, 1.8V, 1.2V). However, the supplies were all generated with individual, discrete devices. Therefore, I spent quite a lot of time designing circuitry to control the sequencing — even with some design tricks to reduce component count, quite a few sequencing components are required. More complex systems generally use a Power Management Integrated Circuit (PMIC) with multiple supplies on a single chip, and many different PMIC variants are made by various manufacturers. Since Raspberry Pi 2 days, I was looking for a suitable PMIC to simplify the Pi design, but invariably (and somewhat counter-intuitively) these were always too expensive compared to my discrete solution, usually because they came with more features than needed.
One device to rule them all
It was way back in May 2015 when I first chatted to Peter Coyle of Exar (Exar were bought by MaxLinear in 2017) about power supply products for Raspberry Pi. We didn’t find a product match then, but in June 2016 Peter, along with Tuomas Hollman and Trevor Latham, visited to pitch the possibility of building a custom power management solution for us.
I was initially sceptical that it could be made cheap enough. However, our discussion indicated that if we could tailor the solution to just what we needed, it could be cost-effective. Over the coming weeks and months, we honed a specification we agreed on from the initial sketches we’d made, and Exar thought they could build it for us at the target price.
The chip we designed would contain all the key supplies required for the Pi on one small device in a cheap QFN package, and it would also perform the required sequencing and voltage monitoring. Moreover, the chip would be flexible to allow adjustment of supply voltages from their default values via I2C; the largest supply would be capable of being adjusted quickly to perform the dynamic core voltage changes needed in order to reduce voltage to the processor when it is idling (to save power), and to boost voltage to the processor when running at maximum speed (1.4 GHz). The supplies on the chip would all be generously specified and could deliver significantly more power than those used on the Raspberry Pi 3. All in all, the chip would contain four switching-mode converters and one low-current linear regulator, this last one being low-noise for the audio circuitry.
The MXL7704 chip
The project was a great success: MaxLinear delivered working samples of first silicon at the end of May 2017 (almost exactly a year after we had kicked off the project), and followed through with production quantities in December 2017 in time for the Raspberry Pi 3B+ production ramp.
Front row: Roger with the very first Pi 3B+ prototypes and James with a MXL7704 development board hacked to power a Pi 3. Back row left to right: Will Torgerson, Trevor Latham, Peter Coyle, Tuomas Hollman.
The MXL7704 device has been key to reducing Pi board complexity and therefore overall bill of materials cost. Furthermore, by being able to deliver more power when needed, it has also been essential to increasing the speed of the (newly packaged) BCM2837B0 processor on the 3B+ to 1.4GHz. The result is improvements to both the continuous output current to the CPU (from 3A to 4A) and to the transient performance (i.e. the chip has helped to reduce the ‘transient response’, which is the change in supply voltage due to a sudden current spike that occurs when the processor suddenly demands a large current in a few nanoseconds, as modern CPUs tend to do).
With the MXL7704, the power supply circuitry on the 3B+ is now a lot simpler than the Pi 3B design. This new supply also provides the LPDDR2 memory voltage directly from a switching regulator rather than using linear regulators like the Pi 3, thereby improving energy efficiency. This helps to somewhat offset the extra power that the faster Ethernet, wireless networking, and processor consume. A pleasing side effect of using the new chip is the symmetric board layout of the regulators — it’s easy to see the four switching-mode supplies, given away by four similar-looking blobs (three grey and one brownish), which are the inductors.
The Pi 3B+ PMIC MXL7704 — pleasingly symmetric
It takes a lot of effort to design a new chip from scratch and get it all the way through to production — we are very grateful to the team at MaxLinear for their hard work, dedication, and enthusiasm. We’re also proud to have created something that will not only power Raspberry Pis, but will also be useful for other product designs: it turns out when you have a low-cost and flexible device, it can be used for many things — something we’re fairly familiar with here at Raspberry Pi! For the curious, the product page (including the data sheet) for the MXL7704 chip is here. Particular thanks go to Peter Coyle, Tuomas Hollman, and Trevor Latham, and also to Jon Cronk, who has been our contact in the US and has had to get up early to attend all our conference calls!
The MXL7704 design team celebrating on Pi Day — it takes a lot of people to design a chip!
I hope you liked reading about some of the effort that has gone into creating the new Pi. It’s nice to finally have a chance to tell people about some of the (increasingly complex) technical work that makes building a $35 computer possible — we’re very pleased with the Raspberry Pi 3B+, and we hope you enjoy using it as much as we’ve enjoyed creating it!
Amazon SageMaker continues to iterate quickly and release new features on behalf of customers. Starting today, SageMaker adds support for many new instance types, local testing with the SDK, and Apache MXNet 1.1.0 and Tensorflow 1.6.0. Let’s take a quick look at each of these updates.
New Instance Types
Amazon SageMaker customers now have additional options for right-sizing their workloads for notebooks, training, and hosting. Notebook instances now support almost all T2, M4, P2, and P3 instance types with the exception of t2.micro, t2.small, and m4.large instances. Model training now supports nearly all M4, M5, C4, C5, P2, and P3 instances with the exception of m4.large, c4.large, and c5.large instances. Finally, model hosting now supports nearly all T2, M4, M5, C4, C5, P2, and P3 instances with the exception of m4.large instances. Many customers can take advantage of the newest P3, C5, and M5 instances to get the best price/performance for their workloads. Customers also take advantage of the burstable compute model on T2 instances for endpoints or notebooks that are used less frequently.
Open Sourced Containers, Local Mode, and TensorFlow 1.6.0 and MXNet 1.1.0
Today Amazon SageMaker has open sourced the MXNet and Tensorflow deep learning containers that power the MXNet and Tensorflow estimators in the SageMaker SDK. The ability to write Python scripts that conform to simple interface is still one of my favorite SageMaker features and now those containers can be additionally customized to include any additional libraries. You can download these containers locally to iterate and experiment which can accelerate your debugging cycle. When you’re ready go from local testing to production training and hosting you just change one line of code.
These containers launch with support for Tensorflow 1.6.0 and MXNet 1.1.0 as well. Tensorflow has a number of new 1.6.0 features including support for CUDA 9.0, cuDNN 7, and AVX instructions which allows for significant speedups in many training applications. MXNet 1.1.0 adds a number of new features including a Text API mxnet.text with support for text processing, indexing, glossaries, and more. Two of the really cool pre-trained embeddings included are GloVe and fastText. <
Available Now All of the features mentioned above are available today. As always please let us know on Twitter or in the comments below if you have any questions or if you’re building something interesting. Now, if you’ll excuse me I’m going to go experiment with some of those new MXNet APIs!
As they sail aboard their floating game design studio Pino, Rekka Bellum and Devine Lu Linvega are starting to explore the use of Raspberry Pis. As part of an experimental development tool and a weather station, Pis are now aiding them on their nautical adventures!
Pino is on its way to becoming a smart sailboat! Raspberry Pi is the ideal device for sailors, we hope to make many more projects with it. Also the projects continue still, but we have windows now yay!
Using a haul of Pimoroni tech including the Enviro pHat, Scroll pHat HD and Mini Black HAT Hack3r, Rekka and Devine have been experimenting with using a Raspberry Pi Zero as an onboard barometer for their sailboat. On their Hundred Rabbits YouTube channel and website, the pair has documented their experimental setups. They have also built another Raspberry Pi rig for distraction-free work and development.
“The Pi computer is currently used only as an experimental development tool aboard Pino, but could readily be turned into a complete development platform, would our principal computers fail.” they explain, before going into the build process for the Raspberry Pi–powered barometer.
The use of solderless headers make this weather station an ideal build wherever space and tools are limited.
The barometer uses the sensor power of the Pimoroni Enviro HAT to measure atmospheric pressure, and a Raspberry Pi Zero displays this data on the Scroll pHAT HD. It thus advises the two travellers of oncoming storms. By taking advantage of the solderless header provided by the Sheffield-based pirates, the Hundred Rabbits team was able to put the device together with relative ease. They provide all information for the build here.
This is us, this what we do, and these are our intentions! We live, and work from our sailboat Pino. Traveling helps us stay creative, and we feed what we see back into our work. We make games, art, books and music under the studio name ‘Hundred Rabbits.’
With AWS Organizations, you can centrally manage policies across multiple AWS accounts without having to use custom scripts and manual processes. For example, you can apply service control policies (SCPs) across multiple AWS accounts that are members of an organization. SCPs allow you to define which AWS service APIs can and cannot be executed by AWS Identity and Access Management (IAM) entities (such as IAM users and roles) in your organization’s member AWS accounts. SCPs are created and applied from the master account, which is the AWS account that you used when you created your organization.
OUs give you a way to logically group and structure member AWS accounts in your organization. The screenshot shows the tree view of an example organizational structure in my organization with several OUs. Currently, I have selected OrgUnit01, and this is the current view I see in my main window. You can see here that within the OrgUnit01 OU, I have nested two additional OUs (OrgUnit01ChildA and OrgUnit01ChildB) and an AWS account is also contained within OrgUnit01, named “Developer Sandbox Account”.
The parts of the example organizational structure in the screenshot are:
Tree view — The hierarchy of your organization’s root and any OUs you have created
Tree view toggle — Enable and disable tree view
Organizational Units — Any child OUs of the selected root or OU in tree view
Accounts — Any AWS accounts (members or master) in the current OU
In the next section, I explain why at least one SCP must be attached to your root and OUs and introduce SCP evaluation.
How Service Control Policy evaluation logic works
To allow an AWS service API at the member account level, you must allow the API at every level between the member account and the root of your organization. This means you must attach an SCP at every level between your organization’s root and the member account that allows the given AWS service API (such as ec2:RunInstances). For more information, see About Service Control Policies.
Let’s say you want to allow the ec2:RunInstances API in the Developer Sandbox Account in the example structure in the preceding screenshot. To allow this AWS service API, you must allow the API in at least one SCP attached at each of these levels:
The organization’s root
The OU named OrgUnit01
If you don’t allow the AWS service in an SCP attached at each of these two levels, neither IAM entities nor the root user in the Developer Sandbox Account will be able to call ec2:RunInstances, even if an administrator has given them permission to do so (for IAM entities). In terms of policy evaluation, SCPs follow exactly the same policy evaluation logic as IAM does: by default, all requests are denied, an explicit allow overrides this default, and an explicit deny overrides any explicit allows.
What does this look like in practice? In the next section, I share a practical example to demonstrate how this works in Organizations.
An example structure with nested OUs and SCPs
In the previous section, I introduced design aspects of AWS Organizations that help prevent administrators from breaking structures in their Organizations. But because AWS Organizations is flexible enough to address multiple use cases, administrators can make changes that have unintended consequences, such as breaking organizational structures when moving an AWS account from one OU to another. In this section, I show an example with broken OU and SCP structures and explain how you can fix them.
I’ll take a blacklisting approach. That is, I’ll use the FullAWSAccess SCP, which doesn’t filter out any AWS service APIs. Then, I will filter out specific APIs by blacklisting them in subsequent SCPs attached to OUs at various points in my organization’s structure. For further reading on blacklisting and whitelisting with AWS Organizations, review AWS Organizations Terminology and Concepts.
Let’s say I have developed the OU and SCP structure shown in the diagram below. Before taking a close look at that diagram, I’ll briefly outline the goals I’m trying to achieve. Broadly speaking, there’s a small subset of APIs that I want to filter out using SCPs. This means that IAM entities in some AWS accounts in my organization will not have access to particular AWS service APIs, such as those related to Amazon EC2, while other accounts will not have access to APIs associated with Amazon CloudWatch, Amazon S3, and so on. Apart from these special cases, I do want the accounts in my organization to have access to all other APIs. More specifically, my goals are as follows:
Any AWS accounts in the Root should not have any API filtered out.
Any AWS accounts in OU 001 should have APIs for CloudWatch filtered out, but all other APIs will be accessible.
Any AWS accounts in OU 002 should have APIs for both CloudWatch and EC2 filtered out, but all other APIs will be accessible.
Any AWS accounts in OU 003 should have APIs for S3 filtered out, but all other APIs will be accessible.
To that end, let’s now look at my initial SCP and OU configuration in the image below that shows the example OU and SCP structure. The arrow shows the direction of inheritance: the root and the OUs below it (children) inherit SCPs from the OUs above them (parents). This example structure contains the following SCPs:
FullAWSAccess — Allows all AWS service APIs
Deny_CW — Denies all CloudWatch APIs
Deny_EC2 — Denies all Amazon EC2 APIs
Deny_S3 — Denies all Amazon S3 APIs
Now that I’ve outlined my intent, and shown you the OU / SCP structure that I’ve created to meet that set of goals, you can probably already see that the structure provided in the image above will not work correctly for my stated goals. In fact, AWS accounts in the Root container and OU 001 will have the intended access, as per my goals (1) and (2). I will not, however, meet my goals (3) and (4) with the above structure: entities in member accounts directly under OU 002 cannot perform any actions, even if they’re granted permissions by IAM access policies. This is because the FullAWSAccess SCP isn’t attached directly to this OU (it’s only inherited).
Why is this important? For an AWS service API to be available to IAM entities in a member account, the API must be specified in an SCP attached at every level all the way down the hierarchy to the relevant member account. Similarly, even though OU 003 does have the FullAWSAccess SCP attached directly to it, the fact that it’s not attached to the parent OU (OU 002) means that IAM entities in member accounts under OU 003 also aren’t able to access any service APIs. This doesn’t happen by default—I have deliberately taken this action to organize my structure in this way, to show both the flexibility and the kind of problems you can encounter when working with OUs and SCPs.
So I now need to fix the problems that I’ve inadvertently created. To start with, I’m going to make one change to OU 002 by attaching the FullAWSAccess SCP directly to that OU. After I do that, OU 002 has the attached and inherited policies that are shown in the following image.
With the FullAWSAccess policy attached to OU 002, member accounts in both OU 002 and OU 003 can access the other non-restricted AWS service APIs (keeping in mind that the FullAWSAccess policy was already applied to OU 003).
I have one final issue to address in this example: OU 003 has an SCP attached that blocks access to the Amazon S3 APIs. However, in this OU, the intent is to allow IAM entities in member accounts to access the EC2 APIs. EC2 API access is blocked because in the parent OU (OU 002), an SCP is attached that denies access to that API (the Deny_EC2 SCP), which means that any actions listed in Deny_EC2 have already been filtered out. An explicit deny always trumps an allow, so to meet goal (4) and have an OU in which EC2 APIs are allowed but access to CloudWatch APIs and S3 APIs is filtered out, I will move OU 003 up one level, placing it directly under OU 001. This change gives me a working OU and policy structure, as shown in the following image.
I recommend that at each level of your organization’s hierarchy, you directly apply the relevant SCPs. By doing this, you’re less likely to forget to apply an SCP to a particular OU, which can break your permission structure. By directly applying SCPs, you also make your policy structure easier to read.
If you have a group of accounts in your organization that are for testing purposes, I recommend that you experiment with OUs and SCPs. Applying SCPs to OUs and then moving an AWS account around within that structure can show you how SCPs affect IAM entities. For example, if you have an IAM user with the AdministratorAccess policy attached, you should see how SCPs can filter out certain AWS service APIs from specified member accounts.
I showed you how you can effectively apply SCPs to OUs in your organization and avoid some of the common issues that you might experience. I demonstrated an approach to designing a working organizational structure that I hope will help smooth your deployment of your organization and enables you to better centrally secure and manage your AWS accounts.
If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Organizations forum.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.