One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”
Tadaaaa! The BYO weather station fully assembled.
Our Oracle Weather Station
In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.
The original Raspberry Pi Oracle Weather Station HAT
We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!
Fun with meteorological experiments!
Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.
Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.
There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.
Who should try this build
We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.
The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.
You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.
For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.
Our plans for the guide
Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!
*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.
The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.
The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.
In addition, they provide dos and don’ts for how to behave on a reef dive.
To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.
This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.
The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.
The web dashboard showing live Nemo-Pi data
Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.
A healthy coral reef
Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.
A bleached coral reef
Vote now for Save Nemo
If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:
Click “Abstimmen” in the footer of the page to vote
Click “JA” in the footer to confirm
Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.
GDPR day, May 25, 2018, is nearly here. On that day, will your inbox explode with update notices, opt-in agreements, and offers from lawyers searching for GDPR violators? Perhaps all the companies on earth that are not GDPR ready will just dissolve into dust. More likely, there will be some changes, but business as usual will continue and we’ll all be more aware of data privacy. Let’s go with the last one.
What’s Different With GDPR at Backblaze
As a reminder, at Backblaze your data falls into two catagories. The first type of data is the data you store with us — stored data. These are the files and objects you upload and store, and as needed, restore. We do not share this data. We do not process this data, except as requested by you to store and restore the data. We do not analyze this data looking for keywords, tags, images, etc. No one outside of Backblaze has access to this data unless you explicitly shared the data by providing that person access to one or more files.
The second type of data is your account data. Some of your account data is considered personal data. This is the information we collect from you to provide our Personal Backup, Business Backup and B2 Cloud Storage services. Examples include your email address to provide access to your account, or the name of your computer so we can organize your files like they are arranged on your computer to make restoration easier. We have written a number of Help Articles covering the different ways this information is collected and processed. In addition, these help articles outline the various “rights” granted via GDPR. We will continue to add help articles over the coming weeks to assist in making it easy to work with us to understand and exercise your rights.
What’s New With GDPR at Backblaze
The most obvious addition is the Data Processing Addendum (DPA). This covers how we protect the data you store with us, i.e. stored data. As noted above, we don’t do anything with your data, except store it and keep it safe until you need it. Now we have a separate document saying that.
Every company we have dealt with over the last few months is working hard to comply with GDPR. It has been a tough road whether you tried to do it yourself or like Backblaze, hired an EU-based law firm for advice. Over the coming weeks and months as you reach out to discover and assert your rights, please have a little patience. We are all going through a steep learning curve as GDPR gets put into practice. Along the way there are certain to be some growing pains — give us a chance, we all want to get it right.
Regardless, at Backblaze we’ve been diligently protecting our customers’ data for over 11 years and nothing that will happen on May 25th will change that.
Attention, case modders: take a look at the Brutus 2, an extremely snazzy computer case with a partly transparent, animated side panel that’s powered by a Pi. Daniel Otto and Carsten Lehman have a current crowdfunder for the case; their video is in German, but the looks of the build speak for themselves. There are some truly gorgeous effects here.
Vorbestellungen ab sofort auf https://www.startnext.com/brutus2 Weitere Infos zu uns auf: https://3nb.de https://www.facebook.com/3nb.de https://www.instagram.com/3nb.de Über 3nb: – GbR aus Leipzig, gegründet 2017 – wir kommen aus den Bereichen Elektronik und Informatik – erstes Produkt: der Brutus One ein Gaming PC mit transparentem Display in der Seite Kurzinfo Brutus 2: – Markencomputergehäuse für Gaming- /Casemoddingszene – Besonderheit: animiertes Seitenfenster angesteuert mit einem Raspberry Pi – Vorteile von unserem Case: o Case ist einzeln lieferbar und nicht nur als komplett-PC o kein Leistungsverbrauch der Grafikkarte dank integriertem Raspberry Pi o bessere Darstellung von Texten und Grafiken durch unscharfen Hintergrund
What’s case modding?
Case modding just means modifying your computer or gaming console’s case, and it’s very popular in the gaming community. Some mods are functional, while others improve the way the case looks. Lots of dedicated gamers don’t only want a powerful computer, they also want it to look amazing — at home, or at LAN parties and games tournaments.
The Brutus 2 case
The Brutus 2 case is made by Daniel and Carsten’s startup, 3nb electronics, and it’s a product that is officially Powered by Raspberry Pi. Its standout feature is the semi-transparent TFT screen, which lets you play any video clip you choose while keeping your gaming hardware on display. It looks incredibly cool. All the graphics for the case’s screen are handled by a Raspberry Pi, so it doesn’t use any of your main PC’s GPU power and your gaming won’t suffer.
To use Brutus 2, you just need to run a small desktop application on your PC to choose what you want to display on the case. A number of neat animations are included, and you can upload your own if you want.
So far, the app only runs on Windows, but 3nb electronics are planning to make the code open-source, so you can modify it for other operating systems, or to display other file types. This is true to the spirit of the case modding and Raspberry Pi communities, who love adapting, retrofitting, and overhauling projects and code to fit their needs.
Daniel and Carsten say that one of their campaign’s stretch goals is to implement more functionality in the Brutus 2 app. So in the future, the case could also show things like CPU temperature, gaming stats, and in-game messages. Of course, there’s nothing stopping you from integrating features like that yourself.
If you have any questions about the case, you can post them directly to Daniel and Carsten here.
The crowdfunding campaign
The Brutus 2 campaign on Startnext is currently halfway to its first funding goal of €10000, with over three weeks to go until it closes. If you’re quick, you still be may be able to snatch one of the early-bird offers. And if your whole guild NEEDS this, that’s OK — there are discounts for bulk orders.
At the moment I’m spending my evenings watching all of Star Trek in order. Yes, I have watched it before (but with some really big gaps). Yes, including the animated series (I’m up to The Terratin Incident). So I’m gratified to find this beautiful The Original Series–style tricorder build.
At this year’s Replica Prop Forum showcase, we meet up once again wtih Brian Mix, who brought his new Star Trek TOS Tricorder. This beautiful replica captures the weight and finish of the filming hand prop, and Brian has taken it one step further with some modern-day electronics!
A what now?
If you don’t know what a tricorder is, which I guess is faintly possible, the easiest way I can explain is to steal words that Liz wrote when Recanthamade one back in 2013. It’s “a made-up thing used by the crew of the Enterprise to measure stuff, store data, and scout ahead remotely when exploring strange new worlds, seeking out new life and new civilisations, and all that jazz.”
A brief history of Picorders
We’ve seen other Raspberry Pi–based realisations of this iconic device. Recantha’s LEGO-cased tricorder delivered some authentic functionality, including temperature sensors, an ultrasonic distance sensor, a photosensor, and a magnetometer. Michael Hahn’s tricorder for element14’s Sci-Fi Your Pi competition in 2015 packed some similar functions, along with Original Series audio effects, into a neat (albeit non-canon) enclosure.
Brian Mix’s Original Series tricorder
Brian Mix’s tricorder, seen in the video above from Tested at this year’s Replica Prop Forum showcase, is based on a high-quality kit into which, he discovered, a Raspberry Pi just fits. He explains that the kit is the work of the late Steve Horch, a special effects professional who provided props for later Star Trek series, including the classic Deep Space Nine episode Trials and Tribble-ations.
Dax, equipped for time travel
This episode’s plot required sets and props — including tricorders — replicating the USS Enterprise of The Original Series, and Steve Horch provided many of these. Thus, a tricorder kit from him is about as close to authentic as you can possibly find unless you can get your hands on a screen-used prop. The Pi allows Brian to drive a real display and a speaker: “Being the geek that I am,” he explains, “I set it up to run every single Original Series Star Trek episode.”
Even more wonderful hypothetical tricorders that I would like someone to make
This tricorder is beautiful, and it makes me think how amazing it would be to squeeze in some of the sensor functionality of the devices depicted in the show. Space in the case is tight, but it looks like there might be a little bit of depth to spare — enough for an IMU, maybe, or a temperature sensor. I’m certain the future will bring more Pi tricorder builds, and I, for one, can’t wait. Please tell us in the comments if you’re planning something along these lines, and, well, I suppose some other sci-fi franchises have decent Pi project potential too, so we could probably stand to hear about those.
If you’re commenting, no spoilers please past The Animated Series S1 E11. Thanks.
This article, pointed out by @TheGrugq, is stupid enough that it’s worth rebutting.
“The views and opinions expressed are those of the author and not necessarily the positions of the U.S. Army, Department of Defense, or the U.S. Government.” <- I sincerely hope so… “the cyber guns of August” https://t.co/xdybbr5B0E
The article starts with the question “Why did the lessons of Stuxnet, Wannacry, Heartbleed and Shamoon go unheeded?“. It then proceeds to ignore the lessons of those things.
Some of the actual lessons should be things like how Stuxnet crossed air gaps, how Wannacry spread through flat Windows networking, how Heartbleed comes from technical debt, and how Shamoon furthers state aims by causing damage.
But this article doesn’t cover the technical lessons. Instead, it thinks the lesson should be the moral lesson, that we should take these things more seriously. But that’s stupid. It’s the sort of lesson people teach you that know nothing about the topic. When you have nothing of value to contribute to a topic you can always take the moral high road and criticize everyone for being morally weak for not taking it more seriously. Obviously, since doctors haven’t cured cancer yet, it’s because they don’t take the problem seriously.
The article continues to ignore the lesson of these cyber attacks and instead regales us with a list of military lessons from WW I and WW II. This makes the same flaw that many in the military make, trying to understand cyber through analogies with the real world. It’s not that such lessons could have no value, it’s that this article contains a poor list of them. It seems to consist of a random list of events that appeal to the author rather than events that have bearing on cybersecurity.
Then, in case we don’t get the point, the article bullies us with hyperbole, cliches, buzzwords, bombastic language, famous quotes, and citations. It’s hard to see how most of them actually apply to the text. Rather, it seems like they are included simply because he really really likes them.
The article invests much effort in discussing the buzzword “OODA loop”. Most attacks in cyberspace don’t have one. Instead, attackers flail around, trying lots of random things, overcoming defense with brute-force rather than an understanding of what’s going on. That’s obviously the case with Wannacry: it was an accident, with the perpetrator experimenting with what would happen if they added the ETERNALBLUE exploit to their existing ransomware code. The consequence was beyond anybody’s ability to predict.
You might claim that this is just the first stage, that they’ll loop around, observe Wannacry’s effects, orient themselves, decide, then act upon what they learned. Nope. Wannacry burned the exploit. It’s essentially removed any vulnerable systems from the public Internet, thereby making it impossible to use what they learned. It’s still active a year later, with infected systems behind firewalls busily scanning the Internet so that if you put a new system online that’s vulnerable, it’ll be taken offline within a few hours, before any other evildoer can take advantage of it.
See what I’m doing here? Learning the actual lessons of things like Wannacry? The thing the above article fails to do??
The article has a humorous paragraph on “defense in depth”, misunderstanding the term. To be fair, it’s the cybersecurity industry’s fault: they adopted then redefined the term. That’s why there’s two separate articles on Wikipedia: one for the old military term (as used in this article) and one for the new cybersecurity term.
As used in the cybersecurity industry, “defense in depth” means having multiple layers of security. Many organizations put all their defensive efforts on the perimeter, and none inside a network. The idea of “defense in depth” is to put more defenses inside the network. For example, instead of just one firewall at the edge of the network, put firewalls inside the network to segment different subnetworks from each other, so that a ransomware infection in the customer support computers doesn’t spread to sales and marketing computers.
The article talks about exploiting WiFi chips to bypass the defense in depth measures like browser sandboxes. This is conflating different types of attacks. A WiFi attack is usually considered a local attack, from somebody next to you in bar, rather than a remote attack from a server in Russia. Moreover, far from disproving “defense in depth” such WiFi attacks highlight the need for it. Namely, phones need to be designed so that successful exploitation of other microprocessors (namely, the WiFi, Bluetooth, and cellular baseband chips) can’t directly compromise the host system. In other words, once exploited with “Broadpwn”, a hacker would need to extend the exploit chain with another vulnerability in the hosts Broadcom WiFi driver rather than immediately exploiting a DMA attack across PCIe. This suggests that if PCIe is used to interface to peripherals in the phone that an IOMMU be used, for “defense in depth”.
Cybersecurity is a young field. There are lots of useful things that outsider non-techies can teach us. Lessons from military history would be well-received.
But that’s not this story. Instead, this story is by an outsider telling us we don’t know what we are doing, that they do, and then proceeds to prove they don’t know what they are doing. Their argument is based on a moral suasion and bullying us with what appears on the surface to be intellectual rigor, but which is in fact devoid of anything smart.
My fear, here, is that I’m going to be in a meeting where somebody has read this pretentious garbage, explaining to me why “defense in depth” is wrong and how we need to OODA faster. I’d rather nip this in the bud, pointing out if you found anything interesting from that article, you are wrong.
By Mohit Goenka, Gnanavel Shanmugam, and Lance Welsh
At Yahoo Mail, we’re constantly striving to upgrade our product experience. We do this not only by adding new features based on our members’ feedback, but also by providing the best technical solutions to power the most engaging experiences. As such, we’ve recently introduced a number of novel and unique revisions to the way in which we use Redux that have resulted in significant stability and performance improvements. Developers may find our methods useful in achieving similar results in their apps.
Improvements to product metrics
when checking for new emails – 20%
when reading emails – 30%
when sending emails – 20%
10% improvement in page load performance
40% improvement in frame rendering time
We have also reduced API calls by approximately 20%.
How we use Redux in Yahoo Mail
Redux architecture is reliant on one large store that represents the application state. In a Redux cycle, action creators dispatch actions to change the state of the store. React Components then respond to those state changes. We’ve made some modifications on top of this architecture that are atypical in the React-Redux community.
For instance, when fetching data over the network, the traditional methodology is to use Thunk middleware. Yahoo Mail fetches data over the network from our API. Thunks would create an unnecessary and undesirable dependency between the action creators and our API. If and when the API changes, the action creators must then also change. To keep these concerns separate we dispatch the action payload from the action creator to store them in the Redux state for later processing by “action syncers”. Action syncers use the payload information from the store to make requests to the API and process responses. In other words, the action syncers form an API layer by interacting with the store. An additional benefit to keeping the concerns separate is that the API layer can change as the backend changes, thereby preventing such changes from bubbling back up into the action creators and components. This also allowed us to optimize the API calls by batching, deduping, and processing the requests only when the network is available. We applied similar strategies for handling other side effects like route handling and instrumentation. Overall, action syncers helped us to reduce our API calls by ~20% and bring down API errors by 20-30%.
Another change to the normal Redux architecture was made to avoid unnecessary props. The React-Redux community has learned to avoid passing unnecessary props from high-level components through multiple layers down to lower-level components (prop drilling) for rendering. We have introduced action enhancers middleware to avoid passing additional unnecessary props that are purely used when dispatching actions. Action enhancers add data to the action payload so that data does not have to come from the component when dispatching the action. This avoids the component from having to receive that data through props and has improved frame rendering by ~40%. The use of action enhancers also avoids writing utility functions to add commonly-used data to each action from action creators.
In our new architecture, the store reducers accept the dispatched action via action enhancers to update the state. The store then updates the UI, completing the action cycle. Action syncers then initiate the call to the backend APIs to synchronize local changes.
Our novel use of Redux in Yahoo Mail has led to significant user-facing benefits through a more performant application. It has also reduced development cycles for new features due to its simplified architecture. We’re excited to share our work with the community and would love to hear from anyone interested in learning more.
The Center for Democracy and Technology has a good summary of the current state of the DMCA’s chilling effects on security research.
To underline the nature of chilling effects on hacking and security research, CDT has worked to describe how tinkerers, hackers, and security researchers of all types both contribute to a baseline level of security in our digital environment and, in turn, are shaped themselves by this environment, most notably when things they do upset others and result in threats, potential lawsuits, and prosecution. We’ve published two reports (sponsored by the Hewlett Foundation and MacArthur Foundation) about needed reforms to the law and the myriad of ways that security research directly improves people’s lives. To get a more complete picture, we wanted to talk to security researchers themselves and gauge the forces that shape their work; essentially, we wanted to “take the pulse” of the security research community.
Today, we are releasing a third report in service of this effort: “Taking the Pulse of Hacking: A Risk Basis for Security Research.” We report findings after having interviewed a set of 20 security researchers and hackers — half academic and half non-academic — about what considerations they take into account when starting new projects or engaging in new work, as well as to what extent they or their colleagues have faced threats in the past that chilled their work. The results in our report show that a wide variety of constraints shape the work they do, from technical constraints to ethical boundaries to legal concerns, including the DMCA and especially the CFAA.
Note: I am a signatory on the letter supporting unrestricted security research.
The Rust team has announced the release of Rust 1.25.0. “The last few releases have been relatively minor, but Rust 1.25 contains a bunch of stuff! The first one is straightforward: we’ve upgraded to LLVM 6 from LLVM 4. This has a number of effects, a major one being a step closer to AVR support.” See the release notes for details.
Abstract: In recent years, hardware Trojans have drawn the attention of governments and industry as well as the scientific community. One of the main concerns is that integrated circuits, e.g., for military or critical-infrastructure applications, could be maliciously manipulated during the manufacturing process, which often takes place abroad. However, since there have been no reported hardware Trojans in practice yet, little is known about how such a Trojan would look like and how difficult it would be in practice to implement one. In this paper we propose an extremely stealthy approach for implementing hardware Trojans below the gate level, and we evaluate their impact on the security of the target device. Instead of adding additional circuitry to the target design, we insert our hardware Trojans by changing the dopant polarity of existing transistors. Since the modified circuit appears legitimate on all wiring layers (including all metal and polysilicon), our family of Trojans is resistant to most detection techniques, including fine-grain optical inspection and checking against “golden chips”. We demonstrate the effectiveness of our approach by inserting Trojans into two designs — a digital post-processing derived from Intel’s cryptographically secure RNG design used in the Ivy Bridge processors and a side-channel resistant SBox implementation — and by exploring their detectability and their effects on security.
The moral is that this kind of technique is very difficult to detect.
Version 4.0 of the Krita drawing tool has been released; see this article for a summary of the new features in this release. “Krita 4.0 will use SVG on vector layers by default, instead of the prior reliance on ODG. SVG is the most widely used open format for vector graphics out there. Used by ‘pure’ vector design applications, SVG on Krita currently supports gradients and transparencies, with more effects coming soon.”
I want to do pathfinding through a Doom map. The ultimate goal is to be able to automatically determine the path the player needs to take to reach the exit — what switches to hit in what order, what keys to get, etc.
Doom maps are 2D planes cut into arbitrary shapes. Everything outside a shape is ｔｈｅ ｖｏｉｄ, which we don’t care about. Here are some shapes.
The shapes are defined implicitly by their edges. All of the edges touching the red area, for example, say that they’re red on one side.
That’s very nice, because it means I don’t have to do any geometry to detect which areas touch each other. I can tell at a glance that the red and blue areas touch, because the line between them says it’s red on one side and blue on the other.
Unfortunately, this doesn’t seem to be all that useful. The player can’t necessarily move from the red area to the blue area, because there’s a skinny bottleneck. If the yellow area were a raised platform, the player couldn’t fit through the gap. Worse, if there’s a switch somewhere that lowers that platform, then the gap is conditionally passable.
I thought this would be uncommon enough that I could get started only looking at neighbors and do actual geometry later, but that “conditionally passable” pattern shows up all the time in the form of locked “bars” that let you peek between or around them. So I might as well just do the dang geometry.
The player is a 32×32 square and always axis-aligned (i.e., the hitbox doesn’t actually rotate). That’s very convenient, because it means I can “dilate the world” — expand all the walls by 16 units in both directions, while shrinking the player to a single point. That expansion eliminates narrow gaps and leaves a map of everywhere the player’s center is allowed to be. Allegedly this is how Quake did collision detection — but in 3D! How hard can it be in 2D?
The plan, then, is to do this:
This creates a bit of an unholy mess. (I could avoid some of the overlap by being clever at points where exactly two lines touch, but I have to deal with a ton of overlap anyway so I’m not sure if that buys anything.)
The gray outlines are dilations of inner walls, where both sides touch a shape. The black outlines are dilations of outer walls, touching ｔｈｅ ｖｏｉｄ on one side. This map tells me that the player’s center can never go within 16 units of an outer wall, which checks out — their hitbox would get in the way! So I can delete all that stuff completely.
Consider that bottom-left outline, where red and yellow touch horizontally. If the player is in the red area, they can only enter that outlined part if they’re also allowed to be in the yellow area. Once they’re inside it, though, they can move around freely. I’ll color that piece orange, and similarly blend colors for the other outlines. (A small sliver at the top requires access to all three areas, so I colored it gray, because I can’t be bothered to figure out how to do a stripe pattern in Inkscape.)
This is the final map, and it’s easy to traverse because it works like a graph! Each contiguous region is a node, and each border is an edge. Some of the edges are one-way (falling off a ledge) or conditional (walking through a door), but the player can move freely within a region, so I don’t need to care about world geometry any more.
I’m having a hell of a time doing this mass-intersection of a big pile of shapes.
I’m writing this in Rust, and I would very very very strongly prefer not to wrap a C library (or, god forbid, a C++ library), because that will considerably complicate actually releasing this dang software. Unfortunately, that also limits my options rather a lot.
I was referred to a paper (A simple algorithm for Boolean operations on polygons, Martínez et al, 2013) that describes doing a Boolean operation (union, intersection, difference, xor) on two shapes, and works even with self-intersections and holes and whatnot.
I spent an inordinate amount of time porting its reference implementation from very bad C++ to moderately bad Rust, and I extended it to work with an arbitrary number of polygons and to spit out all resulting shapes. It has been a very bumpy ride, and I keep hitting walls — the latest is that it panics when intersecting everything results in two distinct but exactly coincident edges, which obviously happens a lot with this approach.
So the question is: is there some better way to do this that I’m overlooking, or should I just keep fiddling with this algorithm and hope I come out the other side with something that works?
Bear in mind, the input shapes are not necessarily convex, and quite frequently aren’t. Also, they can have holes, and quite frequently do. That rules out most common algorithms. It’s probably possible to triangulate everything, but I’m a little wary of cutting the map into even more microscopic shards; feel free to convince me otherwise.
Also, the map format technically allows absolutely any arbitrary combination of lines, so all of these are possible:
It would be nice to handle these gracefully somehow, or at least not crash on them. But they’re usually total nonsense as far as the game is concerned. But also that middle one does show up in the original stock maps a couple times.
Another common trick is that lines might be part of the same shape on both sides:
The left example suggests that such a line is redundant and can simply be ignored without changing anything. The right example shows why this is a problem.
A common trick in vanilla Doom is the so-called self-referencing sector. Here, the edges of the inner yellow square all claim to be yellow — on both sides. The outer edges all claim to be blue only on the inside, as normal. The yellow square therefore doesn’t neighbor the blue square at all, because no edges that are yellow on one side and blue on the other. The effect in-game is that the yellow area is invisible, but still solid, so it can be used as an invisible bridge or invisible pit for various effects.
This does raise the question of exactly how Doom itself handles all these edge cases. Vanilla maps are preprocessed by a node builder and split into subsectors, which are all convex polygons. So for any given weird trick or broken geometry, the answer to “how does this behave” is: however the node builder deals with it.
Subsectors are built right into vanilla maps, so I could use those. The drawback is that they’re optional for maps targeting ZDoom (and maybe other ports as well?), because ZDoom has its own internal node builder. Also, relying on built nodes in general would make this code less useful for map editing, or generating, or whatever.
ZDoom’s node builder is open source, so I could bake it in? Or port it to Rust? (It’s only, ah, ten times bigger than the shape algorithm I ported.) It’d be interesting to have a fairly-correct reflection of how the game sees broken geometry, which is something no map editor really tries to do. Is it fast enough? Running it on the largest map I know to exist (MAP14 of Sunder) takes 1.4 seconds, which seems like a long time, but also that’s from scratch, and maybe it could be adapted to work incrementally…? Christ.
I’m not sure I have the time to dedicate to flesh this out beyond a proof of concept anyway, so maybe this is all moot. But all the more reason to avoid spending a lot of time on dead ends.
Data that describe processes in a spatial context are everywhere in our day-to-day lives and they dominate big data problems. Map data, for instance, whether describing networks of roads or remote sensing data from satellites, get us where we need to go. Atmospheric data from simulations and sensors underlie our weather forecasts and climate models. Devices and sensors with GPS can provide a spatial context to nearly all mobile data.
In this post, we introduce the WIND toolkit, a huge (500 TB), open weather model dataset that’s available to the world on Amazon’s cloud services. We walk through how to access this data and some of the open-source software developed to make it easily accessible. Our solution considers a subset of geospatial data that exist on a grid (raster) and explores ways to provide access to large-scale raster data from weather models. The solution uses foundational AWS services and the Hierarchical Data Format (HDF), a well adopted format for scientific data.
The approach developed here can be extended to any data that fit in an HDF5 file, which can describe sparse and dense vectors and matrices of arbitrary dimensions. This format is already popular within the physical sciences for both experimental and simulation data. We discuss solutions to gridded data storage for a massive dataset of public weather model outputs called the Wind Integration National Dataset (WIND) toolkit. We also highlight strategies that are general to other large geospatial data management problems.
Wind Integration National Dataset
As variable renewable power penetration levels increase in power systems worldwide, the importance of renewable integration studies to ensure continued economic and reliable operation of the power grid is also increasing. The WIND toolkit is the largest freely available grid integration dataset to date.
The WIND toolkit was developed by 3TIER by Vaisala. They were under a subcontract to the National Renewable Energy Laboratory (NREL) to support studies on integration of wind energy into the existing US grid. NREL is a part of a network of national laboratories for the US Department of Energy and has a mission to advance the science and engineering of energy efficiency, sustainable transportation, and renewable power technologies.
The toolkit has been used by consultants, research groups, and universities worldwide to support grid integration studies. Less traditional uses also include resource assessments for wind plants (such as those powering Amazon data centers), and studying the effects of weather on California condor migrations in the Baja peninsula.
The diversity of applications highlights the value of accessible, open public data. Yet, there’s a catch: the dataset is huge. The WIND toolkit provides simulated atmospheric (weather) data at a two-km spatial resolution and five-minute temporal resolution at multiple heights for seven years. The entire dataset is half a petabyte (500 TB) in size and is stored in the NREL High Performance Computing data center in Golden, Colorado. Making this dataset publicly available easily and in a cost-effective manner is a major challenge.
As other laboratories and public institutions work to release their data to the world, they may face similar challenges to those that we experienced. Some prior, well-intentioned efforts to release huge datasets as-is have resulted in data resources that are technically available but fundamentally unusable. They may be stored in an unintuitive format or indexed and organized to support only a subset of potential uses. Downloading hundreds of terabytes of data is often impractical. Most users don’t have access to a big data cluster (or super computer) to slice and dice the data as they need after it’s downloaded.
We aim to provide a large amount of data (50 terabytes) to the public in a way that is efficient, scalable, and easy to use. In many cases, researchers can access these huge cloud-located datasets using the same software and algorithms they have developed for smaller datasets stored locally. Only the pieces of data they need for their individual analysis must be downloaded. To make this work in practice, we worked with the HDF Group and have built upon their forthcoming Highly Scalable Data Service.
In the rest of this post, we discuss how the HSDS software was developed to use Amazon EC2 and Amazon S3 resources to provide convenient and scalable access to these huge geospatial datasets. We describe how the HSDS service has been put to work for the WIND Toolkit dataset and demonstrate how to access it using the h5pyd Python library and the REST API. We conclude with information about our ongoing work to release more ‘open’ datasets to the public using AWS services, and ways to improve and extend the HSDS with newer Amazon services like Amazon ECS and AWS Lambda.
Developing a scalable service for big geospatial data
The HDF5 file format and API have been used for many years and is an effective means of storing large scientific datasets. For example, NASA’s Earth Observing System (EOS) satellites collect more than 16 TBs of data per day using HDF5.
With the rise of the cloud, there are new challenges and opportunities to rethink how HDF5 can be enhanced to work effectively as a component in a cloud-native architecture. For the HDF Group, working with NREL has been a great opportunity to put ideas into practice with a production-size dataset.
An HDF5 file consists of a directed graph of group and dataset objects. Datasets can be thought of as a multidimensional array with support for user-defined metadata tags and compression. Typical operations on datasets would be reading or writing data to a regular subregion (a hyperslab) or reading and writing individual elements (a point selection). Also, group and dataset objects may each contain an arbitrary number of the user-defined metadata elements known as attributes.
Many people have used the HDF library in applications developed or ported to run on EC2 instances, but there are a number of constraints that often prove problematic:
The HDF5 library can’t read directly from HDF5 files stored as S3 objects. The entire file (often many GB in size) would need to be copied to local storage before the first byte can be read. Also, the instance must be configured with the appropriately sized EBS volume)
The HDF library only has access to the computational resources of the instance itself (as opposed to a cluster of instances), so many operations are bottlenecked by the library.
Any modifications to the HDF5 file would somehow have to be synchronized with changes that other instances have made to same file before writing back to S3.
Using a pattern common to many offerings from AWS, the solution to these constraints is to develop a service framework around the HDF data model. Using this model, the HDF Group has created the Highly Scalable Data Service (HSDS) that provides all the functionality that traditionally was provided by the HDF5 library. By using the service, you don’t need to manage your own file volumes, but can just read and write whatever data that you need.
Because the service manages the actual data persistence to a durable medium (S3, in this case), you don’t need to worry about disk management. Simply stream the data you need from the service as you need it. Secondly, putting the functionality behind a service allows some tricks to increase performance (described in more detail later). And lastly, HSDS allows any number of clients to access the data at the same time, enabling HDF5 to be used as a coordination mechanism for multiple readers and writers.
In designing the HSDS architecture, we gave much thought to how to achieve scalability of the HSDS service. For accessing HDF5 data, there are two different types of scaling to consider:
Multiple clients making many requests to the service
Single requests that require a significant amount of data processing
To deal with the first scaling challenge, as with most services, we considered how the service responds as the request rate increases. AWS provides some great tools that help in this regard:
Auto Scaling groups
Elastic Load Balancing load balancers
The ability of S3 to handle large aggregate throughput rates
By using a cluster of EC2 instances behind a load balancer, you can handle different client loads in a cost-effective manner.
The second scaling challenge concerns single requests that would take significant processing time with just one compute node. One example of this from the WIND toolkit would be extracting all the values in the seven-year time span for a given geographic point and dataset.
In HDF5, large datasets are typically stored as “chunks”; that is, a regular partition of the array. In HSDS, each chunk is stored as a binary object in S3. The sequential approach to retrieving the time series values would be for the service to read each chunk needed from S3, extract the needed elements, and go on to the next chunk. In this case, that would involve processing 2557 chunks, and would be quite slow.
Fortunately, with HSDS, you can speed this up quite a bit by exploiting the compute and I/O capabilities of the cluster. Upon receiving the request, the receiving node can use other nodes in the cluster to read different portions of the selection. With multiple nodes reading from S3 in parallel, performance improves as the cluster size increases.
The diagram below illustrates how this works in simplified case of four chunks and four nodes.
This architecture has worked in well in practice. In testing with the WIND toolkit and time series extraction, we observed a request latency of ~60 seconds using four nodes vs. ~5 seconds with 40 nodes. Performance roughly scales with the size of the cluster.
A planned enhancement to this is to use AWS Lambda for the worker processing. This enables 1000-way parallel reads at a reasonable cost, as you only pay for the milliseconds of CPU time used with AWS Lambda.
Public access to atmospheric data using HSDS and AWS
An early challenge in releasing the WIND toolkit data was in deciding how to subset the data for different use cases. In general, few researchers need access to the entire 0.5 PB of data and a great deal of efficiency and cost reduction can be gained by making directed constituent datasets.
NREL grid integration researchers initially extracted a 2-TB subset by selecting 120,000 points where the wind resource seemed appropriate for development. They also chose only those data important for wind applications (100-m wind speed, converted to power), the most interesting locations for those performing grid studies. To support the remaining users who needed more data resolution, we down-sampled the data to a 60-minute temporal resolution, keeping all the other variables and spatial resolution intact. This reduced dataset is 50 TB of data describing 30+ atmospheric variables of data for 7 years at a 60-minute temporal resolution.
Programmatic access is possible using the h5pyd Python library, a distributed analog to the widely used h5py library. Users interact with the datasets (variables) and slice the data from its (time x longitude x latitude) cube form as they see fit.
Examples and use cases are described in a set of Jupyter notebooks and available on GitHub:
Now you have a Jupyter notebook server running on your EC2 server.
From your laptop, create an SSH tunnel:
$ ssh –L 8888:localhost:8888 (IP address of the EC2 server)
Now, you can browse to localhost:8888 using the correct token, and interact with the notebooks as if they were local. Within the directory, there are examples for accessing the HSDS API and plotting wind and weather data using matplotlib.
Controlling access and defraying costs
A final concern is rate limiting and access control. Although the HSDS service is scalable and relatively robust, we had a few practical concerns:
How can we protect from malicious or accidental use that may lead to high egress fees (for example, someone who attempts to repeatedly download the entire dataset from S3)?
How can we keep track of who is using the data both to document the value of the data resource and to justify the costs?
If costs become too high, can we charge for some or all API use to help cover the costs?
To approach these problems, we investigated using Amazon API Gateway and its simplified integration with the AWS Marketplace for SaaS monetization as well as third-party API proxies.
In the end, we chose to use API Umbrella due to its close involvement with http://data.gov. While AWS Marketplace is a compelling option for future datasets, the decision was made to keep this dataset entirely open, at least for now. As community use and associated costs grow, we’ll likely revisit Marketplace. Meanwhile, API Umbrella provides controls for rate limiting and API key registration out of the box and was simple to implement as a front-end proxy to HSDS. Those applications that may want to charge for API use can accomplish a similar strategy using Amazon API Gateway and AWS Marketplace.
Ongoing work and other resources
As NREL and other government research labs, municipalities, and organizations try to share data with the public, we expect many of you will face similar challenges to those we have tried to approach with the architecture described in this post. Providing large datasets is one challenge. Doing so in a way that is affordable and convenient for users is an entirely more difficult goal. Using AWS cloud-native services and the existing foundation of the HDF file format has allowed us to tackle that challenge in a meaningful way.
Dr. Caleb Phillips is a senior scientist with the Data Analysis and Visualization Group within the Computational Sciences Center at the National Renewable Energy Laboratory. Caleb comes from a background in computer science systems, applied statistics, computational modeling, and optimization. His work at NREL spans the breadth of renewable energy technologies and focuses on applying modern data science techniques to data problems at scale.
Dr. Caroline Draxl is a senior scientist at NREL. She supports the research and modeling activities of the US Department of Energy from mesoscale to wind plant scale. Caroline uses mesoscale models to research wind resources in various countries, and participates in on- and offshore boundary layer research and in the coupling of the mesoscale flow features (kilometer scale) to the microscale (tens of meters). She holds a M.S. degree in Meteorology and Geophysics from the University of Innsbruck, Austria, and a PhD in Meteorology from the Technical University of Denmark.
John Readey has been a Senior Architect at The HDF Group since he joined in June 2014. His interests include web services related to HDF, applications that support the use of HDF and data visualization.Before joining The HDF Group, John worked at Amazon.com from 2006–2014 where he developed service-based systems for eCommerce and AWS.
Jordan Perr-Sauer is an RPP intern with the Data Analysis and Visualization Group within the Computational Sciences Center at the National Renewable Energy Laboratory. Jordan hopes to use his professional background in software engineering and his academic training in applied mathematics to solve the challenging problems facing America and the world.
Artificial intelligence technologies have the potential to upend the longstanding advantage that attack has over defense on the Internet. This has to do with the relative strengths and weaknesses of people and computers, how those all interplay in Internet security, and where AI technologies might change things.
You can divide Internet security tasks into two sets: what humans do well and what computers do well. Traditionally, computers excel at speed, scale, and scope. They can launch attacks in milliseconds and infect millions of computers. They can scan computer code to look for particular kinds of vulnerabilities, and data packets to identify particular kinds of attacks.
Humans, conversely, excel at thinking and reasoning. They can look at the data and distinguish a real attack from a false alarm, understand the attack as it’s happening, and respond to it. They can find new sorts of vulnerabilities in systems. Humans are creative and adaptive, and can understand context.
Computers — so far, at least — are bad at what humans do well. They’re not creative or adaptive. They don’t understand context. They can behave irrationally because of those things.
Humans are slow, and get bored at repetitive tasks. They’re terrible at big data analysis. They use cognitive shortcuts, and can only keep a few data points in their head at a time. They can also behave irrationally because of those things.
AI will allow computers to take over Internet security tasks from humans, and then do them faster and at scale. Here are possible AI capabilities:
Discovering new vulnerabilities — and, more importantly, new types of vulnerabilities in systems, both by the offense to exploit and by the defense to patch, and then automatically exploiting or patching them.
Reacting and adapting to an adversary’s actions, again both on the offense and defense sides. This includes reasoning about those actions and what they mean in the context of the attack and the environment.
Abstracting lessons from individual incidents, generalizing them across systems and networks, and applying those lessons to increase attack and defense effectiveness elsewhere.
Identifying strategic and tactical trends from large datasets and using those trends to adapt attack and defense tactics.
That’s an incomplete list. I don’t think anyone can predict what AI technologies will be capable of. But it’s not unreasonable to look at what humans do today and imagine a future where AIs are doing the same things, only at computer speeds, scale, and scope.
Both attack and defense will benefit from AI technologies, but I believe that AI has the capability to tip the scales more toward defense. There will be better offensive and defensive AI techniques. But here’s the thing: defense is currently in a worse position than offense precisely because of the human components. Present-day attacks pit the relative advantages of computers and humans against the relative weaknesses of computers and humans. Computers moving into what are traditionally human areas will rebalance that equation.
Roy Amara famously said that we overestimate the short-term effects of new technologies, but underestimate their long-term effects. AI is notoriously hard to predict, so many of the details I speculate about are likely to be wrong — and AI is likely to introduce new asymmetries that we can’t foresee. But AI is the most promising technology I’ve seen for bringing defense up to par with offense. For Internet security, that will change everything.
One of the effects of GDPR — the new EU General Data Protection Regulation — is that we’re all going to be learning a lot more about who collects our data and what they do with it. Consider PayPal, that just released a list of over 600 companies they share customer data with. Here’s a good visualization of that data.
Is 600 companies unusual? Is it more than average? Less? We’ll soon know.
In Part 2, we take a deeper look at the differences between HDDs and SSDs, how both HDD and SSD technologies are evolving, and how Backblaze takes advantage of SSDs in our operations and data centers.
The first time you booted a computer or opened an app on a computer with a solid-state-drive (SSD), you likely were delighted. I know I was. I loved the speed, silence, and just the wow factor of this new technology that seemed better in just about every way compared to hard drives.
I was ready to fully embrace the promise of SSDs. And I have. My desktop uses an SSD for booting, applications, and for working files. My laptop has a single 512GB SSD. I still use hard drives, however. The second, third, and fourth drives in my desktop computer are HDDs. The external USB RAID I use for local backup uses HDDs in four drive bays. When my laptop is at my desk it is attached to a 1.5TB USB backup hard drive. HDDs still have a place in my personal computing environment, as they likely do in yours.
Nothing stays the same for long, however, especially in the fast-changing world of computing, so we are certain to see new storage technologies coming to the fore, perhaps with even more wow factor.
Before we get to what’s coming, let’s review the primary differences between HDDs and SSDs in a little more detail in the following table.
A Comparison of HDDs to SSDs
Power Draw/Battery Life
More power draw, averages 6–7 watts and therefore uses more battery
Less power draw, averages 2–3 watts, resulting in 30+ minute battery boost
Only around $0.03 per gigabyte, very cheap (buying a 4TB model)
Expensive, roughly $0.20- $0.30 per gigabyte (based on buying a 1TB drive)
Typically around 500GB and 2TB maximum for notebook size drives; 10TB max for desktops
Typically not larger than 1TB for notebook size drives; 4TB for desktops
Operating System Boot Time
Around 30-40 seconds average bootup time
Around 8-13 seconds average bootup time
Audible clicks and spinning platters can be heard
There are no moving parts, hence no sound
The spinning of the platters can sometimes result in vibration
No vibration as there are no moving parts
HDD doesn’t produce much heat, but it will have a measurable amount more heat than an SSD due to moving parts and higher power draw
Lower power draw and no moving parts so little heat is produced
Mean time between failure rate of 1.5 million hours
Mean time between failure rate of 2.0 million hours
File Copy / Write Speed
The range can be anywhere from 50–120MB/s
Generally above 200 MB/s and up to 550 MB/s for cutting edge drives
Full Disk Encryption (FDE) Supported on some models
Full Disk Encryption (FDE) Supported on some models
The HDD has an amazing history of improvement and innovation. From its inception in 1956 the hard drive has decreased in size 57,000 times, increased storage 1 million times, and decreased cost 2,000 times. In other words, the cost per gigabyte has decreased by 2 billion times in about 60 years.
Hard drive manufacturers made these dramatic advances by reducing the size, and consequently the seek times, of platters while increasing their density, improving disk reading technologies, adding multiple arms and read/write heads, developing better bus interfaces, and increasing spin speed and reducing friction with techniques such as filling drives with helium.
In 2005, the drive industry introduced perpendicular recording technology to replace the older longitudinal recording technology, which enabled areal density to reach more than 100 gigabits per square inch. Longitudinal recording aligns data bits horizontally in relation to the drive’s spinning platter, parallel to the surface of the disk, while perpendicular recording aligns bits vertically, perpendicular to the disk surface.
Other technologies such as bit patterned media recording (BPMR) are contributing to increased densities, as well. Introduced by Toshiba in 2010, BPMR is a proposed hard disk drive technology that could succeed perpendicular recording. It records data using nanolithography in magnetic islands, with one bit per island. This contrasts with current disk drive technology where each bit is stored in 20 to 30 magnetic grains within a continuous magnetic film.
Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in HDDs to increase storage density and overall per-drive storage capacity. Shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing for higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because physical limitations prevent recording magnetic heads from having the same width as reading heads, leaving recording heads wider.
Track Spacing Enabled by SMR Technology (Seagate)
To increase the amount of data stored on a drive’s platter requires cramming the magnetic regions closer together, which means the grains need to be smaller so they won’t interfere with each other. In 2002, Seagate successfully demoed heat-assisted magnetic recording (HAMR). HAMR records magnetically using laser-thermal assistance that ultimately could lead to a 20 terabyte drive by 2019. (See our post on HAMR by Seagate’s CTO Mark Re, What is HAMR and How Does It Enable the High-Capacity Needs of the Future?)
Western Digital claims that its competing microwave-assisted magnetic recording (MAMR) could enable drive capacity to increase up to 40TB by the year 2025. Some industry watchers and drive manufacturers predict increases in areal density from today’s .86 tbpsi terabit-per-square-inch (TBPSI) to 10 tbpsi by 2025 resulting in as much as 100TB drive capacity in the next decade.
The future certainly does look bright for HDDs continuing to be with us for a while.
The Outlook for SSDs
SSDs are also in for some amazing advances.
SATA (Serial Advanced Technology Attachment) is the common hardware interface that allows the transfer of data to and from HDDs and SSDs. SATA SSDs are fine for the majority of home users, as they are generally cheaper, operate at a lower speed, and have a lower write life.
While fine for everyday computing, in a RAID (Redundant Array of Independent Disks), server array or data center environment, often a better alternative has been to use ‘SAS’ drives, which stands for Serial Attached SCSI. This is another type of interface that, again, is usable either with HDDs or SSDs. ‘SCSI’ stands for Small Computer System Interface (which is why SAS drives are sometimes referred to as ‘scuzzy’ drives). SAS has increased IOPS (Inputs Outputs Per Second) over SATA, meaning it has the ability to read and write data faster. This has made SAS an optimal choice for systems that require high performance and availability.
On an enterprise level, SAS prevails over SATA, as SAS supports over-provisioning to prolong write life and has been specifically designed to run in environments that require constant drive usage.
PCIe (Peripheral Component Interconnect Express) is a high speed serial computer expansion bus standard that supports drastically higher data transfer rates over SATA or SAS interfaces due to the fact that there are more channels available for the flow of data.
Many leading drive manufacturers have been adopting PCIe as the standard for new home and enterprise storage and some peripherals. For example, you’ll see that the latest Apple Macbooks ship with PCIe-based flash storage, something that Apple has been adopting over the years with their consumer devices.
PCIe can also be used within data centers for RAID systems and to create high-speed networking capabilities, increasing overall performance and supporting the newer and higher capacity HDDs.
As we covered in Part 1, SSDs are based on a type of non-volatile flash memory called NAND.The latest trend in NAND flash is quad-level-cell (QLC) NAND. NAND is subdivided into types based on how many bits of data are stored in each physical memory cell. SLC (single-level-cell) stores one bit, MLC (multi-level-cell) stores two, TLC (triple-level cell) stores three, and QLC (quad-level-cell) stores four.
Storing more data per cell makes NAND more dense, but it also makes the memory slower — it takes more time to read and write data when so much additional information (and so many more charge states) are stored within the same cell of memory.
QLC NAND memory is built on older process nodes with larger cells that can more easily store multiple bits of data. The new NAND tech has higher overall reliability with higher total number of program / erase cycles (P/E cycles).
QLC NAND wafer from which individual microcircuits are made
QLC NAND promises to produce faster and denser SSDs. The effect on price also could be dramatic. Tom’s Hardware is predicting that the advent of QLC could push 512GB SSDs down to $100.
Beyond HDDs and SSDs
There is significant work being done that is pushing the bounds of data storage beyond what is possible with spinning platters and microcircuits. A team at Harvard University has used genome-editing to encode video into live bacteria.
We’ve already discussed the benefits of SSDs. The benefits of SSDs that apply particularly to the data center are:
Low power consumption — When you are running lots of drives, power usage adds up. Anywhere you can conserve power is a win.
Speed — Data can be accessed faster, which is especially beneficial for caching databases and other data affecting overall application or system performance.
Lack of vibration — Reducing vibration improves reliability thereby reducing problems and maintenance. Racks don’t need the size and structural rigidity housing SSDs that they need housing HDDs.
Low noise — Data centers will become quieter as more SSDs are deployed.
Low heat production — The less heat generated the less cooling and power required in the data center.
Faster booting — The faster a storage chassis can get online or a critical server can be rebooted after maintenance or a problem, the better.
Greater areal density — Data centers will be able to store more data in less space, which increases efficiency in all areas (power, cooling, etc.)
The top drive manufacturers say that they expect HDDs and SSDs to coexist for the foreseeable future in all areas — home, business, and data center, with customers choosing which technology and product will best fit their application.
How Backblaze Uses SSDs
In just about all respects, SSDs are superior to HDDs. So why don’t we replace the 100,000+ hard drives we have spinning in our data centers with SSDs?
Our operations team takes advantage of the benefits and savings of SSDs wherever they can, using them in every place that’s appropriate other than primary data storage. They’re particularly useful in our caching and restore layers, where we use them strategically to speed up data transfers. SSDs also speed up access to B2 Cloud Storage metadata. Our operations teams is considering moving to SSDs to boot our Storage Pods, where the cost of a small SSD is competitive with hard drives, and their other attributes (small size, lack of vibration, speed, low-power consumption, reliability) are all pluses.
A Future with Both HDDs and SSDs
IDC predicts that total data created will grow from approximately 33 zettabytes in 2018 to about 160 zettabytes in 2025. (See What’s a Byte? if you’d like help understanding the size of a zettabyte.)
Annual Size of the Global Datasphere
Over 90% of enterprise drive shipments today are HDD, according to IDC. By 2025, SSDs will comprise almost 20% of drive shipments. SSDs will gain share, but total growth in data created will result in massive sales of both HDDs and SSDs.
Enterprise Byte Shipments: NDD and SSD
As both HDD and SSD sales grow, so does the capacity of both technologies. Given the benefits of SSDs in many applications, we’re likely going to see SSDs replacing HDDs in all but the highest capacity uses.
It’s clear that there are merits to both HDDs and SSDs. If you’re not running a data center, and don’t have more than one or two terabytes of data to store on your home or business computer, your first choice likely should be an SSD. They provide a noticeable improvement in performance during boot-up and data transfer, and are smaller, quieter, and more reliable as well. Save the HDDs for secondary drives, NAS, RAID, and local backup devices in your system.
Perhaps some day we’ll look back at the days of spinning platters with the same nostalgia we look back at stereo LPs, and some of us will have an HDD paperweight on our floating anti-gravity desk as a conversation piece. Until the day that SSD’s performance, capacity, and finally, price, expel the last HDD out of the home and data center, we can expect to live in a world that contains both solid state SSDs and magnetic platter HDDs, and as users we will reap the benefits from both technologies.
Don’t miss future posts on HDDs, SSDs, and other topics, including hard drive stats, cloud storage, and tips and tricks for backing up to the cloud. Use the Join button above to receive notification of future posts on our blog.
AWS has released a new whitepaper that has been requested by many AWS customers: AWS Policy Perspectives: Data Residency. Data residency is the requirement that all customer content processed and stored in an IT system must remain within a specific country’s borders, and it is one of the foremost concerns of governments that want to use commercial cloud services. General cybersecurity concerns and concerns about government requests for data have contributed to a continued focus on keeping data within countries’ borders. In fact, some governments have determined that mandating data residency provides an extra layer of security.
This approach, however, is counterproductive to the data protection objectives and the IT modernization and global economic growth goals that many governments have set as milestones. This new whitepaper addresses the real and perceived security risks expressed by governments when they demand in-country data residency by identifying the most likely and prevalent IT vulnerabilities and security risks, explaining the native security embedded in cloud services, and highlighting the roles and responsibilities of cloud service providers (CSPs), governments, and customers in protecting data.
Large-scale, multinational CSPs, often called hyperscale CSPs, represent a transformational disruption in technology because of how they support their customers with high degrees of efficiency, agility, and innovation as part of world-class security offerings. The whitepaper explains how hyperscale CSPs, such as AWS, that might be located out of country provide their customers the ability to achieve high levels of data protection through safeguards on their own platform and with turnkey tooling for their customers. They do this while at the same time preserving nation-state regulatory sovereignty.
The whitepaper also considers the commercial, public-sector, and economic effects of data residency policies and offers considerations for governments to evaluate before enforcing requirements that can unintentionally limit public-sector digital transformation goals, in turn possibly leading to increased cybersecurity risk.
AWS continues to engage with governments around the world to hear and address their top-of-mind security concerns. We take seriously our commitment to advocate for our customers’ interests and enforce security from “ground zero.” This means that when customers use AWS, they can have the confidence that their data is protected with a level of assurance that meets, if not exceeds, their needs, regardless of where the data resides.
For over a decade, civil libertarians have been fighting government mass surveillance of innocent Americans over the Internet. We’ve just lost an important battle. On January 18, President Trumpsigned the renewal of Section 702, domestic mass surveillance became effectively a permanent part of US law.
Section 702 was initially passed in 2008, as an amendment to the Foreign Intelligence Surveillance Act of 1978. As the title of that law says, it was billed as a way for the NSA to spy on non-Americans located outside the United States. It was supposed to be an efficiency and cost-saving measure: the NSA was already permitted to tap communications cables located outside the country, and it was already permitted to tap communications cables from one foreign country to another that passed through the United States. Section 702 allowed it to tap those cables from inside the United States, where it was easier. It also allowed the NSA to request surveillance data directly from Internet companies under a program called PRISM.
The problem is that this authority also gave the NSA the ability to collect foreign communications and data in a way that inherently and intentionally also swept up Americans’ communications as well, without a warrant. Other law enforcement agencies are allowed to ask the NSA to search those communications, give their contents to the FBI and other agencies and then lie about their origins in court.
In 1978, after Watergate had revealed the Nixon administration’s abuses of power, we erected a wall between intelligence and law enforcement that prevented precisely this kind of sharing of surveillance data under any authority less restrictive than the Fourth Amendment. Weakening that wall is incredibly dangerous, and the NSA should never have been given this authority in the first place.
Arguably, it never was. The NSA had been doing this type of surveillance illegally for years, something that was first made public in 2006. Section 702 was secretly used as a way to paper over that illegal collection, but nothing in the text of the later amendment gives the NSA this authority. We didn’t know that the NSA was using this law as the statutory basis for this surveillance until Edward Snowden showed us in 2013.
Civil libertarians have been battling this law in both Congress and the courts ever since it was proposed, and the NSA’s domestic surveillance activities even longer. What this most recent vote tells me is that we’ve lost that fight.
Section 702 was passed under George W. Bush in 2008, reauthorized under Barack Obama in 2012, and now reauthorized again under Trump. In all three cases, congressional support was bipartisan. It has survived multiple lawsuits by the Electronic Frontier Foundation, the ACLU, and others. It has survived the revelations by Snowden that it was being used far more extensively than Congress or the public believed, and numerous public reports of violations of the law. It has even survived Trump’s belief that he was being personally spied on by the intelligence community, as well as any congressional fears that Trump could abuse the authority in the coming years. And though this extension lasts only six years, it’s inconceivable to me that it will ever be repealed at this point.
So what do we do? If we can’t fight this particular statutory authority, where’s the new front on surveillance? There are, it turns out, reasonable modifications that target surveillance more generally, and not in terms of any particular statutory authority. We need to look at US surveillance law more generally.
First, we need to strengthen the minimization procedures to limit incidental collection. Since the Internet was developed, all the world’s communications travel around in a single global network. It’s impossible to collect only foreign communications, because they’re invariably mixed in with domestic communications. This is called “incidental” collection, but that’s a misleading name. It’s collected knowingly, and searched regularly. The intelligence community needs much stronger restrictions on which American communications channels it can access without a court order, and rules that require they delete the data if they inadvertently collect it. More importantly, “collection” is defined as the point the NSA takes a copy of the communications, and not later when they search their databases.
Second, we need to limit how other law enforcement agencies can use incidentally collected information. Today, those agencies can query a database of incidental collection on Americans. The NSA can legally pass information to those other agencies. This has to stop. Data collected by the NSA under its foreign surveillance authority should not be used as a vehicle for domestic surveillance.
The most recent reauthorization modified this lightly, forcing the FBI to obtain a court order when querying the 702 data for a criminal investigation. There are still exceptions and loopholes, though.
Third, we need to end what’s called “parallel construction.” Today, when a law enforcement agency uses evidence found in this NSA database to arrest someone, it doesn’t have to disclose that fact in court. It can reconstruct the evidence in some other manner once it knows about it, and then pretend it learned of it that way. This right to lie to the judge and the defense is corrosive to liberty, and it must end.
Pressure to reform the NSA will probably first come from Europe. Already, European Union courts have pointed to warrantless NSA surveillance as a reason to keep Europeans’ data out of US hands. Right now, there is a fragile agreement between the EU and the United States – called “Privacy Shield” — that requires Americans to maintain certain safeguards for international data flows. NSA surveillance goes against that, and it’s only a matter of time before EU courts start ruling this way. That’ll have significant effects on both government and corporate surveillance of Europeans and, by extension, the entire world.
Further pressure will come from the increased surveillance coming from the Internet of Things. When your home, car, and body are awash in sensors, privacy from both governments and corporations will become increasingly important. Sooner or later, society will reach a tipping point where it’s all too much. When that happens, we’re going to see significant pushback against surveillance of all kinds. That’s when we’ll get new laws that revise all government authorities in this area: a clean sweep for a new world, one with new norms and new fears.
It’s possible that a federal court will rule on Section 702. Although there have been many lawsuits challenging the legality of what the NSA is doing and the constitutionality of the 702 program, no court has ever ruled on those questions. The Bush and Obama administrations successfully argued that defendants don’t have legal standing to sue. That is, they have no right to sue because they don’t know they’re being targeted. If any of the lawsuits can get past that, things might change dramatically.
Meanwhile, much of this is the responsibility of the tech sector. This problem exists primarily because Internet companies collect and retain so much personal data and allow it to be sent across the network with minimal security. Since the government has abdicated its responsibility to protect our privacy and security, these companies need to step up: Minimize data collection. Don’t save data longer than absolutely necessary. Encrypt what has to be saved. Well-designed Internet services will safeguard users, regardless of government surveillance authority.
For the rest of us concerned about this, it’s important not to give up hope. Everything we do to keep the issue in the public eye – and not just when the authority comes up for reauthorization again in 2024 — hastens the day when we will reaffirm our rights to privacy in the digital age.
The most important fact about Wannacry is that it was an accident. We’ve had 30 years of experience with Internet worms teaching us that worms are always accidents. While launching worms may be intentional, their effects cannot be predicted. While they appear to have targets, like Slammer against South Korea, or Witty against the Pentagon, further analysis shows this was just a random effect that was impossible to predict ahead of time. Only in hindsight are these effects explainable.
We should hold those causing accidents accountable, too, but it’s a different accountability. The U.S. has caused more civilian deaths in its War on Terror than the terrorists caused triggering that war. But we hold these to be morally different: the terrorists targeted the innocent, whereas the U.S. takes great pains to avoid civilian casualties.
Since we are talking about blaming those responsible for accidents, we also must include the NSA in that mix. The NSA created, then allowed the release of, weaponized exploits. That’s like accidentally dropping a load of unexploded bombs near a village. When those bombs are then used, those having lost the weapons are held guilty along with those using them. Yes, while we should blame the hacker who added ETERNAL BLUE to their ransomware, we should also blame the NSA for losing control of ETERNAL BLUE.
A country and its assets are different
Was it North Korea, or hackers affilliated with North Korea? These aren’t the same.
It’s hard for North Korea to have hackers of its own. It doesn’t have citizens who grow up with computers to pick from. Moreover, an internal hacking corps would create tainted citizens exposed to dangerous outside ideas. Update: Some people have pointed out that Kim Il-sung University in the capital does have some contact with the outside world, with academics granted limited Internet access, so I guess some tainting is allowed. Still, what we know of North Korea hacking efforts largley comes from hackers they employ outside North Korea. It was the Lazurus Group, outside North Korea, that did Wannacry.
Instead, North Korea develops external hacking “assets”, supporting several external hacking groups in China, Japan, and South Korea. This is similar to how intelligence agencies develop human “assets” in foreign countries. While these assets do things for their handlers, they also have normal day jobs, and do many things that are wholly independent and even sometimes against their handler’s interests.
For example, this Muckrock FOIA dump shows how “CIA assets” independently worked for Castro and assassinated a Panamanian president. That they also worked for the CIA does not make the CIA responsible for the Panamanian assassination.
That CIA/intelligence assets work this way is well-known and uncontroversial. The fact that countries use hacker assets like this is the controversial part. These hackers do act independently, yet we refuse to consider this when we want to “attribute” attacks.
Attribution is political
We have far better attribution for the nPetya attacks. It was less accidental (they clearly desired to disrupt Ukraine), and the hackers were much closer to the Russian government (Russian citizens). Yet, the Trump administration isn’t fighting Russia, they are fighting North Korea, so they don’t officially attribute nPetya to Russia, but do attribute Wannacry to North Korea.
Trump is in conflict with North Korea. He is looking for ways to escalate the conflict. Attributing Wannacry helps achieve his political objectives.
That it was blatantly politics is demonstrated by the way it was released to the press. It wasn’t released in the normal way, where the administration can stand behind it, and get challenged on the particulars. Instead, it was pre-released through the normal system of “anonymous government officials” to the NYTimes, and then backed up with op-ed in the Wall Street Journal. The government leaks information like this when it’s weak, not when its strong.
The proper way is to release the evidence upon which the decision was made, so that the public can challenge it. Among the questions the public would ask is whether it they believe it was North Korea’s intention to cause precisely this effect, such as disabling the British NHS. Or, whether it was merely hackers “affiliated” with North Korea, or hackers carrying out North Korea’s orders. We cannot challenge the government this way because the government intentionally holds itself above such accountability.
We believe hacking groups tied to North Korea are responsible for Wannacry. Yet, even if that’s true, we still have three attribution problems. We still don’t know if that was intentional, in pursuit of some political goal, or an accident. We still don’t know if it was at the direction of North Korea, or whether their hacker assets acted independently. We still don’t know if the government has answers to these questions, or whether it’s exploiting this doubt to achieve political support for actions against North Korea.
On January 3, the world learned about a series of major security vulnerabilities in modern microprocessors. Called Spectre and Meltdown, these vulnerabilities were discovered by several different researchers last summer, disclosed to the microprocessors’ manufacturers, and patched — at least to the extent possible.
This news isn’t really any different from the usual endless stream of security vulnerabilities and patches, but it’s also a harbinger of the sorts of security problems we’re going to be seeing in the coming years. These are vulnerabilities in computer hardware, not software. They affect virtually all high-end microprocessors produced in the last 20 years. Patching them requires large-scale coordination across the industry, and in some cases drastically affects the performance of the computers. And sometimes patching isn’t possible; the vulnerability will remain until the computer is discarded.
Spectre and Meltdown aren’t anomalies. They represent a new area to look for vulnerabilities and a new avenue of attack. They’re the future of security — and it doesn’t look good for the defenders.
Modern computers do lots of things at the same time. Your computer and your phone simultaneously run several applications — or apps. Your browser has several windows open. A cloud computer runs applications for many different computers. All of those applications need to be isolated from each other. For security, one application isn’t supposed to be able to peek at what another one is doing, except in very controlled circumstances. Otherwise, a malicious advertisement on a website you’re visiting could eavesdrop on your banking details, or the cloud service purchased by some foreign intelligence organization could eavesdrop on every other cloud customer, and so on. The companies that write browsers, operating systems, and cloud infrastructure spend a lot of time making sure this isolation works.
Both Spectre and Meltdown break that isolation, deep down at the microprocessor level, by exploiting performance optimizations that have been implemented for the past decade or so. Basically, microprocessors have become so fast that they spend a lot of time waiting for data to move in and out of memory. To increase performance, these processors guess what data they’re going to receive and execute instructions based on that. If the guess turns out to be correct, it’s a performance win. If it’s wrong, the microprocessors throw away what they’ve done without losing any time. This feature is called speculative execution.
Spectre and Meltdown attack speculative execution in different ways. Meltdown is more of a conventional vulnerability; the designers of the speculative-execution process made a mistake, so they just needed to fix it. Spectre is worse; it’s a flaw in the very concept of speculative execution. There’s no way to patch that vulnerability; the chips need to be redesigned in such a way as to eliminate it.
Since the announcement, manufacturers have been rolling out patches to these vulnerabilities to the extent possible. Operating systems have been patched so that attackers can’t make use of the vulnerabilities. Web browsers have been patched. Chips have been patched. From the user’s perspective, these are routine fixes. But several aspects of these vulnerabilities illustrate the sorts of security problems we’re only going to be seeing more of.
First, attacks against hardware, as opposed to software, will become more common. Last fall, vulnerabilities were discovered in Intel’s Management Engine, a remote-administration feature on its microprocessors. Like Spectre and Meltdown, they affected how the chips operate. Looking for vulnerabilities on computer chips is new. Now that researchers know this is a fruitful area to explore, security researchers, foreign intelligence agencies, and criminals will be on the hunt.
Second, because microprocessors are fundamental parts of computers, patching requires coordination between many companies. Even when manufacturers like Intel and AMD can write a patch for a vulnerability, computer makers and application vendors still have to customize and push the patch out to the users. This makes it much harder to keep vulnerabilities secret while patches are being written. Spectre and Meltdown were announced prematurely because details were leaking and rumors were swirling. Situations like this give malicious actors more opportunity to attack systems before they’re guarded.
Third, these vulnerabilities will affect computers’ functionality. In some cases, the patches for Spectre and Meltdown result in significant reductions in speed. The press initially reported 30%, but that only seems true for certain servers running in the cloud. For your personal computer or phone, the performance hit from the patch is minimal. But as more vulnerabilities are discovered in hardware, patches will affect performance in noticeable ways.
And then there are the unpatchable vulnerabilities. For decades, the computer industry has kept things secure by finding vulnerabilities in fielded products and quickly patching them. Now there are cases where that doesn’t work. Sometimes it’s because computers are in cheap products that don’t have a patch mechanism, like many of the DVRs and webcams that are vulnerable to the Mirai (and other) botnets — groups of Internet-connected devices sabotaged for coordinated digital attacks. Sometimes it’s because a computer chip’s functionality is so core to a computer’s design that patching it effectively means turning the computer off. This, too, is becoming more common.
Increasingly, everything is a computer: not just your laptop and phone, but your car, your appliances, your medical devices, and global infrastructure. These computers are and always will be vulnerable, but Spectre and Meltdown represent a new class of vulnerability. Unpatchable vulnerabilities in the deepest recesses of the world’s computer hardware is the new normal. It’s going to leave us all much more vulnerable in the future.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.