Today we are launching a new, session-based pricing option for QuickSight, along with additional region support and other important new features. Let’s take a look at each one:
Pay-per-Session Pricing Our customers are making great use of QuickSight and take full advantage of the power it gives them to connect to data sources, create reports, and and explore visualizations.
However, not everyone in an organization needs or wants such powerful authoring capabilities. Having access to curated data in dashboards and being able to interact with the data by drilling down, filtering, or slicing-and-dicing is more than adequate for their needs. Subscribing them to a monthly or annual plan can be seen as an unwarranted expense, so a lot of such casual users end up not having access to interactive data or BI.
In order to allow customers to provide all of their users with interactive dashboards and reports, the Enterprise Edition of Amazon QuickSight now allows Reader access to dashboards on a Pay-per-Session basis. QuickSight users are now classified as Admins, Authors, or Readers, with distinct capabilities and prices:
Authors have access to the full power of QuickSight; they can establish database connections, upload new data, create ad hoc visualizations, and publish dashboards, all for $9 per month (Standard Edition) or $18 per month (Enterprise Edition).
Readers can view dashboards, slice and dice data using drill downs, filters and on-screen controls, and download data in CSV format, all within the secure QuickSight environment. Readers pay $0.30 for 30 minutes of access, with a monthly maximum of $5 per reader.
Admins have all authoring capabilities, and can manage users and purchase SPICE capacity in the account. The QuickSight admin now has the ability to set the desired option (Author or Reader) when they invite members of their organization to use QuickSight. They can extend Reader invites to their entire user base without incurring any up-front or monthly costs, paying only for the actual usage.
A New Region QuickSight is now available in the Asia Pacific (Tokyo) Region:
The UI is in English, with a localized version in the works.
Hourly Data Refresh Enterprise Edition SPICE data sets can now be set to refresh as frequently as every hour. In the past, each data set could be refreshed up to 5 times a day. To learn more, read Refreshing Imported Data.
Access to Data in Private VPCs This feature was launched in preview form late last year, and is now available in production form to users of the Enterprise Edition. As I noted at the time, you can use it to implement secure, private communication with data sources that do not have public connectivity, including on-premises data in Teradata or SQL Server, accessed over an AWS Direct Connect link. To learn more, read Working with AWS VPC.
Parameters with On-Screen Controls QuickSight dashboards can now include parameters that are set using on-screen dropdown, text box, numeric slider or date picker controls. The default value for each parameter can be set based on the user name (QuickSight calls this a dynamic default). You could, for example, set an appropriate default based on each user’s office location, department, or sales territory. Here’s an example:
URL Actions for Linked Dashboards You can now connect your QuickSight dashboards to external applications by defining URL actions on visuals. The actions can include parameters, and become available in the Details menu for the visual. URL actions are defined like this:
You can use this feature to link QuickSight dashboards to third party applications (e.g. Salesforce) or to your own internal applications. Read Custom URL Actions to learn how to use this feature.
Dashboard Sharing You can now share QuickSight dashboards across every user in an account.
Larger SPICE Tables The per-data set limit for SPICE tables has been raised from 10 GB to 25 GB.
Upgrade to Enterprise Edition The QuickSight administrator can now upgrade an account from Standard Edition to Enterprise Edition with a click. This enables provisioning of Readers with pay-per-session pricing, private VPC access, row-level security for dashboards and data sets, and hourly refresh of data sets. Enterprise Edition pricing applies after the upgrade.
Available Now Everything I listed above is available now and you can start using it today!
In November 2013, the first commercially available helium-filled hard drive was introduced by HGST, a Western Digital subsidiary. The 6 TB drive was not only unique in being helium-filled, it was for the moment, the highest capacity hard drive available. Fast forward a little over 4 years later and 12 TB helium-filled drives are readily available, 14 TB drives can be found, and 16 TB helium-filled drives are arriving soon.
Backblaze has been purchasing and deploying helium-filled hard drives over the past year and we thought it was time to start looking at their failure rates compared to traditional air-filled drives. This post will provide an overview, then we’ll continue the comparison on a regular basis over the coming months.
The Promise and Challenge of Helium Filled Drives
We all know that helium is lighter than air — that’s why helium-filled balloons float. Inside of an air-filled hard drive there are rapidly spinning disk platters that rotate at a given speed, 7200 rpm for example. The air inside adds an appreciable amount of drag on the platters that in turn requires an appreciable amount of additional energy to spin the platters. Replacing the air inside of a hard drive with helium reduces the amount of drag, thereby reducing the amount of energy needed to spin the platters, typically by 20%.
We also know that after a few days, a helium-filled balloon sinks to the ground. This was one of the key challenges in using helium inside of a hard drive: helium escapes from most containers, even if they are well sealed. It took years for hard drive manufacturers to create containers that could contain helium while still functioning as a hard drive. This container innovation allows helium-filled drives to function at spec over the course of their lifetime.
Checking for Leaks
Three years ago, we identified SMART 22 as the attribute assigned to recording the status of helium inside of a hard drive. We have both HGST and Seagate helium-filled hard drives, but only the HGST drives currently report the SMART 22 attribute. It appears the normalized and raw values for SMART 22 currently report the same value, which starts at 100 and goes down.
To date only one HGST drive has reported a value of less than 100, with multiple readings between 94 and 99. That drive continues to perform fine, with no other errors or any correlating changes in temperature, so we are not sure whether the change in value is trying to tell us something or if it is just a wonky sensor.
Helium versus Air-Filled Hard Drives
There are several different ways to compare these two types of drives. Below we decided to use just our 8, 10, and 12 TB drives in the comparison. We did this since we have helium-filled drives in those sizes. We left out of the comparison all of the drives that are 6 TB and smaller as none of the drive models we use are helium-filled. We are open to trying different comparisons. This just seemed to be the best place to start.
The most obvious observation is that there seems to be little difference in the Annualized Failure Rate (AFR) based on whether they contain helium or air. One conclusion, given this evidence, is that helium doesn’t affect the AFR of hard drives versus air-filled drives. My prediction is that the helium drives will eventually prove to have a lower AFR. Why? Drive Days.
Let’s go back in time to Q1 2017 when the air-filled drives listed in the table above had a similar number of Drive Days to the current number of Drive Days for the helium drives. We find that the failure rate for the air-filled drives at the time (Q1 2017) was 1.61%. In other words, when the drives were in use a similar number of hours, the helium drives had a failure rate of 1.06% while the failure rate of the air-filled drives was 1.61%.
Helium or Air?
My hypothesis is that after normalizing the data so that the helium and air-filled drives have the same (or similar) usage (Drive Days), the helium-filled drives we use will continue to have a lower Annualized Failure Rate versus the air-filled drives we use. I expect this trend to continue for the next year at least. What side do you come down on? Will the Annualized Failure Rate for helium-filled drives be better than air-filled drives or vice-versa? Or do you think the two technologies will be eventually produce the same AFR over time? Pick a side and we’ll document the results over the next year and see where the data takes us.
As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.
Hard Drive Reliability Statistics for Q1 2018
At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.
Notes and Observations
If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.
The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.
There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.
Welcome Toshiba 8TB drives, almost…
We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.
In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.
So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.
Lifetime Hard Drive Reliability Statistics
While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.
Notes and Observations
The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.
The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.
Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.
We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.
New SMART Attributes
If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:
177 – Wear Range Delta
179 – Used Reserved Block Count Total
181- Program Fail Count Total or Non-4K Aligned Access Count
182 – Erase Fail Count
235 – Good Block Count AND System(Free) Block Count
The 5 values are all related to SSD drives.
Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.
Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.
If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.
Good luck and let us know if you find anything interesting.
[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]
Last week, we shared the first half of our Q&A with Raspberry Pi Trading CEO and Raspberry Pi creator Eben Upton. Today we follow up with all your other questions, including your expectations for a Raspberry Pi 4, Eben’s dream add-ons, and whether we really could go smaller than the Zero.
Get your questions to us now using #AskRaspberryPi on Twitter
With internet security becoming more necessary, will there be automated versions of VPN on an SD card?
There are already third-party tools which turn your Raspberry Pi into a VPN endpoint. Would we do it ourselves? Like the power button, it’s one of those cases where there are a million things we could do and so it’s more efficient to let the community get on with it.
Just to give a counterexample, while we don’t generally invest in optimising for particular use cases, we did invest a bunch of money into optimising Kodi to run well on Raspberry Pi, because we found that very large numbers of people were using it. So, if we find that we get half a million people a year using a Raspberry Pi as a VPN endpoint, then we’ll probably invest money into optimising it and feature it on the website as we’ve done with Kodi. But I don’t think we’re there today.
Have you ever seen any Pis running and doing important jobs in the wild, and if so, how does it feel?
It’s amazing how often you see them driving displays, for example in radio and TV studios. Of course, it feels great. There’s something wonderful about the geographic spread as well. The Raspberry Pi desktop is quite distinctive, both in its previous incarnation with the grey background and logo, and the current one where we have Greg Annandale’s road picture.
And so it’s funny when you see it in places. Somebody sent me a video of them teaching in a classroom in rural Pakistan and in the background was Greg’s picture.
Raspberry Pi 4!?!
There will be a Raspberry Pi 4, obviously. We get asked about it a lot. I’m sticking to the guidance that I gave people that they shouldn’t expect to see a Raspberry Pi 4 this year. To some extent, the opportunity to do the 3B+ was a surprise: we were surprised that we’ve been able to get 200MHz more clock speed, triple the wireless and wired throughput, and better thermals, and still stick to the $35 price point.
We’re up against the wall from a silicon perspective; we’re at the end of what you can do with the 40nm process. It’s not that you couldn’t clock the processor faster, or put a larger processor which can execute more instructions per clock in there, it’s simply about the energy consumption and the fact that you can’t dissipate the heat. So we’ve got to go to a smaller process node and that’s an order of magnitude more challenging from an engineering perspective. There’s more effort, more risk, more cost, and all of those things are challenging.
With 3B+ out of the way, we’re going to start looking at this now. For the first six months or so we’re going to be figuring out exactly what people want from a Raspberry Pi 4. We’re listening to people’s comments about what they’d like to see in a new Raspberry Pi, and I’m hoping by early autumn we should have an idea of what we want to put in it and a strategy for how we might achieve that.
Could you go smaller than the Zero?
The challenge with Zero as that we’re periphery-limited. If you run your hand around the unit, there is no edge of that board that doesn’t have something there. So the question is: “If you want to go smaller than Zero, what feature are you willing to throw out?”
It’s a single-sided board, so you could certainly halve the PCB area if you fold the circuitry and use both sides, though you’d have to lose something. You could give up some GPIO and go back to 26 pins like the first Raspberry Pi. You could give up the camera connector, you could go to micro HDMI from mini HDMI. You could remove the SD card and just do USB boot. I’m inventing a product live on air! But really, you could get down to two thirds and lose a bunch of GPIO – it’s hard to imagine you could get to half the size.
What’s the one feature that you wish you could outfit on the Raspberry Pi that isn’t cost effective at this time? Your dream feature.
Well, more memory. There are obviously technical reasons why we don’t have more memory on there, but there are also market reasons. People ask “why doesn’t the Raspberry Pi have more memory?”, and my response is typically “go and Google ‘DRAM price’”. We’re used to the price of memory going down. And currently, we’re going through a phase where this has turned around and memory is getting more expensive again.
Machine learning would be interesting. There are machine learning accelerators which would be interesting to put on a piece of hardware. But again, they are not going to be used by everyone, so according to our method of pricing what we might add to a board, machine learning gets treated like a $50 chip. But that would be lovely to do.
Which citizen science projects using the Pi have most caught your attention?
I like the wildlife camera projects. We live out in the countryside in a little village, and we’re conscious of being surrounded by nature but we don’t see a lot of it on a day-to-day basis. So I like the nature cam projects, though, to my everlasting shame, I haven’t set one up yet. There’s a range of them, from very professional products to people taking a Raspberry Pi and a camera and putting them in a plastic box. So those are good fun.
How does it feel to go to bed every day knowing you’ve changed the world for the better in such a massive way?
What feels really good is that when we started this in 2006 nobody else was talking about it, but now we’re part of a very broad movement.
We were in a really bad way: we’d seen a collapse in the number of applicants applying to study Computer Science at Cambridge and elsewhere. In our view, this reflected a move away from seeing technology as ‘a thing you do’ to seeing it as a ‘thing that you have done to you’. It is problematic from the point of view of the economy, industry, and academia, but most importantly it damages the life prospects of individual children, particularly those from disadvantaged backgrounds. The great thing about STEM subjects is that you can’t fake being good at them. There are a lot of industries where your Dad can get you a job based on who he knows and then you can kind of muddle along. But if your dad gets you a job building bridges and you suck at it, after the first or second bridge falls down, then you probably aren’t going to be building bridges anymore. So access to STEM education can be a great driver of social mobility.
By the time we were launching the Raspberry Pi in 2012, there was this wonderful movement going on. Code Club, for example, and CoderDojo came along. Lots of different ways of trying to solve the same problem. What feels really, really good is that we’ve been able to do this as part of an enormous community. And some parts of that community became part of the Raspberry Pi Foundation – we merged with Code Club, we merged with CoderDojo, and we continue to work alongside a lot of these other organisations. So in the two seconds it takes me to fall asleep after my face hits the pillow, that’s what I think about.
We’re currently advertising a Programme Manager role in New Delhi, India. Did you ever think that Raspberry Pi would be advertising a role like this when you were bringing together the Foundation?
No, I didn’t.
But if you told me we were going to be hiring somewhere, India probably would have been top of my list because there’s a massive IT industry in India. When we think about our interaction with emerging markets, India, in a lot of ways, is the poster child for how we would like it to work. There have already been some wonderful deployments of Raspberry Pi, for example in Kerala, without our direct involvement. And we think we’ve got something that’s useful for the Indian market. We have a product, we have clubs, we have teacher training. And we have a body of experience in how to teach people, so we have a physical commercial product as well as a charitable offering that we think are a good fit.
It’s going to be massive.
What is your favourite BBC type-in listing?
There was a game called Codename: Druid. There is a famous game called Codename: Droid which was the sequel to Stryker’s Run, which was an awesome, awesome game. And there was a type-in game called Codename: Druid, which was at the bottom end of what you would consider a commercial game.
And I remember typing that in. And what was really cool about it was that the next month, the guy who wrote it did another article that talks about the memory map and which operating system functions used which bits of memory. So if you weren’t going to do disc access, which bits of memory could you trample on and know the operating system would survive.
I still like type-in listings. The Raspberry Pi 2018 Annual has a type-in listing that I wrote for a Babbage versus Bugs game. I will say that’s not the last type-in listing you will see from me in the next twelve months. And if you download the PDF, you could probably copy and paste it into your favourite text editor to save yourself some time.
Each year we take stock at the Raspberry Pi Foundation, looking back at what we’ve achieved over the previous twelve months. We’ve just published our Annual Review for 2017, reflecting on the progress we’ve made as a foundation and a community towards putting the power of digital making in the hands of people all over the world.
In the review, you can find out about all the different education programmes we run. Moreover, you can hear from people who have taken part, learned through making, and discovered they can do things with technology that they never thought they could.
Growing our reach
Our reach grew hugely in 2017, and the numbers tell this story.
By the end of 2017, we’d sold over 17 million Raspberry Pi computers, bringing tools for learning programming and physical computing to people all over the world.
Vibrant learning and making communities
Code Club grew by 2964 clubs in 2017, to over 10000 clubs across the world reaching over 150000 9- to 13-year-olds.
“The best moment is seeing a child discover something for the first time. It is amazing.” – Code Club volunteer
In 2017 CoderDojo became part of the Raspberry Pi family. Over the year, it grew by 41% to 1556 active Dojos, involving nearly 40000 7- to 17-year-olds in creating with code and collaborating to learn about technology.
Raspberry Jams continued to grow, with 18700 people attending events organised by our amazing community members.
Supporting teaching and learning
We reached 208 projects in our online resources in 2017, and 8.5 million people visited these to get making.
“I like coding because it’s like a whole other language that you have to learn, and it creates something very interesting in the end.” – Betty, Year 10 student
2017 was also the year we began offering online training courses. 19000 people joined us to learn about programming, physical computing, and running a Code Club.
Over 6800 young people entered Mission Zero and Mission Space Lab, 2017’s two Astro Pi challenges. They created code that ran on board the International Space Station or will run soon.
More than 600 educators joined our face-to-face Picademy training last year. Our community of Raspberry Pi Certified Educators grew to 1500, all leading digital making across schools, libraries, and other settings where young people learn.
Well over a million people follow us on social media, and in 2017 we’ve seen big increases in our YouTube and Instagram followings. We have been creating much more video content to share what we do with audiences on these and other social networks.
It’s been a big year, as we continue to reach even more people. This wouldn’t be possible without the amazing work of volunteers and community members who do so much to create opportunities for others to get involved. Behind each of these numbers is a person discovering digital making for the first time, learning new skills, or succeeding with a project that makes a difference to something they care about.
You can read our 2017 Annual Review in full over on our About Us page.
This summer, the Raspberry Pi Foundation is bringing you an all-new community event taking place in Cambridge, UK!
On the weekend of Saturday 30 June and Sunday 1 July 2018, the Pi Towers team, with lots of help from our community of young people, educators, hobbyists, and tech enthusiasts, will be running Raspberry Fields, our brand-new annual festival of digital making!
It will be a chance for people of all ages and skill levels to have a go at getting creative with tech, and it will be a celebration of all that our digital makers have already learnt and achieved, whether through taking part in Code Clubs, CoderDojos, or Raspberry Jams, or through trying our resources at home.
Dive into digital making
At Raspberry Fields, you will have the chance to inspire your inner inventor! Learn about amazing projects others in the community are working on, such as cool robots and wearable technology; have a go at a variety of hands-on activities, from home automation projects to remote-controlled vehicles and more; see fascinating science- and technology-related talks and musical performances. After your visit, you’ll be excited to go home and get making!
If you’re wondering about bringing along young children or less technologically minded family members or friends, there’ll be plenty for them to enjoy — with lots of festival-themed activities such as face painting, fun performances, free giveaways, and delicious food, Raspberry Fields will have something for everyone!
Get your tickets
This two-day ticketed event will be taking place at Cambridge Junction, the city’s leading arts centre. Tickets are £5 if you are aged 16 or older, and free for everyone under 16. Get your tickets by clicking the button on the Raspberry Fields web page!
Where: Cambridge Junction, Clifton Way, Cambridge, CB1 7GX, UK When: Saturday 30 June 2018, 10:30 – 18:00 and Sunday 1 July 2018, 10:00 – 17:30
We are currently looking for people who’d like to contribute activities, talks, or performances with digital themes to the festival. This could be something like live music, dance, or other show acts; talks; or drop-in making activities. In addition, we’re looking for artists who’d like to showcase interactive digital installations, for proud makers who are keen to exhibit their projects, and for vendors who’d like to join in. We particularly encourage young people to showcase projects they’ve created or deliver talks on their digital making journey!
Your contribution to Raspberry Fields should focus on digital making and be fun and engaging for an audience of various ages. However, it doesn’t need to be specific to Raspberry Pi. You might be keen to demonstrate a project you’ve built, do a short Q&A session on what you’ve learnt, or present something more in-depth in the auditorium; maybe you’re one of our approved resellers wanting to showcase in our market area. We’re also looking for digital makers to run drop-in activity sessions, as well as for people who’d like to be marshals with smiling faces who will ensure that everyone has a wonderful time!
If you’d like to take part in Raspberry Fields, let us know via this form, and we’ll be in touch with you soon.
In Part 2, we take a deeper look at the differences between HDDs and SSDs, how both HDD and SSD technologies are evolving, and how Backblaze takes advantage of SSDs in our operations and data centers.
The first time you booted a computer or opened an app on a computer with a solid-state-drive (SSD), you likely were delighted. I know I was. I loved the speed, silence, and just the wow factor of this new technology that seemed better in just about every way compared to hard drives.
I was ready to fully embrace the promise of SSDs. And I have. My desktop uses an SSD for booting, applications, and for working files. My laptop has a single 512GB SSD. I still use hard drives, however. The second, third, and fourth drives in my desktop computer are HDDs. The external USB RAID I use for local backup uses HDDs in four drive bays. When my laptop is at my desk it is attached to a 1.5TB USB backup hard drive. HDDs still have a place in my personal computing environment, as they likely do in yours.
Nothing stays the same for long, however, especially in the fast-changing world of computing, so we are certain to see new storage technologies coming to the fore, perhaps with even more wow factor.
Before we get to what’s coming, let’s review the primary differences between HDDs and SSDs in a little more detail in the following table.
A Comparison of HDDs to SSDs
Power Draw/Battery Life
More power draw, averages 6–7 watts and therefore uses more battery
Less power draw, averages 2–3 watts, resulting in 30+ minute battery boost
Only around $0.03 per gigabyte, very cheap (buying a 4TB model)
Expensive, roughly $0.20- $0.30 per gigabyte (based on buying a 1TB drive)
Typically around 500GB and 2TB maximum for notebook size drives; 10TB max for desktops
Typically not larger than 1TB for notebook size drives; 4TB for desktops
Operating System Boot Time
Around 30-40 seconds average bootup time
Around 8-13 seconds average bootup time
Audible clicks and spinning platters can be heard
There are no moving parts, hence no sound
The spinning of the platters can sometimes result in vibration
No vibration as there are no moving parts
HDD doesn’t produce much heat, but it will have a measurable amount more heat than an SSD due to moving parts and higher power draw
Lower power draw and no moving parts so little heat is produced
Mean time between failure rate of 1.5 million hours
Mean time between failure rate of 2.0 million hours
File Copy / Write Speed
The range can be anywhere from 50–120MB/s
Generally above 200 MB/s and up to 550 MB/s for cutting edge drives
Full Disk Encryption (FDE) Supported on some models
Full Disk Encryption (FDE) Supported on some models
The HDD has an amazing history of improvement and innovation. From its inception in 1956 the hard drive has decreased in size 57,000 times, increased storage 1 million times, and decreased cost 2,000 times. In other words, the cost per gigabyte has decreased by 2 billion times in about 60 years.
Hard drive manufacturers made these dramatic advances by reducing the size, and consequently the seek times, of platters while increasing their density, improving disk reading technologies, adding multiple arms and read/write heads, developing better bus interfaces, and increasing spin speed and reducing friction with techniques such as filling drives with helium.
In 2005, the drive industry introduced perpendicular recording technology to replace the older longitudinal recording technology, which enabled areal density to reach more than 100 gigabits per square inch. Longitudinal recording aligns data bits horizontally in relation to the drive’s spinning platter, parallel to the surface of the disk, while perpendicular recording aligns bits vertically, perpendicular to the disk surface.
Other technologies such as bit patterned media recording (BPMR) are contributing to increased densities, as well. Introduced by Toshiba in 2010, BPMR is a proposed hard disk drive technology that could succeed perpendicular recording. It records data using nanolithography in magnetic islands, with one bit per island. This contrasts with current disk drive technology where each bit is stored in 20 to 30 magnetic grains within a continuous magnetic film.
Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in HDDs to increase storage density and overall per-drive storage capacity. Shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing for higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because physical limitations prevent recording magnetic heads from having the same width as reading heads, leaving recording heads wider.
Track Spacing Enabled by SMR Technology (Seagate)
To increase the amount of data stored on a drive’s platter requires cramming the magnetic regions closer together, which means the grains need to be smaller so they won’t interfere with each other. In 2002, Seagate successfully demoed heat-assisted magnetic recording (HAMR). HAMR records magnetically using laser-thermal assistance that ultimately could lead to a 20 terabyte drive by 2019. (See our post on HAMR by Seagate’s CTO Mark Re, What is HAMR and How Does It Enable the High-Capacity Needs of the Future?)
Western Digital claims that its competing microwave-assisted magnetic recording (MAMR) could enable drive capacity to increase up to 40TB by the year 2025. Some industry watchers and drive manufacturers predict increases in areal density from today’s .86 tbpsi terabit-per-square-inch (TBPSI) to 10 tbpsi by 2025 resulting in as much as 100TB drive capacity in the next decade.
The future certainly does look bright for HDDs continuing to be with us for a while.
The Outlook for SSDs
SSDs are also in for some amazing advances.
SATA (Serial Advanced Technology Attachment) is the common hardware interface that allows the transfer of data to and from HDDs and SSDs. SATA SSDs are fine for the majority of home users, as they are generally cheaper, operate at a lower speed, and have a lower write life.
While fine for everyday computing, in a RAID (Redundant Array of Independent Disks), server array or data center environment, often a better alternative has been to use ‘SAS’ drives, which stands for Serial Attached SCSI. This is another type of interface that, again, is usable either with HDDs or SSDs. ‘SCSI’ stands for Small Computer System Interface (which is why SAS drives are sometimes referred to as ‘scuzzy’ drives). SAS has increased IOPS (Inputs Outputs Per Second) over SATA, meaning it has the ability to read and write data faster. This has made SAS an optimal choice for systems that require high performance and availability.
On an enterprise level, SAS prevails over SATA, as SAS supports over-provisioning to prolong write life and has been specifically designed to run in environments that require constant drive usage.
PCIe (Peripheral Component Interconnect Express) is a high speed serial computer expansion bus standard that supports drastically higher data transfer rates over SATA or SAS interfaces due to the fact that there are more channels available for the flow of data.
Many leading drive manufacturers have been adopting PCIe as the standard for new home and enterprise storage and some peripherals. For example, you’ll see that the latest Apple Macbooks ship with PCIe-based flash storage, something that Apple has been adopting over the years with their consumer devices.
PCIe can also be used within data centers for RAID systems and to create high-speed networking capabilities, increasing overall performance and supporting the newer and higher capacity HDDs.
As we covered in Part 1, SSDs are based on a type of non-volatile flash memory called NAND.The latest trend in NAND flash is quad-level-cell (QLC) NAND. NAND is subdivided into types based on how many bits of data are stored in each physical memory cell. SLC (single-level-cell) stores one bit, MLC (multi-level-cell) stores two, TLC (triple-level cell) stores three, and QLC (quad-level-cell) stores four.
Storing more data per cell makes NAND more dense, but it also makes the memory slower — it takes more time to read and write data when so much additional information (and so many more charge states) are stored within the same cell of memory.
QLC NAND memory is built on older process nodes with larger cells that can more easily store multiple bits of data. The new NAND tech has higher overall reliability with higher total number of program / erase cycles (P/E cycles).
QLC NAND wafer from which individual microcircuits are made
QLC NAND promises to produce faster and denser SSDs. The effect on price also could be dramatic. Tom’s Hardware is predicting that the advent of QLC could push 512GB SSDs down to $100.
Beyond HDDs and SSDs
There is significant work being done that is pushing the bounds of data storage beyond what is possible with spinning platters and microcircuits. A team at Harvard University has used genome-editing to encode video into live bacteria.
We’ve already discussed the benefits of SSDs. The benefits of SSDs that apply particularly to the data center are:
Low power consumption — When you are running lots of drives, power usage adds up. Anywhere you can conserve power is a win.
Speed — Data can be accessed faster, which is especially beneficial for caching databases and other data affecting overall application or system performance.
Lack of vibration — Reducing vibration improves reliability thereby reducing problems and maintenance. Racks don’t need the size and structural rigidity housing SSDs that they need housing HDDs.
Low noise — Data centers will become quieter as more SSDs are deployed.
Low heat production — The less heat generated the less cooling and power required in the data center.
Faster booting — The faster a storage chassis can get online or a critical server can be rebooted after maintenance or a problem, the better.
Greater areal density — Data centers will be able to store more data in less space, which increases efficiency in all areas (power, cooling, etc.)
The top drive manufacturers say that they expect HDDs and SSDs to coexist for the foreseeable future in all areas — home, business, and data center, with customers choosing which technology and product will best fit their application.
How Backblaze Uses SSDs
In just about all respects, SSDs are superior to HDDs. So why don’t we replace the 100,000+ hard drives we have spinning in our data centers with SSDs?
Our operations team takes advantage of the benefits and savings of SSDs wherever they can, using them in every place that’s appropriate other than primary data storage. They’re particularly useful in our caching and restore layers, where we use them strategically to speed up data transfers. SSDs also speed up access to B2 Cloud Storage metadata. Our operations teams is considering moving to SSDs to boot our Storage Pods, where the cost of a small SSD is competitive with hard drives, and their other attributes (small size, lack of vibration, speed, low-power consumption, reliability) are all pluses.
A Future with Both HDDs and SSDs
IDC predicts that total data created will grow from approximately 33 zettabytes in 2018 to about 160 zettabytes in 2025. (See What’s a Byte? if you’d like help understanding the size of a zettabyte.)
Annual Size of the Global Datasphere
Over 90% of enterprise drive shipments today are HDD, according to IDC. By 2025, SSDs will comprise almost 20% of drive shipments. SSDs will gain share, but total growth in data created will result in massive sales of both HDDs and SSDs.
Enterprise Byte Shipments: NDD and SSD
As both HDD and SSD sales grow, so does the capacity of both technologies. Given the benefits of SSDs in many applications, we’re likely going to see SSDs replacing HDDs in all but the highest capacity uses.
It’s clear that there are merits to both HDDs and SSDs. If you’re not running a data center, and don’t have more than one or two terabytes of data to store on your home or business computer, your first choice likely should be an SSD. They provide a noticeable improvement in performance during boot-up and data transfer, and are smaller, quieter, and more reliable as well. Save the HDDs for secondary drives, NAS, RAID, and local backup devices in your system.
Perhaps some day we’ll look back at the days of spinning platters with the same nostalgia we look back at stereo LPs, and some of us will have an HDD paperweight on our floating anti-gravity desk as a conversation piece. Until the day that SSD’s performance, capacity, and finally, price, expel the last HDD out of the home and data center, we can expect to live in a world that contains both solid state SSDs and magnetic platter HDDs, and as users we will reap the benefits from both technologies.
Don’t miss future posts on HDDs, SSDs, and other topics, including hard drive stats, cloud storage, and tips and tricks for backing up to the cloud. Use the Join button above to receive notification of future posts on our blog.
Bug bounties end up in the news with some regularity, usually for the wrong reasons. I’ve been itching to write about that for a while – but instead of dwelling on the mistakes of the bygone days, I figured it may be better to talk about some of the ways to get vulnerability rewards right.
What do you get out of bug bounties?
There’s plenty of differing views, but I like to think of such programs simply as a bid on researchers’ time. In the most basic sense, you get three benefits:
Improved ability to detect bugs in production before they become major incidents.
A comparatively unbiased feedback loop to help you prioritize and measure other security work.
A robust talent pipeline for when you need to hire.
What bug bounties don’t offer?
You don’t get anything resembling a comprehensive security program or a systematic assessment of your platforms. Researchers end up looking for bugs that offer favorable effort-to-payoff ratios for their skills and given the very imperfect information they have about your enterprise. In other words, you may end up with a hundred people looking for XSS and just one person looking for RCE.
Your reward structure can steer them toward the targets and bugs you care about, but it’s difficult to fully eliminate this inherent skew. There’s only so far you can jack up your top-tier rewards, and only so far you can go lowering the bottom-tier ones.
Don’t you have to outcompete the black market to get all the “good” bugs?
There is a free market price discovery component to it all: if you’re not getting the engagement you were hoping for, you should probably consider paying more.
That said, there are going to be researchers who’d rather hurt you than work for you, no matter how much you pay; you don’t have to win them over, and you don’t have to outspend every authoritarian government or every crime syndicate. A bug bounty is effective simply if it attracts enough eyeballs to make bugs statistically harder to find, and reduces the useful lifespan of any zero-days in black market trade. Plus, most researchers don’t want their work to be used to crack down on dissidents in Egypt or Vietnam.
Another factor is that you’re paying for different things: a black market buyer probably wants a reliable exploit capable of delivering payloads, and then demands silence for months or years to come; a vendor-run bug bounty program is usually perfectly happy with a reproducible crash and doesn’t mind a researcher blogging about their work.
In fact, while money is important, you will probably find out that it’s not enough to retain your top talent; many folks want bug bounties to be more than a business transaction, and find a lot of value in having a close relationship with your security team, comparing notes, and growing together. Fostering that partnership can be more important than adding another $10,000 to your top reward.
How do I prevent it all from going horribly wrong?
Bug bounties are an unfamiliar beast to most lawyers and PR folks, so it’s a natural to be wary and try to plan for every eventuality with pages and pages of impenetrable rules and fine-print legalese.
This is generally unnecessary: there is a strong self-selection bias, and almost every participant in a vulnerability reward program will be coming to you in good faith. The more friendly, forthcoming, and approachable you seem, and the more you treat them like peers, the more likely it is for your relationship to stay positive. On the flip side, there is no faster way to make enemies than to make a security researcher feel that they are now talking to a lawyer or to the PR dept.
Most people have strong opinions on disclosure policies; instead of imposing your own views, strive to patch reported bugs reasonably quickly, and almost every reporter will play along. Demand researchers to cancel conference appearances, take down blog posts, or sign NDAs, and you will sooner or later end up in the news.
But what if that’s not enough?
As with any business endeavor, mistakes will happen; total risk avoidance is seldom the answer. Learn to sincerely apologize for mishaps; it’s not a sign of weakness to say “sorry, we messed up”. And you will almost certainly not end up in the courtroom for doing so.
It’s good to foster a healthy and productive relationship with the community, so that they come to your defense when something goes wrong. Encouraging people to disclose bugs and talk about their experiences is one way of accomplishing that.
What about extortion?
You should structure your program to naturally discourage bad behavior and make it stand out like a sore thumb. Require bona fide reports with complete technical details before any reward decision is made by a panel of named peers; and make it clear that you never demand non-disclosure as a condition of getting a reward.
To avoid researchers accidentally putting themselves in awkward situations, have clear rules around data exfiltration and lateral movement: assure them that you will always pay based on the worst-case impact of their findings; in exchange, ask them to stop as soon as they get a shell and never access any data that isn’t their own.
So… are there any downsides?
Yep. Other than souring up your relationship with the community if you implement your program wrong, the other consideration is that bug bounties tend to generate a lot of noise from well-meaning but less-skilled researchers.
When this happens, do not get frustrated and do not penalize such participants; instead, help them grow. Consider publishing educational articles, giving advice on how to investigate and structure reports, or offering free workshops every now and then.
The other downside is cost; although bug bounties tend to offer far more bang for your buck than your average penetration test, they are more random. The annual expenses tend to be fairly predictable, but there is always some possibility of having to pay multiple top-tier rewards in rapid succession. This is the kind of uncertainty that many mid-level budget planners react badly to.
Finally, you need to be able to fix the bugs you receive. It would be nuts to prefer to not know about the vulnerabilities in the first place – but once you invite the research, the clock starts ticking and you need to ship fixes reasonably fast.
So… should I try it?
There are folks who enthusiastically advocate for bug bounties in every conceivable situation, and people who dislike them with fierce passion; both sentiments are usually strongly correlated with the line of business they are in.
In reality, bug bounties are not a cure-all, and there are some ways to make them ineffectual or even dangerous. But they are not as risky or expensive as most people suspect, and when done right, they can actually be fun for your team, too. You won’t know for sure until you try.
The Free Software Foundation has announced the availability of its 2016 annual report. “The Annual Report reviews the Foundation’s activities, accomplishments, and financial picture from October 1, 2015 to September 30, 2016. It is the result of a full external financial audit, along with a focused study of program results.” It may lack punctuality, but it makes up for it in glitz.
Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.
Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.
In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:
Cassandra resource overview
High availability and resiliency
Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.
AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.
Cassandra resource overview
Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.
A single Cassandra deployment.
This typically consists of multiple physical locations, keyspaces, and physical servers.
A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
A group of nodes configured as a single replication group.
A logical deployment construct in AWS.
A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.
A collection of servers.
A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.
A single Availability Zone.
A physical virtual machine running Cassandra software.
An EC2 instance.
Conceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data.
Managed within Cassandra.
Virtual node (vnode)
Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter.
Managed within Cassandra.
The total number of replicas across the cluster.
Managed within Cassandra.
One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.
We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.
You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.
In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.
Single AWS Region, 3 Availability Zones
Single region, 3 Availability Zones
In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.
To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).
This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.
● Highly available, can sustain failure of one Availability Zone.
● Simple deployment
● Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.
● No data loss during failover.
● Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.
● Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.
● High operational overhead
● The second Region effectively doubles the cost
In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.
However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.
We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).
● No data loss during failover.
● Highly available, can sustain failure or partitioning of one whole Region.
● High operational overhead.
● High latency for writes for eventual consistency.
● The second Region effectively doubles the cost.
In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:
Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.
The choice of instance type is generally driven by the type of storage:
If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
Burstable instance types (T2) don’t offer good performance for Cassandra deployments.
Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.
In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.
As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).
(translates to higher query performance)
Up to 3.3M on I3
This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.
Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.
AWS instance types
Compute optimized, C5
Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Basic building blocks are available from AWS.
Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.
a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.
b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.
EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.
The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.
Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.
Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.
EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.
Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.
EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.
If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.
Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.
In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.
To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.
If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.
High availability and resiliency
Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.
Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.
In this section, we look at ways to ensure that your Cassandra cluster is healthy:
Backup and restore
Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.
Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.
All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.
In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.
Backup and restore
Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.
For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.
We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.
Encryption at rest
Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.
Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.
Encryption in transit
Cassandra uses Transport Layer Security (TLS) for client and internode communications.
The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.
The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.
In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.
Beginning in April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. As of the end of 2017, there are about 88 million entries totaling 23 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.
At the end of 2017 we had 93,240 spinning hard drives. Of that number, there were 1,935 boot drives and 91,305 data drives. This post looks at the hard drive statistics of the data drives we monitor. We’ll review the stats for Q4 2017, all of 2017, and the lifetime statistics for all of the drives Backblaze has used in our cloud storage data centers since we started keeping track. Along the way we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
Hard Drive Reliability Statistics for Q4 2017
At the end of Q4 2017 Backblaze was monitoring 91,305 hard drives used to store data. For our evaluation we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives (read why after the chart). This leaves us with 91,243 hard drives. The table below is for the period of Q4 2017.
A few things to remember when viewing this chart:
The failure rate listed is for just Q4 2017. If a drive model has a failure rate of 0%, it means there were no drive failures of that model during Q4 2017.
There were 62 drives (91,305 minus 91,243) that were not included in the list above because we did not have at least 45 of a given drive model. The most common reason we would have fewer than 45 drives of one model is that we needed to replace a failed drive and we had to purchase a different model as a replacement because the original model was no longer available. We use 45 drives of the same model as the minimum number to qualify for reporting quarterly, yearly, and lifetime drive statistics.
Quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of drive days. For example, the Seagate 4 TB drive, model ST4000DM005, has a annualized failure rate of 29.08%, but that is based on only 1,255 drive days and 1 (one) drive failure.
AFR stands for Annualized Failure Rate, which is the projected failure rate for a year based on the data from this quarter only.
Bulking Up and Adding On Storage
Looking back over 2017, we not only added new drives, we “bulked up” by swapping out functional and smaller 2, 3, and 4TB drives with larger 8, 10, and 12TB drives. The changes in drive quantity by quarter are shown in the chart below:
For 2017 we added 25,746 new drives, and lost 6,442 drives to retirement for a net of 19,304 drives. When you look at storage space, we added 230 petabytes and retired 19 petabytes, netting us an additional 211 petabytes of storage in our data center in 2017.
2017 Hard Drive Failure Stats
Below are the lifetime hard drive failure statistics for the hard drive models that were operational at the end of Q4 2017. As with the quarterly results above, we have removed any non-production drives and any models that had fewer than 45 drives.
The chart above gives us the lifetime view of the various drive models in our data center. The Q4 2017 chart at the beginning of the post gives us a snapshot of the most recent quarter of the same models.
Let’s take a look at the same models over time, in our case over the past 3 years (2015 through 2017), by looking at the annual failure rates for each of those years.
The failure rate for each year is calculated for just that year. In looking at the results the following observations can be made:
The failure rates for both of the 6 TB models, Seagate and WDC, have decreased over the years while the number of drives has stayed fairly consistent from year to year.
While it looks like the failure rates for the 3 TB WDC drives have also decreased, you’ll notice that we migrated out nearly 1,000 of these WDC drives in 2017. While the remaining 180 WDC 3 TB drives are performing very well, decreasing the data set that dramatically makes trend analysis suspect.
The Toshiba 5 TB model and the HGST 8 TB model had zero failures over the last year. That’s impressive, but with only 45 drives in use for each model, not statistically useful.
The HGST/Hitachi 4 TB models delivered sub 1.0% failure rates for each of the three years. Amazing.
A Few More Numbers
To save you countless hours of looking, we’ve culled through the data to uncover the following tidbits regarding our ever changing hard drive farm.
116,833 — The number of hard drives for which we have data from April 2013 through the end of December 2017. Currently there are 91,305 drives (data drives) in operation. This means 25,528 drives have either failed or been removed from service due for some other reason — typically migration.
29,844 — The number of hard drives that were installed in 2017. This includes new drives, migrations, and failure replacements.
81.76 — The number of hard drives that were installed each day in 2017. This includes new drives, migrations, and failure replacements.
95,638 — The number of drives installed since we started keeping records in April 2013 through the end of December 2017.
55.41 — The average number of hard drives installed per day from April 2013 to the end of December 2017. The installations can be new drives, migration replacements, or failure replacements.
1,508 — The number of hard drives that were replaced as failed in 2017.
4.13 — The average number of hard drives that have failed each day in 2017.
6,795 — The number of hard drives that have failed from April 2013 until the end of December 2017.
3.94 — The average number of hard drives that have failed each day from April 2013 until the end of December 2017.
Can’t Get Enough Hard Drive Stats?
We’ll be presenting the webinar “Backblaze Hard Drive Stats for 2017” on Thursday February 9, 2017 at 10:00 Pacific time. The webinar will dig deeper into the quarterly, yearly, and lifetime hard drive stats and include the annual and lifetime stats by drive size and manufacturer. You will need to subscribe to the Backblaze BrightTALK channel to view the webinar. Sign up today.
As a reminder, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone — it is free.
Good luck and let us know if you find anything interesting.
Local residents are opposing adding an elevator to a subway station because terrorists might use it to detonate a bomb. No, really. There’s no actual threat analysis, only fear:
“The idea that people can then ride in on the subway with a bomb or whatever and come straight up in an elevator is awful to me,” said Claudia Ward, who lives in 15 Broad Street and was among a group of neighbors who denounced the plan at a recent meeting of the local community board. “It’s too easy for someone to slip through. And I just don’t want my family and my neighbors to be the collateral on that.”
Local residents plan to continue to fight, said Ms. Gerstman, noting that her building’s board decided against putting decorative planters at the building’s entrance over fears that shards could injure people in the event of a blast.
“Knowing that, and then seeing the proposal for giant glass structures in front of my building - ding ding ding! — what does a giant glass structure become in the event of an explosion?” she said.
In 2005, I coined the term “movie-plot threat” to denote a threat scenario that caused undue fear solely because of its specificity. Longtime readers of this blog will remember my annual Movie-Plot Threat Contests. I ended the contest in 2015 because I thought the meme had played itself out. Clearly there’s more work to be done.
As we head into 2018 and start looking forward to longer days in the Northern hemisphere, I thought I’d take a look back at last year’s weather using data from Raspberry Pi Oracle Weather Stations. One of the great things about the kit is that as well as uploading all its readings to the shared online Oracle database, it stores them locally on the Pi in a MySQL or MariaDB database. This means you can use the power of SQL queries coupled with Python code to do automatic data analysis.
My Weather Station has only been installed since May, so I didn’t have a full 52 weeks of my own data to investigate. Still, my station recorded more than 70000 measurements. Living in England, the first thing I wanted to know was: which was the wettest month? Unsurprisingly, both in terms of average daily rainfall and total rainfall, the start of the summer period — exactly when I went on a staycation — was the soggiest:
What about the global Weather Station community?
Even soggier Bavaria
Here things get slightly trickier. Although we have a shiny Oracle database full of all participating schools’ sensor readings, some of the data needs careful interpretation. Many kits are used as part of the school curriculum and do not always record genuine outdoor conditions. Nevertheless, it appears that Adalbert Stifter Gymnasium in Bavaria, Germany, had an even wetter 2017 than my home did:
The records Robert-Dannemann Schule in Westerstede, Germany, is a good example of data which was most likely collected while testing and investigating the weather station sensors, rather than in genuine external conditions. Unless this school’s Weather Station was transported to a planet which suffers from extreme hurricanes, it wasn’t actually subjected to wind speeds above 1000km/h in November. Dismissing these and all similarly suspect records, I decided to award the ‘Windiest location of the year’ prize to CEIP Noalla-Telleiro, Spain.
This school is right on the coast, and is subject to some strong and squally weather systems.
Weather Station at CEIP Noalla-Telleiro
They’ve mounted their wind vane and anemometer nice and high, so I can see how they were able to record such high wind velocities.
A couple of Weather Stations have recently been commissioned in equally exposed places — it will be interesting to see whether they will record even higher speeds during 2018.
Highs and lows
After careful analysis and a few disqualifications (a couple of Weather Stations in contention for this category were housed indoors), the ‘Hottest location’ award went to High School of Chalastra in Thessaloniki, Greece. There were a couple of Weather Stations (the one at The Marwadi Education Foundation in India, for example) that reported higher average temperatures than Chalastra’s 24.54 ºC. However, they had uploaded far fewer readings and their data coverage of 2017 was only partial.
At the other end of the thermometer, the location with the coldest average temperature is École de la Rose Sauvage in Calgary, Canada, with a very chilly 9.9 ºC.
Weather Station at École de la Rose Sauvage
I suspect this school has a good chance of retaining the title: their lowest 2017 temperature of -24 ºC is likely to be beaten in 2018 due to extreme weather currently bringing a freezing start to the year in that part of the world.
If you have an Oracle Raspberry Pi Weather Station and would like to perform an annual review of your local data, you can use this Python script as a starting point. It will display a monthly summary of the temperature and rainfall for 2017, and you should be able to customise the code to focus on other sensor data or on a particular time of year. We’d love to see your results, so please share your findings with [email protected], and we’ll send you some limited-edition Weather Station stickers.
Coolest Projects is a world-leading annual showcase that empowers and inspires the next generation of digital creators, innovators, changemakers, and entrepreneurs. Young people come to the event to exhibit the cool ideas they have been working on throughout the year. And from 2018, Coolest Projects is open to young people across the Raspberry Pi community.
Coolest Projects is a world leading showcase that empowers and inspires the next generation of digital creators, innovators, changemakers and entrepreneurs! Find out more at: http://coolestprojects.org/
A huge fair for digital making
When Raspberry Pi’s Philip and Ben first visited Coolest Projects, they were blown away by the scope of the event, the number of children and young people who had travelled to Dublin to share their work, and the commitment they demonstrated to work ranging from Scratch projects to home-made hovercraft.
Coolest Projects International 2018 will be held in Dublin, Ireland, on Saturday 26 May. Participants will travel from all over the world to take part in a festival of creativity and tech. We hope you’ll be among them!
“It’s a huge fair especially for coding and digital tech – it’s massive and it’s amazing!“
Coolest Projects International and Coolest Projects UK
As well as the flagship international event in Dublin, Ireland, there are regional events in other countries. All these events are now open to makers and creators across the Raspberry Pi community, from Dojos, Code Clubs, and Raspberry Jams.
This year, for the first time, we are bringing Coolest Projects to the UK for a spectacular regional event! Coolest Projects UK will be held at Here East in London on Saturday 28 April. We’re looking forward to discovering over 100 projects that young people have designed and built, and seeing them share their ideas and their passion for technology, make new friends, and learn from one another.
Fierce focus at Coolest Projects
Who can take part?
If you’re up to 18 years of age and you’re in primary, secondary, or further education, you can join in. You can work as an individual or as part of a team of up to five. All projects are welcome, whether you’re a beginner or a seasoned expert.
You must be able to attend the event that you’re entering, whether Coolest Projects International or a regional event. Getting together with other makers and their fantastic projects is a really important and exciting part of the event, so you can’t take part with an online-only or video-only entry. There are a few rules to make sure everything runs smoothly and fairly, and you can read them here.
Wiktoria Jarymowicz from Poland presents the rocket she built at Coolest Projects
How do I join in?
Your project should fit into one of six broad categories, covering everything from Scratch to hardware projects. If you’ve made something with tech, or you’ve got a project idea, it will probably fit into one of them! Once you’ve picked your project, you need to register it and apply for your space at the event. You can register for Coolest Projects International 2018 right now, and registration for Coolest Projects UK 2018 will open on Wednesday: join our email list to get an update when it does.
How will you choose who gets a place?
There are places available for 750 projects, and our goal is to have enough room for everyone who wants to come. If more makers want to bring their projects than there are places available, we’ll select entries to show a balance of projects from different regions and different parts of our communities, from groups and individuals, and from girls and boys, as well as a good mixture of projects across different categories.
I need help to get started, or help to get there
To help get your ideas flowing and guide you through your project, we’ve prepared a set of How to build a project worksheets. And if you’d like to attend Coolest Projects International, but the cost of travel is a problem, you can apply for a travel bursary by 31 January.
Coolest Projects is about rewarding creativity, and we know the Raspberry Pi community has that in spades. It’s about having an idea and making it a reality using the skills you have, whether this is your first project or your fifteenth. We can’t wait to see you at Coolest Projects UK or Coolest Projects International this year!
We recently launched AWS Architecture Monthly, a new subscription service on Kindle that will push a selection of the best content around cloud architecture from AWS, with a few pointers to other content you might also enjoy.
From building a simple website to crafting an AI-based chat bot, the choices of technologies and the best practices in how to apply them are constantly evolving. Our goal is to supply you each month with a broad selection of the best new tech content from AWS — from deep-dive tutorials to industry-trend articles.
With your free subscription, you can look forward to fresh content delivered directly to your Kindledevice or Kindle app including: – Technical whitepapers – Reference architectures – New solutions and implementation guides – Training and certification opportunities – Industry trends
The January issue is now live. This month includes: – AWS Architecture Blog: Glenn Gore’s Take on re:Invent 2017 (Chief Architect for AWS) – AWS Reference Architectures: Java Microservices Deployed on EC2 Container Service; Node.js Microservices Deployed on EC2 Container Service – AWS Training & Certification: AWS Certified Solutions Architect – Associate – Sample Code: aws-serverless-express – Technical Whitepaper: Serverless Architectures with AWS Lambda – Overview and Best Practices
At this time, Architecture Monthly annual subscriptions are only available in the France (new), US, UK, and Germany. As more countries become available, we’ll update you here on the blog. For Amazon.com countries not listed above, we are offering single-issue downloads — also accessible from our landing page. The content is the same as in the subscription but requires individual-issue downloads.
FAQ I have to submit my credit card information for a free subscription? While you do have to submit your card information at this time (as you would for a free book in the Kindle store), it won’t be charged. This will remain a free, annual subscription and includes all 10 issues for the year.
Why isn’t the subscription available everywhere? As new countries get added to Kindle Newsstand, we’ll ensure we add them for Architecture Monthly. This month we added France but anticipate it will take some time for the new service to move into additional markets.
What countries are included in the Amazon.com list where the issues can be downloaded? Andorra, Australia, Austria, Belgium, Brazil, Canada, Gibraltar, Guernsey, India, Ireland, Isle of Man, Japan, Jersey, Liechtenstein, Luxembourg, Mexico, Monaco, Netherlands, New Zealand, San Marino, Spain, Switzerland, Vatican City
Hey folks, Rob from The MagPi here! We know many people might be getting their very first Raspberry Pi this Christmas, and excitedly wondering “what do I do with it?” While we can’t tell you exactly what to do with your Pi, we can show you how to immerse yourself in the world of Raspberry Pi and be inspired by our incredible community, and that’s the topic of The MagPi 65, out today tomorrow (we’re a day early because we’re simply TOO excited about the special announcement below!).
The one, the only…issue 65!
Raspberry Pi for Newbies
Raspberry Pi for Newbies covers some of the very basics you should know about the world of Raspberry Pi. After a quick set-up tutorial, we introduce you to the Raspberry Pi’s free online resources, including Scratch and Python projects from Code Club, before guiding you through the wider Raspberry Pi and maker community.
Pages and pages of useful advice and starter projects
The online community is an amazing place to learn about all the incredible things you can do with the Raspberry Pi. We’ve included some information on good places to look for tutorials, advice and ideas.
And that’s not all
Want to do more after learning about the world of Pi? The rest of the issue has our usual selection of expert guides to help you build some amazing projects: you can make a Christmas memory game, build a tower of bells to ring in the New Year, and even take your first steps towards making a game using C++.
Midimutant, the synthesizer “that boinks endless strange sounds”
All this along with inspiring projects, definitive reviews, and tales from around the community.
Raspberry Pi Annual
Issue 65 isn’t the only new release to look out for. We’re excited to bring you the first ever Raspberry Pi Annual, and it’s free for MagPi subscribers – in fact, subscribers should be receiving it the same day as their issue 65 delivery!
If you’re not yet a subscriber of The MagPi, don’t panic: you can still bag yourself a copy of the Raspberry Pi Annual by signing up to a 12-month subscription of The MagPi before 24 January. You’ll also receive the usual subscriber gift of a free Raspberry Pi Zero W (with case and cable). Click here to subscribe to The MagPi – The Official Raspberry Pi magazine.
The Raspberry Pi Annual is aimed at young folk wanting to learn to code, with a variety of awesome step-by-step Scratch tutorials, games, puzzles, and comics, including a robotic Babbage.
Get your copy
You can get The MagPi 65 and the Raspberry Pi Annual 2018 from our online store, and the magazine can be found in the wild at WHSmith, Tesco, Sainsbury’s, and Asda. You’ll be able to get it in the US at Barnes & Noble and Micro Center in a few days’ time. The MagPi 65 is also available digitally on our Android and iOS apps. Finally, you can also download a free PDF of The MagPi 65 and The Raspberry Pi Annual 2018.
We hope you have a merry Christmas! We’re off until the New Year. Bye!
<p>Today we kicked off our <a href="https://www.generosity.com/community-fundraising/make-a-more-secure-web-with-let-s-encrypt">first crowdfunding campaign</a> with the goal of raising enough funds to cover about one month of our operations – $200,000. That amount covers the operational and engineering staff, the hardware and the software, and general operating expenses needed to securely and reliably issue and manage many millions of certificates.</p>
<p>We decided to run a crowdfunding campaign for a couple of reasons. First, there is a gap between the funds we’ve raised and what we need for next year. Second, we believe individual supporters from our community can come to represent a significant diversification of our annual revenue sources, in addition to corporate sponsorship and grants.</p>
<p>We will provide updates on our progress throughout the campaign via Twitter (<a href="https://twitter.com/letsencrypt">@letsencrypt</a>).</p>
This is the fifth article in a short series about Poland, Europe, and the United States. To explore the entire series, start here.
Perhaps not surprisingly, my previous blog post sparked several interesting discussions with my Polish friends who took a more decisive view of the social costs of firearm ownership, or who saw the Second Amendment as a barbaric construct with no place in today’s world. Their opinions reminded me of my own attitude some ten years ago; in this brief follow-up, I wanted to share several data points that convinced me to take a more measured stance.
Let’s start with the basics: most estimates place the number of guns in the United States at 300 to 350 million – that’s roughly one firearm per every single resident. In Gallup polls, some 40-50% of all households report having a gun, frequently more than one. The demographics of firearm ownership are more uniform than stereotypes may imply; there is some variance across regions, political affiliations, and genders – but for most part, it tends to fall within fairly narrow bands.
An overwhelming majority of gun owners cite personal safety as the leading motive for purchasing a firearm; hunting and recreation activities come strong second. The defensive aspect of firearm ownership is of special note, because it can potentially provide a very compelling argument for protecting the right to bear arms even if it’s a socially unwelcome practice, or if it comes at an elevated cost to the nation as a whole.
The self-defense argument is sometimes dismissed as pure fantasy, with many eminent pundits citing one questionable statistic to support this view: the fairly low number of justifiable homicides in the country. Despite its strong appeal to ideologues, the metric does not stand up to scrutiny: all available data implies that most encounters where a gun is pulled by a would-be victim will not end with the assailant getting killed; it’s overwhelmingly more likely that the bad guy would hastily retreat, be detained at gunpoint, or suffer non-fatal injuries. In fact, even in the unlikely case that a firearm is actually discharged with the intent to kill or maim, somewhere around 70-80% of victims survive.
In reality, we have no single, elegant, and reliable source of data about the frequency with which firearms are used to deter threats; the results of scientific polls probably offer the most comprehensive view, but are open to interpretation and their results vary significantly depending on sampling methods and questions asked. That said, a recent meta-analysis from Centers for Disease Control and Prevention provided some general bounds:
“Defensive use of guns by crime victims is a common occurrence, although the exact number remains disputed (Cook and Ludwig, 1996; Kleck, 2001a). Almost all national survey estimates indicate that defensive gun uses by victims are at least as common as offensive uses by criminals, with estimates of annual uses ranging from about 500,000 to more than 3 million.”
“A different issue is whether defensive uses of guns, however numerous or rare they may be, are effective in preventing injury to the gun-wielding crime victim. Studies that directly assessed the effect of actual defensive uses of guns (i.e., incidents in which a gun was “used” by the crime victim in the sense of attacking or threatening an offender) have found consistently lower injury rates among gun-using crime victims compared with victims who used other self-protective strategies.”
An argument can be made that the availability of firearms translates to higher rates of violent crime, thus elevating the likelihood of encounters where a defensive firearm would be useful – feeding into an endless cycle of escalating violence. That said, such an effect does not seem to be particularly evident. For example, the United States comes out reasonably well in statistics related to assault, rape, and robbery; on these fronts, America looks less violent than the UK or a bunch of other OECD countries with low firearm ownership rates.
But there is an exception: one area where the United States clearly falls behind other highly developed nations are homicides. The per-capita figures are almost three times as high as in much of the European Union. And indeed, the bulk of intentional homicides – some 11 thousand deaths a year – trace back to firearms.
We tend to instinctively draw a connection to guns, but the origins of this tragic situation may be more elusive than they appear. For one, non-gun-related homicides happen in the US at a higher rate than in many other countries, too; Americans just seem to be generally more keen on killing each other than people in places such as Europe, Australia, or Canada. In addition, no convincing pattern emerges when comparing overall homicide rates across states with permissive and restrictive gun ownership laws. Some of the lowest per-capita homicide figures can be found in extremely gun-friendly states such as Idaho, Utah, or Vermont; whereas highly-regulated Washington D.C., Maryland, Illinois, and California all rank pretty high. There is, however, fairly strong correlation between gun and non-gun homicide rates across the country – suggesting that common factors such as population density, urban poverty, and drug-related gang activities play a far more significant role in violent crime than the ease of legally acquiring a firearm. It’s tragic but worth noting that a strikingly disproportionate percentage of homicides involves both victims and perpetrators that belong to socially disadvantaged and impoverished minorities. Another striking pattern is that up to about a half of all gun murders are related to or committed under the influence of illicit drugs.
Now, international comparisons show general correlation between gun ownership and some types of crime, but it’s difficult to draw solid conclusions from that: there are countless other ways to explain why crime rates may be low in the wealthy European states, and high in Venezuela, Mexico, Honduras, or South Africa; compensating for these factors is theoretically possible, but requires making far-fetched assumptions that are hopelessly vulnerable to researcher bias. Comparing European countries is easier, but yields inconclusive results: gun ownership in Poland is almost twenty times lower than in neighboring Germany and ten times lower than in Czech Republic – but you certainly wouldn’t able to tell that from national crime stats.
When it comes to gun control, one CDC study on the topic concluded with:
“The Task Force found insufficient evidence to determine the effectiveness of any of the firearms laws or combinations of laws reviewed on violent outcomes.”
This does not imply that such approaches are necessarily ineffective; for example, it seems pretty reasonable to assume that well-designed background checks or modest waiting periods do save lives. Similarly, safe storage requirements would likely prevent dozens of child deaths every year, at the cost of rendering firearms less available for home defense. But for the hundreds of sometimes far-fetched gun control proposals introduced every year on federal and state level, emotions often take place of real data, poisoning the debate around gun laws and ultimately bringing little or no public benefit. The heated assault weapon debate is one such red herring: although modern semi-automatic rifles look sinister, they are far more common in movies than on the streets; in reality, all kinds of rifles account only for somewhere around 4% of firearm homicides, and AR-15s are only a tiny fraction of that – likely claiming about as many lives as hammers, ladders, or swimming pools. The efforts to close the “gun show loophole” seem fairly sensible at the surface, too, but are of similarly uncertain merit; instead of gun shows, criminals depend on friends, family, and on more than 200,000 guns that stolen from their rightful owners every year. When breaking into a random home yields a 40-50% chance of scoring a firearm, it’s not hard to see why.
Another oddball example of simplistic legislative zeal are the attempts to mandate costly gun owner liability insurance, based on drawing an impassioned but flawed parallel between firearms and cars; what undermines this argument is that car accidents are commonplace, while gun handling mishaps – especially ones that injure others – are rare. We also have proposals to institute $100 ammunition purchase permits, to prohibit ammo sales over the Internet, or to impose a hefty per-bullet tax. Many critics feel that such laws seem to be geared not toward addressing any specific dangers, but toward making firearms more expensive and burdensome to own – slowly eroding the constitutional rights of the less wealthy folks. They also see hypocrisy in the common practice of making retired police officers and many high-ranking government officials exempt from said laws.
Regardless of individual merits of the regulations, it’s certainly true that with countless pieces of sometimes obtuse and poorly-written federal, state, and municipal statutes introduced every year, it’s increasingly easy for people to unintentionally run afoul of the rules. In California, the law as written today implies that any legal permanent resident in good standing can own a gun, but that only US citizens can transport it by car. Given that Californians are also generally barred from carrying firearms on foot in many populated areas, non-citizen residents are seemingly expected to teleport between the gun store, their home, and the shooting range. With many laws hastily drafted in the days after mass shootings and other tragedies, such gems are commonplace. The federal Gun-Free School Zones Act imposes special restrictions on gun ownership within 1,000 feet of a school and slaps harsh penalties for as little carrying it in an unlocked container from one’s home to a car parked in the driveway. In many urban areas, a lot of people either live within such a school zone or can’t conceivably avoid it when going about their business; GFSZA violations are almost certainly common and are policed only selectively.
Meanwhile, with sharp declines in crime continuing for the past 20 years, the public opinion is increasingly in favor of broad, reasonably policed gun ownership; for example, more than 70% respondents to one Gallup poll are against the restrictive handgun bans of the sort attempted in Chicago, San Francisco, or Washington D.C.; and in a recent Rasmussen poll, only 22% say that they would feel safer in a neighborhood where people are not allowed to keep guns. In fact, responding to the media’s undue obsession with random of acts of violence against law-abiding citizens, and worried about the historically very anti-gun views of the sitting president, Americans are buying a lot more firearms than ever before. Even the National Rifle Association – a staunchly conservative organization vilified by gun control advocates and mainstream pundits – enjoys a pretty reasonable approval rating across many demographics: 58% overall and 78% in households with a gun.
And here’s the kicker: despite its reputation for being a political arm of firearm manufacturers, the NRA is funded largely through individual memberships, small-scale donations, and purchase round-ups; organizational donations add up to about 5% of their budget – and if you throw in advertising income, the total still stays under 15%. That makes it quite unlike most of the other large-scale lobbying groups that Democrats aren’t as keen on naming-and-shaming on the campaign trail. The NRA’s financial muscle is also frequently overstated; it doesn’t even make it onto the list of top 100 lobbyists in Washington – and gun control advocacy groups, backed by activist billionaires such as Michael Bloomberg, now frequently outspend the pro-gun crowd. Of course, it would be better for the association’s socially conservative and unnecessarily polarizing rhetoric – sometimes veering onto the topics of abortion or video games – to be offset by the voice of other, more liberal groups. But ironically, organizations such as American Civil Liberties Union – well-known for fearlessly defending controversial speech – prefer to avoid the Second Amendment; they do so not because the latter concept has lesser constitutional standing, but because supporting it would not sit well with their own, progressive support base.
America’s attitude toward guns is a choice, not a necessity. It is also true that gun violence is a devastating problem; and that the emotional horror and lasting social impact of incidents such as school shootings can’t be possibly captured in any cold, dry statistic alone. But there is also nuance and reason to the gun control debate that can be hard to see for newcomers from more firearm-averse parts of the world.
For the next article in the series, click here. Alternatively, if you prefer to keep reading about firearms, go here for an overview of the gun control debate in the US.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.