Interesting collision of real-world and Internet security:
The ceremony sees several trusted internet engineers (a minimum of three and up to seven) from across the world descend on one of two secure locations — one in El Segundo, California, just south of Los Angeles, and the other in Culpeper, Virginia — both in America, every three months.
Once in place, they run through a lengthy series of steps and checks to cryptographically sign the digital key pairs used to secure the internet’s root zone. (Here’s Cloudflare‘s in-depth explanation, and IANA’s PDF step-by-step guide.)
Only specific named people are allowed to take part in the ceremony, and they have to pass through several layers of security — including doors that can only be opened through fingerprint and retinal scans — before getting in the room where the ceremony takes place.
Staff open up two safes, each roughly one-metre across. One contains a hardware security module that contains the private portion of the KSK. The module is activated, allowing the KSK private key to sign keys, using smart cards assigned to the ceremony participants. These credentials are stored in deposit boxes and tamper-proof bags in the second safe. Each step is checked by everyone else, and the event is livestreamed. Once the ceremony is complete — which takes a few hours — all the pieces are separated, sealed, and put back in the safes inside the secure facility, and everyone leaves.
But during what was apparently a check on the system on Tuesday night — the day before the ceremony planned for 1300 PST (2100 UTC) Wednesday — IANA staff discovered that they couldn’t open one of the two safes. One of the locking mechanisms wouldn’t retract and so the safe stayed stubbornly shut.
As soon as they discovered the problem, everyone involved, including those who had flown in for the occasion, were told that the ceremony was being postponed. Thanks to the complexity of the problem — a jammed safe with critical and sensitive equipment inside — they were told it wasn’t going to be possible to hold the ceremony on the back-up date of Thursday, either.
In our blog post on Tuesday, Cryptocurrency Security Challenges, we wrote about the two primary challenges faced by anyone interested in safely and profitably participating in the cryptocurrency economy: 1) make sure you’re dealing with reputable and ethical companies and services, and, 2) keep your cryptocurrency holdings safe and secure.
In this post, we’re going to focus on how to make sure you don’t lose any of your cryptocurrency holdings through accident, theft, or carelessness. You do that by backing up the keys needed to sell or trade your currencies.
$34 Billion in Lost Value
Of the 16.4 million bitcoins said to be in circulation in the middle of 2017, close to 3.8 million may have been lost because their owners no longer are able to claim their holdings. Based on today’s valuation, that could total as much as $34 billion dollars in lost value. And that’s just bitcoins. There are now over 1,500 different cryptocurrencies, and we don’t know how many of those have been misplaced or lost.
Now that some cryptocurrencies have reached (at least for now) staggering heights in value, it’s likely that owners will be more careful in keeping track of the keys needed to use their cryptocurrencies. For the ones already lost, however, the owners have been separated from their currencies just as surely as if they had thrown Benjamin Franklins and Grover Clevelands over the railing of a ship.
The Basics of Securing Your Cryptocurrencies
In our previous post, we reviewed how cryptocurrency keys work, and the common ways owners can keep track of them. A cryptocurrency owner needs two keys to use their currencies: a public key that can be shared with others is used to receive currency, and a private key that must be kept secure is used to spend or trade currency.
Many wallets and applications allow the user to require extra security to access them, such as a password, or iris, face, or thumb print scan. If one of these options is available in your wallets, take advantage of it. Beyond that, it’s essential to back up your wallet, either using the backup feature built into some applications and wallets, or manually backing up the data used by the wallet. When backing up, it’s a good idea to back up the entire wallet, as some wallets require additional private data to operate that might not be apparent.
No matter which backup method you use, it is important to back up often and have multiple backups, preferable in different locations. As with any valuable data, a 3-2-1 backup strategy is good to follow, which ensures that you’ll have a good backup copy if anything goes wrong with one or more copies of your data.
One more caveat, don’t reuse passwords. This applies to all of your accounts, but is especially important for something as critical as your finances. Don’t ever use the same password for more than one account. If security is breached on one of your accounts, someone could connect your name or ID with other accounts, and will attempt to use the password there, as well. Consider using a password manager such as LastPass or 1Password, which make creating and using complex and unique passwords easy no matter where you’re trying to sign in.
Approaches to Backing Up Your Cryptocurrency Keys
There are numerous ways to be sure your keys are backed up. Let’s take them one by one.
1. Automatic backups using a backup program
If you’re using a wallet program on your computer, for example, Bitcoin Core, it will store your keys, along with other information, in a file. For Bitcoin Core, that file is wallet.dat. Other currencies will use the same or a different file name and some give you the option to select a name for the wallet file.
To back up the wallet.dat or other wallet file, you might need to tell your backup program to explicitly back up that file. Users of Backblaze Backup don’t have to worry about configuring this, since by default, Backblaze Backup will back up all data files. You should determine where your particular cryptocurrency, wallet, or application stores your keys, and make sure the necessary file(s) are backed up if your backup program requires you to select which files are included in the backup.
Backblaze B2 is an option for those interested in low-cost and high security cloud storage of their cryptocurrency keys. Backblaze B2 supports 2-factor verification for account access, works with a number of apps that support automatic backups with encryption, error-recovery, and versioning, and offers an API and command-line interface (CLI), as well. The first 10GB of storage is free, which could be all one needs to store encrypted cryptocurrency keys.
2. Backing up by exporting keys to a file
Apps and wallets will let you export your keys from your app or wallet to a file. Once exported, your keys can be stored on a local drive, USB thumb drive, DAS, NAS, or in the cloud with any cloud storage or sync service you wish. Encrypting the file is strongly encouraged — more on that later. If you use 1Password or LastPass, or other secure notes program, you also could store your keys there.
3. Backing up by saving a mnemonic recovery seed
A mnemonic phrase, mnemonic recovery phrase, or mnemonic seed is a list of words that stores all the information needed to recover a cryptocurrency wallet. Many wallets will have the option to generate a mnemonic backup phrase, which can be written down on paper. If the user’s computer no longer works or their hard drive becomes corrupted, they can download the same wallet software again and use the mnemonic recovery phrase to restore their keys.
The phrase can be used by anyone to recover the keys, so it must be kept safe. Mnemonic phrases are an excellent way of backing up and storing cryptocurrency and so they are used by almost all wallets.
A mnemonic recovery seed is represented by a group of easy to remember words. For example:
The first four letters are enough to unambiguously identify the word.
Similar words are avoided (such as: build and built).
Bitcoin and most other cryptocurrencies such as Litecoin, Ethereum, and others use mnemonic seeds that are 12 to 24 words long. Other currencies might use different length seeds.
4. Physical backups — Paper, Metal
Some cryptocurrency holders believe that their backup, or even all their cryptocurrency account information, should be stored entirely separately from the internet to avoid any risk of their information being compromised through hacks, exploits, or leaks. This type of storage is called “cold storage.” One method of cold storage involves printing out the keys to a piece of paper and then erasing any record of the keys from all computer systems. The keys can be entered into a program from the paper when needed, or scanned from a QR code printed on the paper.
Printed public and private keys
Some who go to extremes suggest separating the mnemonic needed to access an account into individual pieces of paper and storing those pieces in different locations in the home or office, or even different geographical locations. Some say this is a bad idea since it could be possible to reconstruct the mnemonic from one or more pieces. How diligent you wish to be in protecting these codes is up to you.
Mnemonic recovery phrase booklet
There’s another option that could make you the envy of your friends. That’s the CryptoSteel wallet, which is a stainless steel metal case that comes with more than 250 stainless steel letter tiles engraved on each side. Codes and passwords are assembled manually from the supplied part-randomized set of tiles. Users are able to store up to 96 characters worth of confidential information. Cryptosteel claims to be fireproof, waterproof, and shock-proof.
Cryptosteel cold wallet
Of course, if you leave your Cryptosteel wallet in the pocket of a pair of ripped jeans that gets thrown out by the housekeeper, as happened to the character Russ Hanneman on the TV show Silicon Valley in last Sunday’s episode, then you’re out of luck. That fictional billionaire investor lost a USB drive with $300 million in cryptocoins. Let’s hope that doesn’t happen to you.
Encryption & Security
Whether you store your keys on your computer, an external disk, a USB drive, DAS, NAS, or in the cloud, you want to make sure that no one else can use those keys. The best way to handle that is to encrypt the backup.
With Backblaze Backup for Windows and Macintosh, your backups are encrypted in transmission to the cloud and on the backup server. Users have the option to add an additional level of security by adding a Personal Encryption Key (PEK), which secures their private key. Your cryptocurrency backup files are secure in the cloud. Using our web or mobile interface, previous versions of files can be accessed, as well.
Our object storage cloud offering, Backblaze B2, can be used with a variety of applications for Windows, Macintosh, and Linux. With B2, cryptocurrency users can choose whichever method of encryption they wish to use on their local computers and then upload their encrypted currency keys to the cloud. Depending on the client used, versioning and life-cycle rules can be applied to the stored files.
Other backup programs and systems provide some or all of these capabilities, as well. If you are backing up to a local drive, it is a good idea to encrypt the local backup, which is an option in some backup programs.
Some experts recommend using a different address for each cryptocurrency transaction. Since the address is not the same as your wallet, this means that you are not creating a new wallet, but simply using a new identifier for people sending you cryptocurrency. Creating a new address is usually as easy as clicking a button in the wallet.
One of the chief advantages of using a different address for each transaction is anonymity. Each time you use an address, you put more information into the public ledger (blockchain) about where the currency came from or where it went. That means that over time, using the same address repeatedly could mean that someone could map your relationships, transactions, and incoming funds. The more you use that address, the more information someone can learn about you. For more on this topic, refer to Address reuse.
Note that a downside of using a paper wallet with a single key pair (type-0 non-deterministic wallet) is that it has the vulnerabilities listed above. Each transaction using that paper wallet will add to the public record of transactions associated with that address. Newer wallets, i.e. “deterministic” or those using mnemonic code words support multiple addresses and are now recommended.
There are other approaches to keeping your cryptocurrency transaction secure. Here are a couple of them.
Multi-signature refers to requiring more than one key to authorize a transaction, much like requiring more than one key to open a safe. It is generally used to divide up responsibility for possession of cryptocurrency. Standard transactions could be called “single-signature transactions” because transfers require only one signature — from the owner of the private key associated with the currency address (public key). Some wallets and apps can be configured to require more than one signature, which means that a group of people, businesses, or other entities all must agree to trade in the cryptocurrencies.
Deep Cold Storage
Deep cold storage ensures the entire transaction process happens in an offline environment. There are typically three elements to deep cold storage.
First, the wallet and private key are generated offline, and the signing of transactions happens on a system not connected to the internet in any manner. This ensures it’s never exposed to a potentially compromised system or connection.
Second, details are secured with encryption to ensure that even if the wallet file ends up in the wrong hands, the information is protected.
Third, storage of the encrypted wallet file or paper wallet is generally at a location or facility that has restricted access, such as a safety deposit box at a bank.
Deep cold storage is used to safeguard a large individual cryptocurrency portfolio held for the long term, or for trustees holding cryptocurrency on behalf of others, and is possibly the safest method to ensure a crypto investment remains secure.
Keep Your Software Up to Date
You should always make sure that you are using the latest version of your app or wallet software, which includes important stability and security fixes. Installing updates for all other software on your computer or mobile device is also important to keep your wallet environment safer.
One Last Thing: Think About Your Testament
Your cryptocurrency funds can be lost forever if you don’t have a backup plan for your peers and family. If the location of your wallets or your passwords is not known by anyone when you are gone, there is no hope that your funds will ever be recovered. Taking a bit of time on these matters can make a huge difference.
To the Moon*
Are you comfortable with how you’re managing and backing up your cryptocurrency wallets and keys? Do you have a suggestion for keeping your cryptocurrencies safe that we missed above? Please let us know in the comments.
*To the Moon — Crypto slang for a currency that reaches an optimistic price projection.
Robots can crack safes faster than humans — and differently:
So Seidle started looking for shortcuts. First he found that, like many safes, his SentrySafe had some tolerance for error. If the combination includes a 12, for instance, 11 or 13 would work, too. That simple convenience measure meant his bot could try every third number instead of every single number, immediately paring down the total test time to just over four days. Seidle also realized that the bot didn’t actually need to return the dial to its original position before trying every combination. By making attempts in a certain careful order, it could keep two of the three rotors in place, while trying new numbers on just the last, vastly cutting the time to try new combinations to a maximum of four seconds per try. That reduced the maximum bruteforcing time to about one day and 16 hours, or under a day on average.
But Seidle found one more clever trick, this time taking advantage of a design quirk in the safe intended to prevent traditional safecracking. Because the safe has a rod that slips into slots in the three rotors when they’re aligned to the combination’s numbers, a human safecracker can apply light pressure to the safe’s handle, turn its dial, and listen or feel for the moment when that rod slips into those slots. To block that technique, the third rotor of Seidle’s SentrySafe is indented with twelve notches that catch the rod if someone turns the dial while pulling the handle.
Seidle took apart the safe he and his wife had owned for years, and measured those twelve notches. To his surprise, he discovered the one that contained the slot for the correct combination was about a hundredth of an inch narrower than the other eleven. That’s not a difference any human can feel or listen for, but his robot can easily detect it with a few automated measurements that take seconds. That discovery defeated an entire rotor’s worth of combinations, dividing the possible solutions by a factor of 33, and reducing the total cracking time to the robot’s current hour-and-13 minute max.
We’re going to have to start thinking about robot adversaries as we design our security systems.
Deploying new software into production will always carry some amount of risk, and failed deployments (e.g., software bugs, misconfigurations, etc.) will occasionally occur. As a service owner, the goal is to try and reduce the number of these incidents and to limit customer impact when they do occur. One method to reduce potential impact is to shape your deployment strategies around the failure conditions of your service. Thus, when a deployment fails, the service owner has more control over the blast radius as well as the scope of the impact. These strategies require an understanding of how the various components of your system interact, how those components can fail and how those failures impact your customers. This blog post discusses some of the deployment strategies that we’ve made on the Route 53 team and how these choices affect the availability of our service.
To begin, I’ll briefly describe some of the deployment procedures and the Route 53 architecture in order to provide some context for the deployment strategies that we have chosen. Hopefully, these examples will reveal strategies that could benefit your own service’s availability. Like many services, Route 53 consists of multiple environments or stages: one for active development, one for staging changes to production and the production stage itself. The natural tension with trying to reduce the number of failed deployments in production is to add more rigidity and processes that slow down the release of new code. At Route 53, we do not enforce a strict release or deployment schedule; individual developers are responsible for verifying their changes in the staging environment and pushing their changes into production. Typically, our deployments proceed in a pipelined fashion. Each step of the pipeline is referred to as a “wave” and consists of some portion of our fleet. A pipeline is a good abstraction as each wave can be thought of as an independent and separate step. After each wave of the pipeline, the change can be verified — this can include automatic, scheduled and manual testing as well as the verification of service metrics. Furthermore, we typically space out the earlier waves of production deployment at least 24 hours apart, in order to allow the changes to “bake.” Letting our software bake refers to rolling out software changes slowly to allow us to validate those changes and verify service metrics with production traffic before pushing the deployment to the next wave. The clear advantage of deploying new code to only a portion of your fleet is that it reduces the impact of a failed deployment to just the portion of the fleet containing the new code. Another benefit of our deployment infrastructure is that it provides us a mechanism to quickly “roll back” a deployment to a previous software version if any problems are detected which, in many cases, enables us to quickly mitigate a failed deployment.
Based on our experiences, we have further organized our deployments to try and match our failure conditions to further reduce impact. First, our deployment strategies are tailored to the part of the system that is the target of our deployment. We commonly refer to two main components of Route 53: the control plane and the data plane (pictured below). The control plane consists primarily of our API and DNS change propagation system. Essentially, this is the part of our system that accepts a customer request to create or delete a DNS record and then the transmission of that update to all of our DNS servers distributed across the world. The data plane consists of our fleet of DNS servers that are responsible for answering DNS queries on behalf of our customers. These servers currently reside in more than 50 locations around the world. Both of these components have their own set of failure conditions and differ in how a failed deployment will impact customers. Further, a failure of one component may not impact the other. For example, an API outage where customers are unable to create new hosted zones or records has no impact on our data plane continuing to answer queries for all records created prior to the outage. Given their distinct set of failure conditions, the control plane and data plane have their own deployment strategies, which are each discussed in more detail below.
Control Plane Deployments
The bulk of the of the control plane actually consists of two APIs. The first is our external API that is reachable from the Internet and is the entry point for customers to create, delete and view their DNS records. This external API performs authentication and authorization checks on customer requests before forwarding them to our internal API. The second, internal API supports a much larger set of operations than just the ones needed by the external API; it also includes operations required to monitor and propagate DNS changes to our DNS servers as well as other operations needed to operate and monitor the service. Failed deployments to the external API typically impact a customer’s ability to view or modify their DNS records. The availability of this API is critical as our customers may rely on the ability to update their DNS records quickly and reliably during an operational event for their own service or site.
Deployments to the external API are fairly straightforward. For increased availability, we host the external API in multiple availability zones. Each wave of deployment consists of the hosts within a single availability zone, and each host in that availability zone is deployed to individually. If any single host deployment fails, the deployment to the entire availability zone is halted automatically. Some host failures may be quickly caught and mitigated by the load balancer for our hosts in that particular availability zone, which is responsible for health checking the hosts. Hosts that fail these load balancer health checks are automatically removed from service by the load balancer. Thus, a failed deployment to just a single host would result in it being removed from service automatically and the deployment halted without any operator intervention. For other types of failed deployments that may not cause the load balancer health checks to fail, restricting waves to a single availability zone allows us to easily flip away from that availability zone as soon as the failure is detected. A similar approach could be applied to services that utilize Route 53 plus ELB in multiple regions and availability zones for their services. ELBs automatically health check their back-end instances and remove unhealthy instances from service. By creating Route 53 alias records marked to evaluate target health (see ELB documentation for how to set this up), if all instances behind an ELB are unhealthy, Route 53 will fail away from this alias and attempt to find an alternate healthy record to serve. This configuration will enable automatic failover at the DNS-level for an unhealthy region or availability zone. To enable manual failover, simply convert the alias resource record set for your ELB to either a weighted alias or associate it with a health check whose health you control. To initiate a failover, simply set the weight to 0 or fail the health check. A weighted alias also allows you the ability to slowly increase the traffic to that ELB, which can be useful for verifying your own software deployments to the back-end instances.
For our internal API, the deployment strategy is more complicated (pictured below). Here, our fleet is partitioned by the type of traffic it handles. We classify traffic into three types: (1) low-priority, long-running operations used to monitor the service (batch fleet), (2) all other operations used to operate and monitor the service (operations fleet) and (3) all customer operations (customer fleet). Deployments to the production internal API are then organized by how critical their traffic is to the service as a whole. For instance, the batch fleet is deployed to first because their operations are not critical to the running of the service and we can tolerate long outages of this fleet. Similarly, we prioritize the operations fleet below that of customer traffic as we would rather continue accepting and processing customer traffic after a failed deployment to the operations fleet. For the internal API, we have also organized our staging waves differently from our production waves. In the staging waves, all three fleets are split across two waves. This is done intentionally to allow us to verify that the code changes work in a split-world where multiple versions of the software are running simultaneously. We have found this to be useful in catching incompatibilities between software versions. Since we never deploy software in production to 100% of our fleet at the same time, our software updates must be designed to be compatible with the previous version. Finally, as with the external API, all wave deployments proceed with a single host at a time. For this API, we also include a deep application health check as part of the deployment. Similar to the load balancer health checks for the external API, if this health check fails, the entire deployment is immediately halted.
Data Plane Deployments
As mentioned earlier, our data plane consists of Route 53’s DNS servers, which are distributed across the world in more than 50 distinct locations (we refer to each location as an ‘edge location’). An important consideration with our deployment strategy is how we stripe our anycast IP space across locations. In summary, each hosted zone is assigned four delegation name servers, each of which belong to a “stripe” (i.e., one quarter of our anycast range). Generally speaking, each edge location announces only a single stripe, so each stripe is therefore announced by roughly 1/4 of our edge locations worldwide. Thus, when a resolver issues a query against each of the four delegation name servers, those queries are directed via BGP to the closest (in a network sense) edge location from each stripe. While the availability and correctness of our API is important, the availability and correctness of our data plane are even more critical. In this case, an outage directly results in an outage for our customers. Furthermore, the impact of serving even a single wrong answer on behalf of a customer is magnified by that answer being cached by both intermediate resolvers and end clients alike. Thus, deployments to our data plane are organized even more carefully to both prevent failed deployments and to reduce potential impact.
The safest way to deploy and minimize impact would be to deploy to a single edge location at a time. However, with manual deployments that are overseen by a developer, this approach is just not scalable with how frequently we deploy new software to over 50 locations (with more added each year). Thus, most of our production deployment waves consist of multiple locations; the one exception is our first wave that includes just a single location. Furthermore, this location is specifically chosen because it runs our oldest hardware, which provides us a quick notification for any unintended performance degradation. It is important to note that while the caching behavior for resolvers can cause issues if we serve an incorrect answer, they handle other types failures well. When a recursive resolver receives a query for a record that is not cached, it will typically issue queries to at least three of the four delegation name servers in parallel and it will use the first response it receives. Thus, in the event where one of our locations is black holing customer queries (i.e., not replying to DNS queries), the resolver should receive a response from one of the other delegation name servers. In this case, the only impact is to resolvers where the edge location that is not answering would have been the fastest responder. Now, that resolver will effectively be waiting for the response from the second fastest stripe. To take advantage of this resiliency, our other waves are organized such that they include edge locations that are geographically diverse, with the intent that for any single resolver, there will be nearby locations that are not included in the current deployment wave. Furthermore, to guarantee that at most a single nameserver for all customers is affected, waves are actually organized by stripe. Finally, each stripe is spread across multiple waves so that failures impact only a single name server for a portion of our customers. An example of this strategy is depicted below. A few notes: our staging environment consists of a much smaller number of edge locations than production, so single-location waves are possible. Second, each stripe is denoted by color; in this example, we see deployments spread across a blue and orange stripe. You, too, can think about organizing your deployment strategy around your failure conditions. For example, if you have a database schema used by both your production system and a warehousing system, deploy the change to the warehousing system first to ensure you haven’t broken any compatibility. You might catch problems with the warehousing system before it affects customer traffic.
Our team’s experience with operating Route 53 over the last 3+ years have highlighted the importance of reducing the impact from failed deployments. Over the years, we have been able to identify some of the common failure conditions and to organize our software deployments in such a way so that we increase the ease of mitigation while decreasing the potential impact to our customers.
– Nick Trebon
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.