Tag Archives: backups

The First Lady’s bad cyber advice

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/05/the-first-ladys-bad-cyber-advice.html

First Lady Melania Trump announced a guide to help children go online safely. It has problems.

Melania’s guide is full of outdated, impractical, inappropriate, and redundant information. But that’s allowed, because it relies upon moral authority: to be moral is to be secure, to be moral is to do what the government tells you. It matters less whether the advice is technically accurate, and more that you are supposed to do what authority tells you.

That’s a problem, not just with her guide, but most cybersecurity advice in general. Our community gives out advice without putting much thought into it, because it doesn’t need thought. You should do what we tell you, because being secure is your moral duty.

This post picks apart Melania’s document. The purpose isn’t to fine-tune her guide and make it better. Instead, the purpose is to demonstrate the idea of resting on moral authority instead of technical authority.
<-- --="" more="">

Strong Passwords

“Strong passwords” is the quintessential cybersecurity cliché that insecurity is due to some “weakness” (laziness, ignorance, greed, etc.) and the remedy is to be “strong”.

The first flaw is that this advice is outdated. Ten years ago, important websites would frequently get hacked and have poor password protection (like MD5 hashing). Back then, strength mattered, to stop hackers from brute force guessing the hacked passwords. These days, important websites get hacked less often and protect the passwords better (like salted bcrypt). Moreover, the advice is now often redundant: websites, at least the important ones, enforce a certain level of password complexity, so that even without advice, you’ll be forced to do the right thing most of the time.

This advice is outdated for a second reason: hackers have gotten a lot better at cracking passwords. Ten years ago, they focused on brute force, trying all possible combinations. Partly because passwords are now protected better, dramatically reducing the effectiveness of the brute force approach, hackers have had to focus on other techniques, such as the mutated dictionary and Markov chain attacks. Consequently, even though “Password123!” seems to meet the above criteria of a strong password, it’ll fall quickly to a mutated dictionary attack. The simple recommendation of “strong passwords” is no longer sufficient.

The last part of the above advice is to avoid password reuse. This is good advice. However, this becomes impractical advice, especially when the user is trying to create “strong” complex passwords as described above. There’s no way users/children can remember that many passwords. So they aren’t going to follow that advice.

To make the advice work, you need to help users with this problem. To begin with, you need to tell them to write down all their passwords. This is something many people avoid, because they’ve been told to be “strong” and writing down passwords seems “weak”. Indeed it is, if you write them down in an office environment and stick them on a note on the monitor or underneath the keyboard. But they are safe and strong if it’s on paper stored in your home safe, or even in a home office drawer. I write my passwords on the margins in a book on my bookshelf — even if you know that, it’ll take you a long time to figure out which book when invading my home.

The other option to help avoid password reuse is to use a password manager. I don’t recommend them to my own parents because that’d be just one more thing I’d have to help them with, but they are fairly easy to use. It means you need only one password for the password manager, which then manages random/complex passwords for all your web accounts.

So what we have here is outdated and redundant advice that overshadows good advice that is nonetheless incomplete and impractical. The advice is based on the moral authority of telling users to be “strong” rather than the practical advice that would help them.

No personal info unless website is secure

The guide teaches kids to recognize the difference between a secure/trustworthy and insecure website. This is laughably wrong.

HTTPS means the connection to the website is secure, not that the website is secure. These are different things. It means hackers are unlikely to be able to eavesdrop on the traffic as it’s transmitted to the website. However, the website itself may be insecure (easily hacked), or worse, it may be a fraudulent website created by hackers to appear similar to a legitimate website.

What HTTPS secures is a common misconception, perpetuated by guides like this. This is the source of criticism for LetsEncrypt, an initiative to give away free website certificates so that everyone can get HTTPS. Hackers now routinely use LetsEncrypt to create their fraudulent websites to host their viruses. Since people have been taught forever that HTTPS means a website is “secure”, people are trusting these hacker websites.

But LetsEncrypt is a good thing, all connections should be secure. What’s bad is not LetsEncrypt itself, but guides like this from the government that have for years been teaching people the wrong thing, that HTTPS means a website is secure.

Backups

Of course, no guide would be complete without telling people to backup their stuff.

This is especially important with the growing ransomware threat. Ransomware is a type of virus/malware that encrypts your files then charges you money to get the key to decrypt the files. Half the time this just destroys the files.

But this again is moral authority, telling people what to do, instead of educating them how to do it. Most will ignore this advice because they don’t know how to effectively backup their stuff.

For most users, it’s easy to go to the store and buy a 256-gigabyte USB drive for $40 (as of May 2018) then use the “Timemachine” feature in macOS, or on Windows the “File History” feature or the “Backup and Restore” feature. These can be configured to automatically do the backup on a regular basis so that you don’t have to worry about it.

But such “local” backups are still problematic. If the drive is left plugged into the machine, ransomeware can attack the backup. If there’s a fire, any backup in your home will be destroyed along with the computer.

I recommend cloud backup instead. There are so many good providers, like DropBox, Backblaze, Microsoft, Apple’s iCloud, and so on. These are especially critical for phones: if your iPhone is destroyed or stolen, you can simply walk into an Apple store and buy a new one, with everything replaced as it was from their iCloud.

But all of this is missing the key problem: your photos. You carry a camera with you all the time now and take a lot of high resolution photos. This quickly exceeds the capacity of most of the free backup solutions. You can configure these, such as you phone’s iCloud backup, to exclude photos, but that means you are prone to losing your photos/memories. For example, Drop Box is great for the free 5 gigabyte service, but if I want to preserve photos on it, I have to pay for their more expensive service.

One of the key messages kids should learn about photos is that they will likely lose most all of the photos they’ve taken within 5 years. The exceptions will be the few photos they’ve posted to social media, which sorta serves as a cloud backup for them. If they want to preserve the rest of these memories, the kids need to take seriously finding backup solutions. I’m not sure of the best solution, but I buy big USB flash drives and send them to my niece asking her to copy all her photos to them, so that at least I can put that in a safe.

One surprisingly good solution is Microsoft Office 365. For $99 a year, you get a copy of their Office software (which I use) but it also comes with a large 1-terabyte of cloud storage, which is likely big enough for your photos. Apple charges around the same amount for 1-terabyte of iCloud, though it doesn’t come with a free license for Microsoft Office :-).

WiFi encryption

Your home WiFi should be encrypted, of course.

I have to point out the language, though. Turning on WPA2 WiFi encryption does not “secure your network”. Instead, it just secures the radio signals from being eavesdropped. Your network may have other vulnerabilities, where encryption won’t help, such as when your router has remote administration turned on with a default or backdoor password enabled.

I’m being a bit pedantic here, but it’s not my argument. It’s the FTC’s argument when they sued vendors like D-Link for making exactly the same sort of recommendation. The FTC claimed it was deceptive business practice because recommending users do things like this still didn’t mean the device was “secure”. Since the FTC is partly responsible for writing Melania’s document, I find this a bit ironic.

In any event, WPA2 personal has problems where it can be hacked, such as if WPS is enabled, or evil twin access-points broadcasting stronger (or more directional) signals. It’s thus insufficient security. To be fully secure against possible WiFi eavesdropping you need to enable enterprise WPA2, which isn’t something most users can do.

Also, WPA2 is largely redundant. If you wardrive your local neighborhood you’ll find that almost everyone has WPA enabled already anyway. Guides like this probably don’t need to advise what everyone’s already doing, especially when it’s still incomplete.

Change your router password

Yes, leaving the default password on your router is a problem, as shown by recent Mirai-style attacks, such as the very recent ones where Russia has infected 500,000 in their cyberwar against Ukraine. But those were only a problem because routers also had remote administration enabled. It’s remote administration you need to make sure is disabled on your router, regardless if you change the default password (as there are other vulnerabilities besides passwords). If remote administration is disabled, then it’s very rare that people will attack your router with the default password.

Thus, they ignore the important thing (remote administration) and instead focus on the less important thing (change default password).

In addition, this advice again the impractical recommendation of choosing a complex (strong) password. Users who do this usually forget it by the time they next need it. Practical advice is to recommend users write down the password they choose, and put it either someplace they won’t forget (like with the rest of their passwords), or on a sticky note under the router.

Update router firmware

Like any device on the network, you should keep it up-to-date with the latest patches. But you aren’t going to, because it’s not practical. While your laptop/desktop and phone nag you about updates, your router won’t. Whereas phones/computers update once a month, your router vendor will update the firmware once a year — and after a few years, stop releasing any more updates at all.

Routers are just one of many IoT devices we are going to have to come to terms with, keeping them patched. I don’t know the right answer. I check my parents stuff every Thanksgiving, so maybe that’s a good strategy: patch your stuff at the end of every year. Maybe some cultural norms will develop, but simply telling people to be strong about their IoT firmware patches isn’t going to be practical in the near term.

Don’t click on stuff

This probably the most common cybersecurity advice given by infosec professionals. It is wrong.

Emails/messages are designed for you to click on things. You regularly get emails/messages from legitimate sources that demand you click on things. It’s so common from legitimate sources that there’s no practical way for users to distinguish between them and bad sources. As that Google Docs bug showed, even experts can’t always tell the difference.

I mean, it’s true that phishing attacks coming through emails/messages try to trick you into clicking on things, and you should be suspicious of such things. However, it doesn’t follow from this that not clicking on things is a practical strategy. It’s like diet advice recommending you stop eating food altogether.

Sex predators, oh my!

Of course, its kids going online, so of course you are going to have warnings about sexual predators:

But online predators are rare. The predator threat to children is overwhelmingly from relatives and acquaintances, a much smaller threat from strangers, and a vanishingly tiny threat from online predators. Recommendations like this stem from our fears of the unknown technology rather than a rational measurement of the threat.

Sexting, oh my!

So here is one piece of advice that I can agree with: don’t sext:

But the reason this is bad is not because it’s immoral or wrong, but because adults have gone crazy and made it illegal for children to take nude photographs of themselves. As this article points out, your child is more likely to get in trouble and get placed on the sex offender registry (for life) than to get molested by a person on that registry.

Thus, we need to warn kids not from some immoral activity, but from adults who’ve gotten freaked out about it. Yes, sending pictures to your friends/love-interest will also often get you in trouble as those images will frequently get passed around school, but such temporary embarrassments will pass. Getting put on a sex offender registry harms you for life.

Texting while driving

Finally, I want to point out this error:

The evidence is to the contrary, that it’s not actually dangerous — it’s just assumed to be dangerous. Texting rarely distracts drivers from what’s going on the road. It instead replaces some other inattention, such as day dreaming, fiddling with the radio, or checking yourself in the mirror. Risk compensation happens, when people are texting while driving, they are also slowing down and letting more space between them and the car in front of them.

Studies have shown this. For example, one study measured accident rates at 6:59pm vs 7:01pm and found no difference. That’s when “free evening texting” came into effect, so we should’ve seen a bump in the number of accidents. They even tried to narrow the effect down, such as people texting while changing cell towers (proving they were in motion).

Yes, texting is illegal, but that’s because people are fed up with the jerk in front of them not noticing the light is green. It’s not illegal because it’s particularly dangerous, that it has a measurable impact on accident rates.

Conclusion

The point of this post is not to refine the advice and make it better. Instead, I attempt to demonstrate how such advice rests on moral authority, because it’s the government telling you so. It’s because cybersecurity and safety are higher moral duties. Much of it is outdated, impractical, inappropriate, and redundant.
We need to move away from this sort of advice. Instead of moral authority, we need technical authority. We need to focus on the threats that people actually face, and instead of commanding them what to do. We need to help them be secure, not command to command them, shaming them for their insecurity. It’s like Strunk and White’s “Elements of Style”: they don’t take the moral authority approach and tell people how to write, but instead try to help people how to write well.

Amazon Neptune Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-neptune-generally-available/

Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.

Now that Amazon Neptune is generally available there are a few changes from the preview:

Launching an Amazon Neptune Cluster

Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.

You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.

Additional Resources

We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.

  • Amazon Neptune Tools Repo
    This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
  • Amazon Neptune Samples Repo
    This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.

Purpose Built Databases

There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.

I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.

As always, feel free to reach out in the comments or on twitter to provide any feedback!

Randall

Securing Your Cryptocurrency

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/backing-up-your-cryptocurrency/

Securing Your Cryptocurrency

In our blog post on Tuesday, Cryptocurrency Security Challenges, we wrote about the two primary challenges faced by anyone interested in safely and profitably participating in the cryptocurrency economy: 1) make sure you’re dealing with reputable and ethical companies and services, and, 2) keep your cryptocurrency holdings safe and secure.

In this post, we’re going to focus on how to make sure you don’t lose any of your cryptocurrency holdings through accident, theft, or carelessness. You do that by backing up the keys needed to sell or trade your currencies.

$34 Billion in Lost Value

Of the 16.4 million bitcoins said to be in circulation in the middle of 2017, close to 3.8 million may have been lost because their owners no longer are able to claim their holdings. Based on today’s valuation, that could total as much as $34 billion dollars in lost value. And that’s just bitcoins. There are now over 1,500 different cryptocurrencies, and we don’t know how many of those have been misplaced or lost.



Now that some cryptocurrencies have reached (at least for now) staggering heights in value, it’s likely that owners will be more careful in keeping track of the keys needed to use their cryptocurrencies. For the ones already lost, however, the owners have been separated from their currencies just as surely as if they had thrown Benjamin Franklins and Grover Clevelands over the railing of a ship.

The Basics of Securing Your Cryptocurrencies

In our previous post, we reviewed how cryptocurrency keys work, and the common ways owners can keep track of them. A cryptocurrency owner needs two keys to use their currencies: a public key that can be shared with others is used to receive currency, and a private key that must be kept secure is used to spend or trade currency.

Many wallets and applications allow the user to require extra security to access them, such as a password, or iris, face, or thumb print scan. If one of these options is available in your wallets, take advantage of it. Beyond that, it’s essential to back up your wallet, either using the backup feature built into some applications and wallets, or manually backing up the data used by the wallet. When backing up, it’s a good idea to back up the entire wallet, as some wallets require additional private data to operate that might not be apparent.

No matter which backup method you use, it is important to back up often and have multiple backups, preferable in different locations. As with any valuable data, a 3-2-1 backup strategy is good to follow, which ensures that you’ll have a good backup copy if anything goes wrong with one or more copies of your data.

One more caveat, don’t reuse passwords. This applies to all of your accounts, but is especially important for something as critical as your finances. Don’t ever use the same password for more than one account. If security is breached on one of your accounts, someone could connect your name or ID with other accounts, and will attempt to use the password there, as well. Consider using a password manager such as LastPass or 1Password, which make creating and using complex and unique passwords easy no matter where you’re trying to sign in.

Approaches to Backing Up Your Cryptocurrency Keys

There are numerous ways to be sure your keys are backed up. Let’s take them one by one.

1. Automatic backups using a backup program

If you’re using a wallet program on your computer, for example, Bitcoin Core, it will store your keys, along with other information, in a file. For Bitcoin Core, that file is wallet.dat. Other currencies will use the same or a different file name and some give you the option to select a name for the wallet file.

To back up the wallet.dat or other wallet file, you might need to tell your backup program to explicitly back up that file. Users of Backblaze Backup don’t have to worry about configuring this, since by default, Backblaze Backup will back up all data files. You should determine where your particular cryptocurrency, wallet, or application stores your keys, and make sure the necessary file(s) are backed up if your backup program requires you to select which files are included in the backup.

Backblaze B2 is an option for those interested in low-cost and high security cloud storage of their cryptocurrency keys. Backblaze B2 supports 2-factor verification for account access, works with a number of apps that support automatic backups with encryption, error-recovery, and versioning, and offers an API and command-line interface (CLI), as well. The first 10GB of storage is free, which could be all one needs to store encrypted cryptocurrency keys.

2. Backing up by exporting keys to a file

Apps and wallets will let you export your keys from your app or wallet to a file. Once exported, your keys can be stored on a local drive, USB thumb drive, DAS, NAS, or in the cloud with any cloud storage or sync service you wish. Encrypting the file is strongly encouraged — more on that later. If you use 1Password or LastPass, or other secure notes program, you also could store your keys there.

3. Backing up by saving a mnemonic recovery seed

A mnemonic phrase, mnemonic recovery phrase, or mnemonic seed is a list of words that stores all the information needed to recover a cryptocurrency wallet. Many wallets will have the option to generate a mnemonic backup phrase, which can be written down on paper. If the user’s computer no longer works or their hard drive becomes corrupted, they can download the same wallet software again and use the mnemonic recovery phrase to restore their keys.

The phrase can be used by anyone to recover the keys, so it must be kept safe. Mnemonic phrases are an excellent way of backing up and storing cryptocurrency and so they are used by almost all wallets.

A mnemonic recovery seed is represented by a group of easy to remember words. For example:

eye female unfair moon genius pipe nuclear width dizzy forum cricket know expire purse laptop scale identify cube pause crucial day cigar noise receive

The above words represent the following seed:

0a5b25e1dab6039d22cd57469744499863962daba9d2844243fec 9c0313c1448d1a0b2cd9e230a78775556f9b514a8be45802c2808e fd449a20234e9262dfa69

These words have certain properties:

  • The first four letters are enough to unambiguously identify the word.
  • Similar words are avoided (such as: build and built).

Bitcoin and most other cryptocurrencies such as Litecoin, Ethereum, and others use mnemonic seeds that are 12 to 24 words long. Other currencies might use different length seeds.

4. Physical backups — Paper, Metal

Some cryptocurrency holders believe that their backup, or even all their cryptocurrency account information, should be stored entirely separately from the internet to avoid any risk of their information being compromised through hacks, exploits, or leaks. This type of storage is called “cold storage.” One method of cold storage involves printing out the keys to a piece of paper and then erasing any record of the keys from all computer systems. The keys can be entered into a program from the paper when needed, or scanned from a QR code printed on the paper.

Printed public and private keys

Printed public and private keys

Some who go to extremes suggest separating the mnemonic needed to access an account into individual pieces of paper and storing those pieces in different locations in the home or office, or even different geographical locations. Some say this is a bad idea since it could be possible to reconstruct the mnemonic from one or more pieces. How diligent you wish to be in protecting these codes is up to you.

Mnemonic recovery phrase booklet

Mnemonic recovery phrase booklet

There’s another option that could make you the envy of your friends. That’s the CryptoSteel wallet, which is a stainless steel metal case that comes with more than 250 stainless steel letter tiles engraved on each side. Codes and passwords are assembled manually from the supplied part-randomized set of tiles. Users are able to store up to 96 characters worth of confidential information. Cryptosteel claims to be fireproof, waterproof, and shock-proof.

image of a Cryptosteel cold storage device

Cryptosteel cold wallet

Of course, if you leave your Cryptosteel wallet in the pocket of a pair of ripped jeans that gets thrown out by the housekeeper, as happened to the character Russ Hanneman on the TV show Silicon Valley in last Sunday’s episode, then you’re out of luck. That fictional billionaire investor lost a USB drive with $300 million in cryptocoins. Let’s hope that doesn’t happen to you.

Encryption & Security

Whether you store your keys on your computer, an external disk, a USB drive, DAS, NAS, or in the cloud, you want to make sure that no one else can use those keys. The best way to handle that is to encrypt the backup.

With Backblaze Backup for Windows and Macintosh, your backups are encrypted in transmission to the cloud and on the backup server. Users have the option to add an additional level of security by adding a Personal Encryption Key (PEK), which secures their private key. Your cryptocurrency backup files are secure in the cloud. Using our web or mobile interface, previous versions of files can be accessed, as well.

Our object storage cloud offering, Backblaze B2, can be used with a variety of applications for Windows, Macintosh, and Linux. With B2, cryptocurrency users can choose whichever method of encryption they wish to use on their local computers and then upload their encrypted currency keys to the cloud. Depending on the client used, versioning and life-cycle rules can be applied to the stored files.

Other backup programs and systems provide some or all of these capabilities, as well. If you are backing up to a local drive, it is a good idea to encrypt the local backup, which is an option in some backup programs.

Address Security

Some experts recommend using a different address for each cryptocurrency transaction. Since the address is not the same as your wallet, this means that you are not creating a new wallet, but simply using a new identifier for people sending you cryptocurrency. Creating a new address is usually as easy as clicking a button in the wallet.

One of the chief advantages of using a different address for each transaction is anonymity. Each time you use an address, you put more information into the public ledger (blockchain) about where the currency came from or where it went. That means that over time, using the same address repeatedly could mean that someone could map your relationships, transactions, and incoming funds. The more you use that address, the more information someone can learn about you. For more on this topic, refer to Address reuse.

Note that a downside of using a paper wallet with a single key pair (type-0 non-deterministic wallet) is that it has the vulnerabilities listed above. Each transaction using that paper wallet will add to the public record of transactions associated with that address. Newer wallets, i.e. “deterministic” or those using mnemonic code words support multiple addresses and are now recommended.

There are other approaches to keeping your cryptocurrency transaction secure. Here are a couple of them.

Multi-signature

Multi-signature refers to requiring more than one key to authorize a transaction, much like requiring more than one key to open a safe. It is generally used to divide up responsibility for possession of cryptocurrency. Standard transactions could be called “single-signature transactions” because transfers require only one signature — from the owner of the private key associated with the currency address (public key). Some wallets and apps can be configured to require more than one signature, which means that a group of people, businesses, or other entities all must agree to trade in the cryptocurrencies.

Deep Cold Storage

Deep cold storage ensures the entire transaction process happens in an offline environment. There are typically three elements to deep cold storage.

First, the wallet and private key are generated offline, and the signing of transactions happens on a system not connected to the internet in any manner. This ensures it’s never exposed to a potentially compromised system or connection.

Second, details are secured with encryption to ensure that even if the wallet file ends up in the wrong hands, the information is protected.

Third, storage of the encrypted wallet file or paper wallet is generally at a location or facility that has restricted access, such as a safety deposit box at a bank.

Deep cold storage is used to safeguard a large individual cryptocurrency portfolio held for the long term, or for trustees holding cryptocurrency on behalf of others, and is possibly the safest method to ensure a crypto investment remains secure.

Keep Your Software Up to Date

You should always make sure that you are using the latest version of your app or wallet software, which includes important stability and security fixes. Installing updates for all other software on your computer or mobile device is also important to keep your wallet environment safer.

One Last Thing: Think About Your Testament

Your cryptocurrency funds can be lost forever if you don’t have a backup plan for your peers and family. If the location of your wallets or your passwords is not known by anyone when you are gone, there is no hope that your funds will ever be recovered. Taking a bit of time on these matters can make a huge difference.

To the Moon*

Are you comfortable with how you’re managing and backing up your cryptocurrency wallets and keys? Do you have a suggestion for keeping your cryptocurrencies safe that we missed above? Please let us know in the comments.


*To the Moon — Crypto slang for a currency that reaches an optimistic price projection.

The post Securing Your Cryptocurrency appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Amazon Aurora Backtrack – Turn Back Time

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-aurora-backtrack-turn-back-time/

We’ve all been there! You need to make a quick, seemingly simple fix to an important production database. You compose the query, give it a once-over, and let it run. Seconds later you realize that you forgot the WHERE clause, dropped the wrong table, or made another serious mistake, and interrupt the query, but the damage has been done. You take a deep breath, whistle through your teeth, wish that reality came with an Undo option. Now what?

New Amazon Aurora Backtrack
Today I would like to tell you about the new backtrack feature for Amazon Aurora. This is as close as we can come, given present-day technology, to an Undo option for reality.

This feature can be enabled at launch time for all newly-launched Aurora database clusters. To enable it, you simply specify how far back in time you might want to rewind, and use the database as usual (this is on the Configure advanced settings page):

Aurora uses a distributed, log-structured storage system (read Design Considerations for High Throughput Cloud-Native Relational Databases to learn a lot more); each change to your database generates a new log record, identified by a Log Sequence Number (LSN). Enabling the backtrack feature provisions a FIFO buffer in the cluster for storage of LSNs. This allows for quick access and recovery times measured in seconds.

After that regrettable moment when all seems lost, you simply pause your application, open up the Aurora Console, select the cluster, and click Backtrack DB cluster:

Then you select Backtrack and choose the point in time just before your epic fail, and click Backtrack DB cluster:

Then you wait for the rewind to take place, unpause your application and proceed as if nothing had happened. When you initiate a backtrack, Aurora will pause the database, close any open connections, drop uncommitted writes, and wait for the backtrack to complete. Then it will resume normal operation and being to accept requests. The instance state will be backtracking while the rewind is underway:

The console will let you know when the backtrack is complete:

If it turns out that you went back a bit too far, you can backtrack to a later time. Other Aurora features such as cloning, backups, and restores continue to work on an instance that has been configured for backtrack.

I’m sure you can think of some creative and non-obvious use cases for this cool new feature. For example, you could use it to restore a test database after running a test that makes changes to the database. You can initiate the restoration from the API or the CLI, making it easy to integrate into your existing test framework.

Things to Know
This option applies to newly created MySQL-compatible Aurora database clusters and to MySQL-compatible clusters that have been restored from a backup. You must opt-in when you create or restore a cluster; you cannot enable it for a running cluster.

This feature is available now in all AWS Regions where Amazon Aurora runs, and you can start using it today.

Jeff;

Analyze data in Amazon DynamoDB using Amazon SageMaker for real-time prediction

Post Syndicated from YongSeong Lee original https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-dynamodb-using-amazon-sagemaker-for-real-time-prediction/

Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.

Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.

DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.

To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.

The solution that I describe provides the following benefits:

  • Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
  • Automatically updates your model to get real-time predictions
  • Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
  • Makes it easier for developers of all skill levels to use Amazon SageMaker

All code and data set in this post are available in this .zip file.

Solution architecture

The following diagram shows the overall architecture of the solution.

The steps that data follows through the architecture are as follows:

  1. Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
  2. Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
  3. Amazon SageMaker renews the model artifact and update the endpoint.
  4. The converted CSV is available for ad hoc queries with Amazon Athena.
  5. Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.

Building the auto-updating model

This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.

Download sample scripts and data

Before you begin, take the following steps:

  1. Download sample scripts in this .zip file.
  2. Unzip the src.zip file.
  3. Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
  4. Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.

For this solution, the banking.csv  should be imported into a DynamoDB table.

Export a DynamoDB table

To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.

One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.

For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.

Add the script to an existing pipeline

After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:

  1. Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
  2. For Actions, choose Edit.
  3. In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.

Paste the following command into the new step after the data ­­upload step:

s3://#{myDDBRegion}.elasticmapreduce/libs/script-runner/script-runner.jar,s3://<your bucket name>/automation_script.sh,#{output.directoryPath},#{myDDBRegion}

The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.

The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.

Automation script: Convert JSON data to CSV with Hive

We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.

When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.

Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.

The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.

#!/bin/bash
hive -e "
ADD jar s3://<your bucket name>/json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar ; 
DROP TABLE IF EXISTS blog_backup_data ;
CREATE EXTERNAL TABLE blog_backup_data (
 customer_id map<string,string>,
 age map<string,string>, job map<string,string>, 
 marital map<string,string>,education map<string,string>, 
 default map<string,string>, housing map<string,string>,
 loan map<string,string>, contact map<string,string>, 
 month map<string,string>, day_of_week map<string,string>, 
 duration map<string,string>, campaign map<string,string>,
 pdays map<string,string>, previous map<string,string>, 
 poutcome map<string,string>, emp_var_rate map<string,string>, 
 cons_price_idx map<string,string>, cons_conf_idx map<string,string>,
 euribor3m map<string,string>, nr_employed map<string,string>, 
 y map<string,string> ) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 
LOCATION '$1/';

INSERT OVERWRITE DIRECTORY 's3://<your bucket name>/<datasource path>/' 
SELECT concat( customer_id['s'],',', 
 age['n'],',', job['s'],',', 
 marital['s'],',', education['s'],',', default['s'],',', 
 housing['s'],',', loan['s'],',', contact['s'],',', 
 month['s'],',', day_of_week['s'],',', duration['n'],',', 
 campaign['n'],',',pdays['n'],',',previous['n'],',', 
 poutcome['s'],',', emp_var_rate['n'],',', cons_price_idx['n'],',',
 cons_conf_idx['n'],',', euribor3m['n'],',', nr_employed['n'],',', y['n'] ) 
FROM blog_backup_data
WHERE customer_id['s'] > 0 ; 

After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.

Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id  columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.

Automation script: Renew the Amazon SageMaker model

After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3.  For renewing model artifact, you must create a new training job.  Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.

In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.

#!/bin/bash
## Define variable 
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>" 


# Select containers image based on region.  
case "$REGION" in
"us-west-2" )
    IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
    ;;
"us-east-1" )
    IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest" 
    ;;
"us-east-2" )
    IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest" 
    ;;
"eu-west-1" )
    IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest" 
    ;;
 *)
    echo "Invalid Region Name"
    exit 1 ;  
esac

# Start training job and creating model artifact 
TRAINING_JOB_NAME=TRAIN-${DTTIME} 
S3OUTPUT="s3://<your bucket name>/model/" 
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5 
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE}  --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None"  }]'  --output-data-config S3OutputPath=${S3OUTPUT} --resource-config  InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier  

# Wait until job completed 
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME}  --region ${REGION}

# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}  --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME}  --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT}  --execution-role-arn ${ROLE}

# create a new endpoint configuration 
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker  create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME}  --production-variants  VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge

# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name  ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
    aws sagemaker  create-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}    
else
    aws sagemaker  update-endpoint --endpoint-name  ServiceEndpoint  --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi

Grant permission

Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.

{
 "Version": "2012-10-17",
 "Statement": [
 {
 "Effect": "Allow",
 "Action": [
 "sagemaker:CreateTrainingJob",
 "sagemaker:DescribeTrainingJob",
 "sagemaker:CreateModel",
 "sagemaker:CreateEndpointConfig",
 "sagemaker:DescribeEndpoint",
 "sagemaker:CreateEndpoint",
 "sagemaker:UpdateEndpoint",
 "iam:PassRole"
 ],
 "Resource": "*"
 }
 ]
}

Use real-time prediction

After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.

Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.

=== Python sample for real-time prediction ===

#!/usr/bin/env python
import boto3
import json 

client = boto3.client('sagemaker-runtime', region_name ='<your region>' )
new_customer_info = '34,10,2,4,1,2,1,1,6,3,190,1,3,4,3,-1.7,94.055,-39.8,0.715,4991.6'
response = client.invoke_endpoint(
    EndpointName='ServiceEndpoint',
    Body=new_customer_info, 
    ContentType='text/csv'
)
result = json.loads(response['Body'].read().decode())
print(result)
--- output(response) ---
{u'predictions': [{u'score': 0.7528127431869507, u'predicted_label': 1.0}]}

Solution summary

The solution takes the following steps:

  1. Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
  2. Train the Amazon SageMaker model with the new data source.
  3. When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
  4. If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.

Running ad hoc queries using Amazon Athena

Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.

With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog. 

Creating an Amazon Athena table and running it

Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.

=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
 age int, 
 job string, 
 marital string , 
 education string, 
 default string, 
 housing string, 
 loan string, 
 contact string, 
 month string, 
 day_of_week string, 
 duration int, 
 campaign int, 
 pdays int , 
 previous int , 
 poutcome string, 
 emp_var_rate double, 
 cons_price_idx double,
 cons_conf_idx double, 
 euribor3m double, 
 nr_employed double, 
 y int 
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' 
LOCATION 's3://<your bucket name>/<datasource path>/';

The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.

=== Sample Query ===

SELECT corr(age,y) AS correlation_age_and_target, 
 corr(duration,y) AS correlation_duration_and_target, 
 corr(campaign,y) AS correlation_campaign_and_target,
 corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y , 
 CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact 
 FROM datasource 
 ) datasource ;

Conclusion

In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.

You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.

 


Additional Reading

If you found this post useful, be sure to check out Serving Real-Time Machine Learning Predictions on Amazon EMR and Analyzing Data in S3 using Amazon Athena.

 


About the Author

Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”

 

 

CI/CD with Data: Enabling Data Portability in a Software Delivery Pipeline with AWS Developer Tools, Kubernetes, and Portworx

Post Syndicated from Kausalya Rani Krishna Samy original https://aws.amazon.com/blogs/devops/cicd-with-data-enabling-data-portability-in-a-software-delivery-pipeline-with-aws-developer-tools-kubernetes-and-portworx/

This post is written by Eric Han – Vice President of Product Management Portworx and Asif Khan – Solutions Architect

Data is the soul of an application. As containers make it easier to package and deploy applications faster, testing plays an even more important role in the reliable delivery of software. Given that all applications have data, development teams want a way to reliably control, move, and test using real application data or, at times, obfuscated data.

For many teams, moving application data through a CI/CD pipeline, while honoring compliance and maintaining separation of concerns, has been a manual task that doesn’t scale. At best, it is limited to a few applications, and is not portable across environments. The goal should be to make running and testing stateful containers (think databases and message buses where operations are tracked) as easy as with stateless (such as with web front ends where they are often not).

Why is state important in testing scenarios? One reason is that many bugs manifest only when code is tested against real data. For example, we might simply want to test a database schema upgrade but a small synthetic dataset does not exercise the critical, finer corner cases in complex business logic. If we want true end-to-end testing, we need to be able to easily manage our data or state.

In this blog post, we define a CI/CD pipeline reference architecture that can automate data movement between applications. We also provide the steps to follow to configure the CI/CD pipeline.

 

Stateful Pipelines: Need for Portable Volumes

As part of continuous integration, testing, and deployment, a team may need to reproduce a bug found in production against a staging setup. Here, the hosting environment is comprised of a cluster with Kubernetes as the scheduler and Portworx for persistent volumes. The testing workflow is then automated by AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild.

Portworx offers Kubernetes storage that can be used to make persistent volumes portable between AWS environments and pipelines. The addition of Portworx to the AWS Developer Tools continuous deployment for Kubernetes reference architecture adds persistent storage and storage orchestration to a Kubernetes cluster. The example uses MongoDB as the demonstration of a stateful application. In practice, the workflow applies to any containerized application such as Cassandra, MySQL, Kafka, and Elasticsearch.

Using the reference architecture, a developer calls CodePipeline to trigger a snapshot of the running production MongoDB database. Portworx then creates a block-based, writable snapshot of the MongoDB volume. Meanwhile, the production MongoDB database continues serving end users and is uninterrupted.

Without the Portworx integrations, a manual process would require an application-level backup of the database instance that is outside of the CI/CD process. For larger databases, this could take hours and impact production. The use of block-based snapshots follows best practices for resilient and non-disruptive backups.

As part of the workflow, CodePipeline deploys a new MongoDB instance for staging onto the Kubernetes cluster and mounts the second Portworx volume that has the data from production. CodePipeline triggers the snapshot of a Portworx volume through an AWS Lambda function, as shown here

 

 

 

AWS Developer Tools with Kubernetes: Integrated Workflow with Portworx

In the following workflow, a developer is testing changes to a containerized application that calls on MongoDB. The tests are performed against a staging instance of MongoDB. The same workflow applies if changes were on the server side. The original production deployment is scheduled as a Kubernetes deployment object and uses Portworx as the storage for the persistent volume.

The continuous deployment pipeline runs as follows:

  • Developers integrate bug fix changes into a main development branch that gets merged into a CodeCommit master branch.
  • Amazon CloudWatch triggers the pipeline when code is merged into a master branch of an AWS CodeCommit repository.
  • AWS CodePipeline sends the new revision to AWS CodeBuild, which builds a Docker container image with the build ID.
  • AWS CodeBuild pushes the new Docker container image tagged with the build ID to an Amazon ECR registry.
  • Kubernetes downloads the new container (for the database client) from Amazon ECR and deploys the application (as a pod) and staging MongoDB instance (as a deployment object).
  • AWS CodePipeline, through a Lambda function, calls Portworx to snapshot the production MongoDB and deploy a staging instance of MongoDB• Portworx provides a snapshot of the production instance as the persistent storage of the staging MongoDB
    • The MongoDB instance mounts the snapshot.

At this point, the staging setup mimics a production environment. Teams can run integration and full end-to-end tests, using partner tooling, without impacting production workloads. The full pipeline is shown here.

 

Summary

This reference architecture showcases how development teams can easily move data between production and staging for the purposes of testing. Instead of taking application-specific manual steps, all operations in this CodePipeline architecture are automated and tracked as part of the CI/CD process.

This integrated experience is part of making stateful containers as easy as stateless. With AWS CodePipeline for CI/CD process, developers can easily deploy stateful containers onto a Kubernetes cluster with Portworx storage and automate data movement within their process.

The reference architecture and code are available on GitHub:

● Reference architecture: https://github.com/portworx/aws-kube-codesuite
● Lambda function source code for Portworx additions: https://github.com/portworx/aws-kube-codesuite/blob/master/src/kube-lambda.py

For more information about persistent storage for containers, visit the Portworx website. For more information about Code Pipeline, see the AWS CodePipeline User Guide.

Reddit Repeat Infringer Policy Shuts Down Megalinks Piracy Sub

Post Syndicated from Andy original https://torrentfreak.com/reddit-repeat-infringer-policy-shuts-down-megalinks-piracy-sub-180430/

Without doubt, Reddit is one of the most popular sites on the entire Internet. At the time of writing it’s the fourth most visited site in the US with 330 million users per month generating 14 billion screenviews.

The core of the site’s success is its communities. Known as ‘sub-Reddits’ or just ‘subs’, there are currently 138,000 of them dedicated to every single subject you can think of and tens of thousands you’d never considered.

Even though they’re technically forbidden, a small but significant number are dedicated to piracy, offering links to copyright-infringing content hosted elsewhere. One of the most popular is /r/megalinks, which is dedicated to listing infringing content (mainly movies and TV shows) uploaded to file-hosting site Mega.

Considering its activities, Megalinks has managed to stay online longer than most people imagined but following an intervention from Reddit, the content indexing sub has stopped accepting new submissions, which will effectively shut it down.

In an announcement Sunday, the sub’s moderators explained that following a direct warning from Reddit’s administrators, the decision had been taken to move on.

“As most of you know by now, we’ve had to deal with a lot of DMCA takedowns over the last 6 months. Everyone knew this day would come, eventually, and its finally here,” they wrote.

“We received a formal warning from Reddit’s administration 2 days ago, and have decided to restrict new submissions for the safety of the subreddit.”

The message from Reddit’s operators makes it absolutely clear that Reddit isn’t the platform to host what amounts to a piracy links forum.

“This is an official warning from Reddit that we are receiving too many copyright infringement notices about material posted to your community. We will be required to ban this community if you can’t adequately address the problem,” the warning reads.

Noting that Redditors aren’t allowed to post content that infringes copyrights, the administrators say they are required by law to handle DMCA notices and that in cases where infringement happens on multiple occasions, that needs to be handled in a more aggressive manner.

“The law also requires us to issue bans in cases of repeat infringement. Sometimes a repeat infringement problem is limited to just one user and we ban just that person. Other times the problem pervades a whole community and we ban the community,” the admins continue.

“This is our formal warning about repeat infringement in this community. Over the past three months we’ve had to remove material from the community in response to copyright notices 60 times. That’s an unusually high number taking into account the community’s size.

The warning suggests ways to keep infringing content down but in a sub dedicated to piracy, they’re all completely irrelevant. It also suggests removing old posts to ensure that Reddit doesn’t keep getting notices, but that would mean deleting pretty much everything. Backups exist but a simple file is a poor substitute for a community.

So, with Reddit warning that without change the sub will be banned, the moderators of /r/megalinks have decided to move on to a new home. Reportedly hosted ‘offshore’, their new forum already has more than 9,800 members and is likely to grow quickly as the word spreads.

A month ago, the /r/megaporn sub-Reddit suffered a similar fate following a warning from Reddit’s admins. It successfully launched a new external forum which is why the Megalinks crew decided on the same model.

“[A]fter seeing how /r/megaporn approached the same situation, we had started working on an offshore forum a week ago in anticipation of the ban. This allows us to work however we want, without having to deal with Reddit’s policies and administration,” the team explain.

Ever since the BMG v Cox case went bad ways for the ISP in 2015, repeat infringer policies have become a very hot topic in the US. That Reddit is now drawing a line in the sand over a relatively small number of complaints (at least compared to other similar platforms) is clear notice that Reddit and blatant piracy won’t be allowed to walk hand in hand.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Ransomware Update: Viruses Targeting Business IT Servers

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/ransomware-update-viruses-targeting-business-it-servers/

Ransomware warning message on computer

As ransomware attacks have grown in number in recent months, the tactics and attack vectors also have evolved. While the primary method of attack used to be to target individual computer users within organizations with phishing emails and infected attachments, we’re increasingly seeing attacks that target weaknesses in businesses’ IT infrastructure.

How Ransomware Attacks Typically Work

In our previous posts on ransomware, we described the common vehicles used by hackers to infect organizations with ransomware viruses. Most often, downloaders distribute trojan horses through malicious downloads and spam emails. The emails contain a variety of file attachments, which if opened, will download and run one of the many ransomware variants. Once a user’s computer is infected with a malicious downloader, it will retrieve additional malware, which frequently includes crypto-ransomware. After the files have been encrypted, a ransom payment is demanded of the victim in order to decrypt the files.

What’s Changed With the Latest Ransomware Attacks?

In 2016, a customized ransomware strain called SamSam began attacking the servers in primarily health care institutions. SamSam, unlike more conventional ransomware, is not delivered through downloads or phishing emails. Instead, the attackers behind SamSam use tools to identify unpatched servers running Red Hat’s JBoss enterprise products. Once the attackers have successfully gained entry into one of these servers by exploiting vulnerabilities in JBoss, they use other freely available tools and scripts to collect credentials and gather information on networked computers. Then they deploy their ransomware to encrypt files on these systems before demanding a ransom. Gaining entry to an organization through its IT center rather than its endpoints makes this approach scalable and especially unsettling.

SamSam’s methodology is to scour the Internet searching for accessible and vulnerable JBoss application servers, especially ones used by hospitals. It’s not unlike a burglar rattling doorknobs in a neighborhood to find unlocked homes. When SamSam finds an unlocked home (unpatched server), the software infiltrates the system. It is then free to spread across the company’s network by stealing passwords. As it transverses the network and systems, it encrypts files, preventing access until the victims pay the hackers a ransom, typically between $10,000 and $15,000. The low ransom amount has encouraged some victimized organizations to pay the ransom rather than incur the downtime required to wipe and reinitialize their IT systems.

The success of SamSam is due to its effectiveness rather than its sophistication. SamSam can enter and transverse a network without human intervention. Some organizations are learning too late that securing internet-facing services in their data center from attack is just as important as securing endpoints.

The typical steps in a SamSam ransomware attack are:

1
Attackers gain access to vulnerable server
Attackers exploit vulnerable software or weak/stolen credentials.
2
Attack spreads via remote access tools
Attackers harvest credentials, create SOCKS proxies to tunnel traffic, and abuse RDP to install SamSam on more computers in the network.
3
Ransomware payload deployed
Attackers run batch scripts to execute ransomware on compromised machines.
4
Ransomware demand delivered requiring payment to decrypt files
Demand amounts vary from victim to victim. Relatively low ransom amounts appear to be designed to encourage quick payment decisions.

What all the organizations successfully exploited by SamSam have in common is that they were running unpatched servers that made them vulnerable to SamSam. Some organizations had their endpoints and servers backed up, while others did not. Some of those without backups they could use to recover their systems chose to pay the ransom money.

Timeline of SamSam History and Exploits

Since its appearance in 2016, SamSam has been in the news with many successful incursions into healthcare, business, and government institutions.

March 2016
SamSam appears

SamSam campaign targets vulnerable JBoss servers
Attackers hone in on healthcare organizations specifically, as they’re more likely to have unpatched JBoss machines.

April 2016
SamSam finds new targets

SamSam begins targeting schools and government.
After initial success targeting healthcare, attackers branch out to other sectors.

April 2017
New tactics include RDP

Attackers shift to targeting organizations with exposed RDP connections, and maintain focus on healthcare.
An attack on Erie County Medical Center costs the hospital $10 million over three months of recovery.
Erie County Medical Center attacked by SamSam ransomware virus

January 2018
Municipalities attacked

• Attack on Municipality of Farmington, NM.
• Attack on Hancock Health.
Hancock Regional Hospital notice following SamSam attack
• Attack on Adams Memorial Hospital
• Attack on Allscripts (Electronic Health Records), which includes 180,000 physicians, 2,500 hospitals, and 7.2 million patients’ health records.

February 2018
Attack volume increases

• Attack on Davidson County, NC.
• Attack on Colorado Department of Transportation.
SamSam virus notification

March 2018
SamSam shuts down Atlanta

• Second attack on Colorado Department of Transportation.
• City of Atlanta suffers a devastating attack by SamSam.
The attack has far-reaching impacts — crippling the court system, keeping residents from paying their water bills, limiting vital communications like sewer infrastructure requests, and pushing the Atlanta Police Department to file paper reports.
Atlanta Ransomware outage alert
• SamSam campaign nets $325,000 in 4 weeks.
Infections spike as attackers launch new campaigns. Healthcare and government organizations are once again the primary targets.

How to Defend Against SamSam and Other Ransomware Attacks

The best way to respond to a ransomware attack is to avoid having one in the first place. If you are attacked, making sure your valuable data is backed up and unreachable by ransomware infection will ensure that your downtime and data loss will be minimal or none if you ever suffer an attack.

In our previous post, How to Recover From Ransomware, we listed the ten ways to protect your organization from ransomware.

  1. Use anti-virus and anti-malware software or other security policies to block known payloads from launching.
  2. Make frequent, comprehensive backups of all important files and isolate them from local and open networks. Cybersecurity professionals view data backup and recovery (74% in a recent survey) by far as the most effective solution to respond to a successful ransomware attack.
  3. Keep offline backups of data stored in locations inaccessible from any potentially infected computer, such as disconnected external storage drives or the cloud, which prevents them from being accessed by the ransomware.
  4. Install the latest security updates issued by software vendors of your OS and applications. Remember to patch early and patch often to close known vulnerabilities in operating systems, server software, browsers, and web plugins.
  5. Consider deploying security software to protect endpoints, email servers, and network systems from infection.
  6. Exercise cyber hygiene, such as using caution when opening email attachments and links.
  7. Segment your networks to keep critical computers isolated and to prevent the spread of malware in case of attack. Turn off unneeded network shares.
  8. Turn off admin rights for users who don’t require them. Give users the lowest system permissions they need to do their work.
  9. Restrict write permissions on file servers as much as possible.
  10. Educate yourself, your employees, and your family in best practices to keep malware out of your systems. Update everyone on the latest email phishing scams and human engineering aimed at turning victims into abettors.

Please Tell Us About Your Experiences with Ransomware

Have you endured a ransomware attack or have a strategy to avoid becoming a victim? Please tell us of your experiences in the comments.

The post Ransomware Update: Viruses Targeting Business IT Servers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Congratulations to Oracle on MySQL 8.0

Post Syndicated from Michael "Monty" Widenius original http://monty-says.blogspot.com/2018/04/congratulations-to-oracle-on-mysql-80.html

Last week, Oracle announced the general availability of MySQL 8.0. This is good news for database users, as it means Oracle is still developing MySQL.

I decide to celebrate the event by doing a quick test of MySQL 8.0. Here follows a step-by-step description of my first experience with MySQL 8.0.
Note that I did the following without reading the release notes, as is what I have done with every MySQL / MariaDB release up to date; In this case it was not the right thing to do.

I pulled MySQL 8.0 from [email protected]:mysql/mysql-server.git
I was pleasantly surprised that ‘cmake . ; make‘ worked without without any compiler warnings! I even checked the used compiler options and noticed that MySQL was compiled with -Wall + several other warning flags. Good job MySQL team!

I did have a little trouble finding the mysqld binary as Oracle had moved it to ‘runtime_output_directory’; Unexpected, but no big thing.

Now it’s was time to install MySQL 8.0.

I did know that MySQL 8.0 has removed mysql_install_db, so I had to use the mysqld binary directly to install the default databases:
(I have specified datadir=/my/data3 in the /tmp/my.cnf file)

> cd runtime_output_directory
> mkdir /my/data3
> ./mysqld –defaults-file=/tmp/my.cnf –install

2018-04-22T12:38:18.332967Z 1 [ERROR] [MY-011011] [Server] Failed to find valid data directory.
2018-04-22T12:38:18.333109Z 0 [ERROR] [MY-010020] [Server] Data Dictionary initialization failed.
2018-04-22T12:38:18.333135Z 0 [ERROR] [MY-010119] [Server] Aborting

A quick look in mysqld –help –verbose output showed that the right command option is –-initialize. My bad, lets try again,

> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:39:31.910509Z 0 [ERROR] [MY-010457] [Server] –initialize specified but the data directory has files in it. Aborting.
2018-04-22T12:39:31.910578Z 0 [ERROR] [MY-010119] [Server] Aborting

Now I used the right options, but still didn’t work.
I took a quick look around:

> ls /my/data3/
binlog.index

So even if the mysqld noticed that the data3 directory was wrong, it still wrote things into it.  This even if I didn’t have –log-binlog enabled in the my.cnf file. Strange, but easy to fix:

> rm /my/data3/binlog.index
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:40:45.633637Z 0 [ERROR] [MY-011071] [Server] unknown variable ‘max-tmp-tables=100’
2018-04-22T12:40:45.633657Z 0 [Warning] [MY-010952] [Server] The privilege system failed to initialize correctly. If you have upgraded your server, make sure you’re executing mysql_upgrade to correct the issue.
2018-04-22T12:40:45.633663Z 0 [ERROR] [MY-010119] [Server] Aborting

The warning about the privilege system confused me a bit, but I ignored it for the time being and removed from my configuration files the variables that MySQL 8.0 doesn’t support anymore. I couldn’t find a list of the removed variables anywhere so this was done with the trial and error method.

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:42:56.626583Z 0 [ERROR] [MY-010735] [Server] Can’t open the mysql.plugin table. Please run mysql_upgrade to create it.
2018-04-22T12:42:56.827685Z 0 [Warning] [MY-010015] [Repl] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened.
2018-04-22T12:42:56.838501Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2018-04-22T12:42:56.848375Z 0 [Warning] [MY-010441] [Server] Failed to open optimizer cost constant tables
2018-04-22T12:42:56.848863Z 0 [ERROR] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-001146 – Table ‘mysql.component’ doesn’t exist
2018-04-22T12:42:56.848916Z 0 [Warning] [MY-013129] [Server] A message intended for a client cannot be sent there as no client-session is attached. Therefore, we’re sending the information to the error-log instead: MY-003543 – The mysql.component table is missing or has an incorrect definition.
….
2018-04-22T12:42:56.854141Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

I figured out that if there is a single wrong variable in the configuration file, running mysqld –initialize will leave the database in an inconsistent state. NOT GOOD! I am happy I didn’t try this in a production system!

Time to start over from the beginning:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize

2018-04-22T12:44:45.548960Z 5 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: px)NaaSp?6um
2018-04-22T12:44:51.221751Z 0 [System] [MY-013170] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld (mysqld 8.0.11) initializing of server has completed

Success!

I wonder why the temporary password is so complex; It could easily have been something that one could easily remember without decreasing security, it’s temporary after all. No big deal, one can always paste it from the logs. (Side note: MariaDB uses socket authentication on many system and thus doesn’t need temporary installation passwords).

Now lets start the MySQL server for real to do some testing:

> ./mysqld –defaults-file=/tmp/my.cnf

2018-04-22T12:45:43.683484Z 0 [System] [MY-010931] [Server] /home/my/mysql-8.0/runtime_output_directory/mysqld: ready for connections. Version: ‘8.0.11’ socket: ‘/tmp/mysql.sock’ port: 3306 Source distribution.

And the lets start the client:

> ./client/mysql –socket=/tmp/mysql.sock –user=root –password=”px)NaaSp?6um”
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Apparently MySQL 8.0 doesn’t work with old MySQL / MariaDB clients by default 🙁

I was testing this in a system with MariaDB installed, like all modern Linux system today, and didn’t want to use the MySQL clients or libraries.

I decided to try to fix this by changing the authentication to the native (original) MySQL authentication method.

> mysqld –skip-grant-tables

> ./client/mysql –socket=/tmp/mysql.sock –user=root
ERROR 1045 (28000): Access denied for user ‘root’@’localhost’ (using password: NO)

Apparently –skip-grant-tables is not good enough anymore. Let’s try again with:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –socket=/tmp/mysql.sock –user=root mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.11 Source distribution

Great, we are getting somewhere, now lets fix “root”  to work with the old authenticaion:

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=password(“test”) where user=”root”;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(“test”) where user=”root”‘ at line 1

A quick look in the MySQL 8.0 release notes told me that the PASSWORD() function is removed in 8.0. Why???? I don’t know how one in MySQL 8.0 is supposed to generate passwords compatible with old installations of MySQL. One could of course start an old MySQL or MariaDB version, execute the password() function and copy the result.

I decided to fix this the easy way and use an empty password:

(Update:: I later discovered that the right way would have been to use: FLUSH PRIVILEGES;  ALTER USER’ root’@’localhost’ identified by ‘test’  ; I however dislike this syntax as it has the password in clear text which is easy to grab and the command can’t be used to easily update the mysql.user table. One must also disable the –skip-grant mode to do use this)

MySQL [mysql]> update mysql.user set plugin=”mysql_native_password”,authentication_string=”” where user=”root”;
Query OK, 1 row affected (0.077 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 
I restarted mysqld:
> mysqld –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
ERROR 1862 (HY000): Your password has expired. To log in you must change it using a client that supports expired passwords.

Ouch, forgot that. Lets try again:

> mysqld –skip-grant-tables –default_authentication_plugin=mysql_native_password

> ./client/mysql –user=root –password=”” mysql
MySQL [mysql]> update mysql.user set password_expired=”N” where user=”root”;

Now restart and test worked:

> ./mysqld –default_authentication_plugin=mysql_native_password

>./client/mysql –user=root –password=”” mysql

Finally I had a working account that I can use to create other users!

When looking at mysqld –help –verbose again. I noticed the option:

–initialize-insecure
Create the default database and exit. Create a super user
with empty password.

I decided to check if this would have made things easier:

> rm -r /my/data3/*
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure

2018-04-22T13:18:06.629548Z 5 [Warning] [MY-010453] [Server] [email protected] is created with an empty password ! Please consider switching off the –initialize-insecure option.

Hm. Don’t understand the warning as–initialize-insecure is not an option that one would use more than one time and thus nothing one would ‘switch off’.

> ./mysqld –defaults-file=/tmp/my.cnf

> ./client/mysql –user=root –password=”” mysql
ERROR 2059 (HY000): Plugin caching_sha2_password could not be loaded: /usr/local/mysql/lib/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory

Back to the beginning 🙁

To get things to work with old clients, one has to initialize the database with:
> ./mysqld –defaults-file=/tmp/my.cnf –initialize-insecure –default_authentication_plugin=mysql_native_password

Now I finally had MySQL 8.0 up and running and thought I would take it up for a spin by running the “standard” MySQL/MariaDB sql-bench test suite. This was removed in MySQL 5.7, but as I happened to have MariaDB 10.3 installed, I decided to run it from there.

sql-bench is a single threaded benchmark that measures the “raw” speed for some common operations. It gives you the ‘maximum’ performance for a single query. Its different from other benchmarks that measures the maximum throughput when you have a lot of users, but sql-bench still tells you a lot about what kind of performance to expect from the database.

I tried first to be clever and create the “test” database, that I needed for sql-bench, with
> mkdir /my/data3/test

but when I tried to run the benchmark, MySQL 8.0 complained that the test database didn’t exist.

MySQL 8.0 has gone away from the original concept of MySQL where the user can easily
create directories and copy databases into the database directory. This may have serious
implication for anyone doing backup of databases and/or trying to restore a backup with normal OS commands.

I created the ‘test’ database with mysqladmin and then tried to run sql-bench:

> ./run-all-tests –user=root

The first run failed in test-ATIS:

Can’t execute command ‘create table class_of_service (class_code char(2) NOT NULL,rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_code))’
Error: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘rank tinyint(2) NOT NULL,class_description char(80) NOT NULL,PRIMARY KEY (class_’ at line 1

This happened because ‘rank‘ is now a reserved word in MySQL 8.0. This is also reserved in ANSI SQL, but I don’t know of any other database that has failed to run test-ATIS before. I have in the past run it against Oracle, PostgreSQL, Mimer, MSSQL etc without any problems.

MariaDB also has ‘rank’ as a keyword in 10.2 and 10.3 but one can still use it as an identifier.

I fixed test-ATIS and then managed to run all tests on MySQL 8.0.

I did run the test both with MySQL 8.0 and MariaDB 10.3 with the InnoDB storage engine and by having identical values for all InnoDB variables, table-definition-cache and table-open-cache. I turned off performance schema for both databases. All test are run with a user with an empty password (to keep things comparable and because it’s was too complex to generate a password in MySQL 8.0)

The result are as follows
Results per test in seconds:

Operation         |MariaDB|MySQL-8|

———————————–
ATIS              | 153.00| 228.00|
alter-table       |  92.00| 792.00|
big-tables        | 990.00|2079.00|
connect           | 186.00| 227.00|
create            | 575.00|4465.00|
insert            |4552.00|8458.00|
select            | 333.00| 412.00|
table-elimination |1900.00|3916.00|
wisconsin         | 272.00| 590.00|
———————————–

This is of course just a first view of the performance of MySQL 8.0 in a single user environment. Some reflections about the results:

  • Alter-table test is slower (as expected) in 8.0 as some of the alter tests benefits of the instant add column in MariaDB 10.3.
  • connect test is also better for MariaDB as we put a lot of efforts to speed this up in MariaDB 10.2
  • table-elimination shows an optimization in MariaDB for the  Anchor table model, which MySQL doesn’t have.
  • CREATE and DROP TABLE is almost 8 times slower in MySQL 8.0 than in MariaDB 10.3. I assume this is the cost of ‘atomic DDL’. This may also cause performance problems for any thread using the data dictionary when another thread is creating/dropping tables.
  • When looking at the individual test results, MySQL 8.0 was slower in almost every test, in many significantly slower.
  • The only test where MySQL was faster was “update_with_key_prefix”. I checked this and noticed that there was a bug in the test and the columns was updated to it’s original value (which should be instant with any storage engine). This is an old bug that MySQL has found and fixed and that we have not been aware of in the test or in MariaDB.
  • While writing this, I noticed that MySQL 8.0 is now using utf8mb4 as the default character set instead of latin1. This may affect some of the benchmarks slightly (not much as most tests works with numbers and Oracle claims that utf8mb4 is only 20% slower than latin1), but needs to be verified.
  • Oracle claims that MySQL 8.0 is much faster on multi user benchmarks. The above test indicates that they may have done this by sacrificing single user performance.
  •  We need to do more and many different benchmarks to better understand exactly what is going on. Stay tuned!

Short summary of my first run with MySQL 8.0:

  • Using the new caching_sha2_password authentication as default for new installation is likely to cause a lot of problems for users. No old application will be able to use MySQL 8.0, installed with default options, without moving to MySQL’s client libraries. While working on this blog I saw MySQL users complain on IRC that not even MySQL Workbench can authenticate with MySQL 8.0. This is the first time in MySQL’s history where such an incompatible change has ever been done!
  • Atomic DDL is a good thing (We plan to have this in MariaDB 10.4), but it should not have such a drastic impact on performance. I am also a bit skeptical of MySQL 8.0 having just one copy of the data dictionary as if this gets corrupted you will lose all your data. (Single point of failure)
  • MySQL 8.0 has several new reserved words and has removed a lot of variables, which makes upgrades hard. Before upgrading to MySQL 8.0 one has to check all one’s databases and applications to ensure that there are no conflicts.
  • As my test above shows, if you have a single deprecated variable in your configuration files, the installation of MySQL will abort and can leave the database in inconsistent state. I did of course my tests by installing into an empty data dictionary, but one can assume that some of the problems may also happen when upgrading an old installation.

Conclusions:
In many ways, MySQL 8.0 has caught up with some earlier versions of MariaDB. For instance, in MariaDB 10.0, we introduced roles (four years ago). In MariaDB 10.1, we introduced encrypted redo/undo logs (three years ago). In MariaDB 10.2, we introduced window functions and CTEs (a year ago). However, some catch-up of MariaDB Server 10.2 features still remains for MySQL (such as check constraints, binlog compression, and log-based rollback).

MySQL 8.0 has a few new interesting features (mostly Atomic DDL and JSON TABLE functions), but at the same time MySQL has strayed away from some of the fundamental corner stone principles of MySQL:

From the start of the first version of MySQL in 1995, all development has been focused around 3 core principles:

  • Ease of use
  • Performance
  • Stability

With MySQL 8.0, Oracle has sacrifices 2 of 3 of these.

In addition (as part of ease of use), while I was working on MySQL, we did our best to ensure that the following should hold:

  • Upgrades should be trivial
  • Things should be kept compatible, if possible (don’t remove features/options/functions that are used)
  • Minimize reserved words, don’t remove server variables
  • One should be able to use normal OS commands to create and drop databases, copy and move tables around within the same system or between different systems. With 8.0 and data dictionary taking backups of specific tables will be hard, even if the server is not running.
  • mysqldump should always be usable backups and to move to new releases
  • Old clients and application should be able to use ‘any’ MySQL server version unchanged. (Some Oracle client libraries, like C++, by default only supports the new X protocol and can thus not be used with older MySQL or any MariaDB version)

We plan to add a data dictionary to MariaDB 10.4 or MariaDB 10.5, but in a way to not sacrifice any of the above principles!

The competition between MySQL and MariaDB is not just about a tactical arms race on features. It’s about design philosophy, or strategic vision, if you will.

This shows in two main ways: our respective view of the Storage Engine structure, and of the top-level direction of the roadmap.

On the Storage Engine side, MySQL is converging on InnoDB, even for clustering and partitioning. In doing so, they are abandoning the advantages of multiple ways of storing data. By contrast, MariaDB sees lots of value in the Storage Engine architecture: MariaDB Server 10.3 will see the general availability of MyRocks (for write-intensive workloads) and Spider (for scalable workloads). On top of that, we have ColumnStore for analytical workloads. One can use the CONNECT engine to join with other databases. The use of different storage engines for different workloads and different hardware is a competitive differentiator, now more than ever.

On the roadmap side, MySQL is carefully steering clear of features that close the gap between MySQL and Oracle. MariaDB has no such constraints. With MariaDB 10.3, we are introducing PL/SQL compatibility (Oracle’s stored procedures) and AS OF (built-in system versioned tables with point-in-time querying). For both of those features, MariaDB is the first Open Source database doing so. I don’t except Oracle to provide any of the above features in MySQL!

Also on the roadmap side, MySQL is not working with the ecosystem in extending the functionality. In 2017, MariaDB accepted more code contributions in one year, than MySQL has done during its entire lifetime, and the rate is increasing!

I am sure that the experience I had with testing MySQL 8.0 would have been significantly better if MySQL would have an open development model where the community could easily participate in developing and testing MySQL continuously. Most of the confusing error messages and strange behavior would have been found and fixed long before the GA release.

Before upgrading to MySQL 8.0 please read https://dev.mysql.com/doc/refman/8.0/en/upgrading-from-previous-series.html to see what problems you can run into! Don’t expect that old installations or applications will work out of the box without testing as a lot of features and options has been removed (query cache, partition of myisam tables etc)! You probably also have to revise your backup methods, especially if you want to ever restore just a few tables. (With 8.0, I don’t know how this can be easily done).

According to the MySQL 8.0 release notes, one can’t use mysqldump to copy a database to MySQL 8.0. One has to first to move to a MySQL 5.7 GA version (with mysqldump, as recommended by Oracle) and then to MySQL 8.0 with in-place update. I assume this means that all old mysqldump backups are useless for MySQL 8.0?

MySQL 8.0 seams to be a one way street to an unknown future. Up to MySQL 5.7 it has been trivial to move to MariaDB and one could always move back to MySQL with mysqldump. All MySQL client libraries has worked with MariaDB and all MariaDB client libraries has worked with MySQL. With MySQL 8.0 this has changed in the wrong direction.

As long as you are using MySQL 5.7 and below you have choices for your future, after MySQL 8.0 you have very little choice. But don’t despair, as MariaDB will always be able to load a mysqldump file and it’s very easy to upgrade your old MySQL installation to MariaDB 🙂

I wish you good luck to try MySQL 8.0 (and also the upcoming MariaDB 10.3)!

Securing Elections

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/securing_electi_1.html

Elections serve two purposes. The first, and obvious, purpose is to accurately choose the winner. But the second is equally important: to convince the loser. To the extent that an election system is not transparently and auditably accurate, it fails in that second purpose. Our election systems are failing, and we need to fix them.

Today, we conduct our elections on computers. Our registration lists are in computer databases. We vote on computerized voting machines. And our tabulation and reporting is done on computers. We do this for a lot of good reasons, but a side effect is that elections now have all the insecurities inherent in computers. The only way to reliably protect elections from both malice and accident is to use something that is not hackable or unreliable at scale; the best way to do that is to back up as much of the system as possible with paper.

Recently, there have been two graphic demonstrations of how bad our computerized voting system is. In 2007, the states of California and Ohio conducted audits of their electronic voting machines. Expert review teams found exploitable vulnerabilities in almost every component they examined. The researchers were able to undetectably alter vote tallies, erase audit logs, and load malware on to the systems. Some of their attacks could be implemented by a single individual with no greater access than a normal poll worker; others could be done remotely.

Last year, the Defcon hackers’ conference sponsored a Voting Village. Organizers collected 25 pieces of voting equipment, including voting machines and electronic poll books. By the end of the weekend, conference attendees had found ways to compromise every piece of test equipment: to load malicious software, compromise vote tallies and audit logs, or cause equipment to fail.

It’s important to understand that these were not well-funded nation-state attackers. These were not even academics who had been studying the problem for weeks. These were bored hackers, with no experience with voting machines, playing around between parties one weekend.

It shouldn’t be any surprise that voting equipment, including voting machines, voter registration databases, and vote tabulation systems, are that hackable. They’re computers — often ancient computers running operating systems no longer supported by the manufacturers — and they don’t have any magical security technology that the rest of the industry isn’t privy to. If anything, they’re less secure than the computers we generally use, because their manufacturers hide any flaws behind the proprietary nature of their equipment.

We’re not just worried about altering the vote. Sometimes causing widespread failures, or even just sowing mistrust in the system, is enough. And an election whose results are not trusted or believed is a failed election.

Voting systems have another requirement that makes security even harder to achieve: the requirement for a secret ballot. Because we have to securely separate the election-roll system that determines who can vote from the system that collects and tabulates the votes, we can’t use the security systems available to banking and other high-value applications.

We can securely bank online, but can’t securely vote online. If we could do away with anonymity — if everyone could check that their vote was counted correctly — then it would be easy to secure the vote. But that would lead to other problems. Before the US had the secret ballot, voter coercion and vote-buying were widespread.

We can’t, so we need to accept that our voting systems are insecure. We need an election system that is resilient to the threats. And for many parts of the system, that means paper.

Let’s start with the voter rolls. We know they’ve already been targeted. In 2016, someone changed the party affiliation of hundreds of voters before the Republican primary. That’s just one possibility. A well-executed attack that deletes, for example, one in five voters at random — or changes their addresses — would cause chaos on election day.

Yes, we need to shore up the security of these systems. We need better computer, network, and database security for the various state voter organizations. We also need to better secure the voter registration websites, with better design and better internet security. We need better security for the companies that build and sell all this equipment.

Multiple, unchangeable backups are essential. A record of every addition, deletion, and change needs to be stored on a separate system, on write-only media like a DVD. Copies of that DVD, or — even better — a paper printout of the voter rolls, should be available at every polling place on election day. We need to be ready for anything.

Next, the voting machines themselves. Security researchers agree that the gold standard is a voter-verified paper ballot. The easiest (and cheapest) way to achieve this is through optical-scan voting. Voters mark paper ballots by hand; they are fed into a machine and counted automatically. That paper ballot is saved, and serves as a final true record in a recount in case of problems. Touch-screen machines that print a paper ballot to drop in a ballot box can also work for voters with disabilities, as long as the ballot can be easily read and verified by the voter.

Finally, the tabulation and reporting systems. Here again we need more security in the process, but we must always use those paper ballots as checks on the computers. A manual, post-election, risk-limiting audit varies the number of ballots examined according to the margin of victory. Conducting this audit after every election, before the results are certified, gives us confidence that the election outcome is correct, even if the voting machines and tabulation computers have been tampered with. Additionally, we need better coordination and communications when incidents occur.

It’s vital to agree on these procedures and policies before an election. Before the fact, when anyone can win and no one knows whose votes might be changed, it’s easy to agree on strong security. But after the vote, someone is the presumptive winner — and then everything changes. Half of the country wants the result to stand, and half wants it reversed. At that point, it’s too late to agree on anything.

The politicians running in the election shouldn’t have to argue their challenges in court. Getting elections right is in the interest of all citizens. Many countries have independent election commissions that are charged with conducting elections and ensuring their security. We don’t do that in the US.

Instead, we have representatives from each of our two parties in the room, keeping an eye on each other. That provided acceptable security against 20th-century threats, but is totally inadequate to secure our elections in the 21st century. And the belief that the diversity of voting systems in the US provides a measure of security is a dangerous myth, because few districts can be decisive and there are so few voting-machine vendors.

We can do better. In 2017, the Department of Homeland Security declared elections to be critical infrastructure, allowing the department to focus on securing them. On 23 March, Congress allocated $380m to states to upgrade election security.

These are good starts, but don’t go nearly far enough. The constitution delegates elections to the states but allows Congress to “make or alter such Regulations”. In 1845, Congress set a nationwide election day. Today, we need it to set uniform and strict election standards.

This essay originally appeared in the Guardian.

Amazon S3 Update: New Storage Class and General Availability of S3 Select

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-s3-update-new-storage-class-general-availability-of-s3-select/

I’ve got two big pieces of news for anyone who stores and retrieves data in Amazon Simple Storage Service (S3):

New S3 One Zone-IA Storage Class – This new storage class is 20% less expensive than the existing Standard-IA storage class. It is designed to be used to store data that does not need the extra level of protection provided by geographic redundancy.

General Availability of S3 Select – This unique retrieval option lets you retrieve subsets of data from S3 objects using simple SQL expressions, with the possibility for a 400% performance improvement in the process.

Let’s take a look at both!

S3 One Zone-IA (Infrequent Access) Storage Class
This new storage class stores data in a single AWS Availability Zone and is designed to provide eleven 9’s (99.99999999%) of data durability, just like the other S3 storage classes. Unlike those other classes, it is not designed to be resilient to the physical loss of an AZ due to major event such as an earthquake or a flood, and data could be lost in the unlikely event that an AZ is destroyed. S3 One Zone-IA storage gives you a lower cost option for secondary backups of on-premises data and for data that can be easily re-created. You can also use it as the target of S3 Cross-Region Replication from another AWS region.

You can specify the use of S3 One Zone-IA storage when you upload a new object to S3:

You can also make use of it as part of an S3 lifecycle rule:

You can set up a lifecycle rule that moves previous versions of an object to S3 One Zone-IA after 30 or more days:

And you can modify the storage class of an existing object:

You can also manage storage classes using the S3 API, CLI, and CloudFormation templates.

The S3 One Zone-IA storage class can be used in all public AWS regions. As I noted earlier, pricing is 20% lower than for the S3 Standard-IA storage class (see the S3 Pricing page for more info). There’s a 30 day minimum retention period, and a 128 KB minimum object size.

General Availability of S3 Select
Randall wrote a detailed introduction to S3 Select last year and showed you how you can use it to retrieve selected data from within S3 objects. During the preview we added support for server-side encryption and the ability to run queries from the S3 Console.

I used a CSV file of airport codes to exercise the new console functionality:

This file contains listings for over 9100 airports, so it makes for useful test data but it definitely does not test the limits of S3 Select in any way. I select the file, open the More menu, and choose Select from:

The console sets the file format and compression according to the file name and the encryption status. I set delimiter and click Show file preview to verify that my settings are correct. Then I click Next to proceed:

I type SQL expressions in the SQL editor and click Run SQL to issue the query:

Or:

I can also issue queries from the AWS SDKs. I initiate the select operation:

s3 = boto3.client('s3', region_name='us-west-2')

r = s3.select_object_content(
        Bucket='jbarr-us-west-2',
        Key='sample-data/airportCodes.csv',
        ExpressionType='SQL',
        Expression="select * from s3object s where s.\"Country (Name)\" like '%United States%'",
        InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization = {'CSV': {}},
)

And then I process the stream of results:

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)
    elif 'Stats' in event:
        statsDetails = event['Stats']['Details']
        print("Stats details bytesScanned: ")
        print(statsDetails['BytesScanned'])
        print("Stats details bytesProcessed: ")
        print(statsDetails['BytesProcessed'])

S3 Select is available in all public regions and you can start using it today. Pricing is based on the amount of data scanned and the amount of data returned.

Jeff;

WannaCry after one year

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/03/wannacry-after-one-year.html

In the news, Boeing (an aircraft maker) has been “targeted by a WannaCry virus attack”. Phrased this way, it’s implausible. There are no new attacks targeting people with WannaCry. There is either no WannaCry, or it’s simply a continuation of the attack from a year ago.


It’s possible what happened is that an anti-virus product called a new virus “WannaCry”. Virus families are often related, and sometimes a distant relative gets called the same thing. I know this watching the way various anti-virus products label my own software, which isn’t a virus, but which virus writers often include with their own stuff. The Lazarus group, which is believed to be responsible for WannaCry, have whole virus families like this. Thus, just because an AV product claims you are infected with WannaCry doesn’t mean it’s the same thing that everyone else is calling WannaCry.

Famously, WannaCry was the first virus/ransomware/worm that used the NSA ETERNALBLUE exploit. Other viruses have since added the exploit, and of course, hackers use it when attacking systems. It may be that a network intrusion detection system detected ETERNALBLUE, which people then assumed was due to WannaCry. It may actually have been an nPetya infection instead (nPetya was the second major virus/worm/ransomware to use the exploit).

Or it could be the real WannaCry, but it’s probably not a new “attack” that “targets” Boeing. Instead, it’s likely a continuation from WannaCry’s first appearance. WannaCry is a worm, which means it spreads automatically after it was launched, for years, without anybody in control. Infected machines still exist, unnoticed by their owners, attacking random machines on the Internet. If you plug in an unpatched computer onto the raw Internet, without the benefit of a firewall, it’ll get infected within an hour.

However, the Boeing manufacturing systems that were infected were not on the Internet, so what happened? The narrative from the news stories imply some nefarious hacker activity that “targeted” Boeing, but that’s unlikely.

We have now have over 15 years of experience with network worms getting into strange places disconnected and even “air gapped” from the Internet. The most common reason is laptops. Somebody takes their laptop to some place like an airport WiFi network, and gets infected. They put their laptop to sleep, then wake it again when they reach their destination, and plug it into the manufacturing network. At this point, the virus spreads and infects everything. This is especially the case with maintenance/support engineers, who often have specialized software they use to control manufacturing machines, for which they have a reason to connect to the local network even if it doesn’t have useful access to the Internet. A single engineer may act as a sort of Typhoid Mary, going from customer to customer, infecting each in turn whenever they open their laptop.

Another cause for infection is virtual machines. A common practice is to take “snapshots” of live machines and save them to backups. Should the virtual machine crash, instead of rebooting it, it’s simply restored from the backed up running image. If that backup image is infected, then bringing it out of sleep will allow the worm to start spreading.

Jake Williams claims he’s seen three other manufacturing networks infected with WannaCry. Why does manufacturing seem more susceptible? The reason appears to be the “killswitch” that stops WannaCry from running elsewhere. The killswitch uses a DNS lookup, stopping itself if it can resolve a certain domain. Manufacturing networks are largely disconnected from the Internet enough that such DNS lookups don’t work, so the domain can’t be found, so the killswitch doesn’t work. Thus, manufacturing systems are no more likely to get infected, but the lack of killswitch means the virus will continue to run, attacking more systems instead of immediately killing itself.

One solution to this would be to setup sinkhole DNS servers on the network that resolve all unknown DNS queries to a single server that logs all requests. This is trivially setup with most DNS servers. The logs will quickly identify problems on the network, as well as any hacker or virus activity. The side effect is that it would make this killswitch kill WannaCry. WannaCry isn’t sufficient reason to setup sinkhole servers, of course, but it’s something I’ve found generally useful in the past.

Conclusion

Something obviously happened to the Boeing plant, but the narrative is all wrong. Words like “targeted attack” imply things that likely didn’t happen. Facts are so loose in cybersecurity that it may not have even been WannaCry.

The real story is that the original WannaCry is still out there, still trying to spread. Simply put a computer on the raw Internet (without a firewall) and you’ll get attacked. That, somehow, isn’t news. Instead, what’s news is whenever that continued infection hits somewhere famous, like Boeing, even though (as Boeing claims) it had no important effect.

Security of Cloud HSMBackups

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/architecture/security-of-cloud-hsmbackups/

Today, our customers use AWS CloudHSM to meet corporate, contractual and regulatory compliance requirements for data security by using dedicated Hardware Security Module (HSM) instances within the AWS cloud. CloudHSM delivers all the benefits of traditional HSMs including secure generation, storage, and management of cryptographic keys used for data encryption that are controlled and accessible only by you.

As a managed service, it automates time-consuming administrative tasks such as hardware provisioning, software patching, high availability, backups and scaling for your sensitive and regulated workloads in a cost-effective manner. Backup and restore functionality is the core building block enabling scalability, reliability and high availability in CloudHSM.

You should consider using AWS CloudHSM if you require:

  • Keys stored in dedicated, third-party validated hardware security modules under your exclusive control
  • FIPS 140-2 compliance
  • Integration with applications using PKCS#11, Java JCE, or Microsoft CNG interfaces
  • High-performance in-VPC cryptographic acceleration (bulk crypto)
  • Financial applications subject to PCI regulations
  • Healthcare applications subject to HIPAA regulations
  • Streaming video solutions subject to contractual DRM requirements

We recently released a whitepaper, “Security of CloudHSM Backups” that provides in-depth information on how backups are protected in all three phases of the CloudHSM backup lifecycle process: Creation, Archive, and Restore.

About the Author

Balaji Iyer is a senior consultant in the Professional Services team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly-scalable distributed systems, operational security, large scale migrations, and leading strategic AWS initiatives.

Introducing Backblaze’s Rapid Ingest Service: B2 Fireball

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/introducing-backblazes-rapid-ingest-service-fireball/

Introducing Backblaze Fireball

Backblaze’s rapid ingest service, Fireball, graduates out of public beta. Our device holds 70 terabytes of customer data and is perfect for migrating large data sets to B2 Cloud Storage.

At Backblaze, we like to put ourselves in the customer’s shoes. Specifically, we ask questions like “how can we make cloud storage more useful?” There is a long list of things we can do to help — over the last few weeks, we’ve addressed some of them when we lowered the cost of downloading data to $0.01 / GB. Today, we are pleased to publicly release our rapid ingest service, Fireball.

What is the Backblaze B2 Fireball?

The Fireball is a hardware device, specifically a NAS device. Any Backblaze B2 customer can order it from inside their account. The Fireball device can hold up to 70 terabytes of data. Upon ordering, it ships from a Backblaze data center to you. When you receive it, you can transfer your data onto the Fireball using your internal network. Once your data transfer is complete, you send it back to a Backblaze data center. Finally, inside our secure data center, your data is uploaded from the Fireball to your account. Your data remains encrypted throughout the process. Step by step instructions can be found here.

What is Fireball?

Why Use the Fireball?

“We would not have been able to get this project off the ground without the B2 Fireball.” — James Cole, KLRU (Austin City Limits)

For most customers, transferring large quantities of data isn’t always simple. The need can arise as you migrate off of legacy systems (e.g. replacing LTO) or simply on a project basis (e.g. transferring video shot in the field to the cloud). An common approach is to upload your data via the internet to the cloud storage vendor of your choosing. While cloud storage vendors don’t charge for uploads, you have to pay your network provider for bandwidth. That’s assuming you are in a place where the bandwidth can be secured.

Your data is stored in megabytes (“MB”) but your bandwidth is measured in megabits per second (“Mbps”). The difference? An 80 Mbps upload connection will transfer no more than 10 MB per second. That means, in your best case scenario, you might be able to upload 50 terabytes in 50 days, assuming you use nearly all of your upload bandwidth for the upload.

If you’re looking to migrate old backups from LTO or even a large project, a 3 month lag time is not operationally viable. That’s why multiple cloud storage providers have introduced rapid ingest devices.

How It Compares: Backblaze B2 Fireball vs AWS Snowball vs GCS Transfer Appliance

“We found the B2 Fireball much simpler and easier to use than Amazon’s Snowball. WunderVu had been looking for a cloud solution for security and simplicity, and B2 hit every check box.” — Aaron Rhodes, Executive Producer, WunderVu

Every vendor that offers a rapid ingest service only lets you upload to that vendor’s cloud. For example, you can’t use an Amazon Snowball to upload to Google Cloud Storage. This means that when considering a rapid ingest service, you are also making a decision on what cloud storage vendor to use. As such, one should consider not only the cost of the rapid ingest service, but also how much that vendor is going to charge you to store and download your data.

Device Capacity Service Fee Shipping Cloud Storage
$/GB/Month
Download
$/GB
Backblaze B2 70 TB $550
(30 day rental)
$75 $0.005 $0.01
Amazon S3 50 TB $200
(10 day rental)
$? * $0.021
+320%
$0.05+
+500%
Google Cloud 100 TB $300
(10 day rental)
$500 $0.020
+300%
$0.08+
+800%

*AWS does not estimate shipping fees at the time of the Snowball order.

To make the comparison easier, let’s create a hypothetical case and compare the costs incurred in the first year. Assume you have 100 TB as an initial upload. But that’s just the initial upload. Over the course of the year, let’s consider a usage pattern where every month you add 5 TB, delete 2 TBs, and download 10 TBs.

Transfer Cost Cloud Storage Fees Total Transfer +
Cloud Storage Fees
Backblaze B2 $1,250
(2 Fireballs)
$9,570 $10,820
Amazon S3 $400
(2 Snowballs)
$36,114 $36,514
+337%
Google Cloud $800
(1 transit)
$39,684 $40,484
+374%

Just looking at the first year, Amazon is 337% more expensive than Backblaze and Google is 374% more expensive than Backblaze.

Put simply, Backblaze offers the lowest cost, high performance cloud storage on the planet. During our public beta of the Fireball program we’ve had extremely positive feedback around how the Fireball enables customers to get their projects started in a time efficient and cost effective way. We hope you’ll give it a try!

The post Introducing Backblaze’s Rapid Ingest Service: B2 Fireball appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

New – Amazon DynamoDB Continuous Backups and Point-In-Time Recovery (PITR)

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-continuous-backups-and-point-in-time-recovery-pitr/

The Amazon DynamoDB team is back with another useful feature hot on the heels of encryption at rest. At AWS re:Invent 2017 we launched global tables and on-demand backup and restore of your DynamoDB tables and today we’re launching continuous backups with point-in-time recovery (PITR).

You can enable continuous backups with a single click in the AWS Management Console, a simple API call, or with the AWS Command Line Interface (CLI). DynamoDB can back up your data with per-second granularity and restore to any single second from the time PITR was enabled up to the prior 35 days. We built this feature to protect against accidental writes or deletes. If a developer runs a script against production instead of staging or if someone fat-fingers a DeleteItem call, PITR has you covered. We also built it for the scenarios you can’t normally predict. You can still keep your on-demand backups for as long as needed for archival purposes but PITR works as additional insurance against accidental loss of data. Let’s see how this works.

Continuous Backup

To enable this feature in the console we navigate to our table and select the Backups tab. From there simply click Enable to turn on the feature. I could also turn on continuous backups via the UpdateContinuousBackups API call.

After continuous backup is enabled we should be able to see an Earliest restore date and Latest restore date

Let’s imagine a scenario where I have a lot of old user profiles that I want to delete.

I really only want to send service updates to our active users based on their last_update date. I decided to write a quick Python script to delete all the users that haven’t used my service in a while.

import boto3
table = boto3.resource("dynamodb").Table("VerySuperImportantTable")
items = table.scan(
    FilterExpression="last_update >= :date",
    ExpressionAttributeValues={":date": "2014-01-01T00:00:00"},
    ProjectionExpression="ImportantId"
)['Items']
print("Deleting {} Items! Dangerous.".format(len(items)))
with table.batch_writer() as batch:
    for item in items:
        batch.delete_item(Key=item)

Great! This should delete all those pesky non-users of my service that haven’t logged in since 2013. So,— CTRL+C CTRL+C CTRL+C CTRL+C (interrupt the currently executing command).

Yikes! Do you see where I went wrong? I’ve just deleted my most important users! Oh, no! Where I had a greater-than sign, I meant to put a less-than! Quick, before Jeff Barr can see, I’m going to restore the table. (I probably could have prevented that typo with Boto 3’s handy DynamoDB conditions: Attr("last_update").lt("2014-01-01T00:00:00"))

Restoring

Luckily for me, restoring a table is easy. In the console I’ll navigate to the Backups tab for my table and click Restore to point-in-time.

I’ll specify the time (a few seconds before I started my deleting spree) and a name for the table I’m restoring to.

For a relatively small and evenly distributed table like mine, the restore is quite fast.

The time it takes to restore a table varies based on multiple factors and restore times are not neccesarily coordinated with the size of the table. If your dataset is evenly distributed across your primary keys you’ll be able to take advanatage of parallelization which will speed up your restores.

Learn More & Try It Yourself
There’s plenty more to learn about this new feature in the documentation here.

Pricing for continuous backups varies by region and is based on the current size of the table and all indexes.

A few things to note:

  • PITR works with encrypted tables.
  • If you disable PITR and later reenable it, you reset the start time from which you can recover.
  • Just like on-demand backups, there are no performance or availability impacts to enabling this feature.
  • Stream settings, Time To Live settings, PITR settings, tags, Amazon CloudWatch alarms, and auto scaling policies are not copied to the restored table.
  • Jeff, it turns out, knew I restored the table all along because every PITR API call is recorded in AWS CloudTrail.

Let us know how you’re going to use continuous backups and PITR on Twitter and in the comments.
Randall

Simplicity is a Feature for Cloud Backup

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/distributed-cloud-backup-for-businesses/

cloud on a blue background
For Joel Wagener, Director of IT at AIBS, simplicity is an important feature he looks for in software applications to use in his organization. So maybe it’s not unexpected that Joel chose Backblaze for Business to back up AIBS’s staff computers. According to Joel, “It just works.”American Institute of Biological Sciences

AIBS (The American Institute of Biological Sciences) is a non-profit scientific association dedicated to advancing biological research and education. Founded in 1947 as part of the National Academy of Sciences, AIBS later became independent and now has over 100 member organizations. AIBS works to ensure that the public, legislators, funders, and the community of biologists have access to and use information that will guide them in making informed decisions about matters that require biological knowledge.

AIBS started using Backblaze for Business Cloud Backup several years ago to make sure that the organization’s data was backed up and protected from accidental loss or computer failure. AIBS is based in Washington, D.C., but is a virtual organization, with staff dispersed around the United States. AIBS needed a backup solution that worked anywhere a staff member was located, and was easy to use, as well. Joel has made Backblaze a default part of the configuration management for all the AIBS endpoints, which in their case are exclusively Macintosh.

AIBS biological images

“We started using Backblaze on a single computer in 2014, then not too long after that decided to deploy it to all our endpoints,” explains Joel. “We use Groups to oversee backups and for central billing, but we let each user manage their own computer and restore files on their own if they need to.”

“Backblaze stays out of the way until we need it. It’s fairly lightweight, and I appreciate that it’s simple,” says Joel. “It doesn’t throttle backups and the price point is good. I have family members who use Backblaze, as well.”

Backblaze’s Groups feature permits an organization to oversee and manage the user accounts, including restores, or let users handle that themselves. This flexibility fits a variety of organizations, where various degrees of oversight or independence are desirable. The finance and HR departments could manage their own data, for example, while the rest of the organization could be managed by IT. All groups can be billed centrally no matter how other functionality is set up.

“If we have a computer that needs repair, we can put a loaner computer in that person’s hands and they can immediately get the data they need directly from the Backblaze cloud backup, which is really helpful. When we get the original computer back from repair we can do a complete restore and return it to the user all ready to go again. When we’ve needed restores, Backblaze has been reliable.”

Joel also likes that the memory footprint of Backblaze is light — the clients for both Macintosh and Windows are native, and designed to use minimum system resources and not impact any applications used on the computer. He also likes that updates to the client software are pushed out when necessary.

Backblaze for Business

Backblaze for Business also helps IT maintain archives of users’ computers after they leave the organization.

“We like that we have a ready-made archive of a computer when someone leaves,” said Joel. The Backblaze backup is there if we need to retrieve anything that person was working on.”

There are other capabilities in Backblaze that Joel likes, but hasn’t had a chance to use yet.

“We’ve used Casper (Jamf) to deploy and manage software on endpoints without needing any interaction from the user. We haven’t used it yet for Backblaze, but we know that Backblaze supports it. It’s a handy feature to have.”

”It just works.”
— Joel Wagener, AIBS Director of IT

Perhaps the best thing about Backblaze for Business isn’t a specific feature that can be found on a product data sheet.

“When files have been lost, Backblaze has provided us access to multiple prior versions, and this feature has been important and worked well several times,” says Joel.

“That provides needed peace of mind to our users, and our IT department, as well.”

The post Simplicity is a Feature for Cloud Backup appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Setting Up Cassandra With Priam

Post Syndicated from Bozho original https://techblog.bozho.net/setting-cassandra-priam/

I’ve previously explained how to setup Cassandra in AWS. The described setup works, but in some cases it may not be sufficient. E.g. it doesn’t give you an easy way to make and restore backups, and adding new nodes relies on a custom python script that randomly selects a seed.

So now I’m going to explain how to setup Priam, a Cassandra helper tool by Netflix.

My main reason for setting it up is the backup/restore functionality that it offers. All other ways to do backups are very tedious, and Priam happens to have implemented the important bits – the snapshotting and the incremental backups.

Priam is a bit tricky to get running, though. The setup guide is not too detailed and not easy to find (it’s the last, not immediately visible item in the wiki). First, it has one branch per Cassandra version, so you have to checkout the proper branch and build it. I immediately hit an issue there, as their naming doesn’t allow eclipse to import the gradle project. Within 24 hours I reported 3 issues, which isn’t ideal. Priam doesn’t support dynamic SimpleDB names, and doesn’t let you override bundled properties via the command line. I hope there aren’t bigger issues. The ones that I encountered, I fixed and made a pull request.

What does the setup look like?

  • Append a javaagent to the JVM options
  • Run the Priam web
  • It automatically replaces most of cassandra.yaml, including the seed provider (i.e. how does the node find other nodes in the cluster)
  • Run Cassandra
  • It fetches seed information (which is stored in AWS SimpleDB) and connects to a cluster

I decided to run the war file with a standalone jetty runner, rather than installing tomcat. In terms of shell scripts, the core bits look like that (in addition to the shell script in the original post that is run on initialization of the node):

# Get the Priam war file and jar file
aws s3 cp s3://$BUCKET_NAME/priam-web-3.12.0-SNAPSHOT.war ~/
aws s3 cp s3://$BUCKET_NAME/priam-cass-extensions-3.12.0-SNAPSHOT.jar /usr/share/cassandra/lib/priam-cass-extensions.jar
# Set the Priam agent
echo "-javaagent:/usr/share/cassandra/lib/priam-cass-extensions.jar" >> /etc/cassandra/conf/jvm.options

# Download jetty-runner to be able to run the Priam war file from the command line
wget http://central.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.8.v20171121/jetty-runner-9.4.8.v20171121.jar
nohup java -Dpriam.clustername=LogSentinelCluster -Dpriam.sdb.instanceIdentity.region=$EC2_REGION -Dpriam.s3.bucket=$BACKUP_BUCKET \
-Dpriam.sdb.instanceidentity.domain=$INSTANCE_IDENTITY_DOMAIN -Dpriam.sdb.properties.domain=$PROPERTIES_DOMAIN \
-Dpriam.client.sslEnabled=true -Dpriam.internodeEncryption=all -Dpriam.rpc.server.type=sync \
-Dpriam.partitioner=org.apache.cassandra.dht.Murmur3Partitioner -Dpriam.backup.retention.days=7 \
-Dpriam.backup.hour=$BACKUP_HOUR -Dpriam.vnodes.numTokens=256 -Dpriam.thrift.enabled=false \
-jar jetty-runner-9.4.8.v20171121.jar --path /Priam ~/priam-web-3.12.0-SNAPSHOT.war &

while ! echo exit | nc $BIND_IP 8080; do sleep 10; done

echo "Started Priam web package"

service cassandra start
chkconfig cassandra on

while ! echo exit | nc $BIND_IP 9042; do sleep 10; done

BACKUP_BUCKET, PROPERTIES_DOMAIN and INSTANCE_DOMAIN are supplied via a CloudFormation script (as we can’t know the exact names in advance – especially for SimpleDB). Note that these properties won’t work in the main repo – I added them in my pull request.

In order for that to work, you need to have the two SimpleDB domains created (e.g. by CloudFormation). It is possible that you could replace SimpleDB with some other data storage (and not rely on AWS), but that’s out of scope for now.

The result of running Priam would be that you have your Cassandra nodes in SimpleDB (you can browse it using this chrome extension as AWS doesn’t offer any UI) and, of course, backups will be automatically created in the backup S3 Bucket.

You can then restore a backup by logging to each node and executing:

curl http://localhost:8080/Priam/REST/v1/restore?daterange=201803180000,201803191200&region=eu-west-1&keyspaces=your_keyspace

You specify the time range for the restore. Still not ideal, as one would hope to have a one-click restore, but much better than rolling out your own backup & restore infrastructure.

One very important note here – vnodes are not supported. My original cluster had a default of 256 vnodes per machine and now it has just 1, because Priam doesn’t support anything other than 1. That’s a pity, since vnodes are the recommended way to setup Cassandra. Apparently Netflix don’t use those, however. There’s a work-in-progress branch for that that was abandoned 5 years ago. Fortunately, there’s a fresh pull request with Vnode support that can be used in conjunction with my pull request from this branch.

Priam replaces some Cassandra defaults with other values so you might want to compare your current setup and the newly generated cassandra.yaml. Overall it doesn’t feel super-production ready, but apparently it is, as Netflix is using it in production.

The post Setting Up Cassandra With Priam appeared first on Bozho's tech blog.

Needed: Senior Software Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/needed-senior-software-engineer/

Want to work at a company that helps customers in 156 countries around the world protect the memories they hold dear? A company that stores over 500 petabytes of customers’ photos, music, documents and work files in a purpose-built cloud storage system?

Well, here’s your chance. Backblaze is looking for a Sr. Software Engineer!

Company Description:

Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • New Parent Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280

Want to know what you’ll be doing?

You will work on the server side APIs that authenticate users when they log in, accept the backups, manage the data, and prepare restored data for customers. And you will help build new features as well as support tools to help chase down and diagnose customer issues.

Must be proficient in:

  • Java
  • Apache Tomcat
  • Large scale systems supporting thousands of servers and millions of customers
  • Cross platform (Linux/Macintosh/Windows) — don’t need to be an expert on all three, but cannot be afraid of any

Bonus points for:

  • Cassandra experience
  • JavaScript
  • ReactJS
  • Python
  • Struts
  • JSP’s

Looking for an attitude of:

  • Passionate about building friendly, easy to use Interfaces and APIs.
  • Likes to work closely with other engineers, support, and sales to help customers.
  • Believes the whole world needs backup, not just English speakers in the USA.
  • Customer Focused (!!) — always focus on the customer’s point of view and how to solve their problem!

Required for all Backblaze Employees:

  • Good attitude and willingness to do whatever it takes to get the job done
  • Strong desire to work for a small, fast-paced company
  • Desire to learn and adapt to rapidly changing technologies and work environment
  • Rigorous adherence to best practices
  • Relentless attention to detail
  • Excellent interpersonal skills and good oral/written communication
  • Excellent troubleshooting and problem solving skills

This position is located in San Mateo, California but will also consider remote work as long as you’re no more than three time zones away and can come to San Mateo now and then.

Backblaze is an Equal Opportunity Employer.

If this sounds like you —follow these steps:

  1. Send an email to [email protected] with the position in the subject line.
  2. Include your resume.
  3. Tell us a bit about your programming experience.

The post Needed: Senior Software Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.