As we shoot way past 500 petabytes of data stored, we need a lot of helping hands in the data center to keep those hard drives spinning! We’ve been hiring quite a lot, and our latest addition is Jack. Lets learn a bit more about him, shall we?
What is your Backblaze Title? Data Center Tech
Where are you originally from? Walnut Creek, CA until 7th grade when the family moved to Durango, Colorado.
What attracted you to Backblaze? I had heard about how cool the Backblaze community is and have always been fascinated by technology.
What do you expect to learn while being at Backblaze? I expect to learn a lot about how our data centers run and all of the hardware behind it.
Where else have you worked? Garrhs HVAC as an HVAC Installer and then Durango Electrical as a Low Volt Technician.
Where did you go to school? Durango High School and then Montana State University.
What’s your dream job? I would love to be a driver for the Audi Sport. Race cars are so much fun!
Favorite place you’ve traveled? Iceland has definitely been my favorite so far.
Favorite hobby? Video games.
Of what achievement are you most proud? Getting my Eagle Scout badge was a tough, but rewarding experience that I will always cherish.
Star Trek or Star Wars? Star Wars.
Coke or Pepsi? Coke…I know, it’s bad.
Favorite food? Thai food.
Why do you like certain things? I tend to warm up to things the more time I spend around them, although I never really know until it happens.
Anything else you’d like to tell us? I’m a friendly car guy who will always be in love with my European cars and I really enjoy the Backblaze community!
We’re happy you joined us Out West! Welcome aboard Jack!
I’m in danger of contradicting myself, after previously pointing out that x86 machine code is a high-level language, but this article claiming C is a not a low level language is bunk. C certainly has some problems, but it’s still the closest language to assembly. This is obvious by the fact it’s still the fastest compiled language. What we see is a typical academic out of touch with the real world.
The author makes the (wrong) observation that we’ve been stuck emulating the PDP-11 for the past 40 years. C was written for the PDP-11, and since then CPUs have been designed to make C run faster. The author imagines a different world, such as where CPU designers instead target something like LISP as their preferred language, or Erlang. This misunderstands the state of the market. CPUs do indeed supports lots of different abstractions, and C has evolved to accommodate this.
The author criticizes things like “out-of-order” execution which has lead to the Spectre sidechannel vulnerabilities. Out-of-order execution is necessary to make C run faster. The author claims instead that those resources should be spent on having more slower CPUs, with more threads. This sacrifices single-threaded performance in exchange for a lot more threads executing in parallel. The author cites Sparc Tx CPUs as his ideal processor.
But here’s the thing, the Sparc Tx was a failure. To be fair, it’s mostly a failure because most of the time, people wanted to run old C code instead of new Erlang code. But it was still a failure at running Erlang.
Time after time, engineers keep finding that “out-of-order”, single-threaded performance is still the winner. A good example is ARM processors for both mobile phones and servers. All the theory points to in-order CPUs as being better, but all the products are out-of-order, because this theory is wrong. The custom ARM cores from Apple and Qualcomm used in most high-end phones are so deeply out-of-order they give Intel CPUs competition. The same is true on the server front with the latest Qualcomm Centriq and Cavium ThunderX2 processors, deeply out of order supporting more than 100 instructions in flight.
The Cavium is especially telling. Its ThunderX CPU had 48 simple cores which was replaced with the ThunderX2 having 32 complex, deeply out-of-order cores. The performance increase was massive, even on multithread-friendly workloads. Every competitor to Intel’s dominance in the server space has learned the lesson from Sparc Tx: many wimpy cores is a failure, you need fewer beefy cores. Yes, they don’t need to be as beefy as Intel’s processors, but they need to be close.
Even Intel’s “Xeon Phi” custom chip learned this lesson. This is their GPU-like chip, running 60 cores with 512-bit wide “vector” (sic) instructions, designed for supercomputer applications. Its first version was purely in-order. Its current version is slightly out-of-order. It supports four threads and focuses on basic number crunching, so in-order cores seems to be the right approach, but Intel found in this case that out-of-order processing still provided a benefit. Practice is different than theory.
As an academic, the author of the above article focuses on abstractions. The criticism of C is that it has the wrong abstractions which are hard to optimize, and that if we instead expressed things in the right abstractions, it would be easier to optimize.
This is an intellectually compelling argument, but so far bunk.
The reason is that while the theoretical base language has issues, everyone programs using extensions to the language, like “intrinsics” (C ‘functions’ that map to assembly instructions). Programmers write libraries using these intrinsics, which then the rest of the normal programmers use. In other words, if your criticism is that C is not itself low level enough, it still provides the best access to low level capabilities.
Given that C can access new functionality in CPUs, CPU designers add new paradigms, from SIMD to transaction processing. In other words, while in the 1980s CPUs were designed to optimize C (stacks, scaled pointers), these days CPUs are designed to optimize tasks regardless of language.
The author of that article criticizes the memory/cache hierarchy, claiming it has problems. Yes, it has problems, but only compared to how well it normally works. The author praises the many simple cores/threads idea as hiding memory latency with little caching, but misses the point that caches also dramatically increase memory bandwidth. Intel processors are optimized to read a whopping 256 bits every clock cycle from L1 cache. Main memory bandwidth is orders of magnitude slower.
The author goes onto criticize cache coherency as a problem. C uses it, but other languages like Erlang don’t need it. But that’s largely due to the problems each languages solves. Erlang solves the problem where a large number of threads work on largely independent tasks, needing to send only small messages to each other across threads. The problems C solves is when you need many threads working on a huge, common set of data.
For example, consider the “intrusion prevention system”. Any thread can process any incoming packet that corresponds to any region of memory. There’s no practical way of solving this problem without a huge coherent cache. It doesn’t matter which language or abstractions you use, it’s the fundamental constraint of the problem being solved. RDMA is an important concept that’s moved from supercomputer applications to the data center, such as with memcached. Again, we have the problem of huge quantities (terabytes worth) shared among threads rather than small quantities (kilobytes).
The fundamental issue the author of the the paper is ignoring is decreasing marginal returns. Moore’s Law has gifted us more transistors than we can usefully use. We can’t apply those additional registers to just one thing, because the useful returns we get diminish.
For example, Intel CPUs have two hardware threads per core. That’s because there are good returns by adding a single additional thread. However, the usefulness of adding a third or fourth thread decreases. That’s why many CPUs have only two threads, or sometimes four threads, but no CPU has 16 threads per core.
You can apply the same discussion to any aspect of the CPU, from register count, to SIMD width, to cache size, to out-of-order depth, and so on. Rather than focusing on one of these things and increasing it to the extreme, CPU designers make each a bit larger every process tick that adds more transistors to the chip.
The same applies to cores. It’s why the “more simpler cores” strategy fails, because more cores have their own decreasing marginal returns. Instead of adding cores tied to limited memory bandwidth, it’s better to add more cache. Such cache already increases the size of the cores, so at some point it’s more effective to add a few out-of-order features to each core rather than more cores. And so on.
The question isn’t whether we can change this paradigm and radically redesign CPUs to match some academic’s view of the perfect abstraction. Instead, the goal is to find new uses for those additional transistors. For example, “message passing” is a useful abstraction in languages like Go and Erlang that’s often more useful than sharing memory. It’s implemented with shared memory and atomic instructions, but I can’t help but think it couldn’t better be done with direct hardware support.
Of course, as soon as they do that, it’ll become an intrinsic in C, then added to languages like Go and Erlang.
Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.
To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.
Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.
What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.
Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.
Here is how the solution is laid out:
The following sections walk you through each of these steps to set up the pipeline.
1. Define the schema
When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.
To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.
The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.
That’s all you have to do to prepare the schema for Kinesis Data Firehose.
2. Define the data streams
Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events. Open the Kinesis Data Streams console and create two streams. You can configure them with only one shard each because you don’t have a lot of data right now.
3. Define the Kinesis Data Firehose delivery stream for Parquet
Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.
As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.
To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.
On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.
Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.
Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.
Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.
4. Set up the Amazon Connect contact center
After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.
After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.
At this point, your pipeline is complete. Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.
So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.
To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.
5. Analyze the data with Athena
Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures. The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.
Let’s run the first query to see which agents have logged into the system.
The query might look complex, but it’s fairly straightforward:
WITH dataset AS (
from_iso8601_timestamp(eventtimestamp) AS event_ts,
-- CURRENT STATE
'$.agentstatus.name') AS current_status,
'$.agentstatus.starttimestamp')) AS current_starttimestamp,
'$.configuration.firstname') AS current_firstname,
'$.configuration.lastname') AS current_lastname,
'$.configuration.username') AS current_username,
'$.configuration.routingprofile.defaultoutboundqueue.name') AS current_outboundqueue,
'$.configuration.routingprofile.inboundqueues.name') as current_inboundqueue,
-- PREVIOUS STATE
'$.agentstatus.name') as prev_status,
'$.agentstatus.starttimestamp')) as prev_starttimestamp,
'$.configuration.firstname') as prev_firstname,
'$.configuration.lastname') as prev_lastname,
'$.configuration.username') as prev_username,
'$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
'$.configuration.routingprofile.inboundqueues.name') as prev_inboundqueue
where eventtype <> 'HEART_BEAT'
current_status as status,
current_username as username,
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC
The query output looks something like this:
Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.
WITH src AS (
json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
src2 AS (
FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
dataset AS (
json_extract_scalar(c_item, '$.contactid') as c_contactid,
json_extract_scalar(c_item, '$.channel') as c_channel,
json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
json_extract_scalar(c_item, '$.queue.name') as c_queue,
json_extract_scalar(c_item, '$.state') as c_state,
from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
json_extract_scalar(p_item, '$.contactid') as p_contactid,
json_extract_scalar(p_item, '$.channel') as p_channel,
json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
json_extract_scalar(p_item, '$.queue.name') as p_queue,
json_extract_scalar(p_item, '$.state') as p_state,
from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
c_channel as channel,
c_direction as direction,
p_state as prev_state,
c_state as current_state,
c_ts as current_ts,
c_contactid as id
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC
The query output looks similar to the following:
6. Analyze the data with Amazon Redshift Spectrum
With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.
Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.
json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
The following shows the query output:
In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.
Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.
In our blog post on Tuesday, Cryptocurrency Security Challenges, we wrote about the two primary challenges faced by anyone interested in safely and profitably participating in the cryptocurrency economy: 1) make sure you’re dealing with reputable and ethical companies and services, and, 2) keep your cryptocurrency holdings safe and secure.
In this post, we’re going to focus on how to make sure you don’t lose any of your cryptocurrency holdings through accident, theft, or carelessness. You do that by backing up the keys needed to sell or trade your currencies.
$34 Billion in Lost Value
Of the 16.4 million bitcoins said to be in circulation in the middle of 2017, close to 3.8 million may have been lost because their owners no longer are able to claim their holdings. Based on today’s valuation, that could total as much as $34 billion dollars in lost value. And that’s just bitcoins. There are now over 1,500 different cryptocurrencies, and we don’t know how many of those have been misplaced or lost.
Now that some cryptocurrencies have reached (at least for now) staggering heights in value, it’s likely that owners will be more careful in keeping track of the keys needed to use their cryptocurrencies. For the ones already lost, however, the owners have been separated from their currencies just as surely as if they had thrown Benjamin Franklins and Grover Clevelands over the railing of a ship.
The Basics of Securing Your Cryptocurrencies
In our previous post, we reviewed how cryptocurrency keys work, and the common ways owners can keep track of them. A cryptocurrency owner needs two keys to use their currencies: a public key that can be shared with others is used to receive currency, and a private key that must be kept secure is used to spend or trade currency.
Many wallets and applications allow the user to require extra security to access them, such as a password, or iris, face, or thumb print scan. If one of these options is available in your wallets, take advantage of it. Beyond that, it’s essential to back up your wallet, either using the backup feature built into some applications and wallets, or manually backing up the data used by the wallet. When backing up, it’s a good idea to back up the entire wallet, as some wallets require additional private data to operate that might not be apparent.
No matter which backup method you use, it is important to back up often and have multiple backups, preferable in different locations. As with any valuable data, a 3-2-1 backup strategy is good to follow, which ensures that you’ll have a good backup copy if anything goes wrong with one or more copies of your data.
One more caveat, don’t reuse passwords. This applies to all of your accounts, but is especially important for something as critical as your finances. Don’t ever use the same password for more than one account. If security is breached on one of your accounts, someone could connect your name or ID with other accounts, and will attempt to use the password there, as well. Consider using a password manager such as LastPass or 1Password, which make creating and using complex and unique passwords easy no matter where you’re trying to sign in.
Approaches to Backing Up Your Cryptocurrency Keys
There are numerous ways to be sure your keys are backed up. Let’s take them one by one.
1. Automatic backups using a backup program
If you’re using a wallet program on your computer, for example, Bitcoin Core, it will store your keys, along with other information, in a file. For Bitcoin Core, that file is wallet.dat. Other currencies will use the same or a different file name and some give you the option to select a name for the wallet file.
To back up the wallet.dat or other wallet file, you might need to tell your backup program to explicitly back up that file. Users of Backblaze Backup don’t have to worry about configuring this, since by default, Backblaze Backup will back up all data files. You should determine where your particular cryptocurrency, wallet, or application stores your keys, and make sure the necessary file(s) are backed up if your backup program requires you to select which files are included in the backup.
Backblaze B2 is an option for those interested in low-cost and high security cloud storage of their cryptocurrency keys. Backblaze B2 supports 2-factor verification for account access, works with a number of apps that support automatic backups with encryption, error-recovery, and versioning, and offers an API and command-line interface (CLI), as well. The first 10GB of storage is free, which could be all one needs to store encrypted cryptocurrency keys.
2. Backing up by exporting keys to a file
Apps and wallets will let you export your keys from your app or wallet to a file. Once exported, your keys can be stored on a local drive, USB thumb drive, DAS, NAS, or in the cloud with any cloud storage or sync service you wish. Encrypting the file is strongly encouraged — more on that later. If you use 1Password or LastPass, or other secure notes program, you also could store your keys there.
3. Backing up by saving a mnemonic recovery seed
A mnemonic phrase, mnemonic recovery phrase, or mnemonic seed is a list of words that stores all the information needed to recover a cryptocurrency wallet. Many wallets will have the option to generate a mnemonic backup phrase, which can be written down on paper. If the user’s computer no longer works or their hard drive becomes corrupted, they can download the same wallet software again and use the mnemonic recovery phrase to restore their keys.
The phrase can be used by anyone to recover the keys, so it must be kept safe. Mnemonic phrases are an excellent way of backing up and storing cryptocurrency and so they are used by almost all wallets.
A mnemonic recovery seed is represented by a group of easy to remember words. For example:
The first four letters are enough to unambiguously identify the word.
Similar words are avoided (such as: build and built).
Bitcoin and most other cryptocurrencies such as Litecoin, Ethereum, and others use mnemonic seeds that are 12 to 24 words long. Other currencies might use different length seeds.
4. Physical backups — Paper, Metal
Some cryptocurrency holders believe that their backup, or even all their cryptocurrency account information, should be stored entirely separately from the internet to avoid any risk of their information being compromised through hacks, exploits, or leaks. This type of storage is called “cold storage.” One method of cold storage involves printing out the keys to a piece of paper and then erasing any record of the keys from all computer systems. The keys can be entered into a program from the paper when needed, or scanned from a QR code printed on the paper.
Printed public and private keys
Some who go to extremes suggest separating the mnemonic needed to access an account into individual pieces of paper and storing those pieces in different locations in the home or office, or even different geographical locations. Some say this is a bad idea since it could be possible to reconstruct the mnemonic from one or more pieces. How diligent you wish to be in protecting these codes is up to you.
Mnemonic recovery phrase booklet
There’s another option that could make you the envy of your friends. That’s the CryptoSteel wallet, which is a stainless steel metal case that comes with more than 250 stainless steel letter tiles engraved on each side. Codes and passwords are assembled manually from the supplied part-randomized set of tiles. Users are able to store up to 96 characters worth of confidential information. Cryptosteel claims to be fireproof, waterproof, and shock-proof.
Cryptosteel cold wallet
Of course, if you leave your Cryptosteel wallet in the pocket of a pair of ripped jeans that gets thrown out by the housekeeper, as happened to the character Russ Hanneman on the TV show Silicon Valley in last Sunday’s episode, then you’re out of luck. That fictional billionaire investor lost a USB drive with $300 million in cryptocoins. Let’s hope that doesn’t happen to you.
Encryption & Security
Whether you store your keys on your computer, an external disk, a USB drive, DAS, NAS, or in the cloud, you want to make sure that no one else can use those keys. The best way to handle that is to encrypt the backup.
With Backblaze Backup for Windows and Macintosh, your backups are encrypted in transmission to the cloud and on the backup server. Users have the option to add an additional level of security by adding a Personal Encryption Key (PEK), which secures their private key. Your cryptocurrency backup files are secure in the cloud. Using our web or mobile interface, previous versions of files can be accessed, as well.
Our object storage cloud offering, Backblaze B2, can be used with a variety of applications for Windows, Macintosh, and Linux. With B2, cryptocurrency users can choose whichever method of encryption they wish to use on their local computers and then upload their encrypted currency keys to the cloud. Depending on the client used, versioning and life-cycle rules can be applied to the stored files.
Other backup programs and systems provide some or all of these capabilities, as well. If you are backing up to a local drive, it is a good idea to encrypt the local backup, which is an option in some backup programs.
Some experts recommend using a different address for each cryptocurrency transaction. Since the address is not the same as your wallet, this means that you are not creating a new wallet, but simply using a new identifier for people sending you cryptocurrency. Creating a new address is usually as easy as clicking a button in the wallet.
One of the chief advantages of using a different address for each transaction is anonymity. Each time you use an address, you put more information into the public ledger (blockchain) about where the currency came from or where it went. That means that over time, using the same address repeatedly could mean that someone could map your relationships, transactions, and incoming funds. The more you use that address, the more information someone can learn about you. For more on this topic, refer to Address reuse.
Note that a downside of using a paper wallet with a single key pair (type-0 non-deterministic wallet) is that it has the vulnerabilities listed above. Each transaction using that paper wallet will add to the public record of transactions associated with that address. Newer wallets, i.e. “deterministic” or those using mnemonic code words support multiple addresses and are now recommended.
There are other approaches to keeping your cryptocurrency transaction secure. Here are a couple of them.
Multi-signature refers to requiring more than one key to authorize a transaction, much like requiring more than one key to open a safe. It is generally used to divide up responsibility for possession of cryptocurrency. Standard transactions could be called “single-signature transactions” because transfers require only one signature — from the owner of the private key associated with the currency address (public key). Some wallets and apps can be configured to require more than one signature, which means that a group of people, businesses, or other entities all must agree to trade in the cryptocurrencies.
Deep Cold Storage
Deep cold storage ensures the entire transaction process happens in an offline environment. There are typically three elements to deep cold storage.
First, the wallet and private key are generated offline, and the signing of transactions happens on a system not connected to the internet in any manner. This ensures it’s never exposed to a potentially compromised system or connection.
Second, details are secured with encryption to ensure that even if the wallet file ends up in the wrong hands, the information is protected.
Third, storage of the encrypted wallet file or paper wallet is generally at a location or facility that has restricted access, such as a safety deposit box at a bank.
Deep cold storage is used to safeguard a large individual cryptocurrency portfolio held for the long term, or for trustees holding cryptocurrency on behalf of others, and is possibly the safest method to ensure a crypto investment remains secure.
Keep Your Software Up to Date
You should always make sure that you are using the latest version of your app or wallet software, which includes important stability and security fixes. Installing updates for all other software on your computer or mobile device is also important to keep your wallet environment safer.
One Last Thing: Think About Your Testament
Your cryptocurrency funds can be lost forever if you don’t have a backup plan for your peers and family. If the location of your wallets or your passwords is not known by anyone when you are gone, there is no hope that your funds will ever be recovered. Taking a bit of time on these matters can make a huge difference.
To the Moon*
Are you comfortable with how you’re managing and backing up your cryptocurrency wallets and keys? Do you have a suggestion for keeping your cryptocurrencies safe that we missed above? Please let us know in the comments.
*To the Moon — Crypto slang for a currency that reaches an optimistic price projection.
Today is the early May bank holiday in England and Wales, a public holiday, and while this blog rarely rests, the Pi Towers team does. So, while we take a day with our families, our friends, and/or our favourite pastimes, I thought I’d point you at a couple of features from HackSpace magazine, our monthly magazine for makers.
To my mind, they go quite well with a deckchair in the garden, the buzz of a lawnmower a few houses down, and a view of the weeds I ought to have dealt with by now, but I’m sure you’ll find your own ambience.
If you want a unique piece of jewellery to show your love for pencils, follow Peter Brown’s lead. Peter glued twelve pencils together in two rows of six. He then measured the size of his finger and drilled a hole between the glued pencils using a drill bit.
First off, pencils. It hadn’t occurred to me that you could make super useful stuff like a miniature crossbow and a catapult out of pencils. Not only can you do this, you can probably go ahead and do it right now: all you need is a handful of pencils, some rubber bands, some drawing pins, and a bulldog clip (or, as you might prefer, some push pins and a binder clip). The sentence that really leaps out at me here is “To keep a handful of boys aged three to eleven occupied during a family trip, Marie decided to build mini crossbows to help their target practice.” The internet hasn’t helped me find out much about Marie, but I am in awe of her.
If you haven’t wandered off to make a stationery arsenal by now, read Lucy Rogers‘ reflections on making a right mess of things. I hope you do, because I think it’d be great if more people coped better with the fact that we all, unavoidably, fail. You probably won’t really get anywhere without a few goes where you just completely muck it all up.
This true of everything. Wet lab work and gardening and coding and parenting. And everything. You can share your heroic failures in the comments, if you like, as well as any historic weaponry you have fashioned from the contents of your desk tidy.
Brian Carrigan found the remains of a $500 supermarket barcode scanner at a Scrap Exchange for $6.25, and decided to put it to use as a shopping list builder for his pantry.
Upcycling from scraps
Brian wasn’t planning to build the Wunderscan. But when he stumbled upon the remains of a $500 Cubit barcode scanner at his local reuse center, his inner maker took hold of the situation.
It had been ripped from its connectors and had unlabeled wires hanging from it; a bit of hardware gore if such a thing exists. It was labeled on sale for $6.25, and a quick search revealed that it originally retailed at over $500… I figured I would try to reverse engineer it, and if all else fails, scrap it for the laser and motor.
Brian decided that the scanner, once refurbished with a Raspberry Pi Zero W and new wiring, would make a great addition to his home pantry as a shopping list builder using Wunderlist. “I thought a great use of this would be to keep near our pantry so that when we are out of a spice or snack, we could just scan the item and it would get posted to our shopping list.”
The datasheet for the Cubit scanner was available online, and Brian was able to discover the missing pieces required to bring the unit back to working order.
However, no wiring diagram was provided with the datasheet, so he was forced to figure out the power connections and signal output for himself using a bit of luck and an oscilloscope.
Now that the part was powered and working, all that was left was finding the RS232 transmit line. I used my oscilloscope to do this part and found it by scanning items and looking for the signal. It was not long before this wire was found and I was able to receive UPC codes.
Scanning codes and building (Wunder)lists
When the scanner reads a barcode, it sends the ASCII representation of a UPC code to the attached Raspberry Pi Zero W. Brian used the free UPC Database to convert each code to the name of the corresponding grocery item. Next, he needed to add it to the Wunderlist shopping list that his wife uses for grocery shopping.
Wunderlist provides an API token so users can incorporate list-making into their projects. With a little extra coding, Brian was able to convert the scanning of a pantry item’s barcode into a new addition to the family shopping list.
Curious as to how it all came together? You can find information on the project, including code and hardware configurations, on Brian’s blog. If you’ve built something similar, we’d love to see it in the comments below.
The Backblaze web team is growing! As we add more features and work on our website we need more hands to get things done. Enter Steven, who joins us as an Associate Front End Developer. Steven is going to be getting his hands dirty and diving in to the fun-filled world of web development. Lets learn a bit more about Steven shall we?
What is your Backblaze Title? Associate Front End Developer.
Where are you originally from? The Bronx, New York born and raised.
What attracted you to Backblaze? The team behind Backblaze made me feel like family from the moment I stepped in the door. The level of respect and dedication they showed me is the same respect and dedication they show their customers. Those qualities made wanting to be a part of Backblaze a no brainer!
What do you expect to learn while being at Backblaze? I expect to grow as a software developer and human being by absorbing as much as I can from the immensely talented people I’ll be surrounded by.
Where else have you worked? I previously worked at The Greenwich Hotel where I was a front desk concierge and bellman. If the team at Backblaze is anything like the team I was a part of there then this is going to be a fun ride.
Where did you go to school? I studied at Baruch College and Bloc.
What’s your dream job? My dream job is one where I’m able to express 100% of my creativity.
Favorite place you’ve traveled? Santiago, Dominican Republic.
Favorite hobby? Watching my Yankees, Knicks or Jets play.
Of what achievement are you most proud? Becoming a Software Developer…
Star Trek or Star Wars? Star Wars! May the force be with you…
Coke or Pepsi? … Water. Black iced tea? One of god’s finer creations.
Favorite food? Mangu con Los Tres Golpes (Mashed Plantains with Fried Salami, Eggs & Cheese).
Why do you like certain things? I like things that give me good vibes.
Anything else you’d like you’d like to tell us? If you break any complex concept down into to its simplest parts you’ll have an easier time trying to fully grasp it.
Those are some serious words of wisdom from Steven. We look forward to him helping us get cool stuff out the door!
If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.
The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.
Which data sources does Amazon QuickSight support?
At the time of publication, you can use the following data methods:
Connect to AWS data sources, including:
Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
Import data from SaaS applications like Salesforce and Snowflake
Use big data processing engines like Spark and Presto
SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.
Typical Amazon QuickSight workflow
When you create an analysis, the typical workflow is as follows:
Connect to a data source, and then create a new dataset or choose an existing dataset.
(Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
Create a new analysis.
Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
(Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
(Optional) Add more visuals to the analysis.
(Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
(Optional) Publish the analysis as a dashboard to share insights with other users.
The following graphic illustrates a typical Amazon QuickSight workflow.
Visualizations created in Amazon QuickSight with sample datasets
Data catalog: The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.
The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.
The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.
Data catalog: Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.
The following graph shows the present max air temperature in Seattle from different RWI station sensors.
The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.
Data catalog: Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.
The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.
The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.
Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.
The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.
A quick search for that date and location shows you the following news report:
Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!
Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.
Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.
Most of the time this blog is dedicated to cloud storage and computer backup topics, but we also want our readers to understand the culture and people at Backblaze who all contribute to keeping our company running and making it an enjoyable place to work. We invited our HR Coordinator, Michele, to talk about how she spends her day searching for great candidates to fill employment positions at Backblaze.
What’s a Typical Day for Michele at Backblaze?
After I’ve had a yummy cup of coffee — maybe with a honey and splash of half and half, I’ll generally start my day reviewing resumes and contacting potential candidates to set up an initial phone screen.
When I start the process of filling a position, I’ll spend a lot of time on the phone speaking with potential candidates. During a phone screen call we’ll chat about their experience, background and what they are ideally looking for in their next position. I also ask about what they like to do outside of work, and most importantly, how they feel about office dogs. A candidate may not always look great on paper, but could turn out to be a great cultural fit after speaking with them about their previous experience and what they’re passionate about.
Next, I push strong candidates to the subsequent steps with the hiring managers, which range from setting up a second phone screen, to setting up a Google hangout for completing coding tasks, to scheduling in-person interviews with the team.
At the end of the day after an in-person interview, I’ll check in with all the interviewers to debrief and decide how to proceed with the candidate. Everyone that interviewed the candidate will get together to give feedback. Is there a good cultural fit? Are they someone we’d like to work with? Keeping in contact with the candidates throughout the process and making sure they are organized and informed is a big part of my job. No one likes to wait around and wonder where they are in the process.
In between all the madness, I’ll put together offer letters, send out onboarding paperwork and links, and get all the necessary signatures to move forward.
On the candidate’s first day, I’ll go over benefits and the handbook and make sure everything is going smoothly in their overall orientation as they transition into their new role here at Backblaze!
What Makes Your Job Exciting?
I get to speak with many different types of people and see what makes them tick and if they’d be a good fit at Backblaze
The fast pace of the job
Being constantly kept busy with different tasks including supporting the FUN committee by researching venues and ideas for family day and the holiday party
I work on enjoyable projects like creating a people wall for new hires so we are able to put a face to the name
Getting to take a mini road trip up to Sacramento each month to check in with the data center employees
Constantly learning more and more about the job, the people, and the company
As ransomware attacks have grown in number in recent months, the tactics and attack vectors also have evolved. While the primary method of attack used to be to target individual computer users within organizations with phishing emails and infected attachments, we’re increasingly seeing attacks that target weaknesses in businesses’ IT infrastructure.
How Ransomware Attacks Typically Work
In our previous posts on ransomware, we described the common vehicles used by hackers to infect organizations with ransomware viruses. Most often, downloaders distribute trojan horses through malicious downloads and spam emails. The emails contain a variety of file attachments, which if opened, will download and run one of the many ransomware variants. Once a user’s computer is infected with a malicious downloader, it will retrieve additional malware, which frequently includes crypto-ransomware. After the files have been encrypted, a ransom payment is demanded of the victim in order to decrypt the files.
What’s Changed With the Latest Ransomware Attacks?
In 2016, a customized ransomware strain called SamSam began attacking the servers in primarily health care institutions. SamSam, unlike more conventional ransomware, is not delivered through downloads or phishing emails. Instead, the attackers behind SamSam use tools to identify unpatched servers running Red Hat’s JBoss enterprise products. Once the attackers have successfully gained entry into one of these servers by exploiting vulnerabilities in JBoss, they use other freely available tools and scripts to collect credentials and gather information on networked computers. Then they deploy their ransomware to encrypt files on these systems before demanding a ransom. Gaining entry to an organization through its IT center rather than its endpoints makes this approach scalable and especially unsettling.
SamSam’s methodology is to scour the Internet searching for accessible and vulnerable JBoss application servers, especially ones used by hospitals. It’s not unlike a burglar rattling doorknobs in a neighborhood to find unlocked homes. When SamSam finds an unlocked home (unpatched server), the software infiltrates the system. It is then free to spread across the company’s network by stealing passwords. As it transverses the network and systems, it encrypts files, preventing access until the victims pay the hackers a ransom, typically between $10,000 and $15,000. The low ransom amount has encouraged some victimized organizations to pay the ransom rather than incur the downtime required to wipe and reinitialize their IT systems.
The success of SamSam is due to its effectiveness rather than its sophistication. SamSam can enter and transverse a network without human intervention. Some organizations are learning too late that securing internet-facing services in their data center from attack is just as important as securing endpoints.
The typical steps in a SamSam ransomware attack are:
1 Attackers gain access to vulnerable server
Attackers exploit vulnerable software or weak/stolen credentials.
2 Attack spreads via remote access tools
Attackers harvest credentials, create SOCKS proxies to tunnel traffic, and abuse RDP to install SamSam on more computers in the network.
3 Ransomware payload deployed
Attackers run batch scripts to execute ransomware on compromised machines.
4 Ransomware demand delivered requiring payment to decrypt files
Demand amounts vary from victim to victim. Relatively low ransom amounts appear to be designed to encourage quick payment decisions.
What all the organizations successfully exploited by SamSam have in common is that they were running unpatched servers that made them vulnerable to SamSam. Some organizations had their endpoints and servers backed up, while others did not. Some of those without backups they could use to recover their systems chose to pay the ransom money.
Timeline of SamSam History and Exploits
Since its appearance in 2016, SamSam has been in the news with many successful incursions into healthcare, business, and government institutions.
March 2016 SamSam appears
SamSam campaign targets vulnerable JBoss servers Attackers hone in on healthcare organizations specifically, as they’re more likely to have unpatched JBoss machines.
April 2016 SamSam finds new targets
SamSam begins targeting schools and government. After initial success targeting healthcare, attackers branch out to other sectors.
April 2017 New tactics include RDP
Attackers shift to targeting organizations with exposed RDP connections, and maintain focus on healthcare. An attack on Erie County Medical Center costs the hospital $10 million over three months of recovery.
January 2018 Municipalities attacked
• Attack on Municipality of Farmington, NM. • Attack on Hancock Health. • Attack on Adams Memorial Hospital • Attack on Allscripts (Electronic Health Records), which includes 180,000 physicians, 2,500 hospitals, and 7.2 million patients’ health records.
February 2018 Attack volume increases
• Attack on Davidson County, NC. • Attack on Colorado Department of Transportation.
March 2018 SamSam shuts down Atlanta
• Second attack on Colorado Department of Transportation. • City of Atlanta suffers a devastating attack by SamSam. The attack has far-reaching impacts — crippling the court system, keeping residents from paying their water bills, limiting vital communications like sewer infrastructure requests, and pushing the Atlanta Police Department to file paper reports. • SamSam campaign nets $325,000 in 4 weeks. Infections spike as attackers launch new campaigns. Healthcare and government organizations are once again the primary targets.
How to Defend Against SamSam and Other Ransomware Attacks
The best way to respond to a ransomware attack is to avoid having one in the first place. If you are attacked, making sure your valuable data is backed up and unreachable by ransomware infection will ensure that your downtime and data loss will be minimal or none if you ever suffer an attack.
Use anti-virus and anti-malware software or other security policies to block known payloads from launching.
Make frequent, comprehensive backups of all important files and isolate them from local and open networks. Cybersecurity professionals view data backup and recovery (74% in a recent survey) by far as the most effective solution to respond to a successful ransomware attack.
Keep offline backups of data stored in locations inaccessible from any potentially infected computer, such as disconnected external storage drives or the cloud, which prevents them from being accessed by the ransomware.
Install the latest security updates issued by software vendors of your OS and applications. Remember to patch early and patch often to close known vulnerabilities in operating systems, server software, browsers, and web plugins.
Consider deploying security software to protect endpoints, email servers, and network systems from infection.
Exercise cyber hygiene, such as using caution when opening email attachments and links.
Segment your networks to keep critical computers isolated and to prevent the spread of malware in case of attack. Turn off unneeded network shares.
Turn off admin rights for users who don’t require them. Give users the lowest system permissions they need to do their work.
Restrict write permissions on file servers as much as possible.
Educate yourself, your employees, and your family in best practices to keep malware out of your systems. Update everyone on the latest email phishing scams and human engineering aimed at turning victims into abettors.
Please Tell Us About Your Experiences with Ransomware
Have you endured a ransomware attack or have a strategy to avoid becoming a victim? Please tell us of your experiences in the comments.
AWS Glue is an increasingly popular way to develop serverless ETL (extract, transform, and load) applications for big data and data lake workloads. Organizations that transform their ETL applications to cloud-based, serverless ETL architectures need a seamless, end-to-end continuous integration and continuous delivery (CI/CD) pipeline: from source code, to build, to deployment, to product delivery. Having a good CI/CD pipeline can help your organization discover bugs before they reach production and deliver updates more frequently. It can also help developers write quality code and automate the ETL job release management process, mitigate risk, and more.
AWS Glue is a fully managed data catalog and ETL service. It simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. AWS Glue crawls your data sources and constructs a data catalog using pre-built classifiers for popular data formats and data types, including CSV, Apache Parquet, JSON, and more.
When you are developing ETL applications using AWS Glue, you might come across some of the following CI/CD challenges:
Iterative development with unit tests
Continuous integration and build
Pushing the ETL pipeline to a test environment
Pushing the ETL pipeline to a production environment
Testing ETL applications using real data (live test)
The following diagram shows the pipeline workflow:
This solution uses AWS CodePipeline, which lets you orchestrate and automate the test and deploy stages for ETL application source code. The solution consists of a pipeline that contains the following stages:
1.) Source Control: In this stage, the AWS Glue ETL job source code and the AWS CloudFormation template file for deploying the ETL jobs are both committed to version control. I chose to use AWS CodeCommit for version control.
2.) LiveTest: In this stage, all resources—including AWS Glue crawlers, jobs, S3 buckets, roles, and other resources that are required for the solution—are provisioned, deployed, live tested, and cleaned up.
The LiveTest stage includes the following actions:
Deploy: In this action, all the resources that are required for this solution (crawlers, jobs, buckets, roles, and so on) are provisioned and deployed using an AWS CloudFormation template.
AutomatedLiveTest: In this action, all the AWS Glue crawlers and jobs are executed and data exploration and validation tests are performed. These validation tests include, but are not limited to, record counts in both raw tables and transformed tables in the data lake and any other business validations. I used AWS CodeBuild for this action.
LiveTestApproval: This action is included for the cases in which a pipeline administrator approval is required to deploy/promote the ETL applications to the next stage. The pipeline pauses in this action until an administrator manually approves the release.
LiveTestCleanup: In this action, all the LiveTest stage resources, including test crawlers, jobs, roles, and so on, are deleted using the AWS CloudFormation template. This action helps minimize cost by ensuring that the test resources exist only for the duration of the AutomatedLiveTest and LiveTestApproval
3.) DeployToProduction: In this stage, all the resources are deployed using the AWS CloudFormation template to the production environment.
Try it out
This code pipeline takes approximately 20 minutes to complete the LiveTest test stage (up to the LiveTest approval stage, in which manual approval is required).
To get started with this solution, choose Launch Stack:
This creates the CI/CD pipeline with all of its stages, as described earlier. It performs an initial commit of the sample AWS Glue ETL job source code to trigger the first release change.
In the AWS CloudFormation console, choose Create. After the template finishes creating resources, you see the pipeline name on the stack Outputs tab.
After that, open the CodePipeline console and select the newly created pipeline. Initially, your pipeline’s CodeCommit stage shows that the source action failed.
Allow a few minutes for your new pipeline to detect the initial commit applied by the CloudFormation stack creation. As soon as the commit is detected, your pipeline starts. You will see the successful stage completion status as soon as the CodeCommit source stage runs.
In the CodeCommit console, choose Code in the navigation pane to view the solution files.
Next, you can watch how the pipeline goes through the LiveTest stage of the deploy and AutomatedLiveTest actions, until it finally reaches the LiveTestApproval action.
At this point, if you check the AWS CloudFormation console, you can see that a new template has been deployed as part of the LiveTest deploy action.
At this point, make sure that the AWS Glue crawlers and the AWS Glue job ran successfully. Also check whether the corresponding databases and external tables have been created in the AWS Glue Data Catalog. Then verify that the data is validated using Amazon Athena, as shown following.
Open the AWS Glue console, and choose Databases in the navigation pane. You will see the following databases in the Data Catalog:
Open the Amazon Athena console, and run the following queries. Verify that the record counts are matching.
SELECT count(*) FROM "nycitytaxi_gluedemocicdtest"."data";
SELECT count(*) FROM "nytaxiparquet_gluedemocicdtest"."datalake";
The following shows the raw data:
The following shows the transformed data:
The pipeline pauses the action until the release is approved. After validating the data, manually approve the revision on the LiveTestApproval action on the CodePipeline console.
Add comments as needed, and choose Approve.
The LiveTestApproval stage now appears as Approved on the console.
After the revision is approved, the pipeline proceeds to use the AWS CloudFormation template to destroy the resources that were deployed in the LiveTest deploy action. This helps reduce cost and ensures a clean test environment on every deployment.
Production deployment is the final stage. In this stage, all the resources—AWS Glue crawlers, AWS Glue jobs, Amazon S3 buckets, roles, and so on—are provisioned and deployed to the production environment using the AWS CloudFormation template.
After successfully running the whole pipeline, feel free to experiment with it by changing the source code stored on AWS CodeCommit. For example, if you modify the AWS Glue ETL job to generate an error, it should make the AutomatedLiveTest action fail. Or if you change the AWS CloudFormation template to make its creation fail, it should affect the LiveTest deploy action. The objective of the pipeline is to guarantee that all changes that are deployed to production are guaranteed to work as expected.
In this post, you learned how easy it is to implement CI/CD for serverless AWS Glue ETL solutions with AWS developer tools like AWS CodePipeline and AWS CodeBuild at scale. Implementing such solutions can help you accelerate ETL development and testing at your organization.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.
Ever since we introduced our Groups feature, Backblaze for Business has been growing at a rapid rate! We’ve been staffing up in order to support the product and the newest addition to the sales team, Victoria, joins us as a Sales Development Representative! Let’s learn a bit more about Victoria, shall we?
What is your Backblaze Title? Sales Development Representative.
Where are you originally from? Harrisburg, North Carolina.
What attracted you to Backblaze? The leaders and family-style culture.
What do you expect to learn while being at Backblaze? How to sell, sell, sell!
Where else have you worked? The North Carolina Autism Society, an ophthalmologist’s office, home health care, and another tech startup.
Where did you go to school? The University of North Carolina Chapel Hill and Duke University’s Fuqua School of Business.
What’s your dream job? Fighter pilot, professional snowboarder or killer whale trainer.
Favorite place you’ve traveled? Hawaii and Banff.
Favorite hobby? Basketball and cars.
Of what achievement are you most proud? Missionary work and helping patients feel better.
Star Trek or Star Wars? Neither, but probably Star Wars.
Coke or Pepsi? Neither, bubble tea.
Favorite food? Snow crab legs.
Why do you like certain things? Because God made me that way.
Anything else you’d like you’d like to tell us? I’m a germophobe, drink a lot of water and unfortunately, am introverted.
Being on the phones all day is a good way to build up those extroversion skills! Welcome to the team and we hope you enjoy learning how to sell, sell, sell!
The Raspberry Pi Foundation loves to celebrate people who use technology to solve problems and express themselves creatively, so we’re proud to expand the incredibly successful event Coolest Projects to North America. This free event will be held on Sunday 23 September 2018 at the Discovery Cube Orange County in Santa Ana, California.
What is Coolest Projects?
Coolest Projects is a world-leading showcase that empowers and inspires the next generation of digital creators, innovators, changemakers, and entrepreneurs. The event is both a competition and an exhibition to give young digital makers aged 7 to 17 a platform to celebrate their successes, creativity, and ingenuity.
In 2012, Coolest Projects was conceived as an opportunity for CoderDojo Ninjas to showcase their work and for supporters to acknowledge these achievements. Week after week, Ninjas would meet up to work diligently on their projects, hacks, and code; however, it can be difficult for them to see their long-term progress on a project when they’re concentrating on its details on a weekly basis. Coolest Projects became a dedicated time each year for Ninjas and supporters to reflect, celebrate, and share both the achievements and challenges of the maker’s journey.
Coolest Projects North America
Not only is Coolest Projects expanding to North America, it’s also expanding its participant pool! Members of our team have met so many amazing young people creating in all areas of the world, that it simply made sense to widen our outreach to include Code Clubs, students of Raspberry Pi Certified Educators, and members of the Raspberry Jam community at large as well as CoderDojo attendees.
Exhibit and attend Coolest Projects
Coolest Projects is a free, family- and educator-friendly event. Young people can apply to exhibit their projects, and the general public can register to attend this one-day event. Be sure to register today, because you make Coolest Projects what it is: the coolest.
The datacenter team continues to expand and the latest person to join the team is Daren! He’s very well versed with our infrastructure and is a welcome addition to the caregivers for our ever-growing fleet!
What is your Backblaze Title? Datacenter Technician.
Where are you originally from? Fair Oaks, CA.
What attracted you to Backblaze? The Pods! I’ve always thought Backblaze had a great business concept and I wanted to be a part of the team that helps build it and make it a huge success.
What do you expect to learn while being at Backblaze? Everything about Backblaze and what makes it tick.
Where else have you worked? Sungard Availability Services, ASC Profiles, and Reids Family Martial Arts.
Where did you go to school? American River College and Techskills of California.
What’s your dream job? I always had interest in Architecture. I’m not sure how good I would be at it but building design is something that I would have liked to try.
Favorite place you’ve traveled? My favorite place to travel is the Philippines. I have a lot of family their and I mostly like to visit the smaller villages far from the busy city life. White sandy beaches, family, and Lumpia!
Favorite hobby? Martial Arts – its challenging, great exercise, and a lot of fun!
Star Trek or Star Wars? Whatever my boss likes.
Coke or Pepsi? Coke.
Favorite food? One of my favorite foods is Lumpia. Its the cousin of the Egg Roll but much more amazing. Made of a thin pastry wrapper with a mixture of fillings, consisting of chopped vegetables, ground beef or pork, and potatoes.
Why do you like certain things? I like certain things that take me to places I have never been before.
Anything else you’d like you’d like to tell us? I am excited to be apart of the Backblaze team.
Welcome aboard Daren! We’d love to try some of that lumpia sometime!
Today is Easter Monday and as such, the drawbridge is up at Pi Towers. So while we spend time with family…too much chocolate…family and chocolate, here are some great Pi-themed videos from members of our community. Enjoy!
Allie assembles this Google Home kit, that runs on a Raspberry Pi, then uses the Google Home to test her space knowledge with a little trivia game. Stay tuned at the end to see a few printed cases you can use instead of the cardboard.
Mission date : March 26 2018 My raspberry pi project. I use LTE modem to connect internet. python programming. raspberry pi controls pi cam, 2servo motor, 2dc motor. (This video recoded with gopro to upload youtube. Actually I controll this rover by pi cam.
I built my first security camera with motion-control connected to my raspberry pi with MotionEyeOS. What you need: *Raspberry pi 3 (I prefer pi 3) *Any Webcam or raspberry pi cam *Mirco SD card (min 8gb) Useful links : Download the motioneyeOS software here ➜ https://github.com/ccrisan/motioneyeos/releases How to do it: – Download motioneyeOS to your empty SD card (I mounted it via Etcher ) – I always do a sudo apt-upgrade & sudo apt-update on my projects, in the Pi.
Abstract: In recent years, hardware Trojans have drawn the attention of governments and industry as well as the scientific community. One of the main concerns is that integrated circuits, e.g., for military or critical-infrastructure applications, could be maliciously manipulated during the manufacturing process, which often takes place abroad. However, since there have been no reported hardware Trojans in practice yet, little is known about how such a Trojan would look like and how difficult it would be in practice to implement one. In this paper we propose an extremely stealthy approach for implementing hardware Trojans below the gate level, and we evaluate their impact on the security of the target device. Instead of adding additional circuitry to the target design, we insert our hardware Trojans by changing the dopant polarity of existing transistors. Since the modified circuit appears legitimate on all wiring layers (including all metal and polysilicon), our family of Trojans is resistant to most detection techniques, including fine-grain optical inspection and checking against “golden chips”. We demonstrate the effectiveness of our approach by inserting Trojans into two designs — a digital post-processing derived from Intel’s cryptographically secure RNG design used in the Ivy Bridge processors and a side-channel resistant SBox implementation — and by exploring their detectability and their effects on security.
The moral is that this kind of technique is very difficult to detect.
With over 500 Petabytes of data under management we need more people keeping the drives spinning in our data center. We’re constantly hiring Systems Administrators and Data Center Technicians, and here’s our latest one! Lets learn a bit more about Jacob, shall we?
What is your Backblaze Title? Data center Technician
Where are you originally from? Ojai, CA
What attracted you to Backblaze? It’s a technical job that believes in training it’s employees and treating them well.
What do you expect to learn while being at Backblaze? As much as I can.
Where else have you worked? I was a Team Lead at Target, I did some volunteer work with the Ventura County Medical Center, and I also worked at a motocross track.
Where did you go to school? Ventura Community College, then 1 semester at Sac State
What’s your dream job? Don’t really have one. Whatever can support my family and that I enjoy.
Favorite place you’ve traveled? Yosemite National Park for the touristy stuff, Bend Oregon for a good getaway place.
Favorite hobby? Gaming and music. It’s a tie.
Of what achievement are you most proud? Marring my wife Masha.
Star Trek or Star Wars? Wars. 100%. I’m a major Star Wars geek.
Coke or Pepsi? Monster.
Favorite food? French fries.
Why do you like certain things? Because my brain tells me I like them.
Thank you for helping care for all of our customer’s data. Welcome to the data center team Jacob!
For Joel Wagener, Director of IT at AIBS, simplicity is an important feature he looks for in software applications to use in his organization. So maybe it’s not unexpected that Joel chose Backblaze for Business to back up AIBS’s staff computers. According to Joel, “It just works.”
AIBS (The American Institute of Biological Sciences) is a non-profit scientific association dedicated to advancing biological research and education. Founded in 1947 as part of the National Academy of Sciences, AIBS later became independent and now has over 100 member organizations. AIBS works to ensure that the public, legislators, funders, and the community of biologists have access to and use information that will guide them in making informed decisions about matters that require biological knowledge.
AIBS started using Backblaze for Business Cloud Backup several years ago to make sure that the organization’s data was backed up and protected from accidental loss or computer failure. AIBS is based in Washington, D.C., but is a virtual organization, with staff dispersed around the United States. AIBS needed a backup solution that worked anywhere a staff member was located, and was easy to use, as well. Joel has made Backblaze a default part of the configuration management for all the AIBS endpoints, which in their case are exclusively Macintosh.
“We started using Backblaze on a single computer in 2014, then not too long after that decided to deploy it to all our endpoints,” explains Joel. “We use Groups to oversee backups and for central billing, but we let each user manage their own computer and restore files on their own if they need to.”
“Backblaze stays out of the way until we need it. It’s fairly lightweight, and I appreciate that it’s simple,” says Joel. “It doesn’t throttle backups and the price point is good. I have family members who use Backblaze, as well.”
Backblaze’s Groups feature permits an organization to oversee and manage the user accounts, including restores, or let users handle that themselves. This flexibility fits a variety of organizations, where various degrees of oversight or independence are desirable. The finance and HR departments could manage their own data, for example, while the rest of the organization could be managed by IT. All groups can be billed centrally no matter how other functionality is set up.
“If we have a computer that needs repair, we can put a loaner computer in that person’s hands and they can immediately get the data they need directly from the Backblaze cloud backup, which is really helpful. When we get the original computer back from repair we can do a complete restore and return it to the user all ready to go again. When we’ve needed restores, Backblaze has been reliable.”
Joel also likes that the memory footprint of Backblaze is light — the clients for both Macintosh and Windows are native, and designed to use minimum system resources and not impact any applications used on the computer. He also likes that updates to the client software are pushed out when necessary.
Backblaze for Business also helps IT maintain archives of users’ computers after they leave the organization.
“We like that we have a ready-made archive of a computer when someone leaves,” said Joel. The Backblaze backup is there if we need to retrieve anything that person was working on.”
There are other capabilities in Backblaze that Joel likes, but hasn’t had a chance to use yet.
“We’ve used Casper (Jamf) to deploy and manage software on endpoints without needing any interaction from the user. We haven’t used it yet for Backblaze, but we know that Backblaze supports it. It’s a handy feature to have.”
”It just works.” — Joel Wagener, AIBS Director of IT
Perhaps the best thing about Backblaze for Business isn’t a specific feature that can be found on a product data sheet.
“When files have been lost, Backblaze has provided us access to multiple prior versions, and this feature has been important and worked well several times,” says Joel.
“That provides needed peace of mind to our users, and our IT department, as well.”
Each year we take stock at the Raspberry Pi Foundation, looking back at what we’ve achieved over the previous twelve months. We’ve just published our Annual Review for 2017, reflecting on the progress we’ve made as a foundation and a community towards putting the power of digital making in the hands of people all over the world.
In the review, you can find out about all the different education programmes we run. Moreover, you can hear from people who have taken part, learned through making, and discovered they can do things with technology that they never thought they could.
Growing our reach
Our reach grew hugely in 2017, and the numbers tell this story.
By the end of 2017, we’d sold over 17 million Raspberry Pi computers, bringing tools for learning programming and physical computing to people all over the world.
Vibrant learning and making communities
Code Club grew by 2964 clubs in 2017, to over 10000 clubs across the world reaching over 150000 9- to 13-year-olds.
“The best moment is seeing a child discover something for the first time. It is amazing.” – Code Club volunteer
In 2017 CoderDojo became part of the Raspberry Pi family. Over the year, it grew by 41% to 1556 active Dojos, involving nearly 40000 7- to 17-year-olds in creating with code and collaborating to learn about technology.
Raspberry Jams continued to grow, with 18700 people attending events organised by our amazing community members.
Supporting teaching and learning
We reached 208 projects in our online resources in 2017, and 8.5 million people visited these to get making.
“I like coding because it’s like a whole other language that you have to learn, and it creates something very interesting in the end.” – Betty, Year 10 student
2017 was also the year we began offering online training courses. 19000 people joined us to learn about programming, physical computing, and running a Code Club.
Over 6800 young people entered Mission Zero and Mission Space Lab, 2017’s two Astro Pi challenges. They created code that ran on board the International Space Station or will run soon.
More than 600 educators joined our face-to-face Picademy training last year. Our community of Raspberry Pi Certified Educators grew to 1500, all leading digital making across schools, libraries, and other settings where young people learn.
Well over a million people follow us on social media, and in 2017 we’ve seen big increases in our YouTube and Instagram followings. We have been creating much more video content to share what we do with audiences on these and other social networks.
It’s been a big year, as we continue to reach even more people. This wouldn’t be possible without the amazing work of volunteers and community members who do so much to create opportunities for others to get involved. Behind each of these numbers is a person discovering digital making for the first time, learning new skills, or succeeding with a project that makes a difference to something they care about.
You can read our 2017 Annual Review in full over on our About Us page.
This summer, the Raspberry Pi Foundation is bringing you an all-new community event taking place in Cambridge, UK!
On the weekend of Saturday 30 June and Sunday 1 July 2018, the Pi Towers team, with lots of help from our community of young people, educators, hobbyists, and tech enthusiasts, will be running Raspberry Fields, our brand-new annual festival of digital making!
It will be a chance for people of all ages and skill levels to have a go at getting creative with tech, and it will be a celebration of all that our digital makers have already learnt and achieved, whether through taking part in Code Clubs, CoderDojos, or Raspberry Jams, or through trying our resources at home.
Dive into digital making
At Raspberry Fields, you will have the chance to inspire your inner inventor! Learn about amazing projects others in the community are working on, such as cool robots and wearable technology; have a go at a variety of hands-on activities, from home automation projects to remote-controlled vehicles and more; see fascinating science- and technology-related talks and musical performances. After your visit, you’ll be excited to go home and get making!
If you’re wondering about bringing along young children or less technologically minded family members or friends, there’ll be plenty for them to enjoy — with lots of festival-themed activities such as face painting, fun performances, free giveaways, and delicious food, Raspberry Fields will have something for everyone!
Get your tickets
This two-day ticketed event will be taking place at Cambridge Junction, the city’s leading arts centre. Tickets are £5 if you are aged 16 or older, and free for everyone under 16. Get your tickets by clicking the button on the Raspberry Fields web page!
Where: Cambridge Junction, Clifton Way, Cambridge, CB1 7GX, UK When: Saturday 30 June 2018, 10:30 – 18:00 and Sunday 1 July 2018, 10:00 – 17:30
We are currently looking for people who’d like to contribute activities, talks, or performances with digital themes to the festival. This could be something like live music, dance, or other show acts; talks; or drop-in making activities. In addition, we’re looking for artists who’d like to showcase interactive digital installations, for proud makers who are keen to exhibit their projects, and for vendors who’d like to join in. We particularly encourage young people to showcase projects they’ve created or deliver talks on their digital making journey!
Your contribution to Raspberry Fields should focus on digital making and be fun and engaging for an audience of various ages. However, it doesn’t need to be specific to Raspberry Pi. You might be keen to demonstrate a project you’ve built, do a short Q&A session on what you’ve learnt, or present something more in-depth in the auditorium; maybe you’re one of our approved resellers wanting to showcase in our market area. We’re also looking for digital makers to run drop-in activity sessions, as well as for people who’d like to be marshals with smiling faces who will ensure that everyone has a wonderful time!
If you’d like to take part in Raspberry Fields, let us know via this form, and we’ll be in touch with you soon.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.