This month sees two wonderful events where you can meet the Raspberry Pi team, both taking place on the weekend of September 22 and 23 in the USA.
And for more impromptu fun, you can also hang out with our Social Media Editor and fellow Pi enthusiasts on the East Coast on September 24–28.
Coolest Projects North America
In the Discovery Cube Orange County in Santa Ana, California, team members of the Raspberry Pi Foundation North America, CoderDojo, and Code Club will be celebrating the next generation of young makers at Coolest Projects North America.
Coolest Projects is a world-leading showcase that empowers and inspires the next generation of digital creators, innovators, changemakers, and entrepreneurs. This year, for the first time, we are bringing Coolest Projects to North America for a spectacular event!
While project submissions for the event are now closed, you can still get the last FREE tickets to attend this showcase on Sunday, September 23.
For those on the east side of the continent at World Maker Faire New York, we’ll have representation in the form of Alex, our Social Media Editor.
The East Coast’s largest celebration of invention, creativity, and curiosity showcases the very best of the global Maker Movement. Get immersed in hundreds of projects and multiple stages focused on making for social good, health, technology, electronics, 3D printing & fabrication, food, robotics, art and more!
Alex will be adorned in Raspberry Pi stickers while exploring the cornucopia of incredible projects on show. She’ll be joined by Raspberry Pi’s videographer Brian, and they’ll gather footage of Raspberry Pis being used across the event for videos like this one from last year’s World Maker Faire:
Labelled ‘the world’s first hackable, customisable, dead simple, robotic coffee maker’, and powered by a Raspberry Pi, Mugsy allows you to take control of every aspect of the coffee-making process: from grind size and water temperature, to brew and bloom time.
So if you’re planning to attend World Maker Faire, either as a registered exhibitor or an attendee showing off your most recent project, we want to know! Share your project in the comments so we can find you at the event.
A week of New York and Boston meetups
Lastly, since she’ll be in New York, Alex will be out and about after MFNY, meeting up with members of the Raspberry Pi community. If you’d be game for a Raspberry Pi-cnic in Central Park, Coffee and Pi in a cafe, or any other semi-impromptu meetup in the city, let us know the best days for you between Monday, September 24 to Thursday, September 27! Alex will organise some fun gatherings in the Big Apple.
You can also join her in Boston, Massachusetts, on Friday, September 28, where Alex will again be looking to meet up with makers and Pi enthusiasts — let us know if you’re game!
This is weird
Does anyone else think it’s weird that I’ve been referring to myself in the third person throughout this post?
The adoption of Apache Spark has increased significantly over the past few years, and running Spark-based application pipelines is the new normal. Spark jobs that are in an ETL (extract, transform, and load) pipeline have different requirements—you must handle dependencies in the jobs, maintain order during executions, and run multiple jobs in parallel. In most of these cases, you can use workflow scheduler tools like Apache Oozie, Apache Airflow, and even Cron to fulfill these requirements.
Apache Oozie is a widely used workflow scheduler system for Hadoop-based jobs. However, its limited UI capabilities, lack of integration with other services, and heavy XML dependency might not be suitable for some users. On the other hand, Apache Airflow comes with a lot of neat features, along with powerful UI and monitoring capabilities and integration with several AWS and third-party services. However, with Airflow, you do need to provision and manage the Airflow server. The Cron utility is a powerful job scheduler. But it doesn’t give you much visibility into the job details, and creating a workflow using Cron jobs can be challenging.
What if you have a simple use case, in which you want to run a few Spark jobs in a specific order, but you don’t want to spend time orchestrating those jobs or maintaining a separate application? You can do that today in a serverless fashion using AWS Step Functions. You can create the entire workflow in AWS Step Functions and interact with Spark on Amazon EMR through Apache Livy.
In this post, I walk you through a list of steps to orchestrate a serverless Spark-based ETL pipeline using AWS Step Functions and Apache Livy.
Input data
For the source data for this post, I use the New York City Taxi and Limousine Commission (TLC) trip record data. For a description of the data, see this detailed dictionary of the taxi data. In this example, we’ll work mainly with the following three columns for the Spark jobs.
Column name
Column description
RateCodeID
Represents the rate code in effect at the end of the trip (for example, 1 for standard rate, 2 for JFK airport, 3 for Newark airport, and so on).
FareAmount
Represents the time-and-distance fare calculated by the meter.
TripDistance
Represents the elapsed trip distance in miles reported by the taxi meter.
The trip data is in comma-separated values (CSV) format with the first row as a header. To shorten the Spark execution time, I trimmed the large input data to only 20,000 rows. During the deployment phase, the input file tripdata.csv is stored in Amazon S3 in the <<your-bucket>>/emr-step-functions/input/ folder.
The following image shows a sample of the trip data:
Solution overview
The next few sections describe how Spark jobs are created for this solution, how you can interact with Spark using Apache Livy, and how you can use AWS Step Functions to create orchestrations for these Spark applications.
At a high level, the solution includes the following steps:
Trigger the AWS Step Function state machine by passing the input file path.
The first stage in the state machine triggers an AWS Lambda
The Lambda function interacts with Apache Spark running on Amazon EMR using Apache Livy, and submits a Spark job.
The state machine waits a few seconds before checking the Spark job status.
Based on the job status, the state machine moves to the success or failure state.
Subsequent Spark jobs are submitted using the same approach.
The state machine waits a few seconds for the job to finish.
The job finishes, and the state machine updates with its final status.
Let’s take a look at the Spark application that is used for this solution.
Spark jobs
For this example, I built a Spark jar named spark-taxi.jar. It has two different Spark applications:
MilesPerRateCode – The first job that runs on the Amazon EMR cluster. This job reads the trip data from an input source and computes the total trip distance for each rate code. The output of this job consists of two columns and is stored in Apache Parquet format in the output path.
The following are the expected output columns:
rate_code – Represents the rate code for the trip.
total_distance – Represents the total trip distance for that rate code (for example, sum(trip_distance)).
RateCodeStatus – The second job that runs on the EMR cluster, but only if the first job finishes successfully. This job depends on two different input sets:
csv – The same trip data that is used for the first Spark job.
miles-per-rate – The output of the first job.
This job first reads the tripdata.csv file and aggregates the fare_amount by the rate_code. After this point, you have two different datasets, both aggregated by rate_code. Finally, the job uses the rate_code field to join two datasets and output the entire rate code status in a single CSV file.
The output columns are as follows:
rate_code_id – Represents the rate code type.
total_distance – Derived from first Spark job and represents the total trip distance.
total_fare_amount – A new field that is generated during the second Spark application, representing the total fare amount by the rate code type.
Note that in this case, you don’t need to run two different Spark jobs to generate that output. The goal of setting up the jobs in this way is just to create a dependency between the two jobs and use them within AWS Step Functions.
Both Spark applications take one input argument called rootPath. It’s the S3 location where the Spark job is stored along with input and output data. Here is a sample of the final output:
The next section discusses how you can use Apache Livy to interact with Spark applications that are running on Amazon EMR.
Using Apache Livy to interact with Apache Spark
Apache Livy provides a REST interface to interact with Spark running on an EMR cluster. Livy is included in Amazon EMR release version 5.9.0 and later. In this post, I use Livy to submit Spark jobs and retrieve job status. When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy, and it starts listening on port 8998 by default. Livy provides APIs to interact with Spark.
Let’s look at a couple of examples how you can interact with Spark running on Amazon EMR using Livy.
To list active running jobs, you can execute the following from the EMR master node:
curl localhost:8998/sessions
If you want to do the same from a remote instance, just change localhost to the EMR hostname, as in the following (port 8998 must be open to that remote instance through the security group):
Through Spark submit, you can pass multiple arguments for the Spark job and Spark configuration settings. You can also do that using Livy, by passing the S3 path through the args parameter, as shown following:
For a detailed list of Livy APIs, see the Apache Livy REST API page. This post uses GET /batches and POST /batches.
In the next section, you create a state machine and orchestrate Spark applications using AWS Step Functions.
Using AWS Step Functions to create a Spark job workflow
AWS Step Functions automatically triggers and tracks each step and retries when it encounters errors. So your application executes in order and as expected every time. To create a Spark job workflow using AWS Step Functions, you first create a Lambda state machine using different types of states to create the entire workflow.
First, you use the Task state—a simple state in AWS Step Functions that performs a single unit of work. You also use the Wait state to delay the state machine from continuing for a specified time. Later, you use the Choice state to add branching logic to a state machine.
The following is a quick summary of how to use different states in the state machine to create the Spark ETL pipeline:
Task state – Invokes a Lambda function. The first Task state submits the Spark job on Amazon EMR, and the next Task state is used to retrieve the previous Spark job status.
Wait state – Pauses the state machine until a job completes execution.
Choice state – Each Spark job execution can return a failure, an error, or a success state So, in the state machine, you use the Choice state to create a rule that specifies the next action or step based on the success or failure of the previous step.
Here is one of my Task states, MilesPerRateCode, which simply submits a Spark job:
"MilesPerRate Job": {
"Type": "Task",
"Resource":"arn:aws:lambda:us-east-1:xxxxxx:function:blog-miles-per-rate-job-submit-function",
"ResultPath": "$.jobId",
"Next": "Wait for MilesPerRate job to complete"
}
This Task state configuration specifies the Lambda function to execute. Inside the Lambda function, it submits a Spark job through Livy using Livy’s POST API. Using ResultPath, it tells the state machine where to place the result of the executing task. As discussed in the previous section, Spark submit returns the session ID, which is captured with $.jobId and used in a later state.
The following code section shows the Lambda function, which is used to submit the MilesPerRateCode job. It uses the Python request library to submit a POST against the Livy endpoint hosted on Amazon EMR and passes the required parameters in JSON format through payload. It then parses the response, grabs id from the response, and returns it. The Next field tells the state machine which state to go to next.
Just like in the MilesPerRate job, another state submits the RateCodeStatus job, but it executes only when all previous jobs have completed successfully.
Here is the Task state in the state machine that checks the Spark job status:
Just like other states, the preceding Task executes a Lambda function, captures the result (represented by jobStatus), and passes it to the next state. The following is the Lambda function that checks the Spark job status based on a given session ID:
In the Choice state, it checks the Spark job status value, compares it with a predefined state status, and transitions the state based on the result. For example, if the status is success, move to the next state (RateCodeJobStatus job), and if it is dead, move to the MilesPerRate job failed state.
To set up this entire solution, you need to create a few AWS resources. To make it easier, I have created an AWS CloudFormation template. This template creates all the required AWS resources and configures all the resources that are needed to create a Spark-based ETL pipeline on AWS Step Functions.
This CloudFormation template requires you to pass the following four parameters during initiation.
Parameter
Description
ClusterSubnetID
The subnet where the Amazon EMR cluster is deployed and Lambda is configured to talk to this subnet.
KeyName
The name of the existing EC2 key pair to access the Amazon EMR cluster.
VPCID
The ID of the virtual private cloud (VPC) where the EMR cluster is deployed and Lambda is configured to talk to this VPC.
S3RootPath
The Amazon S3 path where all required files (input file, Spark job, and so on) are stored and the resulting data is written.
IMPORTANT: These templates are designed only to show how you can create a Spark-based ETL pipeline on AWS Step Functions using Apache Livy. They are not intended for production use without modification. And if you try this solution outside of the us-east-1 Region, download the necessary files from s3://aws-data-analytics-blog/emr-step-functions, upload the files to the buckets in your Region, edit the script as appropriate, and then run it.
To launch the CloudFormation stack, choose Launch Stack:
Launching this stack creates the following list of AWS resources.
Logical ID
Resource Type
Description
StepFunctionsStateExecutionRole
IAM role
IAM role to execute the state machine and have a trust relationship with the states service.
SparkETLStateMachine
AWS Step Functions state machine
State machine in AWS Step Functions for the Spark ETL workflow.
LambdaSecurityGroup
Amazon EC2 security group
Security group that is used for the Lambda function to call the Livy API.
RateCodeStatusJobSubmitFunction
AWS Lambda function
Lambda function to submit the RateCodeStatus job.
MilesPerRateJobSubmitFunction
AWS Lambda function
Lambda function to submit the MilesPerRate job.
SparkJobStatusFunction
AWS Lambda function
Lambda function to check the Spark job status.
LambdaStateMachineRole
IAM role
IAM role for all Lambda functions to use the lambda trust relationship.
EMRCluster
Amazon EMR cluster
EMR cluster where Livy is running and where the job is placed.
During the AWS CloudFormation deployment phase, it sets up S3 paths for input and output. Input files are stored in the <<s3-root-path>>/emr-step-functions/input/ path, whereas spark-taxi.jar is copied under <<s3-root-path>>/emr-step-functions/.
The following screenshot shows how the S3 paths are configured after deployment. In this example, I passed a bucket that I created in the AWS account s3://tm-app-demos for the S3 root path.
If the CloudFormation template completed successfully, you will see Spark-ETL-State-Machine in the AWS Step Functions dashboard, as follows:
Choose the Spark-ETL-State-Machine state machine to take a look at this implementation. The AWS CloudFormation template built the entire state machine along with its dependent Lambda functions, which are now ready to be executed.
On the dashboard, choose the newly created state machine, and then choose New execution to initiate the state machine. It asks you to pass input in JSON format. This input goes to the first state MilesPerRate Job, which eventually executes the Lambda function blog-miles-per-rate-job-submit-function.
Pass the S3 root path as input:
{
“rootPath”: “s3://tm-app-demos”
}
Then choose Start Execution:
The rootPath value is the same value that was passed when creating the CloudFormation stack. It can be an S3 bucket location or a bucket with prefixes, but it should be the same value that is used for AWS CloudFormation. This value tells the state machine where it can find the Spark jar and input file, and where it will write output files. After the state machine starts, each state/task is executed based on its definition in the state machine.
At a high level, the following represents the flow of events:
Execute the first Spark job, MilesPerRate.
The Spark job reads the input file from the location <<rootPath>>/emr-step-functions/input/tripdata.csv. If the job finishes successfully, it writes the output data to <<rootPath>>/emr-step-functions/miles-per-rate.
If the Spark job fails, it transitions to the error state MilesPerRate job failed, and the state machine stops. If the Spark job finishes successfully, it transitions to the RateCodeStatus Job state, and the second Spark job is executed.
If the second Spark job fails, it transitions to the error state RateCodeStatus job failed, and the state machine stops with the Failed status.
If this Spark job completes successfully, it writes the final output data to the <<rootPath>>/emr-step-functions/rate-code-status/ It also transitions the RateCodeStatus job finished state, and the state machine ends its execution with the Success status.
This following screenshot shows a successfully completed Spark ETL state machine:
The right side of the state machine diagram shows the details of individual states with their input and output.
When you execute the state machine for the second time, it fails because the S3 path already exists. The state machine turns red and stops at MilePerRate job failed. The following image represents that failed execution of the state machine:
You can also check your Spark application status and logs by going to the Amazon EMR console and viewing the Application history tab:
I hope this walkthrough paints a picture of how you can create a serverless solution for orchestrating Spark jobs on Amazon EMR using AWS Step Functions and Apache Livy. In the next section, I share some ideas for making this solution even more elegant.
Next steps
The goal of this post is to show a simple example that uses AWS Step Functions to create an orchestration for Spark-based jobs in a serverless fashion. To make this solution robust and production ready, you can explore the following options:
In this example, I manually initiated the state machine by passing the rootPath as input. You can instead trigger the state machine automatically. To run the ETL pipeline as soon as the files arrive in your S3 bucket, you can pass the new file path to the state machine. Because CloudWatch Events supports AWS Step Functions as a target, you can create a CloudWatch rule for an S3 event. You can then set AWS Step Functions as a target and pass the new file path to your state machine. You’re all set!
You can also improve this solution by adding an alerting mechanism in case of failures. To do this, create a Lambda function that sends an alert email and assigns that Lambda function to a Fail That way, when any part of your state fails, it triggers an email and notifies the user.
If you want to submit multiple Spark jobs in parallel, you can use the Parallel state type in AWS Step Functions. The Parallel state is used to create parallel branches of execution in your state machine.
With Lambda and AWS Step Functions, you can create a very robust serverless orchestration for your big data workload.
Cleaning up
When you’ve finished testing this solution, remember to clean up all those AWS resources that you created using AWS CloudFormation. Use the AWS CloudFormation console or AWS CLI to delete the stack named Blog-Spark-ETL-Step-Functions.
Summary
In this post, I showed you how to use AWS Step Functions to orchestrate your Spark jobs that are running on Amazon EMR. You used Apache Livy to submit jobs to Spark from a Lambda function and created a workflow for your Spark jobs, maintaining a specific order for job execution and triggering different AWS events based on your job’s outcome. Go ahead—give this solution a try, and share your experience with us!
Tanzir Musabbir is an EMR Specialist Solutions Architect with AWS. He is an early adopter of open source Big Data technologies. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.
The New York Times is reporting about a company called Securus Technologies that gives police the ability to track cell phone locations without a warrant:
The service can find the whereabouts of almost any cellphone in the country within seconds. It does this by going through a system typically used by marketers and other companies to get location data from major cellphone carriers, including AT&T, Sprint, T-Mobile and Verizon, documents show.
Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.
AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.
Building a Big Grid In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.
Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.
Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:
Spot Instance Limit – 120,000
EBS Volume Limit – 120,000
EBS Capacity Limit – 2 PB
If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.
Running the Grid We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.
The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.
Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.
To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.
I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!
Last month, Wired published a long article about Ray Ozzie and his supposed new scheme for adding a backdoor in encrypted devices. It’s a weird article. It paints Ozzie’s proposal as something that “attains the impossible” and “satisfies both law enforcement and privacy purists,” when (1) it’s barely a proposal, and (2) it’s essentially the same key escrow scheme we’ve been hearing about for decades.
Basically, each device has a unique public/private key pair and a secure processor. The public key goes into the processor and the device, and is used to encrypt whatever user key encrypts the data. The private key is stored in a secure database, available to law enforcement on demand. The only other trick is that for law enforcement to use that key, they have to put the device in some sort of irreversible recovery mode, which means it can never be used again. That’s basically it.
I have no idea why anyone is talking as if this were anything new. Severalcryptographershavealreadyexplained why this key escrow scheme is no better than any other key escrow scheme. The short answer is (1) we won’t be able to secure that database of backdoor keys, (2) we don’t know how to build the secure coprocessor the scheme requires, and (3) it solves none of the policy problems around the whole system. This is the typical mistake non-cryptographers make when they approach this problem: they think that the hard part is the cryptography to create the backdoor. That’s actually the easy part. The hard part is ensuring that it’s only used by the good guys, and there’s nothing in Ozzie’s proposal that addresses any of that.
I worry that this kind of thing is damaging in the long run. There should be some rule that any backdoor or key escrow proposal be a fully specified proposal, not just some cryptography and hand-waving notions about how it will be used in practice. And before it is analyzed and debated, it should have to satisfy some sort of basic security analysis. Otherwise, we’ll be swatting pseudo-proposals like this one, while those on the other side of this debate become increasingly convinced that it’s possible to design one of these things securely.
Already people are using the National Academies report on backdoors for law enforcement as evidence that engineers are developing workable and secure backdoors. Writing in Lawfare, Alan Z. Rozenshtein claims that the report — and a related New York Timesstory — “undermine the argument that secure third-party access systems are so implausible that it’s not even worth trying to develop them.” Susan Landau effectively corrects this misconception, but the damage is done.
Here’s the thing: it’s not hard to design and build a backdoor. What’s hard is building the systems — both technical and procedural — around them. Here’s Rob Graham:
He’s only solving the part we already know how to solve. He’s deliberately ignoring the stuff we don’t know how to solve. We know how to make backdoors, we just don’t know how to secure them.
A bunch of us cryptographers have already explained why we don’t think this sort of thing will work in the foreseeable future. We write:
Exceptional access would force Internet system developers to reverse “forward secrecy” design practices that seek to minimize the impact on user privacy when systems are breached. The complexity of today’s Internet environment, with millions of apps and globally connected services, means that new law enforcement requirements are likely to introduce unanticipated, hard to detect security flaws. Beyond these and other technical vulnerabilities, the prospect of globally deployed exceptional access systems raises difficult problems about how such an environment would be governed and how to ensure that such systems would respect human rights and the rule of law.
The reason so few of us are willing to bet on massive-scale key escrow systems is that we’ve thought about it and we don’t think it will work. We’ve looked at the threat model, the usage model, and the quality of hardware and software that exists today. Our informed opinion is that there’s no detection system for key theft, there’s no renewability system, HSMs are terrifically vulnerable (and the companies largely staffed with ex-intelligence employees), and insiders can be suborned. We’re not going to put the data of a few billion people on the line an environment where we believe with high probability that the system will fail.
The Backblaze web team is growing! As we add more features and work on our website we need more hands to get things done. Enter Steven, who joins us as an Associate Front End Developer. Steven is going to be getting his hands dirty and diving in to the fun-filled world of web development. Lets learn a bit more about Steven shall we?
What is your Backblaze Title? Associate Front End Developer.
Where are you originally from? The Bronx, New York born and raised.
What attracted you to Backblaze? The team behind Backblaze made me feel like family from the moment I stepped in the door. The level of respect and dedication they showed me is the same respect and dedication they show their customers. Those qualities made wanting to be a part of Backblaze a no brainer!
What do you expect to learn while being at Backblaze? I expect to grow as a software developer and human being by absorbing as much as I can from the immensely talented people I’ll be surrounded by.
Where else have you worked? I previously worked at The Greenwich Hotel where I was a front desk concierge and bellman. If the team at Backblaze is anything like the team I was a part of there then this is going to be a fun ride.
Where did you go to school? I studied at Baruch College and Bloc.
What’s your dream job? My dream job is one where I’m able to express 100% of my creativity.
Favorite place you’ve traveled? Santiago, Dominican Republic.
Favorite hobby? Watching my Yankees, Knicks or Jets play.
Of what achievement are you most proud? Becoming a Software Developer…
Star Trek or Star Wars? Star Wars! May the force be with you…
Coke or Pepsi? … Water. Black iced tea? One of god’s finer creations.
Favorite food? Mangu con Los Tres Golpes (Mashed Plantains with Fried Salami, Eggs & Cheese).
Why do you like certain things? I like things that give me good vibes.
Anything else you’d like you’d like to tell us? If you break any complex concept down into to its simplest parts you’ll have an easier time trying to fully grasp it.
Those are some serious words of wisdom from Steven. We look forward to him helping us get cool stuff out the door!
If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.
The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.
Which data sources does Amazon QuickSight support?
At the time of publication, you can use the following data methods:
Connect to AWS data sources, including:
Amazon RDS
Amazon Aurora
Amazon Redshift
Amazon Athena
Amazon S3
Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
Import data from SaaS applications like Salesforce and Snowflake
Use big data processing engines like Spark and Presto
SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.
Typical Amazon QuickSight workflow
When you create an analysis, the typical workflow is as follows:
Connect to a data source, and then create a new dataset or choose an existing dataset.
(Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
Create a new analysis.
Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
(Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
(Optional) Add more visuals to the analysis.
(Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
(Optional) Publish the analysis as a dashboard to share insights with other users.
The following graphic illustrates a typical Amazon QuickSight workflow.
Visualizations created in Amazon QuickSight with sample datasets
Data catalog: The World Bank invests into multiple development projects at the national, regional, and global levels. It’s a great source of information for data analysts.
The following graph shows the percentage of the population that has access to electricity (rural and urban) during 2000 in Asia, Africa, the Middle East, and Latin America.
The following graph shows the share of healthcare costs that are paid out-of-pocket (private vs. public). Also, you can maneuver over the graph to get detailed statistics at a glance.
Data catalog: The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.
The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.
The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.
Data catalog: Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.
The following graph shows the present max air temperature in Seattle from different RWI station sensors.
The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.
Data catalog: Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.
The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.
The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.
Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.
The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.
A quick search for that date and location shows you the following news report:
Summary
Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!
Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.
Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.
We love Mugsy, the Raspberry Pi coffee robot that has smashed its crowdfunding goal within days! Our latest YouTube video shows our catch-up with Mugsy and its creator Matthew Oswald at Maker Faire New York last year.
Labelled ‘the world’s first hackable, customisable, dead simple, robotic coffee maker’, Mugsy allows you to take control of every aspect of the coffee-making process: from grind size and water temperature, to brew and bloom time. Feeling lazy instead? Read in your beans’ barcode via an onboard scanner, and it will automatically use the best settings for your brew.
Looking to start your day with your favourite coffee straight out of bed? Send the robot a text, email, or tweet, and it will notify you when your coffee is ready!
Learning through product development
“Initially, I used [Mugsy] as a way to teach myself hardware design,” explained Matthew at his Editor’s Choice–winning Maker Faire stand. “I really wanted to hold something tangible in my hands. By using the Raspberry Pi and just being curious, anytime I wanted to use a new technology, I would try to pull back [and ask] ‘How can I integrate this into Mugsy?’”
By exploring his passions and using Mugsy as his guinea pig, Matthew created a project that not only solves a problem — how to make amazing coffee at home — but also brings him one step closer to ‘making things’ for a living. “I used to dream about this stuff when I was a kid, and I used to say ‘I’m never going to be able to do something like that.’” he admitted. But now, with open-source devices like the Raspberry Pi so readily available, he “can see the end of the road”: making his passion his livelihood.
Back Mugsy
With only a few days left on the Kickstarter campaign, Mugsy has reached its goal and then some. It’s available for backing from $150 if you provide your own Raspberry Pi 3, or from $175 with a Pi included — check it out today!
За съдържанието на издателите, които дадат възможност за използване на това средство, може да се кликне върху “Абониране” – автоматично се влиза в сайта на издателя и може да се плаща с всяка кредитна карта, която е използвана с Google в миналото.
Участват Les Échos, Fairfax Media, Le Figaro, the Financial Times, Gannett, Gatehouse Media, Grupo Globo, The Mainichi, McClatchy, La Nación, The New York Times, NRC Group, Le Parisien, Reforma, la Republica, The Telegraph и The Washington Post.
Today we’re releasing a new machine learning feature in Amazon Kinesis Data Analytics for detecting “hotspots” in your streaming data. We launched Kinesis Data Analytics in August of 2016 and we’ve continued to add features since. As you may already know, Kinesis Data Analytics is a fully managed real-time processing engine for streaming data that lets you write SQL queries to derive meaning from your data and output the results to Kinesis Data Firehose, Kinesis Data Streams, or even an AWS Lambda function. The new HOTSPOT function adds to the existing machine learning capabilities in Kinesis that allow customers to leverage unsupervised streaming based machine learning algorithms. Customers don’t need to be experts in data science or machine learning to take advantage of these capabilities.
Hotspots
The HOTSPOTS function is a new Kinesis Data Analytics SQL function you can use to idenitfy relatively dense regions in your data without having to explicity build and train complicated machine learning models. You can identify subsections of your data that need immediate attention and take action programatically by streaming the hotspots out to a Kinesis Data stream, to a Firehose delivery stream, or by invoking a AWS Lambda function.
There are a ton of really cool scenarios where this could make your operations easier. Imagine a ride-share program or autonomous vehicle fleet communicating spatiotemporal data about traffic jams and congestion, or a datacenter where a number of servers start to overheat indicating an HVAC issue. HOTSPOTS is not limited to spatiotemporal data and you could apply it across many problem domains.
The function follows some simple syntax and accepts the DOUBLE, INTEGER, FLOAT, TINYINT, SMALLINT, REAL, and BIGINT data types.
The HOTSPOT function takes a cursor as input and returns a JSON string describing the hotspot. This will be easier to understand with an example.
Using Kinesis Data Analytics to Detect Hotspots
Let’s take a simple data set from NY Taxi and Limousine Commission that tracks yellow cab pickup and dropoff locations. Most of this data is already on S3 and publicly accessible at s3://nyc-tlc/. We will create a small python script to load our Kinesis Data Stream with Taxi records which will feed our Kinesis Data Analytics. Finally we’ll output all of this to a Kinesis Data Firehose connected to an Amazon Elasticsearch Service cluster for visualization with Kibana. I know from living in New York for 5 years that we’ll probably find a hotspot or two in this data.
First, we’ll create an input Kinesis stream and start sending our NYC Taxi Ride data into it. I just wrote a quick python script to read from one of the CSV files and used boto3 to push the records into Kinesis. You can put the record in whatever way works for you.
import csv
import json
import boto3
def chunkit(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
kinesis = boto3.client("kinesis")
with open("taxidata2.csv") as f:
reader = csv.DictReader(f)
records = chunkit([{"PartitionKey": "taxis", "Data": json.dumps(row)} for row in reader], 500)
for chunk in records:
kinesis.put_records(StreamName="TaxiData", Records=chunk)
Next, we’ll create the Kinesis Data Analytics application and add our input stream with our taxi data as the source.
Next we’ll automatically detect the schema.
Now we’ll create a quick SQL Script to detect our hotspots and add that to the Real Time Analytics section of our application.
CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
"pickup_longitude" DOUBLE,
"pickup_latitude" DOUBLE,
HOTSPOTS_RESULT VARCHAR(10000)
);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"
SELECT "pickup_longitude", "pickup_latitude", "HOTSPOTS_RESULT" FROM
TABLE(HOTSPOTS(
CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"),
1000,
0.013,
20
)
);
Our HOTSPOTS function takes an input stream, a window size, scan radius, and a minimum number of points to count as a hotspot. The values for these are application dependent but you can tinker with them in the console easily until you get the results you want. There are more details about the parameters themselves in the documentation. The HOTSPOTS_RESULT returns some useful JSON that would let us plot bounding boxes around our hotspots:
When we have our desired results we can save the script and connect our application to our Amazon Elastic Search Service Firehose Delivery Stream. We can run an intermediate lambda function in the firehose to transform our record into a format more suitable for geographic work. Then we can update our mapping in Elasticsearch to index the hotspot objects as Geo-Shapes.
Finally, we can connect to Kibana and visualize the results.
Looks like Manhattan is pretty busy!
Available Now This feature is available now in all existing regions with Kinesis Data Analytics. I think this is a really interesting new feature of Kinesis Data Analytics that can bring immediate value to many applications. Let us know what you build with it on Twitter or in the comments!
Когато започнах рубриката си „Аз, киборгът“ в Тоест, имах в главата си две идеи. Едната е да обяснявам на човешки и нетехнически език важни неща от света на технологиите, а другата – постепенно да разказвам за възможните злоупотреби с данните, които безразсъдно сеем из т.нар. социални мрежи и най-вече Facebook.
Междувременно журналисти на The Guardian и The Observer, заедно с The New York Times и Channel 4 са работили цяла година по разследване, което потвърждава всички опасения, че данните, които Facebook е трупал с доброволното съучастие на потребителите си, са използвани безцеремонно за мръсни и подмолни манипулации от компанията Cambridge Analytica, която е превърнала това в свой бизнес.
В името на коректността е редно да се отбележи, че темата сама по себе си не е новина. Още преди една година разследване на The Intercept извади на повърхността мащаба на проблема – че данните на 30 милиона потребители във Facebook са използвани за предизборни манипулации в полза на Доналд Тръмп. Сега обаче разполагаме със свидетелствата на whistleblower (бивш ключов служител на Cambridge Analytica), който разказва с детайли как се е случвало всичко и още купчина самопризнания за детайли и пикантерии лично от мениджмънта на компанията, записани със скрита камера, докато ухажват мним потенциален нов клиент.
Компанията Cambridge Analytica, извличайки данни от подбрана извадка потребители на Facebook и свързаните с тях лица, създава огромна мрежа за влияние, есплоатирайки страховете и слабостите на хората. Така е манипулирала обществената среда и общественото мнение в полза на клиентите си. Прецизно е оценявала психологическите профили на хората, търсейки техни слабости и възползвайки се от податливостта им на влияния. Оценявала е личните профили, фокусирайки се внимателно върху трите критерия, които в психологията са наречени „черната триада” или „тъмната тройка” – макиавелизъм, психопатия и нарцисизъм.
Предоставяли са услугите си и през фирми-посредници, за да не бъдат уличени в директни връзки с политическите кампании, за които са работили. А за Източна Европа изпускат интересна самохвална реплика, че толкова потайно са си свършили работата, че никой дори не е разбрал…
Извън скандала с Cambridge Analytica обаче е важно да се осъзнае, че те не са единственият злодей в историята. Всичко това се случва, защото просто така работи Facebook. Това, което днес наричаме социални медии, всъщност са инструменти за събиране на данни. И не трябва да има никаква дискусия чии са тези данни и на кого принадлежат. Наши са! Не допускайте други тълкувания! Във времето, в което живеем, данните ни са проекция на самите нас. Данните ни, това сме ние! Допускайки друг да разполага с тях, позволяваме да ни застигат такива скандали, като не на шега ни заплашва някаква форма на дигитален феодализъм или дигитално робство.
Важен е и друг детайл. Всичко това излиза наяве благодарение на свободната преса. Ето защо тя е толкова важна за демокрацията. А със социалните мрежи е свършено! Би трябвало! Поне в този им вид.
Cambridge Analytica е злодеят, който се е възползвал от това, което е свършил друг по-страшен злодей, а именно Facebook. И не на последно място от наивността на всички ни, които още не сме си затворили акаунтите там. Facebook трябва да понесат всички последствия и цялата отговорност, защото направиха възможно всичко това. И не заслужават никаква милост!
DeleteFacebook
P.S. В европейски контекст е важно да отбележим, че е по-разумно да изтрием Facebook акаунта си след 25 май 2018 г. След тази дата GDPR ще бъде в пълна сила и Facebook са длъжни да го спазват, а той изисква ако потребителят пожелае неговият профил да бъде заличен, това наистина да бъде направено. А не просто замразен, както е било досега.
Когато започнах рубриката си „Аз, киборгът“ в Тоест, имах в главата си две идеи. Едната е да обяснявам на човешки и нетехнически език важни неща от света на технологиите, а другата – постепенно да разказвам за възможните злоупотреби с данните, които безразсъдно сеем из т.нар. социални мрежи и най-вече Facebook.
Междувременно журналисти на The Guardian и The Observer, заедно с The New York Times и Channel 4 са работили цяла година по разследване, което потвърждава всички опасения, че данните, които Facebook е трупал с доброволното съучастие на потребителите си, са използвани безцеремонно за мръсни и подмолни манипулации от компанията Cambridge Analytica, която е превърнала това в свой бизнес.
В името на коректността е редно да се отбележи, че темата сама по себе си не е новина. Още преди една година разследване на The Intercept извади на повърхността мащаба на проблема – че данните на 30 милиона потребители във Facebook са използвани за предизборни манипулации в полза на Доналд Тръмп. Сега обаче разполагаме със свидетелствата на whistleblower (бивш ключов служител на Cambridge Analytica), който разказва с детайли как се е случвало всичко и още купчина самопризнания за детайли и пикантерии лично от мениджмънта на компанията, записани със скрита камера, докато ухажват мним потенциален нов клиент.
Компанията Cambridge Analytica, извличайки данни от подбрана извадка потребители на Facebook и свързаните с тях лица, създава огромна мрежа за влияние, есплоатирайки страховете и слабостите на хората. Така е манипулирала обществената среда и общественото мнение в полза на клиентите си. Прецизно е оценявала психологическите профили на хората, търсейки техни слабости и възползвайки се от податливостта им на влияния. Оценявала е личните профили, фокусирайки се внимателно върху трите критерия, които в психологията са наречени „черната триада” или „тъмната тройка” – макиавелизъм, психопатия и нарцисизъм.
Предоставяли са услугите си и през фирми-посредници, за да не бъдат уличени в директни връзки с политическите кампании, за които са работили. А за Източна Европа изпускат интересна самохвална реплика, че толкова потайно са си свършили работата, че никой дори не е разбрал…
Извън скандала с Cambridge Analytica обаче е важно да се осъзнае, че те не са единственият злодей в историята. Всичко това се случва, защото просто така работи Facebook. Това, което днес наричаме социални медии, всъщност са инструменти за събиране на данни. И не трябва да има никаква дискусия чии са тези данни и на кого принадлежат. Наши са! Не допускайте други тълкувания! Във времето, в което живеем, данните ни са проекция на самите нас. Данните ни, това сме ние! Допускайки друг да разполага с тях, позволяваме да ни застигат такива скандали, като не на шега ни заплашва някаква форма на дигитален феодализъм или дигитално робство.
Важен е и друг детайл. Всичко това излиза наяве благодарение на свободната преса. Ето защо тя е толкова важна за демокрацията. А със социалните мрежи е свършено! Би трябвало! Поне в този им вид.
Cambridge Analytica е злодеят, който се е възползвал от това, което е свършил друг по-страшен злодей, а именно Facebook. И не на последно място от наивността на всички ни, които още не сме си затворили акаунтите там. Facebook трябва да понесат всички последствия и цялата отговорност, защото направиха възможно всичко това. И не заслужават никаква милост!
#DeleteFacebook
P.S. В европейски контекст е важно да отбележим, че е по-разумно да изтрием Facebook акаунта си след 25 май 2018 г. След тази дата GDPR ще бъде в пълна сила и Facebook са длъжни да го спазват, а той изисква ако потребителят пожелае неговият профил да бъде заличен, това наистина да бъде направено. А не просто замразен, както е било досега.
Когато започнах рубриката си „Аз, киборгът“ в Тоест, имах в главата си две идеи. Едната е да обяснявам на човешки и нетехнически език важни неща от света на технологиите, а другата – постепенно да разказвам за възможните злоупотреби с данните, които безразсъдно сеем из т.нар. социални мрежи и най-вече Facebook.
Междувременно журналисти на The Guardian и The Observer, заедно с The New York Times и Channel 4 са работили цяла година по разследване, което потвърждава всички опасения, че данните, които Facebook е трупал с доброволното съучастие на потребителите си, са използвани безцеремонно за мръсни и подмолни манипулации от компанията Cambridge Analytica, която е превърнала това в свой бизнес.
В името на коректността е редно да се отбележи, че темата сама по себе си не е новина. Още преди една година разследване на The Intercept извади на повърхността мащаба на проблема – че данните на 30 милиона потребители във Facebook са използвани за предизборни манипулации в полза на Доналд Тръмп. Сега обаче разполагаме със свидетелствата на whistleblower (бивш ключов служител на Cambridge Analytica), който разказва с детайли как се е случвало всичко и още купчина самопризнания за детайли и пикантерии лично от мениджмънта на компанията, записани със скрита камера, докато ухажват мним потенциален нов клиент.
Компанията Cambridge Analytica, извличайки данни от подбрана извадка потребители на Facebook и свързаните с тях лица, създава огромна мрежа за влияние, есплоатирайки страховете и слабостите на хората. Така е манипулирала обществената среда и общественото мнение в полза на клиентите си. Прецизно е оценявала психологическите профили на хората, търсейки техни слабости и възползвайки се от податливостта им на влияния. Оценявала е личните профили, фокусирайки се внимателно върху трите критерия, които в психологията са наречени „черната триада” или „тъмната тройка” – макиавелизъм, психопатия и нарцисизъм.
Предоставяли са услугите си и през фирми-посредници, за да не бъдат уличени в директни връзки с политическите кампании, за които са работили. А за Източна Европа изпускат интересна самохвална реплика, че толкова потайно са си свършили работата, че никой дори не е разбрал…
Извън скандала с Cambridge Analytica обаче е важно да се осъзнае, че те не са единственият злодей в историята. Всичко това се случва, защото просто така работи Facebook. Това, което днес наричаме социални медии, всъщност са инструменти за събиране на данни. И не трябва да има никаква дискусия чии са тези данни и на кого принадлежат. Наши са! Не допускайте други тълкувания! Във времето, в което живеем, данните ни са проекция на самите нас. Данните ни, това сме ние! Допускайки друг да разполага с тях, позволяваме да ни застигат такива скандали, като не на шега ни заплашва някаква форма на дигитален феодализъм или дигитално робство.
Важен е и друг детайл. Всичко това излиза наяве благодарение на свободната преса. Ето защо тя е толкова важна за демокрацията. А със социалните мрежи е свършено! Би трябвало! Поне в този им вид.
Cambridge Analytica е злодеят, който се е възползвал от това, което е свършил друг по-страшен злодей, а именно Facebook. И не на последно място от наивността на всички ни, които още не сме си затворили акаунтите там. Facebook трябва да понесат всички последствия и цялата отговорност, защото направиха възможно всичко това. И не заслужават никаква милост!
#DeleteFacebook
P.S. В европейски контекст е важно да отбележим, че е по-разумно да изтрием Facebook акаунта си след 25 май 2018 г. След тази дата GDPR ще бъде в пълна сила и Facebook са длъжни да го спазват, а той изисква ако потребителят пожелае неговият профил да бъде заличен, това наистина да бъде направено. А не просто замразен, както е било досега.
John Oliver covered bitcoin/cryptocurrencies last night. I thought I’d describe a bunch of things he gets wrong.
How Bitcoin works
Nowhere in the show does it describe what Bitcoin is and how it works.
Discussions should always start with Satoshi Nakamoto’s original paper. The thing Satoshi points out is that there is an important cost to normal transactions, namely, the entire legal system designed to protect you against fraud, such as the way you can reverse the transactions on your credit card if it gets stolen. The point of Bitcoin is that there is no way to reverse a charge. A transaction is done via cryptography: to transfer money to me, you decrypt it with your secret key and encrypt it with mine, handing ownership over to me with no third party involved that can reverse the transaction, and essentially no overhead.
All the rest of the stuff, like the decentralized blockchain and mining, is all about making that work.
Bitcoin crazies forget about the original genesis of Bitcoin. For example, they talk about adding features to stop fraud, reversing transactions, and having a central authority that manages that. This misses the point, because the existing electronic banking system already does that, and does a better job at it than cryptocurrencies ever can. If you want to mock cryptocurrencies, talk about the “DAO”, which did exactly that — and collapsed in a big fraudulent scheme where insiders made money and outsiders didn’t.
Sticking to Satoshi’s original ideas are a lot better than trying to repeat how the crazy fringe activists define Bitcoin.
How does any money have value?
Oliver’s answer is currencies have value because people agree that they have value, like how they agree a Beanie Baby is worth $15,000.
This is wrong. A better way of asking the question why the value of money changes. The dollar has been losing roughly 2% of its value each year for decades. This is called “inflation”, as the dollar loses value, it takes more dollars to buy things, which means the price of things (in dollars) goes up, and employers have to pay us more dollars so that we can buy the same amount of things.
The reason the value of the dollar changes is largely because the Federal Reserve manages the supply of dollars, using the same law of Supply and Demand. As you know, if a supply decreases (like oil), then the price goes up, or if the supply of something increases, the price goes down. The Fed manages money the same way: when prices rise (the dollar is worth less), the Fed reduces the supply of dollars, causing it to be worth more. Conversely, if prices fall (or don’t rise fast enough), the Fed increases supply, so that the dollar is worth less.
The reason money follows the law of Supply and Demand is because people use money, they consume it like they do other goods and services, like gasoline, tax preparation, food, dance lessons, and so forth. It’s not like a fine art painting, a stamp collection or a Beanie Baby — money is a product. It’s just that people have a hard time thinking of it as a consumer product since, in their experience, money is what they use to buy consumer products. But it’s a symmetric operation: when you buy gasoline with dollars, you are actually selling dollars in exchange for gasoline. That you call one side in this transaction “money” and the other “goods” is purely arbitrary, you call gasoline money and dollars the good that is being bought and sold for gasoline.
The reason dollars is a product is because trying to use gasoline as money is a pain in the neck. Storing it and exchanging it is difficult. Goods like this do become money, such as famously how prisons often use cigarettes as a medium of exchange, even for non-smokers, but it has to be a good that is fungible, storable, and easily exchanged. Dollars are the most fungible, the most storable, and the easiest exchanged, so has the most value as “money”. Sure, the mechanic can fix the farmers car for three chickens instead, but most of the time, both parties in the transaction would rather exchange the same value using dollars than chickens.
So the value of dollars is not like the value of Beanie Babies, which people might buy for $15,000, which changes purely on the whims of investors. Instead, a dollar is like gasoline, which obey the law of Supply and Demand.
This brings us back to the question of where Bitcoin gets its value. While Bitcoin is indeed used like dollars to buy things, that’s only a tiny use of the currency, so therefore it’s value isn’t determined by Supply and Demand. Instead, the value of Bitcoin is a lot like Beanie Babies, obeying the laws of investments. So in this respect, Oliver is right about where the value of Bitcoin comes, but wrong about where the value of dollars comes from.
Why Bitcoin conference didn’t take Bitcoin
John Oliver points out the irony of a Bitcoin conference that stopped accepting payments in Bitcoin for tickets.
The biggest reason for this is because Bitcoin has become so popular that transaction fees have gone up. Instead of being proof of failure, it’s proof of popularity. What John Oliver is saying is the old joke that nobody goes to that popular restaurant anymore because it’s too crowded and you can’t get a reservation.
Moreover, the point of Bitcoin is not to replace everyday currencies for everyday transactions. If you read Satoshi Nakamoto’s whitepaper, it’s only goal is to replace certain types of transactions, like purely electronic transactions where electronic goods and services are being exchanged. Where real-life goods/services are being exchanged, existing currencies work just fine. It’s only the crazy activists who claim Bitcoin will eventually replace real world currencies — the saner people see it co-existing with real-world currencies, each with a different value to consumers.
Turning a McNugget back into a chicken
John Oliver uses the metaphor of turning a that while you can process a chicken into McNuggets, you can’t reverse the process. It’s a funny metaphor.
But it’s not clear what the heck this metaphor is trying explain. That’s not a metaphor for the blockchain, but a metaphor for a “cryptographic hash”, where each block is a chicken, and the McNugget is the signature for the block (well, the block plus the signature of the last block, forming a chain).
Even then that metaphor as problems. The McNugget produced from each chicken must be unique to that chicken, for the metaphor to accurately describe a cryptographic hash. You can therefore identify the original chicken simply by looking at the McNugget. A slight change in the original chicken, like losing a feather, results in a completely different McNugget. Thus, nuggets can be used to tell if the original chicken has changed.
This then leads to the key property of the blockchain, it is unalterable. You can’t go back and change any of the blocks of data, because the fingerprints, the nuggets, will also change, and break the nugget chain.
The point is that while John Oliver is laughing at a silly metaphor to explain the blockchain becuase he totally misses the point of the metaphor.
Oliver rightly says “don’t worry if you don’t understand it — most people don’t”, but that includes the big companies that John Oliver name. Some companies do get it, and are producing reasonable things (like JP Morgan, by all accounts), but some don’t. IBM and other big consultancies are charging companies millions of dollars to consult with them on block chain products where nobody involved, the customer or the consultancy, actually understand any of it. That doesn’t stop them from happily charging customers on one side and happily spending money on the other.
Thus, rather than Oliver explaining the problem, he’s just being part of the problem. His explanation of blockchain left you dumber than before.
ICO’s
John Oliver mocks the Brave ICO ($35 million in 30 seconds), claiming it’s all driven by YouTube personalities and people who aren’t looking at the fundamentals.
And while this is true, most ICOs are bunk, the Brave ICO actually had a business model behind it. Brave is a Chrome-like web-browser whose distinguishing feature is that it protects your privacy from advertisers. If you don’t use Brave or a browser with an ad block extension, you have no idea how bad things are for you. However, this presents a problem for websites that fund themselves via advertisements, which is most of them, because visitors no longer see ads. Brave has a fix for this. Most people wouldn’t mind supporting the websites they visit often, like the New York Times. That’s where the Brave ICO “token” comes in: it’s not simply stock in Brave, but a token for micropayments to websites. Users buy tokens, then use them for micropayments to websites like New York Times. The New York Times then sells the tokens back to the market for dollars. The buying and selling of tokens happens without a centralized middleman.
This is still all speculative, of course, and it remains to be seen how successful Brave will be, but it’s a serious effort. It has well respected VC behind the company, a well-respected founder (despite the fact he invented JavaScript), and well-respected employees. It’s not a scam, it’s a legitimate venture.
How to you make money from Bitcoin?
The last part of the show is dedicated to describing all the scam out there, advising people to be careful, and to be “responsible”. This is garbage.
It’s like my simple two step process to making lots of money via Bitcoin: (1) buy when the price is low, and (2) sell when the price is high. My advice is correct, of course, but useless. Same as “be careful” and “invest responsibly”.
The truth about investing in cryptocurrencies is “don’t”. The only responsible way to invest is to buy low-overhead market index funds and hold for retirement. No, you won’t get super rich doing this, but anything other than this is irresponsible gambling.
It’s a hard lesson to learn, because everyone is telling you the opposite. The entire channel CNBC is devoted to day traders, who buy and sell stocks at a high rate based on the same principle as a ponzi scheme, basing their judgment not on the fundamentals (like long term dividends) but animal spirits of whatever stock is hot or cold at the moment. This is the same reason people buy or sell Bitcoin, not because they can describe the fundamental value, but because they believe in a bigger fool down the road who will buy it for even more.
For things like Bitcoin, the trick to making money is to have bought it over 7 years ago when it was essentially worthless, except to nerds who were into that sort of thing. It’s the same tick to making a lot of money in Magic: The Gathering trading cards, which nerds bought decades ago which are worth a ton of money now. Or, to have bought Apple stock back in 2009 when the iPhone was new, when nerds could understand the potential of real Internet access and apps that Wall Street could not.
That was my strategy: be a nerd, who gets into things. I’ve made a good amount of money on all these things because as a nerd, I was into Magic: The Gathering, Bitcoin, and the iPhone before anybody else was, and bought in at the point where these things were essentially valueless.
At this point with cryptocurrencies, with the non-nerds now flooding the market, there little chance of making it rich. The lottery is probably a better bet. Instead, if you want to make money, become a nerd, obsess about a thing, understand a thing when its new, and cash out once the rest of the market figures it out. That might be Brave, for example, but buy into it because you’ve spent the last year studying the browser advertisement ecosystem, the market’s willingness to pay for content, and how their Basic Attention Token delivers value to websites — not because you want in on the ICO craze.
Conclusion
John Oliver spends 25 minutes explaining Bitcoin, Cryptocurrencies, and the Blockchain to you. Sure, it’s funny, but it leaves you worse off than when it started. It admits they “simplify” the explanation, but they simplified it so much to the point where they removed all useful information.
Note to readers! Starting next month, we will be publishing our monthly Hot Startups blog post on the AWS Startup Blog. Please come check us out.
As visual communication—whether through social media channels like Instagram or white space-heavy product pages—becomes a central part of everyone’s life, accessible design platforms and tools become more and more important in the world of tech. This trend is why we have chosen to spotlight three design-related startups—namely Canva, Figma, and InVision—as our hot startups for the month of February. Please read on to learn more about these design-savvy companies and be sure to check out our full post here.
Canva (Sydney, Australia)
For a long time, creating designs required expensive software, extensive studying, and time spent waiting for feedback from clients or colleagues. With Canva, a graphic design tool that makes creating designs much simpler and accessible, users have the opportunity to design anything and publish anywhere. The platform—which integrates professional design elements, including stock photography, graphic elements, and fonts for users to build designs either entirely from scratch or from thousands of free templates—is available on desktop, iOS, and Android, making it possible to spin up an invitation, poster, or graphic on a smartphone at any time.
Figma is a cloud-based design platform that empowers designers to communicate and collaborate more effectively. Using recent advancements in WebGL, Figma offers a design tool that doesn’t require users to install any software or special operating systems. It also allows multiple people to work in a file at the same time—a crucial feature.
As the need for new design talent increases, the industry will need plenty of junior designers to keep up with the demand. Figma is prepared to help students by offering their platform for free. Through this, they “hope to give young designers the resources necessary to kick-start their education and eventually, their careers.”
Founded in 2011 with the goal of helping improve every digital experience in the world, digital product design platform InVision helps users create a streamlined and scalable product design process, build and iterate on prototypes, and collaborate across organizations. The company, which raised a $100 million series E last November, bringing the company’s total funding to $235 million, currently powers the digital product design process at more than 80 percent of the Fortune 100 and brands like Airbnb, HBO, Netflix, and Uber.
This story of leaked Australian government secrets is unlike any other I’ve heard:
It begins at a second-hand shop in Canberra, where ex-government furniture is sold off cheaply.
The deals can be even cheaper when the items in question are two heavy filing cabinets to which no-one can find the keys.
They were purchased for small change and sat unopened for some months until the locks were attacked with a drill.
Inside was the trove of documents now known as The Cabinet Files.
The thousands of pages reveal the inner workings of five separate governments and span nearly a decade.
Nearly all the files are classified, some as “top secret” or “AUSTEO”, which means they are to be seen by Australian eyes only.
Yes, that really happened. The person who bought and opened the file cabinets contacted the Australian Broadcasting Corp, who is now publishing a bunch of it.
There’s lots of interesting (and embarassing) stuff in the documents, although most of it is local politics. I am more interested in the government’s reaction to the incident: they’re pushing for a law making it illegal for the press to publish government secrets it received through unofficial channels.
“The one thing I would point out about the legislation that does concern me particularly is that classified information is an element of the offence,” he said.
“That is to say, if you’ve got a filing cabinet that is full of classified information … that means all the Crown has to prove if they’re prosecuting you is that it is classified nothing else.
“They don’t have to prove that you knew it was classified, so knowledge is beside the point.”
[…]
Many groups have raised concerns, including media organisations who say they unfairly target journalists trying to do their job.
But really anyone could be prosecuted just for possessing classified information, regardless of whether they know about it.
That might include, for instance, if you stumbled across a folder of secret files in a regular skip bin while walking home and handed it over to a journalist.
This illustrates a fundamental misunderstanding of the threat. The Australian Broadcasting Corp gets their funding from the government, and was very restrained in what they published. They waited months before publishing as they coordinated with the Australian government. They allowed the government to secure the files, and then returned them. From the government’s perspective, they were the best possible media outlet to receive this information. If the government makes it illegal for the Australian press to publish this sort of material, the next time it will be sent to the BBC, the Guardian, the New York Times, or Wikileaks. And since people no longer read their news from newspapers sold in stores but on the Internet, the result will be just as many people reading the stories with far fewer redactions.
The proposed law is older than this leak, but the leak is giving it new life. The Australian opposition party is being cagey on whether they will support the law. They don’t want to appear weak on national security, so I’m not optimistic.
EDITED TO ADD (2/8): The Australian government backed down on that new security law.
Франция опита да регламентира и позицията към арменския геноцид от 1915 г., но не се получи (през 2012 г. Конституционният съд обяви закона за противоконституционен).
И ето сега Полша.
Отричането на Холокоста в Полша е престъпление. Или беше досега. Сега с нов законопроект се забранява израза полски лагер на смъртта/ Polish death camps: според изявлението на правителството днес всеки, който използва лъжливия термин полски лагер на смъртта, не само скверни паметта на жертвите, но трови истината с лъжа, което трябва да се преследва и наказва.
Според Reuters.com повече от три милиона от общо 3,2 милиона евреи в Полша са убити от нацистите – и това е около половината от евреите, убити в Холокоста. Евреи от цяла Европа са изпращани да бъдат убити в лагери, построени и експлоатирани от германците на полска земя, включително Аушвиц, Треблинка, Белцек и Собибор.
Медиите изнасят, че чл.55 гласи:
“Който обвинява публично и в несъответствие с фактите полската нация или полската държава, че са отговорни или са съучаствали в нацистките престъпления, извършени от Третия германски Райх или други престъпления против мира и човечеството или военни престъпления или по друг начин грубо омаловажава действителните извършители на тези престъпления, подлежи на глоба или наказание лишаване от свобода до три години. “
Както може да се очаква, текстът среща масово неодобрение. New York Times публикува неласкави мнения – посочва се, че мярката е част от програма, въведена през последните две години, наречена от правителството на PiS/Право и справедливост добра промяна. Промяната включва опити да се узакони правителственият контрол над медиите и да се въведат драконови закони против абортите. PiS също така променят публичността с език, напомнящи новоговора/newspeak на комунистическите години. Комунистите говореха за враговете на народа. Днес Качински нарича онези, които критикуват правителството, най-лошият вид поляци. Другите, които се радват на правителството, са наричани привърженици на закона и правосъдието.
Най-лошият вид поляци излязоха по улиците, за да протестират по-масово, отколкото Полша е виждала от времето на Солидарност, завършва публикацията.
We came to see Toby and Brian Bahn, physics teacher for Dover High School and leader of the Dover Robotics club, so they could tell us about the inner workings of the Tic-Tac-Toe Robot project, and how the Raspberry Pi fit within it. Check out our video for Toby’s explanation of the build and the software controlling it.
Toby’s original robotic arm prototype used a weight to direct the pen on and off the paper. He later replaced this with a servo motor.
Toby documented the prototyping process for the robot on the Dover Robotics blog. Head over there to hear more about the highs and lows of building a robotic arm from scratch, and about how Toby learned to integrate a Raspberry Pi for both software and hardware control.
The finished build is a tic-tac-toe beast, besting everyone who dares to challenge it to a game.
And in case you’re wondering: no, none of the Raspberry Pi team were able to beat the Tic-Tac-Toe Robot when we played against it.
Your turn
We always love seeing Raspberry Pis being used in schools to teach coding and digital making, whether in the classroom or during after-school activities such as the Dover Robotics club and our own Code Clubs and CoderDojos. If you are part of a coding or robotics club, we’d love to hear your story! So make sure to share your experiences and projects in the comments below, or via our social media accounts.
The Washington Post is reporting that poor morale at the NSA is causing a significant talent shortage. A November New York Timesarticle said much the same thing.
The articles point to many factors: the recent reorganization, low pay, and the various leaks. I have been saying for a while that the Shadow Brokers leaks have been much more damaging to the NSA — both to morale and operating capabilities — than Edward Snowden. I think it’ll take most of a decade for them to recover.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.