I can’t believe that check washing is still a thing:
“Check washing” is a practice where thieves break into mailboxes (or otherwise steal mail), find envelopes with checks, then use special solvents to remove the information on that check (except for the signature) and then change the payee and the amount to a bank account under their control so that it could be deposited at out-state-banks and oftentimes by a mobile phone.
The article suggests a solution: stop using paper checks.
The US Cybersecurity and Infrastructure Security Agency (CISA) published a long and technical alert describing a North Korea hacking scheme against ATMs in a bunch of countries worldwide:
This joint advisory is the result of analytic efforts among the Cybersecurity and Infrastructure Security Agency (CISA), the Department of the Treasury (Treasury), the Federal Bureau of Investigation (FBI) and U.S. Cyber Command (USCYBERCOM). Working with U.S. government partners, CISA, Treasury, FBI, and USCYBERCOM identified malware and indicators of compromise (IOCs) used by the North Korean government in an automated teller machine (ATM) cash-out scheme — referred to by the U.S. Government as “FASTCash 2.0: North Korea’s BeagleBoyz Robbing Banks.”
The level of detail is impressive, as seems to be common in CISA’s alerts and analysis reports.
The US Cybersecurity and Infrastructure Security Agency (CISA) published a long and technical alert describing a North Korea hacking scheme against ATMs in a bunch of countries worldwide:
This joint advisory is the result of analytic efforts among the Cybersecurity and Infrastructure Security Agency (CISA), the Department of the Treasury (Treasury), the Federal Bureau of Investigation (FBI) and U.S. Cyber Command (USCYBERCOM). Working with U.S. government partners, CISA, Treasury, FBI, and USCYBERCOM identified malware and indicators of compromise (IOCs) used by the North Korean government in an automated teller machine (ATM) cash-out scheme — referred to by the U.S. Government as “FASTCash 2.0: North Korea’s BeagleBoyz Robbing Banks.”
The level of detail is impressive, as seems to be common in CISA’s alerts and analysis reports.
South Africa’s Postbank experienced a catastrophic security failure. The bank’s master PIN key was stolen, forcing it to cancel and replace 12 million bank cards.
The breach resulted from the printing of the bank’s encrypted master key in plain, unencrypted digital language at the Postbank’s old data centre in the Pretoria city centre.
According to a number of internal Postbank reports, which the Sunday Times obtained, the master key was then stolen by employees.
One of the reports said that the cards would cost about R1bn to replace. The master key, a 36-digit code, allows anyone who has it to gain unfettered access to the bank’s systems, and allows them to read and rewrite account balances, and change information and data on any of the bank’s 12-million cards.
The bank lost $3.2 million in fraudulent transactions before the theft was discovered. Replacing all the cards will cost an estimated $58 million.
Kaspersky is reporting on a series of bank hacks — called DarkVishnya — perpetrated through malicious hardware being surreptitiously installed into the target network:
In 2017-2018, Kaspersky Lab specialists were invited to research a series of cybertheft incidents. Each attack had a common springboard: an unknown device directly connected to the company’s local network. In some cases, it was the central office, in others a regional office, sometimes located in another country. At least eight banks in Eastern Europe were the targets of the attacks (collectively nicknamed DarkVishnya), which caused damage estimated in the tens of millions of dollars.
Each attack can be divided into several identical stages. At the first stage, a cybercriminal entered the organization’s building under the guise of a courier, job seeker, etc., and connected a device to the local network, for example, in one of the meeting rooms. Where possible, the device was hidden or blended into the surroundings, so as not to arouse suspicion.
The devices used in the DarkVishnya attacks varied in accordance with the cybercriminals’ abilities and personal preferences. In the cases we researched, it was one of three tools:
netbook or inexpensive laptop
Raspberry Pi computer
Bash Bunny, a special tool for carrying out USB attacks
Inside the local network, the device appeared as an unknown computer, an external flash drive, or even a keyboard. Combined with the fact that Bash Bunny is comparable in size to a USB flash drive, this seriously complicated the search for the entry point. Remote access to the planted device was via a built-in or USB-connected GPRS/3G/LTE modem.
There are some good lessons in this article on financial fraud:
That’s how we got it so wrong. We were looking for incidental breaches of technical regulations, not systematic crime. And the thing is, that’s normal. The nature of fraud is that it works outside your field of vision, subverting the normal checks and balances so that the world changes while the picture stays the same. People in financial markets have been missing the wood for the trees for as long as there have been markets.
[..]
Trust — particularly between complete strangers, with no interactions beside relatively anonymous market transactions — is the basis of the modern industrial economy. And the story of the development of the modern economy is in large part the story of the invention and improvement of technologies and institutions for managing that trust.
And as industrial society develops, it becomes easier to be a victim. In The Wealth of Nations, Adam Smith described how prosperity derived from the division of labour — the 18 distinct operations that went into the manufacture of a pin, for example. While this was going on, the modern world also saw a growing division of trust. The more a society benefits from the division of labour in checking up on things, the further you can go into a con game before you realise that you’re in one.
[…]
Libor teaches us a valuable lesson about commercial fraud — that unlike other crimes, it has a problem of denial as well as one of detection. There are very few other criminal acts where the victim not only consents to the criminal act, but voluntarily transfers the money or valuable goods to the criminal. And the hierarchies, status distinctions and networks that make up a modern economy also create powerful psychological barriers against seeing fraud when it is happening. White-collar crime is partly defined by the kind of person who commits it: a person of high status in the community, the kind of person who is always given the benefit of the doubt.
[…]
Fraudsters don’t play on moral weaknesses, greed or fear; they play on weaknesses in the system of checks and balances — the audit processes that are meant to supplement an overall environment of trust. One point that comes up again and again when looking at famous and large-scale frauds is that, in many cases, everything could have been brought to a halt at a very early stage if anyone had taken care to confirm all the facts. But nobody does confirm all the facts. There are just too bloody many of them. Even after the financial rubble has settled and the arrests been made, this is a huge problem.
Apple is rolling out an iOS security usability feature called Security code AutoFill. The basic idea is that the OS scans incoming SMS messages for security codes and suggests them in AutoFill, so that people can use them without having to memorize or type them.
Sounds like a really good idea, but Andreas Gutmann points out an application where this could become a vulnerability: when authenticating transactions:
Transaction authentication, as opposed to user authentication, is used to attest the correctness of the intention of an action rather than just the identity of a user. It is most widely known from online banking, where it is an essential tool to defend against sophisticated attacks. For example, an adversary can try to trick a victim into transferring money to a different account than the one intended. To achieve this the adversary might use social engineering techniques such as phishing and vishing and/or tools such as Man-in-the-Browser malware.
Transaction authentication is used to defend against these adversaries. Different methods exist but in the one of relevance here — which is among the most common methods currently used — the bank will summarise the salient information of any transaction request, augment this summary with a TAN tailored to that information, and send this data to the registered phone number via SMS. The user, or bank customer in this case, should verify the summary and, if this summary matches with his or her intentions, copy the TAN from the SMS message into the webpage.
This new iOS feature creates problems for the use of SMS in transaction authentication. Applied to 2FA, the user would no longer need to open and read the SMS from which the code has already been conveniently extracted and presented. Unless this feature can reliably distinguish between OTPs in 2FA and TANs in transaction authentication, we can expect that users will also have their TANs extracted and presented without context of the salient information, e.g. amount and destination of the transaction. Yet, precisely the verification of this salient information is essential for security. Examples of where this scenario could apply include a Man-in-the-Middle attack on the user accessing online banking from their mobile browser, or where a malicious website or app on the user’s phone accesses the bank’s legitimate online banking service.
This is an interesting interaction between two security systems. Security code AutoFill eliminates the need for the user to view the SMS or memorize the one-time code. Transaction authentication assumes the user read and approved the additional information in the SMS message before using the one-time code.
Ross Anderson has a new paper on cryptocurrency exchanges. From his blog:
Bitcoin Redux explains what’s going wrong in the world of cryptocurrencies. The bitcoin exchanges are developing into a shadow banking system, which do not give their customers actual bitcoin but rather display a “balance” and allow them to transact with others. However if Alice sends Bob a bitcoin, and they’re both customers of the same exchange, it just adjusts their balances rather than doing anything on the blockchain. This is an e-money service, according to European law, but is the law enforced? Not where it matters. We’ve been looking at the details.
Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.
Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.
DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.
To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.
The solution that I describe provides the following benefits:
Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
Automatically updates your model to get real-time predictions
Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
Makes it easier for developers of all skill levels to use Amazon SageMaker
All code and data set in this post are available in this .zip file.
Solution architecture
The following diagram shows the overall architecture of the solution.
The steps that data follows through the architecture are as follows:
Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
Amazon SageMaker renews the model artifact and update the endpoint.
The converted CSV is available for ad hoc queries with Amazon Athena.
Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.
Building the auto-updating model
This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.
Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.
For this solution, the banking.csv should be imported into a DynamoDB table.
Export a DynamoDB table
To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.
One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.
For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.
Add the script to an existing pipeline
After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:
Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
For Actions, choose Edit.
In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.
Paste the following command into the new step after the data upload step:
The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.
The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.
Automation script: Convert JSON data to CSV with Hive
We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.
When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.
Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.
The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.
After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.
Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.
Automation script: Renew the Amazon SageMaker model
After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3. For renewing model artifact, you must create a new training job. Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.
In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.
#!/bin/bash
## Define variable
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>"
# Select containers image based on region.
case "$REGION" in
"us-west-2" )
IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
;;
"us-east-1" )
IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest"
;;
"us-east-2" )
IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest"
;;
"eu-west-1" )
IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest"
;;
*)
echo "Invalid Region Name"
exit 1 ;
esac
# Start training job and creating model artifact
TRAINING_JOB_NAME=TRAIN-${DTTIME}
S3OUTPUT="s3://<your bucket name>/model/"
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION} --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE} --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None" }]' --output-data-config S3OutputPath=${S3OUTPUT} --resource-config InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier
# Wait until job completed
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}
# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION} --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME} --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT} --execution-role-arn ${ROLE}
# create a new endpoint configuration
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME} --production-variants VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge
# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
aws sagemaker create-endpoint --endpoint-name ServiceEndpoint --endpoint-config-name ${CONFIGNAME} --region ${REGION}
else
aws sagemaker update-endpoint --endpoint-name ServiceEndpoint --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi
Grant permission
Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.
After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.
Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.
Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
Train the Amazon SageMaker model with the new data source.
When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.
Running ad hoc queries using Amazon Athena
Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.
With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog.
Creating an Amazon Athena table and running it
Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.
=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
age int,
job string,
marital string ,
education string,
default string,
housing string,
loan string,
contact string,
month string,
day_of_week string,
duration int,
campaign int,
pdays int ,
previous int ,
poutcome string,
emp_var_rate double,
cons_price_idx double,
cons_conf_idx double,
euribor3m double,
nr_employed double,
y int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n'
LOCATION 's3://<your bucket name>/<datasource path>/';
The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.
=== Sample Query ===
SELECT corr(age,y) AS correlation_age_and_target,
corr(duration,y) AS correlation_duration_and_target,
corr(campaign,y) AS correlation_campaign_and_target,
corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y ,
CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact
FROM datasource
) datasource ;
Conclusion
In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.
You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.
Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”
The very short version is that a UK bank, TSB, which had been merged into and then many years later was spun out of Lloyds Bank, was bought by the Spanish bank Banco Sabadell in 2015. Lloyds had continued to run the TSB systems and was to transfer them over to Sabadell over the weekend. It’s turned out to be an epic failure, and it’s not clear if and when this can be straightened out.
The more serious issue is the fact that customers still can’t access online accounts and even more disconcerting, are sometimes being allowed into other people’s accounts, says there are massive problems with data integrity. That’s a nightmare to sort out.
Even worse, the fact that this situation has persisted strongly suggests that Lloyds went ahead with the migration without allowing for a rollback.
Elections serve two purposes. The first, and obvious, purpose is to accurately choose the winner. But the second is equally important: to convince the loser. To the extent that an election system is not transparently and auditably accurate, it fails in that second purpose. Our election systems are failing, and we need to fix them.
Today, we conduct our elections on computers. Our registration lists are in computer databases. We vote on computerized voting machines. And our tabulation and reporting is done on computers. We do this for a lot of good reasons, but a side effect is that elections now have all the insecurities inherent in computers. The only way to reliably protect elections from both malice and accident is to use something that is not hackable or unreliable at scale; the best way to do that is to back up as much of the system as possible with paper.
Recently, there have been two graphic demonstrations of how bad our computerized voting system is. In 2007, the states of California and Ohio conducted audits of their electronic voting machines. Expert review teams found exploitable vulnerabilities in almost every component they examined. The researchers were able to undetectably alter vote tallies, erase audit logs, and load malware on to the systems. Some of their attacks could be implemented by a single individual with no greater access than a normal poll worker; others could be done remotely.
Last year, the Defcon hackers’ conference sponsored a Voting Village. Organizers collected 25 pieces of voting equipment, including voting machines and electronic poll books. By the end of the weekend, conference attendees had found ways to compromise every piece of test equipment: to load malicious software, compromise vote tallies and audit logs, or cause equipment to fail.
It’s important to understand that these were not well-funded nation-state attackers. These were not even academics who had been studying the problem for weeks. These were bored hackers, with no experience with voting machines, playing around between parties one weekend.
It shouldn’t be any surprise that voting equipment, including voting machines, voter registration databases, and vote tabulation systems, are that hackable. They’re computers — often ancient computers running operating systems no longer supported by the manufacturers — and they don’t have any magical security technology that the rest of the industry isn’t privy to. If anything, they’re less secure than the computers we generally use, because their manufacturers hide any flaws behind the proprietary nature of their equipment.
We’re not just worried about altering the vote. Sometimes causing widespread failures, or even just sowing mistrust in the system, is enough. And an election whose results are not trusted or believed is a failed election.
Voting systems have another requirement that makes security even harder to achieve: the requirement for a secret ballot. Because we have to securely separate the election-roll system that determines who can vote from the system that collects and tabulates the votes, we can’t use the security systems available to banking and other high-value applications.
We can securely bank online, but can’t securely vote online. If we could do away with anonymity — if everyone could check that their vote was counted correctly — then it would be easy to secure the vote. But that would lead to other problems. Before the US had the secret ballot, voter coercion and vote-buying were widespread.
We can’t, so we need to accept that our voting systems are insecure. We need an election system that is resilient to the threats. And for many parts of the system, that means paper.
Let’s start with the voter rolls. We know they’ve already been targeted. In 2016, someone changed the party affiliation of hundreds of voters before the Republican primary. That’s just one possibility. A well-executed attack that deletes, for example, one in five voters at random — or changes their addresses — would cause chaos on election day.
Yes, we need to shore up the security of these systems. We need better computer, network, and database security for the various state voter organizations. We also need to better secure the voterregistration websites, with better design and better internet security. We need better security for the companies that build and sell all this equipment.
Multiple, unchangeable backups are essential. A record of every addition, deletion, and change needs to be stored on a separate system, on write-only media like a DVD. Copies of that DVD, or — even better — a paper printout of the voter rolls, should be available at every polling place on election day. We need to be ready for anything.
Next, the voting machines themselves. Security researchers agree that the gold standard is a voter-verified paper ballot. The easiest (and cheapest) way to achieve this is through optical-scan voting. Voters mark paper ballots by hand; they are fed into a machine and counted automatically. That paper ballot is saved, and serves as a final true record in a recount in case of problems. Touch-screen machines that print a paper ballot to drop in a ballot box can also work for voters with disabilities, as long as the ballot can be easily read and verified by the voter.
Finally, the tabulation and reporting systems. Here again we need more security in the process, but we must always use those paper ballots as checks on the computers. A manual, post-election, risk-limiting audit varies the number of ballots examined according to the margin of victory. Conducting this audit after every election, before the results are certified, gives us confidence that the election outcome is correct, even if the voting machines and tabulation computers have been tampered with. Additionally, we need better coordination and communications when incidents occur.
It’s vital to agree on these procedures and policies before an election. Before the fact, when anyone can win and no one knows whose votes might be changed, it’s easy to agree on strong security. But after the vote, someone is the presumptive winner — and then everything changes. Half of the country wants the result to stand, and half wants it reversed. At that point, it’s too late to agree on anything.
The politicians running in the election shouldn’t have to argue their challenges in court. Getting elections right is in the interest of all citizens. Many countries have independent election commissions that are charged with conducting elections and ensuring their security. We don’t do that in the US.
Instead, we have representatives from each of our two parties in the room, keeping an eye on each other. That provided acceptable security against 20th-century threats, but is totally inadequate to secure our elections in the 21st century. And the belief that the diversity of voting systems in the US provides a measure of security is a dangerous myth, because few districts can be decisive and there are so few voting-machine vendors.
We candobetter. In 2017, the Department of Homeland Security declared elections to be critical infrastructure, allowing the department to focus on securing them. On 23 March, Congress allocated $380m to states to upgrade election security.
These are good starts, but don’t go nearly far enough. The constitution delegates elections to the states but allows Congress to “make or alter such Regulations”. In 1845, Congress set a nationwide election day. Today, we need it to set uniform and strict election standards.
Amazon Simple Notification Service (SNS) now supports VPC Endpoints (VPCE) via AWS PrivateLink. You can use VPC Endpoints to privately publish messages to SNS topics, from an Amazon Virtual Private Cloud (VPC), without traversing the public internet. When you use AWS PrivateLink, you don’t need to set up an Internet Gateway (IGW), Network Address Translation (NAT) device, or Virtual Private Network (VPN) connection. You don’t need to use public IP addresses, either.
VPC Endpoints doesn’t require code changes and can bring additional security to Pub/Sub Messaging use cases that rely on SNS. VPC Endpoints helps promote data privacy and is aligned with assurance programs, including the Health Insurance Portability and Accountability Act (HIPAA), FedRAMP, and others discussed below.
VPC Endpoints for SNS in action
Here’s how VPC Endpoints for SNS works. The following example is based on a banking system that processes mortgage applications. This banking system, which has been deployed to a VPC, publishes each mortgage application to an SNS topic. The SNS topic then fans out the mortgage application message to two subscribing AWS Lambda functions:
Save-Mortgage-Application stores the application in an Amazon DynamoDB table. As the mortgage application contains personally identifiable information (PII), the message must not traverse the public internet.
Save-Credit-Report checks the applicant’s credit history against an external Credit Reporting Agency (CRA), then stores the final credit report in an Amazon S3 bucket.
The following diagram depicts the underlying architecture for this banking system:
To protect applicants’ data, the financial institution responsible for developing this banking system needed a mechanism to prevent PII data from traversing the internet when publishing mortgage applications from their VPC to the SNS topic. Therefore, they created a VPC endpoint to enable their publisher Amazon EC2 instance to privately connect to the SNS API. As shown in the diagram, when the VPC endpoint is created, an Elastic Network Interface (ENI) is automatically placed in the same VPC subnet as the publisher EC2 instance. This ENI exposes a private IP address that is used as the entry point for traffic destined to SNS. This ensures that traffic between the VPC and SNS doesn’t leave the Amazon network.
Set up VPC Endpoints for SNS
The process for creating a VPC endpoint to privately connect to SNS doesn’t require code changes: access the VPC Management Console, navigate to the Endpoints section, and create a new Endpoint. Three attributes are required:
The SNS service name.
The VPC and Availability Zones (AZs) from which you’ll publish your messages.
The Security Group (SG) to be associated with the endpoint network interface. The Security Group controls the traffic to the endpoint network interface from resources in your VPC. If you don’t specify a Security Group, the default Security Group for your VPC will be associated.
The SNS API is served through HTTP Secure (HTTPS), and encrypts all messages in transit with Transport Layer Security (TLS) certificates issued by Amazon Trust Services (ATS). The certificates verify the identity of the SNS API server when encrypted connections are established. The certificates help establish proof that your SNS API client (SDK, CLI) is communicating securely with the SNS API server. A Certificate Authority (CA) issues the certificate to a specific domain. Hence, when a domain presents a certificate that’s issued by a trusted CA, the SNS API client knows it’s safe to make the connection.
Summary
VPC Endpoints can increase the security of your pub/sub messaging use cases by allowing you to publish messages to SNS topics, from instances in your VPC, without traversing the internet. Setting up VPC Endpoints for SNS doesn’t require any code changes because the SNS API address remains the same.
VPC Endpoints for SNS is now available in all AWS Regions where AWS PrivateLink is available. For information on pricing and regional availability, visit the VPC pricing page. For more information and on-boarding, see Publishing to Amazon SNS Topics from Amazon Virtual Private Cloud in the SNS documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Amazon SNS forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
John Oliver covered bitcoin/cryptocurrencies last night. I thought I’d describe a bunch of things he gets wrong.
How Bitcoin works
Nowhere in the show does it describe what Bitcoin is and how it works.
Discussions should always start with Satoshi Nakamoto’s original paper. The thing Satoshi points out is that there is an important cost to normal transactions, namely, the entire legal system designed to protect you against fraud, such as the way you can reverse the transactions on your credit card if it gets stolen. The point of Bitcoin is that there is no way to reverse a charge. A transaction is done via cryptography: to transfer money to me, you decrypt it with your secret key and encrypt it with mine, handing ownership over to me with no third party involved that can reverse the transaction, and essentially no overhead.
All the rest of the stuff, like the decentralized blockchain and mining, is all about making that work.
Bitcoin crazies forget about the original genesis of Bitcoin. For example, they talk about adding features to stop fraud, reversing transactions, and having a central authority that manages that. This misses the point, because the existing electronic banking system already does that, and does a better job at it than cryptocurrencies ever can. If you want to mock cryptocurrencies, talk about the “DAO”, which did exactly that — and collapsed in a big fraudulent scheme where insiders made money and outsiders didn’t.
Sticking to Satoshi’s original ideas are a lot better than trying to repeat how the crazy fringe activists define Bitcoin.
How does any money have value?
Oliver’s answer is currencies have value because people agree that they have value, like how they agree a Beanie Baby is worth $15,000.
This is wrong. A better way of asking the question why the value of money changes. The dollar has been losing roughly 2% of its value each year for decades. This is called “inflation”, as the dollar loses value, it takes more dollars to buy things, which means the price of things (in dollars) goes up, and employers have to pay us more dollars so that we can buy the same amount of things.
The reason the value of the dollar changes is largely because the Federal Reserve manages the supply of dollars, using the same law of Supply and Demand. As you know, if a supply decreases (like oil), then the price goes up, or if the supply of something increases, the price goes down. The Fed manages money the same way: when prices rise (the dollar is worth less), the Fed reduces the supply of dollars, causing it to be worth more. Conversely, if prices fall (or don’t rise fast enough), the Fed increases supply, so that the dollar is worth less.
The reason money follows the law of Supply and Demand is because people use money, they consume it like they do other goods and services, like gasoline, tax preparation, food, dance lessons, and so forth. It’s not like a fine art painting, a stamp collection or a Beanie Baby — money is a product. It’s just that people have a hard time thinking of it as a consumer product since, in their experience, money is what they use to buy consumer products. But it’s a symmetric operation: when you buy gasoline with dollars, you are actually selling dollars in exchange for gasoline. That you call one side in this transaction “money” and the other “goods” is purely arbitrary, you call gasoline money and dollars the good that is being bought and sold for gasoline.
The reason dollars is a product is because trying to use gasoline as money is a pain in the neck. Storing it and exchanging it is difficult. Goods like this do become money, such as famously how prisons often use cigarettes as a medium of exchange, even for non-smokers, but it has to be a good that is fungible, storable, and easily exchanged. Dollars are the most fungible, the most storable, and the easiest exchanged, so has the most value as “money”. Sure, the mechanic can fix the farmers car for three chickens instead, but most of the time, both parties in the transaction would rather exchange the same value using dollars than chickens.
So the value of dollars is not like the value of Beanie Babies, which people might buy for $15,000, which changes purely on the whims of investors. Instead, a dollar is like gasoline, which obey the law of Supply and Demand.
This brings us back to the question of where Bitcoin gets its value. While Bitcoin is indeed used like dollars to buy things, that’s only a tiny use of the currency, so therefore it’s value isn’t determined by Supply and Demand. Instead, the value of Bitcoin is a lot like Beanie Babies, obeying the laws of investments. So in this respect, Oliver is right about where the value of Bitcoin comes, but wrong about where the value of dollars comes from.
Why Bitcoin conference didn’t take Bitcoin
John Oliver points out the irony of a Bitcoin conference that stopped accepting payments in Bitcoin for tickets.
The biggest reason for this is because Bitcoin has become so popular that transaction fees have gone up. Instead of being proof of failure, it’s proof of popularity. What John Oliver is saying is the old joke that nobody goes to that popular restaurant anymore because it’s too crowded and you can’t get a reservation.
Moreover, the point of Bitcoin is not to replace everyday currencies for everyday transactions. If you read Satoshi Nakamoto’s whitepaper, it’s only goal is to replace certain types of transactions, like purely electronic transactions where electronic goods and services are being exchanged. Where real-life goods/services are being exchanged, existing currencies work just fine. It’s only the crazy activists who claim Bitcoin will eventually replace real world currencies — the saner people see it co-existing with real-world currencies, each with a different value to consumers.
Turning a McNugget back into a chicken
John Oliver uses the metaphor of turning a that while you can process a chicken into McNuggets, you can’t reverse the process. It’s a funny metaphor.
But it’s not clear what the heck this metaphor is trying explain. That’s not a metaphor for the blockchain, but a metaphor for a “cryptographic hash”, where each block is a chicken, and the McNugget is the signature for the block (well, the block plus the signature of the last block, forming a chain).
Even then that metaphor as problems. The McNugget produced from each chicken must be unique to that chicken, for the metaphor to accurately describe a cryptographic hash. You can therefore identify the original chicken simply by looking at the McNugget. A slight change in the original chicken, like losing a feather, results in a completely different McNugget. Thus, nuggets can be used to tell if the original chicken has changed.
This then leads to the key property of the blockchain, it is unalterable. You can’t go back and change any of the blocks of data, because the fingerprints, the nuggets, will also change, and break the nugget chain.
The point is that while John Oliver is laughing at a silly metaphor to explain the blockchain becuase he totally misses the point of the metaphor.
Oliver rightly says “don’t worry if you don’t understand it — most people don’t”, but that includes the big companies that John Oliver name. Some companies do get it, and are producing reasonable things (like JP Morgan, by all accounts), but some don’t. IBM and other big consultancies are charging companies millions of dollars to consult with them on block chain products where nobody involved, the customer or the consultancy, actually understand any of it. That doesn’t stop them from happily charging customers on one side and happily spending money on the other.
Thus, rather than Oliver explaining the problem, he’s just being part of the problem. His explanation of blockchain left you dumber than before.
ICO’s
John Oliver mocks the Brave ICO ($35 million in 30 seconds), claiming it’s all driven by YouTube personalities and people who aren’t looking at the fundamentals.
And while this is true, most ICOs are bunk, the Brave ICO actually had a business model behind it. Brave is a Chrome-like web-browser whose distinguishing feature is that it protects your privacy from advertisers. If you don’t use Brave or a browser with an ad block extension, you have no idea how bad things are for you. However, this presents a problem for websites that fund themselves via advertisements, which is most of them, because visitors no longer see ads. Brave has a fix for this. Most people wouldn’t mind supporting the websites they visit often, like the New York Times. That’s where the Brave ICO “token” comes in: it’s not simply stock in Brave, but a token for micropayments to websites. Users buy tokens, then use them for micropayments to websites like New York Times. The New York Times then sells the tokens back to the market for dollars. The buying and selling of tokens happens without a centralized middleman.
This is still all speculative, of course, and it remains to be seen how successful Brave will be, but it’s a serious effort. It has well respected VC behind the company, a well-respected founder (despite the fact he invented JavaScript), and well-respected employees. It’s not a scam, it’s a legitimate venture.
How to you make money from Bitcoin?
The last part of the show is dedicated to describing all the scam out there, advising people to be careful, and to be “responsible”. This is garbage.
It’s like my simple two step process to making lots of money via Bitcoin: (1) buy when the price is low, and (2) sell when the price is high. My advice is correct, of course, but useless. Same as “be careful” and “invest responsibly”.
The truth about investing in cryptocurrencies is “don’t”. The only responsible way to invest is to buy low-overhead market index funds and hold for retirement. No, you won’t get super rich doing this, but anything other than this is irresponsible gambling.
It’s a hard lesson to learn, because everyone is telling you the opposite. The entire channel CNBC is devoted to day traders, who buy and sell stocks at a high rate based on the same principle as a ponzi scheme, basing their judgment not on the fundamentals (like long term dividends) but animal spirits of whatever stock is hot or cold at the moment. This is the same reason people buy or sell Bitcoin, not because they can describe the fundamental value, but because they believe in a bigger fool down the road who will buy it for even more.
For things like Bitcoin, the trick to making money is to have bought it over 7 years ago when it was essentially worthless, except to nerds who were into that sort of thing. It’s the same tick to making a lot of money in Magic: The Gathering trading cards, which nerds bought decades ago which are worth a ton of money now. Or, to have bought Apple stock back in 2009 when the iPhone was new, when nerds could understand the potential of real Internet access and apps that Wall Street could not.
That was my strategy: be a nerd, who gets into things. I’ve made a good amount of money on all these things because as a nerd, I was into Magic: The Gathering, Bitcoin, and the iPhone before anybody else was, and bought in at the point where these things were essentially valueless.
At this point with cryptocurrencies, with the non-nerds now flooding the market, there little chance of making it rich. The lottery is probably a better bet. Instead, if you want to make money, become a nerd, obsess about a thing, understand a thing when its new, and cash out once the rest of the market figures it out. That might be Brave, for example, but buy into it because you’ve spent the last year studying the browser advertisement ecosystem, the market’s willingness to pay for content, and how their Basic Attention Token delivers value to websites — not because you want in on the ICO craze.
Conclusion
John Oliver spends 25 minutes explaining Bitcoin, Cryptocurrencies, and the Blockchain to you. Sure, it’s funny, but it leaves you worse off than when it started. It admits they “simplify” the explanation, but they simplified it so much to the point where they removed all useful information.
On January 3, the world learned about a series of major security vulnerabilities in modern microprocessors. Called Spectre and Meltdown, these vulnerabilities were discovered by several different researchers last summer, disclosed to the microprocessors’ manufacturers, and patched — at least to the extent possible.
This news isn’t really any different from the usual endless stream of security vulnerabilities and patches, but it’s also a harbinger of the sorts of security problems we’re going to be seeing in the coming years. These are vulnerabilities in computer hardware, not software. They affect virtually all high-end microprocessors produced in the last 20 years. Patching them requires large-scale coordination across the industry, and in some cases drastically affects the performance of the computers. And sometimes patching isn’t possible; the vulnerability will remain until the computer is discarded.
Spectre and Meltdown aren’t anomalies. They represent a new area to look for vulnerabilities and a new avenue of attack. They’re the future of security — and it doesn’t look good for the defenders.
Modern computers do lots of things at the same time. Your computer and your phone simultaneously run several applications — or apps. Your browser has several windows open. A cloud computer runs applications for many different computers. All of those applications need to be isolated from each other. For security, one application isn’t supposed to be able to peek at what another one is doing, except in very controlled circumstances. Otherwise, a malicious advertisement on a website you’re visiting could eavesdrop on your banking details, or the cloud service purchased by some foreign intelligence organization could eavesdrop on every other cloud customer, and so on. The companies that write browsers, operating systems, and cloud infrastructure spend a lot of time making sure this isolation works.
Both Spectre and Meltdown break that isolation, deep down at the microprocessor level, by exploiting performance optimizations that have been implemented for the past decade or so. Basically, microprocessors have become so fast that they spend a lot of time waiting for data to move in and out of memory. To increase performance, these processors guess what data they’re going to receive and execute instructions based on that. If the guess turns out to be correct, it’s a performance win. If it’s wrong, the microprocessors throw away what they’ve done without losing any time. This feature is called speculative execution.
Spectre and Meltdown attack speculative execution in different ways. Meltdown is more of a conventional vulnerability; the designers of the speculative-execution process made a mistake, so they just needed to fix it. Spectre is worse; it’s a flaw in the very concept of speculative execution. There’s no way to patch that vulnerability; the chips need to be redesigned in such a way as to eliminate it.
Since the announcement, manufacturers have been rolling out patches to these vulnerabilities to the extent possible. Operating systems have been patched so that attackers can’t make use of the vulnerabilities. Web browsers have been patched. Chips have been patched. From the user’s perspective, these are routine fixes. But several aspects of these vulnerabilities illustrate the sorts of security problems we’re only going to be seeing more of.
First, attacks against hardware, as opposed to software, will become more common. Last fall, vulnerabilities were discovered in Intel’s Management Engine, a remote-administration feature on its microprocessors. Like Spectre and Meltdown, they affected how the chips operate. Looking for vulnerabilities on computer chips is new. Now that researchers know this is a fruitful area to explore, security researchers, foreign intelligence agencies, and criminals will be on the hunt.
Second, because microprocessors are fundamental parts of computers, patching requires coordination between many companies. Even when manufacturers like Intel and AMD can write a patch for a vulnerability, computer makers and application vendors still have to customize and push the patch out to the users. This makes it much harder to keep vulnerabilities secret while patches are being written. Spectre and Meltdown were announced prematurely because details were leaking and rumors were swirling. Situations like this give malicious actors more opportunity to attack systems before they’re guarded.
Third, these vulnerabilities will affect computers’ functionality. In some cases, the patches for Spectre and Meltdown result in significant reductions in speed. The press initially reported 30%, but that only seems true for certain servers running in the cloud. For your personal computer or phone, the performance hit from the patch is minimal. But as more vulnerabilities are discovered in hardware, patches will affect performance in noticeable ways.
And then there are the unpatchable vulnerabilities. For decades, the computer industry has kept things secure by finding vulnerabilities in fielded products and quickly patching them. Now there are cases where that doesn’t work. Sometimes it’s because computers are in cheap products that don’t have a patch mechanism, like many of the DVRs and webcams that are vulnerable to the Mirai (and other) botnets — groups of Internet-connected devices sabotaged for coordinated digital attacks. Sometimes it’s because a computer chip’s functionality is so core to a computer’s design that patching it effectively means turning the computer off. This, too, is becoming more common.
Increasingly, everything is a computer: not just your laptop and phone, but your car, your appliances, your medical devices, and global infrastructure. These computers are and always will be vulnerable, but Spectre and Meltdown represent a new class of vulnerability. Unpatchable vulnerabilities in the deepest recesses of the world’s computer hardware is the new normal. It’s going to leave us all much more vulnerable in the future.
Jason Barnett used the pots feature of the Monzo banking API to create a simple e-paper display so that his kids can keep track of their pocket money.
Monzo
For those outside the UK: Monzo is a smartphone-based bank that allows costumers to manage their money and payment cards via an app, removing the bank clerk middleman.
In the Monzo banking app, users can set up pots, which allow them to organise their money into various, you guessed it, pots. You want to put aside holiday funds, budget your food shopping, or, like Jason, manage your kids’ pocket money? Using pots is an easy way to do it.
Jason’s Monzo Pot ePaper tracker
After failed attempts at keeping track of his sons’ pocket money via a scrap of paper stuck to the fridge, Jason decided to try a new approach.
He started his build by installing Stretch Lite to the SD card of his Raspberry Pi Zero W. “The Pi will be running headless (without screen, mouse or keyboard)”, he explains on his blog, “so there is no need for a full-fat Raspbian image.” While Stretch Lite was downloading, he set up the Waveshare ePaper HAT on his Zero W. He notes that Pimoroni’s “Inky pHAT would be easiest,” but his tutorial is specific to the Waveshare device.
Before ejecting the SD card, Jason updated the boot partition to allow him to access the Pi via SSH. He talks makers through that process here.
Among the libraries he installed for the project is pyMonzo, a Python wrapper for the Monzo API created by Paweł Adamczak. Monzo is still in its infancy, and the API is partly under construction. Until it’s completed, Paweł’s wrapper offers a more stable way to use it.
After installing the software, it was time to set up the e-paper screen for the tracker. Jason adjusted the code for the API so that the screen reloads information every 15 minutes, displaying the up-to-date amount of pocket money in both kids’ pots.
Here is how Jason describes going to the supermarket with his sons, now that he has completed the tracker:
“Daddy, I want (insert first thing picked up here), I’ve always wanted one of these my whole life!” […] Even though you have never seen that (insert thing here) before, I can quickly open my Monzo app, flick to Account, and say “You have £3.50 in your money box”. If my boy wants it, a 2-second withdrawal is made whilst queueing, and done — he walks away with a new (again, insert whatever he wanted his whole life here) and is happy!
Jason’s blog offers a full breakdown of his project, including all necessary code and the specs for the physical build. Be sure to head over and check it out.
Have you used an API in your projects? What would you build with one?
I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.
Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.
Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.
So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.
For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.
Let’s get more details on the current scenario and dig out more:
Where are images and videos stored?
How many read/write requests are received per second? Per minute?
What is the level of security required?
Are these synchronous or asynchronous requests?
We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:
How is the website handling sessions?
Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
How are you recording customer behavior data and fulfilling your analytics needs?
What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?
So, if we take this one step at a time:
Step 1:Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching. Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.
I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.
Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger. For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.
Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.
Step 5: Scale your server. If there are still issues, it’s time to scale your server. For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.
If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution. You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.
Your multisite architecture will look like the following diagram:
In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.
If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:
In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.
This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.
I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.
The AWS User Guide to Banking Regulations and Guidelines in India was published in December 2017 and includes information that can help banks regulated by the Reserve Bank of India (RBI) assess how to implement an appropriate information security, risk management, and governance program in the AWS Cloud.
The guide focuses on the following key considerations:
Outsourcing guidelines – Guidance for banks entering an outsourcing arrangement, including risk-management practices such as conducting due diligence and maintaining effective oversight. Learn how to conduct an assessment of AWS services and align your governance requirements with the AWS Shared Responsibility Model.
Information security – Detailed requirements to help banks identify and manage information security in the cloud.
This guide joins the existing Financial Services guides for other jurisdictions, such as Singapore, Australia, and Hong Kong. AWS will publish additional guides in 2018 to help you understand regulatory requirements in other markets around the world.
♪ Used to have a little now I have a lot I’m still, I’m still Jenny from the block chain ♪
For all that has been written about Bitcoin and its ilk, it is curious that the focus is almost solely what the cryptocurrencies are supposed to be. Technologists wax lyrical about the potential for blockchains to change almost every aspect of our lives. Libertarians and paleoconservatives ache for the return to “sound money” that can’t be conjured up at the whim of a bureaucrat. Mainstream economists wag their fingers, proclaiming that a proper currency can’t be deflationary, that it must maintain a particular velocity, or that the government must be able to nip crises of confidence in the bud. And so on.
Much of this may be true, but the proponents of cryptocurrencies should recognize that an appeal to consequences is not a guarantee of good results. The critics, on the other hand, would be best served to remember that they are drawing far-reaching conclusions about the effects of modern monetary policies based on a very short and tumultuous period in history.
In this post, my goal is to ditch most of the dogma, talk a bit about the origins of money – and then see how “crypto” fits the bill.
1. The prehistory of currencies
The emergence of money is usually explained in a very straightforward way. You know the story: a farmer raised a pig, a cobbler made a shoe. The cobbler needed to feed his family while the farmer wanted to keep his feet warm – and so they met to exchange the goods on mutually beneficial terms. But as the tale goes, the barter system had a fatal flaw: sometimes, a farmer wanted a cooking pot, a potter wanted a knife, and a blacksmith wanted a pair of pants. To facilitate increasingly complex, multi-step exchanges without requiring dozens of people to meet face to face, we came up with an abstract way to represent value – a shiny coin guaranteed to be accepted by every tradesman.
It is a nice parable, but it probably isn’t very true. It seems far more plausible that early societies relied on the concept of debt long before the advent of currencies: an informal tally or a formal ledger would be used to keep track of who owes what to whom. The concept of debt, closely associated with one’s trustworthiness and standing in the community, would have enabled a wide range of economic activities: debts could be paid back over time, transferred, renegotiated, or forgotten – all without having to engage in spot barter or to mint a single coin. In fact, such non-monetary, trust-based, reciprocal economies are still common in closely-knit communities: among families, neighbors, coworkers, or friends.
In such a setting, primitive currencies probably emerged simply as a consequence of having a system of prices: a cow being worth a particular number of chickens, a chicken being worth a particular number of beaver pelts, and so forth. Formalizing such relationships by settling on a single, widely-known unit of account – say, one chicken – would make it more convenient to transfer, combine, or split debts; or to settle them in alternative goods.
Contrary to popular belief, for communal ledgers, the unit of account probably did not have to be particularly desirable, durable, or easy to carry; it was simply an accounting tool. And indeed, we sometimes run into fairly unusual units of account even in modern times: for example, cigarettes can be the basis of a bustling prison economy even when most inmates don’t smoke and there are not that many packs to go around.
2. The age of commodity money
In the end, the development of coinage might have had relatively little to do with communal trade – and far more with the desire to exchange goods with strangers. When dealing with a unfamiliar or hostile tribe, the concept of a chicken-denominated ledger does not hold up: the other side might be disinclined to honor its obligations – and get away with it, too. To settle such problematic trades, we needed a “spot” medium of exchange that would be easy to carry and authenticate, had a well-defined value, and a near-universal appeal. Throughout much of the recorded history, precious metals – predominantly gold and silver – proved to fit the bill.
In the most basic sense, such commodities could be seen as a tool to reconcile debts across societal boundaries, without necessarily replacing any local units of account. An obligation, denominated in some local currency, would be created on buyer’s side in order to procure the metal for the trade. The proceeds of the completed transaction would in turn allow the seller to settle their own local obligations that arose from having to source the traded goods. In other words, our wondrous chicken-denominated ledgers could coexist peacefully with gold – and when commodity coinage finally took hold, it’s likely that in everyday trade, precious metals served more as a useful abstraction than a precise store of value. A “silver chicken” of sorts.
Still, the emergence of commodity money had one interesting side effect: it decoupled the unit of debt – a “claim on the society”, in a sense – from any moral judgment about its origin. A piece of silver would buy the same amount of food, whether earned through hard labor or won in a drunken bet. This disconnect remains a central theme in many of the debates about social justice and unfairly earned wealth.
3. The State enters the game
If there is one advantage of chicken ledgers over precious metals, it’s that all chickens look and cluck roughly the same – something that can’t be said of every nugget of silver or gold. To cope with this problem, we needed to shape raw commodities into pieces of a more predictable shape and weight; a trusted party could then stamp them with a mark to indicate the value and the quality of the coin.
At first, the task of standardizing coinage rested with private parties – but the responsibility was soon assumed by the State. The advantages of this transition seemed clear: a single, widely-accepted and easily-recognizable currency could be now used to settle virtually all private and official debts.
Alas, in what deserves the dubious distinction of being one of the earliest examples of monetary tomfoolery, some States succumbed to the temptation of fiddling with the coinage to accomplish anything from feeding the poor to waging wars. In particular, it would be common to stamp coins with the same face value but a progressively lower content of silver and gold. Perhaps surprisingly, the strategy worked remarkably well; at least in the times of peace, most people cared about the value stamped on the coin, not its precise composition or weight.
And so, over time, representative money was born: sooner or later, most States opted to mint coins from nearly-worthless metals, or print banknotes on paper and cloth. This radically new currency was accompanied with a simple pledge: the State offered to redeem it at any time for its nominal value in gold.
Of course, the promise was largely illusory: the State did not have enough gold to honor all the promises it had made. Still, as long as people had faith in their rulers and the redemption requests stayed low, the fundamental mechanics of this new representative currency remained roughly the same as before – and in some ways, were an improvement in that they lessened the insatiable demand for a rare commodity. Just as importantly, the new money still enabled international trade – using the underlying gold exchange rate as a reference point.
4. Fractional reserve banking and fiat money
For much of the recorded history, banking was an exceptionally dull affair, not much different from running a communal chicken ledger of the old. But then, something truly marvelous happened in the 17th century: around that time, many European countries have witnessed the emergence of fractional-reserve banks.
These private ventures operated according to a simple scheme: they accepted people’s coin for safekeeping, promising to pay a premium on every deposit made. To meet these obligations and to make a profit, the banks then used the pooled deposits to make high-interest loans to other folks. The financiers figured out that under normal circumstances and when operating at a sufficient scale, they needed only a very modest reserve – well under 10% of all deposited money – to be able to service the usual volume and size of withdrawals requested by their customers. The rest could be loaned out.
The very curious consequence of fractional-reserve banking was that it pulled new money out of thin air. The funds were simultaneously accounted for in the statements shown to the depositor, evidently available for withdrawal or transfer at any time; and given to third-party borrowers, who could spend them on just about anything. Heck, the borrowers could deposit the proceeds in another bank, creating even more money along the way! Whatever they did, the sum of all funds in the monetary system now appeared much higher than the value of all coins and banknotes issued by the government – let alone the amount of gold sitting in any vault.
Of course, no new money was being created in any physical sense: all that banks were doing was engaging in a bit of creative accounting – the sort of which would probably land you in jail if you attempted it today in any other comparably vital field of enterprise. If too many depositors were to ask for their money back, or if too many loans were to go bad, the banking system would fold. Fortunes would evaporate in a puff of accounting smoke, and with the disappearance of vast quantities of quasi-fictitious (“broad”) money, the wealth of the entire nation would shrink.
In the early 20th century, the world kept witnessing just that; a series of bank runs and economic contractions forced the governments around the globe to act. At that stage, outlawing fractional-reserve banking was no longer politically or economically tenable; a simpler alternative was to let go of gold and move to fiat money – a currency implemented as an abstract social construct, with no predefined connection to the physical realm. A new breed of economists saw the role of the government not in trying to peg the value of money to an inflexible commodity, but in manipulating its supply to smooth out economic hiccups or to stimulate growth.
(Contrary to popular beliefs, such manipulation is usually not done by printing new banknotes; more sophisticated methods, such as lowering reserve requirements for bank deposits or enticing banks to invest its deposits into government-issued securities, are the preferred route.)
The obvious peril of fiat money is that in the long haul, its value is determined strictly by people’s willingness to accept a piece of paper in exchange for their trouble; that willingness, in turn, is conditioned solely on their belief that the same piece of paper would buy them something nice a week, a month, or a year from now. It follows that a simple crisis of confidence could make a currency nearly worthless overnight. A prolonged period of hyperinflation and subsequent austerity in Germany and Austria was one of the precipitating factors that led to World War II. In more recent times, dramatic episodes of hyperinflation plagued the fiat currencies of Israel (1984), Mexico (1988), Poland (1990), Yugoslavia (1994), Bulgaria (1996), Turkey (2002), Zimbabwe (2009), Venezuela (2016), and several other nations around the globe.
For the United States, the switch to fiat money came relatively late, in 1971. To stop the dollar from plunging like a rock, the Nixon administration employed a clever trick: they ordered the freeze of wages and prices for the 90 days that immediately followed the move. People went on about their lives and paid the usual for eggs or milk – and by the time the freeze ended, they were accustomed to the idea that the “new”, free-floating dollar is worth about the same as the old, gold-backed one. A robust economy and favorable geopolitics did the rest, and so far, the American adventure with fiat currency has been rather uneventful – perhaps except for the fact that the price of gold itself skyrocketed from $35 per troy ounce in 1971 to $850 in 1980 (or, from $210 to $2,500 in today’s dollars).
Well, one thing did change: now better positioned to freely tamper with the supply of money, the regulators in accord with the bankers adopted a policy of creating it at a rate that slightly outstripped the organic growth in economic activity. They did this to induce a small, steady degree of inflation, believing that doing so would discourage people from hoarding cash and force them to reinvest it for the betterment of the society. Some critics like to point out that such a policy functions as a “backdoor” tax on savings that happens to align with the regulators’ less noble interests; still, either way: in the US and most other developed nations, the purchasing power of any money kept under a mattress will drop at a rate of somewhere between 2 to 10% a year.
5. So what’s up with Bitcoin?
Well… countless tomes have been written about the nature and the optimal characteristics of government-issued fiat currencies. Some heterodox economists, notably including Murray Rothbard, have also explored the topic of privately-issued, decentralized, commodity-backed currencies. But Bitcoin is a wholly different animal.
In essence, BTC is a global, decentralized fiat currency: it has no (recoverable) intrinsic value, no central authority to issue it or define its exchange rate, and it has no anchoring to any historical reference point – a combination that until recently seemed nonsensical and escaped any serious scrutiny. It does the unthinkable by employing three clever tricks:
It allows anyone to create new coins, but only by solving brute-force computational challenges that get more difficult as the time goes by,
It prevents unauthorized transfer of coins by employing public key cryptography to sign off transactions, with only the authorized holder of a coin knowing the correct key,
It prevents double-spending by using a distributed public ledger (“blockchain”), recording the chain of custody for coins in a tamper-proof way.
The blockchain is often described as the most important feature of Bitcoin, but in some ways, its importance is overstated. The idea of a currency that does not rely on a centralized transaction clearinghouse is what helped propel the platform into the limelight – mostly because of its novelty and the perception that it is less vulnerable to government meddling (although the government is still free to track down, tax, fine, or arrest any participants). On the flip side, the everyday mechanics of BTC would not be fundamentally different if all the transactions had to go through Bitcoin Bank, LLC.
A more striking feature of the new currency is the incentive structure surrounding the creation of new coins. The underlying design democratized the creation of new coins early on: all you had to do is leave your computer running for a while to acquire a number of tokens. The tokens had no practical value, but obtaining them involved no substantial expense or risk. Just as importantly, because the difficulty of the puzzles would only increase over time, the hope was that if Bitcoin caught on, latecomers would find it easier to purchase BTC on a secondary market than mine their own – paying with a more established currency at a mutually beneficial exchange rate.
The persistent publicity surrounding Bitcoin and other cryptocurrencies did the rest – and today, with the growing scarcity of coins and the rapidly increasing demand, the price of a single token hovers somewhere south of $15,000.
6. So… is it bad money?
Predicting is hard – especially the future. In some sense, a coin that represents a cryptographic proof of wasted CPU cycles is no better or worse than a currency that relies on cotton decorated with pictures of dead presidents. It is true that Bitcoin suffers from many implementation problems – long transaction processing times, high fees, frequent security breaches of major exchanges – but in principle, such problems can be overcome.
That said, currencies live and die by the lasting willingness of others to accept them in exchange for services or goods – and in that sense, the jury is still out. The use of Bitcoin to settle bona fide purchases is negligible, both in absolute terms and in function of the overall volume of transactions. In fact, because of the technical challenges and limited practical utility, some companies that embraced the currency early on are now backing out.
When the value of an asset is derived almost entirely from its appeal as an ever-appreciating investment vehicle, the situation has all the telltale signs of a speculative bubble. But that does not prove that the asset is destined to collapse, or that a collapse would be its end. Still, the built-in deflationary mechanism of Bitcoin – the increasing difficulty of producing new coins – is probably both a blessing and a curse.
It’s going to go one way or the other; and when it’s all said and done, we’re going to celebrate the people who made the right guess. Because future is actually pretty darn easy to predict — in retrospect.
(Note: this is my personal opinion based on public knowledge around this issue. I have no knowledge of any non-public details of these vulnerabilities, and this should not be interpreted as the position or opinion of my employer)
Intel’s Management Engine (ME) is a small coprocessor built into the majority of Intel CPUs[0]. Older versions were based on the ARC architecture[1] running an embedded realtime operating system, but from version 11 onwards they’ve been small x86 cores running Minix. The precise capabilities of the ME have not been publicly disclosed, but it is at minimum capable of interacting with the network[2], display[3], USB, input devices and system flash. In other words, software running on the ME is capable of doing a lot, without requiring any OS permission in the process.
Back in May, Intel announced a vulnerability in the Advanced Management Technology (AMT) that runs on the ME. AMT offers functionality like providing a remote console to the system (so IT support can connect to your system and interact with it as if they were physically present), remote disk support (so IT support can reinstall your machine over the network) and various other bits of system management. The vulnerability meant that it was possible to log into systems with enabled AMT with an empty authentication token, making it possible to log in without knowing the configured password.
This vulnerability was less serious than it could have been for a couple of reasons – the first is that “consumer”[4] systems don’t ship with AMT, and the second is that AMT is almost always disabled (Shodan found only a few thousand systems on the public internet with AMT enabled, out of many millions of laptops). I wrote more about it here at the time.
How does this compare to the newly announced vulnerabilities? Good question. Two of the announced vulnerabilities are in AMT. The previous AMT vulnerability allowed you to bypass authentication, but restricted you to doing what AMT was designed to let you do. While AMT gives an authenticated user a great deal of power, it’s also designed with some degree of privacy protection in mind – for instance, when the remote console is enabled, an animated warning border is drawn on the user’s screen to alert them.
This vulnerability is different in that it allows an authenticated attacker to execute arbitrary code within the AMT process. This means that the attacker shouldn’t have any capabilities that AMT doesn’t, but it’s unclear where various aspects of the privacy protection are implemented – for instance, if the warning border is implemented in AMT rather than in hardware, an attacker could duplicate that functionality without drawing the warning. If the USB storage emulation for remote booting is implemented as a generic USB passthrough, the attacker could pretend to be an arbitrary USB device and potentially exploit the operating system through bugs in USB device drivers. Unfortunately we don’t currently know.
Note that this exploit still requires two things – first, AMT has to be enabled, and second, the attacker has to be able to log into AMT. If the attacker has physical access to your system and you don’t have a BIOS password set, they will be able to enable it – however, if AMT isn’t enabled and the attacker isn’t physically present, you’re probably safe. But if AMT is enabled and you haven’t patched the previous vulnerability, the attacker will be able to access AMT over the network without a password and then proceed with the exploit. This is bad, so you should probably (1) ensure that you’ve updated your BIOS and (2) ensure that AMT is disabled unless you have a really good reason to use it.
The AMT vulnerability applies to a wide range of versions, everything from version 6 (which shipped around 2008) and later. The other vulnerability that Intel describe is restricted to version 11 of the ME, which only applies to much more recent systems. This vulnerability allows an attacker to execute arbitrary code on the ME, which means they can do literally anything the ME is able to do. This probably also means that they are able to interfere with any other code running on the ME. While AMT has been the most frequently discussed part of this, various other Intel technologies are tied to ME functionality.
Intel’s Platform Trust Technology (PTT) is a software implementation of a Trusted Platform Module (TPM) that runs on the ME. TPMs are intended to protect access to secrets and encryption keys and record the state of the system as it boots, making it possible to determine whether a system has had part of its boot process modified and denying access to the secrets as a result. The most common usage of TPMs is to protect disk encryption keys – Microsoft Bitlocker defaults to storing its encryption key in the TPM, automatically unlocking the drive if the boot process is unmodified. In addition, TPMs support something called Remote Attestation (I wrote about that here), which allows the TPM to provide a signed copy of information about what the system booted to a remote site. This can be used for various purposes, such as not allowing a compute node to join a cloud unless it’s booted the correct version of the OS and is running the latest firmware version. Remote Attestation depends on the TPM having a unique cryptographic identity that is tied to the TPM and inaccessible to the OS.
PTT allows manufacturers to simply license some additional code from Intel and run it on the ME rather than having to pay for an additional chip on the system motherboard. This seems great, but if an attacker is able to run code on the ME then they potentially have the ability to tamper with PTT, which means they can obtain access to disk encryption secrets and circumvent Bitlocker. It also means that they can tamper with Remote Attestation, “attesting” that the system booted a set of software that it didn’t or copying the keys to another system and allowing that to impersonate the first. This is, uh, bad.
Intel also recently announced Intel Online Connect, a mechanism for providing the functionality of security keys directly in the operating system. Components of this are run on the ME in order to avoid scenarios where a compromised OS could be used to steal the identity secrets – if the ME is compromised, this may make it possible for an attacker to obtain those secrets and duplicate the keys.
It’s also not entirely clear how much of Intel’s Secure Guard Extensions (SGX) functionality depends on the ME. The ME does appear to be required for SGX Remote Attestation (which allows an application using SGX to prove to a remote site that it’s the SGX app rather than something pretending to be it), and again if those secrets can be extracted from a compromised ME it may be possible to compromise some of the security assumptions around SGX. Again, it’s not clear how serious this is because it’s not publicly documented.
Various other things also run on the ME, including stuff like video DRM (ensuring that high resolution video streams can’t be intercepted by the OS). It may be possible to obtain encryption keys from a compromised ME that allow things like Netflix streams to be decoded and dumped. From a user privacy or security perspective, these things seem less serious.
The big problem at the moment is that we have no idea what the actual process of compromise is. Intel state that it requires local access, but don’t describe what kind. Local access in this case could simply require the ability to send commands to the ME (possible on any system that has the ME drivers installed), could require direct hardware access to the exposed ME (which would require either kernel access or the ability to install a custom driver) or even the ability to modify system flash (possible only if the attacker has physical access and enough time and skill to take the system apart and modify the flash contents with an SPI programmer). The other thing we don’t know is whether it’s possible for an attacker to modify the system such that the ME is persistently compromised or whether it needs to be re-compromised every time the ME reboots. Note that even the latter is more serious than you might think – the ME may only be rebooted if the system loses power completely, so even a “temporary” compromise could affect a system for a long period of time.
It’s also almost impossible to determine if a system is compromised. If the ME is compromised then it’s probably possible for it to roll back any firmware updates but still report that it’s been updated, giving admins a false sense of security. The only way to determine for sure would be to dump the system flash and compare it to a known good image. This is impractical to do at scale.
So, overall, given what we know right now it’s hard to say how serious this is in terms of real world impact. It’s unlikely that this is the kind of vulnerability that would be used to attack individual end users – anyone able to compromise a system like this could just backdoor your browser instead with much less effort, and that already gives them your banking details. The people who have the most to worry about here are potential targets of skilled attackers, which means activists, dissidents and companies with interesting personal or business data. It’s hard to make strong recommendations about what to do here without more insight into what the vulnerability actually is, and we may not know that until this presentation next month.
Summary: Worst case here is terrible, but unlikely to be relevant to the vast majority of users.
[0] Earlier versions of the ME were built into the motherboard chipset, but as portions of that were incorporated onto the CPU package the ME followed [1] A descendent of the SuperFX chip used in Super Nintendo cartridges such as Starfox, because why not [2] Without any OS involvement for wired ethernet and for wireless networks in the system firmware, but requires OS support for wireless access once the OS drivers have loaded [3] Assuming you’re using integrated Intel graphics [4] “Consumer” is a bit of a misnomer here – “enterprise” laptops like Thinkpads ship with AMT, but are often bought by consumers.
We can’t believe that there are just few days left before re:Invent 2017. If you are attending this year, you’ll want to check out our Big Data sessions! The Big Data and Machine Learning categories are bigger than ever. As in previous years, you can find these sessions in various tracks, including Analytics & Big Data, Deep Learning Summit, Artificial Intelligence & Machine Learning, Architecture, and Databases.
We have great sessions from organizations and companies like Vanguard, Cox Automotive, Pinterest, Netflix, FINRA, Amtrak, AmazonFresh, Sysco Foods, Twilio, American Heart Association, Expedia, Esri, Nextdoor, and many more. All sessions are recorded and made available on YouTube. In addition, all slide decks from the sessions will be available on SlideShare.net after the conference.
This post highlights the sessions that will be presented as part of the Analytics & Big Data track, as well as relevant sessions from other tracks like Architecture, Artificial Intelligence & Machine Learning, and IoT. If you’re interested in Machine Learning sessions, don’t forget to check out our Guide to Machine Learning at re:Invent 2017.
This year’s session catalog contains the following breakout sessions.
Raju Gulabani, VP, Database, Analytics and AI at AWS will discuss the evolution of database and analytics services in AWS, the new database and analytics services and features we launched this year, and our vision for continued innovation in this space. We are witnessing an unprecedented growth in the amount of data collected, in many different forms. Storage, management, and analysis of this data require database services that scale and perform in ways not possible before. AWS offers a collection of database and other data services—including Amazon Aurora, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon ElastiCache, Amazon Kinesis, and Amazon EMR—to process, store, manage, and analyze data. In this session, we provide an overview of AWS database and analytics services and discuss how customers are using these services today.
Deep dive customer use cases
ABD401 – How Netflix Monitors Applications in Near Real-Time with Amazon Kinesis Thousands of services work in concert to deliver millions of hours of video streams to Netflix customers every day. These applications vary in size, function, and technology, but they all make use of the Netflix network to communicate. Understanding the interactions between these services is a daunting challenge both because of the sheer volume of traffic and the dynamic nature of deployments. In this session, we first discuss why Netflix chose Kinesis Streams to address these challenges at scale. We then dive deep into how Netflix uses Kinesis Streams to enrich network traffic logs and identify usage patterns in real time. Lastly, we cover how Netflix uses this system to build comprehensive dependency maps, increase network efficiency, and improve failure resiliency. From this session, you will learn how to build a real-time application monitoring system using network traffic logs and get real-time, actionable insights.
In this session, learn how Nextdoor replaced their home-grown data pipeline based on a topology of Flume nodes with a completely serverless architecture based on Kinesis and Lambda. By making these changes, they improved both the reliability of their data and the delivery times of billions of records of data to their Amazon S3–based data lake and Amazon Redshift cluster. Nextdoor is a private social networking service for neighborhoods.
ABD205 – Taking a Page Out of Ivy Tech’s Book: Using Data for Student Success Data speaks. Discover how Ivy Tech, the nation’s largest singly accredited community college, uses AWS to gather, analyze, and take action on student behavioral data for the betterment of over 3,100 students. This session outlines the process from inception to implementation across the state of Indiana and highlights how Ivy Tech’s model can be applied to your own complex business problems.
ABD207 – Leveraging AWS to Fight Financial Crime and Protect National Security Banks aren’t known to share data and collaborate with one another. But that is exactly what the Mid-Sized Bank Coalition of America (MBCA) is doing to fight digital financial crime—and protect national security. Using the AWS Cloud, the MBCA developed a shared data analytics utility that processes terabytes of non-competitive customer account, transaction, and government risk data. The intelligence produced from the data helps banks increase the efficiency of their operations, cut labor and operating costs, and reduce false positive volumes. The collective intelligence also allows greater enforcement of Anti-Money Laundering (AML) regulations by helping members detect internal risks—and identify the challenges to detecting these risks in the first place. This session demonstrates how the AWS Cloud supports the MBCA to deliver advanced data analytics, provide consistent operating models across financial institutions, reduce costs, and strengthen national security.
ABD208 – Cox Automotive Empowered to Scale with Splunk Cloud & AWS and Explores New Innovation with Amazon Kinesis Firehose In this session, learn how Cox Automotive is using Splunk Cloud for real time visibility into its AWS and hybrid environments to achieve near instantaneous MTTI, reduce auction incidents by 90%, and proactively predict outages. We also introduce a highly anticipated capability that allows you to ingest, transform, and analyze data in real time using Splunk and Amazon Kinesis Firehose to gain valuable insights from your cloud resources. It’s now quicker and easier than ever to gain access to analytics-driven infrastructure monitoring using Splunk Enterprise & Splunk Cloud.
ABD209 – Accelerating the Speed of Innovation with a Data Sciences Data & Analytics Hub at Takeda Historically, silos of data, analytics, and processes across functions, stages of development, and geography created a barrier to R&D efficiency. Gathering the right data necessary for decision-making was challenging due to issues of accessibility, trust, and timeliness. In this session, learn how Takeda is undergoing a transformation in R&D to increase the speed-to-market of high-impact therapies to improve patient lives. The Data and Analytics Hub was built, with Deloitte, to address these issues and support the efficient generation of data insights for functions such as clinical operations, clinical development, medical affairs, portfolio management, and R&D finance. In the AWS hosted data lake, this data is processed, integrated, and made available to business end users through data visualization interfaces, and to data scientists through direct connectivity. Learn how Takeda has achieved significant time reductions—from weeks to minutes—to gather and provision data that has the potential to reduce cycle times in drug development. The hub also enables more efficient operations and alignment to achieve product goals through cross functional team accountability and collaboration due to the ability to access the same cross domain data.
ABD210 – Modernizing Amtrak: Serverless Solution for Real-Time Data Capabilities As the nation’s only high-speed intercity passenger rail provider, Amtrak needs to know critical information to run their business such as: Who’s onboard any train at any time? How are booking and revenue trending? Amtrak was faced with unpredictable and often slow response times from existing databases, ranging from seconds to hours; existing booking and revenue dashboards were spreadsheet-based and manual; multiple copies of data were stored in different repositories, lacking integration and consistency; and operations and maintenance (O&M) costs were relatively high. Join us as we demonstrate how Deloitte and Amtrak successfully went live with a cloud-native operational database and analytical datamart for near-real-time reporting in under six months. We highlight the specific challenges and the modernization of architecture on an AWS native Platform as a Service (PaaS) solution. The solution includes cloud-native components such as AWS Lambda for microservices, Amazon Kinesis and AWS Data Pipeline for moving data, Amazon S3 for storage, Amazon DynamoDB for a managed NoSQL database service, and Amazon Redshift for near-real time reports and dashboards. Deloitte’s solution enabled “at scale” processing of 1 million transactions/day and up to 2K transactions/minute. It provided flexibility and scalability, largely eliminate the need for system management, and dramatically reduce operating costs. Moreover, it laid the groundwork for decommissioning legacy systems, anticipated to save at least $1M over 3 years.
ABD211 – Sysco Foods: A Journey from Too Much Data to Curated Insights In this session, we detail Sysco’s journey from a company focused on hindsight-based reporting to one focused on insights and foresight. For this shift, Sysco moved from multiple data warehouses to an AWS ecosystem, including Amazon Redshift, Amazon EMR, AWS Data Pipeline, and more. As the team at Sysco worked with Tableau, they gained agile insight across their business. Learn how Sysco decided to use AWS, how they scaled, and how they became more strategic with the AWS ecosystem and Tableau.
ABD217 – From Batch to Streaming: How Amazon Flex Uses Real-time Analytics to Deliver Packages on Time Reducing the time to get actionable insights from data is important to all businesses, and customers who employ batch data analytics tools are exploring the benefits of streaming analytics. Learn best practices to extend your architecture from data warehouses and databases to real-time solutions. Learn how to use Amazon Kinesis to get real-time data insights and integrate them with Amazon Aurora, Amazon RDS, Amazon Redshift, and Amazon S3. The Amazon Flex team describes how they used streaming analytics in their Amazon Flex mobile app, used by Amazon delivery drivers to deliver millions of packages each month on time. They discuss the architecture that enabled the move from a batch processing system to a real-time system, overcoming the challenges of migrating existing batch data to streaming data, and how to benefit from real-time analytics.
ABD218 – How EuroLeague Basketball Uses IoT Analytics to Engage Fans IoT and big data have made their way out of industrial applications, general automation, and consumer goods, and are now a valuable tool for improving consumer engagement across a number of industries, including media, entertainment, and sports. The low cost and ease of implementation of AWS analytics services and AWS IoT have allowed AGT, a leader in IoT, to develop their IoTA analytics platform. Using IoTA, AGT brought a tailored solution to EuroLeague Basketball for real-time content production and fan engagement during the 2017-18 season. In this session, we take a deep dive into how this solution is architected for secure, scalable, and highly performant data collection from athletes, coaches, and fans. We also talk about how the data is transformed into insights and integrated into a content generation pipeline. Lastly, we demonstrate how this solution can be easily adapted for other industries and applications.
ABD222 – How to Confidently Unleash Data to Meet the Needs of Your Entire Organization Where are you on the spectrum of IT leaders? Are you confident that you’re providing the technology and solutions that consistently meet or exceed the needs of your internal customers? Do your peers at the executive table see you as an innovative technology leader? Innovative IT leaders understand the value of getting data and analytics directly into the hands of decision makers, and into their own. In this session, Daren Thayne, Domo’s Chief Technology Officer, shares how innovative IT leaders are helping drive a culture change at their organizations. See how transformative it can be to have real-time access to all of the data that’ is relevant to YOUR job (including a complete view of your entire AWS environment), as well as understand how it can help you lead the way in applying that same pattern throughout your entire company
ABD303 – Developing an Insights Platform – Sysco’s Journey from Disparate Systems to Data Lake and Beyond Sysco has nearly 200 operating companies across its multiple lines of business throughout the United States, Canada, Central/South America, and Europe. As the global leader in food services, Sysco identified the need to streamline the collection, transformation, and presentation of data produced by the distributed units and systems, into a central data ecosystem. Sysco’s Business Intelligence and Analytics team addressed these requirements by creating a data lake with scalable analytics and query engines leveraging AWS services. In this session, Sysco will outline their journey from a hindsight reporting focused company to an insights driven organization. They will cover solution architecture, challenges, and lessons learned from deploying a self-service insights platform. They will also walk through the design patterns they used and how they designed the solution to provide predictive analytics using Amazon Redshift Spectrum, Amazon S3, Amazon EMR, AWS Glue, Amazon Elasticsearch Service and other AWS services.
ABD309 – How Twilio Scaled Its Data-Driven Culture As a leading cloud communications platform, Twilio has always been strongly data-driven. But as headcount and data volumes grew—and grew quickly—they faced many new challenges. One-off, static reports work when you’re a small startup, but how do you support a growth stage company to a successful IPO and beyond? Today, Twilio’s data team relies on AWS and Looker to provide data access to 700 colleagues. Departments have the data they need to make decisions, and cloud-based scale means they get answers fast. Data delivers real-business value at Twilio, providing a 360-degree view of their customer, product, and business. In this session, you hear firsthand stories directly from the Twilio data team and learn real-world tips for fostering a truly data-driven culture at scale.
ABD310 – How FINRA Secures Its Big Data and Data Science Platform on AWS FINRA uses big data and data science technologies to detect fraud, market manipulation, and insider trading across US capital markets. As a financial regulator, FINRA analyzes highly sensitive data, so information security is critical. Learn how FINRA secures its Amazon S3 Data Lake and its data science platform on Amazon EMR and Amazon Redshift, while empowering data scientists with tools they need to be effective. In addition, FINRA shares AWS security best practices, covering topics such as AMI updates, micro segmentation, encryption, key management, logging, identity and access management, and compliance.
ABD331 – Log Analytics at Expedia Using Amazon Elasticsearch Service Expedia uses Amazon Elasticsearch Service (Amazon ES) for a variety of mission-critical use cases, ranging from log aggregation to application monitoring and pricing optimization. In this session, the Expedia team reviews how they use Amazon ES and Kibana to analyze and visualize Docker startup logs, AWS CloudTrail data, and application metrics. They share best practices for architecting a scalable, secure log analytics solution using Amazon ES, so you can add new data sources almost effortlessly and get insights quickly
ABD316 – American Heart Association: Finding Cures to Heart Disease Through the Power of Technology Combining disparate datasets and making them accessible to data scientists and researchers is a prevalent challenge for many organizations, not just in healthcare research. American Heart Association (AHA) has built a data science platform using Amazon EMR, Amazon Elasticsearch Service, and other AWS services, that corrals multiple datasets and enables advanced research on phenotype and genotype datasets, aimed at curing heart diseases. In this session, we present how AHA built this platform and the key challenges they addressed with the solution. We also provide a demo of the platform, and leave you with suggestions and next steps so you can build similar solutions for your use cases
ABD319 – Tooling Up for Efficiency: DIY Solutions @ Netflix At Netflix, we have traditionally approached cloud efficiency from a human standpoint, whether it be in-person meetings with the largest service teams or manually flipping reservations. Over time, we realized that these manual processes are not scalable as the business continues to grow. Therefore, in the past year, we have focused on building out tools that allow us to make more insightful, data-driven decisions around capacity and efficiency. In this session, we discuss the DIY applications, dashboards, and processes we built to help with capacity and efficiency. We start at the ten thousand foot view to understand the unique business and cloud problems that drove us to create these products, and discuss implementation details, including the challenges encountered along the way. Tools discussed include Picsou, the successor to our AWS billing file cost analyzer; Libra, an easy-to-use reservation conversion application; and cost and efficiency dashboards that relay useful financial context to 50+ engineering teams and managers.
ABD312 – Deep Dive: Migrating Big Data Workloads to AWS Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to AWS in order to save costs, increase availability, and improve performance. AWS offers a broad set of analytics services, including solutions for batch processing, stream processing, machine learning, data workflow orchestration, and data warehousing. This session will focus on identifying the components and workflows in your current environment; and providing the best practices to migrate these workloads to the right AWS data analytics product. We will cover services such as Amazon EMR, Amazon Athena, Amazon Redshift, Amazon Kinesis, and more. We will also feature Vanguard, an American investment management company based in Malvern, Pennsylvania with over $4.4 trillion in assets under management. Ritesh Shah, Sr. Program Manager for Cloud Analytics Program at Vanguard, will describe how they orchestrated their migration to AWS analytics services, including Hadoop and Spark workloads to Amazon EMR. Ritesh will highlight the technical challenges they faced and overcame along the way, as well as share common recommendations and tuning tips to accelerate the time to production.
ABD402 – How Esri Optimizes Massive Image Archives for Analytics in the Cloud Petabyte scale archives of satellites, planes, and drones imagery continue to grow exponentially. They mostly exist as semi-structured data, but they are only valuable when accessed and processed by a wide range of products for both visualization and analysis. This session provides an overview of how ArcGIS indexes and structures data so that any part of it can be quickly accessed, processed, and analyzed by reading only the minimum amount of data needed for the task. In this session, we share best practices for structuring and compressing massive datasets in Amazon S3, so it can be analyzed efficiently. We also review a number of different image formats, including GeoTIFF (used for the Public Datasets on AWS program, Landsat on AWS), cloud optimized GeoTIFF, MRF, and CRF as well as different compression approaches to show the effect on processing performance. Finally, we provide examples of how this technology has been used to help image processing and analysis for the response to Hurricane Harvey.
ABD329 – A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Massive Scale Amazon’s consumer business continues to grow, and so does the volume of data and the number and complexity of the analytics done in support of the business. In this session, we talk about how Amazon.com uses AWS technologies to build a scalable environment for data and analytics. We look at how Amazon is evolving the world of data warehousing with a combination of a data lake and parallel, scalable compute engines such as Amazon EMR and Amazon Redshift.
ABD327 – Migrating Your Traditional Data Warehouse to a Modern Data Lake In this session, we discuss the latest features of Amazon Redshift and Redshift Spectrum, and take a deep dive into its architecture and inner workings. We share many of the recent availability, performance, and management enhancements and how they improve your end user experience. You also hear from 21st Century Fox, who presents a case study of their fast migration from an on-premises data warehouse to Amazon Redshift. Learn how they are expanding their data warehouse to a data lake that encompasses multiple data sources and data formats. This architecture helps them tie together siloed business units and get actionable 360-degree insights across their consumer base. MCL202 – Ally Bank & Cognizant: Transforming Customer Experience Using Amazon Alexa Given the increasing popularity of natural language interfaces such as Voice as User technology or conversational artificial intelligence (AI), Ally® Bank was looking to interact with customers by enabling direct transactions through conversation or voice. They also needed to develop a capability that allows third parties to connect to the bank securely for information sharing and exchange, using oAuth, an authentication protocol seen as the future of secure banking technology. Cognizant’s Architecture team partnered with Ally Bank’s Enterprise Architecture group and identified the right product for oAuth integration with Amazon Alexa and third-party technologies. In this session, we discuss how building products with conversational AI helps Ally Bank offer an innovative customer experience; increase retention through improved data-driven personalization; increase the efficiency and convenience of customer service; and gain deep insights into customer needs through data analysis and predictive analytics to offer new products and services.
MCL317 – Orchestrating Machine Learning Training for Netflix Recommendations At Netflix, we use machine learning (ML) algorithms extensively to recommend relevant titles to our 100+ million members based on their tastes. Everything on the member home page is an evidence-driven, A/B-tested experience that we roll out backed by ML models. These models are trained using Meson, our workflow orchestration system. Meson distinguishes itself from other workflow engines by handling more sophisticated execution graphs, such as loops and parameterized fan-outs. Meson can schedule Spark jobs, Docker containers, bash scripts, gists of Scala code, and more. Meson also provides a rich visual interface for monitoring active workflows and inspecting execution logs. It has a powerful Scala DSL for authoring workflows as well as the REST API. In this session, we focus on how Meson trains recommendation ML models in production, and how we have re-architected it to scale up for a growing need of broad ETL applications within Netflix. As a driver for this change, we have had to evolve the persistence layer for Meson. We talk about how we migrated from Cassandra to Amazon RDS backed by Amazon Aurora
MCL350 – Humans vs. the Machines: How Pinterest Uses Amazon Mechanical Turk’s Worker Community to Improve Machine Learning Ever since the term “crowdsourcing” was coined in 2006, it’s been a buzzword for technology companies and social institutions. In the technology sector, crowdsourcing is instrumental for verifying machine learning algorithms, which, in turn, improves the user’s experience. In this session, we explore how Pinterest adapted to an increased reliability on human evaluation to improve their product, with a focus on how they’ve integrated with Mechanical Turk’s platform. This presentation is aimed at engineers, analysts, program managers, and product managers who are interested in how companies rely on Mechanical Turk’s human evaluation platform to better understand content and improve machine learning algorithms. The discussion focuses on the analysis and product decisions related to building a high quality crowdsourcing system that takes advantage of Mechanical Turk’s powerful worker community.
ABD201 – Big Data Architectural Patterns and Best Practices on AWS In this session, we simplify big data processing as a data bus comprising various stages: collect, store, process, analyze, and visualize. Next, we discuss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on. Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost
ABD202 – Best Practices for Building Serverless Big Data Applications Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this session, we show you how to incorporate serverless concepts into your big data architectures. We explore the concepts behind and benefits of serverless architectures for big data, looking at design patterns to ingest, store, process, and visualize your data. Along the way, we explain when and how you can use serverless technologies to streamline data processing, minimize infrastructure management, and improve agility and robustness and share a reference architecture using a combination of cloud and open source technologies to solve your big data problems. Topics include: use cases and best practices for serverless big data applications; leveraging AWS technologies such as Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Lambda, Amazon Athena, and Amazon EMR; and serverless ETL, event processing, ad hoc analysis, and real-time analytics.
ABD206 – Building Visualizations and Dashboards with Amazon QuickSight Just as a picture is worth a thousand words, a visual is worth a thousand data points. A key aspect of our ability to gain insights from our data is to look for patterns, and these patterns are often not evident when we simply look at data in tables. The right visualization will help you gain a deeper understanding in a much quicker timeframe. In this session, we will show you how to quickly and easily visualize your data using Amazon QuickSight. We will show you how you can connect to data sources, generate custom metrics and calculations, create comprehensive business dashboards with various chart types, and setup filters and drill downs to slice and dice the data.
ABD203 – Real-Time Streaming Applications on AWS: Use Cases and Patterns To win in the marketplace and provide differentiated customer experiences, businesses need to be able to use live data in real time to facilitate fast decision making. In this session, you learn common streaming data processing use cases and architectures. First, we give an overview of streaming data and AWS streaming data capabilities. Next, we look at a few customer examples and their real-time streaming applications. Finally, we walk through common architectures and design patterns of top streaming data use cases.
ABD213 – How to Build a Data Lake with AWS Glue Data Catalog As data volumes grow and customers store more data on AWS, they often have valuable data that is not easily discoverable and available for analytics. The AWS Glue Data Catalog provides a central view of your data lake, making data readily available for analytics. We introduce key features of the AWS Glue Data Catalog and its use cases. Learn how crawlers can automatically discover your data, extract relevant metadata, and add it as table definitions to the AWS Glue Data Catalog. We will also explore the integration between AWS Glue Data Catalog and Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
ABD214 – Real-time User Insights for Mobile and Web Applications with Amazon Pinpoint With customers demanding relevant and real-time experiences across a range of devices, digital businesses are looking to gather user data at scale, understand this data, and respond to customer needs instantly. This requires tools that can record large volumes of user data in a structured fashion, and then instantly make this data available to generate insights. In this session, we demonstrate how you can use Amazon Pinpoint to capture user data in a structured yet flexible manner. Further, we demonstrate how this data can be set up for instant consumption using services like Amazon Kinesis Firehose and Amazon Redshift. We walk through example data based on real world scenarios, to illustrate how Amazon Pinpoint lets you easily organize millions of events, record them in real-time, and store them for further analysis.
ABD223 – IT Innovators: New Technology for Leveraging Data to Enable Agility, Innovation, and Business Optimization Companies of all sizes are looking for technology to efficiently leverage data and their existing IT investments to stay competitive and understand where to find new growth. Regardless of where companies are in their data-driven journey, they face greater demands for information by customers, prospects, partners, vendors and employees. All stakeholders inside and outside the organization want information on-demand or in “real time”, available anywhere on any device. They want to use it to optimize business outcomes without having to rely on complex software tools or human gatekeepers to relevant information. Learn how IT innovators at companies such as MasterCard, Jefferson Health, and TELUS are using Domo’s Business Cloud to help their organizations more effectively leverage data at scale.
ABD301 – Analyzing Streaming Data in Real Time with Amazon Kinesis Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we present an end-to-end streaming data solution using Kinesis Streams for data ingestion, Kinesis Analytics for real-time processing, and Kinesis Firehose for persistence. We review in detail how to write SQL queries using streaming data and discuss best practices to optimize and monitor your Kinesis Analytics applications. Lastly, we discuss how to estimate the cost of the entire system
ABD302 – Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service and Kibana In this session, we use Apache web logs as example and show you how to build an end-to-end analytics solution. First, we cover how to configure an Amazon ES cluster and ingest data using Amazon Kinesis Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data. Then we demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we review approaches for generating custom, ad-hoc reports.
ABD304 – Best Practices for Data Warehousing with Amazon Redshift & Redshift Spectrum Most companies are over-run with data, yet they lack critical insights to make timely and accurate business decisions. They are missing the opportunity to combine large amounts of new, unstructured big data that resides outside their data warehouse with trusted, structured data inside their data warehouse. In this session, we take an in-depth look at how modern data warehousing blends and analyzes all your data, inside and outside your data warehouse without moving the data, to give you deeper insights to run your business. We will cover best practices on how to design optimal schemas, load data efficiently, and optimize your queries to deliver high throughput and performance.
ABD305 – Design Patterns and Best Practices for Data Analytics with Amazon EMR Amazon EMR is one of the largest Hadoop operators in the world, enabling customers to run ETL, machine learning, real-time processing, data science, and low-latency SQL at petabyte scale. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about lowering cost with Auto Scaling and Spot Instances, and security best practices for encryption and fine-grained access control. Finally, we dive into some of our recent launches to keep you current on our latest features.
ABD307 – Deep Analytics for Global AWS Marketing Organization To meet the needs of the global marketing organization, the AWS marketing analytics team built a scalable platform that allows the data science team to deliver custom econometric and machine learning models for end user self-service. To meet data security standards, we use end-to-end data encryption and different AWS services such as Amazon Redshift, Amazon RDS, Amazon S3, Amazon EMR with Apache Spark and Auto Scaling. In this session, you see real examples of how we have scaled and automated critical analysis, such as calculating the impact of marketing programs like re:Invent and prioritizing leads for our sales teams.
ABD311 – Deploying Business Analytics at Enterprise Scale with Amazon QuickSight One of the biggest tradeoffs customers usually make when deploying BI solutions at scale is agility versus governance. Large-scale BI implementations with the right governance structure can take months to design and deploy. In this session, learn how you can avoid making this tradeoff using Amazon QuickSight. Learn how to easily deploy Amazon QuickSight to thousands of users using Active Directory and Federated SSO, while securely accessing your data sources in Amazon VPCs or on-premises. We also cover how to control access to your datasets, implement row-level security, create scheduled email reports, and audit access to your data.
ABD315 – Building Serverless ETL Pipelines with AWS Glue Organizations need to gain insight and knowledge from a growing number of Internet of Things (IoT), APIs, clickstreams, unstructured and log data sources. However, organizations are also often limited by legacy data warehouses and ETL processes that were designed for transactional data. In this session, we introduce key ETL features of AWS Glue, cover common use cases ranging from scheduled nightly data warehouse loads to near real-time, event-driven ETL flows for your data lake. We discuss how to build scalable, efficient, and serverless ETL pipelines using AWS Glue. Additionally, Merck will share how they built an end-to-end ETL pipeline for their application release management system, and launched it in production in less than a week using AWS Glue.
ABD318 – Architecting a data lake with Amazon S3, Amazon Kinesis, and Amazon Athena Learn how to architect a data lake where different teams within your organization can publish and consume data in a self-service manner. As organizations aim to become more data-driven, data engineering teams have to build architectures that can cater to the needs of diverse users – from developers, to business analysts, to data scientists. Each of these user groups employs different tools, have different data needs and access data in different ways. In this talk, we will dive deep into assembling a data lake using Amazon S3, Amazon Kinesis, Amazon Athena, Amazon EMR, and AWS Glue. The session will feature Mohit Rao, Architect and Integration lead at Atlassian, the maker of products such as JIRA, Confluence, and Stride. First, we will look at a couple of common architectures for building a data lake. Then we will show how Atlassian built a self-service data lake, where any team within the company can publish a dataset to be consumed by a broad set of users.
Companies have valuable data that they may not be analyzing due to the complexity, scalability, and performance issues of loading the data into their data warehouse. However, with the right tools, you can extend your analytics to query data in your data lake—with no loading required. Amazon Redshift Spectrum extends the analytic power of Amazon Redshift beyond data stored in your data warehouse to run SQL queries directly against vast amounts of unstructured data in your Amazon S3 data lake. This gives you the freedom to store your data where you want, in the format you want, and have it available for analytics when you need it. Join a discussion with AWS solution architects to ask question.
ABD330 – Combining Batch and Stream Processing to Get the Best of Both Worlds Today, many architects and developers are looking to build solutions that integrate batch and real-time data processing, and deliver the best of both approaches. Lambda architecture (not to be confused with the AWS Lambda service) is a design pattern that leverages both batch and real-time processing within a single solution to meet the latency, accuracy, and throughput requirements of big data use cases. Come join us for a discussion on how to implement Lambda architecture (batch, speed, and serving layers) and best practices for data processing, loading, and performance tuning
ABD335 – Real-Time Anomaly Detection Using Amazon Kinesis Amazon Kinesis Analytics offers a built-in machine learning algorithm that you can use to easily detect anomalies in your VPC network traffic and improve security monitoring. Join us for an interactive discussion on how to stream your VPC flow Logs to Amazon Kinesis Streams and identify anomalies using Kinesis Analytics.
ABD339 – Deep Dive and Best Practices for Amazon Athena Amazon Athena is an interactive query service that enables you to process data directly from Amazon S3 without the need for infrastructure. Since its launch at re:invent 2016, several organizations have adopted Athena as the central tool to process all their data. In this talk, we dive deep into the most common use cases, including working with other AWS services. We review the best practices for creating tables and partitions and performance optimizations. We also dive into how Athena handles security, authorization, and authentication. Lastly, we hear from a customer who has reduced costs and improved time to market by deploying Athena across their organization.
We look forward to meeting you at re:Invent 2017!
About the Author
Roy Ben-Alta is a solution architect and principal business development manager at Amazon Web Services in New York. He focuses on Data Analytics and ML Technologies, working with AWS customers to build innovative data-driven products.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.