Tag Archives: KSI

Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

  • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
  • Save data in an Amazon S3
  • Query data using Amazon Redshift Spectrum.

We use the following services:

Serverless architecture for capturing and analyzing Aurora data changes

Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

The following diagram shows the flow of data as it occurs in this tutorial:

The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

Creating an Aurora database

First, create a database by following these steps in the Amazon RDS console:

  1. Sign in to the AWS Management Console, and open the Amazon RDS console.
  2. Choose Launch a DB instance, and choose Next.
  3. For Engine, choose Amazon Aurora.
  4. Choose a DB instance class. This example uses a small, since this is not a production database.
  5. In Multi-AZ deployment, choose No.
  6. Configure DB instance identifier, Master username, and Master password.
  7. Launch the DB instance.

After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

The following screenshot shows the MySQL Workbench configuration:

Next, create a table in the database by running the following SQL statement:

Create Table
ItemID int NOT NULL,
Category varchar(255),
Price double(10,2), 
Quantity int not NULL,
OrderDate timestamp,
DestinationState varchar(2),
ShippingType varchar(255),
Referral varchar(255),

You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

import MySQLdb
import random
import datetime

db = MySQLdb.connect(host="AURORA_CNAME",

states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",

shipping_types = ("Free", "3-Day", "2-Day")

product_categories = ("Garden", "Kitchen", "Office", "Household")
referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")

for i in range(0,10):
    item_id = random.randint(1,100)
    state = states[random.randint(0,len(states)-1)]
    shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
    product_category = product_categories[random.randint(0,len(product_categories)-1)]
    quantity = random.randint(1,4)
    referral = referrals[random.randint(0,len(referrals)-1)]
    price = random.randint(1,100)
    order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()

    data_order = (item_id, product_category, price, quantity, order_date, state,
    shipping_type, referral)

    add_order = ("INSERT INTO Sales "
                   "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                   ShippingType, Referral) "
                   "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")

    cursor = db.cursor()
    cursor.execute(add_order, data_order)



The following screenshot shows how the table appears with the sample data:

Sending data from Amazon Aurora to Amazon S3

There are two methods available to send data from Amazon Aurora to Amazon S3:

  • Using a Lambda function

To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

Creating a Kinesis data delivery stream

The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

To create a delivery stream:

  1. Open the Kinesis Data Firehose console
  2. Choose Create delivery stream.
  3. For Delivery stream name, type AuroraChangesToS3.
  4. For Source, choose Direct PUT.
  5. For Record transformation, choose Disabled.
  6. For Destination, choose Amazon S3.
  7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
  8. Enter a prefix if needed, and choose Next.
  9. For Data compression, choose GZIP.
  10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
  11. Review all the details on the screen, and choose Create delivery stream when you’re finished.


Creating a Lambda function

Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

To create the Lambda function:

  1. Open the AWS Lambda console.
  2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
  3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
  4. Choose Author from scratch.
  5. Give your function a name and select Python 3.6 for Runtime
  6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
  7. Choose Next on the trigger selection screen.
  8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
  9. Choose File -> Save in the code editor and then choose Save.
import boto3
import json

firehose = boto3.client('firehose')
stream_name = ‘AuroraChangesToS3’

def Kinesis_publish_message(event, context):
    firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
    event['Category'], event['Price'], event['Quantity'],
    event['OrderDate'], event['DestinationState'], event['ShippingType'], 
    firehose_data = {'Data': str(firehose_data)}

Note the Amazon Resource Name (ARN) of this Lambda function.

Giving Aurora permissions to invoke a Lambda function

To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

Creating a stored procedure and a trigger in Amazon Aurora

Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

									IN Category varchar(255), 
									IN Price double(10,2),
                                    IN Quantity int(11),
                                    IN OrderDate timestamp,
                                    IN DestinationState varchar(2),
                                    IN ShippingType varchar(255),
                                    IN Referral  varchar(255)) LANGUAGE SQL 
  CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
     CONCAT('{ "ItemID" : "', ItemID, 
            '", "Category" : "', Category,
            '", "Price" : "', Price,
            '", "Quantity" : "', Quantity, 
            '", "OrderDate" : "', OrderDate, 
            '", "DestinationState" : "', DestinationState, 
            '", "ShippingType" : "', ShippingType, 
            '", "Referral" : "', Referral, '"}')

Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

  SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
  , New.DestinationState, New.ShippingType, New.Referral
  INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral;
  CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
  , @DestinationState, @ShippingType, @Referral);

If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

Querying data in Amazon Redshift

In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

Next, connect to the Amazon Redshift cluster, and create an external schema and database:

create external schema if not exists spectrum_schema
from data catalog 
database 'spectrum_db' 
region 'us-east-1'
IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
create external database if not exists;

Don’t forget to replace the IAM role in the statement.

Then create an external table within the database:

 CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
  ItemID int,
  Category varchar,
  Quantity int,
  OrderDate TIMESTAMP,
  DestinationState varchar,
  ShippingType varchar,
  Referral varchar)

Query the table, and it should contain data. This is a fact table.

select top 10 * from spectrum_schema.ecommerce_sales


Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

CREATE TABLE date_dimension (
  d_datekey           integer       not null sortkey,
  d_dayofmonth        integer       not null,
  d_monthnum          integer       not null,
  d_dayofweek                varchar(10)   not null,
  d_prettydate        date       not null,
  d_quarter           integer       not null,
  d_half              integer       not null,
  d_year              integer       not null,
  d_season            varchar(10)   not null,
  d_fiscalyear        integer       not null)
diststyle all;

Populate the table with data:

copy date_dimension from 's3://reparmar-lab/2016dates' 
iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
dateformat 'auto';

The date dimension table should look like the following:

Querying data in local and external tables using Amazon Redshift

Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

select sum(quantity*price) as total_sales, date_dimension.d_season
from spectrum_schema.ecommerce_sales 
join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
group by date_dimension.d_season

You get the following results:

Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

Enter the database connection details, validate the connection, and create the data source.

Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

On the next screen, choose Edit.

In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

Final notes

Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

Keep the following in mind:

  • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
  • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
  • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

If you have questions or suggestions, please comment below.

Additional Reading

If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum

About the Authors

Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.




Now Available: New Digital Training to Help You Learn About AWS Big Data Services

Post Syndicated from Sara Snedeker original https://aws.amazon.com/blogs/big-data/now-available-new-digital-training-to-help-you-learn-about-aws-big-data-services/

AWS Training and Certification recently released free digital training courses that will make it easier for you to build your cloud skills and learn about using AWS Big Data services. This training includes courses like Introduction to Amazon EMR and Introduction to Amazon Athena.

You can get free and unlimited access to more than 100 new digital training courses built by AWS experts at aws.training. It’s easy to access training related to big data. Just choose the Analytics category on our Find Training page to browse through the list of courses. You can also use the keyword filter to search for training for specific AWS offerings.

Recommended training

Just getting started, or looking to learn about a new service? Check out the following digital training courses:

Introduction to Amazon EMR (15 minutes)
Covers the available tools that can be used with Amazon EMR and the process of creating a cluster. It includes a demonstration of how to create an EMR cluster.

Introduction to Amazon Athena (10 minutes)
Introduces the Amazon Athena service along with an overview of its operating environment. It covers the basic steps in implementing Athena and provides a brief demonstration.

Introduction to Amazon QuickSight (10 minutes)
Discusses the benefits of using Amazon QuickSight and how the service works. It also includes a demonstration so that you can see Amazon QuickSight in action.

Introduction to Amazon Redshift (10 minutes)
Walks you through Amazon Redshift and its core features and capabilities. It also includes a quick overview of relevant use cases and a short demonstration.

Introduction to AWS Lambda (10 minutes)
Discusses the rationale for using AWS Lambda, how the service works, and how you can get started using it.

Introduction to Amazon Kinesis Analytics (10 minutes)
Discusses how Amazon Kinesis Analytics collects, processes, and analyzes streaming data in real time. It discusses how to use and monitor the service and explores some use cases.

Introduction to Amazon Kinesis Streams (15 minutes)
Covers how Amazon Kinesis Streams is used to collect, process, and analyze real-time streaming data to create valuable insights.

Introduction to AWS IoT (10 minutes)
Describes how the AWS Internet of Things (IoT) communication architecture works, and the components that make up AWS IoT. It discusses how AWS IoT works with other AWS services and reviews a case study.

Introduction to AWS Data Pipeline (10 minutes)
Covers components like tasks, task runner, and pipeline. It also discusses what a pipeline definition is, and reviews the AWS services that are compatible with AWS Data Pipeline.

Go deeper with classroom training

Want to learn more? Enroll in classroom training to learn best practices, get live feedback, and hear answers to your questions from an instructor.

Big Data on AWS (3 days)
Introduces you to cloud-based big data solutions such as Amazon EMR, Amazon Redshift, Amazon Kinesis, and the rest of the AWS big data platform.

Data Warehousing on AWS (3 days)
Introduces you to concepts, strategies, and best practices for designing a cloud-based data warehousing solution, and demonstrates how to collect, store, and prepare data for the data warehouse.

Building a Serverless Data Lake (1 day)
Teaches you how to design, build, and operate a serverless data lake solution with AWS services. Includes topics such as ingesting data from any data source at large scale, storing the data securely and durably, using the right tool to process large volumes of data, and understanding the options available for analyzing the data in near-real time.

More training coming in 2018

We’re always evaluating and expanding our training portfolio, so stay tuned for more training options in the new year. You can always visit us at aws.training to explore our latest offerings.

AWS Updated Its ISO Certifications and Now Has 67 Services Under ISO Compliance

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/aws-updated-its-iso-certifications-and-now-has-67-services-under-iso-compliance/

ISO logo

AWS has updated its certifications against ISO 9001, ISO 27001, ISO 27017, and ISO 27018 standards, bringing the total to 67 services now under ISO compliance. We added the following 29 services this cycle:

Amazon Aurora Amazon S3 Transfer Acceleration AWS [email protected]
Amazon Cloud Directory Amazon SageMaker AWS Managed Services
Amazon CloudWatch Logs Amazon Simple Notification Service AWS OpsWorks Stacks
Amazon Cognito Auto Scaling AWS Shield
Amazon Connect AWS Batch AWS Snowball Edge
Amazon Elastic Container Registry AWS CodeBuild AWS Snowmobile
Amazon Inspector AWS CodeCommit AWS Step Functions
Amazon Kinesis Data Streams AWS CodeDeploy AWS Systems Manager (formerly Amazon EC2 Systems Manager)
Amazon Macie AWS CodePipeline AWS X-Ray
Amazon QuickSight AWS IoT Core

For the complete list of services under ISO compliance, see AWS Services in Scope by Compliance Program.

AWS maintains certifications through extensive audits of its controls to ensure that information security risks that affect the confidentiality, integrity, and availability of company and customer information are appropriately managed.

You can download copies of the AWS ISO certificates that contain AWS’s in-scope services and Regions, and use these certificates to jump-start your own certification efforts:

AWS does not increase service costs in any AWS Region as a result of updating its certifications.

To learn more about compliance in the AWS Cloud, see AWS Cloud Compliance.

– Chad

Bitcoin: In Crypto We Trust

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/12/bitcoin-in-crypto-we-trust.html

Tim Wu, who coined “net neutrality”, has written an op-ed on the New York Times called “The Bitcoin Boom: In Code We Trust“. He is wrong about “code”.

The wrong “trust”

Wu builds a big manifesto about how real-world institutions aren’t can’t be trusted. Certainly, this reflects the rhetoric from a vocal wing of Bitcoin fanatics, but it’s not the Bitcoin manifesto.

Instead, the word “trust” in the Bitcoin paper is much narrower, referring to how online merchants can’t trust credit-cards (for example). When I bought school supplies for my niece when she studied in Canada, the online site wouldn’t accept my U.S. credit card. They didn’t trust my credit card. However, they trusted my Bitcoin, so I used that payment method instead, and succeeded in the purchase.

Real-world currencies like dollars are tethered to the real-world, which means no single transaction can be trusted, because “they” (the credit-card company, the courts, etc.) may decide to reverse the transaction. The manifesto behind Bitcoin is that a transaction cannot be reversed — and thus, can always be trusted.

Deliberately confusing the micro-trust in a transaction and macro-trust in banks and governments is a sort of bait-and-switch.

The wrong inspiration

Wu claims:

“It was, after all, a carnival of human errors and misfeasance that inspired the invention of Bitcoin in 2009, namely, the financial crisis.”

Not true. Bitcoin did not appear fully formed out of the void, but was instead based upon a series of innovations that predate the financial crisis by a decade. Moreover, the financial crisis had little to do with “currency”. The value of the dollar and other major currencies were essentially unscathed by the crisis. Certainly, enthusiasts looking backward like to cherry pick the financial crisis as yet one more reason why the offline world sucks, but it had little to do with Bitcoin.

In crypto we trust

It’s not in code that Bitcoin trusts, but in crypto. Satoshi makes that clear in one of his posts on the subject:

A generation ago, multi-user time-sharing computer systems had a similar problem. Before strong encryption, users had to rely on password protection to secure their files, placing trust in the system administrator to keep their information private. Privacy could always be overridden by the admin based on his judgment call weighing the principle of privacy against other concerns, or at the behest of his superiors. Then strong encryption became available to the masses, and trust was no longer required. Data could be secured in a way that was physically impossible for others to access, no matter for what reason, no matter how good the excuse, no matter what.

You don’t possess Bitcoins. Instead, all the coins are on the public blockchain under your “address”. What you possess is the secret, private key that matches the address. Transferring Bitcoin means using your private key to unlock your coins and transfer them to another. If you print out your private key on paper, and delete it from the computer, it can never be hacked.

Trust is in this crypto operation. Trust is in your private crypto key.

We don’t trust the code

The manifesto “in code we trust” has been proven wrong again and again. We don’t trust computer code (software) in the cryptocurrency world.

The most profound example is something known as the “DAO” on top of Ethereum, Bitcoin’s major competitor. Ethereum allows “smart contracts” containing code. The quasi-religious manifesto of the DAO smart-contract is that the “code is the contract”, that all the terms and conditions are specified within the smart-contract code, completely untethered from real-world terms-and-conditions.

Then a hacker found a bug in the DAO smart-contract and stole most of the money.

In principle, this is perfectly legal, because “the code is the contract”, and the hacker just used the code. In practice, the system didn’t live up to this. The Ethereum core developers, acting as central bankers, rewrote the Ethereum code to fix this one contract, returning the money back to its original owners. They did this because those core developers were themselves heavily invested in the DAO and got their money back.

Similar things happen with the original Bitcoin code. A disagreement has arisen about how to expand Bitcoin to handle more transactions. One group wants smaller and “off-chain” transactions. Another group wants a “large blocksize”. This caused a “fork” in Bitcoin with two versions, “Bitcoin” and “Bitcoin Cash”. The fork championed by the core developers (central bankers) is worth around $20,000 right now, while the other fork is worth around $2,000.

So it’s still “in central bankers we trust”, it’s just that now these central bankers are mostly online instead of offline institutions. They have proven to be even more corrupt than real-world central bankers. It’s certainly not the code that is trusted.

The bubble

Wu repeats the well-known reference to Amazon during the dot-com bubble. If you bought Amazon’s stock for $107 right before the dot-com crash, it still would be one of wisest investments you could’ve made. Amazon shares are now worth around $1,200 each.

The implication is that Bitcoin, too, may have such long term value. Even if you buy it today and it crashes tomorrow, it may still be worth ten-times its current value in another decade or two.

This is a poor analogy, for three reasons.

The first reason is that we knew the Internet had fundamentally transformed commerce. We knew there were going to be winners in the long run, it was just a matter of picking who would win (Amazon) and who would lose (Pets.com). We have yet to prove Bitcoin will be similarly transformative.

The second reason is that businesses are real, they generate real income. While the stock price may include some irrational exuberance, it’s ultimately still based on the rational expectations of how much the business will earn. With Bitcoin, it’s almost entirely irrational exuberance — there are no long term returns.

The third flaw in the analogy is that there are an essentially infinite number of cryptocurrencies. We saw this today as Coinbase started trading Bitcoin Cash, a fork of Bitcoin. The two are nearly identical, so there’s little reason one should be so much valuable than another. It’s only a fickle fad that makes one more valuable than another, not business fundamentals. The successful future cryptocurrency is unlikely to exist today, but will be invented in the future.

The lessons of the dot-com bubble is not that Bitcoin will have long term value, but that cryptocurrency companies like Coinbase and BitPay will have long term value. Or, the lesson is that “old” companies like JPMorgan that are early adopters of the technology will grow faster than their competitors.


The point of Wu’s paper is to distinguish trust in traditional real-world institutions and trust in computer software code. This is an inaccurate reading of the situation.

Bitcoin is not about replacing real-world institutions but about untethering online transactions.

The trust in Bitcoin is in crypto — the power crypto gives individuals instead of third-parties.

The trust is not in the code. Bitcoin is a “cryptocurrency” not a “codecurrency”.

How to Manage Amazon GuardDuty Security Findings Across Multiple Accounts

Post Syndicated from Tom Stickle original https://aws.amazon.com/blogs/security/how-to-manage-amazon-guardduty-security-findings-across-multiple-accounts/

Introduced at AWS re:Invent 2017, Amazon GuardDuty is a managed threat detection service that continuously monitors for malicious or unauthorized behavior to help you protect your AWS accounts and workloads. In an AWS Blog post, Jeff Barr shows you how to enable GuardDuty to monitor your AWS resources continuously. That blog post shows how to get started with a single GuardDuty account and provides an overview of the features of the service. Your security team, though, will probably want to use GuardDuty to monitor a group of AWS accounts continuously.

In this post, I demonstrate how to use GuardDuty to monitor a group of AWS accounts and have their findings routed to another AWS account—the master account—that is owned by a security team. The method I demonstrate in this post is especially useful if your security team is responsible for monitoring a group of AWS accounts over which it does not have direct access—known as member accounts. In this solution, I simplify the work needed to enable GuardDuty in member accounts and configure findings by simplifying the process, which I do by enabling GuardDuty in the master account and inviting member accounts.

Enable GuardDuty in a master account and invite member accounts

To get started, you must enable GuardDuty in the master account, which will receive GuardDuty findings. The master account should be managed by your security team, and it will display the findings from all member accounts. The master account can be reverted later by removing any member accounts you add to it. Adding member accounts is a two-way handshake mechanism to ensure that administrators from both the master and member accounts formally agree to establish the relationship.

To enable GuardDuty in the master account and add member accounts:

  1. Navigate to the GuardDuty console.
  2. In the navigation pane, choose Accounts.
    Screenshot of the Accounts choice in the navigation pane
  1. To designate this account as the GuardDuty master account, start adding member accounts:
    • You can add individual accounts by choosing Add Account, or you can add a list of accounts by choosing Upload List (.csv).
  1. Now, add the account ID and email address of the member account, and choose Add. (If you are uploading a list of accounts, choose Browse, choose the .csv file with the member accounts [one email address and account ID per line], and choose Add accounts.)
    Screenshot of adding an account

For security reasons, AWS checks to make sure each account ID is valid and that you’ve entered each member account’s email address that was used to create the account. If a member account’s account ID and email address do not match, GuardDuty does not send an invitation.
Screenshot showing the Status of Invite

  1. After you add all the member accounts you want to add, you will see them listed in the Member accounts table with a Status of Invite. You don’t have to individually invite each account—you can choose a group of accounts and when you choose to invite one account in the group, all accounts are invited.
  2. When you choose Invite for each member account:
    1. AWS checks to make sure the account ID is valid and the email address provided is the email address of the member account.
    2. AWS sends an email to the member account email address with a link to the GuardDuty console, where the member account owner can accept the invitation. You can add a customized message from your security team. Account owners who receive the invitation must sign in to their AWS account to accept the invitation. The service also sends an invitation through the AWS Personal Health Dashboard in case the member email address is not monitored. This invitation appears in the member account under the AWS Personal Health Dashboard alert bell on the AWS Management Console.
    3. A pending-invitation indicator is shown on the GuardDuty console of the member account, as shown in the following screenshot.
      Screenshot showing the pending-invitation indicator

When the invitation is sent by email, it is sent to the account owner of the GuardDuty member account.
Screenshot of the invitation sent by email

The account owner can click the link in the email invitation or the AWS Personal Health Dashboard message, or the account owner can sign in to their account and navigate to the GuardDuty console. In all cases, the member account displays the pending invitation in the member account’s GuardDuty console with instructions for accepting the invitation. The GuardDuty console walks the account owner through accepting the invitation, including enabling GuardDuty if it is not already enabled.

If you prefer to work in the AWS CLI, you can enable GuardDuty and accept the invitation. To do this, call CreateDetector to enable GuardDuty, and then call AcceptInvitation, which serves the same purpose as accepting the invitation in the GuardDuty console.

  1. After the member account owner accepts the invitation, the Status in the master account is changed to Monitored. The status helps you track the status of each AWS account that you invite.
    Screenshot showing the Status change to Monitored

You have enabled GuardDuty on the member account, and all findings will be forwarded to the master account. You can now monitor the findings about GuardDuty member accounts from the GuardDuty console in the master account.

The member account owner can see GuardDuty findings by default and can control all aspects of the experience in the member account with AWS Identity and Access Management (IAM) permissions. Users with the appropriate permissions can end the multi-account relationship at any time by toggling the Accept button on the Accounts page. Note that ending the relationship changes the Status of the account to Resigned and also triggers a security finding on the side of the master account so that the security team knows the member account is no longer linked to the master account.

Working with GuardDuty findings

Most security teams have ticketing systems, chat operations, security information event management (SIEM) systems, or other security automation systems to which they would like to push GuardDuty findings. For this purpose, GuardDuty sends all findings as JSON-based messages through Amazon CloudWatch Events, a scalable service to which you can subscribe and to which AWS services can stream system events. To access these events, navigate to the CloudWatch Events console and create a rule that subscribes to the GuardDuty-related findings. You then can assign a target such as Amazon Kinesis Data Firehose that can place the findings in a number of services such as Amazon S3. The following screenshot is of the CloudWatch Events console, where I have a rule that pulls all events from GuardDuty and pushes them to a preconfigured AWS Lambda function.

Screenshot of a CloudWatch Events rule

The following example is a subset of GuardDuty findings that includes relevant context and information about the nature of a threat that was detected. In this example, the instanceId, i-00bb62b69b7004a4c, is performing Secure Shell (SSH) brute-force attacks against IP address From a Lambda function, you can access any of the following fields such as the title of the finding and its description, and send those directly to your ticketing system.

Example GuardDuty findings

You can use other AWS services to build custom analytics and visualizations of your security findings. For example, you can connect Kinesis Data Firehose to CloudWatch Events and write events to an S3 bucket in a standard format, which can be encrypted with AWS Key Management Service and then compressed. You also can use Amazon QuickSight to build ad hoc dashboards by using AWS Glue and Amazon Athena. Similarly, you can place the data from Kinesis Data Firehose in Amazon Elasticsearch Service, with which you can use tools such as Kibana to build your own visualizations and dashboards.

Like most other AWS services, GuardDuty is a regional service. This means that when you enable GuardDuty in an AWS Region, all findings are generated and delivered in that region. If you are regulated by a compliance regime, this is often an important requirement to ensure that security findings remain in a specific jurisdiction. Because customers have let us know they would prefer to be able to enable GuardDuty globally and have all findings aggregated in one place, we intend to give the choice of regional or global isolation as we evolve this new service.


In this blog post, I have demonstrated how to use GuardDuty to monitor a group of GuardDuty member accounts and aggregate security findings in a central master GuardDuty account. You can use this solution whether or not you have direct control over the member accounts.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about using GuardDuty, start a thread in the GuardDuty forum or contact AWS Support.


Presenting AWS IoT Analytics: Delivering IoT Analytics at Scale and Faster than Ever Before

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/launch-presenting-aws-iot-analytics/

One of the technology areas I thoroughly enjoy is the Internet of Things (IoT). Even as a child I used to infuriate my parents by taking apart the toys they would purchase for me to see how they worked and if I could somehow put them back together. It seems somehow I was destined to end up the tough and ever-changing world of technology. Therefore, it’s no wonder that I am really enjoying learning and tinkering with IoT devices and technologies. It combines my love of development and software engineering with my curiosity around circuits, controllers, and other facets of the electrical engineering discipline; even though an electrical engineer I can not claim to be.

Despite all of the information that is collected by the deployment of IoT devices and solutions, I honestly never really thought about the need to analyze, search, and process this data until I came up against a scenario where it became of the utmost importance to be able to search and query through loads of sensory data for an anomaly occurrence. Of course, I understood the importance of analytics for businesses to make accurate decisions and predictions to drive the organization’s direction. But it didn’t occur to me initially, how important it was to make analytics an integral part of my IoT solutions. Well, I learned my lesson just in time because this re:Invent a service is launching to make it easier for anyone to process and analyze IoT messages and device data.


Hello, AWS IoT Analytics!  AWS IoT Analytics is a fully managed service of AWS IoT that provides advanced data analysis of data collected from your IoT devices.  With the AWS IoT Analytics service, you can process messages, gather and store large amounts of device data, as well as, query your data. Also, the new AWS IoT Analytics service feature integrates with Amazon Quicksight for visualization of your data and brings the power of machine learning through integration with Jupyter Notebooks.

Benefits of AWS IoT Analytics

  • Helps with predictive analysis of data by providing access to pre-built analytical functions
  • Provides ability to visualize analytical output from service
  • Provides tools to clean up data
  • Can help identify patterns in the gathered data

Be In the Know: IoT Analytics Concepts

  • Channel: archives the raw, unprocessed messages and collects data from MQTT topics.
  • Pipeline: consumes messages from channels and allows message processing.
    • Activities: perform transformations on your messages including filtering attributes and invoking lambda functions advanced processing.
  • Data Store: Used as a queryable repository for processed messages. Provide ability to have multiple datastores for messages coming from different devices or locations or filtered by message attributes.
  • Data Set: Data retrieval view from a data store, can be generated by a recurring schedule. 

Getting Started with AWS IoT Analytics

First, I’ll create a channel to receive incoming messages.  This channel can be used to ingest data sent to the channel via MQTT or messages directed from the Rules Engine. To create a channel, I’ll select the Channels menu option and then click the Create a channel button.

I’ll name my channel, TaraIoTAnalyticsID and give the Channel a MQTT topic filter of Temperature. To complete the creation of my channel, I will click the Create Channel button.

Now that I have my Channel created, I need to create a Data Store to receive and store the messages received on the Channel from my IoT device. Remember you can set up multiple Data Stores for more complex solution needs, but I’ll just create one Data Store for my example. I’ll select Data Stores from menu panel and click Create a data store.


I’ll name my Data Store, TaraDataStoreID, and once I click the Create the data store button and I would have successfully set up a Data Store to house messages coming from my Channel.

Now that I have my Channel and my Data Store, I will need to connect the two using a Pipeline. I’ll create a simple pipeline that just connects my Channel and Data Store, but you can create a more robust pipeline to process and filter messages by adding Pipeline activities like a Lambda activity.

To create a pipeline, I’ll select the Pipelines menu option and then click the Create a pipeline button.

I will not add an Attribute for this pipeline. So I will click Next button.

As we discussed there are additional pipeline activities that I can add to my pipeline for the processing and transformation of messages but I will keep my first pipeline simple and hit the Next button.

The final step in creating my pipeline is for me to select my previously created Data Store and click Create Pipeline.

All that is left for me to take advantage of the AWS IoT Analytics service is to create an IoT rule that sends data to an AWS IoT Analytics channel.  Wow, that was a super easy process to set up analytics for IoT devices.

If I wanted to create a Data Set as a result of queries run against my data for visualization with Amazon Quicksight or integrate with Jupyter Notebooks to perform more advanced analytical functions, I can choose the Analyze menu option to bring up the screens to create data sets and access the Juypter Notebook instances.


As you can see, it was a very simple process to set up the advanced data analysis for AWS IoT. With AWS IoT Analytics, you have the ability to collect, visualize, process, query and store large amounts of data generated from your AWS IoT connected device. Additionally, you can access the AWS IoT Analytics service in a myriad of different ways; the AWS Command Line Interface (AWS CLI), the AWS IoT API, language-specific AWS SDKs, and AWS IoT Device SDKs.

AWS IoT Analytics is available today for you to dig into the analysis of your IoT data. To learn more about AWS IoT and AWS IoT Analytics go to the AWS IoT Analytics product page and/or the AWS IoT documentation.


Amazon QuickSight Update – Geospatial Visualization, Private VPC Access, and More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-quicksight-update-geospatial-visualization-private-vpc-access-and-more/

We don’t often recognize or celebrate anniversaries at AWS. With nearly 100 services on our list, we’d be eating cake and drinking champagne several times a week. While that might sound like fun, we’d rather spend our working hours listening to customers and innovating. With that said, Amazon QuickSight has now been generally available for a little over a year and I would like to give you a quick update!

QuickSight in Action
Today, tens of thousands of customers (from startups to enterprises, in industries as varied as transportation, legal, mining, and healthcare) are using QuickSight to analyze and report on their business data.

Here are a couple of examples:

Gemini provides legal evidence procurement for California attorneys who represent injured workers. They have gone from creating custom reports and running one-off queries to creating and sharing dynamic QuickSight dashboards with drill-downs and filtering. QuickSight is used to track sales pipeline, measure order throughput, and to locate bottlenecks in the order processing pipeline.

Jivochat provides a real-time messaging platform to connect visitors to website owners. QuickSight lets them create and share interactive dashboards while also providing access to the underlying datasets. This has allowed them to move beyond the sharing of static spreadsheets, ensuring that everyone is looking at the same and is empowered to make timely decisions based on current data.

Transfix is a tech-powered freight marketplace that matches loads and increases visibility into logistics for Fortune 500 shippers in retail, food and beverage, manufacturing, and other industries. QuickSight has made analytics accessible to both BI engineers and non-technical business users. They scrutinize key business and operational metrics including shipping routes, carrier efficient, and process automation.

Looking Back / Looking Ahead
The feedback on QuickSight has been incredibly helpful. Customers tell us that their employees are using QuickSight to connect to their data, perform analytics, and make high-velocity, data-driven decisions, all without setting up or running their own BI infrastructure. We love all of the feedback that we get, and use it to drive our roadmap, leading to the introduction of over 40 new features in just a year. Here’s a summary:

Looking forward, we are watching an interesting trend develop within our customer base. As these customers take a close look at how they analyze and report on data, they are realizing that a serverless approach offers some tangible benefits. They use Amazon Simple Storage Service (S3) as a data lake and query it using a combination of QuickSight and Amazon Athena, giving them agility and flexibility without static infrastructure. They also make great use of QuickSight’s dashboards feature, monitoring business results and operational metrics, then sharing their insights with hundreds of users. You can read Building a Serverless Analytics Solution for Cleaner Cities and review Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight if you are interested in this approach.

New Features and Enhancements
We’re still doing our best to listen and to learn, and to make sure that QuickSight continues to meet your needs. I’m happy to announce that we are making seven big additions today:

Geospatial Visualization – You can now create geospatial visuals on geographical data sets.

Private VPC Access – You can now sign up to access a preview of a new feature that allows you to securely connect to data within VPCs or on-premises, without the need for public endpoints.

Flat Table Support – In addition to pivot tables, you can now use flat tables for tabular reporting. To learn more, read about Using Tabular Reports.

Calculated SPICE Fields – You can now perform run-time calculations on SPICE data as part of your analysis. Read Adding a Calculated Field to an Analysis for more information.

Wide Table Support – You can now use tables with up to 1000 columns.

Other Buckets – You can summarize the long tail of high-cardinality data into buckets, as described in Working with Visual Types in Amazon QuickSight.

HIPAA Compliance – You can now run HIPAA-compliant workloads on QuickSight.

Geospatial Visualization
Everyone seems to want this feature! You can now take data that contains a geographic identifier (country, city, state, or zip code) and create beautiful visualizations with just a few clicks. QuickSight will geocode the identifier that you supply, and can also accept lat/long map coordinates. You can use this feature to visualize sales by state, map stores to shipping destinations, and so forth. Here’s a sample visualization:

To learn more about this feature, read Using Geospatial Charts (Maps), and Adding Geospatial Data.

Private VPC Access Preview
If you have data in AWS (perhaps in Amazon Redshift, Amazon Relational Database Service (RDS), or on EC2) or on-premises in Teradata or SQL Server on servers without public connectivity, this feature is for you. Private VPC Access for QuickSight uses an Elastic Network Interface (ENI) for secure, private communication with data sources in a VPC. It also allows you to use AWS Direct Connect to create a secure, private link with your on-premises resources. Here’s what it looks like:

If you are ready to join the preview, you can sign up today.



Visualize AWS Cloudtrail Logs using AWS Glue and Amazon Quicksight

Post Syndicated from Luis Caro Perez original https://aws.amazon.com/blogs/big-data/streamline-aws-cloudtrail-log-visualization-using-aws-glue-and-amazon-quicksight/

Being able to easily visualize AWS CloudTrail logs gives you a better understanding of how your AWS infrastructure is being used. It can also help you audit and review AWS API calls and detect security anomalies inside your AWS account. To do this, you must be able to perform analytics based on your CloudTrail logs.

In this post, I walk through using AWS Glue and AWS Lambda to convert AWS CloudTrail logs from JSON to a query-optimized format dataset in Amazon S3. I then use Amazon Athena and Amazon QuickSight to query and visualize the data.

Solution overview

To process CloudTrail logs, you must implement the following architecture:

CloudTrail delivers log files in an Amazon S3 bucket folder. To correctly crawl these logs, you modify the file contents and folder structure using an Amazon S3-triggered Lambda function that stores the transformed files in an S3 bucket single folder. When the files are in a single folder, AWS Glue scans the data, converts it into Apache Parquet format, and catalogs it to allow for querying and visualization using Amazon Athena and Amazon QuickSight.


Let’s look at the steps that are required to build the solution.

Set up CloudTrail logs

First, you need to set up a trail that delivers log files to an S3 bucket. To create a trail in CloudTrail, follow the instructions in Creating a Trail.

When you finish, the trail settings page should look like the following screenshot:

In this example, I set up log files to be delivered to the cloudtraillfcaro bucket.

Consolidate CloudTrail reports into a single folder using Lambda

AWS CloudTrail delivers log files using the following folder structure inside the configured Amazon S3 bucket:


Additionally, log files have the following structure:

    "Records": [{
        "eventVersion": "1.01",
        "userIdentity": {
            "type": "IAMUser",
            "principalId": "AIDAJDPLRKLG7UEXAMPLE",
            "arn": "arn:aws:iam::123456789012:user/Alice",
            "accountId": "123456789012",
            "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
            "userName": "Alice",
            "sessionContext": {
                "attributes": {
                    "mfaAuthenticated": "false",
                    "creationDate": "2014-03-18T14:29:23Z"
        "eventTime": "2014-03-18T14:30:07Z",
        "eventSource": "cloudtrail.amazonaws.com",
        "eventName": "StartLogging",
        "awsRegion": "us-west-2",
        "sourceIPAddress": "",
        "userAgent": "signin.amazonaws.com",
        "requestParameters": {
            "name": "Default"
        "responseElements": null,
        "requestID": "cdc73f9d-aea9-11e3-9d5a-835b769c0d9c",
        "eventID": "3074414d-c626-42aa-984b-68ff152d6ab7"
    ... additional entries ...

If AWS Glue crawlers are used to catalog these files as they are written, the following obstacles arise:

  1. AWS Glue identifies different tables per different folders because they don’t follow a traditional partition format.
  2. Based on the structure of the file content, AWS Glue identifies the tables as having a single column of type array.
  3. CloudTrail logs have JSON attributes that use uppercase letters. According to the Best Practices When Using Athena with AWS Glue, it is recommended that you convert these to lowercase.

To have AWS Glue catalog all log files in a single table with all the columns describing each event, implement the following Lambda function:

from __future__ import print_function
import json
import urllib
import boto3
import gzip

s3 = boto3.resource('s3')
client = boto3.client('s3')

def convertColumntoLowwerCaps(obj):
    for key in obj.keys():
        new_key = key.lower()
        if new_key != key:
            obj[new_key] = obj[key]
            del obj[key]
    return obj

def lambda_handler(event, context):

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
        newKey = 'flatfiles/' + key.replace("/", "")
        client.download_file(bucket, key, '/tmp/file.json.gz')
        with gzip.open('/tmp/out.json.gz', 'w') as output, gzip.open('/tmp/file.json.gz', 'rb') as file:
            i = 0
            for line in file: 
                for record in json.loads(line,object_hook=convertColumntoLowwerCaps)['records']:
            		if i != 0:
            		i += 1
        client.upload_file('/tmp/out.json.gz', bucket,newKey)
        return "success"
    except Exception as e:
        print('Error processing object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

The function goes over each element of the records array, changes uppercase letters to lowercase in column names, and inserts each element of the array as a single line of a new file. The new file is saved inside a flatfiles folder created by the function without any subfolders in the S3 bucket.

The function should have a role containing a policy with at least the following permissions:

    "Version": "2012-10-17",
    "Statement": [
            "Action": [
            "Resource": [
            "Effect": "Allow"

In this example, CloudTrail delivers logs to the cloudtraillfcaro bucket. Make sure that you replace this name with your bucket name in the policy. For more information about how to work with inline policies, see Working with Inline Policies.

After the Lambda function is created, you can set up the following trigger using the Triggers tab on the AWS Lambda console.

Choose Add trigger, and choose S3 as a source of the trigger.

After choosing the source, configure the following settings:

In the trigger, any file that is written to the path for the log files—which in this case is AWSLogs/119582755581/CloudTrail/—is processed. Make sure that the Enable trigger check box is selected and that the bucket and prefix parameters match your use case.

After you set up the function and receive log files, the bucket (in this case cloudtraillfcaro) should contain the processed files inside the flatfiles folder.

Catalog source data

Once the files are processed by the Lambda function, set up a crawler named cloudtrail to catalog them.

The crawler must point to the flatfiles folder.

All the crawlers and AWS Glue jobs created for this solution must have a role with the AWSGlueServiceRole managed policy and an inline policy with permissions to modify the S3 buckets used on the Lambda function. For more information, see Working with Managed Policies.

The role should look like the following:

In this example, the inline policy named s3perms contains the permissions to modify the S3 buckets.

After you choose the role, you can schedule the crawler to run on demand.

A new database is created, and the crawler is set to use it. In this case, the cloudtrail database is used for all the tables.

After the crawler runs, a single table should be created in the catalog with the following structure:

The table should contain the following columns:

Create and run the AWS Glue job

To convert all the CloudTrail logs to a columnar store in Parquet, set up an AWS Glue job by following these steps.

Upload the following script into a bucket in Amazon S3:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3
import time

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "cloudtrail", table_name = "flatfiles", transformation_ctx = "datasource0")
resolvechoice1 = ResolveChoice.apply(frame = datasource0, choice = "make_struct", transformation_ctx = "resolvechoice1")
relationalized1 = resolvechoice1.relationalize("trail", args["TempDir"]).select("trail")
datasink = glueContext.write_dynamic_frame.from_options(frame = relationalized1, connection_type = "s3", connection_options = {"path": "s3://cloudtraillfcaro/parquettrails"}, format = "parquet", transformation_ctx = "datasink4")

In the example, you load the script as a file named cloudtrailtoparquet.py. Make sure that you modify the script and update the “{"path": "s3://cloudtraillfcaro/parquettrails"}” with the destination in which you want to store your results.

After uploading the script, add a new AWS Glue job. Choose a name and role for the job, and choose the option of running the job from An existing script that you provide.

To avoid processing the same data twice, enable the Job bookmark setting in the Advanced properties section of the job properties.

Choose Next twice, and then choose Finish.

If logs are already in the flatfiles folder, you can run the job on demand to generate the first set of results.

Once the job starts running, wait for it to complete.

When the job is finished, its Run status should be Succeeded. After that, you can verify that the Parquet files are written to the Amazon S3 location.

Catalog results

To be able to process results from Athena, you can use an AWS Glue crawler to catalog the results of the AWS Glue job.

In this example, the crawler is set to use the same database as the source named cloudtrail.

You can run the crawler using the console. When the crawler finishes running and has processed the Parquet results, a new table should be created in the AWS Glue Data Catalog. In this example, it’s named parquettrails.

The table should have the classification set to parquet.

It should have the same columns as the flatfiles table, with the exception of the struct type columns, which should be relationalized into several columns:

In this example, notice how the requestparameters column, which was a struct in the original table (flatfiles), was transformed to several columns—one for each key value inside it. This is done using a transformation native to AWS Glue called relationalize.

Query results with Athena

After crawling the results, you can query them using Athena. For example, to query what events took place in the time frame between 2017-10-23t12:00:00 and 2017-10-23t13:00, use the following select statement:

select *
from cloudtrail.parquettrails
where eventtime > '2017-10-23T12:00:00Z' AND eventtime < '2017-10-23T13:00:00Z'
order by eventtime asc;

Be sure to replace cloudtrail.parquettrails with the names of your database and table that references the Parquet results. Replace the datetimes with an hour when your account had activity and was processed by the AWS Glue job.

Visualize results using Amazon QuickSight

Once you can query the data using Athena, you can visualize it using Amazon QuickSight. Before connecting Amazon QuickSight to Athena, be sure to grant QuickSight access to Athena and the associated S3 buckets in your account. For more information, see Managing Amazon QuickSight Permissions to AWS Resources. You can then create a new data set in Amazon QuickSight based on the Athena table that you created.

After setting up permissions, you can create a new analysis in Amazon QuickSight by choosing New analysis.

Then add a new data set.

Choose Athena as the source.

Give the data source a name (in this case, I named it cloudtrail).

Choose the name of the database and the table referencing the Parquet results.

Then choose Visualize.

After that, you should see the following screen:

Now you can create some visualizations. First, search for the sourceipaddress column, and drag it to the AutoGraph section.

You can see a list of the IP addresses that you have used to interact with AWS. To review whether these IP addresses have been used from IAM users, internal AWS services, or roles, use the type value that is inside the useridentity field of the original log files. Thanks to the relationalize transformation, this value is available as the useridentity.type column. After the column is added into the Group/Color box, the visualization should look like the following:

You can now see and distinguish the most used IPs and whether they are used from roles, AWS services, or IAM users.

After following all these steps, you can use Amazon QuickSight to add different columns from CloudTrail and perform different types of visualizations. You can build operational dashboards that continuously monitor AWS infrastructure usage and access. You can share those dashboards with others in your organization who might need to see this data.


In this post, you saw how you can use a simple Lambda function and an AWS Glue script to convert text files into Parquet to improve Athena query performance and data compression. The post also demonstrated how to use AWS Lambda to preprocess files in Amazon S3 and transform them into a format that is recognizable by AWS Glue crawlers.

This example, used AWS CloudTrail logs, but you can apply the proposed solution to any set of files that after preprocessing, can be cataloged by AWS Glue.

Additional Reading

Learn how to Harmonize, Query, and Visualize Data from Various Providers using AWS Glue, Amazon Athena, and Amazon QuickSight.

About the Authors

Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.




Assassins Creed Origin DRM Hammers Gamers’ CPUs

Post Syndicated from Andy original https://torrentfreak.com/assassins-creed-origin-drm-hammers-gamers-cpus-171030/

There’s a war taking place on the Internet. On one side: gaming companies, publishers, and anti-piracy outfits. On the other: people who varying reasons want to play and/or test games for free.

While these groups are free to battle it out in a manner of their choosing, innocent victims are getting caught up in the crossfire. People who pay for their games without question should be considered part of the solution, not the problem, but whether they like it or not, they’re becoming collateral damage in an increasingly desperate conflict.

For the past several days, some players of the recently-released Assassin’s Creed Origins have emerged as what appear to be examples of this phenomenon.

“What is the normal CPU usage for this game?” a user asked on Steam forums. “I randomly get between 60% to 90% and I’m wondering if this is too high or not.”

The individual reported running an i7 processor, which is no slouch. However, for those running a CPU with less oomph, matters are even worse. Another gamer, running an i5, reported a 100% load on all four cores of his processor, even when lower graphics settings were selected in an effort to free up resources.

“It really doesn’t seem to matter what kind of GPU you are using,” another complained. “The performance issues most people here are complaining about are tied to CPU getting maxed out 100 percent at all times. This results in FPS [frames per second] drops and stutter. As far as I know there is no workaround.”

So what could be causing these problems? Badly configured machines? Terrible coding on the part of the game maker?

According to Voksi, whose ‘Revolt’ team cracked Wolfenstein II: The New Colossus before its commercial release last week, it’s none of these. The entire problem is directly connected to desperate anti-piracy measures.

As widely reported (1,2), the infamous Denuvo anti-piracy technology has been taking a beating lately. Cracking groups are dismantling it in a matter of days, sometimes just hours, making the protection almost pointless. For Assassin’s Creed Origins, however, Ubisoft decided to double up, Voksi says.

“Basically, Ubisoft have implemented VMProtect on top of Denuvo, tanking the game’s performance by 30-40%, demanding that people have a more expensive CPU to play the game properly, only because of the DRM. It’s anti-consumer and a disgusting move,” he told TorrentFreak.

Voksi says he knows all of this because he got an opportunity to review the code after obtaining the binaries for the game. Here’s how it works.

While Denuvo sits underneath doing its thing, it’s clearly vulnerable to piracy, given recent advances in anti-anti-piracy technology. So, in a belt-and-braces approach, Ubisoft opted to deploy another technology – VMProtect – on top.

VMProtect is software that protects other software against reverse engineering and cracking. Although the technicalities are different, its aims appear to be somewhat similar to Denuvo, in that both seek to protect underlying systems from being subverted.

“VMProtect protects code by executing it on a virtual machine with non-standard architecture that makes it extremely difficult to analyze and crack the software. Besides that, VMProtect generates and verifies serial numbers, limits free upgrades and much more,” the company’s marketing reads.

VMProtect and Denuvo didn’t appear to be getting on all that well earlier this year but they later settled their differences. Now their systems are working together, to try and solve the anti-piracy puzzle.

“It seems that Ubisoft decided that Denuvo is not enough to stop pirates in the crucial first days [after release] anymore, so they have implemented an iteration of VMProtect over it,” Voksi explains.

“This is great if you are looking to save your game from those pirates, because this layer of VMProtect will make Denuvo a lot more harder to trace and keygen than without it. But if you are a legit customer, well, it’s not that great for you since this combo could tank your performance by a lot, especially if you are using a low-mid range CPU. That’s why we are seeing 100% CPU usage on 4 core CPUs right now for example.”

The situation is reportedly so bad that some users are getting the dreaded BSOD (blue screen of death) due to their machines overheating after just an hour or two’s play. It remains unclear whether these crashes are indeed due to the VMProtect/Denuvo combination but the perception is that these anti-piracy measures are at the root of users’ CPU utilization problems.

While gaming companies can’t be blamed for wanting to protect their products, there’s no sense in punishing legitimate consumers with an inferior experience. The great irony, of course, is that when Assassin’s Creed gets cracked (if that indeed happens anytime soon), pirates will be the only ones playing it without the hindrance of two lots of anti-piracy tech battling over resources.

The big question now, however, is whether the anti-piracy wall will stand firm. If it does, it raises the bizarre proposition that future gamers might need to buy better hardware in order to accommodate anti-piracy technology.

And people worry about bitcoin mining……?

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Introducing AWS Directory Service for Microsoft Active Directory (Standard Edition)

Post Syndicated from Peter Pereira original https://aws.amazon.com/blogs/security/introducing-aws-directory-service-for-microsoft-active-directory-standard-edition/

Today, AWS introduced AWS Directory Service for Microsoft Active Directory (Standard Edition), also known as AWS Microsoft AD (Standard Edition), which is managed Microsoft Active Directory (AD) that is performance optimized for small and midsize businesses. AWS Microsoft AD (Standard Edition) offers you a highly available and cost-effective primary directory in the AWS Cloud that you can use to manage users, groups, and computers. It enables you to join Amazon EC2 instances to your domain easily and supports many AWS and third-party applications and services. It also can support most of the common use cases of small and midsize businesses. When you use AWS Microsoft AD (Standard Edition) as your primary directory, you can manage access and provide single sign-on (SSO) to cloud applications such as Microsoft Office 365. If you have an existing Microsoft AD directory, you can also use AWS Microsoft AD (Standard Edition) as a resource forest that contains primarily computers and groups, allowing you to migrate your AD-aware applications to the AWS Cloud while using existing on-premises AD credentials.

In this blog post, I help you get started by answering three main questions about AWS Microsoft AD (Standard Edition):

  1. What do I get?
  2. How can I use it?
  3. What are the key features?

After answering these questions, I show how you can get started with creating and using your own AWS Microsoft AD (Standard Edition) directory.

1. What do I get?

When you create an AWS Microsoft AD (Standard Edition) directory, AWS deploys two Microsoft AD domain controllers powered by Microsoft Windows Server 2012 R2 in your Amazon Virtual Private Cloud (VPC). To help deliver high availability, the domain controllers run in different Availability Zones in the AWS Region of your choice.

As a managed service, AWS Microsoft AD (Standard Edition) configures directory replication, automates daily snapshots, and handles all patching and software updates. In addition, AWS Microsoft AD (Standard Edition) monitors and automatically recovers domain controllers in the event of a failure.

AWS Microsoft AD (Standard Edition) has been optimized as a primary directory for small and midsize businesses with the capacity to support approximately 5,000 employees. With 1 GB of directory object storage, AWS Microsoft AD (Standard Edition) has the capacity to store 30,000 or more total directory objects (users, groups, and computers). AWS Microsoft AD (Standard Edition) also gives you the option to add domain controllers to meet the specific performance demands of your applications. You also can use AWS Microsoft AD (Standard Edition) as a resource forest with a trust relationship to your on-premises directory.

2. How can I use it?

With AWS Microsoft AD (Standard Edition), you can share a single directory for multiple use cases. For example, you can share a directory to authenticate and authorize access for .NET applications, Amazon RDS for SQL Server with Windows Authentication enabled, and Amazon Chime for messaging and video conferencing.

The following diagram shows some of the use cases for your AWS Microsoft AD (Standard Edition) directory, including the ability to grant your users access to external cloud applications and allow your on-premises AD users to manage and have access to resources in the AWS Cloud. Click the diagram to see a larger version.

Diagram showing some ways you can use AWS Microsoft AD (Standard Edition)--click the diagram to see a larger version

Use case 1: Sign in to AWS applications and services with AD credentials

You can enable multiple AWS applications and services such as the AWS Management Console, Amazon WorkSpaces, and Amazon RDS for SQL Server to use your AWS Microsoft AD (Standard Edition) directory. When you enable an AWS application or service in your directory, your users can access the application or service with their AD credentials.

For example, you can enable your users to sign in to the AWS Management Console with their AD credentials. To do this, you enable the AWS Management Console as an application in your directory, and then assign your AD users and groups to IAM roles. When your users sign in to the AWS Management Console, they assume an IAM role to manage AWS resources. This makes it easy for you to grant your users access to the AWS Management Console without needing to configure and manage a separate SAML infrastructure.

Use case 2: Manage Amazon EC2 instances

Using familiar AD administration tools, you can apply AD Group Policy objects (GPOs) to centrally manage your Amazon EC2 for Windows or Linux instances by joining your instances to your AWS Microsoft AD (Standard Edition) domain.

In addition, your users can sign in to your instances with their AD credentials. This eliminates the need to use individual instance credentials or distribute private key (PEM) files. This makes it easier for you to instantly grant or revoke access to users by using AD user administration tools you already use.

Use case 3: Provide directory services to your AD-aware workloads

AWS Microsoft AD (Standard Edition) is an actual Microsoft AD that enables you to run traditional AD-aware workloads such as Remote Desktop Licensing Manager, Microsoft SharePoint, and Microsoft SQL Server Always On in the AWS Cloud. AWS Microsoft AD (Standard Edition) also helps you to simplify and improve the security of AD-integrated .NET applications by using group Managed Service Accounts (gMSAs) and Kerberos constrained delegation (KCD).

Use case 4: SSO to Office 365 and other cloud applications

You can use AWS Microsoft AD (Standard Edition) to provide SSO for cloud applications. You can use Azure AD Connect to synchronize your users into Azure AD, and then use Active Directory Federation Services (AD FS) so that your users can access Microsoft Office 365 and other SAML 2.0 cloud applications by using their AD credentials.

Use case 5: Extend your on-premises AD to the AWS Cloud

If you already have an AD infrastructure and want to use it when migrating AD-aware workloads to the AWS Cloud, AWS Microsoft AD (Standard Edition) can help. You can use AD trusts to connect AWS Microsoft AD (Standard Edition) to your existing AD. This means your users can access AD-aware and AWS applications with their on-premises AD credentials, without needing you to synchronize users, groups, or passwords.

For example, your users can sign in to the AWS Management Console and Amazon WorkSpaces by using their existing AD user names and passwords. Also, when you use AD-aware applications such as SharePoint with AWS Microsoft AD (Standard Edition), your logged-in Windows users can access these applications without needing to enter credentials again.

3. What are the key features?

AWS Microsoft AD (Standard Edition) includes the features detailed in this section.

Extend your AD schema

With AWS Microsoft AD, you can run customized AD-integrated applications that require changes to your directory schema, which defines the structures of your directory. The schema is composed of object classes such as user objects, which contain attributes such as user names. AWS Microsoft AD lets you extend the schema by adding new AD attributes or object classes that are not present in the core AD attributes and classes.

For example, if you have a human resources application that uses employee badge color to assign specific benefits, you can extend the schema to include a badge color attribute in the user object class of your directory. To learn more, see How to Move More Custom Applications to the AWS Cloud with AWS Directory Service.

Create user-specific password policies

With user-specific password policies, you can apply specific restrictions and account lockout policies to different types of users in your AWS Microsoft AD (Standard Edition) domain. For example, you can enforce strong passwords and frequent password change policies for administrators, and use less-restrictive policies with moderate account lockout policies for general users.

Add domain controllers

You can increase the performance and redundancy of your directory by adding domain controllers. This can help improve application performance by enabling directory clients to load-balance their requests across a larger number of domain controllers.

Encrypt directory traffic

You can use AWS Microsoft AD (Standard Edition) to encrypt Lightweight Directory Access Protocol (LDAP) communication between your applications and your directory. By enabling LDAP over Secure Sockets Layer (SSL)/Transport Layer Security (TLS), also called LDAPS, you encrypt your LDAP communications end to end. This helps you to protect sensitive information you keep in your directory when it is accessed over untrusted networks.

Improve the security of signing in to AWS services by using multi-factor authentication (MFA)

You can improve the security of signing in to AWS services, such as Amazon WorkSpaces and Amazon QuickSight, by enabling MFA in your AWS Microsoft AD (Standard Edition) directory. With MFA, your users must enter a one-time passcode (OTP) in addition to their AD user names and passwords to access AWS applications and services you enable in AWS Microsoft AD (Standard Edition).

Get started

To get started, use the Directory Service console to create your first directory with just a few clicks. If you have not used Directory Service before, you may be eligible for a 30-day limited free trial.


In this blog post, I explained what AWS Microsoft AD (Standard Edition) is and how you can use it. With a single directory, you can address many use cases for your business, making it easier to migrate and run your AD-aware workloads in the AWS Cloud, provide access to AWS applications and services, and connect to other cloud applications. To learn more about AWS Microsoft AD, see the Directory Service home page.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about this blog post, start a new thread on the Directory Service forum.

– Peter

Amazon QuickSight Adds Support for Combo Charts and Row-Level Security

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-adds-support-for-combo-charts-and-row-level-security/

We are excited to announce support for two new features in Amazon QuickSight: 1) Combo charts, the first visual type in QuickSight to support dual-axis visualization, and 2) Row-Level Security, which allows access control over data at the row level based on the user who is accessing QuickSight. Together, these features enable you to present more engaging and personalized dashboards in Amazon QuickSight, while enforcing stricter controls over data.

Combo charts

Amazon QuickSight now supports charts with bars and lines, which you can use to visualize metrics of different scale or numeric types. For example, you can view sales ($) and margin (%) figures for different product categories of a business on the same visual.

You can also add a field to group the bars by an additional category. Following the example above, a business might want to break up sales across product categories by state to understand the details better. Amazon QuickSight supports this as a clustered bar chart with a line:

Or, as a stacked bar chart with a line:

Row-Level Security

Today’s release also adds support for Row-Level Security (RLS) in Amazon QuickSight Enterprise Edition. RLS allows control over data at a row level based on the permissions that are associated with the user who is accessing the data. With RLS, owners of a dataset can ensure that consumers of dashboards and analyses based on the dataset only view slices of data that they are authorized to. This removes the need for dataset owners to prepare separate data sets and dashboards for users (or groups of users) with different levels of access within the data.

You can use RLS for any dataset (SPICE or direct query) by simply associating a set of user access rules. These user-specific rules can be managed in a dataset (which can also be SPICE or direct query), which is linked to the dataset that is to be restricted. Let’s walk through an example to see how this works.

Using the earlier business data example, let’s consider a situation where Susan and Jane are two users in the company who need access to different views of the same data. Susan manages sales for the state of California and should be granted access to all sales data related to the state. Jane, on the other hand, is a salesperson who covers the Aquatics, Exercise & Fitness, and Outdoors categories for Washington and Oregon.

To apply RLS for this use case, the administrator can create a new rules dataset with a username field and the specific fields that should be used to filter the data. Based on the user personas above, the rules dataset will look as follows

Username Category State
Jane Aquatics, Exercise & Fitness, Outdoors WA, OR
Susan CA


After creating the rules dataset in Amazon QuickSight, the administrator can link the dataset that contains sales data with this rules dataset via the new Permissions option.

After the administrator selects and links the dataset rules, the target dataset is now always filtered by the rules specified. This means that when Jane accesses the system, she sees data related to the states she covers and the categories she handles.

Similarly, Susan now sees all categories, but only for the state of California. 

With RLS in place, a data administrator no longer has to create multiple datasets to serve such use cases and can also use the same dashboards/analyses for multiple users. For more information about RLS and details about dataset rules configuration, see the Amazon QuickSight documentation.

Learn more: To learn more about these capabilities and start using them in your dashboards, see the Amazon QuickSight User Guide. 

Stay engaged: If you have questions or suggestions, you can post them on the Amazon QuickSight discussion forum. 

Not an Amazon QuickSight user?

To get started for FREE, see quicksight.aws.


Amazon QuickSight Now Allows Users to Create Analyses from Dashboards and Import Custom Date Formats

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-allows-users-to-create-analyses-from-dashboards-and-import-custom-date-formats/

Today, we are excited to announce two new features in QuickSight that will allow increased flexibility in your interactions with visualizations and data.

Create analyses from dashboards

When we launched Amazon QuickSight in November 2016, it enabled users to quickly and easily create analyses and dashboards from their data. Analyses allows business users to slice and dice their data, whether from a direct query source or from SPICE. Dashboards allow these insights to be shared in a read-only manner across a large set of users, without the need to worry about managing authentication, scaling up servers or maintaining infrastructure.

Starting today, QuickSight will allow users to save the contents of a dashboard as an analysis within their account. As the user of a dashboard, this will allow you to create an analysis that contains all visuals from the dashboard. You may then modify the visuals, or add/delete visuals in order to customize the content to your preferences. If you are a new user of QuickSight, this also provides you the ability to start your self-service analytics journey in QuickSight with content that is highly relevant to you.

For data administrators who create and manage datasets and dashboards, this feature will reduce requests from individual users for customization/tweaks to the dashboards. When onboarding users to QuickSight for self-service analytics, this also allows administrators to provide sample dashboards that can form the basis of the user’s first analysis in QuickSight.

To be able to save dashboard content as analyses, users should have the permission to do so, together with access to the datasets that are used for the dashboard. Let’s take a look at how this works. Let’s consider Sarah, who has a business dashboard shared with her in QuickSight.

With the changes in this release, Tom, the dashboard author, has an option to allow Sarah to create analyses from this dashboard.

When enabled, this also shares the dataset with Sarah in read-only mode, so that she can explore the data further. This is done automatically when Tom enables Sarah’s ability to create analyses from the dashboard.

Once this permission is enabled, Sarah has the dataset available in her account, and also sees a new ‘Save as” option in her dashboard.

Clicking on this lets Sarah create a new analysis with all the visuals from the dashboard in her account and explore the data further!

With this release, we are also introducing the capability to view all the analyses and dashboards that access a dataset. A dataset owner can then revoke permissions to specific dashboards or analyses if needed.

Custom date formats

Today’s release also adds support for custom date formats. When importing data into QuickSight, a user can convert a non-standard datetime field into a date field by providing the format. Date formats in QuickSight are case sensitive and more details can be found in the documentation.

Learn more

To learn more about these capabilities and start using them in your dashboards, see the Amazon QuickSight User Guide.

Stay engaged

If you have questions or suggestions, you can post them on the Amazon QuickSight discussion forum.

Not an Amazon QuickSight user?

To get started for FREE, see quicksight.aws.