Tag Archives: KSI

Amazon QuickSight Adds Support for Combo Charts and Row-Level Security

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-adds-support-for-combo-charts-and-row-level-security/

We are excited to announce support for two new features in Amazon QuickSight: 1) Combo charts, the first visual type in QuickSight to support dual-axis visualization, and 2) Row-Level Security, which allows access control over data at the row level based on the user who is accessing QuickSight. Together, these features enable you to present more engaging and personalized dashboards in Amazon QuickSight, while enforcing stricter controls over data.

Combo charts

Amazon QuickSight now supports charts with bars and lines, which you can use to visualize metrics of different scale or numeric types. For example, you can view sales ($) and margin (%) figures for different product categories of a business on the same visual.

You can also add a field to group the bars by an additional category. Following the example above, a business might want to break up sales across product categories by state to understand the details better. Amazon QuickSight supports this as a clustered bar chart with a line:

Or, as a stacked bar chart with a line:

Row-Level Security

Today’s release also adds support for Row-Level Security (RLS) in Amazon QuickSight Enterprise Edition. RLS allows control over data at a row level based on the permissions that are associated with the user who is accessing the data. With RLS, owners of a dataset can ensure that consumers of dashboards and analyses based on the dataset only view slices of data that they are authorized to. This removes the need for dataset owners to prepare separate data sets and dashboards for users (or groups of users) with different levels of access within the data.

You can use RLS for any dataset (SPICE or direct query) by simply associating a set of user access rules. These user-specific rules can be managed in a dataset (which can also be SPICE or direct query), which is linked to the dataset that is to be restricted. Let’s walk through an example to see how this works.

Using the earlier business data example, let’s consider a situation where Susan and Jane are two users in the company who need access to different views of the same data. Susan manages sales for the state of California and should be granted access to all sales data related to the state. Jane, on the other hand, is a salesperson who covers the Aquatics, Exercise & Fitness, and Outdoors categories for Washington and Oregon.

To apply RLS for this use case, the administrator can create a new rules dataset with a username field and the specific fields that should be used to filter the data. Based on the user personas above, the rules dataset will look as follows

Username Category State
Jane Aquatics, Exercise & Fitness, Outdoors WA, OR
Susan CA

 

After creating the rules dataset in Amazon QuickSight, the administrator can link the dataset that contains sales data with this rules dataset via the new Permissions option.

After the administrator selects and links the dataset rules, the target dataset is now always filtered by the rules specified. This means that when Jane accesses the system, she sees data related to the states she covers and the categories she handles.

Similarly, Susan now sees all categories, but only for the state of California. 

With RLS in place, a data administrator no longer has to create multiple datasets to serve such use cases and can also use the same dashboards/analyses for multiple users. For more information about RLS and details about dataset rules configuration, see the Amazon QuickSight documentation.

Learn more: To learn more about these capabilities and start using them in your dashboards, see the Amazon QuickSight User Guide. 

Stay engaged: If you have questions or suggestions, you can post them on the Amazon QuickSight discussion forum. 

Not an Amazon QuickSight user?

To get started for FREE, see quicksight.aws.

 

Amazon QuickSight Now Allows Users to Create Analyses from Dashboards and Import Custom Date Formats

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-allows-users-to-create-analyses-from-dashboards-and-import-custom-date-formats/

Today, we are excited to announce two new features in QuickSight that will allow increased flexibility in your interactions with visualizations and data.

Create analyses from dashboards

When we launched Amazon QuickSight in November 2016, it enabled users to quickly and easily create analyses and dashboards from their data. Analyses allows business users to slice and dice their data, whether from a direct query source or from SPICE. Dashboards allow these insights to be shared in a read-only manner across a large set of users, without the need to worry about managing authentication, scaling up servers or maintaining infrastructure.

Starting today, QuickSight will allow users to save the contents of a dashboard as an analysis within their account. As the user of a dashboard, this will allow you to create an analysis that contains all visuals from the dashboard. You may then modify the visuals, or add/delete visuals in order to customize the content to your preferences. If you are a new user of QuickSight, this also provides you the ability to start your self-service analytics journey in QuickSight with content that is highly relevant to you.

For data administrators who create and manage datasets and dashboards, this feature will reduce requests from individual users for customization/tweaks to the dashboards. When onboarding users to QuickSight for self-service analytics, this also allows administrators to provide sample dashboards that can form the basis of the user’s first analysis in QuickSight.

To be able to save dashboard content as analyses, users should have the permission to do so, together with access to the datasets that are used for the dashboard. Let’s take a look at how this works. Let’s consider Sarah, who has a business dashboard shared with her in QuickSight.

With the changes in this release, Tom, the dashboard author, has an option to allow Sarah to create analyses from this dashboard.

When enabled, this also shares the dataset with Sarah in read-only mode, so that she can explore the data further. This is done automatically when Tom enables Sarah’s ability to create analyses from the dashboard.

Once this permission is enabled, Sarah has the dataset available in her account, and also sees a new ‘Save as” option in her dashboard.

Clicking on this lets Sarah create a new analysis with all the visuals from the dashboard in her account and explore the data further!

With this release, we are also introducing the capability to view all the analyses and dashboards that access a dataset. A dataset owner can then revoke permissions to specific dashboards or analyses if needed.

Custom date formats

Today’s release also adds support for custom date formats. When importing data into QuickSight, a user can convert a non-standard datetime field into a date field by providing the format. Date formats in QuickSight are case sensitive and more details can be found in the documentation.

Learn more

To learn more about these capabilities and start using them in your dashboards, see the Amazon QuickSight User Guide.

Stay engaged

If you have questions or suggestions, you can post them on the Amazon QuickSight discussion forum.

Not an Amazon QuickSight user?

To get started for FREE, see quicksight.aws.

Amazon QuickSight Now Supports Search, Filter Groups, and Amazon S3 Analytics Connector

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-supports-search-filter-groups-and-amazon-s3-analytics-connector/

Today, I’m excited to share information about some new features in Amazon QuickSight. First, you can now search for datasets, analyses, and dashboards in Amazon QuickSight using the unified search box, making it faster and easier to find and access your data. Next, you can now create filter groups with multiple filter conditions that are evaluated together using the OR operation. Finally, you can now use the built-in Amazon S3 analytics connector to visualize your S3 storage access patterns across multiple S3 buckets and configurations within a single Amazon QuickSight dashboard to optimize for cost.

Search

You can now easily and quickly find and access your datasets, analyses, and dashboards using the unified search box in Amazon QuickSight. Type in what you’re looking for and you get a list of all matches in a unified view. From there, you can take actions such as creating an analysis from a dataset, modifying a dataset, or accessing an analysis or dashboard.

Filter groups

Filters are one of the most important features in Amazon QuickSight. Before this release, you could create multiple filters that were evaluated using the AND operation. With this release, you can now create multiple filters that are evaluated using the OR operation. This provides you with the flexibility to apply more complex filters to your data and visualizations. For example, you can create a chart that shows customers who have spent less than $100 OR made three or more purchases.

Amazon S3 analytics connector

In July, AWS introduced the ability to analyze and visualize your Amazon S3 storage access patterns using Amazon QuickSight in one click from the S3 console. Today, AWS released a dedicated S3 analytics connector in Amazon QuickSight. This connector allows you to import S3 analytics data for different buckets and configurations into a single Amazon QuickSight dataset. With this dataset, you can then create analyses and dashboards that tracks all of your S3 usage patterns in a single view.

Learn more

To learn more about these capabilities and start using them in your dashboards, see the Amazon QuickSight User Guide.

Stay engaged

If you have questions or suggestions, you can post them on the Amazon QuickSight discussion forum.

Not an Amazon QuickSight user?

To get started for FREE, see quicksight.aws.

 

Analyzing Salesforce Data with Amazon QuickSight

Post Syndicated from David McAmis original https://aws.amazon.com/blogs/big-data/analyzing-salesforce-data-with-amazon-quicksight/

Salesforce Sales Cloud is a powerful platform for managing customer data. One of the key functions that the platform provides is the ability to track customer opportunities. Opportunities in Salesforce are used to track revenue, sales pipelines, and other activities from the very first contact with a potential customer to a closed sale.

Amazon QuickSight is a rich data visualization tool that provides the ability to connect to Salesforce data and use it as a data source for creating analyses, stories, and dashboards  and easily share them with others in the organization. This post focuses on how to connect to Salesforce as a data source and create a useful opportunity dashboard, incorporating Amazon QuickSight features like relative date filters, Key Performance Indicator (KPI) charts, and more.

Walkthrough

In this post, you walk through the following tasks:

  • Creating a new data set based on Salesforce data
  • Creating your analysis and adding visuals
  • Creating an Amazon QuickSight dashboard
  • Working with filters

Note: For this walkthrough, I am using my own Salesforce.com Developer Edition account. You can sign up for your own free developer account at https://developer.salesforce.com/.

Creating a new Amazon QuickSight data set based on Salesforce data

To start, you need to create a new Amazon QuickSight data set. Sign in to Amazon QuickSight at https://quicksight.aws using the link from the home page. Enter your Amazon QuickSight account name and choose Continue. Next, enter your Email address or user name and password, then choose Sign In.

On the Amazon QuickSight start page, choose Manage Data, which takes you to a list of your data sets. Choose New Data Set, and choose Salesforce as your data source. Enter a data source name—in this example, I called mine “SFDC Opportunity.” Choose Create Data Source to open the Salesforce authentication page, where you can enter your Salesforce user name and password.

After you are authenticated to Salesforce, you are presented with a drop-down list that lets you select data from Reports or Objects. For this tutorial, choose Object. Scroll down in the list to choose the Opportunity object, and then choose Select.

To finish creating your data set, choose Visualize to go to where you can create a new Amazon QuickSight analysis from this data.

Creating your analysis and adding visuals

Now that you have acquired your data, it’s time to start working with your analysis. In Amazon Quicksight, an analysis is a container for a set of related visual stories. When you chose Visualize, a new analysis was created for you. This is where you start to create the visuals (charts, graphs, etc.) that will be the building blocks for your dashboard.

In Amazon QuickSight, Salesforce objects look like database tables. In the analysis that you just created, you can see the columns in the Fields list for the Opportunity object.

The Opportunity object in Salesforce has a number of default fields. Salesforce administrators can extend this object by adding other custom fields as required—these custom fields are usually marked with a “_c” at the end.

In the Fields List, you can see that Amazon QuickSight has divided the fields into Dimensions and Measures.  You use these to create your visualizations and dashboard. For this particular dashboard, you create five different visuals to display the data in a few different ways.

Opportunity by Stage

For the first visualization, you create a horizontal bar chart showing “Opportunity by Stage”. In the Fields List, choose the StageName dimension and the ExpectedRevenue measure. By default, this should create a horizontal bar chart for you, as shown in the following image.

Notice that this chart includes the Closed Won category, which we aren’t interested in showing. Choose the bar for Closed Won, and in the pop-up menu, choose Exclude Closed Won. This filters the chart to show only opportunities that are in progress.

It’s important to note that for this dashboard, we only want to show the opportunities that are not Closed Won. So in the menu bar on the left side, choose Filter.

By default, the filter that you just created was only applied to a single visualization. To change this, choose the filter, and then choose All Visuals from the drop-down list. This applies the filter to all visuals in the analysis.

To finish, select the chart title and rename the chart to Opportunity by Stage.

Opportunity by Month

Next, you need to create a new visual to show “Opportunity by Month.” You use a vertical bar chart to display the data. On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. For this visual, choose CloseDate from the dimensions and ExpectedRevenue from the measures.

Using the Visual Types menu, change the chart type to a Vertical Bar Chart. By default, the chart displays the revenue by year, but we want to break it down a bit further. Choose Field Wells, and using the CloseDate drop-down menu, change the Aggregate to Month.

With the change to a monthly aggregate, your chart should look something like the following:

Select the chart title and rename the chart to Opportunity by Month.

Expected Revenue

When working with Salesforce opportunities, there are two measures that are important to most sales managers—the first is the total amount associated with the opportunity, and the second is what the actual expected revenue will be. For the next visual, you use the KPI chart to display these measures.

Choose Add on the Amazon QuickSight toolbar, and then choose Add visual. From the measures, choose ExpectedRevenue, and then Amount. To change your visualization, go to the Visual Types menu and choose the Key Performance Indicator (KPI). Your visualization should change and be similar to the following:

Select the chart title and rename the chart to Expected Revenue.

Opportunity by Lead Source

Next, you need to look at where the opportunity actually came from. This helps your dashboard users understand where the leads are being generated from and their value to the business. For this visual, you use a Horizontal Bar Chart.

On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. From the measures, choose Amount, and for the dimensions, choose LeadSource. To change your visualization, go to the Visual Types menu and choose the Horizontal Bar Chart. Your visualization should change and be similar to the following:

Note: If you can’t read the chart labels for the bars, grab the axis line and drag to resize.

Select the chart title and rename the chart to Opportunity by Lead Source.

Expected Revenue vs. Opportunity Amount

For the last visual, you look at the individual opportunities and how they contribute to the total pipeline. A tree map is a specialized chart type that lets your dashboard users see how each opportunity amount contributes to the whole.  Additionally, you can highlight if there is a difference between the Expected Revenue and the Amount by sizing the marks by the Amount and coloring them by the Expected Amount.

On the Amazon QuickSight toolbar, choose Add, and then choose Add visual. From the measures, choose ExpectedRevenue and Amount. From the dimensions, choose Name. To change your visualization, go to the Visual Types menu and choose the Tree Map. Your visualization should change and be similar to the following:

Select the chart title and rename the chart to Expected Revenue vs Opportunity Amount.

Creating an Amazon QuickSight dashboard

Now that your visuals are created, it’s time to do the fun part—actually putting your Amazon QuickSight dashboard together. To create a dashboard, resize and position your visuals on the page, using the following layout:

To resize a visual, grab the handle in the lower-right corner and drag it to the height and width that you want.

To move your visual, use the grab bar at the top of the visual, as shown here:

When you are done resizing your visuals, your canvas should look something like this:

To create a dashboard, choose Share in the Amazon QuickSight toolbar. Then choose Create Dashboard. For this dashboard, give it a name of SFDC Opportunity Dashboard, and choose Create Dashboard. You are prompted to enter the email address or user name of the users you want to share this dashboard with.

Because we are just concentrating on the design at the moment, you can choose Cancel and share your dashboard later using the Share button on the dashboard toolbar.

Working with filters

There is one more feature that you can use when viewing your dashboard to make it even more useful. Earlier, when you were working with the Analysis, you added a filter to remove any opportunities that were tagged as Closed Won. Now, as you are viewing the dashboard, you add a filter that you can use to filter on a relative date.

This feature in Amazon QuickSight allows you to choose a time period (years, quarters, months, weeks, etc.) and then select from a list of relative time periods. For example, if you choose Year, you could set the filter options to Previous Year, This Year, Year to Date, or Last N Years.

This is especially handy for a Salesforce Opportunity dashboard, as you might want to filter the data using the Close Date field to see when the opportunity is actually set to close.

To create a relative date filter, choose Filter on the toolbar. Choose the filter icon, and then choose CloseDate, as shown in the following image:

At the top of the Edit Filter pane, change the drop-down list to apply the filter to All Visuals. The default filter type is Time Range, so use the drop-down list to change the filter type to Relative Dates.  For the time period, choose Quarters. To view all the current opportunities in your dashboard, choose the option for This Quarter, and choose Apply.

With the date filter in place, you have the final component for your dashboard, which should look something like the following example:

It’s important to note that at this point, you have added the filter when viewing the dashboard. If you think this is something that other users might want to do, you can go back to your Amazon QuickSight Analysis and add the filter there—that way it will be available for all dashboard users.

Summary

In this post, you learned how to connect to Salesforce data and create a basic dashboard. You can apply the same techniques to create analyses and dashboards from all different types of Salesforce data and objects. Whether you want to analyze your Salesforce account demographics or where your leads are coming from, or evaluate any other data stored in Salesforce, Amazon QuickSight helps you quickly connect to and visualize your data with only a few clicks.

 


Additional Reading

Learn how to visualize Amazon S3 analytics data with Amazon QuickSight!


About the Author

David McAmis is a Big Data & Analytics Consultant with Amazon Web Services. He works with customers to develop scalable platforms to gather, process and analyze data on AWS.

 

 

 

 

Top 10 Most Obvious Hacks of All Time (v0.9)

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/07/top-10-most-obvious-hacks-of-all-time.html

For teaching hacking/cybersecurity, I thought I’d create of the most obvious hacks of all time. Not the best hacks, the most sophisticated hacks, or the hacks with the biggest impact, but the most obvious hacks — ones that even the least knowledgeable among us should be able to understand. Below I propose some hacks that fit this bill, though in no particular order.

The reason I’m writing this is that my niece wants me to teach her some hacking. I thought I’d start with the obvious stuff first.

Shared Passwords

If you use the same password for every website, and one of those websites gets hacked, then the hacker has your password for all your websites. The reason your Facebook account got hacked wasn’t because of anything Facebook did, but because you used the same email-address and password when creating an account on “beagleforums.com”, which got hacked last year.

I’ve heard people say “I’m sure, because I choose a complex password and use it everywhere”. No, this is the very worst thing you can do. Sure, you can the use the same password on all sites you don’t care much about, but for Facebook, your email account, and your bank, you should have a unique password, so that when other sites get hacked, your important sites are secure.

And yes, it’s okay to write down your passwords on paper.

Tools: HaveIBeenPwned.com

PIN encrypted PDFs

My accountant emails PDF statements encrypted with the last 4 digits of my Social Security Number. This is not encryption — a 4 digit number has only 10,000 combinations, and a hacker can guess all of them in seconds.
PIN numbers for ATM cards work because ATM machines are online, and the machine can reject your card after four guesses. PIN numbers don’t work for documents, because they are offline — the hacker has a copy of the document on their own machine, disconnected from the Internet, and can continue making bad guesses with no restrictions.
Passwords protecting documents must be long enough that even trillion upon trillion guesses are insufficient to guess.

Tools: Hashcat, John the Ripper

SQL and other injection

The lazy way of combining websites with databases is to combine user input with an SQL statement. This combines code with data, so the obvious consequence is that hackers can craft data to mess with the code.
No, this isn’t obvious to the general public, but it should be obvious to programmers. The moment you write code that adds unfiltered user-input to an SQL statement, the consequence should be obvious. Yet, “SQL injection” has remained one of the most effective hacks for the last 15 years because somehow programmers don’t understand the consequence.
CGI shell injection is a similar issue. Back in early days, when “CGI scripts” were a thing, it was really important, but these days, not so much, so I just included it with SQL. The consequence of executing shell code should’ve been obvious, but weirdly, it wasn’t. The IT guy at the company I worked for back in the late 1990s came to me and asked “this guy says we have a vulnerability, is he full of shit?”, and I had to answer “no, he’s right — obviously so”.

XSS (“Cross Site Scripting”) [*] is another injection issue, but this time at somebody’s web browser rather than a server. It works because websites will echo back what is sent to them. For example, if you search for Cross Site Scripting with the URL https://www.google.com/search?q=cross+site+scripting, then you’ll get a page back from the server that contains that string. If the string is JavaScript code rather than text, then some servers (thought not Google) send back the code in the page in a way that it’ll be executed. This is most often used to hack somebody’s account: you send them an email or tweet a link, and when they click on it, the JavaScript gives control of the account to the hacker.

Cross site injection issues like this should probably be their own category, but I’m including it here for now.

More: Wikipedia on SQL injection, Wikipedia on cross site scripting.
Tools: Burpsuite, SQLmap

Buffer overflows

In the C programming language, programmers first create a buffer, then read input into it. If input is long than the buffer, then it overflows. The extra bytes overwrite other parts of the program, letting the hacker run code.
Again, it’s not a thing the general public is expected to know about, but is instead something C programmers should be expected to understand. They should know that it’s up to them to check the length and stop reading input before it overflows the buffer, that there’s no language feature that takes care of this for them.
We are three decades after the first major buffer overflow exploits, so there is no excuse for C programmers not to understand this issue.

What makes particular obvious is the way they are wrapped in exploits, like in Metasploit. While the bug itself is obvious that it’s a bug, actually exploiting it can take some very non-obvious skill. However, once that exploit is written, any trained monkey can press a button and run the exploit. That’s where we get the insult “script kiddie” from — referring to wannabe-hackers who never learn enough to write their own exploits, but who spend a lot of time running the exploit scripts written by better hackers than they.

More: Wikipedia on buffer overflow, Wikipedia on script kiddie,  “Smashing The Stack For Fun And Profit” — Phrack (1996)
Tools: bash, Metasploit

SendMail DEBUG command (historical)

The first popular email server in the 1980s was called “SendMail”. It had a feature whereby if you send a “DEBUG” command to it, it would execute any code following the command. The consequence of this was obvious — hackers could (and did) upload code to take control of the server. This was used in the Morris Worm of 1988. Most Internet machines of the day ran SendMail, so the worm spread fast infecting most machines.
This bug was mostly ignored at the time. It was thought of as a theoretical problem, that might only rarely be used to hack a system. Part of the motivation of the Morris Worm was to demonstrate that such problems was to demonstrate the consequences — consequences that should’ve been obvious but somehow were rejected by everyone.

More: Wikipedia on Morris Worm

Email Attachments/Links

I’m conflicted whether I should add this or not, because here’s the deal: you are supposed to click on attachments and links within emails. That’s what they are there for. The difference between good and bad attachments/links is not obvious. Indeed, easy-to-use email systems makes detecting the difference harder.
On the other hand, the consequences of bad attachments/links is obvious. That worms like ILOVEYOU spread so easily is because people trusted attachments coming from their friends, and ran them.
We have no solution to the problem of bad email attachments and links. Viruses and phishing are pervasive problems. Yet, we know why they exist.

Default and backdoor passwords

The Mirai botnet was caused by surveillance-cameras having default and backdoor passwords, and being exposed to the Internet without a firewall. The consequence should be obvious: people will discover the passwords and use them to take control of the bots.
Surveillance-cameras have the problem that they are usually exposed to the public, and can’t be reached without a ladder — often a really tall ladder. Therefore, you don’t want a button consumers can press to reset to factory defaults. You want a remote way to reset them. Therefore, they put backdoor passwords to do the reset. Such passwords are easy for hackers to reverse-engineer, and hence, take control of millions of cameras across the Internet.
The same reasoning applies to “default” passwords. Many users will not change the defaults, leaving a ton of devices hackers can hack.

Masscan and background radiation of the Internet

I’ve written a tool that can easily scan the entire Internet in a short period of time. It surprises people that this possible, but it obvious from the numbers. Internet addresses are only 32-bits long, or roughly 4 billion combinations. A fast Internet link can easily handle 1 million packets-per-second, so the entire Internet can be scanned in 4000 seconds, little more than an hour. It’s basic math.
Because it’s so easy, many people do it. If you monitor your Internet link, you’ll see a steady trickle of packets coming in from all over the Internet, especially Russia and China, from hackers scanning the Internet for things they can hack.
People’s reaction to this scanning is weirdly emotional, taking is personally, such as:
  1. Why are they hacking me? What did I do to them?
  2. Great! They are hacking me! That must mean I’m important!
  3. Grrr! How dare they?! How can I hack them back for some retribution!?

I find this odd, because obviously such scanning isn’t personal, the hackers have no idea who you are.

Tools: masscan, firewalls

Packet-sniffing, sidejacking

If you connect to the Starbucks WiFi, a hacker nearby can easily eavesdrop on your network traffic, because it’s not encrypted. Windows even warns you about this, in case you weren’t sure.

At DefCon, they have a “Wall of Sheep”, where they show passwords from people who logged onto stuff using the insecure “DefCon-Open” network. Calling them “sheep” for not grasping this basic fact that unencrypted traffic is unencrypted.

To be fair, it’s actually non-obvious to many people. Even if the WiFi itself is not encrypted, SSL traffic is. They expect their services to be encrypted, without them having to worry about it. And in fact, most are, especially Google, Facebook, Twitter, Apple, and other major services that won’t allow you to log in anymore without encryption.

But many services (especially old ones) may not be encrypted. Unless users check and verify them carefully, they’ll happily expose passwords.

What’s interesting about this was 10 years ago, when most services which only used SSL to encrypt the passwords, but then used unencrypted connections after that, using “cookies”. This allowed the cookies to be sniffed and stolen, allowing other people to share the login session. I used this on stage at BlackHat to connect to somebody’s GMail session. Google, and other major websites, fixed this soon after. But it should never have been a problem — because the sidejacking of cookies should have been obvious.

Tools: Wireshark, dsniff

Stuxnet LNK vulnerability

Again, this issue isn’t obvious to the public, but it should’ve been obvious to anybody who knew how Windows works.
When Windows loads a .dll, it first calls the function DllMain(). A Windows link file (.lnk) can load icons/graphics from the resources in a .dll file. It does this by loading the .dll file, thus calling DllMain. Thus, a hacker could put on a USB drive a .lnk file pointing to a .dll file, and thus, cause arbitrary code execution as soon as a user inserted a drive.
I say this is obvious because I did this, created .lnks that pointed to .dlls, but without hostile DllMain code. The consequence should’ve been obvious to me, but I totally missed the connection. We all missed the connection, for decades.

Social Engineering and Tech Support [* * *]

After posting this, many people have pointed out “social engineering”, especially of “tech support”. This probably should be up near #1 in terms of obviousness.

The classic example of social engineering is when you call tech support and tell them you’ve lost your password, and they reset it for you with minimum of questions proving who you are. For example, you set the volume on your computer really loud and play the sound of a crying baby in the background and appear to be a bit frazzled and incoherent, which explains why you aren’t answering the questions they are asking. They, understanding your predicament as a new parent, will go the extra mile in helping you, resetting “your” password.

One of the interesting consequences is how it affects domain names (DNS). It’s quite easy in many cases to call up the registrar and convince them to transfer a domain name. This has been used in lots of hacks. It’s really hard to defend against. If a registrar charges only $9/year for a domain name, then it really can’t afford to provide very good tech support — or very secure tech support — to prevent this sort of hack.

Social engineering is such a huge problem, and obvious problem, that it’s outside the scope of this document. Just google it to find example after example.

A related issue that perhaps deserves it’s own section is OSINT [*], or “open-source intelligence”, where you gather public information about a target. For example, on the day the bank manager is out on vacation (which you got from their Facebook post) you show up and claim to be a bank auditor, and are shown into their office where you grab their backup tapes. (We’ve actually done this).

More: Wikipedia on Social Engineering, Wikipedia on OSINT, “How I Won the Defcon Social Engineering CTF” — blogpost (2011), “Questioning 42: Where’s the Engineering in Social Engineering of Namespace Compromises” — BSidesLV talk (2016)

Blue-boxes (historical) [*]

Telephones historically used what we call “in-band signaling”. That’s why when you dial on an old phone, it makes sounds — those sounds are sent no differently than the way your voice is sent. Thus, it was possible to make tone generators to do things other than simply dial calls. Early hackers (in the 1970s) would make tone-generators called “blue-boxes” and “black-boxes” to make free long distance calls, for example.

These days, “signaling” and “voice” are digitized, then sent as separate channels or “bands”. This is call “out-of-band signaling”. You can’t trick the phone system by generating tones. When your iPhone makes sounds when you dial, it’s entirely for you benefit and has nothing to do with how it signals the cell tower to make a call.

Early hackers, like the founders of Apple, are famous for having started their careers making such “boxes” for tricking the phone system. The problem was obvious back in the day, which is why as the phone system moves from analog to digital, the problem was fixed.

More: Wikipedia on blue box, Wikipedia article on Steve Wozniak.

Thumb drives in parking lots [*]

A simple trick is to put a virus on a USB flash drive, and drop it in a parking lot. Somebody is bound to notice it, stick it in their computer, and open the file.

This can be extended with tricks. For example, you can put a file labeled “third-quarter-salaries.xlsx” on the drive that required macros to be run in order to open. It’s irresistible to other employees who want to know what their peers are being paid, so they’ll bypass any warning prompts in order to see the data.

Another example is to go online and get custom USB sticks made printed with the logo of the target company, making them seem more trustworthy.

We also did a trick of taking an Adobe Flash game “Punch the Monkey” and replaced the monkey with a logo of a competitor of our target. They now only played the game (infecting themselves with our virus), but gave to others inside the company to play, infecting others, including the CEO.

Thumb drives like this have been used in many incidents, such as Russians hacking military headquarters in Afghanistan. It’s really hard to defend against.

More: “Computer Virus Hits U.S. Military Base in Afghanistan” — USNews (2008), “The Return of the Worm That Ate The Pentagon” — Wired (2011), DoD Bans Flash Drives — Stripes (2008)

Googling [*]

Search engines like Google will index your website — your entire website. Frequently companies put things on their website without much protection because they are nearly impossible for users to find. But Google finds them, then indexes them, causing them to pop up with innocent searches.
There are books written on “Google hacking” explaining what search terms to look for, like “not for public release”, in order to find such documents.

More: Wikipedia entry on Google Hacking, “Google Hacking” book.

URL editing [*]

At the top of every browser is what’s called the “URL”. You can change it. Thus, if you see a URL that looks like this:

http://www.example.com/documents?id=138493

Then you can edit it to see the next document on the server:

http://www.example.com/documents?id=138494

The owner of the website may think they are secure, because nothing points to this document, so the Google search won’t find it. But that doesn’t stop a user from manually editing the URL.
An example of this is a big Fortune 500 company that posts the quarterly results to the website an hour before the official announcement. Simply editing the URL from previous financial announcements allows hackers to find the document, then buy/sell the stock as appropriate in order to make a lot of money.
Another example is the classic case of Andrew “Weev” Auernheimer who did this trick in order to download the account email addresses of early owners of the iPad, including movie stars and members of the Obama administration. It’s an interesting legal case because on one hand, techies consider this so obvious as to not be “hacking”. On the other hand, non-techies, especially judges and prosecutors, believe this to be obviously “hacking”.

DDoS, spoofing, and amplification [*]

For decades now, online gamers have figured out an easy way to win: just flood the opponent with Internet traffic, slowing their network connection. This is called a DoS, which stands for “Denial of Service”. DoSing game competitors is often a teenager’s first foray into hacking.
A variant of this is when you hack a bunch of other machines on the Internet, then command them to flood your target. (The hacked machines are often called a “botnet”, a network of robot computers). This is called DDoS, or “Distributed DoS”. At this point, it gets quite serious, as instead of competitive gamers hackers can take down entire businesses. Extortion scams, DDoSing websites then demanding payment to stop, is a common way hackers earn money.
Another form of DDoS is “amplification”. Sometimes when you send a packet to a machine on the Internet it’ll respond with a much larger response, either a very large packet or many packets. The hacker can then send a packet to many of these sites, “spoofing” or forging the IP address of the victim. This causes all those sites to then flood the victim with traffic. Thus, with a small amount of outbound traffic, the hacker can flood the inbound traffic of the victim.
This is one of those things that has worked for 20 years, because it’s so obvious teenagers can do it, yet there is no obvious solution. President Trump’s executive order of cyberspace specifically demanded that his government come up with a report on how to address this, but it’s unlikely that they’ll come up with any useful strategy.

More: Wikipedia on DDoS, Wikipedia on Spoofing

Conclusion

Tweet me (@ErrataRob) your obvious hacks, so I can add them to the list.

Amazon QuickSight Now Supports Amazon Athena in EU (Ireland), Count Distinct, and Week Aggregation

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-supports-amazon-athena-in-eu-ireland-count-distinct-and-week-aggregation/

Today, I’m excited to share a couple of new features in Amazon QuickSight. First, with this release, we expanded connectivity options by adding Amazon Athena support in the EU (Ireland) Region. Additionally, you can now use Count Distinct on your dimensions and metrics in the visualizations and aggregate date fields by week for SPICE data sets.

Athena in Ireland

Athena is one of the most popular data sources used by QuickSight customers. It allows you to deploy a serverless BI and analytics architecture for your operational and business data. With this release, the Athena connector is now available in the EU (Ireland) Region. You can connect QuickSight to your Athena databases and tables in the region and start visualizing your data in a matter of seconds.

Count Distinct

You can now perform aggregations using Count Distinct in the visualizations, one of the top requests from users. To use Count Distinct, simply select Count Distinct as the aggregation on the visual axis or in the field well. Count Distinct is supported for both direct queries and SPICE data sets. You can apply it to strings and measures. It is available for all supported visualization types.

Date aggregation by week

Time series line charts are one of the most common ways for customers to report on business trends. In addition to Year, Month, Day and Hour, you can now aggregate date fields by WEEK and visualize your data at a weekly granularity.

Learn more

To learn more about these capabilities and start using them in your dashboards, see the QuickSight User Guide.

Stay engaged

If you have questions or suggestions, you can post them on the QuickSight Discussion Forum.

Not a QuickSight user?

To get started for FREE, see quicksight.aws.

 

Amazon Redshift Spectrum Extends Data Warehousing Out to Exabytes—No Loading Required

Post Syndicated from Maor Kleider original https://aws.amazon.com/blogs/big-data/amazon-redshift-spectrum-extends-data-warehousing-out-to-exabytes-no-loading-required/

When we first looked into the possibility of building a cloud-based data warehouse many years ago, we were struck by the fact that our customers were storing ever-increasing amounts of data, and yet only a small fraction of that data ever made it into a data warehouse or Hadoop system for analysis. We saw that this wasn’t just a cloud-specific anomaly. It was also true in the broader industry, where the growth rate of the enterprise storage market segment greatly surpassed that of the data warehousing market segment.

We dubbed this the “dark data” problem. Our customers knew that there was untapped value in the data they collected; why else would they spend money to store it? But the systems available to them to analyze this data were simply too slow, complex, and expensive for them to use on all but a select subset of this data. They were storing it with optimistic hope that, someday, someone would find a solution.

Amazon Redshift became one of the fastest-growing AWS services because it helped solve the dark data problem. It was at least an order of magnitude less expensive and faster than most alternatives available. And Amazon Redshift was fully managed from the start—you didn’t have to worry about capacity, provisioning, patching, monitoring, backups, and a host of other DBA headaches. Many customers, including Vevo, Yelp, Redfin, and Edmunds, migrated to Amazon Redshift to improve query performance, reduce DBA overhead, and lower the cost of analytics.

And our customers’ data continues to grow at a very fast rate. Across the board, gigabytes to petabytes, the average Amazon Redshift customer doubles the data analyzed every year. That’s why we implement features that help customers handle their growing data, for example to double the query throughput or improve the compression ratios from 3x to 4x. That gives our customers some time before they have to consider throwing away data or removing it from their analytic environments. However, there is an increasing number of AWS customers who each generate a petabyte of data every day—that’s an exabyte in only three years. There wasn’t a solution for customers like that. If your data is doubling every year, it’s not long before you have to find new, disruptive approaches that transform the cost, performance, and simplicity curves for managing data.

Let’s look at the options available today. You can use Hadoop-based technologies like Apache Hive with Amazon EMR. This is actually a pretty great solution because it makes it easy and cost-effective to operate directly on data in Amazon S3 without ingestion or transformation. You can spin up clusters as you wish when you need, and size them right for that specific job you’re running. These systems are great at high scale-out processing like scans, filters, and aggregates. On the other hand, they’re not that good at complex query processing. For example, join processing requires data to be shuffled across nodes—for a large amount of data and large numbers of nodes that gets very slow. And joins are intrinsic to any meaningful analytics problem.

You can also use a columnar MPP data warehouse like Amazon Redshift. These systems make it simple to run complex analytic queries with orders of magnitude faster performance for joins and aggregations performed over large datasets. Amazon Redshift, in particular, leverages high-performance local disks, sophisticated query execution. and join-optimized data formats. Because it is just standard SQL, you can keep using your existing ETL and BI tools. But you do have to load data, and you have to provision clusters against the storage and CPU requirements you need.

Both solutions have powerful attributes, but they force you to choose which attributes you want. We see this as a “tyranny of OR.” You can have the throughput of local disks OR the scale of Amazon S3. You can have sophisticated query optimization OR high-scale data processing. You can have fast join performance with optimized formats OR a range of data processing engines that work against common data formats. But you shouldn’t have to choose. At this scale, you really can’t afford to choose. You need “all of the above.”

Redshift Spectrum

We built Redshift Spectrum to end this “tyranny of OR.” With Redshift Spectrum, Amazon Redshift customers can easily query their data in Amazon S3. Like Amazon EMR, you get the benefits of open data formats and inexpensive storage, and you can scale out to thousands of nodes to pull data, filter, project, aggregate, group, and sort. Like Amazon Athena, Redshift Spectrum is serverless and there’s nothing to provision or manage. You just pay for the resources you consume for the duration of your Redshift Spectrum query. Like Amazon Redshift itself, you get the benefits of a sophisticated query optimizer, fast access to data on local disks, and standard SQL. And like nothing else, Redshift Spectrum can execute highly sophisticated queries against an exabyte of data or more—in just minutes.

Redshift Spectrum is a built-in feature of Amazon Redshift, and your existing queries and BI tools will continue to work seamlessly. Under the covers, we manage a fleet of thousands of Redshift Spectrum nodes spread across multiple Availability Zones. These are transparently scaled and allocated to your queries based on the data that you need to process, with no provisioning or commitments. Redshift Spectrum is also highly concurrent—you can access your Amazon S3 data from any number of Amazon Redshift clusters.

The life of a Redshift Spectrum query

It all starts when Redshift Spectrum queries are submitted to the leader node of your Amazon Redshift cluster. The leader node optimizes, compiles, and pushes the query execution to the compute nodes in your Amazon Redshift cluster. Next, the compute nodes obtain the information describing the external tables from your data catalog, dynamically pruning nonrelevant partitions based on the filters and joins in your queries. The compute nodes also examine the data available locally and push down predicates to efficiently scan only the relevant objects in Amazon S3.

The Amazon Redshift compute nodes then generate multiple requests depending on the number of objects that need to be processed, and submit them concurrently to Redshift Spectrum, which pools thousands of Amazon EC2 instances per AWS Region. The Redshift Spectrum worker nodes scan, filter, and aggregate your data from Amazon S3, streaming required data for processing back to your Amazon Redshift cluster. Then, the final join and merge operations are performed locally in your cluster and the results are returned to your client.

Redshift Spectrum’s architecture offers several advantages. First, it elastically scales compute resources separately from the storage layer in Amazon S3. Second, it offers significantly higher concurrency because you can run multiple Amazon Redshift clusters and query the same data in Amazon S3. Third, Redshift Spectrum leverages the Amazon Redshift query optimizer to generate efficient query plans, even for complex queries with multi-table joins and window functions. Fourth, it operates directly on your source data in its native format (Parquet, RCFile, CSV, TSV, Sequence, Avro, RegexSerDe and more to come soon). This means that no data loading or transformation is needed. This also eliminates data duplication and associated costs. Fifth, operating on open data formats gives you the flexibility to leverage other AWS services and execution engines across your various teams to collaborate on the same data in Amazon S3. You get all of this, and because Redshift Spectrum is a feature of Amazon Redshift, you get the same level of end-to-end security, compliance, and certifications as with Amazon Redshift.

Designed for performance and cost-effectiveness

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to leverage file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost. Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of Redshift Spectrum’s supported compression algorithms, less data is scanned.

Amazon Redshift and Redshift Spectrum give you the best of both worlds. If you need to run frequent queries on the same data, you can normalize it, store it in Amazon Redshift, and get all of the benefits of a fully featured data warehouse for storing and querying structured data at a flat rate. At the same time, you can keep your additional data in multiple open file formats in Amazon S3, whether it is historical data or the most recent data, and extend your Amazon Redshift queries across your Amazon S3 data lake.

And that is how Amazon Redshift Spectrum scales data warehousing to exabytes—with no loading required. Redshift Spectrum ends the “tyranny of OR,” enabling you to store your data where you want, in the format you want, and have it available for fast processing using standard SQL when you need it, now and in the future.


Additional Reading

10 Best Practices for Amazon Redshift Spectrum
Amazon QuickSight Adds Support for Amazon Redshift Spectrum
Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data

 


 

About the Author

Maor Kleider is a Senior Product Manager for Amazon Redshift, a fast, simple and cost-effective data warehouse. Maor is passionate about collaborating with customers and partners, learning about their unique big data use cases and making their experience even better. In his spare time, Maor enjoys traveling and exploring new restaurants with his family.

 

 

 

Visualize Amazon S3 Analytics Data with Amazon QuickSight

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/visualize-amazon-s3-analytics-data-with-amazon-quicksight/

When Amazon S3 analytics was released in November 2016, it gave you the ability to analyze storage access patterns and transition the right data to the right storage class. You could also manually export the data to an S3 bucket to analyze, using the business intelligence tool of your choice, and gather deeper insights on usage and growth patterns. This helped you reduce storage costs while optimizing performance based on usage patterns.

With today’s update, you can quickly and easily gain those deeper insights and benefits by analyzing and visualizing S3 analytics data in Amazon QuickSight. It takes just a single click from the S3 console, without the need for manual exports or additional data preparation.

If you already have S3 analytics storage class analysis enabled for your buckets, choose Explore in QuickSight on the top right.

Users new to Amazon QuickSight can follow the steps to get set up. Existing QuickSight users are automatically deep linked to an analysis of their S3 analytics data.

From there, you can access the pre-built visualizations to understand the storage access pattern of your bucket. For example, you can visualize the amount of data retrieved vs. the amount of storage consumed for objects of different age groups to identify infrequently accessed data. You can also create new visualizations and perform ad hoc analysis to break down the access patterns by the filters that you have defined in S3 analytics.

 

Finally, you can set up a daily scheduled refresh of the storage class analysis data set in Amazon QuickSight to keep it up to date. Publish and share the analysis as a dashboard to other users in your organization to monitor your S3 storage access pattern.

This feature is now available in all QuickSight regions: US East (N. Virginia and Ohio), US West (Oregon), and EU (Ireland).

Learn more

To learn more about these capabilities and start using them in your dashboards, check out the Amazon QuickSight User Guide.

Stay engaged

If you have questions and suggestions, post them on the Amazon QuickSight Discussion Forum.

Not a QuickSight user?

Go to the Amazon QuickSight website to get started for FREE.

 

Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

Post Syndicated from Rendy Oka original https://aws.amazon.com/blogs/big-data/analysis-of-top-n-dynamodb-objects-using-amazon-athena-and-amazon-quicksight/

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what is driving the business at any particular moment. For retail companies, for example, understanding which products are currently popular can aid in planning for future growth. Similarly, for PR companies, understanding the impact of an advertising campaign can help them market their products more effectively.

This post covers an architecture that helps you analyze your streaming data. You’ll build a solution using Amazon DynamoDB Streams, AWS Lambda, Amazon Kinesis Firehose, and Amazon Athena to analyze data intake at a frequency that you choose. And because this is a serverless architecture, you can use all of the services here without the need to provision or manage servers.

The data source

You’ll collect a random sampling of tweets via Twitter’s API and store a variety of attributes in your DynamoDB table, such as: Twitter handle, tweet ID, hashtags, location, and Time-To-Live (TTL) value.

In DynamoDB, the primary key is used as an input to an internal hash function. The output from this function determines the partition in which the data will be stored. When using a combination of primary key and sort key as a DynamoDB schema, you need to make sure that no single partition key contains many more objects than the other partition keys because this can cause partition level throttling. For the demonstration in this blog, the Twitter handle will be the primary key and the tweet ID will be the sort key. This allows you to group and sort tweets from each user.

To help you get started, I have written a script that pulls a live Twitter stream that you can use to generate your data. All you need to do is provide your own Twitter Apps credentials, and it should generate the data immediately. Alternatively, I have also provided a script that you can use to generate random Tweets with little effort.

You can find both scripts in the Github repository:

https://github.com/awslabs/aws-blog-dynamodb-analysis

There are some modules that you may need to install to run these scripts. You can find them in Python’s module repository:

To get your own Twitter credentials, go to https://www.twitter.com/ and sign up for a free account, if you don’t already have one. After your account is set up, go to https://apps.twitter.com/. On the main landing page, choose the Create New App button. After the application is created, go to Keys and Access Tokens to get your credentials to use the Twitter API. You’ll need to generate Customer Tokens/Secret and Access Token/Secret. All four keys will be used to authenticate your request.

Architecture overview

Before we begin, let’s take a look at the overall flow of information will look like, from data ingestion into DynamoDB to visualization of results in Amazon QuickSight.

As illustrated in the architecture diagram above, any changes made to the items in DynamoDB will be captured and processed using DynamoDB Streams. Next, a Lambda function will be invoked by a trigger that is configured to respond to events in DynamoDB Streams. The Lambda function processes the data prior to pushing to Amazon Kinesis Firehose, which will output to Amazon S3. Finally, you use Amazon Athena to analyze the streaming data landing in Amazon S3. The result can be explored and visualized in Amazon QuickSight for your company’s business analytics.

You’ll need to implement your custom Lambda function to help transform the raw <key, value> data stored in DynamoDB to a JSON format for Athena to digest, but I can help you with a sample code that you are free to modify.

Implementation

In the following sections, I’ll walk through how you can set up the architecture discussed earlier.

Create your DynamoDB table

First, let’s create a DynamoDB table and enable DynamoDB Streams. This will enable data to be copied out of this table. From the console, use the user_id as the partition key and tweet_id as the sort key:

After the table is ready, you can enable DynamoDB Streams. This process operates asynchronously, so there is no performance impact on the table when you enable this feature. The easiest way to manage DynamoDB Streams is also through the DynamoDB console.

In the Overview tab of your newly created table, click Manage Stream. In the window, choose the information that will be written to the stream whenever data in the table is added or modified. In this example, you can choose either New image or New and old images.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

Configure Kinesis Firehose

Before creating the Lambda function, you need to configure Kinesis Firehose delivery stream so that it’s ready to accept data from Lambda. Open the Firehose console and choose Create Firehose Delivery Stream. From here, choose S3 as the destination and use the following to information to configure the resource. Note the Delivery stream name because you will use it in the next step.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/firehose/latest/dev/basic-create.html#console-to-s3

Create your Lambda function

Now that Kinesis Firehose is ready to accept data, you can create your Lambda function.

From the AWS Lambda console, choose the Create a Lambda function button and use the Blank Function. Enter a name and description, and choose Python 2.7 as the Runtime. Note your Lambda function name because you’ll need it in the next step.

In the Lambda function code field, you can paste the script that I have written for this purpose. All this function needs is the name of your Firehose stream name set as an environment variable.

import boto3
import json
import os

# Initiate Firehose client
firehose_client = boto3.client('firehose')

def lambda_handler(event, context):
    records = []
    batch   = []
    try :
        for record in event['Records']:
            tweet = {}
            t_stats = '{ "table_name":"%s", "user_id":"%s", "tweet_id":"%s", "approx_post_time":"%d" }\n' \
                      % ( record['eventSourceARN'].split('/')[1], \
                          record['dynamodb']['Keys']['user_id']['S'], \
                          record['dynamodb']['Keys']['tweet_id']['N'], \
                          int(record['dynamodb']['ApproximateCreationDateTime']) )
            tweet["Data"] = t_stats
            records.append(tweet)
        batch.append(records)
        res = firehose_client.put_record_batch(
            DeliveryStreamName = os.environ['firehose_stream_name'],
            Records = batch[0]
        )
        return 'Successfully processed {} records.'.format(len(event['Records']))
    except Exception :
        pass

The handler should be set to lambda_function.lambda_handler and you can use the existing lambda_dynamodb_streams role that’s been created by default.

Enable DynamoDB trigger and start collecting data

Everything is ready to go. Open your table using the DynamoDB console and go to the Triggers tab. Select the Create trigger drop down list and choose Existing Lambda function. In the pop-up window, select the function that you just created, and choose the Create button.

At this point, you can start collecting data with the Python script that I’ve provided. The first one will create a script that will pull public Twitter data and the other will generate fake tweets using Lorem Ipsum text.

Configure Amazon Athena to read the data

Next, you will configure Amazon Athena so that it can read the data Kinesis Firehose outputs to Amazon S3 and allow you to analyze the data as needed. You can connect to Athena directly from the Athena console, and you can establish a connection using JDBC or the Athena API. In this example, I’m going to demonstrate what this looks like on the Athena console.

First, create a new database and a new table. You can do this by running the following two queries. The first query creates a new database:

CREATE DATABASE IF NOT EXISTS ddbtablestats

And the second query creates a new table:

CREATE EXTERNAL TABLE IF NOT EXISTS ddbtablestats.twitterfeed (
    `table_name` string,
    `user_id` string,
    `tweet_id` bigint,
    `approx_post_time` timestamp 
) PARTITIONED BY (
    year string,
    month string,
    day string,
    hour string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://myBucket/dynamodb/streams/transactions/'

Note that this table is created using partitions. Partitioning separates your data into logical parts based on certain criteria, such as date, location, language, etc. This allows Athena to selectively pull your data without needing to process the entire data set. This effectively minimizes the query execution time, and it also allows you to have greater control over the data that you want to query.

After the query has completed, you should be able to see the table in the left side pane of the Athena dashboard.

After the database and table have been created, execute the ALTER TABLE query to populate the partitions in your table. Replace the date with the current date when the script was executed.

ALTER TABLE ddbtablestats.TwitterFeed ADD IF NOT EXISTS
PARTITION (year='2017',month='05',day='17',hour='01') location 's3://myBucket/dynamodb/streams/transactions/2017/05/17/01/'

Using the Athena console, you’ll need to manually populate each partition for each additional partition that you’d like to analyze, however you can programmatically automate this process by using the JDBC driver or any AWS SDK of your choice.

For more information on partitioning in Athena, check out our documentation:

http://docs.aws.amazon.com/athena/latest/ug/partitions.html

Querying the data in Amazon Athena

This is it! Let’s run this query to see the top 10 most active Twitter users in the last 24 hours. You can do this from the Athena console:

SELECT user_id, COUNT(DISTINCT tweet_id) tweets FROM ddbTableStats.TwitterFeed
WHERE year='2017' AND month='05' AND day='17'
GROUP BY user_id
ORDER BY tweets DESC
LIMIT 10

The result should look similar to the following:

Linking Athena to Amazon QuickSight

Finally, to make this data available to a larger audience, let’s visualize this data in Amazon QuickSight. Amazon QuickSight provides native connectivity to AWS data sources such as Amazon Redshift, Amazon RDS, and Amazon Athena. Amazon QuickSight can also connect to on-premises databases, Excel, or CSV files, and it can connect to cloud data sources such as Salesforce.com. For this solution, we will connect Amazon QuickSight to the Athena table we just created.

Amazon QuickSight has a free tier that provides 1 user and 1GB of SPICE (Superfast Parallel In-memory Calculated Engine) capacity free. So you can sign up and use QuickSight free of charge.

When you are signing up for Amazon QuickSight, ensure that you grant permissions for QuickSight to connect to Athena and the S3 bucket where the data is stored.

After you’ve signed up, navigate to the new analysis button, and choose new data set, and then select the Athena data source option. Create a new name for your data source and proceed to the next prompt. At this point, you should see the Athena table you created earlier.

Choose the option to import the data to SPICE for a quicker analysis. SPICE is an in-memory optimized calculation engine that is designed for quick data visualization through parallel processing. SPICE also enables you to refresh your data sets at a regular interval or on-demand as you want.

In the dialog box, confirm this data set creation, and you’ll arrive on the landing page where you can start building your graph. The X-axis will represent the user_id and the Value will be used to represent the SUM total of the tweets from each user.

The Amazon QuickSight report looks like this:

Through this visualization, I can easily see that there are 3 users that tweeted over 20 times that day and that the majority of the users have fewer than 10 tweets that day. I can also set up a scheduled refresh of my SPICE dataset so that I have a dashboard that is regularly updated with the latest data.

Closing thoughts

Here are the benefits that you can gain from using this architecture:

  1. You can optimize the design of your DynamoDB schema that follows AWS best practice recommendations.
  1. You can run analysis and data intelligence in order to understand the current customer demands for your business.
  1. You can store incremental backup for future auditing.

The flexibility of our AWS services invites you to create and design the ideal workflow for your production at any scale, and, as always, if you ever need some guidance, don’t hesitate to reach out to us.I  hope this has been helpful to you! Please leave any questions and comments below.

 


Additional Reading

Learn how to analyze VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight.


About the Author

Rendy Oka is a Big Data Support Engineer for Amazon Web Services. He provides consultations and architectural designs and partners with the TAMs, Solution Architects, and AWS product teams to help develop solutions for our customers. He is also a team lead for the big data support team in Seattle. Rendy has traveled to dozens of countries around the world and takes every opportunity to experience the local culture wherever he goes

 

 

 

 

New Power Bundle for Amazon WorkSpaces – More vCPUs, Memory, and Storage

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-power-bundle-for-amazon-workspaces-more-vcpus-memory-and-storage/

Are you tired of hearing me talk about Amazon WorkSpaces yet? I hope not, because we have a lot of customer-driven additions on the roadmap! Our customers in the developer and analyst community have been asking for a workstation-class machine that will allow them to take advantage of the low cost and flexibility of WorkSpaces. Developers want to run Visual Studio, IntelliJ, Eclipse, and other IDEs. Analysts want to run complex simulations and statistical analysis using MatLab, GNU Octave, R, and Stata.

New Power Bundle
Today we are extending the current set of WorkSpaces bundles with a new Power bundle. With four vCPUs, 16 GiB of memory, and 275 GB of storage (175 GB on the system volume and another 100 GB on the user volume), this bundle is designed to make developers, analysts, (and me) smile. You can launch them in all of the usual ways: Console, CLI (create-workspaces), or API (CreateWorkSpaces):

One really interesting benefit to using a cloud-based virtual desktop for simulations and statistical analysis is the ease of access to data that’s already stored in the cloud. Analysts can mine and analyze petabytes of data stored in S3 that is effectively local (with respect to access time) to the WorkSpace. This low-latency access will boost productivity and also simplifies the use of other AWS data analysis tools such as Amazon Redshift, Amazon Redshift Spectrum, Amazon QuickSight, and Amazon Athena.

Like the existing bundles, the new Power bundle can be used in either billing configuration, AlwaysOn or AutoStop (read Amazon WorkSpaces Update – Hourly Usage and Expanded Root Volume to learn more). The bundle is available in all AWS Regions where WorkSpaces is available and you can launch one today! Visit the WorkSpaces Pricing page for pricing in your region.

Jeff;

Mysterious Group Lands Denuvo Anti-Piracy Body Blow

Post Syndicated from Andy original https://torrentfreak.com/mysterious-group-lands-denuvo-anti-piracy-body-blow-170607/

While there’s always excitement in piracy land over the release of a new movie or TV show, video gaming fans really know how to party when a previously uncracked game appears online.

When that game was protected by the infamous Denuvo anti-piracy system, champagne corks explode.

There’s been a lot of activity in this area during recent months but more recently there’s been a noticeable crescendo. As more groups have become involved in trying to defeat the system, Denuvo has looked increasingly vulnerable. Over the past 24 hours, it’s looked in serious danger.

The latest drama surrounds DISHONORED.2-STEAMPUNKS, which is a pirate release of the previously uncracked action adventure game Dishonored 2. The game uses Denuvo protection and at the rate titles have been falling to pirates lately, it’s appearance wasn’t a surprise. However, the manner in which the release landed online has sent shockwaves through the scene.

The cracking scene is relatively open these days, in that people tend to have a rough idea of who the major players are. Their real-life identities are less obvious, of course, but names like CPY, Voksi, and Baldman regularly appear in discussions.

The same cannot be said about SteamPunks. With their topsite presence, they appear to be a proper ‘Scene’ group but up until yesterday, they were an unknown entity.

It’s fair to say that this dramatic appearance from nowhere raised quite a few eyebrows among the more suspicious crack aficionados. That being said, SteamPunks absolutely delivered – and then some.

Rather than simply pre-crack (remove the protection) from Dishonored 2 and then deliver it to the public, the SteamPunks release appears to contain code which enables the user to generate Denuvo licenses on a machine-by-machine basis.

If that hasn’t sunk in, the theory is that the ‘key generator’ might be able to do the same with all Denuvo-protected releases in future, blowing the system out of the water.

While that enormous feat remains to be seen, there is an unusual amount of excitement surrounding this release and the emergence of the previously unknown SteamPunks. In the words of one Reddit user, the group has delivered the cracking equivalent of The Holy Hand Grenade of Antioch, yet no one appears to have had any knowledge of them before yesterday.

Only adding to the mystery is the lack of knowledge relating to how their tool works. Perhaps ironically, perhaps importantly, SteamPunks have chosen to protect their code with VMProtect, the software system that Denuvo itself previously deployed to stop people reverse-engineering its own code.

This raises two issues. One, people could have difficulty finding out how the license generator works and two, it could potentially contain something nefarious besides the means to play Dishonored 2 for free.

With the latter in mind, a number of people in the cracking community have been testing the release but thus far, no one has found anything untoward. That doesn’t guarantee that it’s entirely clean but it does help to calm nerves. Indeed, cracking something as difficult as Denuvo in order to put out some malware seems a lot of effort when the same could be achieved much more easily.

“There is no need to break into Fort Knox to give out flyers for your pyramid scheme,” one user’s great analogy reads.

That being said, people with experience are still urging caution, which should be the case for anyone running a cracked game, no matter who released it.

Finally, another twist in the Denuvo saga arrived yesterday courtesy of VMProtect. As widely reported, someone from the company previously indicated that Denuvo had been using its VMProtect system without securing an appropriate license.

The source said that legal action was on the horizon but an announcement from VMProtect yesterday suggests that the companies are now seeing eye to eye.

“We were informed that there are open questions and some uncertainty about the use of our software by DENUVO GmbH,” VMProtect said.

“Referring to this circumstance we want to clarify that DENUVO GmbH had the right to use our software in the past and has the right to use it currently as well as in the future. In summary, no open issues exist between DENUVO GmbH and VMProtect Software for which reason you may ignore any other divergent information.”

While the above tends to imply there’s never been an issue, a little more information from VMProtect dev Ivan Permyakov may indicate that an old dispute has since been settled.

“Information about our relationship with Denuvo Software has long been outdated and irrelevant,” he said.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.