Tag Archives: cli

Hunting for life on Mars assisted by high-altitude balloons

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/eclipse-high-altitude-balloons/

Will bacteria-laden high-altitude balloons help us find life on Mars? Today’s eclipse should bring us closer to an answer.

NASA Bacteria Balloons Raspberry Pi HAB Life on Mars

image c/o NASA / Ames Research Center / Tristan Caro

The Eclipse Ballooning Project

Having learned of the Eclipse Ballooning Project set to take place today across the USA, a team at NASA couldn’t miss the opportunity to harness the high-flying project for their own experiments.

NASA Bacteria Balloons Raspberry Pi HAB Life on Mars

The Eclipse Ballooning Project invited students across the USA to aid in the launch of 50+ high-altitude balloons during today’s eclipse. Each balloon is equipped with its own Raspberry Pi and camera for data collection and live video-streaming.

High-altitude ballooning, or HAB as it’s often referred to, has become a popular activity within the Raspberry Pi community. The lightweight nature of the device allows for high ascent, and its Camera Module enables instant visual content collection.

Life on Mars

image c/o Montana State University

The Eclipse Ballooning Project team, headed by Angela Des Jardins of Montana State University, was contacted by Jim Green, Director of Planetary Science at NASA, who hoped to piggyback on the project to run tests on bacteria in the Mars-like conditions the balloons would encounter near space.

Into the stratosphere

At around -35 degrees Fahrenheit, with thinner air and harsher ultraviolet radiation, the conditions in the upper part of the earth’s stratosphere are comparable to those on the surface of Mars. And during the eclipse, the moon will block some UV rays, making the environment in our stratosphere even more similar to the martian oneideal for NASA’s experiment.

So the students taking part in the Eclipse Ballooning Project could help the scientists out, NASA sent them some small metal tags.

NASA Bacteria Balloons Raspberry Pi HAB Life on Mars

These tags contain samples of a kind of bacterium known as Paenibacillus xerothermodurans. Upon their return to ground, the bacteria will be tested to see whether and how the high-altitude conditions affected them.

Life on Mars

Paenibacillus xerothermodurans is one of the most resilient bacterial species we know. The team at NASA wants to discover how the bacteria react to their flight in order to learn more about whether life on Mars could possibly exist. If the low temperature, UV rays, and air conditions cause the bacteria to mutate or indeed die, we can be pretty sure that the existence of living organisms on the surface of Mars is very unlikely.

Life on Mars

What happens to the bacteria on the spacecraft and rovers we send to space? This experiment should provide some answers.

The eclipse

If you’re in the US, you might have a chance to witness the full solar eclipse today. And if you’re planning to watch, please make sure to take all precautionary measures. In a nutshell, don’t look directly at the sun. Not today, not ever.

If you’re in the UK, you can observe a partial eclipse, if the clouds decide to vanish. And again, take note of safety measures so you don’t damage your eyes.

Life on Mars

You can also watch a live-stream of the eclipse via the NASA website.

If you’ve created an eclipse-viewing Raspberry Pi project, make sure to share it with us. And while we’re talking about eclipses and balloons, check here for our coverage of the 2015 balloon launches coinciding with the UK’s partial eclipse.

The post Hunting for life on Mars assisted by high-altitude balloons appeared first on Raspberry Pi.

Kernel prepatch 4.13-rc6

Post Syndicated from corbet original https://lwn.net/Articles/731475/rss

The 4.13-rc6 kernel prepatch is out.
So everything still looks on target for a normal release schedule,
which would imply rc7 next weekend, and then the final 4.13 the week
after that.

Unless something happens, of course. Tomorrow is the solar eclipse,
and maybe it brings doom and gloom even beyond the expected Oregon
trafficalypse. You never know.”

The Windows App Store is Full of Pirate Streaming Apps

Post Syndicated from Ernesto original https://torrentfreak.com/the-windows-app-store-is-full-of-pirate-streaming-apps-170820/

Over the past few years it has become much easier to stream movies and TV-shows over the Internet.

Legal streaming services such as Netflix and Amazon are booming. At the same time, however, there’s also a dark market of thousands of pirate streaming tools.

In recent months, Hollywood has directed many its anti-piracy efforts towards unauthorized Kodi-addons and several popular pirate streaming sites, which offer movies and TV-shows without permission. What seems to be largely ignored, however, is a “store” that hundreds of millions of people have access to; the Windows App Store.

When we were browsing through the “top free” apps in the Windows Store, our attention was drawn to several applications that promoted “free movies” including various Hollywood blockbusters such as “Wonder Woman,” “Spider-Man: Homecoming,” and “The Mummy.”

Initially, we assumed that a pirate app may have slipped passed Microsoft’s screening process. However, the ‘problem’ doesn’t appear to be isolated. There are dozens of similar apps in the official store that promise potential users free movies, most with rave reviews.

Some of the many pirate apps in the “trusted” store

Most of the applications work on multiple platforms including PC, mobile, and the Xbox. They are pretty easy to use and rely on the familiar grid-based streaming interface most sites and services use. Pick a movie or TV-show, click the play button, and off you go.

The sheer number of piracy apps in the Windows Store, using names such as “Free Movies HD,” “Free Movies Online 2020,” and “FreeFlix HQ,” came as a surprise to us. In particular, because the developers make no attempt to hide their activities, quite the opposite.

The app descriptions are littered with colorful language offering the latest Hollywood movies, and thousands of others, without charge. In addition, the apps display their capabilities in various screenshots, including those showing movies that are not yet available on legal streaming platforms.

Screenshot provided by the Windows app store

Making matters worse, the applications show advertising as well, including high-quality pre-roll ads. Some of these appear to be facilitated through Microsoft’s own Ad Monetization platform. Other apps offer paid versions or in-app purchases to monetize their service.

After hours of going through the pirate app offerings, it’s clear that Microsoft’s “trusted” Windows Store is ridden with unauthorized content. Thus far we have only mentioned video, but the issue also applies to pirated music in the form of dedicated streaming and download apps.

Earlier this year, Microsoft signed a landmark anti-piracy agreement with several major copyright holders, to address pirate search results in the Bing search engine. The above makes clear that search results in the Microsoft Store store may require some attention too.

TorrentFreak reached out to Microsoft, asking for a comment on our findings, but at the time of publication we haven’t yet heard back.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Rightscorp Bleeds Another Million, Borrows $200K From Customer BMG

Post Syndicated from Andy original https://torrentfreak.com/rightscorp-bleeds-another-million-borrows-200k-from-customer-bmg-170819/

Anti-piracy outfit Rightscorp is one of the many companies trying to turn Internet piracy into profit. The company has a somewhat novel approach but has difficulty balancing the books.

Essentially, Rightscorp operates like other so-called copyright-trolling operations, in that it monitors alleged offenders on BitTorrent networks, tracks them to their ISPs, then attempts to extract a cash settlement. Rightscorp does this by sending DMCA notices with settlement agreements attached, in the hope that at-this-point-anonymous Internet users break cover in panic. This can lead to a $20 or $30 ‘fine’ or in some cases dozens of multiples of that.

But despite settling hundreds of thousands of these cases, profit has thus far proven elusive, with the company hemorrhaging millions in losses. The company has just filed its results for the first half of 2017 and they contain more bad news.

In the six months ended June 2017, revenues obtained from copyright settlements reached just $138,514, that’s 35% down on the $214,326 generated in the same period last year. However, the company did manage to book $148,332 in “consulting revenue” in the first half of this year, a business area that generated no revenue in 2016.

Overall then, total revenue for the six month period was $286,846 – up from $214,326 last year. While that’s a better picture in its own right, Rightscorp has a lot of costs attached to its business.

After paying out $69,257 to copyright holders and absorbing $1,190,696 in general and administrative costs, among other things, the company’s total operating expenses topped out at $1,296,127 for the first six months of the year.

To make a long story short, the company made a net loss of $1,068,422, which was more than the $995,265 loss it made last year and despite improved revenues. The company ended June with just $1,725 in cash.

“These factors raise substantial doubt about the Company’s ability to continue as a going concern within one year after the date that the financial statements are issued,” the company’s latest statement reads.

This hanging-by-a-thread narrative has followed Rightscorp for the past few years but there’s information in the latest accounts which indicates how bad things were at the start of the year.

In January 2016, Rightscorp and several copyright holders, including Hollywood studio Warner Bros, agreed to settle a class-action lawsuit over intimidating robo-calls that were made to alleged infringers. The defendants agreed to set aside $450,000 to cover the costs, and it appears that Rightscorp was liable for at least $200,000 of that.

Rightscorp hasn’t exactly been flush with cash, so it was interesting to read that its main consumer piracy settlement client, music publisher BMG, actually stepped in to pay off the class-action settlement.

“At December 31, 2016, the Company had accrued $200,000 related to the settlement of a class action complaint. On January 7, 2017, BMG Rights Management (US) LLC (“BMG”) advanced the Company $200,000, which was used to pay off the settlement. The advance from BMG is to be applied to future billings from the Company to BMG for consulting services,” Rightscorp’s filing reads.

With Rightscorp’s future BMG revenue now being gobbled up by what appears to be loan repayments, it becomes difficult to see how the anti-piracy outfit can make enough money to pay off the $200,000 debt. However, its filing notes that on July 21, 2017, the company issued “an aggregate of 10,000,000 shares of common stock to an investor for a purchase price of $200,000.” While that amount matches the BMG debt, the filing doesn’t reveal who the investor is.

The filing also reveals that on July 31, Rightscorp entered into two agreements to provide services “to a holder of multiple copyrights.” The copyright holder isn’t named, but the deal reveals that it’s in Rightscorp’s best interests to get immediate payment from people to whom it sends cash settlement demands.

“[Rightscorp] will receive 50% of all gross proceeds of any settlement revenue received by the Client from pre-lawsuit ‘advisory notices,’ and 37.5% of all gross proceeds received by the Client from ‘final warning’ notices sent immediately prior to a lawsuit,” the filing notes.

Also of interest is that Rightscorp has offered not to work with any of the copyright holders’ direct competitors, providing certain thresholds are met – $10,000 revenue in the first month to $100,000 after 12 months. But there’s more to the deal.

Rightscorp will also provide a number of services to this client including detecting and verifying copyright works on P2P networks, providing information about infringers, plus reporting, litigation support, and copyright protection advisory services.

For this, Rightscorp will earn $10,000 for the first three months, rising to $85,000 per month after 16 months, valuable revenue for a company fighting for its life.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Announcing the Winners of the AWS Chatbot Challenge – Conversational, Intelligent Chatbots using Amazon Lex and AWS Lambda

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/announcing-the-winners-of-the-aws-chatbot-challenge-conversational-intelligent-chatbots-using-amazon-lex-and-aws-lambda/

A couple of months ago on the blog, I announced the AWS Chatbot Challenge in conjunction with Slack. The AWS Chatbot Challenge was an opportunity to build a unique chatbot that helped to solve a problem or that would add value for its prospective users. The mission was to build a conversational, natural language chatbot using Amazon Lex and leverage Lex’s integration with AWS Lambda to execute logic or data processing on the backend.

I know that you all have been anxiously waiting to hear announcements of who were the winners of the AWS Chatbot Challenge as much as I was. Well wait no longer, the winners of the AWS Chatbot Challenge have been decided.

May I have the Envelope Please? (The Trumpets sound)

The winners of the AWS Chatbot Challenge are:

  • First Place: BuildFax Counts by Joe Emison
  • Second Place: Hubsy by Andrew Riess, Andrew Puch, and John Wetzel
  • Third Place: PFMBot by Benny Leong and his team from MoneyLion.
  • Large Organization Winner: ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

 

Diving into the Winning Chatbot Projects

Let’s take a walkthrough of the details for each of the winning projects to get a view of what made these chatbots distinctive, as well as, learn more about the technologies used to implement the chatbot solution.

 

BuildFax Counts by Joe Emison

The BuildFax Counts bot was created as a real solution for the BuildFax company to decrease the amount the time that sales and marketing teams can get answers on permits or properties with permits meet certain criteria.

BuildFax, a company co-founded by bot developer Joe Emison, has the only national database of building permits, which updates data from approximately half of the United States on a monthly basis. In order to accommodate the many requests that come in from the sales and marketing team regarding permit information, BuildFax has a technical sales support team that fulfills these requests sent to a ticketing system by manually writing SQL queries that run across the shards of the BuildFax databases. Since there are a large number of requests received by the internal sales support team and due to the manual nature of setting up the queries, it may take several days for getting the sales and marketing teams to receive an answer.

The BuildFax Counts chatbot solves this problem by taking the permit inquiry that would normally be sent into a ticket from the sales and marketing team, as input from Slack to the chatbot. Once the inquiry is submitted into Slack, a query executes and the inquiry results are returned immediately.

Joe built this solution by first creating a nightly export of the data in their BuildFax MySQL RDS database to CSV files that are stored in Amazon S3. From the exported CSV files, an Amazon Athena table was created in order to run quick and efficient queries on the data. He then used Amazon Lex to create a bot to handle the common questions and criteria that may be asked by the sales and marketing teams when seeking data from the BuildFax database by modeling the language used from the BuildFax ticketing system. He added several different sample utterances and slot types; both custom and Lex provided, in order to correctly parse every question and criteria combination that could be received from an inquiry.  Using Lambda, Joe created a Javascript Lambda function that receives information from the Lex intent and used it to build a SQL statement that runs against the aforementioned Athena database using the AWS SDK for JavaScript in Node.js library to return inquiry count result and SQL statement used.

The BuildFax Counts bot is used today for the BuildFax sales and marketing team to get back data on inquiries immediately that previously took up to a week to receive results.

Not only is BuildFax Counts bot our 1st place winner and wonderful solution, but its creator, Joe Emison, is a great guy.  Joe has opted to donate his prize; the $5,000 cash, the $2,500 in AWS Credits, and one re:Invent ticket to the Black Girls Code organization. I must say, you rock Joe for helping these kids get access and exposure to technology.

 

Hubsy by Andrew Riess, Andrew Puch, and John Wetzel

Hubsy bot was created to redefine and personalize the way users traditionally manage their HubSpot account. HubSpot is a SaaS system providing marketing, sales, and CRM software. Hubsy allows users of HubSpot to create engagements and log engagements with customers, provide sales teams with deals status, and retrieves client contact information quickly. Hubsy uses Amazon Lex’s conversational interface to execute commands from the HubSpot API so that users can gain insights, store and retrieve data, and manage tasks directly from Facebook, Slack, or Alexa.

In order to implement the Hubsy chatbot, Andrew and the team members used AWS Lambda to create a Lambda function with Node.js to parse the users request and call the HubSpot API, which will fulfill the initial request or return back to the user asking for more information. Terraform was used to automatically setup and update Lambda, CloudWatch logs, as well as, IAM profiles. Amazon Lex was used to build the conversational piece of the bot, which creates the utterances that a person on a sales team would likely say when seeking information from HubSpot. To integrate with Alexa, the Amazon Alexa skill builder was used to create an Alexa skill which was tested on an Echo Dot. Cloudwatch Logs are used to log the Lambda function information to CloudWatch in order to debug different parts of the Lex intents. In order to validate the code before the Terraform deployment, ESLint was additionally used to ensure the code was linted and proper development standards were followed.

 

PFMBot by Benny Leong and his team from MoneyLion

PFMBot, Personal Finance Management Bot,  is a bot to be used with the MoneyLion finance group which offers customers online financial products; loans, credit monitoring, and free credit score service to improve the financial health of their customers. Once a user signs up an account on the MoneyLion app or website, the user has the option to link their bank accounts with the MoneyLion APIs. Once the bank account is linked to the APIs, the user will be able to login to their MoneyLion account and start having a conversation with the PFMBot based on their bank account information.

The PFMBot UI has a web interface built with using Javascript integration. The chatbot was created using Amazon Lex to build utterances based on the possible inquiries about the user’s MoneyLion bank account. PFMBot uses the Lex built-in AMAZON slots and parsed and converted the values from the built-in slots to pass to AWS Lambda. The AWS Lambda functions interacting with Amazon Lex are Java-based Lambda functions which call the MoneyLion Java-based internal APIs running on Spring Boot. These APIs obtain account data and related bank account information from the MoneyLion MySQL Database.

 

ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

ADP PI (Payroll Innovation) bot is designed to help employees of ADP customers easily review their own payroll details and compare different payroll data by just asking the bot for results. The ADP PI Bot additionally offers issue reporting functionality for employees to report payroll issues and aids HR managers in quickly receiving and organizing any reported payroll issues.

The ADP Payroll Innovation bot is an ecosystem for the ADP payroll consisting of two chatbots, which includes ADP PI Bot for external clients (employees and HR managers), and ADP PI DevOps Bot for internal ADP DevOps team.


The architecture for the ADP PI DevOps bot is different architecture from the ADP PI bot shown above as it is deployed internally to ADP. The ADP PI DevOps bot allows input from both Slack and Alexa. When input comes into Slack, Slack sends the request to Lex for it to process the utterance. Lex then calls the Lambda backend, which obtains ADP data sitting in the ADP VPC running within an Amazon VPC. When input comes in from Alexa, a Lambda function is called that also obtains data from the ADP VPC running on AWS.

The architecture for the ADP PI bot consists of users entering in requests and/or entering issues via Slack. When requests/issues are entered via Slack, the Slack APIs communicate via Amazon API Gateway to AWS Lambda. The Lambda function either writes data into one of the Amazon DynamoDB databases for recording issues and/or sending issues or it sends the request to Lex. When sending issues, DynamoDB integrates with Trello to keep HR Managers abreast of the escalated issues. Once the request data is sent from Lambda to Lex, Lex processes the utterance and calls another Lambda function that integrates with the ADP API and it calls ADP data from within the ADP VPC, which runs on Amazon Virtual Private Cloud (VPC).

Python and Node.js were the chosen languages for the development of the bots.

The ADP PI bot ecosystem has the following functional groupings:

Employee Functionality

  • Summarize Payrolls
  • Compare Payrolls
  • Escalate Issues
  • Evolve PI Bot

HR Manager Functionality

  • Bot Management
  • Audit and Feedback

DevOps Functionality

  • Reduce call volume in service centers (ADP PI Bot).
  • Track issues and generate reports (ADP PI Bot).
  • Monitor jobs for various environment (ADP PI DevOps Bot)
  • View job dashboards (ADP PI DevOps Bot)
  • Query job details (ADP PI DevOps Bot)

 

Summary

Let’s all wish all the winners of the AWS Chatbot Challenge hearty congratulations on their excellent projects.

You can review more details on the winning projects, as well as, all of the submissions to the AWS Chatbot Challenge at: https://awschatbot2017.devpost.com/submissions. If you are curious on the details of Chatbot challenge contest including resources, rules, prizes, and judges, you can review the original challenge website here:  https://awschatbot2017.devpost.com/.

Hopefully, you are just as inspired as I am to build your own chatbot using Lex and Lambda. For more information, take a look at the Amazon Lex developer guide or the AWS AI blog on Building Better Bots Using Amazon Lex (Part 1)

Chat with you soon!

Tara

New – SES Dedicated IP Pools

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-ses-dedicated-ip-pools/

Today we released Dedicated IP Pools for Amazon Simple Email Service (SES). With dedicated IP pools, you can specify which dedicated IP addresses to use for sending different types of email. Dedicated IP pools let you use your SES for different tasks. For instance, you can send transactional emails from one set of IPs and you can send marketing emails from another set of IPs.

If you’re not familiar with Amazon SES these concepts may not make much sense. We haven’t had the chance to cover SES on this blog since 2016, which is a shame, so I want to take a few steps back and talk about the service as a whole and some of the enhancements the team has made over the past year. If you just want the details on this new feature I strongly recommend reading the Amazon Simple Email Service Blog.

What is SES?

So, what is SES? If you’re a customer of Amazon.com you know that we send a lot of emails. Bought something? You get an email. Order shipped? You get an email. Over time, as both email volumes and types increased Amazon.com needed to build an email platform that was flexible, scalable, reliable, and cost-effective. SES is the result of years of Amazon’s own work in dealing with email and maximizing deliverability.

In short: SES gives you the ability to send and receive many types of email with the monitoring and tools to ensure high deliverability.

Sending an email is easy; one simple API call:

import boto3
ses = boto3.client('ses')
ses.send_email(
    [email protected]',
    Destination={'ToAddresses': [[email protected]']},
    Message={
        'Subject': {'Data': 'Hello, World!'},
        'Body': {'Text': {'Data': 'Hello, World!'}}
    }
)

Receiving and reacting to emails is easy too. You can set up rulesets that forward received emails to Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), or AWS Lambda – you could even trigger a Amazon Lex bot through Lambda to communicate with your customers over email. SES is a powerful tool for building applications. The image below shows just a fraction of the capabilities:

Deliverability 101

Deliverability is the percentage of your emails that arrive in your recipients’ inboxes. Maintaining deliverability is a shared responsibility between AWS and the customer. AWS takes the fight against spam very seriously and works hard to make sure services aren’t abused. To learn more about deliverability I recommend the deliverability docs. For now, understand that deliverability is an important aspect of email campaigns and SES has many tools that enable a customer to manage their deliverability.

Dedicated IPs and Dedicated IP pools

When you’re starting out with SES your emails are sent through a shared IP. That IP is responsible for sending mail on behalf of many customers and AWS works to maintain appropriate volume and deliverability on each of those IPs. However, when you reach a sufficient volume shared IPs may not be the right solution.

By creating a dedicated IP you’re able to fully control the reputations of those IPs. This makes it vastly easier to troubleshoot any deliverability or reputation issues. It’s also useful for many email certification programs which require a dedicated IP as a commitment to maintaining your email reputation. Using the shared IPs of the Amazon SES service is still the right move for many customers but if you have sustained daily sending volume greater than hundreds of thousands of emails per day you might want to consider a dedicated IP. One caveat to be aware of: if you’re not sending a sufficient volume of email with a consistent pattern a dedicated IP can actually hurt your reputation. Dedicated IPs are $24.95 per address per month at the time of this writing – but you can find out more at the pricing page.

Before you can use a Dedicated IP you need to “warm” it. You do this by gradually increasing the volume of emails you send through a new address. Each IP needs time to build a positive reputation. In March of this year SES released the ability to automatically warm your IPs over the course of 45 days. This feature is on by default for all new dedicated IPs.

Customers who send high volumes of email will typically have multiple dedicated IPs. Today the SES team released dedicated IP pools to make managing those IPs easier. Now when you send email you can specify a configuration set which will route your email to an IP in a pool based on the pool’s association with that configuration set.

One of the other major benefits of this feature is that it allows customers who previously split their email sending across several AWS accounts (to manage their reputation for different types of email) to consolidate into a single account.

You can read the documentation and blog for more info.

More on My LinkedIn Account

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/08/more_on_my_link.html

I have successfully gotten the fake LinkedIn account in my name deleted. To prevent someone from doing this again, I signed up for LinkedIn. This is my first — and only — post on that account:

My Only LinkedIn Post (Yes, Really)

Welcome to my LinkedIn page. It looks empty because I’m never here. I don’t log in, I never post anything, and I won’t read any notes or comments you leave on this site. Nor will I accept any invitations or click on any “connect” links. I’m sure LinkedIn is a nice place; I just don’t have the time.

If you’re looking for me, visit my webpage at www.schneier.com. There you’ll find my blog, and just about everything I’ve written. My e-mail address is [email protected], if you want to talk to me personally.

I mirror my blog on my Facebook page (https://www.facebook.com/bruce.schneier/) and my Twitter feed (@schneierblog), but I don’t visit those, either.

Now I hear that LinkedIn is e-mailing people on my behalf, suggesting that they friend, follow, connect, or whatever they do there with me. I assure you that I have nothing to do with any of those e-mails, nor do I care what anyone does in response.

Analyzing AWS Cost and Usage Reports with Looker and Amazon Athena

Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/

This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.” 

As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.

The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.

In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.

Analysis with Athena

With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.

After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.

Walkthrough

By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.

Prerequisites

If you want to follow along, you need the following resources:

Enable the cost and usage reports

First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.

The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.

Configure the S3 bucket and files for Athena querying

In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.

To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.

These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.

Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.

As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.

Set up the Athena query engine

In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.

One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.

For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.

 

CREATE EXTERNAL TABLE IF NOT EXISTS cost_and_usage	 (
identity_LineItemId String,
identity_TimeInterval String,
bill_InvoiceId String,
bill_BillingEntity String,
bill_BillType String,
bill_PayerAccountId String,
bill_BillingPeriodStartDate String,
bill_BillingPeriodEndDate String,
lineItem_UsageAccountId String,
lineItem_LineItemType String,
lineItem_UsageStartDate String,
lineItem_UsageEndDate String,
lineItem_ProductCode String,
lineItem_UsageType String,
lineItem_Operation String,
lineItem_AvailabilityZone String,
lineItem_ResourceId String,
lineItem_UsageAmount String,
lineItem_NormalizationFactor String,
lineItem_NormalizedUsageAmount String,
lineItem_CurrencyCode String,
lineItem_UnblendedRate String,
lineItem_UnblendedCost String,
lineItem_BlendedRate String,
lineItem_BlendedCost String,
lineItem_LineItemDescription String,
lineItem_TaxType String,
product_ProductName String,
product_accountAssistance String,
product_architecturalReview String,
product_architectureSupport String,
product_availability String,
product_bestPractices String,
product_cacheEngine String,
product_caseSeverityresponseTimes String,
product_clockSpeed String,
product_currentGeneration String,
product_customerServiceAndCommunities String,
product_databaseEdition String,
product_databaseEngine String,
product_dedicatedEbsThroughput String,
product_deploymentOption String,
product_description String,
product_durability String,
product_ebsOptimized String,
product_ecu String,
product_endpointType String,
product_engineCode String,
product_enhancedNetworkingSupported String,
product_executionFrequency String,
product_executionLocation String,
product_feeCode String,
product_feeDescription String,
product_freeQueryTypes String,
product_freeTrial String,
product_frequencyMode String,
product_fromLocation String,
product_fromLocationType String,
product_group String,
product_groupDescription String,
product_includedServices String,
product_instanceFamily String,
product_instanceType String,
product_io String,
product_launchSupport String,
product_licenseModel String,
product_location String,
product_locationType String,
product_maxIopsBurstPerformance String,
product_maxIopsvolume String,
product_maxThroughputvolume String,
product_maxVolumeSize String,
product_maximumStorageVolume String,
product_memory String,
product_messageDeliveryFrequency String,
product_messageDeliveryOrder String,
product_minVolumeSize String,
product_minimumStorageVolume String,
product_networkPerformance String,
product_operatingSystem String,
product_operation String,
product_operationsSupport String,
product_physicalProcessor String,
product_preInstalledSw String,
product_proactiveGuidance String,
product_processorArchitecture String,
product_processorFeatures String,
product_productFamily String,
product_programmaticCaseManagement String,
product_provisioned String,
product_queueType String,
product_requestDescription String,
product_requestType String,
product_routingTarget String,
product_routingType String,
product_servicecode String,
product_sku String,
product_softwareType String,
product_storage String,
product_storageClass String,
product_storageMedia String,
product_technicalSupport String,
product_tenancy String,
product_thirdpartySoftwareSupport String,
product_toLocation String,
product_toLocationType String,
product_training String,
product_transferType String,
product_usageFamily String,
product_usagetype String,
product_vcpu String,
product_version String,
product_volumeType String,
product_whoCanOpenCases String,
pricing_LeaseContractLength String,
pricing_OfferingClass String,
pricing_PurchaseOption String,
pricing_publicOnDemandCost String,
pricing_publicOnDemandRate String,
pricing_term String,
pricing_unit String,
reservation_AvailabilityZone String,
reservation_NormalizedUnitsPerReservation String,
reservation_NumberOfReservations String,
reservation_ReservationARN String,
reservation_TotalReservedNormalizedUnits String,
reservation_TotalReservedUnits String,
reservation_UnitsPerReservation String,
resourceTags_userName String,
resourceTags_usercostcategory String  


)
    ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      ESCAPED BY '\\'
      LINES TERMINATED BY '\n'

STORED AS TEXTFILE
    LOCATION 's3://<<your bucket name>>';

Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!

Start with Looker and connect to Athena

Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.

Major cost saving levers

Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:

  • Purchasing Reserved Instances vs. On-Demand Instances
  • Data transfer costs
  • Allocating costs over users or other Attributes (denoted with resource tags)

On-Demand, Spot, and Reserved Instances

Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:

  • On-Demand—Pay as you use.
  • Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
  • Reserved Instances—Pay for an instance for a specific, allotted period of time.

When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.

If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.

The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).

Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.

 

SELECT 
	DATE_FORMAT(from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate),'%Y-%m') AS "cost_and_usage.usage_start_month",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_ris",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_non_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_non_ris"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).

With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.

It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.

Data transfer costs

One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.

First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.

Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.

Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.

The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.

SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-In')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_inbound_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-Out')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_outbound_data_transfer_cost"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.

Analysis by tags

AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.

 

SELECT * FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY z___min_rank) as z___pivot_row_rank, RANK() OVER (PARTITION BY z__pivot_col_rank ORDER BY z___min_rank) as z__pivot_col_ordering FROM (
SELECT *, MIN(z___rank) OVER (PARTITION BY "cost_and_usage.product_code") as z___min_rank FROM (
SELECT *, RANK() OVER (ORDER BY CASE WHEN z__pivot_col_rank=1 THEN (CASE WHEN "cost_and_usage.total_unblended_cost" IS NOT NULL THEN 0 ELSE 1 END) ELSE 2 END, CASE WHEN z__pivot_col_rank=1 THEN "cost_and_usage.total_unblended_cost" ELSE NULL END DESC, "cost_and_usage.total_unblended_cost" DESC, z__pivot_col_rank, "cost_and_usage.product_code") AS z___rank FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY CASE WHEN "cost_and_usage.customer_segment" IS NULL THEN 1 ELSE 0 END, "cost_and_usage.customer_segment") AS z__pivot_col_rank FROM (
SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	cost_and_usage.resourcetags_customersegment  AS "cost_and_usage.customer_segment",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_data_transfers_unblended",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.unblended_percent_spend_on_ris"
FROM aws_optimizer.cost_and_usage_raw  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1,2) ww
) bb WHERE z__pivot_col_rank <= 16384
) aa
) xx
) zz
 WHERE z___pivot_row_rank <= 500 OR z__pivot_col_ordering = 1 ORDER BY z___pivot_row_rank

The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.

Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.

Summary

Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.

Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.


About the Author

Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).

 

 

 

Cloudflare Kicking ‘Daily Stormer’ is Bad News For Pirate Sites

Post Syndicated from Ernesto original https://torrentfreak.com/cloudflare-kicking-daily-stormer-is-bad-news-for-pirate-sites-170817/

“I woke up this morning in a bad mood and decided to kick them off the Internet.”

Those are the words of Cloudflare CEO Matthew Prince, who decided to terminate the account of controversial Neo-Nazi site Daily Stormer.

Bam. Gone. At least for a while.

Although many people are happy to see the site go offline, the decision is not without consequence. It goes directly against what many saw as the core values of the company.

For years on end, Cloudflare has been asked to remove terrorist propaganda, pirate sites, and other possibly unacceptable content. Each time, Cloudflare replied that it doesn’t take action without a court order. No exceptions.

“Even if it were able to, Cloudfare does not monitor, evaluate, judge or store content appearing on a third party website,” the company wrote just a few weeks ago, in its whitepaper on intermediary liability.

“We’re the plumbers of the internet. We make the pipes work but it’s not right for us to inspect what is or isn’t going through the pipes,” Cloudflare CEO Matthew Prince himself said not too long ago.

“If companies like ours or ISPs start censoring there would be an uproar. It would lead us down a path of internet censors and controls akin to a country like China,” he added.

The same arguments were repeated in different contexts, over and over.

This strong position was also one of the reasons why Cloudflare was dragged into various copyright infringement court cases. In these cases, the company repeatedly stressed that removing a site from Cloudflare’s service would not make infringing content disappear.

Pirate sites would just require a simple DNS reconfiguration to continue their operation, after all.

“[T]here are no measures of any kind that CloudFlare could take to prevent this alleged infringement, because the termination of CloudFlare’s CDN services would have no impact on the existence and ability of these allegedly infringing websites to continue to operate,” it said.

That comment looks rather misplaced now that the CEO of the same company has decided to “kick” a website “off the Internet” after an emotional, but deliberate, decision.

Taking a page from Cloudflare’s (old) playbook we’re not going to make any judgments here. Just search Twitter or any social media site and you’ll see plenty of opinions, both for and against the company’s actions.

We do have a prediction though. During the months and years to come, Cloudflare is likely to be dragged into many more copyright lawsuits, and when they are, their counterparts are going to bring up Cloudflare’s voluntary decision to kick a website off the Internet.

Unless Cloudflare suddenly decides to pull all pirate sites from its service tomorrow, of course.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

New – VPC Endpoints for DynamoDB

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-vpc-endpoints-for-dynamodb/

Starting today Amazon Virtual Private Cloud (VPC) Endpoints for Amazon DynamoDB are available in all public AWS regions. You can provision an endpoint right away using the AWS Management Console or the AWS Command Line Interface (CLI). There are no additional costs for a VPC Endpoint for DynamoDB.

Many AWS customers run their applications within a Amazon Virtual Private Cloud (VPC) for security or isolation reasons. Previously, if you wanted your EC2 instances in your VPC to be able to access DynamoDB, you had two options. You could use an Internet Gateway (with a NAT Gateway or assigning your instances public IPs) or you could route all of your traffic to your local infrastructure via VPN or AWS Direct Connect and then back to DynamoDB. Both of these solutions had security and throughput implications and it could be difficult to configure NACLs or security groups to restrict access to just DynamoDB. Here is a picture of the old infrastructure.

Creating an Endpoint

Let’s create a VPC Endpoint for DynamoDB. We can make sure our region supports the endpoint with the DescribeVpcEndpointServices API call.


aws ec2 describe-vpc-endpoint-services --region us-east-1
{
    "ServiceNames": [
        "com.amazonaws.us-east-1.dynamodb",
        "com.amazonaws.us-east-1.s3"
    ]
}

Great, so I know my region supports these endpoints and I know what my regional endpoint is. I can grab one of my VPCs and provision an endpoint with a quick call to the CLI or through the console. Let me show you how to use the console.

First I’ll navigate to the VPC console and select “Endpoints” in the sidebar. From there I’ll click “Create Endpoint” which brings me to this handy console.

You’ll notice the AWS Identity and Access Management (IAM) policy section for the endpoint. This supports all of the fine grained access control that DynamoDB supports in regular IAM policies and you can restrict access based on IAM policy conditions.

For now I’ll give full access to my instances within this VPC and click “Next Step”.

This brings me to a list of route tables in my VPC and asks me which of these route tables I want to assign my endpoint to. I’ll select one of them and click “Create Endpoint”.

Keep in mind the note of warning in the console: if you have source restrictions to DynamoDB based on public IP addresses the source IP of your instances accessing DynamoDB will now be their private IP addresses.

After adding the VPC Endpoint for DynamoDB to our VPC our infrastructure looks like this.

That’s it folks! It’s that easy. It’s provided at no cost. Go ahead and start using it today. If you need more details you can read the docs here.

AWS Summit New York – Summary of Announcements

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-summit-new-york-summary-of-announcements/

Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:

Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.

AWS GlueRandall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.

AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.

CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.

Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.

CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.

Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.

Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:

Jeff;

 

Launch – AWS Glue Now Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/launch-aws-glue-now-generally-available/

Today we’re excited to announce the general availability of AWS Glue. Glue is a fully managed, serverless, and cloud-optimized extract, transform and load (ETL) service. Glue is different from other ETL services and platforms in a few very important ways.

First, Glue is “serverless” – you don’t need to provision or manage any resources and you only pay for resources when Glue is actively running. Second, Glue provides crawlers that can automatically detect and infer schemas from many data sources, data types, and across various types of partitions. It stores these generated schemas in a centralized Data Catalog for editing, versioning, querying, and analysis. Third, Glue can automatically generate ETL scripts (in Python!) to translate your data from your source formats to your target formats. Finally, Glue allows you to create development endpoints that allow your developers to use their favorite toolchains to construct their ETL scripts. Ok, let’s dive deep with an example.

In my job as a Developer Evangelist I spend a lot of time traveling and I thought it would be cool to play with some flight data. The Bureau of Transportations Statistics is kind enough to share all of this data for anyone to use here. We can easily download this data and put it in an Amazon Simple Storage Service (S3) bucket. This data will be the basis of our work today.

Crawlers

First, we need to create a Crawler for our flights data from S3. We’ll select Crawlers in the Glue console and follow the on screen prompts from there. I’ll specify s3://crawler-public-us-east-1/flight/2016/csv/ as my first datasource (we can add more later if needed). Next, we’ll create a database called flights and give our tables a prefix of flights as well.

The Crawler will go over our dataset, detect partitions through various folders – in this case months of the year, detect the schema, and build a table. We could add additonal data sources and jobs into our crawler or create separate crawlers that push data into the same database but for now let’s look at the autogenerated schema.

I’m going to make a quick schema change to year, moving it from BIGINT to INT. Then I can compare the two versions of the schema if needed.

Now that we know how to correctly parse this data let’s go ahead and do some transforms.

ETL Jobs

Now we’ll navigate to the Jobs subconsole and click Add Job. Will follow the prompts from there giving our job a name, selecting a datasource, and an S3 location for temporary files. Next we add our target by specifying “Create tables in your data target” and we’ll specify an S3 location in Parquet format as our target.

After clicking next, we’re at screen showing our various mappings proposed by Glue. Now we can make manual column adjustments as needed – in this case we’re just going to use the X button to remove a few columns that we don’t need.

This brings us to my favorite part. This is what I absolutely love about Glue.

Glue generated a PySpark script to transform our data based on the information we’ve given it so far. On the left hand side we can see a diagram documenting the flow of the ETL job. On the top right we see a series of buttons that we can use to add annotated data sources and targets, transforms, spigots, and other features. This is the interface I get if I click on transform.

If we add any of these transforms or additional data sources, Glue will update the diagram on the left giving us a useful visualization of the flow of our data. We can also just write our own code into the console and have it run. We can add triggers to this job that fire on completion of another job, a schedule, or on demand. That way if we add more flight data we can reload this same data back into S3 in the format we need.

I could spend all day writing about the power and versatility of the jobs console but Glue still has more features I want to cover. So, while I might love the script editing console, I know many people prefer their own development environments, tools, and IDEs. Let’s figure out how we can use those with Glue.

Development Endpoints and Notebooks

A Development Endpoint is an environment used to develop and test our Glue scripts. If we navigate to “Dev endpoints” in the Glue console we can click “Add endpoint” in the top right to get started. Next we’ll select a VPC, a security group that references itself and then we wait for it to provision.


Once it’s provisioned we can create an Apache Zeppelin notebook server by going to actions and clicking create notebook server. We give our instance an IAM role and make sure it has permissions to talk to our data sources. Then we can either SSH into the server or connect to the notebook to interactively develop our script.

Pricing and Documentation

You can see detailed pricing information here. Glue crawlers, ETL jobs, and development endpoints are all billed in Data Processing Unit Hours (DPU) (billed by minute). Each DPU-Hour costs $0.44 in us-east-1. A single DPU provides 4vCPU and 16GB of memory.

We’ve only covered about half of the features that Glue has so I want to encourage everyone who made it this far into the post to go read the documentation and service FAQs. Glue also has a rich and powerful API that allows you to do anything console can do and more.

We’re also releasing two new projects today. The aws-glue-libs provide a set of utilities for connecting, and talking with Glue. The aws-glue-samples repo contains a set of example jobs.

I hope you find that using Glue reduces the time it takes to start doing things with your data. Look for another post from me on AWS Glue soon because I can’t stop playing with this new service.
Randall

AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-cloudhsm-update-cost-effective-hardware-key-management/

Our customers run an incredible variety of mission-critical workloads on AWS, many of which process and store sensitive data. As detailed in our Overview of Security Processes document, AWS customers have access to an ever-growing set of options for encrypting and protecting this data. For example, Amazon Relational Database Service (RDS) supports encryption of data at rest and in transit, with options tailored for each supported database engine (MySQL, SQL Server, Oracle, MariaDB, PostgreSQL, and Aurora).

Many customers use AWS Key Management Service (KMS) to centralize their key management, with others taking advantage of the hardware-based key management, encryption, and decryption provided by AWS CloudHSM to meet stringent security and compliance requirements for their most sensitive data and regulated workloads (you can read my post, AWS CloudHSM – Secure Key Storage and Cryptographic Operations, to learn more about Hardware Security Modules, also known as HSMs).

Major CloudHSM Update
Today, building on what we have learned from our first-generation product, we are making a major update to CloudHSM, with a set of improvements designed to make the benefits of hardware-based key management available to a much wider audience while reducing the need for specialized operating expertise. Here’s a summary of the improvements:

Pay As You Go – CloudHSM is now offered under a pay-as-you-go model that is simpler and more cost-effective, with no up-front fees.

Fully Managed – CloudHSM is now a scalable managed service; provisioning, patching, high availability, and backups are all built-in and taken care of for you. Scheduled backups extract an encrypted image of your HSM from the hardware (using keys that only the HSM hardware itself knows) that can be restored only to identical HSM hardware owned by AWS. For durability, those backups are stored in Amazon Simple Storage Service (S3), and for an additional layer of security, encrypted again with server-side S3 encryption using an AWS KMS master key.

Open & Compatible  – CloudHSM is open and standards-compliant, with support for multiple APIs, programming languages, and cryptography extensions such as PKCS #11, Java Cryptography Extension (JCE), and Microsoft CryptoNG (CNG). The open nature of CloudHSM gives you more control and simplifies the process of moving keys (in encrypted form) from one CloudHSM to another, and also allows migration to and from other commercially available HSMs.

More Secure – CloudHSM Classic (the original model) supports the generation and use of keys that comply with FIPS 140-2 Level 2. We’re stepping that up a notch today with support for FIPS 140-2 Level 3, with security mechanisms that are designed to detect and respond to physical attempts to access or modify the HSM. Your keys are protected with exclusive, single-tenant access to tamper-resistant HSMs that appear within your Virtual Private Clouds (VPCs). CloudHSM supports quorum authentication for critical administrative and key management functions. This feature allows you to define a list of N possible identities that can access the functions, and then require at least M of them to authorize the action. It also supports multi-factor authentication using tokens that you provide.

AWS-Native – The updated CloudHSM is an integral part of AWS and plays well with other tools and services. You can create and manage a cluster of HSMs using the AWS Management Console, AWS Command Line Interface (CLI), or API calls.

Diving In
You can create CloudHSM clusters that contain 1 to 32 HSMs, each in a separate Availability Zone in a particular AWS Region. Spreading HSMs across AZs gives you high availability (including built-in load balancing); adding more HSMs gives you additional throughput. The HSMs within a cluster are kept in sync: performing a task or operation on one HSM in a cluster automatically updates the others. Each HSM in a cluster has its own Elastic Network Interface (ENI).

All interaction with an HSM takes place via the AWS CloudHSM client. It runs on an EC2 instance and uses certificate-based mutual authentication to create secure (TLS) connections to the HSMs.

At the hardware level, each HSM includes hardware-enforced isolation of crypto operations and key storage. Each customer HSM runs on dedicated processor cores.

Setting Up a Cluster
Let’s set up a cluster using the CloudHSM Console:

I click on Create cluster to get started, select my desired VPC and the subnets within it (I can also create a new VPC and/or subnets if needed):

Then I review my settings and click on Create:

After a few minutes, my cluster exists, but is uninitialized:

Initialization simply means retrieving a certificate signing request (the Cluster CSR):

And then creating a private key and using it to sign the request (these commands were copied from the Initialize Cluster docs and I have omitted the output. Note that ID identifies the cluster):

$ openssl genrsa -out CustomerRoot.key 2048
$ openssl req -new -x509 -days 365 -key CustomerRoot.key -out CustomerRoot.crt
$ openssl x509 -req -days 365 -in ID_ClusterCsr.csr   \
                              -CA CustomerRoot.crt    \
                              -CAkey CustomerRoot.key \
                              -CAcreateserial         \
                              -out ID_CustomerHsmCertificate.crt

The next step is to apply the signed certificate to the cluster using the console or the CLI. After this has been done, the cluster can be activated by changing the password for the HSM’s administrative user, otherwise known as the Crypto Officer (CO).

Once the cluster has been created, initialized and activated, it can be used to protect data. Applications can use the APIs in AWS CloudHSM SDKs to manage keys, encrypt & decrypt objects, and more. The SDKs provide access to the CloudHSM client (running on the same instance as the application). The client, in turn, connects to the cluster across an encrypted connection.

Available Today
The new HSM is available today in the US East (Northern Virginia), US West (Oregon), US East (Ohio), and EU (Ireland) Regions, with more in the works. Pricing starts at $1.45 per HSM per hour.

Jeff;