A couple of months ago on the blog, I announced the AWS Chatbot Challenge in conjunction with Slack. The AWS Chatbot Challenge was an opportunity to build a unique chatbot that helped to solve a problem or that would add value for its prospective users. The mission was to build a conversational, natural language chatbot using Amazon Lex and leverage Lex’s integration with AWS Lambda to execute logic or data processing on the backend.
I know that you all have been anxiously waiting to hear announcements of who were the winners of the AWS Chatbot Challenge as much as I was. Well wait no longer, the winners of the AWS Chatbot Challenge have been decided.
May I have the Envelope Please? (The Trumpets sound)
The winners of the AWS Chatbot Challenge are:
- First Place: BuildFax Counts by Joe Emison
- Second Place: Hubsy by Andrew Riess, Andrew Puch, and John Wetzel
- Third Place: PFMBot by Benny Leong and his team from MoneyLion.
- Large Organization Winner: ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang
Diving into the Winning Chatbot Projects
Let’s take a walkthrough of the details for each of the winning projects to get a view of what made these chatbots distinctive, as well as, learn more about the technologies used to implement the chatbot solution.
BuildFax Counts by Joe Emison
The BuildFax Counts bot was created as a real solution for the BuildFax company to decrease the amount the time that sales and marketing teams can get answers on permits or properties with permits meet certain criteria.
BuildFax, a company co-founded by bot developer Joe Emison, has the only national database of building permits, which updates data from approximately half of the United States on a monthly basis. In order to accommodate the many requests that come in from the sales and marketing team regarding permit information, BuildFax has a technical sales support team that fulfills these requests sent to a ticketing system by manually writing SQL queries that run across the shards of the BuildFax databases. Since there are a large number of requests received by the internal sales support team and due to the manual nature of setting up the queries, it may take several days for getting the sales and marketing teams to receive an answer.
The BuildFax Counts chatbot solves this problem by taking the permit inquiry that would normally be sent into a ticket from the sales and marketing team, as input from Slack to the chatbot. Once the inquiry is submitted into Slack, a query executes and the inquiry results are returned immediately.
The BuildFax Counts bot is used today for the BuildFax sales and marketing team to get back data on inquiries immediately that previously took up to a week to receive results.
Not only is BuildFax Counts bot our 1st place winner and wonderful solution, but its creator, Joe Emison, is a great guy. Joe has opted to donate his prize; the $5,000 cash, the $2,500 in AWS Credits, and one re:Invent ticket to the Black Girls Code organization. I must say, you rock Joe for helping these kids get access and exposure to technology.
Hubsy by Andrew Riess, Andrew Puch, and John Wetzel
Hubsy bot was created to redefine and personalize the way users traditionally manage their HubSpot account. HubSpot is a SaaS system providing marketing, sales, and CRM software. Hubsy allows users of HubSpot to create engagements and log engagements with customers, provide sales teams with deals status, and retrieves client contact information quickly. Hubsy uses Amazon Lex’s conversational interface to execute commands from the HubSpot API so that users can gain insights, store and retrieve data, and manage tasks directly from Facebook, Slack, or Alexa.
In order to implement the Hubsy chatbot, Andrew and the team members used AWS Lambda to create a Lambda function with Node.js to parse the users request and call the HubSpot API, which will fulfill the initial request or return back to the user asking for more information. Terraform was used to automatically setup and update Lambda, CloudWatch logs, as well as, IAM profiles. Amazon Lex was used to build the conversational piece of the bot, which creates the utterances that a person on a sales team would likely say when seeking information from HubSpot. To integrate with Alexa, the Amazon Alexa skill builder was used to create an Alexa skill which was tested on an Echo Dot. Cloudwatch Logs are used to log the Lambda function information to CloudWatch in order to debug different parts of the Lex intents. In order to validate the code before the Terraform deployment, ESLint was additionally used to ensure the code was linted and proper development standards were followed.
PFMBot by Benny Leong and his team from MoneyLion
PFMBot, Personal Finance Management Bot, is a bot to be used with the MoneyLion finance group which offers customers online financial products; loans, credit monitoring, and free credit score service to improve the financial health of their customers. Once a user signs up an account on the MoneyLion app or website, the user has the option to link their bank accounts with the MoneyLion APIs. Once the bank account is linked to the APIs, the user will be able to login to their MoneyLion account and start having a conversation with the PFMBot based on their bank account information.
ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang
ADP PI (Payroll Innovation) bot is designed to help employees of ADP customers easily review their own payroll details and compare different payroll data by just asking the bot for results. The ADP PI Bot additionally offers issue reporting functionality for employees to report payroll issues and aids HR managers in quickly receiving and organizing any reported payroll issues.
The ADP Payroll Innovation bot is an ecosystem for the ADP payroll consisting of two chatbots, which includes ADP PI Bot for external clients (employees and HR managers), and ADP PI DevOps Bot for internal ADP DevOps team.
The architecture for the ADP PI DevOps bot is different architecture from the ADP PI bot shown above as it is deployed internally to ADP. The ADP PI DevOps bot allows input from both Slack and Alexa. When input comes into Slack, Slack sends the request to Lex for it to process the utterance. Lex then calls the Lambda backend, which obtains ADP data sitting in the ADP VPC running within an Amazon VPC. When input comes in from Alexa, a Lambda function is called that also obtains data from the ADP VPC running on AWS.
The architecture for the ADP PI bot consists of users entering in requests and/or entering issues via Slack. When requests/issues are entered via Slack, the Slack APIs communicate via Amazon API Gateway to AWS Lambda. The Lambda function either writes data into one of the Amazon DynamoDB databases for recording issues and/or sending issues or it sends the request to Lex. When sending issues, DynamoDB integrates with Trello to keep HR Managers abreast of the escalated issues. Once the request data is sent from Lambda to Lex, Lex processes the utterance and calls another Lambda function that integrates with the ADP API and it calls ADP data from within the ADP VPC, which runs on Amazon Virtual Private Cloud (VPC).
The ADP PI bot ecosystem has the following functional groupings:
- Summarize Payrolls
- Compare Payrolls
- Escalate Issues
- Evolve PI Bot
HR Manager Functionality
- Bot Management
- Audit and Feedback
- Reduce call volume in service centers (ADP PI Bot).
- Track issues and generate reports (ADP PI Bot).
- Monitor jobs for various environment (ADP PI DevOps Bot)
- View job dashboards (ADP PI DevOps Bot)
- Query job details (ADP PI DevOps Bot)
Let’s all wish all the winners of the AWS Chatbot Challenge hearty congratulations on their excellent projects.
You can review more details on the winning projects, as well as, all of the submissions to the AWS Chatbot Challenge at: https://awschatbot2017.devpost.com/submissions. If you are curious on the details of Chatbot challenge contest including resources, rules, prizes, and judges, you can review the original challenge website here: https://awschatbot2017.devpost.com/.
Hopefully, you are just as inspired as I am to build your own chatbot using Lex and Lambda. For more information, take a look at the Amazon Lex developer guide or the AWS AI blog on Building Better Bots Using Amazon Lex (Part 1)
Chat with you soon!
Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/08/18/timeshiftgrafanabuzz-1w-issue-9/
Matt from Grafana NYC spent the week visiting Stockholm to focus on v5.0 with Torkel. Despite warnings otherwise, the weather has been beautiful, making a nice backdrop for many UX discussions. Very, very excited to soon show what we’ve been working on.
Grafana <3 Prometheus
Our very own Carl Bergquist spoke at PromCon 2017 yesterday in Munich, highlighting recent Grafana features and enhancements.
We also used the opportunity to debut our coming Prometheus query editor with a load of new functionality; seems the community approves,
in fact this is our most popular tweet ever!
— Grafana (@grafana) August 17, 2017
From the Blogosphere
Wikimedia Metrics: A tweet this week reminded us of the public metrics Wikimedia exposes using Grafana. Exploring the performance stats in real time for the 5th mot popular site on the internet is pretty fun.
Creating Grafana Annotations with InfluxDB: Nice short article by Max Chadwick showing how to quickly add InfluxDB as a source for Grafana annotations.
This week’s MVC (Most Valuable Contributor)
This week’s MVC highlights what is great about Open Source software.
ericslaw submitted his first PR to a public project this past week. Speaking from personal experience, submitting a PR can feel daunting and and we were lucky that he chose Grafana. Even the smallest contributions, like Eric fixing a bogus link within our templating has big impact.
Tweet of the Week
We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove
Seems the excitement about Prometheus and Grafana has also caught the attention of a certain superhero.
— Viktor Petersson 🎩 (@vpetersson) August 18, 2017
What do you think?
That wraps up another issue. Hope you’re finding these roundups valuable. Let us know how we’re doing! Submit a comment on this article below, or post something at our community forum. Help us make this better!
Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/
This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.”
As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.
The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.
In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.
Analysis with Athena
With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.
After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.
By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.
If you want to follow along, you need the following resources:
- AWS account
- New or existing S3 bucket
- Appropriate permissions for Athena to the S3 bucket
Enable the cost and usage reports
First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.
The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.
Configure the S3 bucket and files for Athena querying
In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.
To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.
These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.
Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.
As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.
Set up the Athena query engine
In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.
One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.
For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.
Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!
Start with Looker and connect to Athena
Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.
Major cost saving levers
Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:
- Purchasing Reserved Instances vs. On-Demand Instances
- Data transfer costs
- Allocating costs over users or other Attributes (denoted with resource tags)
On-Demand, Spot, and Reserved Instances
Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:
- On-Demand—Pay as you use.
- Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
- Reserved Instances—Pay for an instance for a specific, allotted period of time.
When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.
If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.
The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).
Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.
The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).
With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.
It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.
Data transfer costs
One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.
First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.
Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.
Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.
The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.
When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.
Analysis by tags
AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.
The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.
Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.
Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.
Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.
About the Author
Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).
Post Syndicated from Ernesto original https://torrentfreak.com/game-of-thrones-episode-s07e06-leaks-online-early-170816/
Trouble continues for HBO as another episode of the popular Game of Thrones series has just leaked online, days ahead of the official premiere.
Copies of the sixth episode of the current season, titled ‘Death is the Enemy,’ are currently circulating on various streaming portals, direct download, and torrent sites.
The first copy only just appeared on the Pirate Bay, but others were shared elsewhere earlier. One of the leaked videos is 64 minutes long and of high quality, and there are also versions that consist of two separate parts.
Early on, the two parts were circulating on the video streaming site Dailymotion, but these were swiftly removed.
At the moment it’s still unclear how the leak came about but some suggest that it was leaked by HBO itself in Spain. TorrentFreak has not been able to confirm this, but there are no visible watermarks that point elsewhere.
This isn’t the first time that a Game of Thrones episode has leaked online early. Two years ago the same happened with the first four episodes of season 5. Nonetheless, that season still broke previous viewership records.
Two weeks ago the fourth episode of the current season was also pirated before its official release. This leak, which carried a prominent “Star India Pvt Ltd” watermark, triggered a lot of interest from impatient Game of Thrones fans as well.
Earlier this week, news broke that four men had been arrested in connection with the breach, which is still being investigated. The arrested men all worked for the local media processing company Prime Focus Technologies, where the leak reportedly originated.
The current leak is not in any way related to the hack on HBO’s system, which occurred earlier and revealed several preliminary Game of Thrones scripts.
This hack has also resulted in leaks of various high profile shows, including the upcoming ninth season of ‘Curb Your Enthusiasm.’ Initially, these were hard to find online, but they are now widely available on the usual pirate sites.
Post Syndicated from Ernesto original https://torrentfreak.com/roku-gets-tough-on-pirate-channels-warns-users-170815/
In recent years it has become much easier to stream movies and TV-shows over the Internet.
Legal services such as Netflix and HBO are flourishing, but there’s also a darker side to this streaming epidemic. Millions of people are streaming from unauthorized sources, often paired with perfectly legal streaming platforms and devices.
Hollywood insiders have dubbed this trend “Piracy 3.0” are actively working with stakeholders to address the threat. One of the companies rightsholders are working with is Roku, known for its easy-to-use media players.
Earlier this year Roku was harshly confronted with this new piracy crackdown when a Mexican court ordered local retailers to take its media player off the shelves. While this legal battle isn’t over yet, it was clear to Roku that misuse of its platform wasn’t without consequences.
While Roku never permitted any infringing content, it appears that the company has recently made some adjustments to better deal with the problem, or at least clarify its stance.
Pirate content generally doesn’t show up in the official Roku Channel Store but is directly loaded onto the device through third-party “private” channels. A few weeks ago, Roku renamed these “private” channels to “non-certified” channels, while making it very clear that copyright infringement is not allowed.
A “WARNING!” message that pops up during the installation of these third-party channels stresses that Roku has no control over the content. In addition, the company notes that these channels may be removed if it links to copyright infringing content.
“By continuing, you acknowledge you are accessing a non-certified channel that may include content that is offensive or inappropriate for some audiences,” Roku’s warning reads.
“Moreover, if Roku determines that this channel violates copyright, contains illegal content, or otherwise violates Roku’s terms and conditions, then ROKU MAY REMOVE THIS CHANNEL WITHOUT PRIOR NOTICE.”
TorrentFreak reached out to Roku to find out how they plan to enforce this policy, but we have yet to hear back. According to Cord Cutters News, several piracy channels have already been removed recently, with other developers opting to leave the platform.
Roku’s General Counsel Steve Kay previously informed us that the company is taking the piracy problem seriously. Together with various stakeholders, they are working hard to address the problem.
“We actively work to prevent third-parties from using our platform to distribute copyright infringing content. Moreover, we have been actively working with other industry stakeholders on a wide range of anti-piracy initiatives,” Kay said.
Roku is not the only platform dealing with the piracy epidemic, the popular media player software Kodi is in the same boat. Kodi has also taken an active anti-piracy stance but they’re not banning any add-ons. They believe it would be pointless due to the open source nature of their software.
Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-summit-new-york-summary-of-announcements/
Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:
Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.
AWS Glue – Randall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.
AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.
CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.
Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.
CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.
Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.
Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:
Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/aws-announces-amazon-macie/
I’m pleased to announce that today we’ve launched a new security service, Amazon Macie.
This service leverages machine learning to help customers prevent data loss by automatically discovering, classifying, and protecting sensitive data in AWS. Amazon Macie recognizes sensitive data such as personally identifiable information (PII) or intellectual property, providing customers with dashboards and alerts that give visibility into how data is being accessed or moved. This enables customers to apply machine learning to a wide array of security and compliance workloads, we think this will be a significant enabler for our customers.
To learn more about the see the full AWS Blog post.
Post Syndicated from Jeremy Winters original https://aws.amazon.com/blogs/big-data/upsert-into-amazon-redshift-using-aws-glue-and-sneaql/
This is a guest post by Jeremy Winters and Ritu Mishra, Solution Architects at Full 360. In their own words, “Full 360 is a cloud first, cloud native integrator, and true believers in the cloud since inception in 2007, our focus has been on helping customers with their journey into the cloud. Our practice areas – Big Data and Warehousing, Application Modernization, and Cloud Ops/Strategy – represent deep, but focused expertise.”
AWS Glue is a fully managed ETL (extract, transform, and load) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores. As a company who has been building data warehouse solutions in the cloud for 10 years, we at Full 360 were interested to see how we can leverage AWS Glue in customer solutions. This post details our experience and lessons learned from using AWS Glue for an Amazon Redshift data integration use case.
UI-based ETL Tools
We have been anticipating the release of AWS Glue since it was announced at re:Invent 2016. Many of our customers are looking for an easy to use, UI-based tooling to manage their data transformation pipeline. However in our experience, the complexity of any production pipeline tends to be difficult to unwind, regardless of the technology used to create them. At Full 360, we build cloud-native, script-based applications deployed in containers to handle data integration. We think script-based transformation provides a balance of robustness and flexibility necessary to handle any data problem that comes our way.
AWS Glue caters both to developers who prefer writing scripts and those who want UI-based interactions. It is possible to initially create jobs using the UI, by selecting data source and target. Under the hood, AWS Glue auto-generates the Python code for you, which can be edited if needed, though this isn’t necessary for the majority of use cases.
Of course, you don’t have to rely on the UI at all. You can simply write your own Python scripts, store them in Amazon S3 with any dependent libraries, and import them into AWS Glue. AWS Glue also supports IDE integration with tools such as PyCharm, and interestingly enough, Zeppelin notebooks! These integrations are targeted toward developers who prefer writing Python themselves and want a cleaner dev/test cycle.
Developers who are already in the business of scripting ETL will be excited by the ability to easily deploy Python scripts with AWS Glue, using existing workflows for source control and CI/CD, and have them deployed and executed in a fully managed manner by AWS. The UI experience of AWS Glue works well, but it is good to know that the tool accommodates those who prefer traditional coding. You can also dig into complex edge cases for data handling where the UI doesn’t cut it.
When AWS Lambda was released, we were excited for its potential to host our ETL processes. With Lambda, you are limited to the five-minute maximum for function execution. We resorted to running Docker containers and Amazon ECS to orchestrate many of our customers long running tasks. With this approach, we are still required to manage the underlying infrastructure.
After a closer look at AWS Glue, we realize that it is a full serverless PySpark runtime, accompanied by an Apache Hive metastore compatible catalog-as-a-service. This means that you are not just running a script on a single core, but instead you have the full power of a multi-worker Spark environment available. If you’re not familiar with the Spark framework, it introduces a new paradigm that allows for the processing of distributed, in-memory datasets. Spark has many uses, from data transformation to machine learning.
In AWS Glue, you use PySpark dataframes to transform data before reaching your database. Dataframes manage data in a way similar to relational databases, so the methods are likely to be familiar to most SQL users. Additionally, you can use SQL in your PySpark code to manipulate the data.
AWS Glue also simplifies the management of the runtime environment by providing you with a DPU setting, which allows you to dial up or down the amount of compute resources used to run your job. One DPU is equivalent to 4 vCPU, 16 GB Mem.
Common use case
We can see that most customers would leverage AWS Glue to load one or many files from S3 into Amazon Redshift. To accelerate this process, you can use the crawler, an AWS console-based utility, to discover the schema of your data and store it in the AWS Glue Data Catalog, whether your data sits in a file or a database. We were able to discover the schemas of our source file and target table, then have AWS Glue construct and execute the ETL job. It worked! We successfully loaded the beer drinkers’ dataset, JSON files used in our advanced tuning class, into Amazon Redshift!
Our use case
For our use case, we integrated AWS Glue and Amazon Redshift Spectrum with an open-source project that we initiated called SneaQL. SneaQL is an open source, containerized framework for sneaking interactivity into static SQL. SneaQL provides an extension to ANSI SQL with command tags to provide functionality such as loops, variables, conditional execution, and error handling to your scripts. It allows you to manage your scripts in an AWS CodeCommit Git repo, which then gets deployed and executed in a container.
We use SneaQL for complex data integrations, usually with an ELT model, where the data is loaded into the database, then transformed into fact tables using parameterized SQL. SneaQL enables advanced use cases like partial upsert aggregation of data, where multiple data sources can merge into the same fact table.
We think AWS Glue, Redshift Spectrum, and SneaQL offer a compelling way to build a data lake in S3, with all of your metadata accessible through a variety of tools such as Hive, Presto, Spark, and Redshift Spectrum). Build your aggregation table in Amazon Redshift to drive your dashboards or other high-performance analytics.
In the video below, you see a demonstration of using AWS Glue to convert JSON documents into Parquet for partitioned storage in an S3 data lake. We then access the data from S3 into Amazon Redshift by way of Redshift Spectrum. Nearing the end of the AWS Glue job, we then call AWS boto3 to trigger an Amazon ECS SneaQL task to perform an upsert of the data into our fact table. All the sample artifacts needed for this demonstration are available in the Full360/Sneaql Github repository.
In this scenario, AWS Glue manipulates data outside of a data warehouse and loads it to Amazon Redshift, and SneaQL manipulates data inside Amazon Redshift.
We think that they are a good compliment to each other when doing complex integrations.
Our pipeline works as follows:
- New files arrive in S3 in JSON format.
- AWS Glue converts the JSON files in Parquet format, stored in another S3 bucket.
- At the end of the AWS Glue script, the AWS SDK for Python (Boto) is used to trigger the Amazon ECS task that runs SneaQL.
- SneaQL container pulls the secrets file from S3 then decrypts it with biscuit or AWS KMS.
- SneaQL clones the appropriate branch from an AWS CodeCommit Git repo.
- Data in Amazon Redshift is transformed by SneaQL statements stored in the repo.
The AWS Glue team also supports conditional triggers that would allow us to split the jobs, and make one dependent upon the other.
Although, as a general rule, we believe in managing schema definitions in source control, we definitely think that the crawler feature has some attractive perks. Besides being handy for ad-hoc jobs, it is also partition-aware. This means that you can perform data movements without worrying about which data has been processed already, which is one less thing for you to manage. We’re interested to see the design patterns that emerge around the use of the crawler.
The freedom of PySpark also allows us to implement advanced use cases. While the boto3 library is available to all AWS Glue jobs, it is also possible to include any libraries and additional JAR files along with your Python script.
We think many customers will find AWS Glue valuable. Those who prefer to code their data processing pipeline will be quick to realize how powerful it is, especially for solving complex data cleansing problems. The learning curve for PySpark is real, so we suggest looking into dataframes as a good entry point. In the long run, the fact that AWS Glue is serverless makes the effort more than worth it.
If you have questions or suggestions, please comment below.
Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/launch-amazon-macie-securing-your-s3-buckets/
When Jeff and I heard about this service, we both were curious on the meaning of the name Macie. Of course, Jeff being a great researcher looked up the name Macie and found that the name Macie has two meanings. It has both French and English (UK) based origin, it is typically a girl name, has various meanings. The first meaning of Macie that was found, said that that name meant “weapon”. The second meaning noted the name was representative of a person that is bold, sporty, and sweet. In a way, these definitions are appropriate, as today I am happy to announce that we are launching Amazon Macie, a new security service that uses machine learning to help identify and protect sensitive data stored in AWS from breaches, data leaks, and unauthorized access with Amazon Simple Storage Service (S3) being the initial data store. Therefore, I can imagine that Amazon Macie could be described as a bold, weapon for AWS customers providing a sweet service with a sporty user interface that helps to protects against malicious access of your data at rest. Whew, that was a mouthful, but I unbelievably got all the Macie descriptions out in a single sentence! Nevertheless, I am a thrilled to share with you the power of the new Amazon Macie service.
Amazon Macie is a service powered by machine learning that can automatically discover and classify your data stored in Amazon S3. But Macie doesn’t stop there, once your data has been classified by Macie, it assigns each data item a business value, and then continuously monitors the data in order to detect any suspicious activity based upon access patterns. Key features of the Macie service include:
- Data Security Automation: analyzes, classifies, and processes data to understand the historical patterns, user authentications to data, data access locations, and times of access.
- Data Security & Monitoring: actively monitors usage log data for anomaly detected along with automatic resolution of reported issues through CloudWatch Events and Lambda
- Data Visibility for Proactive Loss prevention: Provides management visibility into details of storage data while providing immediate protection without the need for manual customer input
- Data Research and Reporting: allows administrative configuration for reporting and alert management requirements
How does Amazon Macie accomplish this you ask?
Using machine learning algorithms for natural language processing (NLP), Macie can automate the classification of data in your S3 buckets. In addition, Amazon Macie takes advantage of predictive analytics algorithms enabling data access patterns to be dynamically analyzed. Learnings are then used to inform and to alert you on possible suspicious behavior. Macie also runs an engine specifically to detect common sources of personally identifiable information (PII), or sensitive personal information (SP). Macie takes advantage of AWS CloudTrail and continuously checks Cloudtrail events for PUT requests in S3 buckets and automatically classify new objects in almost real time.
While Macie is a powerful tool to use for security and data protection in the AWS cloud, it also can aid you with governance, compliance requirements, and/or audit standards. Many of you may already be aware of the EU’s most stringent privacy regulation to date – The General Protection Data Regulation (GDPR), which becomes enforceable on May 25, 2018. As Amazon Macie recognizes personally identifiable information (PII) and provides customers with dashboards and alerts, it will enable customers to comply with GDPR regulations around encryption and pseudonymization of data. When combined with Lambda queries, Macie becomes a powerful tool to help remediate GDPR concerns.
Tour of the Amazon Macie Service
Let’s look a tour of the service and look at Amazon Macie up close and personal.
First, I will log onto the Macie console and start the process of setting up Macie so that I can start to my data classification and protection by clicking the Get Started button.
I will create these roles and turn on the AWS CloudTrail service in my account. To make things easier for you to setup Macie, you can take advantage of sample template for CloudFormation provided in the Macie User Guide that will set up required IAM roles and policies for you, you then would only need to setup a trail as noted in the CloudTrail documentation.
If you have multiple AWS accounts, you should note that the account you use to enable the Macie service will be noted as the master account, you can integrate other accounts with the Macie service but they will have the member account designation. Users from member accounts will need to use an IAM role to federate access to the master account in order access the Macie console.
Now that my IAM roles are created and CloudTrail is enabled, I will click the Enable Macie button to start Macie’s data monitoring and protection.
Once Macie is finished starting the service in your account, you will be brought to the service main screen and any existing alerts in your account will be presented to you. Since I have just started the service, I currently have no existing alerts at this time.
Considering we are doing a tour of the Macie service, I will now integrate some of my S3 buckets with Macie. However, you do not have to specify any S3 buckets for Macie to start monitoring since the service already uses the AWS CloudTrail Management API analyze and process information. With this tour of Macie, I have decided to monitor some object level API events in from certain buckets in CloudTrail.
In order to integrate with S3, I will go to the Integrations tab of the Macie console. Once on the Integrations tab, I will see two options: Accounts and Services. The Account option is used to integrate member accounts with Macie and to set your data retention policy. Since I want to integrate specific S3 buckets with Macie, I’ll click the Services option go to the Services tab.
When I integrate Macie with the S3 service, a trail and a S3 bucket will be created to store logs about S3 data events. To get started, I will use the Select an account drop down to choose an account. Once my account is selected, the services available for integration are presented. I’ll select the Amazon S3 service by clicking the Add button.
Now I can select the buckets that I want Macie to analyze, selecting the Review and Save button takes me to a screen which I confirm that I desire object level logging by clicking Save button.
Next, on our Macie tour, let’s look at how we can customize data classification with Macie.
As we discussed, Macie will automatically monitor and classify your data. Once Macie identifies your data it will classify your data objects by file and content type. Macie will also use a support vector machine (SVM) classifier to classify the content within S3 objects in addition to the metadata of the file. In deep learning/machine learning fields of study, support vector machines are supervised learning models, which have learning algorithms used for classification and regression analysis of data. Macie trained the SVM classifier by using a data of varying content types optimized to support accurate detection of data content even including the source code you may write.
Macie will assign only one content type per data object or file, however, you have the ability to enable or disable content type and file extensions in order to include or exclude them from the Macie service classifying these objects. Once Macie classifies the data, it will assign risk level of the object between 1 and 10 with 10 being the highest risk and 1 being the lowest data risk level.
To customize the classification of our data with Macie, I’ll go to the Settings Tab. I am now presented with the choices available to enable or disable the Macie classifications settings.
For an example during our tour of Macie, I will choose File extension. When presented with the list of file extensions that Macie tracks and uses for classifications.
As a test, I’ll edit the apk file extension for Android application install file, and disable monitoring of this file by selecting No – disabled from the dropdown and clicking the Save button. Of course, later I will turn this back on since I want to keep my entire collection of data files safe including my Android development binaries.
One last thing I want to note about data classification using Macie is that the service provides visibility in how you data object are being classified and highlights data assets that you have stored regarding how critical or important the information for compliance, for your personal data and for your business.
Now that we have explored the data that Macie classifies and monitors, the last stop on our service tour is the Macie dashboard.
The Macie Dashboard provides us with a complete picture of all of the data and activity that has been gathered as Macie monitors and classifies our data. The dashboard displays Metrics and Views grouped by categories to provide different visual perspectives of your data. Within these dashboard screens, you also you can go from a metric perspective directly to the Research tab to build and run queries based on the metric. These queries can be used to set up customized alerts for notification of any possible security issues or problems. We won’t have an opportunity to tour the Research or Alerts tab, but you can find out more information about these features in the Macie user guide.
Turning back to the Dashboard, there are so many great resources in the Macie Dashboard that we will not be able to stop at each view, metric, and feature during our tour, so let me give you an overview of all the features of the dashboard that you can take advantage of using.
Dashboard Metrics – monitored data grouped by the following categories:
- High-risk S3 objects: data objects with risk levels of 8 through 10.
- Total event occurrences: – total count of all event occurrences since Macie was enabled
- Total user sessions – 5-minute snapshot of CloudTrail data
Dashboard Views – views to display various points of the monitored data and activity:
- S3 objects for a selected time range
- S3 objects
- S3 objects by personally identifiable information (PII)
- S3 objects by ACL
- CloudTrail events and associated users
- CloudTrail errors and associated users
- Activity location
- AWS CLoudTrail events
- Activity ISPs
- AWS CloudTrail user identity types
Well, that concludes our tour of the new and exciting Amazon Macie service. Amazon Macie is a sensational new service that uses the power of machine learning and deep learning to aid you in securing, identifying, and protecting your data stored in Amazon S3. Using natural language processing (NLP) to automate data classification, Amazon Macie enables you to easily get started with high accuracy classification and immediate protection of your data by simply enabling the service. The interactive dashboards give visibility to the where, what, who, and when of your information allowing you to proactively analyze massive streams of data, data accesses, and API calls in your environment. Learn more about Amazon Macie by visiting the product page or the documentation in the Amazon Macie user guide.
Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-migration-hub-plan-track-enterprise-application-migration/
About once a week, I speak to current and potential AWS customers in our Seattle Executive Briefing Center. While I generally focus on our innovation process, we sometimes discuss other topics, including application migration. When enterprises decide to migrate their application portfolios they want to do it in a structured, orderly fashion. These portfolios typically consist of hundreds of complex Windows and Linux applications, relational databases, and more. Customers find themselves eager yet uncertain as to how to proceed. After spending time working with these customers, we have learned that their challenges generally fall in to three major categories:
Discovery – They want to make sure that they have a deep and complete understanding of all of the moving parts that power each application.
Server & Database Migration – They need to transfer on-premises workloads and database tables to the cloud.
Tracking / Management – With large application portfolios and multiple migrations happening in parallel, they need to track and manage progress in an application-centric fashion.
Over the last couple of years we have launched a set of tools that address the first two challenges. The AWS Application Discovery Service automates the process of discovering and collecting system information, the AWS Server Migration Service takes care of moving workloads to the cloud, and the AWS Database Migration Service moves relational databases, NoSQL databases, and data warehouses with minimal downtime. Partners like Racemi and CloudEndure also offer migration tools of their own.
New AWS Migration Hub
Today we are bringing this collection of AWS and partner migration tools together in the AWS Migration Hub. The hub provides access to the tools that I mentioned above, guides you through the migration process, and tracks the status of each migration, all in accord with the methodology and tenets described in our Migration Acceleration Program (MAP).
Here’s the main screen. It outlines the migration process (discovery, migration, and tracking):
Clicking on Start discovery reveals the flow of the migration process:
It is also possible to skip the Discovery step and begin the migration immediately:
The Servers list is populated using data from an AWS migration service (Server Migration Service or Database Migration Service), partner tools, or using data collected by the AWS Application Discovery Service:
I can on Group as application to create my first application:
Once I identify some applications to migrate, I can track them in the Migrations section of the Hub:
The migration tools, if authorized, automatically send status updates and results back to Migration Hub, for display on the migration status page for the application. Here you can see that Racemi DynaCenter and CloudEndure Migration have played their parts in the migration:
I can track the status of my migrations by checking the Migration Hub Dashboard:
Migration Hub works with migration tools from AWS and our Migration Partners; see the list of integrated partner tools to learn more:
AWS Migration Hub can manage migrations in any AWS Region that has the necessary migration tools available; the hub itself runs in the US West (Oregon) Region. There is no charge for the Hub; you pay only for the AWS services that you consume in the course of the migration.
If you are ready to begin your migration to the cloud and are in need of some assistance, please take advantage of the services offered by our Migration Acceleration Partners. These organizations have earned their migration competency by repeatedly demonstrating their ability to deliver large-scale migration.
Post Syndicated from Ernesto original https://torrentfreak.com/curb-your-enthusiasm-on-those-hbo-leaks-170814/
Late July, news broke that a hacker, or hackers, had compromised the network of the American cable and television network HBO.
Those responsible contacted reporters, informing them about the prominent breach, and leaked files surfaced on the dedicated website Winter-leak.com.
The website wasn’t around for long, but last week the hackers reached out to the press again with a curated batch of new leaks shared through Mega.nz. Among other things, it contained more Game of Thrones spoilers, marketing plans, and other confidential HBO files.
Fast forward another week and there’s yet another freshly curated batch of leaks. This time it includes episodes of the highly anticipated return of ‘Curb Your Enthusiasm,’ which officially airs in October, as well as episodes from “Barry,” “Insecure” and “The Deuce,” AP reports.
These shows are part of the treasure trove of 1.5 terabytes that was taken from HBO. These and several other titles were already teased last week in a screenshot the hackers released to the press.
There’s no reason to doubt that the leaks are real, but thus far they haven’t been widely distributed. It appears that the various journalists who received the latest batch of Mega.nz links are not very eager to post them in public.
TorrentFreak scoured popular torrent sites and streaming portals for public copies of the new Curb Your Enthusiasm episodes and came up empty-handed. And we’re certainly not the only ones having trouble spotting the leaks in public.
“I searched around a lot a few hours ago and couldn’t find anything,” one Curb Your Enthusiasm watcher commented on Reddit. “Why can’t these hackers be courteous and place links?” another added.
This is quite different from the leaked episode of Game of Thrones that came out before its official release two weeks ago. That leak was not related to the HBO hack, but before the news broke in the mainstream press, thousands of copies were already available on pirate sites.
HBO, meanwhile, appears to have had enough of the continued enthusiasm the hacker is managing to generate in the press.
“We are not in communication with the hacker and we’re not going to comment every time a new piece of information is released,” a company spokesperson said.
“It has been widely reported that there was a cyber incident at HBO. The hacker may continue to drop bits and pieces of stolen information in an attempt to generate media attention. That’s a game we’re not going to participate in.”
As for the Curb Your Enthusiasm fans who were hoping for an early preview of the new season. They may have to, well… you know. For now at least.
Post Syndicated from Ernesto original https://torrentfreak.com/popcorn-time-devs-help-streaming-aggregator-reelgood-to-fix-piracy-170812/
During the fall of 2015, the MPAA shut down one of the most prominent pirate streaming services, Popcorn Time fork PopcornTime.io.
While the service was found to be clearly infringing, many of the developers didn’t set out to break the law. Most of all, they wanted to provide the public with easy access to their favorite movies and TV-shows.
Fast forward nearly two years and several of these Popcorn Time developers are still on the same quest. The main difference is that they now operate on the safe side of the law.
The startup they’re working with is called Reelgood, which can be best described as a streaming service aggregator. The San-Francisco based company, founded by ex-Facebook employee David Sanderson, recently raised $3.5 million and has opened its doors to the public.
The goal of Reelgood is similar to Popcorn Time in the way that it aims to be the go-to tool for people to access their entertainment. Instead of using pirate sources, however, Reelgood stitches together content from various legal platforms, both paid and free.
TorrentFreak spoke to former Popcorn Time developer Luigi Poole, who’s leading the charge on the development of Reelgood’s web app. He stresses that the increasing fragmentation of streaming services, which drives some people to pirate sites, is one of the problems Reelgood hopes to fix.
“There’s a misconception that torrenting is done by bad people who don’t want to pay for content. I’d say, in the vast majority of cases, torrenting is a symptom of the massive fragmentation that’s been given as the only legal option to the consumer,” Poole says.
While people have many reasons to pirate, some stick to unauthorized services because it’s simply too cumbersome to dig through all the legal options. Pirate sites have a single interface to all popular movies and TV-shows and legal platforms don’t.
“The modern TV/movie ecosystem is made up of an increasing number of different services. This makes finding content like changing channels, only more complicated. Is that movie you’re about to buy or rent on a service you already pay for? Right now there’s no way to do this other than a cumbersome search using each service’s individual search. Time to go digging,” Poole says.
“We believe this is the main reason people torrent — it’s just easier, given that the legal options presented to us are essentially a ‘go fetch’ treasure hunt,” he adds.
Flipping that channel on an old school television often beats the online streaming experience. That is, for those who want more than Netflix alone.
And the problem isn’t going away anytime soon. As we reported earlier this week, there’s a trend towards more fragmentation, instead of less. Disney is pulling some of its most popular content from the US Netflix in 2019, keeping piracy relevant.
“The untold story is that consumers are throwing up their hands with all this fragmentation, and turning to torrenting not because it’s free, but because it’s intuitive and easy,” Poole says.
“Reelgood fixes this problem by acting as a pirate site interface for every legal option, sort of like a TV guide to anything streaming, also giving you notifications anytime something is new, letting you track when certain content becomes available, and not only telling you where it’s available but taking you straight there with one click to play.”
Reelgood can be seen as a defragmentation tool, creating a uniform interface for all the legal platforms people have access to. In addition to paid services such as Netflix and HBO, it also lists free content from Fox, CBS, Crackle, and many other providers.
TorrentFreak took it for a spin and it indeed works as advertised. Simply add your streaming service accounts and all will be bundled into an elegant and uniform interface that allows you to watch and track everything with a single click.
The service is still limited to US libraries but there are already plans to expand it to other countries, which is promising. While it may not eradicate piracy anytime soon, it does a good job of trying to organize the increasingly complex streaming landscape.
Unfortunately, it’s still not cheap to use more than a handful of paid services, but that’s a problem even Reelgood can’t fix. Not even with help from seven former Popcorn Time developers.
One of the benefits of a data warehouse environment using both Amazon Redshift and Amazon RDS for PostgreSQL is that you can leverage the advantages of each service. Amazon Redshift is a high performance, petabyte-scale data warehouse service optimized for the online analytical processing (OLAP) queries typical of analytic reporting and business intelligence applications. On the other hand, a service like RDS excels at transactional OLTP workloads such as inserting, deleting, or updating rows.
In the recent JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink post, we showed how you can deploy such an environment. Now, you can deploy a similar architecture using the Modern Data Warehouse on AWS Quick Start. The Quick Start is an automated deployment that uses AWS CloudFormation templates to launch, configure, and run the services required to deploy a data warehousing environment on AWS, based on Amazon Redshift and RDS for PostgreSQL.
The Quick Start also includes an instance of Tableau Server, running on Amazon EC2. This gives you the ability to host and serve analytic dashboards, workbooks and visualizations, supported by a trial license. You can play with the sample data source and dashboard, or create your own analyses by uploading your own data sets.
For more information about the Modern Data Warehouse on AWS Quick Start, download the full deployment guide. If you’re ready to get started, use one of the buttons below:
Option 1: Deploy Quick Start into a new VPC on AWS
Option 2: Deploy Quick Start into an existing VPC
If you have questions, please leave a comment below.
- Learn more about the Quick Start.
- Download the deployment guide.
- Read the JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink blog post.
You can also join us for the webinar Unlock Insights and Reduce Costs by Modernizing Your Data Warehouse on AWS on Tuesday, August 22, 2017. Pearson, the education and publishing company, will present best practices and lessons learned during their journey to Amazon Redshift and Tableau.
Post Syndicated from Eevee original https://eev.ee/blog/2017/08/09/growing-up-alongside-tech/
IndustrialRobot asks… or, uh, asked last month:
industrialrobot: How has your views on tech changed as you’ve got older?
This is so open-ended that it’s actually stumped me for a solid month. I’ve had a surprisingly hard time figuring out where to even start.
It’s not that my views of tech have changed too much — it’s that they’ve changed very gradually. Teasing out and explaining any one particular change is tricky when it happened invisibly over the course of 10+ years.
I think a better framework for this is to consider how my relationship to tech has changed. It’s gone through three pretty distinct phases, each of which has strongly colored how I feel and talk about technology.
In which I start from nothing.
Nothing is an interesting starting point. You only really get to start there once.
Learning something on my own as a kid was something of a magical experience, in a way that I don’t think I could replicate as an adult. I liked computers; I liked toying with computers; so I did that.
I don’t know how universal this is, but when I was a kid, I couldn’t even conceive of how incredible things were made. Buildings? Cars? Paintings? Operating systems? Where does any of that come from? Obviously someone made them, but it’s not the sort of philosophical point I lingered on when I was 10, so in the back of my head they basically just appeared fully-formed from the æther.
That meant that when I started trying out programming, I had no aspirations. I couldn’t imagine how far I would go, because all the examples of how far I would go were completely disconnected from any idea of human achievement. I started out with BASIC on a toy computer; how could I possibly envision a connection between that and something like a mainstream video game? Every new thing felt like a new form of magic, so I couldn’t conceive that I was even in the same ballpark as whatever process produced real software. (Even seeing the source code for
GORILLAS.BAS, it didn’t quite click. I didn’t think to try reading any of it until years after I’d first encountered the game.)
This isn’t to say I didn’t have goals. I invented goals constantly, as I’ve always done; as soon as I learned about a new thing, I’d imagine some ways to use it, then try to build them. I produced a lot of little weird goofy toys, some of which entertained my tiny friend group for a couple days, some of which never saw the light of day. But none of it felt like steps along the way to some mountain peak of mastery, because I didn’t realize the mountain peak was even a place that could be gone to. It was pure, unadulterated (!) playing.
I contrast this to my art career, which started only a couple years ago. I was already in my late 20s, so I’d already spend decades seeing a very broad spectrum of art: everything from quick sketches up to painted masterpieces. And I’d seen the people who create that art, sometimes seen them create it in real-time. I’m even in a relationship with one of them! And of course I’d already had the experience of advancing through tech stuff and discovering first-hand that even the most amazing software is still just code someone wrote.
So from the very beginning, from the moment I touched pencil to paper, I knew the possibilities. I knew that the goddamn Sistine Chapel was something I could learn to do, if I were willing to put enough time in — and I knew that I’m not, so I’d have to settle somewhere a ways before that. I knew that I’d have to put an awful lot of work in before I’d be producing anything very impressive.
I did it anyway (though perhaps waited longer than necessary to start), but those aren’t things I can un-know, and so I can never truly explore art from a place of pure ignorance. On the other hand, I’ve probably learned to draw much more quickly and efficiently than if I’d done it as a kid, precisely because I know those things. Now I can decide I want to do something far beyond my current abilities, then go figure out how to do it. When I was just playing, that kind of ambition was impossible.
So, I played.
How did this affect my views on tech? Well, I didn’t… have any. Learning by playing tends to teach you things in an outward sprawl without many abrupt jumps to new areas, so you don’t tend to run up against conflicting information. The whole point of opinions is that they’re your own resolution to a conflict; without conflict, I can’t meaningfully say I had any opinions. I just accepted whatever I encountered at face value, because I didn’t even know enough to suspect there could be alternatives yet.
That started to seriously change around, I suppose, the end of high school and beginning of college. I was becoming aware of this whole “open source” concept. I took classes that used languages I wouldn’t otherwise have given a second thought. (One of them was Python!) I started to contribute to other people’s projects. Eventually I even got a job, where I had to work with other people. It probably also helped that I’d had to maintain my own old code a few times.
Now I was faced with conflicting subjective ideas, and I had to form opinions about them! And so I did. With gusto. Over time, I developed an idea of what was Right based on experience I’d accrued. And then I set out to always do things Right.
That’s served me decently well with some individual problems, but it also led me to inflict a lot of unnecessary pain on myself. Several endeavors languished for no other reason than my dissatisfaction with the architecture, long before the basic functionality was done. I started a number of “pure” projects around this time, generic tools like imaging libraries that I had no direct need for. I built them for the sake of them, I guess because I felt like I was improving some niche… but of course I never finished any. It was always in areas I didn’t know that well in the first place, which is a fine way to learn if you have a specific concrete goal in mind — but it turns out that building a generic library for editing images means you have to know everything about images. Perhaps that ambition went a little haywire.
I’ve said before that this sort of (self-inflicted!) work was unfulfilling, in part because the best outcome would be that a few distant programmers’ lives are slightly easier. I do still think that, but I think there’s a deeper point here too.
In forgetting how to play, I’d stopped putting any of myself in most of the work I was doing. Yes, building an imaging library is kind of a slog that someone has to do, but… I assume the people who work on software like PIL and ImageMagick are actually interested in it. The few domains I tried to enter and revolutionize weren’t passions of mine; I just happened to walk through the neighborhood one day and decided I could obviously do it better.
Not coincidentally, this was the same era of my life that led me to write stuff like that PHP post, which you may notice I am conspicuously not even linking to. I don’t think I would write anything like it nowadays. I could see myself approaching the same subject, but purely from the point of view of language design, with more contrasts and tradeoffs and less going for volume. I certainly wouldn’t lead off with inflammatory puffery like “PHP is a community of amateurs”.
I think I’ve mellowed out a good bit in the last few years.
It turns out that being Right is much less important than being Not Wrong — i.e., rather than trying to make something perfect that can be adapted to any future case, just avoid as many pitfalls as possible. Code that does something useful has much more practical value than unfinished code with some pristine architecture.
Nowhere is this more apparent than in game development, where all code is doomed to be crap and the best you can hope for is to stem the tide. But there’s also a fixed goal that’s completely unrelated to how the code looks: does the game work, and is it fun to play? Yes? Ship the damn thing and forget about it.
Games are also nice because it’s very easy to pour my own feelings into them and evoke feelings in the people who play them. They’re mine, something with my fingerprints on them — even the games I’ve built with glip have plenty of my own hallmarks, little touches I added on a whim or attention to specific details that I care about.
Maybe a better example is the Doom map parser I started writing. It sounds like a “pure” problem again, except that I actually know an awful lot about the subject already! I also cleverly (accidentally) released some useful results of the work I’ve done thusfar — like statistics about Doom II maps and a few screenshots of flipped stock maps — even though I don’t think the parser itself is far enough along to release yet. The tool has served a purpose, one with my fingerprints on it, even without being released publicly. That keeps it fresh in my mind as something interesting I’d like to keep working on, eventually. (When I run into an architecture question, I step back for a while, or I do other work in the hopes that the solution will reveal itself.)
I also made two simple Pokémon ROM hacks this year, despite knowing nothing about Game Boy internals or assembly when I started. I just decided I wanted to do an open-ended thing beyond my reach, and I went to do it, not worrying about cleanliness and willing to accept a bumpy ride to get there. I played, but in a more experienced way, invoking the stuff I know (and the people I’ve met!) to help me get a running start in completely unfamiliar territory.
This feels like a really fine distinction that I’m not sure I’m doing justice. I don’t know if I could’ve appreciated it three or four years ago. But I missed making toys, and I’m glad I’m doing it again.
In short, I forgot how to have fun with programming for a little while, and I’ve finally started to figure it out again. And that’s far more important than whether you use PHP or not.
Post Syndicated from Ernesto original https://torrentfreak.com/hackers-leak-more-confidential-game-of-thrones-files-170808/
Last week, news broke that a hacker, or hackers, had compromised the network of the American cable and television network HBO.
Those responsible sent out an email to reporters, announcing the prominent breach, and leaked files surfaced on the dedicated website Winter-leak.com.
While the latter is no longer accessible, the hackers are not done yet. Another curated batch of leaked files has now appeared online, revealing more Game of Thrones spoilers, marketing plans, and other confidential HBO files.
The first leak put a preliminary outline of the fourth episode of the current Game of Thrones season in the spotlight, and the second batch follows up with the same for the upcoming fifth episode.
Although the outline was prepared over a year ago, it likely contains various accurate spoilers, which we won’t repeat here.
The new data dump, which is a subsection of the 1.5 terabytes of data the hackers claimed to have in their possession, also lists a variety of other Game of Thrones related files.
Among other items, there’s a confidential cast list for the current season, a highly confidential “Game of Ideas” brief, an outline of GoT marketing strategies, and a Game of Thrones roadmap. The information all appears to be a few months old.
The hackers took a screenshot of several folders, where the files may have been taken from, as seen below.
In addition, the hackers provided ‘proof’ that they have emails, which according to AP point to HBO’s vice president for film programming Leslie Cohen.
Finally, the new batch contains a video letter to HBO CEO Richard Plepler, titled “First letter to HBO,” where a certain Mr. Smith takes credit for the hack. The letter offered to keep the information away from the public, in exchange for a ransom payment.
For spoiler-eager Game of Thrones fans the hack is a true treasure trove. However, like the first batch, no leaked episodes are included. And, based on another screenshot, these are probably not on the way either.
A “Series Screenshot” includes a list of likely compromised titles, such as The Deviant Ones and the previously leaked Barry, Ballers, and Room 104, but no Game of Thrones.
A leak of the fourth GoT episode did appear online late last week, but this wasn’t linked to the breach of HBO’s network. Still, HBO is likely not amused and will do everything in its power to catch those responsible.
Post Syndicated from Ernesto original https://torrentfreak.com/next-game-of-thrones-episode-leaks-online-170804/
It’s been a pretty rough week for HBO thus far.
After hackers got their hands on over a terabyte of confidential information, including Game of Thrones scripts, another major leak has just surfaced.
Starting a few hours ago, a copy of the upcoming Game of Thrones episode “The Spoils of War” began to circulate on various file-sharing and streaming sites, including The Pirate Bay.
While most copies are pulled offline quickly, presumably by HBO itself, the unreleased fourth episode of season 7 is still widely available.
Although the leak comes only a few days after the prominent HBO hack, the two might not be related. The leaked episode appears to be an internal release and is tagged with “For Internal Viewing Only” as well as a prominent “Star India Pvt Ltd” watermark.
Star India is a large media company owned by 21st Century Fox, which broadcasts the popular HBO series locally.
Despite being a low-quality leak, plenty of eager Game of Thrones fans are likely to jump on the episode early. Whether the pirated copy is intact, or whether it’s unfinished is unclear. The official release will still take a few more days.
This is not the first time that Game of Thrones episodes have leaked early. Two years ago the same happened with the first four episodes of season 5. Still, leaks or not, that season still broke previous viewership records.
Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/08/04/timeshiftgrafanabuzz-1w-issue-7/
Hard to believe it’s already August! This week there were a ton of articles to highlight. It’s really exciting to see how many data aficionados there are out there coming up with new ways to connect Grafana to their data, wherever it may live. In this issue we cover crypto currency visualization, home automation setups and breakdown the installation in a number of environments. Enjoy!
Grafana 4.4.2 Released
From the Blogosphere
Monitoring CouchDB with Prometheus, Grafana and Docker: Geoff walks us through all of the steps to get Prometheus, Alertmanager and Grafana installed in Docker to monitor and alert on a CouchDB cluster. These six steps will have you up and running in no time.
Try InfluxDB and Grafana by Docker: Continuing with our Docker theme, Xiao breaks down all of the pieces, explores the configuration options, and explains the Docker commands to setup a simple monitoring stack by using collectd, InfluxDB and Grafana.
Installation of Collectd, Graphite and Grafana – Part 2: Last week we covered the first article in a series focused on setting up a complete Graphite stack. This week we tackle installing Graphite, its components, and Grafana on the server.
Grafana and Home Automation: More and more pieces of our homes are becoming “smart”, so why not monitor them? This article walks you through collecting data from home automation software Jeedom, sending metrics to InfluxDB, and visualizing and alerting in Grafana – so you can know how your smart-toaster is performing.
Making an Awesome Dashboard for your Crypto Currencies in 3 Steps: Christian lays out three steps that will help you keep an eye on your Bitcoin and Ethereum investments. His PHP script fetches things like current price, current balances, earnings, and sends the data to InfluxDB via UDP. He’s also created a dashboard that’s ready to import so you can get back to mining.
FHEM #6 – Grafana and InfluxDB: We’re seeing more and more articles about using Grafana to monitor home automation. This is the sixth article in a series getting data from FHEM into Grafana using InfdluxDB. It also touches on connecting Grafana to MariaDB, taking advantage of Grafana’s alpha native MySQL support.
Installation Overview of Node Exporter, Prometheus and Grafana: Looking to get started with Prometheus? Frits walks us through installing Node Exporter, Prometheus, and Grafana and importing our first dashboard.
Collect Metrics from Liberty Apps and Display in Grafana: This in-depth article covers adding custom metrics to your Liberty application and how to monitor these metrics using collectd, Graphite and Grafana.
Gatling, Graphite, Grafana: Your Application Under High Surveillance!: David explores Gatling, for load testing which can write the data to Graphite and over to Grafana for visualization and alerting.
Plugins and Dashboards
Last week’s timeShift was packed full of plugin updates, as well as a couple of new ones. This week was a little quieter on the plugin front, but we still have a new data source plugin to announce. It’s easy to install this new plugin via the grafana-cli for an on-prem Grafana instance, or a 1-click install on Hosted Grafana.
PRTG Data Source – This data source visualizes data from the Paessler PRTG monitoring system. The easy to use query editor included with this plugin gives access to an array of PRTG metadata properties including Status, Message, Active, Tags, Priority, and more. Annotation support to show sensor status messages on graphs.
This week’s MVC (Most Valuable Contributor)
This week we highlight a contributor who is going to make everyone waiting for Elasticsearch alerting in Grafana jump for joy!
Tweet of the Week
We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove
Having fun with pflogsumm and mailq, I'm addicted! I turn boring numbers into beautiful dashboards. How to monitor Zimbra with Grafana, soon pic.twitter.com/WFNtg7vHNk
— Jorge de la Cruz (@jorgedlcruz) August 2, 2017
We love when people talk about Grafana at meetups and conferences.
Wednesday, August 16, 2017 – 7:30pm | Apprenda HQ
433 River Street, 4th Floor, Troy, NY
Kubernetes focused event! demo from Apprenda and how Kubernetes is used @ GitHub:
Steve Wade, is a Principal Kubernetes Consultant from London and will be providing some fundamental information about the Kubernetes ecosystem as well as overview of its core components. He’ll also talk about some monitoring and alerting best practices learned from working with Kubernetes customers and demo how Prometheus, Grafana and Slack can be used to monitor, visualize and alert on both the Kubernetes platform as well as application workloads.
Aaron Brown, a Site Reliability Engineer at Github, will dive into the ways in which Kubernetes is used within Github to make software development and deployment more efficient.
What do you think?
Please tell us how we’re doing. We want to make sure this continues to be a valuable resource for the Grafana community. Submit a comment on this article below, or post something at our community forum. Help us make this better!