Tag Archives: HoC

Healthy Aussie Pirates Set To Face Cash ‘Fines’, Poor & Sick Should Be OK

Post Syndicated from Andy original https://torrentfreak.com/healthy-aussie-pirates-set-to-face-cash-fines-poor-sick-should-be-ok-170821/

One of the oldest methods of trying to get people to stop downloading and sharing pirated material is by hitting them with ‘fines’.

The RIAA began the practice in September 2003, tracking people sharing music on early peer-to-peer networks, finding out their identities via ISPs, and sending them cease-and-desist orders with a request to pay hundreds to thousands of dollars.

Many thousands of people were fined and the campaign raised awareness, but it did nothing to stop millions of file-sharers who continue to this day.

That’s something that Village Roadshow co-chief Graham Burke now wants to do something about. He says his company will effectively mimic the RIAA’s campaign of 14 years ago and begin suing Internet pirates Down Under. He told AFR that his company is already setting things up, ready to begin suing later in the year.

Few details have been made available at this stage but it’s almost certain that Village Roadshow’s targets will be BitTorrent users. It’s possible that users of other peer-to-peer networks could be affected but due to their inefficiency and relative obscurity, it’s very unlikely.

That leaves users of The Pirate Bay and any other torrent site vulnerable to the company, which will jump into torrent swarms masquerading as regular users, track IP addresses, and trace them back to Internet service providers. What happens next will depend on the responses of those ISPs.

If the ISPs refuse to cooperate, they will have to be taken to court to force them to hand over the personal details of their subscribers to Village Roadshow. It’s extremely unlikely they’ll hand them over voluntarily, so it could be some time before any ISP customer hears anything from the film distributor.

The bottom line is that Village Roadshow will want money to go away and Burke is already being open over the kind of sums his company will ask for.

“We will be looking for damages commensurate with what they’ve done. We’ll be saying ‘You’ve downloaded our Mad Max: Fury Road, our Red Dog, and we want $40 for the four movies plus $200 in costs’,” he says.

While no one will relish any kind of ‘bill’ dropping through a mail box, in the scheme of things a AUS$240 settlement demand isn’t huge, especially when compared to the sums demanded by companies such as Voltage Pictures, who tried and failed to start piracy litigation in Australia two years ago.

However, there’s even better news for some, who have already been given a heads-up that they won’t have to pay anything.

“We will identify people who are stealing our product, we will ask them do they have ill health or dire circumstances, and if they do and undertake to stop, we’ll drop the case,” Burke says.

While being upfront about such a policy has its pros and cons, Burke is also reducing his range of targets, particularly if likes to be seen as a man of his word, whenever those words were delivered. In March 2016, when he restated his intention to begin suing pirates, he also excluded some other groups from legal action.

“We don’t want to sue 16-year-olds or mums and dads,” Burke said. “It takes 18 months to go through the courts and all that does is make lawyers rich and clog the court system. It’s not effective.”

It will remain to be seen what criteria Village Roadshow ultimately employs but it’s likely the company will be asked to explain its intentions to the court, when it embarks on the process to discover alleged pirates’ identities. When it’s decided who is eligible, Burke says the gloves will come off, with pirates being “pursued vigorously” and “sued for damages.”

While Village Roadshow’s list of films is considerable, any with a specifically Australian slant seem the most likely to feature in any legal action. Burke tends to push the narrative that he’s looking after local industry so something like Mad Max: Fury Road would be perfect. It would also provide easy pickings for any anti-piracy company seeking to harvest Aussie IP addresses since it’s still very popular.

Finally, it’s worth noting that Australians who use pirate streaming services will be completely immune to the company’s planned lawsuit campaign. However, Burke appears to be tackling that threat using a couple of popular tactics currently being deployed elsewhere by the movie industry.

“Google are not doing enough and could do a lot more,” he told The Australian (subscription)

Burke said that he was “shocked” at how easy it was to find streaming content using Google’s search so decided to carry out some research of his own at home. He said he found Christopher Nolan’s Dunkirk with no difficulty but that came with a sting in the tail.

According to the movie boss, his computer was immediately infected with malware and began asking for his credit card details. He doesn’t say whether he put them in.

As clearly the world’s most unlucky would-be movie pirate, Burke deserves much sympathy. It’s also completely coincidental that Hollywood is now pushing a “danger” narrative to keep people away from pirate sites.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Analyzing AWS Cost and Usage Reports with Looker and Amazon Athena

Post Syndicated from Dillon Morrison original https://aws.amazon.com/blogs/big-data/analyzing-aws-cost-and-usage-reports-with-looker-and-amazon-athena/

This is a guest post by Dillon Morrison at Looker. Looker is, in their own words, “a new kind of analytics platform–letting everyone in your business make better decisions by getting reliable answers from a tool they can use.” 

As the breadth of AWS products and services continues to grow, customers are able to more easily move their technology stack and core infrastructure to AWS. One of the attractive benefits of AWS is the cost savings. Rather than paying upfront capital expenses for large on-premises systems, customers can instead pay variables expenses for on-demand services. To further reduce expenses AWS users can reserve resources for specific periods of time, and automatically scale resources as needed.

The AWS Cost Explorer is great for aggregated reporting. However, conducting analysis on the raw data using the flexibility and power of SQL allows for much richer detail and insight, and can be the better choice for the long term. Thankfully, with the introduction of Amazon Athena, monitoring and managing these costs is now easier than ever.

In the post, I walk through setting up the data pipeline for cost and usage reports, Amazon S3, and Athena, and discuss some of the most common levers for cost savings. I surface tables through Looker, which comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive.

Analysis with Athena

With Athena, there’s no need to create hundreds of Excel reports, move data around, or deploy clusters to house and process data. Athena uses Apache Hive’s DDL to create tables, and the Presto querying engine to process queries. Analysis can be performed directly on raw data in S3. Conveniently, AWS exports raw cost and usage data directly into a user-specified S3 bucket, making it simple to start querying with Athena quickly. This makes continuous monitoring of costs virtually seamless, since there is no infrastructure to manage. Instead, users can leverage the power of the Athena SQL engine to easily perform ad-hoc analysis and data discovery without needing to set up a data warehouse.

After the data pipeline is established, cost and usage data (the recommended billing data, per AWS documentation) provides a plethora of comprehensive information around usage of AWS services and the associated costs. Whether you need the report segmented by product type, user identity, or region, this report can be cut-and-sliced any number of ways to properly allocate costs for any of your business needs. You can then drill into any specific line item to see even further detail, such as the selected operating system, tenancy, purchase option (on-demand, spot, or reserved), and so on.

Walkthrough

By default, the Cost and Usage report exports CSV files, which you can compress using gzip (recommended for performance). There are some additional configuration options for tuning performance further, which are discussed below.

Prerequisites

If you want to follow along, you need the following resources:

Enable the cost and usage reports

First, enable the Cost and Usage report. For Time unit, select Hourly. For Include, select Resource IDs. All options are prompted in the report-creation window.

The Cost and Usage report dumps CSV files into the specified S3 bucket. Please note that it can take up to 24 hours for the first file to be delivered after enabling the report.

Configure the S3 bucket and files for Athena querying

In addition to the CSV file, AWS also creates a JSON manifest file for each cost and usage report. Athena requires that all of the files in the S3 bucket are in the same format, so we need to get rid of all these manifest files. If you’re looking to get started with Athena quickly, you can simply go into your S3 bucket and delete the manifest file manually, skip the automation described below, and move on to the next section.

To automate the process of removing the manifest file each time a new report is dumped into S3, which I recommend as you scale, there are a few additional steps. The folks at Concurrency labs wrote a great overview and set of scripts for this, which you can find in their GitHub repo.

These scripts take the data from an input bucket, remove anything unnecessary, and dump it into a new output bucket. We can utilize AWS Lambda to trigger this process whenever new data is dropped into S3, or on a nightly basis, or whatever makes most sense for your use-case, depending on how often you’re querying the data. Please note that enabling the “hourly” report means that data is reported at the hour-level of granularity, not that a new file is generated every hour.

Following these scripts, you’ll notice that we’re adding a date partition field, which isn’t necessary but improves query performance. In addition, converting data from CSV to a columnar format like ORC or Parquet also improves performance. We can automate this process using Lambda whenever new data is dropped in our S3 bucket. Amazon Web Services discusses columnar conversion at length, and provides walkthrough examples, in their documentation.

As a long-term solution, best practice is to use compression, partitioning, and conversion. However, for purposes of this walkthrough, we’re not going to worry about them so we can get up-and-running quicker.

Set up the Athena query engine

In your AWS console, navigate to the Athena service, and click “Get Started”. Follow the tutorial and set up a new database (we’ve called ours “AWS Optimizer” in this example). Don’t worry about configuring your initial table, per the tutorial instructions. We’ll be creating a new table for cost and usage analysis. Once you walked through the tutorial steps, you’ll be able to access the Athena interface, and can begin running Hive DDL statements to create new tables.

One thing that’s important to note, is that the Cost and Usage CSVs also contain the column headers in their first row, meaning that the column headers would be included in the dataset and any queries. For testing and quick set-up, you can remove this line manually from your first few CSV files. Long-term, you’ll want to use a script to programmatically remove this row each time a new file is dropped in S3 (every few hours typically). We’ve drafted up a sample script for ease of reference, which we run on Lambda. We utilize Lambda’s native ability to invoke the script whenever a new object is dropped in S3.

For cost and usage, we recommend using the DDL statement below. Since our data is in CSV format, we don’t need to use a SerDe, we can simply specify the “separatorChar, quoteChar, and escapeChar”, and the structure of the files (“TEXTFILE”). Note that AWS does have an OpenCSV SerDe as well, if you prefer to use that.

 

CREATE EXTERNAL TABLE IF NOT EXISTS cost_and_usage	 (
identity_LineItemId String,
identity_TimeInterval String,
bill_InvoiceId String,
bill_BillingEntity String,
bill_BillType String,
bill_PayerAccountId String,
bill_BillingPeriodStartDate String,
bill_BillingPeriodEndDate String,
lineItem_UsageAccountId String,
lineItem_LineItemType String,
lineItem_UsageStartDate String,
lineItem_UsageEndDate String,
lineItem_ProductCode String,
lineItem_UsageType String,
lineItem_Operation String,
lineItem_AvailabilityZone String,
lineItem_ResourceId String,
lineItem_UsageAmount String,
lineItem_NormalizationFactor String,
lineItem_NormalizedUsageAmount String,
lineItem_CurrencyCode String,
lineItem_UnblendedRate String,
lineItem_UnblendedCost String,
lineItem_BlendedRate String,
lineItem_BlendedCost String,
lineItem_LineItemDescription String,
lineItem_TaxType String,
product_ProductName String,
product_accountAssistance String,
product_architecturalReview String,
product_architectureSupport String,
product_availability String,
product_bestPractices String,
product_cacheEngine String,
product_caseSeverityresponseTimes String,
product_clockSpeed String,
product_currentGeneration String,
product_customerServiceAndCommunities String,
product_databaseEdition String,
product_databaseEngine String,
product_dedicatedEbsThroughput String,
product_deploymentOption String,
product_description String,
product_durability String,
product_ebsOptimized String,
product_ecu String,
product_endpointType String,
product_engineCode String,
product_enhancedNetworkingSupported String,
product_executionFrequency String,
product_executionLocation String,
product_feeCode String,
product_feeDescription String,
product_freeQueryTypes String,
product_freeTrial String,
product_frequencyMode String,
product_fromLocation String,
product_fromLocationType String,
product_group String,
product_groupDescription String,
product_includedServices String,
product_instanceFamily String,
product_instanceType String,
product_io String,
product_launchSupport String,
product_licenseModel String,
product_location String,
product_locationType String,
product_maxIopsBurstPerformance String,
product_maxIopsvolume String,
product_maxThroughputvolume String,
product_maxVolumeSize String,
product_maximumStorageVolume String,
product_memory String,
product_messageDeliveryFrequency String,
product_messageDeliveryOrder String,
product_minVolumeSize String,
product_minimumStorageVolume String,
product_networkPerformance String,
product_operatingSystem String,
product_operation String,
product_operationsSupport String,
product_physicalProcessor String,
product_preInstalledSw String,
product_proactiveGuidance String,
product_processorArchitecture String,
product_processorFeatures String,
product_productFamily String,
product_programmaticCaseManagement String,
product_provisioned String,
product_queueType String,
product_requestDescription String,
product_requestType String,
product_routingTarget String,
product_routingType String,
product_servicecode String,
product_sku String,
product_softwareType String,
product_storage String,
product_storageClass String,
product_storageMedia String,
product_technicalSupport String,
product_tenancy String,
product_thirdpartySoftwareSupport String,
product_toLocation String,
product_toLocationType String,
product_training String,
product_transferType String,
product_usageFamily String,
product_usagetype String,
product_vcpu String,
product_version String,
product_volumeType String,
product_whoCanOpenCases String,
pricing_LeaseContractLength String,
pricing_OfferingClass String,
pricing_PurchaseOption String,
pricing_publicOnDemandCost String,
pricing_publicOnDemandRate String,
pricing_term String,
pricing_unit String,
reservation_AvailabilityZone String,
reservation_NormalizedUnitsPerReservation String,
reservation_NumberOfReservations String,
reservation_ReservationARN String,
reservation_TotalReservedNormalizedUnits String,
reservation_TotalReservedUnits String,
reservation_UnitsPerReservation String,
resourceTags_userName String,
resourceTags_usercostcategory String  


)
    ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
      ESCAPED BY '\\'
      LINES TERMINATED BY '\n'

STORED AS TEXTFILE
    LOCATION 's3://<<your bucket name>>';

Once you’ve successfully executed the command, you should see a new table named “cost_and_usage” with the below properties. Now we’re ready to start executing queries and running analysis!

Start with Looker and connect to Athena

Setting up Looker is a quick process, and you can try it out for free here (or download from Amazon Marketplace). It takes just a few seconds to connect Looker to your Athena database, and Looker comes with a host of pre-built data models and dashboards to make analysis of your cost and usage data simple and intuitive. After you’re connected, you can use the Looker UI to run whatever analysis you’d like. Looker translates this UI to optimized SQL, so any user can execute and visualize queries for true self-service analytics.

Major cost saving levers

Now that the data pipeline is configured, you can dive into the most popular use cases for cost savings. In this post, I focus on:

  • Purchasing Reserved Instances vs. On-Demand Instances
  • Data transfer costs
  • Allocating costs over users or other Attributes (denoted with resource tags)

On-Demand, Spot, and Reserved Instances

Purchasing Reserved Instances vs On-Demand Instances is arguably going to be the biggest cost lever for heavy AWS users (Reserved Instances run up to 75% cheaper!). AWS offers three options for purchasing instances:

  • On-Demand—Pay as you use.
  • Spot (variable cost)—Bid on spare Amazon EC2 computing capacity.
  • Reserved Instances—Pay for an instance for a specific, allotted period of time.

When purchasing a Reserved Instance, you can also choose to pay all-upfront, partial-upfront, or monthly. The more you pay upfront, the greater the discount.

If your company has been using AWS for some time now, you should have a good sense of your overall instance usage on a per-month or per-day basis. Rather than paying for these instances On-Demand, you should try to forecast the number of instances you’ll need, and reserve them with upfront payments.

The total amount of usage with Reserved Instances versus overall usage with all instances is called your coverage ratio. It’s important not to confuse your coverage ratio with your Reserved Instance utilization. Utilization represents the amount of reserved hours that were actually used. Don’t worry about exceeding capacity, you can still set up Auto Scaling preferences so that more instances get added whenever your coverage or utilization crosses a certain threshold (we often see a target of 80% for both coverage and utilization among savvy customers).

Calculating the reserved costs and coverage can be a bit tricky with the level of granularity provided by the cost and usage report. The following query shows your total cost over the last 6 months, broken out by Reserved Instance vs other instance usage. You can substitute the cost field for usage if you’d prefer. Please note that you should only have data for the time period after the cost and usage report has been enabled (though you can opt for up to 3 months of historical data by contacting your AWS Account Executive). If you’re just getting started, this query will only show a few days.

 

SELECT 
	DATE_FORMAT(from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate),'%Y-%m') AS "cost_and_usage.usage_start_month",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_ris",
	COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_non_reserved_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_on_non_ris"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

The resulting table should look something like the image below (I’m surfacing tables through Looker, though the same table would result from querying via command line or any other interface).

With a BI tool, you can create dashboards for easy reference and monitoring. New data is dumped into S3 every few hours, so your dashboards can update several times per day.

It’s an iterative process to understand the appropriate number of Reserved Instances needed to meet your business needs. After you’ve properly integrated Reserved Instances into your purchasing patterns, the savings can be significant. If your coverage is consistently below 70%, you should seriously consider adjusting your purchase types and opting for more Reserved instances.

Data transfer costs

One of the great things about AWS data storage is that it’s incredibly cheap. Most charges often come from moving and processing that data. There are several different prices for transferring data, broken out largely by transfers between regions and availability zones. Transfers between regions are the most costly, followed by transfers between Availability Zones. Transfers within the same region and same availability zone are free unless using elastic or public IP addresses, in which case there is a cost. You can find more detailed information in the AWS Pricing Docs. With this in mind, there are several simple strategies for helping reduce costs.

First, since costs increase when transferring data between regions, it’s wise to ensure that as many services as possible reside within the same region. The more you can localize services to one specific region, the lower your costs will be.

Second, you should maximize the data you’re routing directly within AWS services and IP addresses. Transfers out to the open internet are the most costly and least performant mechanisms of data transfers, so it’s best to keep transfers within AWS services.

Lastly, data transfers between private IP addresses are cheaper than between elastic or public IP addresses, so utilizing private IP addresses as much as possible is the most cost-effective strategy.

The following query provides a table depicting the total costs for each AWS product, broken out transfer cost type. Substitute the “lineitem_productcode” field in the query to segment the costs by any other attribute. If you notice any unusually high spikes in cost, you’ll need to dig deeper to understand what’s driving that spike: location, volume, and so on. Drill down into specific costs by including “product_usagetype” and “product_transfertype” in your query to identify the types of transfer costs that are driving up your bill.

SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost), 0) AS "cost_and_usage.total_unblended_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-In')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_inbound_data_transfer_cost",
	COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer-Out')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0) AS "cost_and_usage.total_outbound_data_transfer_cost"
FROM aws_optimizer.cost_and_usage  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 500

When moving between regions or over the open web, many data transfer costs also include the origin and destination location of the data movement. Using a BI tool with mapping capabilities, you can get a nice visual of data flows. The point at the center of the map is used to represent external data flows over the open internet.

Analysis by tags

AWS provides the option to apply custom tags to individual resources, so you can allocate costs over whatever customized segment makes the most sense for your business. For a SaaS company that hosts software for customers on AWS, maybe you’d want to tag the size of each customer. The following query uses custom tags to display the reserved, data transfer, and total cost for each AWS service, broken out by tag categories, over the last 6 months. You’ll want to substitute the cost_and_usage.resourcetags_customersegment and cost_and_usage.customer_segment with the name of your customer field.

 

SELECT * FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY z___min_rank) as z___pivot_row_rank, RANK() OVER (PARTITION BY z__pivot_col_rank ORDER BY z___min_rank) as z__pivot_col_ordering FROM (
SELECT *, MIN(z___rank) OVER (PARTITION BY "cost_and_usage.product_code") as z___min_rank FROM (
SELECT *, RANK() OVER (ORDER BY CASE WHEN z__pivot_col_rank=1 THEN (CASE WHEN "cost_and_usage.total_unblended_cost" IS NOT NULL THEN 0 ELSE 1 END) ELSE 2 END, CASE WHEN z__pivot_col_rank=1 THEN "cost_and_usage.total_unblended_cost" ELSE NULL END DESC, "cost_and_usage.total_unblended_cost" DESC, z__pivot_col_rank, "cost_and_usage.product_code") AS z___rank FROM (
SELECT *, DENSE_RANK() OVER (ORDER BY CASE WHEN "cost_and_usage.customer_segment" IS NULL THEN 1 ELSE 0 END, "cost_and_usage.customer_segment") AS z__pivot_col_rank FROM (
SELECT 
	cost_and_usage.lineitem_productcode  AS "cost_and_usage.product_code",
	cost_and_usage.resourcetags_customersegment  AS "cost_and_usage.customer_segment",
	COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0) AS "cost_and_usage.total_unblended_cost",
	1.0 * (COALESCE(SUM(CASE WHEN REGEXP_LIKE(cost_and_usage.product_usagetype, 'DataTransfer')    THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.percent_spend_data_transfers_unblended",
	1.0 * (COALESCE(SUM(CASE WHEN (CASE
         WHEN cost_and_usage.lineitem_lineitemtype = 'DiscountedUsage' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'RIFee' THEN 'RI Line Item'
         WHEN cost_and_usage.lineitem_lineitemtype = 'Fee' THEN 'RI Line Item'
         ELSE 'Non RI Line Item'
        END = 'Non RI Line Item') THEN cost_and_usage.lineitem_unblendedcost  ELSE NULL END), 0)) / NULLIF((COALESCE(SUM(cost_and_usage.lineitem_unblendedcost ), 0)),0)  AS "cost_and_usage.unblended_percent_spend_on_ris"
FROM aws_optimizer.cost_and_usage_raw  AS cost_and_usage

WHERE 
	(((from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) >= ((DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))) AND (from_iso8601_timestamp(cost_and_usage.lineitem_usagestartdate)) < ((DATE_ADD('month', 6, DATE_ADD('month', -5, DATE_TRUNC('MONTH', CAST(NOW() AS DATE))))))))
GROUP BY 1,2) ww
) bb WHERE z__pivot_col_rank <= 16384
) aa
) xx
) zz
 WHERE z___pivot_row_rank <= 500 OR z__pivot_col_ordering = 1 ORDER BY z___pivot_row_rank

The resulting table in this example looks like the results below. In this example, you can tell that we’re making poor use of Reserved Instances because they represent such a small portion of our overall costs.

Again, using a BI tool to visualize these costs and trends over time makes the analysis much easier to consume and take action on.

Summary

Saving costs on your AWS spend is always an iterative, ongoing process. Hopefully with these queries alone, you can start to understand your spending patterns and identify opportunities for savings. However, this is just a peek into the many opportunities available through analysis of the Cost and Usage report. Each company is different, with unique needs and usage patterns. To achieve maximum cost savings, we encourage you to set up an analytics environment that enables your team to explore all potential cuts and slices of your usage data, whenever it’s necessary. Exploring different trends and spikes across regions, services, user types, etc. helps you gain comprehensive understanding of your major cost levers and consistently implement new cost reduction strategies.

Note that all of the queries and analysis provided in this post were generated using the Looker data platform. If you’re already a Looker customer, you can get all of this analysis, additional pre-configured dashboards, and much more using Looker Blocks for AWS.


About the Author

Dillon Morrison leads the Platform Ecosystem at Looker. He enjoys exploring new technologies and architecting the most efficient data solutions for the business needs of his company and their customers. In his spare time, you’ll find Dillon rock climbing in the Bay Area or nose deep in the docs of the latest AWS product release at his favorite cafe (“Arlequin in SF is unbeatable!”).

 

 

 

“Public Figure” Threatened With Exposure Over Gay Piracy ‘Fine’

Post Syndicated from Andy original https://torrentfreak.com/public-figure-threatened-with-exposure-over-gay-piracy-fine-170817/

Flava Works is an Illinois-based company specializing in adult material featuring black and Latino men. It operates an aggressive anti-piracy strategy which has resulted in some large damages claims in the past.

Now, however, the company has found itself targeted by a lawsuit filed by one of its alleged victims. Filed in a California district court by an unnamed individual, it accuses Flava Works of shocking behavior relating to a claim of alleged piracy.

According to the lawsuit, ‘John Doe’ received a letter in early June from Flava Works CEO Phillip Bleicher, accusing him of Internet piracy. Titled “Settlement Demand and Cease and Desist”, the letter got straight to the point.

“Flava Works is aware that you have been ‘pirating’ the content from its website(s) for your own personal financial benefit,” the letter read.

[Update: ‘John Doe’ has now been identified as Marc Juris, President & General Manager of AMC-owned WE tv. All references to John Doe below refer to Juris. See note at footer]

As is often the case with such claims, Flava Works offered to settle with John Doe for a cash fee. However, instead of the few hundred or thousand dollars usually seen in such cases, the initial settlement amount was an astronomical $97,000. But that wasn’t all.

According to John Doe, Bleicher warned that unless the money was paid in ten days, Flava Works “would initiate litigation against [John Doe], publically accusing him of being a consumer and pirate of copyrighted gay adult entertainment.”

Amping up the pressure, Bleicher then warned that after the ten-day deadline had passed, the settlement amount of $97,000 would be withdrawn and replaced with a new amount – $525,000.

The lawsuit alleges that Bleicher followed up with more emails in which he indicated that there was still time to settle the matter “one on one” since the case hadn’t been assigned to an attorney. However, he warned John Doe that time was running out and that public exposure via a lawsuit would be the next step.

While these kinds of tactics are nothing new in copyright infringement cases, the amounts of money involved are huge, indicating something special at play. Indeed, it transpires that John Doe is a public figure in the entertainment industry and the suggestion is that Flava Works’ assessment of his “wealth and profile” means he can pay these large sums.

According to the suit, on July 6, 2017, Bleicher sent another email to John Doe which “alluded to [his] high-profile status and to the potential publicity that a lawsuit would bring.” The email went as far as threatening an imminent Flava Works press release, announcing that a public figure, who would be named, was being sued for pirating gay adult content.

Flava Works alleges that John Doe uploaded its videos to various BitTorrent sites and forums, but John Doe vigorously denies the accusations, noting that the ‘evidence’ presented by Flava Works fails to back up its claims.

“The materials do not reveal or expose infringement of any sort. [Flava Works’] real purpose in sending this ‘proof’ was to demonstrate just how humiliating it would be to defend against Flava Works’ scurrilous charges,” John Doe’s lawsuit notes.

“[Flava Works’] materials consist largely of screen shots of extremely graphic images of pornography, which [Flava Works] implies that [John Doe] has viewed — but which are completely irrelevant given that they are not Flava Works content. Nevertheless, Bleicher assured [John Doe] that these materials would all be included in a publicly filed lawsuit if he refused to accede to [Flava Works’] payment demands.”

From his lawsuit (pdf) it’s clear that John Doe is in no mood to pay Flava Works large sums of cash and he’s aggressively on the attack, describing the company’s demands as “criminal extortion.”

He concludes with a request for a declaration that he has not infringed Flava Works’ copyrights, while demanding attorneys’ fees and further relief to be determined by the court.

The big question now is whether Flava Works will follow through with its threats to exposure the entertainer, or whether it will drift back into the shadows to fight another day. Definitely one to watch.

Update: Flava Works has now followed through on its threat to sue Juris. A complaint filed iat an Illinois court accuses the TV executive of uploading Flava Works titles to several gay-focused torrent sites in breach of copyright. It demands $1.2m in damages.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

72-Year-Old Man Accused of ‘Pirating’ Over a Thousand Torrents

Post Syndicated from Ernesto original https://torrentfreak.com/72-year-old-man-accused-of-pirating-over-a-thousand-torrents-170810/

In recent years, file-sharers around the world have been pressured to pay significant settlement fees, or face legal repercussions.

These so-called ‘copyright trolling‘ efforts are a common occurrence in the United States too, where hundreds of thousands of people have been targeted in recent years.

While a significant number of defendants are indeed guilty, there are also many that are wrongfully accused. Third-parties may have connected to their Wi-Fi, for example, which isn’t a rarity.

In Hawaii, a recent target of a copyright trolling expedition claims to be innocent, and he’s taken his case to the local press. The 72-year-old John J. Harding doesn’t fit the typical profile of a prolific pirate, but that’s exactly what a movie company has accused him of being.

In June, Harding received a letter from local attorney Kerry Culpepper, who works for the rightsholders of movies such as ‘Mechanic: Resurrection’ and ‘Once Upon a Time in Venice.’

The letter accused the 72-year-old of downloading a movie and also listed over 1,000 other downloads that were tied to his IP-address. Harding was understandably shocked by the threat and says he never downloads anything.

“I’ve never illegally downloaded anything … or even legally! I use my computer for email, games, news and that’s about it,” Harding told HawaiiNewsNow.

“I know definitely that I’m not guilty and my wife is not guilty. So what’s going on? Did somebody hack us? Is somebody out there actively hacking us? How they do that and go about doing that, I have no idea,” Harding added.

As is common in these cases, the copyright holder asked the Hawaii Federal Court for a subpoena, which ordered the associated Internet provider to hand over the personal details of the alleged infringers. The attorney then went on to send out settlement requests to the exposed users.

Harding received a letter offering an easy $3,900 settlement, which would increase to $4,900 if he failed to respond before August 7th. However, the elderly man wasn’t keen on taking the deal, describing the pay-up-or-else demand as “absolutely absurd.”

The attorney reiterated to the local newspaper that these are not idle threats. People risk $150,000 per illegal download, he stressed. That said, mistakes happen and people who feel that they are wrongfully accused should contact his office.

Culpepper explained it further with an analogy while adding a new dimension to the ‘you wouldn’t steal a car’ meme in the process.

“This is similar to a car stolen. If your car was stolen and your car hit someone or did some damage, initially the victim would look to see who was the owner of the car. You would probably tell them, someone stole my car. That time, that person would try to find the person who stole your car,” he said.

The attorney says that they are not trying to bankrupt people. Their goal is to deter piracy. There are cases where they’ve accepted lower settlements or even a mere apology, he notes.

How the 72-year-old will respond in unknown, but judging for his tone he may be looking for an apology himself. Going to the press was probably a smart move, as rightsholders generally don’t like the PR that comes with this kind of story.

These cases are by no means unique though. While browsing through the court dockets of Culpepper’s recent cases we quickly stumbled upon a similar denial. This one comes from a Honolulu woman who’s accused of pirating ‘Mechanic: Resurrection.’

“I have never downloaded the movie they are referencing and when I do download movies I use legal services such as Amazon, and Apple TV,” she wrote to the court, urging it to keep her personal information private.

“I do have frequent guests at our house often using the Internet. In the future I will request that nobody uses any file sharing on our Internet connection,” the letter added.

Unfortunately for her, the letter includes her full name and address, which means that she has effectively exposed herself. This likely means that she will soon receive a settlement request in the mail, just like Harding did, if she hasn’t already.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Darth Beats: Star Wars LEGO gets a musical upgrade

Post Syndicated from Janina Ander original https://www.raspberrypi.org/blog/darth-beats/

Dan Aldred, Raspberry Pi Certified Educator and creator of the website TeCoEd, has built Darth Beats by managing to fit a Pi Zero W and a Pimoroni Speaker pHAT into a LEGO Darth Vader alarm clock! The Pi force is strong with this one.

Darth Beats MP3 Player

Pimoroni Speaker pHAT and Raspberry Pi Zero W embedded into a Lego Darth Vader Alarm clock to create – “Darth Beats MP3 Player”. Video demonstrating all the features and functions of the project. Alarm Clock – https://goo.gl/VSMhG4 Speaker pHAT – https://shop.pimoroni.com/products/speaker-phat

Darth Beats inspiration: I have a very good feeling about this!

As we all know, anything you love gets better when you add something else you love: chocolate ice cream + caramel sauce, apple tart + caramel sauce, pizza + caramel sau— okay, maybe not anything, but you get what I’m saying.

The formula, in the form of “LEGO + Star Wars”, applies to Dan’s LEGO Darth Vader alarm clock. His Darth Vader, however, was sitting around on a shelf, just waiting to be hacked into something even cooler. Then one day, inspiration struck: Dan decided to aim for exponential awesomeness by integrating Raspberry Pi and Pimoroni technology to turn Vader into an MP3 player.

Darth Beats assembly: always tell me the mods!

The space inside the LEGO device measures a puny 6×3×3 cm, so cramming in the Zero W and the pHAT was going to be a struggle. But Dan grabbed his dremel and set to work, telling himself to “do or do not. There is no try.”

Darth Beats dremel

I find your lack of space disturbing.

He removed the battery compartment, and added two additional buttons in its place. Including the head, his Darth Beats has seven buttons, which means it is fully autonomous as a music player.

Darth Beats back buttons

Almost ready to play a silly remix of Yoda quotes

Darth Beats can draw its power from a wall socket, or from a portable battery pack, as shown in Dan’s video. Dan used the GPIO Zero Python library to set up ‘on’ and ‘off’ switches, and buttons for skipping tracks and controlling volume.

For more details on the build process, read his blog, and check out his video log:

Making Darth Beats

Short video showing you how I created the “Darth Beats MP3 Player”.

Accessing Darth Beats: these are the songs you’re looking for

When you press the ‘on’ switch, the Imperial March sounds before Darth Beats asks “What is thy bidding, my master?”. Then the device is ready to play music. Dan accomplished this by using Cron to run his scripts as soon as the Zero W boots up. MP3 files are played with the help of the Pygame library.

Of course, over time it would become boring to only be able to listen to songs that are stored on the Zero W. However, Dan got around this issue by accessing the Zero W remotely. He set up an online file upload system to add and remove MP3 files from the player. To do this, he used Droopy, an file sharing server software package written by Pierre Duquesne.

IT’S A TRAP!

There’s no reason to use this quote, but since it’s the Star Wars line I use most frequently, I’m adding it here anyway. It’s my post, and I can do what I want!

As you can imagine, there’s little that gets us more excited at Pi Towers than a Pi-powered Star Wars build. Except maybe a Harry Potter-themed project? What are your favourite geeky builds? Are you maybe even working on one yourself? Be sure to send us nerdy joy by sharing your links in the comments!

The post Darth Beats: Star Wars LEGO gets a musical upgrade appeared first on Raspberry Pi.

TVAddons Returns, But in Ugly War With Canadian Telcos Over Kodi Addons

Post Syndicated from Andy original https://torrentfreak.com/tvaddons-returns-ugly-war-canadian-telcos-kodi-addons-170801/

After Dish Network filed a lawsuit against TVAddons in Texas, several high-profile Kodi addons took the decision to shut down. Soon after, TVAddons itself went offline.

In the weeks that followed, several TVAddons-related domains were signed over (1,2) to a Canadian law firm, a mysterious situation that didn’t dovetail well with the US-based legal action.

TorrentFreak can now reveal that the shutdown of TVAddons had nothing to do with the US action and everything to do with a separate lawsuit filed in Canada.

The complaint against TVAddons

Two months ago on June 2, a collection of Canadian telecoms giants including Bell Canada, Bell ExpressVu, Bell Media, Videotron, Groupe TVA, Rogers Communications and Rogers Media, filed a complaint in Federal Court against Montreal resident, Adam Lackman, the man behind TVAddons.

The 18-page complaint details the plaintiffs’ case against Lackman, claiming that he communicated copyrighted TV shows including Game of Thrones, Prison Break, The Big Bang Theory, America’s Got Talent, Keeping Up With The Kardashians and dozens more, to the public in breach of copyright.

The key claim is that Lackman achieved this by developing, hosting, distributing or promoting Kodi add-ons.

Adam Lackman, the man behind TVAddons (@adam.lackman on Instagram)

A total of 18 major add-ons are detailed in the complaint including 1Channel, Exodus, Phoenix, Stream All The Sources, SportsDevil, cCloudTV and Alluc, to name a few. Also under the spotlight is the ‘FreeTelly’ custom Kodi build distributed by TVAddons alongside its Kodi configuration tool, Indigo.

“[The defendant] has made the [TV shows] available to the public by telecommunication in a way that allows members of the public to have access to them from a place and at a time individually chosen by them…consequently infringing the Plaintiffs’ copyright…in contravention of sections 2.4(1.1), 3(1)(f) and 27(1) of the Copyright Act,” the complaint reads.

The complaint alleges that Lackman “induced and/or authorized users” of the FreeTelly and Indigo tools to carry out infringement by his handling and promotion of infringing add-ons, including through TVAddons.ag and Offshoregit.com, in contravention of sections 3(1)(f) and 27(1) of the Copyright Act.

“Approximately 40 million unique users located around the world are actively using Infringing Addons hosted by TVAddons every month, and approximately 900,000 Canadian households use Infringing Add-ons to access television content. The amount of users of Infringing add-ons hosted TVAddons is constantly increasing,” the complaint adds.

To limit the harm allegedly caused by TVAddons, the complaint asked for interim, interlocutory, and permanent injunctions restraining Lackman and associates from developing, promoting or distributing any of the allegedly infringing add-ons or software. On top, the plaintiffs requested punitive and exemplary damages, plus costs.

The interim injunction and Anton Piller Order

Following the filing of the complaint, on June 9 the Federal Court handed down a time-limited interim injunction against Lackman which restrained him from various activities in respect of TVAddons. The process took place ex parte, meaning in secret, without Lackman being able to mount a defense.

The Court also authorized a bailiff and computer forensics experts to take control of Internet domains including TVAddons.ag and Offshoregit.com plus social media and hosting provider accounts for a period of 14 days. These were transferred to Daniel Drapeau at DrapeauLex, an independent court-appointed supervising counsel.

The order also contained an Anton Piller order, a civil search warrant that grants plaintiffs no-notice permission to enter a defendant’s premises in order to secure and copy evidence to support their case, before it can be destroyed or tampered with.

The order covered not only data related to the TVAddons platform, such as operating and financial details, revenues, and banking information, but everything in Lackman’s possession.

The Court ordered the telecoms companies to inform Lackman that the case against him is a civil proceeding and that he could deny entry to his property if he wished. However, that option would put him in breach of the order and would place him at risk of being fined or even imprisoned. Catch 22 springs to mind.

The Court did, however, put limits on the number of people that could be present during the execution of the Anton Piller order (ostensibly to avoid intimidation) and ordered the plaintiffs to deposit CAD$50,000 with the Court, in case the order was improperly executed. That decision would later prove an important one.

The search and interrogation of TVAddons’ operator

On June 12, the order was executed and Lackman’s premises were searched for more than 16 hours. For nine hours he was interrogated and effectively denied his right to remain silent since non-cooperation with an Anton Piller order amounts to contempt of court. The Court’s stated aim of not intimidating Lackman failed.

The TVAddons operator informs TorrentFreak that he heard a disturbance in the hallway outside and spotted several men hiding on the other side of the door. Fearing for his life, Lackman called the police and when they arrived he opened the door. At this point, the police were told by those in attendance to leave, despite Lackman’s protests.

Once inside, Lackman was told he had an hour to find a lawyer, but couldn’t use any electronic device to get one. Throughout the entire day, Lackman says he was reminded by the plaintiffs’ lawyer that he could be held in contempt of court and jailed, even though he was always cooperating.

“I had to sit there and not leave their sight. I was denied access to medication,” Lackman told TorrentFreak. “I had a doctor’s appointment I was forced to miss. I wasn’t even allowed to call and cancel.”

In papers later filed with the court by Lackman’s team, the Anton Piller order was described as a “bombe atomique” since TVAddons had never been served with so much as a copyright takedown notice in advance of this action.

The Anton Piller controversy

Anton Piller orders are only valid when passing a three-step test: when there is a strong prima facie case against the respondent, the damage – potential or actual – is serious for the applicant, and when there is a real possibility that evidence could be destroyed.

For Bell Canada, Bell ExpressVu, Bell Media, Videotron, Groupe TVA, Rogers Communications and Rogers Media, serious problems emerged on at least two of these points after the execution of the order.

For example, TVAddons carried more than 1,500 add-ons yet only 1% of those add-ons were considered to be infringing, a tiny number in the overall picture. Then there was the not insignificant problem with the exchange that took place during the hearing to obtain the order, during which Lackman was not present.

Clearly, the securing of existing evidence wasn’t the number one priority.

Plaintiffs: We want to destroy TVAddons

And the problems continued.

No right to remain silent, no right to consult a lawyer

The Anton Piller search should have been carried out between 8am and 8pm but actually carried on until midnight. As previously mentioned, Adam Lackman was effectively denied his right to remain silent and was forbidden from getting advice from his lawyer.

None of this sat well with the Honourable B. Richard Bell during a subsequent Federal Court hearing to consider the execution of the Anton Piller order.

“It is important to note that the Defendant was not permitted to refuse to answer questions under fear of contempt proceedings, and his counsel was not permitted to clarify the answers to questions. I conclude unhesitatingly that the Defendant was subjected to an examination for discovery without any of the protections normally afforded to litigants in such circumstances,” the Judge said.

“Here, I would add that the ‘questions’ were not really questions at all. They took the form of orders or directions. For example, the Defendant was told to ‘provide to the bailiff’ or ‘disclose to the Plaintiffs’ solicitors’.”

Evidence preservation? More like a fishing trip

But shockingly, the interrogation of Lackman went much, much further. TorrentFreak understands that the TVAddons operator was given a list of 30 names of people that might be operating sites or services similar to TVAddons. He was then ordered to provide all of the information he had on those individuals.

Of course, people tend to guard their online identities so it’s possible that the information provided by Lackman will be of limited use, but Judge Bell was not happy that the Anton Piller order was abused by the plaintiffs in this way.

“I conclude that those questions, posed by Plaintiffs’ counsel, were solely made in furtherance of their investigation and constituted a hunt for further evidence, as opposed to the preservation of then existing evidence,” he wrote in a June 29 order.

But he was only just getting started.

Plaintiffs unlawfully tried to destroy TVAddons before trial

The Judge went on to note that from their own mouths, the Anton Piller order was purposely designed by the plaintiffs to completely shut down TVAddons, despite the fact that only a tiny proportion of the add-ons available on the site were allegedly used to infringe copyright.

“I am of the view that [the order’s] true purpose was to destroy the livelihood of the Defendant, deny him the financial resources to finance a defense to the claim made against him, and to provide an opportunity for discovery of the Defendant in circumstances where none of the procedural safeguards of our civil justice system could be engaged,” Judge Bell wrote.

As noted, plaintiffs must also have a “strong prima facie case” to obtain an Anton Piller order but Judge Bell says he’s not convinced that one exists. Instead, he praised the “forthright manner” of Lackman, who successfully compared the ability of Kodi addons to find content in the same way as Google search can.

So why the big turn around?

Judge Bell said that while the prima facie case may have appeared strong before the judge who heard the matter ex parte (without Lackman being present to defend himself), the subsequent adversarial hearing undermined it, to the point that it no longer met the threshold.

As a result of these failings, Judge Bell declared the Anton Piller order unlawful. Things didn’t improve for the plaintiffs on the injunction front either.

The Judge said that he believes that Lackman has “an arguable case” that he is not violating the Copyright Act by merely providing addons and that TVAddons is his only source of income. So, if an injunction to close the site was granted, the litigation would effectively be over, since the plaintiffs already admitted that their aim was to neutralize the platform.

If the platform was neutralized, Lackman could no longer earn money from the site, which would harm his ability to mount a defense.

“In considering the balance of convenience, I also repeat that the plaintiffs admit that the vast majority of add-ons are non-infringing. Whether the remaining approximately 1% are infringing is very much up for debate. For these reasons, I find the balance of convenience favors the defendant, and no interlocutory injunction will be issued,” the Judge declared.

With the Anton Piller order declared unlawful and no interlocutory injunction (one effective until the final determination of the case) handed down, things were about to get worse for the telecoms companies.

They had paid CAD$50,000 to the court in security in case things went wrong with the Anton Piller order, so TVAddons was entitled to compensation from that amount. That would be helpful, since at this point TVAddons had already run up CAD$75,000 in legal expenses.

On top, the Judge told independent counsel to give everything seized during the Anton Piller search back to Lackman.

The order to return items previously seized

But things were far from over. Within days, the telecoms companies took the decision to the Court of Appeal, asking for a stay of execution (a delay in carrying out a court order) to retain possession of items seized, including physical property, domains, and social media accounts.

Mid-July the appeal was granted and certain confidentiality clauses affecting independent counsel (including Daniel Drapeau, who holds the TVAddons’ domains) were ordered to be continued. However, considering the problems with the execution of the Anton Piller order, Bell Canada, TVA, Videotron and Rogers et al, were ordered to submit an additional security bond of CAD$140,000, on top of the CAD$50,000 already deposited.

So the battle continues, and continue it will

Speaking with TorrentFreak, Adam Lackman says that he has no choice but to fight the telcoms companies since not doing so would result in a loss by default judgment. Interestingly, both he and one of the judges involved in the case thus far believe he has an arguable case.

Lackman says that his activities are protected under the Canadian Copyright Act, specifically subparagraph 2.4(1)(b) which states as follows:

A person whose only act in respect of the communication of a work or other subject-matter to the public consists of providing the means of telecommunication necessary for another person to so communicate the work or other subject-matter does not communicate that work or other subject-matter to the public;

Of course, finding out whether that’s indeed the case will be a costly endeavor.

“It all comes down to whether we will have the financial resources necessary to mount our defense and go to trial. We won’t have ad revenue coming in, since losing our domain names means that we’ll lose the majority of our traffic for quite some time into the future,” Lackman told TF in a statement.

“We’re hoping that others will be as concerned as us about big companies manipulating the law in order to shut down what they see as competition. We desperately need help in financially supporting our legal defense, we cannot do it alone.

“We’ve run up a legal bill of over $100,000 to date. We’re David, and they are four Goliaths with practically unlimited resources. If we lose, it will mean that new case law is made, case law that could mean increased censorship of the internet.”

In the hope of getting support, TVAddons has launched a fundraiser campaign and in the meantime, a new version of the site is back on a new domain, TVAddons.co.

Given TVAddons’ line of defense, the nature of both the platform and Kodi addons, and the fact that there has already been a serious abuse of process during evidence preservation, this is now one of the most interesting and potentially influential copyright cases underway anywhere today.

TVAddons is being represented by Éva Richard , Hilal Ayoubi and Karim Renno in Canada, plus Erin Russell and Jason Sweet in the United States.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

The KickassTorrents Shutdown, One Year Later

Post Syndicated from Ernesto original https://torrentfreak.com/the-kickasstorrents-shutdown-one-year-later-170720/

Exactly one year ago, on July 20th 2016, the torrent community was in dire straits.

Polish law enforcement officers had just apprehended Artem Vaulin, the alleged founder of KickassTorrents (KAT) at a local airport.

The arrest was part of a U.S. criminal case which also listed two other men as key players. At the time, KAT was the most-used torrent site around, so the authorities couldn’t have hit a more prominent target.

The criminal case was the end of the torrent site, but also the start of a lengthy court battle for the defendants.

To this day, Artem remains in Poland. He’s currently out on bail awaiting the final decision on the extradition request from the United States, while the other two defendants remain at large. If he is extradited, it’s expected that an extensive court battle will follow.

Although the original KickassTorrents is website no longer around, the ‘brand’ is still very much alive. Soon after the site went down several KAT copies and mirrors appeared. For many, however, the original site is still dearly missed.

The most prominent effort to create a replacement is the product of a group of well-known staffers from the original site. They began to rebuild the community by launching a forum for estranged KAT users last summer. A few months later they expanded their KATcr project to a full blown torrent site, mimicking the looks of the original.

Today, one year after it all started, we reach out to the new KATcr team to hear about their memories and future plans.

“Looking back it was shocking and disheartening for everyone, we know it happens but didn’t expect our ship to sink like that. We’ve written history there though, made many friends, learned a hell of a lot, and achieved so much,” Mr.Gooner recalls.

“It’s thanks to the original site and the loyal, supporting users that we were able to rebuild our ship and set sail again,” he adds.

While KATcr was able to put up a forum within days, getting fully organized was a more complex operation. Several former admins came on board, but without access to the original code or database, it took a few months to build a KAT replacement from scratch.

KATcr today

The site eventually relaunched as a full-blown torrent site last December. Although it doesn’t get as much traffic as the original KAT, many former users have found their way ‘back.’

“Minus a few hiccups and various other minor issues most new sites experience, traffic is increasing at a good rate. We are continuously improving and our name is well and truly out there now. The door is open and everyone is welcomed with open arms, we know all too well what it’s like to lose our home,” Mr.Gooner notes.

A lot of people would think twice before attempting to fill the shoes of a site that was hunted down by the US Department of Justice. However, the KATcr team believes that they are acting within the boundaries of the law.

“As far as we are concerned we operate to every letter of the law,” Mr.Gooner states in full confidence.

In the future, the site hopes to expand its userbase even further. Although it’s now been a year since the original KAT was pulled offline, the KATcr team prefers to look ahead, instead of dwelling in the past. There are some people who are still missed, but other than that, the focus is forward.

“I mostly miss those that are no longer with us. But rather than living in the past, the present day and the future is what matters, so we don’t tend to look back to miss anything else,” Mr.Gooner says.

Looking ahead is what alleged KickassTorrents operator Artem Vaulin will do as well. His concerns are different though.

The most pressing question that has to be answered in the near future is whether Poland will extradite him to the United States. Through his lawyers, he previously floated the idea of surrendering to the US voluntarily to “resolve” the pending charges, but only under the right conditions.

Meanwhile, he remains in Poland on bail.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

How To Get Your First 1,000 Customers

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/how-to-get-your-first-1000-customers/

PR for getting your first 1000 customers

If you launch your startup and no one knows, did you actually launch? As mentioned in my last post, our initial launch target was to get a 1,000 people to use our service. But how do you get even 1,000 people to sign up for your service when no one knows who you are?

There are a variety of methods to attract your first 1,000 customers, but launching with the press is my favorite. I’ll explain why and how to do it below.

Paths to Attract Your First 1,000 Customers

Social following: If you have a massive social following, those people are a reasonable target for what you’re offering. In particular if your relationship with them is one where they would buy something you recommend, this can be one of the easiest ways to get your initial customers. However, building this type of following is non-trivial and often is done over several years.

Press not only provides awareness and customers, but credibility and SEO benefits as well

Paid advertising: The advantage of paid ads is you have control over when they are presented and what they say. The primary disadvantage is they tend to be expensive, especially before you have your positioning, messaging, and funnel nailed.

Viral: There are certainly examples of companies that launched with a hugely viral video, blog post, or promotion. While fantastic if it happens, even if you do everything right, the likelihood of massive virality is miniscule and the conversion rate is often low.

Press: As I said, this is my favorite. You don’t need to pay a PR agency and can go from nothing to launched in a couple weeks. Press not only provides awareness and customers, but credibility and SEO benefits as well.

How to Pitch the Press

It’s easy: Have a compelling story, find the right journalists, make their life easy, pitch and follow-up. Of course, each one of those has some nuance, so let’s dig in.

Have a compelling story

How to Get Attention When you’ve been working for months on your startup, it’s easy to get lost in the minutiae when talking to others. Stories that a journalist will write about need to be something their readers will care about. Knowing what story to tell and how to tell it is part science and part art. Here’s how you can get there:

The basics of your story

Ask yourself the following questions, and write down the answers:

  • What are we doing? What product service are we offering?
  • Why? What problem are we solving?
  • What is interesting or unique? Either about what we’re doing, how we’re doing it, or for who we’re doing it.

“But my story isn’t that exciting”

Neither was announcing a data backup company, believe me. Look for angles that make it compelling. Here are some:

  • Did someone on your team do something major before? (build a successful company/product, create some innovation, market something we all know, etc.)
  • Do you have an interesting investor or board member?
  • Is there a personal story that drove you to start this company?
  • Are you starting it in a unique place?
  • Did you come upon the idea in a unique way?
  • Can you share something people want to know that’s not usually shared?
  • Are you partnered with a well-known company?
  • …is there something interesting/entertaining/odd/shocking/touching/etc.?

It doesn’t get much less exciting than, “We’re launching a company that will backup your data.” But there were still a lot of compelling stories:

  • Founded by serial entrepreneurs, bootstrapped a capital-intensive company, committed to each other for a year without salary.
  • Challenging the way that every backup company before was set up by not asking customers to pick and choose files to backup.
  • Designing our own storage system.
  • Etc. etc.

For the initial launch, we focused on “unlimited for $5/month” and statistics from a survey we ran with Harris Interactive that said that 94% of people did not regularly backup their data.

It’s an old adage that “Everyone has a story.” Regardless of what you’re doing, there is always something interesting to share. Dig for that.

The headline

Once you’ve captured what you think the interesting story is, you’ve got to boil it down. Yes, you need the elevator pitch, but this is shorter…it’s the headline pitch. Write the headline that you would love to see a journalist write.

Regardless of what you’re doing, there is always something interesting to share. Dig for that.

Now comes the part where you have to be really honest with yourself: if you weren’t involved, would you care?

The “Techmeme Test”

One way I try to ground myself is what I call the “Techmeme Test”. Techmeme lists the top tech articles. Read the headlines. Imagine the headline you wrote in the middle of the page. If you weren’t involved, would you click on it? Is it more or less compelling than the others. Much of tech news is dominated by the largest companies. If you want to get written about, your story should be more compelling. If not, go back above and explore your story some more.

Embargoes, exclusives and calls-to-action

Journalists write about news. Thus, if you’ve already announced something and are then pitching a journalist to cover it, unless you’re giving her something significant that hasn’t been said, it’s no longer news. As a result, there are ‘embargoes’ and ‘exclusives’.

Embargoes

    • : An embargo simply means that you are sharing news with a journalist that they need to keep private until a certain date and time.

If you’re Apple, this may be a formal and legal document. In our case, it’s as simple as saying, “Please keep embargoed until 4/13/17 at 8am California time.” in the pitch. Some sites explicitly will not keep embargoes; for example The Information will only break news. If you want to launch something later, do not share information with journalists at these sites. If you are only working with a single journalist for a story, and your announcement time is flexible, you can jointly work out a date and time to announce. However, if you have a fixed launch time or are working with a few journalists, embargoes are key.

Exclusives: An exclusive means you’re giving something specifically to that journalist. Most journalists love an exclusive as it means readers have to come to them for the story. One option is to give a journalist an exclusive on the entire story. If it is your dream journalist, this may make sense. Another option, however, is to give exclusivity on certain pieces. For example, for your launch you could give an exclusive on funding detail & a VC interview to a more finance-focused journalist and insight into the tech & a CTO interview to a more tech-focused journalist.

Call-to-Action: With our launch we gave TechCrunch, Ars Technica, and SimplyHelp URLs that gave the first few hundred of their readers access to the private beta. Once those first few hundred users from each site downloaded, the beta would be turned off.

Thus, we used a combination of embargoes, exclusives, and a call-to-action during our initial launch to be able to brief journalists on the news before it went live, give them something they could announce as exclusive, and provide a time-sensitive call-to-action to the readers so that they would actually sign up and not just read and go away.

How to Find the Most Authoritative Sites / Authors

“If a press release is published and no one sees it, was it published?” Perhaps the time existed when sending a press release out over the wire meant journalists would read it and write about it. That time has long been forgotten. Over 1,000 unread press releases are published every day. If you want your compelling story to be covered, you need to find the handful of journalists that will care.

Determine the publications

Find the publications that cover the type of story you want to share. If you’re in tech, Techmeme has a leaderboard of publications ranked by leadership and presence. This list will tell you which publications are likely to have influence. Visit the sites and see if your type of story appears on their site. But, once you’ve determined the publication do NOT send a pitch their [email protected] or [email protected] email addresses. In all the times I’ve done that, I have never had a single response. Those email addresses are likely on every PR, press release, and spam list and unlikely to get read. Instead…

Determine the journalists

Once you’ve determined which publications cover your area, check which journalists are doing the writing. Skim the articles and search for keywords and competitor names.

Over 1,000 unread press releases are published every day.

Identify one primary journalist at the publication that you would love to have cover you, and secondary ones if there are a few good options. If you’re not sure which one should be the primary, consider a few tests:

  • Do they truly seem to care about the space?
  • Do they write interesting/compelling stories that ‘get it’?
  • Do they appear on the Techmeme leaderboard?
  • Do their articles get liked/tweeted/shared and commented on?
  • Do they have a significant social presence?

Leveraging Google

Google author search by date

In addition to Techmeme or if you aren’t in the tech space Google will become a must have tool for finding the right journalists to pitch. Below the search box you will find a number of tabs. Click on Tools and change the Any time setting to Custom range. I like to use the past six months to ensure I find authors that are actively writing about my market. I start with the All results. This will return a combination of product sites and articles depending upon your search term.

Scan for articles and click on the link to see if the article is on topic. If it is find the author’s name. Often if you click on the author name it will take you to a bio page that includes their Twitter, LinkedIn, and/or Facebook profile. Many times you will find their email address in the bio. You should collect all the information and add it to your outreach spreadsheet. Click here to get a copy. It’s always a good idea to comment on the article to start building awareness of your name. Another good idea is to Tweet or Like the article.

Next click on the News tab and set the same search parameters. You will get a different set of results. Repeat the same steps. Between the two searches you will have a list of authors that actively write for the websites that Google considers the most authoritative on your market.

How to find the most socially shared authors

Buzzsumo search for most shared by date

Your next step is to find the writers whose articles get shared the most socially. Go to Buzzsumo and click on the Most Shared tab. Enter search terms for your market as well as competitor names. Again I like to use the past 6 months as the time range. You will get a list of articles that have been shared the most across Facebook, LinkedIn, Twitter, Pinterest, and Google+. In addition to finding the most shared articles and their authors you can also see some of the Twitter users that shared the article. Many of those Twitter users are big influencers in your market so it’s smart to start following and interacting with them as well as the authors.

How to Find Author Email Addresses

Some journalists publish their contact info right on the stories. For those that don’t, a bit of googling will often get you the email. For example, TechCrunch wrote a story a few years ago where they published all of their email addresses, which was in response to this new service that charges a small fee to provide journalist email addresses. Sometimes visiting their twitter pages will link to a personal site, upon which they will share an email address.

Of course all is not lost if you don’t find an email in the bio. There are two good services for finding emails, https://app.voilanorbert.com/ and https://hunter.io/. For Voila Norbert enter the author name and the website you found their article on. The majority of the time you search for an author on a major publication Norbert will return an accurate email address. If it doesn’t try Hunter.io.

On Hunter.io enter the domain name and click on Personal Only. Then scroll through the results to find the author’s email. I’ve found Norbert to be more accurate overall but between the two you will find most major author’s email addresses.

Email, by the way, is not necessarily the best way to engage a journalist. Many are avid Twitter users. Follow them and engage – that means read/retweet/favorite their tweets; reply to their questions, and generally be helpful BEFORE you pitch them. Later when you email them, you won’t be just a random email address.

Don’t spam

Now that you have all these email addresses (possibly thousands if you purchased a list) – do NOT spam. It is incredibly tempting to think “I could try to figure out which of these folks would be interested, but if I just email all of them, I’ll save myself time and be more likely to get some of them to respond.” Don’t do it.

Follow them and engage – that means read/retweet/favorite their tweets; reply to their questions, and generally be helpful BEFORE you pitch them.

First, you’ll want to tailor your pitch to the individual. Second, it’s a small world and you’ll be known as someone who spams – reputation is golden. Also, don’t call journalists. Unless you know them or they’ve said they’re open to calls, you’re most likely to just annoy them.

Build a relationship

Build Trust with reporters Play the long game. You may be focusing just on the launch and hoping to get this one story covered, but if you don’t quickly flame-out, you will have many more opportunities to tell interesting stories that you’ll want the press to cover. Be honest and don’t exaggerate.
When you have 500 users it’s tempting to say, “We’ve got thousands!” Don’t. The good journalists will see through it and it’ll likely come back to bite you later. If you don’t know something, say “I don’t know but let me find out for you.” Most journalists want to write interesting stories that their readers will appreciate. Help them do that. Build deeper relationships with 5 – 10 journalists, rather than spamming thousands.

Stay organized

It doesn’t need to be complicated, but keep a spreadsheet that includes the name, publication, and contact info of the journalists you care about. Then, use it to keep track of who you’ve pitched, who’s responded, whether you’ve sent them the materials they need, and whether they intend to write/have written.

Make their life easy

Journalists have a million PR people emailing them, are actively engaging with readers on Twitter and in the comments, are tracking their metrics, are working their sources…and all the while needing to publish new articles. They’re busy. Make their life easy and they’re more likely to engage with yours.

Get to know them

Before sending them a pitch, know what they’ve written in the space. If you tell them how your story relates to ones they’ve written, it’ll help them put the story in context, and enable them to possibly link back to a story they wrote before.

Prepare your materials

Journalists will need somewhere to get more info (prepare a fact sheet), a URL to link to, and at least one image (ideally a few to choose from.) A fact sheet gives bite-sized snippets of information they may need about your startup or product: what it is, how big the market is, what’s the pricing, who’s on the team, etc. The URL is where their reader will get the product or more information from you. It doesn’t have to be live when you’re pitching, but you should be able to tell what the URL will be. The images are ones that they could embed in the article: a product screenshot, a CEO or team photo, an infographic. Scan the types of images included in their articles. Don’t send any of these in your pitch, but have them ready. Studies, stats, customer/partner/investor quotes are also good to have.

Pitch

A pitch has to be short and compelling.

Subject Line

Think back to the headline you want. Is it really compelling? Can you shorten it to a subject line? Include what’s happening and when. For Mike Arrington at Techcrunch, our first subject line was “Startup doing an ‘online time machine’”. Later I would include, “launching June 6th”.

For John Timmer at ArsTechnica, it was “Demographics data re: your 4/17 article”. Why? Because he wrote an article titled “WiFi popular with the young people; backups, not so much”. Since we had run a demographics survey on backups, I figured as a science editor he’d be interested in this additional data.

Body

A few key things about the body of the email. It should be short and to the point, no more than a few sentences. Here was my actual, original pitch email to John:

Hey John,

We’re launching Backblaze next week which provides a Time Machine-online type of service. As part of doing some research I read your article about backups not being popular with young people and that you had wished Accenture would have given you demographics. In prep for our invite-only launch I sponsored Harris Interactive to get demographic data on who’s doing backups and if all goes well, I should have that data on Friday.

Next week starts Backup Awareness Month (and yes, probably Clean Your House Month and Brush Your Teeth Month)…but nonetheless…good time to remind readers to backup with a bit of data?

Would you be interested in seeing/talking about the data when I get it?

Would you be interested in getting a sneak peak at Backblaze? (I could give you some invite codes for your readers as well.)

Gleb Budman        

CEO and Co-Founder

Backblaze, Inc.

Automatic, Secure, High-Performance Online Backup

Cell: XXX-XXX-XXXX

The Good: It said what we’re doing, why this relates to him and his readers, provides him information he had asked for in an article, ties to something timely, is clearly tailored for him, is pitched by the CEO and Co-Founder, and provides my cell.

The Bad: It’s too long.

I got better later. Here’s an example:

Subject: Does temperature affect hard drive life?

Hi Peter, there has been much debate about whether temperature affects how long a hard drive lasts. Following up on the Backblaze analyses of how long do drives last & which drives last the longest (that you wrote about) we’ve now analyzed the impact of heat on the nearly 40,000 hard drives we have and found that…

We’re going to publish the results this Monday, 5/12 at 5am California-time. Want a sneak peak of the analysis?

Timing

A common question is “When should I launch?” What day, what time? I prefer to launch on Tuesday at 8am California-time. Launching earlier in the week gives breathing room for the news to live longer. While your launch may be a single article posted and that’s that, if it ends up a larger success, earlier in the week allows other journalists (including ones who are in other countries) to build on the story. Monday announcements can be tough because the journalists generally need to have their stories finished by Friday, and while ideally everything is buttoned up beforehand, startups sometimes use the weekend as overflow before a launch.

The 8am California-time is because it allows articles to be published at the beginning of the day West Coast and around lunch-time East Coast. Later and you risk it being past publishing time for the day. We used to launch at 5am in order to be morning for the East Coast, but it did not seem to have a significant benefit in coverage or impact, but did mean that the entire internal team needed to be up at 3am or 4am. Sometimes that’s critical, but I prefer to not burn the team out when it’s not.

Finally, try to stay clear of holidays, major announcements and large conferences. If Apple is coming out with their next iPhone, many of the tech journalists will be busy at least a couple days prior and possibly a week after. Not always obvious, but if you can, find times that are otherwise going to be slow for news.

Follow-up

There is a fine line between persistence and annoyance. I once had a journalist write me after we had an announcement that was covered by the press, “Why didn’t you let me know?! I would have written about that!” I had sent him three emails about the upcoming announcement to which he never responded.

My general rule is 3 emails.

Ugh. However, my takeaway from this isn’t that I should send 10 emails to every journalist. It’s that sometimes these things happen.

My general rule is 3 emails. If I’ve identified a specific journalist that I think would be interested and have a pitch crafted for her, I’ll send her the email ideally 2 weeks prior to the announcement. I’ll follow-up a week later, and one more time 2 days prior. If she ever says, “I’m not interested in this topic,” I note it and don’t email her on that topic again.

If a journalist wrote, I read the article and engage in the comments (or someone on our team, such as our social guy, @YevP does). We’ll often promote the story through our social channels and email our employees who may choose to share the story as well. This helps us, but also helps the journalist get their story broader reach. Again, the goal is to build a relationship with the journalists your space. If there’s something relevant to your customers that the journalist wrote, you’re providing a service to your customers AND helping the journalist get the word out about the article.

At times the stories also end up shared on sites such as Hacker News, Reddit, Slashdot, or become active conversations on Twitter. Again, we try to engage there and respond to questions (when we do, we are always clear that we’re from Backblaze.)

And finally, I’ll often send a short thank you to the journalist.

Getting Your First 1,000 Customers With Press

As I mentioned at the beginning, there is more than one way to get your first 1,000 customers. My favorite is working with the press to share your story. If you figure out your compelling story, find the right journalists, make their life easy, pitch and follow-up, you stand a high likelyhood of getting coverage and customers. Better yet, that coverage will provide credibility for your company, and if done right, will establish you as a resource for the press for the future.

Like any muscle, this process takes working out. The first time may feel a bit daunting, but just take the steps one at a time. As you do this a few times, the process will be easier and you’ll know who to reach out and quickly determine what stories will be compelling.

The post How To Get Your First 1,000 Customers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/analyze-openfda-data-in-r-with-amazon-s3-and-amazon-athena/

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with a data silo.

The openFDA project creates easy access to the high value, high priority, and public access data of the Food and Drug Administration (FDA). The data has been formatted and documented in consumer-friendly standards. Critical data related to drugs, devices, and food has been harmonized and can easily be called by application developers and researchers via API calls. OpenFDA has published two whitepapers that drill into the technical underpinnings of the API infrastructure as well as how to properly analyze the data in R. In addition, FDA makes openFDA data available on S3 in raw format.

In this post, I show how to use S3, Amazon EMR, and Amazon Athena to analyze the drug adverse events dataset. A drug adverse event is an undesirable experience associated with the use of a drug, including serious drug side effects, product use errors, product quality programs, and therapeutic failures.

Data considerations

Keep in mind that this data does have limitations. In addition, in the United States, these adverse events are submitted to the FDA voluntarily from consumers so there may not be reports for all events that occurred. There is no certainty that the reported event was actually due to the product. The FDA does not require that a causal relationship between a product and event be proven, and reports do not always contain the detail necessary to evaluate an event. Because of this, there is no way to identify the true number of events. The important takeaway to all this is that the information contained in this data has not been verified to produce cause and effect relationships. Despite this disclaimer, many interesting insights and value can be derived from the data to accelerate drug safety research.

Data analysis using SQL

For application developers who want to perform targeted searching and lookups, the API endpoints provided by the openFDA project are “ready to go” for software integration using a standard API powered by Elasticsearch, NodeJS, and Docker. However, for data analysis purposes, it is often easier to work with the data using SQL and statistical packages that expect a SQL table structure. For large-scale analysis, APIs often have query limits, such as 5000 records per query. This can cause extra work for data scientists who want to analyze the full dataset instead of small subsets of data.

To address the concern of requiring all the data in a single dataset, the openFDA project released the full 100 GB of harmonized data files that back the openFDA project onto S3. Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL. It’s a quick and easy way to answer your questions about adverse events and aspirin that does not require you to spin up databases or servers.

While you could point tools directly at the openFDA S3 files, you can find greatly improved performance and use of the data by following some of the preparation steps later in this post.

Architecture

This post explains how to use the following architecture to take the raw data provided by openFDA, leverage several AWS services, and derive meaning from the underlying data.

Steps:

  1. Load the openFDA /drug/event dataset into Spark and convert it to gzip to allow for streaming.
  2. Transform the data in Spark and save the results as a Parquet file in S3.
  3. Query the S3 Parquet file with Athena.
  4. Perform visualization and analysis of the data in R and Python on Amazon EC2.

Optimizing public data sets: A primer on data preparation

Those who want to jump right into preparing the files for Athena may want to skip ahead to the next section.

Transforming, or pre-processing, files is a common task for using many public data sets. Before you jump into the specific steps for transforming the openFDA data files into a format optimized for Athena, I thought it would be worthwhile to provide a quick exploration on the problem.

Making a dataset in S3 efficiently accessible with minimal transformation for the end user has two key elements:

  1. Partitioning the data into objects that contain a complete part of the data (such as data created within a specific month).
  2. Using file formats that make it easy for applications to locate subsets of data (for example, gzip, Parquet, ORC, etc.).

With these two key elements in mind, you can now apply transformations to the openFDA adverse event data to prepare it for Athena. You might find the data techniques employed in this post to be applicable to many of the questions you might want to ask of the public data sets stored in Amazon S3.

Before you get started, I encourage those who are interested in doing deeper healthcare analysis on AWS to make sure that you first read the AWS HIPAA Compliance whitepaper. This covers the information necessary for processing and storing patient health information (PHI).

Also, the adverse event analysis shown for aspirin is strictly for demonstration purposes and should not be used for any real decision or taken as anything other than a demonstration of AWS capabilities. However, there have been robust case studies published that have explored a causal relationship between aspirin and adverse reactions using OpenFDA data. If you are seeking research on aspirin or its risks, visit organizations such as the Centers for Disease Control and Prevention (CDC) or the Institute of Medicine (IOM).

Preparing data for Athena

For this walkthrough, you will start with the FDA adverse events dataset, which is stored as JSON files within zip archives on S3. You then convert it to Parquet for analysis. Why do you need to convert it? The original data download is stored in objects that are partitioned by quarter.

Here is a small sample of what you find in the adverse events (/drugs/event) section of the openFDA website.

If you were looking for events that happened in a specific quarter, this is not a bad solution. For most other scenarios, such as looking across the full history of aspirin events, it requires you to access a lot of data that you won’t need. The zip file format is not ideal for using data in place because zip readers must have random access to the file, which means the data can’t be streamed. Additionally, the zip files contain large JSON objects.

To read the data in these JSON files, a streaming JSON decoder must be used or a computer with a significant amount of RAM must decode the JSON. Opening up these files for public consumption is a great start. However, you still prepare the data with a few lines of Spark code so that the JSON can be streamed.

Step 1:  Convert the file types

Using Apache Spark on EMR, you can extract all of the zip files and pull out the events from the JSON files. To do this, use the Scala code below to deflate the zip file and create a text file. In addition, compress the JSON files with gzip to improve Spark’s performance and reduce your overall storage footprint. The Scala code can be run in either the Spark Shell or in an Apache Zeppelin notebook on your EMR cluster.

If you are unfamiliar with either Apache Zeppelin or the Spark Shell, the following posts serve as great references:

 

import scala.io.Source
import java.util.zip.ZipInputStream
import org.apache.spark.input.PortableDataStream
import org.apache.hadoop.io.compress.GzipCodec

// Input Directory
val inputFile = "s3://download.open.fda.gov/drug/event/2015q4/*.json.zip";

// Output Directory
val outputDir = "s3://{YOUR OUTPUT BUCKET HERE}/output/2015q4/";

// Extract zip files from 
val zipFiles = sc.binaryFiles(inputFile);

// Process zip file to extract the json as text file and save it
// in the output directory 
val rdd = zipFiles.flatMap((file: (String, PortableDataStream)) => {
    val zipStream = new ZipInputStream(file.2.open)
    val entry = zipStream.getNextEntry
    val iter = Source.fromInputStream(zipStream).getLines
    iter
}).map(.replaceAll("\s+","")).saveAsTextFile(outputDir, classOf[GzipCodec])

Step 2:  Transform JSON into Parquet

With just a few more lines of Scala code, you can use Spark’s abstractions to convert the JSON into a Spark DataFrame and then export the data back to S3 in Parquet format.

Spark requires the JSON to be in JSON Lines format to be parsed correctly into a DataFrame.

// Output Parquet directory
val outputDir = "s3://{YOUR OUTPUT BUCKET NAME}/output/drugevents"
// Input json file
val inputJson = "s3://{YOUR OUTPUT BUCKET NAME}/output/2015q4/*”
// Load dataframe from json file multiline 
val df = spark.read.json(sc.wholeTextFiles(inputJson).values)
// Extract results from dataframe
val results = df.select("results")
// Save it to Parquet
results.write.parquet(outputDir)

Step 3:  Create an Athena table

With the data cleanly prepared and stored in S3 using the Parquet format, you can now place an Athena table on top of it to get a better understanding of the underlying data.

Because the openFDA data structure incorporates several layers of nesting, it can be a complex process to try to manually derive the underlying schema in a Hive-compatible format. To shorten this process, you can load the top row of the DataFrame from the previous step into a Hive table within Zeppelin and then extract the “create  table” statement from SparkSQL.

results.createOrReplaceTempView("data")

val top1 = spark.sql("select * from data tablesample(1 rows)")

top1.write.format("parquet").mode("overwrite").saveAsTable("drugevents")

val show_cmd = spark.sql("show create table drugevents”).show(1, false)

This returns a “create table” statement that you can almost paste directly into the Athena console. Make some small modifications (adding the word “external” and replacing “using with “stored as”), and then execute the code in the Athena query editor. The table is created.

For the openFDA data, the DDL returns all string fields, as the date format used in your dataset does not conform to the yyy-mm-dd hh:mm:ss[.f…] format required by Hive. For your analysis, the string format works appropriately but it would be possible to extend this code to use a Presto function to convert the strings into time stamps.

CREATE EXTERNAL TABLE  drugevents (
   companynumb  string, 
   safetyreportid  string, 
   safetyreportversion  string, 
   receiptdate  string, 
   patientagegroup  string, 
   patientdeathdate  string, 
   patientsex  string, 
   patientweight  string, 
   serious  string, 
   seriousnesscongenitalanomali  string, 
   seriousnessdeath  string, 
   seriousnessdisabling  string, 
   seriousnesshospitalization  string, 
   seriousnesslifethreatening  string, 
   seriousnessother  string, 
   actiondrug  string, 
   activesubstancename  string, 
   drugadditional  string, 
   drugadministrationroute  string, 
   drugcharacterization  string, 
   drugindication  string, 
   drugauthorizationnumb  string, 
   medicinalproduct  string, 
   drugdosageform  string, 
   drugdosagetext  string, 
   reactionoutcome  string, 
   reactionmeddrapt  string, 
   reactionmeddraversionpt  string)
STORED AS parquet
LOCATION
  's3://{YOUR TARGET BUCKET}/output/drugevents'

With the Athena table in place, you can start to explore the data by running ad hoc queries within Athena or doing more advanced statistical analysis in R.

Using SQL and R to analyze adverse events

Using the openFDA data with Athena makes it very easy to translate your questions into SQL code and perform quick analysis on the data. After you have prepared the data for Athena, you can begin to explore the relationship between aspirin and adverse drug events, as an example. One of the most common metrics to measure adverse drug events is the Proportional Reporting Ratio (PRR). It is defined as:

PRR = (m/n)/( (M-m)/(N-n) )
Where
m = #reports with drug and event
n = #reports with drug
M = #reports with event in database
N = #reports in database

Gastrointestinal haemorrhage has the highest PRR of any reaction to aspirin when viewed in aggregate. One question you may want to ask is how the PRR has trended on a yearly basis for gastrointestinal haemorrhage since 2005.

Using the following query in Athena, you can see the PRR trend of “GASTROINTESTINAL HAEMORRHAGE” reactions with “ASPIRIN” since 2005:

with drug_and_event as 
(select rpad(receiptdate, 4, 'NA') as receipt_year
    , reactionmeddrapt
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug_and_event 
from fda.drugevents
where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
group by reactionmeddrapt, rpad(receiptdate, 4, 'NA') 
), reports_with_drug as 
(
select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug 
 from fda.drugevents 
 where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
group by rpad(receiptdate, 4, 'NA') 
), reports_with_event as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_event 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
   group by rpad(receiptdate, 4, 'NA')
), total_reports as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as total_reports 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
   group by rpad(receiptdate, 4, 'NA')
)
select  drug_and_event.receipt_year, 
(1.0 * drug_and_event.reports_with_drug_and_event/reports_with_drug.reports_with_drug)/ (1.0 * (reports_with_event.reports_with_event- drug_and_event.reports_with_drug_and_event)/(total_reports.total_reports-reports_with_drug.reports_with_drug)) as prr
, drug_and_event.reports_with_drug_and_event
, reports_with_drug.reports_with_drug
, reports_with_event.reports_with_event
, total_reports.total_reports
from drug_and_event
    inner join reports_with_drug on  drug_and_event.receipt_year = reports_with_drug.receipt_year   
    inner join reports_with_event on  drug_and_event.receipt_year = reports_with_event.receipt_year
    inner join total_reports on  drug_and_event.receipt_year = total_reports.receipt_year
order by  drug_and_event.receipt_year


One nice feature of Athena is that you can quickly connect to it via R or any other tool that can use a JDBC driver to visualize the data and understand it more clearly.

With this quick R script that can be run in R Studio either locally or on an EC2 instance, you can create a visualization of the PRR and Reporting Odds Ratio (RoR) for “GASTROINTESTINAL HAEMORRHAGE” reactions from “ASPIRIN” since 2005 to better understand these trends.

# connect to ATHENA
conn <- dbConnect(drv, '<Your JDBC URL>',s3_staging_dir="<Your S3 Location>",user=Sys.getenv(c("USER_NAME"),password=Sys.getenv(c("USER_PASSWORD"))

# Declare Adverse Event
adverseEvent <- "'GASTROINTESTINAL HAEMORRHAGE'"

# Build SQL Blocks
sqlFirst <- "SELECT rpad(receiptdate, 4, 'NA') as receipt_year, count(DISTINCT safetyreportid) as event_count FROM fda.drugsflat WHERE rpad(receiptdate,4,'NA') between '2005' and '2015'"
sqlEnd <- "GROUP BY rpad(receiptdate, 4, 'NA') ORDER BY receipt_year"

# Extract Aspirin with adverse event counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN' AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
aspirinAdverseCount = dbGetQuery(conn,sql)

# Extract Aspirin counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN'", sqlEnd,sep=" ")
aspirinCount = dbGetQuery(conn,sql)

# Extract adverse event counts
sql <- paste(sqlFirst,"AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
adverseCount = dbGetQuery(conn,sql)

# All Drug Adverse event Counts
sql <- paste(sqlFirst, sqlEnd,sep=" ")
allDrugCount = dbGetQuery(conn,sql)

# Select correct rows
selAll =  allDrugCount$receipt_year == aspirinAdverseCount$receipt_year
selAspirin = aspirinCount$receipt_year == aspirinAdverseCount$receipt_year
selAdverse = adverseCount$receipt_year == aspirinAdverseCount$receipt_year

# Calculate Numbers
m <- c(aspirinAdverseCount$event_count)
n <- c(aspirinCount[selAspirin,2])
M <- c(adverseCount[selAdverse,2])
N <- c(allDrugCount[selAll,2])

# Calculate proptional reporting ratio
PRR = (m/n)/((M-m)/(N-n))

# Calculate reporting Odds Ratio
d = n-m
D = N-M
ROR = (m/d)/(M/D)

# Plot the PRR and ROR
g_range <- range(0, PRR,ROR)
g_range[2] <- g_range[2] + 3
yearLen = length(aspirinAdverseCount$receipt_year)
axis(1,1:yearLen,lab=ax)
plot(PRR, type="o", col="blue", ylim=g_range,axes=FALSE, ann=FALSE)
axis(1,1:yearLen,lab=ax)
axis(2, las=1, at=1*0:g_range[2])
box()
lines(ROR, type="o", pch=22, lty=2, col="red")

As you can see, the PRR and RoR have both remained fairly steady over this time range. With the R Script above, all you need to do is change the adverseEvent variable from GASTROINTESTINAL HAEMORRHAGE to another type of reaction to analyze and compare those trends.

Summary

In this walkthrough:

  • You used a Scala script on EMR to convert the openFDA zip files to gzip.
  • You then transformed the JSON blobs into flattened Parquet files using Spark on EMR.
  • You created an Athena DDL so that you could query these Parquet files residing in S3.
  • Finally, you pointed the R package at the Athena table to analyze the data without pulling it into a database or creating your own servers.

If you have questions or suggestions, please comment below.


Next Steps

Take your skills to the next level. Learn how to optimize Amazon S3 for an architecture commonly used to enable genomic data analysis. Also, be sure to read more about running R on Amazon Athena.

 

 

 

 

 


About the Authors

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 

Vikram Anand is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys playing soccer and watching the NFL & European Soccer leagues.

 

 

Dave Rocamora is a Solutions Architect at Amazon Web Services on the Open Data team. Dave is based in Seattle and when he is not opening data, he enjoys biking and drinking coffee outside.

 

 

 

 

Visualize Amazon S3 Analytics Data with Amazon QuickSight

Post Syndicated from Luis Wang original https://aws.amazon.com/blogs/big-data/visualize-amazon-s3-analytics-data-with-amazon-quicksight/

When Amazon S3 analytics was released in November 2016, it gave you the ability to analyze storage access patterns and transition the right data to the right storage class. You could also manually export the data to an S3 bucket to analyze, using the business intelligence tool of your choice, and gather deeper insights on usage and growth patterns. This helped you reduce storage costs while optimizing performance based on usage patterns.

With today’s update, you can quickly and easily gain those deeper insights and benefits by analyzing and visualizing S3 analytics data in Amazon QuickSight. It takes just a single click from the S3 console, without the need for manual exports or additional data preparation.

If you already have S3 analytics storage class analysis enabled for your buckets, choose Explore in QuickSight on the top right.

Users new to Amazon QuickSight can follow the steps to get set up. Existing QuickSight users are automatically deep linked to an analysis of their S3 analytics data.

From there, you can access the pre-built visualizations to understand the storage access pattern of your bucket. For example, you can visualize the amount of data retrieved vs. the amount of storage consumed for objects of different age groups to identify infrequently accessed data. You can also create new visualizations and perform ad hoc analysis to break down the access patterns by the filters that you have defined in S3 analytics.

 

Finally, you can set up a daily scheduled refresh of the storage class analysis data set in Amazon QuickSight to keep it up to date. Publish and share the analysis as a dashboard to other users in your organization to monitor your S3 storage access pattern.

This feature is now available in all QuickSight regions: US East (N. Virginia and Ohio), US West (Oregon), and EU (Ireland).

Learn more

To learn more about these capabilities and start using them in your dashboards, check out the Amazon QuickSight User Guide.

Stay engaged

If you have questions and suggestions, post them on the Amazon QuickSight Discussion Forum.

Not a QuickSight user?

Go to the Amazon QuickSight website to get started for FREE.

 

TV Giant Canal Plus Decides to Stop Paying Artists & Creators, Gets Sued

Post Syndicated from Andy original https://torrentfreak.com/tv-giant-canal-plus-decides-to-stop-paying-artists-creators-gets-sued-170705/

It’s common for anti-piracy groups to accuse torrent, streaming, and other download sites of not paying licensing fees. As a result, dozens have been sued over the years, often with catastrophic results for the platforms involved.

It is extremely rare, however, for a bone fide broadcasting company to simply declare that it won’t be paying artists, authors, and creators, for the content they provide. Amazingly, that’s the situation playing out in France with pay TV company Canal Plus.

Owned by giant Vivendi, Canal Plus has decided that its current deals with content providers are unfavorable to the company and wants to renegotiate them. In the meantime, tens of millions of euros in royalties owed to SACEM (Society of Authors, Composers and Publishers of Music), SACD (Society of Dramatic Authors and Composers), SCAM (The Civil Society of Multimedia Authors) and ADAGP (Society of Authors in the Graphic and Plastic Arts) are going unpaid.

The decision has caused outrage among the collecting societies, with SACEM (the group that caused the closure of What.CD and French torrent giant T411) deriding Canal Plus for denying artists what is rightfully theirs.

“We are receiving many calls from panic-stricken authors who are finding themselves without the wages due to them,” a spokesman from SACEM said. “Some of them will find themselves facing serious difficulties and how will they continue to create if they have not been paid?”

Hervé Rony, general manager of The Civil Society of Multimedia Authors (SCAM) directly attacked the TV provider.

“I’ve never seen such brutality. Never has another player in the audiovisual industry deployed such methods,” Rony said.

Even filmmakers are affected by the decision to withhold royalties, with the Association of Authors, Directors, Producers (L’ARP) noting that it was “deeply shocked” at what it describes as an act of “violence.”

Although a broad range of creators is affected, local media reports say that Canal Plus’ decision not to pay copyright fees will hit the music sector first, with today being the day that payments should have been made. As a result, SCAM is warning that it may not be able to meet its obligations for the fiscal year.

Telerama reports that Canal Plus is trying to negotiate an 80% discount worth tens of millions of euros to support its cost-cutting agenda, but those demands are meeting a wall of defiance among the collecting societies.

“We only discuss between people in good faith, when they have already settled what they owe and do not renege on contracts already signed,” Rony said. “Nobody wants the death of Canal Plus. But the prerequisite for any discussion is the resumption of payments.”

In comments made by SACEM yesterday, the copyright group indicated that beyond paying what it owes now, Canal Plus only has two options available, both involving the inside of a courtroom. The first would involve a lawsuit over breach of contract and the second would see it being sued for using copyright works without a license – piracy, effectively.

“In this case, we will seek penalties for infringement. They do not have much latitude,” SACEM said.

Several of the groups owed money by Canal Plus have published statements, with SACD, SACEM, SCAM, and ADAGP indicating they will be joining forces to tackle the broadcaster, who they accuse of undermining the right of creators to get paid.

“SACEM, along with the other authors’ societies, would have liked the constructive dialogue it had conducted over the last few weeks to have resulted in Canal Plus fulfilling its contractual obligations, but failing that, was obliged to take appropriate measures, including Judicial rights, so that the rights of its members are preserved,” SACEM wrote.

With a meeting between those affected scheduled for Friday, the suggestion that legal action is already underway has now been confirmed by Variety. Citing an industry source, the publication says that Canal Plus is being sued in the Paris High Court for around 50 million euros.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Half of All Football Fans Have Watched Illegal Streams

Post Syndicated from Andy original https://torrentfreak.com/half-of-all-football-fans-have-watched-illegal-streams-170704/

Being a fan of top-flight football in the UK is an expensive proposition. In 2016, the average price of a season ticket was just shy of £500 a season while watching on TV can cost more than £60 per month.

Of course, there are good reasons for these high prices. Premier League footballers are notoriously highly-paid and with TV rights recently changing hands for more than £5.3bn, money has to be recouped in the most basic of ways – from the fans’ pocket.

While this is a success up to a point, there’s a growing factor upsetting the money men. The rise of online streaming is a thorn in the side of English Premier League, who are having to deal with large numbers of fans obtaining live matches for free via the Internet. But just how many fans are going down this route?

The results of a new survey carried out by the BBC reveal some shocking but perhaps not entirely unexpected results. Carried out online by ComRes between 7 and 15 March among 1,000 fans, it shows that large numbers of fans prefer the free option.

The headline figure is that 36% of football supporters stream Premier League matches online illegally at least once every month, a figure that reduces to just under a quarter (22%) when the frequency is once a week.

However, when fans were asked whether they had ever watched a match through an unofficial online provider, close to half (47%) said they had done so. That’s certainly a worryingly high number for the Premier League.

And if one removes older fans from the equation, things only get worse.

Almost two-thirds (65%) of younger fans aged 18 to 34 say they illegally stream live football matches online at least once a month. Among older fans aged 34 to 54 the figure improves to 33%, dropping to just 13% for the over 55s.

With 29%, the top reason fans gave for streaming content illegally was because “a friend/family member does it and they just watch.” Whether this is fans simply being coy is unclear, but it does suggest that watching football illegally has become a communal pastime, something which can likely be attributed to the rise of set-top boxes running software like Kodi.

Almost a quarter (24%) believe that TV sports packages do not represent good value for money but the only shock here is that the number isn’t higher. It’s certainly possible that many ‘streaming’ fans would never have paid in the first place, so pricing might be less of a factor for them.

Interestingly, 25% of respondents say they stream matches illegally because the quality is good. This is interesting since while illicit streams are both cheap and convenient, quality and reliability isn’t usually high up the checklist. That being said, the BBC research doesn’t differentiate between free streams and cheap IPTV streams, and the latter can indeed rival an official service.

There are also a few interesting revelations when it comes to fans’ opinions on the legality of illicit streaming.

A small 12% of fans think the practice is legal, almost three times less than the number who say it is illegal (34%). Almost three-quarters (32%) don’t know the legal status of streaming from an illicit source.

Following a recent ruling from the European Court of Justice, it is now clear that streaming from an unlicensed source amounts to copyright infringement.

However, enforcing that legislation against people in their own homes would provide similar challenges to prosecuting people who ‘tape’ a friend’s record collection or watch pirate DVDs. It’s just not realistic.

Interestingly, 10% believe it is legal to watch but illegal to upload a stream. That was believed to be the case before the ECJ ruling, but the former has now been clarified.

Uploading streams is very, very much illegal (as is supplying ‘pirate’ boxes) and in the right circumstances could lead to a custodial sentence. However, no regular consumer does this through conventional streaming (through a Kodi-powered device, for example), so it’s a moot point.

A tiny 4% of people believe that unauthorized streaming is not breaking the law but that Sky or BT could still fine them if they found out, which is technically wrong on both counts.

That being said, proving someone watched a stream is extremely difficult and since copyright law in the UK requires that infringers compensate for the losses they’ve caused, any ‘fine’ imposed might only amount to the cost of a match, for example.

Again, the chances of this happening in any way are very unlikely and have certainly never happened to date, even though millions are watching streams via their computers and set-top boxes loaded with Kodi. This is something the Premier League wants to change.

“Fans should know that these pre-loaded boxes enable pirate broadcasts of Premier League football, and other popular content, and are illegal. People who supply them have been jailed or ordered to pay significant financial penalties,” a spokesman told the BBC.

“We are increasingly seeing prominent apps and add-ons being closed down as the law catches up with them, leading to consumers being out of pocket.

“The Premier League will continue to protect its copyright, and the legitimate investment made by its broadcasting partners. Their contribution allows our clubs to develop and acquire players, invest in facilities and support the wider football pyramid and communities – all things that fans enjoy and society benefits from.”

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

The Terrible Horrors of ‘Kodi Boxes’ Shock The UK

Post Syndicated from Andy original https://torrentfreak.com/the-terrible-horrors-of-kodi-boxes-shock-the-uk-170702/

In the beginning, we were told that Kodi Boxes are probably going to destroy Hollywood, not to mention companies like Sky and The Premier League. But who cares about the big people in suits drinking champagne from gold swimming pools?

No, what the unwashed masses need to hear are stories that make us realize that these little plastic wonder boxes are going to ruin our miserable lives. Luckily, they’ve been appearing thick and fast this past couple of weeks.

It turns out that Kodi Boxes are not only likely to burn your house down, but they’re also part of a master plan to pick away at the delicate threads holding family life together.

Forget about the piracy, that doesn’t matter. The powers that be need you to understand that Kodi Boxes are Trojan horses of misery that people are willingly bringing in to their own homes. Can you believe people are being so stupid?

According to an article in this week’s The Mirror, for example, kids’ movies spewed out by these evil devices are now being interrupted by adverts for alcohol. Well, it makes a change from seeing Phil Mitchell smashed out of his mind at 8pm on BBC1, doesn’t it?

At the same time, Kodi Boxes are straining relationships between father and son, not to mention subjecting unsuspecting parents to malware threats. They include scams purporting to be from the ‘FBI’ which demand money for using Popcorn Time inside Kodi. The world truly has gone mad.

Of course, if only one person sees this nonsense it’s too much, and The Mirror piece is quite rightly filled with quotes from real people who gave up piracy as a result of their bad experiences. It also has plenty of useful advice from the UK’s leading anti-piracy outfit, as you’d expect.

Intrigued, we decided to carry out our own research among a handful of the millions of maniacs who are still prepared to plug one of these death devices into their UK mains supply. And we were shocked – not by a dodgy power adaptor from China – but by the huge numbers of other problems these Kodi Boxes can foist upon the honest working man.

A user called Neil told us that he’d bought a Kodi Box off eBay after hearing all the hype in the media. His plan was to watch Premier League football without paying a penny. However, instead of scooping up that forbidden 3pm kick-off excitement, all it did was ruin his enjoyment of the beautiful game.

“I’d been out drinking all day with the lads. I was proper, proper smashed. I got home and shoved the thing into the nearest telly to watch Liverpool versus Manchester United and although I felt really sick, couldn’t focus on the screen, and soon fell unconscious, I think the picture wasn’t too bad,” he said.

“I don’t think I saw that wheel thing spinning in the middle of the screen and everything stopping either, which is a big plus for me on a free box. And to top it all, Liverpool beat United 2:1, which was a real bonus.

“However, when discussing the game the next day with my dad who watched the game on Sky with a proper subscription, I was horrified to learn that Manchester United actually won the game 3:0 – against Arsenal! It just goes to show, you get what you pay for. My box is now where it should have been all along – in the bin.”

A man called Rich told us that he’d also heard good things about Kodi Boxes but was really upset after being completely misled by the person who sold him one.

“I used to be a subscriber to Sky’s top package, including those fifty channels nobody watches but they force you to have. I also forked out for all their boxing PPVs that come on at stupid o’clock in the morning, and bought several blu-ray discs each time I got paid. All in all I must’ve spent £140 a month.

“So, when a bloke down the pub who I’ve never met before told me that I could legally get the same stuff for free using a Kodi Box, I immediately believed him. I mean, what reasonable bloke wouldn’t? He had just one left as well, how lucky was that?”

But it didn’t take long for Rich’s enthusiasm to wane. The thought of owning a potential incendiary device filled with content provided by a Russian crime syndicate and funded by Columbian drug barons was too much.

“I watched a couple of films on it without my house burning down, but then I started reading horror stories in the paper about these boxes shoving drinks adverts in our kids’ faces,” he told us.

“Enough was enough. After being lied to by the seller the thought of my kids demanding toys and beer for Christmas was just too much, it just wasn’t worth the risk. So I went straight back to giving Sky over a grand a year and life’s never been better.”

Kodi Box user Peter told us that he could really relate to warnings published in the papers this week that set-top box users had been hit with popups demanding their bank details.

“I was hoping to watch the big fight last weekend but it only came on for a few minutes and then suddenly went off,” he explained. “Then a notice appeared telling me to ring a number with my credit card details. Well, I’d heard about these ransomware attacks and I wasn’t going to fall for that old trick.

“However, imagine my surprise when I realized that I’d accidentally put on my official satellite box instead of Kodi, and the message was actually from my pay-per-view provider. Just goes to show, everybody wants your money these days, and these crooks can rope you in for years, and make it really hard to cancel.”

Another chap called James told us that he never considered getting a Kodi Box until he saw an article in a UK tabloid explaining how Kodi Boxes pose a risk for families with children.

“The article quoted some anti-piracy company. They said that parents don’t realize that Kodi Boxes allow easy access to hardcore pornography. And it’s true, I had no idea,” James said.

“But I live alone, so I wasted no time buying one off eBay. I’m watching it in the shed with a fire extinguisher in the other hand, just to be safe.”

But while James clearly has his hands full, our last user is much less satisfied.

Sue told us that she was assured her Kodi box was a miracle device with endless uses. However, after its addons recently stopped working she decided to test the claim by sliding the failing unit under the leg of a wobbly table. It soon became clear the hardware had been massively oversold.

“They say these boxes can do anything but mine clearly wasn’t fit for purpose. It was way too thick so when I put it under the leg, the table sat at a really steep angle. If anything, it was more unstable than it was before.

“I dread to think what could’ve happened if I’d put a pot of boiling oil on it next to the baby. No wonder health and safety are up in arms.”

Tune in next week when we reveal how Kodi Boxes can cause unsightly hair growth and unwanted pregnancies.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

NonPetya: no evidence it was a "smokescreen"

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/06/nonpetya-no-evidence-it-was-smokescreen.html

Many well-regarded experts claim that the not-Petya ransomware wasn’t “ransomware” at all, but a “wiper” whose goal was to destroy files, without any intent at letting victims recover their files. I want to point out that there is no real evidence of this.

Certainly, things look suspicious. For one thing, it certainly targeted the Ukraine. For another thing, it made several mistakes that prevent them from ever decrypting drives. Their email account was shutdown, and it corrupts the boot sector.

But these things aren’t evidence, they are problems. They are things needing explanation, not things that support our preferred conspiracy theory.

The simplest, Occam’s Razor explanation explanation is that they were simple mistakes. Such mistakes are common among ransomware. We think of virus writers as professional software developers who thoroughly test their code. Decades of evidence show the opposite, that such software is of poor quality with shockingly bad bugs.

It’s true that effectively, nPetya is a wiper. Matthieu Suiche‏ does a great job describing one flaw that prevents it working. @hasherezade does a great job explaining another flaw.  But best explanation isn’t that this is intentional. Even if these bugs didn’t exist, it’d still be a wiper if the perpetrators simply ignored the decryption requests. They need not intentionally make the decryption fail.

Thus, the simpler explanation is that it’s simply a bug. Ransomware authors test the bits they care about, and test less well the bits they don’t. It’s quite plausible to believe that just before shipping the code, they’d add a few extra features, and forget to regression test the entire suite. I mean, I do that all the time with my code.

Some have pointed to the sophistication of the code as proof that such simple errors are unlikely. This isn’t true. While it’s more sophisticated than WannaCry, it’s about average for the current state-of-the-art for ransomware in general. What people think of, such the Petya base, or using PsExec to spread throughout a Windows domain, is already at least a year old.

Indeed, the use of PsExec itself is a bit clumsy, when the code for doing the same thing is already public. It’s just a few calls to basic Windows networking APIs. A sophisticated virus would do this itself, rather than clumsily use PsExec.

Infamy doesn’t mean skill. People keep making the mistake that the more widespread something is in the news, the more skill, the more of a “conspiracy” there must be behind it. This is not true. Virus/worm writers often do newsworthy things by accident. Indeed, the history of worms, starting with the Morris Worm, has been things running out of control more than the author’s expectations.

What makes nPetya newsworthy isn’t the EternalBlue exploit or the wiper feature. Instead, the creators got lucky with MeDoc. The software is used by every major organization in the Ukraine, and at the same time, their website was horribly insecure — laughably insecure. Furthermore, it’s autoupdate feature didn’t check cryptographic signatures. No hacker can plan for this level of widespread incompetence — it’s just extreme luck.

Thus, the effect of bumbling around is something that hit the Ukraine pretty hard, but it’s not necessarily the intent of the creators. It’s like how the Slammer worm hit South Korea pretty hard, or how the Witty worm hit the DoD pretty hard. These things look “targeted”, especially to the victims, but it was by pure chance (provably so, in the case of Witty).

Certainly, MeDoc was targeted. But then, targeting a single organization is the norm for ransomware. They have to do it that way, giving each target a different Bitcoin address for payment. That it then spread to the entire Ukraine, and further, is the sort of thing that typically surprises worm writers.

Finally, there’s little reason to believe that there needs to be a “smokescreen”. Russian hackers are targeting the Ukraine all the time. Whether Russian hackers are to blame for “ransomware” vs. “wiper” makes little difference.

Conclusion

We know that Russian hackers are constantly targeting the Ukraine. Therefore, the theory that this was nPetya’s goal all along, to destroy Ukraines computers, is a good one.

Yet, there’s no actual “evidence” of this. nPetya’s issues are just as easily explained by normal software bugs. The smokescreen isn’t needed. The boot record bug isn’t needed. The single email address that was shutdown isn’t significant, since half of all ransomware uses the same technique.

The experts who disagree with me are really smart/experienced people who you should generally trust. It’s just that I can’t see their evidence.

Update: I wrote another blogpost about “survivorship bias“, refuting the claim by many experts talking about the sophistication of the spreading feature.


Update: comment asks “why is there no Internet spreading code?”. The answer is “I don’t know”, but unanswerable questions aren’t evidence of a conspiracy. “What aren’t there any stars in the background?” isn’t proof the moon landings are fake, such because you can’t answer the question. One guess is that you never want ransomware to spread that far, until you’ve figured out how to get payment from so many people.

Traveling “Kodi Repair Men” Are Apparently a Thing Now

Post Syndicated from Andy original https://torrentfreak.com/traveling-kodi-repair-men-are-apparently-a-thing-now-170625/

Earlier this month, third-party Kodi add-on ZemTV and the TVAddons library were sued in a federal court in Texas.

The complaint, filed by American satellite and broadcast provider Dish Network, accused the pair of copyright infringement and demanded $150,000 for each offense.

With that case continuing, there has been significant fallout. Not only has the TVAddons repository disappeared but addon developers have been falling like dominos.

Of course, there are large numbers of people out there who are able to acquire and install new addons to restore performance to their faltering setups. These enthusiasts can weather the storms, with most understanding that such setbacks are all part of the piracy experience.

However, unlike most other types of Internet piracy, the world of augmented Kodi setups has a somewhat unusual characteristic.

Although numbers are impossible to come by, it’s likely that the majority of users have no idea how the software in their ‘pirate’ box actually works. This is because through convenience or lack of knowledge they bought their device already setup. So what can these people do?

Well, for some it’s a case of trawling the Internet for help and advice to learn how to reprogram the hardware themselves. It may take time, but those with the patience will be glad they did since it will help them deal with similar problems in the future.

For others, it’s taking the misguided route of trying to get the entirely legal (and probably sick-to-the-teeth) official Kodi team to solve their problems on Twitter. Pro tip: Don’t bother, they’re not interested.

Kodi.tv are not interested in piracy problems

It’s likely that the remainder will take their device back to where they bought it, complain like crazy, and then get things fixed for a small fee. But for those running out of options, never fear – there’s another innovative solution available.

In a local pub this week I overheard a discussion about “everybody’s Kodi going off” which wasn’t a big shock given recent developments. However, what did surprise me was the revelation that a local guy is now touring pubs in the area doing on-site “Kodi repairs.”

To put things back in working order using a laptop he’s charging $25/£20/€23 or, for those with an Amazon Firestick, a $50/£40 trade-in for a new, fully-loaded stick. Apparently, the whole thing takes about 15 to 20 mins and is conveniently carried out while having a drink. While obviously illegal, it’s amazing how quickly opportunists step in to make a few bucks.

That being said, the notion of ‘Kodi repair men’ appearing in the flesh is perhaps not such a surprise after all. Countless millions of these devices have been sold, and they invariably go wrong when pirate sources have issues. In reality, it would be more of a surprise if repairers didn’t exist because there’s clearly a lot of demand.

But exist they do and some are even doing home visits. One, who offers to assist people “for a small call out charge” via his Facebook page, has been receiving glowing reviews, like the one shown below.

Thanks for the help KodiMan

In many cases, these “repair men” are actually the same people selling the pre-configured boxes in the first place. Like pirate DVD sellers, PlayStation modders, and similar characters before them, they’re heroes to many people, particularly those in cash-deprived areas. They’re seen as Robin Hoods who can cut subscription TV prices by 95% and ensure sporting events keep flowing for next to nothing.

What remains to be seen though is how busy these people will be in the future. When people’s devices stop working there’s obviously a lot of bad feeling, so paying each time for “repairs” could eventually become tiresome. That’s certainly what copyright holders are hoping for, so expect further action against more addon providers in the future.

But in the meantime and despite the trouble, ‘pirate’ Kodi devices are still selling like hot cakes. Despite suggestions to the contrary, they’re easily purchased from sites like eBay, and plenty of local publications are carrying ads. But for those prepared to do the work themselves, everything is a lot cheaper and easier to fix when it goes wrong.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

How to Create an AMI Builder with AWS CodeBuild and HashiCorp Packer – Part 2

Post Syndicated from Heitor Lessa original https://aws.amazon.com/blogs/devops/how-to-create-an-ami-builder-with-aws-codebuild-and-hashicorp-packer-part-2/

Written by AWS Solutions Architects Jason Barto and Heitor Lessa

 
In Part 1 of this post, we described how AWS CodeBuild, AWS CodeCommit, and HashiCorp Packer can be used to build an Amazon Machine Image (AMI) from the latest version of Amazon Linux. In this post, we show how to use AWS CodePipeline, AWS CloudFormation, and Amazon CloudWatch Events to continuously ship new AMIs. We use Ansible by Red Hat to harden the OS on the AMIs through a well-known set of security controls outlined by the Center for Internet Security in its CIS Amazon Linux Benchmark.

You’ll find the source code for this post in our GitHub repo.

At the end of this post, we will have the following architecture:

Requirements

 
To follow along, you will need Git and a text editor. Make sure Git is configured to work with AWS CodeCommit, as described in Part 1.

Technologies

 
In addition to the services and products used in Part 1 of this post, we also use these AWS services and third-party software:

AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.

Amazon CloudWatch Events enables you to react selectively to events in the cloud and in your applications. Specifically, you can create CloudWatch Events rules that match event patterns, and take actions in response to those patterns.

AWS CodePipeline is a continuous integration and continuous delivery service for fast and reliable application and infrastructure updates. AWS CodePipeline builds, tests, and deploys your code every time there is a code change, based on release process models you define.

Amazon SNS is a fast, flexible, fully managed push notification service that lets you send individual messages or to fan out messages to large numbers of recipients. Amazon SNS makes it simple and cost-effective to send push notifications to mobile device users or email recipients. The service can even send messages to other distributed services.

Ansible is a simple IT automation system that handles configuration management, application deployment, cloud provisioning, ad-hoc task-execution, and multinode orchestration.

Getting Started

 
We use CloudFormation to bootstrap the following infrastructure:

Component Purpose
AWS CodeCommit repository Git repository where the AMI builder code is stored.
S3 bucket Build artifact repository used by AWS CodePipeline and AWS CodeBuild.
AWS CodeBuild project Executes the AWS CodeBuild instructions contained in the build specification file.
AWS CodePipeline pipeline Orchestrates the AMI build process, triggered by new changes in the AWS CodeCommit repository.
SNS topic Notifies subscribed email addresses when an AMI build is complete.
CloudWatch Events rule Defines how the AMI builder should send a custom event to notify an SNS topic.
Region AMI Builder Launch Template
N. Virginia (us-east-1)
Ireland (eu-west-1)

After launching the CloudFormation template linked here, we will have a pipeline in the AWS CodePipeline console. (Failed at this stage simply means we don’t have any data in our newly created AWS CodeCommit Git repository.)

Next, we will clone the newly created AWS CodeCommit repository.

If this is your first time connecting to a AWS CodeCommit repository, please see instructions in our documentation on Setup steps for HTTPS Connections to AWS CodeCommit Repositories.

To clone the AWS CodeCommit repository (console)

  1. From the AWS Management Console, open the AWS CloudFormation console.
  2. Choose the AMI-Builder-Blogpost stack, and then choose Output.
  3. Make a note of the Git repository URL.
  4. Use git to clone the repository.

For example: git clone https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/AMI-Builder_repo

To clone the AWS CodeCommit repository (CLI)

# Retrieve CodeCommit repo URL
git_repo=$(aws cloudformation describe-stacks --query 'Stacks[0].Outputs[?OutputKey==`GitRepository`].OutputValue' --output text --stack-name "AMI-Builder-Blogpost")

# Clone repository locally
git clone ${git_repo}

Bootstrap the Repo with the AMI Builder Structure

 
Now that our infrastructure is ready, download all the files and templates required to build the AMI.

Your local Git repo should have the following structure:

.
├── ami_builder_event.json
├── ansible
├── buildspec.yml
├── cloudformation
├── packer_cis.json

Next, push these changes to AWS CodeCommit, and then let AWS CodePipeline orchestrate the creation of the AMI:

git add .
git commit -m "My first AMI"
git push origin master

AWS CodeBuild Implementation Details

 
While we wait for the AMI to be created, let’s see what’s changed in our AWS CodeBuild buildspec.yml file:

...
phases:
  ...
  build:
    commands:
      ...
      - ./packer build -color=false packer_cis.json | tee build.log
  post_build:
    commands:
      - egrep "${AWS_REGION}\:\sami\-" build.log | cut -d' ' -f2 > ami_id.txt
      # Packer doesn't return non-zero status; we must do that if Packer build failed
      - test -s ami_id.txt || exit 1
      - sed -i.bak "s/<<AMI-ID>>/$(cat ami_id.txt)/g" ami_builder_event.json
      - aws events put-events --entries file://ami_builder_event.json
      ...
artifacts:
  files:
    - ami_builder_event.json
    - build.log
  discard-paths: yes

In the build phase, we capture Packer output into a file named build.log. In the post_build phase, we take the following actions:

  1. Look up the AMI ID created by Packer and save its findings to a temporary file (ami_id.txt).
  2. Forcefully make AWS CodeBuild to fail if the AMI ID (ami_id.txt) is not found. This is required because Packer doesn’t fail if something goes wrong during the AMI creation process. We have to tell AWS CodeBuild to stop by informing it that an error occurred.
  3. If an AMI ID is found, we update the ami_builder_event.json file and then notify CloudWatch Events that the AMI creation process is complete.
  4. CloudWatch Events publishes a message to an SNS topic. Anyone subscribed to the topic will be notified in email that an AMI has been created.

Lastly, the new artifacts phase instructs AWS CodeBuild to upload files built during the build process (ami_builder_event.json and build.log) to the S3 bucket specified in the Outputs section of the CloudFormation template. These artifacts can then be used as an input artifact in any later stage in AWS CodePipeline.

For information about customizing the artifacts sequence of the buildspec.yml, see the Build Specification Reference for AWS CodeBuild.

CloudWatch Events Implementation Details

 
CloudWatch Events allow you to extend the AMI builder to not only send email after the AMI has been created, but to hook up any of the supported targets to react to the AMI builder event. This event publication means you can decouple from Packer actions you might take after AMI completion and plug in other actions, as you see fit.

For more information about targets in CloudWatch Events, see the CloudWatch Events API Reference.

In this case, CloudWatch Events should receive the following event, match it with a rule we created through CloudFormation, and publish a message to SNS so that you can receive an email.

Example CloudWatch custom event

[
        {
            "Source": "com.ami.builder",
            "DetailType": "AmiBuilder",
            "Detail": "{ \"AmiStatus\": \"Created\"}",
            "Resources": [ "ami-12cd5guf" ]
        }
]

Cloudwatch Events rule

{
  "detail-type": [
    "AmiBuilder"
  ],
  "source": [
    "com.ami.builder"
  ],
  "detail": {
    "AmiStatus": [
      "Created"
    ]
  }
}

Example SNS message sent in email

{
    "version": "0",
    "id": "f8bdede0-b9d7...",
    "detail-type": "AmiBuilder",
    "source": "com.ami.builder",
    "account": "<<aws_account_number>>",
    "time": "2017-04-28T17:56:40Z",
    "region": "eu-west-1",
    "resources": ["ami-112cd5guf "],
    "detail": {
        "AmiStatus": "Created"
    }
}

Packer Implementation Details

 
In addition to the build specification file, there are differences between the current version of the HashiCorp Packer template (packer_cis.json) and the one used in Part 1.

Variables

  "variables": {
    "vpc": "{{env `BUILD_VPC_ID`}}",
    "subnet": "{{env `BUILD_SUBNET_ID`}}",
         “ami_name”: “Prod-CIS-Latest-AMZN-{{isotime \”02-Jan-06 03_04_05\”}}”
  },
  • ami_name: Prefixes a name used by Packer to tag resources during the Builders sequence.
  • vpc and subnet: Environment variables defined by the CloudFormation stack parameters.

We no longer assume a default VPC is present and instead use the VPC and subnet specified in the CloudFormation parameters. CloudFormation configures the AWS CodeBuild project to use these values as environment variables. They are made available throughout the build process.

That allows for more flexibility should you need to change which VPC and subnet will be used by Packer to launch temporary resources.

Builders

  "builders": [{
    ...
    "ami_name": “{{user `ami_name`| clean_ami_name}}”,
    "tags": {
      "Name": “{{user `ami_name`}}”,
    },
    "run_tags": {
      "Name": “{{user `ami_name`}}",
    },
    "run_volume_tags": {
      "Name": “{{user `ami_name`}}",
    },
    "snapshot_tags": {
      "Name": “{{user `ami_name`}}",
    },
    ...
    "vpc_id": "{{user `vpc` }}",
    "subnet_id": "{{user `subnet` }}"
  }],

We now have new properties (*_tag) and a new function (clean_ami_name) and launch temporary resources in a VPC and subnet specified in the environment variables. AMI names can only contain a certain set of ASCII characters. If the input in project deviates from the expected characters (for example, includes whitespace or slashes), Packer’s clean_ami_name function will fix it.

For more information, see functions on the HashiCorp Packer website.

Provisioners

  "provisioners": [
    {
        "type": "shell",
        "inline": [
            "sudo pip install ansible"
        ]
    }, 
    {
        "type": "ansible-local",
        "playbook_file": "ansible/playbook.yaml",
        "role_paths": [
            "ansible/roles/common"
        ],
        "playbook_dir": "ansible",
        "galaxy_file": "ansible/requirements.yaml"
    },
    {
      "type": "shell",
      "inline": [
        "rm .ssh/authorized_keys ; sudo rm /root/.ssh/authorized_keys"
      ]
    }

We used shell provisioner to apply OS patches in Part 1. Now, we use shell to install Ansible on the target machine and ansible-local to import, install, and execute Ansible roles to make our target machine conform to our standards.

Packer uses shell to remove temporary keys before it creates an AMI from the target and temporary EC2 instance.

Ansible Implementation Details

 
Ansible provides OS patching through a custom Common role that can be easily customized for other tasks.

CIS Benchmark and Cloudwatch Logs are implemented through two Ansible third-party roles that are defined in ansible/requirements.yaml as seen in the Packer template.

The Ansible provisioner uses Ansible Galaxy to download these roles onto the target machine and execute them as instructed by ansible/playbook.yaml.

For information about how these components are organized, see the Playbook Roles and Include Statements in the Ansible documentation.

The following Ansible playbook (ansible</playbook.yaml) controls the execution order and custom properties:

---
- hosts: localhost
  connection: local
  gather_facts: true    # gather OS info that is made available for tasks/roles
  become: yes           # majority of CIS tasks require root
  vars:
    # CIS Controls whitepaper:  http://bit.ly/2mGAmUc
    # AWS CIS Whitepaper:       http://bit.ly/2m2Ovrh
    cis_level_1_exclusions:
    # 3.4.2 and 3.4.3 effectively blocks access to all ports to the machine
    ## This can break automation; ignoring it as there are stronger mechanisms than that
      - 3.4.2 
      - 3.4.3
    # CloudWatch Logs will be used instead of Rsyslog/Syslog-ng
    ## Same would be true if any other software doesn't support Rsyslog/Syslog-ng mechanisms
      - 4.2.1.4
      - 4.2.2.4
      - 4.2.2.5
    # Autofs is not installed in newer versions, let's ignore
      - 1.1.19
    # Cloudwatch Logs role configuration
    logs:
      - file: /var/log/messages
        group_name: "system_logs"
  roles:
    - common
    - anthcourtney.cis-amazon-linux
    - dharrisio.aws-cloudwatch-logs-agent

Both third-party Ansible roles can be easily configured through variables (vars). We use Ansible playbook variables to exclude CIS controls that don’t apply to our case and to instruct the CloudWatch Logs agent to stream the /var/log/messages log file to CloudWatch Logs.

If you need to add more OS or application logs, you can easily duplicate the playbook and make changes. The CloudWatch Logs agent will ship configured log messages to CloudWatch Logs.

For more information about parameters you can use to further customize third-party roles, download Ansible roles for the Cloudwatch Logs Agent and CIS Amazon Linux from the Galaxy website.

Committing Changes

 
Now that Ansible and CloudWatch Events are configured as a part of the build process, commiting any changes to the AWS CodeComit Git Repository will triger a new AMI build process that can be followed through the AWS CodePipeline console.

When the build is complete, an email will be sent to the email address you provided as a part of the CloudFormation stack deployment. The email serves as notification that an AMI has been built and is ready for use.

Summary

 
We used AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Packer, and Ansible to build a pipeline that continuously builds new, hardened CIS AMIs. We used Amazon SNS so that email addresses subscribed to a SNS topic are notified upon completion of the AMI build.

By treating our AMI creation process as code, we can iterate and track changes over time. In this way, it’s no different from a software development workflow. With that in mind, software patches, OS configuration, and logs that need to be shipped to a central location are only a git commit away.

Next Steps

 
Here are some ideas to extend this AMI builder:

  • Hook up a Lambda function in Cloudwatch Events to update EC2 Auto Scaling configuration upon completion of the AMI build.
  • Use AWS CodePipeline parallel steps to build multiple Packer images.
  • Add a commit ID as a tag for the AMI you created.
  • Create a scheduled Lambda function through Cloudwatch Events to clean up old AMIs based on timestamp (name or additional tag).
  • Implement Windows support for the AMI builder.
  • Create a cross-account or cross-region AMI build.

Cloudwatch Events allow the AMI builder to decouple AMI configuration and creation so that you can easily add your own logic using targets (AWS Lambda, Amazon SQS, Amazon SNS) to add events or recycle EC2 instances with the new AMI.

If you have questions or other feedback, feel free to leave it in the comments or contribute to the AMI Builder repo on GitHub.