Tag Archives: request

Treasure Trove of AACS 2.0 UHD Blu-Ray Keys Leak Online

Post Syndicated from Ernesto original https://torrentfreak.com/treasure-trove-of-aacs-2-0-uhd-blu-ray-keys-leak-online-171211/

Nowadays, movie buffs and videophiles find it hard to imagine a good viewing experience without UHD content, but disc rippers and pirates have remained on the sidelines for a long time.

Protected with strong AACS 2.0 encryption, UHD Blu-ray discs have long been one of the last bastions movie pirates had yet to breach.

This year there have been some major developments on this front, as full copies of UHD discs started to leak online. While it remained unclear how these were ripped, it was a definite milestone.

Just a few months ago another breakthrough came when a Russian company released a Windows tool called DeUHD that could rip UHD Blu-ray discs. Again, the method for obtaining the keys was not revealed.

Now there’s another setback for AACS LA, the licensing outfit founded by Warner Bros, Disney, Microsoft, Intel, and others. On various platforms around the Internet, copies of 72 AACS 2.0 keys are being shared.

The first mention we can find was posted a few days ago in a ten-year-old forum thread in the Doom9 forums. Since then it has been replicated a few times, without much fanfare.

The keys

The keys in question are confirmed to work and allow people to rip UHD Blu-ray discs of movies with freely available software such as MakeMKV. They are also different from the DeUHD list, so there are more people who know how to get them.

The full list of leaked keys includes movies such as Deadpool, Hancock, Passengers, Star Trek: Into Darkness, and The Martian. Some movies have multiple keys, likely as a result of different disc releases.

The leaked keys are also relevant for another reason. Ten years ago, a hacker leaked the AACS cryptographic key “09 F9” online which prompted the MPAA and AACS LA to issue DMCA takedown requests to sites where it surfaced.

This escalated into a censorship debate when Digg started removing articles that referenced the leak, triggering a massive backlash.

Thus fas the response to the AACS 2.0 leaks has been pretty tame, but it’s still early days. A user who posted the leaked keys on MyCe has already removed them due to possible copyright problems, so it’s definitely still a touchy subject.

The question that remains now is how the hacker managed to secure the keys, and if AACS 2.0 has been permanently breached.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Managing AWS Lambda Function Concurrency

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/managing-aws-lambda-function-concurrency/

One of the key benefits of serverless applications is the ease in which they can scale to meet traffic demands or requests, with little to no need for capacity planning. In AWS Lambda, which is the core of the serverless platform at AWS, the unit of scale is a concurrent execution. This refers to the number of executions of your function code that are happening at any given time.

Thinking about concurrent executions as a unit of scale is a fairly unique concept. In this post, I dive deeper into this and talk about how you can make use of per function concurrency limits in Lambda.

Understanding concurrency in Lambda

Instead of diving right into the guts of how Lambda works, here’s an appetizing analogy: a magical pizza.
Yes, a magical pizza!

This magical pizza has some unique properties:

  • It has a fixed maximum number of slices, such as 8.
  • Slices automatically re-appear after they are consumed.
  • When you take a slice from the pizza, it does not re-appear until it has been completely consumed.
  • One person can take multiple slices at a time.
  • You can easily ask to have the number of slices increased, but they remain fixed at any point in time otherwise.

Now that the magical pizza’s properties are defined, here’s a hypothetical situation of some friends sharing this pizza.

Shawn, Kate, Daniela, Chuck, Ian and Avleen get together every Friday to share a pizza and catch up on their week. As there is just six of them, they can easily all enjoy a slice of pizza at a time. As they finish each slice, it re-appears in the pizza pan and they can take another slice again. Given the magical properties of their pizza, they can continue to eat all they want, but with two very important constraints:

  • If any of them take too many slices at once, the others may not get as much as they want.
  • If they take too many slices, they might also eat too much and get sick.

One particular week, some of the friends are hungrier than the rest, taking two slices at a time instead of just one. If more than two of them try to take two pieces at a time, this can cause contention for pizza slices. Some of them would wait hungry for the slices to re-appear. They could ask for a pizza with more slices, but then run the same risk again later if more hungry friends join than planned for.

What can they do?

If the friends agreed to accept a limit for the maximum number of slices they each eat concurrently, both of these issues are avoided. Some could have a maximum of 2 of the 8 slices, or other concurrency limits that were more or less. Just so long as they kept it at or under eight total slices to be eaten at one time. This would keep any from going hungry or eating too much. The six friends can happily enjoy their magical pizza without worry!

Concurrency in Lambda

Concurrency in Lambda actually works similarly to the magical pizza model. Each AWS Account has an overall AccountLimit value that is fixed at any point in time, but can be easily increased as needed, just like the count of slices in the pizza. As of May 2017, the default limit is 1000 “slices” of concurrency per AWS Region.

Also like the magical pizza, each concurrency “slice” can only be consumed individually one at a time. After consumption, it becomes available to be consumed again. Services invoking Lambda functions can consume multiple slices of concurrency at the same time, just like the group of friends can take multiple slices of the pizza.

Let’s take our example of the six friends and bring it back to AWS services that commonly invoke Lambda:

  • Amazon S3
  • Amazon Kinesis
  • Amazon DynamoDB
  • Amazon Cognito

In a single account with the default concurrency limit of 1000 concurrent executions, any of these four services could invoke enough functions to consume the entire limit or some part of it. Just like with the pizza example, there is the possibility for two issues to pop up:

  • One or more of these services could invoke enough functions to consume a majority of the available concurrency capacity. This could cause others to be starved for it, causing failed invocations.
  • A service could consume too much concurrent capacity and cause a downstream service or database to be overwhelmed, which could cause failed executions.

For Lambda functions that are launched in a VPC, you have the potential to consume the available IP addresses in a subnet or the maximum number of elastic network interfaces to which your account has access. For more information, see Configuring a Lambda Function to Access Resources in an Amazon VPC. For information about elastic network interface limits, see Network Interfaces section in the Amazon VPC Limits topic.

One way to solve both of these problems is applying a concurrency limit to the Lambda functions in an account.

Configuring per function concurrency limits

You can now set a concurrency limit on individual Lambda functions in an account. The concurrency limit that you set reserves a portion of your account level concurrency for a given function. All of your functions’ concurrent executions count against this account-level limit by default.

If you set a concurrency limit for a specific function, then that function’s concurrency limit allocation is deducted from the shared pool and assigned to that specific function. AWS also reserves 100 units of concurrency for all functions that don’t have a specified concurrency limit set. This helps to make sure that future functions have capacity to be consumed.

Going back to the example of the consuming services, you could set throttles for the functions as follows:

Amazon S3 function = 350
Amazon Kinesis function = 200
Amazon DynamoDB function = 200
Amazon Cognito function = 150
Total = 900

With the 100 reserved for all non-concurrency reserved functions, this totals the account limit of 1000.

Here’s how this works. To start, create a basic Lambda function that is invoked via Amazon API Gateway. This Lambda function returns a single “Hello World” statement with an added sleep time between 2 and 5 seconds. The sleep time simulates an API providing some sort of capability that can take a varied amount of time. The goal here is to show how an API that is underloaded can reach its concurrency limit, and what happens when it does.
To create the example function

  1. Open the Lambda console.
  2. Choose Create Function.
  3. For Author from scratch, enter the following values:
    1. For Name, enter a value (such as concurrencyBlog01).
    2. For Runtime, choose Python 3.6.
    3. For Role, choose Create new role from template and enter a name aligned with this function, such as concurrencyBlogRole.
  4. Choose Create function.
  5. The function is created with some basic example code. Replace that code with the following:

import time
from random import randint
seconds = randint(2, 5)

def lambda_handler(event, context):
time.sleep(seconds)
return {"statusCode": 200,
"body": ("Hello world, slept " + str(seconds) + " seconds"),
"headers":
{
"Access-Control-Allow-Headers": "Content-Type,X-Amz-Date,Authorization,X-Api-Key,X-Amz-Security-Token",
"Access-Control-Allow-Methods": "GET,OPTIONS",
}}

  1. Under Basic settings, set Timeout to 10 seconds. While this function should only ever take up to 5-6 seconds (with the 5-second max sleep), this gives you a little bit of room if it takes longer.

  1. Choose Save at the top right.

At this point, your function is configured for this example. Test it and confirm this in the console:

  1. Choose Test.
  2. Enter a name (it doesn’t matter for this example).
  3. Choose Create.
  4. In the console, choose Test again.
  5. You should see output similar to the following:

Now configure API Gateway so that you have an HTTPS endpoint to test against.

  1. In the Lambda console, choose Configuration.
  2. Under Triggers, choose API Gateway.
  3. Open the API Gateway icon now shown as attached to your Lambda function:

  1. Under Configure triggers, leave the default values for API Name and Deployment stage. For Security, choose Open.
  2. Choose Add, Save.

API Gateway is now configured to invoke Lambda at the Invoke URL shown under its configuration. You can take this URL and test it in any browser or command line, using tools such as “curl”:


$ curl https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Hello world, slept 2 seconds

Throwing load at the function

Now start throwing some load against your API Gateway + Lambda function combo. Right now, your function is only limited by the total amount of concurrency available in an account. For this example account, you might have 850 unreserved concurrency out of a full account limit of 1000 due to having configured a few concurrency limits already (also the 100 concurrency saved for all functions without configured limits). You can find all of this information on the main Dashboard page of the Lambda console:

For generating load in this example, use an open source tool called “hey” (https://github.com/rakyll/hey), which works similarly to ApacheBench (ab). You test from an Amazon EC2 instance running the default Amazon Linux AMI from the EC2 console. For more help with configuring an EC2 instance, follow the steps in the Launch Instance Wizard.

After the EC2 instance is running, SSH into the host and run the following:


sudo yum install go
go get -u github.com/rakyll/hey

“hey” is easy to use. For these tests, specify a total number of tests (5,000) and a concurrency of 50 against the API Gateway URL as follows(replace the URL here with your own):


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

The output from “hey” tells you interesting bits of information:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

Summary:
Total: 381.9978 secs
Slowest: 9.4765 secs
Fastest: 0.0438 secs
Average: 3.2153 secs
Requests/sec: 13.0891
Total data: 140024 bytes
Size/request: 28 bytes

Response time histogram:
0.044 [1] |
0.987 [2] |
1.930 [0] |
2.874 [1803] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
3.817 [1518] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
4.760 [719] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
5.703 [917] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
6.647 [13] |
7.590 [14] |
8.533 [9] |
9.477 [4] |

Latency distribution:
10% in 2.0224 secs
25% in 2.0267 secs
50% in 3.0251 secs
75% in 4.0269 secs
90% in 5.0279 secs
95% in 5.0414 secs
99% in 5.1871 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0003 secs, 0.0000 secs, 0.0332 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0046 secs
req write: 0.0000 secs, 0.0000 secs, 0.0005 secs
resp wait: 3.2149 secs, 0.0438 secs, 9.4472 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0004 secs

Status code distribution:
[200] 4997 responses
[502] 3 responses

You can see a helpful histogram and latency distribution. Remember that this Lambda function has a random sleep period in it and so isn’t entirely representational of a real-life workload. Those three 502s warrant digging deeper, but could be due to Lambda cold-start timing and the “second” variable being the maximum of 5, causing the Lambda functions to time out. AWS X-Ray and the Amazon CloudWatch logs generated by both API Gateway and Lambda could help you troubleshoot this.

Configuring a concurrency reservation

Now that you’ve established that you can generate this load against the function, I show you how to limit it and protect a backend resource from being overloaded by all of these requests.

  1. In the console, choose Configure.
  2. Under Concurrency, for Reserve concurrency, enter 25.

  1. Click on Save in the top right corner.

You could also set this with the AWS CLI using the Lambda put-function-concurrency command or see your current concurrency configuration via Lambda get-function. Here’s an example command:


$ aws lambda get-function --function-name concurrencyBlog01 --output json --query Concurrency
{
"ReservedConcurrentExecutions": 25
}

Either way, you’ve set the Concurrency Reservation to 25 for this function. This acts as both a limit and a reservation in terms of making sure that you can execute 25 concurrent functions at all times. Going above this results in the throttling of the Lambda function. Depending on the invoking service, throttling can result in a number of different outcomes, as shown in the documentation on Throttling Behavior. This change has also reduced your unreserved account concurrency for other functions by 25.

Rerun the same load generation as before and see what happens. Previously, you tested at 50 concurrency, which worked just fine. By limiting the Lambda functions to 25 concurrency, you should see rate limiting kick in. Run the same test again:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01

While this test runs, refresh the Monitoring tab on your function detail page. You see the following warning message:

This is great! It means that your throttle is working as configured and you are now protecting your downstream resources from too much load from your Lambda function.

Here is the output from a new “hey” command:


$ ./go/bin/hey -n 5000 -c 50 https://ofixul557l.execute-api.us-east-1.amazonaws.com/prod/concurrencyBlog01
Summary:
Total: 379.9922 secs
Slowest: 7.1486 secs
Fastest: 0.0102 secs
Average: 1.1897 secs
Requests/sec: 13.1582
Total data: 164608 bytes
Size/request: 32 bytes

Response time histogram:
0.010 [1] |
0.724 [3075] |∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
1.438 [0] |
2.152 [811] |∎∎∎∎∎∎∎∎∎∎∎
2.866 [11] |
3.579 [566] |∎∎∎∎∎∎∎
4.293 [214] |∎∎∎
5.007 [1] |
5.721 [315] |∎∎∎∎
6.435 [4] |
7.149 [2] |

Latency distribution:
10% in 0.0130 secs
25% in 0.0147 secs
50% in 0.0205 secs
75% in 2.0344 secs
90% in 4.0229 secs
95% in 5.0248 secs
99% in 5.0629 secs

Details (average, fastest, slowest):
DNS+dialup: 0.0004 secs, 0.0000 secs, 0.0537 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0184 secs
req write: 0.0000 secs, 0.0000 secs, 0.0016 secs
resp wait: 1.1892 secs, 0.0101 secs, 7.1038 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0005 secs

Status code distribution:
[502] 3076 responses
[200] 1924 responses

This looks fairly different from the last load test run. A large percentage of these requests failed fast due to the concurrency throttle failing them (those with the 0.724 seconds line). The timing shown here in the histogram represents the entire time it took to get a response between the EC2 instance and API Gateway calling Lambda and being rejected. It’s also important to note that this example was configured with an edge-optimized endpoint in API Gateway. You see under Status code distribution that 3076 of the 5000 requests failed with a 502, showing that the backend service from API Gateway and Lambda failed the request.

Other uses

Managing function concurrency can be useful in a few other ways beyond just limiting the impact on downstream services and providing a reservation of concurrency capacity. Here are two other uses:

  • Emergency kill switch
  • Cost controls

Emergency kill switch

On occasion, due to issues with applications I’ve managed in the past, I’ve had a need to disable a certain function or capability of an application. By setting the concurrency reservation and limit of a Lambda function to zero, you can do just that.

With the reservation set to zero every invocation of a Lambda function results in being throttled. You could then work on the related parts of the infrastructure or application that aren’t working, and then reconfigure the concurrency limit to allow invocations again.

Cost controls

While I mentioned how you might want to use concurrency limits to control the downstream impact to services or databases that your Lambda function might call, another resource that you might be cautious about is money. Setting the concurrency throttle is another way to help control costs during development and testing of your application.

You might want to prevent against a function performing a recursive action too quickly or a development workload generating too high of a concurrency. You might also want to protect development resources connected to this function from generating too much cost, such as APIs that your Lambda function calls.

Conclusion

Concurrent executions as a unit of scale are a fairly unique characteristic about Lambda functions. Placing limits on how many concurrency “slices” that your function can consume can prevent a single function from consuming all of the available concurrency in an account. Limits can also prevent a function from overwhelming a backend resource that isn’t as scalable.

Unlike monolithic applications or even microservices where there are mixed capabilities in a single service, Lambda functions encourage a sort of “nano-service” of small business logic directly related to the integration model connected to the function. I hope you’ve enjoyed this post and configure your concurrency limits today!

MQTT 5: Introduction to MQTT 5

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/mqtt-5-introduction-to-mqtt-5/

MQTT 5 Introduction

Introduction to MQTT 5

Welcome to our brand new blog post series MQTT 5 – Features and Hidden Gems. Without doubt, the MQTT protocol is the most popular and best received Internet of Things protocol as of today (see the Google Trends Chart below), supporting large scale use cases ranging from Connected Cars, Manufacturing Systems, Logistics, Military Use Cases to Enterprise Chat Applications, Mobile Apps and connecting constrained IoT devices. Of course, with huge amounts of production deployments, the wish list for future versions of the MQTT protocol grew bigger and bigger.

MQTT 5 is by far the most extensive and most feature-rich update to the MQTT protocol specification ever. We are going to explore all hidden gems and protocol features with use case discussion and useful background information – one blog post at a time.

Be sure to read the MQTT Essentials Blog Post series first before diving into our new MQTT 5 series. To get the most out of the new blog posts, it’s important to have a basic understanding of the MQTT 3.1.1 protocol as we are going to highlight key changes as well as all improvements.

Hollywood and Netflix Ask Court to Seize Tickbox Streaming Devices

Post Syndicated from Ernesto original https://torrentfreak.com/hollywood-and-netflix-ask-court-to-seize-tickbox-streaming-devices-171209/

More and more people are starting to use Kodi-powered set-top boxes to stream video content to their TVs.

While Kodi itself is a neutral platform, sellers who ship devices with unauthorized add-ons give it a bad reputation.

According to the Alliance for Creativity and Entertainment (ACE), an anti-piracy partnership between Hollywood studios, Netflix, Amazon, and more than two dozen other companies, Tickbox TV is one of these bad actors.

Earlier this year, ACE filed a lawsuit against the Georgia-based company, which sells set-top boxes that allow users to stream a variety of popular media. The Tickbox devices use the Kodi media player and come with instructions on how to add various add-ons.

According to ACE, these devices are nothing more than pirate tools, allowing buyers to stream copyright infringing content. “TickBox promotes and distributes TickBox TV for infringing use, and that is exactly the result of its use,” they told court this week.

After the complaint was filed in October, Tickbox made some cosmetic changes to the site, removing some allegedly inducing language. The streaming devices are still for sale, however, but not for long if it’s up to the media giants.

This week ACE submitted a request for a preliminary injunction to the court, hoping to stop Tickbox’s sales activities.

“TickBox is intentionally inducing infringement, pure and simple. Plaintiffs respectfully request that the Court enter a preliminary injunction that requires TickBox to halt its flagrantly illegal conduct immediately,” they write in their application.

The companies explain that that since Tickbox is causing irreparable harm, all existing devices should be impounded.

“[A]ll TickBox TV devices in the possession of TickBox and all of its officers, directors, agents, servants, and employees, and all persons in active concert or participation or in privity with any of them are to be impounded and shall be retained by Defendant until further order of the Court,” the proposed order reads.

In addition, Tickbox should push out a software update which remove all infringing add-ons from the devices that were previously sold.

“TickBox shall, via software update, remove from all distributed TickBox TV devices all Kodi ‘Themes,’ ‘Builds,’ ‘Addons,’ or any other software that facilitates the infringing public performances of Plaintiffs’ Copyrighted Works.”

Among others, the list of allegedly infringing add-ons and themes includes Spinz, Lodi Black, Stream on Fire, Wookie, Aqua, CMM, Spanish Quasar, Paradox, Covenant, Elysium, UK Turk, Gurzil, Maverick, and Poseidon.

The filing shows that ACE is serious about its efforts to stop the sale of these type of streaming devices. Tickbox has yet to reply to the original complaint or the injunction request.

While this is the first US lawsuit of its kind, the anti-piracy conglomerate has been rather active in recent weeks. The group has successfully pressured several addon developers to quit and has been involved in enforcement actions around the globe.

A copy of the proposed preliminary injunction is available here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Movie Company Has No Right to Sue, Accused Pirate Argues

Post Syndicated from Ernesto original https://torrentfreak.com/movie-company-has-no-right-to-sue-accused-pirate-argues-171208/

In recent years, a group of select companies have pressured hundreds of thousands of alleged pirates to pay significant settlement fees, or face legal repercussions.

These so-called “copyright trolling” efforts have also been a common occurrence in the United States for more than half a decade, and still are today.

While copyright holders should be able to take legitimate piracy claims to court, not all cases are as strong as they first appear. Many defendants have brought up flaws, often in relation to the IP-address evidence, but an accused pirate in Oregon takes things up a notch.

Lingfu Zhang, represented by attorney David Madden, has turned the tables on the makers of the film Fathers & Daughters. The man denies having downloaded the movie but also points out that the filmmakers have signed away their online distribution rights.

The issue was brought up in previous months, but the relevant findings were only unsealed this week. They show that the movie company (F&D), through a sales agent, sold the online distribution rights to a third party.

While this is not uncommon in the movie business, it means that they no longer have the right to distribute the movie online, a right Zhang was accused of violating. This is also what his attorney pointed out to the court, asking for a judgment in favor of his client.

“ZHANG denies downloading the movie but Defendant’s current motion for summary judgment challenges a different portion of F&D’s case: Defendant argues that F&D has alienated all of the relevant rights necessary to sue for infringement under the Copyright Act,” Madden writes.

The filmmakers opposed the request and pointed out that they still had some rights. However, this is irrelevant according to the defense, since the distribution rights are not owned by them, but by a company that’s not part of the lawsuit.

“Plaintiff claims, for example, that it still owns the right to exploit the movie on airlines and oceangoing vessels. That may or may not be true – Plaintiff has not submitted any evidence on the question – but ZHANG is not accused of showing the movie on an airplane or a cruise ship.

“He is accused of downloading it over the Internet, which is an infringement that affects only an exclusive right owned by non-party DISTRIBUTOR 2,” Madden adds.

Interestingly, an undated addendum to the licensing agreement, allegedly created after the lawsuit was started, states that the filmmakers would keep their “anti-piracy” rights, as can be seen below.

Anti-Piracy rights?

This doesn’t save the filmmaker, according to the defense. The “licensor” who keeps these anti-piracy and enforcement rights refers to the sales agent, not the filmmaker, Madden writes. In addition, the case is about copyright infringement, and despite the addendum, the filmmakers don’t have the exclusive rights that apply here.

“Plaintiff represented to this Court that it was the ‘proprietor of all copyrights and interests need to bring suit’ […] notwithstanding that it had – years earlier – transferred away all its exclusive rights under Section 106 of the Copyright Act,” the defense lawyer concludes.

“Even viewing all Plaintiff’s agreements in the light most favorable to it, Plaintiff holds nothing more than a bare right to sue, which is not a cognizable right that may be exercised in the courts of this Circuit.”

While the court has yet to decide on the motion, this case could turn into a disaster for the makers of Fathers & Daughters.

If the court agrees that they don’t have the proper rights, defendants in other cases may argue the same. It’s easy to see how their entire trolling scheme would then collapse.

The original memorandum in support of the motion for summary judgment is available here (pdf) and a copy of the reply brief can be found here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Movie & TV Companies Tackle Pirate IPTV in Australia Federal Court

Post Syndicated from Andy original https://torrentfreak.com/movie-tv-companies-tackle-pirate-iptv-in-australia-federal-court-171207/

As movie and TV show piracy has migrated from the desktop towards mobile and living room-based devices, copyright holders have found the need to adapt to a new enemy.

Dealing with streaming services is now high on the agenda, with third-party Kodi addons and various Android apps posing the biggest challenge. Alongside is the much less prevalent but rapidly growing pay IPTV market, in which thousands of premium channels are delivered to homes for a relatively small fee.

In Australia, copyright holders are treating these services in much the same way as torrent sites. They feel that if they can force ISPs to block them, the problem can be mitigated. Most recently, movie and TV show giants Village Roadshow, Disney, Universal, Warner Bros, Twentieth Century Fox, and Paramount filed an application targeting HDSubs+, a pirate IPTV operation servicing thousands of Australians.

Filed in October, the application for the injunction targets Australia’s largest ISPs including Telstra, Optus, TPG, and Vocus, plus their subsidiaries. The movie and TV show companies want them to quickly block HDSubs+, to prevent it from reaching its audience.

HDSubs+ IPTV package
However, blocking isn’t particularly straightforward. Due to the way IPTV services are setup a number of domains need to be blocked, including their sales platforms, EPG (electronic program guide), software (such as an Android app), updates, and sundry other services. In HDSubs+ case around ten domains need to be restricted but in court today, Village Roadshow revealed that probably won’t deal with the problem.

HDSubs+ appears to be undergoing some kind of transformation, possibly to mitigate efforts to block it in Australia. ComputerWorld reports that it is now directing subscribers to update to a new version that works in a more evasive manner.

If they agree, HDSubs+ customers are being migrated over to a service called PressPlayPlus. It works in the same way as the old system but no longer uses the domain names cited in Village Roadshow’s injunction application. This means that DNS blocks, the usual weapon of choice for local ISPs, will prove futile.

Village Roadshow says that with this in mind it may be forced to seek enhanced IP address blocking, unless it is granted a speedy hearing for its application. This, in turn, may result in the normally cooperative ISPs returning to court to argue their case.

“If that’s what you want to do, then you’ll have to amend the orders and let the parties know,” Judge John Nicholas said.

“It’s only the former [DNS blocking] that carriage service providers have agreed to in the past.”

As things stand, Village Roadshow will return to court on December 15 for a case management hearing but in the meantime, the Federal Court must deal with another IPTV-related blocking request.

In common with its Australian and US-based counterparts, Hong Kong-based broadcaster Television Broadcasts Limited (TVB) has launched a similar case asking local ISPs to block another IPTV service.

“Television Broadcasts Limited can confirm that we have commenced legal action in Australia to protect our copyright,” a TVB spokesperson told Computerworld.

TVB wants ISPs including Telstra, Optus, Vocus, and TPG plus their subsidiaries to block access to seven Android-based services named as A1, BlueTV, EVPAD, FunTV, MoonBox, Unblock, and hTV5.

Court documents list 21 URLs maintaining the services. They will all need to be blocked by DNS or other means, if the former proves futile. Online reports suggest that there are similarities among the IPTV products listed above. A demo for the FunTV IPTV service is shown below.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

How to Easily Apply Amazon Cloud Directory Schema Changes with In-Place Schema Upgrades

Post Syndicated from Mahendra Chheda original https://aws.amazon.com/blogs/security/how-to-easily-apply-amazon-cloud-directory-schema-changes-with-in-place-schema-upgrades/

Now, Amazon Cloud Directory makes it easier for you to apply schema changes across your directories with in-place schema upgrades. Your directory now remains available while Cloud Directory applies backward-compatible schema changes such as the addition of new fields. Without migrating data between directories or applying code changes to your applications, you can upgrade your schemas. You also can view the history of your schema changes in Cloud Directory by using version identifiers, which help you track and audit schema versions across directories. If you have multiple instances of a directory with the same schema, you can view the version history of schema changes to manage your directory fleet and ensure that all directories are running with the same schema version.

In this blog post, I demonstrate how to perform an in-place schema upgrade and use schema versions in Cloud Directory. I add additional attributes to an existing facet and add a new facet to a schema. I then publish the new schema and apply it to running directories, upgrading the schema in place. I also show how to view the version history of a directory schema, which helps me to ensure my directory fleet is running the same version of the schema and has the correct history of schema changes applied to it.

Note: I share Java code examples in this post. I assume that you are familiar with the AWS SDK and can use Java-based code to build a Cloud Directory code example. You can apply the concepts I cover in this post to other programming languages such as Python and Ruby.

Cloud Directory fundamentals

I will start by covering a few Cloud Directory fundamentals. If you are already familiar with the concepts behind Cloud Directory facets, schemas, and schema lifecycles, you can skip to the next section.

Facets: Groups of attributes. You use facets to define object types. For example, you can define a device schema by adding facets such as computers, phones, and tablets. A computer facet can track attributes such as serial number, make, and model. You can then use the facets to create computer objects, phone objects, and tablet objects in the directory to which the schema applies.

Schemas: Collections of facets. Schemas define which types of objects can be created in a directory (such as users, devices, and organizations) and enforce validation of data for each object class. All data within a directory must conform to the applied schema. As a result, the schema definition is essentially a blueprint to construct a directory with an applied schema.

Schema lifecycle: The four distinct states of a schema: Development, Published, Applied, and Deleted. Schemas in the Published and Applied states have version identifiers and cannot be changed. Schemas in the Applied state are used by directories for validation as applications insert or update data. You can change schemas in the Development state as many times as you need them to. In-place schema upgrades allow you to apply schema changes to an existing Applied schema in a production directory without the need to export and import the data populated in the directory.

How to add attributes to a computer inventory application schema and perform an in-place schema upgrade

To demonstrate how to set up schema versioning and perform an in-place schema upgrade, I will use an example of a computer inventory application that uses Cloud Directory to store relationship data. Let’s say that at my company, AnyCompany, we use this computer inventory application to track all computers we give to our employees for work use. I previously created a ComputerSchema and assigned its version identifier as 1. This schema contains one facet called ComputerInfo that includes attributes for SerialNumber, Make, and Model, as shown in the following schema details.

Schema: ComputerSchema
Version: 1

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer
Attribute: Make, type: String
Attribute: Model, type: String

AnyCompany has offices in Seattle, Portland, and San Francisco. I have deployed the computer inventory application for each of these three locations. As shown in the lower left part of the following diagram, ComputerSchema is in the Published state with a version of 1. The Published schema is applied to SeattleDirectory, PortlandDirectory, and SanFranciscoDirectory for AnyCompany’s three locations. Implementing separate directories for different geographic locations when you don’t have any queries that cross location boundaries is a good data partitioning strategy and gives your application better response times with lower latency.

Diagram of ComputerSchema in Published state and applied to three directories

Legend for the diagrams in this post

The following code example creates the schema in the Development state by using a JSON file, publishes the schema, and then creates directories for the Seattle, Portland, and San Francisco locations. For this example, I assume the schema has been defined in the JSON file. The createSchema API creates a schema Amazon Resource Name (ARN) with the name defined in the variable, SCHEMA_NAME. I can use the putSchemaFromJson API to add specific schema definitions from the JSON file.

// The utility method to get valid Cloud Directory schema JSON
String validJson = getJsonFile("ComputerSchema_version_1.json")

String SCHEMA_NAME = "ComputerSchema";

String developmentSchemaArn = client.createSchema(new CreateSchemaRequest()
        .withName(SCHEMA_NAME))
        .getSchemaArn();

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(validJson));

The following code example takes the schema that is currently in the Development state and publishes the schema, changing its state to Published.

String SCHEMA_VERSION = "1";
String publishedSchemaArn = client.publishSchema(
        new PublishSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withVersion(SCHEMA_VERSION))
        .getPublishedSchemaArn();

// Our Published schema ARN is as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1

The following code example creates a directory named SeattleDirectory and applies the published schema. The createDirectory API call creates a directory by using the published schema provided in the API parameters. Note that Cloud Directory stores a version of the schema in the directory in the Applied state. I will use similar code to create directories for PortlandDirectory and SanFranciscoDirectory.

String DIRECTORY_NAME = "SeattleDirectory"; 

CreateDirectoryResult directory = client.createDirectory(
        new CreateDirectoryRequest()
        .withName(DIRECTORY_NAME)
        .withSchemaArn(publishedSchemaArn));

String directoryArn = directory.getDirectoryArn();
String appliedSchemaArn = directory.getAppliedSchemaArn();

// This code section can be reused to create directories for Portland and San Francisco locations with the appropriate directory names

// Our directory ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX

// Our applied schema ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

Revising a schema

Now let’s say my company, AnyCompany, wants to add more information for computers and to track which employees have been assigned a computer for work use. I modify the schema to add two attributes to the ComputerInfo facet: Description and OSVersion (operating system version). I make Description optional because it is not important for me to track this attribute for the computer objects I create. I make OSVersion mandatory because it is critical for me to track it for all computer objects so that I can make changes such as applying security patches or making upgrades. Because I make OSVersion mandatory, I must provide a default value that Cloud Directory will apply to objects that were created before the schema revision, in order to handle backward compatibility. Note that you can replace the value in any object with a different value.

I also add a new facet to track computer assignment information, shown in the following updated schema as the ComputerAssignment facet. This facet tracks these additional attributes: Name (the name of the person to whom the computer is assigned), EMail (the email address of the assignee), Department, and department CostCenter. Note that Cloud Directory refers to the previously available version identifier as the Major Version. Because I can now add a minor version to a schema, I also denote the changed schema as Minor Version A.

Schema: ComputerSchema
Major Version: 1
Minor Version: A 

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer 
Attribute: Make, type: String
Attribute: Model, type: Integer
Attribute: Description, type: String, required: NOT_REQUIRED
Attribute: OSVersion, type: String, required: REQUIRED_ALWAYS, default: "Windows 7"

Facet: ComputerAssignment
Attribute: Name, type: String
Attribute: EMail, type: String
Attribute: Department, type: String
Attribute: CostCenter, type: Integer

The following diagram shows the changes that were made when I added another facet to the schema and attributes to the existing facet. The highlighted area of the diagram (bottom left) shows that the schema changes were published.

Diagram showing that schema changes were published

The following code example revises the existing Development schema by adding the new attributes to the ComputerInfo facet and by adding the ComputerAssignment facet. I use a new JSON file for the schema revision, and for the purposes of this example, I am assuming the JSON file has the full schema including planned revisions.

// The utility method to get a valid CloudDirectory schema JSON
String schemaJson = getJsonFile("ComputerSchema_version_1_A.json")

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(
        new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(schemaJson));

Upgrading the Published schema

The following code example performs an in-place schema upgrade of the Published schema with schema revisions (it adds new attributes to the existing facet and another facet to the schema). The upgradePublishedSchema API upgrades the Published schema with backward-compatible changes from the Development schema.

// From an earlier code example, I know the publishedSchemaArn has this value: "arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1"

// Upgrade publishedSchemaArn to minorVersion A. The Development schema must be backward compatible with 
// the existing publishedSchemaArn. 

String minorVersion = "A"

UpgradePublishedSchemaResult upgradePublishedSchemaResult = client.upgradePublishedSchema(new UpgradePublishedSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withPublishedSchemaArn(publishedSchemaArn)
        .withMinorVersion(minorVersion));

String upgradedPublishedSchemaArn = upgradePublishedSchemaResult.getUpgradedSchemaArn();

// The Published schema ARN after the upgrade shows a minor version as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1/A

Upgrading the Applied schema

The following diagram shows the in-place schema upgrade for the SeattleDirectory directory. I am performing the schema upgrade so that I can reflect the new schemas in all three directories. As a reminder, I added new attributes to the ComputerInfo facet and also added the ComputerAssignment facet. After the schema and directory upgrade, I can create objects for the ComputerInfo and ComputerAssignment facets in the SeattleDirectory. Any objects that were created with the old facet definition for ComputerInfo will now use the default values for any additional attributes defined in the new schema.

Diagram of the in-place schema upgrade for the SeattleDirectory directory

I use the following code example to perform an in-place upgrade of the SeattleDirectory to a Major Version of 1 and a Minor Version of A. Note that you should change a Major Version identifier in a schema to make backward-incompatible changes such as changing the data type of an existing attribute or dropping a mandatory attribute from your schema. Backward-incompatible changes require directory data migration from a previous version to the new version. You should change a Minor Version identifier in a schema to make backward-compatible upgrades such as adding additional attributes or adding facets, which in turn may contain one or more attributes. The upgradeAppliedSchema API lets me upgrade an existing directory with a different version of a schema.

// This upgrades ComputerSchema version 1 of the Applied schema in SeattleDirectory to Major Version 1 and Minor Version A
// The schema must be backward compatible or the API will fail with IncompatibleSchemaException

UpgradeAppliedSchemaResult upgradeAppliedSchemaResult = client.upgradeAppliedSchema(new UpgradeAppliedSchemaRequest()
        .withDirectoryArn(directoryArn)
        .withPublishedSchemaArn(upgradedPublishedSchemaArn));

String upgradedAppliedSchemaArn = upgradeAppliedSchemaResult.getUpgradedSchemaArn();

// The Applied schema ARN after the in-place schema upgrade will appear as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

// This code section can be reused to upgrade directories for the Portland and San Francisco locations with the appropriate directory ARN

Note: Cloud Directory has excluded returning the Minor Version identifier in the Applied schema ARN for backward compatibility and to enable the application to work across older and newer versions of the directory.

The following diagram shows the changes that are made when I perform an in-place schema upgrade in the two remaining directories, PortlandDirectory and SanFranciscoDirectory. I make these calls sequentially, upgrading PortlandDirectory first and then upgrading SanFranciscoDirectory. I use the same code example that I used earlier to upgrade SeattleDirectory. Now, all my directories are running the most current version of the schema. Also, I made these schema changes without having to migrate data and while maintaining my application’s high availability.

Diagram showing the changes that are made with an in-place schema upgrade in the two remaining directories

Schema revision history

I can now view the schema revision history for any of AnyCompany’s directories by using the listAppliedSchemaArns API. Cloud Directory maintains the five most recent versions of applied schema changes. Similarly, to inspect the current Minor Version that was applied to my schema, I use the getAppliedSchemaVersion API. The listAppliedSchemaArns API returns the schema ARNs based on my schema filter as defined in withSchemaArn.

I use the following code example to query an Applied schema for its version history.

// This returns the five most recent Minor Versions associated with a Major Version
ListAppliedSchemaArnsResult listAppliedSchemaArnsResult = client.listAppliedSchemaArns(new ListAppliedSchemaArnsRequest()
        .withDirectoryArn(directoryArn)
        .withSchemaArn(upgradedAppliedSchemaArn));

// Note: The listAppliedSchemaArns API without the SchemaArn filter returns all the Major Versions in a directory

The listAppliedSchemaArns API returns the two ARNs as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1
arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

The following code example queries an Applied schema for current Minor Version by using the getAppliedSchemaVersion API.

// This returns the current Applied schema's Minor Version ARN 

GetAppliedSchemaVersion getAppliedSchemaVersionResult = client.getAppliedSchemaVersion(new GetAppliedSchemaVersionRequest()
	.withSchemaArn(upgradedAppliedSchemaArn));

The getAppliedSchemaVersion API returns the current Applied schema ARN with a Minor Version, as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

If you have a lot of directories, schema revision API calls can help you audit your directory fleet and ensure that all directories are running the same version of a schema. Such auditing can help you ensure high integrity of directories across your fleet.

Summary

You can use in-place schema upgrades to make changes to your directory schema as you evolve your data set to match the needs of your application. An in-place schema upgrade allows you to maintain high availability for your directory and applications while the upgrade takes place. For more information about in-place schema upgrades, see the in-place schema upgrade documentation.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this post, start a new thread in the Directory Service forum or contact AWS Support.

– Mahendra

 

Newly Updated Whitepaper: FERPA Compliance on AWS

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/newly-updated-whitepaper-ferpa-compliance-on-aws/

One of the main tenets of the Family Educational Rights and Privacy Act (FERPA) is the protection of student education records, including personally identifiable information (PII) and directory information. We recently updated our FERPA Compliance on AWS whitepaper to include AWS service-specific guidance for 24 AWS services. The whitepaper describes how these services can be used to help secure protected data. In conjunction with more detailed service-specific documentation, this updated information helps make it easier for you to plan, deploy, and operate secure environments to meet your compliance requirements in the AWS Cloud.

The updated whitepaper is especially useful for educational institutions and their vendors who need to understand:

  • AWS’s Shared Responsibility Model.
  • How AWS services can be used to help deploy educational and PII workloads securely in the AWS Cloud.
  • Key security disciplines in a security program to help you run a FERPA-compliant program (such as auditing, data destruction, and backup and disaster recovery).

In a related effort to help you secure PII, we also added to the whitepaper a mapping of NIST SP 800-122, which provides guidance for protecting PII, as well as a link to our NIST SP 800-53 Quick Start, a CloudFormation template that automatically configures AWS resources and deploys a multi-tier, Linux-based web application. To learn how this Quick Start works, see the Automate NIST Compliance in AWS GovCloud (US) with AWS Quick Start Tools video. The template helps you streamline and automate secure baselines in AWS—from initial design to operational security readiness—by incorporating the expertise of AWS security and compliance subject matter experts.

For more information about AWS Compliance and FERPA or to request support for your organization, contact your AWS account manager.

– Chris Gile, Senior Manager, AWS Security Assurance

New Police Anti-Piracy Task Force May Get Involved in Site Blocking

Post Syndicated from Ernesto original https://torrentfreak.com/new-police-anti-piracy-task-force-may-get-involved-in-site-blocking-171206/

On a regular basis, major media companies and their associates seek assistance from the authorities in order to curb copyright infringement.

In some cases, this has resulted in special police units that have piracy among their main objectives, such as The City of London Police Intellectual Property Crime Unit (PIPCU) in the UK.

Over in Denmark, the Government greenlighted a similar initiative last week. Justice Minister Søren Pape Poulsen approved a new task force that will operate under police wings, with an exclusive focus on intellectual property crimes.

“This is the culmination of a joint effort among Danish trade organizations’ calls for public engagement in the enforcement of IP crime in Denmark,” Maria Fredenslund, CEO of the local anti-piracy group RettighedsAlliancen (Rights Alliance) tells TorrentFreak.

“Similar to the PIPCU unit in the UK the task force will be specialized in IP crime and will handle existing cases and develop digital enforcement,” she adds.

The new unit will consist of five or six investigators, who will be assisted by prosecutors. The main goal will be to tackle organized crime on as many levels as possible.

The new police task force will first operate on a trial basis. After the first half year, the Government will evaluate its progress and decide if the project will continue. If that happens, the unit may also get involved in website blocking efforts.

Pirate site blockades are not new in Denmark, but thus far these have been the result of civil procedures initiated by copyright holders. According to new plans, which still have to be approved, legislation that’s currently used to block terrorist content may be used against pirate sites as well.

“The Government will look into the possibility to give the police authority to carry out blockades of infringing websites,” Fredenslund says.

This would be possible under a provision in the Administration of Justice Act, which the Danish Parliament recently adopted. While the blocking requests would be submitted by the police unit, instead of copyright holders, a court still has to approve them.

“The decision to block a website is made with a court order by request of the police. The court order shall list the specific circumstances that prove the conditions for the blocking of the website have been met. The court order may be revoked at any time,” the relevant provision reads.

For the time being, the new anti-piracy task force will focus on handling other copyright infringement cases, which these are plenty of.

Rights Alliance is happy with the help they are getting. The anti-piracy group has been working on their own “piracy disruption machine” in recent months and with assistance from law enforcement, they hope to achieve some good results soon.

For now, however, the private blocking requests are continuing as well.

Just yesterday the District Court in Frederiksberg issued an order (pdf) in favor of the Rights Alliance, requiring a local ISP to block dozens of Popcorn Time related domain names. As part of a voluntary agreement, this block will be implemented by other Internet providers as well.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

The Pi Towers Secret Santa Babbage

Post Syndicated from Mark Calleja original https://www.raspberrypi.org/blog/secret-santa-babbage/

Tired of pulling names out of a hat for office Secret Santa? Upgrade your festive tradition with a Raspberry Pi, thermal printer, and everybody’s favourite microcomputer mascot, Babbage Bear.

Raspberry Pi Babbage Bear Secret Santa

The name’s Santa. Secret Santa.

It’s that time of year again, when the cosiness gets turned up to 11 and everyone starts thinking about jolly fat men, reindeer, toys, and benevolent home invasion. At Raspberry Pi, we’re running a Secret Santa pool: everyone buys a gift for someone else in the office. Obviously, the person you buy for has to be picked in secret and at random, or the whole thing wouldn’t work. With that in mind, I created Secret Santa Babbage to do the somewhat mundane task of choosing gift recipients. This could’ve just been done with some names in a hat, but we’re Raspberry Pi! If we don’t make a Python-based Babbage robot wearing a jaunty hat and programmed to spread Christmas cheer, who will?

Secret Santa Babbage

Ho ho ho!

Mecha-Babbage Xmas shenanigans

The script the robot runs is pretty basic: a list of names entered as comma-separated strings is shuffled at the press of a GPIO button, then a name is popped off the end and stored as a variable. The name is matched to a photo of the person stored on the Raspberry Pi, and a thermal printer pinched from Alex’s super awesome PastyCam (blog post forthcoming, maybe) prints out the picture and name of the person you will need to shower with gifts at the Christmas party. (Well, OK — with one gift. No more than five quid’s worth. Nothing untoward.) There’s also a redo function, just in case you pick yourself: press another button and the last picked name — still stored as a variable — is appended to the list again, which is shuffled once more, and a new name is popped off the end.

Secret Santa Babbage prototyping

Prototyping!

As the build was a bit of a rush job undertaken at the request of our ‘Director of Vibe’ Emily, there are a few things I’d like to improve about this functionality that I didn’t get around to — more on that later. To add some extra holiday spirit to the project at the last minute, I used Pygame to play a WAV file of Santa’s jolly laugh while Babbage chooses a name for you. The file is included in the GitHub repo along with everything else, because ‘tis the season, etc., etc.

Secret Santa Babbage prototyping

Editor’s note: Considering these desk adornments, Mark’s Secret Santa gift-giver has a lot to go on.

Writing the code for Xmas Mecha-Babbage was fairly straightforward, though it uses some tricky bits for managing the thermal printer. You’ll need to install the drivers to make it go, as well as the CUPS package for managing the print hosting. You can find instructions for these things here, thanks to the wonderful Adafruit crew. Also, for reasons I couldn’t fathom, this will all only work on a Pi 2 and not a Pi 3, as there are some compatibility issues with the thermal printer otherwise. (I also tested the script on a Pi Zero W…no dice.)

Building a Christmassy throne

The hardest (well, fiddliest) parts of making the whole build were constructing the throne and wiring the bear. Using MakerCase, Inkscape, a bit of ingenuity, and a laser cutter, I was able to rig up a Christmassy plywood throne which has a hole through the seat so I could run the wires down from Babbage and to the Pi inside. I finished the throne by rubbing a couple of fingers of beeswax into it; as well as making the wood shine just a little bit and protecting it against getting wet, this had the added bonus of making it smell awesome.

Secret Santa Babbage inside

Next year’s iteration will be mulled wine–scented.

I next soldered two LEDs to some lengths of wire, and then ran the wires through holes at the top of the throne and down the back along a small channel I had carved with a narrow chisel to connect them to the Pi’s GPIO pins. The green LED will remain on as long as Babbage is running his program, and the red one will light up while he is processing your request. Once the red LED goes off again, the next person can have a go. I also laser-cut a final piece of wood to overlay the back of Babbage’s Xmas throne and cover the wiring a bit.

Creating a Xmas cyborg bear

Taking two 6 mm tactile buttons, I clipped the spiky metal legs off one side of each (the buttons were going into a stuffed christmas toy, after all) and soldered a length of wire to each of the remaining legs. Next, I made a small incision into Babbage with my trusty Swiss army knife (in a place that actually made me cringe a little) and fed the buttons up into his paws. At some point in this process I was standing in the office wrestling with the bear and muttering to myself, which elicited some very strange looks from my colleagues.

Secret Santa Babbage throne

Poor Babbage…

One thing to note here is to make sure the wires remain attached at the solder points while you push them up into Babbage’s paws. The first time I tried it, I snapped one of my connections and had to start again. It helped to remove some stuffing like a tunnel and then replace it afterward. Moreover, you can use your fingertip to support the joints as you poke the wire in. Finally, a couple of squirts of hot glue to keep Babbage’s furry cheeks firmly on the seat, and done!

Secret Santa Babbage

Next year: Game of Thrones–inspired candy cane throne

The Secret Santa Babbage masterpiece

The whole build process was the perfect holiday mix of cheerful and macabre, and while getting the thermal printer to work was a little time-consuming, the finished product definitely raised some smiles around the office and added a bit of interesting digital flavour to a staid office tradition. And it also helped people who are new to the office or from other branches of the Foundation to know for whom they will be buying a gift.

Secret Santa Babbage

Ready to dispense Christmas cheer!

There are a few ways in which I’ll polish this project before next year, such as having the script write the names to external text files to create a record that will persist in case of a reboot, and maybe having Secret Santa Babbage play you a random Christmas carol when you squeeze his paw instead of just laughing merrily every time. (I also thought about adding electric shocks for those people who are on the naughty list, but HR said no. Bah, humbug!)

Make your own

The code and laser cut plans for the whole build are available here. If you plan to make your own, let us know which stuffed toy you will be turning into a Secret Santa cyborg! And if you’ve been working on any other Christmas-themed Raspberry Pi projects, we’d like to see those too, so tag us on social media to share the festive maker cheer.

The post The Pi Towers Secret Santa Babbage appeared first on Raspberry Pi.

Glenn’s Take on re:Invent 2017 – Part 3

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-3/

Glenn Gore here, Chief Architect for AWS. I was in Las Vegas last week — with 43K others — for re:Invent 2017. I checked in to the Architecture blog here and here with my take on what was interesting about some of the bigger announcements from a cloud-architecture perspective.

In the excitement of so many new services being launched, we sometimes overlook feature updates that, while perhaps not as exciting as Amazon DeepLens, have significant impact on how you architect and develop solutions on AWS.

Amazon DynamoDB is used by more than 100,000 customers around the world, handling over a trillion requests every day. From the start, DynamoDB has offered high availability by natively spanning multiple Availability Zones within an AWS Region. As more customers started building and deploying truly-global applications, there was a need to replicate a DynamoDB table to multiple AWS Regions, allowing for read/write operations to occur in any region where the table was replicated. This update is important for providing a globally-consistent view of information — as users may transition from one region to another — or for providing additional levels of availability, allowing for failover between AWS Regions without loss of information.

There are some interesting concurrency-design aspects you need to be aware of and ensure you can handle correctly. For example, we support the “last writer wins” reconciliation where eventual consistency is being used and an application updates the same item in different AWS Regions at the same time. If you require strongly-consistent read/writes then you must perform all of your read/writes in the same AWS Region. The details behind this can be found in the DynamoDB documentation. Providing a globally-distributed, replicated DynamoDB table simplifies many different use cases and allows for the logic of replication, which may have been pushed up into the application layers to be simplified back down into the data layer.

The other big update for DynamoDB is that you can now back up your DynamoDB table on demand with no impact to performance. One of the features I really like is that when you trigger a backup, it is available instantly, regardless of the size of the table. Behind the scenes, we use snapshots and change logs to ensure a consistent backup. While backup is instant, restoring the table could take some time depending on its size and ranges — from minutes to hours for very large tables.

This feature is super important for those of you who work in regulated industries that often have strict requirements around data retention and backups of data, which sometimes limited the use of DynamoDB or required complex workarounds to implement some sort of backup feature in the past. This often incurred significant, additional costs due to increased read transactions on their DynamoDB tables.

Amazon Simple Storage Service (Amazon S3) was our first-released AWS service over 11 years ago, and it proved the simplicity and scalability of true API-driven architectures in the cloud. Today, Amazon S3 stores trillions of objects, with transactional requests per second reaching into the millions! Dealing with data as objects opened up an incredibly diverse array of use cases ranging from libraries of static images, game binary downloads, and application log data, to massive data lakes used for big data analytics and business intelligence. With Amazon S3, when you accessed your data in an object, you effectively had to write/read the object as a whole or use the range feature to retrieve a part of the object — if possible — in your individual use case.

Now, with Amazon S3 Select, an SQL-like query language is used that can work with delimited text and JSON files, as well as work with GZIP compressed files. We don’t support encryption during the preview of Amazon S3 Select.

Amazon S3 Select provides two major benefits:

  • Faster access
  • Lower running costs

Serverless Lambda functions, where every millisecond matters when you are being charged, will benefit greatly from Amazon S3 Select as data retrieval and processing of your Lambda function will experience significant speedups and cost reductions. For example, we have seen 2x speed improvement and 80% cost reduction with the Serverless MapReduce code.

Other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR will support Amazon S3 Select as well as partner offerings including Cloudera and Hortonworks. If you are using Amazon Glacier for longer-term data archival, you will be able to use Amazon Glacier Select to retrieve a subset of your content from within Amazon Glacier.

As the volume of data that can be stored within Amazon S3 and Amazon Glacier continues to scale on a daily basis, we will continue to innovate and develop improved and optimized services that will allow you to work with these magnificently-large data sets while reducing your costs (retrieval and processing). I believe this will also allow you to simplify the transformation and storage of incoming data into Amazon S3 in basic, semi-structured formats as a single copy vs. some of the duplication and reformatting of data sometimes required to do upfront optimizations for downstream processing. Amazon S3 Select largely removes the need for this upfront optimization and instead allows you to store data once and process it based on your individual Amazon S3 Select query per application or transaction need.

Thanks for reading!

Glenn contemplating why CSV format is still relevant in 2017 (Italy).

GoMovies/123Movies Launches Anime Streaming Site

Post Syndicated from Ernesto original https://torrentfreak.com/gomovies123movies-launches-anime-streaming-site-171204/

Pirate video streaming sites are booming. Their relative ease of use through on-demand viewing makes them a viable alternative to P2P file-sharing, which traditionally dominated the piracy arena.

The popular movie streaming site GoMovies, formerly known as 123movies, is one of the most-used streaming sites. Despite the rebranding and several domain changes, it has built a steady base of millions of users over the past year and a half.

And it’s not done yet, we learn today.

The site, currently operating from the Gostream.is domain name, recently launched a new spinoff targeting anime fans. Animehub.to is currently promoted on GoMovies and the site’s operators aim to turn it into the leading streaming site for anime content.

Animehub.to

Someone connected to GoMovies told us that they’ve received a lot of requests from users to add anime content. Anime has traditionally been a large niche on file-sharing sites and the same is true on streaming platforms.

Technically speaking, GoMovies could have easily filled up the original site with anime content, but the owners prefer a different outlet.

With a separate anime site, they hope to draw in more visitors, TorrentFreak was told by an insider. For one, this makes it possible to rank better in search engines. It also allows the operators to cater specifically to the anime audience, with anime specific categories and release schedules.

Anime copyright holders will not be pleased with the new initiative, that’s for sure, but GoMovies is not new to legal pressure.

Earlier this year the US Ambassador to Vietnam called on the local Government to criminally prosecute people behind 123movies, the previous iteration of the site. In addition, the MPAA reported the site to the US Government in its recent overview of notorious pirate sites.

Pressure or not, it appears that GoMovies has no intention of slowing down or changing its course, although we’ve heard that yet another rebranding is on the horizon.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Could a Single Copyright Complaint Kill Your Domain?

Post Syndicated from Andy original https://torrentfreak.com/could-a-single-copyright-complaint-kill-your-domain-171203/

It goes without saying that domain names are a crucial part of any site’s infrastructure. Without domains, sites aren’t easily findable and when things go wrong, the majority of web users could be forgiven for thinking that they no longer exist.

That was the case last week when Canada-based mashup site Sowndhaus suddenly found that its domain had been rendered completely useless. As previously reported, the site’s domain was suspended by UK-based registrar DomainBox after it received a copyright complaint from the IFPI.

There are a number of elements to this story, not least that the site’s operators believe that their project is entirely legal.

“We are a few like-minded folks from the mashup community that were tired of doing the host dance – new sites welcome us with open arms until record industry pressure becomes too much and they mass delete and ban us,” a member of the Sowndhaus team informs TF.

“After every mass deletion there are a wave of producers that just retire and their music is lost forever. We decided to make a more permanent home for ourselves and Canada’s Copyright Modernization Act gave us the opportunity to do it legally.
We just want a small quiet corner of the internet where we can make music without being criminalized. It seems insane that I even have to say that.”

But while these are all valid concerns for the Sowndhaus community, there is a bigger picture here. There is absolutely no question that sites like YouTube and Soundcloud host huge libraries of mashups, yet somehow they hang on to their domains. Why would DomainBox take such drastic action? Is the site a real menace?

“The IFPI have sent a few standard DMCA takedown notices [to Sowndhaus, indirectly], each about a specific track or tracks on our server, asking us to remove them and any infringing activity. Every track complained about has been transformative, either a mashup or a remix and in a couple of cases cover versions,” the team explains.

But in all cases, it appears that IFPI and its agents didn’t take the time to complain to the site first. They instead went for the site’s infrastructure.

“[IFPI] have never contacted us directly, even though we have a ‘report copyright abuse’ feature on our site and a dedicated copyright email address. We’ve only received forwarded emails from our host and domain registrar,” the site says.

Sowndhaus believes that the event that led to the domain suspension was caused by a support ticket raised by the “RiskIQ Incident Response Team”, who appear to have been working on behalf of IFPI.

“We were told by DomainBox…’Please remove the unlawful content from your website, or the domain will be suspended. Please reply within the next 5 working days to ensure the request was actioned’,” Sowndhaus says.

But they weren’t given five days, or even one. DomainBox chose to suspend the Sowndhaus.com domain name immediately, rendering the site inaccessible and without even giving the site a chance to respond.

“They didn’t give us an option to appeal the decision. They just took the IFPI’s word that the files were unlawful and must be removed,” the site informs us.

Intrigued at why DomainBox took the nuclear option, TorrentFreak sent several emails to the company but each time they went unanswered. We also sent emails to Mesh Digital Ltd, DomainBox’s operator, but they were given the same treatment.

We wanted to know on what grounds the registrar suspended the domain but perhaps more importantly, we wanted to know if the company is as aggressive as this with its other customers.

To that end we posed a question: If DomainBox had been entrusted with the domains of YouTube or Soundcloud, would they have acted in the same manner? We can’t put words in their mouth but it seems likely that someone in the company would step in to avoid a PR disaster on that scale.

Of course, both YouTube and Soundcloud comply with the law by taking down content when it infringes someone’s rights. It’s a position held by Sowndhaus too, even though they do not operate in the United States.

“We comply fully with the Copyright Act (Canada) and have our own policy of removing any genuinely infringing content,” the site says, adding that users who infringe are banned from the platform.

While there has never been any suggestion that IFPI or its agents asked for Sowndhaus’ domain to be suspended, it’s clear that DomainBox made a decision to do just that. In some cases that might have been warranted, but registrars should definitely aim for a clear, transparent and fair process, so that the facts can be reviewed and appropriate action taken.

It’s something for people to keep in mind when they register a domain in future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Google Says It Can’t Filter Pirated Content Proactively

Post Syndicated from Ernesto original https://torrentfreak.com/google-says-it-cant-filter-pirated-content-proactively-171202/

Over the past few years the entertainment industries have repeatedly asked Google to step up its game when it comes to its anti-piracy efforts.

These calls haven’t fallen on deaf ears and Google has steadily implemented various anti-piracy measures in response.

Still, that is not enough. At least, according to several prominent music industry groups who are advocating a ‘Take Down, Stay Down’ approach.

Currently, Google mostly responds to takedown requests that are sent in by copyright holders. The search engine deletes the infringing results and demotes the domains of frequent infringers. However, the same content often reappears on other sites, or in another location on the same site.

Earlier this year a group of prominent music groups stated that the present situation forces rightsholders to participate in a never-ending game of whack-a-mole which doesn’t fix the underlying problem. Instead, it results in a “frustrating, burdensome and ultimately ineffective takedown process.”

While Google understands the rationale behind the complaints, the company doesn’t believe in a more proactive solution. This was reiterated by Matt Brittin, President of EMEA Business & Operations at Google, during the Royal Television Society Event in London this week.

“The music industry has been quite tough with us on this. They’d like us proactively to know this stuff. It’s just not possible in this industry,” Brittin said.

That doesn’t mean that Google is sitting still. Brittin stresses that the company has invested millions in anti-piracy tools. That said, there can always be room for improvement.

“What we’ve tried to do is build tools that allow them to do that at scale easily and that work all together … I’m sure there are places where we could do better. There are teams and millions of dollars invested in this.

“Combatting bad acts and piracy is obviously very important to us,” Brittin added.

While Google sees no room for proactive filtering in search results, music industry insiders believe it’s possible.

Ideally, they want some type of automated algorithm or technology that removes infringing results without a targeted DMCA notice. This could be similar to YouTube’s Content-ID system, or the hash filtering mechanisms Google Drive employs, for example.

For now, however, there’s no sign that Google will go beyond the current takedown notice approach, at least for search. A ‘Take Down, Stay Down’ mechanism wouldn’t “understand” when content is authorized or not, the company previously noted.

And so, the status quo is likely to remain, at least for now.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Collect Data Statistics Up to 5x Faster by Analyzing Only Predicate Columns with Amazon Redshift

Post Syndicated from George Caragea original https://aws.amazon.com/blogs/big-data/collect-data-statistics-up-to-5x-faster-by-analyzing-only-predicate-columns-with-amazon-redshift/

Amazon Redshift is a fast, fully managed, petabyte-scale data warehousing service that makes it simple and cost-effective to analyze all of your data. Many of our customers—including Boingo Wireless, Scholastic, Finra, Pinterest, and Foursquare—migrated to Amazon Redshift and achieved agility and faster time to insight, while dramatically reducing costs.

Query optimization and the need for accurate estimates

When a SQL query is submitted to Amazon Redshift, the query optimizer is in charge of generating all the possible ways to execute that query, and picking the fastest one. This can mean evaluating the cost of thousands, if not millions, of different execution plans.

The plan cost is calculated based on estimates of the data characteristics. For example, the characteristics could include the number of rows in each base table, the average width of a variable-length column, the number of distinct values in a column, and the most common values in a column. These estimates (or “statistics”) are computed in advance by running an ANALYZE command, and stored in the system catalog.

How do the query optimizer and ANALYZE work together?

An ideal scenario is to run ANALYZE after every ETL/ingestion job. This way, when running your workload, the query optimizer can use up-to-date data statistics, and choose the most optimal execution plan, given the updates.

However, running the ANALYZE command can add significant overhead to the data ingestion scripts. This can lead to customers not running ANALYZE on their data, and using default or stale estimates. The end result is usually the optimizer choosing a suboptimal execution plan that runs for longer than needed.

Analyzing predicate columns only

When you run a SQL query, the query optimizer requests statistics only on columns used in predicates in the SQL query (join predicates, filters in the WHERE clause and GROUP BY clauses). Consider the following query:

SELECT Avg(salary), 
       Min(hiredate), 
       deptname 
FROM   emp 
WHERE  state = 'CA' 
GROUP  BY deptname; 

In the query above, the optimizer requests statistics only on columns ‘state’ and ‘deptname’, but not on ‘salary’ and ‘hiredate’. If present, statistics on columns ‘salary’ and ‘hiredate’ are ignored, as they do not impact the cost of the execution plans considered.

Based on the optimizer functionality described earlier, the Amazon Redshift ANALYZE command has been updated to optionally collect information only about columns used in previous queries as part of a filter, join condition or a GROUP BY clause, and columns that are part of distribution or sort keys (predicate columns). There’s a recently introduced option for the ANALYZE command that only analyzes predicate columns:

ANALYZE <table name> PREDICATE COLUMNS;

By having Amazon Redshift collect information about predicate columns automatically, and analyzing those columns only, you’re able to reduce the time to run ANALYZE. For example, during the execution of the 99 queries in the TPC-DS workload, only 203 out of the 424 total columns are predicate columns (approximately 48%). By analyzing only the predicate columns for such a workload, the execution time for running ANALYZE can be significantly reduced.

From my experience in the data warehousing space, I have observed that about 20% of columns in a typical use case are marked predicate. In such a case, running ANALYZE PREDICATE COLUMNS can lead to a speedup of up to 5x relative to a full ANALYZE run.

If no information on predicate columns exists in the system (for example, a new table that has not been queried yet), ANALYZE PREDICATE COLUMNS collects statistics on all the columns. When queries on the table are run, Amazon Redshift collects information about predicate column usage, and subsequent runs of ANALYZE PREDICATE COLUMNS only operates on the predicate columns.

If the workload is relatively stable, and the set of predicate columns does not expand continuously over time, I recommend replacing all occurrences of the ANALYZE command with ANALYZE PREDICATE COLUMNS commands in your application and data ingestion code.

Using the Analyze/Vacuum utility

Several AWS customers are using the Analyze/Vacuum utility from the Redshift-Utils package to manage and automate their maintenance operations. By passing the –predicate-cols option to the Analyze/Vacuum utility, you can enable it to use the ANALYZE PREDICATE COLUMNS feature, providing you with the significant changes in overhead in a completely seamless manner.

Enhancements to logging for ANALYZE operations

When running ANALYZE with the PREDICATE COLUMNS option, the type of analyze run (Full vs Predicate Column), as well as information about the predicate columns encountered, is logged in the stl_analyze view:

SELECT status, 
       starttime, 
       prevtime, 
       num_predicate_cols, 
       num_new_predicate_cols 
FROM   stl_analyze;
     status   |    starttime        |   prevtime          | pred_cols | new_pred_cols
--------------+---------------------+---------------------+-----------+---------------
 Full         | 2017-11-09 01:15:47 |                     |         0 |             0
 PredicateCol | 2017-11-09 01:16:20 | 2017-11-09 01:15:47 |         2 |             2

AWS also enhanced the pg_statistic catalog table with two new pieces of information: the time stamp at which a column was marked as “predicate”, and the time stamp at which the column was last analyzed.

The Amazon Redshift documentation provides a view that allows a user to easily see which columns are marked as predicate, when they were marked as predicate, and when a column was last analyzed. For example, for the emp table used above, the output of the view could be as follows:

 SELECT col_name, 
       is_predicate, 
       first_predicate_use, 
       last_analyze 
FROM   predicate_columns 
WHERE  table_name = 'emp';

 col_name | is_predicate | first_predicate_use  |        last_analyze
----------+--------------+----------------------+----------------------------
 id       | f            |                      | 2017-11-09 01:15:47
 name     | f            |                      | 2017-11-09 01:15:47
 deptname | t            | 2017-11-09 01:16:03  | 2017-11-09 01:16:20
 age      | f            |                      | 2017-11-09 01:15:47
 salary   | f            |                      | 2017-11-09 01:15:47
 hiredate | f            |                      | 2017-11-09 01:15:47
 state    | t            | 2017-11-09 01:16:03  | 2017-11-09 01:16:20

Conclusion

After loading new data into an Amazon Redshift cluster, statistics need to be re-computed to guarantee performant query plans. By learning which column statistics are actually being used by the customer’s workload and collecting statistics only on those columns, Amazon Redshift is able to significantly reduce the amount of time needed for table maintenance during data loading workflows.


Additional Reading

Be sure to check out the Top 10 Tuning Techniques for Amazon Redshift, and the Advanced Table Design Playbook: Distribution Styles and Distribution Keys.


About the Author

George Caragea is a Senior Software Engineer with Amazon Redshift. He has been working on MPP Databases for over 6 years and is mainly interested in designing systems at scale. In his spare time, he enjoys being outdoors and on the water in the beautiful Bay Area and finishing the day exploring the rich local restaurant scene.

 

 

Implementing Canary Deployments of AWS Lambda Functions with Alias Traffic Shifting

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/implementing-canary-deployments-of-aws-lambda-functions-with-alias-traffic-shifting/

This post courtesy of Ryan Green, Software Development Engineer, AWS Serverless

The concepts of blue/green and canary deployments have been around for a while now and have been well-established as best-practices for reducing the risk of software deployments.

In a traditional, horizontally scaled application, copies of the application code are deployed to multiple nodes (instances, containers, on-premises servers, etc.), typically behind a load balancer. In these applications, deploying new versions of software to too many nodes at the same time can impact application availability as there may not be enough healthy nodes to service requests during the deployment. This aggressive approach to deployments also drastically increases the blast radius of software bugs introduced in the new version and does not typically give adequate time to safely assess the quality of the new version against production traffic.

In such applications, one commonly accepted solution to these problems is to slowly and incrementally roll out application software across the nodes in the fleet while simultaneously verifying application health (canary deployments). Another solution is to stand up an entirely different fleet and weight (or flip) traffic over to the new fleet after verification, ideally with some production traffic (blue/green). Some teams deploy to a single host (“one box environment”), where the new release can bake for some time before promotion to the rest of the fleet. Techniques like this enable the maintainers of complex systems to safely test in production while minimizing customer impact.

Enter Serverless

There is somewhat of an impedance mismatch when mapping these concepts to a serverless world. You can’t incrementally deploy your software across a fleet of servers when there are no servers!* In fact, even the term “deployment” takes on a different meaning with functions as a service (FaaS). In AWS Lambda, a “deployment” can be roughly modeled as a call to CreateFunction, UpdateFunctionCode, or UpdateAlias (I won’t get into the semantics of whether updating configuration counts as a deployment), all of which may affect the version of code that is invoked by clients.

The abstractions provided by Lambda remove the need for developers to be concerned about servers and Availability Zones, and this provides a powerful opportunity to greatly simplify the process of deploying software.
*Of course there are servers, but they are abstracted away from the developer.

Traffic shifting with Lambda aliases

Before the release of traffic shifting for Lambda aliases, deployments of a Lambda function could only be performed in a single “flip” by updating function code for version $LATEST, or by updating an alias to target a different function version. After the update propagates, typically within a few seconds, 100% of function invocations execute the new version. Implementing canary deployments with this model required the development of an additional routing layer, further adding development time, complexity, and invocation latency.
While rolling back a bad deployment of a Lambda function is a trivial operation and takes effect near instantaneously, deployments of new versions for critical functions can still be a potentially nerve-racking experience.

With the introduction of alias traffic shifting, it is now possible to trivially implement canary deployments of Lambda functions. By updating additional version weights on an alias, invocation traffic is routed to the new function versions based on the weight specified. Detailed CloudWatch metrics for the alias and version can be analyzed during the deployment, or other health checks performed, to ensure that the new version is healthy before proceeding.

Note: Sometimes the term “canary deployments” refers to the release of software to a subset of users. In the case of alias traffic shifting, the new version is released to some percentage of all users. It’s not possible to shard based on identity without adding an additional routing layer.

Examples

The simplest possible use of a canary deployment looks like the following:

# Update $LATEST version of function
aws lambda update-function-code --function-name myfunction ….

# Publish new version of function
aws lambda publish-version --function-name myfunction

# Point alias to new version, weighted at 5% (original version at 95% of traffic)
aws lambda update-alias --function-name myfunction --name myalias --routing-config '{"AdditionalVersionWeights" : {"2" : 0.05} }'

# Verify that the new version is healthy
…
# Set the primary version on the alias to the new version and reset the additional versions (100% weighted)
aws lambda update-alias --function-name myfunction --name myalias --function-version 2 --routing-config '{}'

This is begging to be automated! Here are a few options.

Simple deployment automation

This simple Python script runs as a Lambda function and deploys another function (how meta!) by incrementally increasing the weight of the new function version over a prescribed number of steps, while checking the health of the new version. If the health check fails, the alias is rolled back to its initial version. The health check is implemented as a simple check against the existence of Errors metrics in CloudWatch for the alias and new version.

GitHub aws-lambda-deploy repo

Install:

git clone https://github.com/awslabs/aws-lambda-deploy
cd aws-lambda-deploy
export BUCKET_NAME=[YOUR_S3_BUCKET_NAME_FOR_BUILD_ARTIFACTS]
./install.sh

Run:

# Rollout version 2 incrementally over 10 steps, with 120s between each step
aws lambda invoke --function-name SimpleDeployFunction --log-type Tail --payload \
  '{"function-name": "MyFunction",
  "alias-name": "MyAlias",
  "new-version": "2",
  "steps": 10,
  "interval" : 120,
  "type": "linear"
  }' output

Description of input parameters

  • function-name: The name of the Lambda function to deploy
  • alias-name: The name of the alias used to invoke the Lambda function
  • new-version: The version identifier for the new version to deploy
  • steps: The number of times the new version weight is increased
  • interval: The amount of time (in seconds) to wait between weight updates
  • type: The function to use to generate the weights. Supported values: “linear”

Because this runs as a Lambda function, it is subject to the maximum timeout of 5 minutes. This may be acceptable for many use cases, but to achieve a slower rollout of the new version, a different solution is required.

Step Functions workflow

This state machine performs essentially the same task as the simple deployment function, but it runs as an asynchronous workflow in AWS Step Functions. A nice property of Step Functions is that the maximum deployment timeout has now increased from 5 minutes to 1 year!

The step function incrementally updates the new version weight based on the steps parameter, waiting for some time based on the interval parameter, and performing health checks between updates. If the health check fails, the alias is rolled back to the original version and the workflow fails.

For example, to execute the workflow:

export STATE_MACHINE_ARN=`aws cloudformation describe-stack-resources --stack-name aws-lambda-deploy-stack --logical-resource-id DeployStateMachine --output text | cut  -d$'\t' -f3`

aws stepfunctions start-execution --state-machine-arn $STATE_MACHINE_ARN --input '{
  "function-name": "MyFunction",
  "alias-name": "MyAlias",
  "new-version": "2",
  "steps": 10,
  "interval": 120,
  "type": "linear"}'

Getting feedback on the deployment

Because the state machine runs asynchronously, retrieving feedback on the deployment requires polling for the execution status using DescribeExecution or implementing an asynchronous notification (using SNS or email, for example) from the Rollback or Finalize functions. A CloudWatch alarm could also be created to alarm based on the “ExecutionsFailed” metric for the state machine.

A note on health checks and observability

Weighted rollouts like this are considerably more successful if the code is being exercised and monitored continuously. In this example, it would help to have some automation continuously invoking the alias and reporting metrics on these invocations, such as client-side success rates and latencies.

The absence of Lambda Errors metrics used in these examples can be misleading if the function is not getting invoked. It’s also recommended to instrument your Lambda functions with custom metrics, in addition to Lambda’s built-in metrics, that can be used to monitor health during deployments.

Extensibility

These examples could be easily extended in various ways to support different use cases. For example:

  • Health check implementations: CloudWatch alarms, automatic invocations with payload assertions, querying external systems, etc.
  • Weight increase functions: Exponential, geometric progression, single canary step, etc.
  • Custom success/failure notifications: SNS, email, CI/CD systems, service discovery systems, etc.

Traffic shifting with SAM and CodeDeploy

Using the Lambda UpdateAlias operation with additional version weights provides a powerful primitive for you to implement custom traffic shifting solutions for Lambda functions.

For those not interested in building custom deployment solutions, AWS CodeDeploy provides an intuitive turn-key implementation of this functionality integrated directly into the Serverless Application Model. Traffic-shifted deployments can be declared in a SAM template, and CodeDeploy manages the function rollout as part of the CloudFormation stack update. CloudWatch alarms can also be configured to trigger a stack rollback if something goes wrong.

i.e.

MyFunction:
  Type: AWS::Serverless::Function
  Properties:
    FunctionName: MyFunction
    AutoPublishAlias: MyFunctionInvokeAlias
    DeploymentPreference:
      Type: Linear10PercentEvery1Minute
      Role:
        Fn::GetAtt: [ DeploymentRole, Arn ]
      Alarms:
       - { Ref: MyFunctionErrorsAlarm }
...

For more information about using CodeDeploy with SAM, see Automating Updates to Serverless Apps.

Conclusion

It is often the simple features that provide the most value. As I demonstrated in this post, serverless architectures allow the complex deployment orchestration used in traditional applications to be replaced with a simple Lambda function or Step Functions workflow. By allowing invocation traffic to be easily weighted to multiple function versions, Lambda alias traffic shifting provides a simple but powerful feature that I hope empowers you to easily implement safe deployment workflows for your Lambda functions.