Tag Archives: million

CoderDojo: 2000 Dojos ever

Post Syndicated from Giustina Mizzoni original https://www.raspberrypi.org/blog/2000-dojos-ever/

Every day of the week, we verify new Dojos all around the world, and each Dojo is championed by passionate volunteers. Last week, a huge milestone for the CoderDojo community went by relatively unnoticed: in the history of the movement, more than 2000 Dojos have now been verified!

CoderDojo banner — 2000 Dojos

2000 Dojos

This is a phenomenal achievement for a movement that’s just six years old and powered by volunteers. Presently, there are more than 1650 active Dojos running weekly, fortnightly, or monthly, and all of them are free for participants — for example, the Dojos run by Joel Bayubasire in Kampala, Uganda:

Joel Bayubasire with Ninjas at his Ugandan Dojo — 2000 Dojos

Empowering refugee children

This week, Joel set up his second Dojo and verified it on our global map. Joel is a Congolese refugee living in Kampala, Uganda, where he is currently completing his PhD in Economics at Madison International Institute and Business School.

Joel understands first-hand the challenges faced by refugees who were forced to leave their country due to war or conflict. Uganda is currently hosting more than 1.2 million refugees, 60% of which are children (World Bank, 2017). As refugees, children are only allowed to attend local schools until the age of 12. This results in lower educational attainment, which will likely affect their future employment prospects.

Two girls at a laptop. Joel Bayubasire CoderDojo — 2000 Dojos

Joel has the motivation to overcome these challenges, because he understands the power of education. Therefore, he initiated a number of community-based activities to provide educational opportunities for refugee children. As part of this, he founded his first Dojo earlier in the year, with the aim of giving these children a chance to compete in today’s global knowledge-based economy.

Two boys at a laptop. Joel Bayubasire CoderDojo — 2000 Dojos

Aware that securing volunteer mentors would be a challenge, Joel trained eight young people from the community to become youth mentors to their peers. He explains:

I believe that the mastery of computer coding allows talented young people to thrive professionally and enables them to not only be consumers but creators of the interconnected world of today!

Based on the success of Joel’s first Dojo, he has now expanded the CoderDojo initiative in his community; his plan is to provide computer science training for more than 300 refugee youths in Kampala by 2019. If you’d like to learn more about Joel’s efforts, head to this website.

Join the movement

If you are interested in creating opportunities for the young people in your community, then join the growing CoderDojo movement — you can volunteer to start a Dojo or to support an existing one today!

The post CoderDojo: 2000 Dojos ever appeared first on Raspberry Pi.

ETTV: How an Upload Bot Became a Pirate Hero

Post Syndicated from Ernesto original https://torrentfreak.com/ettv-how-an-upload-bot-became-a-pirate-hero-171210/

Earlier this year, the torrent community was hit hard when another major torrent site suddenly shut its doors.

Just a few months after celebrating its tenth anniversary, ExtraTorrent’s operator threw in the towel. While an official explanation was never provided, it’s likely that he was pressed to make this decision.

The ExtraTorrent site was a safe harbor for millions of regular users, who became homeless overnight. But it was more than that. It was also the birth ground of several popular releasers and distribution groups.

ETTV and ETHD turned into well-known brands themselves. While the ET is derived from ExtraTorrent, the groups have shared TV and movie torrents on several other large torrent sites, and they still do. They even have their own site now.

With millions of people sharing their uploads every week, they’ve become icons and heroes to many. But how did this all come to be? We sat down with the team, virtually, to find out more.

“The idea for ettv/ethd was brought up by ExtraTorrent users,” the ETTV team says.

There was demand for a new group that would upload scene releases faster than the original EZTV, which was the dominant TV-torrent distribution group around 2011, when it all started.

“At the time the real EZTV was still active. They released stuff hours after it was released from the scene, leaving sites to wait very long for shows to arrive in public. In no way was ettv intended for competitive purposes. We had a lot of respect for Nova and the original EZTV operators.”

While ETTV is regularly referred to as a “group,” it was a one-person operation initially. Just a guy with a seedbox, grabbing scene releases and posting them on torrent sites.

It didn’t take long before people got wind of the new distribution ‘group,’ and interest for the torrents quickly exploded. This meant that a single seedbox was no longer sufficient, but help was not far away.

“It started off with one operator and a seedbox, but it became popular too fast. That’s when former ExtraTorrent owners stepped in to give ETTV the support and funding it needed to keep the story going.”

One of the earliest ETTV uploads on ExtraTorrent

In addition to the available disk space and bandwidth, the team itself expanded as well. At its height, a handful of people were working on the group. However, when things became more and more automated this number reduced again.

What many people don’t realize is that ETTV and ETHD are mostly run by lines of code. The entire distribution process is automated and requires minimal intervention from the people behind it.

“Ettv/Ethd is a bot, it doesn’t require human attention. It grabs what you tell the script to,” the team tells us.

The bot is set up to grab the latest copies of predefined shows from private servers where the latest scene release are posted. These are transferred to the seedbox and the torrents are then pushed out to the public – on ETTV.tv, but also on The Pirate Bay and elsewhere. Everything is automated.

Even most of the maintenance is taken care of by the ‘bot’ itself. When disk space is running out older content is purged, allowing fresh releases to come through.

“The only persons involve with the bots are the bill payers of our new home ettv.tv. All they do check bot logs to see if it has any errors and correct them,” the team explains.

One problem that couldn’t be easily solved with some code was the shutdown of ExtraTorrent. While the bills for the seedboxes were paid in advance until the end of 2017, the groups had to find a new home.

“The shutdown of ExtraTorrent didn’t affect the bots from running, it just left ettv/ethd homeless and caused fans to lose their way trying to find us. Not many knew where else we uploaded or didn’t like the other sites we uploaded to.”

After a few months had passed it became clear that they were not going anywhere. Quite the contrary, they started their very own site, ETTV.tv, where all the latest releases are published.

ETTV.tv

In the near future, the team will focus on turning the site into a new home for its followers. Just a few weeks ago it launched a new release “tag,” ETMovies, which specializes in lower resolution films with a smaller file size, for example.

“We recently introduced ETMovies which is basically for SD Movies, other than that the only plan ettv/ethd has is to give a home to the members that suffered from the sudden shut down of ExtraTorrent.”

Just this week, the site also expanded its reach by adding new categories such as music, games, software, and Books, where approved uploaders will publish content.

While they are doing their best to keep the site up and running, it’s not a given that ETTV will be around forever. As long as there are plenty of funds and no concrete legal pressure they might. But if recent history has shown us anything, it’s that there are no guarantees.

“No one is here seeking to be a millionaire, if the traffic pays the bills we keep going, if not then all we can say is (sorry we tried) we will not be the heroes that saved the day.

“Again and again, the troublesome history of torrent sites is clear. It’s a war no site owner can win. If we are ever in danger, we will choose freedom. It’s not like followers can bail you out if the worst were to happen,” the ETTV team concludes.

For now, however, the bot keeps on running.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

CrimeStoppers Campaign Targets Pirate Set-Top Boxes & Their Users

Post Syndicated from Andy original https://torrentfreak.com/crimestoppers-campaign-targets-pirate-set-top-boxes-their-users-171209/

While many people might believe CrimeStoppers to be an official extension of the police in the UK, the truth is a little more subtle.

CrimeStoppers is a charity that operates a service through which members of the public can report crime anonymously, either using a dedicated phone line or via a website. Callers are not required to give their name, meaning that for those concerned about reprisals or becoming involved in a case for other sensitive reasons, it’s the perfect buffer between them and the authorities.

The people at CrimeStoppers deal with all kinds of crime but perhaps a little surprisingly, they’ve just got involved in the set-top box controversy in the UK.

“Advances in technology have allowed us to enjoy on-screen entertainment in more ways than ever before, with ever increasing amounts of exciting and original content,” the CrimeStoppers campaign begins.

“However, some people are avoiding paying for this content by using modified streaming hardware devices, like a set-top box or stick, in conjunction with software such as illegal apps or add-ons, or illegal mobile apps which allow them to watch new movie releases, TV that hasn’t yet aired, and subscription sports channels for free.”

The campaign has been launched in partnership with the Intellectual Property Office and unnamed “industry partners”. Who these companies are isn’t revealed but given the standard messages being portrayed by the likes of ACE, Premier League and Federation Against Copyright Theft lately, it wouldn’t be a surprise if some or all of them were involved.

Those messages are revealed in a series of four video ads, each taking a different approach towards discouraging the public from using devices loaded with pirate software.

The first video clearly targets the consumer, dispelling the myth that watching pirate video isn’t against the law. It is, that’s not in any doubt, but from the constant tone of the video, one could be forgiven that it’s an extremely serious crime rather than something which is likely to be a civil matter, if anything at all.

It also warns people who are configuring and selling pirate devices that they are breaking the law. Again, this is absolutely true but this activity is clearly several magnitudes more serious than simply viewing. The video blurs the boundaries for what appears to be dramatic effect, however.

Selling and watching is illegal

The second video is all about demonizing the people and groups who may offer set-top boxes to the public.

Instead of portraying the hundreds of “cottage industry” suppliers behind many set-top box sales in the UK, the CrimeStoppers video paints a picture of dark organized crime being the main driver. By buying from these people, the charity warns, criminals are being welcomed in.

“It is illegal. You could also be helping to fund organized crime and bringing it into your community,” the video warns.

Are you funding organized crime?

The third video takes another approach, warning that set-top boxes have few if any parental controls. This could lead to children being exposed to inappropriate content, the charity warns.

“What are your children watching. Does it worry you?” the video asks.

Of course, the same can be said about the Internet, period. Web browsers don’t filter what content children have access to unless parents take pro-active steps to configure special services or software for the purpose.

There’s always the option to supervise children, of course, but Netflix is probably a safer option for those with a preference to stand off. It’s also considerably more expensive, a fact that won’t have escaped users of these devices.

Got kids? Take care….

Finally, video four picks up a theme that’s becoming increasingly common in anti-piracy campaigns – malware and identity theft.

“Why risk having your identity stolen or your bank account or home network hacked. If you access entertainment or sports using dodgy streaming devices or apps, or illegal addons for Kodi, you are increasing the risks,” the ad warns.

Danger….Danger….

Perhaps of most interest is that this entire campaign, which almost certainly has Big Media behind the scenes in advisory and financial capacities, barely mentions the entertainment industries at all.

Indeed, the success of the whole campaign hinges on people worrying about the supposed ill effects of illicit streaming on them personally and then feeling persuaded to inform on suppliers and others involved in the chain.

“Know of someone supplying or promoting these dodgy devices or software? It is illegal. Call us now and help stop crime in your community,” the videos warn.

That CrimeStoppers has taken on this campaign at all is a bit of a head-scratcher, given the bigger crime picture. Struggling with severe budget cuts, police in the UK are already de-prioritizing a number of crimes, leading to something called “screening out”, a process through which victims are given a crime number but no investigation is carried out.

This means that in 2016, 45% of all reported crimes in Greater Manchester weren’t investigated and a staggering 57% of all recorded domestic burglaries weren’t followed up by the police. But it gets worse.

“More than 62pc of criminal damage and arson offenses were not investigated, along with one in three reported shoplifting incidents,” MEN reports.

Given this backdrop, how will police suddenly find the resources to follow up lots of leads from the public and then subsequently prosecute people who sell pirate boxes? Even if they do, will that be at the expense of yet more “screening out” of other public-focused offenses?

No one is saying that selling pirate devices isn’t a crime or at least worthy of being followed up, but is this niche likely to be important to the public when they’re being told that nothing will be done when their homes are emptied by intruders? “NO” says a comment on one of the CrimeStoppers videos on YouTube.

“This crime affects multi-million dollar corporations, I’d rather see tax payers money invested on videos raising awareness of crimes committed against the people rather than the 0.001%,” it concludes.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Security Vulnerabilities in Certificate Pinning

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/security_vulner_10.html

New research found that many banks offer certificate pinning as a security feature, but fail to authenticate the hostname. This leaves the systems open to man-in-the-middle attacks.

From the paper:

Abstract: Certificate verification is a crucial stage in the establishment of a TLS connection. A common security flaw in TLS implementations is the lack of certificate hostname verification but, in general, this is easy to detect. In security-sensitive applications, the usage of certificate pinning is on the rise. This paper shows that certificate pinning can (and often does) hide the lack of proper hostname verification, enabling MITM attacks. Dynamic (black-box) detection of this vulnerability would typically require the tester to own a high security certificate from the same issuer (and often same intermediate CA) as the one used by the app. We present Spinner, a new tool for black-box testing for this vulnerability at scale that does not require purchasing any certificates. By redirecting traffic to websites which use the relevant certificates and then analysing the (encrypted) network traffic we are able to determine whether the hostname check is correctly done, even in the presence of certificate pinning. We use Spinner to analyse 400 security-sensitive Android and iPhone apps. We found that 9 apps had this flaw, including two of the largest banks in the world: Bank of America and HSBC. We also found that TunnelBear, one of the most popular VPN apps was also vulnerable. These apps have a joint user base of tens of millions of users.

News article.

Resilient TVAddons Plans to Ditch Proactive ‘Piracy’ Screening

Post Syndicated from Ernesto original https://torrentfreak.com/resilient-tvaddons-plans-to-ditch-proactive-piracy-screening-171207/

After years of smooth sailing, this year TVAddons became a poster child for the entertainment industry’s war on illicit streaming devices.

The leading repository for unofficial Kodi addons was sued for copyright infringement in the US by satellite and broadcast provider Dish Network. Around the same time, a similar case was filed by Bell, TVA, Videotron, and Rogers in Canada.

The latter case has done the most damage thus far, as it caused the addon repository to lose its domain names and social media accounts. As a result, the site went dead and while many believed it would never return, it made a blazing comeback after a few weeks.

Since the original TVAddons.ag domain was seized, the site returned on TVaddons.co. And that was not the only difference. A lot of the old add-ons, for which it was unclear if they linked to licensed content, were no longer listed in the repository either.

TVAddons previously relied on the DMCA to shield it from liability but apparently, that wasn’t enough. As a result, they took the drastic decision to check all submitted add-ons carefully.

“Since complying with the law is clearly not enough to prevent frivolous legal action from being taken against you, we have been forced to implement a more drastic code vetting process,” a TVAddons representative told us previously.

Despite the absence of several of the most used add-ons, the repository has managed to regain many of its former users. Over the past month, TVAddons had over 12 million unique users. These all manually installed the new repository on their devices.

“We’re not like one of those pirate sites that are shut down and opens on a new domain the next day, getting users to actually manually install a new repo isn’t an easy feat,” a TVAddons representative informs TorrentFreak.

While it’s still far away from the 40 million unique users it had earlier this year, before the trouble began, it’s still a force to be reckoned with.

Interestingly, the vast majority of all TVAddons traffic comes from the United States. The UK is second at a respectable distance, followed by Canada, Germany, and the Netherlands.

While many former users have returned, the submission policy changes didn’t go unnoticed. The relatively small selection of add-ons is a major drawback for some, but that’s about to change as well, we are informed.

TVAddons plans to return to the old submission model where developers can upload their code more freely. Instead of proactive screening, TVAddons will rely on a standard DMCA takedown policy, relying on copyright holders to flag potentially infringing content.

“We intend on returning to a standard DMCA compliant add-on submission policy shortly, there’s no reason why we should be held to a higher standard than Facebook, Twitter, YouTube or Reddit given the fact that we don’t even host any form of streaming content in the first place.

“Our interim policy isn’t pragmatic, it’s nearly impossible for us to verify the global licensing of all forms of protected content. When you visit a website, there’s no way of verifying licensing beyond trusting them based on reputation.”

The upcoming change doesn’t mean that TVAddons will ignore its legal requirements. If they receive a legitimate takedown notice, proper action will be taken, as always. As such, they would operate in the same fashion as other user-generated sites.

“Right now our interim addon submission policy is akin to North Korea. We always followed the law and will always continue to do so. Anytime we’ve received a legitimate complaint we’ve acted upon it in an expedited manner.

“Facebook, Twitter, Reddit and other online communities would have never existed if they were required to approve the contents of each user’s submissions prior to public posting.”

The change takes place while the two court cases are still pending. TVAddons is determined to keep up this fight. Meanwhile, they are also asking the public to support the project financially.

While some copyright holders, including those who are fighting the service in court, might not like the change, TVAddons believes that this is well within their rights. And with support from groups such as the Electronic Frontier Foundation, they don’t stand alone in this.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Looking Forward to 2018

Post Syndicated from Let's Encrypt - Free SSL/TLS Certificates original https://letsencrypt.org//2017/12/07/looking-forward-to-2018.html

Let’s Encrypt had a great year in 2017. We more than doubled the number of active (unexpired) certificates we service to 46 million, we just about tripled the number of unique domains we service to 61 million, and we did it all while maintaining a stellar security and compliance track record. Most importantly though, the Web went from 46% encrypted page loads to 67% according to statistics from Mozilla – a gain of 21% in a single year – incredible. We’re proud to have contributed to that, and we’d like to thank all of the other people and organizations who also worked hard to create a more secure and privacy-respecting Web.

While we’re proud of what we accomplished in 2017, we are spending most of the final quarter of the year looking forward rather than back. As we wrap up our own planning process for 2018, I’d like to share some of our plans with you, including both the things we’re excited about and the challenges we’ll face. We’ll cover service growth, new features, infrastructure, and finances.

Service Growth

We are planning to double the number of active certificates and unique domains we service in 2018, to 90 million and 120 million, respectively. This anticipated growth is due to continuing high expectations for HTTPS growth in general in 2018.

Let’s Encrypt helps to drive HTTPS adoption by offering a free, easy to use, and globally available option for obtaining the certificates required to enable HTTPS. HTTPS adoption on the Web took off at an unprecedented rate from the day Let’s Encrypt launched to the public.

One of the reasons Let’s Encrypt is so easy to use is that our community has done great work making client software that works well for a wide variety of platforms. We’d like to thank everyone involved in the development of over 60 client software options for Let’s Encrypt. We’re particularly excited that support for the ACME protocol and Let’s Encrypt is being added to the Apache httpd server.

Other organizations and communities are also doing great work to promote HTTPS adoption, and thus stimulate demand for our services. For example, browsers are starting to make their users more aware of the risks associated with unencrypted HTTP (e.g. Firefox, Chrome). Many hosting providers and CDNs are making it easier than ever for all of their customers to use HTTPS. Government agencies are waking up to the need for stronger security to protect constituents. The media community is working to Secure the News.

New Features

We’ve got some exciting features planned for 2018.

First, we’re planning to introduce an ACME v2 protocol API endpoint and support for wildcard certificates along with it. Wildcard certificates will be free and available globally just like our other certificates. We are planning to have a public test API endpoint up by January 4, and we’ve set a date for the full launch: Tuesday, February 27.

Later in 2018 we plan to introduce ECDSA root and intermediate certificates. ECDSA is generally considered to be the future of digital signature algorithms on the Web due to the fact that it is more efficient than RSA. Let’s Encrypt will currently sign ECDSA keys from subscribers, but we sign with the RSA key from one of our intermediate certificates. Once we have an ECDSA root and intermediates, our subscribers will be able to deploy certificate chains which are entirely ECDSA.

Infrastructure

Our CA infrastructure is capable of issuing millions of certificates per day with multiple redundancy for stability and a wide variety of security safeguards, both physical and logical. Our infrastructure also generates and signs nearly 20 million OCSP responses daily, and serves those responses nearly 2 billion times per day. We expect issuance and OCSP numbers to double in 2018.

Our physical CA infrastructure currently occupies approximately 70 units of rack space, split between two datacenters, consisting primarily of compute servers, storage, HSMs, switches, and firewalls.

When we issue more certificates it puts the most stress on storage for our databases. We regularly invest in more and faster storage for our database servers, and that will continue in 2018.

We’ll need to add a few additional compute servers in 2018, and we’ll also start aging out hardware in 2018 for the first time since we launched. We’ll age out about ten 2u compute servers and replace them with new 1u servers, which will save space and be more energy efficient while providing better reliability and performance.

We’ll also add another infrastructure operations staff member, bringing that team to a total of six people. This is necessary in order to make sure we can keep up with demand while maintaining a high standard for security and compliance. Infrastructure operations staff are systems administrators responsible for building and maintaining all physical and logical CA infrastructure. The team also manages a 24/7/365 on-call schedule and they are primary participants in both security and compliance audits.

Finances

We pride ourselves on being an efficient organization. In 2018 Let’s Encrypt will secure a large portion of the Web with a budget of only $3.0M. For an overall increase in our budget of only 13%, we will be able to issue and service twice as many certificates as we did in 2017. We believe this represents an incredible value and that contributing to Let’s Encrypt is one of the most effective ways to help create a more secure and privacy-respecting Web.

Our 2018 fundraising efforts are off to a strong start with Platinum sponsorships from Mozilla, Akamai, OVH, Cisco, Google Chrome and the Electronic Frontier Foundation. The Ford Foundation has renewed their grant to Let’s Encrypt as well. We are seeking additional sponsorship and grant assistance to meet our full needs for 2018.

We had originally budgeted $2.91M for 2017 but we’ll likely come in under budget for the year at around $2.65M. The difference between our 2017 expenses of $2.65M and the 2018 budget of $3.0M consists primarily of the additional infrastructure operations costs previously mentioned.

Support Let’s Encrypt

We depend on contributions from our community of users and supporters in order to provide our services. If your company or organization would like to sponsor Let’s Encrypt please email us at [email protected]. We ask that you make an individual contribution if it is within your means.

We’re grateful for the industry and community support that we receive, and we look forward to continuing to create a more secure and privacy-respecting Web!

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

Glenn’s Take on re:Invent 2017 – Part 3

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-3/

Glenn Gore here, Chief Architect for AWS. I was in Las Vegas last week — with 43K others — for re:Invent 2017. I checked in to the Architecture blog here and here with my take on what was interesting about some of the bigger announcements from a cloud-architecture perspective.

In the excitement of so many new services being launched, we sometimes overlook feature updates that, while perhaps not as exciting as Amazon DeepLens, have significant impact on how you architect and develop solutions on AWS.

Amazon DynamoDB is used by more than 100,000 customers around the world, handling over a trillion requests every day. From the start, DynamoDB has offered high availability by natively spanning multiple Availability Zones within an AWS Region. As more customers started building and deploying truly-global applications, there was a need to replicate a DynamoDB table to multiple AWS Regions, allowing for read/write operations to occur in any region where the table was replicated. This update is important for providing a globally-consistent view of information — as users may transition from one region to another — or for providing additional levels of availability, allowing for failover between AWS Regions without loss of information.

There are some interesting concurrency-design aspects you need to be aware of and ensure you can handle correctly. For example, we support the “last writer wins” reconciliation where eventual consistency is being used and an application updates the same item in different AWS Regions at the same time. If you require strongly-consistent read/writes then you must perform all of your read/writes in the same AWS Region. The details behind this can be found in the DynamoDB documentation. Providing a globally-distributed, replicated DynamoDB table simplifies many different use cases and allows for the logic of replication, which may have been pushed up into the application layers to be simplified back down into the data layer.

The other big update for DynamoDB is that you can now back up your DynamoDB table on demand with no impact to performance. One of the features I really like is that when you trigger a backup, it is available instantly, regardless of the size of the table. Behind the scenes, we use snapshots and change logs to ensure a consistent backup. While backup is instant, restoring the table could take some time depending on its size and ranges — from minutes to hours for very large tables.

This feature is super important for those of you who work in regulated industries that often have strict requirements around data retention and backups of data, which sometimes limited the use of DynamoDB or required complex workarounds to implement some sort of backup feature in the past. This often incurred significant, additional costs due to increased read transactions on their DynamoDB tables.

Amazon Simple Storage Service (Amazon S3) was our first-released AWS service over 11 years ago, and it proved the simplicity and scalability of true API-driven architectures in the cloud. Today, Amazon S3 stores trillions of objects, with transactional requests per second reaching into the millions! Dealing with data as objects opened up an incredibly diverse array of use cases ranging from libraries of static images, game binary downloads, and application log data, to massive data lakes used for big data analytics and business intelligence. With Amazon S3, when you accessed your data in an object, you effectively had to write/read the object as a whole or use the range feature to retrieve a part of the object — if possible — in your individual use case.

Now, with Amazon S3 Select, an SQL-like query language is used that can work with delimited text and JSON files, as well as work with GZIP compressed files. We don’t support encryption during the preview of Amazon S3 Select.

Amazon S3 Select provides two major benefits:

  • Faster access
  • Lower running costs

Serverless Lambda functions, where every millisecond matters when you are being charged, will benefit greatly from Amazon S3 Select as data retrieval and processing of your Lambda function will experience significant speedups and cost reductions. For example, we have seen 2x speed improvement and 80% cost reduction with the Serverless MapReduce code.

Other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR will support Amazon S3 Select as well as partner offerings including Cloudera and Hortonworks. If you are using Amazon Glacier for longer-term data archival, you will be able to use Amazon Glacier Select to retrieve a subset of your content from within Amazon Glacier.

As the volume of data that can be stored within Amazon S3 and Amazon Glacier continues to scale on a daily basis, we will continue to innovate and develop improved and optimized services that will allow you to work with these magnificently-large data sets while reducing your costs (retrieval and processing). I believe this will also allow you to simplify the transformation and storage of incoming data into Amazon S3 in basic, semi-structured formats as a single copy vs. some of the duplication and reformatting of data sometimes required to do upfront optimizations for downstream processing. Amazon S3 Select largely removes the need for this upfront optimization and instead allows you to store data once and process it based on your individual Amazon S3 Select query per application or transaction need.

Thanks for reading!

Glenn contemplating why CSV format is still relevant in 2017 (Italy).

GoMovies/123Movies Launches Anime Streaming Site

Post Syndicated from Ernesto original https://torrentfreak.com/gomovies123movies-launches-anime-streaming-site-171204/

Pirate video streaming sites are booming. Their relative ease of use through on-demand viewing makes them a viable alternative to P2P file-sharing, which traditionally dominated the piracy arena.

The popular movie streaming site GoMovies, formerly known as 123movies, is one of the most-used streaming sites. Despite the rebranding and several domain changes, it has built a steady base of millions of users over the past year and a half.

And it’s not done yet, we learn today.

The site, currently operating from the Gostream.is domain name, recently launched a new spinoff targeting anime fans. Animehub.to is currently promoted on GoMovies and the site’s operators aim to turn it into the leading streaming site for anime content.

Animehub.to

Someone connected to GoMovies told us that they’ve received a lot of requests from users to add anime content. Anime has traditionally been a large niche on file-sharing sites and the same is true on streaming platforms.

Technically speaking, GoMovies could have easily filled up the original site with anime content, but the owners prefer a different outlet.

With a separate anime site, they hope to draw in more visitors, TorrentFreak was told by an insider. For one, this makes it possible to rank better in search engines. It also allows the operators to cater specifically to the anime audience, with anime specific categories and release schedules.

Anime copyright holders will not be pleased with the new initiative, that’s for sure, but GoMovies is not new to legal pressure.

Earlier this year the US Ambassador to Vietnam called on the local Government to criminally prosecute people behind 123movies, the previous iteration of the site. In addition, the MPAA reported the site to the US Government in its recent overview of notorious pirate sites.

Pressure or not, it appears that GoMovies has no intention of slowing down or changing its course, although we’ve heard that yet another rebranding is on the horizon.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Google Says It Can’t Filter Pirated Content Proactively

Post Syndicated from Ernesto original https://torrentfreak.com/google-says-it-cant-filter-pirated-content-proactively-171202/

Over the past few years the entertainment industries have repeatedly asked Google to step up its game when it comes to its anti-piracy efforts.

These calls haven’t fallen on deaf ears and Google has steadily implemented various anti-piracy measures in response.

Still, that is not enough. At least, according to several prominent music industry groups who are advocating a ‘Take Down, Stay Down’ approach.

Currently, Google mostly responds to takedown requests that are sent in by copyright holders. The search engine deletes the infringing results and demotes the domains of frequent infringers. However, the same content often reappears on other sites, or in another location on the same site.

Earlier this year a group of prominent music groups stated that the present situation forces rightsholders to participate in a never-ending game of whack-a-mole which doesn’t fix the underlying problem. Instead, it results in a “frustrating, burdensome and ultimately ineffective takedown process.”

While Google understands the rationale behind the complaints, the company doesn’t believe in a more proactive solution. This was reiterated by Matt Brittin, President of EMEA Business & Operations at Google, during the Royal Television Society Event in London this week.

“The music industry has been quite tough with us on this. They’d like us proactively to know this stuff. It’s just not possible in this industry,” Brittin said.

That doesn’t mean that Google is sitting still. Brittin stresses that the company has invested millions in anti-piracy tools. That said, there can always be room for improvement.

“What we’ve tried to do is build tools that allow them to do that at scale easily and that work all together … I’m sure there are places where we could do better. There are teams and millions of dollars invested in this.

“Combatting bad acts and piracy is obviously very important to us,” Brittin added.

While Google sees no room for proactive filtering in search results, music industry insiders believe it’s possible.

Ideally, they want some type of automated algorithm or technology that removes infringing results without a targeted DMCA notice. This could be similar to YouTube’s Content-ID system, or the hash filtering mechanisms Google Drive employs, for example.

For now, however, there’s no sign that Google will go beyond the current takedown notice approach, at least for search. A ‘Take Down, Stay Down’ mechanism wouldn’t “understand” when content is authorized or not, the company previously noted.

And so, the status quo is likely to remain, at least for now.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

New Piracy Scaremongering Video Depicts ‘Dangerous’ Raspberry Pi

Post Syndicated from Andy original https://torrentfreak.com/new-piracy-scaremongering-video-depicts-dangerous-raspberry-pi-171202/

Unless you’ve been living under a rock for the past few years, you’ll be aware that online streaming of video is a massive deal right now.

In addition to the successes of Netflix and Amazon Prime, for example, unauthorized sources are also getting a piece of the digital action.

Of course, entertainment industry groups hate this and are quite understandably trying to do something about it. Few people have a really good argument as to why they shouldn’t but recent tactics by some video-affiliated groups are really starting to wear thin.

From the mouth of Hollywood itself, the trending worldwide anti-piracy message is that piracy is dangerous. Torrent sites carry viruses that will kill your computer, streaming sites carry malware that will steal your identity, and ISDs (that’s ‘Illegal Streaming Devices’, apparently) can burn down your home, kill you, and corrupt your children.

If anyone is still taking notice of these overblown doomsday messages, here’s another one. Brought to you by the Hollywood-funded Digital Citizens Alliance, the new video rams home the message – the exact same message in fact – that set-top boxes providing the latest content for free are a threat to, well, just about everything.

While the message is probably getting a little old now, it’s worth noting the big reveal at ten seconds into the video, where the evil pirate box is introduced to the viewer.

As reproduced in the left-hand image below, it is a blatantly obvious recreation of the totally content-neutral Raspberry Pi, the affordable small computer from the UK. Granted, people sometimes use it for Kodi (the image on the right shows a Kodi-themed Raspberry Pi case, created by official Kodi team partner FLIRC) but its overwhelming uses have nothing to do with the media center, or indeed piracy.

Disreputable and dangerous device? Of course not

So alongside all the scary messages, the video succeeds in demonizing a perfectly innocent and safe device of which more than 15 million have been sold, many of them directly to schools. Since the device is so globally recognizable, it’s a not inconsiderable error.

It’s a topic that the Kodi team itself vented over earlier this week, noting how the British tabloid media presented the recent wave of “Kodi Boxes Can Kill You” click-bait articles alongside pictures of the Raspberry Pi.

“Instead of showing one of the many thousands of generic black boxes sold without the legally required CE/UL marks, the media mainly chose to depict a legitimate Rasbperry Pi clothed in a very familiar Kodi case. The Pis originate from Cambridge, UK, and have been rigorously certified,” the team complain.

“We’re also super-huge fans of the Raspberry Pi Foundation, and the proceeds of Pi board sales fund the awesome work they do to promote STEM (Science, Technology, Engineering and Mathematics) education in schools. The Kodi FLIRC case has also been a hit with our Raspberry Pi users and sales contribute towards the cost of events like Kodi DevCon.”

“It’s insulting, and potentially harmful, to see two successful (and safe) products being wrongly presented for the sake of a headline,” they conclude.

Indeed, it seems that both press and the entertainment industry groups that feed them have been playing fast and loose recently, with the Raspberry Pi getting a particularly raw deal.

Still, if it scares away some pirates, that’s the main thing….

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Glenn’s Take on re:Invent Part 2

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-part-2/

Glenn Gore here, Chief Architect for AWS. I’m in Las Vegas this week — with 43K others — for re:Invent 2017. We’ve got a lot of exciting announcements this week. I’m going to check in to the Architecture blog with my take on what’s interesting about some of the announcements from an cloud architectural perspective. My first post can be found here.

The Media and Entertainment industry has been a rapid adopter of AWS due to the scale, reliability, and low costs of our services. This has enabled customers to create new, online, digital experiences for their viewers ranging from broadcast to streaming to Over-the-Top (OTT) services that can be a combination of live, scheduled, or ad-hoc viewing, while supporting devices ranging from high-def TVs to mobile devices. Creating an end-to-end video service requires many different components often sourced from different vendors with different licensing models, which creates a complex architecture and a complex environment to support operationally.

AWS Media Services
Based on customer feedback, we have developed AWS Media Services to help simplify distribution of video content. AWS Media Services is comprised of five individual services that can either be used together to provide an end-to-end service or individually to work within existing deployments: AWS Elemental MediaConvert, AWS Elemental MediaLive, AWS Elemental MediaPackage, AWS Elemental MediaStore and AWS Elemental MediaTailor. These services can help you with everything from storing content safely and durably to setting up a live-streaming event in minutes without having to be concerned about the underlying infrastructure and scalability of the stream itself.

In my role, I participate in many AWS and industry events and often work with the production and event teams that put these shows together. With all the logistical tasks they have to deal with, the biggest question is often: “Will the live stream work?” Compounding this fear is the reality that, as users, we are also quick to jump on social media and make noise when a live stream drops while we are following along remotely. Worse is when I see event organizers actively selecting not to live stream content because of the risk of failure and and exposure — leading them to decide to take the safe option and not stream at all.

With AWS Media Services addressing many of the issues around putting together a high-quality media service, live streaming, and providing access to a library of content through a variety of mechanisms, I can’t wait to see more event teams use live streaming without the concern and worry I’ve seen in the past. I am excited for what this also means for non-media companies, as video becomes an increasingly common way of sharing information and adding a more personalized touch to internally- and externally-facing content.

AWS Media Services will allow you to focus more on the content and not worry about the platform. Awesome!

Amazon Neptune
As a civilization, we have been developing new ways to record and store information and model the relationships between sets of information for more than a thousand years. Government census data, tax records, births, deaths, and marriages were all recorded on medium ranging from knotted cords in the Inca civilization, clay tablets in ancient Babylon, to written texts in Western Europe during the late Middle Ages.

One of the first challenges of computing was figuring out how to store and work with vast amounts of information in a programmatic way, especially as the volume of information was increasing at a faster rate than ever before. We have seen different generations of how to organize this information in some form of database, ranging from flat files to the Information Management System (IMS) used in the 1960s for the Apollo space program, to the rise of the relational database management system (RDBMS) in the 1970s. These innovations drove a lot of subsequent innovations in information management and application development as we were able to move from thousands of records to millions and billions.

Today, as architects and developers, we have a vast variety of database technologies to select from, which have different characteristics that are optimized for different use cases:

  • Relational databases are well understood after decades of use in the majority of companies who required a database to store information. Amazon Relational Database (Amazon RDS) supports many popular relational database engines such as MySQL, Microsoft SQL Server, PostgreSQL, MariaDB, and Oracle. We have even brought the traditional RDBMS into the cloud world through Amazon Aurora, which provides MySQL and PostgreSQL support with the performance and reliability of commercial-grade databases at 1/10th the cost.
  • Non-relational databases (NoSQL) provided a simpler method of storing and retrieving information that was often faster and more scalable than traditional RDBMS technology. The concept of non-relational databases has existed since the 1960s but really took off in the early 2000s with the rise of web-based applications that required performance and scalability that relational databases struggled with at the time. AWS published this Dynamo whitepaper in 2007, with DynamoDB launching as a service in 2012. DynamoDB has quickly become one of the critical design elements for many of our customers who are building highly-scalable applications on AWS. We continue to innovate with DynamoDB, and this week launched global tables and on-demand backup at re:Invent 2017. DynamoDB excels in a variety of use cases, such as tracking of session information for popular websites, shopping cart information on e-commerce sites, and keeping track of gamers’ high scores in mobile gaming applications, for example.
  • Graph databases focus on the relationship between data items in the store. With a graph database, we work with nodes, edges, and properties to represent data, relationships, and information. Graph databases are designed to make it easy and fast to traverse and retrieve complex hierarchical data models. Graph databases share some concepts from the NoSQL family of databases such as key-value pairs (properties) and the use of a non-SQL query language such as Gremlin. Graph databases are commonly used for social networking, recommendation engines, fraud detection, and knowledge graphs. We released Amazon Neptune to help simplify the provisioning and management of graph databases as we believe that graph databases are going to enable the next generation of smart applications.

A common use case I am hearing every week as I talk to customers is how to incorporate chatbots within their organizations. Amazon Lex and Amazon Polly have made it easy for customers to experiment and build chatbots for a wide range of scenarios, but one of the missing pieces of the puzzle was how to model decision trees and and knowledge graphs so the chatbot could guide the conversation in an intelligent manner.

Graph databases are ideal for this particular use case, and having Amazon Neptune simplifies the deployment of a graph database while providing high performance, scalability, availability, and durability as a managed service. Security of your graph database is critical. To help ensure this, you can store your encrypted data by running AWS in Amazon Neptune within your Amazon Virtual Private Cloud (Amazon VPC) and using encryption at rest integrated with AWS Key Management Service (AWS KMS). Neptune also supports Amazon VPC and AWS Identity and Access Management (AWS IAM) to help further protect and restrict access.

Our customers now have the choice of many different database technologies to ensure that they can optimize each application and service for their specific needs. Just as DynamoDB has unlocked and enabled many new workloads that weren’t possible in relational databases, I can’t wait to see what new innovations and capabilities are enabled from graph databases as they become easier to use through Amazon Neptune.

Look for more on DynamoDB and Amazon S3 from me on Monday.

 

Glenn at Tour de Mont Blanc

 

 

Seven Years of Hadopi: Nine Million Piracy Warnings, 189 Convictions

Post Syndicated from Andy original https://torrentfreak.com/seven-years-of-hadopi-nine-million-piracy-warnings-189-convictions-171201/

More than seven years ago, it was predicted that the next big thing in anti-piracy enforcement would be the graduated response scheme.

Commonly known as “three strikes” or variants thereof, these schemes were promoted as educational in nature, with alleged pirates receiving escalating warnings designed to discourage further infringing behavior.

In the fall of 2010, France became one of the pioneers of the warning system and now almost more than seven years later, a new report from the country’s ‘Hadopi’ anti-piracy agency has revealed the extent of its operations.

Between July 2016 and June 2017, Hadopi sent a total of 889 cases to court, a 30% uplift on the 684 cases handed over during the same period 2015/2016. This boost is notable, not least since the use of peer-to-peer protocols (such as BitTorrent, which Hadopi closely monitors) is declining in favor of streaming methods.

When all the seven years of the scheme are added together ending August 31, 2017, the numbers are even more significant.

“Since the launch of the graduated response scheme, more than 2,000 cases have been sent to prosecutors for possible prosecution,” Hadopi’s report reads.

“The number of cases sent to the prosecutor’s office has increased every year, with a significant increase in the last two years. Three-quarters of all the cases sent to prosecutors have been sent since July 2015.”

In all, the Hadopi agency has sent more than nine million first warning notices to alleged pirates since 2012, with more than 800,000 follow-up warnings on top, 200,000 of them during 2016-2017. But perhaps of most interest is the number of French citizens who, despite all the warnings, carried on with their pirating behavior and ended up prosecuted as a result.

Since the program’s inception, 583 court decisions have been handed down against pirates. While 394 of them resulted in a small fine, a caution, or other community-based punishment, 189 citizens walked away with a criminal conviction.

These can include fines of up to 1,500 euros or in more extreme cases, up to three years in prison and/or a 300,000 euro fine.

While this approach looks set to continue into 2018, Hadopi’s report highlights the need to adapt to a changing piracy landscape, one which requires a multi-faceted approach. In addition to tracking pirates, Hadopi also has a mission to promote legal offerings while educating the public. However, it is fully aware that these strategies alone won’t be enough.

To that end, the agency is calling for broader action, such as faster blocking of sites, expanding to the blocking of mirror sites, tackling unauthorized streaming platforms and, of course, dealing with the “fully-loaded” set-top box phenomenon that’s been sweeping the world for the past two years.

The full report can be downloaded here (pdf, French)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Collect Data Statistics Up to 5x Faster by Analyzing Only Predicate Columns with Amazon Redshift

Post Syndicated from George Caragea original https://aws.amazon.com/blogs/big-data/collect-data-statistics-up-to-5x-faster-by-analyzing-only-predicate-columns-with-amazon-redshift/

Amazon Redshift is a fast, fully managed, petabyte-scale data warehousing service that makes it simple and cost-effective to analyze all of your data. Many of our customers—including Boingo Wireless, Scholastic, Finra, Pinterest, and Foursquare—migrated to Amazon Redshift and achieved agility and faster time to insight, while dramatically reducing costs.

Query optimization and the need for accurate estimates

When a SQL query is submitted to Amazon Redshift, the query optimizer is in charge of generating all the possible ways to execute that query, and picking the fastest one. This can mean evaluating the cost of thousands, if not millions, of different execution plans.

The plan cost is calculated based on estimates of the data characteristics. For example, the characteristics could include the number of rows in each base table, the average width of a variable-length column, the number of distinct values in a column, and the most common values in a column. These estimates (or “statistics”) are computed in advance by running an ANALYZE command, and stored in the system catalog.

How do the query optimizer and ANALYZE work together?

An ideal scenario is to run ANALYZE after every ETL/ingestion job. This way, when running your workload, the query optimizer can use up-to-date data statistics, and choose the most optimal execution plan, given the updates.

However, running the ANALYZE command can add significant overhead to the data ingestion scripts. This can lead to customers not running ANALYZE on their data, and using default or stale estimates. The end result is usually the optimizer choosing a suboptimal execution plan that runs for longer than needed.

Analyzing predicate columns only

When you run a SQL query, the query optimizer requests statistics only on columns used in predicates in the SQL query (join predicates, filters in the WHERE clause and GROUP BY clauses). Consider the following query:

SELECT Avg(salary), 
       Min(hiredate), 
       deptname 
FROM   emp 
WHERE  state = 'CA' 
GROUP  BY deptname; 

In the query above, the optimizer requests statistics only on columns ‘state’ and ‘deptname’, but not on ‘salary’ and ‘hiredate’. If present, statistics on columns ‘salary’ and ‘hiredate’ are ignored, as they do not impact the cost of the execution plans considered.

Based on the optimizer functionality described earlier, the Amazon Redshift ANALYZE command has been updated to optionally collect information only about columns used in previous queries as part of a filter, join condition or a GROUP BY clause, and columns that are part of distribution or sort keys (predicate columns). There’s a recently introduced option for the ANALYZE command that only analyzes predicate columns:

ANALYZE <table name> PREDICATE COLUMNS;

By having Amazon Redshift collect information about predicate columns automatically, and analyzing those columns only, you’re able to reduce the time to run ANALYZE. For example, during the execution of the 99 queries in the TPC-DS workload, only 203 out of the 424 total columns are predicate columns (approximately 48%). By analyzing only the predicate columns for such a workload, the execution time for running ANALYZE can be significantly reduced.

From my experience in the data warehousing space, I have observed that about 20% of columns in a typical use case are marked predicate. In such a case, running ANALYZE PREDICATE COLUMNS can lead to a speedup of up to 5x relative to a full ANALYZE run.

If no information on predicate columns exists in the system (for example, a new table that has not been queried yet), ANALYZE PREDICATE COLUMNS collects statistics on all the columns. When queries on the table are run, Amazon Redshift collects information about predicate column usage, and subsequent runs of ANALYZE PREDICATE COLUMNS only operates on the predicate columns.

If the workload is relatively stable, and the set of predicate columns does not expand continuously over time, I recommend replacing all occurrences of the ANALYZE command with ANALYZE PREDICATE COLUMNS commands in your application and data ingestion code.

Using the Analyze/Vacuum utility

Several AWS customers are using the Analyze/Vacuum utility from the Redshift-Utils package to manage and automate their maintenance operations. By passing the –predicate-cols option to the Analyze/Vacuum utility, you can enable it to use the ANALYZE PREDICATE COLUMNS feature, providing you with the significant changes in overhead in a completely seamless manner.

Enhancements to logging for ANALYZE operations

When running ANALYZE with the PREDICATE COLUMNS option, the type of analyze run (Full vs Predicate Column), as well as information about the predicate columns encountered, is logged in the stl_analyze view:

SELECT status, 
       starttime, 
       prevtime, 
       num_predicate_cols, 
       num_new_predicate_cols 
FROM   stl_analyze;
     status   |    starttime        |   prevtime          | pred_cols | new_pred_cols
--------------+---------------------+---------------------+-----------+---------------
 Full         | 2017-11-09 01:15:47 |                     |         0 |             0
 PredicateCol | 2017-11-09 01:16:20 | 2017-11-09 01:15:47 |         2 |             2

AWS also enhanced the pg_statistic catalog table with two new pieces of information: the time stamp at which a column was marked as “predicate”, and the time stamp at which the column was last analyzed.

The Amazon Redshift documentation provides a view that allows a user to easily see which columns are marked as predicate, when they were marked as predicate, and when a column was last analyzed. For example, for the emp table used above, the output of the view could be as follows:

 SELECT col_name, 
       is_predicate, 
       first_predicate_use, 
       last_analyze 
FROM   predicate_columns 
WHERE  table_name = 'emp';

 col_name | is_predicate | first_predicate_use  |        last_analyze
----------+--------------+----------------------+----------------------------
 id       | f            |                      | 2017-11-09 01:15:47
 name     | f            |                      | 2017-11-09 01:15:47
 deptname | t            | 2017-11-09 01:16:03  | 2017-11-09 01:16:20
 age      | f            |                      | 2017-11-09 01:15:47
 salary   | f            |                      | 2017-11-09 01:15:47
 hiredate | f            |                      | 2017-11-09 01:15:47
 state    | t            | 2017-11-09 01:16:03  | 2017-11-09 01:16:20

Conclusion

After loading new data into an Amazon Redshift cluster, statistics need to be re-computed to guarantee performant query plans. By learning which column statistics are actually being used by the customer’s workload and collecting statistics only on those columns, Amazon Redshift is able to significantly reduce the amount of time needed for table maintenance during data loading workflows.


Additional Reading

Be sure to check out the Top 10 Tuning Techniques for Amazon Redshift, and the Advanced Table Design Playbook: Distribution Styles and Distribution Keys.


About the Author

George Caragea is a Senior Software Engineer with Amazon Redshift. He has been working on MPP Databases for over 6 years and is mainly interested in designing systems at scale. In his spare time, he enjoys being outdoors and on the water in the beautiful Bay Area and finishing the day exploring the rich local restaurant scene.

 

 

YouTube Begins Blocking Music in Finland Due to Licensing Failure

Post Syndicated from Andy original https://torrentfreak.com/youtube-begins-blocking-music-in-finland-due-to-licensing-failure-171130/

YouTube is used by millions of people worldwide to access a broad range of content but it is music that is increasingly one of the platform’s big draws.

With an almost unrivaled library, YouTube is the go-to service for music fans globally but over in Finland this morning, things aren’t playing out well.

As shown in the image below, users who try to access music are now getting the following graphic. When translated the text reads “Video content owned by Teosto. The video can not be used in your country.”

No license…..No access…

This is a pretty big deal. Teosto is a Finnish performance rights organization that collects royalties on behalf of local artists and composers. It represents around 30,000 local songwriters and publishers, small fry when compared to the three million foreign music entities it represents in Finland.

This means that YouTube must have pulled huge volumes of content from its platform locally, rendering the service far less attractive to users. However, according to a TorrentFreak source, things go much further than standard modern licensed music.

As shown in the image below, even music published in 1899 has found itself pulled from the platform.

Jean Sibelius’ masterpiece Finlandia? Gone..

The music licensing dispute, which appears to have led to millions of tracks being rendered inaccessible in Finland, was confirmed by YouTube this morning.

“We were unable to reach a new licensing agreement with TEOSTO. Because of this, some videos containing music will be blocked in Finland,” the team said.

While the removal of content will come as a disappointment to the quarter of Finnish citizens who use YouTube regularly, it doesn’t come as a complete surprise.

In September, Teosto issued an opinion on copyrights to Parliament’s Education Committee. The licensing group complained that rightsholders aren’t adequately compensated for content played on platforms like YouTube. Like other groups in the same position, Teosto is looking to obtain more revenue for its members. That seems to be the basis for the dispute with YouTube.

For YouTube to have pulled so much content, negotiations must have really broken down, but Teosto sounded a note of optimism this morning. The group noted that while Google had indeed pulled music content from YouTube in Finland, it may reinstate it during the next couple of days.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Netflix Is Not Going to Kill Piracy, Research Suggests

Post Syndicated from Ernesto original https://torrentfreak.com/netflix-not-going-kill-piracy-research-suggests-171129/

There is little doubt that, in many countries, Netflix has become the standard for watching movies on the Internet.

Generally speaking, on-demand streaming services are convenient alternatives to piracy. However, millions of people stick to their old pirate habits, Netflix subscription or not.

Intrigued by this interplay of legal and unauthorized viewing, researchers from Carnegie Mellon University and Universidade Católica Portuguesa carried out an extensive study. They partnered with a major telco, which is not named, to analyze if BitTorrent downloading habits can be changed by offering legal alternatives.

The researchers used a piracy-tracking firm to get a sample of thousands of BitTorrent pirates at the associated ISP. Half of them were offered a free 45-day subscription to a premium TV and movies package, allowing them to watch popular content on demand.

To measure the effects of video-on-demand access on piracy, the researchers then monitored the legal viewing activity and BitTorrent transfers of the people who received the free offer, comparing it to a control group. The results show that piracy is harder to beat than some would expect.

Subscribers who received the free subscription watched more TV, but overall their torrenting habits didn’t change significantly.

“We find that, on average, households that received the gift increased overall TV consumption by 4.6% and reduced Internet downloads and uploads by 4.2% and 4.5%, respectively. However, and also on average, treated households did not change their likelihood of using BitTorrent during the experiment,” the researchers write.

One of the main problems was that these ‘pirates’ couldn’t get all their favorite shows and movies on the legal service, which is a common problem. For the small portion of subscribers who had access to their preferred content, the researchers did find an effect on torrent traffic.

“Households with preferences aligned with the gifted content reduced their probability of using BitTorrent during the experiment by 18% and decreased their amount of upload traffic by 45%,” the paper reads.

The video-on-demand service in the study had an average “fit” of just 12% with people’s viewing preferences, which means that they were missing a lot of content. But even Netflix, which has a library of thousands of titles, only has a fit of roughly 50%.

The researchers show that the lack of availability is partly caused by licensing windows, which makes it hard for legal video streaming services to compete with piracy.

“We show that licensing windows impose significant restrictions on the content that can be included in SVoD catalogs, which hampers the ability of content distributors to offer catalogs that cater to the preferences of pirates,” they write.

However, even if more content became available, piracy wouldn’t magically disappear. In the experiment, subscribers were offered free access to a video on demand service. In the real world, they would have to pay, which presents another barrier.

In this study, the pirate households were willing to pay at most $3.25 USD per month to access a service with a library as large as Netflix’s in the United States. That’s not enough.

This leads the researchers to the grim conclusion that video on demand services such as Netflix can’t significantly lower piracy rates. They could make a dent if they increase their content libraries while lowering the price at the same time, but that’s not going to happen.

“Together, our results show that, as a stand-alone strategy, using legal SVoD to curtail piracy will require, at the minimum, offering content much earlier and at much lower prices than those currently offered in the marketplace, changes that are likely to reduce industry revenue and that may damage overall incentives to produce new content while, at the same time, curbing only a small share of piracy,” the researchers conclude.

While Hollywood maintains that people can get pretty much anything they want legally, the current research shows that it’s not as simple as that. Most people are not going to pay for 22 separate subscriptions. Instead of more streaming services, it would be better to make more content available at the ones that are already out there.

The research was partially funded by the Carnegie Mellon University’s IDEA, which receives an unrestricted gift from the MPAA, so Hollywood will likely be clued in on the results.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

In the Works – AWS IoT Device Defender – Secure Your IoT Fleet

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-aws-sepio-secure-your-iot-fleet/

Scale takes on a whole new meaning when it comes to IoT. Last year I was lucky enough to tour a gigantic factory that had, on average, one environment sensor per square meter. The sensors measured temperature, humidity, and air purity several times per second, and served as an early warning system for contaminants. I’ve heard customers express interest in deploying IoT-enabled consumer devices in the millions or tens of millions.

With powerful, long-lived devices deployed in a geographically distributed fashion, managing security challenges is crucial. However, the limited amount of local compute power and memory can sometimes limit the ability to use encryption and other forms of data protection.

To address these challenges and to allow our customers to confidently deploy IoT devices at scale, we are working on IoT Device Defender. While the details might change before release, AWS IoT Device Defender is designed to offer these benefits:

Continuous AuditingAWS IoT Device Defender monitors the policies related to your devices to ensure that the desired security settings are in place. It looks for drifts away from best practices and supports custom audit rules so that you can check for conditions that are specific to your deployment. For example, you could check to see if a compromised device has subscribed to sensor data from another device. You can run audits on a schedule or on an as-needed basis.

Real-Time Detection and AlertingAWS IoT Device Defender looks for and quickly alerts you to unusual behavior that could be coming from a compromised device. It does this by monitoring the behavior of similar devices over time, looking for unauthorized access attempts, changes in connection patterns, and changes in traffic patterns (either inbound or outbound).

Fast Investigation and Mitigation – In the event that you get an alert that something unusual is happening, AWS IoT Device Defender gives you the tools, including contextual information, to help you to investigate and mitigate the problem. Device information, device statistics, diagnostic logs, and previous alerts are all at your fingertips. You have the option to reboot the device, revoke its permissions, reset it to factory defaults, or push a security fix.

Stay Tuned
I’ll have more info (and a hands-on post) as soon as possible, so stay tuned!

Jeff;

New- AWS IoT Device Management

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-iot-device-management/

AWS IoT and AWS Greengrass give you a solid foundation and programming environment for your IoT devices and applications.

The nature of IoT means that an at-scale device deployment often encompasses millions or even tens of millions of devices deployed at hundreds or thousands of locations. At that scale, treating each device individually is impossible. You need to be able to set up, monitor, update, and eventually retire devices in bulk, collective fashion while also retaining the flexibility to accommodate varying deployment configurations, device models, and so forth.

New AWS IoT Device Management
Today we are launching AWS IoT Device Management to help address this challenge. It will help you through each phase of the device lifecycle, from manufacturing to retirement. Here’s what you get:

Onboarding – Starting with devices in their as-manufactured state, you can control the provisioning workflow. You can use IoT Device Management templates to quickly onboard entire fleets of devices with a few clicks. The templates can include information about device certificates and access policies.

Organization – In order to deal with massive numbers of devices, AWS IoT Device Management extends the existing IoT Device Registry and allows you to create a hierarchical model of your fleet and to set policies on a hierarchical basis. You can drill-down through the hierarchy in order to locate individual devices. You can also query your fleet on attributes such as device type or firmware version.

Monitoring – Telemetry from the devices is used to gather real-time connection, authentication, and status metrics, which are published to Amazon CloudWatch. You can examine the metrics and locate outliers for further investigation. IoT Device Management lets you configure the log level for each device group, and you can also publish change events for the Registry and Jobs for monitoring purposes.

Remote ManagementAWS IoT Device Management lets you remotely manage your devices. You can push new software and firmware to them, reset to factory defaults, reboot, and set up bulk updates at the desired velocity.

Exploring AWS IoT Device Management
The AWS IoT Device Management Console took me on a tour and pointed out how to access each of the features of the service:

I already have a large set of devices (pressure gauges):

These gauges were created using the new template-driven bulk registration feature. Here’s how I create a template:

The gauges are organized into groups (by US state in this case):

Here are the gauges in Colorado:

AWS IoT group policies allow you to control access to specific IoT resources and actions for all members of a group. The policies are structured very much like IAM policies, and can be created in the console:

Jobs are used to selectively update devices. Here’s how I create one:

As indicated by the Job type above, jobs can run either once or continuously. Here’s how I choose the devices to be updated:

I can create custom authorizers that make use of a Lambda function:

I’ve shown you a medium-sized subset of AWS IoT Device Management in this post. Check it out for yourself to learn more!

Jeff;