Tag Archives: Position

Kim Dotcom Opposes US’s “Fugitive” Claims at Supreme Court

Post Syndicated from Ernesto original https://torrentfreak.com/kim-dotcom-opposes-uss-fugitive-claims-supreme-court-170622/

megaupload-logoWhen Megaupload and Kim Dotcom were raided five years ago, the authorities seized millions of dollars in cash and other property.

The US government claimed the assets were obtained through copyright crimes so went after the bank accounts, cars, and other seized possessions of the Megaupload defendants.

Kim Dotcom and his colleagues were branded as “fugitives” and the Government won its case. Dotcom’s legal team quickly appealed this verdict, but lost once more at the Fourth Circuit appeals court.

A few weeks ago Dotcom and his former colleagues petitioned the Supreme Court to take on the case.

They don’t see themselves as “fugitives” and want the assets returned. The US Government opposed the request, but according to a new reply filed by Megaupload’s legal team, the US Government ignores critical questions.

The Government has a “vested financial stake” in maintaining the current situation, they write, which allows the authorities to use their “fugitive” claims as an offensive weapon.

“Far from being directed towards persons who have fled or avoided our country while claiming assets in it, fugitive disentitlement is being used offensively to strip foreigners of their assets abroad,” the reply brief (pdf) reads.

According to Dotcom’s lawyers there are several conflicting opinions from lower courts, which should be clarified by the Supreme Court. That Dotcom and his colleagues have decided to fight their extradition in New Zealand, doesn’t warrant the seizure of their assets.

“Absent review, forfeiture of tens of millions of dollars will be a fait accompli without the merits being reached,” they write, adding that this is all the more concerning because the US Government’s criminal case may not be as strong as claimed.

“This is especially disconcerting because the Government’s criminal case is so dubious. When the Government characterizes Petitioners as ‘designing and profiting from a system that facilitated wide-scale copyright infringement,’ it continues to paint a portrait of secondary copyright infringement, which is not a crime.”

The defense team cites several issues that warrant review and urges the Supreme Court to hear the case. If not, the Government will effectively be able to use assets seizures as a pressure tool to urge foreign defendants to come to the US.

“If this stands, the Government can weaponize fugitive disentitlement in order to claim assets abroad,” the reply brief reads.

“It is time for the Court to speak to the Questions Presented. Over the past two decades it has never had a better vehicle to do so, nor is any such vehicle elsewhere in sight,” Dotcom’s lawyers add.

Whether the Supreme Court accepts or denies the case will likely be decided in the weeks to come.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

MPAA & RIAA Demand Tough Copyright Standards in NAFTA Negotiations

Post Syndicated from Andy original https://torrentfreak.com/mpaa-riaa-demand-tough-copyright-standards-in-nafta-negotiations-170621/

The North American Free Trade Agreement (NAFTA) between the United States, Canada, and Mexico was negotiated more than 25 years ago. With a quarter of a decade of developments to contend with, the United States wants to modernize.

“While our economy and U.S. businesses have changed considerably over that period, NAFTA has not,” the government says.

With this in mind, the US requested comments from interested parties seeking direction for negotiation points. With those comments now in, groups like the MPAA and RIAA have been making their positions known. It’s no surprise that intellectual property enforcement is high on the agenda.

“Copyright is the lifeblood of the U.S. motion picture and television industry. As such, MPAA places high priority on securing strong protection and enforcement disciplines in the intellectual property chapters of trade agreements,” the MPAA writes in its submission.

“Strong IPR protection and enforcement are critical trade priorities for the music industry. With IPR, we can create good jobs, make significant contributions to U.S. economic growth and security, invest in artists and their creativity, and drive technological innovation,” the RIAA notes.

While both groups have numerous demands, it’s clear that each seeks an environment where not only infringers can be held liable, but also Internet platforms and services.

For the RIAA, there is a big focus on the so-called ‘Value Gap’, a phenomenon found on user-uploaded content sites like YouTube that are able to offer infringing content while avoiding liability due to Section 512 of the DMCA.

“Today, user-uploaded content services, which have developed sophisticated on-demand music platforms, use this as a shield to avoid licensing music on fair terms like other digital services, claiming they are not legally responsible for the music they distribute on their site,” the RIAA writes.

“Services such as Apple Music, TIDAL, Amazon, and Spotify are forced to compete with services that claim they are not liable for the music they distribute.”

But if sites like YouTube are exercising their rights while acting legally under current US law, how can partners Canada and Mexico do any better? For the RIAA, that can be achieved by holding them to standards envisioned by the group when the DMCA was passed, not how things have panned out since.

Demanding that negotiators “protect the original intent” of safe harbor, the RIAA asks that a “high-level and high-standard service provider liability provision” is pursued. This, the music group says, should only be available to “passive intermediaries without requisite knowledge of the infringement on their platforms, and inapplicable to services actively engaged in communicating to the public.”

In other words, make sure that YouTube and similar sites won’t enjoy the same level of safe harbor protection as they do today.

The RIAA also requires any negotiated safe harbor provisions in NAFTA to be flexible in the event that the DMCA is tightened up in response to the ongoing safe harbor rules study.

In any event, NAFTA should not “support interpretations that no longer reflect today’s digital economy and threaten the future of legitimate and sustainable digital trade,” the RIAA states.

For the MPAA, Section 512 is also perceived as a problem. While noting that the original intent was to foster a system of shared responsibility between copyright owners and service providers, the MPAA says courts have subsequently let copyright holders down. Like the RIAA, the MPAA also suggests that Canada and Mexico can be held to higher standards.

“We recommend a new approach to this important trade policy provision by moving to high-level language that establishes intermediary liability and appropriate limitations on liability. This would be fully consistent with U.S. law and avoid the same misinterpretations by policymakers and courts overseas,” the MPAA writes.

“In so doing, a modernized NAFTA would be consistent with Trade Promotion Authority’s negotiating objective of ‘ensuring that standards of protection and enforcement keep pace with technological developments’.”

The MPAA also has some specific problems with Mexico, including unauthorized camcording. The Hollywood group says that 85 illicit audio and video recordings of films were linked to Mexican theaters in 2016. However, recording is not currently a criminal offense in Mexico.

Another issue for the MPAA is that criminal sanctions for commercial scale infringement are only available if the infringement is for profit.

“This has hampered enforcement against the above-discussed camcording problem but also against online infringement, such as peer-to-peer piracy, that may be on a scale that is immensely harmful to U.S. rightsholders but nonetheless occur without profit by the infringer,” the MPAA writes.

“The modernized NAFTA like other U.S. bilateral free trade agreements must provide for criminal sanctions against commercial scale infringements without proof of profit motive.”

Also of interest are the MPAA’s complaints against Mexico’s telecoms laws. Unlike in the US and many countries in Europe, Mexico’s ISPs are forbidden to hand out their customers’ personal details to rights holders looking to sue. This, the MPAA says, needs to change.

The submissions from the RIAA and MPAA can be found here and here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

In the Works – AWS Region in Hong Kong

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-aws-region-in-hong-kong/

Last year we launched new AWS Regions in Canada, India, Korea, the UK (London), and the United States (Ohio), and announced that new regions are coming to France (Paris), China (Ningxia), and Sweden (Stockholm).

Coming to Hong Kong in 2018
Today, I am happy to be able to tell you that we are planning to open up an AWS Region in Hong Kong, in 2018. Hong Kong is a leading international financial center, well known for its service oriented economy. It is rated highly on innovation and for ease of doing business. As an evangelist, I get to visit many great cities in the world, and was lucky to have spent some time in Hong Kong back in 2014 and met a number of awesome customers there. Many of these customers have given us feedback that they wanted a local AWS Region.

This will be the eighth AWS Region in Asia Pacific joining six other Regions there — Singapore, Tokyo, Sydney, Beijing, Seoul, and Mumbai, and an additional Region in China (Ningxia) expected to launch in the coming months. Together, these Regions will provide our customers with a total of 19 Availability Zones (AZs) and allow them to architect highly fault tolerant applications.

Today, our infrastructure comprises 43 Availability Zones across 16 geographic regions worldwide, with another three AWS Regions (and eight Availability Zones) in France, China, and Sweden coming online throughout 2017 and 2018, (see the AWS Global Infrastructure page for more info).

We are looking forward to serving new and existing customers in Hong Kong and working with partners across Asia-Pacific. Of course, the new region will also be open to existing AWS customers who would like to process and store data in Hong Kong. Public sector organizations such as government agencies, educational institutions, and nonprofits in Hong Kong will be able to use this region to store sensitive data locally (the AWS in the Public Sector page has plenty of success stories drawn from our worldwide customer base).

If you are a customer or a partner and have specific questions about this Region, you can contact our Hong Kong team.

Help Wanted
If you are interested in learning more about AWS positions in Hong Kong, please visit the Amazon Jobs site and set the location to Hong Kong.

Jeff;

 

Building Loosely Coupled, Scalable, C# Applications with Amazon SQS and Amazon SNS

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/building-loosely-coupled-scalable-c-applications-with-amazon-sqs-and-amazon-sns/

 
Stephen Liedig, Solutions Architect

 

One of the many challenges professional software architects and developers face is how to make cloud-native applications scalable, fault-tolerant, and highly available.

Fundamental to your project success is understanding the importance of making systems highly cohesive and loosely coupled. That means considering the multi-dimensional facets of system coupling to support the distributed nature of the applications that you are building for the cloud.

By that, I mean addressing not only the application-level coupling (managing incoming and outgoing dependencies), but also considering the impacts of of platform, spatial, and temporal coupling of your systems. Platform coupling relates to the interoperability, or lack thereof, of heterogeneous systems components. Spatial coupling deals with managing components at a network topology level or protocol level. Temporal, or runtime coupling, refers to the ability of a component within your system to do any kind of meaningful work while it is performing a synchronous, blocking operation.

The AWS messaging services, Amazon SQS and Amazon SNS, help you deal with these forms of coupling by providing mechanisms for:

  • Reliable, durable, and fault-tolerant delivery of messages between application components
  • Logical decomposition of systems and increased autonomy of components
  • Creating unidirectional, non-blocking operations, temporarily decoupling system components at runtime
  • Decreasing the dependencies that components have on each other through standard communication and network channels

Following on the recent topic, Building Scalable Applications and Microservices: Adding Messaging to Your Toolbox, in this post, I look at some of the ways you can introduce SQS and SNS into your architectures to decouple your components, and show how you can implement them using C#.

Walkthrough

To illustrate some of these concepts, consider a web application that processes customer orders. As good architects and developers, you have followed best practices and made your application scalable and highly available. Your solution included implementing load balancing, dynamic scaling across multiple Availability Zones, and persisting orders in a Multi-AZ Amazon RDS database instance, as in the following diagram.


In this example, the application is responsible for handling and persisting the order data, as well as dealing with increases in traffic for popular items.

One potential point of vulnerability in the order processing workflow is in saving the order in the database. The business expects that every order has been persisted into the database. However, any potential deadlock, race condition, or network issue could cause the persistence of the order to fail. Then, the order is lost with no recourse to restore the order.

With good logging capability, you may be able to identify when an error occurred and which customer’s order failed. This wouldn’t allow you to “restore” the transaction, and by that stage, your customer is no longer your customer.

As illustrated in the following diagram, introducing an SQS queue helps improve your ordering application. Using the queue isolates the processing logic into its own component and runs it in a separate process from the web application. This, in turn, allows the system to be more resilient to spikes in traffic, while allowing work to be performed only as fast as necessary in order to manage costs.


In addition, you now have a mechanism for persisting orders as messages (with the queue acting as a temporary database), and have moved the scope of your transaction with your database further down the stack. In the event of an application exception or transaction failure, this ensures that the order processing can be retired or redirected to the Amazon SQS Dead Letter Queue (DLQ), for re-processing at a later stage. (See the recent post, Using Amazon SQS Dead-Letter Queues to Control Message Failure, for more information on dead-letter queues.)

Scaling the order processing nodes

This change allows you now to scale the web application frontend independently from the processing nodes. The frontend application can continue to scale based on metrics such as CPU usage, or the number of requests hitting the load balancer. Processing nodes can scale based on the number of orders in the queue. Here is an example of scale-in and scale-out alarms that you would associate with the scaling policy.

Scale-out Alarm

aws cloudwatch put-metric-alarm --alarm-name AddCapacityToCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
--statistic Average --period 300 --threshold 3 --comparison-operator GreaterThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
--evaluation-periods 2 --alarm-actions <arn of the scale-out autoscaling policy>

Scale-in Alarm

aws cloudwatch put-metric-alarm --alarm-name RemoveCapacityFromCustomerOrderQueue --metric-name ApproximateNumberOfMessagesVisible --namespace "AWS/SQS" 
 --statistic Average --period 300 --threshold 1 --comparison-operator LessThanOrEqualToThreshold --dimensions Name=QueueName,Value=customer-orders
 --evaluation-periods 2 --alarm-actions <arn of the scale-in autoscaling policy>

In the above example, use the ApproximateNumberOfMessagesVisible metric to discover the queue length and drive the scaling policy of the Auto Scaling group. Another useful metric is ApproximateAgeOfOldestMessage, when applications have time-sensitive messages and developers need to ensure that messages are processed within a specific time period.

Scaling the order processing implementation

On top of scaling at an infrastructure level using Auto Scaling, make sure to take advantage of the processing power of your Amazon EC2 instances by using as many of the available threads as possible. There are several ways to implement this. In this post, we build a Windows service that uses the BackgroundWorker class to process the messages from the queue.

Here’s a closer look at the implementation. In the first section of the consuming application, use a loop to continually poll the queue for new messages, and construct a ReceiveMessageRequest variable.

public static void PollQueue()
{
    while (_running)
    {
        Task<ReceiveMessageResponse> receiveMessageResponse;

        // Pull messages off the queue
        using (var sqs = new AmazonSQSClient())
        {
            const int maxMessages = 10;  // 1-10

            //Receiving a message
            var receiveMessageRequest = new ReceiveMessageRequest
            {
                // Get URL from Configuration
                QueueUrl = _queueUrl, 
                // The maximum number of messages to return. 
                // Fewer messages might be returned. 
                MaxNumberOfMessages = maxMessages, 
                // A list of attributes that need to be returned with message.
                AttributeNames = new List<string> { "All" },
                // Enable long polling. 
                // Time to wait for message to arrive on queue.
                WaitTimeSeconds = 5 
            };

            receiveMessageResponse = sqs.ReceiveMessageAsync(receiveMessageRequest);
        }

The WaitTimeSeconds property of the ReceiveMessageRequest specifies the duration (in seconds) that the call waits for a message to arrive in the queue before returning a response to the calling application. There are a few benefits to using long polling:

  • It reduces the number of empty responses by allowing SQS to wait until a message is available in the queue before sending a response.
  • It eliminates false empty responses by querying all (rather than a limited number) of the servers.
  • It returns messages as soon any message becomes available.

For more information, see Amazon SQS Long Polling.

After you have returned messages from the queue, you can start to process them by looping through each message in the response and invoking a new BackgroundWorker thread.

// Process messages
if (receiveMessageResponse.Result.Messages != null)
{
    foreach (var message in receiveMessageResponse.Result.Messages)
    {
        Console.WriteLine("Received SQS message, starting worker thread");

        // Create background worker to process message
        BackgroundWorker worker = new BackgroundWorker();
        worker.DoWork += (obj, e) => ProcessMessage(message);
        worker.RunWorkerAsync();
    }
}
else
{
    Console.WriteLine("No messages on queue");
}

The event handler, ProcessMessage, is where you implement business logic for processing orders. It is important to have a good understanding of how long a typical transaction takes so you can set a message VisibilityTimeout that is long enough to complete your operation. If order processing takes longer than the specified timeout period, the message becomes visible on the queue. Other nodes may pick it and process the same order twice, leading to unintended consequences.

Handling Duplicate Messages

In order to manage duplicate messages, seek to make your processing application idempotent. In mathematics, idempotent describes a function that produces the same result if it is applied to itself:

f(x) = f(f(x))

No matter how many times you process the same message, the end result is the same (definition from Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions, Hohpe and Wolf, 2004).

There are several strategies you could apply to achieve this:

  • Create messages that have inherent idempotent characteristics. That is, they are non-transactional in nature and are unique at a specified point in time. Rather than saying “place new order for Customer A,” which adds a duplicate order to the customer, use “place order <orderid> on <timestamp> for Customer A,” which creates a single order no matter how often it is persisted.
  • Deliver your messages via an Amazon SQS FIFO queue, which provides the benefits of message sequencing, but also mechanisms for content-based deduplication. You can deduplicate using the MessageDeduplicationId property on the SendMessage request or by enabling content-based deduplication on the queue, which generates a hash for MessageDeduplicationId, based on the content of the message, not the attributes.
var sendMessageRequest = new SendMessageRequest
{
    QueueUrl = _queueUrl,
    MessageBody = JsonConvert.SerializeObject(order),
    MessageGroupId = Guid.NewGuid().ToString("N"),
    MessageDeduplicationId = Guid.NewGuid().ToString("N")
};
  • If using SQS FIFO queues is not an option, keep a message log of all messages attributes processed for a specified period of time, as an alternative to message deduplication on the receiving end. Verifying the existence of the message in the log before processing the message adds additional computational overhead to your processing. This can be minimized through low latency persistence solutions such as Amazon DynamoDB. Bear in mind that this solution is dependent on the successful, distributed transaction of the message and the message log.

Handling exceptions

Because of the distributed nature of SQS queues, it does not automatically delete the message. Therefore, you must explicitly delete the message from the queue after processing it, using the message ReceiptHandle property (see the following code example).

However, if at any stage you have an exception, avoid handling it as you normally would. The intention is to make sure that the message ends back on the queue, so that you can gracefully deal with intermittent failures. Instead, log the exception to capture diagnostic information, and swallow it.

By not explicitly deleting the message from the queue, you can take advantage of the VisibilityTimeout behavior described earlier. Gracefully handle the message processing failure and make the unprocessed message available to other nodes to process.

In the event that subsequent retries fail, SQS automatically moves the message to the configured DLQ after the configured number of receives has been reached. You can further investigate why the order process failed. Most importantly, the order has not been lost, and your customer is still your customer.

private static void ProcessMessage(Message message)
{
    using (var sqs = new AmazonSQSClient())
    {
        try
        {
            Console.WriteLine("Processing message id: {0}", message.MessageId);

            // Implement messaging processing here
            // Ensure no downstream resource contention (parallel processing)
            // <your order processing logic in here…>
            Console.WriteLine("{0} Thread {1}: {2}", DateTime.Now.ToString("s"), Thread.CurrentThread.ManagedThreadId, message.MessageId);
            
            // Delete the message off the queue. 
            // Receipt handle is the identifier you must provide 
            // when deleting the message.
            var deleteRequest = new DeleteMessageRequest(_queueName, message.ReceiptHandle);
            sqs.DeleteMessageAsync(deleteRequest);
            Console.WriteLine("Processed message id: {0}", message.MessageId);

        }
        catch (Exception ex)
        {
            // Do nothing.
            // Swallow exception, message will return to the queue when 
            // visibility timeout has been exceeded.
            Console.WriteLine("Could not process message due to error. Exception: {0}", ex.Message);
        }
    }
}

Using SQS to adapt to changing business requirements

One of the benefits of introducing a message queue is that you can accommodate new business requirements without dramatically affecting your application.

If, for example, the business decided that all orders placed over $5000 are to be handled as a priority, you could introduce a new “priority order” queue. The way the orders are processed does not change. The only significant change to the processing application is to ensure that messages from the “priority order” queue are processed before the “standard order” queue.

The following diagram shows how this logic could be isolated in an “order dispatcher,” whose only purpose is to route order messages to the appropriate queue based on whether the order exceeds $5000. Nothing on the web application or the processing nodes changes other than the target queue to which the order is sent. The rates at which orders are processed can be achieved by modifying the poll rates and scalability settings that I have already discussed.

Extending the design pattern with Amazon SNS

Amazon SNS supports reliable publish-subscribe (pub-sub) scenarios and push notifications to known endpoints across a wide variety of protocols. It eliminates the need to periodically check or poll for new information and updates. SNS supports:

  • Reliable storage of messages for immediate or delayed processing
  • Publish / subscribe – direct, broadcast, targeted “push” messaging
  • Multiple subscriber protocols
  • Amazon SQS, HTTP, HTTPS, email, SMS, mobile push, AWS Lambda

With these capabilities, you can provide parallel asynchronous processing of orders in the system and extend it to support any number of different business use cases without affecting the production environment. This is commonly referred to as a “fanout” scenario.

Rather than your web application pushing orders to a queue for processing, send a notification via SNS. The SNS messages are sent to a topic and then replicated and pushed to multiple SQS queues and Lambda functions for processing.

As the diagram above shows, you have the development team consuming “live” data as they work on the next version of the processing application, or potentially using the messages to troubleshoot issues in production.

Marketing is consuming all order information, via a Lambda function that has subscribed to the SNS topic, inserting the records into an Amazon Redshift warehouse for analysis.

All of this, of course, is happening without affecting your order processing application.

Summary

While I haven’t dived deep into the specifics of each service, I have discussed how these services can be applied at an architectural level to build loosely coupled systems that facilitate multiple business use cases. I’ve also shown you how to use infrastructure and application-level scaling techniques, so you can get the most out of your EC2 instances.

One of the many benefits of using these managed services is how quickly and easily you can implement powerful messaging capabilities in your systems, and lower the capital and operational costs of managing your own messaging middleware.

Using Amazon SQS and Amazon SNS together can provide you with a powerful mechanism for decoupling application components. This should be part of design considerations as you architect for the cloud.

For more information, see the Amazon SQS Developer Guide and Amazon SNS Developer Guide. You’ll find tutorials on all the concepts covered in this post, and more. To can get started using the AWS console or SDK of your choice visit:

Happy messaging!

BPI Breaks Record After Sending 310 Million Google Takedowns

Post Syndicated from Andy original https://torrentfreak.com/bpi-breaks-record-after-sending-310-million-google-takedowns-170619/

A little over a year ago during March 2016, music industry group BPI reached an important milestone. After years of sending takedown notices to Google, the group burst through the 200 million URL barrier.

The fact that it took BPI several years to reach its 200 million milestone made the surpassing of the quarter billion milestone a few months later even more remarkable. In October 2016, the group sent its 250 millionth takedown to Google, a figure that nearly doubled when accounting for notices sent to Microsoft’s Bing.

But despite the volumes, the battle hadn’t been won, let alone the war. The BPI’s takedown machine continued to run at a remarkable rate, churning out millions more notices per week.

As a result, yet another new milestone was reached this month when the BPI smashed through the 300 million URL barrier. Then, days later, a further 10 million were added, with the latter couple of million added during the time it took to put this piece together.

BPI takedown notices, as reported by Google

While demanding that Google places greater emphasis on its de-ranking of ‘pirate’ sites, the BPI has called again and again for a “notice and stay down” regime, to ensure that content taken down by the search engine doesn’t simply reappear under a new URL. It’s a position BPI maintains today.

“The battle would be a whole lot easier if intermediaries played fair,” a BPI spokesperson informs TF.

“They need to take more proactive responsibility to reduce infringing content that appears on their platform, and, where we expressly notify infringing content to them, to ensure that they do not only take it down, but also keep it down.”

The long-standing suggestion is that the volume of takedown notices sent would reduce if a “take down, stay down” regime was implemented. The BPI says it’s difficult to present a precise figure but infringing content has a tendency to reappear, both in search engines and on hosting sites.

“Google rejects repeat notices for the same URL. But illegal content reappears as it is re-indexed by Google. As to the sites that actually host the content, the vast majority of notices sent to them could be avoided if they implemented take-down & stay-down,” BPI says.

The fact that the BPI has added 60 million more takedowns since the quarter billion milestone a few months ago is quite remarkable, particularly since there appears to be little slowdown from month to month. However, the numbers have grown so huge that 310 billion now feels a lot like 250 million, with just a few added on top for good measure.

That an extra 60 million takedowns can almost be dismissed as a handful is an indication of just how massive the issue is online. While pirates always welcome an abundance of links to juicy content, it’s no surprise that groups like the BPI are seeking more comprehensive and sustainable solutions.

Previously, it was hoped that the Digital Economy Bill would provide some relief, hopefully via government intervention and the imposition of a search engine Code of Practice. In the event, however, all pressure on search engines was removed from the legislation after a separate voluntary agreement was reached.

All parties agreed that the voluntary code should come into effect two weeks ago on June 1 so it seems likely that some effects should be noticeable in the near future. But the BPI says it’s still early days and there’s more work to be done.

“BPI has been working productively with search engines since the voluntary code was agreed to understand how search engines approach the problem, but also what changes can and have been made and how results can be improved,” the group explains.

“The first stage is to benchmark where we are and to assess the impact of the changes search engines have made so far. This will hopefully be completed soon, then we will have better information of the current picture and from that we hope to work together to continue to improve search for rights owners and consumers.”

With more takedown notices in the pipeline not yet publicly reported by Google, the BPI informs TF that it has now notified the search giant of 315 million links to illegal content.

“That’s an astonishing number. More than 1 in 10 of the entire world’s notices to Google come from BPI. This year alone, one in every three notices sent to Google from BPI is for independent record label repertoire,” BPI concludes.

While it’s clear that groups like BPI have developed systems to cope with the huge numbers of takedown notices required in today’s environment, it’s clear that few rightsholders are happy with the status quo. With that in mind, the fight will continue, until search engines are forced into compromise. Considering the implications, that could only appear on a very distant horizon.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

[$] The Brave web browser

Post Syndicated from jake original https://lwn.net/Articles/725261/rss

The Brave web browser is a project from
a new company called Brave Software. It was founded by Brendan Eich, who is the
inventor of JavaScript and former developer and CTO at Mozilla; he
hopes to dramatically re-invent the advertising model of the web while
strengthening user anonymity and security. Brave’s value proposition is
that instead of being served advertisements from web sites that use the
revenue to pay their bills, users can opt to directly pay the content
providers of their choosing with cryptocurrency. Also, there is a
recognition of the
utility of targeted advertising, so users have an option of saving a local,
protected profile that can be used anonymously to obtain targeted
advertisements instead of having their online behavior tracked and sold by
a third party.

US Opposes Kim Dotcom’s Supreme Court Petition Over Seized Millions

Post Syndicated from Ernesto original https://torrentfreak.com/us-opposes-kim-dotcoms-supreme-court-petition-over-seized-millions-170613/

megaupload-logoFollowing the 2012 raid on Megaupload and Kim Dotcom, U.S. and New Zealand authorities seized millions of dollars in cash and other property.

Claiming the assets were obtained through copyright and money laundering crimes, the U.S. government launched a separate civil action in which it asked the court to forfeit the bank accounts, cars, and other seized possessions of the Megaupload defendants.

The U.S. branded Dotcom and his colleagues as “fugitives” and won their case. Dotcom’s legal team quickly appealed this verdict, but lost once more at the Fourth Circuit appeals court.

However, Dotcom didn’t give up and petitioned the US Supreme Court to hear the case. Together with the other defendants, he wants the Supreme Court to overturn the “fugitive disentitlement” ruling and the forfeiture of his assets.

The crux of the case is whether or not the District Court’s order to forfeit an estimated $67 million in assets was right. The defense argues that Dotcom and the other Megaupload defendants were wrongfully labeled as fugitives by the Department of Justice.

“If left undisturbed, the Fourth Circuit’s decision enables the Government to obtain civil forfeiture of every penny of a foreign citizen’s foreign assets based on unproven allegations of the most novel, dubious United States crimes,” Dotcom’s legal team wrote.

The United States Government disagrees with this assessment. In their opposition brief (pdf), submitted late last week and picked up by ARS, the Department of Justice asks the Supreme Court not to take on the case.

According to the US, the decision to label Dotcom and his colleagues as fugitives is how Congress intended the relevant section of the law to work. In addition, the current rulings are not incompatible with previous court decisions in similar cases.

“Petitioners also seek review of the court of appeals’ holding that they qualify as ‘fugitives’ under the federal fugitive-disentitlement statute […] because they declined to enter the United States with the specific intent to avoid prosecution,” DoJ writes in its brief.

“That contention does not warrant review. The court of appeals correctly construed Section 2466 in light of its text and purpose. Its holding applying the statute to the facts here does not conflict with any decision of another circuit,” the brief adds.

The full opposition brief responds in detail to the petition of Dotcom and his colleagues, with the US ultimately concluding that the Supreme Court should deny the request.

Dotcom and his legal team have previously stated that they need more resources to mount a proper defense against the criminal complaint. The case has been ongoing for more than half a decade and is being fought in several courts, which has proven to be rather expensive.

Whether the Supreme Court accepts or denies the case will likely be decided in the weeks to come. Until then, the waiting continues.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Usenet Provider is Obliged to Identify Pirates, Court Rules

Post Syndicated from Ernesto original https://torrentfreak.com/usenet-provider-has-to-identify-pirates-court-rules-170609/

Dutch anti-piracy group BREIN has targeted pirates of all shapes and sizes over the past several years.

It’s also one of the few groups that actively tracks down copyright infringers on Usenet, which still has millions of frequent users.

BREIN sets its aim on prolific uploaders and other large-scale copyright infringers. After identifying its targets, it asks providers to reveal the personal details connected to the account.

Last December, BREIN asked Usenet provider Eweka to hand over the personal details of one of its former customers but the provider refused to cooperate voluntarily.

In its defense, the Usenet provider argued that it’s a neutral intermediary that would rather not perform the role of piracy police. Instead, it preferred to rely on the court to make a decision.

The provider had already taken a similar position earlier last year, but the Court of Haarlem ruled that it must hand over the information.

In a new ruling this week, the Court issued a similar order.

The Court stressed that in these type of situations the Usenet provider is required to hand over the requested details, without intervention from the court. This is in line with case law.

Under Dutch law, ISPs can be obliged to hand over the personal details of their customers if the infringing activity is plausible and the aggrieved party has a legitimate interest.

The former Eweka customer was known under the alias ‘Badfan69’ and previously uploaded 9,538 allegedly infringing works to Usenet, Tweakers reports. He was tracked down through information from the headers of the binaries he posted.

BREIN is pleased with the verdict, which once again strengthens its position in cases where third-party providers hold information on infringing customers.

“Most of the intermediaries adhere to the law and voluntarily provide the relevant data when BREIN makes a motivated request,” BREIN director Tim Kuik responds.

“They have to decide quickly because rightsholders have an interest in stopping uploaders and holding them liable as soon as possible. This sentence emphasizes this once again.”

The court ordered Eweka to pay legal fees of roughly 1,500 euros. In addition, the provider faces a penalty of 1,000 euros per day, to a maximum of 100,000 euros, if it fails to hand over the requested information in its possession.

Eweka hasn’t commented publicly on the verdict yet. But, with two rulings in favor of BREIN, it is unlikely that the provider will continue to fight similar cases in the future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

[$] Range reader/writer locks for the kernel

Post Syndicated from corbet original https://lwn.net/Articles/724502/rss

The kernel uses a variety of lock types internally, but they all share one
feature in common: they are a simple either/or proposition. When a lock is
obtained for a resource, the entire resource is locked, even if
exclusive access is only needed to a part of that resource. Many resources
managed by the kernel are complex entities for which it may make sense to
only lock a smaller part; files (consisting of a range of bytes) or a
process’s address space are examples of this type of resource. For years,
kernel developers have talked about adding “range locks” — locks that would
only apply to a portion of a given resource — as a way of increasing
concurrency. Work has progressed in that
area, and range locks may soon be added to the kernel’s locking toolkit.

“Only a year? It’s felt like forever”: a twelve-month retrospective

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/12-months-raspberry-pi/

This weekend saw my first anniversary at Raspberry Pi, and this blog marks my 100th post written for the company. It would have been easy to let one milestone or the other slide had they not come along hand in hand, begging for some sort of acknowledgement.

Alex, Matt, and Courtney in a punt on the Cam

The day Liz decided to keep me

So here it is!

Joining the crew

Prior to my position in the Comms team as Social Media Editor, my employment history was largely made up of retail sales roles and, before that, bit parts in theatrical backstage crews. I never thought I would work for the Raspberry Pi Foundation, despite its firm position on my Top Five Awesome Places I’d Love to Work list. How could I work for a tech company when my knowledge of tech stretched as far as dismantling my Game Boy when I was a kid to see how the insides worked, or being the one friend everyone went to when their phone didn’t do what it was meant to do? I never thought about the other side of the Foundation coin, or how I could find my place within the hidden workings that turned the cogs that brought everything together.

… when suddenly, as if out of nowhere, a new job with a dream company. #raspberrypi #positive #change #dosomething

12 Likes, 1 Comments – Alex J’rassic (@thealexjrassic) on Instagram: “… when suddenly, as if out of nowhere, a new job with a dream company. #raspberrypi #positive…”

A little luck, a well-written though humorous resumé, and a meeting with Liz and Helen later, I found myself the newest member of the growing team at Pi Towers.

Ticking items off the Bucket List

I thought it would be fun to point out some of the chances I’ve had over the last twelve months and explain how they fit within the world of Raspberry Pi. After all, we’re about more than just a $35 credit card-sized computer. We’re a charitable Foundation made up of some wonderful and exciting projects, people, and goals.

High altitude ballooning (HAB)

Skycademy offers educators in the UK the chance to come to Pi Towers Cambridge to learn how to plan a balloon launch, build a payload with onboard Raspberry Pi and Camera Module, and provide teachers with the skills needed to take their students on an adventure to near space, with photographic evidence to prove it.

All the screens you need to hunt balloons. . We have our landing point and are now rushing to Therford to find the payload in a field. . #HAB #RasppberryPi

332 Likes, 5 Comments – Raspberry Pi (@raspberrypifoundation) on Instagram: “All the screens you need to hunt balloons. . We have our landing point and are now rushing to…”

I was fortunate enough to join Sky Captain James, along with Dan Fisher, Dave Akerman, and Steve Randell on a test launch back in August last year. Testing out new kit that James had still been tinkering with that morning, we headed to a field in Elsworth, near Cambridge, and provided Facebook Live footage of the process from payload build to launch…to the moment when our balloon landed in an RAF shooting range some hours later.

RAF firing range sign

“Can we have our balloon back, please, mister?”

Having enjoyed watching Blue Peter presenters send up a HAB when I was a child, I marked off the event on my bucket list with a bold tick, and I continue to show off the photographs from our Raspberry Pi as it reached near space.

Spend the day launching/chasing a high-altitude balloon. Look how high it went!!! #HAB #ballooning #space #wellspacekinda #ish #photography #uk #highaltitude

13 Likes, 2 Comments – Alex J’rassic (@thealexjrassic) on Instagram: “Spend the day launching/chasing a high-altitude balloon. Look how high it went!!! #HAB #ballooning…”

You can find more information on Skycademy here, plus more detail about our test launch day in Dan’s blog post here.

Dear Raspberry Pi Friends…

My desk is slowly filling with stuff: notes, mementoes, and trinkets that find their way to me from members of the community, both established and new to the life of Pi. There are thank you notes, updates, and more from people I’ve chatted to online as they explore their way around the world of Pi.

Letter of thanks to Raspberry Pi from a young fan

*heart melts*

By plugging myself into social media on a daily basis, I often find hidden treasures that go unnoticed due to the high volume of tags we receive on Facebook, Twitter, Instagram, and so on. Kids jumping off chairs in delight as they complete their first Scratch project, newcomers to the Raspberry Pi shedding a tear as they make an LED blink on their kitchen table, and seasoned makers turning their hobby into something positive to aid others.

It’s wonderful to join in the excitement of people discovering a new skill and exploring the community of Raspberry Pi makers: I’ve been known to shed a tear as a result.

Meeting educators at Bett, chatting to teen makers at makerspaces, and sharing a cupcake or three at the birthday party have been incredible opportunities to get to know you all.

You’re all brilliant.

The Queens of Robots, both shoddy and otherwise

Last year we welcomed the Queen of Shoddy Robots, Simone Giertz to Pi Towers, where we chatted about making, charity, and space while wandering the colleges of Cambridge and hanging out with flat Tim Peake.

Queen of Robots @simonegiertz came to visit #PiTowers today. We hung out with cardboard @astro_timpeake and ate chelsea buns at @fitzbillies #Cambridge. . We also had a great talk about the educational projects of the #RaspberryPi team, #AstroPi and how not enough people realise we’re a #charity. . If you’d like to learn more about the Raspberry Pi Foundation and the work we do with #teachers and #education, check out our website – www.raspberrypi.org. . How was your day? Get up to anything fun?

597 Likes, 3 Comments – Raspberry Pi (@raspberrypifoundation) on Instagram: “Queen of Robots @simonegiertz came to visit #PiTowers today. We hung out with cardboard…”

And last month, the wonderful Estefannie ‘Explains it All’ de La Garza came to hang out, make things, and discuss our educational projects.

Estefannie on Twitter

Ahhhh!!! I still can’t believe I got to hang out and make stuff at the @Raspberry_Pi towers!! Thank you thank you!!

Meeting such wonderful, exciting, and innovative YouTubers was a fantastic inspiration to work on my own projects and to try to do more to help others discover ways to connect with tech through their own interests.

Those ‘wow’ moments

Every Raspberry Pi project I see on a daily basis is awesome. The moment someone takes an idea and does something with it is, in my book, always worthy of awe and appreciation. Whether it be the aforementioned flashing LED, or sending Raspberry Pis to the International Space Station, if you have turned your idea into reality, I applaud you.

Some of my favourite projects over the last twelve months have not only made me say “Wow!”, they’ve also inspired me to want to do more with myself, my time, and my growing maker skill.

Museum in a Box on Twitter

Great to meet @alexjrassic today and nerd out about @Raspberry_Pi and weather balloons and @Space_Station and all things #edtech 🎈⛅🛰📚🤖

Projects such as Museum in a Box, a wonderful hands-on learning aid that brings the world to the hands of children across the globe, honestly made me tear up as I placed a miniaturised 3D-printed Virginia Woolf onto a wooden box and gasped as she started to speak to me.

Jill Ogle’s Let’s Robot project had me in awe as Twitch-controlled Pi robots tackled mazes, attempted to cut birthday cake, or swung to slap Jill in the face over webcam.

Jillian Ogle on Twitter

@SryAbtYourCats @tekn0rebel @Beam Lol speaking of faces… https://t.co/1tqFlMNS31

Every day I discover new, wonderful builds that both make me wish I’d thought of them first, and leave me wondering how they manage to make them work in the first place.

Space

We have Raspberry Pis in space. SPACE. Actually space.

Raspberry Pi on Twitter

New post: Mission accomplished for the European @astro_pi challenge and @esa @Thom_astro is on his way home 🚀 https://t.co/ycTSDR1h1Q

Twelve months later, this still blows my mind.

And let’s not forget…

  • The chance to visit both the Houses of Parliment and St James’s Palace

Raspberry Pi team at the Houses of Parliament

  • Going to a Doctor Who pre-screening and meeting Peter Capaldi, thanks to Clare Sutcliffe

There’s no need to smile when you’re #DoctorWho.

13 Likes, 2 Comments – Alex J’rassic (@thealexjrassic) on Instagram: “There’s no need to smile when you’re #DoctorWho.”

We’re here. Where are you? . . . . . #raspberrypi #vidconeu #vidcon #pizero #zerow #travel #explore #adventure #youtube

1,944 Likes, 30 Comments – Raspberry Pi (@raspberrypifoundation) on Instagram: “We’re here. Where are you? . . . . . #raspberrypi #vidconeu #vidcon #pizero #zerow #travel #explore…”

  • Making a GIF Cam and other builds, and sharing them with you all via the blog

Made a Gif Cam using a Raspberry Pi, Pi camera, button and a couple LEDs. . When you press the button, it takes 8 images and stitches them into a gif file. The files then appear on my MacBook. . Check out our Twitter feed (Raspberry_Pi) for examples! . Next step is to fit it inside a better camera body. . #DigitalMaking #Photography #Making #Camera #Gif #MakersGonnaMake #LED #Creating #PhotosofInstagram #RaspberryPi

19 Likes, 1 Comments – Alex J’rassic (@thealexjrassic) on Instagram: “Made a Gif Cam using a Raspberry Pi, Pi camera, button and a couple LEDs. . When you press the…”

The next twelve months

Despite Eben jokingly firing me near-weekly across Twitter, or Philip giving me the ‘Dad glare’ when I pull wires and buttons out of a box under my desk to start yet another project, I don’t plan on going anywhere. Over the next twelve months, I hope to continue discovering awesome Pi builds, expanding on my own skills, and curating some wonderful projects for you via the Raspberry Pi blog, the Raspberry Pi Weekly newsletter, my submissions to The MagPi Magazine, and the occasional video interview or two.

It’s been a pleasure. Thank you for joining me on the ride!

The post “Only a year? It’s felt like forever”: a twelve-month retrospective appeared first on Raspberry Pi.

Building High-Throughput Genomics Batch Workflows on AWS: Workflow Layer (Part 4 of 4)

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/building-high-throughput-genomics-batch-workflows-on-aws-workflow-layer-part-4-of-4/

Aaron Friedman is a Healthcare and Life Sciences Partner Solutions Architect at AWS

Angel Pizarro is a Scientific Computing Technical Business Development Manager at AWS

This post is the fourth in a series on how to build a genomics workflow on AWS. In Part 1, we introduced a general architecture, shown below, and highlighted the three common layers in a batch workflow:

  • Job
  • Batch
  • Workflow

In Part 2, you built a Docker container for each job that needed to run as part of your workflow, and stored them in Amazon ECR.

In Part 3, you tackled the batch layer and built a scalable, elastic, and easily maintainable batch engine using AWS Batch. This solution took care of dynamically scaling your compute resources in response to the number of runnable jobs in your job queue length as well as managed job placement.

In part 4, you build out the workflow layer of your solution using AWS Step Functions and AWS Lambda. You then run an end-to-end genomic analysis―specifically known as exome secondary analysis―for many times at a cost of less than $1 per exome.

Step Functions makes it easy to coordinate the components of your applications using visual workflows. Building applications from individual components that each perform a single function lets you scale and change your workflow quickly. You can use the graphical console to arrange and visualize the components of your application as a series of steps, which simplify building and running multi-step applications. You can change and add steps without writing code, so you can easily evolve your application and innovate faster.

An added benefit of using Step Functions to define your workflows is that the state machines you create are immutable. While you can delete a state machine, you cannot alter it after it is created. For regulated workloads where auditing is important, you can be assured that state machines you used in production cannot be altered.

In this blog post, you will create a Lambda state machine to orchestrate your batch workflow. For more information on how to create a basic state machine, please see this Step Functions tutorial.

All code related to this blog series can be found in the associated GitHub repository here.

Build a state machine building block

To skip the following steps, we have provided an AWS CloudFormation template that can deploy your Step Functions state machine. You can use this in combination with the setup you did in part 3 to quickly set up the environment in which to run your analysis.

The state machine is composed of smaller state machines that submit a job to AWS Batch, and then poll and check its execution.

The steps in this building block state machine are as follows:

  1. A job is submitted.
    Each analytical module/job has its own Lambda function for submission and calls the batchSubmitJob Lambda function that you built in the previous blog post. You will build these specialized Lambda functions in the following section.
  2. The state machine queries the AWS Batch API for the job status.
    This is also a Lambda function.
  3. The job status is checked to see if the job has completed.
    If the job status equals SUCCESS, proceed to log the final job status. If the job status equals FAILED, end the execution of the state machine. In all other cases, wait 30 seconds and go back to Step 2.

Here is the JSON representing this state machine.

{
  "Comment": "A simple example that submits a Job to AWS Batch",
  "StartAt": "SubmitJob",
  "States": {
    "SubmitJob": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>::function:batchSubmitJob",
      "Next": "GetJobStatus"
    },
    "GetJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "Next": "CheckJobStatus",
      "InputPath": "$",
      "ResultPath": "$.status"
    },
    "CheckJobStatus": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.status",
          "StringEquals": "FAILED",
          "End": true
        },
        {
          "Variable": "$.status",
          "StringEquals": "SUCCEEDED",
          "Next": "GetFinalJobStatus"
        }
      ],
      "Default": "Wait30Seconds"
    },
    "Wait30Seconds": {
      "Type": "Wait",
      "Seconds": 30,
      "Next": "GetJobStatus"
    },
    "GetFinalJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "End": true
    }
  }
}

Building the Lambda functions for the state machine

You need two basic Lambda functions for this state machine. The first one submits a job to AWS Batch and the second checks the status of the AWS Batch job that was submitted.

In AWS Step Functions, you specify an input as JSON that is read into your state machine. Each state receives the aggregate of the steps immediately preceding it, and you can specify which components a state passes on to its children. Because you are using Lambda functions to execute tasks, one of the easiest routes to take is to modify the input JSON, represented as a Python dictionary, within the Lambda function and return the entire dictionary back for the next state to consume.

Building the batchSubmitIsaacJob Lambda function

For Step 1 above, you need a Lambda function for each of the steps in your analysis workflow. As you created a generic Lambda function in the previous post to submit a batch job (batchSubmitJob), you can use that function as the basis for the specialized functions you’ll include in this state machine. Here is such a Lambda function for the Isaac aligner.

from __future__ import print_function

import boto3
import json
import traceback

lambda_client = boto3.client('lambda')



def lambda_handler(event, context):
    try:
        # Generate output put
        bam_s3_path = '/'.join([event['resultsS3Path'], event['sampleId'], 'bam/'])

        depends_on = event['dependsOn'] if 'dependsOn' in event else []

        # Generate run command
        command = [
            '--bam_s3_folder_path', bam_s3_path,
            '--fastq1_s3_path', event['fastq1S3Path'],
            '--fastq2_s3_path', event['fastq2S3Path'],
            '--reference_s3_path', event['isaac']['referenceS3Path'],
            '--working_dir', event['workingDir']
        ]

        if 'cmdArgs' in event['isaac']:
            command.extend(['--cmd_args', event['isaac']['cmdArgs']])
        if 'memory' in event['isaac']:
            command.extend(['--memory', event['isaac']['memory']])

        # Submit Payload
        response = lambda_client.invoke(
            FunctionName='batchSubmitJob',
            InvocationType='RequestResponse',
            LogType='Tail',
            Payload=json.dumps(dict(
                dependsOn=depends_on,
                containerOverrides={
                    'command': command,
                },
                jobDefinition=event['isaac']['jobDefinition'],
                jobName='-'.join(['isaac', event['sampleId']]),
                jobQueue=event['isaac']['jobQueue']
            )))

        response_payload = response['Payload'].read()

        # Update event
        event['bamS3Path'] = bam_s3_path
        event['jobId'] = json.loads(response_payload)['jobId']
        
        return event
    except Exception as e:
        traceback.print_exc()
        raise e

In the Lambda console, create a Python 2.7 Lambda function named batchSubmitIsaacJob and paste in the above code. Use the LambdaBatchExecutionRole that you created in the previous post. For more information, see Step 2.1: Create a Hello World Lambda Function.

This Lambda function reads in the inputs passed to the state machine it is part of, formats the data for the batchSubmitJob Lambda function, invokes that Lambda function, and then modifies the event dictionary to pass onto the subsequent states. You can repeat these for each of the other tools, which can be found in the tools//lambda/lambda_function.py script in the GitHub repo.

Building the batchGetJobStatus Lambda function

For Step 2 above, the process queries the AWS Batch DescribeJobs API action with jobId to identify the state that the job is in. You can put this into a Lambda function to integrate it with Step Functions.

In the Lambda console, create a new Python 2.7 function with the LambdaBatchExecutionRole IAM role. Name your function batchGetJobStatus and paste in the following code. This is similar to the batch-get-job-python27 Lambda blueprint.

from __future__ import print_function

import boto3
import json

print('Loading function')

batch_client = boto3.client('batch')

def lambda_handler(event, context):
    # Log the received event
    print("Received event: " + json.dumps(event, indent=2))
    # Get jobId from the event
    job_id = event['jobId']

    try:
        response = batch_client.describe_jobs(
            jobs=[job_id]
        )
        job_status = response['jobs'][0]['status']
        return job_status
    except Exception as e:
        print(e)
        message = 'Error getting Batch Job status'
        print(message)
        raise Exception(message)

Structuring state machine input

You have structured the state machine input so that general file references are included at the top-level of the JSON object, and any job-specific items are contained within a nested JSON object. At a high level, this is what the input structure looks like:

{
        "general_field_1": "value1",
        "general_field_2": "value2",
        "general_field_3": "value3",
        "job1": {},
        "job2": {},
        "job3": {}
}

Building the full state machine

By chaining these state machine components together, you can quickly build flexible workflows that can process genomes in multiple ways. The development of the larger state machine that defines the entire workflow uses four of the above building blocks. You use the Lambda functions that you built in the previous section. Rename each building block submission to match the tool name.

We have provided a CloudFormation template to deploy your state machine and the associated IAM roles. In the CloudFormation console, select Create Stack, choose your template (deploy_state_machine.yaml), and enter in the ARNs for the Lambda functions you created.

Continue through the rest of the steps and deploy your stack. Be sure to check the box next to "I acknowledge that AWS CloudFormation might create IAM resources."

Once the CloudFormation stack is finished deploying, you should see the following image of your state machine.

In short, you first submit a job for Isaac, which is the aligner you are using for the analysis. Next, you use parallel state to split your output from "GetFinalIsaacJobStatus" and send it to both your variant calling step, Strelka, and your QC step, Samtools Stats. These then are run in parallel and you annotate the results from your Strelka step with snpEff.

Putting it all together

Now that you have built all of the components for a genomics secondary analysis workflow, test the entire process.

We have provided sequences from an Illumina sequencer that cover a region of the genome known as the exome. Most of the positions in the genome that we have currently associated with disease or human traits reside in this region, which is 1–2% of the entire genome. The workflow that you have built works for both analyzing an exome, as well as an entire genome.

Additionally, we have provided prebuilt reference genomes for Isaac, located at:

s3://aws-batch-genomics-resources/reference/

If you are interested, we have provided a script that sets up all of that data. To execute that script, run the following command on a large EC2 instance:

make reference REGISTRY=<your-ecr-registry>

Indexing and preparing this reference takes many hours on a large-memory EC2 instance. Be careful about the costs involved and note that the data is available through the prebuilt reference genomes.

Starting the execution

In a previous section, you established a provenance for the JSON that is fed into your state machine. For ease, we have auto-populated the input JSON for you to the state machine. You can also find this in the GitHub repo under workflow/test.input.json:

{
  "fastq1S3Path": "s3://aws-batch-genomics-resources/fastq/SRR1919605_1.fastq.gz",
  "fastq2S3Path": "s3://aws-batch-genomics-resources/fastq/SRR1919605_2.fastq.gz",
  "referenceS3Path": "s3://aws-batch-genomics-resources/reference/hg38.fa",
  "resultsS3Path": "s3://<bucket>/genomic-workflow/results",
  "sampleId": "NA12878_states_1",
  "workingDir": "/scratch",
  "isaac": {
    "jobDefinition": "isaac-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/highPriority-myenv",
    "referenceS3Path": "s3://aws-batch-genomics-resources/reference/isaac/"
  },
  "samtoolsStats": {
    "jobDefinition": "samtools_stats-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/lowPriority-myenv"
  },
  "strelka": {
    "jobDefinition": "strelka-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/highPriority-myenv",
    "cmdArgs": " --exome "
  },
  "snpEff": {
    "jobDefinition": "snpeff-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/lowPriority-myenv",
    "cmdArgs": " -t hg38 "
  }
}

You are now at the stage to run your full genomic analysis. Copy the above to a new text file, change paths and ARNs to the ones that you created previously, and save your JSON input as input.states.json.

In the CLI, execute the following command. You need the ARN of the state machine that you created in the previous post:

aws stepfunctions start-execution --state-machine-arn <your-state-machine-arn> --input file://input.states.json

Your analysis has now started. By using Spot Instances with AWS Batch, you can quickly scale out your workflows while concurrently optimizing for cost. While this is not guaranteed, most executions of the workflows presented here should cost under $1 for a full analysis.

Monitoring the execution

The output from the above CLI command gives you the ARN that describes the specific execution. Copy that and navigate to the Step Functions console. Select the state machine that you created previously and paste the ARN into the search bar.

The screen shows information about your specific execution. On the left, you see where your execution currently is in the workflow.

In the following screenshot, you can see that your workflow has successfully completed the alignment job and moved onto the subsequent steps, which are variant calling and generating quality information about your sample.

You can also navigate to the AWS Batch console and see that progress of all of your jobs reflected there as well.

Finally, after your workflow has completed successfully, check out the S3 path to which you wrote all of your files. If you run a ls –recursive command on the S3 results path, specified in the input to your state machine execution, you should see something similar to the following:

2017-05-02 13:46:32 6475144340 genomic-workflow/results/NA12878_run1/bam/sorted.bam
2017-05-02 13:46:34    7552576 genomic-workflow/results/NA12878_run1/bam/sorted.bam.bai
2017-05-02 13:46:32         45 genomic-workflow/results/NA12878_run1/bam/sorted.bam.md5
2017-05-02 13:53:20      68769 genomic-workflow/results/NA12878_run1/stats/bam_stats.dat
2017-05-02 14:05:12        100 genomic-workflow/results/NA12878_run1/vcf/stats/runStats.tsv
2017-05-02 14:05:12        359 genomic-workflow/results/NA12878_run1/vcf/stats/runStats.xml
2017-05-02 14:05:12  507577928 genomic-workflow/results/NA12878_run1/vcf/variants/genome.S1.vcf.gz
2017-05-02 14:05:12     723144 genomic-workflow/results/NA12878_run1/vcf/variants/genome.S1.vcf.gz.tbi
2017-05-02 14:05:12  507577928 genomic-workflow/results/NA12878_run1/vcf/variants/genome.vcf.gz
2017-05-02 14:05:12     723144 genomic-workflow/results/NA12878_run1/vcf/variants/genome.vcf.gz.tbi
2017-05-02 14:05:12   30783484 genomic-workflow/results/NA12878_run1/vcf/variants/variants.vcf.gz
2017-05-02 14:05:12    1566596 genomic-workflow/results/NA12878_run1/vcf/variants/variants.vcf.gz.tbi

Modifications to the workflow

You have now built and run your genomics workflow. While diving deep into modifications to this architecture are beyond the scope of these posts, we wanted to leave you with several suggestions of how you might modify this workflow to satisfy additional business requirements.

  • Job tracking with Amazon DynamoDB
    In many cases, such as if you are offering Genomics-as-a-Service, you might want to track the state of your jobs with DynamoDB to get fine-grained records of how your jobs are running. This way, you can easily identify the cost of individual jobs and workflows that you run.
  • Resuming from failure
    Both AWS Batch and Step Functions natively support job retries and can cover many of the standard cases where a job might be interrupted. There may be cases, however, where your workflow might fail in a way that is unpredictable. In this case, you can use custom error handling with AWS Step Functions to build out a workflow that is even more resilient. Also, you can build in fail states into your state machine to fail at any point, such as if a batch job fails after a certain number of retries.
  • Invoking Step Functions from Amazon API Gateway
    You can use API Gateway to build an API that acts as a "front door" to Step Functions. You can create a POST method that contains the input JSON to feed into the state machine you built. For more information, see the Implementing Serverless Manual Approval Steps in AWS Step Functions and Amazon API Gateway blog post.

Conclusion

While the approach we have demonstrated in this series has been focused on genomics, it is important to note that this can be generalized to nearly any high-throughput batch workload. We hope that you have found the information useful and that it can serve as a jump-start to building your own batch workloads on AWS with native AWS services.

For more information about how AWS can enable your genomics workloads, be sure to check out the AWS Genomics page.

Other posts in this four-part series:

Please leave any questions and comments below.

Pornhub Piracy Stopped Me Producing Porn, Jenna Haze Says

Post Syndicated from Andy original https://torrentfreak.com/pornhub-piracy-stopped-me-producing-porn-jenna-haze-says-170531/

Last week, adult ‘tube’ site Pornhub celebrated its 10th anniversary, and what a decade it was.

Six months after its May 2007 launch, the site was getting a million visitors every day. Six months after that, traffic had exploded five-fold. Such was the site’s success, by November 2008 Pornhub entered the ranks of the top 100 most-visited sites on the Internet.

As a YouTube-like platform, Pornhub traditionally relied on users to upload content to the site. Uploaders have to declare that they have the rights to do so but it’s clear that amid large quantities of fully licensed material, content exists on Pornhub that is infringing copyright.

Like YouTube, however, the site says it takes its legal responsibilities seriously by removing content whenever a valid DMCA notice is received. Furthermore, it also has a Content Partner Program which allows content owners to monetize their material on the platform.

But despite these overtures, Pornhub has remained a divisive operation. While some partners happily generate revenue from the platform and use it to drive valuable traffic to their own sites, others view it as a parasite living off their hard work. Today those critics were joined by one of the biggest stars the adult industry has ever known.

After ten years as an adult performer, starring in more than 600 movies (including one that marked her as the first adult performer to appear on Blu-ray format), in 2012 Jenna Haze decided on a change of pace. No longer interested in performing, she headed to the other side of the camera as a producer and director.

“Directing is where my heart is now. It’s allowed me to explore a creative side that is different from what performing has offered me,” she said in a statement.

“I am very satisfied with what I was able to accomplish in 10 years of performing, and now I’m enjoying the challenges of being on the other side of the camera and running my studio.”

But while Haze enjoyed success with 15 movies, it wasn’t to last. The former performer eventually backed away from both directing and producing adult content. This morning she laid the blame for that on Pornhub and similar sites.

It all began with a tweet from Conan O’Brien, who belatedly wished Pornhub a happy 10th anniversary.

In response to O’Brien apparently coming to the party late, a Twitter user informed him how he’d been missing out on Jenna Haze. That drew a response from Haze herself, who accused Pornhub of pirating her content.

“Please don’t support sites like porn hub,” she wrote. “They are a tube site that pirates content that other adult companies produce. It’s like Napster!”

In a follow-up, Haze went on to accuse Pornhub of theft and blamed the site for her exit from the business.

“Well they steal my content from my company, as do many other tube sites. It’s why I don’t produce or direct anymore,” Haze wrote.

“Maybe not all of their content is stolen, but I have definitely seen my content up there, as well as other people’s content.”

Of course, just like record companies can do with YouTube, there’s always the option for Haze to file a DMCA notice with Pornhub to have offending content taken down. However, it’s a route she claims to have taken already, but without much success.

“They take the videos down and put [them] back up. I’m not saying they don’t do legitimate business as well,” she said.

While Pornhub has its critics, the site does indeed do masses of legitimate business. The platform is owned by Mindgeek, whose websites receive a combined 115 million visitors per day, fueled in part by content supplied by Brazzers and Digital Playground, which Mindgeek owns. That being said, Mindgeek’s position in the market has always been controversial.

Three years ago, it became evident that Mindgeek had become so powerful in the adult industry that performers (some of whom felt their content was being exploited by the company) indicated they were scared to criticize it.

Adult actress and outspoken piracy critic Tasha Reign, who also had her videos uploaded to Pornhub without her permission, revealed she was in a particularly tight spot.

“It’s like we’re stuck between a rock and a hard place in a way, because if I want to shoot content then I kinda have to shoot for [Mindgeek] because that’s the company that books me because they own…almost…everything,” Reign said.

In 2017, Mindgeek’s dominance is clearly less of a problem for Haze, who is now concentrating on other things. But for those who remain in the industry, Mindgeek is a force to be reckoned with, so criticism will probably remain somewhat muted.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

New Features for IAM Policy Summaries – An Easier Way to Detect Potential Typos in Your IAM Policies

Post Syndicated from Joy Chatterjee original https://aws.amazon.com/blogs/security/new-features-for-iam-policy-summaries-an-easier-way-to-detect-potential-typos-in-your-iam-policies/

Last month, we introduced policy summaries to make it easier for you to understand the permissions in your AWS Identity and Access Management (IAM) policies. On Thursday, May 25, I announced three new features that have been added to policy summaries and reviewed resource summaries. Yesterday, I reviewed the benefits of being able to view services and actions that are implicitly denied by a policy.

Today, I demonstrate how policy summaries make it easier for you to detect potential typos in your policies by showing you unrecognized services and actions. In this post, I show how this new feature can help you detect and fix potential typos in your policies.

Unrecognized services and actions

You can now use policy summaries to see unrecognized services and actions. One key benefit of this feature is that it helps you find possible typos in a policy. Let’s say your developer, Bob, creates a policy granting full List and Read permissions to some Amazon S3 buckets and full access to Amazon DynamoDB. Unfortunately, when testing the policy, Bob sees “Access denied” messages when he tries to use those services. To troubleshoot, Bob returns to the IAM console to review the policy summary. Bob sees that he inadvertently misspelled “DynamoDB” as “DynamoBD” (reversing the position of the last two letters) in the policy and notices that he does not have all of the list permissions for S3.

Screenshot showing the "dynamobd" typo

When Bob chooses S3, he sees that ListBuckets is an unrecognized action, as shown in the following screenshot.

Screenshot showing that ListBuckets is not recognized by IAM

Bob chooses Show remaining 26 and realizes the correct action is s3:ListBucket and not s3:ListBuckets. He also confirms this by looking at the list of actions for S3.

Screenshot showing the true action name

Bob fixes the mistakes by choosing the Edit policy button, making the necessary updates, and saving the changes. He returns to the policy summary and sees that the policy no longer has unrecognized services and actions.

Exceptions

If you have a service or action that appears in the Unrecognized services or Unrecognized actions section of the policy summary, it may be because the service is in preview mode. If you think a service or action should be recognized, please submit feedback by choosing the Feedback link located in the bottom left corner of the IAM console.

Summary

Policy summaries make it easier to troubleshoot possible errors in policies. The newest updates I have explored this week on the AWS Security Blog make it easy to understand the resources defined in a policy, show the services and actions that are implicitly denied by a policy, and help you troubleshoot possible typos in a policy. To see policy summaries in your AWS account, sign in to the IAM console and navigate to any policy on the Policies page of the IAM console or the Permissions tab on a user’s page.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.

– Joy

I want to talk for a moment about tolerance

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/05/i-want-to-talk-for-moment-about.html

This post is in response to this Twitter thread. I was going to do a series of tweets in response, but as the number grew, I thought it’d better be done in a blog.

She thinks we are fighting for the rights of Nazis. We aren’t — indeed, the fact that she thinks we are is exactly the problem. They aren’t Nazis.

The issue is not about a slippery slope that first Nazi’s lose free speech, then other groups start losing their speech as well. The issue is that it’s a slippery slope that more and more people get labeled a Nazi. And we are already far down that slope.

The “alt-right” is a diverse group. Like any group. Vilifying the entire alt-right by calling them Nazi’s is like lumping all Muslims in with ISIS or Al Qaeda. We really don’t have Nazi’s in America. Even White Nationalists don’t fit the bill. Nazism was about totalitarianism, real desire to exterminate Jews, lebensraum, and Aryan superiority. Sure, some of these people exist, but they are a fringe, even among the alt-right.

It’s at this point we need to discuss words like “tolerance”. I don’t think it means what you think it means.

The idea of tolerance is that reasonable people can disagree. You still believe you are right, and the other person is wrong, but you accept that they are nonetheless a reasonable person with good intentions, and that they don’t need to be punished for holding the wrong opinion.

Gay rights is a good example. I agree with you that there is only one right answer to this. Having spent nights holding my crying gay college roommate, because his father hated gays, has filled me with enormous hatred and contempt for people like his father. I’ve done my fair share shouting at people for anti-gay slurs.

Yet on the other hand, progressive icons like Barack Obama and Hillary Clinton have had evolving positions on gay rights issues, such as having opposed gay marriage at one time.

Tolerance means accepting that a person is reasonable, intelligent, and well-meaning — even if they oppose gay marriage. It means accepting that Hillary and Obama were reasonable people, even when they were vocally opposing gay marriage.

I’m libertarian. Like most libertarians, I support wide open borders, letting any immigrant across the border for any reason. To me, Hillary’s and Obama’s immigration policies are almost as racist as Trump’s. I have to either believe all you people supporting Hillary/Obama are irredeemably racist — or that well-meaning, good people can disagree about immigration.

I could go through a long list of issues that separate the progressive left and alt-right, and my point would always be the same. While people disagree on issues, and I have my own opinions about which side is right, there are reasonable people on both sides. If there are issues that divide our country down the middle, then by definition, both sides are equally reasonable. The problem with the progressive left is that they do not tolerate this. They see the world as being between one half who hold the correct opinions, and the other half who are unreasonable.

What defines the “alt-right” is not Nazism or White Nationalism, but the reaction of many on the right to intolerance of many on the left. Every time somebody is punished and vilified for uttering what is in fact a reasonable difference of opinion, they join the “alt-right”.

The issue at stake here, the issue that the ACLU is defending, is after that violent attack on the Portland train by an extremist, the city is denying all “alt-right” protesters the right to march. It’s blaming all those of the “alt-right” for the actions of one of their member. It’s similar to cities blocking Muslims from building a mosque because of extremists like ISIS and Al Qaeda, or disturbed individuals who carry out violent attacks in the name of Islam.

This is not just a violation of the First Amendment rights, it’s an obvious one. As the Volokh Conspiracy documents, the courts have ruled many times on this issue. There is no doubt that the “alt-right” has the right to march, and that the city’s efforts to deny them this right is a blatant violation of the constitution.

What we are defending here is not the rights of actual Nazi’s to march (as the courts famous ruled was still legitimate speech in Skokie, Illinois), but the rights of non-Nazi’s to march, most who have legitimate, reasonable (albeit often wrong) grievances to express. This speech is clearly being suppressed by gun wielding thugs in Portland, Oregon.

Those like Jillian see this as dealing with unreasonable speech, we see this as a problem of tolerably wrong speech. Those like Jillian York aren’t defending the right to free speech because, in their minds, they’ve vilified the people they disagree with. But that’s that’s exactly when, and only when, free speech needs our protection, when those speaking out have been vilified, and their repression seems just. Look at how Russia suppresses supporters of gay rights, with exactly this sort of vilification, whereby the majority of the populace sees the violence and policing as a legitimate response to speech that should not be free.

We aren’t fighting a slippery slope here, by defending Nazis. We’ve already slid down that slope, where reasonable people’s rights are being violated. We are fighting to get back up top.

–> –>

Huge Coalition Protests EU Mandatory Piracy Filter Proposals

Post Syndicated from Andy original https://torrentfreak.com/huge-coalition-protests-eu-mandatory-piracy-filter-proposals-170530/

Last September, EU Commission President Jean-Claude Juncker announced plans to modernize copyright law in Europe.

The proposals (pdf) are part of the Digital Single Market reforms, which have been under development for the past several years.

The proposals cover a broad range of copyright-related issues, but one stands out as being particularly controversial. Article 13 requires certain online service providers to become deeply involved in the detection and policing of allegedly infringing copyright works, uploaded to their platforms by users.

Although its effects will likely be more broad, the proposal is targeted at the so-called “value gap” (1,2,3), i.e the notion that platforms like YouTube are able to avoid paying expensive licensing fees (for music in particular) by exploiting the safe harbor protections of the DMCA and similar legislation.

To close this loophole using Article 13, services that provide access to “large amounts” of user-uploaded content would be required to cooperate with rightsholders to prevent infringing works being communicated to the public.

This means that platforms like YouTube would be forced to take measures to ensure that their deals with content providers to distribute official content are protected by aggressive anti-piracy mechanisms.

The legislation would see platforms forced to deploy content-recognition, filtering and blocking mechanisms, to ensure that only non-infringing content is uploaded in the first place, thus limiting the chances that unauthorized copyrighted content will be made available to end users.

Supporters argue that the resulting decrease in availability of infringing content will effectively close the “value gap” but critics see the measures as disproportionate, likely to result in censorship (no provision for fair use), and a restriction of fundamental freedoms. Indeed, there are already warnings that such a system would severely “restrict the way Europeans create, share, and communicate online.”

The proposals have predictably received widespread support from entertainment industry companies across the EU and the United States, but there are now clear signs that the battle lines are being drawn.

On one side are the major recording labels, movie studios, and other producers. On the other, companies and platforms that will suddenly become more liable for infringing content, accompanied by citizens and scholars who feel that freedoms will be restricted.

The latest sign of the scale of opposition to Article 13 manifests itself in an open letter to the European Parliament. Under the Copyright for Creativity (C4C) banner and signed by the EFF, Creative Commons, Wikimedia, Mozilla, EDRi, Open Rights Group plus sixty other organizations, the letter warns that the proposals will cause more problems than they solve.

“The European Commission’s proposal on copyright in the Digital Single Market failed to meet the expectations of European citizens and businesses. Instead of supporting Europeans in the digital economy, it is backward looking,” the groups say.

“We need European lawmakers to oppose the most damaging aspects of the proposal, but also to embrace a more ambitious agenda for positive reform.”

In addition to opposing Article 11 (the proposed Press Publishers’ Right), the groups ask the EU Parliament not to impose private censorship on EU citizens via Article 13.

“The provision on the so-called ‘value gap’ is designed to provoke such legal uncertainty that online services will have no other option than to monitor, filter and block EU citizens’ communications if they want to have any chance of staying in business,” the groups write.

“The Commission’s proposal misrepresents some European Court rulings and seeks to impose contradictory obligations on Member States. This is simply bad regulation.”

Calling for the wholesale removal of Article 13 from the copyright negotiations, the groups argue that the reforms should be handled in the appropriate contexts.

“We strenuously oppose such ill thought through experimentation with intermediary liability, which will hinder innovation and competition and will reduce the opportunities available to all European businesses and citizens,” they add.

C4C concludes by calling on lawmakers to oppose Article 13 while seeking avenues for positive reform.

The full letter can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

So You Want To Be An Internet Piracy Investigator?

Post Syndicated from Andy original https://torrentfreak.com/so-you-want-to-be-an-internet-piracy-investigator-170528/

While the authorities would like to paint a picture of Internet pirates as thoughtless thieves only interested in the theft of intellectual property, the truth is more nuanced.

Like every other online and indeed offline location, pirate sites are filled with people from all corners of society, from rich to poor, and from the basically educated to the borderline genius.

What is especially interesting is the extremely thin line between poacher and gamekeeper, between those who want to exploit intellectual property and those who want to protect it. Indeed, it is far from uncommon to find former pirates and renegade coders “going straight” by working for their former enemies.

While a repellent thought to some, it makes perfect sense. Anyone who knows the piracy scene back to front could be a valuable asset to the other side, under the right circumstances. But what does it really take to be an anti-piracy investigator?

As it happens, the UK’s Federation Against Copyright Theft is currently trying to fill exactly such a position. The job of “Internet Investigator” is based in the UK and the successful applicant will report to a manager. While that tends to suggest a lower pay grade, FACT are insistent that applicants meet stringent criteria.

“Working as a proactive member of the investigatory team to support the strategic objectives of FACT. Responsible for the detection, investigation, and protection of clients Intellectual Property whether physical or digital as directed by the Investigations Manager,” the listing reads.

More specifically, FACT is looking for someone with a “strong aptitude for investigation” who is capable of working under minimal supervision. The candidate is also required to have a proven record of liaising with “industry and enforcement organizations”, presumably including entertainment companies and the police.

At this point, things get pretty interesting. FACT says that the job involves assessing and investigating “individuals and entities” responsible for “illegal or infringing activity related to Intellectual Property.” Think torrent, streaming and IPTV site operators and staff, release group members, ‘Kodi Box’ sellers, infringing addon developers, even people flogging dodgy DVDs down the market.

When these investigations are being carried out, FACT expects evidence and intelligence to be gathered “ethically and in accordance with criminal procedure rules”, presumably so that cases don’t collapse when they end up in court. Which they often do.

Also of interest is how closely FACT appears to align its practices with those of the police. While the candidate is expected to liaise with law enforcement, they will also be expected to take part in briefings, seizure of evidence and prosecution support, all while “managing risks” and acting in accordance with UK legislation.

Another aspect of the job is a little cryptic, in that it requires the candidate to “locate offenders” and then undertake action “with an alternative approach to a proportionate solution.” That’s open to interpretation but it sounds very much like the home visits FACT has been known to make to site operators, who are asked to cease and desist while handing over their domains.

Unsurprisingly, FACT are looking for someone with a computer science degree or similar, and good organizational skills. Above that, it’s fairly obvious they’re seeking someone with a legal background, perhaps a law graduate or even a former police officer.

In addition to familiarity with the rules laid down in the Management of Police Information (MOPI) 2010, the candidate will be required to attend court hearings to give evidence. They’ll also need to conduct “intrusive surveillance” in accordance with the Regulation of Investigatory Powers Act 2000 (RIPA) and have knowledge of:

– European Convention on Human Rights Act 2000
– Police and Criminal Evidence Act 1984
– Regulation of Investigatory Powers Act 2000
– Data Protection Act 1998
– Proceeds of Crime Act 2002
– Fraud Act 2006
– Serious Crime Act 2007
– Copyright Designs & Patents Act 1988 and Trade Marks Act 1994
– Computer Misuse Act 1990
– Other applicable legislation

The window to apply has almost run out but given the laundry list of qualities above, it seems unlikely that FACT will be swamped with perfectly suitable candidates right off the bat.

Finally, it’s probably worth mentioning that former torrent site operators and release group members keen to branch out are not specifically mentioned as primary candidates, so the poacher-turned-gamekeeper applicant might want to keep that part under their hat, at least until later.

Otherwise, FACT might just slap the cuffs on there and then, in line with UK legislation and procedure, of course.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Build a Serverless Architecture to Analyze Amazon CloudFront Access Logs Using AWS Lambda, Amazon Athena, and Amazon Kinesis Analytics

Post Syndicated from Rajeev Srinivasan original https://aws.amazon.com/blogs/big-data/build-a-serverless-architecture-to-analyze-amazon-cloudfront-access-logs-using-aws-lambda-amazon-athena-and-amazon-kinesis-analytics/

Nowadays, it’s common for a web server to be fronted by a global content delivery service, like Amazon CloudFront. This type of front end accelerates delivery of websites, APIs, media content, and other web assets to provide a better experience to users across the globe.

The insights gained by analysis of Amazon CloudFront access logs helps improve website availability through bot detection and mitigation, optimizing web content based on the devices and browser used to view your webpages, reducing perceived latency by caching of popular object closer to its viewer, and so on. This results in a significant improvement in the overall perceived experience for the user.

This blog post provides a way to build a serverless architecture to generate some of these insights. To do so, we analyze Amazon CloudFront access logs both at rest and in transit through the stream. This serverless architecture uses Amazon Athena to analyze large volumes of CloudFront access logs (on the scale of terabytes per day), and Amazon Kinesis Analytics for streaming analysis.

The analytic queries in this blog post focus on three common use cases:

  1. Detection of common bots using the user agent string
  2. Calculation of current bandwidth usage per Amazon CloudFront distribution per edge location
  3. Determination of the current top 50 viewers

However, you can easily extend the architecture described to power dashboards for monitoring, reporting, and trigger alarms based on deeper insights gained by processing and analyzing the logs. Some examples are dashboards for cache performance, usage and viewer patterns, and so on.

Following we show a diagram of this architecture.

Prerequisites

Before you set up this architecture, install the AWS Command Line Interface (AWS CLI) tool on your local machine, if you don’t have it already.

Setup summary

The following steps are involved in setting up the serverless architecture on the AWS platform:

  1. Create an Amazon S3 bucket for your Amazon CloudFront access logs to be delivered to and stored in.
  2. Create a second Amazon S3 bucket to receive processed logs and store the partitioned data for interactive analysis.
  3. Create an Amazon Kinesis Firehose delivery stream to batch, compress, and deliver the preprocessed logs for analysis.
  4. Create an AWS Lambda function to preprocess the logs for analysis.
  5. Configure Amazon S3 event notification on the CloudFront access logs bucket, which contains the raw logs, to trigger the Lambda preprocessing function.
  6. Create an Amazon DynamoDB table to look up partition details, such as partition specification and partition location.
  7. Create an Amazon Athena table for interactive analysis.
  8. Create a second AWS Lambda function to add new partitions to the Athena table based on the log delivered to the processed logs bucket.
  9. Configure Amazon S3 event notification on the processed logs bucket to trigger the Lambda partitioning function.
  10. Configure Amazon Kinesis Analytics application for analysis of the logs directly from the stream.

ETL and preprocessing

In this section, we parse the CloudFront access logs as they are delivered, which occurs multiple times in an hour. We filter out commented records and use the user agent string to decipher the browser name, the name of the operating system, and whether the request has been made by a bot. For more details on how to decipher the preceding information based on the user agent string, see user-agents 1.1.0 in the Python documentation.

We use the Lambda preprocessing function to perform these tasks on individual rows of the access log. On successful completion, the rows are pushed to an Amazon Kinesis Firehose delivery stream to be persistently stored in an Amazon S3 bucket, the processed logs bucket.

To create a Firehose delivery stream with a new or existing S3 bucket as the destination, follow the steps described in Create a Firehose Delivery Stream to Amazon S3 in the S3 documentation. Keep most of the default settings, but select an AWS Identity and Access Management (IAM) role that has write access to your S3 bucket and specify GZIP compression. Name the delivery stream CloudFrontLogsToS3.

Another pre-requisite for this setup is to create an IAM role that provides the necessary permissions our AWS Lambda function to get the data from S3, process it, and deliver it to the CloudFrontLogsToS3 delivery stream.

Let’s use the AWS CLI to create the IAM role using the following the steps:

  1. Create the IAM policy (lambda-exec-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (lambda-cflogs-exec-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To download the policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/lambda-exec-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/preprocessiong-lambda/assume-lambda-policy.json  <path_on_your_local_machine>

Following is the lambda-exec-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CloudWatchAccess",
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:*"
        },
        {
            "Sid": "S3Access",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::*"
            ]
        },
        {
            "Sid": "FirehoseAccess",
            "Effect": "Allow",
            "Action": [
                "firehose:ListDeliveryStreams",
                "firehose:PutRecord",
                "firehose:PutRecordBatch"
            ],
            "Resource": [
                "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3"
            ]
        }
    ]
}

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-exec-policy --policy-document file://<path>/lambda-exec-policy.json

To create the AWS Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Following is the assume-lambda-policy.json file, to grant Lambda permission to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To attach the policy (lambda-exec-policy) created to the AWS Lambda execution role (lambda-cflogs-exec-role), type the following command.

aws iam attach-role-policy --role-name lambda-cflogs-exec-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-exec-policy

Now that we have created the CloudFrontLogsToS3 Firehose delivery stream and the lambda-cflogs-exec-role IAM role for Lambda, the next step is to create a Lambda preprocessing function.

This Lambda preprocessing function parses the CloudFront access logs delivered into the S3 bucket and performs a few transformation and mapping operations on the data. The Lambda function adds descriptive information, such as the browser and the operating system that were used to make this request based on the user agent string found in the logs. The Lambda function also adds information about the web distribution to support scenarios where CloudFront access logs are delivered to a centralized S3 bucket from multiple distributions. With the solution in this blog post, you can get insights across distributions and their edge locations.

Use the Lambda Management Console to create a new Lambda function with a Python 2.7 runtime and the s3-get-object-python blueprint. Open the console, and on the Configure triggers page, choose the name of the S3 bucket where the CloudFront access logs are delivered. Choose Put for Event type. For Prefix, type the name of the prefix, if any, for the folder where CloudFront access logs are delivered, for example cloudfront-logs/. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable trigger.

Choose Next and provide a function name to identify this Lambda preprocessing function.

For Code entry type, choose Upload a file from Amazon S3. For S3 link URL, type https.amazonaws.com//preprocessing-lambda/pre-data.zip. In the section, also create an environment variable with the key KINESIS_FIREHOSE_STREAM and a value with the name of the Firehose delivery stream as CloudFrontLogsToS3.

Choose lambda-cflogs-exec-role as the IAM role for the Lambda function, and type prep-data.lambda_handler for the value for Handler.

Choose Next, and then choose Create Lambda.

Table creation in Amazon Athena

In this step, we will build the Athena table. Use the Athena console in the same region and create the table using the query editor.

CREATE EXTERNAL TABLE IF NOT EXISTS cf_logs (
  logdate date,
  logtime string,
  location string,
  bytes bigint,
  requestip string,
  method string,
  host string,
  uri string,
  status bigint,
  referrer string,
  useragent string,
  uriquery string,
  cookie string,
  resulttype string,
  requestid string,
  header string,
  csprotocol string,
  csbytes string,
  timetaken bigint,
  forwardedfor string,
  sslprotocol string,
  sslcipher string,
  responseresulttype string,
  protocolversion string,
  browserfamily string,
  osfamily string,
  isbot string,
  filename string,
  distribution string
)
PARTITIONED BY(year string, month string, day string, hour string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3://<pre-processing-log-bucket>/prefix/';

Creation of the Athena partition

A popular website with millions of requests each day routed using Amazon CloudFront can generate a large volume of logs, on the order of a few terabytes a day. We strongly recommend that you partition your data to effectively restrict the amount of data scanned by each query. Partitioning significantly improves query performance and substantially reduces cost. The Lambda partitioning function adds the partition information to the Athena table for the data delivered to the preprocessed logs bucket.

Before delivering the preprocessed Amazon CloudFront logs file into the preprocessed logs bucket, Amazon Kinesis Firehose adds a UTC time prefix in the format YYYY/MM/DD/HH. This approach supports multilevel partitioning of the data by year, month, date, and hour. You can invoke the Lambda partitioning function every time a new processed Amazon CloudFront log is delivered to the preprocessed logs bucket. To do so, configure the Lambda partitioning function to be triggered by an S3 Put event.

For a website with millions of requests, a large number of preprocessed logs can be delivered multiple times in an hour—for example, at the interval of one each second. To avoid querying the Athena table for partition information every time a preprocessed log file is delivered, you can create an Amazon DynamoDB table for fast lookup.

Based on the year, month, data and hour in the prefix of the delivered log, the Lambda partitioning function checks if the partition specification exists in the Amazon DynamoDB table. If it doesn’t, it’s added to the table using an atomic operation, and then the Athena table is updated.

Type the following command to create the Amazon DynamoDB table.

aws dynamodb create-table --table-name athenapartitiondetails \
--attribute-definitions AttributeName=PartitionSpec,AttributeType=S \
--key-schema AttributeName=PartitionSpec,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100

Here the following is true:

  • PartitionSpec is the hash key and is a representation of the partition signature—for example, year=”2017”; month=”05”; day=”15”; hour=”10”.
  • Depending on the rate at which the processed log files are delivered to the processed log bucket, you might have to increase the ReadCapacityUnits and WriteCapacityUnits values, if these are throttled.

The other attributes besides PartitionSpec are the following:

  • PartitionPath – The S3 path associated with the partition.
  • PartitionType – The type of partition used (Hour, Month, Date, Year, or ALL). In this case, ALL is used.

Next step is to create the IAM role to provide permissions for the Lambda partitioning function. You require permissions to do the following:

  1. Look up and write partition information to DynamoDB.
  2. Alter the Athena table with new partition information.
  3. Perform Amazon CloudWatch logs operations.
  4. Perform Amazon S3 operations.

To download the policy document to your local machine, type following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/lambda-partition-function-execution-policy.json  <path_on_your_local_machine>

To download the assume policy document to your local machine, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/assume-lambda-policy.json <path_on_your_local_machine>

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-cflogs-exec-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(lambda-partition-exec-policy) used by the Lambda execution role.
  2. Create the Lambda execution role (lambda-partition-execution-role)and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

To create the IAM policy used by Lambda execution role, type the following command.

aws iam create-policy --policy-name lambda-partition-exec-policy --policy-document file://<path>/lambda-partition-function-execution-policy.json

To create the Lambda execution role and assign the service to use this role, type the following command.

aws iam create-role --role-name lambda-partition-execution-role --assume-role-policy-document file://<path>/assume-lambda-policy.json

To attach the policy (lambda-partition-exec-policy) created to the AWS Lambda execution role (lambda-partition-execution-role), type the following command.

aws iam attach-role-policy --role-name lambda-partition-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/lambda-partition-exec-policy

Following is the lambda-partition-function-execution-policy.json file, which is the IAM policy used by the Lambda execution role.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
            	"Sid": "DDBTableAccess",
            	"Effect": "Allow",
            	"Action": "dynamodb:PutItem"
            	"Resource": "arn:aws:dynamodb*:*:table/athenapartitiondetails"
        	},
        	{
            	"Sid": "S3Access",
            	"Effect": "Allow",
            	"Action": [
                		"s3:GetBucketLocation",
                		"s3:GetObject",
                		"s3:ListBucket",
                		"s3:ListBucketMultipartUploads",
                		"s3:ListMultipartUploadParts",
                		"s3:AbortMultipartUpload",
                		"s3:PutObject"
            	],
          		"Resource":"arn:aws:s3:::*"
		},
	              {
		      "Sid": "AthenaAccess",
      		"Effect": "Allow",
      		"Action": [ "athena:*" ],
      		"Resource": [ "*" ]
	      },
        	{
            	"Sid": "CloudWatchLogsAccess",
            	"Effect": "Allow",
            	"Action": [
                		"logs:CreateLogGroup",
                		"logs:CreateLogStream",
             	   	"logs:PutLogEvents"
            	],
            	"Resource": "arn:aws:logs:*:*:*"
        	}
    ]
}

Download the .jar file containing the Java deployment package to your local machine.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/partitioning-lambda/aws-lambda-athena-1.0.0.jar <path_on_your_local_machine>

From the AWS Management Console, create a new Lambda function with Java8 as the runtime. Select the Blank Function blueprint.

On the Configure triggers page, choose the name of the S3 bucket where the preprocessed logs are delivered. Choose Put for the Event Type. For Prefix, type the name of the prefix folder, if any, where preprocessed logs are delivered by Firehose—for example, out/. For Suffix, type the name of the compression format that the Firehose stream (CloudFrontLogToS3) delivers the preprocessed logs —for example, gz. To invoke Lambda to retrieve the logs from the S3 bucket as they are delivered, select Enable Trigger.

Choose Next and provide a function name to identify this Lambda partitioning function.

Choose Java8 for Runtime for the AWS Lambda function. Choose Upload a .ZIP or .JAR file for the Code entry type, and choose Upload to upload the downloaded aws-lambda-athena-1.0.0.jar file.

Next, create the following environment variables for the Lambda function:

  • TABLE_NAME – The name of the Athena table (for example, cf_logs).
  • PARTITION_TYPE – The partition to be created based on the Athena table for the logs delivered to the sub folders in S3 bucket based on Year, Month, Date, Hour, or Set this to ALL to use Year, Month, Date, and Hour.
  • DDB_TABLE_NAME – The name of the DynamoDB table holding partition information (for example, athenapartitiondetails).
  • ATHENA_REGION – The current AWS Region for the Athena table to construct the JDBC connection string.
  • S3_STAGING_DIR – The Amazon S3 location where your query output is written. The JDBC driver asks Athena to read the results and provide rows of data back to the user (for example, s3://<bucketname>/<folder>/).

To configure the function handler and IAM, for Handler copy and paste the name of the handler: com.amazonaws.services.lambda.CreateAthenaPartitionsBasedOnS3EventWithDDB::handleRequest. Choose the existing IAM role, lambda-partition-execution-role.

Choose Next and then Create Lambda.

Interactive analysis using Amazon Athena

In this section, we analyze the historical data that’s been collected since we added the partitions to the Amazon Athena table for data delivered to the preprocessing logs bucket.

Scenario 1 is robot traffic by edge location.

SELECT COUNT(*) AS ct, requestip, location FROM cf_logs
WHERE isbot='True'
GROUP BY requestip, location
ORDER BY ct DESC;

Scenario 2 is total bytes transferred per distribution for each edge location for your website.

SELECT distribution, location, SUM(bytes) as totalBytes
FROM cf_logs
GROUP BY location, distribution;

Scenario 3 is the top 50 viewers of your website.

SELECT requestip, COUNT(*) AS ct  FROM cf_logs
GROUP BY requestip
ORDER BY ct DESC;

Streaming analysis using Amazon Kinesis Analytics

In this section, you deploy a stream processing application using Amazon Kinesis Analytics to analyze the preprocessed Amazon CloudFront log streams. This application analyzes directly from the Amazon Kinesis Stream as it is delivered to the preprocessing logs bucket. The stream queries in section are focused on gaining the following insights:

  • The IP address of the bot, identified by its Amazon CloudFront edge location, that is currently sending requests to your website. The query also includes the total bytes transferred as part of the response.
  • The total bytes served per distribution per population for your website.
  • The top 10 viewers of your website.

To download the firehose-access-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/firehose-access-policy.json  <path_on_your_local_machine>

To download the kinesisanalytics-policy.json file, type the following.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis/kinesisanalytics/assume-kinesisanalytics-policy.json <path_on_your_local_machine>

Before we create the Amazon Kinesis Analytics application, we need to create the IAM role to provide permission for the analytics application to access Amazon Kinesis Firehose stream.

Let’s use the AWS CLI to create the IAM role using the following three steps:

  1. Create the IAM policy(firehose-access-policy) for the Lambda execution role to use.
  2. Create the Lambda execution role (ka-execution-role) and assign the service to use this role.
  3. Attach the policy created in step 1 to the Lambda execution role.

Following is the firehose-access-policy.json file, which is the IAM policy used by Kinesis Analytics to read Firehose delivery stream.

{
    "Version": "2012-10-17",
    "Statement": [
      	{
    	"Sid": "AmazonFirehoseAccess",
    	"Effect": "Allow",
    	"Action": [
       	"firehose:DescribeDeliveryStream",
        	"firehose:Get*"
    	],
    	"Resource": [
              "arn:aws:firehose:*:*:deliverystream/CloudFrontLogsToS3”
       ]
     }
}

Following is the assume-kinesisanalytics-policy.json file, to grant Amazon Kinesis Analytics permissions to assume a role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "kinesisanalytics.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

To create the IAM policy used by Analytics access role, type the following command.

aws iam create-policy --policy-name firehose-access-policy --policy-document file://<path>/firehose-access-policy.json

To create the Analytics execution role and assign the service to use this role, type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To attach the policy (irehose-access-policy) created to the Analytics execution role (ka-execution-role), type the following command.

aws iam attach-role-policy --role-name ka-execution-role --policy-arn arn:aws:iam::<your-account-id>:policy/firehose-access-policy

To deploy the Analytics application, first download the configuration file and then modify ResourceARN and RoleARN for the Amazon Kinesis Firehose input configuration.

"KinesisFirehoseInput": { 
    "ResourceARN": "arn:aws:firehose:<region>:<account-id>:deliverystream/CloudFrontLogsToS3", 
    "RoleARN": "arn:aws:iam:<account-id>:role/ka-execution-role"
}

To download the Analytics application configuration file, type the following command.

aws s3 cp s3://aws-bigdata-blog/artifacts/Serverless-CF-Analysis//kinesisanalytics/kinesis-analytics-app-configuration.json <path_on_your_local_machine>

To deploy the application, type the following command.

aws kinesisanalytics create-application --application-name "cf-log-analysis" --cli-input-json file://<path>/kinesis-analytics-app-configuration.json

To start the application, type the following command.

aws kinesisanalytics start-application --application-name "cf-log-analysis" --input-configuration Id="1.1",InputStartingPositionConfiguration={InputStartingPosition="NOW"}

SQL queries using Amazon Kinesis Analytics

Scenario 1 is a query for detecting bots for sending request to your website detection for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BOT_DETECTION" (requesttime TIME, destribution VARCHAR(16), requestip VARCHAR(64), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BOT_DETECTION_PUMP" AS INSERT INTO "BOT_DETECTION"
--
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "request_ip" as requestip, 
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
WHERE "is_bot" = true
GROUP BY "request_ip", "edge_location", "distribution_name",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 2 is a query for total bytes transferred per distribution for each edge location for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "BYTES_TRANSFFERED" (requesttime TIME, destribution VARCHAR(16), edgelocation VARCHAR(64), totalBytes BIGINT);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "BYTES_TRANSFFERED_PUMP" AS INSERT INTO "BYTES_TRANSFFERED"
-- Bytes Transffered per second per web destribution by edge location
SELECT STREAM 
    STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND) as requesttime,
    "distribution_name" as distribution,
    "edge_location" as edgelocation, 
    SUM("bytes") as totalBytes
FROM "CF_LOG_STREAM_001"
GROUP BY "distribution_name", "edge_location", "request_date",
STEP("CF_LOG_STREAM_001"."request_time" BY INTERVAL '1' SECOND),
STEP("CF_LOG_STREAM_001".ROWTIME BY INTERVAL '1' SECOND);

Scenario 3 is a query for the top 50 viewers for your website.

-- Create output stream, which can be used to send to a destination
CREATE OR REPLACE STREAM "TOP_TALKERS" (requestip VARCHAR(64), requestcount DOUBLE);
-- Create pump to insert into output 
CREATE OR REPLACE PUMP "TOP_TALKERS_PUMP" AS INSERT INTO "TOP_TALKERS"
-- Top Ten Talker
SELECT STREAM ITEM as requestip, ITEM_COUNT as requestcount FROM TABLE(TOP_K_ITEMS_TUMBLING(
  CURSOR(SELECT STREAM * FROM "CF_LOG_STREAM_001"),
  'request_ip', -- name of column in single quotes
  50, -- number of top items
  60 -- tumbling window size in seconds
  )
);

Conclusion

Following the steps in this blog post, you just built an end-to-end serverless architecture to analyze Amazon CloudFront access logs. You analyzed these both in interactive and streaming mode, using Amazon Athena and Amazon Kinesis Analytics respectively.

By creating a partition in Athena for the logs delivered to a centralized bucket, this architecture is optimized for performance and cost when analyzing large volumes of logs for popular websites that receive millions of requests. Here, we have focused on just three common use cases for analysis, sharing the analytic queries as part of the post. However, you can extend this architecture to gain deeper insights and generate usage reports to reduce latency and increase availability. This way, you can provide a better experience on your websites fronted with Amazon CloudFront.

In this blog post, we focused on building serverless architecture to analyze Amazon CloudFront access logs. Our plan is to extend the solution to provide rich visualization as part of our next blog post.


About the Authors

Rajeev Srinivasan is a Senior Solution Architect for AWS. He works very close with our customers to provide big data and NoSQL solution leveraging the AWS platform and enjoys coding . In his spare time he enjoys riding his motorcycle and reading books.

 

Sai Sriparasa is a consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.

 

 


Related

Analyzing VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight

EC2 In-Memory Processing Update: Instances with 4 to 16 TB of Memory + Scale-Out SAP HANA to 34 TB

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-in-memory-processing-update-instances-with-4-to-16-tb-of-memory-scale-out-sap-hana-to-34-tb/

Several times each month, I speak to AWS customers at our Executive Briefing Center in Seattle. I describe our innovation process and talk about how the roadmap for each AWS offering is driven by customer requests and feedback.

A good example of this is our work to make AWS a great home for SAP’s portfolio of business solutions. Over the years our customers have told us that they run large-scale SAP applications in production on AWS and we’ve worked hard to provide them with EC2 instances that are designed to accommodate their workloads. Because SAP installations are unfailingly mission-critical, SAP certifies their products for use on certain EC2 instance types and sizes. We work directly with SAP in order to achieve certification and to make AWS a robust & reliable host for their products.

Here’s a quick recap of some of our most important announcements in this area:

June 2012 – We expanded the range of SAP-certified solutions that are available on AWS.

October 2012 – We announced that the SAP HANA in-memory database is now available for production use on AWS.

March 2014 – We announced that SAP HANA can now run in production form on cr1.8xlarge instances with up to 244 GB of memory, with the ability to create test clusters that are even larger.

June 2014 – We published a SAP HANA Deployment Guide and a set of AWS CloudFormation templates in conjunction with SAP certification on r3.8xlarge instances.

October 2015 – We announced the x1.32xlarge instances with 2 TB of memory, designed to run SAP HANA, Microsoft SQL Server, Apache Spark, and Presto.

August 2016 – We announced that clusters of X1 instances can now be used to create production SAP HANA clusters with up to 7 nodes, or 14 TB of memory.

October 2016 – We announced the x1.16xlarge instance with 1 TB of memory.

January 2017 – SAP HANA was certified for use on r4.16xlarge instances.

Today, customers from a broad collection of industries run their SAP applications in production form on AWS (the SAP and Amazon Web Services page has a long list of customer success stories).

My colleague Bas Kamphuis recently wrote about Navigating the Digital Journey with SAP and the Cloud (registration required). He discusses the role of SAP in digital transformation and examines the key characteristics of the cloud infrastructure that support it, while pointing out many of the advantages that the cloud offers in comparison to other hosting options. Here’s how he illustrates these advantages in his article:

We continue to work to make AWS an even better place to run SAP applications in production form. Here are some of the things that we are working on:

  • Bigger SAP HANA Clusters – You can now build scale-out SAP HANA clusters with up to 17 nodes (34 TB of memory).
  • 4 TB Instances – The upcoming x1e.32xlarge instances will offer 4 TB of memory.
  • 8 – 16 TB Instances – Instances with up to 16 TB of memory are in the works.

Let’s dive in!

Building Bigger SAP HANA Clusters
I’m happy to announce that we have been working with SAP to certify the x1.32large instances for use in scale-out clusters with up to 17 nodes (34 TB of memory). This is the largest scale-out deployment available from any cloud provider today, and allows our customers to deploy very large SAP workloads on AWS (visit the SAP HANA Hardware directory certification for the x1.32xlarge instance to learn more). To learn how to architect and deploy your own scale-out cluster, consult the SAP HANA on AWS Quick Start.

Extending the Memory-Intensive X1 Family
We will continue to invest in this and other instance families in order to address your needs and to give you a solid growth path.

Later this year we plan to make the x1e.32xlarge instances available in several AWS regions, in both On-Demand and Reserved Instance form. These instances will offer 4 TB of DDR4 memory (twice as much as the x1.32xlarge), 128 vCPUs (four 2.3 GHz Intel® Xeon® E7 8880 v3 processors), high memory bandwidth, and large L3 caches. The instances will be VPC-only, and will deliver up to 20 Gbps of network banwidth using the Elastic Network Adapter while minimizing latency and jitter. They’ll be EBS-optimized by default, with up to 14 Gbps of dedicated EBS throughput.

Here are some screen shots from the shell. First, dmesg shows the boot-time kernel message:

Second, lscpu shows the vCPU & socket count, along with many other interesting facts:

And top shows nearly 900 processes:

Here’s the view from within HANA Studio:

This new instance, along with the certification for larger clusters, broadens the set of scale-out and scale-up options that you have for running SAP on EC2, as you can see from this diagram:

The Long-Term Memory-Intensive Roadmap
Because we know that planning large-scale SAP installations can take a considerable amount of time, I would also like to share part of our roadmap with you.

Today, customers are able to run larger SAP HANA certified servers in third party colo data centers and connect them to their AWS infrastructure via AWS Direct Connect, but customers have told us that they really want a cloud native solution like they currently get with X1 instances.

In order to meet this need, we are working on instances with even more memory! Throughout 2017 and 2018, we plan to launch EC2 instances with between 8 TB and 16 TB of memory. These upcoming instances, along with the x1e.32xlarge, will allow you to create larger single-node SAP installations and multi-node SAP HANA clusters, and to run other memory-intensive applications and services. It will also provide you with some scale-up headroom that will become helpful when you start to reach the limits of the smaller instances.

I’ll share more information on our plans as soon as possible.

Say Hello at SAPPHIRE
The AWS team will be in booth 539 at SAPPHIRE with a rolling set of sessions from our team, our customers, and our partners in the in-booth theater. We’ll also be participating in many sessions throughout the event. Here’s a sampling (see SAP SAPPHIRE NOW 2017 for a full list):

SAP Solutions on AWS for Big Businesses and Big Workloads – Wednesday, May 17th at Noon. Bas Kamphuis (General Manager, SAP, AWS) & Ed Alford (VP of Business Application Services, BP).

Break Through the Speed Barrier When You Move to SAP HANA on AWS – Wednesday, May 17th at 12:30 PM – Paul Young (VP, SAP) and Saul Dave (Senior Director, Enterprise Systems, Zappos).

AWS Fireside Chat with Zappos (Rapid SAP HANA Migration: Real Results) – Thursday, May 18th at 11:00 AM – Saul Dave (Senior Director, Enterprise Systems, Zappos) and Steve Jones (Senior Manager, SAP Solutions Architecture, AWS).

Jeff;

PS – If you have some SAP experience and would like to bring it to the cloud, take a look at the Principal Product Manager (AWS Quick Starts) and SAP Architect positions.

Product or Project?

Post Syndicated from Matt Richardson original https://www.raspberrypi.org/blog/product-or-project/

This column is from The MagPi issue 57. You can download a PDF of the full issue for free, or subscribe to receive the print edition in your mailbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve its charitable goals.

Image of MagPi magazine and AIY Project Kit

Taking inspiration from a widely known inspirational phrase, I like to tell people, “make the thing you wish to see in the world.” In other words, you don’t have to wait for a company to create the exact product you want. You can be a maker as well as a consumer! Prototyping with hardware has become easier and more affordable, empowering people to make products that suit their needs perfectly. And the people making these things aren’t necessarily electrical engineers, computer scientists, or product designers. They’re not even necessarily adults. They’re often self-taught hobbyists who are empowered by maker-friendly technology.

It’s a subject I’ve been very interested in, and I have written about it before. Here’s what I’ve noticed: the flow between maker project and consumer product moves in both directions. In other words, consumer products can start off as maker projects. Just take a look at the story behind many of the crowdfunded products on sites such as Kickstarter. Conversely, consumer products can evolve into maker products as well. The cover story for the latest issue of The MagPi is a perfect example of that. Google has given you the resources you need to build your own dedicated Google Assistant device. How cool is that?

David Pride on Twitter

@Raspberry_Pi @TheMagP1 Oh this is going to be a ridiculous amount of fun. 😊 #AIYProjects #woodchuck https://t.co/2sWYmpi6T1

But consumer products becoming hackable hardware isn’t always an intentional move by the product’s maker. In the 2000s, TiVo set-top DVRs were a hot product and their most enthusiastic fans figured out how to hack the product to customise it to meet their needs without any kind of support from TiVo.

Embracing change

But since then, things have changed. For example, when Microsoft’s Kinect for the Xbox 360 was released in 2010, makers were immediately enticed by its capabilities. It not only acted as a camera, but it could also sense depth, a feature that would be useful for identifying the position of objects in a space. At first, there was no hacker support from Microsoft, so Adafruit Industries announced a $3,000 bounty to create open-source drivers so that anyone could access the features of Kinect for their own projects. Since then, Microsoft has embraced the use of Kinect for these purposes.

The Create 2 from iRobot

iRobot’s Create 2, a hackable version of the Roomba

Consumer product companies even make versions of their products that are specifically meant for hacking, making, and learning. Belkin’s WeMo home automation product line includes the WeMo Maker, a device that can act as a remote relay or sensor and hook into your home automation system. And iRobot offers Create 2, a hackable version of its Roomba floor-cleaning robot. While iRobot aimed the robot at STEM educators, you could use it for personal projects too. Electronic instrument maker Korg takes its maker-friendly approach to the next level by releasing the schematics for some of its analogue synthesiser products.

Why would a company want to do this? There are a few possible reasons. For one, it’s a way of encouraging consumers to create a community around a product. It could be a way for innovation with the product to continue, unchecked by the firm’s own limits on resources. For certain, it’s an awesome feel-good way for a company to empower their own users. Whatever the reason these products exist, it’s the digital maker who comes out ahead. They have more affordable tools, materials, and resources to create their own customised products and possibly learn a thing or two along the way.

With maker-friendly, hackable products, being a creator and a consumer aren’t mutually exclusive. In fact, you’re probably getting the best of both worlds: great products and great opportunities to make the thing you wish to see in the world.

The post Product or Project? appeared first on Raspberry Pi.

Hiring a Content Director

Post Syndicated from Ahin Thomas original https://www.backblaze.com/blog/hiring-content-director/


Backblaze is looking to hire a full time Content Director. This role is an essential piece of our team, reporting directly to our VP of Marketing. As the hiring manager, I’d like to tell you a little bit more about the role, how I’m thinking about the collaboration, and why I believe this to be a great opportunity.

A Little About Backblaze and the Role

Since 2007, Backblaze has earned a strong reputation as a leader in data storage. Our products are astonishingly easy to use and affordable to purchase. We have engaged customers and an involved community that helps drive our brand. Our audience numbers in the millions and our primary interaction point is the Backblaze blog. We publish content for engineers (data infrastructure, topics in the data storage world), consumers (how to’s, merits of backing up), and entrepreneurs (business insights). In all categories, our Content Director drives our earned positioned as leaders.

Backblaze has a culture focused on being fair and good (to each other and our customers). We have created a sustainable business that is profitable and growing. Our team places a premium on open communication, being cleverly unconventional, and helping each other out. The Content Director, specifically, balances our needs as a commercial enterprise (at the end of the day, we want to sell our products) with the custodianship of our blog (and the trust of our audience).

There’s a lot of ground to be covered at Backblaze. We have three discreet business lines:

  • Computer Backup -> a 10 year old business focusing on backing up consumer computers.
  • B2 Cloud Storage -> Competing with Amazon, Google, and Microsoft… just at ¼ of the price (but with the same performance characteristics).
  • Business Backup -> Both Computer Backup and B2 Cloud Storage, but focused on SMBs and enterprise.

The Best Candidate Is…

An excellent writer – possessing a solid academic understanding of writing, the creative process, and delivering against deadlines. You know how to write with multiple voices for multiple audiences. We do not expect our Content Director to be a storage infrastructure expert; we do expect a facility with researching topics, accessing our engineering and infrastructure team for guidance, and generally translating the technical into something easy to understand. The best Content Director must be an active participant in the business/ strategy / and editorial debates and then must execute with ruthless precision.

Our Content Director’s “day job” is making sure the blog is running smoothly and the sales team has compelling collateral (emails, case studies, white papers).

Specifically, the Perfect Content Director Excels at:

  • Creating well researched, elegantly constructed content on deadline. For example, each week, 2 articles should be published on our blog. Blog posts should rotate to address the constituencies for our 3 business lines – not all blog posts will appeal to everyone, but over the course of a month, we want multiple compelling pieces for each segment of our audience. Similarly, case studies (and outbound emails) should be tailored to our sales team’s proposed campaigns / audiences. The Content Director creates ~75% of all content but is responsible for editing 100%.
  • Understanding organic methods for weaving business needs into compelling content. The majority of our content (but not EVERY piece) must tie to some business strategy. We hate fluff and hold our promotional content to a standard of being worth someone’s time to read. To be effective, the Content Director must understand the target customer segments and use cases for our products.
  • Straddling both Consumer & SaaS mechanics. A key part of the job will be working to augment the collateral used by our sales team for both B2 Cloud Storage and Business Backup. This content should be compelling and optimized for converting leads. And our foundational business line, Computer Backup, deserves to be nurtured and grown.
  • Product marketing. The Content Director “owns” the blog. But also assists in writing cases studies / white papers, creating collateral (email, trade show). Each of these things has a variety of call to action(s) and audiences. Direct experience is a plus, experience that will plausibly translate to these areas is a requirement.
  • Articulating views on storage, backup, and cloud infrastructure. Not everyone has experience with this. That’s fine, but if you do, it’s strongly beneficial.

A Thursday In The Life:

  • Coordinate Collaborators – We are deliverables driven culture, not a meeting driven one. We expect you to collaborate with internal blog authors and the occasional guest poster.
  • Collaborate with Design – Ensure imagery for upcoming posts / collateral are on track.
  • Augment Sales team – Lock content for next week’s outbound campaign.
  • Self directed blog agenda – Feedback for next Tuesday’s post is addressed, next Thursday’s post is circulated to marketing team for feedback & SEO polish.
  • Review Editorial calendar, make any changes.

Oh! And We Have Great Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee & fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Interested in Joining Our Team?

Send us an email to jobscontact@backblaze.com with the subject “Content Director”. Please include your resume and 3 brief abstracts for content pieces.
Some hints for each of your three abstracts:

  • Create a compelling headline
  • Write clearly and concisely
  • Be brief, each abstract should be 100 words or less – no longer
  • Target each abstract to a different specific audience that is relevant to our business lines

Thank you for taking the time to read and consider all this. I hope it sounds like a great opportunity for you or someone you know. Principles only need apply.

The post Hiring a Content Director appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.