Tag Archives: sts

Migrating .NET Classic Applications to Amazon ECS Using Windows Containers

Post Syndicated from Sundar Narasiman original https://aws.amazon.com/blogs/compute/migrating-net-classic-applications-to-amazon-ecs-using-windows-containers/

This post contributed by Sundar Narasiman, Arun Kannan, and Thomas Fuller.

AWS recently announced the general availability of Windows container management for Amazon Elastic Container Service (Amazon ECS). Docker containers and Amazon ECS make it easy to run and scale applications on a virtual machine by abstracting the complex cluster management and setup needed.

Classic .NET applications are developed with .NET Framework 4.7.1 or older and can run only on a Windows platform. These include Windows Communication Foundation (WCF), ASP.NET Web Forms, and an ASP.NET MVC web app or web API.

Why classic ASP.NET?

ASP.NET MVC 4.6 and older versions of ASP.NET occupy a significant footprint in the enterprise web application space. As enterprises move towards microservices for new or existing applications, containers are one of the stepping stones for migrating from monolithic to microservices architectures. Additionally, the support for Windows containers in Windows 10, Windows Server 2016, and Visual Studio Tooling support for Docker simplifies the containerization of ASP.NET MVC apps.

Getting started

In this post, you pick an ASP.NET 4.6.2 MVC application and get step-by-step instructions for migrating to ECS using Windows containers. The detailed steps, AWS CloudFormation template, Microsoft Visual Studio solution, ECS service definition, and ECS task definition are available in the aws-ecs-windows-aspnet GitHub repository.

To help you getting started running Windows containers, here is the reference architecture for Windows containers on GitHub: ecs-refarch-cloudformation-windows. This reference architecture is the layered CloudFormation stack, in that it calls the other stacks to create the environment. The CloudFormation YAML template in this reference architecture is referenced to create a single JSON CloudFormation stack, which is used in the steps for the migration.

Steps for Migration

The code and templates to implement this migration can be found on GitHub: https://github.com/aws-samples/aws-ecs-windows-aspnet.

  1. Your development environment needs to have the latest version and updates for Visual Studio 2017, Windows 10, and Docker for Windows Stable.
  2. Next, containerize the ASP.NET application and test it locally. The size of Windows container application images is generally larger compared to Linux containers. This is because the base image of the Windows container itself is large in size, typically greater than 9 GB.
  3. After the application is containerized, the container image needs to be pushed to Amazon Elastic Container Registry (Amazon ECR). Images stored in ECR are compressed to improve pull times and reduce storage costs. In this case, you can see that ECR compresses the image to around 1 GB, for an optimization factor of 90%.
  4. Create a CloudFormation stack using the template in the ‘CloudFormation template’ folder. This creates an ECS service, task definition (referring the containerized ASP.NET application), and other related components mentioned in the ECS reference architecture for Windows containers.
  5. After the stack is created, verify the successful creation of the ECS service, ECS instances, running tasks (with the threshold mentioned in the task definition), and the Application Load Balancer’s successful health check against running containers.
  6. Navigate to the Application Load Balancer URL and see the successful rendering of the containerized ASP.NET MVC app in the browser.

Key Notes

  • Generally, Windows container images occupy large amount of space (in the order of few GBs).
  • All the task definition parameters for Linux containers are not available for Windows containers. For more information, see Windows Task Definitions.
  • An Application Load Balancer can be configured to route requests to one or more ports on each container instance in a cluster. The dynamic port mapping allows you to have multiple tasks from a single service on the same container instance.
  • IAM roles for Windows tasks require extra configuration. For more information, see Windows IAM Roles for Tasks. For this post, configuration was handled by the CloudFormation template.
  • The ECS container agent log file can be accessed for troubleshooting Windows containers: C:\ProgramData\Amazon\ECS\log\ecs-agent.log

Summary

In this post, you migrated an ASP.NET MVC application to ECS using Windows containers.

The logical next step is to automate the activities for migration to ECS and build a fully automated continuous integration/continuous deployment (CI/CD) pipeline for Windows containers. This can be orchestrated by leveraging services such as AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Amazon ECR, and Amazon ECS. You can learn more about how this is done in the Set Up a Continuous Delivery Pipeline for Containers Using AWS CodePipeline and Amazon ECS post.

If you have questions or suggestions, please comment below.

Optimize Delivery of Trending, Personalized News Using Amazon Kinesis and Related Services

Post Syndicated from Yukinori Koide original https://aws.amazon.com/blogs/big-data/optimize-delivery-of-trending-personalized-news-using-amazon-kinesis-and-related-services/

This is a guest post by Yukinori Koide, an the head of development for the Newspass department at Gunosy.

Gunosy is a news curation application that covers a wide range of topics, such as entertainment, sports, politics, and gourmet news. The application has been installed more than 20 million times.

Gunosy aims to provide people with the content they want without the stress of dealing with a large influx of information. We analyze user attributes, such as gender and age, and past activity logs like click-through rate (CTR). We combine this information with article attributes to provide trending, personalized news articles to users.

In this post, I show you how to process user activity logs in real time using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services.

Why does Gunosy need real-time processing?

Users need fresh and personalized news. There are two constraints to consider when delivering appropriate articles:

  • Time: Articles have freshness—that is, they lose value over time. New articles need to reach users as soon as possible.
  • Frequency (volume): Only a limited number of articles can be shown. It’s unreasonable to display all articles in the application, and users can’t read all of them anyway.

To deliver fresh articles with a high probability that the user is interested in them, it’s necessary to include not only past user activity logs and some feature values of articles, but also the most recent (real-time) user activity logs.

We optimize the delivery of articles with these two steps.

  1. Personalization: Deliver articles based on each user’s attributes, past activity logs, and feature values of each article—to account for each user’s interests.
  2. Trends analysis/identification: Optimize delivering articles using recent (real-time) user activity logs—to incorporate the latest trends from all users.

Optimizing the delivery of articles is always a cold start. Initially, we deliver articles based on past logs. We then use real-time data to optimize as quickly as possible. In addition, news has a short freshness time. Specifically, day-old news is past news, and even the news that is three hours old is past news. Therefore, shortening the time between step 1 and step 2 is important.

To tackle this issue, we chose AWS for processing streaming data because of its fully managed services, cost-effectiveness, and so on.

Solution

The following diagrams depict the architecture for optimizing article delivery by processing real-time user activity logs

There are three processing flows:

  1. Process real-time user activity logs.
  2. Store and process all user-based and article-based logs.
  3. Execute ad hoc or heavy queries.

In this post, I focus on the first processing flow and explain how it works.

Process real-time user activity logs

The following are the steps for processing user activity logs in real time using Kinesis Data Streams and Kinesis Data Analytics.

  1. The Fluentd server sends the following user activity logs to Kinesis Data Streams:
{"article_id": 12345, "user_id": 12345, "action": "click"}
{"article_id": 12345, "user_id": 12345, "action": "impression"}
...
  1. Map rows of logs to columns in Kinesis Data Analytics.

  1. Set the reference data to Kinesis Data Analytics from Amazon S3.

a. Gunosy has user attributes such as gender, age, and segment. Prepare the following CSV file (user_id, gender, segment_id) and put it in Amazon S3:

101,female,1
102,male,2
103,female,3
...

b. Add the application reference data source to Kinesis Data Analytics using the AWS CLI:

$ aws kinesisanalytics add-application-reference-data-source \
  --application-name <my-application-name> \
  --current-application-version-id <version-id> \
  --reference-data-source '{
  "TableName": "REFERENCE_DATA_SOURCE",
  "S3ReferenceDataSource": {
    "BucketARN": "arn:aws:s3:::<my-bucket-name>",
    "FileKey": "mydata.csv",
    "ReferenceRoleARN": "arn:aws:iam::<account-id>:role/..."
  },
  "ReferenceSchema": {
    "RecordFormat": {
      "RecordFormatType": "CSV",
      "MappingParameters": {
        "CSVMappingParameters": {"RecordRowDelimiter": "\n", "RecordColumnDelimiter": ","}
      }
    },
    "RecordEncoding": "UTF-8",
    "RecordColumns": [
      {"Name": "USER_ID", "Mapping": "0", "SqlType": "INTEGER"},
      {"Name": "GENDER",  "Mapping": "1", "SqlType": "VARCHAR(32)"},
      {"Name": "SEGMENT_ID", "Mapping": "2", "SqlType": "INTEGER"}
    ]
  }
}'

This application reference data source can be referred on Kinesis Data Analytics.

  1. Run a query against the source data stream on Kinesis Data Analytics with the application reference data source.

a. Define the temporary stream named TMP_SQL_STREAM.

CREATE OR REPLACE STREAM "TMP_SQL_STREAM" (
  GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER
);

b. Insert the joined source stream and application reference data source into the temporary stream.

CREATE OR REPLACE PUMP "TMP_PUMP" AS
INSERT INTO "TMP_SQL_STREAM"
SELECT STREAM
  R.GENDER, R.SEGMENT_ID, S.ARTICLE_ID, S.ACTION
FROM      "SOURCE_SQL_STREAM_001" S
LEFT JOIN "REFERENCE_DATA_SOURCE" R
  ON S.USER_ID = R.USER_ID;

c. Define the destination stream named DESTINATION_SQL_STREAM.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
  TIME TIMESTAMP, GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER, 
  IMPRESSION INTEGER, CLICK INTEGER
);

d. Insert the processed temporary stream, using a tumbling window, into the destination stream per minute.

CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
  ROW_TIME AS TIME,
  GENDER, SEGMENT_ID, ARTICLE_ID,
  SUM(CASE ACTION WHEN 'impression' THEN 1 ELSE 0 END) AS IMPRESSION,
  SUM(CASE ACTION WHEN 'click' THEN 1 ELSE 0 END) AS CLICK
FROM "TMP_SQL_STREAM"
GROUP BY
  GENDER, SEGMENT_ID, ARTICLE_ID,
  FLOOR("TMP_SQL_STREAM".ROWTIME TO MINUTE);

The results look like the following:

  1. Insert the results into Amazon Elasticsearch Service (Amazon ES).
  2. Batch servers get results from Amazon ES every minute. They then optimize delivering articles with other data sources using a proprietary optimization algorithm.

How to connect a stream to another stream in another AWS Region

When we built the solution, Kinesis Data Analytics was not available in the Asia Pacific (Tokyo) Region, so we used the US West (Oregon) Region. The following shows how we connected a data stream to another data stream in the other Region.

There is no need to continue containing all components in a single AWS Region, unless you have a situation where a response difference at the millisecond level is critical to the service.

Benefits

The solution provides benefits for both our company and for our users. Benefits for the company are cost savings—including development costs, operational costs, and infrastructure costs—and reducing delivery time. Users can now find articles of interest more quickly. The solution can process more than 500,000 records per minute, and it enables fast and personalized news curating for our users.

Conclusion

In this post, I showed you how we optimize trending user activities to personalize news using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services in Gunosy.

AWS gives us a quick and economical solution and a good experience.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Joining and Enriching Streaming Data on Amazon Kinesis.


About the Authors

Yukinori Koide is the head of development for the Newspass department at Gunosy. He is working on standardization of provisioning and deployment flow, promoting the utilization of serverless and containers for machine learning and AI services. His favorite AWS services are DynamoDB, Lambda, Kinesis, and ECS.

 

 

 

Akihiro Tsukada is a start-up solutions architect with AWS. He supports start-up companies in Japan technically at many levels, ranging from seed to later-stage.

 

 

 

 

Yuta Ishii is a solutions architect with AWS. He works with our customers to provide architectural guidance for building media & entertainment services, helping them improve the value of their services when using AWS.

 

 

 

 

 

UK Government Teaches 7-Year-Olds That Piracy is Stealing

Post Syndicated from Ernesto original https://torrentfreak.com/uk-government-teaches-7-year-olds-that-piracy-is-stealing-180118/

In 2014, Mike Weatherley, the UK Government’s top IP advisor at the time, offered a recommendation that copyright education should be added to the school curriculum, starting with the youngest kids in primary school.

New generations should learn copyright moral and ethics, the idea was, and a few months later the first version of the new “Cracking Ideas” curriculum was made public.

In the years that followed new course material was added, published by the UK’s Intellectual Property Office (IPO) with support from the local copyright industry. The teaching material is aimed at a variety of ages, including those who have just started primary school.

Part of the education features a fictitious cartoon band called Nancy and the Meerkats. With help from their manager, they learn key copyright insights and this week several new videos were published, BBC points out.

The videos try to explain concepts including copyright, trademarks, and how people can protect the things they’ve created. Interestingly, the videos themselves use names of existing musicians, with puns such as Ed Shealing, Justin Beaver, and the evil Kitty Perry. Even Nancy and the Meerkats appears to be a play on the classic 1970s cartoon series Josie and the Pussycats, featuring a pop band of the same name.

The play on Ed Sheeran’s name is interesting, to say the least. While he’s one of the most popular artists today, he also mentioned in the past that file-sharing made his career.

“…illegal fire sharing was what made me. It was students in England going to university, sharing my songs with each other,” Sheeran said in an interview with CBS last year.

But that didn’t stop the IPO from using his likeness for their anti-file-sharing campaign. According to Catherine Davies of IPO’s education outreach department, knowledge about key intellectual property issues is a “life skill” nowadays.

“In today’s digital environment, even very young people are IP consumers, accessing online digital content independently and regularly,” she tells the BBC. “A basic understanding of IP and a respect for others’ IP rights is therefore a key life skill.”

While we doubt that these concepts will appeal to the average five-year-old, the course material does it best to simplify complex copyright issues. Perhaps that’s also where the danger lies.

The program is in part backed by copyright-reliant industries, who have a different view on the matter than many others. For example, a previously published video of Nancy and the Meerkats deals with the topic of file-sharing.

After the Meerkats found out that people were downloading their tracks from pirate sites and became outraged, their manager Big Joe explained that file-sharing is just the same as stealing a CD from a physical store.

“In a way, all those people who downloaded free copies are doing the same thing as walking out of the shop with a CD and forgetting to go the till,” he says.

“What these sites are doing is sometimes called piracy. It not only affects music but also videos, books, and movies.If someone owns the copyright to something, well, it is stealing. Simple as that,” Big Joe adds.

The Pirates of the Internet!

While we won’t go into the copying vs. stealing debate, it’s interesting that there is no mention of more liberal copyright licenses. There are thousands of artists who freely share their work after all, by adopting Creative Commons licenses for example. Downloading these tracks is certainly not stealing.

Jim Killock, director of the Open Rights Group, notes that the campaign is a bit extreme at points.

“Infringing copyright is a bad thing, but it is not the same as physical theft. Many children will guess that making a copy is not the same as making off with the local store’s chocolate bars,” he says.

“Children aren’t born bureaucrats, and they are surrounded by stupid rules made by stupid adults. Presumably, the IPO doesn’t want children to conclude that copyright is just another one, so they should be a bit more careful with how they explain things.”

Killock also stresses that children copy a lot of things in school, which would normally violate copyright. However, thanks to the educational exceptions they’re not getting in trouble. The IPO could pay more attention to these going forward.

Perhaps Nancy and the Meerkats could decide to release a free to share track in a future episode, for example, and encourage kids to use it for their own remixes, or other creative projects. Creativity and copyright are not all about restrictions, after all.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Privacy expectations and the connected home

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/50229.html

Traditionally, devices that were tied to logins tended to indicate that in some way – turn on someone’s xbox and it’ll show you their account name, run Netflix and it’ll ask which profile you want to use. The increasing prevalence of smart devices in the home changes that, in ways that may not be immediately obvious to the majority of people. You can configure a Philips Hue with wall-mounted dimmers, meaning that someone unfamiliar with the system may not recognise that it’s a smart lighting system at all. Without any actively malicious intent, you end up with a situation where the account holder is able to infer whether someone is home without that person necessarily having any idea that that’s possible. A visitor who uses an Amazon Echo is not necessarily going to know that it’s tied to somebody’s Amazon account, and even if they do they may not know that the log (and recorded audio!) of all interactions is available to the account holder. And someone grabbing an egg out of your fridge is almost certainly not going to think that your smart egg tray will trigger an immediate notification on the account owner’s phone that they need to buy new eggs.

Things get even more complicated when there’s multiple account support. Google Home supports multiple users on a single device, using voice recognition to determine which queries should be associated with which account. But the account that was used to initially configure the device remains as the fallback, with unrecognised voices ended up being logged to it. If a voice is misidentified, the query may end up being logged to an unexpected account.

There’s some interesting questions about consent and expectations of privacy here. If someone sets up a smart device in their home then at some point they’ll agree to the manufacturer’s privacy policy. But if someone else makes use of the system (by pressing a lightswitch, making a spoken query or, uh, picking up an egg), have they consented? Who has the social obligation to explain to them that the information they’re producing may be stored elsewhere and visible to someone else? If I use an Echo in a hotel room, who has access to the Amazon account it’s associated with? How do you explain to a teenager that there’s a chance that when they asked their Home for contact details for an abortion clinic, it ended up in their parent’s activity log? Who’s going to be the first person divorced for claiming that they were vegan but having been the only person home when an egg was taken out of the fridge?

To be clear, I’m not arguing against the design choices involved in the implementation of these devices. In many cases it’s hard to see how the desired functionality could be implemented without this sort of issue arising. But we’re gradually shifting to a place where the data we generate is not only available to corporations who probably don’t care about us as individuals, it’s also becoming available to people who own the more private spaces we inhabit. We have social norms against bugging our houseguests, but we have no social norms that require us to explain to them that there’ll be a record of every light that they turn on or off. This feels like it’s going to end badly.

(Thanks to Nikki Everett for conversations that inspired this post)

(Disclaimer: while I work for Google, I am not involved in any of the products or teams described in this post and my opinions are my own rather than those of my employer’s)

comment count unavailable comments

Article from a Former Chinese PLA General on Cyber Sovereignty

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/article_from_a_.html

Interesting article by Major General Hao Yeli, Chinese People’s Liberation Army (ret.), a senior advisor at the China International Institute for Strategic Society, Vice President of China Institute for Innovation and Development Strategy, and the Chair of the Guanchao Cyber Forum.

Against the background of globalization and the internet era, the emerging cyber sovereignty concept calls for breaking through the limitations of physical space and avoiding misunderstandings based on perceptions of binary opposition. Reinforcing a cyberspace community with a common destiny, it reconciles the tension between exclusivity and transferability, leading to a comprehensive perspective. China insists on its cyber sovereignty, meanwhile, it transfers segments of its cyber sovereignty reasonably. China rightly attaches importance to its national security, meanwhile, it promotes international cooperation and open development.

China has never been opposed to multi-party governance when appropriate, but rejects the denial of government’s proper role and responsibilities with respect to major issues. The multilateral and multiparty models are complementary rather than exclusive. Governments and multi-stakeholders can play different leading roles at the different levels of cyberspace.

In the internet era, the law of the jungle should give way to solidarity and shared responsibilities. Restricted connections should give way to openness and sharing. Intolerance should be replaced by understanding. And unilateral values should yield to respect for differences while recognizing the importance of diversity.

New AWS Auto Scaling – Unified Scaling For Your Cloud Applications

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-auto-scaling-unified-scaling-for-your-cloud-applications/

I’ve been talking about scalability for servers and other cloud resources for a very long time! Back in 2006, I wrote “This is the new world of scalable, on-demand web services. Pay for what you need and use, and not a byte more.” Shortly after we launched Amazon Elastic Compute Cloud (EC2), we made it easy for you to do this with the simultaneous launch of Elastic Load Balancing, EC2 Auto Scaling, and Amazon CloudWatch. Since then we have added Auto Scaling to other AWS services including ECS, Spot Fleets, DynamoDB, Aurora, AppStream 2.0, and EMR. We have also added features such as target tracking to make it easier for you to scale based on the metric that is most appropriate for your application.

Introducing AWS Auto Scaling
Today we are making it easier for you to use the Auto Scaling features of multiple AWS services from a single user interface with the introduction of AWS Auto Scaling. This new service unifies and builds on our existing, service-specific, scaling features. It operates on any desired EC2 Auto Scaling groups, EC2 Spot Fleets, ECS tasks, DynamoDB tables, DynamoDB Global Secondary Indexes, and Aurora Replicas that are part of your application, as described by an AWS CloudFormation stack or in AWS Elastic Beanstalk (we’re also exploring some other ways to flag a set of resources as an application for use with AWS Auto Scaling).

You no longer need to set up alarms and scaling actions for each resource and each service. Instead, you simply point AWS Auto Scaling at your application and select the services and resources of interest. Then you select the desired scaling option for each one, and AWS Auto Scaling will do the rest, helping you to discover the scalable resources and then creating a scaling plan that addresses the resources of interest.

If you have tried to use any of our Auto Scaling options in the past, you undoubtedly understand the trade-offs involved in choosing scaling thresholds. AWS Auto Scaling gives you a variety of scaling options: You can optimize for availability, keeping plenty of resources in reserve in order to meet sudden spikes in demand. You can optimize for costs, running close to the line and accepting the possibility that you will tax your resources if that spike arrives. Alternatively, you can aim for the middle, with a generous but not excessive level of spare capacity. In addition to optimizing for availability, cost, or a blend of both, you can also set a custom scaling threshold. In each case, AWS Auto Scaling will create scaling policies on your behalf, including appropriate upper and lower bounds for each resource.

AWS Auto Scaling in Action
I will use AWS Auto Scaling on a simple CloudFormation stack consisting of an Auto Scaling group of EC2 instances and a pair of DynamoDB tables. I start by removing the existing Scaling Policies from my Auto Scaling group:

Then I open up the new Auto Scaling Console and selecting the stack:

Behind the scenes, Elastic Beanstalk applications are always launched via a CloudFormation stack. In the screen shot above, awseb-e-sdwttqizbp-stack is an Elastic Beanstalk application that I launched.

I can click on any stack to learn more about it before proceeding:

I select the desired stack and click on Next to proceed. Then I enter a name for my scaling plan and choose the resources that I’d like it to include:

I choose the scaling strategy for each type of resource:

After I have selected the desired strategies, I click Next to proceed. Then I review the proposed scaling plan, and click Create scaling plan to move ahead:

The scaling plan is created and in effect within a few minutes:

I can click on the plan to learn more:

I can also inspect each scaling policy:

I tested my new policy by applying a load to the initial EC2 instance, and watched the scale out activity take place:

I also took a look at the CloudWatch metrics for the EC2 Auto Scaling group:

Available Now
We are launching AWS Auto Scaling today in the US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Singapore) Regions today, with more to follow. There’s no charge for AWS Auto Scaling; you pay only for the CloudWatch Alarms that it creates and any AWS resources that you consume.

As is often the case with our new services, this is just the first step on what we hope to be a long and interesting journey! We have a long roadmap, and we’ll be adding new features and options throughout 2018 in response to your feedback.

Jeff;

Scale Your Web Application — One Step at a Time

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/architecture/scale-your-web-application-one-step-at-a-time/

I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.

Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.

Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.

So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.

For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.

Let’s get more details on the current scenario and dig out more:

  • Where are images and videos stored?
  • How many read/write requests are received per second? Per minute?
  • What is the level of security required?
  • Are these synchronous or asynchronous requests?

We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:

How is the website handling sessions?

  • Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
  • How are you recording customer behavior data and fulfilling your analytics needs?
  • What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?

So, if we take this one step at a time:

Step 1: Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching.
Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database  to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.

I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.

Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger.  For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS  to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.

Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.

Step 5: Scale your server. If there are still issues, it’s time to scale your server.  For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.

If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution.  You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.

Your multisite architecture will look like the following diagram:

In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.

If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:

In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.

This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.

I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.

The Man from Earth Sequel ‘Pirated’ on The Pirate Bay – By Its Creators

Post Syndicated from Andy original https://torrentfreak.com/the-man-from-earth-sequel-pirated-on-the-pirate-bay-by-its-creators-180116/

More than a decade ago, Hollywood was struggling to get to grips with the file-sharing phenomenon. Sharing via BitTorrent was painted as a disease that could kill the movie industry, if it was allowed to take hold. Tough action was the only way to defeat it, the suits concluded.

In 2007, however, a most unusual turn of events showed that piracy could have a magical effect on the success of a movie.

After being produced on a tiny budget, a then little-known independent sci-fi film called “The Man from Earth” turned up on pirate sites, to the surprise of its creators.

“Originally, somebody got hold of a promotional screener DVD of ‘Jerome Bixby’s The Man from Earth’, ripped the file and posted the movie online before we knew what was even happening,” Man from Earth director Richard Schenkman informs TorrentFreak.

“A week or two before the DVD’s ‘street date’, we jumped 11,000% on the IMDb ‘Moviemeter’ and we were shocked.”

With pirates fueling interest in the movie, a member of the team took an unusual step. Producer Eric Wilkinson wrote to RLSlog, a popular piracy links site – not to berate pirates – but to thank them for catapulting the movie to fame.

“Our independent movie had next to no advertising budget and very little going for it until somebody ripped one of the DVD screeners and put the movie online for all to download. Most of the feedback from everyone who has downloaded ‘The Man From Earth’ has been overwhelmingly positive. People like our movie and are talking about it, all thanks to piracy on the net!” he wrote.

Richard Schenkman told TF this morning that availability on file-sharing networks was important for the movie, since it wasn’t available through legitimate means in most countries. So, the team called out to fans for help, if they’d pirated the movie and had liked what they’d seen.

“Once we realized what was going on, we asked people to make donations to our PayPal page if they saw the movie for free and liked it, because we had all worked for nothing for two years to bring it to the screen, and the only chance we had of surviving financially was to ask people to support us and the project,” Schenkman explains.

“And, happily, many people around the world did donate, although of course only a tiny fraction of the millions and millions of people who downloaded pirated copies.”

Following this early boost The Man from Earth went on to win multiple awards. And, a decade on, it boasts a hugely commendable 8/10 score on IMDb from more than 147,000 voters, with Netflix users leaving over 650,000 ratings, which reportedly translates to well over a million views.

It’s a performance director Richard Schenkman would like to repeat with his sequel: The Man from Earth: Holocene. This time, however, he won’t be leaving the piracy aspect to chance.

Yesterday the team behind the movie took matters into their own hands, uploading the movie to The Pirate Bay and other sites so that fans can help themselves.

“It was going to get uploaded regardless of what we did or didn’t do, and we figured that as long as this was inevitable, we would do the uploading ourselves and explain why we were doing it,” Schenkman informs TF.

“And, we would once again reach out to the filesharing community and remind them that while movies may be free to watch, they are not free to make, and we need their support.”

The release, listed here on The Pirate Bay, comes with detailed notes and a few friendly pointers on how the release can be further shared. It also informs people how they can show their appreciation if they like it.

The Man from Earth: Holocene on The Pirate Bay

“It’s a revolutionary global experiment in the honor system. We’re asking people: ‘If you watch our movie, and you like it, will you pay something directly to the people who made it?’,” Schenkman says.

“That’s why we’re so grateful to all of you who visit ManFromEarth.com and make a donation – of any size – if you’ve watched the movie without paying for it up front.”

In addition to using The Pirate Bay – which is often and incorrectly berated as a purely ‘pirate’ platform with no legitimate uses – the team has also teamed up with OpenSubtitles, so translations for the movie are available right from the beginning.

Other partners include MovieSaints.com, where fans can pay to see the movie from January 19 but get a full refund if they don’t enjoy it. It’s also available on Vimeo (see below) but the version seen by pirates is slightly different, and for good reason, Schenkman says.

“This version of the movie includes a greeting from me at the beginning, pointing out that we did indeed upload the movie ourselves, and asking people to visit manfromearth.com and make a donation if they can afford to, and if they enjoyed the film.

“The version we posted is very high-resolution, although we are also sharing some smaller files for those folks who have a slow Internet connection where they live,” he explains.

“We’re asking people to share ONLY this version of the movie — NOT to edit off the appeal message. And of course we’re asking people not to post the movie at YouTube or any other platform where someone (other than us) could profit financially from it. That would not be fair, nor in keeping with the spirit of what we’re trying to do.”

It’s not often we’re able to do this so it’s a pleasure to say that The Man from Earth: Holocene can be downloaded from The Pirate Bay, in various qualities and entirely legally, here. For those who want to show their appreciation, the tip jar is here.

"The Man from Earth: Holocene" Teaser Trailer from Richard Schenkman on Vimeo.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Early Challenges: Managing Cash Flow

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/managing-cash-flow/

Cash flow projection charts

This post by Backblaze’s CEO and co-founder Gleb Budman is the eighth in a series about entrepreneurship. You can choose posts in the series from the list below:

  1. How Backblaze got Started: The Problem, The Solution, and the Stuff In-Between
  2. Building a Competitive Moat: Turning Challenges Into Advantages
  3. From Idea to Launch: Getting Your First Customers
  4. How to Get Your First 1,000 Customers
  5. Surviving Your First Year
  6. How to Compete with Giants
  7. The Decision on Transparency
  8. Early Challenges: Managing Cash Flow

Use the Join button above to receive notification of new posts in this series.

Running out of cash is one of the quickest ways for a startup to go out of business. When you are starting a company the question of where to get cash is usually the top priority, but managing cash flow is critical for every stage in the lifecycle of a company. As a primarily bootstrapped but capital-intensive business, managing cash flow at Backblaze was and still is a key element of our success and requires continued focus. Let’s look at what we learned over the years.

Raising Your Initial Funding

When starting a tech business in Silicon Valley, the default assumption is that you will immediately try to raise venture funding. There are certainly many advantages to raising funding — not the least of which is that you don’t need to be cash-flow positive since you have cash in the bank and the expectation is that you will have a “burn rate,” i.e. you’ll be spending more than you make.

Note: While you’re not expected to be cash-flow positive, that doesn’t mean you don’t have to worry about cash. Cash-flow management will determine your burn rate. Whether you can get to cash-flow breakeven or need to raise another round of funding is a direct byproduct of your cash flow management.

Also, raising funding takes time (most successful fundraising cycles take 3-6 months start-to-finish), and time at a startup is in short supply. Constantly trying to raise funding can take away from product development and pursuing growth opportunities. If you’re not successful in raising funding, you then have to either shut down or find an alternate method of funding the business.

Sources of Funding

Depending on the stage of the company, type of company, and other factors, you may have access to different sources of funding. Let’s list a number of them:

Customers

Sales — the best kind of funding. It is non-dilutive, doesn’t have to be paid back, and is a direct metric of the success of your company.

Pre-Sales — some customers may be willing to pay you for a product in beta, a test, or pre-pay for a product they’ll receive when finished. Pre-Sales income also is great because it shares the characteristics of cash from sales, but you get the cash early. It also can be a good sign that the product you’re building fills a market need. We started charging for Backblaze computer backup while it was still in private beta, which allowed us to not only collect cash from customers, but also test the billing experience and users’ real desire for the service.

Services — if you’re a service company and customers are paying you for that, great. You can effectively scale for the number of hours available in a day. As demand grows, you can add more employees to increase the total number of billable hours.

Note: If you’re a product company and customers are paying you to consult, that can provide much needed cash, and could provide feedback toward the right product. However, it can also distract from your core business, send you down a path where you’re building a product for a single customer, and addict you to a path that prevents you from building a scalable business.

Investors

Yourself — you likely are putting your time into the business, and deferring salary in the process. You may also put your own cash into the business either as an investment or a loan.

Angels — angels are ideal as early investors since they are used to investing in businesses with little to no traction. AngelList is a good place to find them, though finding people you’re connected with through someone that knows you well is best.

Crowdfunding — a component of the JOBS Act permitted entrepreneurs to raise money from nearly anyone since May 2016. The SEC imposes limits on both investors and the companies. This article goes into some depth on the options and sites available.

VCs — VCs are ideal for companies that need to raise at least a few million dollars and intend to build a business that will be worth over $1 billion.

Debt

Friends & Family — F&F are often the first people to give you money because they are investing in you. It’s great to have some early supporters, but it also can be risky to take money from people who aren’t used to the risks. The key advice here is to only take money from people who won’t mind losing it. If someone is talking about using their children’s college funds or borrowing from their 401k, say ‘no thank you’ — even if they’re sure they want to loan you money.

Bank Loans — a variety of loan types exist, but most either require the company to have been operational for a couple years, be able to borrow against money the company has or is making, or be able to get a personal guarantee from the founders whereby their own credit is on the line. Fundera provides a good overview of loan options and can help secure some, but most will not be an option for a brand new startup.

Grants

Government — in some areas there is the potential for government grants to facilitate research. The SBIR program facilitates some such grants.

At Backblaze, we used a number of these options:

• Investors/Yourself
We loaned a cumulative total of a couple hundred thousand dollars to the company and invested our time by going without a salary for a year and a half.
• Customers/Pre-Sales
We started selling the Backblaze service while it was still in beta.
• Customers/Sales
We launched v1.0 and kept selling.
• Investors/Angels
After a year and a half, we raised $370k from 11 angels. All of them were either people whom we knew personally or were a strong recommendation from a mutual friend.
• Debt/Loans
After a couple years we were able to get equipment leases whereby the Storage Pods and hard drives were used as collateral to secure the lease on them.
• Investors/VCs
Ater five years we raised $5m from TMT Investments to add to the balance sheet and invest in growth.

The variety and quantity of sources we used is by no means uncommon.

GAAP vs. Cash

Most companies start tracking financials based on cash, and as they scale they switch to GAAP (Generally Accepted Accounting Principles). Cash is easier to track — we got paid $XXXX and spent $YYY — and as often mentioned, is required for the business to stay alive. GAAP has more subtlety and complexity, but provides a clearer picture of how the business is really doing. Backblaze was on a ‘cash’ system for the first few years, then switched to GAAP. For this post, I’m going to focus on things that help cash flow, not GAAP profitability.

Stages of Cash Flow Management

All-spend

In a pure service business (e.g. solo proprietor law firm), you may have no expenses other than your time, so this stage doesn’t exist. However, in a product business there is a period of time where you are building the product and have nothing to sell. You have zero cash coming in, but have cash going out. Your cash-flow is completely negative and you need funds to cover that.

Sales-generating

Starting to see cash come in from customers is thrilling. I initially had our system set up to email me with every $5 payment we received. You’re making sales, but not covering expenses.

Ramen-profitable

But it takes a lot of $5 payments to pay for servers and salaries, so for a while expenses are likely to outstrip sales. Getting to ramen-profitable is a critical stage where sales cover the business expenses and are “paying enough for the founders to eat ramen.” This extends the runway for a business, but is not completely sustainable, since presumably the founders can’t (or won’t) live forever on a subsistence salary.

Business-profitable

This is the ultimate stage whereby the business is truly profitable, including paying everyone market-rate salaries. A business at this stage is self-sustaining. (Of course, market shifts and plenty of other challenges can kill the business, but cash-flow issues alone will not.)

Note, I’m using the word ‘profitable’ here to mean this is still on a cash-basis.

Backblaze was in the all-spend stage for just over a year, during which time we built the service and hadn’t yet made the service available to customers. Backblaze was in the sales-generating stage for nearly another year before the company was barely ramen-profitable where sales were covering the company expenses and paying the founders minimum wage. (I say ‘barely’ since minimum wage in the SF Bay Area is arguably never subsistence.) It took almost three more years before the company was business-profitable, paying everyone including the founders market-rate.

Cash Flow Forecasting

When raising funding it’s helpful to think of milestones reached. You don’t necessarily need enough cash on day one to last for the next 100 years of the company. Some good milestones to consider are how much cash you need to prove there is a market need, prove you can build a product to meet that need, or get to ramen-profitable.

Two things to consider:

1) Unit Economics (COGS)

If your product is 100% software, this may not be relevant. Once software is built it costs effectively nothing to deliver the product to one customer or one million customers. However, in most businesses there is some incremental cost to provide the product. If you’re selling a hardware device, perhaps you sell it for $100 but it costs you $50 to make it. This is called “COGS” (Cost of Goods Sold).

Many products rely on cloud services where the costs scale with growth. That model works great, but it’s still important to understand what the costs are for the cloud service you use per unit of product you sell.

Support is often done by the founders early-on in a business, but that is another real cost to factor in and estimate on a per-user basis. Taking all of the per unit costs combined, you may charge $10/month/user for your service, but if it costs you $7/month/user in cloud services, you’re only netting $3/month/user.

2) Operating Expenses (OpEx)

These are expenses that don’t scale with the number of product units you sell. Typically this includes research & development, sales & marketing, and general & administrative expenses. Presumably there is a certain level of these functions required to build the product, market it, sell it, and run the organization. You can choose to invest or cut back on these, but you’ll still make the same amount per product unit.

Incremental Net Profit Per Unit

If you’ve calculated your COGS and your unit economics are “upside down,” where the amount you charge is less than that it costs you to provide your service, it’s worth thinking hard about how that’s going to change over time. If it will not change, there is no scale that will make the business work. Presuming you do make money on each unit of product you sell — what is sometimes referred to as “Contribution Margin” — consider how many of those product units you need to sell to cover your operating expenses as described above.

Calculating Your Profit

The math on getting to ramen-profitable is simple:

(Number of Product Units Sold x Contribution Margin) - Operating Expenses = Profit

If your operating expenses include subsistence salaries for the founders and profit > $0, you’re ramen-profitable.

Improving Cash Flow

Having access to sources of cash, whether from selling to customers or other methods, is excellent. But needing less cash gives you more choices and allows you to either dilute less, owe less, or invest more.

There are two ways to improve cash flow:

1) Collect More Cash

The best way to collect more cash is to provide more value to your customers and as a result have them pay you more. Additional features/products/services can allow this. However, you can also collect more cash by changing how you charge for your product. If you have a subscription, changing from charging monthly to yearly dramatically improves your cash flow. If you have a product that customers use up, selling a year’s supply instead of selling them one-by-one can help.

2) Spend Less Cash

Reducing COGS is a fantastic way to spend less cash in a scalable way. If you can do this without harming the product or customer experience, you win. There are a myriad of ways to also reduce operating expenses, including taking sub-market salaries, using your home instead of renting office space, staying focused on your core product, etc.

Ultimately, collecting more and spending less cash dramatically simplifies the process of getting to ramen-profitable and later to business-profitable.

Be Careful (Why GAAP Matters)

A word of caution: while running out of cash will put you out of business immediately, overextending yourself will likely put you out of business not much later. GAAP shows how a business is really doing; cash doesn’t. If you only focus on cash, it is possible to commit yourself to both delivering products and repaying loans in the future in an unsustainable fashion. If you’re taking out loans, watch the total balance and monthly payments you’re committing to. If you’re asking customers for pre-payment, make sure you believe you can deliver on what they’ve paid for.

Summary

There are numerous challenges to building a business, and ensuring you have enough cash is amongst the most important. Having the cash to keep going lets you keep working on all of the other challenges. The frameworks above were critical for maintaining Backblaze’s cash flow and cash balance. Hopefully you can take some of the lessons we learned and apply them to your business. Let us know what works for you in the comments below.

The post Early Challenges: Managing Cash Flow appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Eevee gained 2791 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/15/eevee-gained-2791-experience-points/

Eevee grew to level 31!

A year strongly defined by mixed success! Also, a lot of video games.

I ran three game jams, resulting in a total of 157 games existing that may not have otherwise, which is totally mindblowing?!

For GAMES MADE QUICK???, glip and I made NEON PHASE, a short little exploratory platformer. Honestly, I should give myself more credit for this and the rest of the LÖVE games I’ve based on the same codebase — I wove a physics engine (and everything else!) from scratch and it has held up remarkably well for a variety of different uses.

I successfully finished an HD version of Isaac’s Descent using my LÖVE engine, though it doesn’t have anything new over the original and I’ve only released it as a tech demo on Patreon.

For Strawberry Jam (NSFW!) we made fox flux (slightly NSFW!), which felt like a huge milestone: the first game where I made all the art! I mean, not counting Isaac’s Descent, which was for a very limited platform. It’s a pretty arbitrary milestone, yes, but it feels significant. I’ve been working on expanding the game into a longer and slightly less buggy experience, but the art is taking the longest by far. I must’ve spent weeks on player sprites alone.

We then set about working on Bolthaven, a sequel of sorts to NEON PHASE, and got decently far, and then abandond it. Oops.

We then started a cute little PICO-8 game, and forgot about it. Oops.

I was recruited to help with Chaos Composer, a more ambitious game glip started with someone else in Unity. I had to get used to Unity, and we squabbled a bit, but the game is finally about at the point where it’s “playable” and “maps” can be designed? It’s slightly on hold at the moment while we all finish up some other stuff, though.

We made a birthday game for two of our friends whose birthdays were very close together! Only they got to see it.

For Ludum Dare 38, we made Lunar Depot 38, a little “wave shooter” or whatever you call those? The AI is pretty rough, seeing as this was the first time I’d really made enemies and I had 72 hours to figure out how to do it, but I still think it’s pretty fun to play and I love the circular world.

I made Roguelike Simulator as an experiment with making something small and quick with a simple tool, and I had a lot of fun! I definitely want to do more stuff like this in the future.

And now we’re working on a game about Star Anise, my cat’s self-insert, which is looking to have more polish and depth than anything we’ve done so far! We’ve definitely come a long way in a year.

Somewhere along the line, I put out a call for a “potluck” project, where everyone would give me sprites of a given size without knowing what anyone else had contributed, and I would then make a game using only those sprites. Unfortunately, that stalled a few times: I tried using the Phaser JS library, but we didn’t get along; I tried LÖVE, but didn’t know where to go with the game; and then I decided to use this as an experiment with procedural generation, and didn’t get around to it. I still feel bad that everyone did work for me and I didn’t follow through, but I don’t know whether this will ever become a game.

veekun, alas, consumed months of my life. I finally got Sun and Moon loaded, but it took weeks of work since I was basically reinventing all the tooling we’d ever had from scratch, without even having most of that tooling available as a reference. It was worth it in the end, at least: Ultra Sun and Ultra Moon only took a few days to get loaded. But veekun itself is still missing some obvious Sun/Moon features, and the whole site needs an overhaul, and I just don’t know if I want to dedicate that much time to it when I have so much other stuff going on that’s much more interesting to me right now.

I finally turned my blog into more of a website, giving it a neat front page that lists a bunch of stuff I’ve done. I made a release category at last, though I’m still not quite in the habit of using it.

I wrote some blog posts, of course! I think the most interesting were JavaScript got better while I wasn’t looking and Object models. I was also asked to write a couple pieces for money for a column that then promptly shut down.

On a whim, I made a set of Eevee mugshots for Doom, which I think is a decent indication of my (pixel) art progress over the year?

I started idchoppers, a Doom parsing and manipulation library written in Rust, though it didn’t get very far and I’ve spent most of the time fighting with Rust because it won’t let me implement all my extremely bad ideas. It can do a couple things, at least, like flip maps very quickly and render maps to SVG.

I did toy around with music a little, but not a lot.

I wrote two short twines for Flora. They’re okay. I’m working on another; I think it’ll be better.

I didn’t do a lot of art overall, at least compared to the two previous years; most of my art effort over the year has gone into fox flux, which requires me to learn a whole lot of things. I did dip my toes into 3D modelling, most notably producing my current Twitter banner as well as this cool Star Anise animation. I wouldn’t mind doing more of that; maybe I’ll even try to make a low-poly pixel-textured 3D game sometime.

I restarted my book with a much better concept, though so far I’ve only written about half a chapter. Argh. I see that the vast majority of the work was done within the span of a single week, which is bad since that means I only worked on it for a week, but good since that means I can actually do a pretty good amount of work in only a week. I also did a lot of squabbling with tooling, which is hopefully mostly out of the way now.

My computer broke? That was an exciting week.


A lot of stuff, but the year as a whole still feels hit or miss. All the time I spent on veekun feels like a black void in the middle of the year, which seems like a good sign that I maybe don’t want to pour even more weeks into it in the near future.

Mostly, I want to do: more games, more art, more writing, more music.

I want to try out some tiny game making tools and make some tiny games with them — partly to get exposure to different things, partly to get more little ideas out into the world regularly, and partly to get more practice at letting myself have ideas. I have a couple tools in mind and I guess I’ll aim at a microgame every two months or so? I’d also like to finish the expanded fox flux by the end of the year, of course, though at the moment I can’t even gauge how long it might take.

I seriously lapsed on drawing last year, largely because fox flux pixel art took me so much time. So I want to draw more, and I want to get much faster at pixel art. It would probably help if I had a more concrete goal for drawing, so I might try to draw some short comics and write a little visual novel or something, which would also force me to aim for consistency.

I want to work on my book more, of course, but I also want to try my hand at a bit more fiction. I’ve had a blast writing dialogue for our games! I just shy away from longer-form writing for some reason — which seems ridiculous when a large part of my audience found me through my blog. I do think I’ve had some sort of breakthrough in the last month or two; I suddenly feel a good bit more confident about writing in general and figuring out what I want to say? One recent post I know I wrote in a single afternoon, which virtually never happens because I keep rewriting and rearranging stuff. Again, a visual novel would be a good excuse to practice writing fiction without getting too bogged down in details.

And, ah, music. I shy heavily away from music, since I have no idea what I’m doing, and also I seem to spend a lot of time fighting with tools. (Surprise.) I tried out SunVox for the first time just a few days ago and have been enjoying it quite a bit for making sound effects, so I might try it for music as well. And once again, visual novel background music is a pretty low-pressure thing to compose for. Hell, visual novels are small games, too, so that checks all the boxes. I guess I’ll go make a visual novel.

Here’s to twenty gayteen!

Now Open – Third AWS Availability Zone in London

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-third-aws-availability-zone-in-london/

We expand AWS by picking a geographic area (which we call a Region) and then building multiple, isolated Availability Zones in that area. Each Availability Zone (AZ) has multiple Internet connections and power connections to multiple grids.

Today I am happy to announce that we are opening our 50th AWS Availability Zone, with the addition of a third AZ to the EU (London) Region. This will give you additional flexibility to architect highly scalable, fault-tolerant applications that run across multiple AZs in the UK.

Since launching the EU (London) Region, we have seen an ever-growing set of customers, particularly in the public sector and in regulated industries, use AWS for new and innovative applications. Here are a couple of examples, courtesy of my AWS colleagues in the UK:

Enterprise – Some of the UK’s most respected enterprises are using AWS to transform their businesses, including BBC, BT, Deloitte, and Travis Perkins. Travis Perkins is one of the largest suppliers of building materials in the UK and is implementing the biggest systems and business change in its history, including an all-in migration of its data centers to AWS.

Startups – Cross-border payments company Currencycloud has migrated its entire payments production, and demo platform to AWS resulting in a 30% saving on their infrastructure costs. Clearscore, with plans to disrupting the credit score industry, has also chosen to host their entire platform on AWS. UnderwriteMe is using the EU (London) Region to offer an underwriting platform to their customers as a managed service.

Public Sector -The Met Office chose AWS to support the Met Office Weather App, available for iPhone and Android phones. Since the Met Office Weather App went live in January 2016, it has attracted more than half a million users. Using AWS, the Met Office has been able to increase agility, speed, and scalability while reducing costs. The Driver and Vehicle Licensing Agency (DVLA) is using the EU (London) Region for services such as the Strategic Card Payments platform, which helps the agency achieve PCI DSS compliance.

The AWS EU (London) Region has achieved Public Services Network (PSN) assurance, which provides UK Public Sector customers with an assured infrastructure on which to build UK Public Sector services. In conjunction with AWS’s Standardized Architecture for UK-OFFICIAL, PSN assurance enables UK Public Sector organizations to move their UK-OFFICIAL classified data to the EU (London) Region in a controlled and risk-managed manner.

For a complete list of AWS Regions and Services, visit the AWS Global Infrastructure page. As always, pricing for services in the Region can be found on the detail pages; visit our Cloud Products page to get started.

Jeff;

US Govt Brands Torrent, Streaming & Cyberlocker Sites As Notorious Markets

Post Syndicated from Andy original https://torrentfreak.com/us-govt-brands-torrent-streaming-cyberlocker-sites-as-notorious-markets-180115/

In its annual “Out-of-Cycle Review of Notorious Markets” the office of the United States Trade Representative (USTR) has listed a long list of websites said to be involved in online piracy.

The list is compiled with high-level input from various trade groups, including the MPAA and RIAA who both submitted their recommendations (1,2) during early October last year.

With the word “allegedly” used more than two dozen times in the report, the US government notes that its report does not constitute cast-iron proof of illegal activity. However, it urges the countries from where the so-called “notorious markets” operate to take action where they can, while putting owners and facilitators on notice that their activities are under the spotlight.

“A goal of the List is to motivate appropriate action by owners, operators, and service providers in the private sector of these and similar markets, as well as governments, to reduce piracy and counterfeiting,” the report reads.

“USTR highlights the following marketplaces because they exemplify global counterfeiting and piracy concerns and because the scale of infringing activity in these marketplaces can cause significant harm to U.S. intellectual property (IP) owners, consumers, legitimate online platforms, and the economy.”

The report begins with a page titled “Issue Focus: Illicit Streaming Devices”. Unsurprisingly, particularly given their place in dozens of headlines last year, the segment focus on the set-top box phenomenon. The piece doesn’t list any apps or software tools as such but highlights the general position, claiming a cost to the US entertainment industry of $4-5 billion a year.

Torrent Sites

In common with previous years, the USTR goes on to list several of the world’s top torrent sites but due to changes in circumstances, others have been delisted. ExtraTorrent, which shut down May 2017, is one such example.

As the world’s most famous torrent site, The Pirate Bay gets a prominent mention, with the USTR noting that the site is of “symbolic importance as one of the longest-running and most vocal torrent sites. The USTR underlines the site’s resilience by noting its hydra-like form while revealing an apparent secret concerning its hosting arrangements.

“The Pirate Bay has allegedly had more than a dozen domains hosted in various countries around the world, applies a reverse proxy service, and uses a hosting provider in Vietnam to evade further enforcement action,” the USTR notes.

Other torrent sites singled out for criticism include RARBG, which was nominated for the listing by the movie industry. According to the USTR, the site is hosted in Bosnia and Herzegovina and has changed hosting services to prevent shutdowns in recent years.

1337x.to and the meta-search engine Torrentz2 are also given a prime mention, with the USTR noting that they are “two of the most popular torrent sites that allegedly infringe U.S. content industry’s copyrights.” Russia’s RuTracker is also targeted for criticism, with the government noting that it’s now one of the most popular torrent sites in the world.

Streaming & Cyberlockers

While torrent sites are still important, the USTR reserves considerable space in its report for streaming portals and cyberlocker-type services.

4Shared.com, a file-hosting site that has been targeted by dozens of millions of copyright notices, is reportedly no longer able to use major US payment providers. Nevertheless, the British Virgin Islands company still collects significant sums from premium accounts, advertising, and offshore payment processors, USTR notes.

Cyberlocker Rapidgator gets another prominent mention in 2017, with the USTR noting that the Russian-hosted platform generates millions of dollars every year through premium memberships while employing rewards and affiliate schemes.

Due to its increasing popularity as a hosting and streaming operation, Openload.co (Romania) is now a big target for the USTR. “The site is used frequently in combination with add-ons in illicit streaming devices. In November 2017, users visited Openload.co a staggering 270 million times,” the USTR writes.

Owned by a Swiss company and hosted in the Netherlands, the popular site Uploaded is also criticized by the US alongside France’s 1Fichier.com, which allegedly hosts pirate games while being largely unresponsive to takedown notices. Dopefile.pk, a Pakistan-based storage outfit, is also highlighted.

On the video streaming front, it’s perhaps no surprise that the USTR focuses on sites like FMovies (Sweden), GoStream (Vietnam), Movie4K.tv (Russia) and PrimeWire. An organization collectively known as the MovShare group which encompasses Nowvideo.sx, WholeCloud.net, NowDownload.cd, MeWatchSeries.to and WatchSeries.ac, among others, is also listed.

Unauthorized music / research papers

While most of the above are either focused on video or feature it as part of their repertoire, other sites are listed for their attention to music. Convert2MP3.net is named as one of the most popular stream-ripping sites in the world and is highlighted due to the prevalence of YouTube-downloader sites and the 2017 demise of YouTube-MP3.

“Convert2MP3.net does not appear to have permission from YouTube or other sites and does not have permission from right holders for a wide variety of music represented by major U.S. labels,” the USTR notes.

Given the amount of attention the site has received in 2017 as ‘The Pirate Bay of Research’, Libgen.io and Sci-Hub.io (not to mention the endless proxy and mirror sites that facilitate access) are given a detailed mention in this year’s report.

“Together these sites make it possible to download — all without permission and without remunerating authors, publishers or researchers — millions of copyrighted books by commercial publishers and university presses; scientific, technical and medical journal articles; and publications of technological standards,” the USTR writes.

Service providers

But it’s not only sites that are being put under pressure. Following a growing list of nominations in previous years, Swiss service provider Private Layer is again singled out as a rogue player in the market for hosting 1337x.to and Torrentz2.eu, among others.

“While the exact configuration of websites changes from year to year, this is the fourth consecutive year that the List has stressed the significant international trade impact of Private Layer’s hosting services and the allegedly infringing sites it hosts,” the USTR notes.

“Other listed and nominated sites may also be hosted by Private Layer but are using
reverse proxy services to obfuscate the true host from the public and from law enforcement.”

The USTR notes Switzerland’s efforts to close a legal loophole that restricts enforcement and looks forward to a positive outcome when the draft amendment is considered by parliament.

Perhaps a little surprisingly given its recent anti-piracy efforts and overtures to the US, Russia’s leading social network VK.com again gets a place on the new list. The USTR recognizes VK’s efforts but insists that more needs to be done.

Social networking and e-commerce

“In 2016, VK reached licensing agreements with major record companies, took steps to limit third-party applications dedicated to downloading infringing content from the site, and experimented with content recognition technologies,” the USTR writes.

“Despite these positive signals, VK reportedly continues to be a hub of infringing activity and the U.S. motion picture industry reports that they find thousands of infringing files on the site each month.”

Finally, in addition to traditional pirate sites, the US also lists online marketplaces that allegedly fail to meet appropriate standards. Re-added to the list in 2016 after a brief hiatus in 2015, China’s Alibaba is listed again in 2017. The development provoked an angry response from the company.

Describing his company as a “scapegoat”, Alibaba Group President Michael Evans said that his platform had achieved a 25% drop in takedown requests and has even been removing infringing listings before they make it online.

“In light of all this, it’s clear that no matter how much action we take and progress we make, the USTR is not actually interested in seeing tangible results,” Evans said in a statement.

The full list of sites in the Notorious Markets Report 2017 (pdf) can be found below.

– 1fichier.com – (cyberlocker)
– 4shared.com – (cyberlocker)
– convert2mp3.net – (stream-ripper)
– Dhgate.com (e-commerce)
– Dopefile.pl – (cyberlocker)
– Firestorm-servers.com (pirate gaming service)
– Fmovies.is, Fmovies.se, Fmovies.to – (streaming)
– Gostream.is, Gomovies.to, 123movieshd.to (streaming)
– Indiamart.com (e-commerce)
– Kinogo.club, kinogo.co (streaming host, platform)
– Libgen.io, sci-hub.io, libgen.pw, sci-hub.cc, sci-hub.bz, libgen.info, lib.rus.ec, bookfi.org, bookzz.org, booker.org, booksc.org, book4you.org, bookos-z1.org, booksee.org, b-ok.org (research downloads)
– Movshare Group – Nowvideo.sx, wholecloud.net, auroravid.to, bitvid.sx, nowdownload.ch, cloudtime.to, mewatchseries.to, watchseries.ac (streaming)
– Movie4k.tv (streaming)
– MP3VA.com (music)
– Openload.co (cyberlocker / streaming)
– 1337x.to (torrent site)
– Primewire.ag (streaming)
– Torrentz2, Torrentz2.me, Torrentz2.is (torrent site)
– Rarbg.to (torrent site)
– Rebel (domain company)
– Repelis.tv (movie and TV linking)
– RuTracker.org (torrent site)
– Rapidgator.net (cyberlocker)
– Taobao.com (e-commerce)
– The Pirate Bay (torrent site)
– TVPlus, TVBrowser, Kuaikan (streaming apps and addons, China)
– Uploaded.net (cyberlocker)
– VK.com (social networking)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Pirate Streaming on Facebook is a Seriously Risky Business

Post Syndicated from Andy original https://torrentfreak.com/pirate-streaming-on-facebook-is-a-seriously-risky-business-180114/

For more than a year the British public has been warned about the supposed dangers of Kodi piracy.

Dozens of headlines have claimed consequences ranging from system-destroying malware to prison sentences. Fortunately, most of them can be filed under “tabloid nonsense.”

That being said, there is an extremely important issue that deserves much closer attention, particularly given a shift in the UK legal climate during 2017. We’re talking about live streaming copyrighted content on Facebook, which is both incredibly easy and frighteningly risky.

This week it was revealed that 34-year-old Craig Foster from the UK had been given an ultimatum from Sky to pay a £5,000 settlement fee. The media giant discovered that he’d live-streamed the Anthony Joshua v Wladimir Klitschko fight on Facebook and wanted compensation to make a potential court case disappear.

While it may seem initially odd to use the word, Foster was lucky.

Under last year’s Digital Economy Act, he could’ve been jailed for up to ten years for distributing copyright-infringing content to the public, if he had “reason to believe that communicating the work to the public [would] cause loss to the owner of the copyright, or [would] expose the owner of the copyright to a risk of loss.”

Clearly, as a purchaser of the £19.95 pay-per-view himself, he would’ve appreciated that the event costs money. With that in mind, a court would likely find that he would have been aware that Sky would have been exposed to a “risk of loss”. Sky claim that 4,250 people watched the stream but the way the law is written, no specific level of loss is required for a breach of the law.

But it’s not just the threat of a jail sentence that’s the problem. People streaming live sports on Facebook are sitting ducks.

In Foster’s case, the fight he streamed was watermarked, which means that Sky put a tracking code into it which identified him personally as the buyer of the event. When he (or his friend, as Foster claims) streamed it on Facebook, it was trivial for Sky to capture the watermark and track it back to his Sky account.

Equally, it would be simplicity itself to see that the name on the Sky account had exactly the same name and details as Foster’s Facebook account. So, to most observers, it would appear that not only had Foster purchased the event, but he was also streaming it to Facebook illegally.

It’s important to keep something else in mind. No cooperation between Sky and Facebook would’ve been necessary to obtain Foster’s details. Take the amount of information most people share on Facebook, combine that with the information Sky already had, and the company’s anti-piracy team would have had a very easy job.

Now compare this situation with an upload of the same stream to a torrent site.

While the video capture would still contain Foster’s watermark, which would indicate the source, to prove he also distributed the video Sky would’ve needed to get inside a torrent swarm. From there they would need to capture the IP address of the initial seeder and take the case to court, to force an ISP to hand over that person’s details.

Presuming they were the same person, Sky would have a case, with a broadly similar level of evidence to that presented in the current matter. However, it would’ve taken them months to get their man and cost large sums of money to get there. It’s very unlikely that £5,000 would cover the costs, meaning a much, much bigger bill for the culprit.

Or, confident that Foster was behind the leak based on the watermark alone, Sky could’ve gone straight to the police. That never ends well.

The bottom line is that while live-streaming on Facebook is simplicity itself, people who do it casually from their own account (especially with watermarked content) are asking for trouble.

Nailing Foster was the piracy equivalent of shooting fish in a barrel but the worrying part is that he probably never gave his (or his friend’s…) alleged infringement a second thought. With a click or two, the fight was live and he was staring down the barrel of a potential jail sentence, had Sky not gone the civil route.

It’s scary stuff and not enough is being done to warn people of the consequences. Forget the scare stories attempting to deter people from watching fights or movies on Kodi, thoughtlessly streaming them to the public on social media is the real danger.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Are Torrent Sites Using DMCA Notices to Quash Their Competition?

Post Syndicated from Ernesto original https://torrentfreak.com/are-torrent-sites-using-dmca-notices-to-quash-their-competition-180114/

Every day, copyright holders send out millions of takedown notices to various services, hoping to protect their works.

While most of these requests are legitimate, the process is also being abused. Google prominently features examples of such dubious DMCA requests in its transparency report.

This week we were contacted by the owner of YTS.me after he noticed some unusual activity. In recent weeks his domain name has been targeted with a series of takedown notices from rather unusual people.

Senders with names such as Niklas Glockner, Michelle Williams, Maria Baader, Stefan Kuefer, Anja Herzog, and Markus Ostermann asked Google to remove thousands of YTS.me URLs.

Every notice lists just one movie title, but hundreds of links, most of which have nothing to do with the movie in question.

A few URLs from a single notice

These submitters are all relatively new and there is no sign that they are authorized by the applicable copyright holder. This, and the long list of irrelevant URLs suggest that these DMCA notices are abusive.

The owner of YTS.me believes that the senders have a clear motive. The purpose of the notices is to remove well-ranked pages and push the targeted sites down in Google’s search results.

“These all are fake people names submitting fake DMCA complaints and are not authorized to submit complaints,” the YTS.me operator notes.

“Even if they are real people they would have submitted, or are authorized to submit, complaints for only a few titles. Instead, they submit fake complaints and submit all the URLs possible on our website to degrade its ranking.”

The question that remains is, who is responsible for these notices? Looking at the list of sites that are targeted by these abusive senders we see a pattern emerge. They all target copycats of defunct sites such as YTS and ExtraTorrent.

Markus Osterman’s activity

This leads the YTS.me operator to the conclusion that one of its main competitors is sending these notices. While there is no hard evidence, it seems plausible that another YTS copycat is attempting to take the competition out of Google’s search results to gain more exposure itself.

YTS.me has a good idea of who the perpetrator(s) are – a person or group that also operates several other copycat sites. Thus far there’s no bulletproof evidence though, but it’s a likely explanation.

In any case, the DMCA takedown requests are definitely out of order and warrant further investigation by Google.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Fix Your Crawler

Post Syndicated from Bozho original https://techblog.bozho.net/fix-your-crawler/

Every now and then I open the admin panel of my blog hosting and ban a few IPs (after I’ve tried messaging their abuse email, if I find one). It is always IPs that are generating tons of requests (and traffic) – most likely running some home-made crawler. In some cases the IPs belong to an actual service that captures and provides content, in other cases it’s just a scraper for unknown reasons.

I don’t want to ban IPs, especially because that same IP may be reassigned to a legitimate user (or network) in the future. But they are increasing my hosting usage, which in turn leads to the hosting provider suggesting an upgrade in the plan. And this is not about me, I’m just an example – tons of requests to millions of sites are … useless.

My advice (and plea) is this – please fix your crawlers. Or scrapers. Or whatever you prefer to call that thing that programmatically goes on websites and gets their content.

How? First, reuse an existing crawler. No need to make something new (unless there’s a very specific use-case). A good intro and comparison can be seen here.

Second, make your crawler “polite” (the “politeness” property in the article above). Here’s a good overview on how to be polite, including respect for robots.txt. Existing implementations most likely have politeness options, but you may have to configure them.

Here I’d suggest another option – set a dynamic crawl rate per website that depends on how often the content is updated. My blog updates 3 times a month – no need to crawl it more than once or twice a day. TechCrunch updates many times a day; it’s probably a good idea to crawl it way more often. I don’t have a formula, but you can come up with one that ends up crawling different sites with periods between 2 minutes and 1 day.

Third, don’t “scrape” the content if a better protocol is supported. Many content websites have RSS – use that instead of the HTML of the page. If not, make use of sitemaps. If the WebSub protocol gains traction, you can avoid the crawling/scraping entirely and get notified on new content.

Finally, make sure your crawler/scraper is identifiable by the UserAgent. You can supply your service name or web address in it to make it easier for website owners to find you and complain in case you’ve misconfigured something.

I guess it makes sense to see if using a service like import.io, ScrapingHub, WrapAPI or GetData makes sense for your usecase, instead of reinventing the wheel.

No matter what your use case or approach is, please make sure you don’t put unnecessary pressure on others’ websites.

The post Fix Your Crawler appeared first on Bozho's tech blog.

Coalition Against Piracy Launches Landmark Case Against ‘Pirate’ Android Box Sellers

Post Syndicated from Andy original https://torrentfreak.com/coalition-against-piracy-launches-landmark-case-against-pirate-android-box-sellers-180112/

In 2017, anti-piracy enforcement went global when companies including Disney, HBO, Netflix, Amazon and NBCUniversal formed the Alliance for Creativity and Entertainment (ACE).

Soon after the Coalition Against Piracy (CAP) was announced. With a focus on Asia and backed by CASBAA, CAP counts many of the same companies among its members in addition to local TV providers such as StarHub.

From the outset, CAP has shown a keen interest in tackling unlicensed streaming, particularly that taking place via illicit set-top boxes stuffed with copyright-infringing apps and add-ons. One country under CAP’s spotlight is Singapore, where relevant law is said to be fuzzy at best, insufficient at worst. Now, however, a line in the sand might not be far away.

According to a court listing discovered by Singapore’s TodayOnline, today will see the Coalition Against Piracy’s general manager Neil Kevin Gane attempt to launch a pioneering private prosecution against set-top box distributor Synnex Trading and its client and wholesale goods retailer, An-Nahl.

Gane and CAP are said to be acting on behalf of four parties, one which is TV giant StarHub, a company with a huge interest in bringing media piracy under control in the region. It’s reported that they have also named Synnex Trading director Jia Xiaofen and An-Nahl director Abdul Nagib as defendants in their private criminal case after the parties failed to reach a settlement in an earlier process.

Contacted by TodayOnline, an employee of An-Nahl said the company no longer sells the boxes. However, Synnex is reportedly still selling them for S$219 each ($164) plus additional fees for maintenance and access to VOD. The company’s Facebook page is still active with the relevant offer presented prominently.

The importance of the case cannot be understated. While StarHub and other broadcasters have successfully prosecuted cases where people unlawfully decrypted broadcast signals, the provision of unlicensed streams isn’t specifically tackled by Singapore’s legislation. It’s now a major source of piracy in the region, as it is elsewhere around the globe.

Only time will tell how the process will play out but it’s clear that CAP and its members are prepared to invest significant sums into a prosecution for a favorable outcome. CAP believes that the supply of the boxes falls under Section 136 (3A) of the Copyright Act but only time will tell.

Last December, CAP separately called on the Singapore government to not only block ‘pirate’ streaming software but also unlicensed streams from entering the country.

“Within the Asia-Pacific region, Singapore is the worst in terms of availability of illicit streaming devices,” said CAP General Manager Neil Gane. “They have access to hundreds of illicit broadcasts of channels and video-on-demand content.”

CAP’s 21 members want the authorities to block the software inside devices that enables piracy but it’s far from clear how that can be achieved.

Update: The four companies taking the action are confirmed as Singtel, Starhub, Fox Network, and the English Premier League

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons