Tag Archives: limit

Optimize Delivery of Trending, Personalized News Using Amazon Kinesis and Related Services

Post Syndicated from Yukinori Koide original https://aws.amazon.com/blogs/big-data/optimize-delivery-of-trending-personalized-news-using-amazon-kinesis-and-related-services/

This is a guest post by Yukinori Koide, an the head of development for the Newspass department at Gunosy.

Gunosy is a news curation application that covers a wide range of topics, such as entertainment, sports, politics, and gourmet news. The application has been installed more than 20 million times.

Gunosy aims to provide people with the content they want without the stress of dealing with a large influx of information. We analyze user attributes, such as gender and age, and past activity logs like click-through rate (CTR). We combine this information with article attributes to provide trending, personalized news articles to users.

In this post, I show you how to process user activity logs in real time using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services.

Why does Gunosy need real-time processing?

Users need fresh and personalized news. There are two constraints to consider when delivering appropriate articles:

  • Time: Articles have freshness—that is, they lose value over time. New articles need to reach users as soon as possible.
  • Frequency (volume): Only a limited number of articles can be shown. It’s unreasonable to display all articles in the application, and users can’t read all of them anyway.

To deliver fresh articles with a high probability that the user is interested in them, it’s necessary to include not only past user activity logs and some feature values of articles, but also the most recent (real-time) user activity logs.

We optimize the delivery of articles with these two steps.

  1. Personalization: Deliver articles based on each user’s attributes, past activity logs, and feature values of each article—to account for each user’s interests.
  2. Trends analysis/identification: Optimize delivering articles using recent (real-time) user activity logs—to incorporate the latest trends from all users.

Optimizing the delivery of articles is always a cold start. Initially, we deliver articles based on past logs. We then use real-time data to optimize as quickly as possible. In addition, news has a short freshness time. Specifically, day-old news is past news, and even the news that is three hours old is past news. Therefore, shortening the time between step 1 and step 2 is important.

To tackle this issue, we chose AWS for processing streaming data because of its fully managed services, cost-effectiveness, and so on.

Solution

The following diagrams depict the architecture for optimizing article delivery by processing real-time user activity logs

There are three processing flows:

  1. Process real-time user activity logs.
  2. Store and process all user-based and article-based logs.
  3. Execute ad hoc or heavy queries.

In this post, I focus on the first processing flow and explain how it works.

Process real-time user activity logs

The following are the steps for processing user activity logs in real time using Kinesis Data Streams and Kinesis Data Analytics.

  1. The Fluentd server sends the following user activity logs to Kinesis Data Streams:
{"article_id": 12345, "user_id": 12345, "action": "click"}
{"article_id": 12345, "user_id": 12345, "action": "impression"}
...
  1. Map rows of logs to columns in Kinesis Data Analytics.

  1. Set the reference data to Kinesis Data Analytics from Amazon S3.

a. Gunosy has user attributes such as gender, age, and segment. Prepare the following CSV file (user_id, gender, segment_id) and put it in Amazon S3:

101,female,1
102,male,2
103,female,3
...

b. Add the application reference data source to Kinesis Data Analytics using the AWS CLI:

$ aws kinesisanalytics add-application-reference-data-source \
  --application-name <my-application-name> \
  --current-application-version-id <version-id> \
  --reference-data-source '{
  "TableName": "REFERENCE_DATA_SOURCE",
  "S3ReferenceDataSource": {
    "BucketARN": "arn:aws:s3:::<my-bucket-name>",
    "FileKey": "mydata.csv",
    "ReferenceRoleARN": "arn:aws:iam::<account-id>:role/..."
  },
  "ReferenceSchema": {
    "RecordFormat": {
      "RecordFormatType": "CSV",
      "MappingParameters": {
        "CSVMappingParameters": {"RecordRowDelimiter": "\n", "RecordColumnDelimiter": ","}
      }
    },
    "RecordEncoding": "UTF-8",
    "RecordColumns": [
      {"Name": "USER_ID", "Mapping": "0", "SqlType": "INTEGER"},
      {"Name": "GENDER",  "Mapping": "1", "SqlType": "VARCHAR(32)"},
      {"Name": "SEGMENT_ID", "Mapping": "2", "SqlType": "INTEGER"}
    ]
  }
}'

This application reference data source can be referred on Kinesis Data Analytics.

  1. Run a query against the source data stream on Kinesis Data Analytics with the application reference data source.

a. Define the temporary stream named TMP_SQL_STREAM.

CREATE OR REPLACE STREAM "TMP_SQL_STREAM" (
  GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER
);

b. Insert the joined source stream and application reference data source into the temporary stream.

CREATE OR REPLACE PUMP "TMP_PUMP" AS
INSERT INTO "TMP_SQL_STREAM"
SELECT STREAM
  R.GENDER, R.SEGMENT_ID, S.ARTICLE_ID, S.ACTION
FROM      "SOURCE_SQL_STREAM_001" S
LEFT JOIN "REFERENCE_DATA_SOURCE" R
  ON S.USER_ID = R.USER_ID;

c. Define the destination stream named DESTINATION_SQL_STREAM.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
  TIME TIMESTAMP, GENDER VARCHAR(32), SEGMENT_ID INTEGER, ARTICLE_ID INTEGER, 
  IMPRESSION INTEGER, CLICK INTEGER
);

d. Insert the processed temporary stream, using a tumbling window, into the destination stream per minute.

CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT STREAM
  ROW_TIME AS TIME,
  GENDER, SEGMENT_ID, ARTICLE_ID,
  SUM(CASE ACTION WHEN 'impression' THEN 1 ELSE 0 END) AS IMPRESSION,
  SUM(CASE ACTION WHEN 'click' THEN 1 ELSE 0 END) AS CLICK
FROM "TMP_SQL_STREAM"
GROUP BY
  GENDER, SEGMENT_ID, ARTICLE_ID,
  FLOOR("TMP_SQL_STREAM".ROWTIME TO MINUTE);

The results look like the following:

  1. Insert the results into Amazon Elasticsearch Service (Amazon ES).
  2. Batch servers get results from Amazon ES every minute. They then optimize delivering articles with other data sources using a proprietary optimization algorithm.

How to connect a stream to another stream in another AWS Region

When we built the solution, Kinesis Data Analytics was not available in the Asia Pacific (Tokyo) Region, so we used the US West (Oregon) Region. The following shows how we connected a data stream to another data stream in the other Region.

There is no need to continue containing all components in a single AWS Region, unless you have a situation where a response difference at the millisecond level is critical to the service.

Benefits

The solution provides benefits for both our company and for our users. Benefits for the company are cost savings—including development costs, operational costs, and infrastructure costs—and reducing delivery time. Users can now find articles of interest more quickly. The solution can process more than 500,000 records per minute, and it enables fast and personalized news curating for our users.

Conclusion

In this post, I showed you how we optimize trending user activities to personalize news using Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, and related AWS services in Gunosy.

AWS gives us a quick and economical solution and a good experience.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Joining and Enriching Streaming Data on Amazon Kinesis.


About the Authors

Yukinori Koide is the head of development for the Newspass department at Gunosy. He is working on standardization of provisioning and deployment flow, promoting the utilization of serverless and containers for machine learning and AI services. His favorite AWS services are DynamoDB, Lambda, Kinesis, and ECS.

 

 

 

Akihiro Tsukada is a start-up solutions architect with AWS. He supports start-up companies in Japan technically at many levels, ranging from seed to later-stage.

 

 

 

 

Yuta Ishii is a solutions architect with AWS. He works with our customers to provide architectural guidance for building media & entertainment services, helping them improve the value of their services when using AWS.

 

 

 

 

 

Cloud Babble: The Jargon of Cloud Storage

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/what-is-cloud-computing/

Cloud Babble

One of the things we in the technology business are good at is coming up with names, phrases, euphemisms, and acronyms for the stuff that we create. The Cloud Storage market is no different, and we’d like to help by illuminating some of the cloud storage related terms that you might come across. We know this is just a start, so please feel free to add in your favorites in the comments section below and we’ll update this post accordingly.

Clouds

The cloud is really just a collection of purpose built servers. In a public cloud the servers are shared between multiple unrelated tenants. In a private cloud, the servers are dedicated to a single tenant or sometimes a group of related tenants. A public cloud is off-site, while a private cloud can be on-site or off-site – or on-prem or off-prem, if you prefer.

Both Sides Now: Hybrid Clouds

Speaking of on-prem and off-prem, there are Hybrid Clouds or Hybrid Data Clouds depending on what you need. Both are based on the idea that you extend your local resources (typically on-prem) to the cloud (typically off-prem) as needed. This extension is controlled by software that decides, based on rules you define, what needs to be done where.

A Hybrid Data Cloud is specific to data. For example, you can set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

A Hybrid Cloud is similar to a Hybrid Data Cloud except it also extends compute. For example, at the end of the quarter, you can spin up order processing application instances off-prem as needed to add to your on-prem capacity. Of course, determining where the transactional data used and created by these applications resides can be an interesting systems design challenge.

Clouds in my Coffee: Fog

Typically, public and private clouds live in large buildings called data centers. Full of servers, networking equipment, and clean air, data centers need lots of power, lots of networking bandwidth, and lots of space. This often limits where data centers are located. The further away you are from a data center, the longer it generally takes to get your data to and from there. This is known as latency. That’s where “Fog” comes in.

Fog is often referred to as clouds close to the ground. Fog, in our cloud world, is basically having a “little” data center near you. This can make data storage and even cloud based processing faster for everyone nearby. Data, and less so processing, can be transferred to/from the Fog to the Cloud when time is less a factor. Data could also be aggregated in the Fog and sent to the Cloud. For example, your electric meter could report its minute-by-minute status to the Fog for diagnostic purposes. Then once a day the aggregated data could be send to the power company’s Cloud for billing purposes.

Another term used in place of Fog is Edge, as in computing at the Edge. In either case, a given cloud (data center) usually has multiple Edges (little data centers) connected to it. The connection between the Edge and the Cloud is sometimes known as the middle-mile. The network in the middle-mile can be less robust than that required to support a stand-alone data center. For example, the middle-mile can use 1 Gbps lines, versus a data center, which would require multiple 10 Gbps lines.

Heavy Clouds No Rain: Data

We’re all aware that we are creating, processing, and storing data faster than ever before. All of this data is stored in either a structured or more likely an unstructured way. Databases and data warehouses are structured ways to store data, but a vast amount of data is unstructured – meaning the schema and data access requirements are not known until the data is queried. A large pool of unstructured data in a flat architecture can be referred to as a Data Lake.

A Data Lake is often created so we can perform some type of “big data” analysis. In an over simplified example, let’s extend the lake metaphor a bit and ask the question; “how many fish are in our lake?” To get an answer, we take a sufficient sample of our lake’s water (data), count the number of fish we find, and extrapolate based on the size of the lake to get an answer within a given confidence interval.

A Data Lake is usually found in the cloud, an excellent place to store large amounts of non-transactional data. Watch out as this can lead to our data having too much Data Gravity or being locked in the Hotel California. This could also create a Data Silo, thereby making a potential data Lift-and-Shift impossible. Let me explain:

  • Data Gravity — Generally, the more data you collect in one spot, the harder it is to move. When you store data in a public cloud, you have to pay egress and/or network charges to download the data to another public cloud or even to your own on-premise systems. Some public cloud vendors charge a lot more than others, meaning that depending on your public cloud provider, your data could financially have a lot more gravity than you expected.
  • Hotel California — This is like Data Gravity but to a lesser scale. Your data is in the Hotel California if, to paraphrase, “your data can check out any time you want, but it can never leave.” If the cost of downloading your data is limiting the things you want to do with that data, then your data is in the Hotel California. Data is generally most valuable when used, and with cloud storage that can include archived data. This assumes of course that the archived data is readily available, and affordable, to download. When considering a cloud storage project always figure in the cost of using your own data.
  • Data Silo — Over the years, businesses have suffered from organizational silos as information is not shared between different groups, but instead needs to travel up to the top of the silo before it can be transferred to another silo. If your data is “trapped” in a given cloud by the cost it takes to share such data, then you may have a Data Silo, and that’s exactly opposite of what the cloud should do.
  • Lift-and-Shift — This term is used to define the movement of data or applications from one data center to another or from on-prem to off-prem systems. The move generally occurs all at once and once everything is moved, systems are operational and data is available at the new location with few, if any, changes. If your data has too much gravity or is locked in a hotel, a data lift-and-shift may break the bank.

I Can See Clearly Now

Hopefully, the cloudy terms we’ve covered are well, less cloudy. As we mentioned in the beginning, our compilation is just a start, so please feel free to add in your favorite cloud term in the comments section below and we’ll update this post with your contributions. Keep your entries “clean,” and please no words or phrases that are really adverts for your company. Thanks.

The post Cloud Babble: The Jargon of Cloud Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Article from a Former Chinese PLA General on Cyber Sovereignty

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/article_from_a_.html

Interesting article by Major General Hao Yeli, Chinese People’s Liberation Army (ret.), a senior advisor at the China International Institute for Strategic Society, Vice President of China Institute for Innovation and Development Strategy, and the Chair of the Guanchao Cyber Forum.

Against the background of globalization and the internet era, the emerging cyber sovereignty concept calls for breaking through the limitations of physical space and avoiding misunderstandings based on perceptions of binary opposition. Reinforcing a cyberspace community with a common destiny, it reconciles the tension between exclusivity and transferability, leading to a comprehensive perspective. China insists on its cyber sovereignty, meanwhile, it transfers segments of its cyber sovereignty reasonably. China rightly attaches importance to its national security, meanwhile, it promotes international cooperation and open development.

China has never been opposed to multi-party governance when appropriate, but rejects the denial of government’s proper role and responsibilities with respect to major issues. The multilateral and multiparty models are complementary rather than exclusive. Governments and multi-stakeholders can play different leading roles at the different levels of cyberspace.

In the internet era, the law of the jungle should give way to solidarity and shared responsibilities. Restricted connections should give way to openness and sharing. Intolerance should be replaced by understanding. And unilateral values should yield to respect for differences while recognizing the importance of diversity.

Scale Your Web Application — One Step at a Time

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/architecture/scale-your-web-application-one-step-at-a-time/

I often encounter people experiencing frustration as they attempt to scale their e-commerce or WordPress site—particularly around the cost and complexity related to scaling. When I talk to customers about their scaling plans, they often mention phrases such as horizontal scaling and microservices, but usually people aren’t sure about how to dive in and effectively scale their sites.

Now let’s talk about different scaling options. For instance if your current workload is in a traditional data center, you can leverage the cloud for your on-premises solution. This way you can scale to achieve greater efficiency with less cost. It’s not necessary to set up a whole powerhouse to light a few bulbs. If your workload is already in the cloud, you can use one of the available out-of-the-box options.

Designing your API in microservices and adding horizontal scaling might seem like the best choice, unless your web application is already running in an on-premises environment and you’ll need to quickly scale it because of unexpected large spikes in web traffic.

So how to handle this situation? Take things one step at a time when scaling and you may find horizontal scaling isn’t the right choice, after all.

For example, assume you have a tech news website where you did an early-look review of an upcoming—and highly-anticipated—smartphone launch, which went viral. The review, a blog post on your website, includes both video and pictures. Comments are enabled for the post and readers can also rate it. For example, if your website is hosted on a traditional Linux with a LAMP stack, you may find yourself with immediate scaling problems.

Let’s get more details on the current scenario and dig out more:

  • Where are images and videos stored?
  • How many read/write requests are received per second? Per minute?
  • What is the level of security required?
  • Are these synchronous or asynchronous requests?

We’ll also want to consider the following if your website has a transactional load like e-commerce or banking:

How is the website handling sessions?

  • Do you have any compliance requests—like the Payment Card Industry Data Security Standard (PCI DSS compliance) —if your website is using its own payment gateway?
  • How are you recording customer behavior data and fulfilling your analytics needs?
  • What are your loading balancing considerations (scaling, caching, session maintenance, etc.)?

So, if we take this one step at a time:

Step 1: Ease server load. We need to quickly handle spikes in traffic, generated by activity on the blog post, so let’s reduce server load by moving image and video to some third -party content delivery network (CDN). AWS provides Amazon CloudFront as a CDN solution, which is highly scalable with built-in security to verify origin access identity and handle any DDoS attacks. CloudFront can direct traffic to your on-premises or cloud-hosted server with its 113 Points of Presence (102 Edge Locations and 11 Regional Edge Caches) in 56 cities across 24 countries, which provides efficient caching.
Step 2: Reduce read load by adding more read replicas. MySQL provides a nice mirror replication for databases. Oracle has its own Oracle plug for replication and AWS RDS provide up to five read replicas, which can span across the region and even the Amazon database Amazon Aurora can have 15 read replicas with Amazon Aurora autoscaling support. If a workload is highly variable, you should consider Amazon Aurora Serverless database  to achieve high efficiency and reduced cost. While most mirror technologies do asynchronous replication, AWS RDS can provide synchronous multi-AZ replication, which is good for disaster recovery but not for scalability. Asynchronous replication to mirror instance means replication data can sometimes be stale if network bandwidth is low, so you need to plan and design your application accordingly.

I recommend that you always use a read replica for any reporting needs and try to move non-critical GET services to read replica and reduce the load on the master database. In this case, loading comments associated with a blog can be fetched from a read replica—as it can handle some delay—in case there is any issue with asynchronous reflection.

Step 3: Reduce write requests. This can be achieved by introducing queue to process the asynchronous message. Amazon Simple Queue Service (Amazon SQS) is a highly-scalable queue, which can handle any kind of work-message load. You can process data, like rating and review; or calculate Deal Quality Score (DQS) using batch processing via an SQS queue. If your workload is in AWS, I recommend using a job-observer pattern by setting up Auto Scaling to automatically increase or decrease the number of batch servers, using the number of SQS messages, with Amazon CloudWatch, as the trigger.  For on-premises workloads, you can use SQS SDK to create an Amazon SQS queue that holds messages until they’re processed by your stack. Or you can use Amazon SNS  to fan out your message processing in parallel for different purposes like adding a watermark in an image, generating a thumbnail, etc.

Step 4: Introduce a more robust caching engine. You can use Amazon Elastic Cache for Memcached or Redis to reduce write requests. Memcached and Redis have different use cases so if you can afford to lose and recover your cache from your database, use Memcached. If you are looking for more robust data persistence and complex data structure, use Redis. In AWS, these are managed services, which means AWS takes care of the workload for you and you can also deploy them in your on-premises instances or use a hybrid approach.

Step 5: Scale your server. If there are still issues, it’s time to scale your server.  For the greatest cost-effectiveness and unlimited scalability, I suggest always using horizontal scaling. However, use cases like database vertical scaling may be a better choice until you are good with sharding; or use Amazon Aurora Serverless for variable workloads. It will be wise to use Auto Scaling to manage your workload effectively for horizontal scaling. Also, to achieve that, you need to persist the session. Amazon DynamoDB can handle session persistence across instances.

If your server is on premises, consider creating a multisite architecture, which will help you achieve quick scalability as required and provide a good disaster recovery solution.  You can pick and choose individual services like Amazon Route 53, AWS CloudFormation, Amazon SQS, Amazon SNS, Amazon RDS, etc. depending on your needs.

Your multisite architecture will look like the following diagram:

In this architecture, you can run your regular workload on premises, and use your AWS workload as required for scalability and disaster recovery. Using Route 53, you can direct a precise percentage of users to an AWS workload.

If you decide to move all of your workloads to AWS, the recommended multi-AZ architecture would look like the following:

In this architecture, you are using a multi-AZ distributed workload for high availability. You can have a multi-region setup and use Route53 to distribute your workload between AWS Regions. CloudFront helps you to scale and distribute static content via an S3 bucket and DynamoDB, maintaining your application state so that Auto Scaling can apply horizontal scaling without loss of session data. At the database layer, RDS with multi-AZ standby provides high availability and read replica helps achieve scalability.

This is a high-level strategy to help you think through the scalability of your workload by using AWS even if your workload in on premises and not in the cloud…yet.

I highly recommend creating a hybrid, multisite model by placing your on-premises environment replica in the public cloud like AWS Cloud, and using Amazon Route53 DNS Service and Elastic Load Balancing to route traffic between on-premises and cloud environments. AWS now supports load balancing between AWS and on-premises environments to help you scale your cloud environment quickly, whenever required, and reduce it further by applying Amazon auto-scaling and placing a threshold on your on-premises traffic using Route 53.

BitTorrent Client Transmission Suffers Remote Takeover Vulnerability

Post Syndicated from Ernesto original https://torrentfreak.com/bittorrent-client-transmission-suffers-remote-takeover-vulnerability-180116/

With millions of active users, Transmission is one of the most used BitTorrent clients around, particularly for Mac users.

The application has been around for more than a decade and has a great reputation. However, as with any other type of software, it is not immune to vulnerabilities.

One rather concerning flaw was made public by Google vulnerability researcher Tavis Ormandy a few days ago. The flaw allows outsiders to gain access to Transmission via DNS rebinding. This ultimately allows attackers to control the BitTorrent client and execute custom code.

Ormandy has published a patch, which was also shared with the private Transmission security list at the end of November. Transmission, however, has yet to address the issue in an update.

The relatively slow response was the reason why Ormandy decided to make it public before Project Zero’s usual 90-day window expired, Ars highlights. This allows other projects to address the vulnerability right away.

“I’m finding it frustrating that the transmission developers are not responding on their private security list,” Google’s vulnerability researcher writes. “I’ve never had an opensource project take this long to fix a vulnerability before, so I usually don’t even mention the 90 day limit if the vulnerability is in an open source project.”

A member of the Transmission developer team informed Ars that they will address this ASAP, noting that the issue only affects users who have remote control enabled with the default password. This means that people who disable it or change their password can easily ‘patch’ it until the official update comes out.

Interestingly, this isn’t the last BitTorrent related vulnerability Ormandy plans to expose. According to one of his tweets on the matter, this is just the “first of a few remote code execution flaws in various popular torrent clients.”

Judging from a message the researcher sent late November, uTorrent is on the list as well. Apparently, the company’s security email address wasn’t set up correctly at the time, so BitTorrent inventor Bram Cohen has been acting as a forwarding service.

uTorrent?

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Early Challenges: Managing Cash Flow

Post Syndicated from Gleb Budman original https://www.backblaze.com/blog/managing-cash-flow/

Cash flow projection charts

This post by Backblaze’s CEO and co-founder Gleb Budman is the eighth in a series about entrepreneurship. You can choose posts in the series from the list below:

  1. How Backblaze got Started: The Problem, The Solution, and the Stuff In-Between
  2. Building a Competitive Moat: Turning Challenges Into Advantages
  3. From Idea to Launch: Getting Your First Customers
  4. How to Get Your First 1,000 Customers
  5. Surviving Your First Year
  6. How to Compete with Giants
  7. The Decision on Transparency
  8. Early Challenges: Managing Cash Flow

Use the Join button above to receive notification of new posts in this series.

Running out of cash is one of the quickest ways for a startup to go out of business. When you are starting a company the question of where to get cash is usually the top priority, but managing cash flow is critical for every stage in the lifecycle of a company. As a primarily bootstrapped but capital-intensive business, managing cash flow at Backblaze was and still is a key element of our success and requires continued focus. Let’s look at what we learned over the years.

Raising Your Initial Funding

When starting a tech business in Silicon Valley, the default assumption is that you will immediately try to raise venture funding. There are certainly many advantages to raising funding — not the least of which is that you don’t need to be cash-flow positive since you have cash in the bank and the expectation is that you will have a “burn rate,” i.e. you’ll be spending more than you make.

Note: While you’re not expected to be cash-flow positive, that doesn’t mean you don’t have to worry about cash. Cash-flow management will determine your burn rate. Whether you can get to cash-flow breakeven or need to raise another round of funding is a direct byproduct of your cash flow management.

Also, raising funding takes time (most successful fundraising cycles take 3-6 months start-to-finish), and time at a startup is in short supply. Constantly trying to raise funding can take away from product development and pursuing growth opportunities. If you’re not successful in raising funding, you then have to either shut down or find an alternate method of funding the business.

Sources of Funding

Depending on the stage of the company, type of company, and other factors, you may have access to different sources of funding. Let’s list a number of them:

Customers

Sales — the best kind of funding. It is non-dilutive, doesn’t have to be paid back, and is a direct metric of the success of your company.

Pre-Sales — some customers may be willing to pay you for a product in beta, a test, or pre-pay for a product they’ll receive when finished. Pre-Sales income also is great because it shares the characteristics of cash from sales, but you get the cash early. It also can be a good sign that the product you’re building fills a market need. We started charging for Backblaze computer backup while it was still in private beta, which allowed us to not only collect cash from customers, but also test the billing experience and users’ real desire for the service.

Services — if you’re a service company and customers are paying you for that, great. You can effectively scale for the number of hours available in a day. As demand grows, you can add more employees to increase the total number of billable hours.

Note: If you’re a product company and customers are paying you to consult, that can provide much needed cash, and could provide feedback toward the right product. However, it can also distract from your core business, send you down a path where you’re building a product for a single customer, and addict you to a path that prevents you from building a scalable business.

Investors

Yourself — you likely are putting your time into the business, and deferring salary in the process. You may also put your own cash into the business either as an investment or a loan.

Angels — angels are ideal as early investors since they are used to investing in businesses with little to no traction. AngelList is a good place to find them, though finding people you’re connected with through someone that knows you well is best.

Crowdfunding — a component of the JOBS Act permitted entrepreneurs to raise money from nearly anyone since May 2016. The SEC imposes limits on both investors and the companies. This article goes into some depth on the options and sites available.

VCs — VCs are ideal for companies that need to raise at least a few million dollars and intend to build a business that will be worth over $1 billion.

Debt

Friends & Family — F&F are often the first people to give you money because they are investing in you. It’s great to have some early supporters, but it also can be risky to take money from people who aren’t used to the risks. The key advice here is to only take money from people who won’t mind losing it. If someone is talking about using their children’s college funds or borrowing from their 401k, say ‘no thank you’ — even if they’re sure they want to loan you money.

Bank Loans — a variety of loan types exist, but most either require the company to have been operational for a couple years, be able to borrow against money the company has or is making, or be able to get a personal guarantee from the founders whereby their own credit is on the line. Fundera provides a good overview of loan options and can help secure some, but most will not be an option for a brand new startup.

Grants

Government — in some areas there is the potential for government grants to facilitate research. The SBIR program facilitates some such grants.

At Backblaze, we used a number of these options:

• Investors/Yourself
We loaned a cumulative total of a couple hundred thousand dollars to the company and invested our time by going without a salary for a year and a half.
• Customers/Pre-Sales
We started selling the Backblaze service while it was still in beta.
• Customers/Sales
We launched v1.0 and kept selling.
• Investors/Angels
After a year and a half, we raised $370k from 11 angels. All of them were either people whom we knew personally or were a strong recommendation from a mutual friend.
• Debt/Loans
After a couple years we were able to get equipment leases whereby the Storage Pods and hard drives were used as collateral to secure the lease on them.
• Investors/VCs
Ater five years we raised $5m from TMT Investments to add to the balance sheet and invest in growth.

The variety and quantity of sources we used is by no means uncommon.

GAAP vs. Cash

Most companies start tracking financials based on cash, and as they scale they switch to GAAP (Generally Accepted Accounting Principles). Cash is easier to track — we got paid $XXXX and spent $YYY — and as often mentioned, is required for the business to stay alive. GAAP has more subtlety and complexity, but provides a clearer picture of how the business is really doing. Backblaze was on a ‘cash’ system for the first few years, then switched to GAAP. For this post, I’m going to focus on things that help cash flow, not GAAP profitability.

Stages of Cash Flow Management

All-spend

In a pure service business (e.g. solo proprietor law firm), you may have no expenses other than your time, so this stage doesn’t exist. However, in a product business there is a period of time where you are building the product and have nothing to sell. You have zero cash coming in, but have cash going out. Your cash-flow is completely negative and you need funds to cover that.

Sales-generating

Starting to see cash come in from customers is thrilling. I initially had our system set up to email me with every $5 payment we received. You’re making sales, but not covering expenses.

Ramen-profitable

But it takes a lot of $5 payments to pay for servers and salaries, so for a while expenses are likely to outstrip sales. Getting to ramen-profitable is a critical stage where sales cover the business expenses and are “paying enough for the founders to eat ramen.” This extends the runway for a business, but is not completely sustainable, since presumably the founders can’t (or won’t) live forever on a subsistence salary.

Business-profitable

This is the ultimate stage whereby the business is truly profitable, including paying everyone market-rate salaries. A business at this stage is self-sustaining. (Of course, market shifts and plenty of other challenges can kill the business, but cash-flow issues alone will not.)

Note, I’m using the word ‘profitable’ here to mean this is still on a cash-basis.

Backblaze was in the all-spend stage for just over a year, during which time we built the service and hadn’t yet made the service available to customers. Backblaze was in the sales-generating stage for nearly another year before the company was barely ramen-profitable where sales were covering the company expenses and paying the founders minimum wage. (I say ‘barely’ since minimum wage in the SF Bay Area is arguably never subsistence.) It took almost three more years before the company was business-profitable, paying everyone including the founders market-rate.

Cash Flow Forecasting

When raising funding it’s helpful to think of milestones reached. You don’t necessarily need enough cash on day one to last for the next 100 years of the company. Some good milestones to consider are how much cash you need to prove there is a market need, prove you can build a product to meet that need, or get to ramen-profitable.

Two things to consider:

1) Unit Economics (COGS)

If your product is 100% software, this may not be relevant. Once software is built it costs effectively nothing to deliver the product to one customer or one million customers. However, in most businesses there is some incremental cost to provide the product. If you’re selling a hardware device, perhaps you sell it for $100 but it costs you $50 to make it. This is called “COGS” (Cost of Goods Sold).

Many products rely on cloud services where the costs scale with growth. That model works great, but it’s still important to understand what the costs are for the cloud service you use per unit of product you sell.

Support is often done by the founders early-on in a business, but that is another real cost to factor in and estimate on a per-user basis. Taking all of the per unit costs combined, you may charge $10/month/user for your service, but if it costs you $7/month/user in cloud services, you’re only netting $3/month/user.

2) Operating Expenses (OpEx)

These are expenses that don’t scale with the number of product units you sell. Typically this includes research & development, sales & marketing, and general & administrative expenses. Presumably there is a certain level of these functions required to build the product, market it, sell it, and run the organization. You can choose to invest or cut back on these, but you’ll still make the same amount per product unit.

Incremental Net Profit Per Unit

If you’ve calculated your COGS and your unit economics are “upside down,” where the amount you charge is less than that it costs you to provide your service, it’s worth thinking hard about how that’s going to change over time. If it will not change, there is no scale that will make the business work. Presuming you do make money on each unit of product you sell — what is sometimes referred to as “Contribution Margin” — consider how many of those product units you need to sell to cover your operating expenses as described above.

Calculating Your Profit

The math on getting to ramen-profitable is simple:

(Number of Product Units Sold x Contribution Margin) - Operating Expenses = Profit

If your operating expenses include subsistence salaries for the founders and profit > $0, you’re ramen-profitable.

Improving Cash Flow

Having access to sources of cash, whether from selling to customers or other methods, is excellent. But needing less cash gives you more choices and allows you to either dilute less, owe less, or invest more.

There are two ways to improve cash flow:

1) Collect More Cash

The best way to collect more cash is to provide more value to your customers and as a result have them pay you more. Additional features/products/services can allow this. However, you can also collect more cash by changing how you charge for your product. If you have a subscription, changing from charging monthly to yearly dramatically improves your cash flow. If you have a product that customers use up, selling a year’s supply instead of selling them one-by-one can help.

2) Spend Less Cash

Reducing COGS is a fantastic way to spend less cash in a scalable way. If you can do this without harming the product or customer experience, you win. There are a myriad of ways to also reduce operating expenses, including taking sub-market salaries, using your home instead of renting office space, staying focused on your core product, etc.

Ultimately, collecting more and spending less cash dramatically simplifies the process of getting to ramen-profitable and later to business-profitable.

Be Careful (Why GAAP Matters)

A word of caution: while running out of cash will put you out of business immediately, overextending yourself will likely put you out of business not much later. GAAP shows how a business is really doing; cash doesn’t. If you only focus on cash, it is possible to commit yourself to both delivering products and repaying loans in the future in an unsustainable fashion. If you’re taking out loans, watch the total balance and monthly payments you’re committing to. If you’re asking customers for pre-payment, make sure you believe you can deliver on what they’ve paid for.

Summary

There are numerous challenges to building a business, and ensuring you have enough cash is amongst the most important. Having the cash to keep going lets you keep working on all of the other challenges. The frameworks above were critical for maintaining Backblaze’s cash flow and cash balance. Hopefully you can take some of the lessons we learned and apply them to your business. Let us know what works for you in the comments below.

The post Early Challenges: Managing Cash Flow appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Take a Digital Tour of an AWS Data Center to See How AWS Secures Data Centers Around The World

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/take-a-digital-tour-of-an-aws-data-center-to-see-how-aws-secures-data-centers-around-the-world/

Data center tour banner image

AWS has launched a digital tour of an AWS data center, providing you with a first-ever look at how AWS secures data centers around the world. The videos, pictures, and information in this tour show you how security is intrinsic to the design of our data centers, our global controls, and the AWS culture.

As you will learn when you take this digital tour, the AWS data center security strategy is assembled with scalable security controls and multiple layers of defense that help to protect your information. For example, AWS carefully manages potential flood and seismic activity risks. We use physical barriers, security guards, threat detection technology, and an in-depth screening process to limit access to data centers. We back up our systems, regularly test equipment and processes, and continuously train AWS employees to be ready for the unexpected.

To validate the security of our data centers, external auditors perform testing on more than 2,600 standards and requirements throughout the year. Such independent examination helps ensure that security standards are consistently being met or exceeded. As a result, the most highly regulated organizations in the world trust AWS to protect their data.

Take the tour today to learn more about how we secure our data centers.

– Chad

Eevee gained 2791 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/15/eevee-gained-2791-experience-points/

Eevee grew to level 31!

A year strongly defined by mixed success! Also, a lot of video games.

I ran three game jams, resulting in a total of 157 games existing that may not have otherwise, which is totally mindblowing?!

For GAMES MADE QUICK???, glip and I made NEON PHASE, a short little exploratory platformer. Honestly, I should give myself more credit for this and the rest of the LÖVE games I’ve based on the same codebase — I wove a physics engine (and everything else!) from scratch and it has held up remarkably well for a variety of different uses.

I successfully finished an HD version of Isaac’s Descent using my LÖVE engine, though it doesn’t have anything new over the original and I’ve only released it as a tech demo on Patreon.

For Strawberry Jam (NSFW!) we made fox flux (slightly NSFW!), which felt like a huge milestone: the first game where I made all the art! I mean, not counting Isaac’s Descent, which was for a very limited platform. It’s a pretty arbitrary milestone, yes, but it feels significant. I’ve been working on expanding the game into a longer and slightly less buggy experience, but the art is taking the longest by far. I must’ve spent weeks on player sprites alone.

We then set about working on Bolthaven, a sequel of sorts to NEON PHASE, and got decently far, and then abandond it. Oops.

We then started a cute little PICO-8 game, and forgot about it. Oops.

I was recruited to help with Chaos Composer, a more ambitious game glip started with someone else in Unity. I had to get used to Unity, and we squabbled a bit, but the game is finally about at the point where it’s “playable” and “maps” can be designed? It’s slightly on hold at the moment while we all finish up some other stuff, though.

We made a birthday game for two of our friends whose birthdays were very close together! Only they got to see it.

For Ludum Dare 38, we made Lunar Depot 38, a little “wave shooter” or whatever you call those? The AI is pretty rough, seeing as this was the first time I’d really made enemies and I had 72 hours to figure out how to do it, but I still think it’s pretty fun to play and I love the circular world.

I made Roguelike Simulator as an experiment with making something small and quick with a simple tool, and I had a lot of fun! I definitely want to do more stuff like this in the future.

And now we’re working on a game about Star Anise, my cat’s self-insert, which is looking to have more polish and depth than anything we’ve done so far! We’ve definitely come a long way in a year.

Somewhere along the line, I put out a call for a “potluck” project, where everyone would give me sprites of a given size without knowing what anyone else had contributed, and I would then make a game using only those sprites. Unfortunately, that stalled a few times: I tried using the Phaser JS library, but we didn’t get along; I tried LÖVE, but didn’t know where to go with the game; and then I decided to use this as an experiment with procedural generation, and didn’t get around to it. I still feel bad that everyone did work for me and I didn’t follow through, but I don’t know whether this will ever become a game.

veekun, alas, consumed months of my life. I finally got Sun and Moon loaded, but it took weeks of work since I was basically reinventing all the tooling we’d ever had from scratch, without even having most of that tooling available as a reference. It was worth it in the end, at least: Ultra Sun and Ultra Moon only took a few days to get loaded. But veekun itself is still missing some obvious Sun/Moon features, and the whole site needs an overhaul, and I just don’t know if I want to dedicate that much time to it when I have so much other stuff going on that’s much more interesting to me right now.

I finally turned my blog into more of a website, giving it a neat front page that lists a bunch of stuff I’ve done. I made a release category at last, though I’m still not quite in the habit of using it.

I wrote some blog posts, of course! I think the most interesting were JavaScript got better while I wasn’t looking and Object models. I was also asked to write a couple pieces for money for a column that then promptly shut down.

On a whim, I made a set of Eevee mugshots for Doom, which I think is a decent indication of my (pixel) art progress over the year?

I started idchoppers, a Doom parsing and manipulation library written in Rust, though it didn’t get very far and I’ve spent most of the time fighting with Rust because it won’t let me implement all my extremely bad ideas. It can do a couple things, at least, like flip maps very quickly and render maps to SVG.

I did toy around with music a little, but not a lot.

I wrote two short twines for Flora. They’re okay. I’m working on another; I think it’ll be better.

I didn’t do a lot of art overall, at least compared to the two previous years; most of my art effort over the year has gone into fox flux, which requires me to learn a whole lot of things. I did dip my toes into 3D modelling, most notably producing my current Twitter banner as well as this cool Star Anise animation. I wouldn’t mind doing more of that; maybe I’ll even try to make a low-poly pixel-textured 3D game sometime.

I restarted my book with a much better concept, though so far I’ve only written about half a chapter. Argh. I see that the vast majority of the work was done within the span of a single week, which is bad since that means I only worked on it for a week, but good since that means I can actually do a pretty good amount of work in only a week. I also did a lot of squabbling with tooling, which is hopefully mostly out of the way now.

My computer broke? That was an exciting week.


A lot of stuff, but the year as a whole still feels hit or miss. All the time I spent on veekun feels like a black void in the middle of the year, which seems like a good sign that I maybe don’t want to pour even more weeks into it in the near future.

Mostly, I want to do: more games, more art, more writing, more music.

I want to try out some tiny game making tools and make some tiny games with them — partly to get exposure to different things, partly to get more little ideas out into the world regularly, and partly to get more practice at letting myself have ideas. I have a couple tools in mind and I guess I’ll aim at a microgame every two months or so? I’d also like to finish the expanded fox flux by the end of the year, of course, though at the moment I can’t even gauge how long it might take.

I seriously lapsed on drawing last year, largely because fox flux pixel art took me so much time. So I want to draw more, and I want to get much faster at pixel art. It would probably help if I had a more concrete goal for drawing, so I might try to draw some short comics and write a little visual novel or something, which would also force me to aim for consistency.

I want to work on my book more, of course, but I also want to try my hand at a bit more fiction. I’ve had a blast writing dialogue for our games! I just shy away from longer-form writing for some reason — which seems ridiculous when a large part of my audience found me through my blog. I do think I’ve had some sort of breakthrough in the last month or two; I suddenly feel a good bit more confident about writing in general and figuring out what I want to say? One recent post I know I wrote in a single afternoon, which virtually never happens because I keep rewriting and rearranging stuff. Again, a visual novel would be a good excuse to practice writing fiction without getting too bogged down in details.

And, ah, music. I shy heavily away from music, since I have no idea what I’m doing, and also I seem to spend a lot of time fighting with tools. (Surprise.) I tried out SunVox for the first time just a few days ago and have been enjoying it quite a bit for making sound effects, so I might try it for music as well. And once again, visual novel background music is a pretty low-pressure thing to compose for. Hell, visual novels are small games, too, so that checks all the boxes. I guess I’ll go make a visual novel.

Here’s to twenty gayteen!

US Govt Brands Torrent, Streaming & Cyberlocker Sites As Notorious Markets

Post Syndicated from Andy original https://torrentfreak.com/us-govt-brands-torrent-streaming-cyberlocker-sites-as-notorious-markets-180115/

In its annual “Out-of-Cycle Review of Notorious Markets” the office of the United States Trade Representative (USTR) has listed a long list of websites said to be involved in online piracy.

The list is compiled with high-level input from various trade groups, including the MPAA and RIAA who both submitted their recommendations (1,2) during early October last year.

With the word “allegedly” used more than two dozen times in the report, the US government notes that its report does not constitute cast-iron proof of illegal activity. However, it urges the countries from where the so-called “notorious markets” operate to take action where they can, while putting owners and facilitators on notice that their activities are under the spotlight.

“A goal of the List is to motivate appropriate action by owners, operators, and service providers in the private sector of these and similar markets, as well as governments, to reduce piracy and counterfeiting,” the report reads.

“USTR highlights the following marketplaces because they exemplify global counterfeiting and piracy concerns and because the scale of infringing activity in these marketplaces can cause significant harm to U.S. intellectual property (IP) owners, consumers, legitimate online platforms, and the economy.”

The report begins with a page titled “Issue Focus: Illicit Streaming Devices”. Unsurprisingly, particularly given their place in dozens of headlines last year, the segment focus on the set-top box phenomenon. The piece doesn’t list any apps or software tools as such but highlights the general position, claiming a cost to the US entertainment industry of $4-5 billion a year.

Torrent Sites

In common with previous years, the USTR goes on to list several of the world’s top torrent sites but due to changes in circumstances, others have been delisted. ExtraTorrent, which shut down May 2017, is one such example.

As the world’s most famous torrent site, The Pirate Bay gets a prominent mention, with the USTR noting that the site is of “symbolic importance as one of the longest-running and most vocal torrent sites. The USTR underlines the site’s resilience by noting its hydra-like form while revealing an apparent secret concerning its hosting arrangements.

“The Pirate Bay has allegedly had more than a dozen domains hosted in various countries around the world, applies a reverse proxy service, and uses a hosting provider in Vietnam to evade further enforcement action,” the USTR notes.

Other torrent sites singled out for criticism include RARBG, which was nominated for the listing by the movie industry. According to the USTR, the site is hosted in Bosnia and Herzegovina and has changed hosting services to prevent shutdowns in recent years.

1337x.to and the meta-search engine Torrentz2 are also given a prime mention, with the USTR noting that they are “two of the most popular torrent sites that allegedly infringe U.S. content industry’s copyrights.” Russia’s RuTracker is also targeted for criticism, with the government noting that it’s now one of the most popular torrent sites in the world.

Streaming & Cyberlockers

While torrent sites are still important, the USTR reserves considerable space in its report for streaming portals and cyberlocker-type services.

4Shared.com, a file-hosting site that has been targeted by dozens of millions of copyright notices, is reportedly no longer able to use major US payment providers. Nevertheless, the British Virgin Islands company still collects significant sums from premium accounts, advertising, and offshore payment processors, USTR notes.

Cyberlocker Rapidgator gets another prominent mention in 2017, with the USTR noting that the Russian-hosted platform generates millions of dollars every year through premium memberships while employing rewards and affiliate schemes.

Due to its increasing popularity as a hosting and streaming operation, Openload.co (Romania) is now a big target for the USTR. “The site is used frequently in combination with add-ons in illicit streaming devices. In November 2017, users visited Openload.co a staggering 270 million times,” the USTR writes.

Owned by a Swiss company and hosted in the Netherlands, the popular site Uploaded is also criticized by the US alongside France’s 1Fichier.com, which allegedly hosts pirate games while being largely unresponsive to takedown notices. Dopefile.pk, a Pakistan-based storage outfit, is also highlighted.

On the video streaming front, it’s perhaps no surprise that the USTR focuses on sites like FMovies (Sweden), GoStream (Vietnam), Movie4K.tv (Russia) and PrimeWire. An organization collectively known as the MovShare group which encompasses Nowvideo.sx, WholeCloud.net, NowDownload.cd, MeWatchSeries.to and WatchSeries.ac, among others, is also listed.

Unauthorized music / research papers

While most of the above are either focused on video or feature it as part of their repertoire, other sites are listed for their attention to music. Convert2MP3.net is named as one of the most popular stream-ripping sites in the world and is highlighted due to the prevalence of YouTube-downloader sites and the 2017 demise of YouTube-MP3.

“Convert2MP3.net does not appear to have permission from YouTube or other sites and does not have permission from right holders for a wide variety of music represented by major U.S. labels,” the USTR notes.

Given the amount of attention the site has received in 2017 as ‘The Pirate Bay of Research’, Libgen.io and Sci-Hub.io (not to mention the endless proxy and mirror sites that facilitate access) are given a detailed mention in this year’s report.

“Together these sites make it possible to download — all without permission and without remunerating authors, publishers or researchers — millions of copyrighted books by commercial publishers and university presses; scientific, technical and medical journal articles; and publications of technological standards,” the USTR writes.

Service providers

But it’s not only sites that are being put under pressure. Following a growing list of nominations in previous years, Swiss service provider Private Layer is again singled out as a rogue player in the market for hosting 1337x.to and Torrentz2.eu, among others.

“While the exact configuration of websites changes from year to year, this is the fourth consecutive year that the List has stressed the significant international trade impact of Private Layer’s hosting services and the allegedly infringing sites it hosts,” the USTR notes.

“Other listed and nominated sites may also be hosted by Private Layer but are using
reverse proxy services to obfuscate the true host from the public and from law enforcement.”

The USTR notes Switzerland’s efforts to close a legal loophole that restricts enforcement and looks forward to a positive outcome when the draft amendment is considered by parliament.

Perhaps a little surprisingly given its recent anti-piracy efforts and overtures to the US, Russia’s leading social network VK.com again gets a place on the new list. The USTR recognizes VK’s efforts but insists that more needs to be done.

Social networking and e-commerce

“In 2016, VK reached licensing agreements with major record companies, took steps to limit third-party applications dedicated to downloading infringing content from the site, and experimented with content recognition technologies,” the USTR writes.

“Despite these positive signals, VK reportedly continues to be a hub of infringing activity and the U.S. motion picture industry reports that they find thousands of infringing files on the site each month.”

Finally, in addition to traditional pirate sites, the US also lists online marketplaces that allegedly fail to meet appropriate standards. Re-added to the list in 2016 after a brief hiatus in 2015, China’s Alibaba is listed again in 2017. The development provoked an angry response from the company.

Describing his company as a “scapegoat”, Alibaba Group President Michael Evans said that his platform had achieved a 25% drop in takedown requests and has even been removing infringing listings before they make it online.

“In light of all this, it’s clear that no matter how much action we take and progress we make, the USTR is not actually interested in seeing tangible results,” Evans said in a statement.

The full list of sites in the Notorious Markets Report 2017 (pdf) can be found below.

– 1fichier.com – (cyberlocker)
– 4shared.com – (cyberlocker)
– convert2mp3.net – (stream-ripper)
– Dhgate.com (e-commerce)
– Dopefile.pl – (cyberlocker)
– Firestorm-servers.com (pirate gaming service)
– Fmovies.is, Fmovies.se, Fmovies.to – (streaming)
– Gostream.is, Gomovies.to, 123movieshd.to (streaming)
– Indiamart.com (e-commerce)
– Kinogo.club, kinogo.co (streaming host, platform)
– Libgen.io, sci-hub.io, libgen.pw, sci-hub.cc, sci-hub.bz, libgen.info, lib.rus.ec, bookfi.org, bookzz.org, booker.org, booksc.org, book4you.org, bookos-z1.org, booksee.org, b-ok.org (research downloads)
– Movshare Group – Nowvideo.sx, wholecloud.net, auroravid.to, bitvid.sx, nowdownload.ch, cloudtime.to, mewatchseries.to, watchseries.ac (streaming)
– Movie4k.tv (streaming)
– MP3VA.com (music)
– Openload.co (cyberlocker / streaming)
– 1337x.to (torrent site)
– Primewire.ag (streaming)
– Torrentz2, Torrentz2.me, Torrentz2.is (torrent site)
– Rarbg.to (torrent site)
– Rebel (domain company)
– Repelis.tv (movie and TV linking)
– RuTracker.org (torrent site)
– Rapidgator.net (cyberlocker)
– Taobao.com (e-commerce)
– The Pirate Bay (torrent site)
– TVPlus, TVBrowser, Kuaikan (streaming apps and addons, China)
– Uploaded.net (cyberlocker)
– VK.com (social networking)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

“Where to Invade Next” Popular Among North Korean Pirates

Post Syndicated from Ernesto original https://torrentfreak.com/where-to-invade-next-popular-among-north-korean-pirates-180114/

Due to the public nature of BitTorrent transfers, it’s easy to see what a person behind a certain IP-address is downloading.

There are even entire sites dedicated to making this information public. This includes the ‘I Know What You Download‘ service we’ve covered in the past.

While the data are not complete or perfect, looking at the larger numbers provides some interesting insights. The site recently released its overview of the most downloaded titles in various categories per country, for example.

What stands out is that there’s a lot of overlap between countries that seem vastly different.

Game of Thrones is the most downloaded TV show in America, but also in Iran, Mongolia, Uruguay, and Zambia. Other popular TV-shows in 2017, such as The Flash, The Big Bang Theory, and The Walking Dead also appear in the top ten in all these countries.

On the movie side, a similar picture emerges. Titles such as Wonder Woman, The Fate of the Furious, and Logan appear in many of the top tens. In fact, browsing through the result for various countries there are surprisingly little outliers.

The movie Prityazhenie does well in Russia and in India, Dangal is among the most pirated titles, but most titles appear globally. Even in North Korea, where Internet access is extremely limited, Game of Thrones is listed as the most downloaded TV-show.

However, North Korea also shows some odd results, perhaps because there are only a few downloads per day on average.

Browsing through the most downloaded movies we see that there are a lot of kids’ movies in the top ten, with ‘Despicable Me’ as the top result, followed by ‘Moana’ and ‘Minions’. The Hobbit trilogy also made it into the top ten.

12 most pirated movies in North Korea (2017)

The most eye-catching result, however, is the Michael Moore documentary ‘Where to Invade Next.’ While the title may suggest something more malicious, in this travelogue Moore ‘invades’ countries around the world to see in what areas the US can improve itself.

It’s unclear why North Koreans are so interested in this progressive film. Perhaps they are trying to pick up a few tips as well. This could also explain why good old MacGyver is listed among the most downloaded TV-series.

The annual overview of ‘I Know What You Download’ is available here, for those who are interested in more country statistics.

Finally, we have to note that North Korean IP-ranges have been vulnerable to hijacks in the past so you’re never 100% sure who might be using them. It might be the Russians…

Image credit: KNCA

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Wanted: Sales Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-sales-engineer/

At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.

At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.

We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.

Are you 1/2” deep into many different technologies, and unafraid to dive deeper?

Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?

Are you excited to setup complicated software in a lab and write knowledge base articles about your work?

Then Backblaze is the place for you!

Enough about Backblaze already, what’s in it for me?
In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.

Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:

  • How to setup VM servers for lab environments, both on-prem and using cloud services.
  • Create an automatically “resetting” demo environment for the sales team.
  • Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
  • Learn the basics of OAUTH and web single sign on (SSO).
  • Archive video workflows from camera to media asset management systems.
  • How upload/download files from Javascript by enabling CORS.
  • How to install and monitor online backup installations using RMM tools, like JAMF.
  • Tape (LTO) systems. (Yes – people still use tape for storage!)

How can I know if I’ll succeed in this role?

You have:

  • Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
  • Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
  • Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.

You are versed in:

  • The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
  • Building, installing, integrating and configuring applications on any operating system.
  • Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
  • The basics of TCP/IP networking and the HTTP protocol.
  • Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
  • Your background contains:

  • Bachelor’s degree in computer science or the equivalent.
  • 2+ years of experience as a pre or post-sales engineer.
  • The right extra credit:
    There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!

  • Experience using or programming against Amazon S3.
  • Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
  • Experience with photo or video media. Media archiving is a key market for Backblaze B2.
  • Program arduinos to automatically feed your dog.
  • Experience programming against web or REST APIs. (Point us towards your projects, if they are open source and available to link to.)
  • Experience with sales tools like Salesforce.
  • 3D print door stops.
  • Experience with Windows Servers, Active Directory, Group policies and the like.
  • What’s it like working with the Sales team?
    The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.

    We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.

    If this all sounds like you:

    1. Send an email to [email protected] with the position in the subject line.
    2. Tell us a bit about your Sales Engineering experience.
    3. Include your resume.

    The post Wanted: Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    12 B2 Power Tips for Experts and Developers

    Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/advanced-cloud-storage-tips/

    B2 Tips for Pros
    If you’ve been using B2 Cloud Storage for a while, you probably think you know all that you can do with it. But do you?

    We’ve put together a list of blazing power tips for experts and developers that will take you to the next level. Take a look below.

    If you’re new to B2, we have a list of power tips for you, too.
    Visit 12 Power Tips for New B2 Users.
    Backblaze logo

    1    Manage File Versions

    Use Lifecycle Rules on a Bucket to set how many days to keep files that are no longer the current version. This is a great way to manage the amount of space your B2 account is using.

    Backblaze logo

    2    Easily Stay on Top of Your B2 Account Limits

    Set usage caps and get text/email alerts for your B2 account when you approach limits that you define.

    Backblaze logo

    3    Bring on Your Big Files

    You can upload files as large as 10TB to B2.

    Backblaze logo

    4    You Can Use FedEx to Get Your Data into B2

    If you have over 20TB of data, you can use Backblaze’s Fireball hard disk array to load large volumes of data directly into your B2 account. We ship a Fireball to you and you ship it back.

    Backblaze logo

    5    You Have Command-Line Control of All B2 Functions

    You have complete control over B2 using our command line tool that is available for Macintosh, Windows, and Linux.

    Backblaze logo

    6    You Can Use Your Own Domain Name To Front a Public B2 Bucket

    You can create a vanity URL for your B2 account.

    Backblaze logo

    7    See What’s Happening in Your Account with Graphical Reports

    You can view graphical reports summarizing your B2 usage — transactions, downloads, averages, data stored — in your B2 account dashboard.

    Backblaze logo

    8    Create a B2 SDK

    You can build your own B2 SDK for JVM-based or JVM-compatible languages using our B2 Java SDK on Github.

    Backblaze logo

    9    B2’s API is Easy to Use

    B2’s API is similar to, but simpler than Amazon’s S3 API, making it super easy for developers to integrate with B2 Cloud Storage.

    Backblaze logo

    10    View Code Examples To Get Your B2 Project Started

    The B2 API is well documented and has code examples for cURL, Java, Python, Swift, Ruby, C#, and PHP. For example, here’s how to create a B2 Bucket.

    Backblaze logo

    11    Developers can set the B2 part size as low as 5 MB

    When working with large files, the minimum file part size can be set as low as 5MB or as high as 5GB. This gives developers the ability to maximize the throughput of B2 data uploads and downloads. See Large Files and Downloading for more developer tips.

    Backblaze logo

    12    Your App or Device Can Work with B2, as well

    Your B2 integration can be listed on Backblaze’s website. Visit Submit an Integration to get started.

    Want to Learn More About B2?

    You can find more information on B2 on our website and in our help pages.

    The post 12 B2 Power Tips for Experts and Developers appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    TVAddons and ZemTV Ask Court to Dismiss U.S. Piracy Lawsuit

    Post Syndicated from Ernesto original https://torrentfreak.com/tvaddons-and-zemtv-ask-court-to-dismiss-u-s-piracy-lawsuit-180108/

    Last year, American satellite and broadcast provider Dish Network targeted two well-known players in the third-party Kodi add-on ecosystem.

    In a complaint filed in a federal court in Texas, add-on ZemTV and the TVAddons library were accused of copyright infringement. As a result, both are facing up to $150,000 in damages for each offense.

    While the case was filed in Texas, neither of the defendants live there, or even in the United States. The owner and operator of TVAddons is Adam Lackman, who resides in Montreal, Canada. ZemTV’s developer Shahjahan Durrani is even further away in London, UK.

    Their limited connection to Texas is reason for the case to be dismissed, according to the legal team of the two defendants. They are represented by attorneys Erin Russel and Jason Sweet, who asked the Court to drop the case late last week.

    According to their motion, the Texas District Court does not have jurisdiction over the two defendants.

    “Lackman and Durrani have never been residents or citizens of Texas; they have never owned property in Texas; they have never voted in Texas; they have never personally visited Texas; they have never directed any business activity of any kind to anyone in Texas […] and they have never earned income in Texas,” the motion reads.

    Technically, defendants can be sued in a district they have never been, as long as they “directed actions” at the state or its citizens.

    According to Dish, this is the case here since both defendants made their services available to local residents, among other things. However, the defense team argues that’s not enough to establish jurisdiction in this case.

    “Plaintiff’s conclusory allegation that Lackman and Durrani marketed, made available, and distributed ZemTV service and the ZemTV add-on to consumers in the State of Texas and the Southern District of Texas is misleading at best,” the attorneys write.

    If the case proceeds this would go against the US constitution, violating the defendants’ due process rights. Whether the infringement claims hold ground or not, Dish has no right to sue, according to the defense.

    “Defendants are citizens of Canada and Great Britain and have not had sufficient contacts in the State of Texas for this Court to exercise personal jurisdiction over them. To do so would violate the Due Process Clause of the United States Constitution.”

    The Court must now decide whether the case can proceed or not. TorrentFreak reached out to TVAddons but the service wishes to refrain from commenting on the proceeding at the moment.

    Previously, TVAddons made it clear that it sees the Dish lawsuit as an attempt to destroy the Kodi addon community. One of the methods of attack it mentioned, was to sue people in foreign jurisdictions.

    “Most people don’t have money lying around to hire lawyers in places they’ve never even visited. This means that if a company sues you in a foreign country and you can’t afford a lawyer, you’re screwed even if you did nothing wrong,” TVAddons wrote at the time.

    A copy of the motion to dismiss is available here (pdf).

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    Top 10 Most Popular Torrent Sites of 2018

    Post Syndicated from Ernesto original https://torrentfreak.com/top-10-most-popular-torrent-sites-of-2018-180107/

    Torrent sites have come and gone over past year. Now, at the start of 2018, we take a look to see what the most-used sites are in the current landscape.

    The Pirate Bay remains the undisputed number one. The site has weathered a few storms over the years, but it looks like it will be able to celebrate its 15th anniversary, which is coming up in a few months.

    The list also includes various newcomers including Idope and Zooqle. While many people are happy to see new torrent sites emerge, this often means that others have called it quits.

    Last year’s runner-up Extratorrent, for example, has shut down and left a gaping hole behind. And it wasn’t the only site that went away. TorrentProject also disappeared without a trace and the same was true for isohunt.to.

    The unofficial Torrentz reincarnation Torrentz2.eu, the highest newcomer last year, is somewhat of an unusual entry. A few weeks ago all links to externally hosted torrents were removed, as was the list of indexed pages.

    We decided to include the site nonetheless, given its history and because it’s still possible to find hashes through the site. As Torrentz2’s future is uncertain, we added an extra site (10.1) as compensation.

    Finally, RuTracker also deserves a mention. The torrent site generates enough traffic to warrant a listing, but we traditionally limit the list to sites that are targeted primarily at an English or international audience.

    Below is the full list of the ten most-visited torrent sites at the start of the new year. The list is based on various traffic reports and we display the Alexa rank for each. In addition, we include last year’s ranking.

    Most Popular Torrent Sites

    1. The Pirate Bay

    The Pirate Bay is the “king of torrents” once again and also the oldest site in this list. The past year has been relatively quiet for the notorious torrent site, which is currently operating from its original .org domain name.

    Alexa Rank: 104/ Last year #1

    2. RARBG

    RARBG, which started out as a Bulgarian tracker, has captured the hearts and minds of many video pirates. The site was founded in 2008 and specializes in high quality video releases.

    Alexa Rank: 298 / Last year #3

    3. 1337x

    1337x continues where it left off last year. The site gained a lot of traffic and, unlike some other sites in the list, has a dedicated group of uploaders that provide fresh content.

    Alexa Rank: 321 / Last year #6

    4. Torrentz2

    Torrentz2 launched as a stand-in for the original Torrentz.eu site, which voluntarily closed its doors in 2016. At the time of writing, the site only lists torrent hashes and no longer any links to external torrent sites. While browser add-ons and plugins still make the site functional, its future is uncertain.

    Alexa Rank: 349 / Last year #5

    5. YTS.ag

    YTS.ag is the unofficial successors of the defunct YTS or YIFY group. Not all other torrent sites were happy that the site hijacked the popuar brand and several are actively banning its releases.

    Alexa Rank: 563 / Last year #4

    6. EZTV.ag

    The original TV-torrent distribution group EZTV shut down after a hostile takeover in 2015, with new owners claiming ownership of the brand. The new group currently operates from EZTV.ag and releases its own torrents. These releases are banned on some other torrent sites due to this controversial history.

    Alexa Rank: 981 / Last year #7

    7. LimeTorrents

    Limetorrents has been an established torrent site for more than half a decade. The site’s operator also runs the torrent cache iTorrents, which is used by several other torrent search engines.

    Alexa Rank: 2,433 / Last year #10

    8. NYAA.si

    NYAA.si is a popular resurrection of the anime torrent site NYAA, which shut down last year. Previously we left anime-oriented sites out of the list, but since we also include dedicated TV and movie sites, we decided that a mention is more than warranted.

    Alexa Rank: 1,575 / Last year #NA

    9. Torrents.me

    Torrents.me is one of the torrent sites that enjoyed a meteoric rise in traffic this year. It’s a meta-search engine that links to torrent files and magnet links from other torrent sites.

    Alexa Rank: 2,045 / Last year #NA

    10. Zooqle

    Zooqle, which boasts nearly three million verified torrents, has stayed under the radar for years but has still kept growing. The site made it into the top 10 for the first time this year.

    Alexa Rank: 2,347 / Last year #NA

    10.1 iDope

    The special 10.1 mention goes to iDope. Launched in 2016, the site is a relative newcomer to the torrent scene. The torrent indexer has steadily increased its audience over the past year. With similar traffic numbers to Zooqle, a listing is therefore warranted.

    Alexa Rank: 2,358 / Last year #NA

    Disclaimer: Yes, we know that Alexa isn’t perfect, but it helps to compare sites that operate in a similar niche. We also used other traffic metrics to compile the top ten. Please keep in mind that many sites have mirrors or alternative domains, which are not taken into account here.

    Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

    Physics cheats

    Post Syndicated from Eevee original https://eev.ee/blog/2018/01/06/physics-cheats/

    Anonymous asks:

    something about how we tweak physics to “work” better in games?

    Ho ho! Work. Get it? Like in physics…?

    Hitboxes

    Hitbox” is perhaps not the most accurate term, since the shape used for colliding with the environment and the shape used for detecting damage might be totally different. They’re usually the same in simple platformers, though, and that’s what most of my games have been.

    The hitbox is the biggest physics fudge by far, and it exists because of a single massive approximation that (most) games make: you’re controlling a single entity in the abstract, not a physical body in great detail.

    That is: when you walk with your real-world meat shell, you perform a complex dance of putting one foot in front of the other, a motion you spent years perfecting. When you walk in a video game, you press a single “walk” button. Your avatar may play an animation that moves its legs back and forth, but since you’re not actually controlling the legs independently (and since simulating them is way harder), the game just treats you like a simple shape. Fairly often, this is a box, or something very box-like.

    An Eevee sprite standing on faux ground; the size of the underlying image and the hitbox are outlined

    Since the player has no direct control over the exact placement of their limbs, it would be slightly frustrating to have them collide with the world. This is especially true in cases like the above, where the tail and left ear protrude significantly out from the main body. If that Eevee wanted to stand against a real-world wall, she would simply tilt her ear or tail out of the way, so there’s no reason for the ear to block her from standing against a game wall. To compensate for this, the ear and tail are left out of the collision box entirely and will simply jut into a wall if necessary — a goofy affordance that’s so common it doesn’t even register as unusual. As a bonus (assuming this same box is used for combat), she won’t take damage from projectiles that merely graze past an ear.

    (One extra consideration for sprite games in particular: the hitbox ought to be horizontally symmetric around the sprite’s pivot — i.e. the point where the entity is truly considered to be standing — so that the hitbox doesn’t abruptly move when the entity turns around!)

    Corners

    Treating the player (and indeed most objects) as a box has one annoying side effect: boxes have corners. Corners can catch on other corners, even by a single pixel. Real-world bodies tend to be a bit rounder and squishier and this can tolerate grazing a corner; even real-world boxes will simply rotate a bit.

    Ah, but in our faux physics world, we generally don’t want conscious actors (such as the player) to rotate, even with a realistic physics simulator! Real-world bodies are made of parts that will generally try to keep you upright, after all; you don’t tilt back and forth much.

    One way to handle corners is to simply remove them from conscious actors. A hitbox doesn’t have to be a literal box, after all. A popular alternative — especially in Unity where it’s a standard asset — is the pill-shaped capsule, which has semicircles/hemispheres on the top and bottom and a cylindrical body in 3D. No corners, no problem.

    Of course, that introduces a new problem: now the player can’t balance precariously on edges without their rounded bottom sliding them off. Alas.

    If you’re stuck with corners, then, you may want to use a corner bump, a term I just made up. If the player would collide with a corner, but the collision is only by a few pixels, just nudge them to the side a bit and carry on.

    An Eevee sprite trying to move sideways into a shallow ledge; the game bumps her upwards slightly, so she steps onto it instead

    When the corner is horizontal, this creates stairs! This is, more or less kinda, how steps work in Doom: when the player tries to cross from one sector into another, if the height difference is 24 units or less, the game simply bumps them upwards to the height of the new floor and lets them continue on.

    Implementing this in a game without Doom’s notion of sectors is a little trickier. In fact, I still haven’t done it. Collision detection based on rejection gets it for free, kinda, but it’s not very deterministic and it breaks other things. But that’s a whole other post.

    Gravity

    Gravity is pretty easy. Everything accelerates downwards all the time. What’s interesting are the exceptions.

    Jumping

    Jumping is a giant hack.

    Think about how actual jumping works: you tense your legs, which generally involves bending your knees first, and then spring upwards. In a platformer, you can just leap whenever you feel like it, which is nonsense. Also you go like twenty feet into the air?

    Worse, most platformers allow variable-height jumping, where your jump is lower if you let go of the jump button while you’re in the air. Normally, one would expect to have to decide how much force to put into the jump beforehand.

    But of course this is about convenience of controls: when jumping is your primary action, you want to be able to do it immediately, without any windup for how high you want to jump.

    (And then there’s double jumping? Come on.)

    Air control is a similar phenomenon: usually you’d jump in a particular direction by controlling how you push off the ground with your feet, but in a video game, you don’t have feet! You only have the box. The compromise is to let you control your horizontal movement to a limit degree in midair, even though that doesn’t make any sense. (It’s way more fun, though, and overall gives you more movement options, which are good to have in an interactive medium.)

    Air control also exposes an obvious place that game physics collide with the realistic model of serious physics engines. I’ve mentioned this before, but: if you use Real Physics™ and air control yourself into a wall, you might find that you’ll simply stick to the wall until you let go of the movement buttons. Why? Remember, player movement acts as though an external force were pushing you around (and from the perspective of a Real™ physics engine, this is exactly how you’d implement it) — so air-controlling into a wall is equivalent to pushing a book against a wall with your hand, and the friction with the wall holds you in place. Oops.

    Ground sticking

    Another place game physics conflict with physics engines is with running to the top of a slope. On a real hill, of course, you land on top of the slope and are probably glad of it; slopes are hard to climb!

    An Eevee moves to the top of a slope, and rather than step onto the flat top, she goes flying off into the air

    In a video game, you go flying. Because you’re a box. With momentum. So you hit the peak and keep going in the same direction. Which is diagonally upwards.

    Projectiles

    To make them more predictable, projectiles generally aren’t subject to gravity, at least as far as I’ve seen. The real world does not have such an exemption. The real world imposes gravity even on sniper rifles, which in a video game are often implemented as an instant trace unaffected by anything in the world because the bullet never actually exists in the world.

    Resistance

    Ah. Welcome to hell.

    Water

    Water is an interesting case, and offhand I don’t know the gritty details of how games implement it. In the real world, water applies a resistant drag force to movement — and that force is proportional to the square of velocity, which I’d completely forgotten until right now. I am almost positive that no game handles that correctly. But then, in real-world water, you can push against the water itself for movement, and games don’t simulate that either. What’s the rough equivalent?

    The Sonic Physics Guide suggests that Sonic handles it by basically halving everything: acceleration, max speed, friction, etc. When Sonic enters water, his speed is cut; when Sonic exits water, his speed is increased.

    That last bit feels validating — I could swear Metroid Prime did the same thing, and built my own solution around it, but couldn’t remember for sure. It makes no sense, of course, for a jump to become faster just because you happened to break the surface of the water, but it feels fantastic.

    The thing I did was similar, except that I didn’t want to add a multiplier in a dozen places when you happen to be underwater (and remember which ones need it to be squared, etc.). So instead, I calculate everything completely as normal, so velocity is exactly the same as it would be on dry land — but the distance you would move gets halved. The effect seems to be pretty similar to most platformers with water, at least as far as I can tell. It hasn’t shown up in a published game and I only added this fairly recently, so I might be overlooking some reason this is a bad idea.

    (One reason that comes to mind is that velocity is now a little white lie while underwater, so anything relying on velocity for interesting effects might be thrown off. Or maybe that’s correct, because velocity thresholds should be halved underwater too? Hm!)

    Notably, air is also a fluid, so it should behave the same way (just with different constants). I definitely don’t think any games apply air drag that’s proportional to the square of velocity.

    Friction

    Friction is, in my experience, a little handwaved. Probably because real-world friction is so darn complicated.

    Consider that in the real world, we want very high friction on the surfaces we walk on — shoes and tires are explicitly designed to increase it, even. We move by bracing a back foot against the ground and using that to push ourselves forward, so we want the ground to resist our push as much as possible.

    In a game world, we are a box. We move by being pushed by some invisible outside force, so if the friction between ourselves and the ground is too high, we won’t be able to move at all! That’s complete nonsense physically, but it turns out to be handy in some cases — for example, highish friction can simulate walking through deep mud, which should be difficult due to fluid drag and low friction.

    But the best-known example of the fakeness of game friction is video game ice. Walking on real-world ice is difficult because the low friction means low grip; your feet are likely to slip out from under you, and you’ll simply fall down and have trouble moving at all. In a video game, you can’t fall down, so you have the opposite experience: you spend most of your time sliding around uncontrollably. Yet ice is so common in video games (and perhaps so uncommon in places I’ve lived) that I, at least, had never really thought about this disparity until an hour or so ago.

    Game friction vs real-world friction

    Real-world friction is a force. It’s the normal force (which is the force exerted by the object on the surface) times some constant that depends on how the two materials interact.

    Force is mass times acceleration, and platformers often ignore mass, so friction ought to be an acceleration — applied against the object’s movement, but never enough to push it backwards.

    I haven’t made any games where variable friction plays a significant role, but my gut instinct is that low friction should mean the player accelerates more slowly but has a higher max speed, and high friction should mean the opposite. I see from my own source code that I didn’t even do what I just said, so let’s defer to some better-made and well-documented games: Sonic and Doom.

    In Sonic, friction is a fixed value subtracted from the player’s velocity (regardless of direction) each tic. Sonic has a fixed framerate, so the units are really pixels per tic squared (i.e. acceleration), multiplied by an implicit 1 tic per tic. So far, so good.

    But Sonic’s friction only applies if the player isn’t pressing or . Hang on, that isn’t friction at all; that’s just deceleration! That’s equivalent to jogging to a stop. If friction were lower, Sonic would take longer to stop, but otherwise this is only tangentially related to friction.

    (In fairness, this approach would decently emulate friction for non-conscious sliding objects, which are never going to be pressing movement buttons. Also, we don’t have the Sonic source code, and the name “friction” is a fan invention; the Sonic Physics Guide already uses “deceleration” to describe the player’s acceleration when turning around.)

    Okay, let’s try Doom. In Doom, the default friction is 90.625%.

    Hang on, what?

    Yes, in Doom, friction is a multiplier applied every tic. Doom runs at 35 tics per second, so this is a multiplier of 0.032 per second. Yikes!

    This isn’t anything remotely like real friction, but it’s much easier to implement. With friction as acceleration, the game has to know both the direction of movement (so it can apply friction in the opposite direction) and the magnitude (so it doesn’t overshoot and launch the object in the other direction). That means taking a semi-costly square root and also writing extra code to cap the amount of friction. With a multiplier, neither is necessary; just multiply the whole velocity vector and you’re done.

    There are some downsides. One is that objects will never actually stop, since multiplying by 3% repeatedly will never produce a result of zero — though eventually the speed will become small enough to either slip below a “minimum speed” threshold or simply no longer fit in a float representation. Another is that the units are fairly meaningless: with Doom’s default friction of 90.625%, about how long does it take for the player to stop? I have no idea, partly because “stop” is ambiguous here! If friction were an acceleration, I could divide it into the player’s max speed to get a time.

    All that aside, what are the actual effects of changing Doom’s friction? What an excellent question that’s surprisingly tricky to answer. (Note that friction can’t be changed in original Doom, only in the Boom port and its derivatives.) Here’s what I’ve pieced together.

    Doom’s “friction” is really two values. “Friction” itself is a multiplier applied to moving objects on every tic, but there’s also a move factor which defaults to \(\frac{1}{32} = 0.03125\) and is derived from friction for custom values.

    Every tic, the player’s velocity is multiplied by friction, and then increased by their speed times the move factor.

    $$
    v(n) = v(n – 1) \times friction + speed \times move factor
    $$

    Eventually, the reduction from friction will balance out the speed boost. That happens when \(v(n) = v(n – 1)\), so we can rearrange it to find the player’s effective max speed:

    $$
    v = v \times friction + speed \times move factor \\
    v – v \times friction = speed \times move factor \\
    v = speed \times \frac{move factor}{1 – friction}
    $$

    For vanilla Doom’s move factor of 0.03125 and friction of 0.90625, that becomes:

    $$
    v = speed \times \frac{\frac{1}{32}}{1 – \frac{29}{32}} = speed \times \frac{\frac{1}{32}}{\frac{3}{32}} = \frac{1}{3} \times speed
    $$

    Curiously, “speed” is three times the maximum speed an actor can actually move. Doomguy’s run speed is 50, so in practice he moves a third of that, or 16⅔ units per tic. (Of course, this isn’t counting SR40, a bug that lets Doomguy run ~40% faster than intended diagonally.)

    So now, what if you change friction? Even more curiously, the move factor is calculated completely differently depending on whether friction is higher or lower than the default Doom amount:

    $$
    move factor = \begin{cases}
    \frac{133 – 128 \times friction}{544} &≈ 0.244 – 0.235 \times friction & \text{ if } friction \ge \frac{29}{32} \\
    \frac{81920 \times friction – 70145}{1048576} &≈ 0.078 \times friction – 0.067 & \text{ otherwise }
    \end{cases}
    $$

    That’s pretty weird? Complicating things further is that low friction (which means muddy terrain, remember) has an extra multiplier on its move factor, depending on how fast you’re already going — the idea is apparently that you have a hard time getting going, but it gets easier as you find your footing. The extra multiplier maxes out at 8, which makes the two halves of that function meet at the vanilla Doom value.

    A graph of the relationship between friction and move factor

    That very top point corresponds to the move factor from the original game. So no matter what you do to friction, the move factor becomes lower. At 0.85 and change, you can no longer move at all; below that, you move backwards.

    From the formula above, it’s easy to see what changes to friction and move factor will do to Doomguy’s stable velocity. Move factor is in the numerator, so increasing it will increase stable velocity — but it can’t increase, so stable velocity can only ever decrease. Friction is in the denominator, but it’s subtracted from 1, so increasing friction will make the denominator a smaller value less than 1, i.e. increase stable velocity. Combined, we get this relationship between friction and stable velocity.

    A graph showing stable velocity shooting up dramatically as friction increases

    As friction approaches 1, stable velocity grows without bound. This makes sense, given the definition of \(v(n)\) — if friction is 1, the velocity from the previous tic isn’t reduced at all, so we just keep accelerating freely.

    All of this is why I’m wary of using multipliers.

    Anyway, this leaves me with one last question about the effects of Doom’s friction: how long does it take to reach stable velocity? Barring precision errors, we’ll never truly reach stable velocity, but let’s say within 5%. First we need a closed formula for the velocity after some number of tics. This is a simple recurrence relation, and you can write a few terms out yourself if you want to be sure this is right.

    $$
    v(n) = v_0 \times friction^n + speed \times move factor \times \frac{friction^n – 1}{friction – 1}
    $$

    Our initial velocity is zero, so the first term disappears. Set this equal to the stable formula and solve for n:

    $$
    speed \times move factor \times \frac{friction^n – 1}{friction – 1} = (1 – 5\%) \times speed \times \frac{move factor}{1 – friction} \\
    friction^n – 1 = -(1 – 5\%) \\
    n = \frac{\ln 5\%}{\ln friction}
    $$

    Speed” and move factor disappear entirely, which makes sense, and this is purely a function of friction (and how close we want to get). For vanilla Doom, that comes out to 30.4, which is a little less than a second. For other values of friction:

    A graph of time to stability which leaps upwards dramatically towards the right

    As friction increases (which in Doom terms means the surface is more slippery), it takes longer and longer to reach stable speed, which is in turn greater and greater. For lesser friction (i.e. mud), stable speed is lower, but reached fairly quickly. (Of course, the extra “getting going” multiplier while in mud adds some extra time here, but including that in the graph is a bit more complicated.)

    I think this matches with my instincts above. How fascinating!

    What’s that? This is way too much math and you hate it? Then don’t use multipliers in game physics.

    Uh

    That was a hell of a diversion!

    I guess the goofiest stuff in basic game physics is really just about mapping player controls to in-game actions like jumping and deceleration; the rest consists of hacks to compensate for representing everything as a box.

    Combine Transactional and Analytical Data Using Amazon Aurora and Amazon Redshift

    Post Syndicated from Re Alvarez-Parmar original https://aws.amazon.com/blogs/big-data/combine-transactional-and-analytical-data-using-amazon-aurora-and-amazon-redshift/

    A few months ago, we published a blog post about capturing data changes in an Amazon Aurora database and sending it to Amazon Athena and Amazon QuickSight for fast analysis and visualization. In this post, I want to demonstrate how easy it can be to take the data in Aurora and combine it with data in Amazon Redshift using Amazon Redshift Spectrum.

    With Amazon Redshift, you can build petabyte-scale data warehouses that unify data from a variety of internal and external sources. Because Amazon Redshift is optimized for complex queries (often involving multiple joins) across large tables, it can handle large volumes of retail, inventory, and financial data without breaking a sweat.

    In this post, we describe how to combine data in Aurora in Amazon Redshift. Here’s an overview of the solution:

    • Use AWS Lambda functions with Amazon Aurora to capture data changes in a table.
    • Save data in an Amazon S3
    • Query data using Amazon Redshift Spectrum.

    We use the following services:

    Serverless architecture for capturing and analyzing Aurora data changes

    Consider a scenario in which an e-commerce web application uses Amazon Aurora for a transactional database layer. The company has a sales table that captures every single sale, along with a few corresponding data items. This information is stored as immutable data in a table. Business users want to monitor the sales data and then analyze and visualize it.

    In this example, you take the changes in data in an Aurora database table and save it in Amazon S3. After the data is captured in Amazon S3, you combine it with data in your existing Amazon Redshift cluster for analysis.

    By the end of this post, you will understand how to capture data events in an Aurora table and push them out to other AWS services using AWS Lambda.

    The following diagram shows the flow of data as it occurs in this tutorial:

    The starting point in this architecture is a database insert operation in Amazon Aurora. When the insert statement is executed, a custom trigger calls a Lambda function and forwards the inserted data. Lambda writes the data that it received from Amazon Aurora to a Kinesis data delivery stream. Kinesis Data Firehose writes the data to an Amazon S3 bucket. Once the data is in an Amazon S3 bucket, it is queried in place using Amazon Redshift Spectrum.

    Creating an Aurora database

    First, create a database by following these steps in the Amazon RDS console:

    1. Sign in to the AWS Management Console, and open the Amazon RDS console.
    2. Choose Launch a DB instance, and choose Next.
    3. For Engine, choose Amazon Aurora.
    4. Choose a DB instance class. This example uses a small, since this is not a production database.
    5. In Multi-AZ deployment, choose No.
    6. Configure DB instance identifier, Master username, and Master password.
    7. Launch the DB instance.

    After you create the database, use MySQL Workbench to connect to the database using the CNAME from the console. For information about connecting to an Aurora database, see Connecting to an Amazon Aurora DB Cluster.

    The following screenshot shows the MySQL Workbench configuration:

    Next, create a table in the database by running the following SQL statement:

    Create Table
    CREATE TABLE Sales (
    InvoiceID int NOT NULL AUTO_INCREMENT,
    ItemID int NOT NULL,
    Category varchar(255),
    Price double(10,2), 
    Quantity int not NULL,
    OrderDate timestamp,
    DestinationState varchar(2),
    ShippingType varchar(255),
    Referral varchar(255),
    PRIMARY KEY (InvoiceID)
    )

    You can now populate the table with some sample data. To generate sample data in your table, copy and run the following script. Ensure that the highlighted (bold) variables are replaced with appropriate values.

    #!/usr/bin/python
    import MySQLdb
    import random
    import datetime
    
    db = MySQLdb.connect(host="AURORA_CNAME",
                         user="DBUSER",
                         passwd="DBPASSWORD",
                         db="DB")
    
    states = ("AL","AK","AZ","AR","CA","CO","CT","DE","FL","GA","HI","ID","IL","IN",
    "IA","KS","KY","LA","ME","MD","MA","MI","MN","MS","MO","MT","NE","NV","NH","NJ",
    "NM","NY","NC","ND","OH","OK","OR","PA","RI","SC","SD","TN","TX","UT","VT","VA",
    "WA","WV","WI","WY")
    
    shipping_types = ("Free", "3-Day", "2-Day")
    
    product_categories = ("Garden", "Kitchen", "Office", "Household")
    referrals = ("Other", "Friend/Colleague", "Repeat Customer", "Online Ad")
    
    for i in range(0,10):
        item_id = random.randint(1,100)
        state = states[random.randint(0,len(states)-1)]
        shipping_type = shipping_types[random.randint(0,len(shipping_types)-1)]
        product_category = product_categories[random.randint(0,len(product_categories)-1)]
        quantity = random.randint(1,4)
        referral = referrals[random.randint(0,len(referrals)-1)]
        price = random.randint(1,100)
        order_date = datetime.date(2016,random.randint(1,12),random.randint(1,30)).isoformat()
    
        data_order = (item_id, product_category, price, quantity, order_date, state,
        shipping_type, referral)
    
        add_order = ("INSERT INTO Sales "
                       "(ItemID, Category, Price, Quantity, OrderDate, DestinationState, \
                       ShippingType, Referral) "
                       "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)")
    
        cursor = db.cursor()
        cursor.execute(add_order, data_order)
    
        db.commit()
    
    cursor.close()
    db.close() 

    The following screenshot shows how the table appears with the sample data:

    Sending data from Amazon Aurora to Amazon S3

    There are two methods available to send data from Amazon Aurora to Amazon S3:

    • Using a Lambda function
    • Using SELECT INTO OUTFILE S3

    To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function to send data to Amazon S3 using Amazon Kinesis Data Firehose.

    Alternatively, you can use a SELECT INTO OUTFILE S3 statement to query data from an Amazon Aurora DB cluster and save it directly in text files that are stored in an Amazon S3 bucket. However, with this method, there is a delay between the time that the database transaction occurs and the time that the data is exported to Amazon S3 because the default file size threshold is 6 GB.

    Creating a Kinesis data delivery stream

    The next step is to create a Kinesis data delivery stream, since it’s a dependency of the Lambda function.

    To create a delivery stream:

    1. Open the Kinesis Data Firehose console
    2. Choose Create delivery stream.
    3. For Delivery stream name, type AuroraChangesToS3.
    4. For Source, choose Direct PUT.
    5. For Record transformation, choose Disabled.
    6. For Destination, choose Amazon S3.
    7. In the S3 bucket drop-down list, choose an existing bucket, or create a new one.
    8. Enter a prefix if needed, and choose Next.
    9. For Data compression, choose GZIP.
    10. In IAM role, choose either an existing role that has access to write to Amazon S3, or choose to generate one automatically. Choose Next.
    11. Review all the details on the screen, and choose Create delivery stream when you’re finished.

     

    Creating a Lambda function

    Now you can create a Lambda function that is called every time there is a change that needs to be tracked in the database table. This Lambda function passes the data to the Kinesis data delivery stream that you created earlier.

    To create the Lambda function:

    1. Open the AWS Lambda console.
    2. Ensure that you are in the AWS Region where your Amazon Aurora database is located.
    3. If you have no Lambda functions yet, choose Get started now. Otherwise, choose Create function.
    4. Choose Author from scratch.
    5. Give your function a name and select Python 3.6 for Runtime
    6. Choose and existing or create a new Role, the role would need to have access to call firehose:PutRecord
    7. Choose Next on the trigger selection screen.
    8. Paste the following code in the code window. Change the stream_name variable to the Kinesis data delivery stream that you created in the previous step.
    9. Choose File -> Save in the code editor and then choose Save.
    import boto3
    import json
    
    firehose = boto3.client('firehose')
    stream_name = ‘AuroraChangesToS3’
    
    
    def Kinesis_publish_message(event, context):
        
        firehose_data = (("%s,%s,%s,%s,%s,%s,%s,%s\n") %(event['ItemID'], 
        event['Category'], event['Price'], event['Quantity'],
        event['OrderDate'], event['DestinationState'], event['ShippingType'], 
        event['Referral']))
        
        firehose_data = {'Data': str(firehose_data)}
        print(firehose_data)
        
        firehose.put_record(DeliveryStreamName=stream_name,
        Record=firehose_data)

    Note the Amazon Resource Name (ARN) of this Lambda function.

    Giving Aurora permissions to invoke a Lambda function

    To give Amazon Aurora permissions to invoke a Lambda function, you must attach an IAM role with appropriate permissions to the cluster. For more information, see Invoking a Lambda Function from an Amazon Aurora DB Cluster.

    Once you are finished, the Amazon Aurora database has access to invoke a Lambda function.

    Creating a stored procedure and a trigger in Amazon Aurora

    Now, go back to MySQL Workbench, and run the following command to create a new stored procedure. When this stored procedure is called, it invokes the Lambda function you created. Change the ARN in the following code to your Lambda function’s ARN.

    DROP PROCEDURE IF EXISTS CDC_TO_FIREHOSE;
    DELIMITER ;;
    CREATE PROCEDURE CDC_TO_FIREHOSE (IN ItemID VARCHAR(255), 
    									IN Category varchar(255), 
    									IN Price double(10,2),
                                        IN Quantity int(11),
                                        IN OrderDate timestamp,
                                        IN DestinationState varchar(2),
                                        IN ShippingType varchar(255),
                                        IN Referral  varchar(255)) LANGUAGE SQL 
    BEGIN
      CALL mysql.lambda_async('arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:CDCFromAuroraToKinesis', 
         CONCAT('{ "ItemID" : "', ItemID, 
                '", "Category" : "', Category,
                '", "Price" : "', Price,
                '", "Quantity" : "', Quantity, 
                '", "OrderDate" : "', OrderDate, 
                '", "DestinationState" : "', DestinationState, 
                '", "ShippingType" : "', ShippingType, 
                '", "Referral" : "', Referral, '"}')
         );
    END
    ;;
    DELIMITER ;

    Create a trigger TR_Sales_CDC on the Sales table. When a new record is inserted, this trigger calls the CDC_TO_FIREHOSE stored procedure.

    DROP TRIGGER IF EXISTS TR_Sales_CDC;
     
    DELIMITER ;;
    CREATE TRIGGER TR_Sales_CDC
      AFTER INSERT ON Sales
      FOR EACH ROW
    BEGIN
      SELECT  NEW.ItemID , NEW.Category, New.Price, New.Quantity, New.OrderDate
      , New.DestinationState, New.ShippingType, New.Referral
      INTO @ItemID , @Category, @Price, @Quantity, @OrderDate
      , @DestinationState, @ShippingType, @Referral;
      CALL  CDC_TO_FIREHOSE(@ItemID , @Category, @Price, @Quantity, @OrderDate
      , @DestinationState, @ShippingType, @Referral);
    END
    ;;
    DELIMITER ;

    If a new row is inserted in the Sales table, the Lambda function that is mentioned in the stored procedure is invoked.

    Verify that data is being sent from the Lambda function to Kinesis Data Firehose to Amazon S3 successfully. You might have to insert a few records, depending on the size of your data, before new records appear in Amazon S3. This is due to Kinesis Data Firehose buffering. To learn more about Kinesis Data Firehose buffering, see the “Amazon S3” section in Amazon Kinesis Data Firehose Data Delivery.

    Every time a new record is inserted in the sales table, a stored procedure is called, and it updates data in Amazon S3.

    Querying data in Amazon Redshift

    In this section, you use the data you produced from Amazon Aurora and consume it as-is in Amazon Redshift. In order to allow you to process your data as-is, where it is, while taking advantage of the power and flexibility of Amazon Redshift, you use Amazon Redshift Spectrum. You can use Redshift Spectrum to run complex queries on data stored in Amazon S3, with no need for loading or other data prep.

    Just create a data source and issue your queries to your Amazon Redshift cluster as usual. Behind the scenes, Redshift Spectrum scales to thousands of instances on a per-query basis, ensuring that you get fast, consistent performance even as your dataset grows to beyond an exabyte! Being able to query data that is stored in Amazon S3 means that you can scale your compute and your storage independently. You have the full power of the Amazon Redshift query model and all the reporting and business intelligence tools at your disposal. Your queries can reference any combination of data stored in Amazon Redshift tables and in Amazon S3.

    Redshift Spectrum supports open, common data types, including CSV/TSV, Apache Parquet, SequenceFile, and RCFile. Files can be compressed using gzip or Snappy, with other data types and compression methods in the works.

    First, create an Amazon Redshift cluster. Follow the steps in Launch a Sample Amazon Redshift Cluster.

    Next, create an IAM role that has access to Amazon S3 and Athena. By default, Amazon Redshift Spectrum uses the Amazon Athena data catalog. Your cluster needs authorization to access your external data catalog in AWS Glue or Athena and your data files in Amazon S3.

    In the demo setup, I attached AmazonS3FullAccess and AmazonAthenaFullAccess. In a production environment, the IAM roles should follow the standard security of granting least privilege. For more information, see IAM Policies for Amazon Redshift Spectrum.

    Attach the newly created role to the Amazon Redshift cluster. For more information, see Associate the IAM Role with Your Cluster.

    Next, connect to the Amazon Redshift cluster, and create an external schema and database:

    create external schema if not exists spectrum_schema
    from data catalog 
    database 'spectrum_db' 
    region 'us-east-1'
    IAM_ROLE 'arn:aws:iam::XXXXXXXXXXXX:role/RedshiftSpectrumRole'
    create external database if not exists;

    Don’t forget to replace the IAM role in the statement.

    Then create an external table within the database:

     CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
      ItemID int,
      Category varchar,
      Price DOUBLE PRECISION,
      Quantity int,
      OrderDate TIMESTAMP,
      DestinationState varchar,
      ShippingType varchar,
      Referral varchar)
    ROW FORMAT DELIMITED
          FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n'
    LOCATION 's3://{BUCKET_NAME}/CDC/'

    Query the table, and it should contain data. This is a fact table.

    select top 10 * from spectrum_schema.ecommerce_sales

     

    Next, create a dimension table. For this example, we create a date/time dimension table. Create the table:

    CREATE TABLE date_dimension (
      d_datekey           integer       not null sortkey,
      d_dayofmonth        integer       not null,
      d_monthnum          integer       not null,
      d_dayofweek                varchar(10)   not null,
      d_prettydate        date       not null,
      d_quarter           integer       not null,
      d_half              integer       not null,
      d_year              integer       not null,
      d_season            varchar(10)   not null,
      d_fiscalyear        integer       not null)
    diststyle all;

    Populate the table with data:

    copy date_dimension from 's3://reparmar-lab/2016dates' 
    iam_role 'arn:aws:iam::XXXXXXXXXXXX:role/redshiftspectrum'
    DELIMITER ','
    dateformat 'auto';

    The date dimension table should look like the following:

    Querying data in local and external tables using Amazon Redshift

    Now that you have the fact and dimension table populated with data, you can combine the two and run analysis. For example, if you want to query the total sales amount by weekday, you can run the following:

    select sum(quantity*price) as total_sales, date_dimension.d_season
    from spectrum_schema.ecommerce_sales 
    join date_dimension on spectrum_schema.ecommerce_sales.orderdate = date_dimension.d_prettydate 
    group by date_dimension.d_season

    You get the following results:

    Similarly, you can replace d_season with d_dayofweek to get sales figures by weekday:

    With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to use file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost.

    Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Amazon Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Amazon Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of the supported compression algorithms in Amazon Redshift Spectrum, less data is scanned.

    Analyzing and visualizing Amazon Redshift data in Amazon QuickSight

    Modify the Amazon Redshift security group to allow an Amazon QuickSight connection. For more information, see Authorizing Connections from Amazon QuickSight to Amazon Redshift Clusters.

    After modifying the Amazon Redshift security group, go to Amazon QuickSight. Create a new analysis, and choose Amazon Redshift as the data source.

    Enter the database connection details, validate the connection, and create the data source.

    Choose the schema to be analyzed. In this case, choose spectrum_schema, and then choose the ecommerce_sales table.

    Next, we add a custom field for Total Sales = Price*Quantity. In the drop-down list for the ecommerce_sales table, choose Edit analysis data sets.

    On the next screen, choose Edit.

    In the data prep screen, choose New Field. Add a new calculated field Total Sales $, which is the product of the Price*Quantity fields. Then choose Create. Save and visualize it.

    Next, to visualize total sales figures by month, create a graph with Total Sales on the x-axis and Order Data formatted as month on the y-axis.

    After you’ve finished, you can use Amazon QuickSight to add different columns from your Amazon Redshift tables and perform different types of visualizations. You can build operational dashboards that continuously monitor your transactional and analytical data. You can publish these dashboards and share them with others.

    Final notes

    Amazon QuickSight can also read data in Amazon S3 directly. However, with the method demonstrated in this post, you have the option to manipulate, filter, and combine data from multiple sources or Amazon Redshift tables before visualizing it in Amazon QuickSight.

    In this example, we dealt with data being inserted, but triggers can be activated in response to an INSERT, UPDATE, or DELETE trigger.

    Keep the following in mind:

    • Be careful when invoking a Lambda function from triggers on tables that experience high write traffic. This would result in a large number of calls to your Lambda function. Although calls to the lambda_async procedure are asynchronous, triggers are synchronous.
    • A statement that results in a large number of trigger activations does not wait for the call to the AWS Lambda function to complete. But it does wait for the triggers to complete before returning control to the client.
    • Similarly, you must account for Amazon Kinesis Data Firehose limits. By default, Kinesis Data Firehose is limited to a maximum of 5,000 records/second. For more information, see Monitoring Amazon Kinesis Data Firehose.

    In certain cases, it may be optimal to use AWS Database Migration Service (AWS DMS) to capture data changes in Aurora and use Amazon S3 as a target. For example, AWS DMS might be a good option if you don’t need to transform data from Amazon Aurora. The method used in this post gives you the flexibility to transform data from Aurora using Lambda before sending it to Amazon S3. Additionally, the architecture has the benefits of being serverless, whereas AWS DMS requires an Amazon EC2 instance for replication.

    For design considerations while using Redshift Spectrum, see Using Amazon Redshift Spectrum to Query External Data.

    If you have questions or suggestions, please comment below.


    Additional Reading

    If you found this post useful, be sure to check out Capturing Data Changes in Amazon Aurora Using AWS Lambda and 10 Best Practices for Amazon Redshift Spectrum


    About the Authors

    Re Alvarez-Parmar is a solutions architect for Amazon Web Services. He helps enterprises achieve success through technical guidance and thought leadership. In his spare time, he enjoys spending time with his two kids and exploring outdoors.

     

     

     

    Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown

    Post Syndicated from Eben Upton original https://www.raspberrypi.org/blog/why-raspberry-pi-isnt-vulnerable-to-spectre-or-meltdown/

    Over the last couple of days, there has been a lot of discussion about a pair of security vulnerabilities nicknamed Spectre and Meltdown. These affect all modern Intel processors, and (in the case of Spectre) many AMD processors and ARM cores. Spectre allows an attacker to bypass software checks to read data from arbitrary locations in the current address space; Meltdown allows an attacker to read arbitrary data from the operating system kernel’s address space (which should normally be inaccessible to user programs).

    Both vulnerabilities exploit performance features (caching and speculative execution) common to many modern processors to leak data via a so-called side-channel attack. Happily, the Raspberry Pi isn’t susceptible to these vulnerabilities, because of the particular ARM cores that we use.

    To help us understand why, here’s a little primer on some concepts in modern processor design. We’ll illustrate these concepts using simple programs in Python syntax like this one:

    t = a+b
    u = c+d
    v = e+f
    w = v+g
    x = h+i
    y = j+k
    

    While the processor in your computer doesn’t execute Python directly, the statements here are simple enough that they roughly correspond to a single machine instruction. We’re going to gloss over some details (notably pipelining and register renaming) which are very important to processor designers, but which aren’t necessary to understand how Spectre and Meltdown work.

    For a comprehensive description of processor design, and other aspects of modern computer architecture, you can’t do better than Hennessy and Patterson’s classic Computer Architecture: A Quantitative Approach.

    What is a scalar processor?

    The simplest sort of modern processor executes one instruction per cycle; we call this a scalar processor. Our example above will execute in six cycles on a scalar processor.

    Examples of scalar processors include the Intel 486 and the ARM1176 core used in Raspberry Pi 1 and Raspberry Pi Zero.

    What is a superscalar processor?

    The obvious way to make a scalar processor (or indeed any processor) run faster is to increase its clock speed. However, we soon reach limits of how fast the logic gates inside the processor can be made to run; processor designers therefore quickly began to look for ways to do several things at once.

    An in-order superscalar processor examines the incoming stream of instructions and tries execute more than one at once, in one of several “pipes”, subject to dependencies between the instructions. Dependencies are important: you might think that a two-way superscalar processor could just pair up (or dual-issue) the six instructions in our example like this:

    t, u = a+b, c+d
    v, w = e+f, v+g
    x, y = h+i, j+k
    

    But this doesn’t make sense: we have to compute v before we can compute w, so the third and fourth instructions can’t be executed at the same time. Our two-way superscalar processor won’t be able to find anything to pair with the third instruction, so our example will execute in four cycles:

    t, u = a+b, c+d
    v    = e+f                   # second pipe does nothing here
    w, x = v+g, h+i
    y    = j+k
    

    Examples of superscalar processors include the Intel Pentium, and the ARM Cortex-A7 and Cortex-A53 cores used in Raspberry Pi 2 and Raspberry Pi 3 respectively. Raspberry Pi 3 has only a 33% higher clock speed than Raspberry Pi 2, but has roughly double the performance: the extra performance is partly a result of Cortex-A53’s ability to dual-issue a broader range of instructions than Cortex-A7.

    What is an out-of-order processor?

    Going back to our example, we can see that, although we have a dependency between v and w, we have other independent instructions later in the program that we could potentially have used to fill the empty pipe during the second cycle. An out-of-order superscalar processor has the ability to shuffle the order of incoming instructions (again subject to dependencies) in order to keep its pipelines busy.

    An out-of-order processor might effectively swap the definitions of w and x in our example like this:

    t = a+b
    u = c+d
    v = e+f
    x = h+i
    w = v+g
    y = j+k
    

    allowing it to execute in three cycles:

    t, u = a+b, c+d
    v, x = e+f, h+i
    w, y = v+g, j+k
    

    Examples of out-of-order processors include the Intel Pentium 2 (and most subsequent Intel and AMD x86 processors), and many recent ARM cores, including Cortex-A9, -A15, -A17, and -A57.

    What is speculation?

    Reordering sequential instructions is a powerful way to recover more instruction-level parallelism, but as processors become wider (able to triple- or quadruple-issue instructions) it becomes harder to keep all those pipes busy. Modern processors have therefore grown the ability to speculate. Speculative execution lets us issue instructions which might turn out not to be required (because they are branched over): this keeps a pipe busy, and if it turns out that the instruction isn’t executed, we can just throw the result away.

    To demonstrate the benefits of speculation, let’s look at another example:

    t = a+b
    u = t+c
    v = u+d
    if v:
       w = e+f
       x = w+g
       y = x+h
    

    Now we have dependencies from t to u to v, and from w to x to y, so a two-way out-of-order processor without speculation won’t ever be able to fill its second pipe. It spends three cycles computing t, u, and v, after which it knows whether the body of the if statement will execute, in which case it then spends three cycles computing w, x, and y. Assuming the if (a branch instruction) takes one cycle, our example takes either four cycles (if v turns out to be zero) or seven cycles (if v is non-zero).

    Speculation effectively shuffles the program like this:

    t = a+b
    u = t+c
    v = u+d
    w_ = e+f
    x_ = w_+g
    y_ = x_+h
    if v:
       w, x, y = w_, x_, y_
    

    so we now have additional instruction level parallelism to keep our pipes busy:

    t, w_ = a+b, e+f
    u, x_ = t+c, w_+g
    v, y_ = u+d, x_+h
    if v:
       w, x, y = w_, x_, y_
    

    Cycle counting becomes less well defined in speculative out-of-order processors, but the branch and conditional update of w, x, and y are (approximately) free, so our example executes in (approximately) three cycles.

    What is a cache?

    In the good old days*, the speed of processors was well matched with the speed of memory access. My BBC Micro, with its 2MHz 6502, could execute an instruction roughly every 2µs (microseconds), and had a memory cycle time of 0.25µs. Over the ensuing 35 years, processors have become very much faster, but memory only modestly so: a single Cortex-A53 in a Raspberry Pi 3 can execute an instruction roughly every 0.5ns (nanoseconds), but can take up to 100ns to access main memory.

    At first glance, this sounds like a disaster: every time we access memory, we’ll end up waiting for 100ns to get the result back. In this case, this example:

    a = mem[0]
    b = mem[1]
    

    would take 200ns.

    In practice, programs tend to access memory in relatively predictable ways, exhibiting both temporal locality (if I access a location, I’m likely to access it again soon) and spatial locality (if I access a location, I’m likely to access a nearby location soon). Caching takes advantage of these properties to reduce the average cost of access to memory.

    A cache is a small on-chip memory, close to the processor, which stores copies of the contents of recently used locations (and their neighbours), so that they are quickly available on subsequent accesses. With caching, the example above will execute in a little over 100ns:

    a = mem[0]    # 100ns delay, copies mem[0:15] into cache
    b = mem[1]    # mem[1] is in the cache
    

    From the point of view of Spectre and Meltdown, the important point is that if you can time how long a memory access takes, you can determine whether the address you accessed was in the cache (short time) or not (long time).

    What is a side channel?

    From Wikipedia:

    “… a side-channel attack is any attack based on information gained from the physical implementation of a cryptosystem, rather than brute force or theoretical weaknesses in the algorithms (compare cryptanalysis). For example, timing information, power consumption, electromagnetic leaks or even sound can provide an extra source of information, which can be exploited to break the system.”

    Spectre and Meltdown are side-channel attacks which deduce the contents of a memory location which should not normally be accessible by using timing to observe whether another location is present in the cache.

    Putting it all together

    Now let’s look at how speculation and caching combine to permit the Meltdown attack. Consider the following example, which is a user program that sometimes reads from an illegal (kernel) address:

    t = a+b
    u = t+c
    v = u+d
    if v:
       w = kern_mem[address]   # if we get here crash
       x = w&0x100
       y = user_mem[x]
    

    Now our out-of-order two-way superscalar processor shuffles the program like this:

    t, w_ = a+b, kern_mem[address]
    u, x_ = t+c, w_&0x100
    v, y_ = u+d, user_mem[x_]
    
    if v:
       # crash
       w, x, y = w_, x_, y_      # we never get here
    

    Even though the processor always speculatively reads from the kernel address, it must defer the resulting fault until it knows that v was non-zero. On the face of it, this feels safe because either:

    • v is zero, so the result of the illegal read isn’t committed to w
    • v is non-zero, so the program crashes before the read is committed to w

    However, suppose we flush our cache before executing the code, and arrange a, b, c, and d so that v is zero. Now, the speculative load in the third cycle:

    v, y_ = u+d, user_mem[x_]
    

    will read from either address 0x000 or address 0x100 depending on the eighth bit of the result of the illegal read. Because v is zero, the results of the speculative instructions will be discarded, and execution will continue. If we time a subsequent access to one of those addresses, we can determine which address is in the cache. Congratulations: you’ve just read a single bit from the kernel’s address space!

    The real Meltdown exploit is more complex than this, but the principle is the same. Spectre uses a similar approach to subvert software array bounds checks.

    Conclusion

    Modern processors go to great lengths to preserve the abstraction that they are in-order scalar machines that access memory directly, while in fact using a host of techniques including caching, instruction reordering, and speculation to deliver much higher performance than a simple processor could hope to achieve. Meltdown and Spectre are examples of what happens when we reason about security in the context of that abstraction, and then encounter minor discrepancies between the abstraction and reality.

    The lack of speculation in the ARM1176, Cortex-A7, and Cortex-A53 cores used in Raspberry Pi render us immune to attacks of the sort.

    * days may not be that old, or that good

    The post Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown appeared first on Raspberry Pi.

    Detecting Adblocker Blockers

    Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/detecting_adblo.html

    Interesting research on the prevalence of adblock blockers: “Measuring and Disrupting Anti-Adblockers Using Differential Execution Analysis“:

    Abstract: Millions of people use adblockers to remove intrusive and malicious ads as well as protect themselves against tracking and pervasive surveillance. Online publishers consider adblockers a major threat to the ad-powered “free” Web. They have started to retaliate against adblockers by employing anti-adblockers which can detect and stop adblock users. To counter this retaliation, adblockers in turn try to detect and filter anti-adblocking scripts. This back and forth has prompted an escalating arms race between adblockers and anti-adblockers.

    We want to develop a comprehensive understanding of anti-adblockers, with the ultimate aim of enabling adblockers to bypass state-of-the-art anti-adblockers. In this paper, we present a differential execution analysis to automatically detect and analyze anti-adblockers. At a high level, we collect execution traces by visiting a website with and without adblockers. Through differential execution analysis, we are able to pinpoint the conditions that lead to the differences caused by anti-adblocking code. Using our system, we detect anti-adblockers on 30.5% of the Alexa top-10K websites which is 5-52 times more than reported in prior literature. Unlike prior work which is limited to detecting visible reactions (e.g., warning messages) by anti-adblockers, our system can discover attempts to detect adblockers even when there is no visible reaction. From manually checking one third of the detected websites, we find that the websites that have no visible reactions constitute over 90% of the cases, completely dominating the ones that have visible warning messages. Finally, based on our findings, we further develop JavaScript rewriting and API hooking based solutions (the latter implemented as a Chrome extension) to help adblockers bypass state-of-the-art anti-adblockers.

    News article.