Tag Archives: Proxima

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!

Jeff;

ISPs Win Landmark Case to Protect Privacy of Alleged Pirates

Post Syndicated from Andy original https://torrentfreak.com/isps-win-landmark-case-protect-privacy-alleged-pirates-180508/

With waves of piracy settlement letters being sent out across the world, the last line of defense for many accused Internet users has been their ISPs.

In a number of regions, notably the United States, Europe, and the UK, most ISPs have given up the fight, handing subscriber details over to copyright trolls with a minimum of resistance. However, there are companies out there prepared to stand up for their customers’ rights, if eventually.

Over in Denmark, Telenor grew tired of tens of thousands of requests for subscriber details filed by a local law firm on behalf of international copyright troll groups. It previously complied with demands to hand over the details of individuals behind 22,000 IP addresses, around 11% of the 200,000 total handled by ISPs in Denmark. But with no end in sight, the ISP dug in its heels.

“We think there is a fundamental legal problem because the courts do not really decide what is most important: the legal security of the public or the law firms’ commercial interests,” Telenor’s Legal Director Mette Eistrøm Krüger said last year.

Assisted by rival ISP Telia, Telenor subsequently began preparing a case to protect the interests of their customers, refusing in the meantime to comply with disclosure requests in copyright cases. But last October, the District Court ruled against the telecoms companies, ordering them to provide identities to the copyright trolls.

Undeterred, the companies took their case to the Østre Landsret, one of Denmark’s two High Courts. Yesterday their determination paid off with a resounding victory for the ISPs and security for the individuals behind approximately 4,000 IP addresses targeted by Copyright Collection Ltd via law firm Njord Law.

“In its order based on telecommunications legislation, the Court has weighed subscribers’ rights to confidentiality of information regarding their use of the Internet against the interests of rightsholders to obtain information for the purpose of prosecuting claims against the subscribers,” the Court said in a statement.

Noting that the case raised important questions of European Union law and the European Convention on Human Rights, the High Court said that after due consideration it would overrule the decision of the District Court. The rights of the copyright holders do not trump the individuals right to privacy, it said.

“The telecommunications companies are therefore not required to disclose the names and addresses of their subscribers,” the Court ruled.

Telenor welcomed the decision, noting that it had received countless requests from law firms to disclose the identities of thousands of subscribers but had declined to hand them over, a decision that has now been endorsed by the High Court.

“This is an important victory for our right to protect our customers’ data,” said Telenor Denmark’s Legal Director, Mette Eistrøm Krüger.

“At Telenor we protect our customers’ data and trust – therefore it has been our conviction that we cannot be forced into almost automatically submitting personal data on our customers simply to support some private actors who are driven by commercial interests.”

Noting that it’s been putting up a fight since 2016 against handing over customers’ data for purposes other than investigating serious crime, Telenor said that the clarity provided by the decision is most welcome.

“We and other Danish telecom companies are required to log customer data for the police to fight serious crime and terrorism – but the legislation has just been insufficient in relation to the use of logged data,” Krüger said.

“Therefore I am pleased that with this judgment the High Court has stated that customers’ legal certainty is most important in these cases.”

The decision was also welcomed by Telia Denmark, with Legal Director Lasse Andersen describing the company as being “really really happy” with “a big win.”

“It is a victory for our customers and for all telecom companies’ customers,” Andersen said.

“They can now feel confident that the data that we collect about them cannot be disclosed for purposes other than the terms under which they are collected as determined by the jurisdiction.

“Therefore, anyone and everybody cannot claim our data. We are pleased that throughout the process we have determined that we will not hand over our data to anyone other than the police with a court order,” Andersen added.

But as the ISPs celebrate, the opposite is true for Njord Law and its copyright troll partners.

“It is a sad message to the Danish film and television industry that the possibilities for self-investigating illegal file sharing are complicated and that the work must be left to the police’s scarce resources,” said Jeppe Brogaard Clausen of Njord Law.

While the ISPs finally stood up for users in these cases, Telenor in particular wishes to emphasize that supporting the activities of pirates is not its aim. The company says it does not support illegal file-sharing “in any way” and is actively working with anti-piracy outfit Rights Alliance to prevent unauthorized downloading of movies and other content.

The full decision of the Østre Landsret can be found here (Danish, pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Converting a Kodak Box Brownie into a digital camera

Post Syndicated from Rob Zwetsloot original https://www.raspberrypi.org/blog/kodak-brownie-camera/

In this article from The MagPi issue 69, David Crookes explains how Daniel Berrangé took an old Kodak Brownie from the 1950s and turned it into a quirky digital camera. Get your copy of The MagPi magazine in stores now, or download it as a free PDF here.

Daniel Berrangé Kodak Brownie Raspberry Pi Camera

The Kodak Box Brownie

When Kodak unveiled its Box Brownie in 1900, it did so with the slogan ‘You press the button, we do the rest.’ The words referred to the ease-of-use of what was the world’s first mass-produced camera. But it could equally apply to Daniel Berrangé’s philosophy when modifying it for the 21st century. “I wanted to use the Box Brownie’s shutter button to trigger image capture, and make it simple to use,” he tells us.

Daniel Berrangé Kodak Brownie Raspberry Pi Camera

Daniel’s project grew from a previous effort in which he placed a pinhole webcam inside a ladies’ powder compact case. “The Box Brownie project is essentially a repeat of that design but with a normal lens instead of a pinhole, a real camera case, and improved software to enable a shutter button. Ideally, it would look unchanged from when it was shooting film.”

Webcam woes

At first, Daniel looked for a cheap webcam, intending to spend no more than the price of a Pi Zero. This didn’t work out too well. “The low-light performance of the webcam was not sufficient to make a pinhole camera so I just decided to make a ‘normal’ digital camera instead,” he reveals.
To that end, he began removing some internal components from the Box Brownie. “With the original lens removed, the task was to position the webcam’s electronic light sensor (the CCD) and lens as close to the front of the camera as possible,” Daniel explains. “In the end, the CCD was about 15 mm away from the front aperture of the camera, giving a field of view that was approximately the same as the unmodified camera would achieve.”

Daniel Berrangé Kodak Brownie Raspberry Pi Camera
Daniel Berrangé Kodak Brownie Raspberry Pi Camera
Daniel Berrangé Kodak Brownie Raspberry Pi Camera

It was then time for him to insert the Raspberry Pi, upon which was a custom ‘init’ binary that loads a couple of kernel modules to run the webcam, mount the microSD file system, and launch the application binary. Here, Daniel found he was in luck. “I’d noticed that the size of a 620 film spool (63 mm) was effectively the same as the width of a Raspberry Pi Zero (65 mm), so it could be held in place between the film spool grips,” he recalls. “It was almost as if it was designed with this in mind.”

Shutter success

In order to operate the camera, Daniel had to work on the shutter button. “The Box Brownie’s shutter button is entirely mechanical, driven by a handful of levers and springs,” Daniel explains. “First, the Pi Zero needs to know when the shutter button is pressed and second, the physical shutter has to be open while the webcam is capturing the image. Rather than try to synchronise image capture with the fraction of a second that the physical shutter is open, a bit of electrical tape was used on the shutter mechanism to keep it permanently open.”

Daniel Berrangé Kodak Brownie Raspberry Pi Camera

Daniel made use of the Pi Zero’s GPIO pins to detect the pressing of the shutter button. It determines if each pin is at 0 or 5 volts. “My thought was that I could set a GPIO pin high to 5 V, and then use the action of the shutter button to short it to ground, and detect this change in level from software.”

This initially involved using a pair of bare wires and some conductive paint, although the paint was later replaced by a piece of tinfoil. But with the button pressed, the GPIO pin level goes to zero and the device constantly captures still images until the button is released. All that’s left to do is smile and take the perfect snap.

The post Converting a Kodak Box Brownie into a digital camera appeared first on Raspberry Pi.

Implement continuous integration and delivery of serverless AWS Glue ETL applications using AWS Developer Tools

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-serverless-aws-glue-etl-applications-using-aws-developer-tools/

AWS Glue is an increasingly popular way to develop serverless ETL (extract, transform, and load) applications for big data and data lake workloads. Organizations that transform their ETL applications to cloud-based, serverless ETL architectures need a seamless, end-to-end continuous integration and continuous delivery (CI/CD) pipeline: from source code, to build, to deployment, to product delivery. Having a good CI/CD pipeline can help your organization discover bugs before they reach production and deliver updates more frequently. It can also help developers write quality code and automate the ETL job release management process, mitigate risk, and more.

AWS Glue is a fully managed data catalog and ETL service. It simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. AWS Glue crawls your data sources and constructs a data catalog using pre-built classifiers for popular data formats and data types, including CSV, Apache Parquet, JSON, and more.

When you are developing ETL applications using AWS Glue, you might come across some of the following CI/CD challenges:

  • Iterative development with unit tests
  • Continuous integration and build
  • Pushing the ETL pipeline to a test environment
  • Pushing the ETL pipeline to a production environment
  • Testing ETL applications using real data (live test)
  • Exploring and validating data

In this post, I walk you through a solution that implements a CI/CD pipeline for serverless AWS Glue ETL applications supported by AWS Developer Tools (including AWS CodePipeline, AWS CodeCommit, and AWS CodeBuild) and AWS CloudFormation.

Solution overview

The following diagram shows the pipeline workflow:

This solution uses AWS CodePipeline, which lets you orchestrate and automate the test and deploy stages for ETL application source code. The solution consists of a pipeline that contains the following stages:

1.) Source Control: In this stage, the AWS Glue ETL job source code and the AWS CloudFormation template file for deploying the ETL jobs are both committed to version control. I chose to use AWS CodeCommit for version control.

To get the ETL job source code and AWS CloudFormation template, download the gluedemoetl.zip file. This solution is developed based on a previous post, Build a Data Lake Foundation with AWS Glue and Amazon S3.

2.) LiveTest: In this stage, all resources—including AWS Glue crawlers, jobs, S3 buckets, roles, and other resources that are required for the solution—are provisioned, deployed, live tested, and cleaned up.

The LiveTest stage includes the following actions:

  • Deploy: In this action, all the resources that are required for this solution (crawlers, jobs, buckets, roles, and so on) are provisioned and deployed using an AWS CloudFormation template.
  • AutomatedLiveTest: In this action, all the AWS Glue crawlers and jobs are executed and data exploration and validation tests are performed. These validation tests include, but are not limited to, record counts in both raw tables and transformed tables in the data lake and any other business validations. I used AWS CodeBuild for this action.
  • LiveTestApproval: This action is included for the cases in which a pipeline administrator approval is required to deploy/promote the ETL applications to the next stage. The pipeline pauses in this action until an administrator manually approves the release.
  • LiveTestCleanup: In this action, all the LiveTest stage resources, including test crawlers, jobs, roles, and so on, are deleted using the AWS CloudFormation template. This action helps minimize cost by ensuring that the test resources exist only for the duration of the AutomatedLiveTest and LiveTestApproval

3.) DeployToProduction: In this stage, all the resources are deployed using the AWS CloudFormation template to the production environment.

Try it out

This code pipeline takes approximately 20 minutes to complete the LiveTest test stage (up to the LiveTest approval stage, in which manual approval is required).

To get started with this solution, choose Launch Stack:

This creates the CI/CD pipeline with all of its stages, as described earlier. It performs an initial commit of the sample AWS Glue ETL job source code to trigger the first release change.

In the AWS CloudFormation console, choose Create. After the template finishes creating resources, you see the pipeline name on the stack Outputs tab.

After that, open the CodePipeline console and select the newly created pipeline. Initially, your pipeline’s CodeCommit stage shows that the source action failed.

Allow a few minutes for your new pipeline to detect the initial commit applied by the CloudFormation stack creation. As soon as the commit is detected, your pipeline starts. You will see the successful stage completion status as soon as the CodeCommit source stage runs.

In the CodeCommit console, choose Code in the navigation pane to view the solution files.

Next, you can watch how the pipeline goes through the LiveTest stage of the deploy and AutomatedLiveTest actions, until it finally reaches the LiveTestApproval action.

At this point, if you check the AWS CloudFormation console, you can see that a new template has been deployed as part of the LiveTest deploy action.

At this point, make sure that the AWS Glue crawlers and the AWS Glue job ran successfully. Also check whether the corresponding databases and external tables have been created in the AWS Glue Data Catalog. Then verify that the data is validated using Amazon Athena, as shown following.

Open the AWS Glue console, and choose Databases in the navigation pane. You will see the following databases in the Data Catalog:

Open the Amazon Athena console, and run the following queries. Verify that the record counts are matching.

SELECT count(*) FROM "nycitytaxi_gluedemocicdtest"."data";
SELECT count(*) FROM "nytaxiparquet_gluedemocicdtest"."datalake";

The following shows the raw data:

The following shows the transformed data:

The pipeline pauses the action until the release is approved. After validating the data, manually approve the revision on the LiveTestApproval action on the CodePipeline console.

Add comments as needed, and choose Approve.

The LiveTestApproval stage now appears as Approved on the console.

After the revision is approved, the pipeline proceeds to use the AWS CloudFormation template to destroy the resources that were deployed in the LiveTest deploy action. This helps reduce cost and ensures a clean test environment on every deployment.

Production deployment is the final stage. In this stage, all the resources—AWS Glue crawlers, AWS Glue jobs, Amazon S3 buckets, roles, and so on—are provisioned and deployed to the production environment using the AWS CloudFormation template.

After successfully running the whole pipeline, feel free to experiment with it by changing the source code stored on AWS CodeCommit. For example, if you modify the AWS Glue ETL job to generate an error, it should make the AutomatedLiveTest action fail. Or if you change the AWS CloudFormation template to make its creation fail, it should affect the LiveTest deploy action. The objective of the pipeline is to guarantee that all changes that are deployed to production are guaranteed to work as expected.

Conclusion

In this post, you learned how easy it is to implement CI/CD for serverless AWS Glue ETL solutions with AWS developer tools like AWS CodePipeline and AWS CodeBuild at scale. Implementing such solutions can help you accelerate ETL development and testing at your organization.

If you have questions or suggestions, please comment below.

 


Additional Reading

If you found this post useful, be sure to check out Implement Continuous Integration and Delivery of Apache Spark Applications using AWS and Build a Data Lake Foundation with AWS Glue and Amazon S3.

 


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

 

Achieving Major Stability and Performance Improvements in Yahoo Mail with a Novel Redux Architecture

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/173062946866

yahoodevelopers:

By Mohit Goenka, Gnanavel Shanmugam, and Lance Welsh

At Yahoo Mail, we’re constantly striving to upgrade our product experience. We do this not only by adding new features based on our members’ feedback, but also by providing the best technical solutions to power the most engaging experiences. As such, we’ve recently introduced a number of novel and unique revisions to the way in which we use Redux that have resulted in significant stability and performance improvements. Developers may find our methods useful in achieving similar results in their apps.

Improvements to product metrics

Last year Yahoo Mail implemented a brand new architecture using Redux. Since then, we have transformed the overall architecture to reduce latencies in various operations, reduce JavaScript exceptions, and better synchronized states. As a result, the product is much faster and more stable.

Stability improvements:

  • when checking for new emails – 20%
  • when reading emails – 30%
  • when sending emails – 20%

Performance improvements:

  • 10% improvement in page load performance
  • 40% improvement in frame rendering time

We have also reduced API calls by approximately 20%.

How we use Redux in Yahoo Mail

Redux architecture is reliant on one large store that represents the application state. In a Redux cycle, action creators dispatch actions to change the state of the store. React Components then respond to those state changes. We’ve made some modifications on top of this architecture that are atypical in the React-Redux community.

For instance, when fetching data over the network, the traditional methodology is to use Thunk middleware. Yahoo Mail fetches data over the network from our API. Thunks would create an unnecessary and undesirable dependency between the action creators and our API. If and when the API changes, the action creators must then also change. To keep these concerns separate we dispatch the action payload from the action creator to store them in the Redux state for later processing by “action syncers”. Action syncers use the payload information from the store to make requests to the API and process responses. In other words, the action syncers form an API layer by interacting with the store. An additional benefit to keeping the concerns separate is that the API layer can change as the backend changes, thereby preventing such changes from bubbling back up into the action creators and components. This also allowed us to optimize the API calls by batching, deduping, and processing the requests only when the network is available. We applied similar strategies for handling other side effects like route handling and instrumentation. Overall, action syncers helped us to reduce our API calls by ~20% and bring down API errors by 20-30%.

Another change to the normal Redux architecture was made to avoid unnecessary props. The React-Redux community has learned to avoid passing unnecessary props from high-level components through multiple layers down to lower-level components (prop drilling) for rendering. We have introduced action enhancers middleware to avoid passing additional unnecessary props that are purely used when dispatching actions. Action enhancers add data to the action payload so that data does not have to come from the component when dispatching the action. This avoids the component from having to receive that data through props and has improved frame rendering by ~40%. The use of action enhancers also avoids writing utility functions to add commonly-used data to each action from action creators.

image

In our new architecture, the store reducers accept the dispatched action via action enhancers to update the state. The store then updates the UI, completing the action cycle. Action syncers then initiate the call to the backend APIs to synchronize local changes.

Conclusion

Our novel use of Redux in Yahoo Mail has led to significant user-facing benefits through a more performant application. It has also reduced development cycles for new features due to its simplified architecture. We’re excited to share our work with the community and would love to hear from anyone interested in learning more.

Artefacts in the classroom with Museum in a Box

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/museum-in-a-box/

Museum in a Box bridges the gap between museums and schools by creating a more hands-on approach to conservation education through 3D printing and digital making.

Artefacts in the classroom with Museum in a Box || Raspberry Pi Stories

Learn more: http://rpf.io/ Subscribe to our YouTube channel: http://rpf.io/ytsub Help us reach a wider audience by translating our video content: http://rpf.io/yttranslate Buy a Raspberry Pi from one of our Approved Resellers: http://rpf.io/ytproducts Find out more about the Raspberry Pi Foundation: Raspberry Pi http://rpf.io/ytrpi Code Club UK http://rpf.io/ytccuk Code Club International http://rpf.io/ytcci CoderDojo http://rpf.io/ytcd Check out our free online training courses: http://rpf.io/ytfl Find your local Raspberry Jam event: http://rpf.io/ytjam Work through our free online projects: http://rpf.io/ytprojects Do you have a question about your Raspberry Pi?

Fantastic collections and where to find them

Large, impressive statues are truly a sight to be seen. Take for example the 2.4m Hoa Hakananai’a at the British Museum. Its tall stature looms over you as you read its plaque to learn of the statue’s journey from Easter Island to the UK under the care of Captain Cook in 1774, and you can’t help but wonder at how it made it here in one piece.

Hoa Hakananai’a Captain Cook British Museum
Hoa Hakananai’a Captain Cook British Museum

But unless you live near a big city where museums are plentiful, you’re unlikely to see the likes of Hoa Hakananai’a in person. Instead, you have to content yourself with online photos or videos of world-famous artefacts.

And that only accounts for the objects that are on display: conservators estimate that only approximately 5 to 10% of museums’ overall collections are actually on show across the globe. The rest is boxed up in storage, inaccessible to the public due to risk of damage, or simply due to lack of space.

Museum in a Box

Museum in a Box aims to “put museum collections and expert knowledge into your hand, wherever you are in the world,” through modern maker practices such as 3D printing and digital making. With the help of the ‘Scan the World’ movement, an “ambitious initiative whose mission is to archive objects of cultural significance using 3D scanning technologies”, the Museum in a Box team has been able to print small, handheld replicas of some of the world’s most recognisable statues and sculptures.

Museum in a Box Raspberry Pi

Each 3D print gets NFC tags so it can initiate audio playback from a Raspberry Pi that sits snugly within the laser-cut housing of a ‘brain box’. Thus the print can talk directly to us through the magic of wireless technology, replacing the dense, dry text of a museum plaque with engaging speech.

Museum in a Box Raspberry Pi

The Museum in a Box team headed by CEO George Oates (featured in the video above) makes use of these 3D-printed figures alongside original artefacts, postcards, and more to bridge the gap between large, crowded, distant museums and local schools. Modeled after the museum handling collections that used to be sent to schools, Museum in a Box is a cheaper, more accessible alternative. Moreover, it not only allows for hands-on learning, but also encourages children to get directly involved by hacking its technology! With NFC technology readily available to the public, students can curate their own collections about their local area, record their own messages, and send their own box-sized museums on to schools in other towns or countries. In this way, Museum in a Box enables students to explore, and expand the reach of, their own histories.

Moving forward

With the technology perfected and interest in the project ever-growing, Museum in a Box has a busy year ahead. Supporting the new ‘Unstacked’ learning initiative, the team will soon be delivering ten boxes to the Smithsonian Libraries. The team has curated two collections specifically for this: an exploration into Asia-Pacific America experiences of migration to the USA throughout the 20th century, and a look into the history of science.

Smithsonian Library Museum in a Box Raspberry Pi

The team will also be making a box for the British Museum to support their Iraq Scheme initiative, and another box will be heading to the V&A to support their See Red programme. While primarily installed in the Lansbury Micro Museum, the box will also take to the road to visit the local Spotlight high school.

Museum in a Box at Raspberry Fields

Lastly, by far the most exciting thing the Museum in a Box team will be doing this year — in our opinion at least — is showcasing at Raspberry Fields! This is our brand-new festival of digital making that’s taking place on 30 June and 1 July 2018 here in Cambridge, UK. Find more information about it and get your ticket here.

The post Artefacts in the classroom with Museum in a Box appeared first on Raspberry Pi.

Police Assisted By MPAA Shut Down Pirate TV Box Sellers

Post Syndicated from Andy original https://torrentfreak.com/police-assisted-by-mpaa-shut-down-pirate-tv-box-sellers-180404/

Piracy configured set-top boxes are the next big thing, today. Millions have been sold around the world and anti-piracy groups are scrambling to rein them in.

Many strategies are being tested, from pressurizing developers of allegedly infringing addons to filing aggressive lawsuits against sites such as TVAddons, a Kodi addon repository now facing civil action in both the United States and Canada.

Also under fire are companies that sell set-top boxes that come ready configured for piracy. Both Tickbox TV and Dragon Media Inc are being sued by the Alliance for Creativity and Entertainment (ACE) in the US. At this stage, neither case looks promising for the defendants.

However, civil action isn’t the only way to deal with defendants in the United States, as a man and woman team from Tampa, Florida, have just discovered after being arrested by local police.

Mickael Cantrell and Nancy Major were allegedly the brains behind NBEETV, a company promising to supply set-top boxes that deliver “every movie, every tv show that’s ever been made, plus live sports with no blackouts” with “no monthly fees ever.”

As similar cases have shown, this kind of marketing spiel rarely ends well for defendants but the people behind NBEE TV (also known as FreeTVForLife Inc.) were either oblivious or simply didn’t care about the consequences.

A company press release dated April 2017 advertising the company’s NBPro 3+ box and tracked down by TF this week reveals the extent of the boasts.

“NBPRO 3+ is a TV box that offers instant access to watch every episode of any TV show without paying any monthly bill. One just must attach the loaded box to his TV and stream whatever they want, with no commercials,” the company wrote.

But while “Free TV for Life” was the slogan, that wasn’t the reality at the outset.

NBEETV’s Kodi-powered Android boxes were hellishly expensive with the NBPRO 1, NBPRO 3, NBPRO 5 costing $199.00, $279.00 and $359.00 respectively. This, however, was presented as a bargain alongside a claim that the “average [monthly] cable bill across the country is approximately $198.00” per month.

On top of the base product, NBEETV offered an 800 number for customer support and from their physical premises, they ran “training classes every Tuesday and Thursdays at 11:00” for people to better understand their products.

The location of that building isn’t mentioned in local media but a WHOIS on the company’s FreeTVForLife domain yields a confirmed address. It’s one that’s also been complained about in the past by an unhappy customer.

“Free TV for LIFE [redacted]..(next to K-Mart) Hudson, Fl.. 34667. We bought the Little black box costing $277.00. The pictures were not clear,” Rita S. wrote.

“The screen froze up on us all the time, even after hooking straight into the router. When we took the unit back they kept $80 of our money….were very rude, using the ************* word and we will not get the remainder of our money for 14-28 days according to the employee at the store. Buyers beware and I am telling everyone!!!”

While this customer was clearly unhappy, NBEETV claimed to be a “movement which is spreading across the country.” Unfortunately, that movement reached the eyes of the police, who didn’t think that the content being offered on the devices should have been presented for free.

“We saw [the boxes] had Black Panther, The Shape Of Water, Jumanji was on there as well,” said Detective Darren Hill.

“This is someone blatantly on the side of the road just selling them, with signage, a store front; advertising on the internet with a website.”

Detective Hill worked on the case with the MPAA but even from TorrentFreak’s limited investigations this week, the couple were incredibly easy to identify.

Aside from providing accurate and non-hidden address data in WHOIS records, Mickael Cantrell (also known as Michael Cantrell) put in his real name too. The listed email address is also easily traced back to a company called Nanny Bees Corporation which was operated by Cantrell and partner Nancy Major, who was also arrested in the NBEETV case.

Unfortunately for the couple, the blundering didn’t stop there. Their company YouTube channel, which is packed with tutorials, is also in Cantrell’s real name. Indeed, the photograph supplied to YouTube even matches the mugshot published by ABC Action News.

The publication reports that the Sheriff’s Office found the couple with around 50 ‘pirate’ boxes. The store operated by the couple has also been shutdown.

Finally, another curious aspect of NBEETV’s self-promotion comes via a blog post/press release dated August 2017 in which Cantrell suddenly ups the ante by becoming Michael W. Cantrell, Ph. D alongside some bold and unusual claims.

“Dr. Cantrell unleashes his latest innovation, a Smart TV Box that literally updates every ten minutes. Not only does the content (what you can view) but the whole platform updates automatically. If the Company changes an icon you receive the change in real time,” the release reads.

“Thanks to the Overlay Processor that Dr. Cantrell created, this processor named B-D.A.D (Binary Data Acceleration Dump) which enhances an Android unit’s operating power 5 times than the original bench test, has set a new industry standard around the world.”

Sounds epic….perhaps it powered the following video clip.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Spotify’s Two Million Unauthorized Users Hammered Google For Alternatives

Post Syndicated from Andy original https://torrentfreak.com/spotifys-two-million-unauthorized-users-hammered-google-for-alternatives-180326/

It is now common knowledge that Spotify launched its service more than a decade ago with the aim of attracting pirates.

With the disruption of The Pirate Bay ringing in the music industry’s ears, Spotify set out to capture the hearts and minds of music fans, particularly those with an aversion to paying.

Although it is yet to turn a profit, there can be little doubt that Spotify is a rampant success, at least as far as user numbers go. With premium and ad-supported free tiers available, the service is superbly accessible, no matter the depth of one’s pockets.

Naturally, those who pay get a better and smoother service so it’s no surprise that many free tier users aspire to that level of access. But while some pay the extra, others prefer to hack their way to music utopia.

How many people were accessing Spotify’s service using mainly hacked Android APK files has remained a mystery, but late last week, as part of the company’s IPO, Spotify dropped the bombshell.

“On March 21, 2018, we detected instances of approximately two million users as of December 31, 2017, who have been suppressing advertisements without payment,” Spotify wrote.

“We previously included such users in calculations for certain of our key performance indicators, including MAUs [Monthly Active Users], Ad-Supported Users, Content Hours, and Content Hours per MAU.”

Two million users is hardly an insignificant number and it appears Spotify felt the need to disclose them since up to January 1, 2017, the company had been including these users in its accounting. A couple of million users on the free tier is great, but not if they’re riding ad-free and therefore less likely to upgrade to premium, the suggestion goes.

Earlier this month, with its IPO process underway, Spotify clearly had these freeloading users on its mind. As previously reported, the company started to send out emails to people using hacked installation files, largely on Android, putting them on notice that their activities were not going unnoticed.

“We detected abnormal activity on the app you are using so we have disabled it. Don’t worry – your Spotify account is safe,” the email from Spotify said.

“To access your Spotify account, simply uninstall any unauthorized or modified version of Spotify and download and install the Spotify app from the official Google Play Store. If you need more help, please see our support article on Reinstalling Spotify.”

At the time it became apparent that this email had gone out to a large number of people, with significant volumes of users reporting problems with their accounts. It also seemed to target users fairly methodically, in that some countries’ users retained access while others suffered, only to be hit later on as more and more waves were sent out.

As the chart below from Google Trends shows, it appears that Spotify began taking action on March 1, which drove people to start searching for Spotify APK files that were still working.

By March 3, search volumes had doubled on the index and on March 7, Google searches for ‘Spotify APK’ reached a dramatic peak never before witnessed in the history of the search term. That’s quite an achievement given how many people use these pieces of software.

No prizes for guessing when Spotify got tough….

But after a flurry of activity, on March 22 search volumes were back down to March 3 levels, which is quite interesting in itself.

Although various modified APKs are still managing to evade Spotify’s ban, there doesn’t seem to be a dominant modified client proving popular enough to stop hundreds of thousands of people from continuing to search for an APK solution. So, presuming these ‘banned’ people still want the music offered by Spotify, where have they gone?

Aside from those using the APKs that have slipped through the net, reports suggest others have migrated to Deezer downloading solutions, which are also being targeted by Deezer. Others are using tools to convert their Spotify playlists to use with other pirate services or even YouTube.

The big question then is whether hitting the ban button to potentially eject up to two million users has resulted in a net positive for Spotify?

There’s no doubt it lowered the bandwidth bill for the growing company but how many former freeloaders traded the pirate high seas for an ad-supported account or even the premium service? Only Spotify has the numbers, and it won’t be sharing those yet – if ever.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

[$] Time-based packet transmission

Post Syndicated from corbet original https://lwn.net/Articles/748879/rss

Normally, when an application sends data over the network, it wants that
data to be transmitted as quickly as possible; the kernel’s network stack
tries to oblige. But there are applications that need their packets to be
transmitted within specific time windows. This behavior can be
approximated in user space now, but a better solution is in the works in
the form of the time-based packet
transmission
patch set.

Improve the Operational Efficiency of Amazon Elasticsearch Service Domains with Automated Alarms Using Amazon CloudWatch

Post Syndicated from Veronika Megler original https://aws.amazon.com/blogs/big-data/improve-the-operational-efficiency-of-amazon-elasticsearch-service-domains-with-automated-alarms-using-amazon-cloudwatch/

A customer has been successfully creating and running multiple Amazon Elasticsearch Service (Amazon ES) domains to support their business users’ search needs across products, orders, support documentation, and a growing suite of similar needs. The service has become heavily used across the organization.  This led to some domains running at 100% capacity during peak times, while others began to run low on storage space. Because of this increased usage, the technical teams were in danger of missing their service level agreements.  They contacted me for help.

This post shows how you can set up automated alarms to warn when domains need attention.

Solution overview

Amazon ES is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities along with the availability, scalability, and security that production workloads require.  The service offers built-in integrations with a number of other components and AWS services, enabling customers to go from raw data to actionable insights quickly and securely.

One of these other integrated services is Amazon CloudWatch. CloudWatch is a monitoring service for AWS Cloud resources and the applications that you run on AWS. You can use CloudWatch to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.

CloudWatch collects metrics for Amazon ES. You can use these metrics to monitor the state of your Amazon ES domains, and set alarms to notify you about high utilization of system resources.  For more information, see Amazon Elasticsearch Service Metrics and Dimensions.

While the metrics are automatically collected, the missing piece is how to set alarms on these metrics at appropriate levels for each of your domains. This post includes sample Python code to evaluate the current state of your Amazon ES environment, and to set up alarms according to AWS recommendations and best practices.

There are two components to the sample solution:

  • es-check-cwalarms.py: This Python script checks the CloudWatch alarms that have been set, for all Amazon ES domains in a given account and region.
  • es-create-cwalarms.py: This Python script sets up a set of CloudWatch alarms for a single given domain.

The sample code can also be found in the amazon-es-check-cw-alarms GitHub repo. The scripts are easy to extend or combine, as described in the section “Extensions and Adaptations”.

Assessing the current state

The first script, es-check-cwalarms.py, is used to give an overview of the configurations and alarm settings for all the Amazon ES domains in the given region. The script takes the following parameters:

python es-checkcwalarms.py -h
usage: es-checkcwalarms.py [-h] [-e ESPREFIX] [-n NOTIFY] [-f FREE][-p PROFILE] [-r REGION]
Checks a set of recommended CloudWatch alarms for Amazon Elasticsearch Service domains (optionally, those beginning with a given prefix).
optional arguments:
  -h, --help   		show this help message and exit
  -e ESPREFIX, --esprefix ESPREFIX	Only check Amazon Elasticsearch Service domains that begin with this prefix.
  -n NOTIFY, --notify NOTIFY    List of CloudWatch alarm actions; e.g. ['arn:aws:sns:xxxx']
  -f FREE, --free FREE  Minimum free storage (MB) on which to alarm
  -p PROFILE, --profile PROFILE     IAM profile name to use
  -r REGION, --region REGION       AWS region for the domain. Default: us-east-1

The script first identifies all the domains in the given region (or, optionally, limits them to the subset that begins with a given prefix). It then starts running a set of checks against each one.

The script can be run from the command line or set up as a scheduled Lambda function. For example, for one customer, it was deemed appropriate to regularly run the script to check that alarms were correctly set for all domains. In addition, because configuration changes—cluster size increases to accommodate larger workloads being a common change—might require updates to alarms, this approach allowed the automatic identification of alarms no longer appropriately set as the domain configurations changed.

The output shown below is the output for one domain in my account.

Starting checks for Elasticsearch domain iotfleet , version is 53
Iotfleet Automated snapshot hour (UTC): 0
Iotfleet Instance configuration: 1 instances; type:m3.medium.elasticsearch
Iotfleet Instance storage definition is: 4 GB; free storage calced to: 819.2 MB
iotfleet Desired free storage set to (in MB): 819.2
iotfleet WARNING: Not using VPC Endpoint
iotfleet WARNING: Does not have Zone Awareness enabled
iotfleet WARNING: Instance count is ODD. Best practice is for an even number of data nodes and zone awareness.
iotfleet WARNING: Does not have Dedicated Masters.
iotfleet WARNING: Neither index nor search slow logs are enabled.
iotfleet WARNING: EBS not in use. Using instance storage only.
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.yellow-Alarm ClusterStatus.yellow
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-ClusterStatus.red-Alarm ClusterStatus.red
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-CPUUtilization-Alarm CPUUtilization
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-JVMMemoryPressure-Alarm JVMMemoryPressure
iotfleet WARNING: Missing alarm!! ('ClusterIndexWritesBlocked', 'Maximum', 60, 5, 'GreaterThanOrEqualToThreshold', 1.0)
iotfleet Alarm ok; definition matches. Test-Elasticsearch-iotfleet-AutomatedSnapshotFailure-Alarm AutomatedSnapshotFailure
iotfleet Alarm: Threshold does not match: Test-Elasticsearch-iotfleet-FreeStorageSpace-Alarm Should be:  819.2 ; is 3000.0

The output messages fall into the following categories:

  • System overview, Informational: The Amazon ES version and configuration, including instance type and number, storage, automated snapshot hour, etc.
  • Free storage: A calculation for the appropriate amount of free storage, based on the recommended 20% of total storage.
  • Warnings: best practices that are not being followed for this domain. (For more about this, read on.)
  • Alarms: An assessment of the CloudWatch alarms currently set for this domain, against a recommended set.

The script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. Using the array allows alarm parameters (such as free space) to be updated within the code based on current domain statistics and configurations.

For a given domain, the script checks if each alarm has been set. If the alarm is set, it checks whether the values match those in the array esAlarms. In the output above, you can see three different situations being reported:

  • Alarm ok; definition matches. The alarm set for the domain matches the settings in the array.
  • Alarm: Threshold does not match. An alarm exists, but the threshold value at which the alarm is triggered does not match.
  • WARNING: Missing alarm!! The recommended alarm is missing.

All in all, the list above shows that this domain does not have a configuration that adheres to best practices, nor does it have all the recommended alarms.

Setting up alarms

Now that you know that the domains in their current state are missing critical alarms, you can correct the situation.

To demonstrate the script, set up a new domain named “ver”, in us-west-2. Specify 1 node, and a 10-GB EBS disk. Also, create an SNS topic in us-west-2 with a name of “sendnotification”, which sends you an email.

Run the second script, es-create-cwalarms.py, from the command line. This script creates (or updates) the desired CloudWatch alarms for the specified Amazon ES domain, “ver”.

python es-create-cwalarms.py -r us-west-2 -e test -c ver -n "['arn:aws:sns:us-west-2:xxxxxxxxxx:sendnotification']"
EBS enabled: True type: gp2 size (GB): 10 No Iops 10240  total storage (MB)
Desired free storage set to (in MB): 2048.0
Creating  Test-Elasticsearch-ver-ClusterStatus.yellow-Alarm
Creating  Test-Elasticsearch-ver-ClusterStatus.red-Alarm
Creating  Test-Elasticsearch-ver-CPUUtilization-Alarm
Creating  Test-Elasticsearch-ver-JVMMemoryPressure-Alarm
Creating  Test-Elasticsearch-ver-FreeStorageSpace-Alarm
Creating  Test-Elasticsearch-ver-ClusterIndexWritesBlocked-Alarm
Creating  Test-Elasticsearch-ver-AutomatedSnapshotFailure-Alarm
Successfully finished creating alarms!

As with the first script, this script contains an array of recommended CloudWatch alarms, based on best practices for these metrics and statistics. This approach allows you to add or modify alarms based on your use case (more on that below).

After running the script, navigate to Alarms on the CloudWatch console. You can see the set of alarms set up on your domain.

Because the “ver” domain has only a single node, cluster status is yellow, and that alarm is in an “ALARM” state. It’s already sent a notification that the alarm has been triggered.

What to do when an alarm triggers

After alarms are set up, you need to identify the correct action to take for each alarm, which depends on the alarm triggered. For ideas, guidance, and additional pointers to supporting documentation, see Get Started with Amazon Elasticsearch Service: Set CloudWatch Alarms on Key Metrics. For information about common errors and recovery actions to take, see Handling AWS Service Errors.

In most cases, the alarm triggers due to an increased workload. The likely action is to reconfigure the system to handle the increased workload, rather than reducing the incoming workload. Reconfiguring any backend store—a category of systems that includes Elasticsearch—is best performed when the system is quiescent or lightly loaded. Reconfigurations such as setting zone awareness or modifying the disk type cause Amazon ES to enter a “processing” state, potentially disrupting client access.

Other changes, such as increasing the number of data nodes, may cause Elasticsearch to begin moving shards, potentially impacting search performance on these shards while this is happening. These actions should be considered in the context of your production usage. For the same reason I also do not recommend running a script that resets all domains to match best practices.

Avoid the need to reconfigure during heavy workload by setting alarms at a level that allows a considered approach to making the needed changes. For example, if you identify that each weekly peak is increasing, you can reconfigure during a weekly quiet period.

While Elasticsearch can be reconfigured without being quiesced, it is not a best practice to automatically scale it up and down based on usage patterns. Unlike some other AWS services, I recommend against setting a CloudWatch action that automatically reconfigures the system when alarms are triggered.

There are other situations where the planned reconfiguration approach may not work, such as low or zero free disk space causing the domain to reject writes. If the business is dependent on the domain continuing to accept incoming writes and deleting data is not an option, the team may choose to reconfigure immediately.

Extensions and adaptations

You may wish to modify the best practices encoded in the scripts for your own environment or workloads. It’s always better to avoid situations where alerts are generated but routinely ignored. All alerts should trigger a review and one or more actions, either immediately or at a planned date. The following is a list of common situations where you may wish to set different alarms for different domains:

  • Dev/test vs. production
    You may have a different set of configuration rules and alarms for your dev environment configurations than for test. For example, you may require zone awareness and dedicated masters for your production environment, but not for your development domains. Or, you may not have any alarms set in dev. For test environments that mirror your potential peak load, test to ensure that the alarms are appropriately triggered.
  • Differing workloads or SLAs for different domains
    You may have one domain with a requirement for superfast search performance, and another domain with a heavy ingest load that tolerates slower search response. Your reaction to slow response for these two workloads is likely to be different, so perhaps the thresholds for these two domains should be set at a different level. In this case, you might add a “max CPU utilization” alarm at 100% for 1 minute for the fast search domain, while the other domain only triggers an alarm when the average has been higher than 60% for 5 minutes. You might also add a “free space” rule with a higher threshold to reflect the need for more space for the heavy ingest load if there is danger that it could fill the available disk quickly.
  • “Normal” alarms versus “emergency” alarms
    If, for example, free disk space drops to 25% of total capacity, an alarm is triggered that indicates action should be taken as soon as possible, such as cleaning up old indexes or reconfiguring at the next quiet period for this domain. However, if free space drops below a critical level (20% free space), action must be taken immediately in order to prevent Amazon ES from setting the domain to read-only. Similarly, if the “ClusterIndexWritesBlocked” alarm triggers, the domain has already stopped accepting writes, so immediate action is needed. In this case, you may wish to set “laddered” alarms, where one threshold causes an alarm to be triggered to review the current workload for a planned reconfiguration, but a different threshold raises a “DefCon 3” alarm that immediate action is required.

The sample scripts provided here are a starting point, intended for you to adapt to your own environment and needs.

Running the scripts one time can identify how far your current state is from your desired state, and create an initial set of alarms. Regularly re-running these scripts can capture changes in your environment over time and adjusting your alarms for changes in your environment and configurations. One customer has set them up to run nightly, and to automatically create and update alarms to match their preferred settings.

Removing unwanted alarms

Each CloudWatch alarm costs approximately $0.10 per month. You can remove unwanted alarms in the CloudWatch console, under Alarms. If you set up a “ver” domain above, remember to remove it to avoid continuing charges.

Conclusion

Setting CloudWatch alarms appropriately for your Amazon ES domains can help you avoid suboptimal performance and allow you to respond to workload growth or configuration issues well before they become urgent. This post gives you a starting point for doing so. The additional sleep you’ll get knowing you don’t need to be concerned about Elasticsearch domain performance will allow you to focus on building creative solutions for your business and solving problems for your customers.

Enjoy!


Additional Reading

If you found this post useful, be sure to check out Analyzing Amazon Elasticsearch Service Slow Logs Using Amazon CloudWatch Logs Streaming and Kibana and Get Started with Amazon Elasticsearch Service: How Many Shards Do I Need?

 


About the Author

Dr. Veronika Megler is a senior consultant at Amazon Web Services. She works with our customers to implement innovative big data, AI and ML projects, helping them accelerate their time-to-value when using AWS.

 

 

 

Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/top-8-best-practices-for-high-performance-etl-processing-using-amazon-redshift/

An ETL (Extract, Transform, Load) process enables you to load data from source systems into your data warehouse. This is typically executed as a batch or near-real-time ingest process to keep the data warehouse current and provide up-to-date analytical data to end users.

Amazon Redshift is a fast, petabyte-scale data warehouse that enables you easily to make data-driven decisions. With Amazon Redshift, you can get insights into your big data in a cost-effective fashion using standard SQL. You can set up any type of data model, from star and snowflake schemas, to simple de-normalized tables for running any analytical queries.

To operate a robust ETL platform and deliver data to Amazon Redshift in a timely manner, design your ETL processes to take account of Amazon Redshift’s architecture. When migrating from a legacy data warehouse to Amazon Redshift, it is tempting to adopt a lift-and-shift approach, but this can result in performance and scale issues long term. This post guides you through the following best practices for ensuring optimal, consistent runtimes for your ETL processes:

  • COPY data from multiple, evenly sized files.
  • Use workload management to improve ETL runtimes.
  • Perform table maintenance regularly.
  • Perform multiple steps in a single transaction.
  • Loading data in bulk.
  • Use UNLOAD to extract large result sets.
  • Use Amazon Redshift Spectrum for ad hoc ETL processing.
  • Monitor daily ETL health using diagnostic queries.

1. COPY data from multiple, evenly sized files

Amazon Redshift is an MPP (massively parallel processing) database, where all the compute nodes divide and parallelize the work of ingesting data. Each node is further subdivided into slices, with each slice having one or more dedicated cores, equally dividing the processing capacity. The number of slices per node depends on the node type of the cluster. For example, each DS2.XLARGE compute node has two slices, whereas each DS2.8XLARGE compute node has 16 slices.

When you load data into Amazon Redshift, you should aim to have each slice do an equal amount of work. When you load the data from a single large file or from files split into uneven sizes, some slices do more work than others. As a result, the process runs only as fast as the slowest, or most heavily loaded, slice. In the example shown below, a single large file is loaded into a two-node cluster, resulting in only one of the nodes, “Compute-0”, performing all the data ingestion:

When splitting your data files, ensure that they are of approximately equal size – between 1 MB and 1 GB after compression. The number of files should be a multiple of the number of slices in your cluster. Also, I strongly recommend that you individually compress the load files using gzip, lzop, or bzip2 to efficiently load large datasets.

When loading multiple files into a single table, use a single COPY command for the table, rather than multiple COPY commands. Amazon Redshift automatically parallelizes the data ingestion. Using a single COPY command to bulk load data into a table ensures optimal use of cluster resources, and quickest possible throughput.

2. Use workload management to improve ETL runtimes

Use Amazon Redshift’s workload management (WLM) to define multiple queues dedicated to different workloads (for example, ETL versus reporting) and to manage the runtimes of queries. As you migrate more workloads into Amazon Redshift, your ETL runtimes can become inconsistent if WLM is not appropriately set up.

I recommend limiting the overall concurrency of WLM across all queues to around 15 or less. This WLM guide helps you organize and monitor the different queues for your Amazon Redshift cluster.

When managing different workloads on your Amazon Redshift cluster, consider the following for the queue setup:

  • Create a queue dedicated to your ETL processes. Configure this queue with a small number of slots (5 or fewer). Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to the commit queue. Because ETL is a commit-intensive process, having a separate queue with a small number of slots helps mitigate this issue.
  • Claim extra memory available in a queue. When executing an ETL query, you can take advantage of the wlm_query_slot_count to claim the extra memory available in a particular queue. For example, a typical ETL process might involve COPYing raw data into a staging table so that downstream ETL jobs can run transformations that calculate daily, weekly, and monthly aggregates. To speed up the COPY process (so that the downstream tasks can start in parallel sooner), the wlm_query_slot_count can be increased for this step.
  • Create a separate queue for reporting queries. Configure query monitoring rules on this queue to further manage long-running and expensive queries.
  • Take advantage of the dynamic memory parameters. They swap the memory from your ETL to your reporting queue after the ETL job has completed.

3. Perform table maintenance regularly

Amazon Redshift is a columnar database, which enables fast transformations for aggregating data. Performing regular table maintenance ensures that transformation ETLs are predictable and performant. To get the best performance from your Amazon Redshift database, you must ensure that database tables regularly are VACUUMed and ANALYZEd. The Analyze & Vacuum schema utility helps you automate the table maintenance task and have VACUUM & ANALYZE executed in a regular fashion.

  • Use VACUUM to sort tables and remove deleted blocks

During a typical ETL refresh process, tables receive new incoming records using COPY, and unneeded data (cold data) is removed using DELETE. New rows are added to the unsorted region in a table. Deleted rows are simply marked for deletion.

DELETE does not automatically reclaim the space occupied by the deleted rows. Adding and removing large numbers of rows can therefore cause the unsorted region and the number of deleted blocks to grow. This can degrade the performance of queries executed against these tables.

After an ETL process completes, perform VACUUM to ensure that user queries execute in a consistent manner. The complete list of tables that need VACUUMing can be found using the Amazon Redshift Util’s table_info script.

Use the following approaches to ensure that VACCUM is completed in a timely manner:

  • Use wlm_query_slot_count to claim all the memory allocated in the ETL WLM queue during the VACUUM process.
  • DROP or TRUNCATE intermediate or staging tables, thereby eliminating the need to VACUUM them.
  • If your table has a compound sort key with only one sort column, try to load your data in sort key order. This helps reduce or eliminate the need to VACUUM the table.
  • Consider using time series This helps reduce the amount of data you need to VACUUM.
  • Use ANALYZE to update database statistics

Amazon Redshift uses a cost-based query planner and optimizer using statistics about tables to make good decisions about the query plan for the SQL statements. Regular statistics collection after the ETL completion ensures that user queries run fast, and that daily ETL processes are performant. The Amazon Redshift utility table_info script provides insights into the freshness of the statistics. Keeping the statistics off (pct_stats_off) less than 20% ensures effective query plans for the SQL queries.

4. Perform multiple steps in a single transaction

ETL transformation logic often spans multiple steps. Because commits in Amazon Redshift are expensive, if each ETL step performs a commit, multiple concurrent ETL processes can take a long time to execute.

To minimize the number of commits in a process, the steps in an ETL script should be surrounded by a BEGIN…END statement so that a single commit is performed only after all the transformation logic has been executed. For example, here is an example multi-step ETL script that performs one commit at the end:

Begin
CREATE temporary staging_table;
INSERT INTO staging_table SELECT .. FROM source (transformation logic);
DELETE FROM daily_table WHERE dataset_date =?;
INSERT INTO daily_table SELECT .. FROM staging_table (daily aggregate);
DELETE FROM weekly_table WHERE weekending_date=?;
INSERT INTO weekly_table SELECT .. FROM staging_table(weekly aggregate);
Commit

5. Loading data in bulk

Amazon Redshift is designed to store and query petabyte-scale datasets. Using Amazon S3 you can stage and accumulate data from multiple source systems before executing a bulk COPY operation. The following methods allow efficient and fast transfer of these bulk datasets into Amazon Redshift:

  • Use a manifest file to ingest large datasets that span multiple files. The manifest file is a JSON file that lists all the files to be loaded into Amazon Redshift. Using a manifest file ensures that Amazon Redshift has a consistent view of the data to be loaded from S3, while also ensuring that duplicate files do not result in the same data being loaded more than one time.
  • Use temporary staging tables to hold the data for transformation. These tables are automatically dropped after the ETL session is complete. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. Explicitly specifying the CREATE TEMPORARY TABLE statement allows you to control the DISTRIBUTION KEY, SORT KEY, and compression settings to further improve performance.
  • User ALTER table APPEND to swap data from the staging tables to the target table. Data in the source table is moved to matching columns in the target table. Column order doesn’t matter. After data is successfully appended to the target table, the source table is empty. ALTER TABLE APPEND is much faster than a similar CREATE TABLE AS or INSERT INTO operation because it doesn’t involve copying or moving data.

6. Use UNLOAD to extract large result sets

Fetching a large number of rows using SELECT is expensive and takes a long time. When a large amount of data is fetched from the Amazon Redshift cluster, the leader node has to hold the data temporarily until the fetches are complete. Further, data is streamed out sequentially, which results in longer elapsed time. As a result, the leader node can become hot, which not only affects the SELECT that is being executed, but also throttles resources for creating execution plans and managing the overall cluster resources. Here is an example of a large SELECT statement. Notice that the leader node is doing most of the work to stream out the rows:

Use UNLOAD to extract large results sets directly to S3. After it’s in S3, the data can be shared with multiple downstream systems. By default, UNLOAD writes data in parallel to multiple files according to the number of slices in the cluster. All the compute nodes participate to quickly offload the data into S3.

If you are extracting data for use with Amazon Redshift Spectrum, you should make use of the MAXFILESIZE parameter to and keep files are 150 MB. Similar to item 1 above, having many evenly sized files ensures that Redshift Spectrum can do the maximum amount of work in parallel.

7. Use Redshift Spectrum for ad hoc ETL processing

Events such as data backfill, promotional activity, and special calendar days can trigger additional data volumes that affect the data refresh times in your Amazon Redshift cluster. To help address these spikes in data volumes and throughput, I recommend staging data in S3. After data is organized in S3, Redshift Spectrum enables you to query it directly using standard SQL. In this way, you gain the benefits of additional capacity without having to resize your cluster.

For tips on getting started with and optimizing the use of Redshift Spectrum, see the previous post, 10 Best Practices for Amazon Redshift Spectrum.

8. Monitor daily ETL health using diagnostic queries

Monitoring the health of your ETL processes on a regular basis helps identify the early onset of performance issues before they have a significant impact on your cluster. The following monitoring scripts can be used to provide insights into the health of your ETL processes:

Script Use when… Solution
commit_stats.sql – Commit queue statistics from past days, showing largest queue length and queue time first DML statements such as INSERT/UPDATE/COPY/DELETE operations take several times longer to execute when multiple of these operations are in progress Set up separate WLM queues for the ETL process and limit the concurrency to < 5.
copy_performance.sql –  Copy command statistics for the past days Daily COPY operations take longer to execute • Follow the best practices for the COPY command.
• Analyze data growth with the incoming datasets and consider cluster resize to meet the expected SLA.
table_info.sql – Table skew and unsorted statistics along with storage and key information Transformation steps take longer to execute • Set up regular VACCUM jobs to address unsorted rows and claim the deleted blocks so that transformation SQL execute optimally.
• Consider a table redesign to avoid data skewness.
v_check_transaction_locks.sql – Monitor transaction locks INSERT/UPDATE/COPY/DELETE operations on particular tables do not respond back in timely manner, compared to when run after the ETL Multiple DML statements are operating on the same target table at the same moment from different transactions. Set up ETL job dependency so that they execute serially for the same target table.
v_get_schema_priv_by_user.sql – Get the schema that the user has access to Reporting users can view intermediate tables Set up separate database groups for reporting and ETL users, and grants access to objects using GRANT.
v_generate_tbl_ddl.sql – Get the table DDL You need to create an empty table with same structure as target table for data backfill Generate DDL using this script for data backfill.
v_space_used_per_tbl.sql – monitor space used by individual tables Amazon Redshift data warehouse space growth is trending upwards more than normal

Analyze the individual tables that are growing at higher rate than normal. Consider data archival using UNLOAD to S3 and Redshift Spectrum for later analysis.

Use unscanned_table_summary.sql to find unused table and archive or drop them.

top_queries.sql – Return the top 50 time consuming statements aggregated by its text ETL transformations are taking longer to execute Analyze the top transformation SQL and use EXPLAIN to find opportunities for tuning the query plan.

There are several other useful scripts available in the amazon-redshift-utils repository. The AWS Lambda Utility Runner runs a subset of these scripts on a scheduled basis, allowing you to automate much of monitoring of your ETL processes.

Example ETL process

The following ETL process reinforces some of the best practices discussed in this post. Consider the following four-step daily ETL workflow where data from an RDBMS source system is staged in S3 and then loaded into Amazon Redshift. Amazon Redshift is used to calculate daily, weekly, and monthly aggregations, which are then unloaded to S3, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

Step 1:  Extract from the RDBMS source to a S3 bucket

In this ETL process, the data extract job fetches change data every 1 hour and it is staged into multiple hourly files. For example, the staged S3 folder looks like the following:

 [[email protected] ~]$ aws s3 ls s3://<<S3 Bucket>>/batch/2017/07/02/
2017-07-02 01:59:58   81900220 20170702T01.export.gz
2017-07-02 02:59:56   84926844 20170702T02.export.gz
2017-07-02 03:59:54   78990356 20170702T03.export.gz
…
2017-07-02 22:00:03   75966745 20170702T21.export.gz
2017-07-02 23:00:02   89199874 20170702T22.export.gz
2017-07-02 00:59:59   71161715 20170702T23.export.gz

Organizing the data into multiple, evenly sized files enables the COPY command to ingest this data using all available resources in the Amazon Redshift cluster. Further, the files are compressed (gzipped) to further reduce COPY times.

Step 2: Stage data to the Amazon Redshift table for cleansing

Ingesting the data can be accomplished using a JSON-based manifest file. Using the manifest file ensures that S3 eventual consistency issues can be eliminated and also provides an opportunity to dedupe any files if needed. A sample manifest20170702.json file looks like the following:

{
  "entries": [
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T01.export.gz", "mandatory":true},
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T02.export.gz", "mandatory":true},
    …
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T23.export.gz", "mandatory":true}
  ]
}

The data can be ingested using the following command:

SET wlm_query_slot_count TO <<max available concurrency in the ETL queue>>;
COPY stage_tbl FROM 's3:// <<S3 Bucket>>/batch/manifest20170702.json' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' manifest;

Because the downstream ETL processes depend on this COPY command to complete, the wlm_query_slot_count is used to claim all the memory available to the queue. This helps the COPY command complete as quickly as possible.

Step 3: Transform data to create daily, weekly, and monthly datasets and load into target tables

Data is staged in the “stage_tbl” from where it can be transformed into the daily, weekly, and monthly aggregates and loaded into target tables. The following job illustrates a typical weekly process:

Begin
INSERT into ETL_LOG (..) values (..);
DELETE from weekly_tbl where dataset_week = <<current week>>;
INSERT into weekly_tbl (..)
  SELECT date_trunc('week', dataset_day) AS week_begin_dataset_date, SUM(C1) AS C1, SUM(C2) AS C2
	FROM   stage_tbl
GROUP BY date_trunc('week', dataset_day);
INSERT into AUDIT_LOG values (..);
COMMIT;
End;

As shown above, multiple steps are combined into one transaction to perform a single commit, reducing contention on the commit queue.

Step 4: Unload the daily dataset to populate the S3 data lake bucket

The transformed results are now unloaded into another S3 bucket, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

unload ('SELECT * FROM weekly_tbl WHERE dataset_week = <<current week>>’) TO 's3:// <<S3 Bucket>>/datalake/weekly/20170526/' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

Summary

Amazon Redshift lets you easily operate petabyte-scale data warehouses on the cloud. This post summarized the best practices for operating scalable ETL natively within Amazon Redshift. I demonstrated efficient ways to ingest and transform data, along with close monitoring. I also demonstrated the best practices being used in a typical sample ETL workload to transform the data into Amazon Redshift.

If you have questions or suggestions, please comment below.

 


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

 

Physics cheats

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/06/physics-cheats/

Anonymous asks:

something about how we tweak physics to “work” better in games?

Ho ho! Work. Get it? Like in physics…?

Hitboxes

Hitbox” is perhaps not the most accurate term, since the shape used for colliding with the environment and the shape used for detecting damage might be totally different. They’re usually the same in simple platformers, though, and that’s what most of my games have been.

The hitbox is the biggest physics fudge by far, and it exists because of a single massive approximation that (most) games make: you’re controlling a single entity in the abstract, not a physical body in great detail.

That is: when you walk with your real-world meat shell, you perform a complex dance of putting one foot in front of the other, a motion you spent years perfecting. When you walk in a video game, you press a single “walk” button. Your avatar may play an animation that moves its legs back and forth, but since you’re not actually controlling the legs independently (and since simulating them is way harder), the game just treats you like a simple shape. Fairly often, this is a box, or something very box-like.

An Eevee sprite standing on faux ground; the size of the underlying image and the hitbox are outlined

Since the player has no direct control over the exact placement of their limbs, it would be slightly frustrating to have them collide with the world. This is especially true in cases like the above, where the tail and left ear protrude significantly out from the main body. If that Eevee wanted to stand against a real-world wall, she would simply tilt her ear or tail out of the way, so there’s no reason for the ear to block her from standing against a game wall. To compensate for this, the ear and tail are left out of the collision box entirely and will simply jut into a wall if necessary — a goofy affordance that’s so common it doesn’t even register as unusual. As a bonus (assuming this same box is used for combat), she won’t take damage from projectiles that merely graze past an ear.

(One extra consideration for sprite games in particular: the hitbox ought to be horizontally symmetric around the sprite’s pivot — i.e. the point where the entity is truly considered to be standing — so that the hitbox doesn’t abruptly move when the entity turns around!)

Corners

Treating the player (and indeed most objects) as a box has one annoying side effect: boxes have corners. Corners can catch on other corners, even by a single pixel. Real-world bodies tend to be a bit rounder and squishier and this can tolerate grazing a corner; even real-world boxes will simply rotate a bit.

Ah, but in our faux physics world, we generally don’t want conscious actors (such as the player) to rotate, even with a realistic physics simulator! Real-world bodies are made of parts that will generally try to keep you upright, after all; you don’t tilt back and forth much.

One way to handle corners is to simply remove them from conscious actors. A hitbox doesn’t have to be a literal box, after all. A popular alternative — especially in Unity where it’s a standard asset — is the pill-shaped capsule, which has semicircles/hemispheres on the top and bottom and a cylindrical body in 3D. No corners, no problem.

Of course, that introduces a new problem: now the player can’t balance precariously on edges without their rounded bottom sliding them off. Alas.

If you’re stuck with corners, then, you may want to use a corner bump, a term I just made up. If the player would collide with a corner, but the collision is only by a few pixels, just nudge them to the side a bit and carry on.

An Eevee sprite trying to move sideways into a shallow ledge; the game bumps her upwards slightly, so she steps onto it instead

When the corner is horizontal, this creates stairs! This is, more or less kinda, how steps work in Doom: when the player tries to cross from one sector into another, if the height difference is 24 units or less, the game simply bumps them upwards to the height of the new floor and lets them continue on.

Implementing this in a game without Doom’s notion of sectors is a little trickier. In fact, I still haven’t done it. Collision detection based on rejection gets it for free, kinda, but it’s not very deterministic and it breaks other things. But that’s a whole other post.

Gravity

Gravity is pretty easy. Everything accelerates downwards all the time. What’s interesting are the exceptions.

Jumping

Jumping is a giant hack.

Think about how actual jumping works: you tense your legs, which generally involves bending your knees first, and then spring upwards. In a platformer, you can just leap whenever you feel like it, which is nonsense. Also you go like twenty feet into the air?

Worse, most platformers allow variable-height jumping, where your jump is lower if you let go of the jump button while you’re in the air. Normally, one would expect to have to decide how much force to put into the jump beforehand.

But of course this is about convenience of controls: when jumping is your primary action, you want to be able to do it immediately, without any windup for how high you want to jump.

(And then there’s double jumping? Come on.)

Air control is a similar phenomenon: usually you’d jump in a particular direction by controlling how you push off the ground with your feet, but in a video game, you don’t have feet! You only have the box. The compromise is to let you control your horizontal movement to a limit degree in midair, even though that doesn’t make any sense. (It’s way more fun, though, and overall gives you more movement options, which are good to have in an interactive medium.)

Air control also exposes an obvious place that game physics collide with the realistic model of serious physics engines. I’ve mentioned this before, but: if you use Real Physics™ and air control yourself into a wall, you might find that you’ll simply stick to the wall until you let go of the movement buttons. Why? Remember, player movement acts as though an external force were pushing you around (and from the perspective of a Real™ physics engine, this is exactly how you’d implement it) — so air-controlling into a wall is equivalent to pushing a book against a wall with your hand, and the friction with the wall holds you in place. Oops.

Ground sticking

Another place game physics conflict with physics engines is with running to the top of a slope. On a real hill, of course, you land on top of the slope and are probably glad of it; slopes are hard to climb!

An Eevee moves to the top of a slope, and rather than step onto the flat top, she goes flying off into the air

In a video game, you go flying. Because you’re a box. With momentum. So you hit the peak and keep going in the same direction. Which is diagonally upwards.

Projectiles

To make them more predictable, projectiles generally aren’t subject to gravity, at least as far as I’ve seen. The real world does not have such an exemption. The real world imposes gravity even on sniper rifles, which in a video game are often implemented as an instant trace unaffected by anything in the world because the bullet never actually exists in the world.

Resistance

Ah. Welcome to hell.

Water

Water is an interesting case, and offhand I don’t know the gritty details of how games implement it. In the real world, water applies a resistant drag force to movement — and that force is proportional to the square of velocity, which I’d completely forgotten until right now. I am almost positive that no game handles that correctly. But then, in real-world water, you can push against the water itself for movement, and games don’t simulate that either. What’s the rough equivalent?

The Sonic Physics Guide suggests that Sonic handles it by basically halving everything: acceleration, max speed, friction, etc. When Sonic enters water, his speed is cut; when Sonic exits water, his speed is increased.

That last bit feels validating — I could swear Metroid Prime did the same thing, and built my own solution around it, but couldn’t remember for sure. It makes no sense, of course, for a jump to become faster just because you happened to break the surface of the water, but it feels fantastic.

The thing I did was similar, except that I didn’t want to add a multiplier in a dozen places when you happen to be underwater (and remember which ones need it to be squared, etc.). So instead, I calculate everything completely as normal, so velocity is exactly the same as it would be on dry land — but the distance you would move gets halved. The effect seems to be pretty similar to most platformers with water, at least as far as I can tell. It hasn’t shown up in a published game and I only added this fairly recently, so I might be overlooking some reason this is a bad idea.

(One reason that comes to mind is that velocity is now a little white lie while underwater, so anything relying on velocity for interesting effects might be thrown off. Or maybe that’s correct, because velocity thresholds should be halved underwater too? Hm!)

Notably, air is also a fluid, so it should behave the same way (just with different constants). I definitely don’t think any games apply air drag that’s proportional to the square of velocity.

Friction

Friction is, in my experience, a little handwaved. Probably because real-world friction is so darn complicated.

Consider that in the real world, we want very high friction on the surfaces we walk on — shoes and tires are explicitly designed to increase it, even. We move by bracing a back foot against the ground and using that to push ourselves forward, so we want the ground to resist our push as much as possible.

In a game world, we are a box. We move by being pushed by some invisible outside force, so if the friction between ourselves and the ground is too high, we won’t be able to move at all! That’s complete nonsense physically, but it turns out to be handy in some cases — for example, highish friction can simulate walking through deep mud, which should be difficult due to fluid drag and low friction.

But the best-known example of the fakeness of game friction is video game ice. Walking on real-world ice is difficult because the low friction means low grip; your feet are likely to slip out from under you, and you’ll simply fall down and have trouble moving at all. In a video game, you can’t fall down, so you have the opposite experience: you spend most of your time sliding around uncontrollably. Yet ice is so common in video games (and perhaps so uncommon in places I’ve lived) that I, at least, had never really thought about this disparity until an hour or so ago.

Game friction vs real-world friction

Real-world friction is a force. It’s the normal force (which is the force exerted by the object on the surface) times some constant that depends on how the two materials interact.

Force is mass times acceleration, and platformers often ignore mass, so friction ought to be an acceleration — applied against the object’s movement, but never enough to push it backwards.

I haven’t made any games where variable friction plays a significant role, but my gut instinct is that low friction should mean the player accelerates more slowly but has a higher max speed, and high friction should mean the opposite. I see from my own source code that I didn’t even do what I just said, so let’s defer to some better-made and well-documented games: Sonic and Doom.

In Sonic, friction is a fixed value subtracted from the player’s velocity (regardless of direction) each tic. Sonic has a fixed framerate, so the units are really pixels per tic squared (i.e. acceleration), multiplied by an implicit 1 tic per tic. So far, so good.

But Sonic’s friction only applies if the player isn’t pressing or . Hang on, that isn’t friction at all; that’s just deceleration! That’s equivalent to jogging to a stop. If friction were lower, Sonic would take longer to stop, but otherwise this is only tangentially related to friction.

(In fairness, this approach would decently emulate friction for non-conscious sliding objects, which are never going to be pressing movement buttons. Also, we don’t have the Sonic source code, and the name “friction” is a fan invention; the Sonic Physics Guide already uses “deceleration” to describe the player’s acceleration when turning around.)

Okay, let’s try Doom. In Doom, the default friction is 90.625%.

Hang on, what?

Yes, in Doom, friction is a multiplier applied every tic. Doom runs at 35 tics per second, so this is a multiplier of 0.032 per second. Yikes!

This isn’t anything remotely like real friction, but it’s much easier to implement. With friction as acceleration, the game has to know both the direction of movement (so it can apply friction in the opposite direction) and the magnitude (so it doesn’t overshoot and launch the object in the other direction). That means taking a semi-costly square root and also writing extra code to cap the amount of friction. With a multiplier, neither is necessary; just multiply the whole velocity vector and you’re done.

There are some downsides. One is that objects will never actually stop, since multiplying by 3% repeatedly will never produce a result of zero — though eventually the speed will become small enough to either slip below a “minimum speed” threshold or simply no longer fit in a float representation. Another is that the units are fairly meaningless: with Doom’s default friction of 90.625%, about how long does it take for the player to stop? I have no idea, partly because “stop” is ambiguous here! If friction were an acceleration, I could divide it into the player’s max speed to get a time.

All that aside, what are the actual effects of changing Doom’s friction? What an excellent question that’s surprisingly tricky to answer. (Note that friction can’t be changed in original Doom, only in the Boom port and its derivatives.) Here’s what I’ve pieced together.

Doom’s “friction” is really two values. “Friction” itself is a multiplier applied to moving objects on every tic, but there’s also a move factor which defaults to \(\frac{1}{32} = 0.03125\) and is derived from friction for custom values.

Every tic, the player’s velocity is multiplied by friction, and then increased by their speed times the move factor.

$$
v(n) = v(n – 1) \times friction + speed \times move factor
$$

Eventually, the reduction from friction will balance out the speed boost. That happens when \(v(n) = v(n – 1)\), so we can rearrange it to find the player’s effective max speed:

$$
v = v \times friction + speed \times move factor \\
v – v \times friction = speed \times move factor \\
v = speed \times \frac{move factor}{1 – friction}
$$

For vanilla Doom’s move factor of 0.03125 and friction of 0.90625, that becomes:

$$
v = speed \times \frac{\frac{1}{32}}{1 – \frac{29}{32}} = speed \times \frac{\frac{1}{32}}{\frac{3}{32}} = \frac{1}{3} \times speed
$$

Curiously, “speed” is three times the maximum speed an actor can actually move. Doomguy’s run speed is 50, so in practice he moves a third of that, or 16⅔ units per tic. (Of course, this isn’t counting SR40, a bug that lets Doomguy run ~40% faster than intended diagonally.)

So now, what if you change friction? Even more curiously, the move factor is calculated completely differently depending on whether friction is higher or lower than the default Doom amount:

$$
move factor = \begin{cases}
\frac{133 – 128 \times friction}{544} &≈ 0.244 – 0.235 \times friction & \text{ if } friction \ge \frac{29}{32} \\
\frac{81920 \times friction – 70145}{1048576} &≈ 0.078 \times friction – 0.067 & \text{ otherwise }
\end{cases}
$$

That’s pretty weird? Complicating things further is that low friction (which means muddy terrain, remember) has an extra multiplier on its move factor, depending on how fast you’re already going — the idea is apparently that you have a hard time getting going, but it gets easier as you find your footing. The extra multiplier maxes out at 8, which makes the two halves of that function meet at the vanilla Doom value.

A graph of the relationship between friction and move factor

That very top point corresponds to the move factor from the original game. So no matter what you do to friction, the move factor becomes lower. At 0.85 and change, you can no longer move at all; below that, you move backwards.

From the formula above, it’s easy to see what changes to friction and move factor will do to Doomguy’s stable velocity. Move factor is in the numerator, so increasing it will increase stable velocity — but it can’t increase, so stable velocity can only ever decrease. Friction is in the denominator, but it’s subtracted from 1, so increasing friction will make the denominator a smaller value less than 1, i.e. increase stable velocity. Combined, we get this relationship between friction and stable velocity.

A graph showing stable velocity shooting up dramatically as friction increases

As friction approaches 1, stable velocity grows without bound. This makes sense, given the definition of \(v(n)\) — if friction is 1, the velocity from the previous tic isn’t reduced at all, so we just keep accelerating freely.

All of this is why I’m wary of using multipliers.

Anyway, this leaves me with one last question about the effects of Doom’s friction: how long does it take to reach stable velocity? Barring precision errors, we’ll never truly reach stable velocity, but let’s say within 5%. First we need a closed formula for the velocity after some number of tics. This is a simple recurrence relation, and you can write a few terms out yourself if you want to be sure this is right.

$$
v(n) = v_0 \times friction^n + speed \times move factor \times \frac{friction^n – 1}{friction – 1}
$$

Our initial velocity is zero, so the first term disappears. Set this equal to the stable formula and solve for n:

$$
speed \times move factor \times \frac{friction^n – 1}{friction – 1} = (1 – 5\%) \times speed \times \frac{move factor}{1 – friction} \\
friction^n – 1 = -(1 – 5\%) \\
n = \frac{\ln 5\%}{\ln friction}
$$

Speed” and move factor disappear entirely, which makes sense, and this is purely a function of friction (and how close we want to get). For vanilla Doom, that comes out to 30.4, which is a little less than a second. For other values of friction:

A graph of time to stability which leaps upwards dramatically towards the right

As friction increases (which in Doom terms means the surface is more slippery), it takes longer and longer to reach stable speed, which is in turn greater and greater. For lesser friction (i.e. mud), stable speed is lower, but reached fairly quickly. (Of course, the extra “getting going” multiplier while in mud adds some extra time here, but including that in the graph is a bit more complicated.)

I think this matches with my instincts above. How fascinating!

What’s that? This is way too much math and you hate it? Then don’t use multipliers in game physics.

Uh

That was a hell of a diversion!

I guess the goofiest stuff in basic game physics is really just about mapping player controls to in-game actions like jumping and deceleration; the rest consists of hacks to compensate for representing everything as a box.

Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown

Post Syndicated from Eben Upton original https://www.raspberrypi.org/blog/why-raspberry-pi-isnt-vulnerable-to-spectre-or-meltdown/

Over the last couple of days, there has been a lot of discussion about a pair of security vulnerabilities nicknamed Spectre and Meltdown. These affect all modern Intel processors, and (in the case of Spectre) many AMD processors and ARM cores. Spectre allows an attacker to bypass software checks to read data from arbitrary locations in the current address space; Meltdown allows an attacker to read arbitrary data from the operating system kernel’s address space (which should normally be inaccessible to user programs).

Both vulnerabilities exploit performance features (caching and speculative execution) common to many modern processors to leak data via a so-called side-channel attack. Happily, the Raspberry Pi isn’t susceptible to these vulnerabilities, because of the particular ARM cores that we use.

To help us understand why, here’s a little primer on some concepts in modern processor design. We’ll illustrate these concepts using simple programs in Python syntax like this one:

t = a+b
u = c+d
v = e+f
w = v+g
x = h+i
y = j+k

While the processor in your computer doesn’t execute Python directly, the statements here are simple enough that they roughly correspond to a single machine instruction. We’re going to gloss over some details (notably pipelining and register renaming) which are very important to processor designers, but which aren’t necessary to understand how Spectre and Meltdown work.

For a comprehensive description of processor design, and other aspects of modern computer architecture, you can’t do better than Hennessy and Patterson’s classic Computer Architecture: A Quantitative Approach.

What is a scalar processor?

The simplest sort of modern processor executes one instruction per cycle; we call this a scalar processor. Our example above will execute in six cycles on a scalar processor.

Examples of scalar processors include the Intel 486 and the ARM1176 core used in Raspberry Pi 1 and Raspberry Pi Zero.

What is a superscalar processor?

The obvious way to make a scalar processor (or indeed any processor) run faster is to increase its clock speed. However, we soon reach limits of how fast the logic gates inside the processor can be made to run; processor designers therefore quickly began to look for ways to do several things at once.

An in-order superscalar processor examines the incoming stream of instructions and tries execute more than one at once, in one of several “pipes”, subject to dependencies between the instructions. Dependencies are important: you might think that a two-way superscalar processor could just pair up (or dual-issue) the six instructions in our example like this:

t, u = a+b, c+d
v, w = e+f, v+g
x, y = h+i, j+k

But this doesn’t make sense: we have to compute v before we can compute w, so the third and fourth instructions can’t be executed at the same time. Our two-way superscalar processor won’t be able to find anything to pair with the third instruction, so our example will execute in four cycles:

t, u = a+b, c+d
v    = e+f                   # second pipe does nothing here
w, x = v+g, h+i
y    = j+k

Examples of superscalar processors include the Intel Pentium, and the ARM Cortex-A7 and Cortex-A53 cores used in Raspberry Pi 2 and Raspberry Pi 3 respectively. Raspberry Pi 3 has only a 33% higher clock speed than Raspberry Pi 2, but has roughly double the performance: the extra performance is partly a result of Cortex-A53’s ability to dual-issue a broader range of instructions than Cortex-A7.

What is an out-of-order processor?

Going back to our example, we can see that, although we have a dependency between v and w, we have other independent instructions later in the program that we could potentially have used to fill the empty pipe during the second cycle. An out-of-order superscalar processor has the ability to shuffle the order of incoming instructions (again subject to dependencies) in order to keep its pipelines busy.

An out-of-order processor might effectively swap the definitions of w and x in our example like this:

t = a+b
u = c+d
v = e+f
x = h+i
w = v+g
y = j+k

allowing it to execute in three cycles:

t, u = a+b, c+d
v, x = e+f, h+i
w, y = v+g, j+k

Examples of out-of-order processors include the Intel Pentium 2 (and most subsequent Intel and AMD x86 processors), and many recent ARM cores, including Cortex-A9, -A15, -A17, and -A57.

What is speculation?

Reordering sequential instructions is a powerful way to recover more instruction-level parallelism, but as processors become wider (able to triple- or quadruple-issue instructions) it becomes harder to keep all those pipes busy. Modern processors have therefore grown the ability to speculate. Speculative execution lets us issue instructions which might turn out not to be required (because they are branched over): this keeps a pipe busy, and if it turns out that the instruction isn’t executed, we can just throw the result away.

To demonstrate the benefits of speculation, let’s look at another example:

t = a+b
u = t+c
v = u+d
if v:
   w = e+f
   x = w+g
   y = x+h

Now we have dependencies from t to u to v, and from w to x to y, so a two-way out-of-order processor without speculation won’t ever be able to fill its second pipe. It spends three cycles computing t, u, and v, after which it knows whether the body of the if statement will execute, in which case it then spends three cycles computing w, x, and y. Assuming the if (a branch instruction) takes one cycle, our example takes either four cycles (if v turns out to be zero) or seven cycles (if v is non-zero).

Speculation effectively shuffles the program like this:

t = a+b
u = t+c
v = u+d
w_ = e+f
x_ = w_+g
y_ = x_+h
if v:
   w, x, y = w_, x_, y_

so we now have additional instruction level parallelism to keep our pipes busy:

t, w_ = a+b, e+f
u, x_ = t+c, w_+g
v, y_ = u+d, x_+h
if v:
   w, x, y = w_, x_, y_

Cycle counting becomes less well defined in speculative out-of-order processors, but the branch and conditional update of w, x, and y are (approximately) free, so our example executes in (approximately) three cycles.

What is a cache?

In the good old days*, the speed of processors was well matched with the speed of memory access. My BBC Micro, with its 2MHz 6502, could execute an instruction roughly every 2µs (microseconds), and had a memory cycle time of 0.25µs. Over the ensuing 35 years, processors have become very much faster, but memory only modestly so: a single Cortex-A53 in a Raspberry Pi 3 can execute an instruction roughly every 0.5ns (nanoseconds), but can take up to 100ns to access main memory.

At first glance, this sounds like a disaster: every time we access memory, we’ll end up waiting for 100ns to get the result back. In this case, this example:

a = mem[0]
b = mem[1]

would take 200ns.

In practice, programs tend to access memory in relatively predictable ways, exhibiting both temporal locality (if I access a location, I’m likely to access it again soon) and spatial locality (if I access a location, I’m likely to access a nearby location soon). Caching takes advantage of these properties to reduce the average cost of access to memory.

A cache is a small on-chip memory, close to the processor, which stores copies of the contents of recently used locations (and their neighbours), so that they are quickly available on subsequent accesses. With caching, the example above will execute in a little over 100ns:

a = mem[0]    # 100ns delay, copies mem[0:15] into cache
b = mem[1]    # mem[1] is in the cache

From the point of view of Spectre and Meltdown, the important point is that if you can time how long a memory access takes, you can determine whether the address you accessed was in the cache (short time) or not (long time).

What is a side channel?

From Wikipedia:

“… a side-channel attack is any attack based on information gained from the physical implementation of a cryptosystem, rather than brute force or theoretical weaknesses in the algorithms (compare cryptanalysis). For example, timing information, power consumption, electromagnetic leaks or even sound can provide an extra source of information, which can be exploited to break the system.”

Spectre and Meltdown are side-channel attacks which deduce the contents of a memory location which should not normally be accessible by using timing to observe whether another location is present in the cache.

Putting it all together

Now let’s look at how speculation and caching combine to permit the Meltdown attack. Consider the following example, which is a user program that sometimes reads from an illegal (kernel) address:

t = a+b
u = t+c
v = u+d
if v:
   w = kern_mem[address]   # if we get here crash
   x = w&0x100
   y = user_mem[x]

Now our out-of-order two-way superscalar processor shuffles the program like this:

t, w_ = a+b, kern_mem[address]
u, x_ = t+c, w_&0x100
v, y_ = u+d, user_mem[x_]

if v:
   # crash
   w, x, y = w_, x_, y_      # we never get here

Even though the processor always speculatively reads from the kernel address, it must defer the resulting fault until it knows that v was non-zero. On the face of it, this feels safe because either:

  • v is zero, so the result of the illegal read isn’t committed to w
  • v is non-zero, so the program crashes before the read is committed to w

However, suppose we flush our cache before executing the code, and arrange a, b, c, and d so that v is zero. Now, the speculative load in the third cycle:

v, y_ = u+d, user_mem[x_]

will read from either address 0x000 or address 0x100 depending on the eighth bit of the result of the illegal read. Because v is zero, the results of the speculative instructions will be discarded, and execution will continue. If we time a subsequent access to one of those addresses, we can determine which address is in the cache. Congratulations: you’ve just read a single bit from the kernel’s address space!

The real Meltdown exploit is more complex than this, but the principle is the same. Spectre uses a similar approach to subvert software array bounds checks.

Conclusion

Modern processors go to great lengths to preserve the abstraction that they are in-order scalar machines that access memory directly, while in fact using a host of techniques including caching, instruction reordering, and speculation to deliver much higher performance than a simple processor could hope to achieve. Meltdown and Spectre are examples of what happens when we reason about security in the context of that abstraction, and then encounter minor discrepancies between the abstraction and reality.

The lack of speculation in the ARM1176, Cortex-A7, and Cortex-A53 cores used in Raspberry Pi render us immune to attacks of the sort.

* days may not be that old, or that good

The post Why Raspberry Pi isn’t vulnerable to Spectre or Meltdown appeared first on Raspberry Pi.

Random with care

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/02/random-with-care/

Hi! Here are a few loose thoughts about picking random numbers.

A word about crypto

DON’T ROLL YOUR OWN CRYPTO

This is all aimed at frivolous pursuits like video games. Hell, even video games where money is at stake should be deferring to someone who knows way more than I do. Otherwise you might find out that your deck shuffles in your poker game are woefully inadequate and some smartass is cheating you out of millions. (If your random number generator has fewer than 226 bits of state, it can’t even generate every possible shuffling of a deck of cards!)

Use the right distribution

Most languages have a random number primitive that spits out a number uniformly in the range [0, 1), and you can go pretty far with just that. But beware a few traps!

Random pitches

Say you want to pitch up a sound by a random amount, perhaps up to an octave. Your audio API probably has a way to do this that takes a pitch multiplier, where I say “probably” because that’s how the only audio API I’ve used works.

Easy peasy. If 1 is unchanged and 2 is pitched up by an octave, then all you need is rand() + 1. Right?

No! Pitch is exponential — within the same octave, the “gap” between C and C♯ is about half as big as the gap between B and the following C. If you pick a pitch multiplier uniformly, you’ll have a noticeable bias towards the higher pitches.

One octave corresponds to a doubling of pitch, so if you want to pick a random note, you want 2 ** rand().

Random directions

For two dimensions, you can just pick a random angle with rand() * TAU.

If you want a vector rather than an angle, or if you want a random direction in three dimensions, it’s a little trickier. You might be tempted to just pick a random point where each component is rand() * 2 - 1 (ranging from −1 to 1), but that’s not quite right. A direction is a point on the surface (or, equivalently, within the volume) of a sphere, and picking each component independently produces a point within the volume of a cube; the result will be a bias towards the corners of the cube, where there’s much more extra volume beyond the sphere.

No? Well, just trust me. I don’t know how to make a diagram for this.

Anyway, you could use the Pythagorean theorem a few times and make a huge mess of things, or it turns out there’s a really easy way that even works for two or four or any number of dimensions. You pick each coordinate from a Gaussian (normal) distribution, then normalize the resulting vector. In other words, using Python’s random module:

1
2
3
4
5
6
def random_direction():
    x = random.gauss(0, 1)
    y = random.gauss(0, 1)
    z = random.gauss(0, 1)
    r = math.sqrt(x*x + y*y + z*z)
    return x/r, y/r, z/r

Why does this work? I have no idea!

Note that it is possible to get zero (or close to it) for every component, in which case the result is nonsense. You can re-roll all the components if necessary; just check that the magnitude (or its square) is less than some epsilon, which is equivalent to throwing away a tiny sphere at the center and shouldn’t affect the distribution.

Beware Gauss

Since I brought it up: the Gaussian distribution is a pretty nice one for choosing things in some range, where the middle is the common case and should appear more frequently.

That said, I never use it, because it has one annoying drawback: the Gaussian distribution has no minimum or maximum value, so you can’t really scale it down to the range you want. In theory, you might get any value out of it, with no limit on scale.

In practice, it’s astronomically rare to actually get such a value out. I did a hundred million trials just to see what would happen, and the largest value produced was 5.8.

But, still, I’d rather not knowingly put extremely rare corner cases in my code if I can at all avoid it. I could clamp the ends, but that would cause unnatural bunching at the endpoints. I could reroll if I got a value outside some desired range, but I prefer to avoid rerolling when I can, too; after all, it’s still (astronomically) possible to have to reroll for an indefinite amount of time. (Okay, it’s really not, since you’ll eventually hit the period of your PRNG. Still, though.) I don’t bend over backwards here — I did just say to reroll when picking a random direction, after all — but when there’s a nicer alternative I’ll gladly use it.

And lo, there is a nicer alternative! Enter the beta distribution. It always spits out a number in [0, 1], so you can easily swap it in for the standard normal function, but it takes two “shape” parameters α and β that alter its behavior fairly dramatically.

With α = β = 1, the beta distribution is uniform, i.e. no different from rand(). As α increases, the distribution skews towards the right, and as β increases, the distribution skews towards the left. If α = β, the whole thing is symmetric with a hump in the middle. The higher either one gets, the more extreme the hump (meaning that value is far more common than any other). With a little fiddling, you can get a number of interesting curves.

Screenshots don’t really do it justice, so here’s a little Wolfram widget that lets you play with α and β live:

Note that if α = 1, then 1 is a possible value; if β = 1, then 0 is a possible value. You probably want them both greater than 1, which clamps the endpoints to zero.

Also, it’s possible to have either α or β or both be less than 1, but this creates very different behavior: the corresponding endpoints become poles.

Anyway, something like α = β = 3 is probably close enough to normal for most purposes but already clamped for you. And you could easily replicate something like, say, NetHack’s incredibly bizarre rnz function.

Random frequency

Say you want some event to have an 80% chance to happen every second. You (who am I kidding, I) might be tempted to do something like this:

1
2
if random() < 0.8 * dt:
    do_thing()

In an ideal world, dt is always the same and is equal to 1 / f, where f is the framerate. Replace that 80% with a variable, say P, and every tic you have a P / f chance to do the… whatever it is.

Each second, f tics pass, so you’ll make this check f times. The chance that any check succeeds is the inverse of the chance that every check fails, which is \(1 – \left(1 – \frac{P}{f}\right)^f\).

For P of 80% and a framerate of 60, that’s a total probability of 55.3%. Wait, what?

Consider what happens if the framerate is 2. On the first tic, you roll 0.4 twice — but probabilities are combined by multiplying, and splitting work up by dt only works for additive quantities. You lose some accuracy along the way. If you’re dealing with something that multiplies, you need an exponent somewhere.

But in this case, maybe you don’t want that at all. Each separate roll you make might independently succeed, so it’s possible (but very unlikely) that the event will happen 60 times within a single second! Or 200 times, if that’s someone’s framerate.

If you explicitly want something to have a chance to happen on a specific interval, you have to check on that interval. If you don’t have a gizmo handy to run code on an interval, it’s easy to do yourself with a time buffer:

1
2
3
4
5
6
timer += dt
# here, 1 is the "every 1 seconds"
while timer > 1:
    timer -= 1
    if random() < 0.8:
        do_thing()

Using while means rolls still happen even if you somehow skipped over an entire second.

(For the curious, and the nerds who already noticed: the expression \(1 – \left(1 – \frac{P}{f}\right)^f\) converges to a specific value! As the framerate increases, it becomes a better and better approximation for \(1 – e^{-P}\), which for the example above is 0.551. Hey, 60 fps is pretty accurate — it’s just accurately representing something nowhere near what I wanted. Er, you wanted.)

Rolling your own

Of course, you can fuss with the classic [0, 1] uniform value however you want. If I want a bias towards zero, I’ll often just square it, or multiply two of them together. If I want a bias towards one, I’ll take a square root. If I want something like a Gaussian/normal distribution, but with clearly-defined endpoints, I might add together n rolls and divide by n. (The normal distribution is just what you get if you roll infinite dice and divide by infinity!)

It’d be nice to be able to understand exactly what this will do to the distribution. Unfortunately, that requires some calculus, which this post is too small to contain, and which I didn’t even know much about myself until I went down a deep rabbit hole while writing, and which in many cases is straight up impossible to express directly.

Here’s the non-calculus bit. A source of randomness is often graphed as a PDF — a probability density function. You’ve almost certainly seen a bell curve graphed, and that’s a PDF. They’re pretty nice, since they do exactly what they look like: they show the relative chance that any given value will pop out. On a bog standard bell curve, there’s a peak at zero, and of course zero is the most common result from a normal distribution.

(Okay, actually, since the results are continuous, it’s vanishingly unlikely that you’ll get exactly zero — but you’re much more likely to get a value near zero than near any other number.)

For the uniform distribution, which is what a classic rand() gives you, the PDF is just a straight horizontal line — every result is equally likely.


If there were a calculus bit, it would go here! Instead, we can cheat. Sometimes. Mathematica knows how to work with probability distributions in the abstract, and there’s a free web version you can use. For the example of squaring a uniform variable, try this out:

1
PDF[TransformedDistribution[u^2, u \[Distributed] UniformDistribution[{0, 1}]], u]

(The \[Distributed] is a funny tilde that doesn’t exist in Unicode, but which Mathematica uses as a first-class operator. Also, press shiftEnter to evaluate the line.)

This will tell you that the distribution is… \(\frac{1}{2\sqrt{u}}\). Weird! You can plot it:

1
Plot[%, {u, 0, 1}]

(The % refers to the result of the last thing you did, so if you want to try several of these, you can just do Plot[PDF[…], u] directly.)

The resulting graph shows that numbers around zero are, in fact, vastly — infinitely — more likely than anything else.

What about multiplying two together? I can’t figure out how to get Mathematica to understand this, but a great amount of digging revealed that the answer is -ln x, and from there you can plot them both on Wolfram Alpha. They’re similar, though squaring has a much better chance of giving you high numbers than multiplying two separate rolls — which makes some sense, since if either of two rolls is a low number, the product will be even lower.

What if you know the graph you want, and you want to figure out how to play with a uniform roll to get it? Good news! That’s a whole thing called inverse transform sampling. All you have to do is take an integral. Good luck!


This is all extremely ridiculous. New tactic: Just Simulate The Damn Thing. You already have the code; run it a million times, make a histogram, and tada, there’s your PDF. That’s one of the great things about computers! Brute-force numerical answers are easy to come by, so there’s no excuse for producing something like rnz. (Though, be sure your histogram has sufficiently narrow buckets — I tried plotting one for rnz once and the weird stuff on the left side didn’t show up at all!)

By the way, I learned something from futzing with Mathematica here! Taking the square root (to bias towards 1) gives a PDF that’s a straight diagonal line, nothing like the hyperbola you get from squaring (to bias towards 0). How do you get a straight line the other way? Surprise: \(1 – \sqrt{1 – u}\).

Okay, okay, here’s the actual math

I don’t claim to have a very firm grasp on this, but I had a hell of a time finding it written out clearly, so I might as well write it down as best I can. This was a great excuse to finally set up MathJax, too.

Say \(u(x)\) is the PDF of the original distribution and \(u\) is a representative number you plucked from that distribution. For the uniform distribution, \(u(x) = 1\). Or, more accurately,

$$
u(x) = \begin{cases}
1 & \text{ if } 0 \le x \lt 1 \\
0 & \text{ otherwise }
\end{cases}
$$

Remember that \(x\) here is a possible outcome you want to know about, and the PDF tells you the relative probability that a roll will be near it. This PDF spits out 1 for every \(x\), meaning every number between 0 and 1 is equally likely to appear.

We want to do something to that PDF, which creates a new distribution, whose PDF we want to know. I’ll use my original example of \(f(u) = u^2\), which creates a new PDF \(v(x)\).

The trick is that we need to work in terms of the cumulative distribution function for \(u\). Where the PDF gives the relative chance that a roll will be (“near”) a specific value, the CDF gives the relative chance that a roll will be less than a specific value.

The conventions for this seem to be a bit fuzzy, and nobody bothers to explain which ones they’re using, which makes this all the more confusing to read about… but let’s write the CDF with a capital letter, so we have \(U(x)\). In this case, \(U(x) = x\), a straight 45° line (at least between 0 and 1). With the definition I gave, this should make sense. At some arbitrary point like 0.4, the value of the PDF is 1 (0.4 is just as likely as anything else), and the value of the CDF is 0.4 (you have a 40% chance of getting a number from 0 to 0.4).

Calculus ahoy: the PDF is the derivative of the CDF, which means it measures the slope of the CDF at any point. For \(U(x) = x\), the slope is always 1, and indeed \(u(x) = 1\). See, calculus is easy.

Okay, so, now we’re getting somewhere. What we want is the CDF of our new distribution, \(V(x)\). The CDF is defined as the probability that a roll \(v\) will be less than \(x\), so we can literally write:

$$V(x) = P(v \le x)$$

(This is why we have to work with CDFs, rather than PDFs — a PDF gives the chance that a roll will be “nearby,” whatever that means. A CDF is much more concrete.)

What is \(v\), exactly? We defined it ourselves; it’s the do something applied to a roll from the original distribution, or \(f(u)\).

$$V(x) = P\!\left(f(u) \le x\right)$$

Now the first tricky part: we have to solve that inequality for \(u\), which means we have to do something, backwards to \(x\).

$$V(x) = P\!\left(u \le f^{-1}(x)\right)$$

Almost there! We now have a probability that \(u\) is less than some value, and that’s the definition of a CDF!

$$V(x) = U\!\left(f^{-1}(x)\right)$$

Hooray! Now to turn these CDFs back into PDFs, all we need to do is differentiate both sides and use the chain rule. If you never took calculus, don’t worry too much about what that means!

$$v(x) = u\!\left(f^{-1}(x)\right)\left|\frac{d}{dx}f^{-1}(x)\right|$$

Wait! Where did that absolute value come from? It takes care of whether \(f(x)\) increases or decreases. It’s the least interesting part here by far, so, whatever.

There’s one more magical part here when using the uniform distribution — \(u(\dots)\) is always equal to 1, so that entire term disappears! (Note that this only works for a uniform distribution with a width of 1; PDFs are scaled so the entire area under them sums to 1, so if you had a rand() that could spit out a number between 0 and 2, the PDF would be \(u(x) = \frac{1}{2}\).)

$$v(x) = \left|\frac{d}{dx}f^{-1}(x)\right|$$

So for the specific case of modifying the output of rand(), all we have to do is invert, then differentiate. The inverse of \(f(u) = u^2\) is \(f^{-1}(x) = \sqrt{x}\) (no need for a ± since we’re only dealing with positive numbers), and differentiating that gives \(v(x) = \frac{1}{2\sqrt{x}}\). Done! This is also why square root comes out nicer; inverting it gives \(x^2\), and differentiating that gives \(2x\), a straight line.

Incidentally, that method for turning a uniform distribution into any distribution — inverse transform sampling — is pretty much the same thing in reverse: integrate, then invert. For example, when I saw that taking the square root gave \(v(x) = 2x\), I naturally wondered how to get a straight line going the other way, \(v(x) = 2 – 2x\). Integrating that gives \(2x – x^2\), and then you can use the quadratic formula (or just ask Wolfram Alpha) to solve \(2x – x^2 = u\) for \(x\) and get \(f(u) = 1 – \sqrt{1 – u}\).

Multiply two rolls is a bit more complicated; you have to write out the CDF as an integral and you end up doing a double integral and wow it’s a mess. The only thing I’ve retained is that you do a division somewhere, which then gets integrated, and that’s why it ends up as \(-\ln x\).

And that’s quite enough of that! (Okay but having math in my blog is pretty cool and I will definitely be doing more of this, sorry, not sorry.)

Random vs varied

Sometimes, random isn’t actually what you want. We tend to use the word “random” casually to mean something more like chaotic, i.e., with no discernible pattern. But that’s not really random. In fact, given how good humans can be at finding incidental patterns, they aren’t all that unlikely! Consider that when you roll two dice, they’ll come up either the same or only one apart almost half the time. Coincidence? Well, yes.

If you ask for randomness, you’re saying that any outcome — or series of outcomes — is acceptable, including five heads in a row or five tails in a row. Most of the time, that’s fine. Some of the time, it’s less fine, and what you really want is variety. Here are a couple examples and some fairly easy workarounds.

NPC quips

The nature of games is such that NPCs will eventually run out of things to say, at which point further conversation will give the player a short brush-off quip — a slight nod from the designer to the player that, hey, you hit the end of the script.

Some NPCs have multiple possible quips and will give one at random. The trouble with this is that it’s very possible for an NPC to repeat the same quip several times in a row before abruptly switching to another one. With only a few options to choose from, getting the same option twice or thrice (especially across an entire game, which may have numerous NPCs) isn’t all that unlikely. The notion of an NPC quip isn’t very realistic to start with, but having someone repeat themselves and then abruptly switch to something else is especially jarring.

The easy fix is to show the quips in order! Paradoxically, this is more consistently varied than choosing at random — the original “order” is likely to be meaningless anyway, and it already has the property that the same quip can never appear twice in a row.

If you like, you can shuffle the list of quips every time you reach the end, but take care here — it’s possible that the last quip in the old order will be the same as the first quip in the new order, so you may still get a repeat. (Of course, you can just check for this case and swap the first quip somewhere else if it bothers you.)

That last behavior is, in fact, the canonical way that Tetris chooses pieces — the game simply shuffles a list of all 7 pieces, gives those to you in shuffled order, then shuffles them again to make a new list once it’s exhausted. There’s no avoidance of duplicates, though, so you can still get two S blocks in a row, or even two S and two Z all clumped together, but no more than that. Some Tetris variants take other approaches, such as actively avoiding repeats even several pieces apart or deliberately giving you the worst piece possible.

Random drops

Random drops are often implemented as a flat chance each time. Maybe enemies have a 5% chance to drop health when they die. Legally speaking, over the long term, a player will see health drops for about 5% of enemy kills.

Over the short term, they may be desperate for health and not survive to see the long term. So you may want to put a thumb on the scale sometimes. Games in the Metroid series, for example, have a somewhat infamous bias towards whatever kind of drop they think you need — health if your health is low, missiles if your missiles are low.

I can’t give you an exact approach to use, since it depends on the game and the feeling you’re going for and the variables at your disposal. In extreme cases, you might want to guarantee a health drop from a tough enemy when the player is critically low on health. (Or if you’re feeling particularly evil, you could go the other way and deny the player health when they most need it…)

The problem becomes a little different, and worse, when the event that triggers the drop is relatively rare. The pathological case here would be something like a raid boss in World of Warcraft, which requires hours of effort from a coordinated group of people to defeat, and which has some tiny chance of dropping a good item that will go to only one of those people. This is why I stopped playing World of Warcraft at 60.

Dialing it back a little bit gives us Enter the Gungeon, a roguelike where each room is a set of encounters and each floor only has a dozen or so rooms. Initially, you have a 1% chance of getting a reward after completing a room — but every time you complete a room and don’t get a reward, the chance increases by 9%, up to a cap of 80%. Once you get a reward, the chance resets to 1%.

The natural question is: how frequently, exactly, can a player expect to get a reward? We could do math, or we could Just Simulate The Damn Thing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from collections import Counter
import random

histogram = Counter()

TRIALS = 1000000
chance = 1
rooms_cleared = 0
rewards_found = 0
while rewards_found < TRIALS:
    rooms_cleared += 1
    if random.random() * 100 < chance:
        # Reward!
        rewards_found += 1
        histogram[rooms_cleared] += 1
        rooms_cleared = 0
        chance = 1
    else:
        chance = min(80, chance + 9)

for gaps, count in sorted(histogram.items()):
    print(f"{gaps:3d} | {count / TRIALS * 100:6.2f}%", '#' * (count // (TRIALS // 100)))
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  1 |   0.98%
  2 |   9.91% #########
  3 |  17.00% ################
  4 |  20.23% ####################
  5 |  19.21% ###################
  6 |  15.05% ###############
  7 |   9.69% #########
  8 |   5.07% #####
  9 |   2.09% ##
 10 |   0.63%
 11 |   0.12%
 12 |   0.03%
 13 |   0.00%
 14 |   0.00%
 15 |   0.00%

We’ve got kind of a hilly distribution, skewed to the left, which is up in this histogram. Most of the time, a player should see a reward every three to six rooms, which is maybe twice per floor. It’s vanishingly unlikely to go through a dozen rooms without ever seeing a reward, so a player should see at least one per floor.

Of course, this simulated a single continuous playthrough; when starting the game from scratch, your chance at a reward always starts fresh at 1%, the worst it can be. If you want to know about how many rewards a player will get on the first floor, hey, Just Simulate The Damn Thing.

1
2
3
4
5
6
7
  0 |   0.01%
  1 |  13.01% #############
  2 |  56.28% ########################################################
  3 |  27.49% ###########################
  4 |   3.10% ###
  5 |   0.11%
  6 |   0.00%

Cool. Though, that’s assuming exactly 12 rooms; it might be worth changing that to pick at random in a way that matches the level generator.

(Enter the Gungeon does some other things to skew probability, which is very nice in a roguelike where blind luck can make or break you. For example, if you kill a boss without having gotten a new gun anywhere else on the floor, the boss is guaranteed to drop a gun.)

Critical hits

I suppose this is the same problem as random drops, but backwards.

Say you have a battle sim where every attack has a 6% chance to land a devastating critical hit. Presumably the same rules apply to both the player and the AI opponents.

Consider, then, that the AI opponents have exactly the same 6% chance to ruin the player’s day. Consider also that this gives them an 0.4% chance to critical hit twice in a row. 0.4% doesn’t sound like much, but across an entire playthrough, it’s not unlikely that a player might see it happen and find it incredibly annoying.

Perhaps it would be worthwhile to explicitly forbid AI opponents from getting consecutive critical hits.

In conclusion

An emerging theme here has been to Just Simulate The Damn Thing. So consider Just Simulating The Damn Thing. Even a simple change to a random value can do surprising things to the resulting distribution, so unless you feel like differentiating the inverse function of your code, maybe test out any non-trivial behavior and make sure it’s what you wanted. Probability is hard to reason about.