Tag Archives: LAPS

Community profile: Dave Akerman

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/community-profile-dave-akerman/

This column is from The MagPi issue 61. You can download a PDF of the full issue for free, or subscribe to receive the print edition through your letterbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve our charitable goals.

The pinned tweet on Dave Akerman’s Twitter account shows a table displaying the various components needed for a high-altitude balloon (HAB) flight. Batteries, leads, a camera and Raspberry Pi, plus an unusually themed payload. The caption reads ‘The Queen, The Duke of York, and my TARDIS”, and sums up Dave’s maker career in a heartbeat.

David Akerman on Twitter

The Queen, The Duke of York, and my TARDIS 🙂 #UKHAS #RaspberryPi

Though writing software for industrial automation pays the bills, the majority of Dave’s time is spent in the world of high-altitude ballooning and the ever-growing community that encompasses it. And, while he makes some money sending business-themed balloons to near space for the likes of Aardman Animations, Confused.com, and the BBC, Dave is best known in the Raspberry Pi community for his use of the small computer in every payload, and his work as a tutor alongside the Foundation’s staff at Skycademy events.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave continues to help others while breaking records and having a good time exploring the atmosphere.

Dave has dedicated many hours and many, many more miles to assist with the Foundation’s Skycademy programme, helping to explore high-altitude ballooning with educators from across the UK. Using a Raspberry Pi and various other pieces of lightweight tech, Dave and Foundation staff member James Robinson explored the incorporation of high-altitude ballooning into education. Through Skycademy, educators were able to learn new skills and take them to the classroom, setting off their own balloons with their students, and recording the results on Raspberry Pis.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave’s most recent flight broke a new record. On 13 August 2017, his HAB payload was able to send back the highest images taken by any amateur flight.

But education isn’t the only reason for Dave’s involvement in the HAB community. As with anyone passionate about a specific hobby, Dave strives to break records. The most recent record-breaking flight took place on 13 August 2017, when Dave’s Raspberry Pi Zero HAB sent home the highest images taken by any amateur high-altitude balloon launch: at 43014 metres. No other HAB balloon has provided images from such an altitude, and the lightweight nature of the Pi Zero definitely helped, as Dave went on to mention on Twitter a few days later.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave is recognised as being the first person to incorporate a Raspberry Pi into a HAB payload, and continues to break records with the help of the little green board. More recently, he’s been able to lighten the load by using the Raspberry Pi Zero.

When the first Pi made its way to near space, Dave tore the computer apart in order to meet the weight restriction. The Pi in the Sky board was created to add the extra features needed for the flight. Since then, the HAT has experienced a few changes.

Dave Akerman The MagPi Raspberry Pi Community Profile

The Pi in the Sky board, created specifically for HAB flights.

Dave first fell in love with high-altitude ballooning after coming across the hobby in a video shared on a photographic forum. With a lifelong interest in space thanks to watching the Moon landings as a boy, plus a talent for electronics and photography, it seems a natural progression for him. Throw in his coding skills from learning to program on a Teletype and it’s no wonder he was ready and eager to take to the skies, so to speak, and capture the curvature of the Earth. What was so great about using the Raspberry Pi was the instant gratification he got from receiving images in real time as they were taken during the flight. While other devices could control a camera and store captured images for later retrieval, thanks to the Pi Dave was able to transmit the files back down to Earth and check the progress of his balloon while attempting to break records with a flight.

Dave Akerman The MagPi Raspberry Pi Community Profile Morph

One of the many commercial flights Dave has organised featured the classic children’s TV character Morph, a creation of the Aardman Animations studio known for Wallace and Gromit. Morph took to the sky twice in his mission to reach near space, and finally succeeded in 2016.

High-altitude ballooning isn’t the only part of Dave’s life that incorporates a Raspberry Pi. Having “lost count” of how many Pis he has running tasks, Dave has also created radio receivers for APRS (ham radio data), ADS-B (aircraft tracking), and OGN (gliders), along with a time-lapse camera in his garden, and he has a few more Pi for tinkering purposes.

The post Community profile: Dave Akerman appeared first on Raspberry Pi.

What John Oliver gets wrong about Bitcoin

Post Syndicated from Robert Graham original http://blog.erratasec.com/2018/03/what-john-oliver-gets-wrong-about.html

John Oliver covered bitcoin/cryptocurrencies last night. I thought I’d describe a bunch of things he gets wrong.

How Bitcoin works

Nowhere in the show does it describe what Bitcoin is and how it works.
Discussions should always start with Satoshi Nakamoto’s original paper. The thing Satoshi points out is that there is an important cost to normal transactions, namely, the entire legal system designed to protect you against fraud, such as the way you can reverse the transactions on your credit card if it gets stolen. The point of Bitcoin is that there is no way to reverse a charge. A transaction is done via cryptography: to transfer money to me, you decrypt it with your secret key and encrypt it with mine, handing ownership over to me with no third party involved that can reverse the transaction, and essentially no overhead.
All the rest of the stuff, like the decentralized blockchain and mining, is all about making that work.
Bitcoin crazies forget about the original genesis of Bitcoin. For example, they talk about adding features to stop fraud, reversing transactions, and having a central authority that manages that. This misses the point, because the existing electronic banking system already does that, and does a better job at it than cryptocurrencies ever can. If you want to mock cryptocurrencies, talk about the “DAO”, which did exactly that — and collapsed in a big fraudulent scheme where insiders made money and outsiders didn’t.
Sticking to Satoshi’s original ideas are a lot better than trying to repeat how the crazy fringe activists define Bitcoin.

How does any money have value?

Oliver’s answer is currencies have value because people agree that they have value, like how they agree a Beanie Baby is worth $15,000.
This is wrong. A better way of asking the question why the value of money changes. The dollar has been losing roughly 2% of its value each year for decades. This is called “inflation”, as the dollar loses value, it takes more dollars to buy things, which means the price of things (in dollars) goes up, and employers have to pay us more dollars so that we can buy the same amount of things.
The reason the value of the dollar changes is largely because the Federal Reserve manages the supply of dollars, using the same law of Supply and Demand. As you know, if a supply decreases (like oil), then the price goes up, or if the supply of something increases, the price goes down. The Fed manages money the same way: when prices rise (the dollar is worth less), the Fed reduces the supply of dollars, causing it to be worth more. Conversely, if prices fall (or don’t rise fast enough), the Fed increases supply, so that the dollar is worth less.
The reason money follows the law of Supply and Demand is because people use money, they consume it like they do other goods and services, like gasoline, tax preparation, food, dance lessons, and so forth. It’s not like a fine art painting, a stamp collection or a Beanie Baby — money is a product. It’s just that people have a hard time thinking of it as a consumer product since, in their experience, money is what they use to buy consumer products. But it’s a symmetric operation: when you buy gasoline with dollars, you are actually selling dollars in exchange for gasoline. That you call one side in this transaction “money” and the other “goods” is purely arbitrary, you call gasoline money and dollars the good that is being bought and sold for gasoline.
The reason dollars is a product is because trying to use gasoline as money is a pain in the neck. Storing it and exchanging it is difficult. Goods like this do become money, such as famously how prisons often use cigarettes as a medium of exchange, even for non-smokers, but it has to be a good that is fungible, storable, and easily exchanged. Dollars are the most fungible, the most storable, and the easiest exchanged, so has the most value as “money”. Sure, the mechanic can fix the farmers car for three chickens instead, but most of the time, both parties in the transaction would rather exchange the same value using dollars than chickens.
So the value of dollars is not like the value of Beanie Babies, which people might buy for $15,000, which changes purely on the whims of investors. Instead, a dollar is like gasoline, which obey the law of Supply and Demand.
This brings us back to the question of where Bitcoin gets its value. While Bitcoin is indeed used like dollars to buy things, that’s only a tiny use of the currency, so therefore it’s value isn’t determined by Supply and Demand. Instead, the value of Bitcoin is a lot like Beanie Babies, obeying the laws of investments. So in this respect, Oliver is right about where the value of Bitcoin comes, but wrong about where the value of dollars comes from.

Why Bitcoin conference didn’t take Bitcoin

John Oliver points out the irony of a Bitcoin conference that stopped accepting payments in Bitcoin for tickets.
The biggest reason for this is because Bitcoin has become so popular that transaction fees have gone up. Instead of being proof of failure, it’s proof of popularity. What John Oliver is saying is the old joke that nobody goes to that popular restaurant anymore because it’s too crowded and you can’t get a reservation.
Moreover, the point of Bitcoin is not to replace everyday currencies for everyday transactions. If you read Satoshi Nakamoto’s whitepaper, it’s only goal is to replace certain types of transactions, like purely electronic transactions where electronic goods and services are being exchanged. Where real-life goods/services are being exchanged, existing currencies work just fine. It’s only the crazy activists who claim Bitcoin will eventually replace real world currencies — the saner people see it co-existing with real-world currencies, each with a different value to consumers.

Turning a McNugget back into a chicken

John Oliver uses the metaphor of turning a that while you can process a chicken into McNuggets, you can’t reverse the process. It’s a funny metaphor.
But it’s not clear what the heck this metaphor is trying explain. That’s not a metaphor for the blockchain, but a metaphor for a “cryptographic hash”, where each block is a chicken, and the McNugget is the signature for the block (well, the block plus the signature of the last block, forming a chain).
Even then that metaphor as problems. The McNugget produced from each chicken must be unique to that chicken, for the metaphor to accurately describe a cryptographic hash. You can therefore identify the original chicken simply by looking at the McNugget. A slight change in the original chicken, like losing a feather, results in a completely different McNugget. Thus, nuggets can be used to tell if the original chicken has changed.
This then leads to the key property of the blockchain, it is unalterable. You can’t go back and change any of the blocks of data, because the fingerprints, the nuggets, will also change, and break the nugget chain.
The point is that while John Oliver is laughing at a silly metaphor to explain the blockchain becuase he totally misses the point of the metaphor.
Oliver rightly says “don’t worry if you don’t understand it — most people don’t”, but that includes the big companies that John Oliver name. Some companies do get it, and are producing reasonable things (like JP Morgan, by all accounts), but some don’t. IBM and other big consultancies are charging companies millions of dollars to consult with them on block chain products where nobody involved, the customer or the consultancy, actually understand any of it. That doesn’t stop them from happily charging customers on one side and happily spending money on the other.
Thus, rather than Oliver explaining the problem, he’s just being part of the problem. His explanation of blockchain left you dumber than before.

ICO’s

John Oliver mocks the Brave ICO ($35 million in 30 seconds), claiming it’s all driven by YouTube personalities and people who aren’t looking at the fundamentals.
And while this is true, most ICOs are bunk, the  Brave ICO actually had a business model behind it. Brave is a Chrome-like web-browser whose distinguishing feature is that it protects your privacy from advertisers. If you don’t use Brave or a browser with an ad block extension, you have no idea how bad things are for you. However, this presents a problem for websites that fund themselves via advertisements, which is most of them, because visitors no longer see ads. Brave has a fix for this. Most people wouldn’t mind supporting the websites they visit often, like the New York Times. That’s where the Brave ICO “token” comes in: it’s not simply stock in Brave, but a token for micropayments to websites. Users buy tokens, then use them for micropayments to websites like New York Times. The New York Times then sells the tokens back to the market for dollars. The buying and selling of tokens happens without a centralized middleman.
This is still all speculative, of course, and it remains to be seen how successful Brave will be, but it’s a serious effort. It has well respected VC behind the company, a well-respected founder (despite the fact he invented JavaScript), and well-respected employees. It’s not a scam, it’s a legitimate venture.

How to you make money from Bitcoin?

The last part of the show is dedicated to describing all the scam out there, advising people to be careful, and to be “responsible”. This is garbage.
It’s like my simple two step process to making lots of money via Bitcoin: (1) buy when the price is low, and (2) sell when the price is high. My advice is correct, of course, but useless. Same as “be careful” and “invest responsibly”.
The truth about investing in cryptocurrencies is “don’t”. The only responsible way to invest is to buy low-overhead market index funds and hold for retirement. No, you won’t get super rich doing this, but anything other than this is irresponsible gambling.
It’s a hard lesson to learn, because everyone is telling you the opposite. The entire channel CNBC is devoted to day traders, who buy and sell stocks at a high rate based on the same principle as a ponzi scheme, basing their judgment not on the fundamentals (like long term dividends) but animal spirits of whatever stock is hot or cold at the moment. This is the same reason people buy or sell Bitcoin, not because they can describe the fundamental value, but because they believe in a bigger fool down the road who will buy it for even more.
For things like Bitcoin, the trick to making money is to have bought it over 7 years ago when it was essentially worthless, except to nerds who were into that sort of thing. It’s the same tick to making a lot of money in Magic: The Gathering trading cards, which nerds bought decades ago which are worth a ton of money now. Or, to have bought Apple stock back in 2009 when the iPhone was new, when nerds could understand the potential of real Internet access and apps that Wall Street could not.
That was my strategy: be a nerd, who gets into things. I’ve made a good amount of money on all these things because as a nerd, I was into Magic: The Gathering, Bitcoin, and the iPhone before anybody else was, and bought in at the point where these things were essentially valueless.
At this point with cryptocurrencies, with the non-nerds now flooding the market, there little chance of making it rich. The lottery is probably a better bet. Instead, if you want to make money, become a nerd, obsess about a thing, understand a thing when its new, and cash out once the rest of the market figures it out. That might be Brave, for example, but buy into it because you’ve spent the last year studying the browser advertisement ecosystem, the market’s willingness to pay for content, and how their Basic Attention Token delivers value to websites — not because you want in on the ICO craze.

Conclusion

John Oliver spends 25 minutes explaining Bitcoin, Cryptocurrencies, and the Blockchain to you. Sure, it’s funny, but it leaves you worse off than when it started. It admits they “simplify” the explanation, but they simplified it so much to the point where they removed all useful information.

PipeCam: the low-cost underwater camera

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/pipecam-low-cost-underwater-camera/

Fred Fourie is building a low-cost underwater camera for shallow deployment, and his prototypes are already returning fascinating results. You can build your own PipeCam, and explore the undiscovered depths with a Raspberry Pi and off-the-shelf materials.

PipeCam underwater Raspberry Pi Camera

Materials and build

In its latest iteration, PipeCam consists of a 110mm PVC waste pipe with fittings and a 10mm perspex window at one end. Previous prototypes have also used plumbing materials for the body, but this latest version employs heavy-duty parts that deliver the good seal this project needs.

PipeCam underwater Raspberry Pi Camera

In testing, Fred and a friend determined that the rig could withstand 4 bar of pressure. This is enough to protect the tech inside at the depths Fred plans for, and a significant performance improvement on previous prototypes.

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

Inside the pipe are a Raspberry Pi 3, a camera module, and a real-time clock add-on board. A 2.4Ah rechargeable lead acid battery powers the set-up via a voltage regulator.

Using foam and fibreboard, Fred made a mount that holds everything in place and fits snugly inside the pipe.

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

PipeCam will be subject to ocean currents, not to mention the attentions of sea creatures, so it’s essential to make sure that everything is held securely inside the pipe – something Fred has learned from previous versions of the project.

Software

It’s straightforward to write time-lapse code for a Raspberry Pi using Python and one of our free online resources, but Fred has more ambitious plans for PipeCam. As well as a Python script to control the camera, Fred made a web page to display the health of the device. It shows battery level and storage availability, along with the latest photo taken by the camera. He also made adjustments to the camera’s exposure settings using raspistill. You can see the effect in this side-by-side comparison of the default python-picam image and the edited raspistill one.

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

Underwater testing

Fred has completed the initial first test of PipeCam, running the device under water for an hour in two-metre deep water off the coast near his home. And the results? Well, see for yourself:

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

PipeCam is a work in progress, and you can read Fred’s build log at the project’s Hackaday.io page, so be sure to follow along.

The post PipeCam: the low-cost underwater camera appeared first on Raspberry Pi.

Playboy Brands Boing Boing a “Clickbait” Site With No Fair Use Defense

Post Syndicated from Andy original https://torrentfreak.com/playboy-brands-boing-boing-a-clickbait-site-with-no-fair-use-defense-180126/

Late 2017, Boing Boing co-editor Xena Jardin posted an article in which he linked to an archive containing every Playboy centerfold image to date.

“Kind of amazing to see how our standards of hotness, and the art of commercial erotic photography, have changed over time,” Jardin noted.

While Boing Boing had nothing to do with the compilation, uploading, or storing of the Imgur-based archive, Playboy took exception to the popular blog linking to the album.

Noting that Jardin had referred to the archive uploader as a “wonderful person”, the adult publication responded with a lawsuit (pdf), claiming that Boing Boing had commercially exploited its copyrighted images.

Last week, with assistance from the Electronic Frontier Foundation, Boing Boing parent company Happy Mutants filed a motion to dismiss in which it defended its right to comment on and link to copyrighted content without that constituting infringement.

“This lawsuit is frankly mystifying. Playboy’s theory of liability seems to be that it is illegal to link to material posted by others on the web — an act performed daily by hundreds of millions of users of Facebook and Twitter, and by journalists like the ones in Playboy’s crosshairs here,” the company wrote.

EFF Senior Staff Attorney Daniel Nazer weighed in too, arguing that since Boing Boing’s reporting and commenting is protected by copyright’s fair use doctrine, the “deeply flawed” lawsuit should be dismissed.

Now, just a week later, Playboy has fired back. Opposing Happy Mutants’ request for the Court to dismiss the case, the company cites the now-famous Perfect 10 v. Amazon/Google case from 2007, which tried to prevent Google from facilitating access to infringing images.

Playboy highlights the court’s finding that Google could have been held contributorily liable – if it had knowledge that Perfect 10 images were available using its search engine, could have taken simple measures to prevent further damage, but failed to do so.

Turning to Boing Boing’s conduct, Playboy says that the company knew it was linking to infringing content, could have taken steps to prevent that, but failed to do so. It then launches an attack on the site itself, offering disparaging comments concerning its activities and business model.

“This is an important case. At issue is whether clickbait sites like Happy Mutants’ Boing Boing weblog — a site designed to attract viewers and encourage them to click on links in order to generate advertising revenue — can knowingly find, promote, and profit from infringing content with impunity,” Playboy writes.

“Clickbait sites like Boing Boing are not known for creating original content. Rather, their business model is based on ‘collecting’ interesting content created by others. As such, they effectively profit off the work of others without actually creating anything original themselves.”

Playboy notes that while sites like Boing Boing are within their rights to leverage works created by others, courts in the US and overseas have ruled that knowingly linking to infringing content is unacceptable.

Even given these conditions, Playboy argues, Happy Mutants and the EFF now want the Court to dismiss the case so that sites are free to “not only encourage, facilitate, and induce infringement, but to profit from those harmful activities.”

Claiming that Boing Boing’s only reason for linking to the infringing album was to “monetize the web traffic that over fifty years of Playboy photographs would generate”, Playboy insists that the site and parent company Happy Mutants was properly charged with copyright infringement.

Playboy also dismisses Boing Boing’s argument that a link to infringing content cannot result in liability due to the link having both infringing and substantial non-infringing uses.

First citing the Betamax case, which found that maker Sony could not be held liable for infringement because its video recorders had substantial non-infringing uses, Playboy counters with the Grokster decision, which held that a distributor of a product could be liable for infringement, if there was an intent to encourage or support infringement.

“In this case, Happy Mutants’ offending link — which does nothing more than support infringing content — is good for nothing but promoting infringement and there is no legitimate public interest in its unlicensed availability,” Playboy notes.

In its motion to dismiss, Happy Mutants also argued that unless Playboy could identify users who “in fact downloaded — rather than simply viewing — the material in question,” the case should be dismissed. However, Playboy rejects the argument, claiming it is based on an erroneous interpretation of the law.

Citing the Grokster decision once more, the adult publisher notes that the Supreme Court found that someone infringes contributorily when they intentionally induce or encourage direct infringement.

“The argument that contributory infringement only lies where the defendant’s actions result in further infringement ignores the ‘or’ and collapses ‘inducing’ and ‘encouraging’ into one thing when they are two distinct things,” Playboy writes.

As for Boing Boing’s four classic fair use arguments, the publisher describes these as “extremely weak” and proceeds to hit them one by one.

In respect of the purpose and character of the use, Playboy discounts Boing Boing’s position that the aim of its post was to show “how our standards of hotness, and the art of commercial erotic photography, have changed over time.” The publisher argues that is the exact same purpose of Playboy magazine, while highliting its publication Playboy: The Compete Centerfolds, 1953-2016.

Moving on to the second factor of fair use – the nature of the copyrighted work – Playboy notes that an entire album of artwork is involved, rather than just a single image.

On the third factor, concerning the amount and substantiality of the original work used, Playboy argues that in order to publish an opinion on how “standards of hotness” had developed over time, there was no need to link to all of the pictures in the archive.

“Had only representative images from each decade, or perhaps even each year, been taken, this would be a very different case — but Happy Mutants cannot dispute that it knew it was linking to an illegal library of ‘Every Playboy Playmate Centerfold Ever’ since that is what it titled its blog post,” Playboy notes.

Finally, when considering the effect of the use upon the potential market for or value of the copyrighted work, Playbody says its archive of images continues to be monetized and Boing Boing’s use of infringing images jeopardizes that.

“Given that people are generally not going to pay for what is freely available, it is disingenuous of Happy Mutants to claim that promoting the free availability of infringing archives of Playboy’s work for viewing and downloading is not going to have an adverse effect on the value or market of that work,” the publisher adds.

While it appears the parties agree on very little, there is agreement on one key aspect of the case – its wider importance.

On the one hand, Playboy insists that a finding in its favor will ensure that people can’t commercially exploit infringing content with impunity. On the other, Boing Boing believes that the health of the entire Internet is at stake.

“The world can’t afford a judgment against us in this case — it would end the web as we know it, threatening everyone who publishes online, from us five weirdos in our basements to multimillion-dollar, globe-spanning publishing empires like Playboy,” the company concludes.

Playboy’s opposition to Happy Mutants’ motion to dismiss can be found here (pdf)

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/top-8-best-practices-for-high-performance-etl-processing-using-amazon-redshift/

An ETL (Extract, Transform, Load) process enables you to load data from source systems into your data warehouse. This is typically executed as a batch or near-real-time ingest process to keep the data warehouse current and provide up-to-date analytical data to end users.

Amazon Redshift is a fast, petabyte-scale data warehouse that enables you easily to make data-driven decisions. With Amazon Redshift, you can get insights into your big data in a cost-effective fashion using standard SQL. You can set up any type of data model, from star and snowflake schemas, to simple de-normalized tables for running any analytical queries.

To operate a robust ETL platform and deliver data to Amazon Redshift in a timely manner, design your ETL processes to take account of Amazon Redshift’s architecture. When migrating from a legacy data warehouse to Amazon Redshift, it is tempting to adopt a lift-and-shift approach, but this can result in performance and scale issues long term. This post guides you through the following best practices for ensuring optimal, consistent runtimes for your ETL processes:

  • COPY data from multiple, evenly sized files.
  • Use workload management to improve ETL runtimes.
  • Perform table maintenance regularly.
  • Perform multiple steps in a single transaction.
  • Loading data in bulk.
  • Use UNLOAD to extract large result sets.
  • Use Amazon Redshift Spectrum for ad hoc ETL processing.
  • Monitor daily ETL health using diagnostic queries.

1. COPY data from multiple, evenly sized files

Amazon Redshift is an MPP (massively parallel processing) database, where all the compute nodes divide and parallelize the work of ingesting data. Each node is further subdivided into slices, with each slice having one or more dedicated cores, equally dividing the processing capacity. The number of slices per node depends on the node type of the cluster. For example, each DS2.XLARGE compute node has two slices, whereas each DS2.8XLARGE compute node has 16 slices.

When you load data into Amazon Redshift, you should aim to have each slice do an equal amount of work. When you load the data from a single large file or from files split into uneven sizes, some slices do more work than others. As a result, the process runs only as fast as the slowest, or most heavily loaded, slice. In the example shown below, a single large file is loaded into a two-node cluster, resulting in only one of the nodes, “Compute-0”, performing all the data ingestion:

When splitting your data files, ensure that they are of approximately equal size – between 1 MB and 1 GB after compression. The number of files should be a multiple of the number of slices in your cluster. Also, I strongly recommend that you individually compress the load files using gzip, lzop, or bzip2 to efficiently load large datasets.

When loading multiple files into a single table, use a single COPY command for the table, rather than multiple COPY commands. Amazon Redshift automatically parallelizes the data ingestion. Using a single COPY command to bulk load data into a table ensures optimal use of cluster resources, and quickest possible throughput.

2. Use workload management to improve ETL runtimes

Use Amazon Redshift’s workload management (WLM) to define multiple queues dedicated to different workloads (for example, ETL versus reporting) and to manage the runtimes of queries. As you migrate more workloads into Amazon Redshift, your ETL runtimes can become inconsistent if WLM is not appropriately set up.

I recommend limiting the overall concurrency of WLM across all queues to around 15 or less. This WLM guide helps you organize and monitor the different queues for your Amazon Redshift cluster.

When managing different workloads on your Amazon Redshift cluster, consider the following for the queue setup:

  • Create a queue dedicated to your ETL processes. Configure this queue with a small number of slots (5 or fewer). Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to the commit queue. Because ETL is a commit-intensive process, having a separate queue with a small number of slots helps mitigate this issue.
  • Claim extra memory available in a queue. When executing an ETL query, you can take advantage of the wlm_query_slot_count to claim the extra memory available in a particular queue. For example, a typical ETL process might involve COPYing raw data into a staging table so that downstream ETL jobs can run transformations that calculate daily, weekly, and monthly aggregates. To speed up the COPY process (so that the downstream tasks can start in parallel sooner), the wlm_query_slot_count can be increased for this step.
  • Create a separate queue for reporting queries. Configure query monitoring rules on this queue to further manage long-running and expensive queries.
  • Take advantage of the dynamic memory parameters. They swap the memory from your ETL to your reporting queue after the ETL job has completed.

3. Perform table maintenance regularly

Amazon Redshift is a columnar database, which enables fast transformations for aggregating data. Performing regular table maintenance ensures that transformation ETLs are predictable and performant. To get the best performance from your Amazon Redshift database, you must ensure that database tables regularly are VACUUMed and ANALYZEd. The Analyze & Vacuum schema utility helps you automate the table maintenance task and have VACUUM & ANALYZE executed in a regular fashion.

  • Use VACUUM to sort tables and remove deleted blocks

During a typical ETL refresh process, tables receive new incoming records using COPY, and unneeded data (cold data) is removed using DELETE. New rows are added to the unsorted region in a table. Deleted rows are simply marked for deletion.

DELETE does not automatically reclaim the space occupied by the deleted rows. Adding and removing large numbers of rows can therefore cause the unsorted region and the number of deleted blocks to grow. This can degrade the performance of queries executed against these tables.

After an ETL process completes, perform VACUUM to ensure that user queries execute in a consistent manner. The complete list of tables that need VACUUMing can be found using the Amazon Redshift Util’s table_info script.

Use the following approaches to ensure that VACCUM is completed in a timely manner:

  • Use wlm_query_slot_count to claim all the memory allocated in the ETL WLM queue during the VACUUM process.
  • DROP or TRUNCATE intermediate or staging tables, thereby eliminating the need to VACUUM them.
  • If your table has a compound sort key with only one sort column, try to load your data in sort key order. This helps reduce or eliminate the need to VACUUM the table.
  • Consider using time series This helps reduce the amount of data you need to VACUUM.
  • Use ANALYZE to update database statistics

Amazon Redshift uses a cost-based query planner and optimizer using statistics about tables to make good decisions about the query plan for the SQL statements. Regular statistics collection after the ETL completion ensures that user queries run fast, and that daily ETL processes are performant. The Amazon Redshift utility table_info script provides insights into the freshness of the statistics. Keeping the statistics off (pct_stats_off) less than 20% ensures effective query plans for the SQL queries.

4. Perform multiple steps in a single transaction

ETL transformation logic often spans multiple steps. Because commits in Amazon Redshift are expensive, if each ETL step performs a commit, multiple concurrent ETL processes can take a long time to execute.

To minimize the number of commits in a process, the steps in an ETL script should be surrounded by a BEGIN…END statement so that a single commit is performed only after all the transformation logic has been executed. For example, here is an example multi-step ETL script that performs one commit at the end:

Begin
CREATE temporary staging_table;
INSERT INTO staging_table SELECT .. FROM source (transformation logic);
DELETE FROM daily_table WHERE dataset_date =?;
INSERT INTO daily_table SELECT .. FROM staging_table (daily aggregate);
DELETE FROM weekly_table WHERE weekending_date=?;
INSERT INTO weekly_table SELECT .. FROM staging_table(weekly aggregate);
Commit

5. Loading data in bulk

Amazon Redshift is designed to store and query petabyte-scale datasets. Using Amazon S3 you can stage and accumulate data from multiple source systems before executing a bulk COPY operation. The following methods allow efficient and fast transfer of these bulk datasets into Amazon Redshift:

  • Use a manifest file to ingest large datasets that span multiple files. The manifest file is a JSON file that lists all the files to be loaded into Amazon Redshift. Using a manifest file ensures that Amazon Redshift has a consistent view of the data to be loaded from S3, while also ensuring that duplicate files do not result in the same data being loaded more than one time.
  • Use temporary staging tables to hold the data for transformation. These tables are automatically dropped after the ETL session is complete. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. Explicitly specifying the CREATE TEMPORARY TABLE statement allows you to control the DISTRIBUTION KEY, SORT KEY, and compression settings to further improve performance.
  • User ALTER table APPEND to swap data from the staging tables to the target table. Data in the source table is moved to matching columns in the target table. Column order doesn’t matter. After data is successfully appended to the target table, the source table is empty. ALTER TABLE APPEND is much faster than a similar CREATE TABLE AS or INSERT INTO operation because it doesn’t involve copying or moving data.

6. Use UNLOAD to extract large result sets

Fetching a large number of rows using SELECT is expensive and takes a long time. When a large amount of data is fetched from the Amazon Redshift cluster, the leader node has to hold the data temporarily until the fetches are complete. Further, data is streamed out sequentially, which results in longer elapsed time. As a result, the leader node can become hot, which not only affects the SELECT that is being executed, but also throttles resources for creating execution plans and managing the overall cluster resources. Here is an example of a large SELECT statement. Notice that the leader node is doing most of the work to stream out the rows:

Use UNLOAD to extract large results sets directly to S3. After it’s in S3, the data can be shared with multiple downstream systems. By default, UNLOAD writes data in parallel to multiple files according to the number of slices in the cluster. All the compute nodes participate to quickly offload the data into S3.

If you are extracting data for use with Amazon Redshift Spectrum, you should make use of the MAXFILESIZE parameter to and keep files are 150 MB. Similar to item 1 above, having many evenly sized files ensures that Redshift Spectrum can do the maximum amount of work in parallel.

7. Use Redshift Spectrum for ad hoc ETL processing

Events such as data backfill, promotional activity, and special calendar days can trigger additional data volumes that affect the data refresh times in your Amazon Redshift cluster. To help address these spikes in data volumes and throughput, I recommend staging data in S3. After data is organized in S3, Redshift Spectrum enables you to query it directly using standard SQL. In this way, you gain the benefits of additional capacity without having to resize your cluster.

For tips on getting started with and optimizing the use of Redshift Spectrum, see the previous post, 10 Best Practices for Amazon Redshift Spectrum.

8. Monitor daily ETL health using diagnostic queries

Monitoring the health of your ETL processes on a regular basis helps identify the early onset of performance issues before they have a significant impact on your cluster. The following monitoring scripts can be used to provide insights into the health of your ETL processes:

Script Use when… Solution
commit_stats.sql – Commit queue statistics from past days, showing largest queue length and queue time first DML statements such as INSERT/UPDATE/COPY/DELETE operations take several times longer to execute when multiple of these operations are in progress Set up separate WLM queues for the ETL process and limit the concurrency to < 5.
copy_performance.sql –  Copy command statistics for the past days Daily COPY operations take longer to execute • Follow the best practices for the COPY command.
• Analyze data growth with the incoming datasets and consider cluster resize to meet the expected SLA.
table_info.sql – Table skew and unsorted statistics along with storage and key information Transformation steps take longer to execute • Set up regular VACCUM jobs to address unsorted rows and claim the deleted blocks so that transformation SQL execute optimally.
• Consider a table redesign to avoid data skewness.
v_check_transaction_locks.sql – Monitor transaction locks INSERT/UPDATE/COPY/DELETE operations on particular tables do not respond back in timely manner, compared to when run after the ETL Multiple DML statements are operating on the same target table at the same moment from different transactions. Set up ETL job dependency so that they execute serially for the same target table.
v_get_schema_priv_by_user.sql – Get the schema that the user has access to Reporting users can view intermediate tables Set up separate database groups for reporting and ETL users, and grants access to objects using GRANT.
v_generate_tbl_ddl.sql – Get the table DDL You need to create an empty table with same structure as target table for data backfill Generate DDL using this script for data backfill.
v_space_used_per_tbl.sql – monitor space used by individual tables Amazon Redshift data warehouse space growth is trending upwards more than normal

Analyze the individual tables that are growing at higher rate than normal. Consider data archival using UNLOAD to S3 and Redshift Spectrum for later analysis.

Use unscanned_table_summary.sql to find unused table and archive or drop them.

top_queries.sql – Return the top 50 time consuming statements aggregated by its text ETL transformations are taking longer to execute Analyze the top transformation SQL and use EXPLAIN to find opportunities for tuning the query plan.

There are several other useful scripts available in the amazon-redshift-utils repository. The AWS Lambda Utility Runner runs a subset of these scripts on a scheduled basis, allowing you to automate much of monitoring of your ETL processes.

Example ETL process

The following ETL process reinforces some of the best practices discussed in this post. Consider the following four-step daily ETL workflow where data from an RDBMS source system is staged in S3 and then loaded into Amazon Redshift. Amazon Redshift is used to calculate daily, weekly, and monthly aggregations, which are then unloaded to S3, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

Step 1:  Extract from the RDBMS source to a S3 bucket

In this ETL process, the data extract job fetches change data every 1 hour and it is staged into multiple hourly files. For example, the staged S3 folder looks like the following:

 [[email protected] ~]$ aws s3 ls s3://<<S3 Bucket>>/batch/2017/07/02/
2017-07-02 01:59:58   81900220 20170702T01.export.gz
2017-07-02 02:59:56   84926844 20170702T02.export.gz
2017-07-02 03:59:54   78990356 20170702T03.export.gz
…
2017-07-02 22:00:03   75966745 20170702T21.export.gz
2017-07-02 23:00:02   89199874 20170702T22.export.gz
2017-07-02 00:59:59   71161715 20170702T23.export.gz

Organizing the data into multiple, evenly sized files enables the COPY command to ingest this data using all available resources in the Amazon Redshift cluster. Further, the files are compressed (gzipped) to further reduce COPY times.

Step 2: Stage data to the Amazon Redshift table for cleansing

Ingesting the data can be accomplished using a JSON-based manifest file. Using the manifest file ensures that S3 eventual consistency issues can be eliminated and also provides an opportunity to dedupe any files if needed. A sample manifest20170702.json file looks like the following:

{
  "entries": [
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T01.export.gz", "mandatory":true},
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T02.export.gz", "mandatory":true},
    …
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T23.export.gz", "mandatory":true}
  ]
}

The data can be ingested using the following command:

SET wlm_query_slot_count TO <<max available concurrency in the ETL queue>>;
COPY stage_tbl FROM 's3:// <<S3 Bucket>>/batch/manifest20170702.json' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' manifest;

Because the downstream ETL processes depend on this COPY command to complete, the wlm_query_slot_count is used to claim all the memory available to the queue. This helps the COPY command complete as quickly as possible.

Step 3: Transform data to create daily, weekly, and monthly datasets and load into target tables

Data is staged in the “stage_tbl” from where it can be transformed into the daily, weekly, and monthly aggregates and loaded into target tables. The following job illustrates a typical weekly process:

Begin
INSERT into ETL_LOG (..) values (..);
DELETE from weekly_tbl where dataset_week = <<current week>>;
INSERT into weekly_tbl (..)
  SELECT date_trunc('week', dataset_day) AS week_begin_dataset_date, SUM(C1) AS C1, SUM(C2) AS C2
	FROM   stage_tbl
GROUP BY date_trunc('week', dataset_day);
INSERT into AUDIT_LOG values (..);
COMMIT;
End;

As shown above, multiple steps are combined into one transaction to perform a single commit, reducing contention on the commit queue.

Step 4: Unload the daily dataset to populate the S3 data lake bucket

The transformed results are now unloaded into another S3 bucket, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

unload ('SELECT * FROM weekly_tbl WHERE dataset_week = <<current week>>’) TO 's3:// <<S3 Bucket>>/datalake/weekly/20170526/' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

Summary

Amazon Redshift lets you easily operate petabyte-scale data warehouses on the cloud. This post summarized the best practices for operating scalable ETL natively within Amazon Redshift. I demonstrated efficient ways to ingest and transform data, along with close monitoring. I also demonstrated the best practices being used in a typical sample ETL workload to transform the data into Amazon Redshift.

If you have questions or suggestions, please comment below.

 


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

 

Eevee gained 2791 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/15/eevee-gained-2791-experience-points/

Eevee grew to level 31!

A year strongly defined by mixed success! Also, a lot of video games.

I ran three game jams, resulting in a total of 157 games existing that may not have otherwise, which is totally mindblowing?!

For GAMES MADE QUICK???, glip and I made NEON PHASE, a short little exploratory platformer. Honestly, I should give myself more credit for this and the rest of the LÖVE games I’ve based on the same codebase — I wove a physics engine (and everything else!) from scratch and it has held up remarkably well for a variety of different uses.

I successfully finished an HD version of Isaac’s Descent using my LÖVE engine, though it doesn’t have anything new over the original and I’ve only released it as a tech demo on Patreon.

For Strawberry Jam (NSFW!) we made fox flux (slightly NSFW!), which felt like a huge milestone: the first game where I made all the art! I mean, not counting Isaac’s Descent, which was for a very limited platform. It’s a pretty arbitrary milestone, yes, but it feels significant. I’ve been working on expanding the game into a longer and slightly less buggy experience, but the art is taking the longest by far. I must’ve spent weeks on player sprites alone.

We then set about working on Bolthaven, a sequel of sorts to NEON PHASE, and got decently far, and then abandond it. Oops.

We then started a cute little PICO-8 game, and forgot about it. Oops.

I was recruited to help with Chaos Composer, a more ambitious game glip started with someone else in Unity. I had to get used to Unity, and we squabbled a bit, but the game is finally about at the point where it’s “playable” and “maps” can be designed? It’s slightly on hold at the moment while we all finish up some other stuff, though.

We made a birthday game for two of our friends whose birthdays were very close together! Only they got to see it.

For Ludum Dare 38, we made Lunar Depot 38, a little “wave shooter” or whatever you call those? The AI is pretty rough, seeing as this was the first time I’d really made enemies and I had 72 hours to figure out how to do it, but I still think it’s pretty fun to play and I love the circular world.

I made Roguelike Simulator as an experiment with making something small and quick with a simple tool, and I had a lot of fun! I definitely want to do more stuff like this in the future.

And now we’re working on a game about Star Anise, my cat’s self-insert, which is looking to have more polish and depth than anything we’ve done so far! We’ve definitely come a long way in a year.

Somewhere along the line, I put out a call for a “potluck” project, where everyone would give me sprites of a given size without knowing what anyone else had contributed, and I would then make a game using only those sprites. Unfortunately, that stalled a few times: I tried using the Phaser JS library, but we didn’t get along; I tried LÖVE, but didn’t know where to go with the game; and then I decided to use this as an experiment with procedural generation, and didn’t get around to it. I still feel bad that everyone did work for me and I didn’t follow through, but I don’t know whether this will ever become a game.

veekun, alas, consumed months of my life. I finally got Sun and Moon loaded, but it took weeks of work since I was basically reinventing all the tooling we’d ever had from scratch, without even having most of that tooling available as a reference. It was worth it in the end, at least: Ultra Sun and Ultra Moon only took a few days to get loaded. But veekun itself is still missing some obvious Sun/Moon features, and the whole site needs an overhaul, and I just don’t know if I want to dedicate that much time to it when I have so much other stuff going on that’s much more interesting to me right now.

I finally turned my blog into more of a website, giving it a neat front page that lists a bunch of stuff I’ve done. I made a release category at last, though I’m still not quite in the habit of using it.

I wrote some blog posts, of course! I think the most interesting were JavaScript got better while I wasn’t looking and Object models. I was also asked to write a couple pieces for money for a column that then promptly shut down.

On a whim, I made a set of Eevee mugshots for Doom, which I think is a decent indication of my (pixel) art progress over the year?

I started idchoppers, a Doom parsing and manipulation library written in Rust, though it didn’t get very far and I’ve spent most of the time fighting with Rust because it won’t let me implement all my extremely bad ideas. It can do a couple things, at least, like flip maps very quickly and render maps to SVG.

I did toy around with music a little, but not a lot.

I wrote two short twines for Flora. They’re okay. I’m working on another; I think it’ll be better.

I didn’t do a lot of art overall, at least compared to the two previous years; most of my art effort over the year has gone into fox flux, which requires me to learn a whole lot of things. I did dip my toes into 3D modelling, most notably producing my current Twitter banner as well as this cool Star Anise animation. I wouldn’t mind doing more of that; maybe I’ll even try to make a low-poly pixel-textured 3D game sometime.

I restarted my book with a much better concept, though so far I’ve only written about half a chapter. Argh. I see that the vast majority of the work was done within the span of a single week, which is bad since that means I only worked on it for a week, but good since that means I can actually do a pretty good amount of work in only a week. I also did a lot of squabbling with tooling, which is hopefully mostly out of the way now.

My computer broke? That was an exciting week.


A lot of stuff, but the year as a whole still feels hit or miss. All the time I spent on veekun feels like a black void in the middle of the year, which seems like a good sign that I maybe don’t want to pour even more weeks into it in the near future.

Mostly, I want to do: more games, more art, more writing, more music.

I want to try out some tiny game making tools and make some tiny games with them — partly to get exposure to different things, partly to get more little ideas out into the world regularly, and partly to get more practice at letting myself have ideas. I have a couple tools in mind and I guess I’ll aim at a microgame every two months or so? I’d also like to finish the expanded fox flux by the end of the year, of course, though at the moment I can’t even gauge how long it might take.

I seriously lapsed on drawing last year, largely because fox flux pixel art took me so much time. So I want to draw more, and I want to get much faster at pixel art. It would probably help if I had a more concrete goal for drawing, so I might try to draw some short comics and write a little visual novel or something, which would also force me to aim for consistency.

I want to work on my book more, of course, but I also want to try my hand at a bit more fiction. I’ve had a blast writing dialogue for our games! I just shy away from longer-form writing for some reason — which seems ridiculous when a large part of my audience found me through my blog. I do think I’ve had some sort of breakthrough in the last month or two; I suddenly feel a good bit more confident about writing in general and figuring out what I want to say? One recent post I know I wrote in a single afternoon, which virtually never happens because I keep rewriting and rearranging stuff. Again, a visual novel would be a good excuse to practice writing fiction without getting too bogged down in details.

And, ah, music. I shy heavily away from music, since I have no idea what I’m doing, and also I seem to spend a lot of time fighting with tools. (Surprise.) I tried out SunVox for the first time just a few days ago and have been enjoying it quite a bit for making sound effects, so I might try it for music as well. And once again, visual novel background music is a pretty low-pressure thing to compose for. Hell, visual novels are small games, too, so that checks all the boxes. I guess I’ll go make a visual novel.

Here’s to twenty gayteen!

A hedgehog cam or two

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/a-hedgehog-cam-or-two/

Here we are, hauling ourselves out of the Christmas and New Year holidays and into January proper. It’s dawning on me that I have to go back to work, even though it’s still very cold and gloomy in northern Europe, and even though my duvet is lovely and warm. I found myself envying beings that hibernate, and thinking about beings that hibernate, and searching for things to do with hedgehogs. And, well, the long and the short of it is, today’s blog post is a short meditation on the hedgehog cam.

A hedgehog in a garden, photographed in infrared light by a hedgehog cam

Success! It’s a hedgehog!
Photo by Andrew Wedgbury

Hedgehog watching

Someone called Barker has installed a Raspberry Pi–based hedgehog cam in a location with a distant view of a famous Alp, and as well as providing live views by visible and infrared light for the dedicated and the insomniac, they also make a sped-up version of the previous night’s activity available. With hedgehogs usually being in hibernation during January, you mightn’t see them in any current feed — but don’t worry! You’re guaranteed a few hedgehogs on Barker’s website, because they have also thrown in some lovely GIFs of hoggy (and foxy) divas that their camera captured in the past.

A Hedgehog eating from a bowl on a patio, captured by a hedgehog cam

Nom nom nom!
GIF by Barker’s Site

Build your own hedgehog cam

For pointers on how to replicate this kind of setup, you could do worse than turn to Andrew Wedgbury’s hedgehog cam write-up. Andrew’s Twitter feed reveals that he’s a Cambridge local, and there are hints that he was behind RealVNC’s hoggy mascot for Pi Wars 2017.

RealVNC on Twitter

Another day at the office: testing our #PiWars mascot using a @Raspberry_Pi 3, #VNC Connect and @4tronix_uk Picon Zero. Name suggestions? https://t.co/iYY3xAX9Bk

Our infrared bird box and time-lapse camera resources will also set you well on the way towards your own custom wildlife camera. For a kit that wraps everything up in a weatherproof enclosure made with love, time, and serious amounts of design and testing, take a look at Naturebytes’ wildlife cam kit.

Or, if you’re thinking that a robot mascot is more dependable than real animals for the fluffiness you need in order to start your January with something like productivity and with your soul intact, you might like to put your own spin on our robot buggy.

Happy 2018

While we’re on the subject of getting to grips with the new year, do take a look at yesterday’s blog post, in which we suggest a New Year’s project that’s different from the usual resolutions. However you tackle 2018, we wish you an excellent year of creative computing.

The post A hedgehog cam or two appeared first on Raspberry Pi.

The deal with Bitcoin

Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2017/12/the-deal-with-bitcoin.html

♪ Used to have a little now I have a lot
I’m still, I’m still Jenny from the block
          chain ♪

For all that has been written about Bitcoin and its ilk, it is curious that the focus is almost solely what the cryptocurrencies are supposed to be. Technologists wax lyrical about the potential for blockchains to change almost every aspect of our lives. Libertarians and paleoconservatives ache for the return to “sound money” that can’t be conjured up at the whim of a bureaucrat. Mainstream economists wag their fingers, proclaiming that a proper currency can’t be deflationary, that it must maintain a particular velocity, or that the government must be able to nip crises of confidence in the bud. And so on.

Much of this may be true, but the proponents of cryptocurrencies should recognize that an appeal to consequences is not a guarantee of good results. The critics, on the other hand, would be best served to remember that they are drawing far-reaching conclusions about the effects of modern monetary policies based on a very short and tumultuous period in history.

In this post, my goal is to ditch most of the dogma, talk a bit about the origins of money – and then see how “crypto” fits the bill.

1. The prehistory of currencies

The emergence of money is usually explained in a very straightforward way. You know the story: a farmer raised a pig, a cobbler made a shoe. The cobbler needed to feed his family while the farmer wanted to keep his feet warm – and so they met to exchange the goods on mutually beneficial terms. But as the tale goes, the barter system had a fatal flaw: sometimes, a farmer wanted a cooking pot, a potter wanted a knife, and a blacksmith wanted a pair of pants. To facilitate increasingly complex, multi-step exchanges without requiring dozens of people to meet face to face, we came up with an abstract way to represent value – a shiny coin guaranteed to be accepted by every tradesman.

It is a nice parable, but it probably isn’t very true. It seems far more plausible that early societies relied on the concept of debt long before the advent of currencies: an informal tally or a formal ledger would be used to keep track of who owes what to whom. The concept of debt, closely associated with one’s trustworthiness and standing in the community, would have enabled a wide range of economic activities: debts could be paid back over time, transferred, renegotiated, or forgotten – all without having to engage in spot barter or to mint a single coin. In fact, such non-monetary, trust-based, reciprocal economies are still common in closely-knit communities: among families, neighbors, coworkers, or friends.

In such a setting, primitive currencies probably emerged simply as a consequence of having a system of prices: a cow being worth a particular number of chickens, a chicken being worth a particular number of beaver pelts, and so forth. Formalizing such relationships by settling on a single, widely-known unit of account – say, one chicken – would make it more convenient to transfer, combine, or split debts; or to settle them in alternative goods.

Contrary to popular belief, for communal ledgers, the unit of account probably did not have to be particularly desirable, durable, or easy to carry; it was simply an accounting tool. And indeed, we sometimes run into fairly unusual units of account even in modern times: for example, cigarettes can be the basis of a bustling prison economy even when most inmates don’t smoke and there are not that many packs to go around.

2. The age of commodity money

In the end, the development of coinage might have had relatively little to do with communal trade – and far more with the desire to exchange goods with strangers. When dealing with a unfamiliar or hostile tribe, the concept of a chicken-denominated ledger does not hold up: the other side might be disinclined to honor its obligations – and get away with it, too. To settle such problematic trades, we needed a “spot” medium of exchange that would be easy to carry and authenticate, had a well-defined value, and a near-universal appeal. Throughout much of the recorded history, precious metals – predominantly gold and silver – proved to fit the bill.

In the most basic sense, such commodities could be seen as a tool to reconcile debts across societal boundaries, without necessarily replacing any local units of account. An obligation, denominated in some local currency, would be created on buyer’s side in order to procure the metal for the trade. The proceeds of the completed transaction would in turn allow the seller to settle their own local obligations that arose from having to source the traded goods. In other words, our wondrous chicken-denominated ledgers could coexist peacefully with gold – and when commodity coinage finally took hold, it’s likely that in everyday trade, precious metals served more as a useful abstraction than a precise store of value. A “silver chicken” of sorts.

Still, the emergence of commodity money had one interesting side effect: it decoupled the unit of debt – a “claim on the society”, in a sense – from any moral judgment about its origin. A piece of silver would buy the same amount of food, whether earned through hard labor or won in a drunken bet. This disconnect remains a central theme in many of the debates about social justice and unfairly earned wealth.

3. The State enters the game

If there is one advantage of chicken ledgers over precious metals, it’s that all chickens look and cluck roughly the same – something that can’t be said of every nugget of silver or gold. To cope with this problem, we needed to shape raw commodities into pieces of a more predictable shape and weight; a trusted party could then stamp them with a mark to indicate the value and the quality of the coin.

At first, the task of standardizing coinage rested with private parties – but the responsibility was soon assumed by the State. The advantages of this transition seemed clear: a single, widely-accepted and easily-recognizable currency could be now used to settle virtually all private and official debts.

Alas, in what deserves the dubious distinction of being one of the earliest examples of monetary tomfoolery, some States succumbed to the temptation of fiddling with the coinage to accomplish anything from feeding the poor to waging wars. In particular, it would be common to stamp coins with the same face value but a progressively lower content of silver and gold. Perhaps surprisingly, the strategy worked remarkably well; at least in the times of peace, most people cared about the value stamped on the coin, not its precise composition or weight.

And so, over time, representative money was born: sooner or later, most States opted to mint coins from nearly-worthless metals, or print banknotes on paper and cloth. This radically new currency was accompanied with a simple pledge: the State offered to redeem it at any time for its nominal value in gold.

Of course, the promise was largely illusory: the State did not have enough gold to honor all the promises it had made. Still, as long as people had faith in their rulers and the redemption requests stayed low, the fundamental mechanics of this new representative currency remained roughly the same as before – and in some ways, were an improvement in that they lessened the insatiable demand for a rare commodity. Just as importantly, the new money still enabled international trade – using the underlying gold exchange rate as a reference point.

4. Fractional reserve banking and fiat money

For much of the recorded history, banking was an exceptionally dull affair, not much different from running a communal chicken
ledger of the old. But then, something truly marvelous happened in the 17th century: around that time, many European countries have witnessed
the emergence of fractional-reserve banks.

These private ventures operated according to a simple scheme: they accepted people’s coin
for safekeeping, promising to pay a premium on every deposit made. To meet these obligations and to make a profit, the banks then
used the pooled deposits to make high-interest loans to other folks. The financiers figured out that under normal circumstances
and when operating at a sufficient scale, they needed only a very modest reserve – well under 10% of all deposited money – to be
able to service the usual volume and size of withdrawals requested by their customers. The rest could be loaned out.

The very curious consequence of fractional-reserve banking was that it pulled new money out of thin air.
The funds were simultaneously accounted for in the statements shown to the depositor, evidently available for withdrawal or
transfer at any time; and given to third-party borrowers, who could spend them on just about anything. Heck, the borrowers could
deposit the proceeds in another bank, creating even more money along the way! Whatever they did, the sum of all funds in the monetary
system now appeared much higher than the value of all coins and banknotes issued by the government – let alone the amount of gold
sitting in any vault.

Of course, no new money was being created in any physical sense: all that banks were doing was engaging in a bit of creative accounting – the sort of which would probably land you in jail if you attempted it today in any other comparably vital field of enterprise. If too many depositors were to ask for their money back, or if too many loans were to go bad, the banking system would fold. Fortunes would evaporate in a puff of accounting smoke, and with the disappearance of vast quantities of quasi-fictitious (“broad”) money, the wealth of the entire nation would shrink.

In the early 20th century, the world kept witnessing just that; a series of bank runs and economic contractions forced the governments around the globe to act. At that stage, outlawing fractional-reserve banking was no longer politically or economically tenable; a simpler alternative was to let go of gold and move to fiat money – a currency implemented as an abstract social construct, with no predefined connection to the physical realm. A new breed of economists saw the role of the government not in trying to peg the value of money to an inflexible commodity, but in manipulating its supply to smooth out economic hiccups or to stimulate growth.

(Contrary to popular beliefs, such manipulation is usually not done by printing new banknotes; more sophisticated methods, such as lowering reserve requirements for bank deposits or enticing banks to invest its deposits into government-issued securities, are the preferred route.)

The obvious peril of fiat money is that in the long haul, its value is determined strictly by people’s willingness to accept a piece of paper in exchange for their trouble; that willingness, in turn, is conditioned solely on their belief that the same piece of paper would buy them something nice a week, a month, or a year from now. It follows that a simple crisis of confidence could make a currency nearly worthless overnight. A prolonged period of hyperinflation and subsequent austerity in Germany and Austria was one of the precipitating factors that led to World War II. In more recent times, dramatic episodes of hyperinflation plagued the fiat currencies of Israel (1984), Mexico (1988), Poland (1990), Yugoslavia (1994), Bulgaria (1996), Turkey (2002), Zimbabwe (2009), Venezuela (2016), and several other nations around the globe.

For the United States, the switch to fiat money came relatively late, in 1971. To stop the dollar from plunging like a rock, the Nixon administration employed a clever trick: they ordered the freeze of wages and prices for the 90 days that immediately followed the move. People went on about their lives and paid the usual for eggs or milk – and by the time the freeze ended, they were accustomed to the idea that the “new”, free-floating dollar is worth about the same as the old, gold-backed one. A robust economy and favorable geopolitics did the rest, and so far, the American adventure with fiat currency has been rather uneventful – perhaps except for the fact that the price of gold itself skyrocketed from $35 per troy ounce in 1971 to $850 in 1980 (or, from $210 to $2,500 in today’s dollars).

Well, one thing did change: now better positioned to freely tamper with the supply of money, the regulators in accord with the bankers adopted a policy of creating it at a rate that slightly outstripped the organic growth in economic activity. They did this to induce a small, steady degree of inflation, believing that doing so would discourage people from hoarding cash and force them to reinvest it for the betterment of the society. Some critics like to point out that such a policy functions as a “backdoor” tax on savings that happens to align with the regulators’ less noble interests; still, either way: in the US and most other developed nations, the purchasing power of any money kept under a mattress will drop at a rate of somewhere between 2 to 10% a year.

5. So what’s up with Bitcoin?

Well… countless tomes have been written about the nature and the optimal characteristics of government-issued fiat currencies. Some heterodox economists, notably including Murray Rothbard, have also explored the topic of privately-issued, decentralized, commodity-backed currencies. But Bitcoin is a wholly different animal.

In essence, BTC is a global, decentralized fiat currency: it has no (recoverable) intrinsic value, no central authority to issue it or define its exchange rate, and it has no anchoring to any historical reference point – a combination that until recently seemed nonsensical and escaped any serious scrutiny. It does the unthinkable by employing three clever tricks:

  1. It allows anyone to create new coins, but only by solving brute-force computational challenges that get more difficult as the time goes by,

  2. It prevents unauthorized transfer of coins by employing public key cryptography to sign off transactions, with only the authorized holder of a coin knowing the correct key,

  3. It prevents double-spending by using a distributed public ledger (“blockchain”), recording the chain of custody for coins in a tamper-proof way.

The blockchain is often described as the most important feature of Bitcoin, but in some ways, its importance is overstated. The idea of a currency that does not rely on a centralized transaction clearinghouse is what helped propel the platform into the limelight – mostly because of its novelty and the perception that it is less vulnerable to government meddling (although the government is still free to track down, tax, fine, or arrest any participants). On the flip side, the everyday mechanics of BTC would not be fundamentally different if all the transactions had to go through Bitcoin Bank, LLC.

A more striking feature of the new currency is the incentive structure surrounding the creation of new coins. The underlying design democratized the creation of new coins early on: all you had to do is leave your computer running for a while to acquire a number of tokens. The tokens had no practical value, but obtaining them involved no substantial expense or risk. Just as importantly, because the difficulty of the puzzles would only increase over time, the hope was that if Bitcoin caught on, latecomers would find it easier to purchase BTC on a secondary market than mine their own – paying with a more established currency at a mutually beneficial exchange rate.

The persistent publicity surrounding Bitcoin and other cryptocurrencies did the rest – and today, with the growing scarcity of coins and the rapidly increasing demand, the price of a single token hovers somewhere south of $15,000.

6. So… is it bad money?

Predicting is hard – especially the future. In some sense, a coin that represents a cryptographic proof of wasted CPU cycles is no better or worse than a currency that relies on cotton decorated with pictures of dead presidents. It is true that Bitcoin suffers from many implementation problems – long transaction processing times, high fees, frequent security breaches of major exchanges – but in principle, such problems can be overcome.

That said, currencies live and die by the lasting willingness of others to accept them in exchange for services or goods – and in that sense, the jury is still out. The use of Bitcoin to settle bona fide purchases is negligible, both in absolute terms and in function of the overall volume of transactions. In fact, because of the technical challenges and limited practical utility, some companies that embraced the currency early on are now backing out.

When the value of an asset is derived almost entirely from its appeal as an ever-appreciating investment vehicle, the situation has all the telltale signs of a speculative bubble. But that does not prove that the asset is destined to collapse, or that a collapse would be its end. Still, the built-in deflationary mechanism of Bitcoin – the increasing difficulty of producing new coins – is probably both a blessing and a curse.

It’s going to go one way or the other; and when it’s all said and done, we’re going to celebrate the people who made the right guess. Because future is actually pretty darn easy to predict — in retrospect.

Movie Company Has No Right to Sue, Accused Pirate Argues

Post Syndicated from Ernesto original https://torrentfreak.com/movie-company-has-no-right-to-sue-accused-pirate-argues-171208/

In recent years, a group of select companies have pressured hundreds of thousands of alleged pirates to pay significant settlement fees, or face legal repercussions.

These so-called “copyright trolling” efforts have also been a common occurrence in the United States for more than half a decade, and still are today.

While copyright holders should be able to take legitimate piracy claims to court, not all cases are as strong as they first appear. Many defendants have brought up flaws, often in relation to the IP-address evidence, but an accused pirate in Oregon takes things up a notch.

Lingfu Zhang, represented by attorney David Madden, has turned the tables on the makers of the film Fathers & Daughters. The man denies having downloaded the movie but also points out that the filmmakers have signed away their online distribution rights.

The issue was brought up in previous months, but the relevant findings were only unsealed this week. They show that the movie company (F&D), through a sales agent, sold the online distribution rights to a third party.

While this is not uncommon in the movie business, it means that they no longer have the right to distribute the movie online, a right Zhang was accused of violating. This is also what his attorney pointed out to the court, asking for a judgment in favor of his client.

“ZHANG denies downloading the movie but Defendant’s current motion for summary judgment challenges a different portion of F&D’s case: Defendant argues that F&D has alienated all of the relevant rights necessary to sue for infringement under the Copyright Act,” Madden writes.

The filmmakers opposed the request and pointed out that they still had some rights. However, this is irrelevant according to the defense, since the distribution rights are not owned by them, but by a company that’s not part of the lawsuit.

“Plaintiff claims, for example, that it still owns the right to exploit the movie on airlines and oceangoing vessels. That may or may not be true – Plaintiff has not submitted any evidence on the question – but ZHANG is not accused of showing the movie on an airplane or a cruise ship.

“He is accused of downloading it over the Internet, which is an infringement that affects only an exclusive right owned by non-party DISTRIBUTOR 2,” Madden adds.

Interestingly, an undated addendum to the licensing agreement, allegedly created after the lawsuit was started, states that the filmmakers would keep their “anti-piracy” rights, as can be seen below.

Anti-Piracy rights?

This doesn’t save the filmmaker, according to the defense. The “licensor” who keeps these anti-piracy and enforcement rights refers to the sales agent, not the filmmaker, Madden writes. In addition, the case is about copyright infringement, and despite the addendum, the filmmakers don’t have the exclusive rights that apply here.

“Plaintiff represented to this Court that it was the ‘proprietor of all copyrights and interests need to bring suit’ […] notwithstanding that it had – years earlier – transferred away all its exclusive rights under Section 106 of the Copyright Act,” the defense lawyer concludes.

“Even viewing all Plaintiff’s agreements in the light most favorable to it, Plaintiff holds nothing more than a bare right to sue, which is not a cognizable right that may be exercised in the courts of this Circuit.”

While the court has yet to decide on the motion, this case could turn into a disaster for the makers of Fathers & Daughters.

If the court agrees that they don’t have the proper rights, defendants in other cases may argue the same. It’s easy to see how their entire trolling scheme would then collapse.

The original memorandum in support of the motion for summary judgment is available here (pdf) and a copy of the reply brief can be found here (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

The Raspberry Pi Christmas shopping list 2017

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/christmas-shopping-list-2017/

Looking for the perfect Christmas gift for a beloved maker in your life? Maybe you’d like to give a relative or friend a taste of the world of coding and Raspberry Pi? Whatever you’re looking for, the Raspberry Pi Christmas shopping list will point you in the right direction.

An ice-skating Raspberry Pi - The Raspberry Pi Christmas Shopping List 2017

For those getting started

Thinking about introducing someone special to the wonders of Raspberry Pi during the holidays? Although you can set up your Pi with peripherals from around your home, such as a mobile phone charger, your PC’s keyboard, and the old mouse dwelling in an office drawer, a starter kit is a nice all-in-one package for the budding coder.



Check out the starter kits from Raspberry Pi Approved Resellers such as Pimoroni, The Pi Hut, ModMyPi, Adafruit, CanaKit…the list is pretty long. Our products page will direct you to your closest reseller, or you can head to element14 to pick up the official Raspberry Pi Starter Kit.



You can also buy the Raspberry Pi Press’s brand-new Raspberry Pi Beginners Book, which includes a Raspberry Pi Zero W, a case, a ready-made SD card, and adapter cables.

Once you’ve presented a lucky person with their first Raspberry Pi, it’s time for them to spread their maker wings and learn some new skills.

MagPi Essentials books - The Raspberry Pi Christmas Shopping List 2017

To help them along, you could pick your favourite from among the Official Projects Book volume 3, The MagPi Essentials guides, and the brand-new third edition of Carrie Anne Philbin’s Adventures in Raspberry Pi. (She is super excited about this new edition!)

And you can always add a link to our free resources on the gift tag.

For the maker in your life

If you’re looking for something for a confident digital maker, you can’t go wrong with adding to their arsenal of electric and electronic bits and bobs that are no doubt cluttering drawers and boxes throughout their house.



Components such as servomotors, displays, and sensors are staples of the maker world. And when it comes to jumper wires, buttons, and LEDs, one can never have enough.



You could also consider getting your person a soldering iron, some helpings hands, or small tools such as a Dremel or screwdriver set.

And to make their life a little less messy, pop it all inside a Really Useful Box…because they’re really useful.



For kit makers

While some people like to dive into making head-first and to build whatever comes to mind, others enjoy working with kits.



The Naturebytes kit allows you to record the animal visitors of your garden with the help of a camera and a motion sensor. Footage of your local badgers, birds, deer, and more will be saved to an SD card, or tweeted or emailed to you if it’s in range of WiFi.

Cortec Tiny 4WD - The Raspberry Pi Christmas Shopping List 2017

Coretec’s Tiny 4WD is a kit for assembling a Pi Zero–powered remote-controlled robot at home. Not only is the robot adorable, building it also a great introduction to motors and wireless control.



Bare Conductive’s Touch Board Pro Kit offers everything you need to create interactive electronics projects using conductive paint.

Pi Hut Arcade Kit - The Raspberry Pi Christmas Shopping List 2017

Finally, why not help your favourite maker create their own gaming arcade using the Arcade Building Kit from The Pi Hut?

For the reader

For those who like to curl up with a good read, or spend too much of their day on public transport, a book or magazine subscription is the perfect treat.

For makers, hackers, and those interested in new technologies, our brand-new HackSpace magazine and the ever popular community magazine The MagPi are ideal. Both are available via a physical or digital subscription, and new subscribers to The MagPi also receive a free Raspberry Pi Zero W plus case.

Cover of CoderDojo Nano Make your own game

Marc Scott Beginner's Guide to Coding Book

You can also check out other publications from the Raspberry Pi family, including CoderDojo’s new CoderDojo Nano: Make Your Own Game, Eben Upton and Gareth Halfacree’s Raspberry Pi User Guide, and Marc Scott’s A Beginner’s Guide to Coding. And have I mentioned Carrie Anne’s Adventures in Raspberry Pi yet?

Stocking fillers for everyone

Looking for something small to keep your loved ones occupied on Christmas morning? Or do you have to buy a Secret Santa gift for the office tech? Here are some wonderful stocking fillers to fill your boots with this season.

Pi Hut 3D Christmas Tree - The Raspberry Pi Christmas Shopping List 2017

The Pi Hut 3D Xmas Tree: available as both a pre-soldered and a DIY version, this gadget will work with any 40-pin Raspberry Pi and allows you to create your own mini light show.



Google AIY Voice kit: build your own home assistant using a Raspberry Pi, the MagPi Essentials guide, and this brand-new kit. “Google, play Mariah Carey again…”



Pimoroni’s Raspberry Pi Zero W Project Kits offer everything you need, including the Pi, to make your own time-lapse cameras, music players, and more.



The official Raspberry Pi Sense HAT, Camera Module, and cases for the Pi 3 and Pi Zero will complete the collection of any Raspberry Pi owner, while also opening up exciting project opportunities.

STEAM gifts that everyone will love

Awesome Astronauts | Building LEGO’s Women of NASA!

LEGO Idea’s bought out this amazing ‘Women of NASA’ set, and I thought it would be fun to build, play and learn from these inspiring women! First up, let’s discover a little more about Sally Ride and Mae Jemison, two AWESOME ASTRONAUTS!

Treat the kids, and big kids, in your life to the newest LEGO Ideas set, the Women of NASA — starring Nancy Grace Roman, Margaret Hamilton, Sally Ride, and Mae Jemison!



Explore the world of wearables with Pimoroni’s sewable, hackable, wearable, adorable Bearables kits.



Add lights and motors to paper creations with the Activating Origami Kit, available from The Pi Hut.




We all loved Hidden Figures, and the STEAM enthusiast you know will do too. The film’s available on DVD, and you can also buy the original book, along with other fascinating non-fiction such as Rebecca Skloot’s The Immortal Life of Henrietta Lacks, Rachel Ignotofsky’s Women in Science, and Sydney Padua’s (mostly true) The Thrilling Adventures of Lovelace and Babbage.

Have we missed anything?

With so many amazing kits, HATs, and books available from members of the Raspberry Pi community, it’s hard to only pick a few. Have you found something splendid for the maker in your life? Maybe you’ve created your own kit that uses the Raspberry Pi? Share your favourites with us in the comments below or via our social media accounts.

The post The Raspberry Pi Christmas shopping list 2017 appeared first on Raspberry Pi.

Don Jr.: I’ll bite

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/11/don-jr-ill-bite.html

So Don Jr. tweets the following, which is an excellent troll. So I thought I’d bite. The reason is I just got through debunk Democrat claims about NetNeutrality, so it seems like a good time to balance things out and debunk Trump nonsense.

The issue here is not which side is right. The issue here is whether you stand for truth, or whether you’ll seize any factoid that appears to support your side, regardless of the truthfulness of it. The ACLU obviously chose falsehoods, as I documented. In the following tweet, Don Jr. does the same.

It’s a preview of the hyperpartisan debates are you are likely to have across the dinner table tomorrow, which each side trying to outdo the other in the false-hoods they’ll claim.

What we see in this number is a steady trend of these statistics since the Great Recession, with no evidence in the graphs showing how Trump has influenced these numbers, one way or the other.

Stock markets at all time highs

This is true, but it’s obviously not due to Trump. The stock markers have been steadily rising since the Great Recession. Trump has done nothing substantive to change the market trajectory. Also, he hasn’t inspired the market to change it’s direction.
To be fair to Don Jr., we’ve all been crediting (or blaming) presidents for changes in the stock market despite the fact they have almost no influence over it. Presidents don’t run the economy, it’s an inappropriate conceit. The most influence they’ve had is in harming it.

Lowest jobless claims since 73

Again, let’s graph this:

As we can see, jobless claims have been on a smooth downward trajectory since the Great Recession. It’s difficult to see here how President Trump has influenced these numbers.

6 Trillion added to the economy

What he’s referring to is that assets have risen in value, like the stock market, homes, gold, and even Bitcoin.
But this is a well known fallacy known as Mercantilism, believing the “economy” is measured by the value of its assets. This was debunked by Adam Smith in his book “The Wealth of Nations“, where he showed instead the the “economy” is measured by how much it produces (GDP – Gross Domestic Product) and not assets.
GDP has grown at 3.0%, which is pretty good compared to the long term trend, and is better than Europe or Japan (though not as good as China). But Trump doesn’t deserve any credit for this — today’s rise in GDP is the result of stuff that happened years ago.
Assets have risen by $6 trillion, but that’s not a good thing. After all, when you sell your home for more money, the buyer has to pay more. So one person is better off and one is worse off, so the net effect is zero.
Actually, such asset price increase is a worrisome indicator — we are entering into bubble territory. It’s the result of a loose monetary policy, low interest rates and “quantitative easing” that was designed under the Obama administration to stimulate the economy. That’s why all assets are rising in value. Normally, a rise in one asset means a fall in another, like selling gold to pay for houses. But because of loose monetary policy, all assets are increasing in price. The amazing rise in Bitcoin over the last year is as much a result of this bubble growing in all assets as it is to an exuberant belief in Bitcoin.
When this bubble collapses, which may happen during Trump’s term, it’ll really be the Obama administration who is to blame. I mean, if Trump is willing to take credit for the asset price bubble now, I’m willing to give it to him, as long as he accepts the blame when it crashes.

1.5 million fewer people on food stamps

As you’d expect, I’m going to debunk this with a graph: the numbers have been falling since the great recession. Indeed, in the previous period under Obama, 1.9 fewer people got off food stamps, so Trump’s performance is slight ahead rather than behind Obama. Of course, neither president is really responsible.

Consumer confidence through the roof

Again we are going to graph this number:

Again we find nothing in the graph that suggests President Trump is responsible for any change — it’s been improving steadily since the Great Recession.

One thing to note is that, technically, it’s not “through the roof” — it still quite a bit below the roof set during the dot-com era.

Lowest Unemployment rate in 17 years

Again, let’s simply graph it over time and look for Trump’s contribution. as we can see, there doesn’t appear to be anything special Trump has done — unemployment has steadily been improving since the Great Recession.
But here’s the thing, the “unemployment rate” only measures those looking for work, not those who have given up. The number that concerns people more is the “labor force participation rate”. The Great Recession kicked a lot of workers out of the economy.
Mostly this is because Baby Boomer are now retiring an leaving the workforce, and some have chosen to retire early rather than look for another job. But there are still some other problems in our economy that cause this. President Trump has nothing particular in order to solve these problems.

Conclusion

As we see, Don Jr’s tweet is a troll. When we look at the graphs of these indicators going back to the Great Recession, we don’t see how President Trump has influenced anything. The improvements this year are in line with the improvements last year, which are in turn inline with the improvements in the previous year.
To be fair, all parties credit their President with improvements during their term. President Obama’s supporters did the same thing. But at least right now, with these numbers, we can see that there’s no merit to anything in Don Jr’s tweet.
The hyperpartisan rancor in this country is because neither side cares about the facts. We should care. We should care that these numbers suck, even if we are Republicans. Conversely, we should care that those NetNeutrality claims by Democrats suck, even if we are Democrats.

Ultimate 3D printer control with OctoPrint

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/octoprint/

Control and monitor your 3D printer remotely with a Raspberry Pi and OctoPrint.

Timelapse of OctoPrint Ornament

Printed on a bq Witbox STL file can be found here: http://www.thingiverse.com/thing:191635 OctoPrint is located here: http://www.octoprint.org

3D printing

Whether you have a 3D printer at home or use one at your school or local makerspace, it’s fair to assume you’ve had a failed print or two in your time. Filament knotting or running out, your print peeling away from the print bed — these are common issues for all 3D printing enthusiasts, and they can be costly if they’re discovered too late.

OctoPrint

OctoPrint is a free open-source software, created and maintained by Gina Häußge, that performs a multitude of useful 3D printing–related tasks, including remote control of your printer, live video, and data collection.

The OctoPrint logo

Control and monitoring

To control the print process, use OctoPrint on a Raspberry Pi connected to your 3D printer. First, ensure a safe uninterrupted run by using the software to restrict who can access the printer. Then, before starting your print, use the web app to work on your STL file. The app also allows you to reposition the print head at any time, as well as pause or stop printing if needed.

Live video streaming

Since OctoPrint can stream video of your print as it happens, you can watch out for any faults that may require you to abort and restart. Proud of your print? Record the entire process from start to finish and upload the time-lapse video to your favourite social media platform.

OctoPrint software graphic user interface screenshot

Data capture

Octoprint records real-time data, such as the temperature, giving you another way to monitor your print to ensure a smooth, uninterrupted process. Moreover, the records will help with troubleshooting if there is a problem.

OctoPrint software graphic user interface screenshot

Print the Millenium Falcon

OK, you can print anything you like. However, this design definitely caught our eye this week.

3D-Printed Fillenium Malcon (Timelapse)

This is a Timelapse of my biggest print project so far on my own designed/built printer. It’s 500x170x700(mm) and weights 3 Kilograms of Filament.

You can support the work of Gina and OctoPrint by visiting her Patreon account and following OctoPrint on Twitter, Facebook, or G+. And if you’ve set up a Raspberry Pi to run OctoPrint, or you’ve created some cool Pi-inspired 3D prints, make sure to share them with us on our own social media channels.

The post Ultimate 3D printer control with OctoPrint appeared first on Raspberry Pi.

Using AWS CodeCommit Pull Requests to request code reviews and discuss code

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/devops/using-aws-codecommit-pull-requests-to-request-code-reviews-and-discuss-code/

Thank you to Michael Edge, Senior Cloud Architect, for a great blog on CodeCommit pull requests.

~~~~~~~

AWS CodeCommit is a fully managed service for securely hosting private Git repositories. CodeCommit now supports pull requests, which allows repository users to review, comment upon, and interactively iterate on code changes. Used as a collaboration tool between team members, pull requests help you to review potential changes to a CodeCommit repository before merging those changes into the repository. Each pull request goes through a simple lifecycle, as follows:

  • The new features to be merged are added as one or more commits to a feature branch. The commits are not merged into the destination branch.
  • The pull request is created, usually from the difference between two branches.
  • Team members review and comment on the pull request. The pull request might be updated with additional commits that contain changes made in response to comments, or include changes made to the destination branch.
  • Once team members are happy with the pull request, it is merged into the destination branch. The commits are applied to the destination branch in the same order they were added to the pull request.

Commenting is an integral part of the pull request process, and is used to collaborate between the developers and the reviewer. Reviewers add comments and questions to a pull request during the review process, and developers respond to these with explanations. Pull request comments can be added to the overall pull request, a file within the pull request, or a line within a file.

To make the comments more useful, sign in to the AWS Management Console as an AWS Identity and Access Management (IAM) user. The username will then be associated with the comment, indicating the owner of the comment. Pull request comments are a great quality improvement tool as they allow the entire development team visibility into what reviewers are looking for in the code. They also serve as a record of the discussion between team members at a point in time, and shouldn’t be deleted.

AWS CodeCommit is also introducing the ability to add comments to a commit, another useful collaboration feature that allows team members to discuss code changed as part of a commit. This helps you discuss changes made in a repository, including why the changes were made, whether further changes are necessary, or whether changes should be merged. As is the case with pull request comments, you can comment on an overall commit, on a file within a commit, or on a specific line or change within a file, and other repository users can respond to your comments. Comments are not restricted to commits, they can also be used to comment on the differences between two branches, or between two tags. Commit comments are separate from pull request comments, i.e. you will not see commit comments when reviewing a pull request – you will only see pull request comments.

A pull request example

Let’s get started by running through an example. We’ll take a typical pull request scenario and look at how we’d use CodeCommit and the AWS Management Console for each of the steps.

To try out this scenario, you’ll need:

  • An AWS CodeCommit repository with some sample code in the master branch. We’ve provided sample code below.
  • Two AWS Identity and Access Management (IAM) users, both with the AWSCodeCommitPowerUser managed policy applied to them.
  • Git installed on your local computer, and access configured for AWS CodeCommit.
  • A clone of the AWS CodeCommit repository on your local computer.

In the course of this example, you’ll sign in to the AWS CodeCommit console as one IAM user to create the pull request, and as the other IAM user to review the pull request. To learn more about how to set up your IAM users and how to connect to AWS CodeCommit with Git, see the following topics:

  • Information on creating an IAM user with AWS Management Console access.
  • Instructions on how to access CodeCommit using Git.
  • If you’d like to use the same ‘hello world’ application as used in this article, here is the source code:
package com.amazon.helloworld;

public class Main {
	public static void main(String[] args) {

		System.out.println("Hello, world");
	}
}

The scenario below uses the us-east-2 region.

Creating the branches

Before we jump in and create a pull request, we’ll need at least two branches. In this example, we’ll follow a branching strategy similar to the one described in GitFlow. We’ll create a new branch for our feature from the main development branch (the default branch). We’ll develop the feature in the feature branch. Once we’ve written and tested the code for the new feature in that branch, we’ll create a pull request that contains the differences between the feature branch and the main development branch. Our team lead (the second IAM user) will review the changes in the pull request. Once the changes have been reviewed, the feature branch will be merged into the development branch.

Figure 1: Pull request link

Sign in to the AWS CodeCommit console with the IAM user you want to use as the developer. You can use an existing repository or you can go ahead and create a new one. We won’t be merging any changes to the master branch of your repository, so it’s safe to use an existing repository for this example. You’ll find the Pull requests link has been added just above the Commits link (see Figure 1), and below Commits you’ll find the Branches link. Click Branches and create a new branch called ‘develop’, branched from the ‘master’ branch. Then create a new branch called ‘feature1’, branched from the ‘develop’ branch. You’ll end up with three branches, as you can see in Figure 2. (Your repository might contain other branches in addition to the three shown in the figure).

Figure 2: Create a feature branch

If you haven’t cloned your repo yet, go to the Code link in the CodeCommit console and click the Connect button. Follow the instructions to clone your repo (detailed instructions are here). Open a terminal or command line and paste the git clone command supplied in the Connect instructions for your repository. The example below shows cloning a repository named codecommit-demo:

git clone https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo

If you’ve previously cloned the repo you’ll need to update your local repo with the branches you created. Open a terminal or command line and make sure you’re in the root directory of your repo, then run the following command:

git remote update origin

You’ll see your new branches pulled down to your local repository.

$ git remote update origin
Fetching origin
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
 * [new branch]      develop    -> origin/develop
 * [new branch]      feature1   -> origin/feature1

You can also see your new branches by typing:

git branch --all

$ git branch --all
* master
  remotes/origin/develop
  remotes/origin/feature1
  remotes/origin/master

Now we’ll make a change to the ‘feature1’ branch. Open a terminal or command line and check out the feature1 branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Make code changes

Edit a file in the repo using your favorite editor and save the changes. Commit your changes to the local repository, and push your changes to CodeCommit. For example:

git commit -am 'added new feature'
git push origin feature1

$ git commit -am 'added new feature'
[feature1 8f6cb28] added new feature
1 file changed, 1 insertion(+), 1 deletion(-)

$ git push origin feature1
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (9/9), 617 bytes | 617.00 KiB/s, done.
Total 9 (delta 2), reused 0 (delta 0)
To https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   2774a53..8f6cb28  feature1 -> feature1

Creating the pull request

Now we have a ‘feature1’ branch that differs from the ‘develop’ branch. At this point we want to merge our changes into the ‘develop’ branch. We’ll create a pull request to notify our team members to review our changes and check whether they are ready for a merge.

In the AWS CodeCommit console, click Pull requests. Click Create pull request. On the next page select ‘develop’ as the destination branch and ‘feature1’ as the source branch. Click Compare. CodeCommit will check for merge conflicts and highlight whether the branches can be automatically merged using the fast-forward option, or whether a manual merge is necessary. A pull request can be created in both situations.

Figure 3: Create a pull request

After comparing the two branches, the CodeCommit console displays the information you’ll need in order to create the pull request. In the ‘Details’ section, the ‘Title’ for the pull request is mandatory, and you may optionally provide comments to your reviewers to explain the code change you have made and what you’d like them to review. In the ‘Notifications’ section, there is an option to set up notifications to notify subscribers of changes to your pull request. Notifications will be sent on creation of the pull request as well as for any pull request updates or comments. And finally, you can review the changes that make up this pull request. This includes both the individual commits (a pull request can contain one or more commits, available in the Commits tab) as well as the changes made to each file, i.e. the diff between the two branches referenced by the pull request, available in the Changes tab. After you have reviewed this information and added a title for your pull request, click the Create button. You will see a confirmation screen, as shown in Figure 4, indicating that your pull request has been successfully created, and can be merged without conflicts into the ‘develop’ branch.

Figure 4: Pull request confirmation page

Reviewing the pull request

Now let’s view the pull request from the perspective of the team lead. If you set up notifications for this CodeCommit repository, creating the pull request would have sent an email notification to the team lead, and he/she can use the links in the email to navigate directly to the pull request. In this example, sign in to the AWS CodeCommit console as the IAM user you’re using as the team lead, and click Pull requests. You will see the same information you did during creation of the pull request, plus a record of activity related to the pull request, as you can see in Figure 5.

Figure 5: Team lead reviewing the pull request

Commenting on the pull request

You now perform a thorough review of the changes and make a number of comments using the new pull request comment feature. To gain an overall perspective on the pull request, you might first go to the Commits tab and review how many commits are included in this pull request. Next, you might visit the Changes tab to review the changes, which displays the differences between the feature branch code and the develop branch code. At this point, you can add comments to the pull request as you work through each of the changes. Let’s go ahead and review the pull request. During the review, you can add review comments at three levels:

  • The overall pull request
  • A file within the pull request
  • An individual line within a file

The overall pull request
In the Changes tab near the bottom of the page you’ll see a ‘Comments on changes’ box. We’ll add comments here related to the overall pull request. Add your comments as shown in Figure 6 and click the Save button.

Figure 6: Pull request comment

A specific file in the pull request
Hovering your mouse over a filename in the Changes tab will cause a blue ‘comments’ icon to appear to the left of the filename. Clicking the icon will allow you to enter comments specific to this file, as in the example in Figure 7. Go ahead and add comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 7: File comment

A specific line in a file in the pull request
A blue ‘comments’ icon will appear as you hover over individual lines within each file in the pull request, allowing you to create comments against lines that have been added, removed or are unchanged. In Figure 8, you add comments against a line that has been added to the source code, encouraging the developer to review the naming standards. Go ahead and add line comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 8: Line comment

A pull request that has been commented at all three levels will look similar to Figure 9. The pull request comment is shown expanded in the ‘Comments on changes’ section, while the comments at file and line level are shown collapsed. A ‘comment’ icon indicates that comments exist at file and line level. Clicking the icon will expand and show the comment. Since you are expecting the developer to make further changes based on your comments, you won’t merge the pull request at this stage, but will leave it open awaiting feedback. Each comment you made results in a notification being sent to the developer, who can respond to the comments. This is great for remote working, where developers and team lead may be in different time zones.

Figure 9: Fully commented pull request

Adding a little complexity

A typical development team is going to be creating pull requests on a regular basis. It’s highly likely that the team lead will merge other pull requests into the ‘develop’ branch while pull requests on feature branches are in the review stage. This may result in a change to the ‘Mergable’ status of a pull request. Let’s add this scenario into the mix and check out how a developer will handle this.

To test this scenario, we could create a new pull request and ask the team lead to merge this to the ‘develop’ branch. But for the sake of simplicity we’ll take a shortcut. Clone your CodeCommit repo to a new folder, switch to the ‘develop’ branch, and make a change to one of the same files that were changed in your pull request. Make sure you change a line of code that was also changed in the pull request. Commit and push this back to CodeCommit. Since you’ve just changed a line of code in the ‘develop’ branch that has also been changed in the ‘feature1’ branch, the ‘feature1’ branch cannot be cleanly merged into the ‘develop’ branch. Your developer will need to resolve this merge conflict.

A developer reviewing the pull request would see the pull request now looks similar to Figure 10, with a ‘Resolve conflicts’ status rather than the ‘Mergable’ status it had previously (see Figure 5).

Figure 10: Pull request with merge conflicts

Reviewing the review comments

Once the team lead has completed his review, the developer will review the comments and make the suggested changes. As a developer, you’ll see the list of review comments made by the team lead in the pull request Activity tab, as shown in Figure 11. The Activity tab shows the history of the pull request, including commits and comments. You can reply to the review comments directly from the Activity tab, by clicking the Reply button, or you can do this from the Changes tab. The Changes tab shows the comments for the latest commit, as comments on previous commits may be associated with lines that have changed or been removed in the current commit. Comments for previous commits are available to view and reply to in the Activity tab.

In the Activity tab, use the shortcut link (which looks like this </>) to move quickly to the source code associated with the comment. In this example, you will make further changes to the source code to address the pull request review comments, so let’s go ahead and do this now. But first, you will need to resolve the ‘Resolve conflicts’ status.

Figure 11: Pull request activity

Resolving the ‘Resolve conflicts’ status

The ‘Resolve conflicts’ status indicates there is a merge conflict between the ‘develop’ branch and the ‘feature1’ branch. This will require manual intervention to restore the pull request back to the ‘Mergable’ state. We will resolve this conflict next.

Open a terminal or command line and check out the develop branch by running the following command:

git checkout develop

$ git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

To incorporate the changes the team lead made to the ‘develop’ branch, merge the remote ‘develop’ branch with your local copy:

git pull

$ git pull
remote: Counting objects: 9, done.
Unpacking objects: 100% (9/9), done.
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   af13c82..7b36f52  develop    -> origin/develop
Updating af13c82..7b36f52
Fast-forward
 src/main/java/com/amazon/helloworld/Main.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Then checkout the ‘feature1’ branch:

git checkout feature1

$ git checkout feature1
Switched to branch 'feature1'
Your branch is up-to-date with 'origin/feature1'.

Now merge the changes from the ‘develop’ branch into your ‘feature1’ branch:

git merge develop

$ git merge develop
Auto-merging src/main/java/com/amazon/helloworld/Main.java
CONFLICT (content): Merge conflict in src/main/java/com/amazon/helloworld/Main.java
Automatic merge failed; fix conflicts and then commit the result.

Yes, this fails. The file Main.java has been changed in both branches, resulting in a merge conflict that can’t be resolved automatically. However, Main.java will now contain markers that indicate where the conflicting code is, and you can use these to resolve the issues manually. Edit Main.java using your favorite IDE, and you’ll see it looks something like this:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

<<<<<<< HEAD
        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earthling. Today's date is: " + todaysdate);
=======
      System.out.println("Hello, earth");
>>>>>>> develop
   }
}

The code between HEAD and ‘===’ is the code the developer added in the ‘feature1’ branch (HEAD represents ‘feature1’ because this is the current checked out branch). The code between ‘===’ and ‘>>> develop’ is the code added to the ‘develop’ branch by the team lead. We’ll resolve the conflict by manually merging both changes, resulting in an updated Main.java:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysdate);
   }
}

After saving the change you can add and commit it to your local repo:

git add src/
git commit -m 'fixed merge conflict by merging changes'

Fixing issues raised by the reviewer

Now you are ready to address the comments made by the team lead. If you are no longer pointing to the ‘feature1’ branch, check out the ‘feature1’ branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Edit the source code in your favorite IDE and make the changes to address the comments. In this example, the developer has updated the source code as follows:

package com.amazon.helloworld;

import java.util.*;

/**
 *  This class prints a hello world message
 *
 * @author Michael Edge
 * @see HelloEarth
 * @version 1.0
 */

public class Main {
   public static void main(String[] args) {

        Date todaysDate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysDate);
   }
}

After saving the changes, commit and push to the CodeCommit ‘feature1’ branch as you did previously:

git commit -am 'updated based on review comments'
git push origin feature1

Responding to the reviewer

Now that you’ve fixed the code issues you will want to respond to the review comments. In the AWS CodeCommit console, check that your latest commit appears in the pull request Commits tab. You now have a pull request consisting of more than one commit. The pull request in Figure 12 has four commits, which originated from the following activities:

  • 8th Nov: the original commit used to initiate this pull request
  • 10th Nov, 3 hours ago: the commit by the team lead to the ‘develop’ branch, merged into our ‘feature1’ branch
  • 10th Nov, 24 minutes ago: the commit by the developer that resolved the merge conflict
  • 10th Nov, 4 minutes ago: the final commit by the developer addressing the review comments

Figure 12: Pull request with multiple commits

Let’s reply to the review comments provided by the team lead. In the Activity tab, reply to the pull request comment and save it, as shown in Figure 13.

Figure 13: Replying to a pull request comment

At this stage, your code has been committed and you’ve updated your pull request comments, so you are ready for a final review by the team lead.

Final review

The team lead reviews the code changes and comments made by the developer. As team lead, you own the ‘develop’ branch and it’s your decision on whether to merge the changes in the pull request into the ‘develop’ branch. You can close the pull request with or without merging using the Merge and Close buttons at the bottom of the pull request page (see Figure 13). Clicking Close will allow you to add comments on why you are closing the pull request without merging. Merging will perform a fast-forward merge, incorporating the commits referenced by the pull request. Let’s go ahead and click the Merge button to merge the pull request into the ‘develop’ branch.

Figure 14: Merging the pull request

After merging a pull request, development of that feature is complete and the feature branch is no longer needed. It’s common practice to delete the feature branch after merging. CodeCommit provides a check box during merge to automatically delete the associated feature branch, as seen in Figure 14. Clicking the Merge button will merge the pull request into the ‘develop’ branch, as shown in Figure 15. This will update the status of the pull request to ‘Merged’, and will close the pull request.

Conclusion

This blog has demonstrated how pull requests can be used to request a code review, and enable reviewers to get a comprehensive summary of what is changing, provide feedback to the author, and merge the code into production. For more information on pull requests, see the documentation.

New – Amazon EC2 Instances with Up to 8 NVIDIA Tesla V100 GPUs (P3)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-with-up-to-8-nvidia-tesla-v100-gpus-p3/

Driven by customer demand and made possible by on-going advances in the state-of-the-art, we’ve come a long way since the original m1.small instance that we launched in 2006, with instances that are emphasize compute power, burstable performance, memory size, local storage, and accelerated computing.

The New P3
Today we are making the next generation of GPU-powered EC2 instances available in four AWS regions. Powered by up to eight NVIDIA Tesla V100 GPUs, the P3 instances are designed to handle compute-intensive machine learning, deep learning, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, and genomics workloads.

P3 instances use customized Intel Xeon E5-2686v4 processors running at up to 2.7 GHz. They are available in three sizes (all VPC-only and EBS-only):

Model NVIDIA Tesla V100 GPUs GPU Memory NVIDIA NVLink vCPUs Main Memory Network Bandwidth EBS Bandwidth
p3.2xlarge 1 16 GiB n/a 8 61 GiB Up to 10 Gbps 1.5 Gbps
p3.8xlarge 4 64 GiB 200 GBps 32 244 GiB 10 Gbps 7 Gbps
p3.16xlarge 8 128 GiB 300 GBps 64 488 GiB 25 Gbps 14 Gbps

Each of the NVIDIA GPUs is packed with 5,120 CUDA cores and another 640 Tensor cores and can deliver up to 125 TFLOPS of mixed-precision floating point, 15.7 TFLOPS of single-precision floating point, and 7.8 TFLOPS of double-precision floating point. On the two larger sizes, the GPUs are connected together via NVIDIA NVLink 2.0 running at a total data rate of up to 300 GBps. This allows the GPUs to exchange intermediate results and other data at high speed, without having to move it through the CPU or the PCI-Express fabric.

What’s a Tensor Core?
I had not heard the term Tensor core before starting to write this post. According to this very helpful post on the NVIDIA Blog, Tensor cores are designed to speed up the training and inference of large, deep neural networks. Each core is able to quickly and efficiently multiply a pair of 4×4 half-precision (also known as FP16) matrices together, add the resulting 4×4 matrix to another half or single-precision (FP32) matrix, and store the resulting 4×4 matrix in either half or single-precision form. Here’s a diagram from NVIDIA’s blog post:

This operation is in the innermost loop of the training process for a deep neural network, and is an excellent example of how today’s NVIDIA GPU hardware is purpose-built to address a very specific market need. By the way, the mixed-precision qualifier on the Tensor core performance means that it is flexible enough to work with with a combination of 16-bit and 32-bit floating point values.

Performance in Perspective
I always like to put raw performance numbers into a real-world perspective so that they are easier to relate to and (hopefully) more meaningful. This turned out to be surprisingly difficult, given that the eight NVIDIA Tesla V100 GPUs on a single p3.16xlarge can do 125 trillion single-precision floating point multiplications per second.

Let’s go back to the dawn of the microprocessor era, and consider the Intel 8080A chip that powered the MITS Altair that I bought in the summer of 1977. With a 2 MHz clock, it was able to do about 832 multiplications per second (I used this data and corrected it for the faster clock speed). The p3.16xlarge is roughly 150 billion times faster. However, just 1.2 billion seconds have gone by since that summer. In other words, I can do 100x more calculations today in one second than my Altair could have done in the last 40 years!

What about the innovative 8087 math coprocessor that was an optional accessory for the IBM PC that was announced in the summer of 1981? With a 5 MHz clock and purpose-built hardware, it was able to do about 52,632 multiplications per second. 1.14 billion seconds have elapsed since then, p3.16xlarge is 2.37 billion times faster, so the poor little PC would be barely halfway through a calculation that would run for 1 second today.

Ok, how about a Cray-1? First delivered in 1976, this supercomputer was able to perform vector operations at 160 MFLOPS, making the p3.x16xlarge 781,000 times faster. It could have iterated on some interesting problem 1500 times over the years since it was introduced.

Comparisons between the P3 and today’s scale-out supercomputers are harder to make, given that you can think of the P3 as a step-and-repeat component of a supercomputer that you can launch on as as-needed basis.

Run One Today
In order to take full advantage of the NVIDIA Tesla V100 GPUs and the Tensor cores, you will need to use CUDA 9 and cuDNN7. These drivers and libraries have already been added to the newest versions of the Windows AMIs and will be included in an updated Amazon Linux AMI that is scheduled for release on November 7th. New packages are already available in our repos if you want to to install them on your existing Amazon Linux AMI.

The newest AWS Deep Learning AMIs come preinstalled with the latest releases of Apache MxNet, Caffe2, and Tensorflow (each with support for the NVIDIA Tesla V100 GPUs), and will be updated to support P3 instances with other machine learning frameworks such as Microsoft Cognitive Toolkit and PyTorch as soon as these frameworks release support for the NVIDIA Tesla V100 GPUs. You can also use the NVIDIA Volta Deep Learning AMI for NGC.

P3 instances are available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) Regions in On-Demand, Spot, Reserved Instance, and Dedicated Host form.

Jeff;

 

Coaxing 2D platforming out of Unity

Post Syndicated from Eevee original https://eev.ee/blog/2017/10/13/coaxing-2d-platforming-out-of-unity/

An anonymous donor asked a question that I can’t even begin to figure out how to answer, but they also said anything else is fine, so here’s anything else.

I’ve been avoiding writing about game physics, since I want to save it for ✨ the book I’m writing ✨, but that book will almost certainly not touch on Unity. Here, then, is a brief run through some of the brick walls I ran into while trying to convince Unity to do 2D platforming.

This is fairly high-level — there are no blocks of code or helpful diagrams. I’m just getting this out of my head because it’s interesting. If you want more gritty details, I guess you’ll have to wait for ✨ the book ✨.

The setup

I hadn’t used Unity before. I hadn’t even used a “real” physics engine before. My games so far have mostly used LÖVE, a Lua-based engine. LÖVE includes box2d bindings, but for various reasons (not all of them good), I opted to avoid them and instead write my own physics completely from scratch. (How, you ask? ✨ Book ✨!)

I was invited to work on a Unity project, Chaos Composer, that someone else had already started. It had basic movement already implemented; I taught myself Unity’s physics system by hacking on it. It’s entirely possible that none of this is actually the best way to do anything, since I was really trying to reproduce my own homegrown stuff in Unity, but it’s the best I’ve managed to come up with.

Two recurring snags were that you can’t ask Unity to do multiple physics updates in a row, and sometimes getting the information I wanted was difficult. Working with my own code spoiled me a little, since I could invoke it at any time and ask it anything I wanted; Unity, on the other hand, is someone else’s black box with a rigid interface on top.

Also, wow, Googling for a lot of this was not quite as helpful as expected. A lot of what’s out there is just the first thing that works, and often that’s pretty hacky and imposes severe limits on the game design (e.g., “this won’t work with slopes”). Basic movement and collision are the first thing you do, which seems to me like the worst time to be locking yourself out of a lot of design options. I tried very (very, very, very) hard to minimize those kinds of constraints.

Problem 1: Movement

When I showed up, movement was already working. Problem solved!

Like any good programmer, I immediately set out to un-solve it. Given a “real” physics engine like Unity prominently features, you have two options: ⓐ treat the player as a physics object, or ⓑ don’t. The existing code went with option ⓑ, like I’d done myself with LÖVE, and like I’d seen countless people advise. Using a physics sim makes for bad platforming.

But… why? I believed it, but I couldn’t concretely defend it. I had to know for myself. So I started a blank project, drew some physics boxes, and wrote a dozen-line player controller.

Ah! Immediate enlightenment.

If the player was sliding down a wall, and I tried to move them into the wall, they would simply freeze in midair until I let go of the movement key. The trouble is that the physics sim works in terms of forces — moving the player involves giving them a nudge in some direction, like a giant invisible hand pushing them around the level. Surprise! If you press a real object against a real wall with your real hand, you’ll see the same effect — friction will cancel out gravity, and the object will stay in midair..

Platformer movement, as it turns out, doesn’t make any goddamn physical sense. What is air control? What are you pushing against? Nothing, really; we just have it because it’s nice to play with, because not having it is a nightmare.

I looked to see if there were any common solutions to this, and I only really found one: make all your walls frictionless.

Game development is full of hacks like this, and I… don’t like them. I can accept that minor hacks are necessary sometimes, but this one makes an early and widespread change to a fundamental system to “fix” something that was wrong in the first place. It also imposes an “invisible” requirement, something I try to avoid at all costs — if you forget to make a particular wall frictionless, you’ll never know unless you happen to try sliding down it.

And so, I swiftly returned to the existing code. It wasn’t too different from what I’d come up with for LÖVE: it applied gravity by hand, tracked the player’s velocity, computed the intended movement each frame, and moved by that amount. The interesting thing was that it used MovePosition, which schedules a movement for the next physics update and stops the movement if the player hits something solid.

It’s kind of a nice hybrid approach, actually; all the “physics” for conscious actors is done by hand, but the physics engine is still used for collision detection. It’s also used for collision rejection — if the player manages to wedge themselves several pixels into a solid object, for example, the physics engine will try to gently nudge them back out of it with no extra effort required on my part. I still haven’t figured out how to get that to work with my homegrown stuff, which is built to prevent overlap rather than to jiggle things out of it.

But wait, what about…

Our player is a dynamic body with rotation lock and no gravity. Why not just use a kinematic body?

I must be missing something, because I do not understand the point of kinematic bodies. I ran into this with Godot, too, which documented them the same way: as intended for use as players and other manually-moved objects. But by default, they don’t even collide with other kinematic bodies or static geometry. What? There’s a checkbox to turn this on, which I enabled, but then I found out that MovePosition doesn’t stop kinematic bodies when they hit something, so I would’ve had to cast along the intended path of movement to figure out when to stop, thus duplicating the same work the physics engine was about to do.

But that’s impossible anyway! Static geometry generally wants to be made of edge colliders, right? They don’t care about concave/convex. Imagine the player is standing on the ground near a wall and tries to move towards the wall. Both the ground and the wall are different edges from the same edge collider.

If you try to cast the player’s hitbox horizontally, parallel to the ground, you’ll only get one collision: the existing collision with the ground. Casting doesn’t distinguish between touching and hitting. And because Unity only reports one collision per collider, and because the ground will always show up first, you will never find out about the impending wall collision.

So you’re forced to either use raycasts for collision detection or decomposed polygons for world geometry, both of which are slightly worse tools for no real gain.

I ended up sticking with a dynamic body.


Oh, one other thing that doesn’t really fit anywhere else: keep track of units! If you’re adding something called “velocity” directly to something called “position”, something has gone very wrong. Acceleration is distance per time squared; velocity is distance per time; position is distance. You must multiply or divide by time to convert between them.

I never even, say, add a constant directly to position every frame; I always phrase it as velocity and multiply by Δt. It keeps the units consistent: time is always in seconds, not in tics.

Problem 2: Slopes

Ah, now we start to get off in the weeds.

A sort of pre-problem here was detecting whether we’re on a slope, which means detecting the ground. The codebase originally used a manual physics query of the area around the player’s feet to check for the ground, which seems to be somewhat common, but that can’t tell me the angle of the detected ground. (It’s also kind of error-prone, since “around the player’s feet” has to be specified by hand and may not stay correct through animations or changes in the hitbox.)

I replaced that with what I’d eventually settled on in LÖVE: detect the ground by detecting collisions, and looking at the normal of the collision. A normal is a vector that points straight out from a surface, so if you’re standing on the ground, the normal points straight up; if you’re on a 10° incline, the normal points 10° away from straight up.

Not all collisions are with the ground, of course, so I assumed something is ground if the normal pointed away from gravity. (I like this definition more than “points upwards”, because it avoids assuming anything about the direction of gravity, which leaves some interesting doors open for later on.) That’s easily detected by taking the dot product — if it’s negative, the collision was with the ground, and I now have the normal of the ground.

Actually doing this in practice was slightly tricky. With my LÖVE engine, I could cram this right into the middle of collision resolution. With Unity, not quite so much. I went through a couple iterations before I really grasped Unity’s execution order, which I guess I will have to briefly recap for this to make sense.

Unity essentially has two update cycles. It performs physics updates at fixed intervals for consistency, and updates everything else just before rendering. Within a single frame, Unity does as many fixed physics updates as it has spare time for (which might be zero, one, or more), then does a regular update, then renders. User code can implement either or both of Update, which runs during a regular update, and FixedUpdate, which runs just before Unity does a physics pass.

So my solution was:

  • At the very end of FixedUpdate, clear the actor’s “on ground” flag and ground normal.

  • During OnCollisionEnter2D and OnCollisionStay2D (which are called from within a physics pass), if there’s a collision that looks like it’s with the ground, set the “on ground” flag and ground normal. (If there are multiple ground collisions, well, good luck figuring out the best way to resolve that! At the moment I’m just taking the first and hoping for the best.)

That means there’s a brief window between the end of FixedUpdate and Unity’s physics pass during which a grounded actor might mistakenly believe it’s not on the ground, which is a bit of a shame, but there are very few good reasons for anything to be happening in that window.

Okay! Now we can do slopes.

Just kidding! First we have to do sliding.

When I first looked at this code, it didn’t apply gravity while the player was on the ground. I think I may have had some problems with detecting the ground as result, since the player was no longer pushing down against it? Either way, it seemed like a silly special case, so I made gravity always apply.

Lo! I was a fool. The player could no longer move.

Why? Because MovePosition does exactly what it promises. If the player collides with something, they’ll stop moving. Applying gravity means that the player is trying to move diagonally downwards into the ground, and so MovePosition stops them immediately.

Hence, sliding. I don’t want the player to actually try to move into the ground. I want them to move the unblocked part of that movement. For flat ground, that means the horizontal part, which is pretty much the same as discarding gravity. For sloped ground, it’s a bit more complicated!

Okay but actually it’s less complicated than you’d think. It can be done with some cross products fairly easily, but Unity makes it even easier with a couple casts. There’s a Vector3.ProjectOnPlane function that projects an arbitrary vector on a plane given by its normal — exactly the thing I want! So I apply that to the attempted movement before passing it along to MovePosition. I do the same thing with the current velocity, to prevent the player from accelerating infinitely downwards while standing on flat ground.

One other thing: I don’t actually use the detected ground normal for this. The player might be touching two ground surfaces at the same time, and I’d want to project on both of them. Instead, I use the player body’s GetContacts method, which returns contact points (and normals!) for everything the player is currently touching. I believe those contact points are tracked by the physics engine anyway, so asking for them doesn’t require any actual physics work.

(Looking at the code I have, I notice that I still only perform the slide for surfaces facing upwards — but I’d want to slide against sloped ceilings, too. Why did I do this? Maybe I should remove that.)

(Also, I’m pretty sure projecting a vector on a plane is non-commutative, which raises the question of which order the projections should happen in and what difference it makes. I don’t have a good answer.)

(I note that my LÖVE setup does something slightly different: it just tries whatever the movement ought to be, and if there’s a collision, then it projects — and tries again with the remaining movement. But I can’t ask Unity to do multiple moves in one physics update, alas.)

Okay! Now, slopes. But actually, with the above work done, slopes are most of the way there already.

One obvious problem is that the player tries to move horizontally even when on a slope, and the easy fix is to change their movement from speed * Vector2.right to speed * new Vector2(ground.y, -ground.x) while on the ground. That’s the ground normal rotated a quarter-turn clockwise, so for flat ground it still points to the right, and in general it points rightwards along the ground. (Note that it assumes the ground normal is a unit vector, but as far as I’m aware, that’s true for all the normals Unity gives you.)

Another issue is that if the player stands motionless on a slope, gravity will cause them to slowly slide down it — because the movement from gravity will be projected onto the slope, and unlike flat ground, the result is no longer zero. For conscious actors only, I counter this by adding the opposite factor to the player’s velocity as part of adding in their walking speed. This matches how the real world works, to some extent: when you’re standing on a hill, you’re exerting some small amount of effort just to stay in place.

(Note that slope resistance is not the same as friction. Okay, yes, in the real world, virtually all resistance to movement happens as a result of friction, but bracing yourself against the ground isn’t the same as being passively resisted.)

From here there are a lot of things you can do, depending on how you think slopes should be handled. You could make the player unable to walk up slopes that are too steep. You could make walking down a slope faster than walking up it. You could make jumping go along the ground normal, rather than straight up. You could raise the player’s max allowed speed while running downhill. Whatever you want, really. Armed with a normal and awareness of dot products, you can do whatever you want.

But first you might want to fix a few aggravating side effects.

Problem 3: Ground adherence

I don’t know if there’s a better name for this. I rarely even see anyone talk about it, which surprises me; it seems like it should be a very common problem.

The problem is: if the player runs up a slope which then abruptly changes to flat ground, their momentum will carry them into the air. For very fast players going off the top of very steep slopes, this makes sense, but it becomes visible even for relatively gentle slopes. It was a mild nightmare in the original release of our game Lunar Depot 38, which has very “rough” ground made up of lots of shallow slopes — so the player is very frequently slightly off the ground, which meant they couldn’t jump, for seemingly no reason. (I even had code to fix this, but I disabled it because of a silly visual side effect that I never got around to fixing.)

Anyway! The reason this is a problem is that game protagonists are generally not boxes sliding around — they have legs. We don’t go flying off the top of real-world hilltops because we put our foot down until it touches the ground.

Simulating this footfall is surprisingly fiddly to get right, especially with someone else’s physics engine. It’s made somewhat easier by Cast, which casts the entire hitbox — no matter what shape it is — in a particular direction, as if it had moved, and tells you all the hypothetical collisions in order.

So I cast the player in the direction of gravity by some distance. If the cast hits something solid with a ground-like collision normal, then the player must be close to the ground, and I move them down to touch it (and set that ground as the new ground normal).

There are some wrinkles.

Wrinkle 1: I only want to do this if the player is off the ground now, but was on the ground last frame, and is not deliberately moving upwards. That latter condition means I want to skip this logic if the player jumps, for example, but also if the player is thrust upwards by a spring or abducted by a UFO or whatever. As long as external code goes through some interface and doesn’t mess with the player’s velocity directly, that shouldn’t be too hard to track.

Wrinkle 2: When does this logic run? It needs to happen after the player moves, which means after a Unity physics pass… but there’s no callback for that point in time. I ended up running it at the beginning of FixedUpdate and the beginning of Update — since I definitely want to do it before rendering happens! That means it’ll sometimes happen twice between physics updates. (I could carefully juggle a flag to skip the second run, but I… didn’t do that. Yet?)

Wrinkle 3: I can’t move the player with MovePosition! Remember, MovePosition schedules a movement, it doesn’t actually perform one; that means if it’s called twice before the physics pass, the first call is effectively ignored. I can’t easily combine the drop with the player’s regular movement, for various fiddly reasons. I ended up doing it “by hand” using transform.Translate, which I think was the “old way” to do manual movement before MovePosition existed. I’m not totally sure if it activates triggers? For that matter, I’m not sure it even notices collisions — but since I did a full-body Cast, there shouldn’t be any anyway.

Wrinkle 4: What, exactly, is “some distance”? I’ve yet to find a satisfying answer for this. It seems like it ought to be based on the player’s current speed and the slope of the ground they’re moving along, but every time I’ve done that math, I’ve gotten totally ludicrous answers that sometimes exceed the size of a tile. But maybe that’s not wrong? Play around, I guess, and think about when the effect should “break” and the player should go flying off the top of a hill.

Wrinkle 5: It’s possible that the player will launch off a slope, hit something, and then be adhered to the ground where they wouldn’t have hit it. I don’t much like this edge case, but I don’t see a way around it either.

This problem is surprisingly awkward for how simple it sounds, and the solution isn’t entirely satisfying. Oh, well; the results are much nicer than the solution. As an added bonus, this also fixes occasional problems with running down a hill and becoming detached from the ground due to precision issues or whathaveyou.

Problem 4: One-way platforms

Ah, what a nightmare.

It took me ages just to figure out how to define one-way platforms. Only block when the player is moving downwards? Nope. Only block when the player is above the platform? Nuh-uh.

Well, okay, yes, those approaches might work for convex players and flat platforms. But what about… sloped, one-way platforms? There’s no reason you shouldn’t be able to have those. If Super Mario World can do it, surely Unity can do it almost 30 years later.

The trick is, again, to look at the collision normal. If it faces away from gravity, the player is hitting a ground-like surface, so the platform should block them. Otherwise (or if the player overlaps the platform), it shouldn’t.

Here’s the catch: Unity doesn’t have conditional collision. I can’t decide, on the fly, whether a collision should block or not. In fact, I think that by the time I get a callback like OnCollisionEnter2D, the physics pass is already over.

I could go the other way and use triggers (which are non-blocking), but then I have the opposite problem: I can’t stop the player on the fly. I could move them back to where they hit the trigger, but I envision all kinds of problems as a result. What if they were moving fast enough to activate something on the other side of the platform? What if something else moved to where I’m trying to shove them back to in the meantime? How does this interact with ground detection and listing contacts, which would rightly ignore a trigger as non-blocking?

I beat my head against this for a while, but the inability to respond to collision conditionally was a huge roadblock. It’s all the more infuriating a problem, because Unity ships with a one-way platform modifier thing. Unfortunately, it seems to have been implemented by someone who has never played a platformer. It’s literally one-way — the player is only allowed to move straight upwards through it, not in from the sides. It also tries to block the player if they’re moving downwards while inside the platform, which invokes clumsy rejection behavior. And this all seems to be built into the physics engine itself somehow, so I can’t simply copy whatever they did.

Eventually, I settled on the following. After calculating attempted movement (including sliding), just at the end of FixedUpdate, I do a Cast along the movement vector. I’m not thrilled about having to duplicate the physics engine’s own work, but I do filter to only things on a “one-way platform” physics layer, which should at least help. For each object the cast hits, I use Physics2D.IgnoreCollision to either ignore or un-ignore the collision between the player and the platform, depending on whether the collision was ground-like or not.

(A lot of people suggested turning off collision between layers, but that can’t possibly work — the player might be standing on one platform while inside another, and anyway, this should work for all actors!)

Again, wrinkles! But fewer this time. Actually, maybe just one: handling the case where the player already overlaps the platform. I can’t just check for that with e.g. OverlapCollider, because that doesn’t distinguish between overlapping and merely touching.

I came up with a fairly simple fix: if I was going to un-ignore the collision (i.e. make the platform block), and the cast distance is reported as zero (either already touching or overlapping), I simply do nothing instead. If I’m standing on the platform, I must have already set it blocking when I was approaching it from the top anyway; if I’m overlapping it, I must have already set it non-blocking to get here in the first place.

I can imagine a few cases where this might go wrong. Moving platforms, especially, are going to cause some interesting issues. But this is the best I can do with what I know, and it seems to work well enough so far.

Oh, and our player can deliberately drop down through platforms, which was easy enough to implement; I just decide the platform is always passable while some button is held down.

Problem 5: Pushers and carriers

I haven’t gotten to this yet! Oh boy, can’t wait. I implemented it in LÖVE, but my way was hilariously invasive; I’m hoping that having a physics engine that supports a handwaved “this pushes that” will help. Of course, you also have to worry about sticking to platforms, for which the recommended solution is apparently to parent the cargo to the platform, which sounds goofy to me? I guess I’ll find out when I throw myself at it later.

Overall result

I ended up with a fairly pleasant-feeling system that supports slopes and one-way platforms and whatnot, with all the same pieces as I came up with for LÖVE. The code somehow ended up as less of a mess, too, but it probably helps that I’ve been down this rabbit hole once before and kinda knew what I was aiming for this time.

Animation of a character running smoothly along the top of an irregular dinosaur skeleton

Sorry that I don’t have a big block of code for you to copy-paste into your project. I don’t think there are nearly enough narrative discussions of these fundamentals, though, so hopefully this is useful to someone. If not, well, look forward to ✨ my book, that I am writing ✨!

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.

 

 

Denuvo Crisis After Total Warhammer 2 Gets Pirated in Hours

Post Syndicated from Andy original https://torrentfreak.com/denuvo-crisis-after-total-warhammer-2-gets-pirated-in-hours-170929/

Needing little introduction, the anti-piracy system sold by Denuvo Software Solutions of Austria is probably the most well-known product of its type of the planet.

For years, Denuvo was considered pretty much impenetrable, with its presence a virtual stamp of assurance that a game being protected by it would not fall victim to piracy, potentially for years. In recent times, however, things have begun to crumble.

Strangely, it started in early 2016 with bad news. Chinese cracking group 3DM declared that Denuvo was probably uncrackable and no protected games would appear online during the next two years.

By June, however, hope appeared on the horizon, with hints that progress was being made. By August 2016, all doubts were removed when a group called CONSPIR4CY (a reported collaboration between CPY and CODEX) released Rise of the Tomb Raider.

After that, Denuvo-protected titles began dropping like flies, with some getting cracked weeks after their launch. Then things got serious.

Early this year, Resident Evil 7 fell in less than a week. In the summer, RiME fell in a few days, four days exactly for Tekken 7.

Now, however, Denuvo has suffered its biggest failure yet, with strategy game Total War: Warhammer 2 falling to pirates in less than a day, arguably just a few hours. It was cracked by STEAMPUNKS, a group that’s been dumping cracked games on the Internet at quite a rate for the past few months.

TOTAL.WAR.WARHAMMER.2-STEAMPUNKS

“Take this advice, DO NOT CODE a new installer when you have very hot Babes dancing in their bikini just in front of you. Never again,” the group said in a statement. “This time we locked ourselves inside and produced a new installer.”

The fall of this game in such a short space of time will be of major concern to Denuvo Software Solutions. After Resident Evil 7 was cracked in days earlier this year, Denuvo Marketing Director Thomas Goebl told Eurogamer that some protection was better than nothing.

“Given the fact that every unprotected title is cracked on the day of release — as well as every update of games — our solution made a difference for this title,” he said.

With yesterday’s 0-day crack of Total War: Warhammer 2, it can be argued that Denuvo made absolutely no difference whatsoever to the availability of the title. It didn’t even protect the initial launch window.

Goebl’s additional comment in the summer was that “so far only one piracy group has been able to bypass [Denuvo].” Now, just a handful of months later, there are several groups with the ability. That’s not a good look for the company.

Back in 2016, Denuvo co-founder Robert Hernandez told Kotaku that the company does not give refunds. It would be interesting to know if anything has changed there too.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Google Signs Agreement to Tackle YouTube Piracy

Post Syndicated from Andy original https://torrentfreak.com/google-signs-unprecedented-agreement-to-tackle-youtube-piracy-170921/

Once upon a time, people complaining about piracy would point to the hundreds of piracy sites around the Internet. These days, criticism is just as likely to be leveled at Google-owned services.

YouTube, in particular, has come in for intense criticism, with the music industry complaining of exploitation of the DMCA in order to obtain unfair streaming rates from record labels. Along with streaming-ripping, this so-called Value Gap is one of the industry’s hottest topics.

With rightsholders seemingly at war with Google to varying degrees, news from France suggests that progress can be made if people sit down and negotiate.

According to local reports, Google and local anti-piracy outfit ALPA (l’Association de Lutte Contre la Piraterie Audiovisuelle) under the auspices of the CNC have signed an agreement to grant rightsholders direct access to content takedown mechanisms on YouTube.

YouTube has granted access to its Content ID systems to companies elsewhere for years but the new deal will see the system utilized by French content owners for the first time. It’s hoped that the access will result in infringing content being taken down or monetized more quickly than before.

“We do not want fraudsters to use our platforms to the detriment of creators,” said Carlo D’Asaro Biondo, Google’s President of Strategic Relationships in Europe, the Middle East and Africa.

The agreement, overseen by the Ministry of Culture, will see Google provide ALPA with financial support and rightsholders with essential training.

ALPA president Nicolas Seydoux welcomed the deal, noting that it symbolizes the “collapse of the wall of incomprehension” that previously existed between France’s rightsholders and the Internet search giant.

The deal forms part of the French government’s “Plan of Action Against Piracy”, in which it hopes to crack down on infringement in various ways, including tackling the threat of pirate sites, better promotion of services offering legitimate content, and educating children “from an early age” on the need to respect copyright.

“The fight against piracy is the great challenge of the new century in the cultural sphere,” said France’s Minister of Culture, Françoise Nyssen.

“I hope this is just the beginning of a process. It will require other agreements with rights holders and other platforms, as well as at the European level.”

According to NextInpact, the Google agreement will eventually encompass the downgrading of infringing content in search results as part of the Trusted Copyright Removal Program. A similar system is already in place in the UK.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.