Tag Archives: LAPS

Recording lost seconds with the Augenblick blink camera

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/augenblick-camera/

Warning: a GIF used in today’s blog contains flashing images.

Students at the University of Bremen, Germany, have built a wearable camera that records the seconds of vision lost when you blink. Augenblick uses a Raspberry Pi Zero and Camera Module alongside muscle sensors to record footage whenever you close your eyes, producing a rather disjointed film of the sights you miss out on.

Augenblick blink camera recording using a Raspberry Pi Zero

Blink and you’ll miss it

The average person blinks up to five times a minute, with each blink lasting 0.5 to 0.8 seconds. These half-seconds add up to about 30 minutes a day. What sights are we losing during these minutes? That is the question asked by students Manasse Pinsuwan and René Henrich when they set out to design Augenblick.

Blinking is a highly invasive mechanism for our eyesight. Every day we close our eyes thousands of times without noticing it. Our mind manages to never let us wonder what exactly happens in the moments that we miss.

Capturing lost moments

For Augenblick, the wearer sticks MyoWare Muscle Sensor pads to their face, and these detect the electrical impulses that trigger blinking.

Augenblick blink camera recording using a Raspberry Pi Zero

Two pads are applied over the orbicularis oculi muscle that forms a ring around the eye socket, while the third pad is attached to the cheek as a neutral point.

Biology fact: there are two muscles responsible for blinking. The orbicularis oculi muscle closes the eye, while the levator palpebrae superioris muscle opens it — and yes, they both sound like the names of Harry Potter spells.

The sensor is read 25 times a second. Whenever it detects that the orbicularis oculi is active, the Camera Module records video footage.

Augenblick blink recording using a Raspberry Pi Zero

Pressing a button on the side of the Augenblick glasses set the code running. An LED lights up whenever the camera is recording and also serves to confirm the correct placement of the sensor pads.

Augenblick blink camera recording using a Raspberry Pi Zero

The Pi Zero saves the footage so that it can be stitched together later to form a continuous, if disjointed, film.

Learn more about the Augenblick blink camera

You can find more information on the conception, design, and build process of Augenblick here in German, with a shorter explanation including lots of photos here in English.

And if you’re keen to recreate this project, our free project resource for a wearable Pi Zero time-lapse camera will come in handy as a starting point.

The post Recording lost seconds with the Augenblick blink camera appeared first on Raspberry Pi.

Naturebytes’ weatherproof Pi and camera case

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/naturebytes-weatherproof-pi-and-camera-case/

Naturebytes are making their weatherproof Wildlife Cam Case available as a standalone product for the first time, a welcome addition to the Raspberry Pi ecosystem that should take some of the hassle out of your outdoor builds.

A robin on a bird feeder in a garden with a Naturebytes Wildlife Cam mounted beside it

Weatherproofing digital making projects

People often use Raspberry Pis and Camera Modules for outdoor projects, but weatherproofing your set-up can be tricky. You need to keep water — and tiny creatures — out, but you might well need access for wires and cables, whether for power or sensors; if you’re using a camera, it’ll need something clear and cleanable in front of the lens. You can use sealant, but if you need to adjust anything that you’ve applied it to, you’ll have to remove it and redo it. While we’ve seen a few reasonable options available to buy, the choice has never been what you’d call extensive.

The Naturebytes case

For all these reasons, I was pleased to learn that Naturebytes, the wildlife camera people, are releasing their Wildlife Cam Case as a standalone product for the first time.

Naturebytes case open

The Wildlife Cam Case is ideal for nature camera projects, of course, but it’ll also be useful for anyone who wants to take their Pi outdoors. It has weatherproof lenses that are transparent to visible and IR light, for all your nature observation projects. Its opening is hinged to allow easy access to your hardware, and the case has waterproof access for cables. Inside, there’s a mount for fixing any model of Raspberry Pi and camera, as well as many other components. On top of all that, the case comes with a sturdy nylon strap to make it easy to attach it to a post or a tree.

Naturebytes case additional components

Order yours now!

At the moment, Naturebytes are producing a limited run of the cases. The first batch of 50 are due to be dispatched next week to arrive just in time for the Bank Holiday weekend in the UK, so get them while they’re hot. It’s the perfect thing for recording a timelapse of exactly how quickly the slugs obliterate your vegetable seedlings, and of lots more heartening things that must surely happen in gardens other than mine.

The post Naturebytes’ weatherproof Pi and camera case appeared first on Raspberry Pi.

Puerto Rico’s First Raspberry Pi Educator Workshop

Post Syndicated from Dana Augustin original https://www.raspberrypi.org/blog/puerto-rico-raspberry-pi-workshop/

Earlier this spring, an excited group of STEM educators came together to participate in the first ever Raspberry Pi and Arduino workshop in Puerto Rico.

Their three-day digital making adventure was led by MakerTechPR’s José Rullán and Raspberry Pi Certified Educator Alex Martínez. They ran the event as part of the Robot Makers challenge organized by Yees! and sponsored by Puerto Rico’s Department of Economic Development and Trade to promote entrepreneurial skills within Puerto Rico’s education system.

Over 30 educators attended the workshop, which covered the use of the Raspberry Pi 3 as a computer and digital making resource. The educators received a kit consisting of a Raspberry Pi 3 with an Explorer HAT Pro and an Arduino Uno. At the end of the workshop, the educators were able to keep the kit as a demonstration unit for their classrooms. They were enthusiastic to learn new concepts and immerse themselves in the world of physical computing.

In their first session, the educators were introduced to the Raspberry Pi as an affordable technology for robotic clubs. In their second session, they explored physical computing and the coding languages needed to control the Explorer HAT Pro. They started off coding with Scratch, with which some educators had experience, and ended with controlling the GPIO pins with Python. In the final session, they learned how to develop applications using the powerful combination of Arduino and Raspberry Pi for robotics projects. This gave them a better understanding of how they could engage their students in physical computing.

“The Raspberry Pi ecosystem is the perfect solution in the classroom because to us it is very resourceful and accessible.” – Alex Martínez

Computer science and robotics courses are important for many schools and teachers in Puerto Rico. The simple idea of programming a microcontroller from a $35 computer increases the chances of more students having access to more technology to create things.

Puerto Rico’s education system has faced enormous challenges after Hurricane Maria, including economic collapse and the government’s closure of many schools due to the exodus of families from the island. By attending training like this workshop, educators in Puerto Rico are becoming more experienced in fields like robotics in particular, which are key for 21st-century skills and learning. This, in turn, can lead to more educational opportunities, and hopefully the reopening of more schools on the island.

“We find it imperative that our children be taught STEM disciplines and skills. Our goal is to continue this work of spreading digital making and computer science using the Raspberry Pi around Puerto Rico. We want our children to have the best education possible.” – Alex Martínez

After attending Picademy in 2016, Alex has integrated the Raspberry Pi Foundation’s online resources into his classroom. He has also taught small workshops around the island and in the local Puerto Rican makerspace community. José is an electrical engineer, entrepreneur, educator and hobbyist who enjoys learning to use technology and sharing his knowledge through projects and challenges.

The post Puerto Rico’s First Raspberry Pi Educator Workshop appeared first on Raspberry Pi.

Continued: the answers to your questions for Eben Upton

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/eben-q-a-2/

Last week, we shared the first half of our Q&A with Raspberry Pi Trading CEO and Raspberry Pi creator Eben Upton. Today we follow up with all your other questions, including your expectations for a Raspberry Pi 4, Eben’s dream add-ons, and whether we really could go smaller than the Zero.

Live Q&A with Eben Upton, creator of the Raspberry Pi

Get your questions to us now using #AskRaspberryPi on Twitter

With internet security becoming more necessary, will there be automated versions of VPN on an SD card?

There are already third-party tools which turn your Raspberry Pi into a VPN endpoint. Would we do it ourselves? Like the power button, it’s one of those cases where there are a million things we could do and so it’s more efficient to let the community get on with it.

Just to give a counterexample, while we don’t generally invest in optimising for particular use cases, we did invest a bunch of money into optimising Kodi to run well on Raspberry Pi, because we found that very large numbers of people were using it. So, if we find that we get half a million people a year using a Raspberry Pi as a VPN endpoint, then we’ll probably invest money into optimising it and feature it on the website as we’ve done with Kodi. But I don’t think we’re there today.

Have you ever seen any Pis running and doing important jobs in the wild, and if so, how does it feel?

It’s amazing how often you see them driving displays, for example in radio and TV studios. Of course, it feels great. There’s something wonderful about the geographic spread as well. The Raspberry Pi desktop is quite distinctive, both in its previous incarnation with the grey background and logo, and the current one where we have Greg Annandale’s road picture.

The PIXEL desktop on Raspberry Pi

And so it’s funny when you see it in places. Somebody sent me a video of them teaching in a classroom in rural Pakistan and in the background was Greg’s picture.

Raspberry Pi 4!?!

There will be a Raspberry Pi 4, obviously. We get asked about it a lot. I’m sticking to the guidance that I gave people that they shouldn’t expect to see a Raspberry Pi 4 this year. To some extent, the opportunity to do the 3B+ was a surprise: we were surprised that we’ve been able to get 200MHz more clock speed, triple the wireless and wired throughput, and better thermals, and still stick to the $35 price point.

We’re up against the wall from a silicon perspective; we’re at the end of what you can do with the 40nm process. It’s not that you couldn’t clock the processor faster, or put a larger processor which can execute more instructions per clock in there, it’s simply about the energy consumption and the fact that you can’t dissipate the heat. So we’ve got to go to a smaller process node and that’s an order of magnitude more challenging from an engineering perspective. There’s more effort, more risk, more cost, and all of those things are challenging.

With 3B+ out of the way, we’re going to start looking at this now. For the first six months or so we’re going to be figuring out exactly what people want from a Raspberry Pi 4. We’re listening to people’s comments about what they’d like to see in a new Raspberry Pi, and I’m hoping by early autumn we should have an idea of what we want to put in it and a strategy for how we might achieve that.

Could you go smaller than the Zero?

The challenge with Zero as that we’re periphery-limited. If you run your hand around the unit, there is no edge of that board that doesn’t have something there. So the question is: “If you want to go smaller than Zero, what feature are you willing to throw out?”

It’s a single-sided board, so you could certainly halve the PCB area if you fold the circuitry and use both sides, though you’d have to lose something. You could give up some GPIO and go back to 26 pins like the first Raspberry Pi. You could give up the camera connector, you could go to micro HDMI from mini HDMI. You could remove the SD card and just do USB boot. I’m inventing a product live on air! But really, you could get down to two thirds and lose a bunch of GPIO – it’s hard to imagine you could get to half the size.

What’s the one feature that you wish you could outfit on the Raspberry Pi that isn’t cost effective at this time? Your dream feature.

Well, more memory. There are obviously technical reasons why we don’t have more memory on there, but there are also market reasons. People ask “why doesn’t the Raspberry Pi have more memory?”, and my response is typically “go and Google ‘DRAM price’”. We’re used to the price of memory going down. And currently, we’re going through a phase where this has turned around and memory is getting more expensive again.

Machine learning would be interesting. There are machine learning accelerators which would be interesting to put on a piece of hardware. But again, they are not going to be used by everyone, so according to our method of pricing what we might add to a board, machine learning gets treated like a $50 chip. But that would be lovely to do.

Which citizen science projects using the Pi have most caught your attention?

I like the wildlife camera projects. We live out in the countryside in a little village, and we’re conscious of being surrounded by nature but we don’t see a lot of it on a day-to-day basis. So I like the nature cam projects, though, to my everlasting shame, I haven’t set one up yet. There’s a range of them, from very professional products to people taking a Raspberry Pi and a camera and putting them in a plastic box. So those are good fun.

Raspberry Shake seismometer

The Raspberry Shake seismometer

And there’s Meteor Pi from the Cambridge Science Centre, that’s a lot of fun. And the seismometer Raspberry Shake – that sort of thing is really nice. We missed the recent South Wales earthquake; perhaps we should set one up at our Californian office.

How does it feel to go to bed every day knowing you’ve changed the world for the better in such a massive way?

What feels really good is that when we started this in 2006 nobody else was talking about it, but now we’re part of a very broad movement.

We were in a really bad way: we’d seen a collapse in the number of applicants applying to study Computer Science at Cambridge and elsewhere. In our view, this reflected a move away from seeing technology as ‘a thing you do’ to seeing it as a ‘thing that you have done to you’. It is problematic from the point of view of the economy, industry, and academia, but most importantly it damages the life prospects of individual children, particularly those from disadvantaged backgrounds. The great thing about STEM subjects is that you can’t fake being good at them. There are a lot of industries where your Dad can get you a job based on who he knows and then you can kind of muddle along. But if your dad gets you a job building bridges and you suck at it, after the first or second bridge falls down, then you probably aren’t going to be building bridges anymore. So access to STEM education can be a great driver of social mobility.

By the time we were launching the Raspberry Pi in 2012, there was this wonderful movement going on. Code Club, for example, and CoderDojo came along. Lots of different ways of trying to solve the same problem. What feels really, really good is that we’ve been able to do this as part of an enormous community. And some parts of that community became part of the Raspberry Pi Foundation – we merged with Code Club, we merged with CoderDojo, and we continue to work alongside a lot of these other organisations. So in the two seconds it takes me to fall asleep after my face hits the pillow, that’s what I think about.

We’re currently advertising a Programme Manager role in New Delhi, India. Did you ever think that Raspberry Pi would be advertising a role like this when you were bringing together the Foundation?

No, I didn’t.

But if you told me we were going to be hiring somewhere, India probably would have been top of my list because there’s a massive IT industry in India. When we think about our interaction with emerging markets, India, in a lot of ways, is the poster child for how we would like it to work. There have already been some wonderful deployments of Raspberry Pi, for example in Kerala, without our direct involvement. And we think we’ve got something that’s useful for the Indian market. We have a product, we have clubs, we have teacher training. And we have a body of experience in how to teach people, so we have a physical commercial product as well as a charitable offering that we think are a good fit.

It’s going to be massive.

What is your favourite BBC type-in listing?

There was a game called Codename: Druid. There is a famous game called Codename: Droid which was the sequel to Stryker’s Run, which was an awesome, awesome game. And there was a type-in game called Codename: Druid, which was at the bottom end of what you would consider a commercial game.

codename druid

And I remember typing that in. And what was really cool about it was that the next month, the guy who wrote it did another article that talks about the memory map and which operating system functions used which bits of memory. So if you weren’t going to do disc access, which bits of memory could you trample on and know the operating system would survive.

babbage versus bugs Raspberry Pi annual

See the full listing for Babbage versus Bugs in the Raspberry Pi 2018 Annual

I still like type-in listings. The Raspberry Pi 2018 Annual has a type-in listing that I wrote for a Babbage versus Bugs game. I will say that’s not the last type-in listing you will see from me in the next twelve months. And if you download the PDF, you could probably copy and paste it into your favourite text editor to save yourself some time.

The post Continued: the answers to your questions for Eben Upton appeared first on Raspberry Pi.

Community profile: Dave Akerman

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/community-profile-dave-akerman/

This column is from The MagPi issue 61. You can download a PDF of the full issue for free, or subscribe to receive the print edition through your letterbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve our charitable goals.

The pinned tweet on Dave Akerman’s Twitter account shows a table displaying the various components needed for a high-altitude balloon (HAB) flight. Batteries, leads, a camera and Raspberry Pi, plus an unusually themed payload. The caption reads ‘The Queen, The Duke of York, and my TARDIS”, and sums up Dave’s maker career in a heartbeat.

David Akerman on Twitter

The Queen, The Duke of York, and my TARDIS 🙂 #UKHAS #RaspberryPi

Though writing software for industrial automation pays the bills, the majority of Dave’s time is spent in the world of high-altitude ballooning and the ever-growing community that encompasses it. And, while he makes some money sending business-themed balloons to near space for the likes of Aardman Animations, Confused.com, and the BBC, Dave is best known in the Raspberry Pi community for his use of the small computer in every payload, and his work as a tutor alongside the Foundation’s staff at Skycademy events.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave continues to help others while breaking records and having a good time exploring the atmosphere.

Dave has dedicated many hours and many, many more miles to assist with the Foundation’s Skycademy programme, helping to explore high-altitude ballooning with educators from across the UK. Using a Raspberry Pi and various other pieces of lightweight tech, Dave and Foundation staff member James Robinson explored the incorporation of high-altitude ballooning into education. Through Skycademy, educators were able to learn new skills and take them to the classroom, setting off their own balloons with their students, and recording the results on Raspberry Pis.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave’s most recent flight broke a new record. On 13 August 2017, his HAB payload was able to send back the highest images taken by any amateur flight.

But education isn’t the only reason for Dave’s involvement in the HAB community. As with anyone passionate about a specific hobby, Dave strives to break records. The most recent record-breaking flight took place on 13 August 2017, when Dave’s Raspberry Pi Zero HAB sent home the highest images taken by any amateur high-altitude balloon launch: at 43014 metres. No other HAB balloon has provided images from such an altitude, and the lightweight nature of the Pi Zero definitely helped, as Dave went on to mention on Twitter a few days later.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave is recognised as being the first person to incorporate a Raspberry Pi into a HAB payload, and continues to break records with the help of the little green board. More recently, he’s been able to lighten the load by using the Raspberry Pi Zero.

When the first Pi made its way to near space, Dave tore the computer apart in order to meet the weight restriction. The Pi in the Sky board was created to add the extra features needed for the flight. Since then, the HAT has experienced a few changes.

Dave Akerman The MagPi Raspberry Pi Community Profile

The Pi in the Sky board, created specifically for HAB flights.

Dave first fell in love with high-altitude ballooning after coming across the hobby in a video shared on a photographic forum. With a lifelong interest in space thanks to watching the Moon landings as a boy, plus a talent for electronics and photography, it seems a natural progression for him. Throw in his coding skills from learning to program on a Teletype and it’s no wonder he was ready and eager to take to the skies, so to speak, and capture the curvature of the Earth. What was so great about using the Raspberry Pi was the instant gratification he got from receiving images in real time as they were taken during the flight. While other devices could control a camera and store captured images for later retrieval, thanks to the Pi Dave was able to transmit the files back down to Earth and check the progress of his balloon while attempting to break records with a flight.

Dave Akerman The MagPi Raspberry Pi Community Profile Morph

One of the many commercial flights Dave has organised featured the classic children’s TV character Morph, a creation of the Aardman Animations studio known for Wallace and Gromit. Morph took to the sky twice in his mission to reach near space, and finally succeeded in 2016.

High-altitude ballooning isn’t the only part of Dave’s life that incorporates a Raspberry Pi. Having “lost count” of how many Pis he has running tasks, Dave has also created radio receivers for APRS (ham radio data), ADS-B (aircraft tracking), and OGN (gliders), along with a time-lapse camera in his garden, and he has a few more Pi for tinkering purposes.

The post Community profile: Dave Akerman appeared first on Raspberry Pi.

What John Oliver gets wrong about Bitcoin

Post Syndicated from Robert Graham original http://blog.erratasec.com/2018/03/what-john-oliver-gets-wrong-about.html

John Oliver covered bitcoin/cryptocurrencies last night. I thought I’d describe a bunch of things he gets wrong.

How Bitcoin works

Nowhere in the show does it describe what Bitcoin is and how it works.
Discussions should always start with Satoshi Nakamoto’s original paper. The thing Satoshi points out is that there is an important cost to normal transactions, namely, the entire legal system designed to protect you against fraud, such as the way you can reverse the transactions on your credit card if it gets stolen. The point of Bitcoin is that there is no way to reverse a charge. A transaction is done via cryptography: to transfer money to me, you decrypt it with your secret key and encrypt it with mine, handing ownership over to me with no third party involved that can reverse the transaction, and essentially no overhead.
All the rest of the stuff, like the decentralized blockchain and mining, is all about making that work.
Bitcoin crazies forget about the original genesis of Bitcoin. For example, they talk about adding features to stop fraud, reversing transactions, and having a central authority that manages that. This misses the point, because the existing electronic banking system already does that, and does a better job at it than cryptocurrencies ever can. If you want to mock cryptocurrencies, talk about the “DAO”, which did exactly that — and collapsed in a big fraudulent scheme where insiders made money and outsiders didn’t.
Sticking to Satoshi’s original ideas are a lot better than trying to repeat how the crazy fringe activists define Bitcoin.

How does any money have value?

Oliver’s answer is currencies have value because people agree that they have value, like how they agree a Beanie Baby is worth $15,000.
This is wrong. A better way of asking the question why the value of money changes. The dollar has been losing roughly 2% of its value each year for decades. This is called “inflation”, as the dollar loses value, it takes more dollars to buy things, which means the price of things (in dollars) goes up, and employers have to pay us more dollars so that we can buy the same amount of things.
The reason the value of the dollar changes is largely because the Federal Reserve manages the supply of dollars, using the same law of Supply and Demand. As you know, if a supply decreases (like oil), then the price goes up, or if the supply of something increases, the price goes down. The Fed manages money the same way: when prices rise (the dollar is worth less), the Fed reduces the supply of dollars, causing it to be worth more. Conversely, if prices fall (or don’t rise fast enough), the Fed increases supply, so that the dollar is worth less.
The reason money follows the law of Supply and Demand is because people use money, they consume it like they do other goods and services, like gasoline, tax preparation, food, dance lessons, and so forth. It’s not like a fine art painting, a stamp collection or a Beanie Baby — money is a product. It’s just that people have a hard time thinking of it as a consumer product since, in their experience, money is what they use to buy consumer products. But it’s a symmetric operation: when you buy gasoline with dollars, you are actually selling dollars in exchange for gasoline. That you call one side in this transaction “money” and the other “goods” is purely arbitrary, you call gasoline money and dollars the good that is being bought and sold for gasoline.
The reason dollars is a product is because trying to use gasoline as money is a pain in the neck. Storing it and exchanging it is difficult. Goods like this do become money, such as famously how prisons often use cigarettes as a medium of exchange, even for non-smokers, but it has to be a good that is fungible, storable, and easily exchanged. Dollars are the most fungible, the most storable, and the easiest exchanged, so has the most value as “money”. Sure, the mechanic can fix the farmers car for three chickens instead, but most of the time, both parties in the transaction would rather exchange the same value using dollars than chickens.
So the value of dollars is not like the value of Beanie Babies, which people might buy for $15,000, which changes purely on the whims of investors. Instead, a dollar is like gasoline, which obey the law of Supply and Demand.
This brings us back to the question of where Bitcoin gets its value. While Bitcoin is indeed used like dollars to buy things, that’s only a tiny use of the currency, so therefore it’s value isn’t determined by Supply and Demand. Instead, the value of Bitcoin is a lot like Beanie Babies, obeying the laws of investments. So in this respect, Oliver is right about where the value of Bitcoin comes, but wrong about where the value of dollars comes from.

Why Bitcoin conference didn’t take Bitcoin

John Oliver points out the irony of a Bitcoin conference that stopped accepting payments in Bitcoin for tickets.
The biggest reason for this is because Bitcoin has become so popular that transaction fees have gone up. Instead of being proof of failure, it’s proof of popularity. What John Oliver is saying is the old joke that nobody goes to that popular restaurant anymore because it’s too crowded and you can’t get a reservation.
Moreover, the point of Bitcoin is not to replace everyday currencies for everyday transactions. If you read Satoshi Nakamoto’s whitepaper, it’s only goal is to replace certain types of transactions, like purely electronic transactions where electronic goods and services are being exchanged. Where real-life goods/services are being exchanged, existing currencies work just fine. It’s only the crazy activists who claim Bitcoin will eventually replace real world currencies — the saner people see it co-existing with real-world currencies, each with a different value to consumers.

Turning a McNugget back into a chicken

John Oliver uses the metaphor of turning a that while you can process a chicken into McNuggets, you can’t reverse the process. It’s a funny metaphor.
But it’s not clear what the heck this metaphor is trying explain. That’s not a metaphor for the blockchain, but a metaphor for a “cryptographic hash”, where each block is a chicken, and the McNugget is the signature for the block (well, the block plus the signature of the last block, forming a chain).
Even then that metaphor as problems. The McNugget produced from each chicken must be unique to that chicken, for the metaphor to accurately describe a cryptographic hash. You can therefore identify the original chicken simply by looking at the McNugget. A slight change in the original chicken, like losing a feather, results in a completely different McNugget. Thus, nuggets can be used to tell if the original chicken has changed.
This then leads to the key property of the blockchain, it is unalterable. You can’t go back and change any of the blocks of data, because the fingerprints, the nuggets, will also change, and break the nugget chain.
The point is that while John Oliver is laughing at a silly metaphor to explain the blockchain becuase he totally misses the point of the metaphor.
Oliver rightly says “don’t worry if you don’t understand it — most people don’t”, but that includes the big companies that John Oliver name. Some companies do get it, and are producing reasonable things (like JP Morgan, by all accounts), but some don’t. IBM and other big consultancies are charging companies millions of dollars to consult with them on block chain products where nobody involved, the customer or the consultancy, actually understand any of it. That doesn’t stop them from happily charging customers on one side and happily spending money on the other.
Thus, rather than Oliver explaining the problem, he’s just being part of the problem. His explanation of blockchain left you dumber than before.

ICO’s

John Oliver mocks the Brave ICO ($35 million in 30 seconds), claiming it’s all driven by YouTube personalities and people who aren’t looking at the fundamentals.
And while this is true, most ICOs are bunk, the  Brave ICO actually had a business model behind it. Brave is a Chrome-like web-browser whose distinguishing feature is that it protects your privacy from advertisers. If you don’t use Brave or a browser with an ad block extension, you have no idea how bad things are for you. However, this presents a problem for websites that fund themselves via advertisements, which is most of them, because visitors no longer see ads. Brave has a fix for this. Most people wouldn’t mind supporting the websites they visit often, like the New York Times. That’s where the Brave ICO “token” comes in: it’s not simply stock in Brave, but a token for micropayments to websites. Users buy tokens, then use them for micropayments to websites like New York Times. The New York Times then sells the tokens back to the market for dollars. The buying and selling of tokens happens without a centralized middleman.
This is still all speculative, of course, and it remains to be seen how successful Brave will be, but it’s a serious effort. It has well respected VC behind the company, a well-respected founder (despite the fact he invented JavaScript), and well-respected employees. It’s not a scam, it’s a legitimate venture.

How to you make money from Bitcoin?

The last part of the show is dedicated to describing all the scam out there, advising people to be careful, and to be “responsible”. This is garbage.
It’s like my simple two step process to making lots of money via Bitcoin: (1) buy when the price is low, and (2) sell when the price is high. My advice is correct, of course, but useless. Same as “be careful” and “invest responsibly”.
The truth about investing in cryptocurrencies is “don’t”. The only responsible way to invest is to buy low-overhead market index funds and hold for retirement. No, you won’t get super rich doing this, but anything other than this is irresponsible gambling.
It’s a hard lesson to learn, because everyone is telling you the opposite. The entire channel CNBC is devoted to day traders, who buy and sell stocks at a high rate based on the same principle as a ponzi scheme, basing their judgment not on the fundamentals (like long term dividends) but animal spirits of whatever stock is hot or cold at the moment. This is the same reason people buy or sell Bitcoin, not because they can describe the fundamental value, but because they believe in a bigger fool down the road who will buy it for even more.
For things like Bitcoin, the trick to making money is to have bought it over 7 years ago when it was essentially worthless, except to nerds who were into that sort of thing. It’s the same tick to making a lot of money in Magic: The Gathering trading cards, which nerds bought decades ago which are worth a ton of money now. Or, to have bought Apple stock back in 2009 when the iPhone was new, when nerds could understand the potential of real Internet access and apps that Wall Street could not.
That was my strategy: be a nerd, who gets into things. I’ve made a good amount of money on all these things because as a nerd, I was into Magic: The Gathering, Bitcoin, and the iPhone before anybody else was, and bought in at the point where these things were essentially valueless.
At this point with cryptocurrencies, with the non-nerds now flooding the market, there little chance of making it rich. The lottery is probably a better bet. Instead, if you want to make money, become a nerd, obsess about a thing, understand a thing when its new, and cash out once the rest of the market figures it out. That might be Brave, for example, but buy into it because you’ve spent the last year studying the browser advertisement ecosystem, the market’s willingness to pay for content, and how their Basic Attention Token delivers value to websites — not because you want in on the ICO craze.

Conclusion

John Oliver spends 25 minutes explaining Bitcoin, Cryptocurrencies, and the Blockchain to you. Sure, it’s funny, but it leaves you worse off than when it started. It admits they “simplify” the explanation, but they simplified it so much to the point where they removed all useful information.

PipeCam: the low-cost underwater camera

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/pipecam-low-cost-underwater-camera/

Fred Fourie is building a low-cost underwater camera for shallow deployment, and his prototypes are already returning fascinating results. You can build your own PipeCam, and explore the undiscovered depths with a Raspberry Pi and off-the-shelf materials.

PipeCam underwater Raspberry Pi Camera

Materials and build

In its latest iteration, PipeCam consists of a 110mm PVC waste pipe with fittings and a 10mm perspex window at one end. Previous prototypes have also used plumbing materials for the body, but this latest version employs heavy-duty parts that deliver the good seal this project needs.

PipeCam underwater Raspberry Pi Camera

In testing, Fred and a friend determined that the rig could withstand 4 bar of pressure. This is enough to protect the tech inside at the depths Fred plans for, and a significant performance improvement on previous prototypes.

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

Inside the pipe are a Raspberry Pi 3, a camera module, and a real-time clock add-on board. A 2.4Ah rechargeable lead acid battery powers the set-up via a voltage regulator.

Using foam and fibreboard, Fred made a mount that holds everything in place and fits snugly inside the pipe.

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

PipeCam will be subject to ocean currents, not to mention the attentions of sea creatures, so it’s essential to make sure that everything is held securely inside the pipe – something Fred has learned from previous versions of the project.

Software

It’s straightforward to write time-lapse code for a Raspberry Pi using Python and one of our free online resources, but Fred has more ambitious plans for PipeCam. As well as a Python script to control the camera, Fred made a web page to display the health of the device. It shows battery level and storage availability, along with the latest photo taken by the camera. He also made adjustments to the camera’s exposure settings using raspistill. You can see the effect in this side-by-side comparison of the default python-picam image and the edited raspistill one.

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

Underwater testing

Fred has completed the initial first test of PipeCam, running the device under water for an hour in two-metre deep water off the coast near his home. And the results? Well, see for yourself:

PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera
PipeCam underwater Raspberry Pi Camera

PipeCam is a work in progress, and you can read Fred’s build log at the project’s Hackaday.io page, so be sure to follow along.

The post PipeCam: the low-cost underwater camera appeared first on Raspberry Pi.

Top 8 Best Practices for High-Performance ETL Processing Using Amazon Redshift

Post Syndicated from Thiyagarajan Arumugam original https://aws.amazon.com/blogs/big-data/top-8-best-practices-for-high-performance-etl-processing-using-amazon-redshift/

An ETL (Extract, Transform, Load) process enables you to load data from source systems into your data warehouse. This is typically executed as a batch or near-real-time ingest process to keep the data warehouse current and provide up-to-date analytical data to end users.

Amazon Redshift is a fast, petabyte-scale data warehouse that enables you easily to make data-driven decisions. With Amazon Redshift, you can get insights into your big data in a cost-effective fashion using standard SQL. You can set up any type of data model, from star and snowflake schemas, to simple de-normalized tables for running any analytical queries.

To operate a robust ETL platform and deliver data to Amazon Redshift in a timely manner, design your ETL processes to take account of Amazon Redshift’s architecture. When migrating from a legacy data warehouse to Amazon Redshift, it is tempting to adopt a lift-and-shift approach, but this can result in performance and scale issues long term. This post guides you through the following best practices for ensuring optimal, consistent runtimes for your ETL processes:

  • COPY data from multiple, evenly sized files.
  • Use workload management to improve ETL runtimes.
  • Perform table maintenance regularly.
  • Perform multiple steps in a single transaction.
  • Loading data in bulk.
  • Use UNLOAD to extract large result sets.
  • Use Amazon Redshift Spectrum for ad hoc ETL processing.
  • Monitor daily ETL health using diagnostic queries.

1. COPY data from multiple, evenly sized files

Amazon Redshift is an MPP (massively parallel processing) database, where all the compute nodes divide and parallelize the work of ingesting data. Each node is further subdivided into slices, with each slice having one or more dedicated cores, equally dividing the processing capacity. The number of slices per node depends on the node type of the cluster. For example, each DS2.XLARGE compute node has two slices, whereas each DS2.8XLARGE compute node has 16 slices.

When you load data into Amazon Redshift, you should aim to have each slice do an equal amount of work. When you load the data from a single large file or from files split into uneven sizes, some slices do more work than others. As a result, the process runs only as fast as the slowest, or most heavily loaded, slice. In the example shown below, a single large file is loaded into a two-node cluster, resulting in only one of the nodes, “Compute-0”, performing all the data ingestion:

When splitting your data files, ensure that they are of approximately equal size – between 1 MB and 1 GB after compression. The number of files should be a multiple of the number of slices in your cluster. Also, I strongly recommend that you individually compress the load files using gzip, lzop, or bzip2 to efficiently load large datasets.

When loading multiple files into a single table, use a single COPY command for the table, rather than multiple COPY commands. Amazon Redshift automatically parallelizes the data ingestion. Using a single COPY command to bulk load data into a table ensures optimal use of cluster resources, and quickest possible throughput.

2. Use workload management to improve ETL runtimes

Use Amazon Redshift’s workload management (WLM) to define multiple queues dedicated to different workloads (for example, ETL versus reporting) and to manage the runtimes of queries. As you migrate more workloads into Amazon Redshift, your ETL runtimes can become inconsistent if WLM is not appropriately set up.

I recommend limiting the overall concurrency of WLM across all queues to around 15 or less. This WLM guide helps you organize and monitor the different queues for your Amazon Redshift cluster.

When managing different workloads on your Amazon Redshift cluster, consider the following for the queue setup:

  • Create a queue dedicated to your ETL processes. Configure this queue with a small number of slots (5 or fewer). Amazon Redshift is designed for analytics queries, rather than transaction processing. The cost of COMMIT is relatively high, and excessive use of COMMIT can result in queries waiting for access to the commit queue. Because ETL is a commit-intensive process, having a separate queue with a small number of slots helps mitigate this issue.
  • Claim extra memory available in a queue. When executing an ETL query, you can take advantage of the wlm_query_slot_count to claim the extra memory available in a particular queue. For example, a typical ETL process might involve COPYing raw data into a staging table so that downstream ETL jobs can run transformations that calculate daily, weekly, and monthly aggregates. To speed up the COPY process (so that the downstream tasks can start in parallel sooner), the wlm_query_slot_count can be increased for this step.
  • Create a separate queue for reporting queries. Configure query monitoring rules on this queue to further manage long-running and expensive queries.
  • Take advantage of the dynamic memory parameters. They swap the memory from your ETL to your reporting queue after the ETL job has completed.

3. Perform table maintenance regularly

Amazon Redshift is a columnar database, which enables fast transformations for aggregating data. Performing regular table maintenance ensures that transformation ETLs are predictable and performant. To get the best performance from your Amazon Redshift database, you must ensure that database tables regularly are VACUUMed and ANALYZEd. The Analyze & Vacuum schema utility helps you automate the table maintenance task and have VACUUM & ANALYZE executed in a regular fashion.

  • Use VACUUM to sort tables and remove deleted blocks

During a typical ETL refresh process, tables receive new incoming records using COPY, and unneeded data (cold data) is removed using DELETE. New rows are added to the unsorted region in a table. Deleted rows are simply marked for deletion.

DELETE does not automatically reclaim the space occupied by the deleted rows. Adding and removing large numbers of rows can therefore cause the unsorted region and the number of deleted blocks to grow. This can degrade the performance of queries executed against these tables.

After an ETL process completes, perform VACUUM to ensure that user queries execute in a consistent manner. The complete list of tables that need VACUUMing can be found using the Amazon Redshift Util’s table_info script.

Use the following approaches to ensure that VACCUM is completed in a timely manner:

  • Use wlm_query_slot_count to claim all the memory allocated in the ETL WLM queue during the VACUUM process.
  • DROP or TRUNCATE intermediate or staging tables, thereby eliminating the need to VACUUM them.
  • If your table has a compound sort key with only one sort column, try to load your data in sort key order. This helps reduce or eliminate the need to VACUUM the table.
  • Consider using time series This helps reduce the amount of data you need to VACUUM.
  • Use ANALYZE to update database statistics

Amazon Redshift uses a cost-based query planner and optimizer using statistics about tables to make good decisions about the query plan for the SQL statements. Regular statistics collection after the ETL completion ensures that user queries run fast, and that daily ETL processes are performant. The Amazon Redshift utility table_info script provides insights into the freshness of the statistics. Keeping the statistics off (pct_stats_off) less than 20% ensures effective query plans for the SQL queries.

4. Perform multiple steps in a single transaction

ETL transformation logic often spans multiple steps. Because commits in Amazon Redshift are expensive, if each ETL step performs a commit, multiple concurrent ETL processes can take a long time to execute.

To minimize the number of commits in a process, the steps in an ETL script should be surrounded by a BEGIN…END statement so that a single commit is performed only after all the transformation logic has been executed. For example, here is an example multi-step ETL script that performs one commit at the end:

Begin
CREATE temporary staging_table;
INSERT INTO staging_table SELECT .. FROM source (transformation logic);
DELETE FROM daily_table WHERE dataset_date =?;
INSERT INTO daily_table SELECT .. FROM staging_table (daily aggregate);
DELETE FROM weekly_table WHERE weekending_date=?;
INSERT INTO weekly_table SELECT .. FROM staging_table(weekly aggregate);
Commit

5. Loading data in bulk

Amazon Redshift is designed to store and query petabyte-scale datasets. Using Amazon S3 you can stage and accumulate data from multiple source systems before executing a bulk COPY operation. The following methods allow efficient and fast transfer of these bulk datasets into Amazon Redshift:

  • Use a manifest file to ingest large datasets that span multiple files. The manifest file is a JSON file that lists all the files to be loaded into Amazon Redshift. Using a manifest file ensures that Amazon Redshift has a consistent view of the data to be loaded from S3, while also ensuring that duplicate files do not result in the same data being loaded more than one time.
  • Use temporary staging tables to hold the data for transformation. These tables are automatically dropped after the ETL session is complete. Temporary tables can be created using the CREATE TEMPORARY TABLE syntax, or by issuing a SELECT … INTO #TEMP_TABLE query. Explicitly specifying the CREATE TEMPORARY TABLE statement allows you to control the DISTRIBUTION KEY, SORT KEY, and compression settings to further improve performance.
  • User ALTER table APPEND to swap data from the staging tables to the target table. Data in the source table is moved to matching columns in the target table. Column order doesn’t matter. After data is successfully appended to the target table, the source table is empty. ALTER TABLE APPEND is much faster than a similar CREATE TABLE AS or INSERT INTO operation because it doesn’t involve copying or moving data.

6. Use UNLOAD to extract large result sets

Fetching a large number of rows using SELECT is expensive and takes a long time. When a large amount of data is fetched from the Amazon Redshift cluster, the leader node has to hold the data temporarily until the fetches are complete. Further, data is streamed out sequentially, which results in longer elapsed time. As a result, the leader node can become hot, which not only affects the SELECT that is being executed, but also throttles resources for creating execution plans and managing the overall cluster resources. Here is an example of a large SELECT statement. Notice that the leader node is doing most of the work to stream out the rows:

Use UNLOAD to extract large results sets directly to S3. After it’s in S3, the data can be shared with multiple downstream systems. By default, UNLOAD writes data in parallel to multiple files according to the number of slices in the cluster. All the compute nodes participate to quickly offload the data into S3.

If you are extracting data for use with Amazon Redshift Spectrum, you should make use of the MAXFILESIZE parameter to and keep files are 150 MB. Similar to item 1 above, having many evenly sized files ensures that Redshift Spectrum can do the maximum amount of work in parallel.

7. Use Redshift Spectrum for ad hoc ETL processing

Events such as data backfill, promotional activity, and special calendar days can trigger additional data volumes that affect the data refresh times in your Amazon Redshift cluster. To help address these spikes in data volumes and throughput, I recommend staging data in S3. After data is organized in S3, Redshift Spectrum enables you to query it directly using standard SQL. In this way, you gain the benefits of additional capacity without having to resize your cluster.

For tips on getting started with and optimizing the use of Redshift Spectrum, see the previous post, 10 Best Practices for Amazon Redshift Spectrum.

8. Monitor daily ETL health using diagnostic queries

Monitoring the health of your ETL processes on a regular basis helps identify the early onset of performance issues before they have a significant impact on your cluster. The following monitoring scripts can be used to provide insights into the health of your ETL processes:

Script Use when… Solution
commit_stats.sql – Commit queue statistics from past days, showing largest queue length and queue time first DML statements such as INSERT/UPDATE/COPY/DELETE operations take several times longer to execute when multiple of these operations are in progress Set up separate WLM queues for the ETL process and limit the concurrency to < 5.
copy_performance.sql –  Copy command statistics for the past days Daily COPY operations take longer to execute • Follow the best practices for the COPY command.
• Analyze data growth with the incoming datasets and consider cluster resize to meet the expected SLA.
table_info.sql – Table skew and unsorted statistics along with storage and key information Transformation steps take longer to execute • Set up regular VACCUM jobs to address unsorted rows and claim the deleted blocks so that transformation SQL execute optimally.
• Consider a table redesign to avoid data skewness.
v_check_transaction_locks.sql – Monitor transaction locks INSERT/UPDATE/COPY/DELETE operations on particular tables do not respond back in timely manner, compared to when run after the ETL Multiple DML statements are operating on the same target table at the same moment from different transactions. Set up ETL job dependency so that they execute serially for the same target table.
v_get_schema_priv_by_user.sql – Get the schema that the user has access to Reporting users can view intermediate tables Set up separate database groups for reporting and ETL users, and grants access to objects using GRANT.
v_generate_tbl_ddl.sql – Get the table DDL You need to create an empty table with same structure as target table for data backfill Generate DDL using this script for data backfill.
v_space_used_per_tbl.sql – monitor space used by individual tables Amazon Redshift data warehouse space growth is trending upwards more than normal

Analyze the individual tables that are growing at higher rate than normal. Consider data archival using UNLOAD to S3 and Redshift Spectrum for later analysis.

Use unscanned_table_summary.sql to find unused table and archive or drop them.

top_queries.sql – Return the top 50 time consuming statements aggregated by its text ETL transformations are taking longer to execute Analyze the top transformation SQL and use EXPLAIN to find opportunities for tuning the query plan.

There are several other useful scripts available in the amazon-redshift-utils repository. The AWS Lambda Utility Runner runs a subset of these scripts on a scheduled basis, allowing you to automate much of monitoring of your ETL processes.

Example ETL process

The following ETL process reinforces some of the best practices discussed in this post. Consider the following four-step daily ETL workflow where data from an RDBMS source system is staged in S3 and then loaded into Amazon Redshift. Amazon Redshift is used to calculate daily, weekly, and monthly aggregations, which are then unloaded to S3, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

Step 1:  Extract from the RDBMS source to a S3 bucket

In this ETL process, the data extract job fetches change data every 1 hour and it is staged into multiple hourly files. For example, the staged S3 folder looks like the following:

 [[email protected] ~]$ aws s3 ls s3://<<S3 Bucket>>/batch/2017/07/02/
2017-07-02 01:59:58   81900220 20170702T01.export.gz
2017-07-02 02:59:56   84926844 20170702T02.export.gz
2017-07-02 03:59:54   78990356 20170702T03.export.gz
…
2017-07-02 22:00:03   75966745 20170702T21.export.gz
2017-07-02 23:00:02   89199874 20170702T22.export.gz
2017-07-02 00:59:59   71161715 20170702T23.export.gz

Organizing the data into multiple, evenly sized files enables the COPY command to ingest this data using all available resources in the Amazon Redshift cluster. Further, the files are compressed (gzipped) to further reduce COPY times.

Step 2: Stage data to the Amazon Redshift table for cleansing

Ingesting the data can be accomplished using a JSON-based manifest file. Using the manifest file ensures that S3 eventual consistency issues can be eliminated and also provides an opportunity to dedupe any files if needed. A sample manifest20170702.json file looks like the following:

{
  "entries": [
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T01.export.gz", "mandatory":true},
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T02.export.gz", "mandatory":true},
    …
    {"url":" s3://<<S3 Bucket>>/batch/2017/07/02/20170702T23.export.gz", "mandatory":true}
  ]
}

The data can be ingested using the following command:

SET wlm_query_slot_count TO <<max available concurrency in the ETL queue>>;
COPY stage_tbl FROM 's3:// <<S3 Bucket>>/batch/manifest20170702.json' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' manifest;

Because the downstream ETL processes depend on this COPY command to complete, the wlm_query_slot_count is used to claim all the memory available to the queue. This helps the COPY command complete as quickly as possible.

Step 3: Transform data to create daily, weekly, and monthly datasets and load into target tables

Data is staged in the “stage_tbl” from where it can be transformed into the daily, weekly, and monthly aggregates and loaded into target tables. The following job illustrates a typical weekly process:

Begin
INSERT into ETL_LOG (..) values (..);
DELETE from weekly_tbl where dataset_week = <<current week>>;
INSERT into weekly_tbl (..)
  SELECT date_trunc('week', dataset_day) AS week_begin_dataset_date, SUM(C1) AS C1, SUM(C2) AS C2
	FROM   stage_tbl
GROUP BY date_trunc('week', dataset_day);
INSERT into AUDIT_LOG values (..);
COMMIT;
End;

As shown above, multiple steps are combined into one transaction to perform a single commit, reducing contention on the commit queue.

Step 4: Unload the daily dataset to populate the S3 data lake bucket

The transformed results are now unloaded into another S3 bucket, where they can be further processed and made available for end-user reporting using a number of different tools, including Redshift Spectrum and Amazon Athena.

unload ('SELECT * FROM weekly_tbl WHERE dataset_week = <<current week>>’) TO 's3:// <<S3 Bucket>>/datalake/weekly/20170526/' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

Summary

Amazon Redshift lets you easily operate petabyte-scale data warehouses on the cloud. This post summarized the best practices for operating scalable ETL natively within Amazon Redshift. I demonstrated efficient ways to ingest and transform data, along with close monitoring. I also demonstrated the best practices being used in a typical sample ETL workload to transform the data into Amazon Redshift.

If you have questions or suggestions, please comment below.

 


About the Author

Thiyagarajan Arumugam is a Big Data Solutions Architect at Amazon Web Services and designs customer architectures to process data at scale. Prior to AWS, he built data warehouse solutions at Amazon.com. In his free time, he enjoys all outdoor sports and practices the Indian classical drum mridangam.

 

Eevee gained 2791 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/15/eevee-gained-2791-experience-points/

Eevee grew to level 31!

A year strongly defined by mixed success! Also, a lot of video games.

I ran three game jams, resulting in a total of 157 games existing that may not have otherwise, which is totally mindblowing?!

For GAMES MADE QUICK???, glip and I made NEON PHASE, a short little exploratory platformer. Honestly, I should give myself more credit for this and the rest of the LÖVE games I’ve based on the same codebase — I wove a physics engine (and everything else!) from scratch and it has held up remarkably well for a variety of different uses.

I successfully finished an HD version of Isaac’s Descent using my LÖVE engine, though it doesn’t have anything new over the original and I’ve only released it as a tech demo on Patreon.

For Strawberry Jam (NSFW!) we made fox flux (slightly NSFW!), which felt like a huge milestone: the first game where I made all the art! I mean, not counting Isaac’s Descent, which was for a very limited platform. It’s a pretty arbitrary milestone, yes, but it feels significant. I’ve been working on expanding the game into a longer and slightly less buggy experience, but the art is taking the longest by far. I must’ve spent weeks on player sprites alone.

We then set about working on Bolthaven, a sequel of sorts to NEON PHASE, and got decently far, and then abandond it. Oops.

We then started a cute little PICO-8 game, and forgot about it. Oops.

I was recruited to help with Chaos Composer, a more ambitious game glip started with someone else in Unity. I had to get used to Unity, and we squabbled a bit, but the game is finally about at the point where it’s “playable” and “maps” can be designed? It’s slightly on hold at the moment while we all finish up some other stuff, though.

We made a birthday game for two of our friends whose birthdays were very close together! Only they got to see it.

For Ludum Dare 38, we made Lunar Depot 38, a little “wave shooter” or whatever you call those? The AI is pretty rough, seeing as this was the first time I’d really made enemies and I had 72 hours to figure out how to do it, but I still think it’s pretty fun to play and I love the circular world.

I made Roguelike Simulator as an experiment with making something small and quick with a simple tool, and I had a lot of fun! I definitely want to do more stuff like this in the future.

And now we’re working on a game about Star Anise, my cat’s self-insert, which is looking to have more polish and depth than anything we’ve done so far! We’ve definitely come a long way in a year.

Somewhere along the line, I put out a call for a “potluck” project, where everyone would give me sprites of a given size without knowing what anyone else had contributed, and I would then make a game using only those sprites. Unfortunately, that stalled a few times: I tried using the Phaser JS library, but we didn’t get along; I tried LÖVE, but didn’t know where to go with the game; and then I decided to use this as an experiment with procedural generation, and didn’t get around to it. I still feel bad that everyone did work for me and I didn’t follow through, but I don’t know whether this will ever become a game.

veekun, alas, consumed months of my life. I finally got Sun and Moon loaded, but it took weeks of work since I was basically reinventing all the tooling we’d ever had from scratch, without even having most of that tooling available as a reference. It was worth it in the end, at least: Ultra Sun and Ultra Moon only took a few days to get loaded. But veekun itself is still missing some obvious Sun/Moon features, and the whole site needs an overhaul, and I just don’t know if I want to dedicate that much time to it when I have so much other stuff going on that’s much more interesting to me right now.

I finally turned my blog into more of a website, giving it a neat front page that lists a bunch of stuff I’ve done. I made a release category at last, though I’m still not quite in the habit of using it.

I wrote some blog posts, of course! I think the most interesting were JavaScript got better while I wasn’t looking and Object models. I was also asked to write a couple pieces for money for a column that then promptly shut down.

On a whim, I made a set of Eevee mugshots for Doom, which I think is a decent indication of my (pixel) art progress over the year?

I started idchoppers, a Doom parsing and manipulation library written in Rust, though it didn’t get very far and I’ve spent most of the time fighting with Rust because it won’t let me implement all my extremely bad ideas. It can do a couple things, at least, like flip maps very quickly and render maps to SVG.

I did toy around with music a little, but not a lot.

I wrote two short twines for Flora. They’re okay. I’m working on another; I think it’ll be better.

I didn’t do a lot of art overall, at least compared to the two previous years; most of my art effort over the year has gone into fox flux, which requires me to learn a whole lot of things. I did dip my toes into 3D modelling, most notably producing my current Twitter banner as well as this cool Star Anise animation. I wouldn’t mind doing more of that; maybe I’ll even try to make a low-poly pixel-textured 3D game sometime.

I restarted my book with a much better concept, though so far I’ve only written about half a chapter. Argh. I see that the vast majority of the work was done within the span of a single week, which is bad since that means I only worked on it for a week, but good since that means I can actually do a pretty good amount of work in only a week. I also did a lot of squabbling with tooling, which is hopefully mostly out of the way now.

My computer broke? That was an exciting week.


A lot of stuff, but the year as a whole still feels hit or miss. All the time I spent on veekun feels like a black void in the middle of the year, which seems like a good sign that I maybe don’t want to pour even more weeks into it in the near future.

Mostly, I want to do: more games, more art, more writing, more music.

I want to try out some tiny game making tools and make some tiny games with them — partly to get exposure to different things, partly to get more little ideas out into the world regularly, and partly to get more practice at letting myself have ideas. I have a couple tools in mind and I guess I’ll aim at a microgame every two months or so? I’d also like to finish the expanded fox flux by the end of the year, of course, though at the moment I can’t even gauge how long it might take.

I seriously lapsed on drawing last year, largely because fox flux pixel art took me so much time. So I want to draw more, and I want to get much faster at pixel art. It would probably help if I had a more concrete goal for drawing, so I might try to draw some short comics and write a little visual novel or something, which would also force me to aim for consistency.

I want to work on my book more, of course, but I also want to try my hand at a bit more fiction. I’ve had a blast writing dialogue for our games! I just shy away from longer-form writing for some reason — which seems ridiculous when a large part of my audience found me through my blog. I do think I’ve had some sort of breakthrough in the last month or two; I suddenly feel a good bit more confident about writing in general and figuring out what I want to say? One recent post I know I wrote in a single afternoon, which virtually never happens because I keep rewriting and rearranging stuff. Again, a visual novel would be a good excuse to practice writing fiction without getting too bogged down in details.

And, ah, music. I shy heavily away from music, since I have no idea what I’m doing, and also I seem to spend a lot of time fighting with tools. (Surprise.) I tried out SunVox for the first time just a few days ago and have been enjoying it quite a bit for making sound effects, so I might try it for music as well. And once again, visual novel background music is a pretty low-pressure thing to compose for. Hell, visual novels are small games, too, so that checks all the boxes. I guess I’ll go make a visual novel.

Here’s to twenty gayteen!

A hedgehog cam or two

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/a-hedgehog-cam-or-two/

Here we are, hauling ourselves out of the Christmas and New Year holidays and into January proper. It’s dawning on me that I have to go back to work, even though it’s still very cold and gloomy in northern Europe, and even though my duvet is lovely and warm. I found myself envying beings that hibernate, and thinking about beings that hibernate, and searching for things to do with hedgehogs. And, well, the long and the short of it is, today’s blog post is a short meditation on the hedgehog cam.

A hedgehog in a garden, photographed in infrared light by a hedgehog cam

Success! It’s a hedgehog!
Photo by Andrew Wedgbury

Hedgehog watching

Someone called Barker has installed a Raspberry Pi–based hedgehog cam in a location with a distant view of a famous Alp, and as well as providing live views by visible and infrared light for the dedicated and the insomniac, they also make a sped-up version of the previous night’s activity available. With hedgehogs usually being in hibernation during January, you mightn’t see them in any current feed — but don’t worry! You’re guaranteed a few hedgehogs on Barker’s website, because they have also thrown in some lovely GIFs of hoggy (and foxy) divas that their camera captured in the past.

A Hedgehog eating from a bowl on a patio, captured by a hedgehog cam

Nom nom nom!
GIF by Barker’s Site

Build your own hedgehog cam

For pointers on how to replicate this kind of setup, you could do worse than turn to Andrew Wedgbury’s hedgehog cam write-up. Andrew’s Twitter feed reveals that he’s a Cambridge local, and there are hints that he was behind RealVNC’s hoggy mascot for Pi Wars 2017.

RealVNC on Twitter

Another day at the office: testing our #PiWars mascot using a @Raspberry_Pi 3, #VNC Connect and @4tronix_uk Picon Zero. Name suggestions? https://t.co/iYY3xAX9Bk

Our infrared bird box and time-lapse camera resources will also set you well on the way towards your own custom wildlife camera. For a kit that wraps everything up in a weatherproof enclosure made with love, time, and serious amounts of design and testing, take a look at Naturebytes’ wildlife cam kit.

Or, if you’re thinking that a robot mascot is more dependable than real animals for the fluffiness you need in order to start your January with something like productivity and with your soul intact, you might like to put your own spin on our robot buggy.

Happy 2018

While we’re on the subject of getting to grips with the new year, do take a look at yesterday’s blog post, in which we suggest a New Year’s project that’s different from the usual resolutions. However you tackle 2018, we wish you an excellent year of creative computing.

The post A hedgehog cam or two appeared first on Raspberry Pi.

The Raspberry Pi Christmas shopping list 2017

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/christmas-shopping-list-2017/

Looking for the perfect Christmas gift for a beloved maker in your life? Maybe you’d like to give a relative or friend a taste of the world of coding and Raspberry Pi? Whatever you’re looking for, the Raspberry Pi Christmas shopping list will point you in the right direction.

An ice-skating Raspberry Pi - The Raspberry Pi Christmas Shopping List 2017

For those getting started

Thinking about introducing someone special to the wonders of Raspberry Pi during the holidays? Although you can set up your Pi with peripherals from around your home, such as a mobile phone charger, your PC’s keyboard, and the old mouse dwelling in an office drawer, a starter kit is a nice all-in-one package for the budding coder.



Check out the starter kits from Raspberry Pi Approved Resellers such as Pimoroni, The Pi Hut, ModMyPi, Adafruit, CanaKit…the list is pretty long. Our products page will direct you to your closest reseller, or you can head to element14 to pick up the official Raspberry Pi Starter Kit.



You can also buy the Raspberry Pi Press’s brand-new Raspberry Pi Beginners Book, which includes a Raspberry Pi Zero W, a case, a ready-made SD card, and adapter cables.

Once you’ve presented a lucky person with their first Raspberry Pi, it’s time for them to spread their maker wings and learn some new skills.

MagPi Essentials books - The Raspberry Pi Christmas Shopping List 2017

To help them along, you could pick your favourite from among the Official Projects Book volume 3, The MagPi Essentials guides, and the brand-new third edition of Carrie Anne Philbin’s Adventures in Raspberry Pi. (She is super excited about this new edition!)

And you can always add a link to our free resources on the gift tag.

For the maker in your life

If you’re looking for something for a confident digital maker, you can’t go wrong with adding to their arsenal of electric and electronic bits and bobs that are no doubt cluttering drawers and boxes throughout their house.



Components such as servomotors, displays, and sensors are staples of the maker world. And when it comes to jumper wires, buttons, and LEDs, one can never have enough.



You could also consider getting your person a soldering iron, some helpings hands, or small tools such as a Dremel or screwdriver set.

And to make their life a little less messy, pop it all inside a Really Useful Box…because they’re really useful.



For kit makers

While some people like to dive into making head-first and to build whatever comes to mind, others enjoy working with kits.



The Naturebytes kit allows you to record the animal visitors of your garden with the help of a camera and a motion sensor. Footage of your local badgers, birds, deer, and more will be saved to an SD card, or tweeted or emailed to you if it’s in range of WiFi.

Cortec Tiny 4WD - The Raspberry Pi Christmas Shopping List 2017

Coretec’s Tiny 4WD is a kit for assembling a Pi Zero–powered remote-controlled robot at home. Not only is the robot adorable, building it also a great introduction to motors and wireless control.



Bare Conductive’s Touch Board Pro Kit offers everything you need to create interactive electronics projects using conductive paint.

Pi Hut Arcade Kit - The Raspberry Pi Christmas Shopping List 2017

Finally, why not help your favourite maker create their own gaming arcade using the Arcade Building Kit from The Pi Hut?

For the reader

For those who like to curl up with a good read, or spend too much of their day on public transport, a book or magazine subscription is the perfect treat.

For makers, hackers, and those interested in new technologies, our brand-new HackSpace magazine and the ever popular community magazine The MagPi are ideal. Both are available via a physical or digital subscription, and new subscribers to The MagPi also receive a free Raspberry Pi Zero W plus case.

Cover of CoderDojo Nano Make your own game

Marc Scott Beginner's Guide to Coding Book

You can also check out other publications from the Raspberry Pi family, including CoderDojo’s new CoderDojo Nano: Make Your Own Game, Eben Upton and Gareth Halfacree’s Raspberry Pi User Guide, and Marc Scott’s A Beginner’s Guide to Coding. And have I mentioned Carrie Anne’s Adventures in Raspberry Pi yet?

Stocking fillers for everyone

Looking for something small to keep your loved ones occupied on Christmas morning? Or do you have to buy a Secret Santa gift for the office tech? Here are some wonderful stocking fillers to fill your boots with this season.

Pi Hut 3D Christmas Tree - The Raspberry Pi Christmas Shopping List 2017

The Pi Hut 3D Xmas Tree: available as both a pre-soldered and a DIY version, this gadget will work with any 40-pin Raspberry Pi and allows you to create your own mini light show.



Google AIY Voice kit: build your own home assistant using a Raspberry Pi, the MagPi Essentials guide, and this brand-new kit. “Google, play Mariah Carey again…”



Pimoroni’s Raspberry Pi Zero W Project Kits offer everything you need, including the Pi, to make your own time-lapse cameras, music players, and more.



The official Raspberry Pi Sense HAT, Camera Module, and cases for the Pi 3 and Pi Zero will complete the collection of any Raspberry Pi owner, while also opening up exciting project opportunities.

STEAM gifts that everyone will love

Awesome Astronauts | Building LEGO’s Women of NASA!

LEGO Idea’s bought out this amazing ‘Women of NASA’ set, and I thought it would be fun to build, play and learn from these inspiring women! First up, let’s discover a little more about Sally Ride and Mae Jemison, two AWESOME ASTRONAUTS!

Treat the kids, and big kids, in your life to the newest LEGO Ideas set, the Women of NASA — starring Nancy Grace Roman, Margaret Hamilton, Sally Ride, and Mae Jemison!



Explore the world of wearables with Pimoroni’s sewable, hackable, wearable, adorable Bearables kits.



Add lights and motors to paper creations with the Activating Origami Kit, available from The Pi Hut.




We all loved Hidden Figures, and the STEAM enthusiast you know will do too. The film’s available on DVD, and you can also buy the original book, along with other fascinating non-fiction such as Rebecca Skloot’s The Immortal Life of Henrietta Lacks, Rachel Ignotofsky’s Women in Science, and Sydney Padua’s (mostly true) The Thrilling Adventures of Lovelace and Babbage.

Have we missed anything?

With so many amazing kits, HATs, and books available from members of the Raspberry Pi community, it’s hard to only pick a few. Have you found something splendid for the maker in your life? Maybe you’ve created your own kit that uses the Raspberry Pi? Share your favourites with us in the comments below or via our social media accounts.

The post The Raspberry Pi Christmas shopping list 2017 appeared first on Raspberry Pi.

Don Jr.: I’ll bite

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/11/don-jr-ill-bite.html

So Don Jr. tweets the following, which is an excellent troll. So I thought I’d bite. The reason is I just got through debunk Democrat claims about NetNeutrality, so it seems like a good time to balance things out and debunk Trump nonsense.

The issue here is not which side is right. The issue here is whether you stand for truth, or whether you’ll seize any factoid that appears to support your side, regardless of the truthfulness of it. The ACLU obviously chose falsehoods, as I documented. In the following tweet, Don Jr. does the same.

It’s a preview of the hyperpartisan debates are you are likely to have across the dinner table tomorrow, which each side trying to outdo the other in the false-hoods they’ll claim.

What we see in this number is a steady trend of these statistics since the Great Recession, with no evidence in the graphs showing how Trump has influenced these numbers, one way or the other.

Stock markets at all time highs

This is true, but it’s obviously not due to Trump. The stock markers have been steadily rising since the Great Recession. Trump has done nothing substantive to change the market trajectory. Also, he hasn’t inspired the market to change it’s direction.
To be fair to Don Jr., we’ve all been crediting (or blaming) presidents for changes in the stock market despite the fact they have almost no influence over it. Presidents don’t run the economy, it’s an inappropriate conceit. The most influence they’ve had is in harming it.

Lowest jobless claims since 73

Again, let’s graph this:

As we can see, jobless claims have been on a smooth downward trajectory since the Great Recession. It’s difficult to see here how President Trump has influenced these numbers.

6 Trillion added to the economy

What he’s referring to is that assets have risen in value, like the stock market, homes, gold, and even Bitcoin.
But this is a well known fallacy known as Mercantilism, believing the “economy” is measured by the value of its assets. This was debunked by Adam Smith in his book “The Wealth of Nations“, where he showed instead the the “economy” is measured by how much it produces (GDP – Gross Domestic Product) and not assets.
GDP has grown at 3.0%, which is pretty good compared to the long term trend, and is better than Europe or Japan (though not as good as China). But Trump doesn’t deserve any credit for this — today’s rise in GDP is the result of stuff that happened years ago.
Assets have risen by $6 trillion, but that’s not a good thing. After all, when you sell your home for more money, the buyer has to pay more. So one person is better off and one is worse off, so the net effect is zero.
Actually, such asset price increase is a worrisome indicator — we are entering into bubble territory. It’s the result of a loose monetary policy, low interest rates and “quantitative easing” that was designed under the Obama administration to stimulate the economy. That’s why all assets are rising in value. Normally, a rise in one asset means a fall in another, like selling gold to pay for houses. But because of loose monetary policy, all assets are increasing in price. The amazing rise in Bitcoin over the last year is as much a result of this bubble growing in all assets as it is to an exuberant belief in Bitcoin.
When this bubble collapses, which may happen during Trump’s term, it’ll really be the Obama administration who is to blame. I mean, if Trump is willing to take credit for the asset price bubble now, I’m willing to give it to him, as long as he accepts the blame when it crashes.

1.5 million fewer people on food stamps

As you’d expect, I’m going to debunk this with a graph: the numbers have been falling since the great recession. Indeed, in the previous period under Obama, 1.9 fewer people got off food stamps, so Trump’s performance is slight ahead rather than behind Obama. Of course, neither president is really responsible.

Consumer confidence through the roof

Again we are going to graph this number:

Again we find nothing in the graph that suggests President Trump is responsible for any change — it’s been improving steadily since the Great Recession.

One thing to note is that, technically, it’s not “through the roof” — it still quite a bit below the roof set during the dot-com era.

Lowest Unemployment rate in 17 years

Again, let’s simply graph it over time and look for Trump’s contribution. as we can see, there doesn’t appear to be anything special Trump has done — unemployment has steadily been improving since the Great Recession.
But here’s the thing, the “unemployment rate” only measures those looking for work, not those who have given up. The number that concerns people more is the “labor force participation rate”. The Great Recession kicked a lot of workers out of the economy.
Mostly this is because Baby Boomer are now retiring an leaving the workforce, and some have chosen to retire early rather than look for another job. But there are still some other problems in our economy that cause this. President Trump has nothing particular in order to solve these problems.

Conclusion

As we see, Don Jr’s tweet is a troll. When we look at the graphs of these indicators going back to the Great Recession, we don’t see how President Trump has influenced anything. The improvements this year are in line with the improvements last year, which are in turn inline with the improvements in the previous year.
To be fair, all parties credit their President with improvements during their term. President Obama’s supporters did the same thing. But at least right now, with these numbers, we can see that there’s no merit to anything in Don Jr’s tweet.
The hyperpartisan rancor in this country is because neither side cares about the facts. We should care. We should care that these numbers suck, even if we are Republicans. Conversely, we should care that those NetNeutrality claims by Democrats suck, even if we are Democrats.

Ultimate 3D printer control with OctoPrint

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/octoprint/

Control and monitor your 3D printer remotely with a Raspberry Pi and OctoPrint.

Timelapse of OctoPrint Ornament

Printed on a bq Witbox STL file can be found here: http://www.thingiverse.com/thing:191635 OctoPrint is located here: http://www.octoprint.org

3D printing

Whether you have a 3D printer at home or use one at your school or local makerspace, it’s fair to assume you’ve had a failed print or two in your time. Filament knotting or running out, your print peeling away from the print bed — these are common issues for all 3D printing enthusiasts, and they can be costly if they’re discovered too late.

OctoPrint

OctoPrint is a free open-source software, created and maintained by Gina Häußge, that performs a multitude of useful 3D printing–related tasks, including remote control of your printer, live video, and data collection.

The OctoPrint logo

Control and monitoring

To control the print process, use OctoPrint on a Raspberry Pi connected to your 3D printer. First, ensure a safe uninterrupted run by using the software to restrict who can access the printer. Then, before starting your print, use the web app to work on your STL file. The app also allows you to reposition the print head at any time, as well as pause or stop printing if needed.

Live video streaming

Since OctoPrint can stream video of your print as it happens, you can watch out for any faults that may require you to abort and restart. Proud of your print? Record the entire process from start to finish and upload the time-lapse video to your favourite social media platform.

OctoPrint software graphic user interface screenshot

Data capture

Octoprint records real-time data, such as the temperature, giving you another way to monitor your print to ensure a smooth, uninterrupted process. Moreover, the records will help with troubleshooting if there is a problem.

OctoPrint software graphic user interface screenshot

Print the Millenium Falcon

OK, you can print anything you like. However, this design definitely caught our eye this week.

3D-Printed Fillenium Malcon (Timelapse)

This is a Timelapse of my biggest print project so far on my own designed/built printer. It’s 500x170x700(mm) and weights 3 Kilograms of Filament.

You can support the work of Gina and OctoPrint by visiting her Patreon account and following OctoPrint on Twitter, Facebook, or G+. And if you’ve set up a Raspberry Pi to run OctoPrint, or you’ve created some cool Pi-inspired 3D prints, make sure to share them with us on our own social media channels.

The post Ultimate 3D printer control with OctoPrint appeared first on Raspberry Pi.

Using AWS CodeCommit Pull Requests to request code reviews and discuss code

Post Syndicated from Chris Barclay original https://aws.amazon.com/blogs/devops/using-aws-codecommit-pull-requests-to-request-code-reviews-and-discuss-code/

Thank you to Michael Edge, Senior Cloud Architect, for a great blog on CodeCommit pull requests.

~~~~~~~

AWS CodeCommit is a fully managed service for securely hosting private Git repositories. CodeCommit now supports pull requests, which allows repository users to review, comment upon, and interactively iterate on code changes. Used as a collaboration tool between team members, pull requests help you to review potential changes to a CodeCommit repository before merging those changes into the repository. Each pull request goes through a simple lifecycle, as follows:

  • The new features to be merged are added as one or more commits to a feature branch. The commits are not merged into the destination branch.
  • The pull request is created, usually from the difference between two branches.
  • Team members review and comment on the pull request. The pull request might be updated with additional commits that contain changes made in response to comments, or include changes made to the destination branch.
  • Once team members are happy with the pull request, it is merged into the destination branch. The commits are applied to the destination branch in the same order they were added to the pull request.

Commenting is an integral part of the pull request process, and is used to collaborate between the developers and the reviewer. Reviewers add comments and questions to a pull request during the review process, and developers respond to these with explanations. Pull request comments can be added to the overall pull request, a file within the pull request, or a line within a file.

To make the comments more useful, sign in to the AWS Management Console as an AWS Identity and Access Management (IAM) user. The username will then be associated with the comment, indicating the owner of the comment. Pull request comments are a great quality improvement tool as they allow the entire development team visibility into what reviewers are looking for in the code. They also serve as a record of the discussion between team members at a point in time, and shouldn’t be deleted.

AWS CodeCommit is also introducing the ability to add comments to a commit, another useful collaboration feature that allows team members to discuss code changed as part of a commit. This helps you discuss changes made in a repository, including why the changes were made, whether further changes are necessary, or whether changes should be merged. As is the case with pull request comments, you can comment on an overall commit, on a file within a commit, or on a specific line or change within a file, and other repository users can respond to your comments. Comments are not restricted to commits, they can also be used to comment on the differences between two branches, or between two tags. Commit comments are separate from pull request comments, i.e. you will not see commit comments when reviewing a pull request – you will only see pull request comments.

A pull request example

Let’s get started by running through an example. We’ll take a typical pull request scenario and look at how we’d use CodeCommit and the AWS Management Console for each of the steps.

To try out this scenario, you’ll need:

  • An AWS CodeCommit repository with some sample code in the master branch. We’ve provided sample code below.
  • Two AWS Identity and Access Management (IAM) users, both with the AWSCodeCommitPowerUser managed policy applied to them.
  • Git installed on your local computer, and access configured for AWS CodeCommit.
  • A clone of the AWS CodeCommit repository on your local computer.

In the course of this example, you’ll sign in to the AWS CodeCommit console as one IAM user to create the pull request, and as the other IAM user to review the pull request. To learn more about how to set up your IAM users and how to connect to AWS CodeCommit with Git, see the following topics:

  • Information on creating an IAM user with AWS Management Console access.
  • Instructions on how to access CodeCommit using Git.
  • If you’d like to use the same ‘hello world’ application as used in this article, here is the source code:
package com.amazon.helloworld;

public class Main {
	public static void main(String[] args) {

		System.out.println("Hello, world");
	}
}

The scenario below uses the us-east-2 region.

Creating the branches

Before we jump in and create a pull request, we’ll need at least two branches. In this example, we’ll follow a branching strategy similar to the one described in GitFlow. We’ll create a new branch for our feature from the main development branch (the default branch). We’ll develop the feature in the feature branch. Once we’ve written and tested the code for the new feature in that branch, we’ll create a pull request that contains the differences between the feature branch and the main development branch. Our team lead (the second IAM user) will review the changes in the pull request. Once the changes have been reviewed, the feature branch will be merged into the development branch.

Figure 1: Pull request link

Sign in to the AWS CodeCommit console with the IAM user you want to use as the developer. You can use an existing repository or you can go ahead and create a new one. We won’t be merging any changes to the master branch of your repository, so it’s safe to use an existing repository for this example. You’ll find the Pull requests link has been added just above the Commits link (see Figure 1), and below Commits you’ll find the Branches link. Click Branches and create a new branch called ‘develop’, branched from the ‘master’ branch. Then create a new branch called ‘feature1’, branched from the ‘develop’ branch. You’ll end up with three branches, as you can see in Figure 2. (Your repository might contain other branches in addition to the three shown in the figure).

Figure 2: Create a feature branch

If you haven’t cloned your repo yet, go to the Code link in the CodeCommit console and click the Connect button. Follow the instructions to clone your repo (detailed instructions are here). Open a terminal or command line and paste the git clone command supplied in the Connect instructions for your repository. The example below shows cloning a repository named codecommit-demo:

git clone https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo

If you’ve previously cloned the repo you’ll need to update your local repo with the branches you created. Open a terminal or command line and make sure you’re in the root directory of your repo, then run the following command:

git remote update origin

You’ll see your new branches pulled down to your local repository.

$ git remote update origin
Fetching origin
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
 * [new branch]      develop    -> origin/develop
 * [new branch]      feature1   -> origin/feature1

You can also see your new branches by typing:

git branch --all

$ git branch --all
* master
  remotes/origin/develop
  remotes/origin/feature1
  remotes/origin/master

Now we’ll make a change to the ‘feature1’ branch. Open a terminal or command line and check out the feature1 branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Make code changes

Edit a file in the repo using your favorite editor and save the changes. Commit your changes to the local repository, and push your changes to CodeCommit. For example:

git commit -am 'added new feature'
git push origin feature1

$ git commit -am 'added new feature'
[feature1 8f6cb28] added new feature
1 file changed, 1 insertion(+), 1 deletion(-)

$ git push origin feature1
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (9/9), 617 bytes | 617.00 KiB/s, done.
Total 9 (delta 2), reused 0 (delta 0)
To https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   2774a53..8f6cb28  feature1 -> feature1

Creating the pull request

Now we have a ‘feature1’ branch that differs from the ‘develop’ branch. At this point we want to merge our changes into the ‘develop’ branch. We’ll create a pull request to notify our team members to review our changes and check whether they are ready for a merge.

In the AWS CodeCommit console, click Pull requests. Click Create pull request. On the next page select ‘develop’ as the destination branch and ‘feature1’ as the source branch. Click Compare. CodeCommit will check for merge conflicts and highlight whether the branches can be automatically merged using the fast-forward option, or whether a manual merge is necessary. A pull request can be created in both situations.

Figure 3: Create a pull request

After comparing the two branches, the CodeCommit console displays the information you’ll need in order to create the pull request. In the ‘Details’ section, the ‘Title’ for the pull request is mandatory, and you may optionally provide comments to your reviewers to explain the code change you have made and what you’d like them to review. In the ‘Notifications’ section, there is an option to set up notifications to notify subscribers of changes to your pull request. Notifications will be sent on creation of the pull request as well as for any pull request updates or comments. And finally, you can review the changes that make up this pull request. This includes both the individual commits (a pull request can contain one or more commits, available in the Commits tab) as well as the changes made to each file, i.e. the diff between the two branches referenced by the pull request, available in the Changes tab. After you have reviewed this information and added a title for your pull request, click the Create button. You will see a confirmation screen, as shown in Figure 4, indicating that your pull request has been successfully created, and can be merged without conflicts into the ‘develop’ branch.

Figure 4: Pull request confirmation page

Reviewing the pull request

Now let’s view the pull request from the perspective of the team lead. If you set up notifications for this CodeCommit repository, creating the pull request would have sent an email notification to the team lead, and he/she can use the links in the email to navigate directly to the pull request. In this example, sign in to the AWS CodeCommit console as the IAM user you’re using as the team lead, and click Pull requests. You will see the same information you did during creation of the pull request, plus a record of activity related to the pull request, as you can see in Figure 5.

Figure 5: Team lead reviewing the pull request

Commenting on the pull request

You now perform a thorough review of the changes and make a number of comments using the new pull request comment feature. To gain an overall perspective on the pull request, you might first go to the Commits tab and review how many commits are included in this pull request. Next, you might visit the Changes tab to review the changes, which displays the differences between the feature branch code and the develop branch code. At this point, you can add comments to the pull request as you work through each of the changes. Let’s go ahead and review the pull request. During the review, you can add review comments at three levels:

  • The overall pull request
  • A file within the pull request
  • An individual line within a file

The overall pull request
In the Changes tab near the bottom of the page you’ll see a ‘Comments on changes’ box. We’ll add comments here related to the overall pull request. Add your comments as shown in Figure 6 and click the Save button.

Figure 6: Pull request comment

A specific file in the pull request
Hovering your mouse over a filename in the Changes tab will cause a blue ‘comments’ icon to appear to the left of the filename. Clicking the icon will allow you to enter comments specific to this file, as in the example in Figure 7. Go ahead and add comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 7: File comment

A specific line in a file in the pull request
A blue ‘comments’ icon will appear as you hover over individual lines within each file in the pull request, allowing you to create comments against lines that have been added, removed or are unchanged. In Figure 8, you add comments against a line that has been added to the source code, encouraging the developer to review the naming standards. Go ahead and add line comments for one of the files changed by the developer. Click the Save button to save your comment.

Figure 8: Line comment

A pull request that has been commented at all three levels will look similar to Figure 9. The pull request comment is shown expanded in the ‘Comments on changes’ section, while the comments at file and line level are shown collapsed. A ‘comment’ icon indicates that comments exist at file and line level. Clicking the icon will expand and show the comment. Since you are expecting the developer to make further changes based on your comments, you won’t merge the pull request at this stage, but will leave it open awaiting feedback. Each comment you made results in a notification being sent to the developer, who can respond to the comments. This is great for remote working, where developers and team lead may be in different time zones.

Figure 9: Fully commented pull request

Adding a little complexity

A typical development team is going to be creating pull requests on a regular basis. It’s highly likely that the team lead will merge other pull requests into the ‘develop’ branch while pull requests on feature branches are in the review stage. This may result in a change to the ‘Mergable’ status of a pull request. Let’s add this scenario into the mix and check out how a developer will handle this.

To test this scenario, we could create a new pull request and ask the team lead to merge this to the ‘develop’ branch. But for the sake of simplicity we’ll take a shortcut. Clone your CodeCommit repo to a new folder, switch to the ‘develop’ branch, and make a change to one of the same files that were changed in your pull request. Make sure you change a line of code that was also changed in the pull request. Commit and push this back to CodeCommit. Since you’ve just changed a line of code in the ‘develop’ branch that has also been changed in the ‘feature1’ branch, the ‘feature1’ branch cannot be cleanly merged into the ‘develop’ branch. Your developer will need to resolve this merge conflict.

A developer reviewing the pull request would see the pull request now looks similar to Figure 10, with a ‘Resolve conflicts’ status rather than the ‘Mergable’ status it had previously (see Figure 5).

Figure 10: Pull request with merge conflicts

Reviewing the review comments

Once the team lead has completed his review, the developer will review the comments and make the suggested changes. As a developer, you’ll see the list of review comments made by the team lead in the pull request Activity tab, as shown in Figure 11. The Activity tab shows the history of the pull request, including commits and comments. You can reply to the review comments directly from the Activity tab, by clicking the Reply button, or you can do this from the Changes tab. The Changes tab shows the comments for the latest commit, as comments on previous commits may be associated with lines that have changed or been removed in the current commit. Comments for previous commits are available to view and reply to in the Activity tab.

In the Activity tab, use the shortcut link (which looks like this </>) to move quickly to the source code associated with the comment. In this example, you will make further changes to the source code to address the pull request review comments, so let’s go ahead and do this now. But first, you will need to resolve the ‘Resolve conflicts’ status.

Figure 11: Pull request activity

Resolving the ‘Resolve conflicts’ status

The ‘Resolve conflicts’ status indicates there is a merge conflict between the ‘develop’ branch and the ‘feature1’ branch. This will require manual intervention to restore the pull request back to the ‘Mergable’ state. We will resolve this conflict next.

Open a terminal or command line and check out the develop branch by running the following command:

git checkout develop

$ git checkout develop
Switched to branch 'develop'
Your branch is up-to-date with 'origin/develop'.

To incorporate the changes the team lead made to the ‘develop’ branch, merge the remote ‘develop’ branch with your local copy:

git pull

$ git pull
remote: Counting objects: 9, done.
Unpacking objects: 100% (9/9), done.
From https://git-codecommit.us-east-2.amazonaws.com/v1/repos/codecommit-demo
   af13c82..7b36f52  develop    -> origin/develop
Updating af13c82..7b36f52
Fast-forward
 src/main/java/com/amazon/helloworld/Main.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Then checkout the ‘feature1’ branch:

git checkout feature1

$ git checkout feature1
Switched to branch 'feature1'
Your branch is up-to-date with 'origin/feature1'.

Now merge the changes from the ‘develop’ branch into your ‘feature1’ branch:

git merge develop

$ git merge develop
Auto-merging src/main/java/com/amazon/helloworld/Main.java
CONFLICT (content): Merge conflict in src/main/java/com/amazon/helloworld/Main.java
Automatic merge failed; fix conflicts and then commit the result.

Yes, this fails. The file Main.java has been changed in both branches, resulting in a merge conflict that can’t be resolved automatically. However, Main.java will now contain markers that indicate where the conflicting code is, and you can use these to resolve the issues manually. Edit Main.java using your favorite IDE, and you’ll see it looks something like this:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

<<<<<<< HEAD
        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earthling. Today's date is: " + todaysdate);
=======
      System.out.println("Hello, earth");
>>>>>>> develop
   }
}

The code between HEAD and ‘===’ is the code the developer added in the ‘feature1’ branch (HEAD represents ‘feature1’ because this is the current checked out branch). The code between ‘===’ and ‘>>> develop’ is the code added to the ‘develop’ branch by the team lead. We’ll resolve the conflict by manually merging both changes, resulting in an updated Main.java:

package com.amazon.helloworld;

import java.util.*;

/**
 * This class prints a hello world message
 */

public class Main {
   public static void main(String[] args) {

        Date todaysdate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysdate);
   }
}

After saving the change you can add and commit it to your local repo:

git add src/
git commit -m 'fixed merge conflict by merging changes'

Fixing issues raised by the reviewer

Now you are ready to address the comments made by the team lead. If you are no longer pointing to the ‘feature1’ branch, check out the ‘feature1’ branch by running the following command:

git checkout feature1

$ git checkout feature1
Branch feature1 set up to track remote branch feature1 from origin.
Switched to a new branch 'feature1'

Edit the source code in your favorite IDE and make the changes to address the comments. In this example, the developer has updated the source code as follows:

package com.amazon.helloworld;

import java.util.*;

/**
 *  This class prints a hello world message
 *
 * @author Michael Edge
 * @see HelloEarth
 * @version 1.0
 */

public class Main {
   public static void main(String[] args) {

        Date todaysDate = Calendar.getInstance().getTime();

        System.out.println("Hello, earth. Today's date is: " + todaysDate);
   }
}

After saving the changes, commit and push to the CodeCommit ‘feature1’ branch as you did previously:

git commit -am 'updated based on review comments'
git push origin feature1

Responding to the reviewer

Now that you’ve fixed the code issues you will want to respond to the review comments. In the AWS CodeCommit console, check that your latest commit appears in the pull request Commits tab. You now have a pull request consisting of more than one commit. The pull request in Figure 12 has four commits, which originated from the following activities:

  • 8th Nov: the original commit used to initiate this pull request
  • 10th Nov, 3 hours ago: the commit by the team lead to the ‘develop’ branch, merged into our ‘feature1’ branch
  • 10th Nov, 24 minutes ago: the commit by the developer that resolved the merge conflict
  • 10th Nov, 4 minutes ago: the final commit by the developer addressing the review comments

Figure 12: Pull request with multiple commits

Let’s reply to the review comments provided by the team lead. In the Activity tab, reply to the pull request comment and save it, as shown in Figure 13.

Figure 13: Replying to a pull request comment

At this stage, your code has been committed and you’ve updated your pull request comments, so you are ready for a final review by the team lead.

Final review

The team lead reviews the code changes and comments made by the developer. As team lead, you own the ‘develop’ branch and it’s your decision on whether to merge the changes in the pull request into the ‘develop’ branch. You can close the pull request with or without merging using the Merge and Close buttons at the bottom of the pull request page (see Figure 13). Clicking Close will allow you to add comments on why you are closing the pull request without merging. Merging will perform a fast-forward merge, incorporating the commits referenced by the pull request. Let’s go ahead and click the Merge button to merge the pull request into the ‘develop’ branch.

Figure 14: Merging the pull request

After merging a pull request, development of that feature is complete and the feature branch is no longer needed. It’s common practice to delete the feature branch after merging. CodeCommit provides a check box during merge to automatically delete the associated feature branch, as seen in Figure 14. Clicking the Merge button will merge the pull request into the ‘develop’ branch, as shown in Figure 15. This will update the status of the pull request to ‘Merged’, and will close the pull request.

Conclusion

This blog has demonstrated how pull requests can be used to request a code review, and enable reviewers to get a comprehensive summary of what is changing, provide feedback to the author, and merge the code into production. For more information on pull requests, see the documentation.

New – Amazon EC2 Instances with Up to 8 NVIDIA Tesla V100 GPUs (P3)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-with-up-to-8-nvidia-tesla-v100-gpus-p3/

Driven by customer demand and made possible by on-going advances in the state-of-the-art, we’ve come a long way since the original m1.small instance that we launched in 2006, with instances that are emphasize compute power, burstable performance, memory size, local storage, and accelerated computing.

The New P3
Today we are making the next generation of GPU-powered EC2 instances available in four AWS regions. Powered by up to eight NVIDIA Tesla V100 GPUs, the P3 instances are designed to handle compute-intensive machine learning, deep learning, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, and genomics workloads.

P3 instances use customized Intel Xeon E5-2686v4 processors running at up to 2.7 GHz. They are available in three sizes (all VPC-only and EBS-only):

Model NVIDIA Tesla V100 GPUs GPU Memory NVIDIA NVLink vCPUs Main Memory Network Bandwidth EBS Bandwidth
p3.2xlarge 1 16 GiB n/a 8 61 GiB Up to 10 Gbps 1.5 Gbps
p3.8xlarge 4 64 GiB 200 GBps 32 244 GiB 10 Gbps 7 Gbps
p3.16xlarge 8 128 GiB 300 GBps 64 488 GiB 25 Gbps 14 Gbps

Each of the NVIDIA GPUs is packed with 5,120 CUDA cores and another 640 Tensor cores and can deliver up to 125 TFLOPS of mixed-precision floating point, 15.7 TFLOPS of single-precision floating point, and 7.8 TFLOPS of double-precision floating point. On the two larger sizes, the GPUs are connected together via NVIDIA NVLink 2.0 running at a total data rate of up to 300 GBps. This allows the GPUs to exchange intermediate results and other data at high speed, without having to move it through the CPU or the PCI-Express fabric.

What’s a Tensor Core?
I had not heard the term Tensor core before starting to write this post. According to this very helpful post on the NVIDIA Blog, Tensor cores are designed to speed up the training and inference of large, deep neural networks. Each core is able to quickly and efficiently multiply a pair of 4×4 half-precision (also known as FP16) matrices together, add the resulting 4×4 matrix to another half or single-precision (FP32) matrix, and store the resulting 4×4 matrix in either half or single-precision form. Here’s a diagram from NVIDIA’s blog post:

This operation is in the innermost loop of the training process for a deep neural network, and is an excellent example of how today’s NVIDIA GPU hardware is purpose-built to address a very specific market need. By the way, the mixed-precision qualifier on the Tensor core performance means that it is flexible enough to work with with a combination of 16-bit and 32-bit floating point values.

Performance in Perspective
I always like to put raw performance numbers into a real-world perspective so that they are easier to relate to and (hopefully) more meaningful. This turned out to be surprisingly difficult, given that the eight NVIDIA Tesla V100 GPUs on a single p3.16xlarge can do 125 trillion single-precision floating point multiplications per second.

Let’s go back to the dawn of the microprocessor era, and consider the Intel 8080A chip that powered the MITS Altair that I bought in the summer of 1977. With a 2 MHz clock, it was able to do about 832 multiplications per second (I used this data and corrected it for the faster clock speed). The p3.16xlarge is roughly 150 billion times faster. However, just 1.2 billion seconds have gone by since that summer. In other words, I can do 100x more calculations today in one second than my Altair could have done in the last 40 years!

What about the innovative 8087 math coprocessor that was an optional accessory for the IBM PC that was announced in the summer of 1981? With a 5 MHz clock and purpose-built hardware, it was able to do about 52,632 multiplications per second. 1.14 billion seconds have elapsed since then, p3.16xlarge is 2.37 billion times faster, so the poor little PC would be barely halfway through a calculation that would run for 1 second today.

Ok, how about a Cray-1? First delivered in 1976, this supercomputer was able to perform vector operations at 160 MFLOPS, making the p3.x16xlarge 781,000 times faster. It could have iterated on some interesting problem 1500 times over the years since it was introduced.

Comparisons between the P3 and today’s scale-out supercomputers are harder to make, given that you can think of the P3 as a step-and-repeat component of a supercomputer that you can launch on as as-needed basis.

Run One Today
In order to take full advantage of the NVIDIA Tesla V100 GPUs and the Tensor cores, you will need to use CUDA 9 and cuDNN7. These drivers and libraries have already been added to the newest versions of the Windows AMIs and will be included in an updated Amazon Linux AMI that is scheduled for release on November 7th. New packages are already available in our repos if you want to to install them on your existing Amazon Linux AMI.

The newest AWS Deep Learning AMIs come preinstalled with the latest releases of Apache MxNet, Caffe2, and Tensorflow (each with support for the NVIDIA Tesla V100 GPUs), and will be updated to support P3 instances with other machine learning frameworks such as Microsoft Cognitive Toolkit and PyTorch as soon as these frameworks release support for the NVIDIA Tesla V100 GPUs. You can also use the NVIDIA Volta Deep Learning AMI for NGC.

P3 instances are available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) Regions in On-Demand, Spot, Reserved Instance, and Dedicated Host form.

Jeff;

 

Coaxing 2D platforming out of Unity

Post Syndicated from Eevee original https://eev.ee/blog/2017/10/13/coaxing-2d-platforming-out-of-unity/

An anonymous donor asked a question that I can’t even begin to figure out how to answer, but they also said anything else is fine, so here’s anything else.

I’ve been avoiding writing about game physics, since I want to save it for ✨ the book I’m writing ✨, but that book will almost certainly not touch on Unity. Here, then, is a brief run through some of the brick walls I ran into while trying to convince Unity to do 2D platforming.

This is fairly high-level — there are no blocks of code or helpful diagrams. I’m just getting this out of my head because it’s interesting. If you want more gritty details, I guess you’ll have to wait for ✨ the book ✨.

The setup

I hadn’t used Unity before. I hadn’t even used a “real” physics engine before. My games so far have mostly used LÖVE, a Lua-based engine. LÖVE includes box2d bindings, but for various reasons (not all of them good), I opted to avoid them and instead write my own physics completely from scratch. (How, you ask? ✨ Book ✨!)

I was invited to work on a Unity project, Chaos Composer, that someone else had already started. It had basic movement already implemented; I taught myself Unity’s physics system by hacking on it. It’s entirely possible that none of this is actually the best way to do anything, since I was really trying to reproduce my own homegrown stuff in Unity, but it’s the best I’ve managed to come up with.

Two recurring snags were that you can’t ask Unity to do multiple physics updates in a row, and sometimes getting the information I wanted was difficult. Working with my own code spoiled me a little, since I could invoke it at any time and ask it anything I wanted; Unity, on the other hand, is someone else’s black box with a rigid interface on top.

Also, wow, Googling for a lot of this was not quite as helpful as expected. A lot of what’s out there is just the first thing that works, and often that’s pretty hacky and imposes severe limits on the game design (e.g., “this won’t work with slopes”). Basic movement and collision are the first thing you do, which seems to me like the worst time to be locking yourself out of a lot of design options. I tried very (very, very, very) hard to minimize those kinds of constraints.

Problem 1: Movement

When I showed up, movement was already working. Problem solved!

Like any good programmer, I immediately set out to un-solve it. Given a “real” physics engine like Unity prominently features, you have two options: ⓐ treat the player as a physics object, or ⓑ don’t. The existing code went with option ⓑ, like I’d done myself with LÖVE, and like I’d seen countless people advise. Using a physics sim makes for bad platforming.

But… why? I believed it, but I couldn’t concretely defend it. I had to know for myself. So I started a blank project, drew some physics boxes, and wrote a dozen-line player controller.

Ah! Immediate enlightenment.

If the player was sliding down a wall, and I tried to move them into the wall, they would simply freeze in midair until I let go of the movement key. The trouble is that the physics sim works in terms of forces — moving the player involves giving them a nudge in some direction, like a giant invisible hand pushing them around the level. Surprise! If you press a real object against a real wall with your real hand, you’ll see the same effect — friction will cancel out gravity, and the object will stay in midair..

Platformer movement, as it turns out, doesn’t make any goddamn physical sense. What is air control? What are you pushing against? Nothing, really; we just have it because it’s nice to play with, because not having it is a nightmare.

I looked to see if there were any common solutions to this, and I only really found one: make all your walls frictionless.

Game development is full of hacks like this, and I… don’t like them. I can accept that minor hacks are necessary sometimes, but this one makes an early and widespread change to a fundamental system to “fix” something that was wrong in the first place. It also imposes an “invisible” requirement, something I try to avoid at all costs — if you forget to make a particular wall frictionless, you’ll never know unless you happen to try sliding down it.

And so, I swiftly returned to the existing code. It wasn’t too different from what I’d come up with for LÖVE: it applied gravity by hand, tracked the player’s velocity, computed the intended movement each frame, and moved by that amount. The interesting thing was that it used MovePosition, which schedules a movement for the next physics update and stops the movement if the player hits something solid.

It’s kind of a nice hybrid approach, actually; all the “physics” for conscious actors is done by hand, but the physics engine is still used for collision detection. It’s also used for collision rejection — if the player manages to wedge themselves several pixels into a solid object, for example, the physics engine will try to gently nudge them back out of it with no extra effort required on my part. I still haven’t figured out how to get that to work with my homegrown stuff, which is built to prevent overlap rather than to jiggle things out of it.

But wait, what about…

Our player is a dynamic body with rotation lock and no gravity. Why not just use a kinematic body?

I must be missing something, because I do not understand the point of kinematic bodies. I ran into this with Godot, too, which documented them the same way: as intended for use as players and other manually-moved objects. But by default, they don’t even collide with other kinematic bodies or static geometry. What? There’s a checkbox to turn this on, which I enabled, but then I found out that MovePosition doesn’t stop kinematic bodies when they hit something, so I would’ve had to cast along the intended path of movement to figure out when to stop, thus duplicating the same work the physics engine was about to do.

But that’s impossible anyway! Static geometry generally wants to be made of edge colliders, right? They don’t care about concave/convex. Imagine the player is standing on the ground near a wall and tries to move towards the wall. Both the ground and the wall are different edges from the same edge collider.

If you try to cast the player’s hitbox horizontally, parallel to the ground, you’ll only get one collision: the existing collision with the ground. Casting doesn’t distinguish between touching and hitting. And because Unity only reports one collision per collider, and because the ground will always show up first, you will never find out about the impending wall collision.

So you’re forced to either use raycasts for collision detection or decomposed polygons for world geometry, both of which are slightly worse tools for no real gain.

I ended up sticking with a dynamic body.


Oh, one other thing that doesn’t really fit anywhere else: keep track of units! If you’re adding something called “velocity” directly to something called “position”, something has gone very wrong. Acceleration is distance per time squared; velocity is distance per time; position is distance. You must multiply or divide by time to convert between them.

I never even, say, add a constant directly to position every frame; I always phrase it as velocity and multiply by Δt. It keeps the units consistent: time is always in seconds, not in tics.

Problem 2: Slopes

Ah, now we start to get off in the weeds.

A sort of pre-problem here was detecting whether we’re on a slope, which means detecting the ground. The codebase originally used a manual physics query of the area around the player’s feet to check for the ground, which seems to be somewhat common, but that can’t tell me the angle of the detected ground. (It’s also kind of error-prone, since “around the player’s feet” has to be specified by hand and may not stay correct through animations or changes in the hitbox.)

I replaced that with what I’d eventually settled on in LÖVE: detect the ground by detecting collisions, and looking at the normal of the collision. A normal is a vector that points straight out from a surface, so if you’re standing on the ground, the normal points straight up; if you’re on a 10° incline, the normal points 10° away from straight up.

Not all collisions are with the ground, of course, so I assumed something is ground if the normal pointed away from gravity. (I like this definition more than “points upwards”, because it avoids assuming anything about the direction of gravity, which leaves some interesting doors open for later on.) That’s easily detected by taking the dot product — if it’s negative, the collision was with the ground, and I now have the normal of the ground.

Actually doing this in practice was slightly tricky. With my LÖVE engine, I could cram this right into the middle of collision resolution. With Unity, not quite so much. I went through a couple iterations before I really grasped Unity’s execution order, which I guess I will have to briefly recap for this to make sense.

Unity essentially has two update cycles. It performs physics updates at fixed intervals for consistency, and updates everything else just before rendering. Within a single frame, Unity does as many fixed physics updates as it has spare time for (which might be zero, one, or more), then does a regular update, then renders. User code can implement either or both of Update, which runs during a regular update, and FixedUpdate, which runs just before Unity does a physics pass.

So my solution was:

  • At the very end of FixedUpdate, clear the actor’s “on ground” flag and ground normal.

  • During OnCollisionEnter2D and OnCollisionStay2D (which are called from within a physics pass), if there’s a collision that looks like it’s with the ground, set the “on ground” flag and ground normal. (If there are multiple ground collisions, well, good luck figuring out the best way to resolve that! At the moment I’m just taking the first and hoping for the best.)

That means there’s a brief window between the end of FixedUpdate and Unity’s physics pass during which a grounded actor might mistakenly believe it’s not on the ground, which is a bit of a shame, but there are very few good reasons for anything to be happening in that window.

Okay! Now we can do slopes.

Just kidding! First we have to do sliding.

When I first looked at this code, it didn’t apply gravity while the player was on the ground. I think I may have had some problems with detecting the ground as result, since the player was no longer pushing down against it? Either way, it seemed like a silly special case, so I made gravity always apply.

Lo! I was a fool. The player could no longer move.

Why? Because MovePosition does exactly what it promises. If the player collides with something, they’ll stop moving. Applying gravity means that the player is trying to move diagonally downwards into the ground, and so MovePosition stops them immediately.

Hence, sliding. I don’t want the player to actually try to move into the ground. I want them to move the unblocked part of that movement. For flat ground, that means the horizontal part, which is pretty much the same as discarding gravity. For sloped ground, it’s a bit more complicated!

Okay but actually it’s less complicated than you’d think. It can be done with some cross products fairly easily, but Unity makes it even easier with a couple casts. There’s a Vector3.ProjectOnPlane function that projects an arbitrary vector on a plane given by its normal — exactly the thing I want! So I apply that to the attempted movement before passing it along to MovePosition. I do the same thing with the current velocity, to prevent the player from accelerating infinitely downwards while standing on flat ground.

One other thing: I don’t actually use the detected ground normal for this. The player might be touching two ground surfaces at the same time, and I’d want to project on both of them. Instead, I use the player body’s GetContacts method, which returns contact points (and normals!) for everything the player is currently touching. I believe those contact points are tracked by the physics engine anyway, so asking for them doesn’t require any actual physics work.

(Looking at the code I have, I notice that I still only perform the slide for surfaces facing upwards — but I’d want to slide against sloped ceilings, too. Why did I do this? Maybe I should remove that.)

(Also, I’m pretty sure projecting a vector on a plane is non-commutative, which raises the question of which order the projections should happen in and what difference it makes. I don’t have a good answer.)

(I note that my LÖVE setup does something slightly different: it just tries whatever the movement ought to be, and if there’s a collision, then it projects — and tries again with the remaining movement. But I can’t ask Unity to do multiple moves in one physics update, alas.)

Okay! Now, slopes. But actually, with the above work done, slopes are most of the way there already.

One obvious problem is that the player tries to move horizontally even when on a slope, and the easy fix is to change their movement from speed * Vector2.right to speed * new Vector2(ground.y, -ground.x) while on the ground. That’s the ground normal rotated a quarter-turn clockwise, so for flat ground it still points to the right, and in general it points rightwards along the ground. (Note that it assumes the ground normal is a unit vector, but as far as I’m aware, that’s true for all the normals Unity gives you.)

Another issue is that if the player stands motionless on a slope, gravity will cause them to slowly slide down it — because the movement from gravity will be projected onto the slope, and unlike flat ground, the result is no longer zero. For conscious actors only, I counter this by adding the opposite factor to the player’s velocity as part of adding in their walking speed. This matches how the real world works, to some extent: when you’re standing on a hill, you’re exerting some small amount of effort just to stay in place.

(Note that slope resistance is not the same as friction. Okay, yes, in the real world, virtually all resistance to movement happens as a result of friction, but bracing yourself against the ground isn’t the same as being passively resisted.)

From here there are a lot of things you can do, depending on how you think slopes should be handled. You could make the player unable to walk up slopes that are too steep. You could make walking down a slope faster than walking up it. You could make jumping go along the ground normal, rather than straight up. You could raise the player’s max allowed speed while running downhill. Whatever you want, really. Armed with a normal and awareness of dot products, you can do whatever you want.

But first you might want to fix a few aggravating side effects.

Problem 3: Ground adherence

I don’t know if there’s a better name for this. I rarely even see anyone talk about it, which surprises me; it seems like it should be a very common problem.

The problem is: if the player runs up a slope which then abruptly changes to flat ground, their momentum will carry them into the air. For very fast players going off the top of very steep slopes, this makes sense, but it becomes visible even for relatively gentle slopes. It was a mild nightmare in the original release of our game Lunar Depot 38, which has very “rough” ground made up of lots of shallow slopes — so the player is very frequently slightly off the ground, which meant they couldn’t jump, for seemingly no reason. (I even had code to fix this, but I disabled it because of a silly visual side effect that I never got around to fixing.)

Anyway! The reason this is a problem is that game protagonists are generally not boxes sliding around — they have legs. We don’t go flying off the top of real-world hilltops because we put our foot down until it touches the ground.

Simulating this footfall is surprisingly fiddly to get right, especially with someone else’s physics engine. It’s made somewhat easier by Cast, which casts the entire hitbox — no matter what shape it is — in a particular direction, as if it had moved, and tells you all the hypothetical collisions in order.

So I cast the player in the direction of gravity by some distance. If the cast hits something solid with a ground-like collision normal, then the player must be close to the ground, and I move them down to touch it (and set that ground as the new ground normal).

There are some wrinkles.

Wrinkle 1: I only want to do this if the player is off the ground now, but was on the ground last frame, and is not deliberately moving upwards. That latter condition means I want to skip this logic if the player jumps, for example, but also if the player is thrust upwards by a spring or abducted by a UFO or whatever. As long as external code goes through some interface and doesn’t mess with the player’s velocity directly, that shouldn’t be too hard to track.

Wrinkle 2: When does this logic run? It needs to happen after the player moves, which means after a Unity physics pass… but there’s no callback for that point in time. I ended up running it at the beginning of FixedUpdate and the beginning of Update — since I definitely want to do it before rendering happens! That means it’ll sometimes happen twice between physics updates. (I could carefully juggle a flag to skip the second run, but I… didn’t do that. Yet?)

Wrinkle 3: I can’t move the player with MovePosition! Remember, MovePosition schedules a movement, it doesn’t actually perform one; that means if it’s called twice before the physics pass, the first call is effectively ignored. I can’t easily combine the drop with the player’s regular movement, for various fiddly reasons. I ended up doing it “by hand” using transform.Translate, which I think was the “old way” to do manual movement before MovePosition existed. I’m not totally sure if it activates triggers? For that matter, I’m not sure it even notices collisions — but since I did a full-body Cast, there shouldn’t be any anyway.

Wrinkle 4: What, exactly, is “some distance”? I’ve yet to find a satisfying answer for this. It seems like it ought to be based on the player’s current speed and the slope of the ground they’re moving along, but every time I’ve done that math, I’ve gotten totally ludicrous answers that sometimes exceed the size of a tile. But maybe that’s not wrong? Play around, I guess, and think about when the effect should “break” and the player should go flying off the top of a hill.

Wrinkle 5: It’s possible that the player will launch off a slope, hit something, and then be adhered to the ground where they wouldn’t have hit it. I don’t much like this edge case, but I don’t see a way around it either.

This problem is surprisingly awkward for how simple it sounds, and the solution isn’t entirely satisfying. Oh, well; the results are much nicer than the solution. As an added bonus, this also fixes occasional problems with running down a hill and becoming detached from the ground due to precision issues or whathaveyou.

Problem 4: One-way platforms

Ah, what a nightmare.

It took me ages just to figure out how to define one-way platforms. Only block when the player is moving downwards? Nope. Only block when the player is above the platform? Nuh-uh.

Well, okay, yes, those approaches might work for convex players and flat platforms. But what about… sloped, one-way platforms? There’s no reason you shouldn’t be able to have those. If Super Mario World can do it, surely Unity can do it almost 30 years later.

The trick is, again, to look at the collision normal. If it faces away from gravity, the player is hitting a ground-like surface, so the platform should block them. Otherwise (or if the player overlaps the platform), it shouldn’t.

Here’s the catch: Unity doesn’t have conditional collision. I can’t decide, on the fly, whether a collision should block or not. In fact, I think that by the time I get a callback like OnCollisionEnter2D, the physics pass is already over.

I could go the other way and use triggers (which are non-blocking), but then I have the opposite problem: I can’t stop the player on the fly. I could move them back to where they hit the trigger, but I envision all kinds of problems as a result. What if they were moving fast enough to activate something on the other side of the platform? What if something else moved to where I’m trying to shove them back to in the meantime? How does this interact with ground detection and listing contacts, which would rightly ignore a trigger as non-blocking?

I beat my head against this for a while, but the inability to respond to collision conditionally was a huge roadblock. It’s all the more infuriating a problem, because Unity ships with a one-way platform modifier thing. Unfortunately, it seems to have been implemented by someone who has never played a platformer. It’s literally one-way — the player is only allowed to move straight upwards through it, not in from the sides. It also tries to block the player if they’re moving downwards while inside the platform, which invokes clumsy rejection behavior. And this all seems to be built into the physics engine itself somehow, so I can’t simply copy whatever they did.

Eventually, I settled on the following. After calculating attempted movement (including sliding), just at the end of FixedUpdate, I do a Cast along the movement vector. I’m not thrilled about having to duplicate the physics engine’s own work, but I do filter to only things on a “one-way platform” physics layer, which should at least help. For each object the cast hits, I use Physics2D.IgnoreCollision to either ignore or un-ignore the collision between the player and the platform, depending on whether the collision was ground-like or not.

(A lot of people suggested turning off collision between layers, but that can’t possibly work — the player might be standing on one platform while inside another, and anyway, this should work for all actors!)

Again, wrinkles! But fewer this time. Actually, maybe just one: handling the case where the player already overlaps the platform. I can’t just check for that with e.g. OverlapCollider, because that doesn’t distinguish between overlapping and merely touching.

I came up with a fairly simple fix: if I was going to un-ignore the collision (i.e. make the platform block), and the cast distance is reported as zero (either already touching or overlapping), I simply do nothing instead. If I’m standing on the platform, I must have already set it blocking when I was approaching it from the top anyway; if I’m overlapping it, I must have already set it non-blocking to get here in the first place.

I can imagine a few cases where this might go wrong. Moving platforms, especially, are going to cause some interesting issues. But this is the best I can do with what I know, and it seems to work well enough so far.

Oh, and our player can deliberately drop down through platforms, which was easy enough to implement; I just decide the platform is always passable while some button is held down.

Problem 5: Pushers and carriers

I haven’t gotten to this yet! Oh boy, can’t wait. I implemented it in LÖVE, but my way was hilariously invasive; I’m hoping that having a physics engine that supports a handwaved “this pushes that” will help. Of course, you also have to worry about sticking to platforms, for which the recommended solution is apparently to parent the cargo to the platform, which sounds goofy to me? I guess I’ll find out when I throw myself at it later.

Overall result

I ended up with a fairly pleasant-feeling system that supports slopes and one-way platforms and whatnot, with all the same pieces as I came up with for LÖVE. The code somehow ended up as less of a mess, too, but it probably helps that I’ve been down this rabbit hole once before and kinda knew what I was aiming for this time.

Animation of a character running smoothly along the top of an irregular dinosaur skeleton

Sorry that I don’t have a big block of code for you to copy-paste into your project. I don’t think there are nearly enough narrative discussions of these fundamentals, though, so hopefully this is useful to someone. If not, well, look forward to ✨ my book, that I am writing ✨!

Predict Billboard Top 10 Hits Using RStudio, H2O and Amazon Athena

Post Syndicated from Gopal Wunnava original https://aws.amazon.com/blogs/big-data/predict-billboard-top-10-hits-using-rstudio-h2o-and-amazon-athena/

Success in the popular music industry is typically measured in terms of the number of Top 10 hits artists have to their credit. The music industry is a highly competitive multi-billion dollar business, and record labels incur various costs in exchange for a percentage of the profits from sales and concert tickets.

Predicting the success of an artist’s release in the popular music industry can be difficult. One release may be extremely popular, resulting in widespread play on TV, radio and social media, while another single may turn out quite unpopular, and therefore unprofitable. Record labels need to be selective in their decision making, and predictive analytics can help them with decision making around the type of songs and artists they need to promote.

In this walkthrough, you leverage H2O.ai, Amazon Athena, and RStudio to make predictions on whether a song might make it to the Top 10 Billboard charts. You explore the GLM, GBM, and deep learning modeling techniques using H2O’s rapid, distributed and easy-to-use open source parallel processing engine. RStudio is a popular IDE, licensed either commercially or under AGPLv3, for working with R. This is ideal if you don’t want to connect to a server via SSH and use code editors such as vi to do analytics. RStudio is available in a desktop version, or a server version that allows you to access R via a web browser. RStudio’s Notebooks feature is used to demonstrate the execution of code and output. In addition, this post showcases how you can leverage Athena for query and interactive analysis during the modeling phase. A working knowledge of statistics and machine learning would be helpful to interpret the analysis being performed in this post.

Walkthrough

Your goal is to predict whether a song will make it to the Top 10 Billboard charts. For this purpose, you will be using multiple modeling techniques―namely GLM, GBM and deep learning―and choose the model that is the best fit.

This solution involves the following steps:

  • Install and configure RStudio with Athena
  • Log in to RStudio
  • Install R packages
  • Connect to Athena
  • Create a dataset
  • Create models

Install and configure RStudio with Athena

Use the following AWS CloudFormation stack to install, configure, and connect RStudio on an Amazon EC2 instance with Athena.

Launching this stack creates all required resources and prerequisites:

  • Amazon EC2 instance with Amazon Linux (minimum size of t2.large is recommended)
  • Provisioning of the EC2 instance in an existing VPC and public subnet
  • Installation of Java 8
  • Assignment of an IAM role to the EC2 instance with the required permissions for accessing Athena and Amazon S3
  • Security group allowing access to the RStudio and SSH ports from the internet (I recommend restricting access to these ports)
  • S3 staging bucket required for Athena (referenced within RStudio as ATHENABUCKET)
  • RStudio username and password
  • Setup logs in Amazon CloudWatch Logs (if needed for additional troubleshooting)
  • Amazon EC2 Systems Manager agent, which makes it easy to manage and patch

All AWS resources are created in the US-East-1 Region. To avoid cross-region data transfer fees, launch the CloudFormation stack in the same region. To check the availability of Athena in other regions, see Region Table.

Log in to RStudio

The instance security group has been automatically configured to allow incoming connections on the RStudio port 8787 from any source internet address. You can edit the security group to restrict source IP access. If you have trouble connecting, ensure that port 8787 isn’t blocked by subnet network ACLS or by your outgoing proxy/firewall.

  1. In the CloudFormation stack, choose Outputs, Value, and then open the RStudio URL. You might need to wait for a few minutes until the instance has been launched.
  2. Log in to RStudio with the and password you provided during setup.

Install R packages

Next, install the required R packages from the RStudio console. You can download the R notebook file containing just the code.

#install pacman – a handy package manager for managing installs
if("pacman" %in% rownames(installed.packages()) == FALSE)
{install.packages("pacman")}  
library(pacman)
p_load(h2o,rJava,RJDBC,awsjavasdk)
h2o.init(nthreads = -1)
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         2 hours 42 minutes 
##     H2O cluster version:        3.10.4.6 
##     H2O cluster version age:    4 months and 4 days !!! 
##     H2O cluster name:           H2O_started_from_R_rstudio_hjx881 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   3.30 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 3.3.3 (2017-03-06)
## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is too old (4 months and 4 days)!
## Please download and install the latest version from http://h2o.ai/download/
#install aws sdk if not present (pre-requisite for using Athena with an IAM role)
if (!aws_sdk_present()) {
  install_aws_sdk()
}

load_sdk()
## NULL

Connect to Athena

Next, establish a connection to Athena from RStudio, using an IAM role associated with your EC2 instance. Use ATHENABUCKET to specify the S3 staging directory.

URL <- 'https://s3.amazonaws.com/athena-downloads/drivers/AthenaJDBC41-1.0.1.jar'
fil <- basename(URL)
#download the file into current working directory
if (!file.exists(fil)) download.file(URL, fil)
#verify that the file has been downloaded successfully
list.files()
## [1] "AthenaJDBC41-1.0.1.jar"
drv <- JDBC(driverClass="com.amazonaws.athena.jdbc.AthenaDriver", fil, identifier.quote="'")

con <- jdbcConnection <- dbConnect(drv, 'jdbc:awsathena://athena.us-east-1.amazonaws.com:443/',
                                   s3_staging_dir=Sys.getenv("ATHENABUCKET"),
                                   aws_credentials_provider_class="com.amazonaws.auth.DefaultAWSCredentialsProviderChain")

Verify the connection. The results returned depend on your specific Athena setup.

con
## <JDBCConnection>
dbListTables(con)
##  [1] "gdelt"               "wikistats"           "elb_logs_raw_native"
##  [4] "twitter"             "twitter2"            "usermovieratings"   
##  [7] "eventcodes"          "events"              "billboard"          
## [10] "billboardtop10"      "elb_logs"            "gdelthist"          
## [13] "gdeltmaster"         "twitter"             "twitter3"

Create a dataset

For this analysis, you use a sample dataset combining information from Billboard and Wikipedia with Echo Nest data in the Million Songs Dataset. Upload this dataset into your own S3 bucket. The table below provides a description of the fields used in this dataset.

Field Description
year Year that song was released
songtitle Title of the song
artistname Name of the song artist
songid Unique identifier for the song
artistid Unique identifier for the song artist
timesignature Variable estimating the time signature of the song
timesignature_confidence Confidence in the estimate for the timesignature
loudness Continuous variable indicating the average amplitude of the audio in decibels
tempo Variable indicating the estimated beats per minute of the song
tempo_confidence Confidence in the estimate for tempo
key Variable with twelve levels indicating the estimated key of the song (C, C#, B)
key_confidence Confidence in the estimate for key
energy Variable that represents the overall acoustic energy of the song, using a mix of features such as loudness
pitch Continuous variable that indicates the pitch of the song
timbre_0_min thru timbre_11_min Variables that indicate the minimum values over all segments for each of the twelve values in the timbre vector
timbre_0_max thru timbre_11_max Variables that indicate the maximum values over all segments for each of the twelve values in the timbre vector
top10 Indicator for whether or not the song made it to the Top 10 of the Billboard charts (1 if it was in the top 10, and 0 if not)

Create an Athena table based on the dataset

In the Athena console, select the default database, sampled, or create a new database.

Run the following create table statement.

create external table if not exists billboard
(
year int,
songtitle string,
artistname string,
songID string,
artistID string,
timesignature int,
timesignature_confidence double,
loudness double,
tempo double,
tempo_confidence double,
key int,
key_confidence double,
energy double,
pitch double,
timbre_0_min double,
timbre_0_max double,
timbre_1_min double,
timbre_1_max double,
timbre_2_min double,
timbre_2_max double,
timbre_3_min double,
timbre_3_max double,
timbre_4_min double,
timbre_4_max double,
timbre_5_min double,
timbre_5_max double,
timbre_6_min double,
timbre_6_max double,
timbre_7_min double,
timbre_7_max double,
timbre_8_min double,
timbre_8_max double,
timbre_9_min double,
timbre_9_max double,
timbre_10_min double,
timbre_10_max double,
timbre_11_min double,
timbre_11_max double,
Top10 int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 's3://aws-bigdata-blog/artifacts/predict-billboard/data'
;

Inspect the table definition for the ‘billboard’ table that you have created. If you chose a database other than sampledb, replace that value with your choice.

dbGetQuery(con, "show create table sampledb.billboard")
##                                      createtab_stmt
## 1       CREATE EXTERNAL TABLE `sampledb.billboard`(
## 2                                       `year` int,
## 3                               `songtitle` string,
## 4                              `artistname` string,
## 5                                  `songid` string,
## 6                                `artistid` string,
## 7                              `timesignature` int,
## 8                `timesignature_confidence` double,
## 9                                `loudness` double,
## 10                                  `tempo` double,
## 11                       `tempo_confidence` double,
## 12                                       `key` int,
## 13                         `key_confidence` double,
## 14                                 `energy` double,
## 15                                  `pitch` double,
## 16                           `timbre_0_min` double,
## 17                           `timbre_0_max` double,
## 18                           `timbre_1_min` double,
## 19                           `timbre_1_max` double,
## 20                           `timbre_2_min` double,
## 21                           `timbre_2_max` double,
## 22                           `timbre_3_min` double,
## 23                           `timbre_3_max` double,
## 24                           `timbre_4_min` double,
## 25                           `timbre_4_max` double,
## 26                           `timbre_5_min` double,
## 27                           `timbre_5_max` double,
## 28                           `timbre_6_min` double,
## 29                           `timbre_6_max` double,
## 30                           `timbre_7_min` double,
## 31                           `timbre_7_max` double,
## 32                           `timbre_8_min` double,
## 33                           `timbre_8_max` double,
## 34                           `timbre_9_min` double,
## 35                           `timbre_9_max` double,
## 36                          `timbre_10_min` double,
## 37                          `timbre_10_max` double,
## 38                          `timbre_11_min` double,
## 39                          `timbre_11_max` double,
## 40                                     `top10` int)
## 41                             ROW FORMAT DELIMITED 
## 42                         FIELDS TERMINATED BY ',' 
## 43                            STORED AS INPUTFORMAT 
## 44       'org.apache.hadoop.mapred.TextInputFormat' 
## 45                                     OUTPUTFORMAT 
## 46  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
## 47                                        LOCATION
## 48    's3://aws-bigdata-blog/artifacts/predict-billboard/data'
## 49                                  TBLPROPERTIES (
## 50            'transient_lastDdlTime'='1505484133')

Run a sample query

Next, run a sample query to obtain a list of all songs from Janet Jackson that made it to the Billboard Top 10 charts.

dbGetQuery(con, " SELECT songtitle,artistname,top10   FROM sampledb.billboard WHERE lower(artistname) =     'janet jackson' AND top10 = 1")
##                       songtitle    artistname top10
## 1                       Runaway Janet Jackson     1
## 2               Because Of Love Janet Jackson     1
## 3                         Again Janet Jackson     1
## 4                            If Janet Jackson     1
## 5  Love Will Never Do (Without You) Janet Jackson 1
## 6                     Black Cat Janet Jackson     1
## 7               Come Back To Me Janet Jackson     1
## 8                       Alright Janet Jackson     1
## 9                      Escapade Janet Jackson     1
## 10                Rhythm Nation Janet Jackson     1

Determine how many songs in this dataset are specifically from the year 2010.

dbGetQuery(con, " SELECT count(*)   FROM sampledb.billboard WHERE year = 2010")
##   _col0
## 1   373

The sample dataset provides certain song properties of interest that can be analyzed to gauge the impact to the song’s overall popularity. Look at one such property, timesignature, and determine the value that is the most frequent among songs in the database. Timesignature is a measure of the number of beats and the type of note involved.

Running the query directly may result in an error, as shown in the commented lines below. This error is a result of trying to retrieve a large result set over a JDBC connection, which can cause out-of-memory issues at the client level. To address this, reduce the fetch size and run again.

#t<-dbGetQuery(con, " SELECT timesignature FROM sampledb.billboard")
#Note:  Running the preceding query results in the following error: 
#Error in .jcall(rp, "I", "fetch", stride, block): java.sql.SQLException: The requested #fetchSize is more than the allowed value in Athena. Please reduce the fetchSize and try #again. Refer to the Athena documentation for valid fetchSize values.
# Use the dbSendQuery function, reduce the fetch size, and run again
r <- dbSendQuery(con, " SELECT timesignature     FROM sampledb.billboard")
dftimesignature<- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
table(dftimesignature)
## dftimesignature
##    0    1    3    4    5    7 
##   10  143  503 6787  112   19
nrow(dftimesignature)
## [1] 7574

From the results, observe that 6787 songs have a timesignature of 4.

Next, determine the song with the highest tempo.

dbGetQuery(con, " SELECT songtitle,artistname,tempo   FROM sampledb.billboard WHERE tempo = (SELECT max(tempo) FROM sampledb.billboard) ")
##                   songtitle      artistname   tempo
## 1 Wanna Be Startin' Somethin' Michael Jackson 244.307

Create the training dataset

Your model needs to be trained such that it can learn and make accurate predictions. Split the data into training and test datasets, and create the training dataset first.  This dataset contains all observations from the year 2009 and earlier. You may face the same JDBC connection issue pointed out earlier, so this query uses a fetch size.

#BillboardTrain <- dbGetQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
#Running the preceding query results in the following error:-
#Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve #JDBC result set for SELECT * FROM sampledb.billboard WHERE year <= 2009 (Internal error)
#Follow the same approach as before to address this issue.

r <- dbSendQuery(con, "SELECT * FROM sampledb.billboard WHERE year <= 2009")
BillboardTrain <- fetch(r, n=-1, block=100)
dbClearResult(r)
## [1] TRUE
BillboardTrain[1:2,c(1:3,6:10)]
##   year           songtitle artistname timesignature
## 1 2009 The Awkward Goodbye    Athlete             3
## 2 2009        Rubik's Cube    Athlete             3
##   timesignature_confidence loudness   tempo tempo_confidence
## 1                    0.732   -6.320  89.614   0.652
## 2                    0.906   -9.541 117.742   0.542
nrow(BillboardTrain)
## [1] 7201

Create the test dataset

BillboardTest <- dbGetQuery(con, "SELECT * FROM sampledb.billboard where year = 2010")
BillboardTest[1:2,c(1:3,11:15)]
##   year              songtitle        artistname key
## 1 2010 This Is the House That Doubt Built A Day to Remember  11
## 2 2010        Sticks & Bricks A Day to Remember  10
##   key_confidence    energy pitch timbre_0_min
## 1          0.453 0.9666556 0.024        0.002
## 2          0.469 0.9847095 0.025        0.000
nrow(BillboardTest)
## [1] 373

Convert the training and test datasets into H2O dataframes

train.h2o <- as.h2o(BillboardTrain)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
test.h2o <- as.h2o(BillboardTest)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%

Inspect the column names in your H2O dataframes.

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"

Create models

You need to designate the independent and dependent variables prior to applying your modeling algorithms. Because you’re trying to predict the ‘top10’ field, this would be your dependent variable and everything else would be independent.

Create your first model using GLM. Because GLM works best with numeric data, you create your model by dropping non-numeric variables. You only use the variables in the dataset that describe the numerical attributes of the song in the logistic regression model. You won’t use these variables:  “year”, “songtitle”, “artistname”, “songid”, or “artistid”.

y.dep <- 39
x.indep <- c(6:38)
x.indep
##  [1]  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
## [24] 29 30 31 32 33 34 35 36 37 38

Create Model 1: All numeric variables

Create Model 1 with the training dataset, using GLM as the modeling algorithm and H2O’s built-in h2o.glm function.

modelh1 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 1, using H2O’s built-in performance function.

h2o.performance(model=modelh1,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09924684
## RMSE:  0.3150347
## LogLoss:  0.3220267
## Mean Per-Class Error:  0.2380168
## AUC:  0.8431394
## Gini:  0.6862787
## R^2:  0.254663
## Null Deviance:  326.0801
## Residual Deviance:  240.2319
## AIC:  308.2319
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0   1    Error     Rate
## 0      255  59 0.187898  =59/314
## 1       17  42 0.288136   =17/59
## Totals 272 101 0.203753  =76/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.192772 0.525000 100
## 2                       max f2  0.124912 0.650510 155
## 3                 max f0point5  0.416258 0.612903  23
## 4                 max accuracy  0.416258 0.879357  23
## 5                max precision  0.813396 1.000000   0
## 6                   max recall  0.037579 1.000000 282
## 7              max specificity  0.813396 1.000000   0
## 8             max absolute_mcc  0.416258 0.455251  23
## 9   max min_per_class_accuracy  0.161402 0.738854 125
## 10 max mean_per_class_accuracy  0.124912 0.765006 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or ` 
h2o.auc(h2o.performance(modelh1,test.h2o)) 
## [1] 0.8431394

The AUC metric provides insight into how well the classifier is able to separate the two classes. In this case, the value of 0.8431394 indicates that the classification is good. (A value of 0.5 indicates a worthless test, while a value of 1.0 indicates a perfect test.)

Next, inspect the coefficients of the variables in the dataset.

dfmodelh1 <- as.data.frame(h2o.varimp(modelh1))
dfmodelh1
##                       names coefficients sign
## 1              timbre_0_max  1.290938663  NEG
## 2                  loudness  1.262941934  POS
## 3                     pitch  0.616995941  NEG
## 4              timbre_1_min  0.422323735  POS
## 5              timbre_6_min  0.349016024  NEG
## 6                    energy  0.348092062  NEG
## 7             timbre_11_min  0.307331997  NEG
## 8              timbre_3_max  0.302225619  NEG
## 9             timbre_11_max  0.243632060  POS
## 10             timbre_4_min  0.224233951  POS
## 11             timbre_4_max  0.204134342  POS
## 12             timbre_5_min  0.199149324  NEG
## 13             timbre_0_min  0.195147119  POS
## 14 timesignature_confidence  0.179973904  POS
## 15         tempo_confidence  0.144242598  POS
## 16            timbre_10_max  0.137644568  POS
## 17             timbre_7_min  0.126995955  NEG
## 18            timbre_10_min  0.123851179  POS
## 19             timbre_7_max  0.100031481  NEG
## 20             timbre_2_min  0.096127636  NEG
## 21           key_confidence  0.083115820  POS
## 22             timbre_6_max  0.073712419  POS
## 23            timesignature  0.067241917  POS
## 24             timbre_8_min  0.061301881  POS
## 25             timbre_8_max  0.060041698  POS
## 26                      key  0.056158445  POS
## 27             timbre_3_min  0.050825116  POS
## 28             timbre_9_max  0.033733561  POS
## 29             timbre_2_max  0.030939072  POS
## 30             timbre_9_min  0.020708113  POS
## 31             timbre_1_max  0.014228818  NEG
## 32                    tempo  0.008199861  POS
## 33             timbre_5_max  0.004837870  POS
## 34                                    NA <NA>

Typically, songs with heavier instrumentation tend to be louder (have higher values in the variable “loudness”) and more energetic (have higher values in the variable “energy”). This knowledge is helpful for interpreting the modeling results.

You can make the following observations from the results:

  • The coefficient estimates for the confidence values associated with the time signature, key, and tempo variables are positive. This suggests that higher confidence leads to a higher predicted probability of a Top 10 hit.
  • The coefficient estimate for loudness is positive, meaning that mainstream listeners prefer louder songs with heavier instrumentation.
  • The coefficient estimate for energy is negative, meaning that mainstream listeners prefer songs that are less energetic, which are those songs with light instrumentation.

These coefficients lead to contradictory conclusions for Model 1. This could be due to multicollinearity issues. Inspect the correlation between the variables “loudness” and “energy” in the training set.

cor(train.h2o$loudness,train.h2o$energy)
## [1] 0.7399067

This number indicates that these two variables are highly correlated, and Model 1 does indeed suffer from multicollinearity. Typically, you associate a value of -1.0 to -0.5 or 1.0 to 0.5 to indicate strong correlation, and a value of 0.1 to 0.1 to indicate weak correlation. To avoid this correlation issue, omit one of these two variables and re-create the models.

You build two variations of the original model:

  • Model 2, in which you keep “energy” and omit “loudness”
  • Model 3, in which you keep “loudness” and omit “energy”

You compare these two models and choose the model with a better fit for this use case.

Create Model 2: Keep energy and omit loudness

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:7,9:38)
x.indep
##  [1]  6  7  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh2 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=================================================================| 100%

Measure the performance of Model 2.

h2o.performance(model=modelh2,newdata=test.h2o)
## H2OBinomialMetrics: glm
## 
## MSE:  0.09922606
## RMSE:  0.3150017
## LogLoss:  0.3228213
## Mean Per-Class Error:  0.2490554
## AUC:  0.8431933
## Gini:  0.6863867
## R^2:  0.2548191
## Null Deviance:  326.0801
## Residual Deviance:  240.8247
## AIC:  306.8247
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      280 34 0.108280  =34/314
## 1       23 36 0.389831   =23/59
## Totals 303 70 0.152815  =57/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.254391 0.558140  69
## 2                       max f2  0.113031 0.647208 157
## 3                 max f0point5  0.413999 0.596026  22
## 4                 max accuracy  0.446250 0.876676  18
## 5                max precision  0.811739 1.000000   0
## 6                   max recall  0.037682 1.000000 283
## 7              max specificity  0.811739 1.000000   0
## 8             max absolute_mcc  0.254391 0.469060  69
## 9   max min_per_class_accuracy  0.141051 0.716561 131
## 10 max mean_per_class_accuracy  0.113031 0.761821 157
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh2 <- as.data.frame(h2o.varimp(modelh2))
dfmodelh2
##                       names coefficients sign
## 1                     pitch  0.700331511  NEG
## 2              timbre_1_min  0.510270513  POS
## 3              timbre_0_max  0.402059546  NEG
## 4              timbre_6_min  0.333316236  NEG
## 5             timbre_11_min  0.331647383  NEG
## 6              timbre_3_max  0.252425901  NEG
## 7             timbre_11_max  0.227500308  POS
## 8              timbre_4_max  0.210663865  POS
## 9              timbre_0_min  0.208516163  POS
## 10             timbre_5_min  0.202748055  NEG
## 11             timbre_4_min  0.197246582  POS
## 12            timbre_10_max  0.172729619  POS
## 13         tempo_confidence  0.167523934  POS
## 14 timesignature_confidence  0.167398830  POS
## 15             timbre_7_min  0.142450727  NEG
## 16             timbre_8_max  0.093377516  POS
## 17            timbre_10_min  0.090333426  POS
## 18            timesignature  0.085851625  POS
## 19             timbre_7_max  0.083948442  NEG
## 20           key_confidence  0.079657073  POS
## 21             timbre_6_max  0.076426046  POS
## 22             timbre_2_min  0.071957831  NEG
## 23             timbre_9_max  0.071393189  POS
## 24             timbre_8_min  0.070225578  POS
## 25                      key  0.061394702  POS
## 26             timbre_3_min  0.048384697  POS
## 27             timbre_1_max  0.044721121  NEG
## 28                   energy  0.039698433  POS
## 29             timbre_5_max  0.039469064  POS
## 30             timbre_2_max  0.018461133  POS
## 31                    tempo  0.013279926  POS
## 32             timbre_9_min  0.005282143  NEG
## 33                                    NA <NA>

h2o.auc(h2o.performance(modelh2,test.h2o)) 
## [1] 0.8431933

You can make the following observations:

  • The AUC metric is 0.8431933.
  • Inspecting the coefficient of the variable energy, Model 2 suggests that songs with high energy levels tend to be more popular. This is as per expectation.
  • As H2O orders variables by significance, the variable energy is not significant in this model.

You can conclude that Model 2 is not ideal for this use , as energy is not significant.

CreateModel 3: Keep loudness but omit energy

colnames(train.h2o)
##  [1] "year"                     "songtitle"               
##  [3] "artistname"               "songid"                  
##  [5] "artistid"                 "timesignature"           
##  [7] "timesignature_confidence" "loudness"                
##  [9] "tempo"                    "tempo_confidence"        
## [11] "key"                      "key_confidence"          
## [13] "energy"                   "pitch"                   
## [15] "timbre_0_min"             "timbre_0_max"            
## [17] "timbre_1_min"             "timbre_1_max"            
## [19] "timbre_2_min"             "timbre_2_max"            
## [21] "timbre_3_min"             "timbre_3_max"            
## [23] "timbre_4_min"             "timbre_4_max"            
## [25] "timbre_5_min"             "timbre_5_max"            
## [27] "timbre_6_min"             "timbre_6_max"            
## [29] "timbre_7_min"             "timbre_7_max"            
## [31] "timbre_8_min"             "timbre_8_max"            
## [33] "timbre_9_min"             "timbre_9_max"            
## [35] "timbre_10_min"            "timbre_10_max"           
## [37] "timbre_11_min"            "timbre_11_max"           
## [39] "top10"
y.dep <- 39
x.indep <- c(6:12,14:38)
x.indep
##  [1]  6  7  8  9 10 11 12 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [24] 30 31 32 33 34 35 36 37 38
modelh3 <- h2o.glm( y = y.dep, x = x.indep, training_frame = train.h2o, family = "binomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |=================================================================| 100%
perfh3<-h2o.performance(model=modelh3,newdata=test.h2o)
perfh3
## H2OBinomialMetrics: glm
## 
## MSE:  0.0978859
## RMSE:  0.3128672
## LogLoss:  0.3178367
## Mean Per-Class Error:  0.264925
## AUC:  0.8492389
## Gini:  0.6984778
## R^2:  0.2648836
## Null Deviance:  326.0801
## Residual Deviance:  237.1062
## AIC:  303.1062
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      286 28 0.089172  =28/314
## 1       26 33 0.440678   =26/59
## Totals 312 61 0.144772  =54/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                         metric threshold    value idx
## 1                       max f1  0.273799 0.550000  60
## 2                       max f2  0.125503 0.663265 155
## 3                 max f0point5  0.435479 0.628931  24
## 4                 max accuracy  0.435479 0.882038  24
## 5                max precision  0.821606 1.000000   0
## 6                   max recall  0.038328 1.000000 280
## 7              max specificity  0.821606 1.000000   0
## 8             max absolute_mcc  0.435479 0.471426  24
## 9   max min_per_class_accuracy  0.173693 0.745763 120
## 10 max mean_per_class_accuracy  0.125503 0.775073 155
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
dfmodelh3 <- as.data.frame(h2o.varimp(modelh3))
dfmodelh3
##                       names coefficients sign
## 1              timbre_0_max 1.216621e+00  NEG
## 2                  loudness 9.780973e-01  POS
## 3                     pitch 7.249788e-01  NEG
## 4              timbre_1_min 3.891197e-01  POS
## 5              timbre_6_min 3.689193e-01  NEG
## 6             timbre_11_min 3.086673e-01  NEG
## 7              timbre_3_max 3.025593e-01  NEG
## 8             timbre_11_max 2.459081e-01  POS
## 9              timbre_4_min 2.379749e-01  POS
## 10             timbre_4_max 2.157627e-01  POS
## 11             timbre_0_min 1.859531e-01  POS
## 12             timbre_5_min 1.846128e-01  NEG
## 13 timesignature_confidence 1.729658e-01  POS
## 14             timbre_7_min 1.431871e-01  NEG
## 15            timbre_10_max 1.366703e-01  POS
## 16            timbre_10_min 1.215954e-01  POS
## 17         tempo_confidence 1.183698e-01  POS
## 18             timbre_2_min 1.019149e-01  NEG
## 19           key_confidence 9.109701e-02  POS
## 20             timbre_7_max 8.987908e-02  NEG
## 21             timbre_6_max 6.935132e-02  POS
## 22             timbre_8_max 6.878241e-02  POS
## 23            timesignature 6.120105e-02  POS
## 24                      key 5.814805e-02  POS
## 25             timbre_8_min 5.759228e-02  POS
## 26             timbre_1_max 2.930285e-02  NEG
## 27             timbre_9_max 2.843755e-02  POS
## 28             timbre_3_min 2.380245e-02  POS
## 29             timbre_2_max 1.917035e-02  POS
## 30             timbre_5_max 1.715813e-02  POS
## 31                    tempo 1.364418e-02  NEG
## 32             timbre_9_min 8.463143e-05  NEG
## 33                                    NA <NA>
h2o.sensitivity(perfh3,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501855569251422. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.2033898
h2o.auc(perfh3)
## [1] 0.8492389

You can make the following observations:

  • The AUC metric is 0.8492389.
  • From the confusion matrix, the model correctly predicts that 33 songs will be top 10 hits (true positives). However, it has 26 false positives (songs that the model predicted would be Top 10 hits, but ended up not being Top 10 hits).
  • Loudness has a positive coefficient estimate, meaning that this model predicts that songs with heavier instrumentation tend to be more popular. This is the same conclusion from Model 2.
  • Loudness is significant in this model.

Overall, Model 3 predicts a higher number of top 10 hits with an accuracy rate that is acceptable. To choose the best fit for production runs, record labels should consider the following factors:

  • Desired model accuracy at a given threshold
  • Number of correct predictions for top10 hits
  • Tolerable number of false positives or false negatives

Next, make predictions using Model 3 on the test dataset.

predict.regh <- h2o.predict(modelh3, test.h2o)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |=================================================================| 100%
print(predict.regh)
##   predict        p0          p1
## 1       0 0.9654739 0.034526052
## 2       0 0.9654748 0.034525236
## 3       0 0.9635547 0.036445318
## 4       0 0.9343579 0.065642149
## 5       0 0.9978334 0.002166601
## 6       0 0.9779949 0.022005078
## 
## [373 rows x 3 columns]
predict.regh$predict
##   predict
## 1       0
## 2       0
## 3       0
## 4       0
## 5       0
## 6       0
## 
## [373 rows x 1 column]
dpr<-as.data.frame(predict.regh)
#Rename the predicted column 
colnames(dpr)[colnames(dpr) == 'predict'] <- 'predict_top10'
table(dpr$predict_top10)
## 
##   0   1 
## 312  61

The first set of output results specifies the probabilities associated with each predicted observation.  For example, observation 1 is 96.54739% likely to not be a Top 10 hit, and 3.4526052% likely to be a Top 10 hit (predict=1 indicates Top 10 hit and predict=0 indicates not a Top 10 hit).  The second set of results list the actual predictions made.  From the third set of results, this model predicts that 61 songs will be top 10 hits.

Compute the baseline accuracy, by assuming that the baseline predicts the most frequent outcome, which is that most songs are not Top 10 hits.

table(BillboardTest$top10)
## 
##   0   1 
## 314  59

Now observe that the baseline model would get 314 observations correct, and 59 wrong, for an accuracy of 314/(314+59) = 0.8418231.

It seems that Model 3, with an accuracy of 0.8552, provides you with a small improvement over the baseline model. But is this model useful for record labels?

View the two models from an investment perspective:

  • A production company is interested in investing in songs that are more likely to make it to the Top 10. The company’s objective is to minimize the risk of financial losses attributed to investing in songs that end up unpopular.
  • How many songs does Model 3 correctly predict as a Top 10 hit in 2010? Looking at the confusion matrix, you see that it predicts 33 top 10 hits correctly at an optimal threshold, which is more than half the number
  • It will be more useful to the record label if you can provide the production company with a list of songs that are highly likely to end up in the Top 10.
  • The baseline model is not useful, as it simply does not label any song as a hit.

Considering the three models built so far, you can conclude that Model 3 proves to be the best investment choice for the record label.

GBM model

H2O provides you with the ability to explore other learning models, such as GBM and deep learning. Explore building a model using the GBM technique, using the built-in h2o.gbm function.

Before you do this, you need to convert the target variable to a factor for multinomial classification techniques.

train.h2o$top10=as.factor(train.h2o$top10)
gbm.modelh <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 500, max_depth = 4, learn_rate = 0.01, seed = 1122,distribution="multinomial")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |=================================================================| 100%
perf.gbmh<-h2o.performance(gbm.modelh,test.h2o)
perf.gbmh
## H2OBinomialMetrics: gbm
## 
## MSE:  0.09860778
## RMSE:  0.3140188
## LogLoss:  0.3206876
## Mean Per-Class Error:  0.2120263
## AUC:  0.8630573
## Gini:  0.7261146
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      266 48 0.152866  =48/314
## 1       16 43 0.271186   =16/59
## Totals 282 91 0.171582  =64/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.189757 0.573333  90
## 2                     max f2  0.130895 0.693717 145
## 3               max f0point5  0.327346 0.598802  26
## 4               max accuracy  0.442757 0.876676  14
## 5              max precision  0.802184 1.000000   0
## 6                 max recall  0.049990 1.000000 284
## 7            max specificity  0.802184 1.000000   0
## 8           max absolute_mcc  0.169135 0.496486 104
## 9 max min_per_class_accuracy  0.169135 0.796610 104
## 10 max mean_per_class_accuracy  0.169135 0.805948 104
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `
h2o.sensitivity(perf.gbmh,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.501205344484314. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.1355932
h2o.auc(perf.gbmh)
## [1] 0.8630573

This model correctly predicts 43 top 10 hits, which is 10 more than the number predicted by Model 3. Moreover, the AUC metric is higher than the one obtained from Model 3.

As seen above, H2O’s API provides the ability to obtain key statistical measures required to analyze the models easily, using several built-in functions. The record label can experiment with different parameters to arrive at the model that predicts the maximum number of Top 10 hits at the desired level of accuracy and threshold.

H2O also allows you to experiment with deep learning models. Deep learning models have the ability to learn features implicitly, but can be more expensive computationally.

Now, create a deep learning model with the h2o.deeplearning function, using the same training and test datasets created before. The time taken to run this model depends on the type of EC2 instance chosen for this purpose.  For models that require more computation, consider using accelerated computing instances such as the P2 instance type.

system.time(
  dlearning.modelh <- h2o.deeplearning(y = y.dep,
                                      x = x.indep,
                                      training_frame = train.h2o,
                                      epoch = 250,
                                      hidden = c(250,250),
                                      activation = "Rectifier",
                                      seed = 1122,
                                      distribution="multinomial"
  )
)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |=================================================================| 100%
##    user  system elapsed 
##   1.216   0.020 166.508
perf.dl<-h2o.performance(model=dlearning.modelh,newdata=test.h2o)
perf.dl
## H2OBinomialMetrics: deeplearning
## 
## MSE:  0.1678359
## RMSE:  0.4096778
## LogLoss:  1.86509
## Mean Per-Class Error:  0.3433013
## AUC:  0.7568822
## Gini:  0.5137644
## 
## Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
##          0  1    Error     Rate
## 0      290 24 0.076433  =24/314
## 1       36 23 0.610169   =36/59
## Totals 326 47 0.160858  =60/373
## 
## Maximum Metrics: Maximum metrics at their respective thresholds
##                       metric threshold    value idx
## 1                     max f1  0.826267 0.433962  46
## 2                     max f2  0.000000 0.588235 239
## 3               max f0point5  0.999929 0.511811  16
## 4               max accuracy  0.999999 0.865952  10
## 5              max precision  1.000000 1.000000   0
## 6                 max recall  0.000000 1.000000 326
## 7            max specificity  1.000000 1.000000   0
## 8           max absolute_mcc  0.999929 0.363219  16
## 9 max min_per_class_accuracy  0.000004 0.662420 145
## 10 max mean_per_class_accuracy  0.000000 0.685334 224
## 
## Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
h2o.sensitivity(perf.dl,0.5)
## Warning in h2o.find_row_by_threshold(object, t): Could not find exact
## threshold: 0.5 for this set of metrics; using closest threshold found:
## 0.496293348880151. Run `h2o.predict` and apply your desired threshold on a
## probability column.
## [[1]]
## [1] 0.3898305
h2o.auc(perf.dl)
## [1] 0.7568822

The AUC metric for this model is 0.7568822, which is less than what you got from the earlier models. I recommend further experimentation using different hyper parameters, such as the learning rate, epoch or the number of hidden layers.

H2O’s built-in functions provide many key statistical measures that can help measure model performance. Here are some of these key terms.

Metric Description
Sensitivity Measures the proportion of positives that have been correctly identified. It is also called the true positive rate, or recall.
Specificity Measures the proportion of negatives that have been correctly identified. It is also called the true negative rate.
Threshold Cutoff point that maximizes specificity and sensitivity. While the model may not provide the highest prediction at this point, it would not be biased towards positives or negatives.
Precision The fraction of the documents retrieved that are relevant to the information needed, for example, how many of the positively classified are relevant
AUC

Provides insight into how well the classifier is able to separate the two classes. The implicit goal is to deal with situations where the sample distribution is highly skewed, with a tendency to overfit to a single class.

0.90 – 1 = excellent (A)

0.8 – 0.9 = good (B)

0.7 – 0.8 = fair (C)

.6 – 0.7 = poor (D)

0.5 – 0.5 = fail (F)

Here’s a summary of the metrics generated from H2O’s built-in functions for the three models that produced useful results.

Metric Model 3 GBM Model Deep Learning Model

Accuracy

(max)

0.882038

(t=0.435479)

0.876676

(t=0.442757)

0.865952

(t=0.999999)

Precision

(max)

1.0

(t=0.821606)

1.0

(t=0802184)

1.0

(t=1.0)

Recall

(max)

1.0 1.0

1.0

(t=0)

Specificity

(max)

1.0 1.0

1.0

(t=1)

Sensitivity

 

0.2033898 0.1355932

0.3898305

(t=0.5)

AUC 0.8492389 0.8630573 0.756882

Note: ‘t’ denotes threshold.

Your options at this point could be narrowed down to Model 3 and the GBM model, based on the AUC and accuracy metrics observed earlier.  If the slightly lower accuracy of the GBM model is deemed acceptable, the record label can choose to go to production with the GBM model, as it can predict a higher number of Top 10 hits.  The AUC metric for the GBM model is also higher than that of Model 3.

Record labels can experiment with different learning techniques and parameters before arriving at a model that proves to be the best fit for their business. Because deep learning models can be computationally expensive, record labels can choose more powerful EC2 instances on AWS to run their experiments faster.

Conclusion

In this post, I showed how the popular music industry can use analytics to predict the type of songs that make the Top 10 Billboard charts. By running H2O’s scalable machine learning platform on AWS, data scientists can easily experiment with multiple modeling techniques and interactively query the data using Amazon Athena, without having to manage the underlying infrastructure. This helps record labels make critical decisions on the type of artists and songs to promote in a timely fashion, thereby increasing sales and revenue.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to build and explore a simple geospita simple GEOINT application using SparkR.


About the Authors

gopalGopal Wunnava is a Partner Solution Architect with the AWS GSI Team. He works with partners and customers on big data engagements, and is passionate about building analytical solutions that drive business capabilities and decision making. In his spare time, he loves all things sports and movies related and is fond of old classics like Asterix, Obelix comics and Hitchcock movies.

 

 

Bob Strahan, a Senior Consultant with AWS Professional Services, contributed to this post.