Tag Archives: Advertising

Advances in Ad Blocking

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/04/advances_in_ad_.html

Ad blockers represent the largest consumer boycott in human history. They’re also an arms race between the blockers and the blocker blockers. This article discusses a new ad-blocking technology that represents another advance in this arms race. I don’t think it will “put an end to the ad-blocking arms race,” as the title proclaims, but it will definitely give the blockers the upper hand.

The software, devised by Arvind Narayanan, Dillon Reisman, Jonathan Mayer, and Grant Storey, is novel in two major ways: First, it looks at the struggle between advertising and ad blockers as fundamentally a security problem that can be fought in much the same way antivirus programs attempt to block malware, using techniques borrowed from rootkits and built-in web browser customizability to stealthily block ads without being detected. Second, the team notes that there are regulations and laws on the books that give a fundamental advantage to consumers that cannot be easily changed, opening the door to a long-term ad-blocking solution.

Now if we could only block the data collection as well.

Commenting Policy for This Blog

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/03/commenting_poli.html

Over the past few months, I have been watching my blog comments decline in civility. I blame it in part on the contentious US election and its aftermath. It’s also a consequence of not requiring visitors to register in order to post comments, and of our tolerance for impassioned conversation. Whatever the causes, I’m tired of it. Partisan nastiness is driving away visitors who might otherwise have valuable insights to offer.

I have been engaging in more active comment moderation. What that means is that I have been quicker to delete posts that are rude, insulting, or off-topic. This is my blog. I consider the comments section as analogous to a gathering at my home. It’s not a town square. Everyone is expected to be polite and respectful, and if you’re an unpleasant guest, I’m going to ask you to leave. Your freedom of speech does not compel me to publish your words.

I like people who disagree with me. I like debate. I even like arguments. But I expect everyone to behave as if they’ve been invited into my home.

I realize that I sometimes express opinions on political matters; I find they are relevant to security at all levels. On those posts, I welcome on-topic comments regarding those opinions. I don’t welcome people pissing and moaning about the fact that I’ve expressed my opinion on something other than security technology. As I said, it’s my blog.

So, please… Assume good faith. Be polite. Minimize profanity. Argue facts, not personalities. Stay on topic. If you want a model to emulate, look at Clive Robinson’s posts.

Schneier on Security is not a professional operation. There’s no advertising, so no revenue to hire staff. My part-time moderator — paid out of my own pocket — and I do what we can when we can. If you see a comment that’s spam, or off-topic, or an ad hominem attack, flag it and be patient. Don’t reply or engage; we’ll get to it. And we won’t always post an explanation when we delete something.

My own stance on privacy and anonymity means that I’m not going to require commenters to register a name or e-mail address, so that isn’t an option. And I really don’t want to disable comments.

I dislike having to deal with this problem. I’ve been proud and happy to see how interesting and useful the comments section has been all these years. I’ve watched many blogs and discussion groups descend into toxicity as a result of trolls and drive-by ideologues derailing the conversations of regular posters. I’m not going to let that happen here.

Three challenges for the web, according to its inventor

Post Syndicated from ris original https://lwn.net/Articles/716981/rss

The world wide web has been around for 28 years now. Web inventor Sir Tim
Berners-Lee writes
about the challenges facing the modern web, including the loss of control of
our personal data, the spread of misinformation, and the lack of
transparency in political advertising. “Political advertising online
has rapidly
become
a sophisticated industry. The fact that most people get their
information from just a few platforms and the increasing sophistication of
algorithms drawing upon rich pools of personal data, means that political
campaigns are now building individual adverts targeted directly at
users. One
source
suggests that in the 2016 US election, as many as 50,000
variations of adverts were being served every single day on Facebook, a
near-impossible situation to monitor. And there are suggestions that some
political adverts – in the US and around the world – are being used in
unethical ways – to point voters to fake news sites, for instance, or to keep
others away from the polls
. Targeted advertising allows a campaign to
say completely different, possibly conflicting things to different
groups. Is that democratic?

Defense against Doxing

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/03/defense_against.html

A decade ago, I wrote about the death of ephemeral conversation. As computers were becoming ubiquitous, some unintended changes happened, too. Before computers, what we said disappeared once we’d said it. Neither face-to-face conversations nor telephone conversations were routinely recorded. A permanent communication was something different and special; we called it correspondence.

The Internet changed this. We now chat by text message and e-mail, on Facebook and on Instagram. These conversations — with friends, lovers, colleagues, fellow employees — all leave electronic trails. And while we know this intellectually, we haven’t truly internalized it. We still think of conversation as ephemeral, forgetting that we’re being recorded and what we say has the permanence of correspondence.

That our data is used by large companies for psychological manipulation ­– we call this advertising –­ is well known. So is its use by governments for law enforcement and, depending on the country, social control. What made the news over the past year were demonstrations of how vulnerable all of this data is to hackers and the effects of having it hacked, copied, and then published online. We call this doxing.

Doxing isn’t new, but it has become more common. It’s been perpetrated against corporations, law firms, individuals, the NSA and — just this week — the CIA. It’s largely harassment and not whistleblowing, and it’s not going to change anytime soon. The data in your computer and in the cloud are, and will continue to be, vulnerable to hacking and publishing online. Depending on your prominence and the details of this data, you may need some new strategies to secure your private life.

There are two basic ways hackers can get at your e-mail and private documents. One way is to guess your password. That’s how hackers got their hands on personal photos of celebrities from iCloud in 2014.

How to protect yourself from this attack is pretty obvious. First, don’t choose a guessable password. This is more than not using “password1” or “qwerty”; most easily memorizable passwords are guessable. My advice is to generate passwords you have to remember by using either the XKCD scheme or the Schneier scheme, and to use large random passwords stored in a password manager for everything else.

Second, turn on two-factor authentication where you can, like Google’s 2-Step Verification. This adds another step besides just entering a password, such as having to type in a one-time code that’s sent to your mobile phone. And third, don’t reuse the same password on any sites you actually care about.

You’re not done, though. Hackers have accessed accounts by exploiting the “secret question” feature and resetting the password. That was how Sarah Palin’s e-mail account was hacked in 2008. The problem with secret questions is that they’re not very secret and not very random. My advice is to refuse to use those features. Type randomness into your keyboard, or choose a really random answer and store it in your password manager.

Finally, you also have to stay alert to phishing attacks, where a hacker sends you an enticing e-mail with a link that sends you to a web page that looks almost like the expected page, but which actually isn’t. This sort of thing can bypass two-factor authentication, and is almost certainly what tricked John Podesta and Colin Powell.

The other way hackers can get at your personal stuff is by breaking in to the computers the information is stored on. This is how the Russians got into the Democratic National Committee’s network and how a lone hacker got into the Panamanian law firm Mossack Fonseca. Sometimes individuals are targeted, as when China hacked Google in 2010 to access the e-mail accounts of human rights activists. Sometimes the whole network is the target, and individuals are inadvertent victims, as when thousands of Sony employees had their e-mails published by North Korea in 2014.

Protecting yourself is difficult, because it often doesn’t matter what you do. If your e-mail is stored with a service provider in the cloud, what matters is the security of that network and that provider. Most users have no control over that part of the system. The only way to truly protect yourself is to not keep your data in the cloud where someone could get to it. This is hard. We like the fact that all of our e-mail is stored on a server somewhere and that we can instantly search it. But that convenience comes with risk. Consider deleting old e-mail, or at least downloading it and storing it offline on a portable hard drive. In fact, storing data offline is one of the best things you can do to protect it from being hacked and exposed. If it’s on your computer, what matters is the security of your operating system and network, not the security of your service provider.

Consider this for files on your own computer. The more things you can move offline, the safer you’ll be.

E-mail, no matter how you store it, is vulnerable. If you’re worried about your conversations becoming public, think about an encrypted chat program instead, such as Signal, WhatsApp or Off-the-Record Messaging. Consider using communications systems that don’t save everything by default.

None of this is perfect, of course. Portable hard drives are vulnerable when you connect them to your computer. There are ways to jump air gaps and access data on computers not connected to the Internet. Communications and data files you delete might still exist in backup systems somewhere — either yours or those of the various cloud providers you’re using. And always remember that there’s always another copy of any of your conversations stored with the person you’re conversing with. Even with these caveats, though, these measures will make a big difference.

When secrecy is truly paramount, go back to communications systems that are still ephemeral. Pick up the telephone and talk. Meet face to face. We don’t yet live in a world where everything is recorded and everything is saved, although that era is coming. Enjoy the last vestiges of ephemeral conversation while you still can.

This essay originally appeared in the Washington Post.

Botnets

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/03/botnets.html

Botnets have existed for at least a decade. As early as 2000, hackers were breaking into computers over the Internet and controlling them en masse from centralized systems. Among other things, the hackers used the combined computing power of these botnets to launch distributed denial-of-service attacks, which flood websites with traffic to take them down.

But now the problem is getting worse, thanks to a flood of cheap webcams, digital video recorders, and other gadgets in the “Internet of things.” Because these devices typically have little or no security, hackers can take them over with little effort. And that makes it easier than ever to build huge botnets that take down much more than one site at a time.

In October, a botnet made up of 100,000 compromised gadgets knocked an Internet infrastructure provider partially offline. Taking down that provider, Dyn, resulted in a cascade of effects that ultimately caused a long list of high-profile websites, including Twitter and Netflix, to temporarily disappear from the Internet. More attacks are sure to follow: the botnet that attacked Dyn was created with publicly available malware called Mirai that largely automates the process of co-opting computers.

The best defense would be for everything online to run only secure software, so botnets couldn’t be created in the first place. This isn’t going to happen anytime soon. Internet of things devices are not designed with security in mind and often have no way of being patched. The things that have become part of Mirai botnets, for example, will be vulnerable until their owners throw them away. Botnets will get larger and more powerful simply because the number of vulnerable devices will go up by orders of magnitude over the next few years.

What do hackers do with them? Many things.

Botnets are used to commit click fraud. Click fraud is a scheme to fool advertisers into thinking that people are clicking on, or viewing, their ads. There are lots of ways to commit click fraud, but the easiest is probably for the attacker to embed a Google ad in a Web page he owns. Google ads pay a site owner according to the number of people who click on them. The attacker instructs all the computers on his botnet to repeatedly visit the Web page and click on the ad. Dot, dot, dot, PROFIT! If the botnet makers figure out more effective ways to siphon revenue from big companies online, we could see the whole advertising model of the Internet crumble.

Similarly, botnets can be used to evade spam filters, which work partly by knowing which computers are sending millions of e-mails. They can speed up password guessing to break into online accounts, mine bitcoins, and do anything else that requires a large network of computers. This is why botnets are big businesses. Criminal organizations rent time on them.

But the botnet activities that most often make headlines are denial-of-service attacks. Dyn seems to have been the victim of some angry hackers, but more financially motivated groups use these attacks as a form of extortion. Political groups use them to silence websites they don’t like. Such attacks will certainly be a tactic in any future cyberwar.

Once you know a botnet exists, you can attack its command-and-control system. When botnets were rare, this tactic was effective. As they get more common, this piecemeal defense will become less so. You can also secure yourself against the effects of botnets. For example, several companies sell defenses against denial-of-service attacks. Their effectiveness varies, depending on the severity of the attack and the type of service.

But overall, the trends favor the attacker. Expect more attacks like the one against Dyn in the coming year.

This essay previously appeared in the MIT Technology Review.

AWS Hot Startups – February 2017

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-february-2017-2/

As we finish up the month of February, Tina Barr is back with some awesome startups.

-Ana


This month we are bringing you five innovative hot startups:

  • GumGum – Creating and popularizing the field of in-image advertising.
  • Jiobit – Smart tags to help parents keep track of kids.
  • Parsec – Offers flexibility in hardware and location for PC gamers.
  • Peloton – Revolutionizing indoor cycling and fitness classes at home.
  • Tendril – Reducing energy consumption for homeowners.

If you missed any of our January startups, make sure to check them out here.

GumGum (Santa Monica, CA)
GumGum logo1GumGum is best known for inventing and popularizing the field of in-image advertising. Founded in 2008 by Ophir Tanz, the company is on a mission to unlock the value held within the vast content produced daily via social media, editorials, and broadcasts in a variety of industries. GumGum powers campaigns across more than 2,000 premium publishers, which are seen by over 400 million users.

In-image advertising was pioneered by GumGum and has given companies a platform to deliver highly visible ads to a place where the consumer’s attention is already focused. Using image recognition technology, GumGum delivers targeted placements as contextual overlays on related pictures, as banners that fit on all screen sizes, or as In-Feed placements that blend seamlessly into the surrounding content. Using Visual Intelligence, GumGum can scour social media and broadcast TV for all images and videos related to a brand, allowing companies to gain a stronger understanding of their audience and how they are relating to that brand on social media.

GumGum relies on AWS for its Image Processing and Ad Serving operations. Using AWS infrastructure, GumGum currently processes 13 million requests per minute across the globe and generates 30 TB of new data every day. The company uses a suite of services including but not limited to Amazon EC2, Amazon S3, Amazon Kinesis, Amazon EMR, AWS Data Pipeline, and Amazon SNS. AWS edge locations allow GumGum to serve its customers in the US, Europe, Australia, and Japan and the company has plans to expand its infrastructure to Australia and APAC regions in the future.

For a look inside GumGum’s startup culture, check out their first Hackathon!

Jiobit (Chicago, IL)
Jiobit Team1
Jiobit was inspired by a real event that took place in a crowded Chicago park. A couple of summers ago, John Renaldi experienced every parent’s worst nightmare – he lost track of his then 6-year-old son in a public park for almost 30 minutes. John knew he wasn’t the only parent with this problem. After months of research, he determined that over 50% of parents have had a similar experience and an even greater percentage are actively looking for a way to prevent it.

Jiobit is the world’s smallest and longest lasting smart tag that helps parents keep track of their kids in every location – indoors and outdoors. The small device is kid-proof: lightweight, durable, and waterproof. It acts as a virtual “safety harness” as it uses a combination of Bluetooth, Wi-Fi, Multiple Cellular Networks, GPS, and sensors to provide accurate locations in real-time. Jiobit can automatically learn routes and locations, and will send parents an alert if their child does not arrive at their destination on time. The talented team of experienced engineers, designers, marketers, and parents has over 150 patents and has shipped dozens of hardware and software products worldwide.

The Jiobit team is utilizing a number of AWS services in the development of their product. Security is critical to the overall product experience, and they are over-engineering security on both the hardware and software side with the help of AWS. Jiobit is also working towards being the first child monitoring device that will have implemented an Alexa Skill via the Amazon Echo device (see here for a demo!). The devices use AWS IoT to send and receive data from the Jio Cloud over the MQTT protocol. Once data is received, they use AWS Lambda to parse the received data and take appropriate actions, including storing relevant data using Amazon DynamoDB, and sending location data to Amazon Machine Learning processing jobs.

Visit the Jiobit blog for more information.

Parsec (New York, NY)
Parsec logo large1
Parsec operates under the notion that everyone should have access to the best computing in the world because access to technology creates endless opportunities. Founded in 2016 by Benjy Boxer and Chris Dickson, Parsec aims to eliminate the burden of hardware upgrades that users frequently experience by building the technology to make a computer in the cloud available anywhere, at any time. Today, they are using their technology to enable greater flexibility in the hardware and location that PC gamers choose to play their favorite games on. Check out this interview with Benjy and our Startups team for a look at how Parsec works.

Parsec built their first product to improve the gaming experience; gamers no longer have to purchase consoles or expensive PCs to access the entertainment they love. Their low latency video streaming and networking technologies allow gamers to remotely access their gaming rig and play on any Windows, Mac, Android, or Raspberry Pi device. With the global reach of AWS, Parsec is able to deliver cloud gaming to the median user in the US and Europe with less than 30 milliseconds of network latency.

Parsec users currently have two options available to start gaming with cloud resources. They can either set up their own machines with the Parsec AMI in their region or rely on Parsec to manage everything for a seamless experience. In either case, Parsec uses the g2.2xlarge EC2 instance type. Parsec is using Amazon Elastic Block Storage to store games, Amazon DynamoDB for scalability, and Amazon EC2 for its web servers and various APIs. They also deal with a high volume of logs and take advantage of the Amazon Elasticsearch Service to analyze the data.

Be sure to check out Parsec’s blog to keep up with the latest news.

Peloton (New York, NY)
Peloton image 3
The idea for Peloton was born in 2012 when John Foley, Founder and CEO, and his wife Jill started realizing the challenge of balancing work, raising young children, and keeping up with personal fitness. This is a common challenge people face – they want to work out, but there are a lot of obstacles that stand in their way. Peloton offers a solution that enables people to join indoor cycling and fitness classes anywhere, anytime.

Peloton has created a cutting-edge indoor bike that streams up to 14 hours of live classes daily and has over 4,000 on-demand classes. Users can access live classes from world-class instructors from the convenience of their home or gym. The bike tracks progress with in-depth ride metrics and allows people to compete in real-time with other users who have taken a specific ride. The live classes even feature top DJs that play current playlists to keep users motivated.

With an aggressive marketing campaign, which has included high-visibility TV advertising, Peloton made the decision to run its entire platform in the cloud. Most recently, they ran an ad during an NFL playoff game and their rate of requests per minute to their site increased from ~2k/min to ~32.2k/min within 60 seconds. As they continue to grow and diversify, they are utilizing services such as Amazon S3 for thousands of hours of archived on-demand video content, Amazon Redshift for data warehousing, and Application Load Balancer for intelligent request routing.

Learn more about Peloton’s engineering team here.

Tendril (Denver, CO)
Tendril logo1
Tendril was founded in 2004 with the goal of helping homeowners better manage and reduce their energy consumption. Today, electric and gas utilities use Tendril’s data analytics platform on more than 140 million homes to deliver a personalized energy experience for consumers around the world. Using the latest technology in decision science and analytics, Tendril can gain access to real-time, ever evolving data about energy consumers and their homes so they can improve customer acquisition, increase engagement, and orchestrate home energy experiences. In turn, Tendril helps its customers unlock the true value of energy interactions.

AWS helps Tendril run its services globally, while scaling capacity up and down as needed, and in real-time. This has been especially important in support of Tendril’s newest solution, Orchestrated Energy, a continuous demand management platform that calculates a home’s thermal mass, predicts consumer behavior, and integrates with smart thermostats and other connected home devices. This solution allows millions of consumers to create a personalized energy plan for their home based on their individual needs.

Tendril builds and maintains most of its infrastructure services with open sources tools running on Amazon EC2 instances, while also making use of AWS services such as Elastic Load Balancing, Amazon API Gateway, Amazon CloudFront, Amazon Route 53, Amazon Simple Queue Service, and Amazon RDS for PostgreSQL.

Visit the Tendril Blog for more information!

— Tina Barr

Tips on Winning the ecommerce Game

Post Syndicated from Sarah Wilson original http://www.anchor.com.au/blog/2017/02/tips-ecommerce-hosting-game/

The ecommerce world is constantly changing and evolving, which is exactly why you must keep on top of the game. Arguably, choosing a reliable host is the most important decision that an eCommerce business has, that’s why we have noted 5 major reasons as to why a quality hosting provider is vital.

High Availability

The most important thing to think about when choosing a host and your infrastructure, is “How much is it going to cost me when my site goes down”.
If your site is down, especially over a large period of time, you could be losing customers and profits. One way to minimise this is to create a highly available environment on the cloud. This means that there is a ‘redundancy’ plan in place to minimise the chances of your site being offline for even a minute.

SEO Ranking

Having a good SEO ranking isn’t purely based on your content. If your site is extremely slow to load, or doesn’t load at all, the ‘secret Google bots’, will push your site further and further down the results page. We recommend using a CDN (Content Delivery Network) such as Cloudflare to help improve performance.  

Security

This may seem like a fairly obvious concern, but making sure you have regular security updates and patches is vital, especially, if credit cards or money transfers are involved on your site. Obviously there is no one way to combat every security concern on the internet, however, making sure you have regular back ups and 24/7 support will help any situation.

Scalability 

What happens when you have a sale or run an advertising campaign and suddenly have a flurry of traffic to your site? In order for your site to be able to cope with the new influx, it needs to be scalable. A good hosting provider can make your site scalable so that there is no downtime when your site is hit with a heavy traffic load. Generally, the best direction to follow when scalability is a priority, is the cloud or Amazon Web Services. The best part of it is, not only do you only pay for what you use, but hosting on the Amazon infrastructure also gives you an SLA (Service Level Agreement) of 99.95% uptime guarantee.

Stress-Free Support

Finally, a good hosting provider will take away any stress that is related to hosting. If your site goes down at 3am, you don’t want to be the person having to deal with it. At Anchor, we have a team of expert Sysadmins available 24/7 to take the stress out of keeping your site up and online.

With these 5 points in mind, you can now make 2017 your year, and beat the game that is eCommerce.

If you have security concerns, experiencing slow page loads or even downtime, we can perform a free ecommerce site assessment to help define a hosting roadmap that will allow you to speed ahead of the competition. If you would simply like to learn more about eCommerce hosting on Anchor’s award winning hosting network, simply contact our friendly staff will get back to you ASAP. 

The post Tips on Winning the ecommerce Game appeared first on AWS Managed Services by Anchor.

Tips on Winning the ecommerce Game

Post Syndicated from Sarah Wilson original https://www.anchor.com.au/blog/2017/02/tips-ecommerce-hosting-game/

The ecommerce world is constantly changing and evolving, which is exactly why you must keep on top of the game. Arguably, choosing a reliable host is the most important decision that an eCommerce business has, that’s why we have noted 5 major reasons as to why a quality hosting provider is vital.

High Availability

The most important thing to think about when choosing a host and your infrastructure, is “How much is it going to cost me when my site goes down”.
If your site is down, especially over a large period of time, you could be losing customers and profits. One way to minimise this is to create a highly available environment on the cloud. This means that there is a ‘redundancy’ plan in place to minimise the chances of your site being offline for even a minute.

SEO Ranking

Having a good SEO ranking isn’t purely based on your content. If your site is extremely slow to load, or doesn’t load at all, the ‘secret Google bots’, will push your site further and further down the results page. We recommend using a CDN (Content Delivery Network) such as Cloudflare to help improve performance.  

Security

This may seem like a fairly obvious concern, but making sure you have regular security updates and patches is vital, especially, if credit cards or money transfers are involved on your site. Obviously there is no one way to combat every security concern on the internet, however, making sure you have regular back ups and 24/7 support will help any situation.

Scalability 

What happens when you have a sale or run an advertising campaign and suddenly have a flurry of traffic to your site? In order for your site to be able to cope with the new influx, it needs to be scalable. A good hosting provider can make your site scalable so that there is no downtime when your site is hit with a heavy traffic load. Generally, the best direction to follow when scalability is a priority, is the cloud or Amazon Web Services. The best part of it is, not only do you only pay for what you use, but hosting on the Amazon infrastructure also gives you an SLA (Service Level Agreement) of 99.95% uptime guarantee.

Stress-Free Support

Finally, a good hosting provider will take away any stress that is related to hosting. If your site goes down at 3am, you don’t want to be the person having to deal with it. At Anchor, we have a team of expert Sysadmins available 24/7 to take the stress out of keeping your site up and online.

With these 5 points in mind, you can now make 2017 your year, and beat the game that is eCommerce.

If you have security concerns, experiencing slow page loads or even downtime, we can perform a free ecommerce site assessment to help define a hosting roadmap that will allow you to speed ahead of the competition. If you would simply like to learn more about eCommerce hosting on Anchor’s award winning hosting network, simply contact our friendly staff will get back to you ASAP. 

The post Tips on Winning the ecommerce Game appeared first on AWS Managed Services by Anchor.

Android apps, IMEIs and privacy

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/46266.html

There’s been a sudden wave of people concerned about the Meitu selfie app’s use of unique phone IDs. Here’s what we know: the app will transmit your phone’s IMEI (a unique per-phone identifier that can’t be altered under normal circumstances) to servers in China. It’s able to obtain this value because it asks for a permission called READ_PHONE_STATE, which (if granted) means that the app can obtain various bits of information about your phone including those unique IDs and whether you’re currently on a call.

Why would anybody want these IDs? The simple answer is that app authors mostly make money by selling advertising, and advertisers like to know who’s seeing their advertisements. The more app views they can tie to a single individual, the more they can track that user’s response to different kinds of adverts and the more targeted (and, they hope, more profitable) the advertising towards that user. Using the same ID between multiple apps makes this easier, and so using a device-level ID rather than an app-level one is preferred. The IMEI is the most stable ID on Android devices, persisting even across factory resets.

The downside of using a device-level ID is, well, whoever has that data knows a lot about what you’re running. That lets them tailor adverts to your tastes, but there are certainly circumstances where that could be embarrassing or even compromising. Using the IMEI for this is even worse, since it’s also used for fundamental telephony functions – for instance, when a phone is reported stolen, its IMEI is added to a blacklist and networks will refuse to allow it to join. A sufficiently malicious person could potentially report your phone stolen and get it blocked by providing your IMEI. And phone networks are obviously able to track devices using them, so someone with enough access could figure out who you are from your app usage and then track you via your IMEI. But realistically, anyone with that level of access to the phone network could just identify you via other means. There’s no reason to believe that this is part of a nefarious Chinese plot.

Is there anything you can do about this? On Android 6 and later, yes. Go to settings, hit apps, hit the gear menu in the top right, choose “App permissions” and scroll down to phone. Under there you’ll see all apps that have permission to obtain this information, and you can turn them off. Doing so may cause some apps to crash or otherwise misbehave, whereas newer apps may simply ask for you to grant the permission again and refuse to do so if you don’t.

Meitu isn’t especially rare in this respect. Over 50% of the Android apps I have handy request your IMEI, although I haven’t tracked what they all do with it. It’s certainly something to be concerned about, but Meitu isn’t especially rare here – there are big-name apps that do exactly the same thing. There’s a legitimate question over whether Android should be making it so easy for apps to obtain this level of identifying information without more explicit informed consent from the user, but until Google do anything to make it more difficult, apps will continue making use of this information. Let’s turn this into a conversation about user privacy online rather than blaming one specific example.

comment count unavailable comments

Android apps, IMEIs and privacy

Post Syndicated from Matthew Garrett original http://mjg59.dreamwidth.org/46266.html

There’s been a sudden wave of people concerned about the Meitu selfie app’s use of unique phone IDs. Here’s what we know: the app will transmit your phone’s IMEI (a unique per-phone identifier that can’t be altered under normal circumstances) to servers in China. It’s able to obtain this value because it asks for a permission called READ_PHONE_STATE, which (if granted) means that the app can obtain various bits of information about your phone including those unique IDs and whether you’re currently on a call.

Why would anybody want these IDs? The simple answer is that app authors mostly make money by selling advertising, and advertisers like to know who’s seeing their advertisements. The more app views they can tie to a single individual, the more they can track that user’s response to different kinds of adverts and the more targeted (and, they hope, more profitable) the advertising towards that user. Using the same ID between multiple apps makes this easier, and so using a device-level ID rather than an app-level one is preferred. The IMEI is the most stable ID on Android devices, persisting even across factory resets.

The downside of using a device-level ID is, well, whoever has that data knows a lot about what you’re running. That lets them tailor adverts to your tastes, but there are certainly circumstances where that could be embarrassing or even compromising. Using the IMEI for this is even worse, since it’s also used for fundamental telephony functions – for instance, when a phone is reported stolen, its IMEI is added to a blacklist and networks will refuse to allow it to join. A sufficiently malicious person could potentially report your phone stolen and get it blocked by providing your IMEI. And phone networks are obviously able to track devices using them, so someone with enough access could figure out who you are from your app usage and then track you via your IMEI. But realistically, anyone with that level of access to the phone network could just identify you via other means. There’s no reason to believe that this is part of a nefarious Chinese plot.

Is there anything you can do about this? On Android 6 and later, yes. Go to settings, hit apps, hit the gear menu in the top right, choose “App permissions” and scroll down to phone. Under there you’ll see all apps that have permission to obtain this information, and you can turn them off. Doing so may cause some apps to crash or otherwise misbehave, whereas newer apps may simply ask for you to grant the permission again and refuse to do so if you don’t.

Meitu isn’t especially rare in this respect. Over 50% of the Android apps I have handy request your IMEI, although I haven’t tracked what they all do with it. It’s certainly something to be concerned about, but Meitu isn’t especially rare here – there are big-name apps that do exactly the same thing. There’s a legitimate question over whether Android should be making it so easy for apps to obtain this level of identifying information without more explicit informed consent from the user, but until Google do anything to make it more difficult, apps will continue making use of this information. Let’s turn this into a conversation about user privacy online rather than blaming one specific example.

comment count unavailable comments

Notes about the FTC action against D-Link

Post Syndicated from Robert Graham original http://blog.erratasec.com/2017/01/notes-about-ftc-action-against-d-link.html

Today, the FTC filed a lawsuit[*] against D-Link for security problems, such as backdoor passwords. I thought I’d write up some notes.

The suit is not “product liability”, but “unfair and deceptive” business practices for promising “security”. In addition, they interpret “security” different from the cybersecurity community.

This needs to be stressed because right now in our industry, there is a big discussion of product liability, insisting that everything attached to the Internet needs to be secured. People will therefore assume the FTC action is based on “liability”.

Instead, all six counts are based upon the fact that D-Link offers its products for securing networks, and claims they are secure. Because they have backdoor passwords, clear-text passwords, command-injection bugs, and public private-keys, the FTC feels the claims of security to be untrue.

The key point I’m trying to make is that D-Link can resolve the suit (in theory) by simply removing all claims of “security”. Sure, it can claim it supports stateful-inspection firewalls and WPA2, but not things like “WPA2 security”. (Sure, the FTC may come back with a new lawsuit — but it would solve the points raised in this one).

On the other hand, while “deception” is the law the FTC uses, their obvious real intent is to improve security. They intend for D-Link to remove it’s security weakness, not to change its claims. The lawsuit is also intended to scare all IoT makers into securing their products, not to remove claims of security.

We see this intent in other posts on the FTC website. They’ve long been talking about IoT security. Recently, they announced a contest giving out $25,000 to the best solution for patching out-of-date IoT devices [*]. It’s a silly contest, but shows what their real intent is.

Thus, the language of the lawsuit is very much about improving security, while the actual counts are about unfair/deceptive practices.

This is nonsense for a number of reasons. Among their claims is that D-Link lied to their customers for saying “you need to change the default password to secure the device”, because the device still had a command-injection bug. That’s a shocking departure from common sense. We in the cybersecurity community repeatedly advise people to change passwords to make devices more secure, ignoring any other insecurity that might exist. It means I’m just as deceptive as D-Link is.

The FTC’s action is a clear violation of “due process”. They didn’t create a standard ahead of time of bugs that it would consider making a product “insecure”, but instead arbitrarily punished D-Link for not meeting an unknown standard “secure”. They never published a document saying “you can’t advertise your product as being ‘secure’ if it contains this list of problems”.

More to the point, their idea of “secure” is at odds with the cybersecurity community. We would indeed describe WPA2 as secure, regardless of some other feature of the device that makes it insecure. Most IoT devices are intended to be used behind a firewall anyway, so the only attack surface is the WiFi network. In such cases, the device can have backdoor passwords up the ying-yang, and we in the cybersecurity community will still call this “secure”.

This is important because no product will ever be perfectly secure. Ten years from now, hackers will still discover some bug in some IoT product that nobody considered before, and the FTC will come down on them and punish them for deceptive practice. This is also counterproductive to the FTC’s goals: if they are going to be so unfair about it, they are going to create incentives for companies to produce the wrong solution, to stop advertising their products as “secure”.

The consequence of this action against D-Link is that the FTC is going to create an enormous chilling effect on innovation. As apps and IoT devices proliferate, the FTC is going to punish those on the forefront creating new and innovative products. At the same time, it’s going to have little impact on actual security. They’ll raise the price of brand-name products, while still being unable to target the white-box/no-name products that contain most of the vulnerabilities.


D-Link’s makes a standard claim that we always make in the security industry:

…and then the FTC sues them for it.


AWS Hot Startups – A Year in Review

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-hot-startups-a-year-in-review/

It is the end of 2016! Tina Barr has a great roundup of all the startups we featured this year. Check it out and see if you managed to read about them all, then come back in January for when we start up all over again.

Also- I wanted to thank Elsa Mayer for her hard work in helping out with the startup posts.

-Ana


 

What a year it has been for startups! We began the Hot Startups series in March as a way to feature exciting AWS-powered startups and the motivation behind the products and services they offer. Since then we’ve featured 27 startups across a wide range of industries including healthcare, commerce, social, finance, and many more. Each startup offers a unique product to its users – whether it’s an entertaining app or website, an educational platform, or a product to help businesses grow, startups have the ability to reach customers far and wide.

The startups we showcased this year are headquartered all over the world. Check them out on the map below!

startup map

In case you missed any of the posts, here is a list of all the hot startups we featured in 2016:

March

  • Intercom – One place for every team in an Internet business to see and talk
    to customers, personally, at scale.
  • Tile – A popular key locator product that works with an app to help people find their stuff.
  • Bugsnag – A tool to capture and analyze runtime errors in production web and mobile applications.
  • DroneDeploy  – Making the sky productive and accessible for everyone.

April

  • Robinhood – Free stock trading to democratize access to financial markets.
  • Dubsmash – Bringing joy to communication through video
  • Sharethrough – An all-in-one native advertising platform.

June

  • Shaadi.com – Helping South Asians to find a companion for life.
  • Capillary– Boosting customer engagement for e-commerce.
  • Monzo – A mobile-first bank.

July

  • Depop– A social mobile marketplace for artists and friends to buy and sell products.
  • Nextdoor – Building stronger and safer neighborhoods through technology.
  • Branch – Provides free deep linking technology for mobile app developers to gain and retain users.

August

  • Craftsvilla– Offering a platform to purchase ethnic goods.
  • SendBird – Helping developers build 1-on-1 messaging and group chat quickly.
  • Teletext.io – A solution for content management, without the system.
  • Wavefront– A cloud-based analytics platform.

September

  • Funding Circle – The leading online marketplace for business loans.
  • Karhoo– A ride comparison app.
  • nearbuy – Connecting customers and local merchants across India.

October

  • Optimizely – Providing web and mobile A/B testing for the world’s leading brands.
  • Touch Surgery – Building technologies for the global surgical community.
  • WittyFeed – Creating viral content.

November

  • AwareLabs – Helping small businesses build smart websites.
  • Doctor On Demand– Delivering fast, easy, and cost-effective access to top healthcare providers.
  • Starling Bank – Mobile banking for the next generation.
  • VigLink – Powering content-driven commerce.

Thank you for keeping up with us as we shared these startups’ amazing stories throughout the year. Be sure to check back here in January for our first hot startups of 2017!

-Tina Barr

WWW Malware Hides in Images

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2016/12/www_malware_hid.html

There’s new malware toolkit that uses stegaography to hide in images:

For the past two months, a new exploit kit has been serving malicious code hidden in the pixels of banner ads via a malvertising campaign that has been active on several high profile websites.

Discovered by security researchers from ESET, this new exploit kit is named Stegano, from the word steganography, which is a technique of hiding content inside other files.

In this particular scenario, malvertising campaign operators hid malicious code inside PNG images used for banner ads.

The crooks took a PNG image and altered the transparency value of several pixels. They then packed the modified image as an ad, for which they bought ad displays on several high-profile websites.

Since a large number of advertising networks allow advertisers to deliver JavaScript code with their ads, the crooks also included JS code that would parse the image, extract the pixel transparency values, and using a mathematical formula, convert those values into a character.

Slashdot thread.

Succeeding MegaZeux

Post Syndicated from Eevee original https://eev.ee/blog/2016/10/06/succeeding-megazeux/

In the beginning, there was ZZT. ZZT was a set of little shareware games for DOS that used VGA text mode for all the graphics, leading to such whimsical Rogue-like choices as ä for ammo pickups, Ω for lions, and for keys. It also came with an editor, including a small programming language for creating totally custom objects, which gave it the status of “game creation system” and a legacy that survives even today.

A little later on, there was MegaZeux. MegaZeux was something of a spiritual successor to ZZT, created by (as I understand it) someone well-known for her creative abuse of ZZT’s limitations. It added quite a few bells and whistles, most significantly a built-in font editor, which let aspiring developers draw simple sprites rather than rely on whatever they could scrounge from the DOS font.

And then…

And then, nothing. MegaZeux was updated for quite a while, and (unlike ZZT) has even been ported to SDL so it can actually run on modern operating systems. But there was never a third entry in this series, another engine worthy of calling these its predecessors.

I think that’s a shame.

The legacy

Plenty of people have never heard of ZZT, and far more have never heard of MegaZeux, so here’s a brief primer.

Both were released as “first-episode” shareware: they came with one game free, and you could pony up some cash to get the sequels. Those first games — Town of ZZT and Caverns of Zeux — have these moderately iconic opening scenes.

Town of ZZT
Caverns of Zeux

In the intervening decades, all of the sequels have been released online for free. If you want to try them yourself, ZZT 3.2 includes Town of ZZT and its sequels (but must be run in DOSBox), and you can get MegaZeux 2.84c, Caverns of Zeux, and the rest of the Zeux series separately.

Town of ZZT has you, the anonymous player, wandering around a loosely-themed “town” in search of five purple keys. It’s very much a game of its time: the setting is very vague but manages to stay distinct and memorable with very light touches; the puzzles range from trivial to downright cruel; the interface itself fights against you, as you can’t carry more than one purple key at a time; and the game can be softlocked in numerous ways, only some of which have advance warning in the form of “SAVE!!!” written carved directly into the environment.

The armory, and a gruff guardian
Darkness, which all players love
A few subtle hints

Caverns of Zeux is a little more cohesive, with a (thin) plot that unfolds as you progress through the game. Your objectives are slightly vaguer; you start out only knowing you’re trapped in a cave, and further information must be gleaned from NPCs. The gameplay is shaken up a couple times throughout — you discover spellbooks that give you new abilities, but later lose your primary weapon. The meat of the game is more about exploring and less about wacky Sokoban puzzles, though with many of the areas looking very similar and at least eight different-colored doors scattered throughout the game, the backtracking can get frustrating.

A charming little town
A chasm with several gem holders
The ice caves, or maybe caverns

Those are obviously a bit retro-looking now, but they’re not bad for VGA text made by individual hobbyists in 1991 and 1994. ZZT only even uses CGA’s eight bright colors. MegaZeux takes a bit more advantage of VGA capabilities to let you edit the palette as well as the font, but games are still restricted to only using 16 colors at one time.

The font ZZT was stuck with
MegaZeux's default character set

That’s great, but who cares?

A fair question!

ZZT and MegaZeux both occupy a unique game development niche. It’s the same niche as (Z)Doom, I think, and a niche that very few other tools fill.

I’ve mumbled about this on Twitter a couple times, and several people have suggested that the PICO-8 or Mario Maker might be in the same vein. I disagree wholeheartedly! ZZT, MegaZeux, and ZDoom all have two critical — and rare — things in common.

  1. You can crack open the editor, draw a box, and have a game. On the PICO-8, you are a lonely god in an empty void; you must invent physics from scratch before you can do anything else. ZZT, MegaZeux, and Doom all have enough built-in gameplay to make a variety of interesting levels right out of the gate. You can treat them as nothing more than level editors, and you’ll be hitting the ground running — no code required. And unlike most “no programming” GCSes, I mean that literally!

  2. If and when you get tired of only using the built-in objects, you can extend the engine. ZZT and MegaZeux have programmable actors built right in. Even vanilla Doom was popular enough to gain a third-party tool, DEHACKED, which could edit the compiled doom.exe to customize actor behavior. Mario Maker might be a nice and accessible environment for making games, but at the end of the day, the only thing you can make with it is Mario.

Both of these properties together make for a very smooth learning curve. You can open the editor and immediately make something, rather than needing to absorb a massive pile of upfront stuff before you can even get a sprite on the screen. Once you need to make small tweaks, you can dip your toes into robots — a custom pickup that gives you two keys at once is four lines of fairly self-explanatory code. Want an NPC with a dialogue tree? That’s a little more complex, but not much. And then suddenly you discover you’re doing programming. At the same time, you get rendering, movement, combat, collision, health, death, pickups, map transitions, menus, dialogs, saving/loading… all for free.

MegaZeux has one more nice property, the art learning curve. The built-in font is perfectly usable, but a world built from monochrome 8×14 tiles is a very comfortable place to dabble in sprite editing. You can add eyebrows to the built-in player character or slightly reshape keys to fit your own tastes, and the result will still fit the “art style” of the built-in assets. Want to try making your own sprites from scratch? Go ahead! It’s much easier to make something that looks nice when you don’t have to worry about color or line weight or proportions or any of that stuff.

It’s true that we’re in an “indie” “boom” right now, and more game-making tools are available than ever before. A determined game developer can already choose from among dozens (hundreds?) of editors and engines and frameworks and toolkits and whatnot. But the operative word there is “determined“. Not everyone has their heart set on this. The vast majority of people aren’t interested in devoting themselves to making games, so the most they’d want to do (at first) is dabble.

But programming is a strange and complex art, where dabbling can be surprisingly difficult. If you want to try out art or writing or music or cooking or dance or whatever, you can usually get started with some very simple tools and a one-word Google search. If you want to try out game development, it usually requires programming, which in turn requires a mountain of upfront context and tool choices and explanations and mysterious incantations and forty-minute YouTube videos of some guy droning on in monotone.

To me, the magic of MegaZeux is that anyone with five minutes to spare can sit down, plop some objects around, and have made a thing.

Deep dive

MegaZeux has a lot of hidden features. It also has a lot of glass walls. Is that a phrase? It should be a phrase. I mean that it’s easy to find yourself wanting to do something that seems common and obvious, yet find out quite abruptly that it’s structurally impossible.

I’m not leading towards a conclusion here, only thinking out loud. I want to explain what makes MegaZeux interesting, but also explain what makes MegaZeux limiting, but also speculate on what might improve on it. So, you know, something for everyone.

Big picture

MegaZeux is a top-down adventure-ish game engine. You can make platformers, if you fake your own gravity; you can make RPGs, if you want to build all the UI that implies.

MegaZeux games can only be played in, well, MegaZeux. Games that need instructions and multiple downloads to be played are fighting an uphill battle. It’s a simple engine that seems reasonable to deploy to the web, and I’ve heard of a couple attempts at either reimplementing the engine in JavaScript or throwing the whole shebang at emscripten, but none are yet viable.

People have somewhat higher expectations from both games and tools nowadays. But approachability is often at odds with flexibility. The more things you explicitly support, the more complicated and intimidating the interface — or the more hidden features you have to scour the manual to even find out about.

I’ve looked through the advertising screenshots of Game Maker and RPG Maker, and I’m amazed how many things are all over the place at any given time. It’s like trying to configure the old Mozilla Suite. Every new feature means a new checkbox somewhere, and eventually half of what new authors need to remember is the set of things they can safely ignore.

SLADE’s Doom map editor manages to be much simpler, but I’m not particularly happy with that, either — it’s not clever enough to save you from your mistakes (or necessarily detect them), and a lot of the jargon makes no sense unless you’ve already learned what it means somewhere else. Plus, making the most of ZDoom’s extra features tends to involve navigating ten different text files that all have different syntax and different rules.

MegaZeux has your world, some menus with objects in them, and spacebar to place something. The UI is still very DOS-era, but once you get past that, it’s pretty easy to build something.

How do you preserve that in something “modern”? I’m not sure. The only remotely-similar thing I can think of is Mario Maker, which cleverly hides a lot of customization options right in the world editor UI: placing wings on existing objects, dropping objects into blocks, feeding mushrooms to enemies to make them bigger. The downside is that Mario Maker has quite a lot of apocryphal knowledge that isn’t written down anywhere. (That’s not entirely a downside… but I could write a whole other post just exploring that one sentence.)

Graphics

Oh, no.

Graphics don’t make the game, but they’re a significant limiting factor for MegaZeux. Fixing everything to a grid means that even a projectile can only move one tile at a time. Only one character can be drawn per grid space, so objects can’t usefully be drawn on top of each other. Animations are difficult, since they eat into your 255-character budget, which limits real-time visual feedback. Most individual objects are a single tile — creating anything larger requires either a lot of manual work to keep all the parts together, or the use of multi-tile sprites which don’t quite exist on the board.

And yet! The same factors are what make MegaZeux very accessible. The tiles are small and simple enough that different art styles don’t really clash. Using a grid means simple games don’t have to think about collision detection at all. A monochromatic font can be palette-shifted, giving you colorful variants of the same objects for free.

How could you scale up the graphics but preserve the charm and approachability? Hmm.

I think the palette restrictions might be important here, but merely bumping from 2 to 8 colors isn’t quite right. The palette-shifting in MegaZeux always makes me think of keys first, and multi-colored keys make me think of Chip’s Challenge, where the key sprites were simple but lightly shaded.

All four Chips Challenge 2 keys

The game has to contain all four sprites separately. If you wanted to have a single sprite and get all of those keys by drawing it in different colors, you’d have to specify three colors per key: the base color, a lighter color, and a darker color. In other words, a ramp — a short gradient, chosen from a palette, that can represent the same color under different lighting. Here are some PICO-8 ramps, for example. What about a sprite system that drew sprites in terms of ramps rather than individual colors?

A pixel-art door in eight different color schemes

I whipped up this crappy example to illustrate. All of the doors are fundamentally the same image, and all of them use only eight colors: black, transparent, and two ramps of three colors each. The top-left door could be expressed as just “light gray” and “blue” — those colors would be expanded into ramps automatically, and black would remain black.

I don’t know how well this would work, but I’d love to see someone try it. It may not even be necessary to require all sprites be expressed this way — maybe you could import your own truecolor art if you wanted. ZDoom works kind of this way, though it’s more of a historical accident: it does support arbitrary PNGs, but vanilla Doom sprites use a custom format that’s in terms of a single global palette, and only that custom format can be subjected to palette manipulation.


Now, MegaZeux has the problem that small sprites make it difficult to draw bigger things like UI (or a non-microscopic player). The above sprites are 32×32 (scaled up 2× for ease of viewing here), which creates the opposite problem: you can’t possibly draw text or other smaller details with them.

I wonder what could be done here. I know that the original Pokémon games have a concept of “metatiles”: every map is defined in terms of 4×4 blocks of smaller tiles. You can see it pretty clearly on this map of Pallet Town. Each larger square is a metatile, and many of them repeat, even in areas that otherwise seem different.

Pallet Town from Pokémon Red, carved into blocks

I left the NPCs in because they highlight one of the things I found most surprising about this scheme. All the objects you interact with — NPCs, signs, doors, items, cuttable trees, even the player yourself — are 16×16 sprites. The map appears to be made out of 16×16 sprites, as well — but it’s really built from 8×8 tiles arranged into bigger 32×32 tiles.

This isn’t a particularly nice thing to expose directly to authors nowadays, but it demonstrates that there are other ways to compose tiles besides the obvious. Perhaps simple terrain like grass and dirt could be single large tiles, but you could also make a large tile by packing together several smaller tiles?

Text? Oh, text can just be a font.

Player status

MegaZeux has no HUD. To know how much health you have, you need to press Enter to bring up the pause menu, where your health is listed in a stack of other numbers like “gems” and “coins”. I say “menu”, but the pause menu is really a list of keyboard shortcuts, not something you can scroll through and choose items from.

MegaZeux's in-game menu, showing a list of keyboard shortcuts on the left and some stats on the right

To be fair, ZZT does reserve the right side of the screen for your stats, and it puts health at the top. I find myself scanning the MegaZeux pause menu for health every time, which seems a somewhat poor choice for the number that makes the game end when you run out of it.

Unlike most adventure games, your health is an integer starting at 100, not a small number of hearts or whatever. The only feedback when you take damage is a sound effect and an “Ouch!” at the bottom of the screen; you don’t flinch, recoil, or blink. Health pickups might give you any amount of health, you can pick up health beyond 100, and nothing on the screen tells you how much you got when you pick one up. Keeping track of your health in your head is, ah, difficult.

MegaZeux also has a system of multiple lives, but those are also just a number, and the default behavior on “death” is for your health to reset to 100 and absolutely nothing else happens. Walking into lava (which hurts for 100 at a time) will thus kill you and strip you of all your lives quite rapidly.

It is possible to manually create a HUD in MegaZeux using the “overlay” layer, a layer that gets drawn on top of everything else in the world. The downside is that you then can’t use the overlay for anything in-world, like roofs or buildings that can be walked behind. The overlay can be in multiple modes, one that’s attached to the viewport (like a HUD) and one that’s attached to the world (like a ceiling layer), so an obvious first step would be offering these as separate features.

An alternative is to use sprites, blocks of tiles created and drawn as a single unit by Robotic code. Sprites can be attached to the viewport and can even be drawn even above the overlay, though they aren’t exposed in the editor and must be created entirely manually. Promising, if clumsy and a bit non-obvious — I only just now found out about this possibility by glancing at an obscure section of the manual.

Another looming problem is that text is the same size as everything else — but you generally want a HUD to be prominent enough to glance at very quickly.

This makes me wonder how more advanced drawing could work in general. Instead of writing code by hand to populate and redraw your UI, could you just drag and drop some obvious components (like “value of this number”) onto a layer? Reuse the same concept for custom dialogs and menus, perhaps?

Inventory

MegaZeux has no inventory. Or, okay, it has sort of an inventory, but it’s all over the place.

The stuff in the pause menu is kind of like an inventory. It counts ammo, gems, coins, two kinds of bombs, and a variety of keys for you. The game also has multiple built-in objects that can give you specific numbers of gems and coins, which is neat, except that gems and coins don’t do actually anything. I think they increase your score, but until now I’d forgotten that MegaZeux has a score.

A developer can also define six named “counters” (i.e., integers) that will show up on the pause menu when nonzero. Caverns of Zeux uses this to show you how many rainbow gems you’ve discovered… but it’s just a number labeled RainbowGems, and there’s no way to see which ones you have.

Other than that, you’re on your own. All of the original Zeux games made use of an inventory, so this is a really weird oversight. Caverns of Zeux also had spellbooks, but you could only see which ones you’d found by trying to use them and seeing if it failed. Chronos Stasis has maybe a dozen items you can collect and no way to see which ones you have — though, to be fair, you use most of them in the same place. Forest of Ruin has a fairly standard inventory, but no way to view it. All three games have at least one usable item that they just bind to a key, which you’d better remember, because it’s game-specific and thus not listed in the general help file.

To be fair, this is preposterously flexible in a way that a general inventory might not be. But it’s also tedious for game authors and potentially confusing for players.

I don’t think an inventory would be particularly difficult to support, and MegaZeux is already halfway there. Most likely, the support is missing because it would need to be based on some concept of a custom object, and MegaZeux doesn’t have that either. I’ll get to that in a bit.

Creating new objects

MegaZeux allows you to create “robots”, objects that are controlled entirely through code you write in a simple programming language. You can copy and paste robots around as easily as any other object on the map. Cool.

What’s less cool is that robots can’t share code — when you place one, you make a separate copy of all of its code. If you create a small horde of custom monsters, then later want to make a change, you’ll have to copy/paste all the existing ones. Hope you don’t have them on other boards!

Some workarounds exist: you could make use of robots’ ability to copy themselves at runtime, and it’s possible to save or load code to/from an external file at runtime. More cumbersome than defining a template object and dropping it wherever you want, and definitely much less accessible.

This is really, really bad, because the only way to extend any of the builtin objects is to replace them with robots!

I’m a little spoiled by ZDoom, where you can create as many kinds of actor as you want. Actors can even inherit from one another, though the mechanism is a little limited and… idiosyncratic, so I wouldn’t call it beginner-friendly. It’s pretty nice to be able to define a type of monster or decoration and drop it all over a map, and I’m surprised such a thing doesn’t exist in MegaZeux, where boards and the viewport both tend to be fairly large.

This is the core of how ZDoom’s inventory works, too. I believe that inventories contain only kinds, not individual actors — that is, you can have 5 red keys, but the game only knows “5 of RedCard” rather than having five distinct RedCard objects. I’m sure part of the reason MegaZeux has no general-purpose inventory is that every custom object is completely distinct, with nothing fundamentally linking even identical copies of the same robot together.

Combat

By default, the player can shoot bullets by holding Space and pressing a direction. (Moving and shooting at the same time is… difficult.) Like everything else, bullets are fixed to the character grid, so they move an entire tile at a time.

Bullets can also destroy other projectiles, sometimes. A bullet hitting another bullet will annihilate both. A bullet hitting a fireball might either turn the fireball into a regular fire tile or simple be destroyed, depending on which animation frame the fireball is in when the bullet hits it. I didn’t know this until someone told me only a couple weeks ago; I’d always just thought it was random and arbitrary and frustrating. Seekers can’t be destroyed at all.

Most enemies charge directly at you; most are killed in one hit; most attack you by colliding with you; most are also destroyed by the collision.

The (built-in) combat is fairly primitive. It gives you something to do, but it’s not particularly satisfting, which is unfortunate for an adventure game engine.

Several factors conspire here. Graphical limitations make it difficult to give much visual feedback when something (including the player) takes damage or is destroyed. The motion of small, fast-moving objects on a fixed grid can be hard to keep track of. No inventory means weapons aren’t objects, either, so custom weapons need to be implemented separately in the global robot. No custom objects means new enemies and projectiles are difficult to create. No visual feedback means hitscan weapons are implausible.

I imagine some new and interesting directions would make themselves obvious in an engine with a higher resolution and custom objects.

Robotic

Robotic is MegaZeux’s programming language for defining the behavior of robots, and it’s one of the most interesting parts of the engine. A robot that acts like an item giving you two keys might look like this:

1
2
3
4
5
6
end
: "touch"
* "You found two keys!"
givekey c04
givekey c05
die as an item
MegaZeux's Robotic editor

Robotic has no blocks, loops, locals, or functions — though recent versions can fake functions by using special jumps. All you get is a fixed list of a few hundred commands. It’s effectively a form of bytecode assembly, with no manual assembling required.

And yet! For simple tasks, it works surprisingly well. Creating a state machine, as in the code above, is straightforward. end stops execution, since all robots start executing from their first line on start. : "touch" is a label (:"touch" is invalid syntax) — all external stimuli are received as jumps, and touch is a special label that a robot jumps to when the player pushes against it. * displays a message in the colorful status line at the bottom of the screen. givekey gives a key of a specific color — colors are a first-class argument type, complete with their own UI in the editor and an automatic preview of the particular colors. die as an item destroys the robot and simultaneously moves the player on top of it, as though the player had picked it up.

A couple other interesting quirks:

  • Most prepositions, articles, and other English glue words are semi-optional and shown in grey. The line die as an item above has as an greyed out, indicating that you could just type die item and MegaZeux would fill in the rest. You could also type die as item, die an item, or even die through item, because all of as, an, and through act like whitespace. Most commands sprinkle a few of these in to make themselves read a little more like English and clarify the order of arguments.

  • The same label may appear more than once. However, labels may be zapped, and a jump will always go to the first non-zapped occurrence of a label. This lets an author encode a robot’s state within the state of its own labels, obviating the need for state-tracking variables in many cases. (Zapping labels predates per-robot variables — “local counters” — which are unhelpfully named local through local32.)

    Of course, this can rapidly spiral out of control when state changes are more complicated or several labels start out zapped or different labels are zapped out of step with each other. Robotic offers no way to query how many of a label have been zapped and MegaZeux has no debugger for label states, so it’s not hard to lose track of what’s going on. Still, it’s an interesting extension atop a simple label-based state machine.

  • The built-in types often have some very handy shortcuts. For example, GO [dir] # tells a robot to move in some direction, some number of spaces. The directions you’d expect all work: NORTH, SOUTH, EAST, WEST, and synonyms like N and UP. But there are some extras like RANDNB to choose a random direction that doesn’t block the robot, or SEEK to move towards the player, or FLOW to continue moving in its current direction. Some of the extras only make sense in particular contexts, which complicates them a little, but the ability to tell an NPC to wander aimlessly with only RANDNB is incredible.

  • Robotic is more powerful than you might expect; it can change anything you can change in the editor, emulate the behavior of most other builtins, and make use of several features not exposed in the editor at all.

Nowadays, the obvious choice for an embedded language is Lua. It’d be much more flexible, to be sure, but it’d lose a little of the charm. One of the advantages of creating a totally custom language for a game is that you can add syntax for very common engine-specific features, like colors; in a general-purpose language, those are a little clumsier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
function myrobot:ontouch(toucher)
    if not toucher.is_player then
        return false
    end
    world:showstatus("You found two keys!")
    toucher.inventory:add(Key{color=world.colors.RED})
    toucher.inventory:add(Key{color=world.colors.PURPLE})
    self:die()
    return true
end

Changing the rules

MegaZeux has a couple kinds of built-in objects that are difficult to replicate — and thus difficult to customize.

One is projectiles, mentioned earlier. Several variants exist, and a handful of specific behaviors can be toggled with board or world settings, but otherwise that’s all you get. It should be feasible to replicate them all with robots, but I suspect it’d involve a lot of subtleties.

Another is terrain. MegaZeux has a concept of a floor layer (though this is not explicitly exposed in the editor) and some floor tiles have different behavior. Ice is slippery; forest blocks almost everything but can be trampled by the player; lava hurts the player a lot; fire hurts the player and can spread, but burns out after a while. The trick with replicating these is that robots cannot be walked on. An alternative is to use sensors, which can be walked on and which can be controlled by a robot, but anything other than the player will push a sensor rather than stepping onto it. The only other approach I can think of is to keep track of all tiles that have a custom terrain, draw or animate them manually with custom floor tiles, and constantly check whether something’s standing there.

Last are powerups, which are really effects that rings or potions can give you. Some of them are special cases of effects that Robotic can do more generally, such as giving 10 health or changing all of one object into another. Some are completely custom engine stuff, like “Slow Time”, which makes everything on the board (even robots!) run at half speed. The latter are the ones you can’t easily emulate. What if you want to run everything at a quarter speed, for whatever reason? Well, you can’t, short of replacing everything with robots and doing a multiplication every time they wait.

ZDoom has a similar problem: it offers fixed sets of behaviors and powerups (which mostly derive from the commercial games it supports) and that’s it. You can manually script other stuff and go quite far, but some surprisingly simple ideas are very difficult to implement, just because the engine doesn’t offer the right kind of hook.

The tricky part of a generic engine is that a game creator will eventually want to change the rules, and they can only do that if the engine has rules for changing those rules. If the engine devs never thought of it, you’re out of luck.

Someone else please carry on this legacy

MegaZeux still sees development activity, but it’s very sporadic — the last release was in 2012. New features tend to be about making the impossible possible, rather than making the difficult easier. I think it’s safe to call MegaZeux finished, in the sense that a novel is finished.

I would really like to see something pick up its torch. It’s a very tricky problem, especially with the sprawling complexity of games, but surely it’s worth giving non-developers a way to try out the field.

I suppose if ZZT and MegaZeux and ZDoom have taught us anything, it’s that the best way to get started is to just write a game and give it very flexible editing tools. Maybe we should do that more. Maybe I’ll try to do it with Isaac’s Descent HD, and we’ll see how it turns out.

Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics

Post Syndicated from Chris Marshall original https://blogs.aws.amazon.com/bigdata/post/Tx1XNQPQ2ARGT81/Real-time-Clickstream-Anomaly-Detection-with-Amazon-Kinesis-Analytics

Chris Marshall is a Solutions Architect for Amazon Web Services

Analyzing web log traffic to gain insights that drive business decisions has historically been performed using batch processing.  While effective, this approach results in delayed responses to emerging trends and user activities.  There are solutions to deal with processing data in real time using streaming and micro-batching technologies, but they can be complex to set up and maintain.  Amazon Kinesis Analytics is a managed service that makes it very easy to identify and respond to changes in behavior in real-time.   

One use case where it’s valuable to have immediate insights is analyzing clickstream data.   In the world of digital advertising, an impression is when an ad is displayed in a browser and a clickthrough represents a user clicking on that ad.  A clickthrough rate (CTR) is one way to monitor the ad’s effectiveness.  CTR is calculated in the form of: CTR = Clicks / Impressions * 100.  Digital marketers are interested in monitoring CTR to know when particular ads perform better than normal, giving them a chance to optimize placements within the ad campaign.  They may also be interested in anomalous low-end CTR that could be a result of a bad image or bidding model.

In this post, I show an analytics pipeline which detects anomalies in real time for a web traffic stream, using the RANDOM_CUT_FOREST function available in Amazon Kinesis Analytics. 

RANDOM_CUT_FOREST 

Amazon Kinesis Analytics includes a powerful set of analytics functions to analyze streams of data.  One such function is RANDOM_CUT_FOREST.  This function detects anomalies by scoring data flowing through a dynamic data stream. This novel approach identifies a normal pattern in streaming data, then compares new data points in reference to it. For more information, see Robust Random Cut Forest Based Anomaly Detection On Streams.

Analytics pipeline components

To demonstrate how the RANDOM_CUT_FOREST function can be used to detect anomalies in real-time click through rates, I will walk you through how to build an analytics pipeline and generate web traffic using a simple Python script.  When your injected anomaly is detected, you get an email or SMS message to alert you to the event. 

This post first walks through the components of the analytics pipeline, then discusses how to build and test it in your own AWS account.  The accompanying AWS CloudFormation script builds the Amazon API Gateway API, Amazon Kinesis streams, AWS Lambda function, Amazon SNS components, and IAM roles required for this walkthrough. Then you manually create the Amazon Kinesis Analytics application that uses the RANDOM_CUT_FOREST function in SQL. 

Web Client

Because Python is a popular scripting language available on many clients, we’ll use it to generate web impressions and click data by making HTTP GET requests with the requests library.  This script mimics beacon traffic generated by web browsers.  The GET request contains a query string value of click or impression to indicate the type of beacon being captured.  The script has a while loop that iterates 2500 times.  For each pass through the loop, an Impression beacon is sent.  Then, the random function is used to potentially send an additional click beacon.  For most of the passes through the loop, the click request is generated at the rate of roughly 10% of the time.  However, for some web requests, the CTR jumps to a rate of 50% representing an anomaly.  These are artificially high CTR rates used for the purpose of illustration for this post, but the functionality would be the same using small fractional values

import requests
import random
import sys
import argparse

def getClicked(rate):
	if random.random() <= rate:
		return True
	else:
		return False

def httpGetImpression():
	url = args.target + '?browseraction=Impression' 
	r = requests.get(url)

def httpGetClick():
	url = args.target + '?browseraction=Click' 
	r = requests.get(url)
	sys.stdout.write('+')

parser = argparse.ArgumentParser()
parser.add_argument("target",help=" the http(s) location to send the GET request")
args = parser.parse_args()
i = 0
while (i < 2500):
	httpGetImpression()
	if(i<1950 or i>=2000):
		clicked = getClicked(.1)
		sys.stdout.write('_')
	else:
		clicked = getClicked(.5)
		sys.stdout.write('-')
	if(clicked):
		httpGetClick()
	i = i + 1
	sys.stdout.flush()

Web “front door”

Client web requests are handled by Amazon API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. This handles the web request traffic to get data into your analytics pipeline by being a service proxy to an Amazon Kinesis stream.  API Gateway takes click and impression web traffic and converts the HTTP header and query string data into a JSON message and place it into your stream.  The CloudFormation script creates an API endpoint with a body mapping template similar to the following:

#set($inputRoot = $input.path('$'))
{
  "Data": "$util.base64Encode("{ ""browseraction"" : ""$input.params('browseraction')"", ""site"" : ""$input.params('Host')"" }")",
  "PartitionKey" : "$input.params('Host')",
  "StreamName" : "CSEKinesisBeaconInputStream"
}

(API Gateway service proxy body mapping template)

This tells API Gateway to take the host header and query string inputs and convert them into a payload to be put into a stream using a service proxy for Amazon Kinesis Streams.

Input stream

Amazon Kinesis Streams can capture terabytes of data per hour from thousands of sources.  In this case, you use it to deliver your JSON messages to Amazon Kinesis Analytics. 

Analytics application

Amazon Kinesis Analytics processes the incoming messages and allow you to perform analytics on the dynamic stream of data.  You use SQL to calculate the CTR and pass that into your RANDOM_CUT_FOREST function to calculate an anomaly score. 

First, calculate the counts of impressions and clicks using SQL with a tumbling window in Amazon Kinesis Analytics.  Analytics uses streams and pumps in SQL to process data.  See our earlier post to get an overview of streaming data with Kinesis Analytics. Inside the Analytics SQL, a stream is analogous to a table and a pump is the flow of data into those tables.  Here’s the description of streams for the impression and click data.

CREATE OR REPLACE STREAM "CLICKSTREAM" ( 
   "CLICKCOUNT" DOUBLE
);


CREATE OR REPLACE STREAM "IMPRESSIONSTREAM" ( 
   "IMPRESSIONCOUNT" DOUBLE
);

Later, you calculate the clicks divided by impressions; you are using a DOUBLE data type to contain the count values.  If you left them as integers, the division below would yield only a 0 or 1.  Next, you define the pumps that populate the stream using a tumbling window, a window of time during which all records received are considered as part of the statement. 

CREATE OR REPLACE PUMP "CLICKPUMP" STARTED AS 
INSERT INTO "CLICKSTREAM" ("CLICKCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Click'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);
CREATE OR REPLACE PUMP "IMPRESSIONPUMP" STARTED AS 
INSERT INTO "IMPRESSIONSTREAM" ("IMPRESSIONCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Impression'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);

The GROUP BY statement uses FLOOR and ROWTIME to create a tumbling window of 10 seconds.  This tumbling window will output one record every ten seconds assuming we are receiving data from the corresponding pump.  In this case, you output the total number of records that were clicks or impressions. 

Next, use the output of these pumps to calculate the CTR:

CREATE OR REPLACE STREAM "CTRSTREAM" (
  "CTR" DOUBLE
);

CREATE OR REPLACE PUMP "CTRPUMP" STARTED AS 
INSERT INTO "CTRSTREAM" ("CTR")
SELECT STREAM "CLICKCOUNT" / "IMPRESSIONCOUNT" * 100 as "CTR"
FROM "IMPRESSIONSTREAM",
  "CLICKSTREAM"
WHERE "IMPRESSIONSTREAM".ROWTIME = "CLICKSTREAM".ROWTIME;

Finally, these CTR values are used in RANDOM_CUT_FOREST to detect anomalous values.  The DESTINATION_SQL_STREAM is the output stream of data from the Amazon Kinesis Analytics application. 

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "CTRPERCENT" DOUBLE,
    "ANOMALY_SCORE" DOUBLE
);

CREATE OR REPLACE PUMP "OUTPUT_PUMP" STARTED AS 
INSERT INTO "DESTINATION_SQL_STREAM" 
SELECT STREAM * FROM
TABLE (RANDOM_CUT_FOREST( 
             CURSOR(SELECT STREAM "CTR" FROM "CTRSTREAM"), --inputStream
             100, --numberOfTrees (default)
             12, --subSampleSize 
             100000, --timeDecay (default)
             1) --shingleSize (default)
)
WHERE ANOMALY_SCORE > 2; 

The RANDOM_CUT_FOREST function greatly simplifies the programming required for anomaly detection.  However, understanding your data domain is paramount when performing data analytics.  The RANDOM_CUT_FOREST function is a tool for data scientists, not a replacement for them.  Knowing whether your data is logarithmic, circadian rhythmic, linear, etc. will provide the insights necessary to select the right parameters for RANDOM_CUT_FOREST.  For more information about parameters, see the RANDOM_CUT_FOREST Function

Fortunately, the default values work in a wide variety of cases. In this case, use the default values for all but the subSampleSize parameter.  Typically, you would use a larger sample size to increase the pool of random samples used to calculate the anomaly score; for this post, use 12 samples so as to start evaluating the anomaly scores sooner.  

Your SQL query outputs one record every ten seconds from the tumbling window so you’ll have enough evaluation values after two minutes to start calculating the anomaly score.  You are also using a cutoff value where records are only output to “DESTINATION_SQL_STREAM” if the anomaly score from the function is greater than 2 using the WHERE clause. To help visualize the cutoff point, here are the data points from a few runs through the pipeline using the sample Python script:

As you can see, the majority of the CTR data points are grouped around a 10% rate.  As the values move away from the 10% rate, their anomaly scores go up because they are more infrequent.  In the graph, an anomaly score above the cutoff is shaded orange. A cutoff value of 2 works well for this data set.  

Output stream

The Amazon Kinesis Analytics application outputs the values from “DESTINATION_SQL_STREAM” to an Amazon Kinesis stream when the record has an anomaly score greater than 2.

Anomaly execution function

The output stream is an input event for an AWS Lambda function with a batch size of 1, so the function gets called one time for each message put in the output stream.  This function represents a place where you could react to anomalous events.  In a real world scenario, you may want to instead update a real-time bidding system to react to current events.  In this example, you send a message to SNS to see a tangible response to the changing CTR.  The CloudFormation template creates a Lambda function similar to the following:

var AWS = require('aws-sdk');
var sns = new AWS.SNS( { region: "<region>" });
	exports.handler = function(event, context) {
	//our batch size is 1 record so loop expected to process only once
	event.Records.forEach(function(record) {
		// Kinesis data is base64 encoded so decode here
		var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
		var rec = payload.split(',');
		var ctr = rec[0];
		var anomaly_score = rec[1];
		var detail = 'Anomaly detected with a click through rate of ' + ctr + '% and an anomaly score of ' + anomaly_score;
		var subject = 'Anomaly Detected';
		var params = {
			Message: detail,
			MessageStructure: 'String',
			Subject: subject,
			TopicArn: 'arn:aws:sns:us-east-1::ClickStreamEvent'
		};
		sns.publish(params, function(err, data) {
			if (err) context.fail(err.stack);
			else     context.succeed('Published Notification');
		});
	});
}; 

Notification system

Amazon SNS is a fully managed, highly scalable messaging platform.  For this walkthrough, you send messages to SNS with subscribers that send email and text messages to a mobile phone.

Analytics pipeline walkthrough

Sometimes it’s best to build out a solution so you can see all the parts working and get a good sense of how it works.   Here are the steps to build out the entire pipeline as described above in your own account and perform real-time clickstream analysis yourself.  Following this walkthrough creates resources in your account that you can use to test the example pipeline.

Prerequisites

To complete the walkthrough, you’ll need an AWS account and a place to execute Python scripts.

Configure the CloudFormation template

Download the CloudFormation template named CSE-cfn.json from the AWS Big Data blog Github repository. In the CloudFormation console, set your region to N. Virginia, Oregon, or Ireland and then browse to that file.

When your script executes and an anomaly is detected, SNS sends a notification.  To see these in an email or text message, enter your email address and mobile phone number in the Parameters section and choose Next

This CloudFormation script creates policies and roles necessary to process the incoming messages.  Acknowledge this by selecting the field or you will get a warning about IAM_CAPABILITY.

If you entered a valid email or SMS number, you will receive a validation message when the SNS subscriptions are created by CloudFormation.  If you don’t wish to receive email or texts from your pipeline, you don’t need to complete the verification process.  

Run the Python script

Copy ClickImpressionGenerator.py to a local folder.

This script uses the requests library to make HTTP requests.  If you don’t have that Python package globally installed, you can install it locally in by running the following command in the same folder as the Python script:

pip install requests –t .

When the CloudFormation stack is complete, choose Outputs, ExecutionCommand, and run the command from the folder with the Python script. 

This will start generating web traffic and send it to the API Gateway in the front of your analytics pipeline.  It is important to have data flowing into your Amazon Kinesis Analytics application for the next section so that a schema can be generated based on the incoming data.  If the script completes before you finish with the next section, simply restart the script with the same command.

Create the Analytics pipeline application

Open the Amazon Kinesis console and choose Go to Analytics.

Choose Create new application to create the Kinesis Analytics application, enter an application name and description, and then choose Save and continue

Choose Connect to a source to see a list of streams.  Select the stream that starts with the name of your CloudFormation stack and contains CSEKinesisBeaconInputStream in the form of: <stack name>- CSEKinesisBeaconInputStream-<random string>.  This is the stream that your click and impression requests will be sent to from API Gateway. 

Scroll down and choose Choose an IAM role.  Select the role with the name in the form of <stack name>-CSEKinesisAnalyticsRole-<random string>.  If your ClickImpressionGenerator.py script is still running and generating data, you should see a stream sample.  Scroll to the bottom and choose Save and continue

Next, add the SQL for your real-time analytics.  Choose the Go to SQL editor, choose Yes, start application, and paste the SQL contents from CSE-SQL.sql into the editor. Choose Save and run SQL.  After a minute or so, you should see data showing up every ten seconds in the CLICKSTREAM.  Choose Exit when you are done editing. 

Choose Connect to a destination and select the <stack name>-CSEKinesisBeaconOutputStream-<random string> from the list of streams.  Change the output format to CSV. 

Choose Choose an IAM role and select <stack name>-CSEKinesisAnalyticsRole-<random string> again.  Choose Save and continue.  Now your analytics pipeline is complete.  When your script gets to the injected anomaly section, you should get a text message or email to notify you of the anomaly.  To clean up the resources created here, delete the stack in CloudFormation, stop the Kinesis Analytics application then delete it. 

Summary

A pipeline like this can be used for many use cases where anomaly detection is valuable. What solutions have you enabled with this architecture? If you have questions or suggestions, please comment below.

—————————–

Related

Process Amazon Kinesis Aggregated Data with AWS Lambda

Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics

Post Syndicated from Chris Marshall original https://aws.amazon.com/blogs/big-data/real-time-clickstream-anomaly-detection-with-amazon-kinesis-analytics/

Chris Marshall is a Solutions Architect for Amazon Web Services

Analyzing web log traffic to gain insights that drive business decisions has historically been performed using batch processing.  While effective, this approach results in delayed responses to emerging trends and user activities.  There are solutions to deal with processing data in real time using streaming and micro-batching technologies, but they can be complex to set up and maintain.  Amazon Kinesis Analytics is a managed service that makes it very easy to identify and respond to changes in behavior in real-time.

One use case where it’s valuable to have immediate insights is analyzing clickstream data.   In the world of digital advertising, an impression is when an ad is displayed in a browser and a clickthrough represents a user clicking on that ad.  A clickthrough rate (CTR) is one way to monitor the ad’s effectiveness.  CTR is calculated in the form of: CTR = Clicks / Impressions * 100.  Digital marketers are interested in monitoring CTR to know when particular ads perform better than normal, giving them a chance to optimize placements within the ad campaign.  They may also be interested in anomalous low-end CTR that could be a result of a bad image or bidding model.

In this post, I show an analytics pipeline which detects anomalies in real time for a web traffic stream, using the RANDOM_CUT_FOREST function available in Amazon Kinesis Analytics.

RANDOM_CUT_FOREST 

Amazon Kinesis Analytics includes a powerful set of analytics functions to analyze streams of data.  One such function is RANDOM_CUT_FOREST.  This function detects anomalies by scoring data flowing through a dynamic data stream. This novel approach identifies a normal pattern in streaming data, then compares new data points in reference to it. For more information, see Robust Random Cut Forest Based Anomaly Detection On Streams.

Analytics pipeline components

To demonstrate how the RANDOM_CUT_FOREST function can be used to detect anomalies in real-time click through rates, I will walk you through how to build an analytics pipeline and generate web traffic using a simple Python script.  When your injected anomaly is detected, you get an email or SMS message to alert you to the event.

This post first walks through the components of the analytics pipeline, then discusses how to build and test it in your own AWS account.  The accompanying AWS CloudFormation script builds the Amazon API Gateway API, Amazon Kinesis streams, AWS Lambda function, Amazon SNS components, and IAM roles required for this walkthrough. Then you manually create the Amazon Kinesis Analytics application that uses the RANDOM_CUT_FOREST function in SQL.

Web Client

Because Python is a popular scripting language available on many clients, we’ll use it to generate web impressions and click data by making HTTP GET requests with the requests library.  This script mimics beacon traffic generated by web browsers.  The GET request contains a query string value of click or impression to indicate the type of beacon being captured.  The script has a while loop that iterates 2500 times.  For each pass through the loop, an Impression beacon is sent.  Then, the random function is used to potentially send an additional click beacon.  For most of the passes through the loop, the click request is generated at the rate of roughly 10% of the time.  However, for some web requests, the CTR jumps to a rate of 50% representing an anomaly.  These are artificially high CTR rates used for the purpose of illustration for this post, but the functionality would be the same using small fractional values

import requests
import random
import sys
import argparse

def getClicked(rate):
	if random.random() <= rate:
		return True
	else:
		return False

def httpGetImpression():
	url = args.target + '?browseraction=Impression' 
	r = requests.get(url)

def httpGetClick():
	url = args.target + '?browseraction=Click' 
	r = requests.get(url)
	sys.stdout.write('+')

parser = argparse.ArgumentParser()
parser.add_argument("target",help=" the http(s) location to send the GET request")
args = parser.parse_args()
i = 0
while (i < 2500):
	httpGetImpression()
	if(i<1950 or i>=2000):
		clicked = getClicked(.1)
		sys.stdout.write('_')
	else:
		clicked = getClicked(.5)
		sys.stdout.write('-')
	if(clicked):
		httpGetClick()
	i = i + 1
	sys.stdout.flush()

Web “front door”

Client web requests are handled by Amazon API Gateway, a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. This handles the web request traffic to get data into your analytics pipeline by being a service proxy to an Amazon Kinesis stream.  API Gateway takes click and impression web traffic and converts the HTTP header and query string data into a JSON message and place it into your stream.  The CloudFormation script creates an API endpoint with a body mapping template similar to the following:

#set($inputRoot = $input.path('$'))
{
  "Data": "$util.base64Encode("{ ""browseraction"" : ""$input.params('browseraction')"", ""site"" : ""$input.params('Host')"" }")",
  "PartitionKey" : "$input.params('Host')",
  "StreamName" : "CSEKinesisBeaconInputStream"
}

(API Gateway service proxy body mapping template)

This tells API Gateway to take the host header and query string inputs and convert them into a payload to be put into a stream using a service proxy for Amazon Kinesis Streams.

Input stream

Amazon Kinesis Streams can capture terabytes of data per hour from thousands of sources.  In this case, you use it to deliver your JSON messages to Amazon Kinesis Analytics.

Analytics application

Amazon Kinesis Analytics processes the incoming messages and allow you to perform analytics on the dynamic stream of data.  You use SQL to calculate the CTR and pass that into your RANDOM_CUT_FOREST function to calculate an anomaly score.

First, calculate the counts of impressions and clicks using SQL with a tumbling window in Amazon Kinesis Analytics.  Analytics uses streams and pumps in SQL to process data.  See our earlier post to get an overview of streaming data with Kinesis Analytics. Inside the Analytics SQL, a stream is analogous to a table and a pump is the flow of data into those tables.  Here’s the description of streams for the impression and click data.

CREATE OR REPLACE STREAM "CLICKSTREAM" ( 
   "CLICKCOUNT" DOUBLE
);


CREATE OR REPLACE STREAM "IMPRESSIONSTREAM" ( 
   "IMPRESSIONCOUNT" DOUBLE
);

Later, you calculate the clicks divided by impressions; you are using a DOUBLE data type to contain the count values.  If you left them as integers, the division below would yield only a 0 or 1.  Next, you define the pumps that populate the stream using a tumbling window, a window of time during which all records received are considered as part of the statement.

CREATE OR REPLACE PUMP "CLICKPUMP" STARTED AS 
INSERT INTO "CLICKSTREAM" ("CLICKCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Click'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);
CREATE OR REPLACE PUMP "IMPRESSIONPUMP" STARTED AS 
INSERT INTO "IMPRESSIONSTREAM" ("IMPRESSIONCOUNT") 
SELECT STREAM COUNT(*) 
FROM "SOURCE_SQL_STREAM_001"
WHERE "browseraction" = 'Impression'
GROUP BY FLOOR(
  ("SOURCE_SQL_STREAM_001".ROWTIME - TIMESTAMP '1970-01-01 00:00:00')
    SECOND / 10 TO SECOND
);

The GROUP BY statement uses FLOOR and ROWTIME to create a tumbling window of 10 seconds.  This tumbling window will output one record every ten seconds assuming we are receiving data from the corresponding pump.  In this case, you output the total number of records that were clicks or impressions.

Next, use the output of these pumps to calculate the CTR:

CREATE OR REPLACE STREAM "CTRSTREAM" (
  "CTR" DOUBLE
);

CREATE OR REPLACE PUMP "CTRPUMP" STARTED AS 
INSERT INTO "CTRSTREAM" ("CTR")
SELECT STREAM "CLICKCOUNT" / "IMPRESSIONCOUNT" * 100 as "CTR"
FROM "IMPRESSIONSTREAM",
  "CLICKSTREAM"
WHERE "IMPRESSIONSTREAM".ROWTIME = "CLICKSTREAM".ROWTIME;

Finally, these CTR values are used in RANDOM_CUT_FOREST to detect anomalous values.  The DESTINATION_SQL_STREAM is the output stream of data from the Amazon Kinesis Analytics application.

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    "CTRPERCENT" DOUBLE,
    "ANOMALY_SCORE" DOUBLE
);

CREATE OR REPLACE PUMP "OUTPUT_PUMP" STARTED AS 
INSERT INTO "DESTINATION_SQL_STREAM" 
SELECT STREAM * FROM
TABLE (RANDOM_CUT_FOREST( 
             CURSOR(SELECT STREAM "CTR" FROM "CTRSTREAM"), --inputStream
             100, --numberOfTrees (default)
             12, --subSampleSize 
             100000, --timeDecay (default)
             1) --shingleSize (default)
)
WHERE ANOMALY_SCORE > 2; 

The RANDOM_CUT_FOREST function greatly simplifies the programming required for anomaly detection.  However, understanding your data domain is paramount when performing data analytics.  The RANDOM_CUT_FOREST function is a tool for data scientists, not a replacement for them.  Knowing whether your data is logarithmic, circadian rhythmic, linear, etc. will provide the insights necessary to select the right parameters for RANDOM_CUT_FOREST.  For more information about parameters, see the RANDOM_CUT_FOREST Function.

Fortunately, the default values work in a wide variety of cases. In this case, use the default values for all but the subSampleSize parameter.  Typically, you would use a larger sample size to increase the pool of random samples used to calculate the anomaly score; for this post, use 12 samples so as to start evaluating the anomaly scores sooner.

Your SQL query outputs one record every ten seconds from the tumbling window so you’ll have enough evaluation values after two minutes to start calculating the anomaly score.  You are also using a cutoff value where records are only output to “DESTINATION_SQL_STREAM” if the anomaly score from the function is greater than 2 using the WHERE clause. To help visualize the cutoff point, here are the data points from a few runs through the pipeline using the sample Python script:

As you can see, the majority of the CTR data points are grouped around a 10% rate.  As the values move away from the 10% rate, their anomaly scores go up because they are more infrequent.  In the graph, an anomaly score above the cutoff is shaded orange. A cutoff value of 2 works well for this data set.

Output stream

The Amazon Kinesis Analytics application outputs the values from “DESTINATION_SQL_STREAM” to an Amazon Kinesis stream when the record has an anomaly score greater than 2.

Anomaly execution function

The output stream is an input event for an AWS Lambda function with a batch size of 1, so the function gets called one time for each message put in the output stream.  This function represents a place where you could react to anomalous events.  In a real world scenario, you may want to instead update a real-time bidding system to react to current events.  In this example, you send a message to SNS to see a tangible response to the changing CTR.  The CloudFormation template creates a Lambda function similar to the following:

var AWS = require('aws-sdk');
var sns = new AWS.SNS( { region: "<region>" });
	exports.handler = function(event, context) {
	//our batch size is 1 record so loop expected to process only once
	event.Records.forEach(function(record) {
		// Kinesis data is base64 encoded so decode here
		var payload = new Buffer(record.kinesis.data, 'base64').toString('ascii');
		var rec = payload.split(',');
		var ctr = rec[0];
		var anomaly_score = rec[1];
		var detail = 'Anomaly detected with a click through rate of ' + ctr + '% and an anomaly score of ' + anomaly_score;
		var subject = 'Anomaly Detected';
		var params = {
			Message: detail,
			MessageStructure: 'String',
			Subject: subject,
			TopicArn: 'arn:aws:sns:us-east-1::ClickStreamEvent'
		};
		sns.publish(params, function(err, data) {
			if (err) context.fail(err.stack);
			else     context.succeed('Published Notification');
		});
	});
}; 

Notification system

Amazon SNS is a fully managed, highly scalable messaging platform.  For this walkthrough, you send messages to SNS with subscribers that send email and text messages to a mobile phone.

Analytics pipeline walkthrough

Sometimes it’s best to build out a solution so you can see all the parts working and get a good sense of how it works.   Here are the steps to build out the entire pipeline as described above in your own account and perform real-time clickstream analysis yourself.  Following this walkthrough creates resources in your account that you can use to test the example pipeline.

Prerequisites

To complete the walkthrough, you’ll need an AWS account and a place to execute Python scripts.

Configure the CloudFormation template

Download the CloudFormation template named CSE-cfn.json from the AWS Big Data blog Github repository. In the CloudFormation console, set your region to N. Virginia, Oregon, or Ireland and then browse to that file.

When your script executes and an anomaly is detected, SNS sends a notification.  To see these in an email or text message, enter your email address and mobile phone number in the Parameters section and choose Next.

This CloudFormation script creates policies and roles necessary to process the incoming messages.  Acknowledge this by selecting the field or you will get a warning about IAM_CAPABILITY.

If you entered a valid email or SMS number, you will receive a validation message when the SNS subscriptions are created by CloudFormation.  If you don’t wish to receive email or texts from your pipeline, you don’t need to complete the verification process.

Run the Python script

Copy ClickImpressionGenerator.py to a local folder.

This script uses the requests library to make HTTP requests.  If you don’t have that Python package globally installed, you can install it locally in by running the following command in the same folder as the Python script:

pip install requests –t .

When the CloudFormation stack is complete, choose Outputs, ExecutionCommand, and run the command from the folder with the Python script.

This will start generating web traffic and send it to the API Gateway in the front of your analytics pipeline.  It is important to have data flowing into your Amazon Kinesis Analytics application for the next section so that a schema can be generated based on the incoming data.  If the script completes before you finish with the next section, simply restart the script with the same command.

Create the Analytics pipeline application

Open the Amazon Kinesis console and choose Go to Analytics.

Choose Create new application to create the Kinesis Analytics application, enter an application name and description, and then choose Save and continue.

Choose Connect to a source to see a list of streams.  Select the stream that starts with the name of your CloudFormation stack and contains CSEKinesisBeaconInputStream in the form of: <stack name>- CSEKinesisBeaconInputStream-<random string>.  This is the stream that your click and impression requests will be sent to from API Gateway.

Scroll down and choose Choose an IAM role.  Select the role with the name in the form of <stack name>-CSEKinesisAnalyticsRole-<random string>.  If your ClickImpressionGenerator.py script is still running and generating data, you should see a stream sample.  Scroll to the bottom and choose Save and continue.

Next, add the SQL for your real-time analytics.  Choose the Go to SQL editor, choose Yes, start application, and paste the SQL contents from CSE-SQL.sql into the editor. Choose Save and run SQL.  After a minute or so, you should see data showing up every ten seconds in the CLICKSTREAM.  Choose Exit when you are done editing.

Choose Connect to a destination and select the <stack name>-CSEKinesisBeaconOutputStream-<random string> from the list of streams.  Change the output format to CSV.

Choose Choose an IAM role and select <stack name>-CSEKinesisAnalyticsRole-<random string> again.  Choose Save and continue.  Now your analytics pipeline is complete.  When your script gets to the injected anomaly section, you should get a text message or email to notify you of the anomaly.  To clean up the resources created here, delete the stack in CloudFormation, stop the Kinesis Analytics application then delete it.

Summary

A pipeline like this can be used for many use cases where anomaly detection is valuable. What solutions have you enabled with this architecture? If you have questions or suggestions, please comment below.

 


Related

Process Amazon Kinesis Aggregated Data with AWS Lambda

Bluetooth LED bulbs

Post Syndicated from Matthew Garrett original http://mjg59.dreamwidth.org/43722.html

The best known smart bulb setups (such as the Philips Hue and the Belkin Wemo) are based on Zigbee, a low-energy, low-bandwidth protocol that operates on various unlicensed radio bands. The problem with Zigbee is that basically no home routers or mobile devices have a Zigbee radio, so to communicate with them you need an additional device (usually called a hub or bridge) that can speak Zigbee and also hook up to your existing home network. Requests are sent to the hub (either directly if you’re on the same network, or via some external control server if you’re on a different network) and it sends appropriate Zigbee commands to the bulbs.

But requiring an additional device adds some expense. People have attempted to solve this in a couple of ways. The first is building direct network connectivity into the bulbs, in the form of adding an 802.11 controller. Go through some sort of setup process[1], the bulb joins your network and you can communicate with it happily. Unfortunately adding wifi costs more than adding Zigbee, both in terms of money and power – wifi bulbs consume noticeably more power when “off” than Zigbee ones.

There’s a middle ground. There’s a large number of bulbs available from Amazon advertising themselves as Bluetooth, which is true but slightly misleading. They’re actually implementing Bluetooth Low Energy, which is part of the Bluetooth 4.0 spec. Implementing this requires both OS and hardware support, so older systems are unable to communicate. Android 4.3 devices tend to have all the necessary features, and modern desktop Linux is also fine as long as you have a Bluetooth 4.0 controller.

Bluetooth is intended as a low power communications protocol. Bluetooth Low Energy (or BLE) is even lower than that, running in a similar power range to Zigbee. Most semi-modern phones can speak it, so it seems like a pretty good choice. Obviously you lose the ability to access the device remotely, but given the track record on this sort of thing that’s arguably a benefit. There’s a couple of other downsides – the range is worse than Zigbee (but probably still acceptable for any reasonably sized house or apartment), and only one device can be connected to a given BLE server at any one time. That means that if you have the control app open while you’re near a bulb, nobody else can control that bulb until you disconnect.

The quality of the bulbs varies a great deal. Some of them are pure RGB bulbs and incapable of producing a convincing white at a reasonable intensity[2]. Some have additional white LEDs but don’t support running them at the same time as the colour LEDs, so you have the choice between colour or a fixed (and usually more intense) white. Some allow running the white LEDs at the same time as the RGB ones, which means you can vary the colour temperature of the “white” output.

But while the quality of the bulbs varies, the quality of the apps doesn’t really. They’re typically all dreadful, competing on features like changing bulb colour in time to music rather than on providing a pleasant user experience. And the whole “Only one person can control the lights at a time” thing doesn’t really work so well if you actually live with anyone else. I was dissatisfied.

I’d met Mike Ryan at Kiwicon a couple of years back after watching him demonstrate hacking a BLE skateboard. He offered a couple of good hints for reverse engineering these devices, the first being that Android already does almost everything you need. Hidden in the developer settings is an option marked “Enable Bluetooth HCI snoop log”. Turn that on and all Bluetooth traffic (including BLE) is dumped into /sdcard/btsnoop_hci.log. Turn that on, start the app, make some changes, retrieve the file and check it out using Wireshark. Easy.

Conveniently, BLE is very straightforward when it comes to network protocol. The only thing you have is GATT, the Generic Attribute Protocol. Using this you can read and write multiple characteristics. Each packet is limited to a maximum of 20 bytes. Most implementations use a single characteristic for light control, so it’s then just a matter of staring at the dumped packets until something jumps out at you. A pretty typical implementation is something like:

0x56,r,g,b,0x00,0xf0,0x00,0xaa

where r, g and b are each just a single byte representing the corresponding red, green or blue intensity. 0x56 presumably indicates a “Set the light to these values” command, 0xaa indicates end of command and 0xf0 indicates that it’s a request to set the colour LEDs. Sending 0x0f instead results in the previous byte (0x00 in this example) being interpreted as the intensity of the white LEDs. Unfortunately the bulb I tested that speaks this protocol didn’t allow you to drive the white LEDs at the same time as anything else – setting the selection byte to 0xff didn’t result in both sets of intensities being interpreted at once. Boo.

You can test this out fairly easily using the gatttool app. Run hcitool lescan to look for the device (remember that it won’t show up if anything else is connected to it at the time), then do gatttool -b deviceid -I to get an interactive shell. Type connect to initiate a connection, and once connected send commands by doing char-write-cmd handle value using the handle obtained from your hci dump.

I did this successfully for various bulbs, but annoyingly hit a problem with one from Tikteck. The leading byte of each packet was clearly a counter, but the rest of the packet appeared to be garbage. For reasons best known to themselves, they’ve implemented application-level encryption on top of BLE. This was a shame, because they were easily the best of the bulbs I’d used – the white LEDs work in conjunction with the colour ones once you’re sufficiently close to white, giving you good intensity and letting you modify the colour temperature. That gave me incentive, but figuring out the protocol took quite some time. Earlier this week, I finally cracked it. I’ve put a Python implementation on Github. The idea is to tie it into Ulfire running on a central machine with a Bluetooth controller, making it possible for me to control the lights from multiple different apps simultaneously and also integrating with my Echo.

I’d write something about the encryption, but I honestly don’t know. Large parts of this make no sense to me whatsoever. I haven’t even had any gin in the past two weeks. If anybody can explain how anything that’s being done there makes any sense at all[3] that would be appreciated.

[1] typically via the bulb pretending to be an access point, but also these days through a terrifying hack involving spewing UDP multicast packets of varying lengths in order to broadcast the password to associated but unauthenticated devices and good god the future is terrifying

[2] For a given power input, blue LEDs produce more light than other colours. To get white with RGB LEDs you either need to have more red and green LEDs than blue ones (which costs more), or you need to reduce the intensity of the blue ones (which means your headline intensity is lower). Neither is appealing, so most of these bulbs will just give you a blue “white” if you ask for full red, green and blue

[3] Especially the bit where we calculate something from the username and password and then encrypt that using some random numbers as the key, then send 50% of the random numbers and 50% of the encrypted output to the device, because I can’t even

comment count unavailable comments

Bluetooth LED bulbs

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/43722.html

The best known smart bulb setups (such as the Philips Hue and the Belkin Wemo) are based on Zigbee, a low-energy, low-bandwidth protocol that operates on various unlicensed radio bands. The problem with Zigbee is that basically no home routers or mobile devices have a Zigbee radio, so to communicate with them you need an additional device (usually called a hub or bridge) that can speak Zigbee and also hook up to your existing home network. Requests are sent to the hub (either directly if you’re on the same network, or via some external control server if you’re on a different network) and it sends appropriate Zigbee commands to the bulbs.

But requiring an additional device adds some expense. People have attempted to solve this in a couple of ways. The first is building direct network connectivity into the bulbs, in the form of adding an 802.11 controller. Go through some sort of setup process[1], the bulb joins your network and you can communicate with it happily. Unfortunately adding wifi costs more than adding Zigbee, both in terms of money and power – wifi bulbs consume noticeably more power when “off” than Zigbee ones.

There’s a middle ground. There’s a large number of bulbs available from Amazon advertising themselves as Bluetooth, which is true but slightly misleading. They’re actually implementing Bluetooth Low Energy, which is part of the Bluetooth 4.0 spec. Implementing this requires both OS and hardware support, so older systems are unable to communicate. Android 4.3 devices tend to have all the necessary features, and modern desktop Linux is also fine as long as you have a Bluetooth 4.0 controller.

Bluetooth is intended as a low power communications protocol. Bluetooth Low Energy (or BLE) is even lower than that, running in a similar power range to Zigbee. Most semi-modern phones can speak it, so it seems like a pretty good choice. Obviously you lose the ability to access the device remotely, but given the track record on this sort of thing that’s arguably a benefit. There’s a couple of other downsides – the range is worse than Zigbee (but probably still acceptable for any reasonably sized house or apartment), and only one device can be connected to a given BLE server at any one time. That means that if you have the control app open while you’re near a bulb, nobody else can control that bulb until you disconnect.

The quality of the bulbs varies a great deal. Some of them are pure RGB bulbs and incapable of producing a convincing white at a reasonable intensity[2]. Some have additional white LEDs but don’t support running them at the same time as the colour LEDs, so you have the choice between colour or a fixed (and usually more intense) white. Some allow running the white LEDs at the same time as the RGB ones, which means you can vary the colour temperature of the “white” output.

But while the quality of the bulbs varies, the quality of the apps doesn’t really. They’re typically all dreadful, competing on features like changing bulb colour in time to music rather than on providing a pleasant user experience. And the whole “Only one person can control the lights at a time” thing doesn’t really work so well if you actually live with anyone else. I was dissatisfied.

I’d met Mike Ryan at Kiwicon a couple of years back after watching him demonstrate hacking a BLE skateboard. He offered a couple of good hints for reverse engineering these devices, the first being that Android already does almost everything you need. Hidden in the developer settings is an option marked “Enable Bluetooth HCI snoop log”. Turn that on and all Bluetooth traffic (including BLE) is dumped into /sdcard/btsnoop_hci.log. Turn that on, start the app, make some changes, retrieve the file and check it out using Wireshark. Easy.

Conveniently, BLE is very straightforward when it comes to network protocol. The only thing you have is GATT, the Generic Attribute Protocol. Using this you can read and write multiple characteristics. Each packet is limited to a maximum of 20 bytes. Most implementations use a single characteristic for light control, so it’s then just a matter of staring at the dumped packets until something jumps out at you. A pretty typical implementation is something like:

0x56,r,g,b,0x00,0xf0,0x00,0xaa

where r, g and b are each just a single byte representing the corresponding red, green or blue intensity. 0x56 presumably indicates a “Set the light to these values” command, 0xaa indicates end of command and 0xf0 indicates that it’s a request to set the colour LEDs. Sending 0x0f instead results in the previous byte (0x00 in this example) being interpreted as the intensity of the white LEDs. Unfortunately the bulb I tested that speaks this protocol didn’t allow you to drive the white LEDs at the same time as anything else – setting the selection byte to 0xff didn’t result in both sets of intensities being interpreted at once. Boo.

You can test this out fairly easily using the gatttool app. Run hcitool lescan to look for the device (remember that it won’t show up if anything else is connected to it at the time), then do gatttool -b deviceid -I to get an interactive shell. Type connect to initiate a connection, and once connected send commands by doing char-write-cmd handle value using the handle obtained from your hci dump.

I did this successfully for various bulbs, but annoyingly hit a problem with one from Tikteck. The leading byte of each packet was clearly a counter, but the rest of the packet appeared to be garbage. For reasons best known to themselves, they’ve implemented application-level encryption on top of BLE. This was a shame, because they were easily the best of the bulbs I’d used – the white LEDs work in conjunction with the colour ones once you’re sufficiently close to white, giving you good intensity and letting you modify the colour temperature. That gave me incentive, but figuring out the protocol took quite some time. Earlier this week, I finally cracked it. I’ve put a Python implementation on Github. The idea is to tie it into Ulfire running on a central machine with a Bluetooth controller, making it possible for me to control the lights from multiple different apps simultaneously and also integrating with my Echo.

I’d write something about the encryption, but I honestly don’t know. Large parts of this make no sense to me whatsoever. I haven’t even had any gin in the past two weeks. If anybody can explain how anything that’s being done there makes any sense at all[3] that would be appreciated.

[1] typically via the bulb pretending to be an access point, but also these days through a terrifying hack involving spewing UDP multicast packets of varying lengths in order to broadcast the password to associated but unauthenticated devices and good god the future is terrifying

[2] For a given power input, blue LEDs produce more light than other colours. To get white with RGB LEDs you either need to have more red and green LEDs than blue ones (which costs more), or you need to reduce the intensity of the blue ones (which means your headline intensity is lower). Neither is appealing, so most of these bulbs will just give you a blue “white” if you ask for full red, green and blue

[3] Especially the bit where we calculate something from the username and password and then encrypt that using some random numbers as the key, then send 50% of the random numbers and 50% of the encrypted output to the device, because I can’t even

comment count unavailable comments