Tag Archives: javascript

Integration With Zapier

Post Syndicated from Bozho original https://techblog.bozho.net/integration-with-zapier/

Integration is boring. And also inevitable. But I won’t be writing about enterprise integration patterns. Instead, I’ll explain how to create an app for integration with Zapier.

What is Zapier? It is a service that allows you tо connect two (or more) otherwise unconnected services via their APIs (or protocols). You can do stuff like “Create a Trello task from an Evernote note”, “publish new RSS items to Facebook”, “append new emails to a spreadsheet”, “post approaching calendar meeting to Slack”, “Save big email attachments to Dropbox”, “tweet all instagrams above a certain likes threshold”, and so on. In fact, it looks to cover mostly the same usecases as another famous service that I really like – IFTTT (if this then that), with my favourite use-case “Get a notification when the international space station passes over your house”. And all of those interactions can be configured via a UI.

Now that’s good for end users but what does it have to do with software development and integration? Zapier (unlike IFTTT, unfortunately), allows custom 3rd party services to be included. So if you have a service of your own, you can create an “app” and allow users to integrate your service with all the other 3rd party services. IFTTT offers a way to invoke web endpoints (including RESTful services), but it doesn’t allow setting headers, so that makes it quite limited for actual APIs.

In this post I’ll briefly explain how to write a custom Zapier app and then will discuss where services like Zapier stand from an architecture perspective.

The thing that I needed it for – to be able to integrate LogSentinel with any of the third parties available through Zapier, i.e. to store audit logs for events that happen in all those 3rd party systems. So how do I do that? There’s a tutorial that makes it look simple. And it is, with a few catches.

First, there are two tutorials – one in GitHub and one on Zapier’s website. And they differ slightly, which becomes tricky in some cases.

I initially followed the GitHub tutorial and had my build fail. It claimed the zapier platform dependency is missing. After I compared it with the example apps, I found out there’s a caret in front of the zapier platform dependency. Removing it just yielded another error – that my node version should be exactly 6.10.2. Why?

The Zapier CLI requires you have exactly version 6.10.2 installed. You’ll see errors and will be unable to proceed otherwise.

It appears that they are using AWS Lambda which is stuck on Node 6.10.2 (actually – it’s 6.10.3 when you check). The current major release is 8, so minus points for choosing … javascript for a command-line tool and for building sandboxed apps. Maybe other decisions had their downsides as well, I won’t be speculating. Maybe it’s just my dislike for dynamic languages.

So, after you make sure you have the correct old version on node, you call zapier init and make sure there are no carets, npm install and then zapier test. So far so good, you have a dummy app. Now how do you make a RESTful call to your service?

Zapier splits the programmable entities in two – “triggers” and “creates”. A trigger is the event that triggers the whole app, an a “create” is what happens as a result. In my case, my app doesn’t publish any triggers, it only accepts input, so I won’t be mentioning triggers (though they seem easy). You configure all of the elements in index.js (e.g. this one):

const log = require('./creates/log');
....
creates: {
    [log.key]: log,
}

The log.js file itself is the interesting bit – there you specify all the parameters that should be passed to your API call, as well as making the API call itself:

const log = (z, bundle) => {
  const responsePromise = z.request({
    method: 'POST',
    url: `https://api.logsentinel.com/api/log/${bundle.inputData.actorId}/${bundle.inputData.action}`,
    body: bundle.inputData.details,
	headers: {
		'Accept': 'application/json'
	}
  });
  return responsePromise
    .then(response => JSON.parse(response.content));
};

module.exports = {
  key: 'log-entry',
  noun: 'Log entry',

  display: {
    label: 'Log',
    description: 'Log an audit trail entry'
  },

  operation: {
    inputFields: [
      {key: 'actorId', label:'ActorID', required: true},
      {key: 'action', label:'Action', required: true},
      {key: 'details', label:'Details', required: false}
    ],
    perform: log
  }
};

You can pass the input parameters to your API call, and it’s as simple as that. The user can then specify which parameters from the source (“trigger”) should be mapped to each of your parameters. In an example zap, I used an email trigger and passed the sender as actorId, the sibject as “action” and the body of the email as details.

There’s one more thing – authentication. Authentication can be done in many ways. Some services offer OAuth, others – HTTP Basic or other custom forms of authentication. There is a section in the documentation about all the options. In my case it was (almost) an HTTP Basic auth. My initial thought was to just supply the credentials as parameters (which you just hardcode rather than map to trigger parameters). That may work, but it’s not the canonical way. You should configure “authentication”, as it triggers a friendly UI for the user.

You include authentication.js (which has the fields your authentication requires) and then pre-process requests by adding a header (in index.js):

const authentication = require('./authentication');

const includeAuthHeaders = (request, z, bundle) => {
  if (bundle.authData.organizationId) {
	request.headers = request.headers || {};
	request.headers['Application-Id'] = bundle.authData.applicationId
	const basicHash = Buffer(`${bundle.authData.organizationId}:${bundle.authData.apiSecret}`).toString('base64');
	request.headers['Authorization'] = `Basic ${basicHash}`;
  }
  return request;
};

const App = {
  // This is just shorthand to reference the installed dependencies you have. Zapier will
  // need to know these before we can upload
  version: require('./package.json').version,
  platformVersion: require('zapier-platform-core').version,
  authentication: authentication,
  
  // beforeRequest & afterResponse are optional hooks into the provided HTTP client
  beforeRequest: [
	includeAuthHeaders
  ]
...
}

And then you zapier push your app and you can test it. It doesn’t automatically go live, as you have to invite people to try it and use it first, but in many cases that’s sufficient (i.e. using Zapier when doing integration with a particular client)

Can Zapier can be used for any integration problem? Unlikely – it’s pretty limited and simple, but that’s also a strength. You can, in half a day, make your service integrate with thousands of others for the most typical use-cases. And not that although it’s meant for integrating public services rather than for enterprise integration (where you make multiple internal systems talk to each other), as an increasing number of systems rely on 3rd party services, it could find home in an enterprise system, replacing some functions of an ESB.

Effectively, such services (Zapier, IFTTT) are “Simple ESB-as-a-service”. You go to a UI, fill a bunch of fields, and you get systems talking to each other without touching the systems themselves. I’m not a big fan of ESBs, mostly because they become harder to support with time. But minimalist, external ones might be applicable in certain situations. And while such services are primarily aimed at end users, they could be a useful bit in an enterprise architecture that relies on 3rd party services.

Whether it could process the required load, whether an organization is willing to let its data flow through a 3rd party provider (which may store the intermediate parameters), is a question that should be answered in a case by cases basis. I wouldn’t recommend it as a general solution, but it’s certainly an option to consider.

The post Integration With Zapier appeared first on Bozho's tech blog.

Reactive Microservices Architecture on AWS

Post Syndicated from Sascha Moellering original https://aws.amazon.com/blogs/architecture/reactive-microservices-architecture-on-aws/

Microservice-application requirements have changed dramatically in recent years. These days, applications operate with petabytes of data, need almost 100% uptime, and end users expect sub-second response times. Typical N-tier applications can’t deliver on these requirements.

Reactive Manifesto, published in 2014, describes the essential characteristics of reactive systems including: responsiveness, resiliency, elasticity, and being message driven.

Being message driven is perhaps the most important characteristic of reactive systems. Asynchronous messaging helps in the design of loosely coupled systems, which is a key factor for scalability. In order to build a highly decoupled system, it is important to isolate services from each other. As already described, isolation is an important aspect of the microservices pattern. Indeed, reactive systems and microservices are a natural fit.

Implemented Use Case
This reference architecture illustrates a typical ad-tracking implementation.

Many ad-tracking companies collect massive amounts of data in near-real-time. In many cases, these workloads are very spiky and heavily depend on the success of the ad-tech companies’ customers. Typically, an ad-tracking-data use case can be separated into a real-time part and a non-real-time part. In the real-time part, it is important to collect data as fast as possible and ask several questions including:,  “Is this a valid combination of parameters?,””Does this program exist?,” “Is this program still valid?”

Because response time has a huge impact on conversion rate in advertising, it is important for advertisers to respond as fast as possible. This information should be kept in memory to reduce communication overhead with the caching infrastructure. The tracking application itself should be as lightweight and scalable as possible. For example, the application shouldn’t have any shared mutable state and it should use reactive paradigms. In our implementation, one main application is responsible for this real-time part. It collects and validates data, responds to the client as fast as possible, and asynchronously sends events to backend systems.

The non-real-time part of the application consumes the generated events and persists them in a NoSQL database. In a typical tracking implementation, clicks, cookie information, and transactions are matched asynchronously and persisted in a data store. The matching part is not implemented in this reference architecture. Many ad-tech architectures use frameworks like Hadoop for the matching implementation.

The system can be logically divided into the data collection partand the core data updatepart. The data collection part is responsible for collecting, validating, and persisting the data. In the core data update part, the data that is used for validation gets updated and all subscribers are notified of new data.

Components and Services

Main Application
The main application is implemented using Java 8 and uses Vert.x as the main framework. Vert.x is an event-driven, reactive, non-blocking, polyglot framework to implement microservices. It runs on the Java virtual machine (JVM) by using the low-level IO library Netty. You can write applications in Java, JavaScript, Groovy, Ruby, Kotlin, Scala, and Ceylon. The framework offers a simple and scalable actor-like concurrency model. Vert.x calls handlers by using a thread known as an event loop. To use this model, you have to write code known as “verticles.” Verticles share certain similarities with actors in the actor model. To use them, you have to implement the verticle interface. Verticles communicate with each other by generating messages in  a single event bus. Those messages are sent on the event bus to a specific address, and verticles can register to this address by using handlers.

With only a few exceptions, none of the APIs in Vert.x block the calling thread. Similar to Node.js, Vert.x uses the reactor pattern. However, in contrast to Node.js, Vert.x uses several event loops. Unfortunately, not all APIs in the Java ecosystem are written asynchronously, for example, the JDBC API. Vert.x offers a possibility to run this, blocking APIs without blocking the event loop. These special verticles are called worker verticles. You don’t execute worker verticles by using the standard Vert.x event loops, but by using a dedicated thread from a worker pool. This way, the worker verticles don’t block the event loop.

Our application consists of five different verticles covering different aspects of the business logic. The main entry point for our application is the HttpVerticle, which exposes an HTTP-endpoint to consume HTTP-requests and for proper health checking. Data from HTTP requests such as parameters and user-agent information are collected and transformed into a JSON message. In order to validate the input data (to ensure that the program exists and is still valid), the message is sent to the CacheVerticle.

This verticle implements an LRU-cache with a TTL of 10 minutes and a capacity of 100,000 entries. Instead of adding additional functionality to a standard JDK map implementation, we use Google Guava, which has all the features we need. If the data is not in the L1 cache, the message is sent to the RedisVerticle. This verticle is responsible for data residing in Amazon ElastiCache and uses the Vert.x-redis-client to read data from Redis. In our example, Redis is the central data store. However, in a typical production implementation, Redis would just be the L2 cache with a central data store like Amazon DynamoDB. One of the most important paradigms of a reactive system is to switch from a pull- to a push-based model. To achieve this and reduce network overhead, we’ll use Redis pub/sub to push core data changes to our main application.

Vert.x also supports direct Redis pub/sub-integration, the following code shows our subscriber-implementation:

vertx.eventBus().<JsonObject>consumer(REDIS_PUBSUB_CHANNEL_VERTX, received -> {

JsonObject value = received.body().getJsonObject("value");

String message = value.getString("message");

JsonObject jsonObject = new JsonObject(message);

eb.send(CACHE_REDIS_EVENTBUS_ADDRESS, jsonObject);

});

redis.subscribe(Constants.REDIS_PUBSUB_CHANNEL, res -> {

if (res.succeeded()) {

LOGGER.info("Subscribed to " + Constants.REDIS_PUBSUB_CHANNEL);

} else {

LOGGER.info(res.cause());

}

});

The verticle subscribes to the appropriate Redis pub/sub-channel. If a message is sent over this channel, the payload is extracted and forwarded to the cache-verticle that stores the data in the L1-cache. After storing and enriching data, a response is sent back to the HttpVerticle, which responds to the HTTP request that initially hit this verticle. In addition, the message is converted to ByteBuffer, wrapped in protocol buffers, and send to an Amazon Kinesis Data Stream.

The following example shows a stripped-down version of the KinesisVerticle:

public class KinesisVerticle extends AbstractVerticle {

private static final Logger LOGGER = LoggerFactory.getLogger(KinesisVerticle.class);

private AmazonKinesisAsync kinesisAsyncClient;

private String eventStream = "EventStream";

@Override

public void start() throws Exception {

EventBus eb = vertx.eventBus();

kinesisAsyncClient = createClient();

eventStream = System.getenv(STREAM_NAME) == null ? "EventStream" : System.getenv(STREAM_NAME);

eb.consumer(Constants.KINESIS_EVENTBUS_ADDRESS, message -> {

try {

TrackingMessage trackingMessage = Json.decodeValue((String)message.body(), TrackingMessage.class);

String partitionKey = trackingMessage.getMessageId();

byte [] byteMessage = createMessage(trackingMessage);

ByteBuffer buf = ByteBuffer.wrap(byteMessage);

sendMessageToKinesis(buf, partitionKey);

message.reply("OK");

}

catch (KinesisException exc) {

LOGGER.error(exc);

}

});

}

Kinesis Consumer
This AWS Lambda function consumes data from an Amazon Kinesis Data Stream and persists the data in an Amazon DynamoDB table. In order to improve testability, the invocation code is separated from the business logic. The invocation code is implemented in the class KinesisConsumerHandler and iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to protocol buffers and converted into a Java object. Those Java objects are passed to the business logic, which persists the data in a DynamoDB table. In order to improve duration of successive Lambda calls, the DynamoDB-client is instantiated lazily and reused if possible.

Redis Updater
From time to time, it is necessary to update core data in Redis. A very efficient implementation for this requirement is using AWS Lambda and Amazon Kinesis. New core data is sent over the AWS Kinesis stream using JSON as data format and consumed by a Lambda function. This function iterates over the Kinesis events pulled from the Kinesis stream by AWS Lambda. Each Kinesis event is unwrapped and transformed from ByteBuffer to String and converted into a Java object. The Java object is passed to the business logic and stored in Redis. In addition, the new core data is also sent to the main application using Redis pub/sub in order to reduce network overhead and converting from a pull- to a push-based model.

The following example shows the source code to store data in Redis and notify all subscribers:

public void updateRedisData(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

Map<String, String> map = marshal(jsonString);

String statusCode = jedis.hmset(trackingMessage.getProgramId(), map);

}

catch (Exception exc) {

if (null == logger)

exc.printStackTrace();

else

logger.log(exc.getMessage());

}

}

public void notifySubscribers(final TrackingMessage trackingMessage, final Jedis jedis, final LambdaLogger logger) {

try {

ObjectMapper mapper = new ObjectMapper();

String jsonString = mapper.writeValueAsString(trackingMessage);

jedis.publish(Constants.REDIS_PUBSUB_CHANNEL, jsonString);

}

catch (final IOException e) {

log(e.getMessage(), logger);

}

}

Similarly to our Kinesis Consumer, the Redis-client is instantiated somewhat lazily.

Infrastructure as Code
As already outlined, latency and response time are a very critical part of any ad-tracking solution because response time has a huge impact on conversion rate. In order to reduce latency for customers world-wide, it is common practice to roll out the infrastructure in different AWS Regions in the world to be as close to the end customer as possible. AWS CloudFormation can help you model and set up your AWS resources so that you can spend less time managing those resources and more time focusing on your applications that run in AWS.

You create a template that describes all the AWS resources that you want (for example, Amazon EC2 instances or Amazon RDS DB instances), and AWS CloudFormation takes care of provisioning and configuring those resources for you. Our reference architecture can be rolled out in different Regions using an AWS CloudFormation template, which sets up the complete infrastructure (for example, Amazon Virtual Private Cloud (Amazon VPC), Amazon Elastic Container Service (Amazon ECS) cluster, Lambda functions, DynamoDB table, Amazon ElastiCache cluster, etc.).

Conclusion
In this blog post we described reactive principles and an example architecture with a common use case. We leveraged the capabilities of different frameworks in combination with several AWS services in order to implement reactive principles—not only at the application-level but also at the system-level. I hope I’ve given you ideas for creating your own reactive applications and systems on AWS.

About the Author

Sascha Moellering is a Senior Solution Architect. Sascha is primarily interested in automation, infrastructure as code, distributed computing, containers and JVM. He can be reached at [email protected]

 

 

SUPER game night 3: GAMES MADE QUICK??? 2.0

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/23/super-game-night-3-games-made-quick-2-0/

Game night continues with a smorgasbord of games from my recent game jam, GAMES MADE QUICK??? 2.0!

The idea was to make a game in only a week while watching AGDQ, as an alternative to doing absolutely nothing for a week while watching AGDQ. (I didn’t submit a game myself; I was chugging along on my Anise game, which isn’t finished yet.)

I can’t very well run a game jam and not play any of the games, so here’s some of them in no particular order! Enjoy!

These are impressions, not reviews. I try to avoid major/ending spoilers, but big plot points do tend to leave impressions.

Weather Quest, by timlmul

short · rpg · jan 2017 · (lin)/mac/win · free on itch · jam entry

Weather Quest is its author’s first shipped game, written completely from scratch (the only vendored code is a micro OO base). It’s very short, but as someone who has also written LÖVE games completely from scratch, I can attest that producing something this game-like in a week is a fucking miracle. Bravo!

For reference, a week into my first foray, I think I was probably still writing my own Tiled importer like an idiot.

Only Mac and Windows builds are on itch, but it’s a LÖVE game, so Linux folks can just grab a zip from GitHub and throw that at love.

FINAL SCORE: ⛅☔☀

Pancake Numbers Simulator, by AnorakThePrimordial

short · sim · jan 2017 · lin/mac/win · free on itch · jam entry

Given a stack of N pancakes (of all different sizes and in no particular order), the Nth pancake number is the most flips you could possibly need to sort the pancakes in order with the smallest on top. A “flip” is sticking a spatula under one of the pancakes and flipping the whole sub-stack over. There’s, ah, a video embedded on the game page with some visuals.

Anyway, this game lets you simulate sorting a stack via pancake flipping, which is surprisingly satisfying! I enjoy cleaning up little simulated messes, such as… incorrectly-sorted pancakes, I guess?

This probably doesn’t work too well as a simulator for solving the general problem — you’d have to find an optimal solution for every permutation of N pancakes to be sure you were right. But it’s a nice interactive illustration of the problem, and if you know the pancake number for your stack size of choice (which I wish the game told you — for seven pancakes, it’s 8), then trying to restore a stack in that many moves makes for a nice quick puzzle.

FINAL SCORE: \(\frac{18}{11}\)

Framed Animals, by chridd

short · metroidvania · jan 2017 · web/win · free on itch · jam entry

The concept here was to kill the frames, save the animals, which is a delightfully literal riff on a long-running AGDQ/SGDQ donation incentive — people vote with their dollars to decide whether Super Metroid speedrunners go out of their way to free the critters who show you how to walljump and shinespark. Super Metroid didn’t have a showing at this year’s AGDQ, and so we have this game instead.

It’s rough, but clever, and I got really into it pretty quickly — each animal you save gives you a new ability (in true Metroid style), and you get to test that ability out by playing as the animal, with only that ability and no others, to get yourself back to the most recent save point.

I did, tragically, manage to get myself stuck near what I think was about to be the end of the game, so some of the animals will remain framed forever. What an unsatisfying conclusion.

Gravity feels a little high given the size of the screen, and like most tile-less platformers, there’s not really any way to gauge how high or long your jump is before you leap. But I’m only even nitpicking because I think this is a great idea and I hope the author really does keep working on it.

FINAL SCORE: $136,596.69

Battle 4 Glory, by Storyteller Games

short · fighter · jan 2017 · win · free on itch · jam entry

This is a Smash Bros-style brawler, complete with the four players, the 2D play area in a 3D world, and the random stage obstacles showing up. I do like the Smash style, despite not otherwise being a fan of fighting games, so it’s nice to see another game chase that aesthetic.

Alas, that’s about as far as it got — which is pretty far for a week of work! I don’t know what more to say, though. The environments are neat, but unless I’m missing something, the only actions at your disposal are jumping and very weak melee attacks. I did have a good few minutes of fun fruitlessly mashing myself against the bumbling bots, as you can see.

FINAL SCORE: 300%

Icnaluferu Guild, Year Sixteen, by CHz

short · adventure · jan 2017 · web · free on itch · jam entry

Here we have the first of several games made with bitsy, a micro game making tool that basically only supports walking around, talking to people, and picking up items.

I tell you this because I think half of my appreciation for this game is in the ways it wriggled against those limits to emulate a Zelda-like dungeon crawler. Everything in here is totally fake, and you can’t really understand just how fake unless you’ve tried to make something complicated with bitsy.

It’s pretty good. The dialogue is entertaining (the rest of your party develops distinct personalities solely through oneliners, somehow), the riffs on standard dungeon fare are charming, and the Link’s Awakening-esque perspective walls around the edges of each room are fucking glorious.

FINAL SCORE: 2 bits

The Lonely Tapes, by JTHomeslice

short · rpg · jan 2017 · web · free on itch · jam entry

Another bitsy entry, this one sees you play as a Wal— sorry, a JogDawg, which has lost its cassette tapes and needs to go recover them!

(A cassette tape is like a VHS, but for music.)

(A VHS is—)

I have the sneaking suspicion that I missed out on some musical in-jokes, due to being uncultured swine. I still enjoyed the game — it’s always clear when someone is passionate about the thing they’re writing about, and I could tell I was awash in that aura even if some of it went over my head. You know you’ve done good if someone from way outside your sphere shows up and still has a good time.

FINAL SCORE: Nine… Inch Nails? They’re a band, right? God I don’t know write your own damn joke

Pirate Kitty-Quest, by TheKoolestKid

short · adventure · jan 2017 · win · free on itch · jam entry

I completely forgot I’d even given “my birthday” and “my cat” as mostly-joking jam themes until I stumbled upon this incredible gem. I don’t think — let me just check here and — yeah no this person doesn’t even follow me on Twitter. I have no idea who they are?

BUT THEY MADE A GAME ABOUT ANISE AS A PIRATE, LOOKING FOR TREASURE

PIRATE. ANISE

PIRATE ANISE!!!

This game wins the jam, hands down. 🏆

FINAL SCORE: Yarr, eight pieces o’ eight

CHIPS Mario, by NovaSquirrel

short · platformer · jan 2017 · (lin/mac)/win · free on itch · jam entry

You see this? This is fucking witchcraft.

This game is made with MegaZeux. MegaZeux games look like THIS. Text-mode, bound to a grid, with two colors per cell. That’s all you get.

Until now, apparently?? The game is a tech demo of “unbound” sprites, which can be drawn on top of the character grid without being aligned to it. And apparently have looser color restrictions.

The collision is a little glitchy, which isn’t surprising for a MegaZeux platformer; I had some fun interactions with platforms a couple times. But hey, goddamn, it’s free-moving Mario, in MegaZeux, what the hell.

(I’m looking at the most recently added games on DigitalMZX now, and I notice that not only is this game in the first slot, but NovaSquirrel’s MegaZeux entry for Strawberry Jam last February is still in the seventh slot. RIP, MegaZeux. I’m surprised a major feature like this was even added if the community has largely evaporated?)

FINAL SCORE: n/a, disqualified for being probably summoned from the depths of Hell

d!¢< pic, by 573 Games

short · story · jan 2017 · web · free on itch · jam entry

This is a short story about not sending dick pics. It’s very short, so I can’t say much without spoiling it, but: you are generally prompted to either text something reasonable, or send a dick pic. You should not send a dick pic.

It’s a fascinating artifact, not because of the work itself, but because it’s so terse that I genuinely can’t tell what the author was even going for. And this is the kind of subject where the author was, surely, going for something. Right? But was it genuinely intended to be educational, or was it tongue-in-cheek about how some dudes still don’t get it? Or is it side-eying the player who clicks the obviously wrong option just for kicks, which is the same reason people do it for real? Or is it commentary on how “send a dick pic” is a literal option for every response in a real conversation, too, and it’s not that hard to just not do it — unless you are one of the kinds of people who just feels a compulsion to try everything, anything, just because you can? Or is it just a quick Twine and I am way too deep in this? God, just play the thing, it’s shorter than this paragraph.

I’m also left wondering when it is appropriate to send a dick pic. Presumably there is a correct time? Hopefully the author will enter Strawberry Jam 2 to expound upon this.

FINAL SCORE: 3½” 😉

Marble maze, by Shtille

short · arcade · jan 2017 · win · free on itch · jam entry

Ah, hm. So this is a maze navigated by rolling a marble around. You use WASD to move the marble, and you can also turn the camera with the arrow keys.

The trouble is… the marble’s movement is always relative to the world, not the camera. That means if you turn the camera 30° and then try to move the marble, it’ll move at a 30° angle from your point of view.

That makes navigating a maze, er, difficult.

Camera-relative movement is the kind of thing I take so much for granted that I wouldn’t even think to do otherwise, and I think it’s valuable to look at surprising choices that violate fundamental conventions, so I’m trying to take this as a nudge out of my comfort zone. What could you design in an interesting way that used world-relative movement? Probably not the player, but maybe something else in the world, as long as you had strong landmarks? Hmm.

FINAL SCORE: ᘔ

Refactor: flight, by fluffy

short · arcade · jan 2017 · lin/mac/win · free on itch · jam entry

Refactor is a game album, which is rather a lot what it sounds like, and Flight is one of the tracks. Which makes this a single, I suppose.

It’s one of those games where you move down an oddly-shaped tunnel trying not to hit the walls, but with some cute twists. Coins and gems hop up from the bottom of the screen in time with the music, and collecting them gives you points. Hitting a wall costs you some points and kills your momentum, but I don’t think outright losing is possible, which is great for me!

Also, the monk cycles through several animal faces. I don’t know why, and it’s very good. One of those odd but memorable details that sits squarely on the intersection of abstract, mysterious, and a bit weird, and refuses to budge from that spot.

The music is great too? Really chill all around.

FINAL SCORE: 🎵🎵🎵🎵

The Adventures of Klyde

short · adventure · jan 2017 · web · free on itch · jam entry

Another bitsy game, this one starring a pig (humorously symbolized by a giant pig nose with ears) who must collect fruit and solve some puzzles.

This is charmingly nostalgic for me — it reminds me of some standard fare in engines like MegaZeux, where the obvious things to do when presented with tiles and pickups were to make mazes. I don’t mean that in a bad way; the maze is the fundamental environmental obstacle.

A couple places in here felt like invisible teleport mazes I had to brute-force, but I might have been missing a hint somewhere. I did make it through with only a little trouble, but alas — I stepped in a bad warp somewhere and got sent to the upper left corner of the starting screen, which is surrounded by walls. So Klyde’s new life is being trapped eternally in a nowhere space.

FINAL SCORE: 19/20 apples

And more

That was only a third of the games, and I don’t think even half of the ones I’ve played. I’ll have to do a second post covering the rest of them? Maybe a third?

Or maybe this is a ludicrous format for commenting on several dozen games and I should try to narrow it down to the ones that resonated the most for Strawberry Jam 2? Maybe??

Firefox 58 is out

Post Syndicated from ris original https://lwn.net/Articles/745180/rss

Firefox 58 has been released. “With this release, we’re building on the great foundation provided by our all-new Firefox Quantum browser. We’re optimizing the performance gains we released in 57 by improving the way we render graphics and cache JavaScript. We also made functional and privacy improvements to Firefox Screenshots. On Firefox for Android, we’ve added support for Progressive Web Apps (PWAs) so you can add websites to your home screen and use them like native apps.

"Skyfall attack" was attention seeking

Post Syndicated from Robert Graham original http://blog.erratasec.com/2018/01/skyfall-attack-was-attention-seeking.html

After the Meltdown/Spectre attacks, somebody created a website promising related “Skyfall/Solace” attacks. They revealed today that it was a “hoax”.

It was a bad hoax. It wasn’t a clever troll, parody, or commentary. It was childish behavior seeking attention.
For all you hate naming of security vulnerabilities, Meltdown/Spectre was important enough to deserve a name. Sure, from an infosec perspective, it was minor, we just patch and move on. But from an operating-system and CPU design perspective, these things where huge.
Page table isolation to fix Meltdown is a fundamental redesign of the operating system. What you learned in college about how Solaris, Windows, Linux, and BSD were designed is now out-of-date. It’s on the same scale of change as address space randomization.
The same is true of Spectre. It changes what capabilities are given to JavaScript (buffers and high resolution timers). It dramatically increases the paranoia we have of running untrusted code from the Internet. We’ve been cleansing JavaScript of things like buffer-overflows and type confusion errors, now we have to cleanse it of branch prediction issues.

Moreover, not only do we need to change software, we need to change the CPU. No, we won’t get rid of branch-prediction and out-of-order execution, but there things that can easily be done to mitigate these attacks. We won’t be recalling the billions of CPUs already shipped, and it will take a year before fixed CPUs appear on the market, but it’s still an important change. That we fix security through such a massive hardware change is by itself worthy of “names”.

Yes, the “naming” of vulnerabilities is annoying. A bunch of vulns named by their creators have disappeared, and we’ve stopped talking about them. On the other hand, we still talk about Heartbleed and Shellshock, because they were damn important. A decade from now, we’ll still be talking about Meltdown/Spectre. Even if they hadn’t been named by their creators, we still would’ve come up with nicknames to talk about them, because CVE numbers are so inconvenient.
Thus, the hoax’s mocking of the naming is invalid. It was largely incoherent rambling from somebody who really doesn’t understand the importance of these vulns, who uses the hoax to promote themselves.

When You Have A Blockchain, Everything Looks Like a Nail

Post Syndicated from Bozho original https://techblog.bozho.net/blockchain-everything-looks-like-nail/

Blockchain, AI, big data, NoSQL, microservices, single page applications, cloud, SOA. What do these have in common? They have been or are hyped. At some point they were “the big thing” du jour. Everyone was investigating the possibility of using them, everyone was talking about them, there were meetups, conferences, articles on Hacker news and reddit. There are more examples, of course (which is the javascript framework this month?) but I’ll focus my examples on those above.

Another thing they have in common is that they are useful. All of them have some pretty good applications that are definitely worth the time and investment.

Yet another thing they have in common is that they are far from universally applicable. I’ve argued that monoliths are often still the better approach and that microservices introduce too much complexity for the average project. Big Data is something very few organizations actually have; AI/machine learning can help a wide variety of problems, but it is just a tool in a toolbox, not the solution to all problems. Single page applications are great for, yeah, applications, but most websites are still websites, not feature-rich frontends – you don’t need an SPA for every type of website. NoSQL has solved niche issues, and issues of scale that few companies have had, but nothing beats a good old relational database for the typical project out there. “The cloud” is not always where you want your software to be; and SOA just means everything (ESBs, direct integrations, even microservices, according to some). And the blockchain – it seems to be having limited success beyond cryptocurrencies.

And finally, another trait many of them share is that the hype has settled down. Only yesterday I read an article about the “death of the microservices madness”. I don’t see nearly as many new NoSQL databases as a few years ago, some of the projects that have been popular have faded. SOA and “the cloud” are already “boring”, and we’ve realized we don’t actually have big data if it fits in an Excel spreadsheet. SPAs and AI are still high in popularity, but we are getting a good understanding as a community why and when they are useful.

But it seems that nuanced reality has never stopped us from hyping a particular technology or approach. And maybe that’s okay in order to get a promising, though niche, technology, the spotlight and let it shine in the particular usecases where it fits.

But countless projects have and will suffer from our collective inability to filter through these hypes. I’d bet millions of developer hours have been wasted in trying to use the above technologies where they just didn’t fit. It’s like that scene from Idiocracy where a guy tries to fit a rectangular figure into a circular hole.

And the new one is not “the blockchain”. I won’t repeat my rant, but in summary – it doesn’t solve many of the problems companies are trying to solve with it right now just because it’s cool. Or at least it doesn’t solve them better than existing solutions. Many pilots will be carried out, many hours will be wasted in figuring out why that thing doesn’t work. A few of those projects will be a good fit and will actually bring value.

Do you need to reach multi-party consensus for the data you store? Can all stakeholder support the infrastructure to run their node(s)? Do they have the staff to administer the node(s)? Do you need to execute distributed application code on the data? Won’t it be easier to just deploy RESTful APIs and integrate the parties through that? Do you need to store all the data, or just parts of it, to guarantee data integrity?

“If you have is a hammer, everything looks like a nail” as the famous saying goes. In the software industry we repeatedly find new and cool hammers and then try to hit as many nails as we can. But only few of them are actual nails. The rest remain ugly, hard to support, “who was the idiot that wrote this” and “I wasn’t here when the decisions were made” types of projects.

I don’t have the illusion that we will calm down and skip the next hypes. Especially if adding the hyped word to your company raises your stock price. But if there’s one thing I’d like people to ask themselves when choosing a technology stack, it is “do we really need that to solve our problems?”.

If the answer is really “yes”, then great, go ahead and deploy the multi-organization permissioned blockchain, or fork Ethereum, or whatever. If not, you can still do a project a home that you can safely abandon. And if you need some pilot project to figure out whether the new piece of technology would be beneficial – go ahead and try it. But have a baseline – the fact that it somehow worked doesn’t mean it’s better than old, tested models of doing the same thing.

The post When You Have A Blockchain, Everything Looks Like a Nail appeared first on Bozho's tech blog.

Eevee gained 2791 experience points

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/15/eevee-gained-2791-experience-points/

Eevee grew to level 31!

A year strongly defined by mixed success! Also, a lot of video games.

I ran three game jams, resulting in a total of 157 games existing that may not have otherwise, which is totally mindblowing?!

For GAMES MADE QUICK???, glip and I made NEON PHASE, a short little exploratory platformer. Honestly, I should give myself more credit for this and the rest of the LÖVE games I’ve based on the same codebase — I wove a physics engine (and everything else!) from scratch and it has held up remarkably well for a variety of different uses.

I successfully finished an HD version of Isaac’s Descent using my LÖVE engine, though it doesn’t have anything new over the original and I’ve only released it as a tech demo on Patreon.

For Strawberry Jam (NSFW!) we made fox flux (slightly NSFW!), which felt like a huge milestone: the first game where I made all the art! I mean, not counting Isaac’s Descent, which was for a very limited platform. It’s a pretty arbitrary milestone, yes, but it feels significant. I’ve been working on expanding the game into a longer and slightly less buggy experience, but the art is taking the longest by far. I must’ve spent weeks on player sprites alone.

We then set about working on Bolthaven, a sequel of sorts to NEON PHASE, and got decently far, and then abandond it. Oops.

We then started a cute little PICO-8 game, and forgot about it. Oops.

I was recruited to help with Chaos Composer, a more ambitious game glip started with someone else in Unity. I had to get used to Unity, and we squabbled a bit, but the game is finally about at the point where it’s “playable” and “maps” can be designed? It’s slightly on hold at the moment while we all finish up some other stuff, though.

We made a birthday game for two of our friends whose birthdays were very close together! Only they got to see it.

For Ludum Dare 38, we made Lunar Depot 38, a little “wave shooter” or whatever you call those? The AI is pretty rough, seeing as this was the first time I’d really made enemies and I had 72 hours to figure out how to do it, but I still think it’s pretty fun to play and I love the circular world.

I made Roguelike Simulator as an experiment with making something small and quick with a simple tool, and I had a lot of fun! I definitely want to do more stuff like this in the future.

And now we’re working on a game about Star Anise, my cat’s self-insert, which is looking to have more polish and depth than anything we’ve done so far! We’ve definitely come a long way in a year.

Somewhere along the line, I put out a call for a “potluck” project, where everyone would give me sprites of a given size without knowing what anyone else had contributed, and I would then make a game using only those sprites. Unfortunately, that stalled a few times: I tried using the Phaser JS library, but we didn’t get along; I tried LÖVE, but didn’t know where to go with the game; and then I decided to use this as an experiment with procedural generation, and didn’t get around to it. I still feel bad that everyone did work for me and I didn’t follow through, but I don’t know whether this will ever become a game.

veekun, alas, consumed months of my life. I finally got Sun and Moon loaded, but it took weeks of work since I was basically reinventing all the tooling we’d ever had from scratch, without even having most of that tooling available as a reference. It was worth it in the end, at least: Ultra Sun and Ultra Moon only took a few days to get loaded. But veekun itself is still missing some obvious Sun/Moon features, and the whole site needs an overhaul, and I just don’t know if I want to dedicate that much time to it when I have so much other stuff going on that’s much more interesting to me right now.

I finally turned my blog into more of a website, giving it a neat front page that lists a bunch of stuff I’ve done. I made a release category at last, though I’m still not quite in the habit of using it.

I wrote some blog posts, of course! I think the most interesting were JavaScript got better while I wasn’t looking and Object models. I was also asked to write a couple pieces for money for a column that then promptly shut down.

On a whim, I made a set of Eevee mugshots for Doom, which I think is a decent indication of my (pixel) art progress over the year?

I started idchoppers, a Doom parsing and manipulation library written in Rust, though it didn’t get very far and I’ve spent most of the time fighting with Rust because it won’t let me implement all my extremely bad ideas. It can do a couple things, at least, like flip maps very quickly and render maps to SVG.

I did toy around with music a little, but not a lot.

I wrote two short twines for Flora. They’re okay. I’m working on another; I think it’ll be better.

I didn’t do a lot of art overall, at least compared to the two previous years; most of my art effort over the year has gone into fox flux, which requires me to learn a whole lot of things. I did dip my toes into 3D modelling, most notably producing my current Twitter banner as well as this cool Star Anise animation. I wouldn’t mind doing more of that; maybe I’ll even try to make a low-poly pixel-textured 3D game sometime.

I restarted my book with a much better concept, though so far I’ve only written about half a chapter. Argh. I see that the vast majority of the work was done within the span of a single week, which is bad since that means I only worked on it for a week, but good since that means I can actually do a pretty good amount of work in only a week. I also did a lot of squabbling with tooling, which is hopefully mostly out of the way now.

My computer broke? That was an exciting week.


A lot of stuff, but the year as a whole still feels hit or miss. All the time I spent on veekun feels like a black void in the middle of the year, which seems like a good sign that I maybe don’t want to pour even more weeks into it in the near future.

Mostly, I want to do: more games, more art, more writing, more music.

I want to try out some tiny game making tools and make some tiny games with them — partly to get exposure to different things, partly to get more little ideas out into the world regularly, and partly to get more practice at letting myself have ideas. I have a couple tools in mind and I guess I’ll aim at a microgame every two months or so? I’d also like to finish the expanded fox flux by the end of the year, of course, though at the moment I can’t even gauge how long it might take.

I seriously lapsed on drawing last year, largely because fox flux pixel art took me so much time. So I want to draw more, and I want to get much faster at pixel art. It would probably help if I had a more concrete goal for drawing, so I might try to draw some short comics and write a little visual novel or something, which would also force me to aim for consistency.

I want to work on my book more, of course, but I also want to try my hand at a bit more fiction. I’ve had a blast writing dialogue for our games! I just shy away from longer-form writing for some reason — which seems ridiculous when a large part of my audience found me through my blog. I do think I’ve had some sort of breakthrough in the last month or two; I suddenly feel a good bit more confident about writing in general and figuring out what I want to say? One recent post I know I wrote in a single afternoon, which virtually never happens because I keep rewriting and rearranging stuff. Again, a visual novel would be a good excuse to practice writing fiction without getting too bogged down in details.

And, ah, music. I shy heavily away from music, since I have no idea what I’m doing, and also I seem to spend a lot of time fighting with tools. (Surprise.) I tried out SunVox for the first time just a few days ago and have been enjoying it quite a bit for making sound effects, so I might try it for music as well. And once again, visual novel background music is a pretty low-pressure thing to compose for. Hell, visual novels are small games, too, so that checks all the boxes. I guess I’ll go make a visual novel.

Here’s to twenty gayteen!

Wanted: Sales Engineer

Post Syndicated from Yev original https://www.backblaze.com/blog/wanted-sales-engineer/

At inception, Backblaze was a consumer company. Thousands upon thousands of individuals came to our website and gave us $5/mo to keep their data safe. But, we didn’t sell business solutions. It took us years before we had a sales team. In the last couple of years, we’ve released products that businesses of all sizes love: Backblaze B2 Cloud Storage and Backblaze for Business Computer Backup. Those businesses want to integrate Backblaze deeply into their infrastructure, so it’s time to hire our first Sales Engineer!

Company Description:
Founded in 2007, Backblaze started with a mission to make backup software elegant and provide complete peace of mind. Over the course of almost a decade, we have become a pioneer in robust, scalable low cost cloud backup. Recently, we launched B2 – robust and reliable object storage at just $0.005/gb/mo. Part of our differentiation is being able to offer the lowest price of any of the big players while still being profitable.

We’ve managed to nurture a team oriented culture with amazingly low turnover. We value our people and their families. Don’t forget to check out our “About Us” page to learn more about the people and some of our perks.

We have built a profitable, high growth business. While we love our investors, we have maintained control over the business. That means our corporate goals are simple – grow sustainably and profitably.

Some Backblaze Perks:

  • Competitive healthcare plans
  • Competitive compensation and 401k
  • All employees receive Option grants
  • Unlimited vacation days
  • Strong coffee
  • Fully stocked Micro kitchen
  • Catered breakfast and lunches
  • Awesome people who work on awesome projects
  • Childcare bonus
  • Normal work hours
  • Get to bring your pets into the office
  • San Mateo Office – located near Caltrain and Highways 101 & 280.

Backblaze B2 cloud storage is a building block for almost any computing service that requires storage. Customers need our help integrating B2 into iOS apps to Docker containers. Some customers integrate directly to the API using the programming language of their choice, others want to solve a specific problem using ready made software, already integrated with B2.

At the same time, our computer backup product is deepening it’s integration into enterprise IT systems. We are commonly asked for how to set Windows policies, integrate with Active Directory, and install the client via remote management tools.

We are looking for a sales engineer who can help our customers navigate the integration of Backblaze into their technical environments.

Are you 1/2” deep into many different technologies, and unafraid to dive deeper?

Can you confidently talk with customers about their technology, even if you have to look up all the acronyms right after the call?

Are you excited to setup complicated software in a lab and write knowledge base articles about your work?

Then Backblaze is the place for you!

Enough about Backblaze already, what’s in it for me?
In this role, you will be given the opportunity to learn about the technologies that drive innovation today; diverse technologies that customers are using day in and out. And more importantly, you’ll learn how to learn new technologies.

Just as an example, in the past 12 months, we’ve had the opportunity to learn and become experts in these diverse technologies:

  • How to setup VM servers for lab environments, both on-prem and using cloud services.
  • Create an automatically “resetting” demo environment for the sales team.
  • Setup Microsoft Domain Controllers with Active Directory and AD Federation Services.
  • Learn the basics of OAUTH and web single sign on (SSO).
  • Archive video workflows from camera to media asset management systems.
  • How upload/download files from Javascript by enabling CORS.
  • How to install and monitor online backup installations using RMM tools, like JAMF.
  • Tape (LTO) systems. (Yes – people still use tape for storage!)

How can I know if I’ll succeed in this role?

You have:

  • Confidence. Be able to ask customers questions about their environments and convey to them your technical acumen.
  • Curiosity. Always want to learn about customers’ situations, how they got there and what problems they are trying to solve.
  • Organization. You’ll work with customers, integration partners, and Backblaze team members on projects of various lengths. You can context switch and either have a great memory or keep copious notes. Your checklists have their own checklists.

You are versed in:

  • The fundamentals of Windows, Linux and Mac OS X operating systems. You shouldn’t be afraid to use a command line.
  • Building, installing, integrating and configuring applications on any operating system.
  • Debugging failures – reading logs, monitoring usage, effective google searching to fix problems excites you.
  • The basics of TCP/IP networking and the HTTP protocol.
  • Novice development skills in any programming/scripting language. Have basic understanding of data structures and program flow.
  • Your background contains:

  • Bachelor’s degree in computer science or the equivalent.
  • 2+ years of experience as a pre or post-sales engineer.
  • The right extra credit:
    There are literally hundreds of previous experiences you can have had that would make you perfect for this job. Some experiences that we know would be helpful for us are below, but make sure you tell us your stories!

  • Experience using or programming against Amazon S3.
  • Experience with large on-prem storage – NAS, SAN, Object. And backing up data on such storage with tools like Veeam, Veritas and others.
  • Experience with photo or video media. Media archiving is a key market for Backblaze B2.
  • Program arduinos to automatically feed your dog.
  • Experience programming against web or REST APIs. (Point us towards your projects, if they are open source and available to link to.)
  • Experience with sales tools like Salesforce.
  • 3D print door stops.
  • Experience with Windows Servers, Active Directory, Group policies and the like.
  • What’s it like working with the Sales team?
    The Backblaze sales team collaborates. We help each other out by sharing ideas, templates, and our customer’s experiences. When we talk about our accomplishments, there is no “I did this,” only “we”. We are truly a team.

    We are honest to each other and our customers and communicate openly. We aim to have fun by embracing crazy ideas and creative solutions. We try to think not outside the box, but with no boxes at all. Customers are the driving force behind the success of the company and we care deeply about their success.

    If this all sounds like you:

    1. Send an email to [email protected] with the position in the subject line.
    2. Tell us a bit about your Sales Engineering experience.
    3. Include your resume.

    The post Wanted: Sales Engineer appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

    Physics cheats

    Post Syndicated from Eevee original https://eev.ee/blog/2018/01/06/physics-cheats/

    Anonymous asks:

    something about how we tweak physics to “work” better in games?

    Ho ho! Work. Get it? Like in physics…?

    Hitboxes

    Hitbox” is perhaps not the most accurate term, since the shape used for colliding with the environment and the shape used for detecting damage might be totally different. They’re usually the same in simple platformers, though, and that’s what most of my games have been.

    The hitbox is the biggest physics fudge by far, and it exists because of a single massive approximation that (most) games make: you’re controlling a single entity in the abstract, not a physical body in great detail.

    That is: when you walk with your real-world meat shell, you perform a complex dance of putting one foot in front of the other, a motion you spent years perfecting. When you walk in a video game, you press a single “walk” button. Your avatar may play an animation that moves its legs back and forth, but since you’re not actually controlling the legs independently (and since simulating them is way harder), the game just treats you like a simple shape. Fairly often, this is a box, or something very box-like.

    An Eevee sprite standing on faux ground; the size of the underlying image and the hitbox are outlined

    Since the player has no direct control over the exact placement of their limbs, it would be slightly frustrating to have them collide with the world. This is especially true in cases like the above, where the tail and left ear protrude significantly out from the main body. If that Eevee wanted to stand against a real-world wall, she would simply tilt her ear or tail out of the way, so there’s no reason for the ear to block her from standing against a game wall. To compensate for this, the ear and tail are left out of the collision box entirely and will simply jut into a wall if necessary — a goofy affordance that’s so common it doesn’t even register as unusual. As a bonus (assuming this same box is used for combat), she won’t take damage from projectiles that merely graze past an ear.

    (One extra consideration for sprite games in particular: the hitbox ought to be horizontally symmetric around the sprite’s pivot — i.e. the point where the entity is truly considered to be standing — so that the hitbox doesn’t abruptly move when the entity turns around!)

    Corners

    Treating the player (and indeed most objects) as a box has one annoying side effect: boxes have corners. Corners can catch on other corners, even by a single pixel. Real-world bodies tend to be a bit rounder and squishier and this can tolerate grazing a corner; even real-world boxes will simply rotate a bit.

    Ah, but in our faux physics world, we generally don’t want conscious actors (such as the player) to rotate, even with a realistic physics simulator! Real-world bodies are made of parts that will generally try to keep you upright, after all; you don’t tilt back and forth much.

    One way to handle corners is to simply remove them from conscious actors. A hitbox doesn’t have to be a literal box, after all. A popular alternative — especially in Unity where it’s a standard asset — is the pill-shaped capsule, which has semicircles/hemispheres on the top and bottom and a cylindrical body in 3D. No corners, no problem.

    Of course, that introduces a new problem: now the player can’t balance precariously on edges without their rounded bottom sliding them off. Alas.

    If you’re stuck with corners, then, you may want to use a corner bump, a term I just made up. If the player would collide with a corner, but the collision is only by a few pixels, just nudge them to the side a bit and carry on.

    An Eevee sprite trying to move sideways into a shallow ledge; the game bumps her upwards slightly, so she steps onto it instead

    When the corner is horizontal, this creates stairs! This is, more or less kinda, how steps work in Doom: when the player tries to cross from one sector into another, if the height difference is 24 units or less, the game simply bumps them upwards to the height of the new floor and lets them continue on.

    Implementing this in a game without Doom’s notion of sectors is a little trickier. In fact, I still haven’t done it. Collision detection based on rejection gets it for free, kinda, but it’s not very deterministic and it breaks other things. But that’s a whole other post.

    Gravity

    Gravity is pretty easy. Everything accelerates downwards all the time. What’s interesting are the exceptions.

    Jumping

    Jumping is a giant hack.

    Think about how actual jumping works: you tense your legs, which generally involves bending your knees first, and then spring upwards. In a platformer, you can just leap whenever you feel like it, which is nonsense. Also you go like twenty feet into the air?

    Worse, most platformers allow variable-height jumping, where your jump is lower if you let go of the jump button while you’re in the air. Normally, one would expect to have to decide how much force to put into the jump beforehand.

    But of course this is about convenience of controls: when jumping is your primary action, you want to be able to do it immediately, without any windup for how high you want to jump.

    (And then there’s double jumping? Come on.)

    Air control is a similar phenomenon: usually you’d jump in a particular direction by controlling how you push off the ground with your feet, but in a video game, you don’t have feet! You only have the box. The compromise is to let you control your horizontal movement to a limit degree in midair, even though that doesn’t make any sense. (It’s way more fun, though, and overall gives you more movement options, which are good to have in an interactive medium.)

    Air control also exposes an obvious place that game physics collide with the realistic model of serious physics engines. I’ve mentioned this before, but: if you use Real Physics™ and air control yourself into a wall, you might find that you’ll simply stick to the wall until you let go of the movement buttons. Why? Remember, player movement acts as though an external force were pushing you around (and from the perspective of a Real™ physics engine, this is exactly how you’d implement it) — so air-controlling into a wall is equivalent to pushing a book against a wall with your hand, and the friction with the wall holds you in place. Oops.

    Ground sticking

    Another place game physics conflict with physics engines is with running to the top of a slope. On a real hill, of course, you land on top of the slope and are probably glad of it; slopes are hard to climb!

    An Eevee moves to the top of a slope, and rather than step onto the flat top, she goes flying off into the air

    In a video game, you go flying. Because you’re a box. With momentum. So you hit the peak and keep going in the same direction. Which is diagonally upwards.

    Projectiles

    To make them more predictable, projectiles generally aren’t subject to gravity, at least as far as I’ve seen. The real world does not have such an exemption. The real world imposes gravity even on sniper rifles, which in a video game are often implemented as an instant trace unaffected by anything in the world because the bullet never actually exists in the world.

    Resistance

    Ah. Welcome to hell.

    Water

    Water is an interesting case, and offhand I don’t know the gritty details of how games implement it. In the real world, water applies a resistant drag force to movement — and that force is proportional to the square of velocity, which I’d completely forgotten until right now. I am almost positive that no game handles that correctly. But then, in real-world water, you can push against the water itself for movement, and games don’t simulate that either. What’s the rough equivalent?

    The Sonic Physics Guide suggests that Sonic handles it by basically halving everything: acceleration, max speed, friction, etc. When Sonic enters water, his speed is cut; when Sonic exits water, his speed is increased.

    That last bit feels validating — I could swear Metroid Prime did the same thing, and built my own solution around it, but couldn’t remember for sure. It makes no sense, of course, for a jump to become faster just because you happened to break the surface of the water, but it feels fantastic.

    The thing I did was similar, except that I didn’t want to add a multiplier in a dozen places when you happen to be underwater (and remember which ones need it to be squared, etc.). So instead, I calculate everything completely as normal, so velocity is exactly the same as it would be on dry land — but the distance you would move gets halved. The effect seems to be pretty similar to most platformers with water, at least as far as I can tell. It hasn’t shown up in a published game and I only added this fairly recently, so I might be overlooking some reason this is a bad idea.

    (One reason that comes to mind is that velocity is now a little white lie while underwater, so anything relying on velocity for interesting effects might be thrown off. Or maybe that’s correct, because velocity thresholds should be halved underwater too? Hm!)

    Notably, air is also a fluid, so it should behave the same way (just with different constants). I definitely don’t think any games apply air drag that’s proportional to the square of velocity.

    Friction

    Friction is, in my experience, a little handwaved. Probably because real-world friction is so darn complicated.

    Consider that in the real world, we want very high friction on the surfaces we walk on — shoes and tires are explicitly designed to increase it, even. We move by bracing a back foot against the ground and using that to push ourselves forward, so we want the ground to resist our push as much as possible.

    In a game world, we are a box. We move by being pushed by some invisible outside force, so if the friction between ourselves and the ground is too high, we won’t be able to move at all! That’s complete nonsense physically, but it turns out to be handy in some cases — for example, highish friction can simulate walking through deep mud, which should be difficult due to fluid drag and low friction.

    But the best-known example of the fakeness of game friction is video game ice. Walking on real-world ice is difficult because the low friction means low grip; your feet are likely to slip out from under you, and you’ll simply fall down and have trouble moving at all. In a video game, you can’t fall down, so you have the opposite experience: you spend most of your time sliding around uncontrollably. Yet ice is so common in video games (and perhaps so uncommon in places I’ve lived) that I, at least, had never really thought about this disparity until an hour or so ago.

    Game friction vs real-world friction

    Real-world friction is a force. It’s the normal force (which is the force exerted by the object on the surface) times some constant that depends on how the two materials interact.

    Force is mass times acceleration, and platformers often ignore mass, so friction ought to be an acceleration — applied against the object’s movement, but never enough to push it backwards.

    I haven’t made any games where variable friction plays a significant role, but my gut instinct is that low friction should mean the player accelerates more slowly but has a higher max speed, and high friction should mean the opposite. I see from my own source code that I didn’t even do what I just said, so let’s defer to some better-made and well-documented games: Sonic and Doom.

    In Sonic, friction is a fixed value subtracted from the player’s velocity (regardless of direction) each tic. Sonic has a fixed framerate, so the units are really pixels per tic squared (i.e. acceleration), multiplied by an implicit 1 tic per tic. So far, so good.

    But Sonic’s friction only applies if the player isn’t pressing or . Hang on, that isn’t friction at all; that’s just deceleration! That’s equivalent to jogging to a stop. If friction were lower, Sonic would take longer to stop, but otherwise this is only tangentially related to friction.

    (In fairness, this approach would decently emulate friction for non-conscious sliding objects, which are never going to be pressing movement buttons. Also, we don’t have the Sonic source code, and the name “friction” is a fan invention; the Sonic Physics Guide already uses “deceleration” to describe the player’s acceleration when turning around.)

    Okay, let’s try Doom. In Doom, the default friction is 90.625%.

    Hang on, what?

    Yes, in Doom, friction is a multiplier applied every tic. Doom runs at 35 tics per second, so this is a multiplier of 0.032 per second. Yikes!

    This isn’t anything remotely like real friction, but it’s much easier to implement. With friction as acceleration, the game has to know both the direction of movement (so it can apply friction in the opposite direction) and the magnitude (so it doesn’t overshoot and launch the object in the other direction). That means taking a semi-costly square root and also writing extra code to cap the amount of friction. With a multiplier, neither is necessary; just multiply the whole velocity vector and you’re done.

    There are some downsides. One is that objects will never actually stop, since multiplying by 3% repeatedly will never produce a result of zero — though eventually the speed will become small enough to either slip below a “minimum speed” threshold or simply no longer fit in a float representation. Another is that the units are fairly meaningless: with Doom’s default friction of 90.625%, about how long does it take for the player to stop? I have no idea, partly because “stop” is ambiguous here! If friction were an acceleration, I could divide it into the player’s max speed to get a time.

    All that aside, what are the actual effects of changing Doom’s friction? What an excellent question that’s surprisingly tricky to answer. (Note that friction can’t be changed in original Doom, only in the Boom port and its derivatives.) Here’s what I’ve pieced together.

    Doom’s “friction” is really two values. “Friction” itself is a multiplier applied to moving objects on every tic, but there’s also a move factor which defaults to \(\frac{1}{32} = 0.03125\) and is derived from friction for custom values.

    Every tic, the player’s velocity is multiplied by friction, and then increased by their speed times the move factor.

    $$
    v(n) = v(n – 1) \times friction + speed \times move factor
    $$

    Eventually, the reduction from friction will balance out the speed boost. That happens when \(v(n) = v(n – 1)\), so we can rearrange it to find the player’s effective max speed:

    $$
    v = v \times friction + speed \times move factor \\
    v – v \times friction = speed \times move factor \\
    v = speed \times \frac{move factor}{1 – friction}
    $$

    For vanilla Doom’s move factor of 0.03125 and friction of 0.90625, that becomes:

    $$
    v = speed \times \frac{\frac{1}{32}}{1 – \frac{29}{32}} = speed \times \frac{\frac{1}{32}}{\frac{3}{32}} = \frac{1}{3} \times speed
    $$

    Curiously, “speed” is three times the maximum speed an actor can actually move. Doomguy’s run speed is 50, so in practice he moves a third of that, or 16⅔ units per tic. (Of course, this isn’t counting SR40, a bug that lets Doomguy run ~40% faster than intended diagonally.)

    So now, what if you change friction? Even more curiously, the move factor is calculated completely differently depending on whether friction is higher or lower than the default Doom amount:

    $$
    move factor = \begin{cases}
    \frac{133 – 128 \times friction}{544} &≈ 0.244 – 0.235 \times friction & \text{ if } friction \ge \frac{29}{32} \\
    \frac{81920 \times friction – 70145}{1048576} &≈ 0.078 \times friction – 0.067 & \text{ otherwise }
    \end{cases}
    $$

    That’s pretty weird? Complicating things further is that low friction (which means muddy terrain, remember) has an extra multiplier on its move factor, depending on how fast you’re already going — the idea is apparently that you have a hard time getting going, but it gets easier as you find your footing. The extra multiplier maxes out at 8, which makes the two halves of that function meet at the vanilla Doom value.

    A graph of the relationship between friction and move factor

    That very top point corresponds to the move factor from the original game. So no matter what you do to friction, the move factor becomes lower. At 0.85 and change, you can no longer move at all; below that, you move backwards.

    From the formula above, it’s easy to see what changes to friction and move factor will do to Doomguy’s stable velocity. Move factor is in the numerator, so increasing it will increase stable velocity — but it can’t increase, so stable velocity can only ever decrease. Friction is in the denominator, but it’s subtracted from 1, so increasing friction will make the denominator a smaller value less than 1, i.e. increase stable velocity. Combined, we get this relationship between friction and stable velocity.

    A graph showing stable velocity shooting up dramatically as friction increases

    As friction approaches 1, stable velocity grows without bound. This makes sense, given the definition of \(v(n)\) — if friction is 1, the velocity from the previous tic isn’t reduced at all, so we just keep accelerating freely.

    All of this is why I’m wary of using multipliers.

    Anyway, this leaves me with one last question about the effects of Doom’s friction: how long does it take to reach stable velocity? Barring precision errors, we’ll never truly reach stable velocity, but let’s say within 5%. First we need a closed formula for the velocity after some number of tics. This is a simple recurrence relation, and you can write a few terms out yourself if you want to be sure this is right.

    $$
    v(n) = v_0 \times friction^n + speed \times move factor \times \frac{friction^n – 1}{friction – 1}
    $$

    Our initial velocity is zero, so the first term disappears. Set this equal to the stable formula and solve for n:

    $$
    speed \times move factor \times \frac{friction^n – 1}{friction – 1} = (1 – 5\%) \times speed \times \frac{move factor}{1 – friction} \\
    friction^n – 1 = -(1 – 5\%) \\
    n = \frac{\ln 5\%}{\ln friction}
    $$

    Speed” and move factor disappear entirely, which makes sense, and this is purely a function of friction (and how close we want to get). For vanilla Doom, that comes out to 30.4, which is a little less than a second. For other values of friction:

    A graph of time to stability which leaps upwards dramatically towards the right

    As friction increases (which in Doom terms means the surface is more slippery), it takes longer and longer to reach stable speed, which is in turn greater and greater. For lesser friction (i.e. mud), stable speed is lower, but reached fairly quickly. (Of course, the extra “getting going” multiplier while in mud adds some extra time here, but including that in the graph is a bit more complicated.)

    I think this matches with my instincts above. How fascinating!

    What’s that? This is way too much math and you hate it? Then don’t use multipliers in game physics.

    Uh

    That was a hell of a diversion!

    I guess the goofiest stuff in basic game physics is really just about mapping player controls to in-game actions like jumping and deceleration; the rest consists of hacks to compensate for representing everything as a box.

    Detecting Adblocker Blockers

    Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/detecting_adblo.html

    Interesting research on the prevalence of adblock blockers: “Measuring and Disrupting Anti-Adblockers Using Differential Execution Analysis“:

    Abstract: Millions of people use adblockers to remove intrusive and malicious ads as well as protect themselves against tracking and pervasive surveillance. Online publishers consider adblockers a major threat to the ad-powered “free” Web. They have started to retaliate against adblockers by employing anti-adblockers which can detect and stop adblock users. To counter this retaliation, adblockers in turn try to detect and filter anti-adblocking scripts. This back and forth has prompted an escalating arms race between adblockers and anti-adblockers.

    We want to develop a comprehensive understanding of anti-adblockers, with the ultimate aim of enabling adblockers to bypass state-of-the-art anti-adblockers. In this paper, we present a differential execution analysis to automatically detect and analyze anti-adblockers. At a high level, we collect execution traces by visiting a website with and without adblockers. Through differential execution analysis, we are able to pinpoint the conditions that lead to the differences caused by anti-adblocking code. Using our system, we detect anti-adblockers on 30.5% of the Alexa top-10K websites which is 5-52 times more than reported in prior literature. Unlike prior work which is limited to detecting visible reactions (e.g., warning messages) by anti-adblockers, our system can discover attempts to detect adblockers even when there is no visible reaction. From manually checking one third of the detected websites, we find that the websites that have no visible reactions constitute over 90% of the cases, completely dominating the ones that have visible warning messages. Finally, based on our findings, we further develop JavaScript rewriting and API hooking based solutions (the latter implemented as a Chrome extension) to help adblockers bypass state-of-the-art anti-adblockers.

    News article.

    Random with care

    Post Syndicated from Eevee original https://eev.ee/blog/2018/01/02/random-with-care/

    Hi! Here are a few loose thoughts about picking random numbers.

    A word about crypto

    DON’T ROLL YOUR OWN CRYPTO

    This is all aimed at frivolous pursuits like video games. Hell, even video games where money is at stake should be deferring to someone who knows way more than I do. Otherwise you might find out that your deck shuffles in your poker game are woefully inadequate and some smartass is cheating you out of millions. (If your random number generator has fewer than 226 bits of state, it can’t even generate every possible shuffling of a deck of cards!)

    Use the right distribution

    Most languages have a random number primitive that spits out a number uniformly in the range [0, 1), and you can go pretty far with just that. But beware a few traps!

    Random pitches

    Say you want to pitch up a sound by a random amount, perhaps up to an octave. Your audio API probably has a way to do this that takes a pitch multiplier, where I say “probably” because that’s how the only audio API I’ve used works.

    Easy peasy. If 1 is unchanged and 2 is pitched up by an octave, then all you need is rand() + 1. Right?

    No! Pitch is exponential — within the same octave, the “gap” between C and C♯ is about half as big as the gap between B and the following C. If you pick a pitch multiplier uniformly, you’ll have a noticeable bias towards the higher pitches.

    One octave corresponds to a doubling of pitch, so if you want to pick a random note, you want 2 ** rand().

    Random directions

    For two dimensions, you can just pick a random angle with rand() * TAU.

    If you want a vector rather than an angle, or if you want a random direction in three dimensions, it’s a little trickier. You might be tempted to just pick a random point where each component is rand() * 2 - 1 (ranging from −1 to 1), but that’s not quite right. A direction is a point on the surface (or, equivalently, within the volume) of a sphere, and picking each component independently produces a point within the volume of a cube; the result will be a bias towards the corners of the cube, where there’s much more extra volume beyond the sphere.

    No? Well, just trust me. I don’t know how to make a diagram for this.

    Anyway, you could use the Pythagorean theorem a few times and make a huge mess of things, or it turns out there’s a really easy way that even works for two or four or any number of dimensions. You pick each coordinate from a Gaussian (normal) distribution, then normalize the resulting vector. In other words, using Python’s random module:

    1
    2
    3
    4
    5
    6
    def random_direction():
        x = random.gauss(0, 1)
        y = random.gauss(0, 1)
        z = random.gauss(0, 1)
        r = math.sqrt(x*x + y*y + z*z)
        return x/r, y/r, z/r
    

    Why does this work? I have no idea!

    Note that it is possible to get zero (or close to it) for every component, in which case the result is nonsense. You can re-roll all the components if necessary; just check that the magnitude (or its square) is less than some epsilon, which is equivalent to throwing away a tiny sphere at the center and shouldn’t affect the distribution.

    Beware Gauss

    Since I brought it up: the Gaussian distribution is a pretty nice one for choosing things in some range, where the middle is the common case and should appear more frequently.

    That said, I never use it, because it has one annoying drawback: the Gaussian distribution has no minimum or maximum value, so you can’t really scale it down to the range you want. In theory, you might get any value out of it, with no limit on scale.

    In practice, it’s astronomically rare to actually get such a value out. I did a hundred million trials just to see what would happen, and the largest value produced was 5.8.

    But, still, I’d rather not knowingly put extremely rare corner cases in my code if I can at all avoid it. I could clamp the ends, but that would cause unnatural bunching at the endpoints. I could reroll if I got a value outside some desired range, but I prefer to avoid rerolling when I can, too; after all, it’s still (astronomically) possible to have to reroll for an indefinite amount of time. (Okay, it’s really not, since you’ll eventually hit the period of your PRNG. Still, though.) I don’t bend over backwards here — I did just say to reroll when picking a random direction, after all — but when there’s a nicer alternative I’ll gladly use it.

    And lo, there is a nicer alternative! Enter the beta distribution. It always spits out a number in [0, 1], so you can easily swap it in for the standard normal function, but it takes two “shape” parameters α and β that alter its behavior fairly dramatically.

    With α = β = 1, the beta distribution is uniform, i.e. no different from rand(). As α increases, the distribution skews towards the right, and as β increases, the distribution skews towards the left. If α = β, the whole thing is symmetric with a hump in the middle. The higher either one gets, the more extreme the hump (meaning that value is far more common than any other). With a little fiddling, you can get a number of interesting curves.

    Screenshots don’t really do it justice, so here’s a little Wolfram widget that lets you play with α and β live:

    Note that if α = 1, then 1 is a possible value; if β = 1, then 0 is a possible value. You probably want them both greater than 1, which clamps the endpoints to zero.

    Also, it’s possible to have either α or β or both be less than 1, but this creates very different behavior: the corresponding endpoints become poles.

    Anyway, something like α = β = 3 is probably close enough to normal for most purposes but already clamped for you. And you could easily replicate something like, say, NetHack’s incredibly bizarre rnz function.

    Random frequency

    Say you want some event to have an 80% chance to happen every second. You (who am I kidding, I) might be tempted to do something like this:

    1
    2
    if random() < 0.8 * dt:
        do_thing()
    

    In an ideal world, dt is always the same and is equal to 1 / f, where f is the framerate. Replace that 80% with a variable, say P, and every tic you have a P / f chance to do the… whatever it is.

    Each second, f tics pass, so you’ll make this check f times. The chance that any check succeeds is the inverse of the chance that every check fails, which is \(1 – \left(1 – \frac{P}{f}\right)^f\).

    For P of 80% and a framerate of 60, that’s a total probability of 55.3%. Wait, what?

    Consider what happens if the framerate is 2. On the first tic, you roll 0.4 twice — but probabilities are combined by multiplying, and splitting work up by dt only works for additive quantities. You lose some accuracy along the way. If you’re dealing with something that multiplies, you need an exponent somewhere.

    But in this case, maybe you don’t want that at all. Each separate roll you make might independently succeed, so it’s possible (but very unlikely) that the event will happen 60 times within a single second! Or 200 times, if that’s someone’s framerate.

    If you explicitly want something to have a chance to happen on a specific interval, you have to check on that interval. If you don’t have a gizmo handy to run code on an interval, it’s easy to do yourself with a time buffer:

    1
    2
    3
    4
    5
    6
    timer += dt
    # here, 1 is the "every 1 seconds"
    while timer > 1:
        timer -= 1
        if random() < 0.8:
            do_thing()
    

    Using while means rolls still happen even if you somehow skipped over an entire second.

    (For the curious, and the nerds who already noticed: the expression \(1 – \left(1 – \frac{P}{f}\right)^f\) converges to a specific value! As the framerate increases, it becomes a better and better approximation for \(1 – e^{-P}\), which for the example above is 0.551. Hey, 60 fps is pretty accurate — it’s just accurately representing something nowhere near what I wanted. Er, you wanted.)

    Rolling your own

    Of course, you can fuss with the classic [0, 1] uniform value however you want. If I want a bias towards zero, I’ll often just square it, or multiply two of them together. If I want a bias towards one, I’ll take a square root. If I want something like a Gaussian/normal distribution, but with clearly-defined endpoints, I might add together n rolls and divide by n. (The normal distribution is just what you get if you roll infinite dice and divide by infinity!)

    It’d be nice to be able to understand exactly what this will do to the distribution. Unfortunately, that requires some calculus, which this post is too small to contain, and which I didn’t even know much about myself until I went down a deep rabbit hole while writing, and which in many cases is straight up impossible to express directly.

    Here’s the non-calculus bit. A source of randomness is often graphed as a PDF — a probability density function. You’ve almost certainly seen a bell curve graphed, and that’s a PDF. They’re pretty nice, since they do exactly what they look like: they show the relative chance that any given value will pop out. On a bog standard bell curve, there’s a peak at zero, and of course zero is the most common result from a normal distribution.

    (Okay, actually, since the results are continuous, it’s vanishingly unlikely that you’ll get exactly zero — but you’re much more likely to get a value near zero than near any other number.)

    For the uniform distribution, which is what a classic rand() gives you, the PDF is just a straight horizontal line — every result is equally likely.


    If there were a calculus bit, it would go here! Instead, we can cheat. Sometimes. Mathematica knows how to work with probability distributions in the abstract, and there’s a free web version you can use. For the example of squaring a uniform variable, try this out:

    1
    PDF[TransformedDistribution[u^2, u \[Distributed] UniformDistribution[{0, 1}]], u]
    

    (The \[Distributed] is a funny tilde that doesn’t exist in Unicode, but which Mathematica uses as a first-class operator. Also, press shiftEnter to evaluate the line.)

    This will tell you that the distribution is… \(\frac{1}{2\sqrt{u}}\). Weird! You can plot it:

    1
    Plot[%, {u, 0, 1}]
    

    (The % refers to the result of the last thing you did, so if you want to try several of these, you can just do Plot[PDF[…], u] directly.)

    The resulting graph shows that numbers around zero are, in fact, vastly — infinitely — more likely than anything else.

    What about multiplying two together? I can’t figure out how to get Mathematica to understand this, but a great amount of digging revealed that the answer is -ln x, and from there you can plot them both on Wolfram Alpha. They’re similar, though squaring has a much better chance of giving you high numbers than multiplying two separate rolls — which makes some sense, since if either of two rolls is a low number, the product will be even lower.

    What if you know the graph you want, and you want to figure out how to play with a uniform roll to get it? Good news! That’s a whole thing called inverse transform sampling. All you have to do is take an integral. Good luck!


    This is all extremely ridiculous. New tactic: Just Simulate The Damn Thing. You already have the code; run it a million times, make a histogram, and tada, there’s your PDF. That’s one of the great things about computers! Brute-force numerical answers are easy to come by, so there’s no excuse for producing something like rnz. (Though, be sure your histogram has sufficiently narrow buckets — I tried plotting one for rnz once and the weird stuff on the left side didn’t show up at all!)

    By the way, I learned something from futzing with Mathematica here! Taking the square root (to bias towards 1) gives a PDF that’s a straight diagonal line, nothing like the hyperbola you get from squaring (to bias towards 0). How do you get a straight line the other way? Surprise: \(1 – \sqrt{1 – u}\).

    Okay, okay, here’s the actual math

    I don’t claim to have a very firm grasp on this, but I had a hell of a time finding it written out clearly, so I might as well write it down as best I can. This was a great excuse to finally set up MathJax, too.

    Say \(u(x)\) is the PDF of the original distribution and \(u\) is a representative number you plucked from that distribution. For the uniform distribution, \(u(x) = 1\). Or, more accurately,

    $$
    u(x) = \begin{cases}
    1 & \text{ if } 0 \le x \lt 1 \\
    0 & \text{ otherwise }
    \end{cases}
    $$

    Remember that \(x\) here is a possible outcome you want to know about, and the PDF tells you the relative probability that a roll will be near it. This PDF spits out 1 for every \(x\), meaning every number between 0 and 1 is equally likely to appear.

    We want to do something to that PDF, which creates a new distribution, whose PDF we want to know. I’ll use my original example of \(f(u) = u^2\), which creates a new PDF \(v(x)\).

    The trick is that we need to work in terms of the cumulative distribution function for \(u\). Where the PDF gives the relative chance that a roll will be (“near”) a specific value, the CDF gives the relative chance that a roll will be less than a specific value.

    The conventions for this seem to be a bit fuzzy, and nobody bothers to explain which ones they’re using, which makes this all the more confusing to read about… but let’s write the CDF with a capital letter, so we have \(U(x)\). In this case, \(U(x) = x\), a straight 45° line (at least between 0 and 1). With the definition I gave, this should make sense. At some arbitrary point like 0.4, the value of the PDF is 1 (0.4 is just as likely as anything else), and the value of the CDF is 0.4 (you have a 40% chance of getting a number from 0 to 0.4).

    Calculus ahoy: the PDF is the derivative of the CDF, which means it measures the slope of the CDF at any point. For \(U(x) = x\), the slope is always 1, and indeed \(u(x) = 1\). See, calculus is easy.

    Okay, so, now we’re getting somewhere. What we want is the CDF of our new distribution, \(V(x)\). The CDF is defined as the probability that a roll \(v\) will be less than \(x\), so we can literally write:

    $$V(x) = P(v \le x)$$

    (This is why we have to work with CDFs, rather than PDFs — a PDF gives the chance that a roll will be “nearby,” whatever that means. A CDF is much more concrete.)

    What is \(v\), exactly? We defined it ourselves; it’s the do something applied to a roll from the original distribution, or \(f(u)\).

    $$V(x) = P\!\left(f(u) \le x\right)$$

    Now the first tricky part: we have to solve that inequality for \(u\), which means we have to do something, backwards to \(x\).

    $$V(x) = P\!\left(u \le f^{-1}(x)\right)$$

    Almost there! We now have a probability that \(u\) is less than some value, and that’s the definition of a CDF!

    $$V(x) = U\!\left(f^{-1}(x)\right)$$

    Hooray! Now to turn these CDFs back into PDFs, all we need to do is differentiate both sides and use the chain rule. If you never took calculus, don’t worry too much about what that means!

    $$v(x) = u\!\left(f^{-1}(x)\right)\left|\frac{d}{dx}f^{-1}(x)\right|$$

    Wait! Where did that absolute value come from? It takes care of whether \(f(x)\) increases or decreases. It’s the least interesting part here by far, so, whatever.

    There’s one more magical part here when using the uniform distribution — \(u(\dots)\) is always equal to 1, so that entire term disappears! (Note that this only works for a uniform distribution with a width of 1; PDFs are scaled so the entire area under them sums to 1, so if you had a rand() that could spit out a number between 0 and 2, the PDF would be \(u(x) = \frac{1}{2}\).)

    $$v(x) = \left|\frac{d}{dx}f^{-1}(x)\right|$$

    So for the specific case of modifying the output of rand(), all we have to do is invert, then differentiate. The inverse of \(f(u) = u^2\) is \(f^{-1}(x) = \sqrt{x}\) (no need for a ± since we’re only dealing with positive numbers), and differentiating that gives \(v(x) = \frac{1}{2\sqrt{x}}\). Done! This is also why square root comes out nicer; inverting it gives \(x^2\), and differentiating that gives \(2x\), a straight line.

    Incidentally, that method for turning a uniform distribution into any distribution — inverse transform sampling — is pretty much the same thing in reverse: integrate, then invert. For example, when I saw that taking the square root gave \(v(x) = 2x\), I naturally wondered how to get a straight line going the other way, \(v(x) = 2 – 2x\). Integrating that gives \(2x – x^2\), and then you can use the quadratic formula (or just ask Wolfram Alpha) to solve \(2x – x^2 = u\) for \(x\) and get \(f(u) = 1 – \sqrt{1 – u}\).

    Multiply two rolls is a bit more complicated; you have to write out the CDF as an integral and you end up doing a double integral and wow it’s a mess. The only thing I’ve retained is that you do a division somewhere, which then gets integrated, and that’s why it ends up as \(-\ln x\).

    And that’s quite enough of that! (Okay but having math in my blog is pretty cool and I will definitely be doing more of this, sorry, not sorry.)

    Random vs varied

    Sometimes, random isn’t actually what you want. We tend to use the word “random” casually to mean something more like chaotic, i.e., with no discernible pattern. But that’s not really random. In fact, given how good humans can be at finding incidental patterns, they aren’t all that unlikely! Consider that when you roll two dice, they’ll come up either the same or only one apart almost half the time. Coincidence? Well, yes.

    If you ask for randomness, you’re saying that any outcome — or series of outcomes — is acceptable, including five heads in a row or five tails in a row. Most of the time, that’s fine. Some of the time, it’s less fine, and what you really want is variety. Here are a couple examples and some fairly easy workarounds.

    NPC quips

    The nature of games is such that NPCs will eventually run out of things to say, at which point further conversation will give the player a short brush-off quip — a slight nod from the designer to the player that, hey, you hit the end of the script.

    Some NPCs have multiple possible quips and will give one at random. The trouble with this is that it’s very possible for an NPC to repeat the same quip several times in a row before abruptly switching to another one. With only a few options to choose from, getting the same option twice or thrice (especially across an entire game, which may have numerous NPCs) isn’t all that unlikely. The notion of an NPC quip isn’t very realistic to start with, but having someone repeat themselves and then abruptly switch to something else is especially jarring.

    The easy fix is to show the quips in order! Paradoxically, this is more consistently varied than choosing at random — the original “order” is likely to be meaningless anyway, and it already has the property that the same quip can never appear twice in a row.

    If you like, you can shuffle the list of quips every time you reach the end, but take care here — it’s possible that the last quip in the old order will be the same as the first quip in the new order, so you may still get a repeat. (Of course, you can just check for this case and swap the first quip somewhere else if it bothers you.)

    That last behavior is, in fact, the canonical way that Tetris chooses pieces — the game simply shuffles a list of all 7 pieces, gives those to you in shuffled order, then shuffles them again to make a new list once it’s exhausted. There’s no avoidance of duplicates, though, so you can still get two S blocks in a row, or even two S and two Z all clumped together, but no more than that. Some Tetris variants take other approaches, such as actively avoiding repeats even several pieces apart or deliberately giving you the worst piece possible.

    Random drops

    Random drops are often implemented as a flat chance each time. Maybe enemies have a 5% chance to drop health when they die. Legally speaking, over the long term, a player will see health drops for about 5% of enemy kills.

    Over the short term, they may be desperate for health and not survive to see the long term. So you may want to put a thumb on the scale sometimes. Games in the Metroid series, for example, have a somewhat infamous bias towards whatever kind of drop they think you need — health if your health is low, missiles if your missiles are low.

    I can’t give you an exact approach to use, since it depends on the game and the feeling you’re going for and the variables at your disposal. In extreme cases, you might want to guarantee a health drop from a tough enemy when the player is critically low on health. (Or if you’re feeling particularly evil, you could go the other way and deny the player health when they most need it…)

    The problem becomes a little different, and worse, when the event that triggers the drop is relatively rare. The pathological case here would be something like a raid boss in World of Warcraft, which requires hours of effort from a coordinated group of people to defeat, and which has some tiny chance of dropping a good item that will go to only one of those people. This is why I stopped playing World of Warcraft at 60.

    Dialing it back a little bit gives us Enter the Gungeon, a roguelike where each room is a set of encounters and each floor only has a dozen or so rooms. Initially, you have a 1% chance of getting a reward after completing a room — but every time you complete a room and don’t get a reward, the chance increases by 9%, up to a cap of 80%. Once you get a reward, the chance resets to 1%.

    The natural question is: how frequently, exactly, can a player expect to get a reward? We could do math, or we could Just Simulate The Damn Thing.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    from collections import Counter
    import random
    
    histogram = Counter()
    
    TRIALS = 1000000
    chance = 1
    rooms_cleared = 0
    rewards_found = 0
    while rewards_found < TRIALS:
        rooms_cleared += 1
        if random.random() * 100 < chance:
            # Reward!
            rewards_found += 1
            histogram[rooms_cleared] += 1
            rooms_cleared = 0
            chance = 1
        else:
            chance = min(80, chance + 9)
    
    for gaps, count in sorted(histogram.items()):
        print(f"{gaps:3d} | {count / TRIALS * 100:6.2f}%", '#' * (count // (TRIALS // 100)))
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
      1 |   0.98%
      2 |   9.91% #########
      3 |  17.00% ################
      4 |  20.23% ####################
      5 |  19.21% ###################
      6 |  15.05% ###############
      7 |   9.69% #########
      8 |   5.07% #####
      9 |   2.09% ##
     10 |   0.63%
     11 |   0.12%
     12 |   0.03%
     13 |   0.00%
     14 |   0.00%
     15 |   0.00%
    

    We’ve got kind of a hilly distribution, skewed to the left, which is up in this histogram. Most of the time, a player should see a reward every three to six rooms, which is maybe twice per floor. It’s vanishingly unlikely to go through a dozen rooms without ever seeing a reward, so a player should see at least one per floor.

    Of course, this simulated a single continuous playthrough; when starting the game from scratch, your chance at a reward always starts fresh at 1%, the worst it can be. If you want to know about how many rewards a player will get on the first floor, hey, Just Simulate The Damn Thing.

    1
    2
    3
    4
    5
    6
    7
      0 |   0.01%
      1 |  13.01% #############
      2 |  56.28% ########################################################
      3 |  27.49% ###########################
      4 |   3.10% ###
      5 |   0.11%
      6 |   0.00%
    

    Cool. Though, that’s assuming exactly 12 rooms; it might be worth changing that to pick at random in a way that matches the level generator.

    (Enter the Gungeon does some other things to skew probability, which is very nice in a roguelike where blind luck can make or break you. For example, if you kill a boss without having gotten a new gun anywhere else on the floor, the boss is guaranteed to drop a gun.)

    Critical hits

    I suppose this is the same problem as random drops, but backwards.

    Say you have a battle sim where every attack has a 6% chance to land a devastating critical hit. Presumably the same rules apply to both the player and the AI opponents.

    Consider, then, that the AI opponents have exactly the same 6% chance to ruin the player’s day. Consider also that this gives them an 0.4% chance to critical hit twice in a row. 0.4% doesn’t sound like much, but across an entire playthrough, it’s not unlikely that a player might see it happen and find it incredibly annoying.

    Perhaps it would be worthwhile to explicitly forbid AI opponents from getting consecutive critical hits.

    In conclusion

    An emerging theme here has been to Just Simulate The Damn Thing. So consider Just Simulating The Damn Thing. Even a simple change to a random value can do surprising things to the resulting distribution, so unless you feel like differentiating the inverse function of your code, maybe test out any non-trivial behavior and make sure it’s what you wanted. Probability is hard to reason about.

    Serverless @ re:Invent 2017

    Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/serverless-reinvent-2017/

    At re:Invent 2014, we announced AWS Lambda, what is now the center of the serverless platform at AWS, and helped ignite the trend of companies building serverless applications.

    This year, at re:Invent 2017, the topic of serverless was everywhere. We were incredibly excited to see the energy from everyone attending 7 workshops, 15 chalk talks, 20 skills sessions and 27 breakout sessions. Many of these sessions were repeated due to high demand, so we are happy to summarize and provide links to the recordings and slides of these sessions.

    Over the course of the week leading up to and then the week of re:Invent, we also had over 15 new features and capabilities across a number of serverless services, including AWS Lambda, Amazon API Gateway, AWS [email protected], AWS SAM, and the newly announced AWS Serverless Application Repository!

    AWS Lambda

    Amazon API Gateway

    • Amazon API Gateway Supports Endpoint Integrations with Private VPCs – You can now provide access to HTTP(S) resources within your VPC without exposing them directly to the public internet. This includes resources available over a VPN or Direct Connect connection!
    • Amazon API Gateway Supports Canary Release Deployments – You can now use canary release deployments to gradually roll out new APIs. This helps you more safely roll out API changes and limit the blast radius of new deployments.
    • Amazon API Gateway Supports Access Logging – The access logging feature lets you generate access logs in different formats such as CLF (Common Log Format), JSON, XML, and CSV. The access logs can be fed into your existing analytics or log processing tools so you can perform more in-depth analysis or take action in response to the log data.
    • Amazon API Gateway Customize Integration Timeouts – You can now set a custom timeout for your API calls as low as 50ms and as high as 29 seconds (the default is 30 seconds).
    • Amazon API Gateway Supports Generating SDK in Ruby – This is in addition to support for SDKs in Java, JavaScript, Android and iOS (Swift and Objective-C). The SDKs that Amazon API Gateway generates save you development time and come with a number of prebuilt capabilities, such as working with API keys, exponential back, and exception handling.

    AWS Serverless Application Repository

    Serverless Application Repository is a new service (currently in preview) that aids in the publication, discovery, and deployment of serverless applications. With it you’ll be able to find shared serverless applications that you can launch in your account, while also sharing ones that you’ve created for others to do the same.

    AWS [email protected]

    [email protected] now supports content-based dynamic origin selection, network calls from viewer events, and advanced response generation. This combination of capabilities greatly increases the use cases for [email protected], such as allowing you to send requests to different origins based on request information, showing selective content based on authentication, and dynamically watermarking images for each viewer.

    AWS SAM

    Twitch Launchpad live announcements

    Other service announcements

    Here are some of the other highlights that you might have missed. We think these could help you make great applications:

    AWS re:Invent 2017 sessions

    Coming up with the right mix of talks for an event like this can be quite a challenge. The Product, Marketing, and Developer Advocacy teams for Serverless at AWS spent weeks reading through dozens of talk ideas to boil it down to the final list.

    From feedback at other AWS events and webinars, we knew that customers were looking for talks that focused on concrete examples of solving problems with serverless, how to perform common tasks such as deployment, CI/CD, monitoring, and troubleshooting, and to see customer and partner examples solving real world problems. To that extent we tried to settle on a good mix based on attendee experience and provide a track full of rich content.

    Below are the recordings and slides of breakout sessions from re:Invent 2017. We’ve organized them for those getting started, those who are already beginning to build serverless applications, and the experts out there already running them at scale. Some of the videos and slides haven’t been posted yet, and so we will update this list as they become available.

    Find the entire Serverless Track playlist on YouTube.

    Talks for people new to Serverless

    Advanced topics

    Expert mode

    Talks for specific use cases

    Talks from AWS customers & partners

    Looking to get hands-on with Serverless?

    At re:Invent, we delivered instructor-led skills sessions to help attendees new to serverless applications get started quickly. The content from these sessions is already online and you can do the hands-on labs yourself!
    Build a Serverless web application

    Still looking for more?

    We also recently completely overhauled the main Serverless landing page for AWS. This includes a new Resources page containing case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials. Check it out!

    MQTT 5: Is it time to upgrade to MQTT 5 yet?

    Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/mqtt-5-time-to-upgrade-yet/

    MQTT 5 - Is it time to upgrade yet?

    Is it time to upgrade to MQTT 5 yet?

    Welcome to this week’s blog post! After last week’s Introduction to MQTT 5, many readers wondered when the successor to MQTT 3.1.1 is ready for prime time and can be used in future and existing projects.

    Before we try to answer the question in more detail, we’d love to hear your thoughts about upgrading to MQTT 5. We prepared a small survey below. Let us know how your MQTT 5 upgrading plans are!

    The MQTT 5 OASIS Standard

    As of late December 2017, the MQTT 5 specification is not available as an official “Committee Specification” yet. In other words: MQTT 5 is not available yet officially. The foundation for every implementation of the standard is, that the Technical Committee at OASIS officially releases the standard.

    The good news: Although no official version of the standard is available yet, fundamental changes to the current state of the specification are not expected.
    The Public Review phase of the “Committee Specification Draft 2” finished without any major comments or issues. We at HiveMQ expect the MQTT 5 standard to be released in very late December 2017 or January 2018.

    Current state of client libraries

    To start using MQTT 5, you need two participants: An MQTT 5 client library implementation in your programming language(s) of choice and an MQTT 5 broker implementation (like HiveMQ). If both components support the new standard, you are good to go and can use the new version in your projects.

    When it comes to MQTT libraries, Eclipse Paho is the one-stop shop for MQTT clients in most programming languages. A recent Paho mailing list entry stated that Paho plans to release MQTT 5 client libraries end of June 2018 for the following programming languages:

    • C (+ embedded C)
    • Java
    • Go
    • C++

    If you’re feeling adventurous, at least the Java Paho client has preliminary MQTT 5 support available. You can play around with the API and get a feel about the upcoming Paho version. Just build the library from source and test it, but be aware that this is not safe for production use.

    There is also a very basic test broker implementation available at Eclipse Paho which can be used for playing around. This is of course only for very basic tests and does not support all MQTT 5 features yet. If you’re planning to write your own library, this may be a good tool to test your implementation against.

    There are of other individual MQTT library projects which may be worth to check out. As of December 2017 most of these libraries don’t have an MQTT 5 roadmap published yet.

    HiveMQ and MQTT 5

    You can’t use the new version of the MQTT protocol only by having a client that is ready for MQTT 5. The counterpart, the MQTT broker, also needs to fully support the new protocol version. At the time of writing, no broker is MQTT 5 ready yet.

    HiveMQ was the first broker to fully support version 3.1.1 of MQTT and of course here at HiveMQ we are committed to give our customers the advantage of the new features of version 5 of the protocol as soon as possible and viable.

    We are going to provide an Early Access version of the upcoming HiveMQ generation with MQTT 5 support by May/June 2018. If you’re a library developer or want to go live with the new protocol version as soon as possible: The Early Access version is for you. Add yourself to the Early Access Notification List and we’ll notify you when the Early Access version is available.

    We expect to release the upcoming HiveMQ generation in the third quarter of 2018 with full support of ALL MQTT 5 features at scale in an interoperable way with previous MQTT versions.

    When is MQTT 5 ready for prime time?

    MQTT is typically used in mission critical environments where it’s not acceptable that parts of the infrastructure, broker or client, are unreliable or have some rough edges. So it’s typically not advisable to be the very first to try out new things in a critical production environment.

    Here at HiveMQ we expect that the first users will go live to production in late Q3 (September) 2018 and in the subsequent months. After the releases of the Paho library in June and the HiveMQ Early Access version, the adoption of MQTT 5 is expected to increase rapidly.

    So, is MQTT 5 ready for prime time yet (as of December 2017)? No.

    Will the new version of the protocol be suitable for production environments in the second half of 2018: Yes, definitely.

    Upcoming topics in this series

    We will continue this blog post series in January after the European Christmas Holidays. To kick-off the technical part of the series, we will take a look at the foundational changes of the MQTT protocol. And after that, we will release one blog post per week that will thoroughly review and inspect one new feature in detail together with best practices and fun trivia.

    If you want us to send the next and all upcoming articles directly into your inbox, just use the newsletter subscribe form below.

    P.S. Don’t forget to let us know if MQTT 5 is of interest for you by participating in this quick poll.

    Have an awesome week,
    The HiveMQ Team

    How to Make Your Web App More Reliable and Performant Using webpack: a Yahoo Mail Case Study

    Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/168508200981

    yahoodevelopers:

    image

    By Murali Krishna Bachhu, Anurag Damle, and Utkarsh Shrivastava

    As engineers on the Yahoo Mail team at Oath, we pride ourselves on the things that matter most to developers: faster development cycles, more reliability, and better performance. Users don’t necessarily see these elements, but they certainly feel the difference they make when significant improvements are made. Recently, we were able to upgrade all three of these areas at scale by adopting webpack® as Yahoo Mail’s underlying module bundler, and you can do the same for your web application.

    What is webpack?

    webpack is an open source module bundler for modern JavaScript applications. When webpack processes your application, it recursively builds a dependency graph that includes every module your application needs. Then it packages all of those modules into a small number of bundles, often only one, to be loaded by the browser.

    webpack became our choice module bundler not only because it supports on-demand loading, multiple bundle generation, and has a relatively low runtime overhead, but also because it is better suited for web platforms and NodeJS apps and has great community support.

    image

    Comparison of webpack to other open source bundlers


    How did we integrate webpack?

    Like any developer does when integrating a new module bundler, we started integrating webpack into Yahoo Mail by looking at its basic config file. We explored available default webpack plugins as well as third-party webpack plugins and then picked the plugins most suitable for our application. If we didn’t find a plugin that suited a specific need, we wrote the webpack plugin ourselves (e.g., We wrote a plugin to execute Atomic CSS scripts in the latest Yahoo Mail experience in order to decrease our overall CSS payload**).

    During the development process for Yahoo Mail, we needed a way to make sure webpack would continuously run in the background. To make this happen, we decided to use the task runner Grunt. Not only does Grunt keep the connection to webpack alive, but it also gives us the ability to pass different parameters to the webpack config file based on the given environment. Some examples of these parameters are source map options, enabling HMR, and uglification.

    Before deployment to production, we wanted to optimize the javascript bundles for size to make the Yahoo Mail experience faster. webpack provides good default support for this with the UglifyJS plugin. Although the default options are conservative, they give us the ability to configure the options. Once we modified the options to our specifications, we saved approximately 10KB.

    image

    Code snippet showing the configuration options for the UglifyJS plugin


    Faster development cycles for developers

    While developing a new feature, engineers ideally want to see their code changes reflected on their web app instantaneously. This allows them to maintain their train of thought and eventually results in more productivity. Before we implemented webpack, it took us around 30 seconds to 1 minute for changes to reflect on our Yahoo Mail development environment. webpack helped us reduce the wait time to 5 seconds.

    More reliability

    Consumers love a reliable product, where all the features work seamlessly every time. Before we began using webpack, we were generating javascript bundles on demand or during run-time, which meant the product was more prone to exceptions or failures while fetching the javascript bundles. With webpack, we now generate all the bundles during build time, which means that all the bundles are available whenever consumers access Yahoo Mail. This results in significantly fewer exceptions and failures and a better experience overall.

    Better Performance

    We were able to attain a significant reduction of payload after adopting webpack.

    1. Reduction of about 75 KB gzipped Javascript payload
    2. 50% reduction on server-side render time
    3. 10% improvement in Yahoo Mail’s launch performance metrics, as measured by render time above the fold (e.g., Time to load contents of an email).

    Below are some charts that demonstrate the payload size of Yahoo Mail before and after implementing webpack.

    image

    Payload before using webpack (JavaScript Size = 741.41KB)


    image

    Payload after switching to webpack (JavaScript size = 669.08KB)


    image

    Conclusion

    Shifting to webpack has resulted in significant improvements. We saw a common build process go from 30 seconds to 5 seconds, large JavaScript bundle size reductions, and a halving in server-side rendering time. In addition to these benefits, our engineers have found the community support for webpack to have been impressive as well. webpack has made the development of Yahoo Mail more efficient and enhanced the product for users. We believe you can use it to achieve similar results for your web application as well.

    **Optimized CSS generation with Atomizer

    Before we implemented webpack into the development of Yahoo Mail, we looked into how we could decrease our CSS payload. To achieve this, we developed an in-house solution for writing modular and scoped CSS in React. Our solution is similar to the Atomizer library, and our CSS is written in JavaScript like the example below:

    image

    Sample snippet of CSS written with Atomizer


    Every React component creates its own styles.js file with required style definitions. React-Atomic-CSS converts these files into unique class definitions. Our total CSS payload after implementing our solution equaled all the unique style definitions in our code, or only 83KB (21KB gzipped).

    During our migration to webpack, we created a custom plugin and loader to parse these files and extract the unique style definitions from all of our CSS files. Since this process is tied to bundling, only CSS files that are part of the dependency chain are included in the final CSS.

    AWS Cloud9 – Cloud Developer Environments

    Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/aws-cloud9-cloud-developer-environments/

    One of the first things you learn when you start programming is that, just like any craftsperson, your tools matter. Notepad.exe isn’t going to cut it. A powerful editor and testing pipeline supercharge your productivity. I still remember learning to use Vim for the first time and being able to zip around systems and complex programs. Do you remember how hard it was to setup all your compilers and dependencies on a new machine? How many cycles have you wasted matching versions, tinkering with configs, and then writing documentation to onboard a new developer to a project?

    Today we’re launching AWS Cloud9, an Integrated Development Environment (IDE) for writing, running, and debugging code, all from your web browser. Cloud9 comes prepackaged with essential tools for many popular programming languages (Javascript, Python, PHP, etc.) so you don’t have to tinker with installing various compilers and toolchains. Cloud9 also provides a seamless experience for working with serverless applications allowing you to quickly switch between local and remote testing or debugging. Based on the popular open source Ace Editor and c9.io IDE (which we acquired last year), AWS Cloud9 is designed to make collaborative cloud development easy with extremely powerful pair programming features. There are more features than I could ever cover in this post but to give a quick breakdown I’ll break the IDE into 3 components: The editor, the AWS integrations, and the collaboration.

    Editing


    The Ace Editor at the core of Cloud9 is what lets you write code quickly, easily, and beautifully. It follows a UNIX philosophy of doing one thing and doing it well: writing code.

    It has all the typical IDE features you would expect: live syntax checking, auto-indent, auto-completion, code folding, split panes, version control integration, multiple cursors and selections, and it also has a few unique features I want to highlight. First of all, it’s fast, even for large (100000+ line) files. There’s no lag or other issues while typing. It has over two dozen themes built-in (solarized!) and you can bring all of your favorite themes from Sublime Text or TextMate as well. It has built-in support for 40+ language modes and customizable run configurations for your projects. Most importantly though, it has Vim mode (or emacs if your fingers work that way). It also has a keybinding editor that allows you to bend the editor to your will.

    The editor supports powerful keyboard navigation and commands (similar to Sublime Text or vim plugins like ctrlp). On a Mac, with ⌘+P you can open any file in your environment with fuzzy search. With ⌘+. you can open up the command pane which allows you to do invoke any of the editor commands by typing the name. It also helpfully displays the keybindings for a command in the pane, for instance to open to a terminal you can press ⌥+T. Oh, did I mention there’s a terminal? It ships with the AWS CLI preconfigured for access to your resources.

    The environment also comes with pre-installed debugging tools for many popular languages – but you’re not limited to what’s already installed. It’s easy to add in new programs and define new run configurations.

    The editor is just one, admittedly important, component in an IDE though. I want to show you some other compelling features.

    AWS Integrations

    The AWS Cloud9 IDE is the first IDE I’ve used that is truly “cloud native”. The service is provided at no additional charge, and you only charged for the underlying compute and storage resources. When you create an environment you’re prompted for either: an instance type and an auto-hibernate time, or SSH access to a machine of your choice.

    If you’re running in AWS the auto-hibernate feature will stop your instance shortly after you stop using your IDE. This can be a huge cost savings over running a more permanent developer desktop. You can also launch it within a VPC to give it secure access to your development resources. If you want to run Cloud9 outside of AWS, or on an existing instance, you can provide SSH access to the service which it will use to create an environment on the external machine. Your environment is provisioned with automatic and secure access to your AWS account so you don’t have to worry about copying credentials around. Let me say that again: you can run this anywhere.

    Serverless Development with AWS Cloud9

    I spend a lot of time on Twitch developing serverless applications. I have hundreds of lambda functions and APIs deployed. Cloud9 makes working with every single one of these functions delightful. Let me show you how it works.


    If you look in the top right side of the editor you’ll see an AWS Resources tab. Opening this you can see all of the lambda functions in your region (you can see functions in other regions by adjusting your region preferences in the AWS preference pane).

    You can import these remote functions to your local workspace just by double-clicking them. This allows you to edit, test, and debug your serverless applications all locally. You can create new applications and functions easily as well. If you click the Lambda icon in the top right of the pane you’ll be prompted to create a new lambda function and Cloud9 will automatically create a Serverless Application Model template for you as well. The IDE ships with support for the popular SAM local tool pre-installed. This is what I use in most of my local testing and serverless development. Since you have a terminal, it’s easy to install additional tools and use other serverless frameworks.

     

    Launching an Environment from AWS CodeStar

    With AWS CodeStar you can easily provision an end-to-end continuous delivery toolchain for development on AWS. Codestar provides a unified experience for building, testing, deploying, and managing applications using AWS CodeCommit, CodeBuild, CodePipeline, and CodeDeploy suite of services. Now, with a few simple clicks you can provision a Cloud9 environment to develop your application. Your environment will be pre-configured with the code for your CodeStar application already checked out and git credentials already configured.

    You can easily share this environment with your coworkers which leads me to another extremely useful set of features.

    Collaboration

    One of the many things that sets AWS Cloud9 apart from other editors are the rich collaboration tools. You can invite an IAM user to your environment with a few clicks.

    You can see what files they’re working on, where their cursors are, and even share a terminal. The chat features is useful as well.

    Things to Know

    • There are no additional charges for this service beyond the underlying compute and storage.
    • c9.io continues to run for existing users. You can continue to use all the features of c9.io and add new team members if you have a team account. In the future, we will provide tools for easy migration of your c9.io workspaces to AWS Cloud9.
    • AWS Cloud9 is available in the US West (Oregon), US East (Ohio), US East (N.Virginia), EU (Ireland), and Asia Pacific (Singapore) regions.

    I can’t wait to see what you build with AWS Cloud9!

    Randall

    Object models

    Post Syndicated from Eevee original https://eev.ee/blog/2017/11/28/object-models/

    Anonymous asks, with dollars:

    More about programming languages!

    Well then!

    I’ve written before about what I think objects are: state and behavior, which in practice mostly means method calls.

    I suspect that the popular impression of what objects are, and also how they should work, comes from whatever C++ and Java happen to do. From that point of view, the whole post above is probably nonsense. If the baseline notion of “object” is a rigid definition woven tightly into the design of two massively popular languages, then it doesn’t even make sense to talk about what “object” should mean — it does mean the features of those languages, and cannot possibly mean anything else.

    I think that’s a shame! It piles a lot of baggage onto a fairly simple idea. Polymorphism, for example, has nothing to do with objects — it’s an escape hatch for static type systems. Inheritance isn’t the only way to reuse code between objects, but it’s the easiest and fastest one, so it’s what we get. Frankly, it’s much closer to a speed tradeoff than a fundamental part of the concept.

    We could do with more experimentation around how objects work, but that’s impossible in the languages most commonly thought of as object-oriented.

    Here, then, is a (very) brief run through the inner workings of objects in four very dynamic languages. I don’t think I really appreciated objects until I’d spent some time with Python, and I hope this can help someone else whet their own appetite.

    Python 3

    Of the four languages I’m going to touch on, Python will look the most familiar to the Java and C++ crowd. For starters, it actually has a class construct.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    class Vector:
        def __init__(self, x, y):
            self.x = x
            self.y = y
    
        def __neg__(self):
            return Vector(-self.x, -self.y)
    
        def __div__(self, denom):
            return Vector(self.x / denom, self.y / denom)
    
        @property
        def magnitude(self):
            return (self.x ** 2 + self.y ** 2) ** 0.5
    
        def normalized(self):
            return self / self.magnitude
    

    The __init__ method is an initializer, which is like a constructor but named differently (because the object already exists in a usable form by the time the initializer is called). Operator overloading is done by implementing methods with other special __dunder__ names. Properties can be created with @property, where the @ is syntax for applying a wrapper function to a function as it’s defined. You can do inheritance, even multiply:

    1
    2
    3
    4
    class Foo(A, B, C):
        def bar(self, x, y, z):
            # do some stuff
            super().bar(x, y, z)
    

    Cool, a very traditional object model.

    Except… for some details.

    Some details

    For one, Python objects don’t have a fixed layout. Code both inside and outside the class can add or remove whatever attributes they want from whatever object they want. The underlying storage is just a dict, Python’s mapping type. (Or, rather, something like one. Also, it’s possible to change, which will probably be the case for everything I say here.)

    If you create some attributes at the class level, you’ll start to get a peek behind the curtains:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    class Foo:
        values = []
    
        def add_value(self, value):
            self.values.append(value)
    
    a = Foo()
    b = Foo()
    a.add_value('a')
    print(a.values)  # ['a']
    b.add_value('b')
    print(b.values)  # ['a', 'b']
    

    The [] assigned to values isn’t a default assigned to each object. In fact, the individual objects don’t know about it at all! You can use vars(a) to get at the underlying storage dict, and you won’t see a values entry in there anywhere.

    Instead, values lives on the class, which is a value (and thus an object) in its own right. When Python is asked for self.values, it checks to see if self has a values attribute; in this case, it doesn’t, so Python keeps going and asks the class for one.

    Python’s object model is secretly prototypical — a class acts as a prototype, as a shared set of fallback values, for its objects.

    In fact, this is also how method calls work! They aren’t syntactically special at all, which you can see by separating the attribute lookup from the call.

    1
    2
    3
    print("abc".startswith("a"))  # True
    meth = "abc".startswith
    print(meth("a"))  # True
    

    Reading obj.method looks for a method attribute; if there isn’t one on obj, Python checks the class. Here, it finds one: it’s a function from the class body.

    Ah, but wait! In the code I just showed, meth seems to “know” the object it came from, so it can’t just be a plain function. If you inspect the resulting value, it claims to be a “bound method” or “built-in method” rather than a function, too. Something funny is going on here, and that funny something is the descriptor protocol.

    Descriptors

    Python allows attributes to implement their own custom behavior when read from or written to. Such an attribute is called a descriptor. I’ve written about them before, but here’s a quick overview.

    If Python looks up an attribute, finds it in a class, and the value it gets has a __get__ method… then instead of using that value, Python will use the return value of its __get__ method.

    The @property decorator works this way. The magnitude property in my original example was shorthand for doing this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    class MagnitudeDescriptor:
        def __get__(self, instance, owner):
            if instance is None:
                return self
            return (instance.x ** 2 + instance.y ** 2) ** 0.5
    
    class Vector:
        def __init__(self, x, y):
            self.x = x
            self.y = y
    
        magnitude = MagnitudeDescriptor()
    

    When you ask for somevec.magnitude, Python checks somevec but doesn’t find magnitude, so it consults the class instead. The class does have a magnitude, and it’s a value with a __get__ method, so Python calls that method and somevec.magnitude evaluates to its return value. (The instance is None check is because __get__ is called even if you get the descriptor directly from the class via Vector.magnitude. A descriptor intended to work on instances can’t do anything useful in that case, so the convention is to return the descriptor itself.)

    You can also intercept attempts to write to or delete an attribute, and do absolutely whatever you want instead. But note that, similar to operating overloading in Python, the descriptor must be on a class; you can’t just slap one on an arbitrary object and have it work.

    This brings me right around to how “bound methods” actually work. Functions are descriptors! The function type implements __get__, and when a function is retrieved from a class via an instance, that __get__ bundles the function and the instance together into a tiny bound method object. It’s essentially:

    1
    2
    3
    4
    5
    class FunctionType:
        def __get__(self, instance, owner):
            if instance is None:
                return self
            return functools.partial(self, instance)
    

    The self passed as the first argument to methods is not special or magical in any way. It’s built out of a few simple pieces that are also readily accessible to Python code.

    Note also that because obj.method() is just an attribute lookup and a call, Python doesn’t actually care whether method is a method on the class or just some callable thing on the object. You won’t get the auto-self behavior if it’s on the object, but otherwise there’s no difference.

    More attribute access, and the interesting part

    Descriptors are one of several ways to customize attribute access. Classes can implement __getattr__ to intervene when an attribute isn’t found on an object; __setattr__ and __delattr__ to intervene when any attribute is set or deleted; and __getattribute__ to implement unconditional attribute access. (That last one is a fantastic way to create accidental recursion, since any attribute access you do within __getattribute__ will of course call __getattribute__ again.)

    Here’s what I really love about Python. It might seem like a magical special case that descriptors only work on classes, but it really isn’t. You could implement exactly the same behavior yourself, in pure Python, using only the things I’ve just told you about. Classes are themselves objects, remember, and they are instances of type, so the reason descriptors only work on classes is that type effectively does this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    class type:
        def __getattribute__(self, name):
            value = super().__getattribute__(name)
            # like all op overloads, __get__ must be on the type, not the instance
            ty = type(value)
            if hasattr(ty, '__get__'):
                # it's a descriptor!  this is a class access so there is no instance
                return ty.__get__(value, None, self)
            else:
                return value
    

    You can even trivially prove to yourself that this is what’s going on by skipping over types behavior:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    class Descriptor:
        def __get__(self, instance, owner):
            print('called!')
    
    class Foo:
        bar = Descriptor()
    
    Foo.bar  # called!
    type.__getattribute__(Foo, 'bar')  # called!
    object.__getattribute__(Foo, 'bar')  # ...
    

    And that’s not all! The mysterious super function, used to exhaustively traverse superclass method calls even in the face of diamond inheritance, can also be expressed in pure Python using these primitives. You could write your own superclass calling convention and use it exactly the same way as super.

    This is one of the things I really like about Python. Very little of it is truly magical; virtually everything about the object model exists in the types rather than the language, which means virtually everything can be customized in pure Python.

    Class creation and metaclasses

    A very brief word on all of this stuff, since I could talk forever about Python and I have three other languages to get to.

    The class block itself is fairly interesting. It looks like this:

    1
    2
    class Name(*bases, **kwargs):
        # code
    

    I’ve said several times that classes are objects, and in fact the class block is one big pile of syntactic sugar for calling type(...) with some arguments to create a new type object.

    The Python documentation has a remarkably detailed description of this process, but the gist is:

    • Python determines the type of the new class — the metaclass — by looking for a metaclass keyword argument. If there isn’t one, Python uses the “lowest” type among the provided base classes. (If you’re not doing anything special, that’ll just be type, since every class inherits from object and object is an instance of type.)

    • Python executes the class body. It gets its own local scope, and any assignments or method definitions go into that scope.

    • Python now calls type(name, bases, attrs, **kwargs). The name is whatever was right after class; the bases are position arguments; and attrs is the class body’s local scope. (This is how methods and other class attributes end up on the class.) The brand new type is then assigned to Name.

    Of course, you can mess with most of this. You can implement __prepare__ on a metaclass, for example, to use a custom mapping as storage for the local scope — including any reads, which allows for some interesting shenanigans. The only part you can’t really implement in pure Python is the scoping bit, which has a couple extra rules that make sense for classes. (In particular, functions defined within a class block don’t close over the class body; that would be nonsense.)

    Object creation

    Finally, there’s what actually happens when you create an object — including a class, which remember is just an invocation of type(...).

    Calling Foo(...) is implemented as, well, a call. Any type can implement calls with the __call__ special method, and you’ll find that type itself does so. It looks something like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    # oh, a fun wrinkle that's hard to express in pure python: type is a class, so
    # it's an instance of itself
    class type:
        def __call__(self, *args, **kwargs):
            # remember, here 'self' is a CLASS, an instance of type.
            # __new__ is a true constructor: object.__new__ allocates storage
            # for a new blank object
            instance = self.__new__(self, *args, **kwargs)
            # you can return whatever you want from __new__ (!), and __init__
            # is only called on it if it's of the right type
            if isinstance(instance, self):
                instance.__init__(*args, **kwargs)
            return instance
    

    Again, you can trivially confirm this by asking any type for its __call__ method. Assuming that type doesn’t implement __call__ itself, you’ll get back a bound version of types implementation.

    1
    2
    >>> list.__call__
    <method-wrapper '__call__' of type object at 0x7fafb831a400>
    

    You can thus implement __call__ in your own metaclass to completely change how subclasses are created — including skipping the creation altogether, if you like.

    And… there’s a bunch of stuff I haven’t even touched on.

    The Python philosophy

    Python offers something that, on the surface, looks like a “traditional” class/object model. Under the hood, it acts more like a prototypical system, where failed attribute lookups simply defer to a superclass or metaclass.

    The language also goes to almost superhuman lengths to expose all of its moving parts. Even the prototypical behavior is an implementation of __getattribute__ somewhere, which you are free to completely replace in your own types. Proxying and delegation are easy.

    Also very nice is that these features “bundle” well, by which I mean a library author can do all manner of convoluted hijinks, and a consumer of that library doesn’t have to see any of it or understand how it works. You only need to inherit from a particular class (which has a metaclass), or use some descriptor as a decorator, or even learn any new syntax.

    This meshes well with Python culture, which is pretty big on the principle of least surprise. These super-advanced features tend to be tightly confined to single simple features (like “makes a weak attribute“) or cordoned with DSLs (e.g., defining a form/struct/database table with a class body). In particular, I’ve never seen a metaclass in the wild implement its own __call__.

    I have mixed feelings about that. It’s probably a good thing overall that the Python world shows such restraint, but I wonder if there are some very interesting possibilities we’re missing out on. I implemented a metaclass __call__ myself, just once, in an entity/component system that strove to minimize fuss when communicating between components. It never saw the light of day, but I enjoyed seeing some new things Python could do with the same relatively simple syntax. I wouldn’t mind seeing, say, an object model based on composition (with no inheritance) built atop Python’s primitives.

    Lua

    Lua doesn’t have an object model. Instead, it gives you a handful of very small primitives for building your own object model. This is pretty typical of Lua — it’s a very powerful language, but has been carefully constructed to be very small at the same time. I’ve never encountered anything else quite like it, and “but it starts indexing at 1!” really doesn’t do it justice.

    The best way to demonstrate how objects work in Lua is to build some from scratch. We need two key features. The first is metatables, which bear a passing resemblance to Python’s metaclasses.

    Tables and metatables

    The table is Lua’s mapping type and its primary data structure. Keys can be any value other than nil. Lists are implemented as tables whose keys are consecutive integers starting from 1. Nothing terribly surprising. The dot operator is sugar for indexing with a string key.

    1
    2
    3
    4
    5
    local t = { a = 1, b = 2 }
    print(t['a'])  -- 1
    print(t.b)  -- 2
    t.c = 3
    print(t['c'])  -- 3
    

    A metatable is a table that can be associated with another value (usually another table) to change its behavior. For example, operator overloading is implemented by assigning a function to a special key in a metatable.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    local t = { a = 1, b = 2 }
    --print(t + 0)  -- error: attempt to perform arithmetic on a table value
    
    local mt = {
        __add = function(left, right)
            return 12
        end,
    }
    setmetatable(t, mt)
    print(t + 0)  -- 12
    

    Now, the interesting part: one of the special keys is __index, which is consulted when the base table is indexed by a key it doesn’t contain. Here’s a table that claims every key maps to itself.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    local t = {}
    local mt = {
        __index = function(table, key)
            return key
        end,
    }
    setmetatable(t, mt)
    print(t.foo)  -- foo
    print(t.bar)  -- bar
    print(t[3])  -- 3
    

    __index doesn’t have to be a function, either. It can be yet another table, in which case that table is simply indexed with the key. If the key still doesn’t exist and that table has a metatable with an __index, the process repeats.

    With this, it’s easy to have several unrelated tables that act as a single table. Call the base table an object, fill the __index table with functions and call it a class, and you have half of an object system. You can even get prototypical inheritance by chaining __indexes together.

    At this point things are a little confusing, since we have at least three tables going on, so here’s a diagram. Keep in mind that Lua doesn’t actually have anything called an “object”, “class”, or “method” — those are just convenient nicknames for a particular structure we might build with Lua’s primitives.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
                        ╔═══════════╗        ...
                        ║ metatable ║         ║
                        ╟───────────╢   ┌─────╨───────────────────────┐
                        ║ __index   ╫───┤ lookup table ("superclass") │
                        ╚═══╦═══════╝   ├─────────────────────────────┤
      ╔═══════════╗         ║           │ some other method           ┼─── function() ... end
      ║ metatable ║         ║           └─────────────────────────────┘
      ╟───────────╢   ┌─────╨──────────────────┐
      ║ __index   ╫───┤ lookup table ("class") │
      ╚═══╦═══════╝   ├────────────────────────┤
          ║           │ some method            ┼─── function() ... end
          ║           └────────────────────────┘
    ┌─────╨─────────────────┐
    │ base table ("object") │
    └───────────────────────┘
    

    Note that a metatable is not the same as a class; it defines behavior, not methods. Conversely, if you try to use a class directly as a metatable, it will probably not do much. (This is pretty different from e.g. Python, where operator overloads are just methods with funny names. One nice thing about the Lua approach is that you can keep interface-like functionality separate from methods, and avoid clogging up arbitrary objects’ namespaces. You could even use a dummy table as a key and completely avoid name collisions.)

    Anyway, code!

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    local class = {
        foo = function(a)
            print("foo got", a)
        end,
    }
    local mt = { __index = class }
    -- setmetatable returns its first argument, so this is nice shorthand
    local obj1 = setmetatable({}, mt)
    local obj2 = setmetatable({}, mt)
    obj1.foo(7)  -- foo got 7
    obj2.foo(9)  -- foo got 9
    

    Wait, wait, hang on. Didn’t I call these methods? How do they get at the object? Maybe Lua has a magical this variable?

    Methods, sort of

    Not quite, but this is where the other key feature comes in: method-call syntax. It’s the lightest touch of sugar, just enough to have method invocation.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    -- note the colon!
    a:b(c, d, ...)
    
    -- exactly equivalent to this
    -- (except that `a` is only evaluated once)
    a.b(a, c, d, ...)
    
    -- which of course is really this
    a["b"](a, c, d, ...)
    

    Now we can write methods that actually do something.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    local class = {
        bar = function(self)
            print("our score is", self.score)
        end,
    }
    local mt = { __index = class }
    local obj1 = setmetatable({ score = 13 }, mt)
    local obj2 = setmetatable({ score = 25 }, mt)
    obj1:bar()  -- our score is 13
    obj2:bar()  -- our score is 25
    

    And that’s all you need. Much like Python, methods and data live in the same namespace, and Lua doesn’t care whether obj:method() finds a function on obj or gets one from the metatable’s __index. Unlike Python, the function will be passed self either way, because self comes from the use of : rather than from the lookup behavior.

    (Aside: strictly speaking, any Lua value can have a metatable — and if you try to index a non-table, Lua will always consult the metatable’s __index. Strings all have the string library as a metatable, so you can call methods on them: try ("%s %s"):format(1, 2). I don’t think Lua lets user code set the metatable for non-tables, so this isn’t that interesting, but if you’re writing Lua bindings from C then you can wrap your pointers in metatables to give them methods implemented in C.)

    Bringing it all together

    Of course, writing all this stuff every time is a little tedious and error-prone, so instead you might want to wrap it all up inside a little function. No problem.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    local function make_object(body)
        -- create a metatable
        local mt = { __index = body }
        -- create a base table to serve as the object itself
        local obj = setmetatable({}, mt)
        -- and, done
        return obj
    end
    
    -- you can leave off parens if you're only passing in 
    local Dog = {
        -- this acts as a "default" value; if obj.barks is missing, __index will
        -- kick in and find this value on the class.  but if obj.barks is assigned
        -- to, it'll go in the object and shadow the value here.
        barks = 0,
    
        bark = function(self)
            self.barks = self.barks + 1
            print("woof!")
        end,
    }
    
    local mydog = make_object(Dog)
    mydog:bark()  -- woof!
    mydog:bark()  -- woof!
    mydog:bark()  -- woof!
    print(mydog.barks)  -- 3
    print(Dog.barks)  -- 0
    

    It works, but it’s fairly barebones. The nice thing is that you can extend it pretty much however you want. I won’t reproduce an entire serious object system here — lord knows there are enough of them floating around — but the implementation I have for my LÖVE games lets me do this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    local Animal = Object:extend{
        cries = 0,
    }
    
    -- called automatically by Object
    function Animal:init()
        print("whoops i couldn't think of anything interesting to put here")
    end
    
    -- this is just nice syntax for adding a first argument called 'self', then
    -- assigning this function to Animal.cry
    function Animal:cry()
        self.cries = self.cries + 1
    end
    
    local Cat = Animal:extend{}
    
    function Cat:cry()
        print("meow!")
        Cat.__super.cry(self)
    end
    
    local cat = Cat()
    cat:cry()  -- meow!
    cat:cry()  -- meow!
    print(cat.cries)  -- 2
    

    When I say you can extend it however you want, I mean that. I could’ve implemented Python (2)-style super(Cat, self):cry() syntax; I just never got around to it. I could even make it work with multiple inheritance if I really wanted to — or I could go the complete opposite direction and only implement composition. I could implement descriptors, customizing the behavior of individual table keys. I could add pretty decent syntax for composition/proxying. I am trying very hard to end this section now.

    The Lua philosophy

    Lua’s philosophy is to… not have a philosophy? It gives you the bare minimum to make objects work, and you can do absolutely whatever you want from there. Lua does have something resembling prototypical inheritance, but it’s not so much a first-class feature as an emergent property of some very simple tools. And since you can make __index be a function, you could avoid the prototypical behavior and do something different entirely.

    The very severe downside, of course, is that you have to find or build your own object system — which can get pretty confusing very quickly, what with the multiple small moving parts. Third-party code may also have its own object system with subtly different behavior. (Though, in my experience, third-party code tries very hard to avoid needing an object system at all.)

    It’s hard to say what the Lua “culture” is like, since Lua is an embedded language that’s often a little different in each environment. I imagine it has a thousand millicultures, instead. I can say that the tedium of building my own object model has led me into something very “traditional”, with prototypical inheritance and whatnot. It’s partly what I’m used to, but it’s also just really dang easy to get working.

    Likewise, while I love properties in Python and use them all the dang time, I’ve yet to use a single one in Lua. They wouldn’t be particularly hard to add to my object model, but having to add them myself (or shop around for an object model with them and also port all my code to use it) adds a huge amount of friction. I’ve thought about designing an interesting ECS with custom object behavior, too, but… is it really worth the effort? For all the power and flexibility Lua offers, the cost is that by the time I have something working at all, I’m too exhausted to actually use any of it.

    JavaScript

    JavaScript is notable for being preposterously heavily used, yet not having a class block.

    Well. Okay. Yes. It has one now. It didn’t for a very long time, and even the one it has now is sugar.

    Here’s a vector class again:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    class Vector {
        constructor(x, y) {
            this.x = x;
            this.y = y;
        }
    
        get magnitude() {
            return Math.sqrt(this.x * this.x + this.y * this.y);
        }
    
        dot(other) {
            return this.x * other.x + this.y * other.y;
        }
    }
    

    In “classic” JavaScript, this would be written as:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    function Vector(x, y) {
        this.x = x;
        this.y = y;
    }
    
    Object.defineProperty(Vector.prototype, 'magnitude', {
        configurable: true,
        enumerable: true,
        get: function() {
            return Math.sqrt(this.x * this.x + this.y * this.y);
        },
    });
    
    
    Vector.prototype.dot = function(other) {
        return this.x * other.x + this.y * other.y;
    };
    

    Hm, yes. I can see why they added class.

    The JavaScript model

    In JavaScript, a new type is defined in terms of a function, which is its constructor.

    Right away we get into trouble here. There is a very big difference between these two invocations, which I actually completely forgot about just now after spending four hours writing about Python and Lua:

    1
    2
    let vec = Vector(3, 4);
    let vec = new Vector(3, 4);
    

    The first calls the function Vector. It assigns some properties to this, which here is going to be window, so now you have a global x and y. It then returns nothing, so vec is undefined.

    The second calls Vector with this set to a new empty object, then evaluates to that object. The result is what you’d actually expect.

    (You can detect this situation with the strange new.target expression, but I have never once remembered to do so.)

    From here, we have true, honest-to-god, first-class prototypical inheritance. The word “prototype” is even right there. When you write this:

    1
    vec.dot(vec2)
    

    JavaScript will look for dot on vec and (presumably) not find it. It then consults vecs prototype, an object you can see for yourself by using Object.getPrototypeOf(). Since vec is a Vector, its prototype is Vector.prototype.

    I stress that Vector.prototype is not the prototype for Vector. It’s the prototype for instances of Vector.

    (I say “instance”, but the true type of vec here is still just object. If you want to find Vector, it’s automatically assigned to the constructor property of its own prototype, so it’s available as vec.constructor.)

    Of course, Vector.prototype can itself have a prototype, in which case the process would continue if dot were not found. A common (and, arguably, very bad) way to simulate single inheritance is to set Class.prototype to an instance of a superclass to get the prototype right, then tack on the methods for Class. Nowadays we can do Object.create(Superclass.prototype).

    Now that I’ve been through Python and Lua, though, this isn’t particularly surprising. I kinda spoiled it.

    I suppose one difference in JavaScript is that you can tack arbitrary attributes directly onto Vector all you like, and they will remain invisible to instances since they aren’t in the prototype chain. This is kind of backwards from Lua, where you can squirrel stuff away in the metatable.

    Another difference is that every single object in JavaScript has a bunch of properties already tacked on — the ones in Object.prototype. Every object (and by “object” I mean any mapping) has a prototype, and that prototype defaults to Object.prototype, and it has a bunch of ancient junk like isPrototypeOf.

    (Nit: it’s possible to explicitly create an object with no prototype via Object.create(null).)

    Like Lua, and unlike Python, JavaScript doesn’t distinguish between keys found on an object and keys found via a prototype. Properties can be defined on prototypes with Object.defineProperty(), but that works just as well directly on an object, too. JavaScript doesn’t have a lot of operator overloading, but some things like Symbol.iterator also work on both objects and prototypes.

    About this

    You may, at this point, be wondering what this is. Unlike Lua and Python (and the last language below), this is a special built-in value — a context value, invisibly passed for every function call.

    It’s determined by where the function came from. If the function was the result of an attribute lookup, then this is set to the object containing that attribute. Otherwise, this is set to the global object, window. (You can also set this to whatever you want via the call method on functions.)

    This decision is made lexically, i.e. from the literal source code as written. There are no Python-style bound methods. In other words:

    1
    2
    3
    4
    5
    // this = obj
    obj.method()
    // this = window
    let meth = obj.method
    meth()
    

    Also, because this is reassigned on every function call, it cannot be meaningfully closed over, which makes using closures within methods incredibly annoying. The old approach was to assign this to some other regular name like self (which got syntax highlighting since it’s also a built-in name in browsers); then we got Function.bind, which produced a callable thing with a fixed context value, which was kind of nice; and now finally we have arrow functions, which explicitly close over the current this when they’re defined and don’t change it when called. Phew.

    Class syntax

    I already showed class syntax, and it’s really just one big macro for doing all the prototype stuff The Right Way. It even prevents you from calling the type without new. The underlying model is exactly the same, and you can inspect all the parts.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    class Vector { ... }
    
    console.log(Vector.prototype);  // { dot: ..., magnitude: ..., ... }
    let vec = new Vector(3, 4);
    console.log(Object.getPrototypeOf(vec));  // same as Vector.prototype
    
    // i don't know why you would subclass vector but let's roll with it
    class Vectest extends Vector { ... }
    
    console.log(Vectest.prototype);  // { ... }
    console.log(Object.getPrototypeOf(Vectest.prototype))  // same as Vector.prototype
    

    Alas, class syntax has a couple shortcomings. You can’t use the class block to assign arbitrary data to either the type object or the prototype — apparently it was deemed too confusing that mutations would be shared among instances. Which… is… how prototypes work. How Python works. How JavaScript itself, one of the most popular languages of all time, has worked for twenty-two years. Argh.

    You can still do whatever assignment you want outside of the class block, of course. It’s just a little ugly, and not something I’d think to look for with a sugary class.

    A more subtle result of this behavior is that a class block isn’t quite the same syntax as an object literal. The check for data isn’t a runtime thing; class Foo { x: 3 } fails to parse. So JavaScript now has two largely but not entirely identical styles of key/value block.

    Attribute access

    Here’s where things start to come apart at the seams, just a little bit.

    JavaScript doesn’t really have an attribute protocol. Instead, it has two… extension points, I suppose.

    One is Object.defineProperty, seen above. For common cases, there’s also the get syntax inside a property literal, which does the same thing. But unlike Python’s @property, these aren’t wrappers around some simple primitives; they are the primitives. JavaScript is the only language of these four to have “property that runs code on access” as a completely separate first-class concept.

    If you want to intercept arbitrary attribute access (and some kinds of operators), there’s a completely different primitive: the Proxy type. It doesn’t let you intercept attribute access or operators; instead, it produces a wrapper object that supports interception and defers to the wrapped object by default.

    It’s cool to see composition used in this way, but also, extremely weird. If you want to make your own type that overloads in or calling, you have to return a Proxy that wraps your own type, rather than actually returning your own type. And (unlike the other three languages in this post) you can’t return a different type from a constructor, so you have to throw that away and produce objects only from a factory. And instanceof would be broken, but you can at least fix that with Symbol.hasInstance — which is really operator overloading, implement yet another completely different way.

    I know the design here is a result of legacy and speed — if any object could intercept all attribute access, then all attribute access would be slowed down everywhere. Fair enough. It still leaves the surface area of the language a bit… bumpy?

    The JavaScript philosophy

    It’s a little hard to tell. The original idea of prototypes was interesting, but it was hidden behind some very awkward syntax. Since then, we’ve gotten a bunch of extra features awkwardly bolted on to reflect the wildly varied things the built-in types and DOM API were already doing. We have class syntax, but it’s been explicitly designed to avoid exposing the prototype parts of the model.

    I admit I don’t do a lot of heavy JavaScript, so I might just be overlooking it, but I’ve seen virtually no code that makes use of any of the recent advances in object capabilities. Forget about custom iterators or overloading call; I can’t remember seeing any JavaScript in the wild that even uses properties yet. I don’t know if everyone’s waiting for sufficient browser support, nobody knows about them, or nobody cares.

    The model has advanced recently, but I suspect JavaScript is still shackled to its legacy of “something about prototypes, I don’t really get it, just copy the other code that’s there” as an object model. Alas! Prototypes are so good. Hopefully class syntax will make it a bit more accessible, as it has in Python.

    Perl 5

    Perl 5 also doesn’t have an object system and expects you to build your own. But where Lua gives you two simple, powerful tools for building one, Perl 5 feels more like a puzzle with half the pieces missing. Clearly they were going for something, but they only gave you half of it.

    In brief, a Perl object is a reference that has been blessed with a package.

    I need to explain a few things. Honestly, one of the biggest problems with the original Perl object setup was how many strange corners and unique jargon you had to understand just to get off the ground.

    (If you want to try running any of this code, you should stick a use v5.26; as the first line. Perl is very big on backwards compatibility, so you need to opt into breaking changes, and even the mundane say builtin is behind a feature gate.)

    References

    A reference in Perl is sort of like a pointer, but its main use is very different. See, Perl has the strange property that its data structures try very hard to spill their contents all over the place. Despite having dedicated syntax for arrays — @foo is an array variable, distinct from the single scalar variable $foo — it’s actually impossible to nest arrays.

    1
    2
    3
    my @foo = (1, 2, 3, 4);
    my @bar = (@foo, @foo);
    # @bar is now a flat list of eight items: 1, 2, 3, 4, 1, 2, 3, 4
    

    The idea, I guess, is that an array is not one thing. It’s not a container, which happens to hold multiple things; it is multiple things. Anywhere that expects a single value, such as an array element, cannot contain an array, because an array fundamentally is not a single value.

    And so we have “references”, which are a form of indirection, but also have the nice property that they’re single values. They add containment around arrays, and in general they make working with most of Perl’s primitive types much more sensible. A reference to a variable can be taken with the \ operator, or you can use [ ... ] and { ... } to directly create references to anonymous arrays or hashes.

    1
    2
    3
    my @foo = (1, 2, 3, 4);
    my @bar = (\@foo, \@foo);
    # @bar is now a nested list of two items: [1, 2, 3, 4], [1, 2, 3, 4]
    

    (Incidentally, this is the sole reason I initially abandoned Perl for Python. Non-trivial software kinda requires nesting a lot of data structures, so you end up with references everywhere, and the syntax for going back and forth between a reference and its contents is tedious and ugly.)

    A Perl object must be a reference. Perl doesn’t care what kind of reference — it’s usually a hash reference, since hashes are a convenient place to store arbitrary properties, but it could just as well be a reference to an array, a scalar, or even a sub (i.e. function) or filehandle.

    I’m getting a little ahead of myself. First, the other half: blessing and packages.

    Packages and blessing

    Perl packages are just namespaces. A package looks like this:

    1
    2
    3
    4
    5
    6
    7
    package Foo::Bar;
    
    sub quux {
        say "hi from quux!";
    }
    
    # now Foo::Bar::quux() can be called from anywhere
    

    Nothing shocking, right? It’s just a named container. A lot of the details are kind of weird, like how a package exists in some liminal quasi-value space, but the basic idea is a Bag Of Stuff.

    The final piece is “blessing,” which is Perl’s funny name for binding a package to a reference. A very basic class might look like this:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    package Vector;
    
    # the name 'new' is convention, not special
    sub new {
        # perl argument passing is weird, don't ask
        my ($class, $x, $y) = @_;
    
        # create the object itself -- here, unusually, an array reference makes sense
        my $self = [ $x, $y ];
    
        # associate the package with that reference
        # note that $class here is just the regular string, 'Vector'
        bless $self, $class;
    
        return $self;
    }
    
    sub x {
        my ($self) = @_;
        return $self->[0];
    }
    
    sub y {
        my ($self) = @_;
        return $self->[1];
    }
    
    sub magnitude {
        my ($self) = @_;
        return sqrt($self->x ** 2 + $self->y ** 2);
    }
    
    # switch back to the "default" package
    package main;
    
    # -> is method call syntax, which passes the invocant as the first argument;
    # for a package, that's just the package name
    my $vec = Vector->new(3, 4);
    say $vec->magnitude;  # 5
    

    A few things of note here. First, $self->[0] has nothing to do with objects; it’s normal syntax for getting the value of a index 0 out of an array reference called $self. (Most classes are based on hashrefs and would use $self->{value} instead.) A blessed reference is still a reference and can be treated like one.

    In general, -> is Perl’s dereferencey operator, but its exact behavior depends on what follows. If it’s followed by brackets, then it’ll apply the brackets to the thing in the reference: ->{} to index a hash reference, ->[] to index an array reference, and ->() to call a function reference.

    But if -> is followed by an identifier, then it’s a method call. For packages, that means calling a function in the package and passing the package name as the first argument. For objects — blessed references — that means calling a function in the associated package and passing the object as the first argument.

    This is a little weird! A blessed reference is a superposition of two things: its normal reference behavior, and some completely orthogonal object behavior. Also, object behavior has no notion of methods vs data; it only knows about methods. Perl lets you omit parentheses in a lot of places, including when calling a method with no arguments, so $vec->magnitude is really $vec->magnitude().

    Perl’s blessing bears some similarities to Lua’s metatables, but ultimately Perl is much closer to Ruby’s “message passing” approach than the above three languages’ approaches of “get me something and maybe it’ll be callable”. (But this is no surprise — Ruby is a spiritual successor to Perl 5.)

    All of this leads to one little wrinkle: how do you actually expose data? Above, I had to write x and y methods. Am I supposed to do that for every single attribute on my type?

    Yes! But don’t worry, there are third-party modules to help with this incredibly fundamental task. Take Class::Accessor::Fast, so named because it’s faster than Class::Accessor:

    1
    2
    3
    package Foo;
    use base qw(Class::Accessor::Fast);
    __PACKAGE__->mk_accessors(qw(fred wilma barney));
    

    (__PACKAGE__ is the lexical name of the current package; qw(...) is a list literal that splits its contents on whitespace.)

    This assumes you’re using a hashref with keys of the same names as the attributes. $obj->fred will return the fred key from your hashref, and $obj->fred(4) will change it to 4.

    You also, somewhat bizarrely, have to inherit from Class::Accessor::Fast. Speaking of which,

    Inheritance

    Inheritance is done by populating the package-global @ISA array with some number of (string) names of parent packages. Most code instead opts to write use base ...;, which does the same thing. Or, more commonly, use parent ...;, which… also… does the same thing.

    Every package implicitly inherits from UNIVERSAL, which can be freely modified by Perl code.

    A method can call its superclass method with the SUPER:: pseudo-package:

    1
    2
    3
    4
    sub foo {
        my ($self) = @_;
        $self->SUPER::foo;
    }
    

    However, this does a depth-first search, which means it almost certainly does the wrong thing when faced with multiple inheritance. For a while the accepted solution involved a third-party module, but Perl eventually grew an alternative you have to opt into: C3, which may be more familiar to you as the order Python uses.

    1
    2
    3
    4
    5
    6
    use mro 'c3';
    
    sub foo {
        my ($self) = @_;
        $self->next::method;
    }
    

    Offhand, I’m not actually sure how next::method works, seeing as it was originally implemented in pure Perl code. I suspect it involves peeking at the caller’s stack frame. If so, then this is a very different style of customizability from e.g. Python — the MRO was never intended to be pluggable, and the use of a special pseudo-package means it isn’t really, but someone was determined enough to make it happen anyway.

    Operator overloading and whatnot

    Operator overloading looks a little weird, though really it’s pretty standard Perl.

    1
    2
    3
    4
    5
    6
    7
    8
    package MyClass;
    
    use overload '+' => \&_add;
    
    sub _add {
        my ($self, $other, $swap) = @_;
        ...
    }
    

    use overload here is a pragma, where “pragma” means “regular-ass module that does some wizardry when imported”.

    \&_add is how you get a reference to the _add sub so you can pass it to the overload module. If you just said &_add or _add, that would call it.

    And that’s it; you just pass a map of operators to functions to this built-in module. No worry about name clashes or pollution, which is pretty nice. You don’t even have to give references to functions that live in the package, if you don’t want them to clog your namespace; you could put them in another package, or even inline them anonymously.

    One especially interesting thing is that Perl lets you overload every operator. Perl has a lot of operators. It considers some math builtins like sqrt and trig functions to be operators, or at least operator-y enough that you can overload them. You can also overload the “file text” operators, such as -e $path to test whether a file exists. You can overload conversions, including implicit conversion to a regex. And most fascinating to me, you can overload dereferencing — that is, the thing Perl does when you say $hashref->{key} to get at the underlying hash. So a single object could pretend to be references of multiple different types, including a subref to implement callability. Neat.

    Somewhat related: you can overload basic operators (indexing, etc.) on basic types (not references!) with the tie function, which is designed completely differently and looks for methods with fixed names. Go figure.

    You can intercept calls to nonexistent methods by implementing a function called AUTOLOAD, within which the $AUTOLOAD global will contain the name of the method being called. Originally this feature was, I think, intended for loading binary components or large libraries on-the-fly only when needed, hence the name. Offhand I’m not sure I ever saw it used the way __getattr__ is used in Python.

    Is there a way to intercept all method calls? I don’t think so, but it is Perl, so I must be forgetting something.

    Actually no one does this any more

    Like a decade ago, a council of elder sages sat down and put together a whole whizbang system that covers all of it: Moose.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    package Vector;
    use Moose;
    
    has x => (is => 'rw', isa => 'Int');
    has y => (is => 'rw', isa => 'Int');
    
    sub magnitude {
        my ($self) = @_;
        return sqrt($self->x ** 2 + $self->y ** 2);
    }
    

    Moose has its own way to do pretty much everything, and it’s all built on the same primitives. Moose also adds metaclasses, somehow, despite that the underlying model doesn’t actually support them? I’m not entirely sure how they managed that, but I do remember doing some class introspection with Moose and it was much nicer than the built-in way.

    (If you’re wondering, the built-in way begins with looking at the hash called %Vector::. No, that’s not a typo.)

    I really cannot stress enough just how much stuff Moose does, but I don’t want to delve into it here since Moose itself is not actually the language model.

    The Perl philosophy

    I hope you can see what I meant with what I first said about Perl, now. It has multiple inheritance with an MRO, but uses the wrong one by default. It has extensive operator overloading, which looks nothing like how inheritance works, and also some of it uses a totally different mechanism with special method names instead. It only understands methods, not data, leaving you to figure out accessors by hand.

    There’s 70% of an object system here with a clear general design it was gunning for, but none of the pieces really look anything like each other. It’s weird, in a distinctly Perl way.

    The result is certainly flexible, at least! It’s especially cool that you can use whatever kind of reference you want for storage, though even as I say that, I acknowledge it’s no different from simply subclassing list or something in Python. It feels different in Perl, but maybe only because it looks so different.

    I haven’t written much Perl in a long time, so I don’t know what the community is like any more. Moose was already ubiquitous when I left, which you’d think would let me say “the community mostly focuses on the stuff Moose can do” — but even a decade ago, Moose could already do far more than I had ever seen done by hand in Perl. It’s always made a big deal out of roles (read: interfaces), for instance, despite that I’d never seen anyone care about them in Perl before Moose came along. Maybe their presence in Moose has made them more popular? Who knows.

    Also, I wrote Perl seriously, but in the intervening years I’ve only encountered people who only ever used Perl for one-offs. Maybe it’ll come as a surprise to a lot of readers that Perl has an object model at all.

    End

    Well, that was fun! I hope any of that made sense.

    Special mention goes to Rust, which doesn’t have an object model you can fiddle with at runtime, but does do things a little differently.

    It’s been really interesting thinking about how tiny differences make a huge impact on what people do in practice. Take the choice of storage in Perl versus Python. Perl’s massively common URI class uses a string as the storage, nothing else; I haven’t seen anything like that in Python aside from markupsafe, which is specifically designed as a string type. I would guess this is partly because Perl makes you choose — using a hashref is an obvious default, but you have to make that choice one way or the other. In Python (especially 3), inheriting from object and getting dict-based storage is the obvious thing to do; the ability to use another type isn’t quite so obvious, and doing it “right” involves a tiny bit of extra work.

    Or, consider that Lua could have descriptors, but the extra bit of work (especially design work) has been enough of an impediment that I’ve never implemented them. I don’t think the object implementations I’ve looked at have included them, either. Super weird!

    In that light, it’s only natural that objects would be so strongly associated with the features Java and C++ attach to them. I think that makes it all the more important to play around! Look at what Moose has done. No, really, you should bear in mind my description of how Perl does stuff and flip through the Moose documentation. It’s amazing what they’ve built.

    Introducing AWS AppSync – Build data-driven apps with real-time and off-line capabilities

    Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/introducing-amazon-appsync/

    In this day and age, it is almost impossible to do without our mobile devices and the applications that help make our lives easier. As our dependency on our mobile phone grows, the mobile application market has exploded with millions of apps vying for our attention. For mobile developers, this means that we must ensure that we build applications that provide the quality, real-time experiences that app users desire.  Therefore, it has become essential that mobile applications are developed to include features such as multi-user data synchronization, offline network support, and data discovery, just to name a few.  According to several articles, I read recently about mobile development trends on publications like InfoQ, DZone, and the mobile development blog AlleviateTech, one of the key elements in of delivering the aforementioned capabilities is with cloud-driven mobile applications.  It seems that this is especially true, as it related to mobile data synchronization and data storage.

    That being the case, it is a perfect time for me to announce a new service for building innovative mobile applications that are driven by data-intensive services in the cloud; AWS AppSync. AWS AppSync is a fully managed serverless GraphQL service for real-time data queries, synchronization, communications and offline programming features. For those not familiar, let me briefly share some information about the open GraphQL specification. GraphQL is a responsive data query language and server-side runtime for querying data sources that allow for real-time data retrieval and dynamic query execution. You can use GraphQL to build a responsive API for use in when building client applications. GraphQL works at the application layer and provides a type system for defining schemas. These schemas serve as specifications to define how operations should be performed on the data and how the data should be structured when retrieved. Additionally, GraphQL has a declarative coding model which is supported by many client libraries and frameworks including React, React Native, iOS, and Android.

    Now the power of the GraphQL open standard query language is being brought to you in a rich managed service with AWS AppSync.  With AppSync developers can simplify the retrieval and manipulation of data across multiple data sources with ease, allowing them to quickly prototype, build and create robust, collaborative, multi-user applications. AppSync keeps data updated when devices are connected, but enables developers to build solutions that work offline by caching data locally and synchronizing local data when connections become available.

    Let’s discuss some key concepts of AWS AppSync and how the service works.

    AppSync Concepts

    • AWS AppSync Client: service client that defines operations, wraps authorization details of requests, and manage offline logic.
    • Data Source: the data storage system or a trigger housing data
    • Identity: a set of credentials with permissions and identification context provided with requests to GraphQL proxy
    • GraphQL Proxy: the GraphQL engine component for processing and mapping requests, handling conflict resolution, and managing Fine Grained Access Control
    • Operation: one of three GraphQL operations supported in AppSync
      • Query: a read-only fetch call to the data
      • Mutation: a write of the data followed by a fetch,
      • Subscription: long-lived connections that receive data in response to events.
    • Action: a notification to connected subscribers from a GraphQL subscription.
    • Resolver: function using request and response mapping templates that converts and executes payload against data source

    How It Works

    A schema is created to define types and capabilities of the desired GraphQL API and tied to a Resolver function.  The schema can be created to mirror existing data sources or AWS AppSync can create tables automatically based the schema definition. Developers can also use GraphQL features for data discovery without having knowledge of the backend data sources. After a schema definition is established, an AWS AppSync client can be configured with an operation request, like a Query operation. The client submits the operation request to GraphQL Proxy along with an identity context and credentials. The GraphQL Proxy passes this request to the Resolver which maps and executes the request payload against pre-configured AWS data services like an Amazon DynamoDB table, an AWS Lambda function, or a search capability using Amazon Elasticsearch. The Resolver executes calls to one or all of these services within a single network call minimizing CPU cycles and bandwidth needs and returns the response to the client. Additionally, the client application can change data requirements in code on demand and the AppSync GraphQL API will dynamically map requests for data accordingly, allowing prototyping and faster development.

    In order to take a quick peek at the service, I’ll go to the AWS AppSync console. I’ll click the Create API button to get started.

     

    When the Create new API screen opens, I’ll give my new API a name, TarasTestApp, and since I am just exploring the new service I will select the Sample schema option.  You may notice from the informational dialog box on the screen that in using the sample schema, AWS AppSync will automatically create the DynamoDB tables and the IAM roles for me.It will also deploy the TarasTestApp API on my behalf.  After review of the sample schema provided by the console, I’ll click the Create button to create my test API.

    After the TaraTestApp API has been created and the associated AWS resources provisioned on my behalf, I can make updates to the schema, data source, or connect my data source(s) to a resolver. I also can integrate my GraphQL API into an iOS, Android, Web, or React Native application by cloning the sample repo from GitHub and downloading the accompanying GraphQL schema.  These application samples are great to help get you started and they are pre-configured to function in offline scenarios.

    If I select the Schema menu option on the console, I can update and view the TarasTestApp GraphQL API schema.


    Additionally, if I select the Data Sources menu option in the console, I can see the existing data sources.  Within this screen, I can update, delete, or add data sources if I so desire.

    Next, I will select the Query menu option which takes me to the console tool for writing and testing queries. Since I chose the sample schema and the AWS AppSync service did most of the heavy lifting for me, I’ll try a query against my new GraphQL API.

    I’ll use a mutation to add data for the event type in my schema. Since this is a mutation and it first writes data and then does a read of the data, I want the query to return values for name and where.

    If I go to the DynamoDB table created for the event type in the schema, I will see that the values from my query have been successfully written into the table. Now that was a pretty simple task to write and retrieve data based on a GraphQL API schema from a data source, don’t you think.


     Summary

    AWS AppSync is currently in AWS AppSync is in Public Preview and you can sign up today. It supports development for iOS, Android, and JavaScript applications. You can take advantage of this managed GraphQL service by going to the AWS AppSync console or learn more by reviewing more details about the service by reading a tutorial in the AWS documentation for the service or checking out our AWS AppSync Developer Guide.

    Tara

     

    Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production

    Post Syndicated from Rafi Ton original https://aws.amazon.com/blogs/big-data/using-amazon-redshift-spectrum-amazon-athena-and-aws-glue-with-node-js-in-production/

    This is a guest post by Rafi Ton, founder and CEO of NUVIAD. NUVIAD is, in their own words, “a mobile marketing platform providing professional marketers, agencies and local businesses state of the art tools to promote their products and services through hyper targeting, big data analytics and advanced machine learning tools.”

    At NUVIAD, we’ve been using Amazon Redshift as our main data warehouse solution for more than 3 years.

    We store massive amounts of ad transaction data that our users and partners analyze to determine ad campaign strategies. When running real-time bidding (RTB) campaigns in large scale, data freshness is critical so that our users can respond rapidly to changes in campaign performance. We chose Amazon Redshift because of its simplicity, scalability, performance, and ability to load new data in near real time.

    Over the past three years, our customer base grew significantly and so did our data. We saw our Amazon Redshift cluster grow from three nodes to 65 nodes. To balance cost and analytics performance, we looked for a way to store large amounts of less-frequently analyzed data at a lower cost. Yet, we still wanted to have the data immediately available for user queries and to meet their expectations for fast performance. We turned to Amazon Redshift Spectrum.

    In this post, I explain the reasons why we extended Amazon Redshift with Redshift Spectrum as our modern data warehouse. I cover how our data growth and the need to balance cost and performance led us to adopt Redshift Spectrum. I also share key performance metrics in our environment, and discuss the additional AWS services that provide a scalable and fast environment, with data available for immediate querying by our growing user base.

    Amazon Redshift as our foundation

    The ability to provide fresh, up-to-the-minute data to our customers and partners was always a main goal with our platform. We saw other solutions provide data that was a few hours old, but this was not good enough for us. We insisted on providing the freshest data possible. For us, that meant loading Amazon Redshift in frequent micro batches and allowing our customers to query Amazon Redshift directly to get results in near real time.

    The benefits were immediately evident. Our customers could see how their campaigns performed faster than with other solutions, and react sooner to the ever-changing media supply pricing and availability. They were very happy.

    However, this approach required Amazon Redshift to store a lot of data for long periods, and our data grew substantially. In our peak, we maintained a cluster running 65 DC1.large nodes. The impact on our Amazon Redshift cluster was evident, and we saw our CPU utilization grow to 90%.

    Why we extended Amazon Redshift to Redshift Spectrum

    Redshift Spectrum gives us the ability to run SQL queries using the powerful Amazon Redshift query engine against data stored in Amazon S3, without needing to load the data. With Redshift Spectrum, we store data where we want, at the cost that we want. We have the data available for analytics when our users need it with the performance they expect.

    Seamless scalability, high performance, and unlimited concurrency

    Scaling Redshift Spectrum is a simple process. First, it allows us to leverage Amazon S3 as the storage engine and get practically unlimited data capacity.

    Second, if we need more compute power, we can leverage Redshift Spectrum’s distributed compute engine over thousands of nodes to provide superior performance – perfect for complex queries running against massive amounts of data.

    Third, all Redshift Spectrum clusters access the same data catalog so that we don’t have to worry about data migration at all, making scaling effortless and seamless.

    Lastly, since Redshift Spectrum distributes queries across potentially thousands of nodes, they are not affected by other queries, providing much more stable performance and unlimited concurrency.

    Keeping it SQL

    Redshift Spectrum uses the same query engine as Amazon Redshift. This means that we did not need to change our BI tools or query syntax, whether we used complex queries across a single table or joins across multiple tables.

    An interesting capability introduced recently is the ability to create a view that spans both Amazon Redshift and Redshift Spectrum external tables. With this feature, you can query frequently accessed data in your Amazon Redshift cluster and less-frequently accessed data in Amazon S3, using a single view.

    Leveraging Parquet for higher performance

    Parquet is a columnar data format that provides superior performance and allows Redshift Spectrum (or Amazon Athena) to scan significantly less data. With less I/O, queries run faster and we pay less per query. You can read all about Parquet at https://parquet.apache.org/ or https://en.wikipedia.org/wiki/Apache_Parquet.

    Lower cost

    From a cost perspective, we pay standard rates for our data in Amazon S3, and only small amounts per query to analyze data with Redshift Spectrum. Using the Parquet format, we can significantly reduce the amount of data scanned. Our costs are now lower, and our users get fast results even for large complex queries.

    What we learned about Amazon Redshift vs. Redshift Spectrum performance

    When we first started looking at Redshift Spectrum, we wanted to put it to the test. We wanted to know how it would compare to Amazon Redshift, so we looked at two key questions:

    1. What is the performance difference between Amazon Redshift and Redshift Spectrum on simple and complex queries?
    2. Does the data format impact performance?

    During the migration phase, we had our dataset stored in Amazon Redshift and S3 as CSV/GZIP and as Parquet file formats. We tested three configurations:

    • Amazon Redshift cluster with 28 DC1.large nodes
    • Redshift Spectrum using CSV/GZIP
    • Redshift Spectrum using Parquet

    We performed benchmarks for simple and complex queries on one month’s worth of data. We tested how much time it took to perform the query, and how consistent the results were when running the same query multiple times. The data we used for the tests was already partitioned by date and hour. Properly partitioning the data improves performance significantly and reduces query times.

    Simple query

    First, we tested a simple query aggregating billing data across a month:

    SELECT 
      user_id, 
      count(*) AS impressions, 
      SUM(billing)::decimal /1000000 AS billing 
    FROM <table_name> 
    WHERE 
      date >= '2017-08-01' AND 
      date <= '2017-08-31'  
    GROUP BY 
      user_id;

    We ran the same query seven times and measured the response times (red marking the longest time and green the shortest time):

    Execution Time (seconds)
      Amazon Redshift Redshift Spectrum
    CSV
    Redshift Spectrum Parquet
    Run #1 39.65 45.11 11.92
    Run #2 15.26 43.13 12.05
    Run #3 15.27 46.47 13.38
    Run #4 21.22 51.02 12.74
    Run #5 17.27 43.35 11.76
    Run #6 16.67 44.23 13.67
    Run #7 25.37 40.39 12.75
    Average 21.53  44.82 12.61

    For simple queries, Amazon Redshift performed better than Redshift Spectrum, as we thought, because the data is local to Amazon Redshift.

    What was surprising was that using Parquet data format in Redshift Spectrum significantly beat ‘traditional’ Amazon Redshift performance. For our queries, using Parquet data format with Redshift Spectrum delivered an average 40% performance gain over traditional Amazon Redshift. Furthermore, Redshift Spectrum showed high consistency in execution time with a smaller difference between the slowest run and the fastest run.

    Comparing the amount of data scanned when using CSV/GZIP and Parquet, the difference was also significant:

    Data Scanned (GB)
    CSV (Gzip) 135.49
    Parquet 2.83

    Because we pay only for the data scanned by Redshift Spectrum, the cost saving of using Parquet is evident and substantial.

    Complex query

    Next, we compared the same three configurations with a complex query.

    Execution Time (seconds)
      Amazon Redshift Redshift Spectrum CSV Redshift Spectrum Parquet
    Run #1 329.80 84.20 42.40
    Run #2 167.60 65.30 35.10
    Run #3 165.20 62.20 23.90
    Run #4 273.90 74.90 55.90
    Run #5 167.70 69.00 58.40
    Average 220.84 71.12 43.14

    This time, Redshift Spectrum using Parquet cut the average query time by 80% compared to traditional Amazon Redshift!

    Bottom line: For complex queries, Redshift Spectrum provided a 67% performance gain over Amazon Redshift. Using the Parquet data format, Redshift Spectrum delivered an 80% performance improvement over Amazon Redshift. For us, this was substantial.

    Optimizing the data structure for different workloads

    Because the cost of S3 is relatively inexpensive and we pay only for the data scanned by each query, we believe that it makes sense to keep our data in different formats for different workloads and different analytics engines. It is important to note that we can have any number of tables pointing to the same data on S3. It all depends on how we partition the data and update the table partitions.

    Data permutations

    For example, we have a process that runs every minute and generates statistics for the last minute of data collected. With Amazon Redshift, this would be done by running the query on the table with something as follows:

    SELECT 
      user, 
      COUNT(*) 
    FROM 
      events_table 
    WHERE 
      ts BETWEEN ‘2017-08-01 14:00:00’ AND ‘2017-08-01 14:00:59’ 
    GROUP BY 
      user;

    (Assuming ‘ts’ is your column storing the time stamp for each event.)

    With Redshift Spectrum, we pay for the data scanned in each query. If the data is partitioned by the minute instead of the hour, a query looking at one minute would be 1/60th the cost. If we use a temporary table that points only to the data of the last minute, we save that unnecessary cost.

    Creating Parquet data efficiently

    On the average, we have 800 instances that process our traffic. Each instance sends events that are eventually loaded into Amazon Redshift. When we started three years ago, we would offload data from each server to S3 and then perform a periodic copy command from S3 to Amazon Redshift.

    Recently, Amazon Kinesis Firehose added the capability to offload data directly to Amazon Redshift. While this is now a viable option, we kept the same collection process that worked flawlessly and efficiently for three years.

    This changed, however, when we incorporated Redshift Spectrum. With Redshift Spectrum, we needed to find a way to:

    • Collect the event data from the instances.
    • Save the data in Parquet format.
    • Partition the data effectively.

    To accomplish this, we save the data as CSV and then transform it to Parquet. The most effective method to generate the Parquet files is to:

    1. Send the data in one-minute intervals from the instances to Kinesis Firehose with an S3 temporary bucket as the destination.
    2. Aggregate hourly data and convert it to Parquet using AWS Lambda and AWS Glue.
    3. Add the Parquet data to S3 by updating the table partitions.

    With this new process, we had to give more attention to validating the data before we sent it to Kinesis Firehose, because a single corrupted record in a partition fails queries on that partition.

    Data validation

    To store our click data in a table, we considered the following SQL create table command:

    create external TABLE spectrum.blog_clicks (
        user_id varchar(50),
        campaign_id varchar(50),
        os varchar(50),
        ua varchar(255),
        ts bigint,
        billing float
    )
    partitioned by (date date, hour smallint)  
    stored as parquet
    location 's3://nuviad-temp/blog/clicks/';

    The above statement defines a new external table (all Redshift Spectrum tables are external tables) with a few attributes. We stored ‘ts’ as a Unix time stamp and not as Timestamp, and billing data is stored as float and not decimal (more on that later). We also said that the data is partitioned by date and hour, and then stored as Parquet on S3.

    First, we need to get the table definitions. This can be achieved by running the following query:

    SELECT 
      * 
    FROM 
      svv_external_columns 
    WHERE 
      tablename = 'blog_clicks';

    This query lists all the columns in the table with their respective definitions:

    schemaname tablename columnname external_type columnnum part_key
    spectrum blog_clicks user_id varchar(50) 1 0
    spectrum blog_clicks campaign_id varchar(50) 2 0
    spectrum blog_clicks os varchar(50) 3 0
    spectrum blog_clicks ua varchar(255) 4 0
    spectrum blog_clicks ts bigint 5 0
    spectrum blog_clicks billing double 6 0
    spectrum blog_clicks date date 7 1
    spectrum blog_clicks hour smallint 8 2

    Now we can use this data to create a validation schema for our data:

    const rtb_request_schema = {
        "name": "clicks",
        "items": {
            "user_id": {
                "type": "string",
                "max_length": 100
            },
            "campaign_id": {
                "type": "string",
                "max_length": 50
            },
            "os": {
                "type": "string",
                "max_length": 50            
            },
            "ua": {
                "type": "string",
                "max_length": 255            
            },
            "ts": {
                "type": "integer",
                "min_value": 0,
                "max_value": 9999999999999
            },
            "billing": {
                "type": "float",
                "min_value": 0,
                "max_value": 9999999999999
            }
        }
    };

    Next, we create a function that uses this schema to validate data:

    function valueIsValid(value, item_schema) {
        if (schema.type == 'string') {
            return (typeof value == 'string' && value.length <= schema.max_length);
        }
        else if (schema.type == 'integer') {
            return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
        }
        else if (schema.type == 'float' || schema.type == 'double') {
            return (typeof value == 'number' && value >= schema.min_value && value <= schema.max_value);
        }
        else if (schema.type == 'boolean') {
            return typeof value == 'boolean';
        }
        else if (schema.type == 'timestamp') {
            return (new Date(value)).getTime() > 0;
        }
        else {
            return true;
        }
    }

    Near real-time data loading with Kinesis Firehose

    On Kinesis Firehose, we created a new delivery stream to handle the events as follows:

    Delivery stream name: events
    Source: Direct PUT
    S3 bucket: nuviad-events
    S3 prefix: rtb/
    IAM role: firehose_delivery_role_1
    Data transformation: Disabled
    Source record backup: Disabled
    S3 buffer size (MB): 100
    S3 buffer interval (sec): 60
    S3 Compression: GZIP
    S3 Encryption: No Encryption
    Status: ACTIVE
    Error logging: Enabled

    This delivery stream aggregates event data every minute, or up to 100 MB, and writes the data to an S3 bucket as a CSV/GZIP compressed file. Next, after we have the data validated, we can safely send it to our Kinesis Firehose API:

    if (validated) {
        let itemString = item.join('|')+'\n'; //Sending csv delimited by pipe and adding new line
    
        let params = {
            DeliveryStreamName: 'events',
            Record: {
                Data: itemString
            }
        };
    
        firehose.putRecord(params, function(err, data) {
            if (err) {
                console.error(err, err.stack);        
            }
            else {
                // Continue to your next step 
            }
        });
    }

    Now, we have a single CSV file representing one minute of event data stored in S3. The files are named automatically by Kinesis Firehose by adding a UTC time prefix in the format YYYY/MM/DD/HH before writing objects to S3. Because we use the date and hour as partitions, we need to change the file naming and location to fit our Redshift Spectrum schema.

    Automating data distribution using AWS Lambda

    We created a simple Lambda function triggered by an S3 put event that copies the file to a different location (or locations), while renaming it to fit our data structure and processing flow. As mentioned before, the files generated by Kinesis Firehose are structured in a pre-defined hierarchy, such as:

    S3://your-bucket/your-prefix/2017/08/01/20/events-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz

    All we need to do is parse the object name and restructure it as we see fit. In our case, we did the following (the event is an object received in the Lambda function with all the data about the object written to S3):

    /*
    	object key structure in the event object:
    your-prefix/2017/08/01/20/event-4-2017-08-01-20-06-06-536f5c40-6893-4ee4-907d-81e4d3b09455.gz
    	*/
    
    let key_parts = event.Records[0].s3.object.key.split('/'); 
    
    let event_type = key_parts[0];
    let date = key_parts[1] + '-' + key_parts[2] + '-' + key_parts[3];
    let hour = key_parts[4];
    if (hour.indexOf('0') == 0) {
     		hour = parseInt(hour, 10) + '';
    }
        
    let parts1 = key_parts[5].split('-');
    let minute = parts1[7];
    if (minute.indexOf('0') == 0) {
            minute = parseInt(minute, 10) + '';
    }

    Now, we can redistribute the file to the two destinations we need—one for the minute processing task and the other for hourly aggregation:

        copyObjectToHourlyFolder(event, date, hour, minute)
            .then(copyObjectToMinuteFolder.bind(null, event, date, hour, minute))
            .then(addPartitionToSpectrum.bind(null, event, date, hour, minute))
            .then(deleteOldMinuteObjects.bind(null, event))
            .then(deleteStreamObject.bind(null, event))        
            .then(result => {
                callback(null, { message: 'done' });            
            })
            .catch(err => {
                console.error(err);
                callback(null, { message: err });            
            }); 

    Kinesis Firehose stores the data in a temporary folder. We copy the object to another folder that holds the data for the last processed minute. This folder is connected to a small Redshift Spectrum table where the data is being processed without needing to scan a much larger dataset. We also copy the data to a folder that holds the data for the entire hour, to be later aggregated and converted to Parquet.

    Because we partition the data by date and hour, we created a new partition on the Redshift Spectrum table if the processed minute is the first minute in the hour (that is, minute 0). We ran the following:

    ALTER TABLE 
      spectrum.events 
    ADD partition
      (date='2017-08-01', hour=0) 
      LOCATION 's3://nuviad-temp/events/2017-08-01/0/';

    After the data is processed and added to the table, we delete the processed data from the temporary Kinesis Firehose storage and from the minute storage folder.

    Migrating CSV to Parquet using AWS Glue and Amazon EMR

    The simplest way we found to run an hourly job converting our CSV data to Parquet is using Lambda and AWS Glue (and thanks to the awesome AWS Big Data team for their help with this).

    Creating AWS Glue jobs

    What this simple AWS Glue script does:

    • Gets parameters for the job, date, and hour to be processed
    • Creates a Spark EMR context allowing us to run Spark code
    • Reads CSV data into a DataFrame
    • Writes the data as Parquet to the destination S3 bucket
    • Adds or modifies the Redshift Spectrum / Amazon Athena table partition for the table
    import sys
    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    import boto3
    
    ## @params: [JOB_NAME]
    args = getResolvedOptions(sys.argv, ['JOB_NAME','day_partition_key', 'hour_partition_key', 'day_partition_value', 'hour_partition_value' ])
    
    #day_partition_key = "partition_0"
    #hour_partition_key = "partition_1"
    #day_partition_value = "2017-08-01"
    #hour_partition_value = "0"
    
    day_partition_key = args['day_partition_key']
    hour_partition_key = args['hour_partition_key']
    day_partition_value = args['day_partition_value']
    hour_partition_value = args['hour_partition_value']
    
    print("Running for " + day_partition_value + "/" + hour_partition_value)
    
    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)
    
    df = spark.read.option("delimiter","|").csv("s3://nuviad-temp/events/"+day_partition_value+"/"+hour_partition_value)
    df.registerTempTable("data")
    
    df1 = spark.sql("select _c0 as user_id, _c1 as campaign_id, _c2 as os, _c3 as ua, cast(_c4 as bigint) as ts, cast(_c5 as double) as billing from data")
    
    df1.repartition(1).write.mode("overwrite").parquet("s3://nuviad-temp/parquet/"+day_partition_value+"/hour="+hour_partition_value)
    
    client = boto3.client('athena', region_name='us-east-1')
    
    response = client.start_query_execution(
        QueryString='alter table parquet_events add if not exists partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ')  location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
        QueryExecutionContext={
            'Database': 'spectrumdb'
        },
        ResultConfiguration={
            'OutputLocation': 's3://nuviad-temp/convertresults'
        }
    )
    
    response = client.start_query_execution(
        QueryString='alter table parquet_events partition(' + day_partition_key + '=\'' + day_partition_value + '\',' + hour_partition_key + '=' + hour_partition_value + ') set location \'s3://nuviad-temp/parquet/' + day_partition_value + '/hour=' + hour_partition_value + '\'' ,
        QueryExecutionContext={
            'Database': 'spectrumdb'
        },
        ResultConfiguration={
            'OutputLocation': 's3://nuviad-temp/convertresults'
        }
    )
    
    job.commit()

    Note: Because Redshift Spectrum and Athena both use the AWS Glue Data Catalog, we could use the Athena client to add the partition to the table.

    Here are a few words about float, decimal, and double. Using decimal proved to be more challenging than we expected, as it seems that Redshift Spectrum and Spark use them differently. Whenever we used decimal in Redshift Spectrum and in Spark, we kept getting errors, such as:

    S3 Query Exception (Fetch). Task failed due to an internal error. File 'https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parquet has an incompatible Parquet schema for column 's3://nuviad-events/events.lat'. Column type: DECIMAL(18, 8), Parquet schema:\noptional float lat [i:4 d:1 r:0]\n (https://s3-external-1.amazonaws.com/nuviad-temp/events/2017-08-01/hour=2/part-00017-48ae5b6b-906e-4875-8cde-bc36c0c6d0ca.c000.snappy.parq

    We had to experiment with a few floating-point formats until we found that the only combination that worked was to define the column as double in the Spark code and float in Spectrum. This is the reason you see billing defined as float in Spectrum and double in the Spark code.

    Creating a Lambda function to trigger conversion

    Next, we created a simple Lambda function to trigger the AWS Glue script hourly using a simple Python code:

    import boto3
    import json
    from datetime import datetime, timedelta
     
    client = boto3.client('glue')
     
    def lambda_handler(event, context):
        last_hour_date_time = datetime.now() - timedelta(hours = 1)
        day_partition_value = last_hour_date_time.strftime("%Y-%m-%d") 
        hour_partition_value = last_hour_date_time.strftime("%-H") 
        response = client.start_job_run(
        JobName='convertEventsParquetHourly',
        Arguments={
             '--day_partition_key': 'date',
             '--hour_partition_key': 'hour',
             '--day_partition_value': day_partition_value,
             '--hour_partition_value': hour_partition_value
             }
        )

    Using Amazon CloudWatch Events, we trigger this function hourly. This function triggers an AWS Glue job named ‘convertEventsParquetHourly’ and runs it for the previous hour, passing job names and values of the partitions to process to AWS Glue.

    Redshift Spectrum and Node.js

    Our development stack is based on Node.js, which is well-suited for high-speed, light servers that need to process a huge number of transactions. However, a few limitations of the Node.js environment required us to create workarounds and use other tools to complete the process.

    Node.js and Parquet

    The lack of Parquet modules for Node.js required us to implement an AWS Glue/Amazon EMR process to effectively migrate data from CSV to Parquet. We would rather save directly to Parquet, but we couldn’t find an effective way to do it.

    One interesting project in the works is the development of a Parquet NPM by Marc Vertes called node-parquet (https://www.npmjs.com/package/node-parquet). It is not in a production state yet, but we think it would be well worth following the progress of this package.

    Timestamp data type

    According to the Parquet documentation, Timestamp data are stored in Parquet as 64-bit integers. However, JavaScript does not support 64-bit integers, because the native number type is a 64-bit double, giving only 53 bits of integer range.

    The result is that you cannot store Timestamp correctly in Parquet using Node.js. The solution is to store Timestamp as string and cast the type to Timestamp in the query. Using this method, we did not witness any performance degradation whatsoever.

    Lessons learned

    You can benefit from our trial-and-error experience.

    Lesson #1: Data validation is critical

    As mentioned earlier, a single corrupt entry in a partition can fail queries running against this partition, especially when using Parquet, which is harder to edit than a simple CSV file. Make sure that you validate your data before scanning it with Redshift Spectrum.

    Lesson #2: Structure and partition data effectively

    One of the biggest benefits of using Redshift Spectrum (or Athena for that matter) is that you don’t need to keep nodes up and running all the time. You pay only for the queries you perform and only for the data scanned per query.

    Keeping different permutations of your data for different queries makes a lot of sense in this case. For example, you can partition your data by date and hour to run time-based queries, and also have another set partitioned by user_id and date to run user-based queries. This results in faster and more efficient performance of your data warehouse.

    Storing data in the right format

    Use Parquet whenever you can. The benefits of Parquet are substantial. Faster performance, less data to scan, and much more efficient columnar format. However, it is not supported out-of-the-box by Kinesis Firehose, so you need to implement your own ETL. AWS Glue is a great option.

    Creating small tables for frequent tasks

    When we started using Redshift Spectrum, we saw our Amazon Redshift costs jump by hundreds of dollars per day. Then we realized that we were unnecessarily scanning a full day’s worth of data every minute. Take advantage of the ability to define multiple tables on the same S3 bucket or folder, and create temporary and small tables for frequent queries.

    Lesson #3: Combine Athena and Redshift Spectrum for optimal performance

    Moving to Redshift Spectrum also allowed us to take advantage of Athena as both use the AWS Glue Data Catalog. Run fast and simple queries using Athena while taking advantage of the advanced Amazon Redshift query engine for complex queries using Redshift Spectrum.

    Redshift Spectrum excels when running complex queries. It can push many compute-intensive tasks, such as predicate filtering and aggregation, down to the Redshift Spectrum layer, so that queries use much less of your cluster’s processing capacity.

    Lesson #4: Sort your Parquet data within the partition

    We achieved another performance improvement by sorting data within the partition using sortWithinPartitions(sort_field). For example:

    df.repartition(1).sortWithinPartitions("campaign_id")…

    Conclusion

    We were extremely pleased with using Amazon Redshift as our core data warehouse for over three years. But as our client base and volume of data grew substantially, we extended Amazon Redshift to take advantage of scalability, performance, and cost with Redshift Spectrum.

    Redshift Spectrum lets us scale to virtually unlimited storage, scale compute transparently, and deliver super-fast results for our users. With Redshift Spectrum, we store data where we want at the cost we want, and have the data available for analytics when our users need it with the performance they expect.


    About the Author

    With 7 years of experience in the AdTech industry and 15 years in leading technology companies, Rafi Ton is the founder and CEO of NUVIAD. He enjoys exploring new technologies and putting them to use in cutting edge products and services, in the real world generating real money. Being an experienced entrepreneur, Rafi believes in practical-programming and fast adaptation of new technologies to achieve a significant market advantage.