Tag Archives: eNom

Build your own weather station with our new guide!

Post Syndicated from Richard Hayler original https://www.raspberrypi.org/blog/build-your-own-weather-station/

One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”

Build Your Own weather station kit assembled

Tadaaaa! The BYO weather station fully assembled.

Our Oracle Weather Station

In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.

The original Raspberry Pi Oracle Weather Station HAT – Build Your Own Raspberry Pi weather station

The original Raspberry Pi Oracle Weather Station HAT

We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.

Our new BYO weather station guide

We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!

Build Your Own Raspberry Pi weather station

Fun with meteorological experiments!

Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.

Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.

Build Your Own Raspberry Pi weather station on a breadboard

There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.

Who should try this build

We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.

Build Your Own Raspberry Pi weather station – components

The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.

You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.

Prototyping HAT for Raspberry Pi weather station sensors

For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.

Our plans for the guide

Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!

*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.

The post Build your own weather station with our new guide! appeared first on Raspberry Pi.

Friday Squid Blogging: Do Cephalopods Contain Alien DNA?

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/06/friday_squid_bl_627.html

Maybe not DNA, but biological somethings.

Cause of Cambrian explosion — Terrestrial or Cosmic?“:

Abstract: We review the salient evidence consistent with or predicted by the Hoyle-Wickramasinghe (H-W) thesis of Cometary (Cosmic) Biology. Much of this physical and biological evidence is multifactorial. One particular focus are the recent studies which date the emergence of the complex retroviruses of vertebrate lines at or just before the Cambrian Explosion of ~500 Ma. Such viruses are known to be plausibly associated with major evolutionary genomic processes. We believe this coincidence is not fortuitous but is consistent with a key prediction of H-W theory whereby major extinction-diversification evolutionary boundaries coincide with virus-bearing cometary-bolide bombardment events. A second focus is the remarkable evolution of intelligent complexity (Cephalopods) culminating in the emergence of the Octopus. A third focus concerns the micro-organism fossil evidence contained within meteorites as well as the detection in the upper atmosphere of apparent incoming life-bearing particles from space. In our view the totality of the multifactorial data and critical analyses assembled by Fred Hoyle, Chandra Wickramasinghe and their many colleagues since the 1960s leads to a very plausible conclusion — life may have been seeded here on Earth by life-bearing comets as soon as conditions on Earth allowed it to flourish (about or just before 4.1 Billion years ago); and living organisms such as space-resistant and space-hardy bacteria, viruses, more complex eukaryotic cells, fertilised ova and seeds have been continuously delivered ever since to Earth so being one important driver of further terrestrial evolution which has resulted in considerable genetic diversity and which has led to the emergence of mankind.

Two commentaries.

This is almost certainly not true.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

YouTube Won’t Put Up With Blatant Piracy Tutorials Forever

Post Syndicated from Andy original https://torrentfreak.com/youtube-wont-put-up-with-blatant-piracy-tutorials-forever-180506/

Once upon a time, Internet users’ voices would be heard in limited circles, on platforms such as Usenet or other niche platforms.

Then, with the rise of forum platforms such as phpBB in 2000 and Invision Power Board in 2002, thriving communities could gather in public to discuss endless specialist topics, including file-sharing of course.

When dedicated piracy forums began to gain traction, it was pretty much a free-for-all. People discussed obtaining free content absolutely openly. Nothing was taboo and no one considered that there would be any repercussions. As such, moderation was limited to keeping troublemakers in check.

As the years progressed and lawsuits against both sites and services became more commonplace, most sites that weren’t actually serving illegal content began to consider their positions. Run by hobbyists, most didn’t want the hassle of a multi-million dollar lawsuit, so links to pirate content began to diminish and the more overt piracy tutorials began to disappear underground.

Those that remained in plain sight became much more considered. Tutorials on how to pirate specific Hollywood blockbusters were no longer needed, a plain general tutorial would suffice. And, as communities matured and took time to understand the implications of their actions, those without political motivations realized that drawing attention to potential criminality was neither required nor necessary.

Then YouTube and social media happened and almost overnight, no one was in charge and anyone could say whatever they liked.

In this new reality, there were no irritating moderator-type figures removing links to this and that, and nobody warning people against breaking rules that suddenly didn’t exist anymore. In essence, previously tight-knit and street-wise file-sharing and piracy communities not only became fragmented, but also chaotic.

This meant that anyone could become a leader and in some cases, this was the utopia that many had hoped for. Not only couldn’t the record labels or Hollywood tell people what to do anymore, discussion site operators couldn’t either. For those who didn’t abuse the power and for those who knew no better, this was a much-needed breath of fresh air. But, like all good things, it was unlikely to last forever.

Where most file-sharing of yesterday was carried out by hobbyist enthusiasts, many of today’s pirates are far more casual. They’re just as thirsty for content, but they don’t want to spend hours hunting for it. They want it all on a plate, at the flick of a switch, delivered to their TV with a minimum of hassle.

With online discussions increasingly seen as laborious and old-fashioned, many mainstream pirates have turned to easy-to-consume videos. In support of their Kodi media player habits, YouTube has become the educational platform of choice for millions.

As a result, there is now a long line of self-declared Kodi piracy specialists scooping up millions of views on YouTube. Their videos – which in many cases are thinly veiled advertisements for third party addons, Kodi ‘builds’, illegal IPTV services, and obscure Android APKs – are now the main way for a new generation to obtain direct advice on pirating.

Many of the videos are incredibly blatant, like the past 15 years of litigation never happened. All the lessons learned by the phpBB board operators of yesteryear, of how to achieve their goals of sharing information without getting shut down, have been long forgotten. In their place, a barrage of daily videos designed to generate clicks and affiliate revenue, no matter what the cost, no matter what the risk.

It’s pretty clear that these videos are at least partly responsible for the phenomenal uptick in Kodi and Android-based piracy over the past few years. In that respect, many lovers of free content will be eternally grateful for the service they’ve provided. But like many piracy movements over the years, people shouldn’t get too attached to them, at least in their current form.

Thanks to the devil-may-care approach of many influential YouTubers, it won’t be long before a whole new set of moderators begin flexing their muscles. While your average phpBB moderator could be reasoned with in order to get a second chance, a determined and largely faceless YouTube will eject offenders without so much as a clear explanation.

When this happens (and it’s only a question of time given the growing blatancy of many tutorials) YouTubers will not only lose their voices but their revenue streams too. While YouTube’s partner programs bring in some welcome cash, the profitable affiliate schemes touted on these channels for external products will also be under threat.

Perhaps the most surprising thing in this drama-waiting-to-happen is that many of the most popular YouTubers can hardly be considered young and naive. While some are of more tender years, most – with their undoubted skill, knowledge and work ethic – should know better for their 30 or 40 years on this planet. Yet not only do they make their names public, they feature their faces heavily in their videos too.

Still, it’s likely that it will take some big YouTube accounts to fall before YouTubers respond by shaving the sharp edges off their blatant promotion of illegal activity. And there’s little doubt that those advertising products (which is most of them) will have to do so sooner rather than later.

Just this week, YouTube made it clear that it won’t tolerate people making money from the promotion of illegal activities.

“YouTube creators may include paid endorsements as part of their content only if the product or service they are endorsing complies with our advertising policies,” YouTube told the BBC.

“We will be working with creators going forward so they better understand that in video promotions [they] must not promote dishonest activity.”

That being said, like many other players in the piracy and file-sharing space over the past 18 years, YouTubers will eventually begin to learn that not only can the smart survive, they can flourish too.

Sure, there will be people out there who’ll protest that free speech allows citizens to express themselves in a manner of their choosing. But try PM’ing that to YouTube in response to a strike, and see how that fares.

When they say you’re done, the road back is a long one.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

EC2 Fleet – Manage Thousands of On-Demand and Spot Instances with One Request

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/ec2-fleet-manage-thousands-of-on-demand-and-spot-instances-with-one-request/

EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.

EC2 Fleet
Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.

You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.

I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).

Using EC2 Fleet
There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.

Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.

You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:

  • m4.16xlarge – 64 vCPUs, weight = 64
  • m5.24xlarge – 96 vCPUs, weight = 96

This is expressed in a template that looks like this:

"Overrides": [
{
  "InstanceType": "m4.16xlarge",
  "WeightedCapacity": 64,
},
{
  "InstanceType": "m5.24xlarge",
  "WeightedCapacity": 96,
},
]

By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.

Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 2880,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 1920,
	"DefaultTargetCapacityType": "Spot"
}

The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.

Putting it all together, I have a single file (fl1.json) that describes my fleet:

    "LaunchTemplateConfigs": [
        {
            "LaunchTemplateSpecification": {
                "LaunchTemplateId": "lt-0e8c754449b27161c",
                "Version": "1"
            }
        "Overrides": [
        {
          "InstanceType": "m4.16xlarge",
          "WeightedCapacity": 64,
        },
        {
          "InstanceType": "m5.24xlarge",
          "WeightedCapacity": 96,
        },
      ]
        }
    ],
    "TargetCapacitySpecification": {
        "TotalTargetCapacity": 2880,
        "OnDemandTargetCapacity": 960,
        "SpotTargetCapacity": 1920,
        "DefaultTargetCapacityType": "Spot"
    }
}

I can launch my fleet with a single command:

$ aws ec2 create-fleet --cli-input-json file://home/ec2-user/fl1.json
{
    "FleetId":"fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a"
}

My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.

Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:

{         
    "TotalTargetCapacity": 960,
}

Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:

"TargetCapacitySpecification": {
	"TotalTargetCapacity": 960,
	"OnDemandTargetCapacity": 960,
	"SpotTargetCapacity": 0,
	"DefaultTargetCapacityType": "Spot"
}

When I no longer need my fleet I can delete it and terminate the instances in it like this:

$ aws ec2 delete-fleets --fleet-id fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a \
  --terminate-instances   
{
    "UnsuccessfulFleetDletetions": [],
    "SuccessfulFleetDeletions": [
        {
            "CurrentFleetState": "deleted_terminating",
            "PreviousFleetState": "active",
            "FleetId": "fleet-838cf4e5-fded-4f68-acb5-8c47ee1b248a"
        }
    ]
}

Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.

In the Works
We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.

Available Now
You can create and make use of EC2 Fleets today in all public AWS Regions!

Jeff;

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Physics cheats

Post Syndicated from Eevee original https://eev.ee/blog/2018/01/06/physics-cheats/

Anonymous asks:

something about how we tweak physics to “work” better in games?

Ho ho! Work. Get it? Like in physics…?

Hitboxes

Hitbox” is perhaps not the most accurate term, since the shape used for colliding with the environment and the shape used for detecting damage might be totally different. They’re usually the same in simple platformers, though, and that’s what most of my games have been.

The hitbox is the biggest physics fudge by far, and it exists because of a single massive approximation that (most) games make: you’re controlling a single entity in the abstract, not a physical body in great detail.

That is: when you walk with your real-world meat shell, you perform a complex dance of putting one foot in front of the other, a motion you spent years perfecting. When you walk in a video game, you press a single “walk” button. Your avatar may play an animation that moves its legs back and forth, but since you’re not actually controlling the legs independently (and since simulating them is way harder), the game just treats you like a simple shape. Fairly often, this is a box, or something very box-like.

An Eevee sprite standing on faux ground; the size of the underlying image and the hitbox are outlined

Since the player has no direct control over the exact placement of their limbs, it would be slightly frustrating to have them collide with the world. This is especially true in cases like the above, where the tail and left ear protrude significantly out from the main body. If that Eevee wanted to stand against a real-world wall, she would simply tilt her ear or tail out of the way, so there’s no reason for the ear to block her from standing against a game wall. To compensate for this, the ear and tail are left out of the collision box entirely and will simply jut into a wall if necessary — a goofy affordance that’s so common it doesn’t even register as unusual. As a bonus (assuming this same box is used for combat), she won’t take damage from projectiles that merely graze past an ear.

(One extra consideration for sprite games in particular: the hitbox ought to be horizontally symmetric around the sprite’s pivot — i.e. the point where the entity is truly considered to be standing — so that the hitbox doesn’t abruptly move when the entity turns around!)

Corners

Treating the player (and indeed most objects) as a box has one annoying side effect: boxes have corners. Corners can catch on other corners, even by a single pixel. Real-world bodies tend to be a bit rounder and squishier and this can tolerate grazing a corner; even real-world boxes will simply rotate a bit.

Ah, but in our faux physics world, we generally don’t want conscious actors (such as the player) to rotate, even with a realistic physics simulator! Real-world bodies are made of parts that will generally try to keep you upright, after all; you don’t tilt back and forth much.

One way to handle corners is to simply remove them from conscious actors. A hitbox doesn’t have to be a literal box, after all. A popular alternative — especially in Unity where it’s a standard asset — is the pill-shaped capsule, which has semicircles/hemispheres on the top and bottom and a cylindrical body in 3D. No corners, no problem.

Of course, that introduces a new problem: now the player can’t balance precariously on edges without their rounded bottom sliding them off. Alas.

If you’re stuck with corners, then, you may want to use a corner bump, a term I just made up. If the player would collide with a corner, but the collision is only by a few pixels, just nudge them to the side a bit and carry on.

An Eevee sprite trying to move sideways into a shallow ledge; the game bumps her upwards slightly, so she steps onto it instead

When the corner is horizontal, this creates stairs! This is, more or less kinda, how steps work in Doom: when the player tries to cross from one sector into another, if the height difference is 24 units or less, the game simply bumps them upwards to the height of the new floor and lets them continue on.

Implementing this in a game without Doom’s notion of sectors is a little trickier. In fact, I still haven’t done it. Collision detection based on rejection gets it for free, kinda, but it’s not very deterministic and it breaks other things. But that’s a whole other post.

Gravity

Gravity is pretty easy. Everything accelerates downwards all the time. What’s interesting are the exceptions.

Jumping

Jumping is a giant hack.

Think about how actual jumping works: you tense your legs, which generally involves bending your knees first, and then spring upwards. In a platformer, you can just leap whenever you feel like it, which is nonsense. Also you go like twenty feet into the air?

Worse, most platformers allow variable-height jumping, where your jump is lower if you let go of the jump button while you’re in the air. Normally, one would expect to have to decide how much force to put into the jump beforehand.

But of course this is about convenience of controls: when jumping is your primary action, you want to be able to do it immediately, without any windup for how high you want to jump.

(And then there’s double jumping? Come on.)

Air control is a similar phenomenon: usually you’d jump in a particular direction by controlling how you push off the ground with your feet, but in a video game, you don’t have feet! You only have the box. The compromise is to let you control your horizontal movement to a limit degree in midair, even though that doesn’t make any sense. (It’s way more fun, though, and overall gives you more movement options, which are good to have in an interactive medium.)

Air control also exposes an obvious place that game physics collide with the realistic model of serious physics engines. I’ve mentioned this before, but: if you use Real Physics™ and air control yourself into a wall, you might find that you’ll simply stick to the wall until you let go of the movement buttons. Why? Remember, player movement acts as though an external force were pushing you around (and from the perspective of a Real™ physics engine, this is exactly how you’d implement it) — so air-controlling into a wall is equivalent to pushing a book against a wall with your hand, and the friction with the wall holds you in place. Oops.

Ground sticking

Another place game physics conflict with physics engines is with running to the top of a slope. On a real hill, of course, you land on top of the slope and are probably glad of it; slopes are hard to climb!

An Eevee moves to the top of a slope, and rather than step onto the flat top, she goes flying off into the air

In a video game, you go flying. Because you’re a box. With momentum. So you hit the peak and keep going in the same direction. Which is diagonally upwards.

Projectiles

To make them more predictable, projectiles generally aren’t subject to gravity, at least as far as I’ve seen. The real world does not have such an exemption. The real world imposes gravity even on sniper rifles, which in a video game are often implemented as an instant trace unaffected by anything in the world because the bullet never actually exists in the world.

Resistance

Ah. Welcome to hell.

Water

Water is an interesting case, and offhand I don’t know the gritty details of how games implement it. In the real world, water applies a resistant drag force to movement — and that force is proportional to the square of velocity, which I’d completely forgotten until right now. I am almost positive that no game handles that correctly. But then, in real-world water, you can push against the water itself for movement, and games don’t simulate that either. What’s the rough equivalent?

The Sonic Physics Guide suggests that Sonic handles it by basically halving everything: acceleration, max speed, friction, etc. When Sonic enters water, his speed is cut; when Sonic exits water, his speed is increased.

That last bit feels validating — I could swear Metroid Prime did the same thing, and built my own solution around it, but couldn’t remember for sure. It makes no sense, of course, for a jump to become faster just because you happened to break the surface of the water, but it feels fantastic.

The thing I did was similar, except that I didn’t want to add a multiplier in a dozen places when you happen to be underwater (and remember which ones need it to be squared, etc.). So instead, I calculate everything completely as normal, so velocity is exactly the same as it would be on dry land — but the distance you would move gets halved. The effect seems to be pretty similar to most platformers with water, at least as far as I can tell. It hasn’t shown up in a published game and I only added this fairly recently, so I might be overlooking some reason this is a bad idea.

(One reason that comes to mind is that velocity is now a little white lie while underwater, so anything relying on velocity for interesting effects might be thrown off. Or maybe that’s correct, because velocity thresholds should be halved underwater too? Hm!)

Notably, air is also a fluid, so it should behave the same way (just with different constants). I definitely don’t think any games apply air drag that’s proportional to the square of velocity.

Friction

Friction is, in my experience, a little handwaved. Probably because real-world friction is so darn complicated.

Consider that in the real world, we want very high friction on the surfaces we walk on — shoes and tires are explicitly designed to increase it, even. We move by bracing a back foot against the ground and using that to push ourselves forward, so we want the ground to resist our push as much as possible.

In a game world, we are a box. We move by being pushed by some invisible outside force, so if the friction between ourselves and the ground is too high, we won’t be able to move at all! That’s complete nonsense physically, but it turns out to be handy in some cases — for example, highish friction can simulate walking through deep mud, which should be difficult due to fluid drag and low friction.

But the best-known example of the fakeness of game friction is video game ice. Walking on real-world ice is difficult because the low friction means low grip; your feet are likely to slip out from under you, and you’ll simply fall down and have trouble moving at all. In a video game, you can’t fall down, so you have the opposite experience: you spend most of your time sliding around uncontrollably. Yet ice is so common in video games (and perhaps so uncommon in places I’ve lived) that I, at least, had never really thought about this disparity until an hour or so ago.

Game friction vs real-world friction

Real-world friction is a force. It’s the normal force (which is the force exerted by the object on the surface) times some constant that depends on how the two materials interact.

Force is mass times acceleration, and platformers often ignore mass, so friction ought to be an acceleration — applied against the object’s movement, but never enough to push it backwards.

I haven’t made any games where variable friction plays a significant role, but my gut instinct is that low friction should mean the player accelerates more slowly but has a higher max speed, and high friction should mean the opposite. I see from my own source code that I didn’t even do what I just said, so let’s defer to some better-made and well-documented games: Sonic and Doom.

In Sonic, friction is a fixed value subtracted from the player’s velocity (regardless of direction) each tic. Sonic has a fixed framerate, so the units are really pixels per tic squared (i.e. acceleration), multiplied by an implicit 1 tic per tic. So far, so good.

But Sonic’s friction only applies if the player isn’t pressing or . Hang on, that isn’t friction at all; that’s just deceleration! That’s equivalent to jogging to a stop. If friction were lower, Sonic would take longer to stop, but otherwise this is only tangentially related to friction.

(In fairness, this approach would decently emulate friction for non-conscious sliding objects, which are never going to be pressing movement buttons. Also, we don’t have the Sonic source code, and the name “friction” is a fan invention; the Sonic Physics Guide already uses “deceleration” to describe the player’s acceleration when turning around.)

Okay, let’s try Doom. In Doom, the default friction is 90.625%.

Hang on, what?

Yes, in Doom, friction is a multiplier applied every tic. Doom runs at 35 tics per second, so this is a multiplier of 0.032 per second. Yikes!

This isn’t anything remotely like real friction, but it’s much easier to implement. With friction as acceleration, the game has to know both the direction of movement (so it can apply friction in the opposite direction) and the magnitude (so it doesn’t overshoot and launch the object in the other direction). That means taking a semi-costly square root and also writing extra code to cap the amount of friction. With a multiplier, neither is necessary; just multiply the whole velocity vector and you’re done.

There are some downsides. One is that objects will never actually stop, since multiplying by 3% repeatedly will never produce a result of zero — though eventually the speed will become small enough to either slip below a “minimum speed” threshold or simply no longer fit in a float representation. Another is that the units are fairly meaningless: with Doom’s default friction of 90.625%, about how long does it take for the player to stop? I have no idea, partly because “stop” is ambiguous here! If friction were an acceleration, I could divide it into the player’s max speed to get a time.

All that aside, what are the actual effects of changing Doom’s friction? What an excellent question that’s surprisingly tricky to answer. (Note that friction can’t be changed in original Doom, only in the Boom port and its derivatives.) Here’s what I’ve pieced together.

Doom’s “friction” is really two values. “Friction” itself is a multiplier applied to moving objects on every tic, but there’s also a move factor which defaults to \(\frac{1}{32} = 0.03125\) and is derived from friction for custom values.

Every tic, the player’s velocity is multiplied by friction, and then increased by their speed times the move factor.

$$
v(n) = v(n – 1) \times friction + speed \times move factor
$$

Eventually, the reduction from friction will balance out the speed boost. That happens when \(v(n) = v(n – 1)\), so we can rearrange it to find the player’s effective max speed:

$$
v = v \times friction + speed \times move factor \\
v – v \times friction = speed \times move factor \\
v = speed \times \frac{move factor}{1 – friction}
$$

For vanilla Doom’s move factor of 0.03125 and friction of 0.90625, that becomes:

$$
v = speed \times \frac{\frac{1}{32}}{1 – \frac{29}{32}} = speed \times \frac{\frac{1}{32}}{\frac{3}{32}} = \frac{1}{3} \times speed
$$

Curiously, “speed” is three times the maximum speed an actor can actually move. Doomguy’s run speed is 50, so in practice he moves a third of that, or 16⅔ units per tic. (Of course, this isn’t counting SR40, a bug that lets Doomguy run ~40% faster than intended diagonally.)

So now, what if you change friction? Even more curiously, the move factor is calculated completely differently depending on whether friction is higher or lower than the default Doom amount:

$$
move factor = \begin{cases}
\frac{133 – 128 \times friction}{544} &≈ 0.244 – 0.235 \times friction & \text{ if } friction \ge \frac{29}{32} \\
\frac{81920 \times friction – 70145}{1048576} &≈ 0.078 \times friction – 0.067 & \text{ otherwise }
\end{cases}
$$

That’s pretty weird? Complicating things further is that low friction (which means muddy terrain, remember) has an extra multiplier on its move factor, depending on how fast you’re already going — the idea is apparently that you have a hard time getting going, but it gets easier as you find your footing. The extra multiplier maxes out at 8, which makes the two halves of that function meet at the vanilla Doom value.

A graph of the relationship between friction and move factor

That very top point corresponds to the move factor from the original game. So no matter what you do to friction, the move factor becomes lower. At 0.85 and change, you can no longer move at all; below that, you move backwards.

From the formula above, it’s easy to see what changes to friction and move factor will do to Doomguy’s stable velocity. Move factor is in the numerator, so increasing it will increase stable velocity — but it can’t increase, so stable velocity can only ever decrease. Friction is in the denominator, but it’s subtracted from 1, so increasing friction will make the denominator a smaller value less than 1, i.e. increase stable velocity. Combined, we get this relationship between friction and stable velocity.

A graph showing stable velocity shooting up dramatically as friction increases

As friction approaches 1, stable velocity grows without bound. This makes sense, given the definition of \(v(n)\) — if friction is 1, the velocity from the previous tic isn’t reduced at all, so we just keep accelerating freely.

All of this is why I’m wary of using multipliers.

Anyway, this leaves me with one last question about the effects of Doom’s friction: how long does it take to reach stable velocity? Barring precision errors, we’ll never truly reach stable velocity, but let’s say within 5%. First we need a closed formula for the velocity after some number of tics. This is a simple recurrence relation, and you can write a few terms out yourself if you want to be sure this is right.

$$
v(n) = v_0 \times friction^n + speed \times move factor \times \frac{friction^n – 1}{friction – 1}
$$

Our initial velocity is zero, so the first term disappears. Set this equal to the stable formula and solve for n:

$$
speed \times move factor \times \frac{friction^n – 1}{friction – 1} = (1 – 5\%) \times speed \times \frac{move factor}{1 – friction} \\
friction^n – 1 = -(1 – 5\%) \\
n = \frac{\ln 5\%}{\ln friction}
$$

Speed” and move factor disappear entirely, which makes sense, and this is purely a function of friction (and how close we want to get). For vanilla Doom, that comes out to 30.4, which is a little less than a second. For other values of friction:

A graph of time to stability which leaps upwards dramatically towards the right

As friction increases (which in Doom terms means the surface is more slippery), it takes longer and longer to reach stable speed, which is in turn greater and greater. For lesser friction (i.e. mud), stable speed is lower, but reached fairly quickly. (Of course, the extra “getting going” multiplier while in mud adds some extra time here, but including that in the graph is a bit more complicated.)

I think this matches with my instincts above. How fascinating!

What’s that? This is way too much math and you hate it? Then don’t use multipliers in game physics.

Uh

That was a hell of a diversion!

I guess the goofiest stuff in basic game physics is really just about mapping player controls to in-game actions like jumping and deceleration; the rest consists of hacks to compensate for representing everything as a box.

Supporting Conservancy Makes a Difference

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2017/12/31/donate-conservancy.html

Earlier this year, in
February, I wrote a blog post encouraging people to donate
to where I
work, Software Freedom Conservancy. I’ve not otherwise blogged too much
this year. It’s been a rough year for many reasons, and while I
personally and Conservancy in general have accomplished some very
important work this year, I’m reminded as always that more resources do
make things easier.

I understand the urge, given how bad the larger political crises have
gotten, to want to give to charities other than those related to software
freedom. There are important causes out there that have become more urgent
this year. Here’s three issues which have become shockingly more acute
this year:

  • making sure the USA keeps it commitment
    to immigrants to allow them make a new life here just like my own ancestors
    did,
  • assuring that the great national nature reserves are maintained and
    left pristine for generations to come,
  • assuring that we have zero tolerance abusive behavior —
    particularly by those in power against people who come to them for help and
    job opportunities.

These are just three of the many issues this year that I’ve seen get worse,
not better. I am glad that I know and support people who work on these
issues, and I urge everyone to work on these issues, too.

Nevertheless, as I plan my primary donations this year, I’m again, as I
always do, giving to the FSF and my
own employer, Software
Freedom Conservancy
. The reason is simple: software freedom is still
an essential cause and it is frankly one that most people don’t understand
(yet). I wrote almost
two years ago about the phenomenon I dubbed Kuhn’s
Paradox
. Simply put: it keeps getting more and more difficult
to avoid proprietary software in a normal day’s tasks, even while the
number of lines of code licensed freely gets larger every day.

As long as that paradox remains true, I see software freedom as urgent. I
know that we’re losing ground on so many other causes, too. But those of
you who read my blog are some of the few people in the world that
understand that software freedom is under threat and needs the urgent work
that the very few software-freedom-related organizations,
like the FSF
and Software Freedom
Conservancy
are doing. I hope you’ll donate now to both of them. For
my part, I gave $120 myself to FSF as part of the monthly Associate
Membership program, and in a few minutes, I’m going to give $400 to
Conservancy. I’ll be frank: if you work in technology in an industrialized
country, I’m quite sure you can afford that level of money, and I suspect
those amounts are less than most of you spent on technology equipment
and/or network connectivity charges this year. Make a difference for us
and give to the cause of software freedom at least as much a you’re giving
to large technology companies.

Finally, a good reason to give to smaller charities like FSF and
Conservancy is that your donation makes a bigger difference. I do think
bigger organizations, such as (to pick an example of an organization I used
to give to) my local NPR station does important work. However, I was
listening this week to my local NPR station, and they said their goal
for that day was to raise $50,000. For Conservancy, that’s closer
to a goal we have for entire fundraising season, which for this year was
$75,000. The thing is: NPR is an important part of USA society, but it’s
one that nearly everyone understands. So few people understand the threats
looming from proprietary software, and they may not understand at all until
it’s too late — when all their devices are locked down, DRM is
fully ubiquitous, and no one is allowed to tinker with the software on
their devices and learn the wonderful art of computer programming. We are
at real risk of reaching that distopia before 90% of the world’s
population understands the threat!

Thus, giving to organizations in the area of software freedom is just
going to have a bigger and more immediate impact than more general causes
that more easily connect with people. You’re giving to prevent a future
that not everyone understands yet, and making an impact on our
work to help explain the dangers to the larger population.

Acoustical Attacks against Hard Drives

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/acoustical_atta.html

Interesting destructive attack: “Acoustic Denial of Service Attacks on HDDs“:

Abstract: Among storage components, hard disk drives (HDDs) have become the most commonly-used type of non-volatile storage due to their recent technological advances, including, enhanced energy efficacy and significantly-improved areal density. Such advances in HDDs have made them an inevitable part of numerous computing systems, including, personal computers, closed-circuit television (CCTV) systems, medical bedside monitors, and automated teller machines (ATMs). Despite the widespread use of HDDs and their critical role in real-world systems, there exist only a few research studies on the security of HDDs. In particular, prior research studies have discussed how HDDs can potentially leak critical private information through acoustic or electromagnetic emanations. Borrowing theoretical principles from acoustics and mechanics, we propose a novel denial-of-service (DoS) attack against HDDs that exploits a physical phenomenon, known as acoustic resonance. We perform a comprehensive examination of physical characteristics of several HDDs and create acoustic signals that cause significant vibrations in HDDs internal components. We demonstrate that such vibrations can negatively influence the performance of HDDs embedded in real-world systems. We show the feasibility of the proposed attack in two real-world case studies, namely, personal computers and CCTVs.

The deal with Bitcoin

Post Syndicated from Michal Zalewski original http://lcamtuf.blogspot.com/2017/12/the-deal-with-bitcoin.html

♪ Used to have a little now I have a lot
I’m still, I’m still Jenny from the block
          chain ♪

For all that has been written about Bitcoin and its ilk, it is curious that the focus is almost solely what the cryptocurrencies are supposed to be. Technologists wax lyrical about the potential for blockchains to change almost every aspect of our lives. Libertarians and paleoconservatives ache for the return to “sound money” that can’t be conjured up at the whim of a bureaucrat. Mainstream economists wag their fingers, proclaiming that a proper currency can’t be deflationary, that it must maintain a particular velocity, or that the government must be able to nip crises of confidence in the bud. And so on.

Much of this may be true, but the proponents of cryptocurrencies should recognize that an appeal to consequences is not a guarantee of good results. The critics, on the other hand, would be best served to remember that they are drawing far-reaching conclusions about the effects of modern monetary policies based on a very short and tumultuous period in history.

In this post, my goal is to ditch most of the dogma, talk a bit about the origins of money – and then see how “crypto” fits the bill.

1. The prehistory of currencies

The emergence of money is usually explained in a very straightforward way. You know the story: a farmer raised a pig, a cobbler made a shoe. The cobbler needed to feed his family while the farmer wanted to keep his feet warm – and so they met to exchange the goods on mutually beneficial terms. But as the tale goes, the barter system had a fatal flaw: sometimes, a farmer wanted a cooking pot, a potter wanted a knife, and a blacksmith wanted a pair of pants. To facilitate increasingly complex, multi-step exchanges without requiring dozens of people to meet face to face, we came up with an abstract way to represent value – a shiny coin guaranteed to be accepted by every tradesman.

It is a nice parable, but it probably isn’t very true. It seems far more plausible that early societies relied on the concept of debt long before the advent of currencies: an informal tally or a formal ledger would be used to keep track of who owes what to whom. The concept of debt, closely associated with one’s trustworthiness and standing in the community, would have enabled a wide range of economic activities: debts could be paid back over time, transferred, renegotiated, or forgotten – all without having to engage in spot barter or to mint a single coin. In fact, such non-monetary, trust-based, reciprocal economies are still common in closely-knit communities: among families, neighbors, coworkers, or friends.

In such a setting, primitive currencies probably emerged simply as a consequence of having a system of prices: a cow being worth a particular number of chickens, a chicken being worth a particular number of beaver pelts, and so forth. Formalizing such relationships by settling on a single, widely-known unit of account – say, one chicken – would make it more convenient to transfer, combine, or split debts; or to settle them in alternative goods.

Contrary to popular belief, for communal ledgers, the unit of account probably did not have to be particularly desirable, durable, or easy to carry; it was simply an accounting tool. And indeed, we sometimes run into fairly unusual units of account even in modern times: for example, cigarettes can be the basis of a bustling prison economy even when most inmates don’t smoke and there are not that many packs to go around.

2. The age of commodity money

In the end, the development of coinage might have had relatively little to do with communal trade – and far more with the desire to exchange goods with strangers. When dealing with a unfamiliar or hostile tribe, the concept of a chicken-denominated ledger does not hold up: the other side might be disinclined to honor its obligations – and get away with it, too. To settle such problematic trades, we needed a “spot” medium of exchange that would be easy to carry and authenticate, had a well-defined value, and a near-universal appeal. Throughout much of the recorded history, precious metals – predominantly gold and silver – proved to fit the bill.

In the most basic sense, such commodities could be seen as a tool to reconcile debts across societal boundaries, without necessarily replacing any local units of account. An obligation, denominated in some local currency, would be created on buyer’s side in order to procure the metal for the trade. The proceeds of the completed transaction would in turn allow the seller to settle their own local obligations that arose from having to source the traded goods. In other words, our wondrous chicken-denominated ledgers could coexist peacefully with gold – and when commodity coinage finally took hold, it’s likely that in everyday trade, precious metals served more as a useful abstraction than a precise store of value. A “silver chicken” of sorts.

Still, the emergence of commodity money had one interesting side effect: it decoupled the unit of debt – a “claim on the society”, in a sense – from any moral judgment about its origin. A piece of silver would buy the same amount of food, whether earned through hard labor or won in a drunken bet. This disconnect remains a central theme in many of the debates about social justice and unfairly earned wealth.

3. The State enters the game

If there is one advantage of chicken ledgers over precious metals, it’s that all chickens look and cluck roughly the same – something that can’t be said of every nugget of silver or gold. To cope with this problem, we needed to shape raw commodities into pieces of a more predictable shape and weight; a trusted party could then stamp them with a mark to indicate the value and the quality of the coin.

At first, the task of standardizing coinage rested with private parties – but the responsibility was soon assumed by the State. The advantages of this transition seemed clear: a single, widely-accepted and easily-recognizable currency could be now used to settle virtually all private and official debts.

Alas, in what deserves the dubious distinction of being one of the earliest examples of monetary tomfoolery, some States succumbed to the temptation of fiddling with the coinage to accomplish anything from feeding the poor to waging wars. In particular, it would be common to stamp coins with the same face value but a progressively lower content of silver and gold. Perhaps surprisingly, the strategy worked remarkably well; at least in the times of peace, most people cared about the value stamped on the coin, not its precise composition or weight.

And so, over time, representative money was born: sooner or later, most States opted to mint coins from nearly-worthless metals, or print banknotes on paper and cloth. This radically new currency was accompanied with a simple pledge: the State offered to redeem it at any time for its nominal value in gold.

Of course, the promise was largely illusory: the State did not have enough gold to honor all the promises it had made. Still, as long as people had faith in their rulers and the redemption requests stayed low, the fundamental mechanics of this new representative currency remained roughly the same as before – and in some ways, were an improvement in that they lessened the insatiable demand for a rare commodity. Just as importantly, the new money still enabled international trade – using the underlying gold exchange rate as a reference point.

4. Fractional reserve banking and fiat money

For much of the recorded history, banking was an exceptionally dull affair, not much different from running a communal chicken
ledger of the old. But then, something truly marvelous happened in the 17th century: around that time, many European countries have witnessed
the emergence of fractional-reserve banks.

These private ventures operated according to a simple scheme: they accepted people’s coin
for safekeeping, promising to pay a premium on every deposit made. To meet these obligations and to make a profit, the banks then
used the pooled deposits to make high-interest loans to other folks. The financiers figured out that under normal circumstances
and when operating at a sufficient scale, they needed only a very modest reserve – well under 10% of all deposited money – to be
able to service the usual volume and size of withdrawals requested by their customers. The rest could be loaned out.

The very curious consequence of fractional-reserve banking was that it pulled new money out of thin air.
The funds were simultaneously accounted for in the statements shown to the depositor, evidently available for withdrawal or
transfer at any time; and given to third-party borrowers, who could spend them on just about anything. Heck, the borrowers could
deposit the proceeds in another bank, creating even more money along the way! Whatever they did, the sum of all funds in the monetary
system now appeared much higher than the value of all coins and banknotes issued by the government – let alone the amount of gold
sitting in any vault.

Of course, no new money was being created in any physical sense: all that banks were doing was engaging in a bit of creative accounting – the sort of which would probably land you in jail if you attempted it today in any other comparably vital field of enterprise. If too many depositors were to ask for their money back, or if too many loans were to go bad, the banking system would fold. Fortunes would evaporate in a puff of accounting smoke, and with the disappearance of vast quantities of quasi-fictitious (“broad”) money, the wealth of the entire nation would shrink.

In the early 20th century, the world kept witnessing just that; a series of bank runs and economic contractions forced the governments around the globe to act. At that stage, outlawing fractional-reserve banking was no longer politically or economically tenable; a simpler alternative was to let go of gold and move to fiat money – a currency implemented as an abstract social construct, with no predefined connection to the physical realm. A new breed of economists saw the role of the government not in trying to peg the value of money to an inflexible commodity, but in manipulating its supply to smooth out economic hiccups or to stimulate growth.

(Contrary to popular beliefs, such manipulation is usually not done by printing new banknotes; more sophisticated methods, such as lowering reserve requirements for bank deposits or enticing banks to invest its deposits into government-issued securities, are the preferred route.)

The obvious peril of fiat money is that in the long haul, its value is determined strictly by people’s willingness to accept a piece of paper in exchange for their trouble; that willingness, in turn, is conditioned solely on their belief that the same piece of paper would buy them something nice a week, a month, or a year from now. It follows that a simple crisis of confidence could make a currency nearly worthless overnight. A prolonged period of hyperinflation and subsequent austerity in Germany and Austria was one of the precipitating factors that led to World War II. In more recent times, dramatic episodes of hyperinflation plagued the fiat currencies of Israel (1984), Mexico (1988), Poland (1990), Yugoslavia (1994), Bulgaria (1996), Turkey (2002), Zimbabwe (2009), Venezuela (2016), and several other nations around the globe.

For the United States, the switch to fiat money came relatively late, in 1971. To stop the dollar from plunging like a rock, the Nixon administration employed a clever trick: they ordered the freeze of wages and prices for the 90 days that immediately followed the move. People went on about their lives and paid the usual for eggs or milk – and by the time the freeze ended, they were accustomed to the idea that the “new”, free-floating dollar is worth about the same as the old, gold-backed one. A robust economy and favorable geopolitics did the rest, and so far, the American adventure with fiat currency has been rather uneventful – perhaps except for the fact that the price of gold itself skyrocketed from $35 per troy ounce in 1971 to $850 in 1980 (or, from $210 to $2,500 in today’s dollars).

Well, one thing did change: now better positioned to freely tamper with the supply of money, the regulators in accord with the bankers adopted a policy of creating it at a rate that slightly outstripped the organic growth in economic activity. They did this to induce a small, steady degree of inflation, believing that doing so would discourage people from hoarding cash and force them to reinvest it for the betterment of the society. Some critics like to point out that such a policy functions as a “backdoor” tax on savings that happens to align with the regulators’ less noble interests; still, either way: in the US and most other developed nations, the purchasing power of any money kept under a mattress will drop at a rate of somewhere between 2 to 10% a year.

5. So what’s up with Bitcoin?

Well… countless tomes have been written about the nature and the optimal characteristics of government-issued fiat currencies. Some heterodox economists, notably including Murray Rothbard, have also explored the topic of privately-issued, decentralized, commodity-backed currencies. But Bitcoin is a wholly different animal.

In essence, BTC is a global, decentralized fiat currency: it has no (recoverable) intrinsic value, no central authority to issue it or define its exchange rate, and it has no anchoring to any historical reference point – a combination that until recently seemed nonsensical and escaped any serious scrutiny. It does the unthinkable by employing three clever tricks:

  1. It allows anyone to create new coins, but only by solving brute-force computational challenges that get more difficult as the time goes by,

  2. It prevents unauthorized transfer of coins by employing public key cryptography to sign off transactions, with only the authorized holder of a coin knowing the correct key,

  3. It prevents double-spending by using a distributed public ledger (“blockchain”), recording the chain of custody for coins in a tamper-proof way.

The blockchain is often described as the most important feature of Bitcoin, but in some ways, its importance is overstated. The idea of a currency that does not rely on a centralized transaction clearinghouse is what helped propel the platform into the limelight – mostly because of its novelty and the perception that it is less vulnerable to government meddling (although the government is still free to track down, tax, fine, or arrest any participants). On the flip side, the everyday mechanics of BTC would not be fundamentally different if all the transactions had to go through Bitcoin Bank, LLC.

A more striking feature of the new currency is the incentive structure surrounding the creation of new coins. The underlying design democratized the creation of new coins early on: all you had to do is leave your computer running for a while to acquire a number of tokens. The tokens had no practical value, but obtaining them involved no substantial expense or risk. Just as importantly, because the difficulty of the puzzles would only increase over time, the hope was that if Bitcoin caught on, latecomers would find it easier to purchase BTC on a secondary market than mine their own – paying with a more established currency at a mutually beneficial exchange rate.

The persistent publicity surrounding Bitcoin and other cryptocurrencies did the rest – and today, with the growing scarcity of coins and the rapidly increasing demand, the price of a single token hovers somewhere south of $15,000.

6. So… is it bad money?

Predicting is hard – especially the future. In some sense, a coin that represents a cryptographic proof of wasted CPU cycles is no better or worse than a currency that relies on cotton decorated with pictures of dead presidents. It is true that Bitcoin suffers from many implementation problems – long transaction processing times, high fees, frequent security breaches of major exchanges – but in principle, such problems can be overcome.

That said, currencies live and die by the lasting willingness of others to accept them in exchange for services or goods – and in that sense, the jury is still out. The use of Bitcoin to settle bona fide purchases is negligible, both in absolute terms and in function of the overall volume of transactions. In fact, because of the technical challenges and limited practical utility, some companies that embraced the currency early on are now backing out.

When the value of an asset is derived almost entirely from its appeal as an ever-appreciating investment vehicle, the situation has all the telltale signs of a speculative bubble. But that does not prove that the asset is destined to collapse, or that a collapse would be its end. Still, the built-in deflationary mechanism of Bitcoin – the increasing difficulty of producing new coins – is probably both a blessing and a curse.

It’s going to go one way or the other; and when it’s all said and done, we’re going to celebrate the people who made the right guess. Because future is actually pretty darn easy to predict — in retrospect.

CoderDojo: 2000 Dojos ever

Post Syndicated from Giustina Mizzoni original https://www.raspberrypi.org/blog/2000-dojos-ever/

Every day of the week, we verify new Dojos all around the world, and each Dojo is championed by passionate volunteers. Last week, a huge milestone for the CoderDojo community went by relatively unnoticed: in the history of the movement, more than 2000 Dojos have now been verified!

CoderDojo banner — 2000 Dojos

2000 Dojos

This is a phenomenal achievement for a movement that’s just six years old and powered by volunteers. Presently, there are more than 1650 active Dojos running weekly, fortnightly, or monthly, and all of them are free for participants — for example, the Dojos run by Joel Bayubasire in Kampala, Uganda:

Joel Bayubasire with Ninjas at his Ugandan Dojo — 2000 Dojos

Empowering refugee children

This week, Joel set up his second Dojo and verified it on our global map. Joel is a Congolese refugee living in Kampala, Uganda, where he is currently completing his PhD in Economics at Madison International Institute and Business School.

Joel understands first-hand the challenges faced by refugees who were forced to leave their country due to war or conflict. Uganda is currently hosting more than 1.2 million refugees, 60% of which are children (World Bank, 2017). As refugees, children are only allowed to attend local schools until the age of 12. This results in lower educational attainment, which will likely affect their future employment prospects.

Two girls at a laptop. Joel Bayubasire CoderDojo — 2000 Dojos

Joel has the motivation to overcome these challenges, because he understands the power of education. Therefore, he initiated a number of community-based activities to provide educational opportunities for refugee children. As part of this, he founded his first Dojo earlier in the year, with the aim of giving these children a chance to compete in today’s global knowledge-based economy.

Two boys at a laptop. Joel Bayubasire CoderDojo — 2000 Dojos

Aware that securing volunteer mentors would be a challenge, Joel trained eight young people from the community to become youth mentors to their peers. He explains:

I believe that the mastery of computer coding allows talented young people to thrive professionally and enables them to not only be consumers but creators of the interconnected world of today!

Based on the success of Joel’s first Dojo, he has now expanded the CoderDojo initiative in his community; his plan is to provide computer science training for more than 300 refugee youths in Kampala by 2019. If you’d like to learn more about Joel’s efforts, head to this website.

Join the movement

If you are interested in creating opportunities for the young people in your community, then join the growing CoderDojo movement — you can volunteer to start a Dojo or to support an existing one today!

The post CoderDojo: 2000 Dojos ever appeared first on Raspberry Pi.

Object models

Post Syndicated from Eevee original https://eev.ee/blog/2017/11/28/object-models/

Anonymous asks, with dollars:

More about programming languages!

Well then!

I’ve written before about what I think objects are: state and behavior, which in practice mostly means method calls.

I suspect that the popular impression of what objects are, and also how they should work, comes from whatever C++ and Java happen to do. From that point of view, the whole post above is probably nonsense. If the baseline notion of “object” is a rigid definition woven tightly into the design of two massively popular languages, then it doesn’t even make sense to talk about what “object” should mean — it does mean the features of those languages, and cannot possibly mean anything else.

I think that’s a shame! It piles a lot of baggage onto a fairly simple idea. Polymorphism, for example, has nothing to do with objects — it’s an escape hatch for static type systems. Inheritance isn’t the only way to reuse code between objects, but it’s the easiest and fastest one, so it’s what we get. Frankly, it’s much closer to a speed tradeoff than a fundamental part of the concept.

We could do with more experimentation around how objects work, but that’s impossible in the languages most commonly thought of as object-oriented.

Here, then, is a (very) brief run through the inner workings of objects in four very dynamic languages. I don’t think I really appreciated objects until I’d spent some time with Python, and I hope this can help someone else whet their own appetite.

Python 3

Of the four languages I’m going to touch on, Python will look the most familiar to the Java and C++ crowd. For starters, it actually has a class construct.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __neg__(self):
        return Vector(-self.x, -self.y)

    def __div__(self, denom):
        return Vector(self.x / denom, self.y / denom)

    @property
    def magnitude(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5

    def normalized(self):
        return self / self.magnitude

The __init__ method is an initializer, which is like a constructor but named differently (because the object already exists in a usable form by the time the initializer is called). Operator overloading is done by implementing methods with other special __dunder__ names. Properties can be created with @property, where the @ is syntax for applying a wrapper function to a function as it’s defined. You can do inheritance, even multiply:

1
2
3
4
class Foo(A, B, C):
    def bar(self, x, y, z):
        # do some stuff
        super().bar(x, y, z)

Cool, a very traditional object model.

Except… for some details.

Some details

For one, Python objects don’t have a fixed layout. Code both inside and outside the class can add or remove whatever attributes they want from whatever object they want. The underlying storage is just a dict, Python’s mapping type. (Or, rather, something like one. Also, it’s possible to change, which will probably be the case for everything I say here.)

If you create some attributes at the class level, you’ll start to get a peek behind the curtains:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Foo:
    values = []

    def add_value(self, value):
        self.values.append(value)

a = Foo()
b = Foo()
a.add_value('a')
print(a.values)  # ['a']
b.add_value('b')
print(b.values)  # ['a', 'b']

The [] assigned to values isn’t a default assigned to each object. In fact, the individual objects don’t know about it at all! You can use vars(a) to get at the underlying storage dict, and you won’t see a values entry in there anywhere.

Instead, values lives on the class, which is a value (and thus an object) in its own right. When Python is asked for self.values, it checks to see if self has a values attribute; in this case, it doesn’t, so Python keeps going and asks the class for one.

Python’s object model is secretly prototypical — a class acts as a prototype, as a shared set of fallback values, for its objects.

In fact, this is also how method calls work! They aren’t syntactically special at all, which you can see by separating the attribute lookup from the call.

1
2
3
print("abc".startswith("a"))  # True
meth = "abc".startswith
print(meth("a"))  # True

Reading obj.method looks for a method attribute; if there isn’t one on obj, Python checks the class. Here, it finds one: it’s a function from the class body.

Ah, but wait! In the code I just showed, meth seems to “know” the object it came from, so it can’t just be a plain function. If you inspect the resulting value, it claims to be a “bound method” or “built-in method” rather than a function, too. Something funny is going on here, and that funny something is the descriptor protocol.

Descriptors

Python allows attributes to implement their own custom behavior when read from or written to. Such an attribute is called a descriptor. I’ve written about them before, but here’s a quick overview.

If Python looks up an attribute, finds it in a class, and the value it gets has a __get__ method… then instead of using that value, Python will use the return value of its __get__ method.

The @property decorator works this way. The magnitude property in my original example was shorthand for doing this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class MagnitudeDescriptor:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return (instance.x ** 2 + instance.y ** 2) ** 0.5

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    magnitude = MagnitudeDescriptor()

When you ask for somevec.magnitude, Python checks somevec but doesn’t find magnitude, so it consults the class instead. The class does have a magnitude, and it’s a value with a __get__ method, so Python calls that method and somevec.magnitude evaluates to its return value. (The instance is None check is because __get__ is called even if you get the descriptor directly from the class via Vector.magnitude. A descriptor intended to work on instances can’t do anything useful in that case, so the convention is to return the descriptor itself.)

You can also intercept attempts to write to or delete an attribute, and do absolutely whatever you want instead. But note that, similar to operating overloading in Python, the descriptor must be on a class; you can’t just slap one on an arbitrary object and have it work.

This brings me right around to how “bound methods” actually work. Functions are descriptors! The function type implements __get__, and when a function is retrieved from a class via an instance, that __get__ bundles the function and the instance together into a tiny bound method object. It’s essentially:

1
2
3
4
5
class FunctionType:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return functools.partial(self, instance)

The self passed as the first argument to methods is not special or magical in any way. It’s built out of a few simple pieces that are also readily accessible to Python code.

Note also that because obj.method() is just an attribute lookup and a call, Python doesn’t actually care whether method is a method on the class or just some callable thing on the object. You won’t get the auto-self behavior if it’s on the object, but otherwise there’s no difference.

More attribute access, and the interesting part

Descriptors are one of several ways to customize attribute access. Classes can implement __getattr__ to intervene when an attribute isn’t found on an object; __setattr__ and __delattr__ to intervene when any attribute is set or deleted; and __getattribute__ to implement unconditional attribute access. (That last one is a fantastic way to create accidental recursion, since any attribute access you do within __getattribute__ will of course call __getattribute__ again.)

Here’s what I really love about Python. It might seem like a magical special case that descriptors only work on classes, but it really isn’t. You could implement exactly the same behavior yourself, in pure Python, using only the things I’ve just told you about. Classes are themselves objects, remember, and they are instances of type, so the reason descriptors only work on classes is that type effectively does this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class type:
    def __getattribute__(self, name):
        value = super().__getattribute__(name)
        # like all op overloads, __get__ must be on the type, not the instance
        ty = type(value)
        if hasattr(ty, '__get__'):
            # it's a descriptor!  this is a class access so there is no instance
            return ty.__get__(value, None, self)
        else:
            return value

You can even trivially prove to yourself that this is what’s going on by skipping over types behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class Descriptor:
    def __get__(self, instance, owner):
        print('called!')

class Foo:
    bar = Descriptor()

Foo.bar  # called!
type.__getattribute__(Foo, 'bar')  # called!
object.__getattribute__(Foo, 'bar')  # ...

And that’s not all! The mysterious super function, used to exhaustively traverse superclass method calls even in the face of diamond inheritance, can also be expressed in pure Python using these primitives. You could write your own superclass calling convention and use it exactly the same way as super.

This is one of the things I really like about Python. Very little of it is truly magical; virtually everything about the object model exists in the types rather than the language, which means virtually everything can be customized in pure Python.

Class creation and metaclasses

A very brief word on all of this stuff, since I could talk forever about Python and I have three other languages to get to.

The class block itself is fairly interesting. It looks like this:

1
2
class Name(*bases, **kwargs):
    # code

I’ve said several times that classes are objects, and in fact the class block is one big pile of syntactic sugar for calling type(...) with some arguments to create a new type object.

The Python documentation has a remarkably detailed description of this process, but the gist is:

  • Python determines the type of the new class — the metaclass — by looking for a metaclass keyword argument. If there isn’t one, Python uses the “lowest” type among the provided base classes. (If you’re not doing anything special, that’ll just be type, since every class inherits from object and object is an instance of type.)

  • Python executes the class body. It gets its own local scope, and any assignments or method definitions go into that scope.

  • Python now calls type(name, bases, attrs, **kwargs). The name is whatever was right after class; the bases are position arguments; and attrs is the class body’s local scope. (This is how methods and other class attributes end up on the class.) The brand new type is then assigned to Name.

Of course, you can mess with most of this. You can implement __prepare__ on a metaclass, for example, to use a custom mapping as storage for the local scope — including any reads, which allows for some interesting shenanigans. The only part you can’t really implement in pure Python is the scoping bit, which has a couple extra rules that make sense for classes. (In particular, functions defined within a class block don’t close over the class body; that would be nonsense.)

Object creation

Finally, there’s what actually happens when you create an object — including a class, which remember is just an invocation of type(...).

Calling Foo(...) is implemented as, well, a call. Any type can implement calls with the __call__ special method, and you’ll find that type itself does so. It looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# oh, a fun wrinkle that's hard to express in pure python: type is a class, so
# it's an instance of itself
class type:
    def __call__(self, *args, **kwargs):
        # remember, here 'self' is a CLASS, an instance of type.
        # __new__ is a true constructor: object.__new__ allocates storage
        # for a new blank object
        instance = self.__new__(self, *args, **kwargs)
        # you can return whatever you want from __new__ (!), and __init__
        # is only called on it if it's of the right type
        if isinstance(instance, self):
            instance.__init__(*args, **kwargs)
        return instance

Again, you can trivially confirm this by asking any type for its __call__ method. Assuming that type doesn’t implement __call__ itself, you’ll get back a bound version of types implementation.

1
2
>>> list.__call__
<method-wrapper '__call__' of type object at 0x7fafb831a400>

You can thus implement __call__ in your own metaclass to completely change how subclasses are created — including skipping the creation altogether, if you like.

And… there’s a bunch of stuff I haven’t even touched on.

The Python philosophy

Python offers something that, on the surface, looks like a “traditional” class/object model. Under the hood, it acts more like a prototypical system, where failed attribute lookups simply defer to a superclass or metaclass.

The language also goes to almost superhuman lengths to expose all of its moving parts. Even the prototypical behavior is an implementation of __getattribute__ somewhere, which you are free to completely replace in your own types. Proxying and delegation are easy.

Also very nice is that these features “bundle” well, by which I mean a library author can do all manner of convoluted hijinks, and a consumer of that library doesn’t have to see any of it or understand how it works. You only need to inherit from a particular class (which has a metaclass), or use some descriptor as a decorator, or even learn any new syntax.

This meshes well with Python culture, which is pretty big on the principle of least surprise. These super-advanced features tend to be tightly confined to single simple features (like “makes a weak attribute“) or cordoned with DSLs (e.g., defining a form/struct/database table with a class body). In particular, I’ve never seen a metaclass in the wild implement its own __call__.

I have mixed feelings about that. It’s probably a good thing overall that the Python world shows such restraint, but I wonder if there are some very interesting possibilities we’re missing out on. I implemented a metaclass __call__ myself, just once, in an entity/component system that strove to minimize fuss when communicating between components. It never saw the light of day, but I enjoyed seeing some new things Python could do with the same relatively simple syntax. I wouldn’t mind seeing, say, an object model based on composition (with no inheritance) built atop Python’s primitives.

Lua

Lua doesn’t have an object model. Instead, it gives you a handful of very small primitives for building your own object model. This is pretty typical of Lua — it’s a very powerful language, but has been carefully constructed to be very small at the same time. I’ve never encountered anything else quite like it, and “but it starts indexing at 1!” really doesn’t do it justice.

The best way to demonstrate how objects work in Lua is to build some from scratch. We need two key features. The first is metatables, which bear a passing resemblance to Python’s metaclasses.

Tables and metatables

The table is Lua’s mapping type and its primary data structure. Keys can be any value other than nil. Lists are implemented as tables whose keys are consecutive integers starting from 1. Nothing terribly surprising. The dot operator is sugar for indexing with a string key.

1
2
3
4
5
local t = { a = 1, b = 2 }
print(t['a'])  -- 1
print(t.b)  -- 2
t.c = 3
print(t['c'])  -- 3

A metatable is a table that can be associated with another value (usually another table) to change its behavior. For example, operator overloading is implemented by assigning a function to a special key in a metatable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = { a = 1, b = 2 }
--print(t + 0)  -- error: attempt to perform arithmetic on a table value

local mt = {
    __add = function(left, right)
        return 12
    end,
}
setmetatable(t, mt)
print(t + 0)  -- 12

Now, the interesting part: one of the special keys is __index, which is consulted when the base table is indexed by a key it doesn’t contain. Here’s a table that claims every key maps to itself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = {}
local mt = {
    __index = function(table, key)
        return key
    end,
}
setmetatable(t, mt)
print(t.foo)  -- foo
print(t.bar)  -- bar
print(t[3])  -- 3

__index doesn’t have to be a function, either. It can be yet another table, in which case that table is simply indexed with the key. If the key still doesn’t exist and that table has a metatable with an __index, the process repeats.

With this, it’s easy to have several unrelated tables that act as a single table. Call the base table an object, fill the __index table with functions and call it a class, and you have half of an object system. You can even get prototypical inheritance by chaining __indexes together.

At this point things are a little confusing, since we have at least three tables going on, so here’s a diagram. Keep in mind that Lua doesn’t actually have anything called an “object”, “class”, or “method” — those are just convenient nicknames for a particular structure we might build with Lua’s primitives.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
                    ╔═══════════╗        ...
                    ║ metatable ║         ║
                    ╟───────────╢   ┌─────╨───────────────────────┐
                    ║ __index   ╫───┤ lookup table ("superclass") │
                    ╚═══╦═══════╝   ├─────────────────────────────┤
  ╔═══════════╗         ║           │ some other method           ┼─── function() ... end
  ║ metatable ║         ║           └─────────────────────────────┘
  ╟───────────╢   ┌─────╨──────────────────┐
  ║ __index   ╫───┤ lookup table ("class") │
  ╚═══╦═══════╝   ├────────────────────────┤
      ║           │ some method            ┼─── function() ... end
      ║           └────────────────────────┘
┌─────╨─────────────────┐
│ base table ("object") │
└───────────────────────┘

Note that a metatable is not the same as a class; it defines behavior, not methods. Conversely, if you try to use a class directly as a metatable, it will probably not do much. (This is pretty different from e.g. Python, where operator overloads are just methods with funny names. One nice thing about the Lua approach is that you can keep interface-like functionality separate from methods, and avoid clogging up arbitrary objects’ namespaces. You could even use a dummy table as a key and completely avoid name collisions.)

Anyway, code!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
local class = {
    foo = function(a)
        print("foo got", a)
    end,
}
local mt = { __index = class }
-- setmetatable returns its first argument, so this is nice shorthand
local obj1 = setmetatable({}, mt)
local obj2 = setmetatable({}, mt)
obj1.foo(7)  -- foo got 7
obj2.foo(9)  -- foo got 9

Wait, wait, hang on. Didn’t I call these methods? How do they get at the object? Maybe Lua has a magical this variable?

Methods, sort of

Not quite, but this is where the other key feature comes in: method-call syntax. It’s the lightest touch of sugar, just enough to have method invocation.

1
2
3
4
5
6
7
8
9
-- note the colon!
a:b(c, d, ...)

-- exactly equivalent to this
-- (except that `a` is only evaluated once)
a.b(a, c, d, ...)

-- which of course is really this
a["b"](a, c, d, ...)

Now we can write methods that actually do something.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local class = {
    bar = function(self)
        print("our score is", self.score)
    end,
}
local mt = { __index = class }
local obj1 = setmetatable({ score = 13 }, mt)
local obj2 = setmetatable({ score = 25 }, mt)
obj1:bar()  -- our score is 13
obj2:bar()  -- our score is 25

And that’s all you need. Much like Python, methods and data live in the same namespace, and Lua doesn’t care whether obj:method() finds a function on obj or gets one from the metatable’s __index. Unlike Python, the function will be passed self either way, because self comes from the use of : rather than from the lookup behavior.

(Aside: strictly speaking, any Lua value can have a metatable — and if you try to index a non-table, Lua will always consult the metatable’s __index. Strings all have the string library as a metatable, so you can call methods on them: try ("%s %s"):format(1, 2). I don’t think Lua lets user code set the metatable for non-tables, so this isn’t that interesting, but if you’re writing Lua bindings from C then you can wrap your pointers in metatables to give them methods implemented in C.)

Bringing it all together

Of course, writing all this stuff every time is a little tedious and error-prone, so instead you might want to wrap it all up inside a little function. No problem.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
local function make_object(body)
    -- create a metatable
    local mt = { __index = body }
    -- create a base table to serve as the object itself
    local obj = setmetatable({}, mt)
    -- and, done
    return obj
end

-- you can leave off parens if you're only passing in 
local Dog = {
    -- this acts as a "default" value; if obj.barks is missing, __index will
    -- kick in and find this value on the class.  but if obj.barks is assigned
    -- to, it'll go in the object and shadow the value here.
    barks = 0,

    bark = function(self)
        self.barks = self.barks + 1
        print("woof!")
    end,
}

local mydog = make_object(Dog)
mydog:bark()  -- woof!
mydog:bark()  -- woof!
mydog:bark()  -- woof!
print(mydog.barks)  -- 3
print(Dog.barks)  -- 0

It works, but it’s fairly barebones. The nice thing is that you can extend it pretty much however you want. I won’t reproduce an entire serious object system here — lord knows there are enough of them floating around — but the implementation I have for my LÖVE games lets me do this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
local Animal = Object:extend{
    cries = 0,
}

-- called automatically by Object
function Animal:init()
    print("whoops i couldn't think of anything interesting to put here")
end

-- this is just nice syntax for adding a first argument called 'self', then
-- assigning this function to Animal.cry
function Animal:cry()
    self.cries = self.cries + 1
end

local Cat = Animal:extend{}

function Cat:cry()
    print("meow!")
    Cat.__super.cry(self)
end

local cat = Cat()
cat:cry()  -- meow!
cat:cry()  -- meow!
print(cat.cries)  -- 2

When I say you can extend it however you want, I mean that. I could’ve implemented Python (2)-style super(Cat, self):cry() syntax; I just never got around to it. I could even make it work with multiple inheritance if I really wanted to — or I could go the complete opposite direction and only implement composition. I could implement descriptors, customizing the behavior of individual table keys. I could add pretty decent syntax for composition/proxying. I am trying very hard to end this section now.

The Lua philosophy

Lua’s philosophy is to… not have a philosophy? It gives you the bare minimum to make objects work, and you can do absolutely whatever you want from there. Lua does have something resembling prototypical inheritance, but it’s not so much a first-class feature as an emergent property of some very simple tools. And since you can make __index be a function, you could avoid the prototypical behavior and do something different entirely.

The very severe downside, of course, is that you have to find or build your own object system — which can get pretty confusing very quickly, what with the multiple small moving parts. Third-party code may also have its own object system with subtly different behavior. (Though, in my experience, third-party code tries very hard to avoid needing an object system at all.)

It’s hard to say what the Lua “culture” is like, since Lua is an embedded language that’s often a little different in each environment. I imagine it has a thousand millicultures, instead. I can say that the tedium of building my own object model has led me into something very “traditional”, with prototypical inheritance and whatnot. It’s partly what I’m used to, but it’s also just really dang easy to get working.

Likewise, while I love properties in Python and use them all the dang time, I’ve yet to use a single one in Lua. They wouldn’t be particularly hard to add to my object model, but having to add them myself (or shop around for an object model with them and also port all my code to use it) adds a huge amount of friction. I’ve thought about designing an interesting ECS with custom object behavior, too, but… is it really worth the effort? For all the power and flexibility Lua offers, the cost is that by the time I have something working at all, I’m too exhausted to actually use any of it.

JavaScript

JavaScript is notable for being preposterously heavily used, yet not having a class block.

Well. Okay. Yes. It has one now. It didn’t for a very long time, and even the one it has now is sugar.

Here’s a vector class again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Vector {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    dot(other) {
        return this.x * other.x + this.y * other.y;
    }
}

In “classic” JavaScript, this would be written as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function Vector(x, y) {
    this.x = x;
    this.y = y;
}

Object.defineProperty(Vector.prototype, 'magnitude', {
    configurable: true,
    enumerable: true,
    get: function() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
});


Vector.prototype.dot = function(other) {
    return this.x * other.x + this.y * other.y;
};

Hm, yes. I can see why they added class.

The JavaScript model

In JavaScript, a new type is defined in terms of a function, which is its constructor.

Right away we get into trouble here. There is a very big difference between these two invocations, which I actually completely forgot about just now after spending four hours writing about Python and Lua:

1
2
let vec = Vector(3, 4);
let vec = new Vector(3, 4);

The first calls the function Vector. It assigns some properties to this, which here is going to be window, so now you have a global x and y. It then returns nothing, so vec is undefined.

The second calls Vector with this set to a new empty object, then evaluates to that object. The result is what you’d actually expect.

(You can detect this situation with the strange new.target expression, but I have never once remembered to do so.)

From here, we have true, honest-to-god, first-class prototypical inheritance. The word “prototype” is even right there. When you write this:

1
vec.dot(vec2)

JavaScript will look for dot on vec and (presumably) not find it. It then consults vecs prototype, an object you can see for yourself by using Object.getPrototypeOf(). Since vec is a Vector, its prototype is Vector.prototype.

I stress that Vector.prototype is not the prototype for Vector. It’s the prototype for instances of Vector.

(I say “instance”, but the true type of vec here is still just object. If you want to find Vector, it’s automatically assigned to the constructor property of its own prototype, so it’s available as vec.constructor.)

Of course, Vector.prototype can itself have a prototype, in which case the process would continue if dot were not found. A common (and, arguably, very bad) way to simulate single inheritance is to set Class.prototype to an instance of a superclass to get the prototype right, then tack on the methods for Class. Nowadays we can do Object.create(Superclass.prototype).

Now that I’ve been through Python and Lua, though, this isn’t particularly surprising. I kinda spoiled it.

I suppose one difference in JavaScript is that you can tack arbitrary attributes directly onto Vector all you like, and they will remain invisible to instances since they aren’t in the prototype chain. This is kind of backwards from Lua, where you can squirrel stuff away in the metatable.

Another difference is that every single object in JavaScript has a bunch of properties already tacked on — the ones in Object.prototype. Every object (and by “object” I mean any mapping) has a prototype, and that prototype defaults to Object.prototype, and it has a bunch of ancient junk like isPrototypeOf.

(Nit: it’s possible to explicitly create an object with no prototype via Object.create(null).)

Like Lua, and unlike Python, JavaScript doesn’t distinguish between keys found on an object and keys found via a prototype. Properties can be defined on prototypes with Object.defineProperty(), but that works just as well directly on an object, too. JavaScript doesn’t have a lot of operator overloading, but some things like Symbol.iterator also work on both objects and prototypes.

About this

You may, at this point, be wondering what this is. Unlike Lua and Python (and the last language below), this is a special built-in value — a context value, invisibly passed for every function call.

It’s determined by where the function came from. If the function was the result of an attribute lookup, then this is set to the object containing that attribute. Otherwise, this is set to the global object, window. (You can also set this to whatever you want via the call method on functions.)

This decision is made lexically, i.e. from the literal source code as written. There are no Python-style bound methods. In other words:

1
2
3
4
5
// this = obj
obj.method()
// this = window
let meth = obj.method
meth()

Also, because this is reassigned on every function call, it cannot be meaningfully closed over, which makes using closures within methods incredibly annoying. The old approach was to assign this to some other regular name like self (which got syntax highlighting since it’s also a built-in name in browsers); then we got Function.bind, which produced a callable thing with a fixed context value, which was kind of nice; and now finally we have arrow functions, which explicitly close over the current this when they’re defined and don’t change it when called. Phew.

Class syntax

I already showed class syntax, and it’s really just one big macro for doing all the prototype stuff The Right Way. It even prevents you from calling the type without new. The underlying model is exactly the same, and you can inspect all the parts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Vector { ... }

console.log(Vector.prototype);  // { dot: ..., magnitude: ..., ... }
let vec = new Vector(3, 4);
console.log(Object.getPrototypeOf(vec));  // same as Vector.prototype

// i don't know why you would subclass vector but let's roll with it
class Vectest extends Vector { ... }

console.log(Vectest.prototype);  // { ... }
console.log(Object.getPrototypeOf(Vectest.prototype))  // same as Vector.prototype

Alas, class syntax has a couple shortcomings. You can’t use the class block to assign arbitrary data to either the type object or the prototype — apparently it was deemed too confusing that mutations would be shared among instances. Which… is… how prototypes work. How Python works. How JavaScript itself, one of the most popular languages of all time, has worked for twenty-two years. Argh.

You can still do whatever assignment you want outside of the class block, of course. It’s just a little ugly, and not something I’d think to look for with a sugary class.

A more subtle result of this behavior is that a class block isn’t quite the same syntax as an object literal. The check for data isn’t a runtime thing; class Foo { x: 3 } fails to parse. So JavaScript now has two largely but not entirely identical styles of key/value block.

Attribute access

Here’s where things start to come apart at the seams, just a little bit.

JavaScript doesn’t really have an attribute protocol. Instead, it has two… extension points, I suppose.

One is Object.defineProperty, seen above. For common cases, there’s also the get syntax inside a property literal, which does the same thing. But unlike Python’s @property, these aren’t wrappers around some simple primitives; they are the primitives. JavaScript is the only language of these four to have “property that runs code on access” as a completely separate first-class concept.

If you want to intercept arbitrary attribute access (and some kinds of operators), there’s a completely different primitive: the Proxy type. It doesn’t let you intercept attribute access or operators; instead, it produces a wrapper object that supports interception and defers to the wrapped object by default.

It’s cool to see composition used in this way, but also, extremely weird. If you want to make your own type that overloads in or calling, you have to return a Proxy that wraps your own type, rather than actually returning your own type. And (unlike the other three languages in this post) you can’t return a different type from a constructor, so you have to throw that away and produce objects only from a factory. And instanceof would be broken, but you can at least fix that with Symbol.hasInstance — which is really operator overloading, implement yet another completely different way.

I know the design here is a result of legacy and speed — if any object could intercept all attribute access, then all attribute access would be slowed down everywhere. Fair enough. It still leaves the surface area of the language a bit… bumpy?

The JavaScript philosophy

It’s a little hard to tell. The original idea of prototypes was interesting, but it was hidden behind some very awkward syntax. Since then, we’ve gotten a bunch of extra features awkwardly bolted on to reflect the wildly varied things the built-in types and DOM API were already doing. We have class syntax, but it’s been explicitly designed to avoid exposing the prototype parts of the model.

I admit I don’t do a lot of heavy JavaScript, so I might just be overlooking it, but I’ve seen virtually no code that makes use of any of the recent advances in object capabilities. Forget about custom iterators or overloading call; I can’t remember seeing any JavaScript in the wild that even uses properties yet. I don’t know if everyone’s waiting for sufficient browser support, nobody knows about them, or nobody cares.

The model has advanced recently, but I suspect JavaScript is still shackled to its legacy of “something about prototypes, I don’t really get it, just copy the other code that’s there” as an object model. Alas! Prototypes are so good. Hopefully class syntax will make it a bit more accessible, as it has in Python.

Perl 5

Perl 5 also doesn’t have an object system and expects you to build your own. But where Lua gives you two simple, powerful tools for building one, Perl 5 feels more like a puzzle with half the pieces missing. Clearly they were going for something, but they only gave you half of it.

In brief, a Perl object is a reference that has been blessed with a package.

I need to explain a few things. Honestly, one of the biggest problems with the original Perl object setup was how many strange corners and unique jargon you had to understand just to get off the ground.

(If you want to try running any of this code, you should stick a use v5.26; as the first line. Perl is very big on backwards compatibility, so you need to opt into breaking changes, and even the mundane say builtin is behind a feature gate.)

References

A reference in Perl is sort of like a pointer, but its main use is very different. See, Perl has the strange property that its data structures try very hard to spill their contents all over the place. Despite having dedicated syntax for arrays — @foo is an array variable, distinct from the single scalar variable $foo — it’s actually impossible to nest arrays.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (@foo, @foo);
# @bar is now a flat list of eight items: 1, 2, 3, 4, 1, 2, 3, 4

The idea, I guess, is that an array is not one thing. It’s not a container, which happens to hold multiple things; it is multiple things. Anywhere that expects a single value, such as an array element, cannot contain an array, because an array fundamentally is not a single value.

And so we have “references”, which are a form of indirection, but also have the nice property that they’re single values. They add containment around arrays, and in general they make working with most of Perl’s primitive types much more sensible. A reference to a variable can be taken with the \ operator, or you can use [ ... ] and { ... } to directly create references to anonymous arrays or hashes.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (\@foo, \@foo);
# @bar is now a nested list of two items: [1, 2, 3, 4], [1, 2, 3, 4]

(Incidentally, this is the sole reason I initially abandoned Perl for Python. Non-trivial software kinda requires nesting a lot of data structures, so you end up with references everywhere, and the syntax for going back and forth between a reference and its contents is tedious and ugly.)

A Perl object must be a reference. Perl doesn’t care what kind of reference — it’s usually a hash reference, since hashes are a convenient place to store arbitrary properties, but it could just as well be a reference to an array, a scalar, or even a sub (i.e. function) or filehandle.

I’m getting a little ahead of myself. First, the other half: blessing and packages.

Packages and blessing

Perl packages are just namespaces. A package looks like this:

1
2
3
4
5
6
7
package Foo::Bar;

sub quux {
    say "hi from quux!";
}

# now Foo::Bar::quux() can be called from anywhere

Nothing shocking, right? It’s just a named container. A lot of the details are kind of weird, like how a package exists in some liminal quasi-value space, but the basic idea is a Bag Of Stuff.

The final piece is “blessing,” which is Perl’s funny name for binding a package to a reference. A very basic class might look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package Vector;

# the name 'new' is convention, not special
sub new {
    # perl argument passing is weird, don't ask
    my ($class, $x, $y) = @_;

    # create the object itself -- here, unusually, an array reference makes sense
    my $self = [ $x, $y ];

    # associate the package with that reference
    # note that $class here is just the regular string, 'Vector'
    bless $self, $class;

    return $self;
}

sub x {
    my ($self) = @_;
    return $self->[0];
}

sub y {
    my ($self) = @_;
    return $self->[1];
}

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

# switch back to the "default" package
package main;

# -> is method call syntax, which passes the invocant as the first argument;
# for a package, that's just the package name
my $vec = Vector->new(3, 4);
say $vec->magnitude;  # 5

A few things of note here. First, $self->[0] has nothing to do with objects; it’s normal syntax for getting the value of a index 0 out of an array reference called $self. (Most classes are based on hashrefs and would use $self->{value} instead.) A blessed reference is still a reference and can be treated like one.

In general, -> is Perl’s dereferencey operator, but its exact behavior depends on what follows. If it’s followed by brackets, then it’ll apply the brackets to the thing in the reference: ->{} to index a hash reference, ->[] to index an array reference, and ->() to call a function reference.

But if -> is followed by an identifier, then it’s a method call. For packages, that means calling a function in the package and passing the package name as the first argument. For objects — blessed references — that means calling a function in the associated package and passing the object as the first argument.

This is a little weird! A blessed reference is a superposition of two things: its normal reference behavior, and some completely orthogonal object behavior. Also, object behavior has no notion of methods vs data; it only knows about methods. Perl lets you omit parentheses in a lot of places, including when calling a method with no arguments, so $vec->magnitude is really $vec->magnitude().

Perl’s blessing bears some similarities to Lua’s metatables, but ultimately Perl is much closer to Ruby’s “message passing” approach than the above three languages’ approaches of “get me something and maybe it’ll be callable”. (But this is no surprise — Ruby is a spiritual successor to Perl 5.)

All of this leads to one little wrinkle: how do you actually expose data? Above, I had to write x and y methods. Am I supposed to do that for every single attribute on my type?

Yes! But don’t worry, there are third-party modules to help with this incredibly fundamental task. Take Class::Accessor::Fast, so named because it’s faster than Class::Accessor:

1
2
3
package Foo;
use base qw(Class::Accessor::Fast);
__PACKAGE__->mk_accessors(qw(fred wilma barney));

(__PACKAGE__ is the lexical name of the current package; qw(...) is a list literal that splits its contents on whitespace.)

This assumes you’re using a hashref with keys of the same names as the attributes. $obj->fred will return the fred key from your hashref, and $obj->fred(4) will change it to 4.

You also, somewhat bizarrely, have to inherit from Class::Accessor::Fast. Speaking of which,

Inheritance

Inheritance is done by populating the package-global @ISA array with some number of (string) names of parent packages. Most code instead opts to write use base ...;, which does the same thing. Or, more commonly, use parent ...;, which… also… does the same thing.

Every package implicitly inherits from UNIVERSAL, which can be freely modified by Perl code.

A method can call its superclass method with the SUPER:: pseudo-package:

1
2
3
4
sub foo {
    my ($self) = @_;
    $self->SUPER::foo;
}

However, this does a depth-first search, which means it almost certainly does the wrong thing when faced with multiple inheritance. For a while the accepted solution involved a third-party module, but Perl eventually grew an alternative you have to opt into: C3, which may be more familiar to you as the order Python uses.

1
2
3
4
5
6
use mro 'c3';

sub foo {
    my ($self) = @_;
    $self->next::method;
}

Offhand, I’m not actually sure how next::method works, seeing as it was originally implemented in pure Perl code. I suspect it involves peeking at the caller’s stack frame. If so, then this is a very different style of customizability from e.g. Python — the MRO was never intended to be pluggable, and the use of a special pseudo-package means it isn’t really, but someone was determined enough to make it happen anyway.

Operator overloading and whatnot

Operator overloading looks a little weird, though really it’s pretty standard Perl.

1
2
3
4
5
6
7
8
package MyClass;

use overload '+' => \&_add;

sub _add {
    my ($self, $other, $swap) = @_;
    ...
}

use overload here is a pragma, where “pragma” means “regular-ass module that does some wizardry when imported”.

\&_add is how you get a reference to the _add sub so you can pass it to the overload module. If you just said &_add or _add, that would call it.

And that’s it; you just pass a map of operators to functions to this built-in module. No worry about name clashes or pollution, which is pretty nice. You don’t even have to give references to functions that live in the package, if you don’t want them to clog your namespace; you could put them in another package, or even inline them anonymously.

One especially interesting thing is that Perl lets you overload every operator. Perl has a lot of operators. It considers some math builtins like sqrt and trig functions to be operators, or at least operator-y enough that you can overload them. You can also overload the “file text” operators, such as -e $path to test whether a file exists. You can overload conversions, including implicit conversion to a regex. And most fascinating to me, you can overload dereferencing — that is, the thing Perl does when you say $hashref->{key} to get at the underlying hash. So a single object could pretend to be references of multiple different types, including a subref to implement callability. Neat.

Somewhat related: you can overload basic operators (indexing, etc.) on basic types (not references!) with the tie function, which is designed completely differently and looks for methods with fixed names. Go figure.

You can intercept calls to nonexistent methods by implementing a function called AUTOLOAD, within which the $AUTOLOAD global will contain the name of the method being called. Originally this feature was, I think, intended for loading binary components or large libraries on-the-fly only when needed, hence the name. Offhand I’m not sure I ever saw it used the way __getattr__ is used in Python.

Is there a way to intercept all method calls? I don’t think so, but it is Perl, so I must be forgetting something.

Actually no one does this any more

Like a decade ago, a council of elder sages sat down and put together a whole whizbang system that covers all of it: Moose.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
package Vector;
use Moose;

has x => (is => 'rw', isa => 'Int');
has y => (is => 'rw', isa => 'Int');

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

Moose has its own way to do pretty much everything, and it’s all built on the same primitives. Moose also adds metaclasses, somehow, despite that the underlying model doesn’t actually support them? I’m not entirely sure how they managed that, but I do remember doing some class introspection with Moose and it was much nicer than the built-in way.

(If you’re wondering, the built-in way begins with looking at the hash called %Vector::. No, that’s not a typo.)

I really cannot stress enough just how much stuff Moose does, but I don’t want to delve into it here since Moose itself is not actually the language model.

The Perl philosophy

I hope you can see what I meant with what I first said about Perl, now. It has multiple inheritance with an MRO, but uses the wrong one by default. It has extensive operator overloading, which looks nothing like how inheritance works, and also some of it uses a totally different mechanism with special method names instead. It only understands methods, not data, leaving you to figure out accessors by hand.

There’s 70% of an object system here with a clear general design it was gunning for, but none of the pieces really look anything like each other. It’s weird, in a distinctly Perl way.

The result is certainly flexible, at least! It’s especially cool that you can use whatever kind of reference you want for storage, though even as I say that, I acknowledge it’s no different from simply subclassing list or something in Python. It feels different in Perl, but maybe only because it looks so different.

I haven’t written much Perl in a long time, so I don’t know what the community is like any more. Moose was already ubiquitous when I left, which you’d think would let me say “the community mostly focuses on the stuff Moose can do” — but even a decade ago, Moose could already do far more than I had ever seen done by hand in Perl. It’s always made a big deal out of roles (read: interfaces), for instance, despite that I’d never seen anyone care about them in Perl before Moose came along. Maybe their presence in Moose has made them more popular? Who knows.

Also, I wrote Perl seriously, but in the intervening years I’ve only encountered people who only ever used Perl for one-offs. Maybe it’ll come as a surprise to a lot of readers that Perl has an object model at all.

End

Well, that was fun! I hope any of that made sense.

Special mention goes to Rust, which doesn’t have an object model you can fiddle with at runtime, but does do things a little differently.

It’s been really interesting thinking about how tiny differences make a huge impact on what people do in practice. Take the choice of storage in Perl versus Python. Perl’s massively common URI class uses a string as the storage, nothing else; I haven’t seen anything like that in Python aside from markupsafe, which is specifically designed as a string type. I would guess this is partly because Perl makes you choose — using a hashref is an obvious default, but you have to make that choice one way or the other. In Python (especially 3), inheriting from object and getting dict-based storage is the obvious thing to do; the ability to use another type isn’t quite so obvious, and doing it “right” involves a tiny bit of extra work.

Or, consider that Lua could have descriptors, but the extra bit of work (especially design work) has been enough of an impediment that I’ve never implemented them. I don’t think the object implementations I’ve looked at have included them, either. Super weird!

In that light, it’s only natural that objects would be so strongly associated with the features Java and C++ attach to them. I think that makes it all the more important to play around! Look at what Moose has done. No, really, you should bear in mind my description of how Perl does stuff and flip through the Moose documentation. It’s amazing what they’ve built.

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.

 

Hot Startups on AWS – October 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/hot-startups-on-aws-october-2017/

In 2015, the Centers for Medicare and Medicaid Services (CMS) reported that healthcare spending made up 17.8% of the U.S. GDP – that’s almost $3.2 trillion or $9,990 per person. By 2025, the CMS estimates this number will increase to nearly 20%. As cloud technology evolves in the healthcare and life science industries, we are seeing how companies of all sizes are using AWS to provide powerful and innovative solutions to customers across the globe. This month we are excited to feature the following startups:

  • ClearCare – helping home care agencies operate efficiently and grow their business.
  • DNAnexus – providing a cloud-based global network for sharing and managing genomic data.

ClearCare (San Francisco, CA)

ClearCare envisions a future where home care is the only choice for aging in place. Home care agencies play a critical role in the economy and their communities by significantly lowering the overall cost of care, reducing the number of hospital admissions, and bending the cost curve of aging. Patients receiving home care typically have multiple chronic conditions and functional limitations, driving over $190 billion in healthcare spending in the U.S. each year. To offset these costs, health insurance payers are developing in-home care management programs for patients. ClearCare’s goal is to help home care agencies leverage technology to improve costs, outcomes, and quality of life for the aging population. The company’s powerful software platform is specifically designed for use by non-medical, in-home care agencies to manage their businesses.

Founder and CEO Geoff Nudd created ClearCare because of his own grandmother’s need for care. Keeping family members and caregivers up to date on a loved one’s well being can be difficult, so Geoff created what is now ClearCare’s Family Room, which enables caregivers and agency staff to check schedules and receive real-time updates about what’s happening in the home. Since then, agencies have provided feedback on others areas of their businesses that could be streamlined. ClearCare has now built over 20 modules to help home care agencies optimize operations with services including a telephony service, billing and payroll, and more. ClearCare now serves over 4,000 home care agencies, representing 500,000 caregivers and 400,000 seniors.

Using AWS, ClearCare is able to spin up reliable infrastructure for proofs of concept and iterate on those systems to quickly get value to market. The company runs many AWS services including Amazon Elasticsearch Service, Amazon RDS, and Amazon CloudFront. Amazon EMR and Amazon Athena have enabled ClearCare to build a Hadoop-based ETL and data warehousing system that processes terabytes of data each day. By utilizing these managed services, ClearCare has been able to go from concept to customer delivery in less than three months.

To learn more about ClearCare, check out their website.

DNAnexus (Mountain View, CA)

DNAnexus is accelerating the application of genomic data in precision medicine by providing a cloud-based platform for sharing and managing genomic and biomedical data and analysis tools. The company was founded in 2009 by Stanford graduate student Andreas Sundquist and two Stanford professors Arend Sidow and Serafim Batzoglou, to address the need for scaling secondary analysis of next-generation sequencing (NGS) data in the cloud. The founders quickly learned that users needed a flexible solution to build complex analysis workflows and tools that enable them to share and manage large volumes of data. DNAnexus is optimized to address the challenges of security, scalability, and collaboration for organizations that are pursuing genomic-based approaches to health, both in clinics and research labs. DNAnexus has a global customer base – spanning North America, Europe, Asia-Pacific, South America, and Africa – that runs a million jobs each month and is doubling their storage year-over-year. The company currently stores more than 10 petabytes of biomedical and genomic data. That is equivalent to approximately 100,000 genomes, or in simpler terms, over 50 billion Facebook photos!

DNAnexus is working with its customers to help expand their translational informatics research, which includes expanding into clinical trial genomic services. This will help companies developing different medicines to better stratify clinical trial populations and develop companion tests that enable the right patient to get the right medicine. In collaboration with Janssen Human Microbiome Institute, DNAnexus is also launching Mosaic – a community platform for microbiome research.

AWS provides DNAnexus and its customers the flexibility to grow and scale research programs. Building the technology infrastructure required to manage these projects in-house is expensive and time-consuming. DNAnexus removes that barrier for labs of any size by using AWS scalable cloud resources. The company deploys its customers’ genomic pipelines on Amazon EC2, using Amazon S3 for high-performance, high-durability storage, and Amazon Glacier for low-cost data archiving. DNAnexus is also an AWS Life Sciences Competency Partner.

Learn more about DNAnexus here.

-Tina

New – Amazon EC2 Instances with Up to 8 NVIDIA Tesla V100 GPUs (P3)

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-amazon-ec2-instances-with-up-to-8-nvidia-tesla-v100-gpus-p3/

Driven by customer demand and made possible by on-going advances in the state-of-the-art, we’ve come a long way since the original m1.small instance that we launched in 2006, with instances that are emphasize compute power, burstable performance, memory size, local storage, and accelerated computing.

The New P3
Today we are making the next generation of GPU-powered EC2 instances available in four AWS regions. Powered by up to eight NVIDIA Tesla V100 GPUs, the P3 instances are designed to handle compute-intensive machine learning, deep learning, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, and genomics workloads.

P3 instances use customized Intel Xeon E5-2686v4 processors running at up to 2.7 GHz. They are available in three sizes (all VPC-only and EBS-only):

ModelNVIDIA Tesla V100 GPUsGPU MemoryNVIDIA NVLinkvCPUsMain MemoryNetwork BandwidthEBS Bandwidth
p3.2xlarge116 GiBn/a861 GiBUp to 10 Gbps1.5 Gbps
p3.8xlarge464 GiB200 GBps32244 GiB10 Gbps7 Gbps
p3.16xlarge8128 GiB300 GBps64488 GiB25 Gbps14 Gbps

Each of the NVIDIA GPUs is packed with 5,120 CUDA cores and another 640 Tensor cores and can deliver up to 125 TFLOPS of mixed-precision floating point, 15.7 TFLOPS of single-precision floating point, and 7.8 TFLOPS of double-precision floating point. On the two larger sizes, the GPUs are connected together via NVIDIA NVLink 2.0 running at a total data rate of up to 300 GBps. This allows the GPUs to exchange intermediate results and other data at high speed, without having to move it through the CPU or the PCI-Express fabric.

What’s a Tensor Core?
I had not heard the term Tensor core before starting to write this post. According to this very helpful post on the NVIDIA Blog, Tensor cores are designed to speed up the training and inference of large, deep neural networks. Each core is able to quickly and efficiently multiply a pair of 4×4 half-precision (also known as FP16) matrices together, add the resulting 4×4 matrix to another half or single-precision (FP32) matrix, and store the resulting 4×4 matrix in either half or single-precision form. Here’s a diagram from NVIDIA’s blog post:

This operation is in the innermost loop of the training process for a deep neural network, and is an excellent example of how today’s NVIDIA GPU hardware is purpose-built to address a very specific market need. By the way, the mixed-precision qualifier on the Tensor core performance means that it is flexible enough to work with with a combination of 16-bit and 32-bit floating point values.

Performance in Perspective
I always like to put raw performance numbers into a real-world perspective so that they are easier to relate to and (hopefully) more meaningful. This turned out to be surprisingly difficult, given that the eight NVIDIA Tesla V100 GPUs on a single p3.16xlarge can do 125 trillion single-precision floating point multiplications per second.

Let’s go back to the dawn of the microprocessor era, and consider the Intel 8080A chip that powered the MITS Altair that I bought in the summer of 1977. With a 2 MHz clock, it was able to do about 832 multiplications per second (I used this data and corrected it for the faster clock speed). The p3.16xlarge is roughly 150 billion times faster. However, just 1.2 billion seconds have gone by since that summer. In other words, I can do 100x more calculations today in one second than my Altair could have done in the last 40 years!

What about the innovative 8087 math coprocessor that was an optional accessory for the IBM PC that was announced in the summer of 1981? With a 5 MHz clock and purpose-built hardware, it was able to do about 52,632 multiplications per second. 1.14 billion seconds have elapsed since then, p3.16xlarge is 2.37 billion times faster, so the poor little PC would be barely halfway through a calculation that would run for 1 second today.

Ok, how about a Cray-1? First delivered in 1976, this supercomputer was able to perform vector operations at 160 MFLOPS, making the p3.x16xlarge 781,000 times faster. It could have iterated on some interesting problem 1500 times over the years since it was introduced.

Comparisons between the P3 and today’s scale-out supercomputers are harder to make, given that you can think of the P3 as a step-and-repeat component of a supercomputer that you can launch on as as-needed basis.

Run One Today
In order to take full advantage of the NVIDIA Tesla V100 GPUs and the Tensor cores, you will need to use CUDA 9 and cuDNN7. These drivers and libraries have already been added to the newest versions of the Windows AMIs and will be included in an updated Amazon Linux AMI that is scheduled for release on November 7th. New packages are already available in our repos if you want to to install them on your existing Amazon Linux AMI.

The newest AWS Deep Learning AMIs come preinstalled with the latest releases of Apache MxNet, Caffe2, and Tensorflow (each with support for the NVIDIA Tesla V100 GPUs), and will be updated to support P3 instances with other machine learning frameworks such as Microsoft Cognitive Toolkit and PyTorch as soon as these frameworks release support for the NVIDIA Tesla V100 GPUs. You can also use the NVIDIA Volta Deep Learning AMI for NGC.

P3 instances are available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) Regions in On-Demand, Spot, Reserved Instance, and Dedicated Host form.

Jeff;