Tag Archives: hash

Amazon Relational Database Service – Looking Back at 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-relational-database-service-looking-back-at-2017/

The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!

Certification & Security

Features

Engine Versions & Features

Regional Support

Instance Support

Price Reductions

And That’s a Wrap
I’m pretty sure that’s everything. As you can see, 2017 was quite the year! I can’t wait to see what the team delivers in 2018.

Jeff;

 

Integration With Zapier

Post Syndicated from Bozho original https://techblog.bozho.net/integration-with-zapier/

Integration is boring. And also inevitable. But I won’t be writing about enterprise integration patterns. Instead, I’ll explain how to create an app for integration with Zapier.

What is Zapier? It is a service that allows you tо connect two (or more) otherwise unconnected services via their APIs (or protocols). You can do stuff like “Create a Trello task from an Evernote note”, “publish new RSS items to Facebook”, “append new emails to a spreadsheet”, “post approaching calendar meeting to Slack”, “Save big email attachments to Dropbox”, “tweet all instagrams above a certain likes threshold”, and so on. In fact, it looks to cover mostly the same usecases as another famous service that I really like – IFTTT (if this then that), with my favourite use-case “Get a notification when the international space station passes over your house”. And all of those interactions can be configured via a UI.

Now that’s good for end users but what does it have to do with software development and integration? Zapier (unlike IFTTT, unfortunately), allows custom 3rd party services to be included. So if you have a service of your own, you can create an “app” and allow users to integrate your service with all the other 3rd party services. IFTTT offers a way to invoke web endpoints (including RESTful services), but it doesn’t allow setting headers, so that makes it quite limited for actual APIs.

In this post I’ll briefly explain how to write a custom Zapier app and then will discuss where services like Zapier stand from an architecture perspective.

The thing that I needed it for – to be able to integrate LogSentinel with any of the third parties available through Zapier, i.e. to store audit logs for events that happen in all those 3rd party systems. So how do I do that? There’s a tutorial that makes it look simple. And it is, with a few catches.

First, there are two tutorials – one in GitHub and one on Zapier’s website. And they differ slightly, which becomes tricky in some cases.

I initially followed the GitHub tutorial and had my build fail. It claimed the zapier platform dependency is missing. After I compared it with the example apps, I found out there’s a caret in front of the zapier platform dependency. Removing it just yielded another error – that my node version should be exactly 6.10.2. Why?

The Zapier CLI requires you have exactly version 6.10.2 installed. You’ll see errors and will be unable to proceed otherwise.

It appears that they are using AWS Lambda which is stuck on Node 6.10.2 (actually – it’s 6.10.3 when you check). The current major release is 8, so minus points for choosing … javascript for a command-line tool and for building sandboxed apps. Maybe other decisions had their downsides as well, I won’t be speculating. Maybe it’s just my dislike for dynamic languages.

So, after you make sure you have the correct old version on node, you call zapier init and make sure there are no carets, npm install and then zapier test. So far so good, you have a dummy app. Now how do you make a RESTful call to your service?

Zapier splits the programmable entities in two – “triggers” and “creates”. A trigger is the event that triggers the whole app, an a “create” is what happens as a result. In my case, my app doesn’t publish any triggers, it only accepts input, so I won’t be mentioning triggers (though they seem easy). You configure all of the elements in index.js (e.g. this one):

const log = require('./creates/log');
....
creates: {
    [log.key]: log,
}

The log.js file itself is the interesting bit – there you specify all the parameters that should be passed to your API call, as well as making the API call itself:

const log = (z, bundle) => {
  const responsePromise = z.request({
    method: 'POST',
    url: `https://api.logsentinel.com/api/log/${bundle.inputData.actorId}/${bundle.inputData.action}`,
    body: bundle.inputData.details,
	headers: {
		'Accept': 'application/json'
	}
  });
  return responsePromise
    .then(response => JSON.parse(response.content));
};

module.exports = {
  key: 'log-entry',
  noun: 'Log entry',

  display: {
    label: 'Log',
    description: 'Log an audit trail entry'
  },

  operation: {
    inputFields: [
      {key: 'actorId', label:'ActorID', required: true},
      {key: 'action', label:'Action', required: true},
      {key: 'details', label:'Details', required: false}
    ],
    perform: log
  }
};

You can pass the input parameters to your API call, and it’s as simple as that. The user can then specify which parameters from the source (“trigger”) should be mapped to each of your parameters. In an example zap, I used an email trigger and passed the sender as actorId, the sibject as “action” and the body of the email as details.

There’s one more thing – authentication. Authentication can be done in many ways. Some services offer OAuth, others – HTTP Basic or other custom forms of authentication. There is a section in the documentation about all the options. In my case it was (almost) an HTTP Basic auth. My initial thought was to just supply the credentials as parameters (which you just hardcode rather than map to trigger parameters). That may work, but it’s not the canonical way. You should configure “authentication”, as it triggers a friendly UI for the user.

You include authentication.js (which has the fields your authentication requires) and then pre-process requests by adding a header (in index.js):

const authentication = require('./authentication');

const includeAuthHeaders = (request, z, bundle) => {
  if (bundle.authData.organizationId) {
	request.headers = request.headers || {};
	request.headers['Application-Id'] = bundle.authData.applicationId
	const basicHash = Buffer(`${bundle.authData.organizationId}:${bundle.authData.apiSecret}`).toString('base64');
	request.headers['Authorization'] = `Basic ${basicHash}`;
  }
  return request;
};

const App = {
  // This is just shorthand to reference the installed dependencies you have. Zapier will
  // need to know these before we can upload
  version: require('./package.json').version,
  platformVersion: require('zapier-platform-core').version,
  authentication: authentication,
  
  // beforeRequest & afterResponse are optional hooks into the provided HTTP client
  beforeRequest: [
	includeAuthHeaders
  ]
...
}

And then you zapier push your app and you can test it. It doesn’t automatically go live, as you have to invite people to try it and use it first, but in many cases that’s sufficient (i.e. using Zapier when doing integration with a particular client)

Can Zapier can be used for any integration problem? Unlikely – it’s pretty limited and simple, but that’s also a strength. You can, in half a day, make your service integrate with thousands of others for the most typical use-cases. And not that although it’s meant for integrating public services rather than for enterprise integration (where you make multiple internal systems talk to each other), as an increasing number of systems rely on 3rd party services, it could find home in an enterprise system, replacing some functions of an ESB.

Effectively, such services (Zapier, IFTTT) are “Simple ESB-as-a-service”. You go to a UI, fill a bunch of fields, and you get systems talking to each other without touching the systems themselves. I’m not a big fan of ESBs, mostly because they become harder to support with time. But minimalist, external ones might be applicable in certain situations. And while such services are primarily aimed at end users, they could be a useful bit in an enterprise architecture that relies on 3rd party services.

Whether it could process the required load, whether an organization is willing to let its data flow through a 3rd party provider (which may store the intermediate parameters), is a question that should be answered in a case by cases basis. I wouldn’t recommend it as a general solution, but it’s certainly an option to consider.

The post Integration With Zapier appeared first on Bozho's tech blog.

Big Birthday Weekend 2018: find a Jam near you!

Post Syndicated from Ben Nuttall original https://www.raspberrypi.org/blog/big-birthday-weekend-2018-find-a-jam-near-you/

We’re just over three weeks away from the Raspberry Jam Big Birthday Weekend 2018, our community celebration of Raspberry Pi’s sixth birthday. Instead of an event in Cambridge, as we’ve held in the past, we’re coordinating Raspberry Jam events to take place around the world on 3–4 March, so that as many people as possible can join in. Well over 100 Jams have been confirmed so far.

Raspberry Pi Big Birthday Weekend Jam

Find a Jam near you

There are Jams planned in Argentina, Australia, Bolivia, Brazil, Bulgaria, Cameroon, Canada, Colombia, Dominican Republic, France, Germany, Greece, Hungary, India, Iran, Ireland, Italy, Japan, Kenya, Malaysia, Malta, Mexico, Netherlands, Norway, Papua New Guinea, Peru, Philippines, Poland, South Africa, Spain, Taiwan, Turkey, United Kingdom, United States, and Zimbabwe.

Take a look at the events map and the full list (including those who haven’t added their event to the map quite yet).

Raspberry Jam Big Birthday Weekend 2018 event map

We will have Raspberry Jams in 35 countries across six continents

Birthday kits

We had some special swag made especially for the birthday, including these T-shirts, which we’ve sent to Jam organisers:

Raspberry Jam Big Birthday Weekend 2018 T-shirt

There is also a poster with a list of participating Jams, which you can download:

Raspberry Jam Big Birthday Weekend 2018 list

Raspberry Jam photo booth

I created a Raspberry Jam photo booth that overlays photos with the Big Birthday Weekend logo and then tweets the picture from your Jam’s account — you’ll be seeing plenty of those if you follow the #PiParty hashtag on 3–4 March.

Check out the project on GitHub, and feel free to set up your own booth, or modify it to your own requirements. We’ve included text annotations in several languages, and more contributions are very welcome.

There’s still time…

If you can’t find a Jam near you, there’s still time to organise one for the Big Birthday Weekend. All you need to do is find a venue — a room in a school or library will do — and think about what you’d like to do at the event. Some Jams have Raspberry Pis set up for workshops and practical activities, some arrange tech talks, some put on show-and-tell — it’s up to you. To help you along, there’s the Raspberry Jam Guidebook full of advice and tips from Jam organisers.

Raspberry Pi on Twitter

The packed. And they packed. And they packed some more. Who’s expecting one of these #rjam kits for the Raspberry Jam Big Birthday Weekend?

Download the Raspberry Jam branding pack, and the special birthday branding pack, where you’ll find logos, graphical assets, flyer templates, worksheets, and more. When you’re ready to announce your event, create a webpage for it — you can use a site like Eventbrite or Meetup — and submit your Jam to us so it will appear on the Jam map!

We are six

We’re really looking forward to celebrating our birthday with thousands of people around the world. Over 48 hours, people of all ages will come together at more than 100 events to learn, share ideas, meet people, and make things during our Big Birthday Weekend.

Raspberry Jam Manchester
Raspberry Jam Manchester
Raspberry Jam Manchester

Since we released the first Raspberry Pi in 2012, we’ve sold 17 million of them. We’re also reaching almost 200000 children in 130 countries around the world through Code Club and CoderDojo, we’ve trained over 1500 Raspberry Pi Certified Educators, and we’ve sent code written by more than 6800 children into space. Our magazines are read by a quarter of a million people, and millions more use our free online learning resources. There’s plenty to celebrate and even more still to do: we really hope you’ll join us from a Jam near you on 3–4 March.

The post Big Birthday Weekend 2018: find a Jam near you! appeared first on Raspberry Pi.

Top 10 Most Popular Torrent Sites of 2018

Post Syndicated from Ernesto original https://torrentfreak.com/top-10-most-popular-torrent-sites-of-2018-180107/

Torrent sites have come and gone over past year. Now, at the start of 2018, we take a look to see what the most-used sites are in the current landscape.

The Pirate Bay remains the undisputed number one. The site has weathered a few storms over the years, but it looks like it will be able to celebrate its 15th anniversary, which is coming up in a few months.

The list also includes various newcomers including Idope and Zooqle. While many people are happy to see new torrent sites emerge, this often means that others have called it quits.

Last year’s runner-up Extratorrent, for example, has shut down and left a gaping hole behind. And it wasn’t the only site that went away. TorrentProject also disappeared without a trace and the same was true for isohunt.to.

The unofficial Torrentz reincarnation Torrentz2.eu, the highest newcomer last year, is somewhat of an unusual entry. A few weeks ago all links to externally hosted torrents were removed, as was the list of indexed pages.

We decided to include the site nonetheless, given its history and because it’s still possible to find hashes through the site. As Torrentz2’s future is uncertain, we added an extra site (10.1) as compensation.

Finally, RuTracker also deserves a mention. The torrent site generates enough traffic to warrant a listing, but we traditionally limit the list to sites that are targeted primarily at an English or international audience.

Below is the full list of the ten most-visited torrent sites at the start of the new year. The list is based on various traffic reports and we display the Alexa rank for each. In addition, we include last year’s ranking.

Most Popular Torrent Sites

1. The Pirate Bay

The Pirate Bay is the “king of torrents” once again and also the oldest site in this list. The past year has been relatively quiet for the notorious torrent site, which is currently operating from its original .org domain name.

Alexa Rank: 104/ Last year #1

2. RARBG

RARBG, which started out as a Bulgarian tracker, has captured the hearts and minds of many video pirates. The site was founded in 2008 and specializes in high quality video releases.

Alexa Rank: 298 / Last year #3

3. 1337x

1337x continues where it left off last year. The site gained a lot of traffic and, unlike some other sites in the list, has a dedicated group of uploaders that provide fresh content.

Alexa Rank: 321 / Last year #6

4. Torrentz2

Torrentz2 launched as a stand-in for the original Torrentz.eu site, which voluntarily closed its doors in 2016. At the time of writing, the site only lists torrent hashes and no longer any links to external torrent sites. While browser add-ons and plugins still make the site functional, its future is uncertain.

Alexa Rank: 349 / Last year #5

5. YTS.ag

YTS.ag is the unofficial successors of the defunct YTS or YIFY group. Not all other torrent sites were happy that the site hijacked the popuar brand and several are actively banning its releases.

Alexa Rank: 563 / Last year #4

6. EZTV.ag

The original TV-torrent distribution group EZTV shut down after a hostile takeover in 2015, with new owners claiming ownership of the brand. The new group currently operates from EZTV.ag and releases its own torrents. These releases are banned on some other torrent sites due to this controversial history.

Alexa Rank: 981 / Last year #7

7. LimeTorrents

Limetorrents has been an established torrent site for more than half a decade. The site’s operator also runs the torrent cache iTorrents, which is used by several other torrent search engines.

Alexa Rank: 2,433 / Last year #10

8. NYAA.si

NYAA.si is a popular resurrection of the anime torrent site NYAA, which shut down last year. Previously we left anime-oriented sites out of the list, but since we also include dedicated TV and movie sites, we decided that a mention is more than warranted.

Alexa Rank: 1,575 / Last year #NA

9. Torrents.me

Torrents.me is one of the torrent sites that enjoyed a meteoric rise in traffic this year. It’s a meta-search engine that links to torrent files and magnet links from other torrent sites.

Alexa Rank: 2,045 / Last year #NA

10. Zooqle

Zooqle, which boasts nearly three million verified torrents, has stayed under the radar for years but has still kept growing. The site made it into the top 10 for the first time this year.

Alexa Rank: 2,347 / Last year #NA

10.1 iDope

The special 10.1 mention goes to iDope. Launched in 2016, the site is a relative newcomer to the torrent scene. The torrent indexer has steadily increased its audience over the past year. With similar traffic numbers to Zooqle, a listing is therefore warranted.

Alexa Rank: 2,358 / Last year #NA

Disclaimer: Yes, we know that Alexa isn’t perfect, but it helps to compare sites that operate in a similar niche. We also used other traffic metrics to compile the top ten. Please keep in mind that many sites have mirrors or alternative domains, which are not taken into account here.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Torrent Pioneers: isoHunt’s Gary Fung, Ten Years Later

Post Syndicated from Ernesto original https://torrentfreak.com/torrent-pioneers-isohunts-gary-fung-ten-years-later-180106/

Ten years ago, November 2007 to be precise, we published an article featuring the four leading torrent site admins at the time.

Niek van der Maas of Mininova, Justin Bunnell of TorrentSpy, Pirate Bay’s Peter Sunde and isoHunt’s Gary Fung were all kind enough to share their vision of BitTorrent’s future.

This future is the present today, and although the predictions were not all spot-on, there are a few interesting observations to make.

For one, these four men were all known by name, despite the uncertain legal situation they were in. How different is that today, when the operators of most of the world’s largest torrent sites are unknown to the broader public.

Another thing that stands out is that none of these pioneers are still active in the torrent space today. Niek and Justin have their own advertising businesses, Peter is a serial entrepreneur involved in various startups, while Gary works on his own projects.

While they have all moved on, they also remain a part of Internet history, which is why we decided to reach out to them ten years on.

Gary Fung was the first to reply. Those who’ve been following torrent news for a while know that isoHunt was shut down in 2013. The shutdown was the result of a lawsuit and came with a $110 million settlement with the MPAA, on paper.

Today the Canadian entrepreneur has other things on his hands, which includes “leveling up” his now one-year-old daughter. While that can be a day job by itself, he is also finalizing a mobile search app which will be released in the near future.

“The key is speed, and I can measure its speedup of the whole mobile search experience to be 10-100x that of conventional mobile web browsers,” Gary tells us, noting that after years of development, it’s almost ready.

The new search app is not one dedicated to torrents, as isoHunt once was. However, looking back, Gary is proud of what he accomplished with isoHunt, despite the bitter end.

“It was a humbling experience, in more ways than one. I’m proud that I participated and championed the rise of P2P content distribution through isoHunt as a search gateway,” Gary tells us.

“But I was also humbled by the responsibility and power at play, as seen in the lawsuits from the media industry giants, as well as the even larger picture of what P2P technologies were bringing, and still bring today.”

Decentralization has always been a key feature of BitTorrent and Gary sees this coming back in new trends. This includes the massive attention for blockchain related projects such as Bitcoin.

“2017 was the year Bitcoin became mainstream in a big way, and it’s feeling like the Internet before 2000. Decentralization is by nature disruptive, and I can’t wait to see what decentralizing money, governance, organizations and all kinds of applications will bring in the next few years.

“dApps [decentralized apps] made possible by platforms like Ethereum are like generalized BitTorrent for all kinds of applications, with ones we haven’t even thought of yet,” Gary adds.

Not everything is positive in hindsight, of course. Gary tells us that if he had to do it all over again he would take legal issues and lawyers more seriously. Not doing so led to more trouble than he imagined.

As a former torrent site admin, he has thought about the piracy issue quite a bit over the years. And unlike some sites today, he was happy to look for possible solutions to stop piracy.

One solution Gary suggested to Hollywood in the past was a hash recognition system for infringing torrents. A system to automatically filter known infringing files and remove these from cooperating torrent sites could still work today, he thinks.

“ContentID for all files shared on BitTorrent, similar to YouTube. I’ve proposed this to Hollywood studios before, as a better solution to suing their customers and potential P2P technology partners, but it obviously fell on deaf ears.”

In any case, torrent sites and similar services will continue to play an important role in how the media industry evolves. These platforms are showing Hollywood what the public wants, Gary believes.

“It has and will continue to play a role in showing the industry what consumers truly want: frictionless, convenient distribution, without borders of country or bundles. Bundles as in cable channels, but also in any way unwanted content is forced onto consumers without choice.”

While torrents were dominant in the past, the future will be streaming mostly, isoHunt’s founder says. He said this ten years ago, and he believes that in another decade it will have completely replaced cable TV.

Whether piracy will still be relevant then depends on how content is offered. More fragmentation will lead to more piracy, while easier access will make it less relevant.

“The question then will be, will streaming platforms be fragmented and exclusive content bundled into a hundred pieces besides Netflix, or will consumer choice and convenience win out in a cross-platform way?

“A piracy increase or reduction will depend on how that plays out because nobody wants to worry about ten monthly subscriptions to ten different streaming services, much less a hundred,” Gary concludes.

Perhaps we should revisit this again next decade…


The second post in this series, with Peter Sunde, will be published this weekend. The other two pioneers did not respond or declined to take part.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Instrumenting Web Apps Using AWS X-Ray

Post Syndicated from Bharath Kumar original https://aws.amazon.com/blogs/devops/instrumenting-web-apps-using-aws-x-ray/

This post was written by James Bowman, Software Development Engineer, AWS X-Ray

AWS X-Ray helps developers analyze and debug distributed applications and underlying services in production. You can identify and analyze root-causes of performance issues and errors, understand customer impact, and extract statistical aggregations (such as histograms) for optimization.

In this blog post, I will provide a step-by-step walkthrough for enabling X-Ray tracing in the Go programming language. You can use these steps to add X-Ray tracing to any distributed application.

Revel: A web framework for the Go language

This section will assist you with designing a guestbook application. Skip to “Instrumenting with AWS X-Ray” section below if you already have a Go language application.

Revel is a web framework for the Go language. It facilitates the rapid development of web applications by providing a predefined framework for controllers, views, routes, filters, and more.

To get started with Revel, run revel new github.com/jamesdbowman/guestbook. A project base is then copied to $GOPATH/src/github.com/jamesdbowman/guestbook.

$ tree -L 2
.
├── README.md
├── app
│ ├── controllers
│ ├── init.go
│ ├── routes
│ ├── tmp
│ └── views
├── conf
│ ├── app.conf
│ └── routes
├── messages
│ └── sample.en
├── public
│ ├── css
│ ├── fonts
│ ├── img
│ └── js
└── tests
└── apptest.go

Writing a guestbook application

A basic guestbook application can consist of just two routes: one to sign the guestbook and another to list all entries.
Let’s set up these routes by adding a Book controller, which can be routed to by modifying ./conf/routes.

./app/controllers/book.go:
package controllers

import (
    "math/rand"
    "time"

    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/endpoints"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/dynamodb"
    "github.com/aws/aws-sdk-go/service/dynamodb/dynamodbattribute"
    "github.com/revel/revel"
)

const TABLE_NAME = "guestbook"
const SUCCESS = "Success.\n"
const DAY = 86400

var letters = []rune("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

func init() {
    rand.Seed(time.Now().UnixNano())
}

// randString returns a random string of len n, used for DynamoDB Hash key.
func randString(n int) string {
    b := make([]rune, n)
    for i := range b {
        b[i] = letters[rand.Intn(len(letters))]
    }
    return string(b)
}

// Book controls interactions with the guestbook.
type Book struct {
    *revel.Controller
    ddbClient *dynamodb.DynamoDB
}

// Signature represents a user's signature.
type Signature struct {
    Message string
    Epoch   int64
    ID      string
}

// ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
func (c Book) ddb() *dynamodb.DynamoDB {
    if c.ddbClient == nil {
        sess := session.Must(session.NewSession(&aws.Config{
            Region: aws.String(endpoints.UsWest2RegionID),
        }))
        c.ddbClient = dynamodb.New(sess)
    }
    return c.ddbClient
}

// Sign allows users to sign the book.
// The message is to be passed as application/json typed content, listed under the "message" top level key.
func (c Book) Sign() revel.Result {
    var s Signature

    err := c.Params.BindJSON(&s)
    if err != nil {
        return c.RenderError(err)
    }
    now := time.Now()
    s.Epoch = now.Unix()
    s.ID = randString(20)

    item, err := dynamodbattribute.MarshalMap(s)
    if err != nil {
        return c.RenderError(err)
    }

    putItemInput := &dynamodb.PutItemInput{
        TableName: aws.String(TABLE_NAME),
        Item:      item,
    }
    _, err = c.ddb().PutItem(putItemInput)
    if err != nil {
        return c.RenderError(err)
    }

    return c.RenderText(SUCCESS)
}

// List allows users to list all signatures in the book.
func (c Book) List() revel.Result {
    scanInput := &dynamodb.ScanInput{
        TableName: aws.String(TABLE_NAME),
        Limit:     aws.Int64(100),
    }
    res, err := c.ddb().Scan(scanInput)
    if err != nil {
        return c.RenderError(err)
    }

    messages := make([]string, 0)
    for _, v := range res.Items {
        messages = append(messages, *(v["Message"].S))
    }
    return c.RenderJSON(messages)
}

./conf/routes:
POST /sign Book.Sign
GET /list Book.List

Creating the resources and testing

For the purposes of this blog post, the application will be run and tested locally. We will store and retrieve messages from an Amazon DynamoDB table. Use the following AWS CLI command to create the guestbook table:

aws dynamodb create-table --region us-west-2 --table-name "guestbook" --attribute-definitions AttributeName=ID,AttributeType=S AttributeName=Epoch,AttributeType=N --key-schema AttributeName=ID,KeyType=HASH AttributeName=Epoch,KeyType=RANGE --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

Now, let’s test our sign and list routes. If everything is working correctly, the following result appears:

$ curl -d '{"message":"Hello from cURL!"}' -H "Content-Type: application/json" http://localhost:9000/book/sign
Success.
$ curl http://localhost:9000/book/list
[
  "Hello from cURL!"
]%

Integrating with AWS X-Ray

Download and run the AWS X-Ray daemon

The AWS SDKs emit trace segments over UDP on port 2000. (This port can be configured.) In order for the trace segments to make it to the X-Ray service, the daemon must listen on this port and batch the segments in calls to the PutTraceSegments API.
For information about downloading and running the X-Ray daemon, see the AWS X-Ray Developer Guide.

Installing the AWS X-Ray SDK for Go

To download the SDK from GitHub, run go get -u github.com/aws/aws-xray-sdk-go/... The SDK will appear in the $GOPATH.

Enabling the incoming request filter

The first step to instrumenting an application with AWS X-Ray is to enable the generation of trace segments on incoming requests. The SDK conveniently provides an implementation of http.Handler which does exactly that. To ensure incoming web requests travel through this handler, we can modify app/init.go, adding a custom function to be run on application start.

import (
    "github.com/aws/aws-xray-sdk-go/xray"
    "github.com/revel/revel"
)

...

func init() {
  ...
    revel.OnAppStart(installXRayHandler)
}

func installXRayHandler() {
    revel.Server.Handler = xray.Handler(xray.NewFixedSegmentNamer("GuestbookApp"), revel.Server.Handler)
}

The application will now emit a segment for each incoming web request. The service graph appears:

You can customize the name of the segment to make it more descriptive by providing an alternate implementation of SegmentNamer to xray.Handler. For example, you can use xray.NewDynamicSegmentNamer(fallback, pattern) in place of the fixed namer. This namer will use the host name from the incoming web request (if it matches pattern) as the segment name. This is often useful when you are trying to separate different instances of the same application.

In addition, HTTP-centric information such as method and URL is collected in the segment’s http subsection:

"http": {
    "request": {
        "url": "/book/list",
        "method": "GET",
        "user_agent": "curl/7.54.0",
        "client_ip": "::1"
    },
    "response": {
        "status": 200
    }
},

Instrumenting outbound calls

To provide detailed performance metrics for distributed applications, the AWS X-Ray SDK needs to measure the time it takes to make outbound requests. Trace context is passed to downstream services using the X-Amzn-Trace-Id header. To draw a detailed and accurate representation of a distributed application, outbound call instrumentation is required.

AWS SDK calls

The AWS X-Ray SDK for Go provides a one-line AWS client wrapper that enables the collection of detailed per-call metrics for any AWS client. We can modify the DynamoDB client instantiation to include this line:

// ddb returns the controller's DynamoDB client, instatiating a new client if necessary.
func (c Book) ddb() *dynamodb.DynamoDB {
    if c.ddbClient == nil {
        sess := session.Must(session.NewSession(&aws.Config{
            Region: aws.String(endpoints.UsWest2RegionID),
        }))
        c.ddbClient = dynamodb.New(sess)
        xray.AWS(c.ddbClient.Client) // add subsegment-generating X-Ray handlers to this client
    }
    return c.ddbClient
}

We also need to ensure that the segment generated by our xray.Handler is passed to these AWS calls so that the X-Ray SDK knows to which segment these generated subsegments belong. In Go, the context.Context object is passed throughout the call path to achieve this goal. (In most other languages, some variant of ThreadLocal is used.) AWS clients provide a *WithContext method variant for each AWS operation, which we need to switch to:

_, err = c.ddb().PutItemWithContext(c.Request.Context(), putItemInput)
    res, err := c.ddb().ScanWithContext(c.Request.Context(), scanInput)

We now see much more detail in the Timeline view of the trace for the sign and list operations:

We can use this detail to help diagnose throttling on our DynamoDB table. In the following screenshot, the purple in the DynamoDB service graph node indicates that our table is underprovisioned. The red in the GuestbookApp node indicates that the application is throwing faults due to this throttling.

HTTP calls

Although the guestbook application does not make any non-AWS outbound HTTP calls in its current state, there is a similar one-liner to wrap HTTP clients that make outbound requests. xray.Client(c *http.Client) wraps an existing http.Client (or nil if you want to use a default HTTP client). For example:

resp, err := ctxhttp.Get(ctx, xray.Client(nil), "https://aws.amazon.com/")

Instrumenting local operations

X-Ray can also assist in measuring the performance of local compute operations. To see this in action, let’s create a custom subsegment inside the randString method:


// randString returns a random string of len n, used for DynamoDB Hash key.
func randString(ctx context.Context, n int) string {
    xray.Capture(ctx, "randString", func(innerCtx context.Context) {
        b := make([]rune, n)
        for i := range b {
            b[i] = letters[rand.Intn(len(letters))]
        }
        s := string(b)
    })
    return s
}

// we'll also need to change the callsite

s.ID = randString(c.Request.Context(), 20)

Summary

By now, you are an expert on how to instrument X-Ray for your Go applications. Instrumenting X-Ray with your applications is an easy way to analyze and debug performance issues and understand customer impact. Please feel free to give any feedback or comments below.

For more information about advanced configuration of the AWS X-Ray SDK for Go, see the AWS X-Ray SDK for Go in the AWS X-Ray Developer Guide and the aws/aws-xray-sdk-go GitHub repository.

For more information about some of the advanced X-Ray features such as histograms, annotations, and filter expressions, see the Analyzing Performance for Amazon Rekognition Apps Written on AWS Lambda Using AWS X-Ray blog post.

Using Trusted Timestamping With Java

Post Syndicated from Bozho original https://techblog.bozho.net/using-trusted-timestamping-java/

Trusted timestamping is the process of having a trusted third party (“Time stamping authority”, TSA) certify the time of a given event in electronic form. The EU regulation eIDAS gives these timestamps legal strength – i.e. nobody can dispute the time or the content of the event if it was timestamped. It is applicable to multiple scenarios, including timestamping audit logs. (Note: timestamping is not sufficient for a good audit trail as it does not prevent a malicious actor from deleting the event altogether)

There are a number of standards for trusted timestamping, the core one being RFC 3161. As most RFCs it is hard to read. Fortunately for Java users, BouncyCastle implements the standard. Unfortunately, as with most security APIs, working with it is hard, even abysmal. I had to implement it, so I’ll share the code needed to timestamp data.

The whole gist can be found here, but I’ll try to explain the main flow. Obviously, there is a lot of code that’s there to simply follow the standard. The BouncyCastle classes are a maze that’s hard to navigate.

The main method is obviously timestamp(hash, tsaURL, username, password):

public TimestampResponseDto timestamp(byte[] hash, String tsaUrl, String tsaUsername, String tsaPassword) throws IOException {
    MessageImprint imprint = new MessageImprint(sha512oid, hash);

    TimeStampReq request = new TimeStampReq(imprint, null, new ASN1Integer(random.nextLong()),
            ASN1Boolean.TRUE, null);

    byte[] body = request.getEncoded();
    try {
        byte[] responseBytes = getTSAResponse(body, tsaUrl, tsaUsername, tsaPassword);

        ASN1StreamParser asn1Sp = new ASN1StreamParser(responseBytes);
        TimeStampResp tspResp = TimeStampResp.getInstance(asn1Sp.readObject());
        TimeStampResponse tsr = new TimeStampResponse(tspResp);

        checkForErrors(tsaUrl, tsr);

        // validate communication level attributes (RFC 3161 PKIStatus)
        tsr.validate(new TimeStampRequest(request));

        TimeStampToken token = tsr.getTimeStampToken();
            
        TimestampResponseDto response = new TimestampResponseDto();
        response.setTime(getSigningTime(token.getSignedAttributes()));
        response.setEncodedToken(Base64.getEncoder().encodeToString(token.getEncoded()));
           
        return response;
    } catch (RestClientException | TSPException | CMSException | OperatorCreationException | GeneralSecurityException e) {
        throw new IOException(e);
    }
}

It prepares the request by creating the message imprint. Note that you are passing the hash itself, but also the hashing algorithm used to make the hash. Why isn’t the API hiding that from you, I don’t know. In my case the hash is obtained in a more complicated way, so it’s useful, but still. Then we get the raw form of the request and send it to the TSA (time stamping authority). It is an HTTP request, sort of simple, but you have to take care of some request and response headers that are not necessarily consistent across TSAs. The username and password are optional, some TSAs offer the service (rate-limited) without authentication.

When you have the raw response back, you parse it to a TimeStampResponse. Again, you have to go through 2 intermediate objects (ASN1StreamParser and TimeStampResp), which may be a proper abstraction, but is not a usable API.

Then you check if the response was successful, and you also have to validate it – the TSA may have returned a bad response. Ideally all of that could’ve been hidden from you. Validation throws an exception, which in this case I just propagate by wrapping in an IOException.

Finally, you get the token and return the response. The most important thing is the content of the token, which in my case was needed as Base64, so I encode it. It could just be the raw bytes as well. If you want to get any additional data from the token (e.g. the signing time), it’s not that simple; you have to parse the low-level attributes (seen in the gist).

Okay, you have the token now, and you can store it in a database. Occasionally you may want to validate whether timestamps have not been tampered with (which is my usecase). The code is here, and I won’t even try to explain it – it’s a ton of boilerplate that is also accounting for variations in the way TSAs respond (I’ve tried a few). The fact that a DummyCertificate class is needed either means I got something very wrong, or confirms my critique for the BouncyCastle APIs. The DummyCertificate may not be needed for some TSAs, but it is for others, and you actually can’t instantiate it that easily. You need a real certificate to construct it (which is not included in the gist; using the init() method in the next gist you can create the dummy with dummyCertificate = new DummyCertificate(certificateHolder.toASN1Structure());). In my code these are all one class, but for presenting them I decided to split it, hence this little duplication.

Okay, now we can timestamp and validate timestamps. That should be enough; but for testing purposes (or limited internal use) you may want to do the timestamping locally instead of asking a TSA. The code can be found here. It uses spring, but you can instead pass the keystore details as arguments to the init method. You need a JKS store with a keypair and a certificate, and I used KeyStore Explorer to create them. If you are running your application in AWS, you may want to encrypt your keystore using KMS (Key Management Service), and then decrypt it on application load, but that’s out of the scope of this article. For the local timestamping validation works as expected, and for timestamping – instead of calling the external service, just call localTSA.timestamp(req);

How did I get to know which classes to instantiate and which parameters to pass – I don’t remember. Looking at tests, examples, answers, sources. It took a while, and so I’m sharing it, to potentially save some trouble of others.

A list of TSAs you can test with: SafeCreative, FreeTSA, time.centum.pl.

I realize this does not seem applicable to many scenarios, but I would recommend timestamping some critical pieces of your application data. And it is generally useful to have it in your “toolbox”, ready to use, rather than trying to read the standard and battling with BouncyCastle classes for days in order to achieve this allegedly simple task.

The post Using Trusted Timestamping With Java appeared first on Bozho's tech blog.

All the lights, all of the twinkly lights

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/all-of-the-lights/

Twinkly lights are to Christmas what pumpkins are to Halloween. And when you add a Raspberry Pi to your light show, the result instantly goes from “Meh, yeah.” to “OMG, wow!”

Here are some cool light-based Christmas projects to inspire you this weekend.

Raspberry Pi Christmas Lights

App-based light control

Christmas Tree Lights Demo

Project Code – https://github.com/eidolonFIRE/Christmas-Lights Raspberry Pi A+ ws2812b – https://smile.amazon.com/gp/product/B01H04YAIQ/ref=od_aui_detailpages00?ie=UTF8&psc=1 200w 5V supply – https://smile.amazon.com/gp/product/B01LZRIWZD/ref=od_aui_detailpages01?ie=UTF8&psc=1

In his Christmas lights project, Caleb Johnson uses an app as a control panel to switch between predefined displays. The full code is available on his GitHub, and it connects a Raspberry Pi A+ to a strip of programmable LEDs that change their pattern at the touch of a phone screen.

What’s great about this project, aside from the simplicity of its design, is the scope for extending it. Why not share the app with friends and family, allowing them to control your lights remotely? Or link the lights to social media so they are triggered by a specific hashtag, like in Alex Ellis’ #cheerlights project below.

Worldwide holiday #cheerlights

Holiday lights hack – 1$ Snowman + Raspberry Pi

Here we have a smart holiday light which will only run when it detects your presence in the room through a passive infrared PIR sensor. I’ve used hot glue for the fixings and an 8-LED NeoPixel strip connected to port 18.

Cheerlights, an online service created by Hans Scharler, allows makers to incorporate hashtag-controlled lighting into the projects. By tweeting the hashtag #cheerlights, followed by a colour, you can control a network of lights so that they are all displaying the same colour.

For his holiday light hack using Cheerlights, Alex incorporated the Pimoroni Blinkt! and a collection of cheap Christmas decorations to create cute light-up ornaments for the festive season.

To make your own, check out Alex’s blog post, and head to your local £1/$1 store for hackable decor. You could even link your Christmas tree and the trees of your family, syncing them all in one glorious, Santa-pleasing spectacular.

Outdoor decorations

DIY musical Xmas lights for beginners with raspberry pi

With just a few bucks of extra material, I walk you through converting your regular Christmas lights into a whole-house light show. The goal here is to go from scratch. Although this guide is intended for people who don’t know how to use linux at all and those who do alike, the focus is for people for whom linux and the raspberry pi are a complete mystery.

Looking to outdo your neighbours with your Christmas light show this year? YouTuber Makin’Things has created a beginners guide to setting up a Raspberry Pi–based musical light show for your facade, complete with information on soldering, wiring, and coding.

Once you’ve wrapped your house in metres and metres of lights and boosted your speakers so they can be heard for miles around, why not incorporate #cheerlights to make your outdoor decor interactive?

Still not enough? How about controlling your lights using a drum kit? Christian Kratky’s MIDI-Based Christmas Lights Animation system (or as I like to call it, House Rock) does exactly that.

Eye Of The Tiger (MIDI based christmas lights animation system prototype)

Project documentation and source code: https://www.hackster.io/cyborg-titanium-14/light-pi-1c88b0 The song is taken from: https://www.youtube.com/watch?v=G6r1dAire0Y

Any more?

We know these projects are just the tip of the iceberg when it comes to the Raspberry Pi–powered Christmas projects out there, and as always, we’d love you to share yours with us. So post a link in the comments below, or tag us on social media when posting your build photos, videos, and/or blog links. ‘Tis the season for sharing after all.

The post All the lights, all of the twinkly lights appeared first on Raspberry Pi.

net-creds – Sniff Passwords From Interface or PCAP File

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/12/net-creds-sniff-passwords-from-interface-or-pcap-file/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

net-creds – Sniff Passwords From Interface or PCAP File

net-creds is a Python-based tool for sniffing plaintext passwords and hashes from a network interface or PCAP file – it doesn’t rely on port numbers for service identification and can concatenate fragmented packets.

Features of net-creds for Sniffing Passwords

It can sniff the following directly from a network interface or from a PCAP file:

  • URLs visited
  • POST loads sent
  • HTTP form logins/passwords
  • HTTP basic auth logins/passwords
  • HTTP searches
  • FTP logins/passwords
  • IRC logins/passwords
  • POP logins/passwords
  • IMAP logins/passwords
  • Telnet logins/passwords
  • SMTP logins/passwords
  • SNMP community string
  • NTLMv1/v2 all supported protocols: HTTP, SMB, LDAP, etc.

Read the rest of net-creds – Sniff Passwords From Interface or PCAP File now! Only available at Darknet.

The Intel ME vulnerabilities are a big deal for some people, harmless for most

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/49788.html

(Note: all discussion here is based on publicly disclosed information, and I am not speaking on behalf of my employers)

I wrote about the potential impact of the most recent Intel ME vulnerabilities a couple of weeks ago. The details of the vulnerability were released last week, and it’s not absolutely the worst case scenario but it’s still pretty bad. The short version is that one of the (signed) pieces of early bringup code for the ME reads an unsigned file from flash and parses it. Providing a malformed file could result in a buffer overflow, and a moderately complicated exploit chain could be built that allowed the ME’s exploit mitigation features to be bypassed, resulting in arbitrary code execution on the ME.

Getting this file into flash in the first place is the difficult bit. The ME region shouldn’t be writable at OS runtime, so the most practical way for an attacker to achieve this is to physically disassemble the machine and directly reprogram it. The AMT management interface may provide a vector for a remote attacker to achieve this – for this to be possible, AMT must be enabled and provisioned and the attacker must have valid credentials[1]. Most systems don’t have provisioned AMT, so most users don’t have to worry about this.

Overall, for most end users there’s little to worry about here. But the story changes for corporate users or high value targets who rely on TPM-backed disk encryption. The way the TPM protects access to the disk encryption key is to insist that a series of “measurements” are correct before giving the OS access to the disk encryption key. The first of these measurements is obtained through the ME hashing the first chunk of the system firmware and passing that to the TPM, with the firmware then hashing each component in turn and storing those in the TPM as well. If someone compromises a later point of the chain then the previous step will generate a different measurement, preventing the TPM from releasing the secret.

However, if the first step in the chain can be compromised, all these guarantees vanish. And since the first step in the chain relies on the ME to be running uncompromised code, this vulnerability allows that to be circumvented. The attacker’s malicious code can be used to pass the “good” hash to the TPM even if the rest of the firmware has been tampered with. This allows a sufficiently skilled attacker to extract the disk encryption key and read the contents of the disk[2].

In addition, TPMs can be used to perform something called “remote attestation”. This allows the TPM to provide a signed copy of the recorded measurements to a remote service, allowing that service to make a policy decision around whether or not to grant access to a resource. Enterprises using remote attestation to verify that systems are appropriately patched (eg) before they allow them access to sensitive material can no longer depend on those results being accurate.

Things are even worse for people relying on Intel’s Platform Trust Technology (PTT), which is an implementation of a TPM that runs on the ME itself. Since this vulnerability allows full access to the ME, an attacker can obtain all the private key material held in the PTT implementation and, effectively, adopt the machine’s cryptographic identity. This allows them to impersonate the system with arbitrary measurements whenever they want to. This basically renders PTT worthless from an enterprise perspective – unless you’ve maintained physical control of a machine for its entire lifetime, you have no way of knowing whether it’s had its private keys extracted and so you have no way of knowing whether the attestation attempt is coming from the machine or from an attacker pretending to be that machine.

Bootguard, the component of the ME that’s responsible for measuring the firmware into the TPM, is also responsible for verifying that the firmware has an appropriate cryptographic signature. Since that can be bypassed, an attacker can reflash modified firmware that can do pretty much anything. Yes, that probably means you can use this vulnerability to install Coreboot on a system locked down using Bootguard.

(An aside: The Titan security chips used in Google Cloud Platform sit between the chipset and the flash and verify the flash before permitting anything to start reading from it. If an attacker tampers with the ME firmware, Titan should detect that and prevent the system from booting. However, I’m not involved in the Titan project and don’t know exactly how this works, so don’t take my word for this)

Intel have published an update that fixes the vulnerability, but it’s pretty pointless – there’s apparently no rollback protection in the affected 11.x MEs, so while the attacker is modifying your flash to insert the payload they can just downgrade your ME firmware to a vulnerable version. Version 12 will reportedly include optional rollback protection, which is little comfort to anyone who has current hardware. Basically, anyone whose threat model depends on the low-level security of their Intel system is probably going to have to buy new hardware.

This is a big deal for enterprises and any individuals who may be targeted by skilled attackers who have physical access to their hardware, and entirely irrelevant for almost anybody else. If you don’t know that you should be worried, you shouldn’t be.

[1] Although admins should bear in mind that any system that hasn’t been patched against CVE-2017-5689 considers an empty authentication cookie to be a valid credential

[2] TPMs are not intended to be strongly tamper resistant, so an attacker could also just remove the TPM, decap it and (with some effort) extract the key that way. This is somewhat more time consuming than just reflashing the firmware, so the ME vulnerability still amounts to a change in attack practicality.

comment count unavailable comments

Dutch Film Distributor Wins Right To Chase Pirates, Store Data For 5 Years

Post Syndicated from Andy original https://torrentfreak.com/dutch-film-distributor-wins-right-to-chase-pirates-store-data-for-5-years-171208/

For many years, Dutch Internet users were allowed to download copyrighted content without reprisals, provided it was for their own personal use.

In 2014, however, the European Court of Justice ruled that the country’s “piracy levy” to compensate rightsholders was unlawful. Almost immediately, the government announced a downloading ban.

In March 2016, anti-piracy outfit BREIN followed up by obtaining permission from the Dutch Data Protection Authority to track and store the personal data of alleged BitTorrent pirates. This year, movie distributor Dutch FilmWorks (DFW) made a similar application.

The company said that it would be pursuing alleged pirates to deter future infringement but many suspected that securing cash settlements was its main aim. That was confirmed in August.

“[The letter to alleged pirates] will propose a fee. If someone does not agree [to pay], the organization can start a lawsuit,” said DFW CEO Willem Pruijsserts

“In Germany, this costs between €800 and €1,000, although we find this a bit excessive. But of course it has to be a deterrent, so it will be more than a tenner or two,” he added.

But despite the grand plans, nothing would be possible without first obtaining the necessary permission from the Data Protection Authority. This Wednesday, however, that arrived.

“DFW has given sufficient guarantees for the proper and careful processing of personal data. This means that DFW has been given a green light from the Data Protection Authority to collect personal data, such as IP addresses, from people downloading from illegal sources,” the Authority announced.

Noting that it received feedback from four entities during the six-week consultation process following the publication of its draft decision during the summer, the Data Protection Authority said that further investigations were duly carried out. All input was considered before handing down the final decision.

The Authority said it was satisfied that personal data would be handled correctly and that the information collected and stored would be encrypted and hashed to ensure integrity. Furthermore, data will not be retained for longer than is necessary.

“DFW has stated…that data from users with Dutch IP addresses who were involved in the exchange of a title owned by DFW, but in respect of which there is no intention to follow up on that within three months after receipt, will be destroyed,” the decision reads.

For any cases that are active and haven’t been discarded in the initial three-month period, DFW will be allowed to hold alleged pirates’ data for a maximum of five years, a period that matches the time a company has to file a claim under the Dutch Civil Code.

“When DFW does follow up on a file, DFW carries out further research into the identity of the users of the IP addresses. For this, it is necessary to contact the Internet service providers of the subscribers who used the IP addresses found in the BitTorrent network,” the Authority notes.

According to the decision, once DFW has a person’s details it can take any of several actions, starting with a simple warning or moving up to an amicable cash settlement. Failing that, it might choose to file a full-on court case in which the distributor seeks an injunction against the alleged pirate plus compensation and costs.

Only time will tell what strategy DFW will deploy against alleged pirates but since these schemes aren’t cheap to run, it’s likely that simple warning letters will be seriously outnumbered by demands for cash settlement.

While it seems unlikely that the Data Protection Authority will change its mind at this late stage, it’s decision remains open to appeal. Interested parties have just under six weeks to make their voices heard. Failing that, copyright trolling will hit the Netherlands in the weeks and months to come.

The full decision can be found here (Dutch, pdf) via Tweakers

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

"Crypto" Is Being Redefined as Cryptocurrencies

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/12/crypto_is_being.html

I agree with Lorenzo Franceschi-Bicchierai, “Cryptocurrencies aren’t ‘crypto’“:

Lately on the internet, people in the world of Bitcoin and other digital currencies are starting to use the word “crypto” as a catch-all term for the lightly regulated and burgeoning world of digital currencies in general, or for the word “cryptocurrency” — which probably shouldn’t even be called “currency,” by the way.

[…]

To be clear, I’m not the only one who is mad about this. Bitcoin and other technologies indeed do use cryptography: all cryptocurrency transactions are secured by a “public key” known to all and a “private key” known only to one party­ — this is the basis for a swath of cryptographic approaches (known as public key, or asymmetric cryptography) like PGP. But cryptographers say that’s not really their defining trait.

“Most cryptocurrency barely has anything to do with serious cryptography,” Matthew Green, a renowned computer scientist who studies cryptography, told me via email. “Aside from the trivial use of digital signatures and hash functions, it’s a stupid name.”

It is a stupid name.

Kernel prepatch 4.15-rc2

Post Syndicated from corbet original https://lwn.net/Articles/740516/rss

The second 4.15 kernel prepatch is out for
testing. “One thing I’ll point out is that I’m trying to get some kernel ASLR
leaks plugged, and as part of that we now hash any pointers printed by
‘%p’ by default. That won’t affect a lot of people, but where it is a
debugging problem (rather than leaking interesting kernel pointers),
we will have to fix things up.

Google Says It Can’t Filter Pirated Content Proactively

Post Syndicated from Ernesto original https://torrentfreak.com/google-says-it-cant-filter-pirated-content-proactively-171202/

Over the past few years the entertainment industries have repeatedly asked Google to step up its game when it comes to its anti-piracy efforts.

These calls haven’t fallen on deaf ears and Google has steadily implemented various anti-piracy measures in response.

Still, that is not enough. At least, according to several prominent music industry groups who are advocating a ‘Take Down, Stay Down’ approach.

Currently, Google mostly responds to takedown requests that are sent in by copyright holders. The search engine deletes the infringing results and demotes the domains of frequent infringers. However, the same content often reappears on other sites, or in another location on the same site.

Earlier this year a group of prominent music groups stated that the present situation forces rightsholders to participate in a never-ending game of whack-a-mole which doesn’t fix the underlying problem. Instead, it results in a “frustrating, burdensome and ultimately ineffective takedown process.”

While Google understands the rationale behind the complaints, the company doesn’t believe in a more proactive solution. This was reiterated by Matt Brittin, President of EMEA Business & Operations at Google, during the Royal Television Society Event in London this week.

“The music industry has been quite tough with us on this. They’d like us proactively to know this stuff. It’s just not possible in this industry,” Brittin said.

That doesn’t mean that Google is sitting still. Brittin stresses that the company has invested millions in anti-piracy tools. That said, there can always be room for improvement.

“What we’ve tried to do is build tools that allow them to do that at scale easily and that work all together … I’m sure there are places where we could do better. There are teams and millions of dollars invested in this.

“Combatting bad acts and piracy is obviously very important to us,” Brittin added.

While Google sees no room for proactive filtering in search results, music industry insiders believe it’s possible.

Ideally, they want some type of automated algorithm or technology that removes infringing results without a targeted DMCA notice. This could be similar to YouTube’s Content-ID system, or the hash filtering mechanisms Google Drive employs, for example.

For now, however, there’s no sign that Google will go beyond the current takedown notice approach, at least for search. A ‘Take Down, Stay Down’ mechanism wouldn’t “understand” when content is authorized or not, the company previously noted.

And so, the status quo is likely to remain, at least for now.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

GDPR – A Practical Guide For Developers

Post Syndicated from Bozho original https://techblog.bozho.net/gdpr-practical-guide-developers/

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.

Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects.

I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.

The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them), the right to data portability (the user should be able to get a machine-readable dump of their data).

Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.

It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)

Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s.

  • “Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blcokchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse.
  • Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
  • Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
  • Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while).
  • Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
  • Consent checkboxes – this is in my opinion the biggest change that the regulation brings. “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”.
  • Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
  • “See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned)
  • Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviosuly, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).

Now some “do’s”, which are mostly about the technical measures needed to protect personal data. They may be more “ops” than “dev”, but often the application also has to be extended to support them. I’ve listed most of what I could think of in a previous post.

  • Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
  • Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
  • Encrypt your backups – kind of obvious
  • Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person
  • Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
  • Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
  • Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose
  • Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)

Finally, some “don’t’s”.

  • Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent.
  • Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
  • Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
  • Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
  • Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards

Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.

I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.

The post GDPR – A Practical Guide For Developers appeared first on Bozho's tech blog.

Object models

Post Syndicated from Eevee original https://eev.ee/blog/2017/11/28/object-models/

Anonymous asks, with dollars:

More about programming languages!

Well then!

I’ve written before about what I think objects are: state and behavior, which in practice mostly means method calls.

I suspect that the popular impression of what objects are, and also how they should work, comes from whatever C++ and Java happen to do. From that point of view, the whole post above is probably nonsense. If the baseline notion of “object” is a rigid definition woven tightly into the design of two massively popular languages, then it doesn’t even make sense to talk about what “object” should mean — it does mean the features of those languages, and cannot possibly mean anything else.

I think that’s a shame! It piles a lot of baggage onto a fairly simple idea. Polymorphism, for example, has nothing to do with objects — it’s an escape hatch for static type systems. Inheritance isn’t the only way to reuse code between objects, but it’s the easiest and fastest one, so it’s what we get. Frankly, it’s much closer to a speed tradeoff than a fundamental part of the concept.

We could do with more experimentation around how objects work, but that’s impossible in the languages most commonly thought of as object-oriented.

Here, then, is a (very) brief run through the inner workings of objects in four very dynamic languages. I don’t think I really appreciated objects until I’d spent some time with Python, and I hope this can help someone else whet their own appetite.

Python 3

Of the four languages I’m going to touch on, Python will look the most familiar to the Java and C++ crowd. For starters, it actually has a class construct.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __neg__(self):
        return Vector(-self.x, -self.y)

    def __div__(self, denom):
        return Vector(self.x / denom, self.y / denom)

    @property
    def magnitude(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5

    def normalized(self):
        return self / self.magnitude

The __init__ method is an initializer, which is like a constructor but named differently (because the object already exists in a usable form by the time the initializer is called). Operator overloading is done by implementing methods with other special __dunder__ names. Properties can be created with @property, where the @ is syntax for applying a wrapper function to a function as it’s defined. You can do inheritance, even multiply:

1
2
3
4
class Foo(A, B, C):
    def bar(self, x, y, z):
        # do some stuff
        super().bar(x, y, z)

Cool, a very traditional object model.

Except… for some details.

Some details

For one, Python objects don’t have a fixed layout. Code both inside and outside the class can add or remove whatever attributes they want from whatever object they want. The underlying storage is just a dict, Python’s mapping type. (Or, rather, something like one. Also, it’s possible to change, which will probably be the case for everything I say here.)

If you create some attributes at the class level, you’ll start to get a peek behind the curtains:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Foo:
    values = []

    def add_value(self, value):
        self.values.append(value)

a = Foo()
b = Foo()
a.add_value('a')
print(a.values)  # ['a']
b.add_value('b')
print(b.values)  # ['a', 'b']

The [] assigned to values isn’t a default assigned to each object. In fact, the individual objects don’t know about it at all! You can use vars(a) to get at the underlying storage dict, and you won’t see a values entry in there anywhere.

Instead, values lives on the class, which is a value (and thus an object) in its own right. When Python is asked for self.values, it checks to see if self has a values attribute; in this case, it doesn’t, so Python keeps going and asks the class for one.

Python’s object model is secretly prototypical — a class acts as a prototype, as a shared set of fallback values, for its objects.

In fact, this is also how method calls work! They aren’t syntactically special at all, which you can see by separating the attribute lookup from the call.

1
2
3
print("abc".startswith("a"))  # True
meth = "abc".startswith
print(meth("a"))  # True

Reading obj.method looks for a method attribute; if there isn’t one on obj, Python checks the class. Here, it finds one: it’s a function from the class body.

Ah, but wait! In the code I just showed, meth seems to “know” the object it came from, so it can’t just be a plain function. If you inspect the resulting value, it claims to be a “bound method” or “built-in method” rather than a function, too. Something funny is going on here, and that funny something is the descriptor protocol.

Descriptors

Python allows attributes to implement their own custom behavior when read from or written to. Such an attribute is called a descriptor. I’ve written about them before, but here’s a quick overview.

If Python looks up an attribute, finds it in a class, and the value it gets has a __get__ method… then instead of using that value, Python will use the return value of its __get__ method.

The @property decorator works this way. The magnitude property in my original example was shorthand for doing this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class MagnitudeDescriptor:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return (instance.x ** 2 + instance.y ** 2) ** 0.5

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    magnitude = MagnitudeDescriptor()

When you ask for somevec.magnitude, Python checks somevec but doesn’t find magnitude, so it consults the class instead. The class does have a magnitude, and it’s a value with a __get__ method, so Python calls that method and somevec.magnitude evaluates to its return value. (The instance is None check is because __get__ is called even if you get the descriptor directly from the class via Vector.magnitude. A descriptor intended to work on instances can’t do anything useful in that case, so the convention is to return the descriptor itself.)

You can also intercept attempts to write to or delete an attribute, and do absolutely whatever you want instead. But note that, similar to operating overloading in Python, the descriptor must be on a class; you can’t just slap one on an arbitrary object and have it work.

This brings me right around to how “bound methods” actually work. Functions are descriptors! The function type implements __get__, and when a function is retrieved from a class via an instance, that __get__ bundles the function and the instance together into a tiny bound method object. It’s essentially:

1
2
3
4
5
class FunctionType:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return functools.partial(self, instance)

The self passed as the first argument to methods is not special or magical in any way. It’s built out of a few simple pieces that are also readily accessible to Python code.

Note also that because obj.method() is just an attribute lookup and a call, Python doesn’t actually care whether method is a method on the class or just some callable thing on the object. You won’t get the auto-self behavior if it’s on the object, but otherwise there’s no difference.

More attribute access, and the interesting part

Descriptors are one of several ways to customize attribute access. Classes can implement __getattr__ to intervene when an attribute isn’t found on an object; __setattr__ and __delattr__ to intervene when any attribute is set or deleted; and __getattribute__ to implement unconditional attribute access. (That last one is a fantastic way to create accidental recursion, since any attribute access you do within __getattribute__ will of course call __getattribute__ again.)

Here’s what I really love about Python. It might seem like a magical special case that descriptors only work on classes, but it really isn’t. You could implement exactly the same behavior yourself, in pure Python, using only the things I’ve just told you about. Classes are themselves objects, remember, and they are instances of type, so the reason descriptors only work on classes is that type effectively does this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class type:
    def __getattribute__(self, name):
        value = super().__getattribute__(name)
        # like all op overloads, __get__ must be on the type, not the instance
        ty = type(value)
        if hasattr(ty, '__get__'):
            # it's a descriptor!  this is a class access so there is no instance
            return ty.__get__(value, None, self)
        else:
            return value

You can even trivially prove to yourself that this is what’s going on by skipping over types behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class Descriptor:
    def __get__(self, instance, owner):
        print('called!')

class Foo:
    bar = Descriptor()

Foo.bar  # called!
type.__getattribute__(Foo, 'bar')  # called!
object.__getattribute__(Foo, 'bar')  # ...

And that’s not all! The mysterious super function, used to exhaustively traverse superclass method calls even in the face of diamond inheritance, can also be expressed in pure Python using these primitives. You could write your own superclass calling convention and use it exactly the same way as super.

This is one of the things I really like about Python. Very little of it is truly magical; virtually everything about the object model exists in the types rather than the language, which means virtually everything can be customized in pure Python.

Class creation and metaclasses

A very brief word on all of this stuff, since I could talk forever about Python and I have three other languages to get to.

The class block itself is fairly interesting. It looks like this:

1
2
class Name(*bases, **kwargs):
    # code

I’ve said several times that classes are objects, and in fact the class block is one big pile of syntactic sugar for calling type(...) with some arguments to create a new type object.

The Python documentation has a remarkably detailed description of this process, but the gist is:

  • Python determines the type of the new class — the metaclass — by looking for a metaclass keyword argument. If there isn’t one, Python uses the “lowest” type among the provided base classes. (If you’re not doing anything special, that’ll just be type, since every class inherits from object and object is an instance of type.)

  • Python executes the class body. It gets its own local scope, and any assignments or method definitions go into that scope.

  • Python now calls type(name, bases, attrs, **kwargs). The name is whatever was right after class; the bases are position arguments; and attrs is the class body’s local scope. (This is how methods and other class attributes end up on the class.) The brand new type is then assigned to Name.

Of course, you can mess with most of this. You can implement __prepare__ on a metaclass, for example, to use a custom mapping as storage for the local scope — including any reads, which allows for some interesting shenanigans. The only part you can’t really implement in pure Python is the scoping bit, which has a couple extra rules that make sense for classes. (In particular, functions defined within a class block don’t close over the class body; that would be nonsense.)

Object creation

Finally, there’s what actually happens when you create an object — including a class, which remember is just an invocation of type(...).

Calling Foo(...) is implemented as, well, a call. Any type can implement calls with the __call__ special method, and you’ll find that type itself does so. It looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# oh, a fun wrinkle that's hard to express in pure python: type is a class, so
# it's an instance of itself
class type:
    def __call__(self, *args, **kwargs):
        # remember, here 'self' is a CLASS, an instance of type.
        # __new__ is a true constructor: object.__new__ allocates storage
        # for a new blank object
        instance = self.__new__(self, *args, **kwargs)
        # you can return whatever you want from __new__ (!), and __init__
        # is only called on it if it's of the right type
        if isinstance(instance, self):
            instance.__init__(*args, **kwargs)
        return instance

Again, you can trivially confirm this by asking any type for its __call__ method. Assuming that type doesn’t implement __call__ itself, you’ll get back a bound version of types implementation.

1
2
>>> list.__call__
<method-wrapper '__call__' of type object at 0x7fafb831a400>

You can thus implement __call__ in your own metaclass to completely change how subclasses are created — including skipping the creation altogether, if you like.

And… there’s a bunch of stuff I haven’t even touched on.

The Python philosophy

Python offers something that, on the surface, looks like a “traditional” class/object model. Under the hood, it acts more like a prototypical system, where failed attribute lookups simply defer to a superclass or metaclass.

The language also goes to almost superhuman lengths to expose all of its moving parts. Even the prototypical behavior is an implementation of __getattribute__ somewhere, which you are free to completely replace in your own types. Proxying and delegation are easy.

Also very nice is that these features “bundle” well, by which I mean a library author can do all manner of convoluted hijinks, and a consumer of that library doesn’t have to see any of it or understand how it works. You only need to inherit from a particular class (which has a metaclass), or use some descriptor as a decorator, or even learn any new syntax.

This meshes well with Python culture, which is pretty big on the principle of least surprise. These super-advanced features tend to be tightly confined to single simple features (like “makes a weak attribute“) or cordoned with DSLs (e.g., defining a form/struct/database table with a class body). In particular, I’ve never seen a metaclass in the wild implement its own __call__.

I have mixed feelings about that. It’s probably a good thing overall that the Python world shows such restraint, but I wonder if there are some very interesting possibilities we’re missing out on. I implemented a metaclass __call__ myself, just once, in an entity/component system that strove to minimize fuss when communicating between components. It never saw the light of day, but I enjoyed seeing some new things Python could do with the same relatively simple syntax. I wouldn’t mind seeing, say, an object model based on composition (with no inheritance) built atop Python’s primitives.

Lua

Lua doesn’t have an object model. Instead, it gives you a handful of very small primitives for building your own object model. This is pretty typical of Lua — it’s a very powerful language, but has been carefully constructed to be very small at the same time. I’ve never encountered anything else quite like it, and “but it starts indexing at 1!” really doesn’t do it justice.

The best way to demonstrate how objects work in Lua is to build some from scratch. We need two key features. The first is metatables, which bear a passing resemblance to Python’s metaclasses.

Tables and metatables

The table is Lua’s mapping type and its primary data structure. Keys can be any value other than nil. Lists are implemented as tables whose keys are consecutive integers starting from 1. Nothing terribly surprising. The dot operator is sugar for indexing with a string key.

1
2
3
4
5
local t = { a = 1, b = 2 }
print(t['a'])  -- 1
print(t.b)  -- 2
t.c = 3
print(t['c'])  -- 3

A metatable is a table that can be associated with another value (usually another table) to change its behavior. For example, operator overloading is implemented by assigning a function to a special key in a metatable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = { a = 1, b = 2 }
--print(t + 0)  -- error: attempt to perform arithmetic on a table value

local mt = {
    __add = function(left, right)
        return 12
    end,
}
setmetatable(t, mt)
print(t + 0)  -- 12

Now, the interesting part: one of the special keys is __index, which is consulted when the base table is indexed by a key it doesn’t contain. Here’s a table that claims every key maps to itself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = {}
local mt = {
    __index = function(table, key)
        return key
    end,
}
setmetatable(t, mt)
print(t.foo)  -- foo
print(t.bar)  -- bar
print(t[3])  -- 3

__index doesn’t have to be a function, either. It can be yet another table, in which case that table is simply indexed with the key. If the key still doesn’t exist and that table has a metatable with an __index, the process repeats.

With this, it’s easy to have several unrelated tables that act as a single table. Call the base table an object, fill the __index table with functions and call it a class, and you have half of an object system. You can even get prototypical inheritance by chaining __indexes together.

At this point things are a little confusing, since we have at least three tables going on, so here’s a diagram. Keep in mind that Lua doesn’t actually have anything called an “object”, “class”, or “method” — those are just convenient nicknames for a particular structure we might build with Lua’s primitives.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
                    ╔═══════════╗        ...
                    ║ metatable ║         ║
                    ╟───────────╢   ┌─────╨───────────────────────┐
                    ║ __index   ╫───┤ lookup table ("superclass") │
                    ╚═══╦═══════╝   ├─────────────────────────────┤
  ╔═══════════╗         ║           │ some other method           ┼─── function() ... end
  ║ metatable ║         ║           └─────────────────────────────┘
  ╟───────────╢   ┌─────╨──────────────────┐
  ║ __index   ╫───┤ lookup table ("class") │
  ╚═══╦═══════╝   ├────────────────────────┤
      ║           │ some method            ┼─── function() ... end
      ║           └────────────────────────┘
┌─────╨─────────────────┐
│ base table ("object") │
└───────────────────────┘

Note that a metatable is not the same as a class; it defines behavior, not methods. Conversely, if you try to use a class directly as a metatable, it will probably not do much. (This is pretty different from e.g. Python, where operator overloads are just methods with funny names. One nice thing about the Lua approach is that you can keep interface-like functionality separate from methods, and avoid clogging up arbitrary objects’ namespaces. You could even use a dummy table as a key and completely avoid name collisions.)

Anyway, code!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
local class = {
    foo = function(a)
        print("foo got", a)
    end,
}
local mt = { __index = class }
-- setmetatable returns its first argument, so this is nice shorthand
local obj1 = setmetatable({}, mt)
local obj2 = setmetatable({}, mt)
obj1.foo(7)  -- foo got 7
obj2.foo(9)  -- foo got 9

Wait, wait, hang on. Didn’t I call these methods? How do they get at the object? Maybe Lua has a magical this variable?

Methods, sort of

Not quite, but this is where the other key feature comes in: method-call syntax. It’s the lightest touch of sugar, just enough to have method invocation.

1
2
3
4
5
6
7
8
9
-- note the colon!
a:b(c, d, ...)

-- exactly equivalent to this
-- (except that `a` is only evaluated once)
a.b(a, c, d, ...)

-- which of course is really this
a["b"](a, c, d, ...)

Now we can write methods that actually do something.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local class = {
    bar = function(self)
        print("our score is", self.score)
    end,
}
local mt = { __index = class }
local obj1 = setmetatable({ score = 13 }, mt)
local obj2 = setmetatable({ score = 25 }, mt)
obj1:bar()  -- our score is 13
obj2:bar()  -- our score is 25

And that’s all you need. Much like Python, methods and data live in the same namespace, and Lua doesn’t care whether obj:method() finds a function on obj or gets one from the metatable’s __index. Unlike Python, the function will be passed self either way, because self comes from the use of : rather than from the lookup behavior.

(Aside: strictly speaking, any Lua value can have a metatable — and if you try to index a non-table, Lua will always consult the metatable’s __index. Strings all have the string library as a metatable, so you can call methods on them: try ("%s %s"):format(1, 2). I don’t think Lua lets user code set the metatable for non-tables, so this isn’t that interesting, but if you’re writing Lua bindings from C then you can wrap your pointers in metatables to give them methods implemented in C.)

Bringing it all together

Of course, writing all this stuff every time is a little tedious and error-prone, so instead you might want to wrap it all up inside a little function. No problem.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
local function make_object(body)
    -- create a metatable
    local mt = { __index = body }
    -- create a base table to serve as the object itself
    local obj = setmetatable({}, mt)
    -- and, done
    return obj
end

-- you can leave off parens if you're only passing in 
local Dog = {
    -- this acts as a "default" value; if obj.barks is missing, __index will
    -- kick in and find this value on the class.  but if obj.barks is assigned
    -- to, it'll go in the object and shadow the value here.
    barks = 0,

    bark = function(self)
        self.barks = self.barks + 1
        print("woof!")
    end,
}

local mydog = make_object(Dog)
mydog:bark()  -- woof!
mydog:bark()  -- woof!
mydog:bark()  -- woof!
print(mydog.barks)  -- 3
print(Dog.barks)  -- 0

It works, but it’s fairly barebones. The nice thing is that you can extend it pretty much however you want. I won’t reproduce an entire serious object system here — lord knows there are enough of them floating around — but the implementation I have for my LÖVE games lets me do this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
local Animal = Object:extend{
    cries = 0,
}

-- called automatically by Object
function Animal:init()
    print("whoops i couldn't think of anything interesting to put here")
end

-- this is just nice syntax for adding a first argument called 'self', then
-- assigning this function to Animal.cry
function Animal:cry()
    self.cries = self.cries + 1
end

local Cat = Animal:extend{}

function Cat:cry()
    print("meow!")
    Cat.__super.cry(self)
end

local cat = Cat()
cat:cry()  -- meow!
cat:cry()  -- meow!
print(cat.cries)  -- 2

When I say you can extend it however you want, I mean that. I could’ve implemented Python (2)-style super(Cat, self):cry() syntax; I just never got around to it. I could even make it work with multiple inheritance if I really wanted to — or I could go the complete opposite direction and only implement composition. I could implement descriptors, customizing the behavior of individual table keys. I could add pretty decent syntax for composition/proxying. I am trying very hard to end this section now.

The Lua philosophy

Lua’s philosophy is to… not have a philosophy? It gives you the bare minimum to make objects work, and you can do absolutely whatever you want from there. Lua does have something resembling prototypical inheritance, but it’s not so much a first-class feature as an emergent property of some very simple tools. And since you can make __index be a function, you could avoid the prototypical behavior and do something different entirely.

The very severe downside, of course, is that you have to find or build your own object system — which can get pretty confusing very quickly, what with the multiple small moving parts. Third-party code may also have its own object system with subtly different behavior. (Though, in my experience, third-party code tries very hard to avoid needing an object system at all.)

It’s hard to say what the Lua “culture” is like, since Lua is an embedded language that’s often a little different in each environment. I imagine it has a thousand millicultures, instead. I can say that the tedium of building my own object model has led me into something very “traditional”, with prototypical inheritance and whatnot. It’s partly what I’m used to, but it’s also just really dang easy to get working.

Likewise, while I love properties in Python and use them all the dang time, I’ve yet to use a single one in Lua. They wouldn’t be particularly hard to add to my object model, but having to add them myself (or shop around for an object model with them and also port all my code to use it) adds a huge amount of friction. I’ve thought about designing an interesting ECS with custom object behavior, too, but… is it really worth the effort? For all the power and flexibility Lua offers, the cost is that by the time I have something working at all, I’m too exhausted to actually use any of it.

JavaScript

JavaScript is notable for being preposterously heavily used, yet not having a class block.

Well. Okay. Yes. It has one now. It didn’t for a very long time, and even the one it has now is sugar.

Here’s a vector class again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Vector {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    dot(other) {
        return this.x * other.x + this.y * other.y;
    }
}

In “classic” JavaScript, this would be written as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function Vector(x, y) {
    this.x = x;
    this.y = y;
}

Object.defineProperty(Vector.prototype, 'magnitude', {
    configurable: true,
    enumerable: true,
    get: function() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
});


Vector.prototype.dot = function(other) {
    return this.x * other.x + this.y * other.y;
};

Hm, yes. I can see why they added class.

The JavaScript model

In JavaScript, a new type is defined in terms of a function, which is its constructor.

Right away we get into trouble here. There is a very big difference between these two invocations, which I actually completely forgot about just now after spending four hours writing about Python and Lua:

1
2
let vec = Vector(3, 4);
let vec = new Vector(3, 4);

The first calls the function Vector. It assigns some properties to this, which here is going to be window, so now you have a global x and y. It then returns nothing, so vec is undefined.

The second calls Vector with this set to a new empty object, then evaluates to that object. The result is what you’d actually expect.

(You can detect this situation with the strange new.target expression, but I have never once remembered to do so.)

From here, we have true, honest-to-god, first-class prototypical inheritance. The word “prototype” is even right there. When you write this:

1
vec.dot(vec2)

JavaScript will look for dot on vec and (presumably) not find it. It then consults vecs prototype, an object you can see for yourself by using Object.getPrototypeOf(). Since vec is a Vector, its prototype is Vector.prototype.

I stress that Vector.prototype is not the prototype for Vector. It’s the prototype for instances of Vector.

(I say “instance”, but the true type of vec here is still just object. If you want to find Vector, it’s automatically assigned to the constructor property of its own prototype, so it’s available as vec.constructor.)

Of course, Vector.prototype can itself have a prototype, in which case the process would continue if dot were not found. A common (and, arguably, very bad) way to simulate single inheritance is to set Class.prototype to an instance of a superclass to get the prototype right, then tack on the methods for Class. Nowadays we can do Object.create(Superclass.prototype).

Now that I’ve been through Python and Lua, though, this isn’t particularly surprising. I kinda spoiled it.

I suppose one difference in JavaScript is that you can tack arbitrary attributes directly onto Vector all you like, and they will remain invisible to instances since they aren’t in the prototype chain. This is kind of backwards from Lua, where you can squirrel stuff away in the metatable.

Another difference is that every single object in JavaScript has a bunch of properties already tacked on — the ones in Object.prototype. Every object (and by “object” I mean any mapping) has a prototype, and that prototype defaults to Object.prototype, and it has a bunch of ancient junk like isPrototypeOf.

(Nit: it’s possible to explicitly create an object with no prototype via Object.create(null).)

Like Lua, and unlike Python, JavaScript doesn’t distinguish between keys found on an object and keys found via a prototype. Properties can be defined on prototypes with Object.defineProperty(), but that works just as well directly on an object, too. JavaScript doesn’t have a lot of operator overloading, but some things like Symbol.iterator also work on both objects and prototypes.

About this

You may, at this point, be wondering what this is. Unlike Lua and Python (and the last language below), this is a special built-in value — a context value, invisibly passed for every function call.

It’s determined by where the function came from. If the function was the result of an attribute lookup, then this is set to the object containing that attribute. Otherwise, this is set to the global object, window. (You can also set this to whatever you want via the call method on functions.)

This decision is made lexically, i.e. from the literal source code as written. There are no Python-style bound methods. In other words:

1
2
3
4
5
// this = obj
obj.method()
// this = window
let meth = obj.method
meth()

Also, because this is reassigned on every function call, it cannot be meaningfully closed over, which makes using closures within methods incredibly annoying. The old approach was to assign this to some other regular name like self (which got syntax highlighting since it’s also a built-in name in browsers); then we got Function.bind, which produced a callable thing with a fixed context value, which was kind of nice; and now finally we have arrow functions, which explicitly close over the current this when they’re defined and don’t change it when called. Phew.

Class syntax

I already showed class syntax, and it’s really just one big macro for doing all the prototype stuff The Right Way. It even prevents you from calling the type without new. The underlying model is exactly the same, and you can inspect all the parts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Vector { ... }

console.log(Vector.prototype);  // { dot: ..., magnitude: ..., ... }
let vec = new Vector(3, 4);
console.log(Object.getPrototypeOf(vec));  // same as Vector.prototype

// i don't know why you would subclass vector but let's roll with it
class Vectest extends Vector { ... }

console.log(Vectest.prototype);  // { ... }
console.log(Object.getPrototypeOf(Vectest.prototype))  // same as Vector.prototype

Alas, class syntax has a couple shortcomings. You can’t use the class block to assign arbitrary data to either the type object or the prototype — apparently it was deemed too confusing that mutations would be shared among instances. Which… is… how prototypes work. How Python works. How JavaScript itself, one of the most popular languages of all time, has worked for twenty-two years. Argh.

You can still do whatever assignment you want outside of the class block, of course. It’s just a little ugly, and not something I’d think to look for with a sugary class.

A more subtle result of this behavior is that a class block isn’t quite the same syntax as an object literal. The check for data isn’t a runtime thing; class Foo { x: 3 } fails to parse. So JavaScript now has two largely but not entirely identical styles of key/value block.

Attribute access

Here’s where things start to come apart at the seams, just a little bit.

JavaScript doesn’t really have an attribute protocol. Instead, it has two… extension points, I suppose.

One is Object.defineProperty, seen above. For common cases, there’s also the get syntax inside a property literal, which does the same thing. But unlike Python’s @property, these aren’t wrappers around some simple primitives; they are the primitives. JavaScript is the only language of these four to have “property that runs code on access” as a completely separate first-class concept.

If you want to intercept arbitrary attribute access (and some kinds of operators), there’s a completely different primitive: the Proxy type. It doesn’t let you intercept attribute access or operators; instead, it produces a wrapper object that supports interception and defers to the wrapped object by default.

It’s cool to see composition used in this way, but also, extremely weird. If you want to make your own type that overloads in or calling, you have to return a Proxy that wraps your own type, rather than actually returning your own type. And (unlike the other three languages in this post) you can’t return a different type from a constructor, so you have to throw that away and produce objects only from a factory. And instanceof would be broken, but you can at least fix that with Symbol.hasInstance — which is really operator overloading, implement yet another completely different way.

I know the design here is a result of legacy and speed — if any object could intercept all attribute access, then all attribute access would be slowed down everywhere. Fair enough. It still leaves the surface area of the language a bit… bumpy?

The JavaScript philosophy

It’s a little hard to tell. The original idea of prototypes was interesting, but it was hidden behind some very awkward syntax. Since then, we’ve gotten a bunch of extra features awkwardly bolted on to reflect the wildly varied things the built-in types and DOM API were already doing. We have class syntax, but it’s been explicitly designed to avoid exposing the prototype parts of the model.

I admit I don’t do a lot of heavy JavaScript, so I might just be overlooking it, but I’ve seen virtually no code that makes use of any of the recent advances in object capabilities. Forget about custom iterators or overloading call; I can’t remember seeing any JavaScript in the wild that even uses properties yet. I don’t know if everyone’s waiting for sufficient browser support, nobody knows about them, or nobody cares.

The model has advanced recently, but I suspect JavaScript is still shackled to its legacy of “something about prototypes, I don’t really get it, just copy the other code that’s there” as an object model. Alas! Prototypes are so good. Hopefully class syntax will make it a bit more accessible, as it has in Python.

Perl 5

Perl 5 also doesn’t have an object system and expects you to build your own. But where Lua gives you two simple, powerful tools for building one, Perl 5 feels more like a puzzle with half the pieces missing. Clearly they were going for something, but they only gave you half of it.

In brief, a Perl object is a reference that has been blessed with a package.

I need to explain a few things. Honestly, one of the biggest problems with the original Perl object setup was how many strange corners and unique jargon you had to understand just to get off the ground.

(If you want to try running any of this code, you should stick a use v5.26; as the first line. Perl is very big on backwards compatibility, so you need to opt into breaking changes, and even the mundane say builtin is behind a feature gate.)

References

A reference in Perl is sort of like a pointer, but its main use is very different. See, Perl has the strange property that its data structures try very hard to spill their contents all over the place. Despite having dedicated syntax for arrays — @foo is an array variable, distinct from the single scalar variable $foo — it’s actually impossible to nest arrays.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (@foo, @foo);
# @bar is now a flat list of eight items: 1, 2, 3, 4, 1, 2, 3, 4

The idea, I guess, is that an array is not one thing. It’s not a container, which happens to hold multiple things; it is multiple things. Anywhere that expects a single value, such as an array element, cannot contain an array, because an array fundamentally is not a single value.

And so we have “references”, which are a form of indirection, but also have the nice property that they’re single values. They add containment around arrays, and in general they make working with most of Perl’s primitive types much more sensible. A reference to a variable can be taken with the \ operator, or you can use [ ... ] and { ... } to directly create references to anonymous arrays or hashes.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (\@foo, \@foo);
# @bar is now a nested list of two items: [1, 2, 3, 4], [1, 2, 3, 4]

(Incidentally, this is the sole reason I initially abandoned Perl for Python. Non-trivial software kinda requires nesting a lot of data structures, so you end up with references everywhere, and the syntax for going back and forth between a reference and its contents is tedious and ugly.)

A Perl object must be a reference. Perl doesn’t care what kind of reference — it’s usually a hash reference, since hashes are a convenient place to store arbitrary properties, but it could just as well be a reference to an array, a scalar, or even a sub (i.e. function) or filehandle.

I’m getting a little ahead of myself. First, the other half: blessing and packages.

Packages and blessing

Perl packages are just namespaces. A package looks like this:

1
2
3
4
5
6
7
package Foo::Bar;

sub quux {
    say "hi from quux!";
}

# now Foo::Bar::quux() can be called from anywhere

Nothing shocking, right? It’s just a named container. A lot of the details are kind of weird, like how a package exists in some liminal quasi-value space, but the basic idea is a Bag Of Stuff.

The final piece is “blessing,” which is Perl’s funny name for binding a package to a reference. A very basic class might look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package Vector;

# the name 'new' is convention, not special
sub new {
    # perl argument passing is weird, don't ask
    my ($class, $x, $y) = @_;

    # create the object itself -- here, unusually, an array reference makes sense
    my $self = [ $x, $y ];

    # associate the package with that reference
    # note that $class here is just the regular string, 'Vector'
    bless $self, $class;

    return $self;
}

sub x {
    my ($self) = @_;
    return $self->[0];
}

sub y {
    my ($self) = @_;
    return $self->[1];
}

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

# switch back to the "default" package
package main;

# -> is method call syntax, which passes the invocant as the first argument;
# for a package, that's just the package name
my $vec = Vector->new(3, 4);
say $vec->magnitude;  # 5

A few things of note here. First, $self->[0] has nothing to do with objects; it’s normal syntax for getting the value of a index 0 out of an array reference called $self. (Most classes are based on hashrefs and would use $self->{value} instead.) A blessed reference is still a reference and can be treated like one.

In general, -> is Perl’s dereferencey operator, but its exact behavior depends on what follows. If it’s followed by brackets, then it’ll apply the brackets to the thing in the reference: ->{} to index a hash reference, ->[] to index an array reference, and ->() to call a function reference.

But if -> is followed by an identifier, then it’s a method call. For packages, that means calling a function in the package and passing the package name as the first argument. For objects — blessed references — that means calling a function in the associated package and passing the object as the first argument.

This is a little weird! A blessed reference is a superposition of two things: its normal reference behavior, and some completely orthogonal object behavior. Also, object behavior has no notion of methods vs data; it only knows about methods. Perl lets you omit parentheses in a lot of places, including when calling a method with no arguments, so $vec->magnitude is really $vec->magnitude().

Perl’s blessing bears some similarities to Lua’s metatables, but ultimately Perl is much closer to Ruby’s “message passing” approach than the above three languages’ approaches of “get me something and maybe it’ll be callable”. (But this is no surprise — Ruby is a spiritual successor to Perl 5.)

All of this leads to one little wrinkle: how do you actually expose data? Above, I had to write x and y methods. Am I supposed to do that for every single attribute on my type?

Yes! But don’t worry, there are third-party modules to help with this incredibly fundamental task. Take Class::Accessor::Fast, so named because it’s faster than Class::Accessor:

1
2
3
package Foo;
use base qw(Class::Accessor::Fast);
__PACKAGE__->mk_accessors(qw(fred wilma barney));

(__PACKAGE__ is the lexical name of the current package; qw(...) is a list literal that splits its contents on whitespace.)

This assumes you’re using a hashref with keys of the same names as the attributes. $obj->fred will return the fred key from your hashref, and $obj->fred(4) will change it to 4.

You also, somewhat bizarrely, have to inherit from Class::Accessor::Fast. Speaking of which,

Inheritance

Inheritance is done by populating the package-global @ISA array with some number of (string) names of parent packages. Most code instead opts to write use base ...;, which does the same thing. Or, more commonly, use parent ...;, which… also… does the same thing.

Every package implicitly inherits from UNIVERSAL, which can be freely modified by Perl code.

A method can call its superclass method with the SUPER:: pseudo-package:

1
2
3
4
sub foo {
    my ($self) = @_;
    $self->SUPER::foo;
}

However, this does a depth-first search, which means it almost certainly does the wrong thing when faced with multiple inheritance. For a while the accepted solution involved a third-party module, but Perl eventually grew an alternative you have to opt into: C3, which may be more familiar to you as the order Python uses.

1
2
3
4
5
6
use mro 'c3';

sub foo {
    my ($self) = @_;
    $self->next::method;
}

Offhand, I’m not actually sure how next::method works, seeing as it was originally implemented in pure Perl code. I suspect it involves peeking at the caller’s stack frame. If so, then this is a very different style of customizability from e.g. Python — the MRO was never intended to be pluggable, and the use of a special pseudo-package means it isn’t really, but someone was determined enough to make it happen anyway.

Operator overloading and whatnot

Operator overloading looks a little weird, though really it’s pretty standard Perl.

1
2
3
4
5
6
7
8
package MyClass;

use overload '+' => \&_add;

sub _add {
    my ($self, $other, $swap) = @_;
    ...
}

use overload here is a pragma, where “pragma” means “regular-ass module that does some wizardry when imported”.

\&_add is how you get a reference to the _add sub so you can pass it to the overload module. If you just said &_add or _add, that would call it.

And that’s it; you just pass a map of operators to functions to this built-in module. No worry about name clashes or pollution, which is pretty nice. You don’t even have to give references to functions that live in the package, if you don’t want them to clog your namespace; you could put them in another package, or even inline them anonymously.

One especially interesting thing is that Perl lets you overload every operator. Perl has a lot of operators. It considers some math builtins like sqrt and trig functions to be operators, or at least operator-y enough that you can overload them. You can also overload the “file text” operators, such as -e $path to test whether a file exists. You can overload conversions, including implicit conversion to a regex. And most fascinating to me, you can overload dereferencing — that is, the thing Perl does when you say $hashref->{key} to get at the underlying hash. So a single object could pretend to be references of multiple different types, including a subref to implement callability. Neat.

Somewhat related: you can overload basic operators (indexing, etc.) on basic types (not references!) with the tie function, which is designed completely differently and looks for methods with fixed names. Go figure.

You can intercept calls to nonexistent methods by implementing a function called AUTOLOAD, within which the $AUTOLOAD global will contain the name of the method being called. Originally this feature was, I think, intended for loading binary components or large libraries on-the-fly only when needed, hence the name. Offhand I’m not sure I ever saw it used the way __getattr__ is used in Python.

Is there a way to intercept all method calls? I don’t think so, but it is Perl, so I must be forgetting something.

Actually no one does this any more

Like a decade ago, a council of elder sages sat down and put together a whole whizbang system that covers all of it: Moose.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
package Vector;
use Moose;

has x => (is => 'rw', isa => 'Int');
has y => (is => 'rw', isa => 'Int');

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

Moose has its own way to do pretty much everything, and it’s all built on the same primitives. Moose also adds metaclasses, somehow, despite that the underlying model doesn’t actually support them? I’m not entirely sure how they managed that, but I do remember doing some class introspection with Moose and it was much nicer than the built-in way.

(If you’re wondering, the built-in way begins with looking at the hash called %Vector::. No, that’s not a typo.)

I really cannot stress enough just how much stuff Moose does, but I don’t want to delve into it here since Moose itself is not actually the language model.

The Perl philosophy

I hope you can see what I meant with what I first said about Perl, now. It has multiple inheritance with an MRO, but uses the wrong one by default. It has extensive operator overloading, which looks nothing like how inheritance works, and also some of it uses a totally different mechanism with special method names instead. It only understands methods, not data, leaving you to figure out accessors by hand.

There’s 70% of an object system here with a clear general design it was gunning for, but none of the pieces really look anything like each other. It’s weird, in a distinctly Perl way.

The result is certainly flexible, at least! It’s especially cool that you can use whatever kind of reference you want for storage, though even as I say that, I acknowledge it’s no different from simply subclassing list or something in Python. It feels different in Perl, but maybe only because it looks so different.

I haven’t written much Perl in a long time, so I don’t know what the community is like any more. Moose was already ubiquitous when I left, which you’d think would let me say “the community mostly focuses on the stuff Moose can do” — but even a decade ago, Moose could already do far more than I had ever seen done by hand in Perl. It’s always made a big deal out of roles (read: interfaces), for instance, despite that I’d never seen anyone care about them in Perl before Moose came along. Maybe their presence in Moose has made them more popular? Who knows.

Also, I wrote Perl seriously, but in the intervening years I’ve only encountered people who only ever used Perl for one-offs. Maybe it’ll come as a surprise to a lot of readers that Perl has an object model at all.

End

Well, that was fun! I hope any of that made sense.

Special mention goes to Rust, which doesn’t have an object model you can fiddle with at runtime, but does do things a little differently.

It’s been really interesting thinking about how tiny differences make a huge impact on what people do in practice. Take the choice of storage in Perl versus Python. Perl’s massively common URI class uses a string as the storage, nothing else; I haven’t seen anything like that in Python aside from markupsafe, which is specifically designed as a string type. I would guess this is partly because Perl makes you choose — using a hashref is an obvious default, but you have to make that choice one way or the other. In Python (especially 3), inheriting from object and getting dict-based storage is the obvious thing to do; the ability to use another type isn’t quite so obvious, and doing it “right” involves a tiny bit of extra work.

Or, consider that Lua could have descriptors, but the extra bit of work (especially design work) has been enough of an impediment that I’ve never implemented them. I don’t think the object implementations I’ve looked at have included them, either. Super weird!

In that light, it’s only natural that objects would be so strongly associated with the features Java and C++ attach to them. I think that makes it all the more important to play around! Look at what Moose has done. No, really, you should bear in mind my description of how Perl does stuff and flip through the Moose documentation. It’s amazing what they’ve built.

Facebook Fingerprinting Photos to Prevent Revenge Porn

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/11/facebook_finger.html

This is a pilot project in Australia:

Individuals who have shared intimate, nude or sexual images with partners and are worried that the partner (or ex-partner) might distribute them without their consent can use Messenger to send the images to be “hashed.” This means that the company converts the image into a unique digital fingerprint that can be used to identify and block any attempts to re-upload that same image.

I’m not sure I like this. It doesn’t prevent revenge porn in general; it only prevents the same photos being uploaded to Facebook in particular. And it requires the person to send Facebook copies of all their intimate photos.

Facebook will store these images for a short period of time before deleting them to ensure it is enforcing the policy correctly, the company said.

At least there’s that.

More articles.