Tag Archives: libraries

AWS Glue Now Supports Scala Scripts

Post Syndicated from Mehul Shah original https://aws.amazon.com/blogs/big-data/aws-glue-now-supports-scala-scripts/

We are excited to announce AWS Glue support for running ETL (extract, transform, and load) scripts in Scala. Scala lovers can rejoice because they now have one more powerful tool in their arsenal. Scala is the native language for Apache Spark, the underlying engine that AWS Glue offers for performing data transformations.

Beyond its elegant language features, writing Scala scripts for AWS Glue has two main advantages over writing scripts in Python. First, Scala is faster for custom transformations that do a lot of heavy lifting because there is no need to shovel data between Python and Apache Spark’s Scala runtime (that is, the Java virtual machine, or JVM). You can build your own transformations or invoke functions in third-party libraries. Second, it’s simpler to call functions in external Java class libraries from Scala because Scala is designed to be Java-compatible. It compiles to the same bytecode, and its data structures don’t need to be converted.

To illustrate these benefits, we walk through an example that analyzes a recent sample of the GitHub public timeline available from the GitHub archive. This site is an archive of public requests to the GitHub service, recording more than 35 event types ranging from commits and forks to issues and comments.

This post shows how to build an example Scala script that identifies highly negative issues in the timeline. It pulls out issue events in the timeline sample, analyzes their titles using the sentiment prediction functions from the Stanford CoreNLP libraries, and surfaces the most negative issues.

Getting started

Before we start writing scripts, we use AWS Glue crawlers to get a sense of the data—its structure and characteristics. We also set up a development endpoint and attach an Apache Zeppelin notebook, so we can interactively explore the data and author the script.

Crawl the data

The dataset used in this example was downloaded from the GitHub archive website into our sample dataset bucket in Amazon S3, and copied to the following locations:

s3://aws-glue-datasets-<region>/examples/scala-blog/githubarchive/data/

Choose the best folder by replacing <region> with the region that you’re working in, for example, us-east-1. Crawl this folder, and put the results into a database named githubarchive in the AWS Glue Data Catalog, as described in the AWS Glue Developer Guide. This folder contains 12 hours of the timeline from January 22, 2017, and is organized hierarchically (that is, partitioned) by year, month, and day.

When finished, use the AWS Glue console to navigate to the table named data in the githubarchive database. Notice that this data has eight top-level columns, which are common to each event type, and three partition columns that correspond to year, month, and day.

Choose the payload column, and you will notice that it has a complex schema—one that reflects the union of the payloads of event types that appear in the crawled data. Also note that the schema that crawlers generate is a subset of the true schema because they sample only a subset of the data.

Set up the library, development endpoint, and notebook

Next, you need to download and set up the libraries that estimate the sentiment in a snippet of text. The Stanford CoreNLP libraries contain a number of human language processing tools, including sentiment prediction.

Download the Stanford CoreNLP libraries. Unzip the .zip file, and you’ll see a directory full of jar files. For this example, the following jars are required:

  • stanford-corenlp-3.8.0.jar
  • stanford-corenlp-3.8.0-models.jar
  • ejml-0.23.jar

Upload these files to an Amazon S3 path that is accessible to AWS Glue so that it can load these libraries when needed. For this example, they are in s3://glue-sample-other/corenlp/.

Development endpoints are static Spark-based environments that can serve as the backend for data exploration. You can attach notebooks to these endpoints to interactively send commands and explore and analyze your data. These endpoints have the same configuration as that of AWS Glue’s job execution system. So, commands and scripts that work there also work the same when registered and run as jobs in AWS Glue.

To set up an endpoint and a Zeppelin notebook to work with that endpoint, follow the instructions in the AWS Glue Developer Guide. When you are creating an endpoint, be sure to specify the locations of the previously mentioned jars in the Dependent jars path as a comma-separated list. Otherwise, the libraries will not be loaded.

After you set up the notebook server, go to the Zeppelin notebook by choosing Dev Endpoints in the left navigation pane on the AWS Glue console. Choose the endpoint that you created. Next, choose the Notebook Server URL, which takes you to the Zeppelin server. Log in using the notebook user name and password that you specified when creating the notebook. Finally, create a new note to try out this example.

Each notebook is a collection of paragraphs, and each paragraph contains a sequence of commands and the output for that command. Moreover, each notebook includes a number of interpreters. If you set up the Zeppelin server using the console, the (Python-based) pyspark and (Scala-based) spark interpreters are already connected to your new development endpoint, with pyspark as the default. Therefore, throughout this example, you need to prepend %spark at the top of your paragraphs. In this example, we omit these for brevity.

Working with the data

In this section, we use AWS Glue extensions to Spark to work with the dataset. We look at the actual schema of the data and filter out the interesting event types for our analysis.

Start with some boilerplate code to import libraries that you need:

%spark

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext

Then, create the Spark and AWS Glue contexts needed for working with the data:

@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)

You need the transient decorator on the SparkContext when working in Zeppelin; otherwise, you will run into a serialization error when executing commands.

Dynamic frames

This section shows how to create a dynamic frame that contains the GitHub records in the table that you crawled earlier. A dynamic frame is the basic data structure in AWS Glue scripts. It is like an Apache Spark data frame, except that it is designed and optimized for data cleaning and transformation workloads. A dynamic frame is well-suited for representing semi-structured datasets like the GitHub timeline.

A dynamic frame is a collection of dynamic records. In Spark lingo, it is an RDD (resilient distributed dataset) of DynamicRecords. A dynamic record is a self-describing record. Each record encodes its columns and types, so every record can have a schema that is unique from all others in the dynamic frame. This is convenient and often more efficient for datasets like the GitHub timeline, where payloads can vary drastically from one event type to another.

The following creates a dynamic frame, github_events, from your table:

val github_events = glueContext
                    .getCatalogSource(database = "githubarchive", tableName = "data")
                    .getDynamicFrame()

The getCatalogSource() method returns a DataSource, which represents a particular table in the Data Catalog. The getDynamicFrame() method returns a dynamic frame from the source.

Recall that the crawler created a schema from only a sample of the data. You can scan the entire dataset, count the rows, and print the complete schema as follows:

github_events.count
github_events.printSchema()

The result looks like the following:

The data has 414,826 records. As before, notice that there are eight top-level columns, and three partition columns. If you scroll down, you’ll also notice that the payload is the most complex column.

Run functions and filter records

This section describes how you can create your own functions and invoke them seamlessly to filter records. Unlike filtering with Python lambdas, Scala scripts do not need to convert records from one language representation to another, thereby reducing overhead and running much faster.

Let’s create a function that picks only the IssuesEvents from the GitHub timeline. These events are generated whenever someone posts an issue for a particular repository. Each GitHub event record has a field, “type”, that indicates the kind of event it is. The issueFilter() function returns true for records that are IssuesEvents.

def issueFilter(rec: DynamicRecord): Boolean = { 
    rec.getField("type").exists(_ == "IssuesEvent") 
}

Note that the getField() method returns an Option[Any] type, so you first need to check that it exists before checking the type.

You pass this function to the filter transformation, which applies the function on each record and returns a dynamic frame of those records that pass.

val issue_events =  github_events.filter(issueFilter)

Now, let’s look at the size and schema of issue_events.

issue_events.count
issue_events.printSchema()

It’s much smaller (14,063 records), and the payload schema is less complex, reflecting only the schema for issues. Keep a few essential columns for your analysis, and drop the rest using the ApplyMapping() transform:

val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                 ("actor.login", "string", "actor", "string"), 
                                                 ("repo.name", "string", "repo", "string"),
                                                 ("payload.action", "string", "action", "string"),
                                                 ("payload.issue.title", "string", "title", "string")))
issue_titles.show()

The ApplyMapping() transform is quite handy for renaming columns, casting types, and restructuring records. The preceding code snippet tells the transform to select the fields (or columns) that are enumerated in the left half of the tuples and map them to the fields and types in the right half.

Estimating sentiment using Stanford CoreNLP

To focus on the most pressing issues, you might want to isolate the records with the most negative sentiments. The Stanford CoreNLP libraries are Java-based and offer sentiment-prediction functions. Accessing these functions through Python is possible, but quite cumbersome. It requires creating Python surrogate classes and objects for those found on the Java side. Instead, with Scala support, you can use those classes and objects directly and invoke their methods. Let’s see how.

First, import the libraries needed for the analysis:

import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

The Stanford CoreNLP libraries have a main driver that orchestrates all of their analysis. The driver setup is heavyweight, setting up threads and data structures that are shared across analyses. Apache Spark runs on a cluster with a main driver process and a collection of backend executor processes that do most of the heavy sifting of the data.

The Stanford CoreNLP shared objects are not serializable, so they cannot be distributed easily across a cluster. Instead, you need to initialize them once for every backend executor process that might need them. Here is how to accomplish that:

val props = new Properties()
props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
props.setProperty("parse.maxlen", "70")

object myNLP {
    lazy val coreNLP = new StanfordCoreNLP(props)
}

The properties tell the libraries which annotators to execute and how many words to process. The preceding code creates an object, myNLP, with a field coreNLP that is lazily evaluated. This field is initialized only when it is needed, and only once. So, when the backend executors start processing the records, each executor initializes the driver for the Stanford CoreNLP libraries only one time.

Next is a function that estimates the sentiment of a text string. It first calls Stanford CoreNLP to annotate the text. Then, it pulls out the sentences and takes the average sentiment across all the sentences. The sentiment is a double, from 0.0 as the most negative to 4.0 as the most positive.

def estimatedSentiment(text: String): Double = {
    if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
    val annotations = myNLP.coreNLP.process(text)
    val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
    sentences.foldLeft(0.0)( (csum, x) => { 
        csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
    }) / sentences.length
}

Now, let’s estimate the sentiment of the issue titles and add that computed field as part of the records. You can accomplish this with the map() method on dynamic frames:

val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
    val mbody = rec.getField("title")
    mbody match {
        case Some(mval: String) => { 
            rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
            rec }
        case _ => rec
    }
})

The map() method applies the user-provided function on every record. The function takes a DynamicRecord as an argument and returns a DynamicRecord. The code above computes the sentiment, adds it in a top-level field, sentiment, to the record, and returns the record.

Count the records with sentiment and show the schema. This takes a few minutes because Spark must initialize the library and run the sentiment analysis, which can be involved.

issue_sentiments.count
issue_sentiments.printSchema()

Notice that all records were processed (14,063), and the sentiment value was added to the schema.

Finally, let’s pick out the titles that have the lowest sentiment (less than 1.5). Count them and print out a sample to see what some of the titles look like.

val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))
pressing_issues.count
pressing_issues.show(10)

Next, write them all to a file so that you can handle them later. (You’ll need to replace the output path with your own.)

glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions("""{"path": "s3://<bucket>/out/path/"}"""), 
                              format = "json")
            .writeDynamicFrame(pressing_issues)

Take a look in the output path, and you can see the output files.

Putting it all together

Now, let’s create a job from the preceding interactive session. The following script combines all the commands from earlier. It processes the GitHub archive files and writes out the highly negative issues:

import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.GlueArgParser
import com.amazonaws.services.glue.util.Job
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.types._
import org.apache.spark.SparkContext
import java.util.Properties
import edu.stanford.nlp.ling.CoreAnnotations
import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations
import edu.stanford.nlp.pipeline.{Annotation, StanfordCoreNLP}
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations
import scala.collection.convert.wrapAll._

object GlueApp {

    object myNLP {
        val props = new Properties()
        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment")
        props.setProperty("parse.maxlen", "70")

        lazy val coreNLP = new StanfordCoreNLP(props)
    }

    def estimatedSentiment(text: String): Double = {
        if ((text == null) || (!text.nonEmpty)) { return Double.NaN }
        val annotations = myNLP.coreNLP.process(text)
        val sentences = annotations.get(classOf[CoreAnnotations.SentencesAnnotation])
        sentences.foldLeft(0.0)( (csum, x) => { 
            csum + RNNCoreAnnotations.getPredictedClass(x.get(classOf[SentimentCoreAnnotations.SentimentAnnotatedTree])) 
        }) / sentences.length
    }

    def main(sysArgs: Array[String]) {
        val spark: SparkContext = SparkContext.getOrCreate()
        val glueContext: GlueContext = new GlueContext(spark)

        val dbname = "githubarchive"
        val tblname = "data"
        val outpath = "s3://<bucket>/out/path/"

        val github_events = glueContext
                            .getCatalogSource(database = dbname, tableName = tblname)
                            .getDynamicFrame()

        val issue_events =  github_events.filter((rec: DynamicRecord) => {
            rec.getField("type").exists(_ == "IssuesEvent")
        })

        val issue_titles = issue_events.applyMapping(Seq(("id", "string", "id", "string"),
                                                         ("actor.login", "string", "actor", "string"), 
                                                         ("repo.name", "string", "repo", "string"),
                                                         ("payload.action", "string", "action", "string"),
                                                         ("payload.issue.title", "string", "title", "string")))

        val issue_sentiments = issue_titles.map((rec: DynamicRecord) => { 
            val mbody = rec.getField("title")
            mbody match {
                case Some(mval: String) => { 
                    rec.addField("sentiment", ScalarNode(estimatedSentiment(mval)))
                    rec }
                case _ => rec
            }
        })

        val pressing_issues = issue_sentiments.filter(_.getField("sentiment").exists(_.asInstanceOf[Double] < 1.5))

        glueContext.getSinkWithFormat(connectionType = "s3", 
                              options = JsonOptions(s"""{"path": "$outpath"}"""), 
                              format = "json")
                    .writeDynamicFrame(pressing_issues)
    }
}

Notice that the script is enclosed in a top-level object called GlueApp, which serves as the script’s entry point for the job. (You’ll need to replace the output path with your own.) Upload the script to an Amazon S3 location so that AWS Glue can load it when needed.

To create the job, open the AWS Glue console. Choose Jobs in the left navigation pane, and then choose Add job. Create a name for the job, and specify a role with permissions to access the data. Choose An existing script that you provide, and choose Scala as the language.

For the Scala class name, type GlueApp to indicate the script’s entry point. Specify the Amazon S3 location of the script.

Choose Script libraries and job parameters. In the Dependent jars path field, enter the Amazon S3 locations of the Stanford CoreNLP libraries from earlier as a comma-separated list (without spaces). Then choose Next.

No connections are needed for this job, so choose Next again. Review the job properties, and choose Finish. Finally, choose Run job to execute the job.

You can simply edit the script’s input table and output path to run this job on whatever GitHub timeline datasets that you might have.

Conclusion

In this post, we showed how to write AWS Glue ETL scripts in Scala via notebooks and how to run them as jobs. Scala has the advantage that it is the native language for the Spark runtime. With Scala, it is easier to call Scala or Java functions and third-party libraries for analyses. Moreover, data processing is faster in Scala because there’s no need to convert records from one language runtime to another.

You can find more example of Scala scripts in our GitHub examples repository: https://github.com/awslabs/aws-glue-samples. We encourage you to experiment with Scala scripts and let us know about any interesting ETL flows that you want to share.

Happy Glue-ing!

 


Additional Reading

If you found this post useful, be sure to check out Simplify Querying Nested JSON with the AWS Glue Relationalize Transform and Genomic Analysis with Hail on Amazon EMR and Amazon Athena.

 


About the Authors

Mehul Shah is a senior software manager for AWS Glue. His passion is leveraging the cloud to build smarter, more efficient, and easier to use data systems. He has three girls, and, therefore, he has no spare time.

 

 

 

Ben Sowell is a software development engineer at AWS Glue.

 

 

 

 
Vinay Vivili is a software development engineer for AWS Glue.

 

 

 

Security updates for New Year’s day

Post Syndicated from ris original https://lwn.net/Articles/742498/rss

Security updates have been issued by Debian (asterisk, gimp, thunderbird, and wireshark), Fedora (global, python-mistune, and thunderbird-enigmail), Mageia (apache, bind, emacs, ffmpeg, freerdp, gdk-pixbuf2.0, gstreamer0.10-plugins-bad/gstreamer1.0-plugins-bad, gstreamer0.10-plugins-ugly, gstreamer0.10-plugins-ugly/gstreamer1.0-plugins-ugly, gstreamer1.0-plugins-bad, heimdal, icu, ipsec-tools, jasper, kdebase4-runtime, ldns, libvirt, mupdf, ncurses, openjpeg2, openssh, python/python3, ruby, ruby-RubyGems, shotwell, thunderbird, webkit2, and X11 client libraries), openSUSE (gdk-pixbuf and phpMyAdmin), and SUSE (java-1_7_1-ibm).

How to Encrypt Amazon S3 Objects with the AWS SDK for Ruby

Post Syndicated from Doug Schwartz original https://aws.amazon.com/blogs/security/how-to-encrypt-amazon-s3-objects-with-the-aws-sdk-for-ruby/

AWS KMS image

Recently, Amazon announced some new Amazon S3 encryption and security features. The AWS Blog post showed how to use the Amazon S3 console to take advantage of these new features. However, if you have a large number of Amazon S3 buckets, using the console to implement these features could take hours, if not days. As an alternative, I created documentation topics in the AWS SDK for Ruby Developer Guide that include code examples showing you how to use the new Amazon S3 encryption features using the AWS SDK for Ruby.

What are my encryption options?

You can encrypt Amazon S3 bucket objects on a server or on a client:

  • When you encrypt objects on a server, you request that Amazon S3 encrypt the objects before saving them to disk in data centers and decrypt the objects when you download them. The main advantage of this approach is that Amazon S3 manages the entire encryption process.
  • When you encrypt objects on a client, you encrypt the objects before you upload them to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools. Use this option when:
    • Company policy and standards require it.
    • You already have a development process in place that meets your needs.

    Encrypting on the client has always been available, but you should know the following points:

    • You must be diligent about protecting your encryption keys, which is analogous to having a burglar-proof lock on your front door. If you leave a key under the mat, your security is compromised.
    • If you lose your encryption keys, you won’t be able to decrypt your data.

    If you encrypt objects on the client, we strongly recommend that you use an AWS Key Management Service (AWS KMS) managed customer master key (CMK)

How to use encryption on a server

You can specify that Amazon S3 automatically encrypts objects as you upload them to a bucket or require that objects uploaded to an Amazon S3 bucket include encryption on a server before they are uploaded to an Amazon S3 bucket.

The advantage of these settings is that when you specify them, you ensure that objects uploaded to Amazon S3 are encrypted. Alternatively, you can have Amazon S3 encrypt individual objects on the server as you upload them to a bucket or encrypt them on the server with your own key as you upload them to a bucket.

The AWS SDK for Ruby Developer Guide now contains the following topics that explain your encryption options on a server:

How to use encryption on a client

You can encrypt objects on a client before you upload them to a bucket and decrypt them after you download them from a bucket by using the Amazon S3 encryption client.

The AWS SDK for Ruby Developer Guide now contains the following topics that explain your encryption options on the client:

Note: The Amazon S3 encryption client in the AWS SDK for Ruby is compatible with other Amazon S3 encryption clients, but it is not compatible with other AWS client-side encryption libraries, including the AWS Encryption SDK and the Amazon DynamoDB encryption client for Java. Each library returns a different ciphertext (“encrypted message”) format, so you can’t use one library to encrypt objects and a different library to decrypt them. For more information, see Protecting Data Using Client-Side Encryption.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about encrypting objects on servers and clients, start a new thread on the Amazon S3 forum or contact AWS Support.

– Doug

Gamers Want DMCA Exemption for ‘Abandoned’ Online Games

Post Syndicated from Ernesto original https://torrentfreak.com/gamers-want-dmca-exemption-for-abandoned-online-games-171221/

The U.S. Copyright Office is considering whether or not to update the DMCA’s anti-circumvention provisions, which prevent the public from tinkering with DRM-protected content and devices.

These provisions are renewed every three years. To allow individuals and organizations to chime in, the Office traditionally launches a public consultation, before it makes any decisions.

This week a series of new responses were received and many of these focused on abandoned games. As is true for most software, games have a limited lifespan, so after a few years they are no longer supported by manufacturers.

To preserve these games for future generations and nostalgic gamers, the Copyright Office previously included game preservation exemptions. This means that libraries, archives and museums can use emulators and other circumvention tools to make old classics playable.

However, these exemptions are limited and do not apply to games that require a connection to an online server, which includes most recent games. When the online servers are taken down, the game simply disappears forever.

This should be prevented, according to The Museum of Art and Digital Entertainment (the MADE), a nonprofit organization operating in California.

“Although the Current Exemption does not cover it, preservation of online video games is now critical,” MADE writes in its comment to the Copyright Office.

“Online games have become ubiquitous and are only growing in popularity. For example, an estimated fifty-three percent of gamers play multiplayer games at least once a week, and spend, on average, six hours a week playing with others online.”

During the previous review, similar calls for an online exemption were made but, at the time, the Register of Copyrights noted that multiplayer games could still be played on local area networks.

“Today, however, local multiplayer options are increasingly rare, and many games no longer support LAN connected multiplayer capability,” MADE counters, adding that nowadays even some single-player games require an online connection.

“More troubling still to archivists, many video games rely on server connectivity to function in single-player mode and become unplayable when servers shut down.”

MADE asks the Copyright Office to extend the current exemptions and include games with an online connection as well. This would allow libraries, archives, and museums to operate servers for these abandoned games and keep them alive.

The nonprofit museum is not alone in its call, with digital rights group Public Knowledge submitting a similar comment. They also highlight the need to preserve online games. Not just for nostalgic gamers, but also for researchers and scholars.

This issue is more relevant than ever before, as hundreds of online multiplayer games have been abandoned already.

“It is difficult to quantify the number of multiplayer servers that have been shut down in recent years. However, Electronic Arts’ ‘Online Services Shutdown’ list is one illustrative example,” Public Knowledge writes.

“The list — which is littered with popular franchises such as FIFA World Cup, Nascar, and The Sims — currently stands at 319 games and servers discontinued since 2013, or just over one game per week since 2012.”

Finally, several ‘regular’ gaming fans have also made their feelings known. While their arguments are usually not as elaborate, the personal pleasure people still get out of older games can’t be overstated.

“I have been playing video games since the Atari 2600, for 35 years. Nowadays, game ‘museums’ — getting the opportunity to replay games from my youth, and share them with my child — are a source of joy for me,” one individual commenter wrote.

“I would love the opportunity to explore some of the early online / MMO games that I spent so much time on in the past!”

Game on?

Header image via MMOs.com

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

MQTT 5: Is it time to upgrade to MQTT 5 yet?

Post Syndicated from The HiveMQ Team original https://www.hivemq.com/blog/mqtt-5-time-to-upgrade-yet/

MQTT 5 - Is it time to upgrade yet?

Is it time to upgrade to MQTT 5 yet?

Welcome to this week’s blog post! After last week’s Introduction to MQTT 5, many readers wondered when the successor to MQTT 3.1.1 is ready for prime time and can be used in future and existing projects.

Before we try to answer the question in more detail, we’d love to hear your thoughts about upgrading to MQTT 5. We prepared a small survey below. Let us know how your MQTT 5 upgrading plans are!

The MQTT 5 OASIS Standard

As of late December 2017, the MQTT 5 specification is not available as an official “Committee Specification” yet. In other words: MQTT 5 is not available yet officially. The foundation for every implementation of the standard is, that the Technical Committee at OASIS officially releases the standard.

The good news: Although no official version of the standard is available yet, fundamental changes to the current state of the specification are not expected.
The Public Review phase of the “Committee Specification Draft 2” finished without any major comments or issues. We at HiveMQ expect the MQTT 5 standard to be released in very late December 2017 or January 2018.

Current state of client libraries

To start using MQTT 5, you need two participants: An MQTT 5 client library implementation in your programming language(s) of choice and an MQTT 5 broker implementation (like HiveMQ). If both components support the new standard, you are good to go and can use the new version in your projects.

When it comes to MQTT libraries, Eclipse Paho is the one-stop shop for MQTT clients in most programming languages. A recent Paho mailing list entry stated that Paho plans to release MQTT 5 client libraries end of June 2018 for the following programming languages:

  • C (+ embedded C)
  • Java
  • Go
  • C++

If you’re feeling adventurous, at least the Java Paho client has preliminary MQTT 5 support available. You can play around with the API and get a feel about the upcoming Paho version. Just build the library from source and test it, but be aware that this is not safe for production use.

There is also a very basic test broker implementation available at Eclipse Paho which can be used for playing around. This is of course only for very basic tests and does not support all MQTT 5 features yet. If you’re planning to write your own library, this may be a good tool to test your implementation against.

There are of other individual MQTT library projects which may be worth to check out. As of December 2017 most of these libraries don’t have an MQTT 5 roadmap published yet.

HiveMQ and MQTT 5

You can’t use the new version of the MQTT protocol only by having a client that is ready for MQTT 5. The counterpart, the MQTT broker, also needs to fully support the new protocol version. At the time of writing, no broker is MQTT 5 ready yet.

HiveMQ was the first broker to fully support version 3.1.1 of MQTT and of course here at HiveMQ we are committed to give our customers the advantage of the new features of version 5 of the protocol as soon as possible and viable.

We are going to provide an Early Access version of the upcoming HiveMQ generation with MQTT 5 support by May/June 2018. If you’re a library developer or want to go live with the new protocol version as soon as possible: The Early Access version is for you. Add yourself to the Early Access Notification List and we’ll notify you when the Early Access version is available.

We expect to release the upcoming HiveMQ generation in the third quarter of 2018 with full support of ALL MQTT 5 features at scale in an interoperable way with previous MQTT versions.

When is MQTT 5 ready for prime time?

MQTT is typically used in mission critical environments where it’s not acceptable that parts of the infrastructure, broker or client, are unreliable or have some rough edges. So it’s typically not advisable to be the very first to try out new things in a critical production environment.

Here at HiveMQ we expect that the first users will go live to production in late Q3 (September) 2018 and in the subsequent months. After the releases of the Paho library in June and the HiveMQ Early Access version, the adoption of MQTT 5 is expected to increase rapidly.

So, is MQTT 5 ready for prime time yet (as of December 2017)? No.

Will the new version of the protocol be suitable for production environments in the second half of 2018: Yes, definitely.

Upcoming topics in this series

We will continue this blog post series in January after the European Christmas Holidays. To kick-off the technical part of the series, we will take a look at the foundational changes of the MQTT protocol. And after that, we will release one blog post per week that will thoroughly review and inspect one new feature in detail together with best practices and fun trivia.

If you want us to send the next and all upcoming articles directly into your inbox, just use the newsletter subscribe form below.

P.S. Don’t forget to let us know if MQTT 5 is of interest for you by participating in this quick poll.

Have an awesome week,
The HiveMQ Team

[$] HarfBuzz brings professional typography to the desktop

Post Syndicated from jake original https://lwn.net/Articles/741722/rss

By their nature, low-level libraries go mostly unnoticed by users and
even some programmers. Usually, they are only noticed when something goes
wrong. However, HarfBuzz
deserves to be an exception. Not only does the adoption of HarfBuzz mean
that free
software’s ability to convert Unicode
characters to a font’s specific glyphs is as advanced as any proprietary
equivalent, but its increasing use means that professional typography can
now be done from the Linux desktop as easily as at a print shop.

Digital making for new parents

Post Syndicated from Carrie Anne Philbin original https://www.raspberrypi.org/blog/digital-making-for-new-parents/

Solving problems that are meaningful to us is at the core of our approach to teaching and learning about technology here at the Raspberry Pi Foundation. Over the last eight months, I’ve noticed that the types of digital making projects that motivate and engage me have changed (can’t think why). Always looking for ways to save money and automate my life and the lives of my loved ones, I’ve been thinking a lot about how digital making projects could be the new best friend of any new parent.

A baby, oblivious to the amount its parents have spent on stuff they never knew existed last year.
Image: sweet baby by MRef photography / CC BY-ND 2.0

Baby Monitor

I never knew how much equipment one small child needs until very recently. I also had no idea of the range of technology that is on offer to support you as a new parent to ensure the perfect environment outside of the womb. Baby monitors are at the top of this list. There are lots of Raspberry Pi baby monitor projects with a range of sensing functionality already in existence, and we’ve blogged about some of them before. They’re a great example of how an understanding of technology can open up a range of solutions that won’t break the bank. I’m looking forward to using all the capabilities of the Raspberry Pi to keep an eye on baby.

Baby name generator

Another surprising discovery was just how difficult it is to name a human being. Surprising because I can give a name to an inanimate object in less than three seconds, and come up with nicknames for colleagues in less than a day. My own offspring, though, and I draw a blank. The only solution: write a Python program to randomly generate names based on some parameters!

import names
from time import sleep
from guizero import App, ButtonGroup, Text, PushButton, TextBox

def get_name():
    boyname = names.get_first_name(gender='male')
    girlname = names.get_first_name(gender='female')
    othername = names.get_first_name()

    if babygender.get() == "male":
        name.set(str(boyname)+" "+str(babylastname.get()))
    elif babygender.get() == "female":
        name.set(str(girlname)+" "+str(babylastname.get()))
    else:
        name.set(str(othername)+" "+str(babylastname.get()))

app = App("Baby name generator")
surname_label = Text(app, "What is your surname?")
babylastname = TextBox(app, width=50)
babygender = ButtonGroup(app, options=[["boy", "male"], ["girl", "female"], ["all", "all"]], selected="male", horizontal=True)
intro = Text(app, "Your baby name could be")
name = Text(app, "")
button = PushButton(app, get_name, text="Generate me a name")

app.display()

Thanks to the names and GUIZero Python libraries, it is super simple to create, resolving any possible parent-to-be naming disputes in mere minutes.

Food, Poo, or Love?

I love data. Not just in Star Trek, but also more generally. Collecting and analysing data to understand my sleep patterns, my eating habits, how much exercise I do, and how much time I spend watching YouTube videos consumes much of my time. So of course I want to know lots about the little person we’ve made, long before he can use language to tell us himself.

I’m told that most newborns’ needs are quite simple: they want food, they want to be changed, or they just want some cuddles. I’m certain it’s more complicated than this, but it’s a good starting point for a data set, so stick with me here. I also wondered whether there might be a correlation between the amplitude of the cry and the type of need the baby has. A bit of an imprecise indicator, maybe, but fun to start to think about.

This build’s success is mostly thanks to Pimoroni’s Rainbow HAT, which, conveniently, has three capacitive touch buttons to record the newborn’s need, four fourteen-segment displays to display the words “FOOD”, “POO”, and “LOVE” when a button is pressed, and seven multicoloured LEDs to indicate the ferociousness of the baby’s cry in glorious technicolour. With the addition of a microphone, the ‘Food, Poo, Love Machine’ was born. Here it is in action:

Food Poo Love – Raspberry Pi Baby Monitor Project

Food Poo Love – The Raspberry Pi baby monitor project that allows you to track data on your new born baby.

Automatic Baby mobile

Another project that I’ve not had time to hack on, but that I think would be really awesome, is to automate a baby cot mobile. Imagine this one moving to the Star Trek theme music:

Image courtesy of Gisele Blaker Designs (check out her cool shop!)

Pretty awesome.

If you’ve got any more ideas for baby projects, do let me know. I’ll have a few months of nothing to do… right?

The post Digital making for new parents appeared first on Raspberry Pi.

Google & Facebook Excluded From Aussie Safe Harbor Copyright Amendments

Post Syndicated from Andy original https://torrentfreak.com/google-facebook-excluded-from-aussie-safe-harbor-copyright-amendments-171205/

Due to a supposed drafting error in Australia’s implementation of the Australia – US Free Trade Agreement (AUSFTA), copyright safe harbor provisions currently only apply to commercial Internet service providers.

This means that while local ISPs such as Telstra receive protection from copyright infringement complaints, services such as Google, Facebook and YouTube face legal uncertainty.

Proposed amendments to the Copyright Act earlier this year would’ve seen enhanced safe harbor protections for such platforms but they were withdrawn at the eleventh hour so that the government could consider “further feedback” from interested parties.

Shortly after the government embarked on a detailed consultation with entertainment industry groups. They accuse platforms like YouTube of exploiting safe harbor provisions in the US and Europe, which forces copyright holders into an expensive battle to have infringing content taken down. They do not want that in Australia and at least for now, they appear to have achieved their aims.

According to a report from AFR (paywall), the Australian government is set to introduce new legislation Wednesday which will expand safe harbors for some organizations but will exclude companies such as Google, Facebook, and similar platforms.

Communications Minister Mitch Fifield confirmed the exclusions while noting that additional safeguards will be available to institutions, libraries, and organizations in the disability, archive and culture sectors.

“The measures in the bill will ensure these sectors are protected from legal liability where they can demonstrate that they have taken reasonable steps to deal with copyright infringement by users of their online platforms,” Senator Fifield told AFR.

“Extending the safe harbor scheme in this way will provide greater certainty to institutions in these sectors and enhance their ability to provide more innovative and creative services for all Australians.”

According to the Senator, the government will continue its work with stakeholders to further reform safe harbor provisions, before applying them to other service providers.

The news that Google, Facebook, and similar platforms are to be denied access to the new safe harbor rules will be seen as a victory for rightsholders. They’re desperately trying to tighten up legislation in other regions where such safeguards are already in place, arguing that platforms utilizing user-generated content for profit should obtain appropriate licensing first.

This so-called ‘Value Gap’ (1,2,3) and associated proactive filtering proposals are among the hottest copyright topics right now, generating intense debate across Europe and the United States.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Glenn’s Take on re:Invent 2017 – Part 3

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-3/

Glenn Gore here, Chief Architect for AWS. I was in Las Vegas last week — with 43K others — for re:Invent 2017. I checked in to the Architecture blog here and here with my take on what was interesting about some of the bigger announcements from a cloud-architecture perspective.

In the excitement of so many new services being launched, we sometimes overlook feature updates that, while perhaps not as exciting as Amazon DeepLens, have significant impact on how you architect and develop solutions on AWS.

Amazon DynamoDB is used by more than 100,000 customers around the world, handling over a trillion requests every day. From the start, DynamoDB has offered high availability by natively spanning multiple Availability Zones within an AWS Region. As more customers started building and deploying truly-global applications, there was a need to replicate a DynamoDB table to multiple AWS Regions, allowing for read/write operations to occur in any region where the table was replicated. This update is important for providing a globally-consistent view of information — as users may transition from one region to another — or for providing additional levels of availability, allowing for failover between AWS Regions without loss of information.

There are some interesting concurrency-design aspects you need to be aware of and ensure you can handle correctly. For example, we support the “last writer wins” reconciliation where eventual consistency is being used and an application updates the same item in different AWS Regions at the same time. If you require strongly-consistent read/writes then you must perform all of your read/writes in the same AWS Region. The details behind this can be found in the DynamoDB documentation. Providing a globally-distributed, replicated DynamoDB table simplifies many different use cases and allows for the logic of replication, which may have been pushed up into the application layers to be simplified back down into the data layer.

The other big update for DynamoDB is that you can now back up your DynamoDB table on demand with no impact to performance. One of the features I really like is that when you trigger a backup, it is available instantly, regardless of the size of the table. Behind the scenes, we use snapshots and change logs to ensure a consistent backup. While backup is instant, restoring the table could take some time depending on its size and ranges — from minutes to hours for very large tables.

This feature is super important for those of you who work in regulated industries that often have strict requirements around data retention and backups of data, which sometimes limited the use of DynamoDB or required complex workarounds to implement some sort of backup feature in the past. This often incurred significant, additional costs due to increased read transactions on their DynamoDB tables.

Amazon Simple Storage Service (Amazon S3) was our first-released AWS service over 11 years ago, and it proved the simplicity and scalability of true API-driven architectures in the cloud. Today, Amazon S3 stores trillions of objects, with transactional requests per second reaching into the millions! Dealing with data as objects opened up an incredibly diverse array of use cases ranging from libraries of static images, game binary downloads, and application log data, to massive data lakes used for big data analytics and business intelligence. With Amazon S3, when you accessed your data in an object, you effectively had to write/read the object as a whole or use the range feature to retrieve a part of the object — if possible — in your individual use case.

Now, with Amazon S3 Select, an SQL-like query language is used that can work with delimited text and JSON files, as well as work with GZIP compressed files. We don’t support encryption during the preview of Amazon S3 Select.

Amazon S3 Select provides two major benefits:

  • Faster access
  • Lower running costs

Serverless Lambda functions, where every millisecond matters when you are being charged, will benefit greatly from Amazon S3 Select as data retrieval and processing of your Lambda function will experience significant speedups and cost reductions. For example, we have seen 2x speed improvement and 80% cost reduction with the Serverless MapReduce code.

Other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR will support Amazon S3 Select as well as partner offerings including Cloudera and Hortonworks. If you are using Amazon Glacier for longer-term data archival, you will be able to use Amazon Glacier Select to retrieve a subset of your content from within Amazon Glacier.

As the volume of data that can be stored within Amazon S3 and Amazon Glacier continues to scale on a daily basis, we will continue to innovate and develop improved and optimized services that will allow you to work with these magnificently-large data sets while reducing your costs (retrieval and processing). I believe this will also allow you to simplify the transformation and storage of incoming data into Amazon S3 in basic, semi-structured formats as a single copy vs. some of the duplication and reformatting of data sometimes required to do upfront optimizations for downstream processing. Amazon S3 Select largely removes the need for this upfront optimization and instead allows you to store data once and process it based on your individual Amazon S3 Select query per application or transaction need.

Thanks for reading!

Glenn contemplating why CSV format is still relevant in 2017 (Italy).

Could a Single Copyright Complaint Kill Your Domain?

Post Syndicated from Andy original https://torrentfreak.com/could-a-single-copyright-complaint-kill-your-domain-171203/

It goes without saying that domain names are a crucial part of any site’s infrastructure. Without domains, sites aren’t easily findable and when things go wrong, the majority of web users could be forgiven for thinking that they no longer exist.

That was the case last week when Canada-based mashup site Sowndhaus suddenly found that its domain had been rendered completely useless. As previously reported, the site’s domain was suspended by UK-based registrar DomainBox after it received a copyright complaint from the IFPI.

There are a number of elements to this story, not least that the site’s operators believe that their project is entirely legal.

“We are a few like-minded folks from the mashup community that were tired of doing the host dance – new sites welcome us with open arms until record industry pressure becomes too much and they mass delete and ban us,” a member of the Sowndhaus team informs TF.

“After every mass deletion there are a wave of producers that just retire and their music is lost forever. We decided to make a more permanent home for ourselves and Canada’s Copyright Modernization Act gave us the opportunity to do it legally.
We just want a small quiet corner of the internet where we can make music without being criminalized. It seems insane that I even have to say that.”

But while these are all valid concerns for the Sowndhaus community, there is a bigger picture here. There is absolutely no question that sites like YouTube and Soundcloud host huge libraries of mashups, yet somehow they hang on to their domains. Why would DomainBox take such drastic action? Is the site a real menace?

“The IFPI have sent a few standard DMCA takedown notices [to Sowndhaus, indirectly], each about a specific track or tracks on our server, asking us to remove them and any infringing activity. Every track complained about has been transformative, either a mashup or a remix and in a couple of cases cover versions,” the team explains.

But in all cases, it appears that IFPI and its agents didn’t take the time to complain to the site first. They instead went for the site’s infrastructure.

“[IFPI] have never contacted us directly, even though we have a ‘report copyright abuse’ feature on our site and a dedicated copyright email address. We’ve only received forwarded emails from our host and domain registrar,” the site says.

Sowndhaus believes that the event that led to the domain suspension was caused by a support ticket raised by the “RiskIQ Incident Response Team”, who appear to have been working on behalf of IFPI.

“We were told by DomainBox…’Please remove the unlawful content from your website, or the domain will be suspended. Please reply within the next 5 working days to ensure the request was actioned’,” Sowndhaus says.

But they weren’t given five days, or even one. DomainBox chose to suspend the Sowndhaus.com domain name immediately, rendering the site inaccessible and without even giving the site a chance to respond.

“They didn’t give us an option to appeal the decision. They just took the IFPI’s word that the files were unlawful and must be removed,” the site informs us.

Intrigued at why DomainBox took the nuclear option, TorrentFreak sent several emails to the company but each time they went unanswered. We also sent emails to Mesh Digital Ltd, DomainBox’s operator, but they were given the same treatment.

We wanted to know on what grounds the registrar suspended the domain but perhaps more importantly, we wanted to know if the company is as aggressive as this with its other customers.

To that end we posed a question: If DomainBox had been entrusted with the domains of YouTube or Soundcloud, would they have acted in the same manner? We can’t put words in their mouth but it seems likely that someone in the company would step in to avoid a PR disaster on that scale.

Of course, both YouTube and Soundcloud comply with the law by taking down content when it infringes someone’s rights. It’s a position held by Sowndhaus too, even though they do not operate in the United States.

“We comply fully with the Copyright Act (Canada) and have our own policy of removing any genuinely infringing content,” the site says, adding that users who infringe are banned from the platform.

While there has never been any suggestion that IFPI or its agents asked for Sowndhaus’ domain to be suspended, it’s clear that DomainBox made a decision to do just that. In some cases that might have been warranted, but registrars should definitely aim for a clear, transparent and fair process, so that the facts can be reviewed and appropriate action taken.

It’s something for people to keep in mind when they register a domain in future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Netflix Is Not Going to Kill Piracy, Research Suggests

Post Syndicated from Ernesto original https://torrentfreak.com/netflix-not-going-kill-piracy-research-suggests-171129/

There is little doubt that, in many countries, Netflix has become the standard for watching movies on the Internet.

Generally speaking, on-demand streaming services are convenient alternatives to piracy. However, millions of people stick to their old pirate habits, Netflix subscription or not.

Intrigued by this interplay of legal and unauthorized viewing, researchers from Carnegie Mellon University and Universidade Católica Portuguesa carried out an extensive study. They partnered with a major telco, which is not named, to analyze if BitTorrent downloading habits can be changed by offering legal alternatives.

The researchers used a piracy-tracking firm to get a sample of thousands of BitTorrent pirates at the associated ISP. Half of them were offered a free 45-day subscription to a premium TV and movies package, allowing them to watch popular content on demand.

To measure the effects of video-on-demand access on piracy, the researchers then monitored the legal viewing activity and BitTorrent transfers of the people who received the free offer, comparing it to a control group. The results show that piracy is harder to beat than some would expect.

Subscribers who received the free subscription watched more TV, but overall their torrenting habits didn’t change significantly.

“We find that, on average, households that received the gift increased overall TV consumption by 4.6% and reduced Internet downloads and uploads by 4.2% and 4.5%, respectively. However, and also on average, treated households did not change their likelihood of using BitTorrent during the experiment,” the researchers write.

One of the main problems was that these ‘pirates’ couldn’t get all their favorite shows and movies on the legal service, which is a common problem. For the small portion of subscribers who had access to their preferred content, the researchers did find an effect on torrent traffic.

“Households with preferences aligned with the gifted content reduced their probability of using BitTorrent during the experiment by 18% and decreased their amount of upload traffic by 45%,” the paper reads.

The video-on-demand service in the study had an average “fit” of just 12% with people’s viewing preferences, which means that they were missing a lot of content. But even Netflix, which has a library of thousands of titles, only has a fit of roughly 50%.

The researchers show that the lack of availability is partly caused by licensing windows, which makes it hard for legal video streaming services to compete with piracy.

“We show that licensing windows impose significant restrictions on the content that can be included in SVoD catalogs, which hampers the ability of content distributors to offer catalogs that cater to the preferences of pirates,” they write.

However, even if more content became available, piracy wouldn’t magically disappear. In the experiment, subscribers were offered free access to a video on demand service. In the real world, they would have to pay, which presents another barrier.

In this study, the pirate households were willing to pay at most $3.25 USD per month to access a service with a library as large as Netflix’s in the United States. That’s not enough.

This leads the researchers to the grim conclusion that video on demand services such as Netflix can’t significantly lower piracy rates. They could make a dent if they increase their content libraries while lowering the price at the same time, but that’s not going to happen.

“Together, our results show that, as a stand-alone strategy, using legal SVoD to curtail piracy will require, at the minimum, offering content much earlier and at much lower prices than those currently offered in the marketplace, changes that are likely to reduce industry revenue and that may damage overall incentives to produce new content while, at the same time, curbing only a small share of piracy,” the researchers conclude.

While Hollywood maintains that people can get pretty much anything they want legally, the current research shows that it’s not as simple as that. Most people are not going to pay for 22 separate subscriptions. Instead of more streaming services, it would be better to make more content available at the ones that are already out there.

The research was partially funded by the Carnegie Mellon University’s IDEA, which receives an unrestricted gift from the MPAA, so Hollywood will likely be clued in on the results.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN discounts, offers and coupons

Announcing Amazon FreeRTOS – Enabling Billions of Devices to Securely Benefit from the Cloud

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/announcing-amazon-freertos/

I was recently reading an article on ReadWrite.com titled “IoT devices go forth and multiply, to increase 200% by 2021“, and while the article noted the benefit for consumers and the industry of this growth, two things in the article stuck with me. The first was the specific statement that read “researchers warned that the proliferation of IoT technology will create a new bevvy of challenges. Particularly troublesome will be IoT deployments at scale for both end-users and providers.” Not only was that sentence a mouthful, but it really addressed some of the challenges that can come building solutions and deployment of this exciting new technology area. The second sentiment in the article that stayed with me was that Security issues could grow.

So the article got me thinking, how can we create these cool IoT solutions using low-cost efficient microcontrollers with a secure operating system that can easily connect to the cloud. Luckily the answer came to me by way of an exciting new open-source based offering coming from AWS that I am happy to announce to you all today. Let’s all welcome, Amazon FreeRTOS to the technology stage.

Amazon FreeRTOS is an IoT microcontroller operating system that simplifies development, security, deployment, and maintenance of microcontroller-based edge devices. Amazon FreeRTOS extends the FreeRTOS kernel, a popular real-time operating system, with libraries that enable local and cloud connectivity, security, and (coming soon) over-the-air updates.

So what are some of the great benefits of this new exciting offering, you ask. They are as follows:

  • Easily to create solutions for Low Power Connected Devices: provides a common operating system (OS) and libraries that make the development of common IoT capabilities easy for devices. For example; over-the-air (OTA) updates (coming soon) and device configuration.
  • Secure Data and Device Connections: devices only run trusted software using the Code Signing service, Amazon FreeRTOS provides a secure connection to the AWS using TLS, as well as, the ability to securely store keys and sensitive data on the device.
  • Extensive Ecosystem: contains an extensive hardware and technology ecosystem that allows you to choose a variety of qualified chipsets, including Texas Instruments, Microchip, NXP Semiconductors, and STMicroelectronics.
  • Cloud or Local Connections:  Devices can connect directly to the AWS Cloud or via AWS Greengrass.

 

What’s cool is that it is easy to get started. 

The Amazon FreeRTOS console allows you to select and download the software that you need for your solution.

There is a Qualification Program that helps to assure you that the microcontroller you choose will run consistently across several hardware options.

Finally, Amazon FreeRTOS kernel is an open-source FreeRTOS operating system that is freely available on GitHub for download.

But I couldn’t leave you without at least showing you a few snapshots of the Amazon FreeRTOS Console.

Within the Amazon FreeRTOS Console, I can select a predefined software configuration that I would like to use.

If I want to have a more customized software configuration, Amazon FreeRTOS allows you to customize a solution that is targeted for your use by adding or removing libraries.

Summary

Thanks for checking out the new Amazon FreeRTOS offering. To learn more go to the Amazon FreeRTOS product page or review the information provided about this exciting IoT device targeted operating system in the AWS documentation.

Can’t wait to see what great new IoT systems are will be enabled and created with it! Happy Coding.

Tara

 

Object models

Post Syndicated from Eevee original https://eev.ee/blog/2017/11/28/object-models/

Anonymous asks, with dollars:

More about programming languages!

Well then!

I’ve written before about what I think objects are: state and behavior, which in practice mostly means method calls.

I suspect that the popular impression of what objects are, and also how they should work, comes from whatever C++ and Java happen to do. From that point of view, the whole post above is probably nonsense. If the baseline notion of “object” is a rigid definition woven tightly into the design of two massively popular languages, then it doesn’t even make sense to talk about what “object” should mean — it does mean the features of those languages, and cannot possibly mean anything else.

I think that’s a shame! It piles a lot of baggage onto a fairly simple idea. Polymorphism, for example, has nothing to do with objects — it’s an escape hatch for static type systems. Inheritance isn’t the only way to reuse code between objects, but it’s the easiest and fastest one, so it’s what we get. Frankly, it’s much closer to a speed tradeoff than a fundamental part of the concept.

We could do with more experimentation around how objects work, but that’s impossible in the languages most commonly thought of as object-oriented.

Here, then, is a (very) brief run through the inner workings of objects in four very dynamic languages. I don’t think I really appreciated objects until I’d spent some time with Python, and I hope this can help someone else whet their own appetite.

Python 3

Of the four languages I’m going to touch on, Python will look the most familiar to the Java and C++ crowd. For starters, it actually has a class construct.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __neg__(self):
        return Vector(-self.x, -self.y)

    def __div__(self, denom):
        return Vector(self.x / denom, self.y / denom)

    @property
    def magnitude(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5

    def normalized(self):
        return self / self.magnitude

The __init__ method is an initializer, which is like a constructor but named differently (because the object already exists in a usable form by the time the initializer is called). Operator overloading is done by implementing methods with other special __dunder__ names. Properties can be created with @property, where the @ is syntax for applying a wrapper function to a function as it’s defined. You can do inheritance, even multiply:

1
2
3
4
class Foo(A, B, C):
    def bar(self, x, y, z):
        # do some stuff
        super().bar(x, y, z)

Cool, a very traditional object model.

Except… for some details.

Some details

For one, Python objects don’t have a fixed layout. Code both inside and outside the class can add or remove whatever attributes they want from whatever object they want. The underlying storage is just a dict, Python’s mapping type. (Or, rather, something like one. Also, it’s possible to change, which will probably be the case for everything I say here.)

If you create some attributes at the class level, you’ll start to get a peek behind the curtains:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Foo:
    values = []

    def add_value(self, value):
        self.values.append(value)

a = Foo()
b = Foo()
a.add_value('a')
print(a.values)  # ['a']
b.add_value('b')
print(b.values)  # ['a', 'b']

The [] assigned to values isn’t a default assigned to each object. In fact, the individual objects don’t know about it at all! You can use vars(a) to get at the underlying storage dict, and you won’t see a values entry in there anywhere.

Instead, values lives on the class, which is a value (and thus an object) in its own right. When Python is asked for self.values, it checks to see if self has a values attribute; in this case, it doesn’t, so Python keeps going and asks the class for one.

Python’s object model is secretly prototypical — a class acts as a prototype, as a shared set of fallback values, for its objects.

In fact, this is also how method calls work! They aren’t syntactically special at all, which you can see by separating the attribute lookup from the call.

1
2
3
print("abc".startswith("a"))  # True
meth = "abc".startswith
print(meth("a"))  # True

Reading obj.method looks for a method attribute; if there isn’t one on obj, Python checks the class. Here, it finds one: it’s a function from the class body.

Ah, but wait! In the code I just showed, meth seems to “know” the object it came from, so it can’t just be a plain function. If you inspect the resulting value, it claims to be a “bound method” or “built-in method” rather than a function, too. Something funny is going on here, and that funny something is the descriptor protocol.

Descriptors

Python allows attributes to implement their own custom behavior when read from or written to. Such an attribute is called a descriptor. I’ve written about them before, but here’s a quick overview.

If Python looks up an attribute, finds it in a class, and the value it gets has a __get__ method… then instead of using that value, Python will use the return value of its __get__ method.

The @property decorator works this way. The magnitude property in my original example was shorthand for doing this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class MagnitudeDescriptor:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return (instance.x ** 2 + instance.y ** 2) ** 0.5

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    magnitude = MagnitudeDescriptor()

When you ask for somevec.magnitude, Python checks somevec but doesn’t find magnitude, so it consults the class instead. The class does have a magnitude, and it’s a value with a __get__ method, so Python calls that method and somevec.magnitude evaluates to its return value. (The instance is None check is because __get__ is called even if you get the descriptor directly from the class via Vector.magnitude. A descriptor intended to work on instances can’t do anything useful in that case, so the convention is to return the descriptor itself.)

You can also intercept attempts to write to or delete an attribute, and do absolutely whatever you want instead. But note that, similar to operating overloading in Python, the descriptor must be on a class; you can’t just slap one on an arbitrary object and have it work.

This brings me right around to how “bound methods” actually work. Functions are descriptors! The function type implements __get__, and when a function is retrieved from a class via an instance, that __get__ bundles the function and the instance together into a tiny bound method object. It’s essentially:

1
2
3
4
5
class FunctionType:
    def __get__(self, instance, owner):
        if instance is None:
            return self
        return functools.partial(self, instance)

The self passed as the first argument to methods is not special or magical in any way. It’s built out of a few simple pieces that are also readily accessible to Python code.

Note also that because obj.method() is just an attribute lookup and a call, Python doesn’t actually care whether method is a method on the class or just some callable thing on the object. You won’t get the auto-self behavior if it’s on the object, but otherwise there’s no difference.

More attribute access, and the interesting part

Descriptors are one of several ways to customize attribute access. Classes can implement __getattr__ to intervene when an attribute isn’t found on an object; __setattr__ and __delattr__ to intervene when any attribute is set or deleted; and __getattribute__ to implement unconditional attribute access. (That last one is a fantastic way to create accidental recursion, since any attribute access you do within __getattribute__ will of course call __getattribute__ again.)

Here’s what I really love about Python. It might seem like a magical special case that descriptors only work on classes, but it really isn’t. You could implement exactly the same behavior yourself, in pure Python, using only the things I’ve just told you about. Classes are themselves objects, remember, and they are instances of type, so the reason descriptors only work on classes is that type effectively does this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class type:
    def __getattribute__(self, name):
        value = super().__getattribute__(name)
        # like all op overloads, __get__ must be on the type, not the instance
        ty = type(value)
        if hasattr(ty, '__get__'):
            # it's a descriptor!  this is a class access so there is no instance
            return ty.__get__(value, None, self)
        else:
            return value

You can even trivially prove to yourself that this is what’s going on by skipping over types behavior:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class Descriptor:
    def __get__(self, instance, owner):
        print('called!')

class Foo:
    bar = Descriptor()

Foo.bar  # called!
type.__getattribute__(Foo, 'bar')  # called!
object.__getattribute__(Foo, 'bar')  # ...

And that’s not all! The mysterious super function, used to exhaustively traverse superclass method calls even in the face of diamond inheritance, can also be expressed in pure Python using these primitives. You could write your own superclass calling convention and use it exactly the same way as super.

This is one of the things I really like about Python. Very little of it is truly magical; virtually everything about the object model exists in the types rather than the language, which means virtually everything can be customized in pure Python.

Class creation and metaclasses

A very brief word on all of this stuff, since I could talk forever about Python and I have three other languages to get to.

The class block itself is fairly interesting. It looks like this:

1
2
class Name(*bases, **kwargs):
    # code

I’ve said several times that classes are objects, and in fact the class block is one big pile of syntactic sugar for calling type(...) with some arguments to create a new type object.

The Python documentation has a remarkably detailed description of this process, but the gist is:

  • Python determines the type of the new class — the metaclass — by looking for a metaclass keyword argument. If there isn’t one, Python uses the “lowest” type among the provided base classes. (If you’re not doing anything special, that’ll just be type, since every class inherits from object and object is an instance of type.)

  • Python executes the class body. It gets its own local scope, and any assignments or method definitions go into that scope.

  • Python now calls type(name, bases, attrs, **kwargs). The name is whatever was right after class; the bases are position arguments; and attrs is the class body’s local scope. (This is how methods and other class attributes end up on the class.) The brand new type is then assigned to Name.

Of course, you can mess with most of this. You can implement __prepare__ on a metaclass, for example, to use a custom mapping as storage for the local scope — including any reads, which allows for some interesting shenanigans. The only part you can’t really implement in pure Python is the scoping bit, which has a couple extra rules that make sense for classes. (In particular, functions defined within a class block don’t close over the class body; that would be nonsense.)

Object creation

Finally, there’s what actually happens when you create an object — including a class, which remember is just an invocation of type(...).

Calling Foo(...) is implemented as, well, a call. Any type can implement calls with the __call__ special method, and you’ll find that type itself does so. It looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# oh, a fun wrinkle that's hard to express in pure python: type is a class, so
# it's an instance of itself
class type:
    def __call__(self, *args, **kwargs):
        # remember, here 'self' is a CLASS, an instance of type.
        # __new__ is a true constructor: object.__new__ allocates storage
        # for a new blank object
        instance = self.__new__(self, *args, **kwargs)
        # you can return whatever you want from __new__ (!), and __init__
        # is only called on it if it's of the right type
        if isinstance(instance, self):
            instance.__init__(*args, **kwargs)
        return instance

Again, you can trivially confirm this by asking any type for its __call__ method. Assuming that type doesn’t implement __call__ itself, you’ll get back a bound version of types implementation.

1
2
>>> list.__call__
<method-wrapper '__call__' of type object at 0x7fafb831a400>

You can thus implement __call__ in your own metaclass to completely change how subclasses are created — including skipping the creation altogether, if you like.

And… there’s a bunch of stuff I haven’t even touched on.

The Python philosophy

Python offers something that, on the surface, looks like a “traditional” class/object model. Under the hood, it acts more like a prototypical system, where failed attribute lookups simply defer to a superclass or metaclass.

The language also goes to almost superhuman lengths to expose all of its moving parts. Even the prototypical behavior is an implementation of __getattribute__ somewhere, which you are free to completely replace in your own types. Proxying and delegation are easy.

Also very nice is that these features “bundle” well, by which I mean a library author can do all manner of convoluted hijinks, and a consumer of that library doesn’t have to see any of it or understand how it works. You only need to inherit from a particular class (which has a metaclass), or use some descriptor as a decorator, or even learn any new syntax.

This meshes well with Python culture, which is pretty big on the principle of least surprise. These super-advanced features tend to be tightly confined to single simple features (like “makes a weak attribute“) or cordoned with DSLs (e.g., defining a form/struct/database table with a class body). In particular, I’ve never seen a metaclass in the wild implement its own __call__.

I have mixed feelings about that. It’s probably a good thing overall that the Python world shows such restraint, but I wonder if there are some very interesting possibilities we’re missing out on. I implemented a metaclass __call__ myself, just once, in an entity/component system that strove to minimize fuss when communicating between components. It never saw the light of day, but I enjoyed seeing some new things Python could do with the same relatively simple syntax. I wouldn’t mind seeing, say, an object model based on composition (with no inheritance) built atop Python’s primitives.

Lua

Lua doesn’t have an object model. Instead, it gives you a handful of very small primitives for building your own object model. This is pretty typical of Lua — it’s a very powerful language, but has been carefully constructed to be very small at the same time. I’ve never encountered anything else quite like it, and “but it starts indexing at 1!” really doesn’t do it justice.

The best way to demonstrate how objects work in Lua is to build some from scratch. We need two key features. The first is metatables, which bear a passing resemblance to Python’s metaclasses.

Tables and metatables

The table is Lua’s mapping type and its primary data structure. Keys can be any value other than nil. Lists are implemented as tables whose keys are consecutive integers starting from 1. Nothing terribly surprising. The dot operator is sugar for indexing with a string key.

1
2
3
4
5
local t = { a = 1, b = 2 }
print(t['a'])  -- 1
print(t.b)  -- 2
t.c = 3
print(t['c'])  -- 3

A metatable is a table that can be associated with another value (usually another table) to change its behavior. For example, operator overloading is implemented by assigning a function to a special key in a metatable.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = { a = 1, b = 2 }
--print(t + 0)  -- error: attempt to perform arithmetic on a table value

local mt = {
    __add = function(left, right)
        return 12
    end,
}
setmetatable(t, mt)
print(t + 0)  -- 12

Now, the interesting part: one of the special keys is __index, which is consulted when the base table is indexed by a key it doesn’t contain. Here’s a table that claims every key maps to itself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local t = {}
local mt = {
    __index = function(table, key)
        return key
    end,
}
setmetatable(t, mt)
print(t.foo)  -- foo
print(t.bar)  -- bar
print(t[3])  -- 3

__index doesn’t have to be a function, either. It can be yet another table, in which case that table is simply indexed with the key. If the key still doesn’t exist and that table has a metatable with an __index, the process repeats.

With this, it’s easy to have several unrelated tables that act as a single table. Call the base table an object, fill the __index table with functions and call it a class, and you have half of an object system. You can even get prototypical inheritance by chaining __indexes together.

At this point things are a little confusing, since we have at least three tables going on, so here’s a diagram. Keep in mind that Lua doesn’t actually have anything called an “object”, “class”, or “method” — those are just convenient nicknames for a particular structure we might build with Lua’s primitives.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
                    ╔═══════════╗        ...
                    ║ metatable ║         ║
                    ╟───────────╢   ┌─────╨───────────────────────┐
                    ║ __index   ╫───┤ lookup table ("superclass") │
                    ╚═══╦═══════╝   ├─────────────────────────────┤
  ╔═══════════╗         ║           │ some other method           ┼─── function() ... end
  ║ metatable ║         ║           └─────────────────────────────┘
  ╟───────────╢   ┌─────╨──────────────────┐
  ║ __index   ╫───┤ lookup table ("class") │
  ╚═══╦═══════╝   ├────────────────────────┤
      ║           │ some method            ┼─── function() ... end
      ║           └────────────────────────┘
┌─────╨─────────────────┐
│ base table ("object") │
└───────────────────────┘

Note that a metatable is not the same as a class; it defines behavior, not methods. Conversely, if you try to use a class directly as a metatable, it will probably not do much. (This is pretty different from e.g. Python, where operator overloads are just methods with funny names. One nice thing about the Lua approach is that you can keep interface-like functionality separate from methods, and avoid clogging up arbitrary objects’ namespaces. You could even use a dummy table as a key and completely avoid name collisions.)

Anyway, code!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
local class = {
    foo = function(a)
        print("foo got", a)
    end,
}
local mt = { __index = class }
-- setmetatable returns its first argument, so this is nice shorthand
local obj1 = setmetatable({}, mt)
local obj2 = setmetatable({}, mt)
obj1.foo(7)  -- foo got 7
obj2.foo(9)  -- foo got 9

Wait, wait, hang on. Didn’t I call these methods? How do they get at the object? Maybe Lua has a magical this variable?

Methods, sort of

Not quite, but this is where the other key feature comes in: method-call syntax. It’s the lightest touch of sugar, just enough to have method invocation.

1
2
3
4
5
6
7
8
9
-- note the colon!
a:b(c, d, ...)

-- exactly equivalent to this
-- (except that `a` is only evaluated once)
a.b(a, c, d, ...)

-- which of course is really this
a["b"](a, c, d, ...)

Now we can write methods that actually do something.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
local class = {
    bar = function(self)
        print("our score is", self.score)
    end,
}
local mt = { __index = class }
local obj1 = setmetatable({ score = 13 }, mt)
local obj2 = setmetatable({ score = 25 }, mt)
obj1:bar()  -- our score is 13
obj2:bar()  -- our score is 25

And that’s all you need. Much like Python, methods and data live in the same namespace, and Lua doesn’t care whether obj:method() finds a function on obj or gets one from the metatable’s __index. Unlike Python, the function will be passed self either way, because self comes from the use of : rather than from the lookup behavior.

(Aside: strictly speaking, any Lua value can have a metatable — and if you try to index a non-table, Lua will always consult the metatable’s __index. Strings all have the string library as a metatable, so you can call methods on them: try ("%s %s"):format(1, 2). I don’t think Lua lets user code set the metatable for non-tables, so this isn’t that interesting, but if you’re writing Lua bindings from C then you can wrap your pointers in metatables to give them methods implemented in C.)

Bringing it all together

Of course, writing all this stuff every time is a little tedious and error-prone, so instead you might want to wrap it all up inside a little function. No problem.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
local function make_object(body)
    -- create a metatable
    local mt = { __index = body }
    -- create a base table to serve as the object itself
    local obj = setmetatable({}, mt)
    -- and, done
    return obj
end

-- you can leave off parens if you're only passing in 
local Dog = {
    -- this acts as a "default" value; if obj.barks is missing, __index will
    -- kick in and find this value on the class.  but if obj.barks is assigned
    -- to, it'll go in the object and shadow the value here.
    barks = 0,

    bark = function(self)
        self.barks = self.barks + 1
        print("woof!")
    end,
}

local mydog = make_object(Dog)
mydog:bark()  -- woof!
mydog:bark()  -- woof!
mydog:bark()  -- woof!
print(mydog.barks)  -- 3
print(Dog.barks)  -- 0

It works, but it’s fairly barebones. The nice thing is that you can extend it pretty much however you want. I won’t reproduce an entire serious object system here — lord knows there are enough of them floating around — but the implementation I have for my LÖVE games lets me do this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
local Animal = Object:extend{
    cries = 0,
}

-- called automatically by Object
function Animal:init()
    print("whoops i couldn't think of anything interesting to put here")
end

-- this is just nice syntax for adding a first argument called 'self', then
-- assigning this function to Animal.cry
function Animal:cry()
    self.cries = self.cries + 1
end

local Cat = Animal:extend{}

function Cat:cry()
    print("meow!")
    Cat.__super.cry(self)
end

local cat = Cat()
cat:cry()  -- meow!
cat:cry()  -- meow!
print(cat.cries)  -- 2

When I say you can extend it however you want, I mean that. I could’ve implemented Python (2)-style super(Cat, self):cry() syntax; I just never got around to it. I could even make it work with multiple inheritance if I really wanted to — or I could go the complete opposite direction and only implement composition. I could implement descriptors, customizing the behavior of individual table keys. I could add pretty decent syntax for composition/proxying. I am trying very hard to end this section now.

The Lua philosophy

Lua’s philosophy is to… not have a philosophy? It gives you the bare minimum to make objects work, and you can do absolutely whatever you want from there. Lua does have something resembling prototypical inheritance, but it’s not so much a first-class feature as an emergent property of some very simple tools. And since you can make __index be a function, you could avoid the prototypical behavior and do something different entirely.

The very severe downside, of course, is that you have to find or build your own object system — which can get pretty confusing very quickly, what with the multiple small moving parts. Third-party code may also have its own object system with subtly different behavior. (Though, in my experience, third-party code tries very hard to avoid needing an object system at all.)

It’s hard to say what the Lua “culture” is like, since Lua is an embedded language that’s often a little different in each environment. I imagine it has a thousand millicultures, instead. I can say that the tedium of building my own object model has led me into something very “traditional”, with prototypical inheritance and whatnot. It’s partly what I’m used to, but it’s also just really dang easy to get working.

Likewise, while I love properties in Python and use them all the dang time, I’ve yet to use a single one in Lua. They wouldn’t be particularly hard to add to my object model, but having to add them myself (or shop around for an object model with them and also port all my code to use it) adds a huge amount of friction. I’ve thought about designing an interesting ECS with custom object behavior, too, but… is it really worth the effort? For all the power and flexibility Lua offers, the cost is that by the time I have something working at all, I’m too exhausted to actually use any of it.

JavaScript

JavaScript is notable for being preposterously heavily used, yet not having a class block.

Well. Okay. Yes. It has one now. It didn’t for a very long time, and even the one it has now is sugar.

Here’s a vector class again:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
class Vector {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    get magnitude() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    dot(other) {
        return this.x * other.x + this.y * other.y;
    }
}

In “classic” JavaScript, this would be written as:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
function Vector(x, y) {
    this.x = x;
    this.y = y;
}

Object.defineProperty(Vector.prototype, 'magnitude', {
    configurable: true,
    enumerable: true,
    get: function() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    },
});


Vector.prototype.dot = function(other) {
    return this.x * other.x + this.y * other.y;
};

Hm, yes. I can see why they added class.

The JavaScript model

In JavaScript, a new type is defined in terms of a function, which is its constructor.

Right away we get into trouble here. There is a very big difference between these two invocations, which I actually completely forgot about just now after spending four hours writing about Python and Lua:

1
2
let vec = Vector(3, 4);
let vec = new Vector(3, 4);

The first calls the function Vector. It assigns some properties to this, which here is going to be window, so now you have a global x and y. It then returns nothing, so vec is undefined.

The second calls Vector with this set to a new empty object, then evaluates to that object. The result is what you’d actually expect.

(You can detect this situation with the strange new.target expression, but I have never once remembered to do so.)

From here, we have true, honest-to-god, first-class prototypical inheritance. The word “prototype” is even right there. When you write this:

1
vec.dot(vec2)

JavaScript will look for dot on vec and (presumably) not find it. It then consults vecs prototype, an object you can see for yourself by using Object.getPrototypeOf(). Since vec is a Vector, its prototype is Vector.prototype.

I stress that Vector.prototype is not the prototype for Vector. It’s the prototype for instances of Vector.

(I say “instance”, but the true type of vec here is still just object. If you want to find Vector, it’s automatically assigned to the constructor property of its own prototype, so it’s available as vec.constructor.)

Of course, Vector.prototype can itself have a prototype, in which case the process would continue if dot were not found. A common (and, arguably, very bad) way to simulate single inheritance is to set Class.prototype to an instance of a superclass to get the prototype right, then tack on the methods for Class. Nowadays we can do Object.create(Superclass.prototype).

Now that I’ve been through Python and Lua, though, this isn’t particularly surprising. I kinda spoiled it.

I suppose one difference in JavaScript is that you can tack arbitrary attributes directly onto Vector all you like, and they will remain invisible to instances since they aren’t in the prototype chain. This is kind of backwards from Lua, where you can squirrel stuff away in the metatable.

Another difference is that every single object in JavaScript has a bunch of properties already tacked on — the ones in Object.prototype. Every object (and by “object” I mean any mapping) has a prototype, and that prototype defaults to Object.prototype, and it has a bunch of ancient junk like isPrototypeOf.

(Nit: it’s possible to explicitly create an object with no prototype via Object.create(null).)

Like Lua, and unlike Python, JavaScript doesn’t distinguish between keys found on an object and keys found via a prototype. Properties can be defined on prototypes with Object.defineProperty(), but that works just as well directly on an object, too. JavaScript doesn’t have a lot of operator overloading, but some things like Symbol.iterator also work on both objects and prototypes.

About this

You may, at this point, be wondering what this is. Unlike Lua and Python (and the last language below), this is a special built-in value — a context value, invisibly passed for every function call.

It’s determined by where the function came from. If the function was the result of an attribute lookup, then this is set to the object containing that attribute. Otherwise, this is set to the global object, window. (You can also set this to whatever you want via the call method on functions.)

This decision is made lexically, i.e. from the literal source code as written. There are no Python-style bound methods. In other words:

1
2
3
4
5
// this = obj
obj.method()
// this = window
let meth = obj.method
meth()

Also, because this is reassigned on every function call, it cannot be meaningfully closed over, which makes using closures within methods incredibly annoying. The old approach was to assign this to some other regular name like self (which got syntax highlighting since it’s also a built-in name in browsers); then we got Function.bind, which produced a callable thing with a fixed context value, which was kind of nice; and now finally we have arrow functions, which explicitly close over the current this when they’re defined and don’t change it when called. Phew.

Class syntax

I already showed class syntax, and it’s really just one big macro for doing all the prototype stuff The Right Way. It even prevents you from calling the type without new. The underlying model is exactly the same, and you can inspect all the parts.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
class Vector { ... }

console.log(Vector.prototype);  // { dot: ..., magnitude: ..., ... }
let vec = new Vector(3, 4);
console.log(Object.getPrototypeOf(vec));  // same as Vector.prototype

// i don't know why you would subclass vector but let's roll with it
class Vectest extends Vector { ... }

console.log(Vectest.prototype);  // { ... }
console.log(Object.getPrototypeOf(Vectest.prototype))  // same as Vector.prototype

Alas, class syntax has a couple shortcomings. You can’t use the class block to assign arbitrary data to either the type object or the prototype — apparently it was deemed too confusing that mutations would be shared among instances. Which… is… how prototypes work. How Python works. How JavaScript itself, one of the most popular languages of all time, has worked for twenty-two years. Argh.

You can still do whatever assignment you want outside of the class block, of course. It’s just a little ugly, and not something I’d think to look for with a sugary class.

A more subtle result of this behavior is that a class block isn’t quite the same syntax as an object literal. The check for data isn’t a runtime thing; class Foo { x: 3 } fails to parse. So JavaScript now has two largely but not entirely identical styles of key/value block.

Attribute access

Here’s where things start to come apart at the seams, just a little bit.

JavaScript doesn’t really have an attribute protocol. Instead, it has two… extension points, I suppose.

One is Object.defineProperty, seen above. For common cases, there’s also the get syntax inside a property literal, which does the same thing. But unlike Python’s @property, these aren’t wrappers around some simple primitives; they are the primitives. JavaScript is the only language of these four to have “property that runs code on access” as a completely separate first-class concept.

If you want to intercept arbitrary attribute access (and some kinds of operators), there’s a completely different primitive: the Proxy type. It doesn’t let you intercept attribute access or operators; instead, it produces a wrapper object that supports interception and defers to the wrapped object by default.

It’s cool to see composition used in this way, but also, extremely weird. If you want to make your own type that overloads in or calling, you have to return a Proxy that wraps your own type, rather than actually returning your own type. And (unlike the other three languages in this post) you can’t return a different type from a constructor, so you have to throw that away and produce objects only from a factory. And instanceof would be broken, but you can at least fix that with Symbol.hasInstance — which is really operator overloading, implement yet another completely different way.

I know the design here is a result of legacy and speed — if any object could intercept all attribute access, then all attribute access would be slowed down everywhere. Fair enough. It still leaves the surface area of the language a bit… bumpy?

The JavaScript philosophy

It’s a little hard to tell. The original idea of prototypes was interesting, but it was hidden behind some very awkward syntax. Since then, we’ve gotten a bunch of extra features awkwardly bolted on to reflect the wildly varied things the built-in types and DOM API were already doing. We have class syntax, but it’s been explicitly designed to avoid exposing the prototype parts of the model.

I admit I don’t do a lot of heavy JavaScript, so I might just be overlooking it, but I’ve seen virtually no code that makes use of any of the recent advances in object capabilities. Forget about custom iterators or overloading call; I can’t remember seeing any JavaScript in the wild that even uses properties yet. I don’t know if everyone’s waiting for sufficient browser support, nobody knows about them, or nobody cares.

The model has advanced recently, but I suspect JavaScript is still shackled to its legacy of “something about prototypes, I don’t really get it, just copy the other code that’s there” as an object model. Alas! Prototypes are so good. Hopefully class syntax will make it a bit more accessible, as it has in Python.

Perl 5

Perl 5 also doesn’t have an object system and expects you to build your own. But where Lua gives you two simple, powerful tools for building one, Perl 5 feels more like a puzzle with half the pieces missing. Clearly they were going for something, but they only gave you half of it.

In brief, a Perl object is a reference that has been blessed with a package.

I need to explain a few things. Honestly, one of the biggest problems with the original Perl object setup was how many strange corners and unique jargon you had to understand just to get off the ground.

(If you want to try running any of this code, you should stick a use v5.26; as the first line. Perl is very big on backwards compatibility, so you need to opt into breaking changes, and even the mundane say builtin is behind a feature gate.)

References

A reference in Perl is sort of like a pointer, but its main use is very different. See, Perl has the strange property that its data structures try very hard to spill their contents all over the place. Despite having dedicated syntax for arrays — @foo is an array variable, distinct from the single scalar variable $foo — it’s actually impossible to nest arrays.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (@foo, @foo);
# @bar is now a flat list of eight items: 1, 2, 3, 4, 1, 2, 3, 4

The idea, I guess, is that an array is not one thing. It’s not a container, which happens to hold multiple things; it is multiple things. Anywhere that expects a single value, such as an array element, cannot contain an array, because an array fundamentally is not a single value.

And so we have “references”, which are a form of indirection, but also have the nice property that they’re single values. They add containment around arrays, and in general they make working with most of Perl’s primitive types much more sensible. A reference to a variable can be taken with the \ operator, or you can use [ ... ] and { ... } to directly create references to anonymous arrays or hashes.

1
2
3
my @foo = (1, 2, 3, 4);
my @bar = (\@foo, \@foo);
# @bar is now a nested list of two items: [1, 2, 3, 4], [1, 2, 3, 4]

(Incidentally, this is the sole reason I initially abandoned Perl for Python. Non-trivial software kinda requires nesting a lot of data structures, so you end up with references everywhere, and the syntax for going back and forth between a reference and its contents is tedious and ugly.)

A Perl object must be a reference. Perl doesn’t care what kind of reference — it’s usually a hash reference, since hashes are a convenient place to store arbitrary properties, but it could just as well be a reference to an array, a scalar, or even a sub (i.e. function) or filehandle.

I’m getting a little ahead of myself. First, the other half: blessing and packages.

Packages and blessing

Perl packages are just namespaces. A package looks like this:

1
2
3
4
5
6
7
package Foo::Bar;

sub quux {
    say "hi from quux!";
}

# now Foo::Bar::quux() can be called from anywhere

Nothing shocking, right? It’s just a named container. A lot of the details are kind of weird, like how a package exists in some liminal quasi-value space, but the basic idea is a Bag Of Stuff.

The final piece is “blessing,” which is Perl’s funny name for binding a package to a reference. A very basic class might look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
package Vector;

# the name 'new' is convention, not special
sub new {
    # perl argument passing is weird, don't ask
    my ($class, $x, $y) = @_;

    # create the object itself -- here, unusually, an array reference makes sense
    my $self = [ $x, $y ];

    # associate the package with that reference
    # note that $class here is just the regular string, 'Vector'
    bless $self, $class;

    return $self;
}

sub x {
    my ($self) = @_;
    return $self->[0];
}

sub y {
    my ($self) = @_;
    return $self->[1];
}

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

# switch back to the "default" package
package main;

# -> is method call syntax, which passes the invocant as the first argument;
# for a package, that's just the package name
my $vec = Vector->new(3, 4);
say $vec->magnitude;  # 5

A few things of note here. First, $self->[0] has nothing to do with objects; it’s normal syntax for getting the value of a index 0 out of an array reference called $self. (Most classes are based on hashrefs and would use $self->{value} instead.) A blessed reference is still a reference and can be treated like one.

In general, -> is Perl’s dereferencey operator, but its exact behavior depends on what follows. If it’s followed by brackets, then it’ll apply the brackets to the thing in the reference: ->{} to index a hash reference, ->[] to index an array reference, and ->() to call a function reference.

But if -> is followed by an identifier, then it’s a method call. For packages, that means calling a function in the package and passing the package name as the first argument. For objects — blessed references — that means calling a function in the associated package and passing the object as the first argument.

This is a little weird! A blessed reference is a superposition of two things: its normal reference behavior, and some completely orthogonal object behavior. Also, object behavior has no notion of methods vs data; it only knows about methods. Perl lets you omit parentheses in a lot of places, including when calling a method with no arguments, so $vec->magnitude is really $vec->magnitude().

Perl’s blessing bears some similarities to Lua’s metatables, but ultimately Perl is much closer to Ruby’s “message passing” approach than the above three languages’ approaches of “get me something and maybe it’ll be callable”. (But this is no surprise — Ruby is a spiritual successor to Perl 5.)

All of this leads to one little wrinkle: how do you actually expose data? Above, I had to write x and y methods. Am I supposed to do that for every single attribute on my type?

Yes! But don’t worry, there are third-party modules to help with this incredibly fundamental task. Take Class::Accessor::Fast, so named because it’s faster than Class::Accessor:

1
2
3
package Foo;
use base qw(Class::Accessor::Fast);
__PACKAGE__->mk_accessors(qw(fred wilma barney));

(__PACKAGE__ is the lexical name of the current package; qw(...) is a list literal that splits its contents on whitespace.)

This assumes you’re using a hashref with keys of the same names as the attributes. $obj->fred will return the fred key from your hashref, and $obj->fred(4) will change it to 4.

You also, somewhat bizarrely, have to inherit from Class::Accessor::Fast. Speaking of which,

Inheritance

Inheritance is done by populating the package-global @ISA array with some number of (string) names of parent packages. Most code instead opts to write use base ...;, which does the same thing. Or, more commonly, use parent ...;, which… also… does the same thing.

Every package implicitly inherits from UNIVERSAL, which can be freely modified by Perl code.

A method can call its superclass method with the SUPER:: pseudo-package:

1
2
3
4
sub foo {
    my ($self) = @_;
    $self->SUPER::foo;
}

However, this does a depth-first search, which means it almost certainly does the wrong thing when faced with multiple inheritance. For a while the accepted solution involved a third-party module, but Perl eventually grew an alternative you have to opt into: C3, which may be more familiar to you as the order Python uses.

1
2
3
4
5
6
use mro 'c3';

sub foo {
    my ($self) = @_;
    $self->next::method;
}

Offhand, I’m not actually sure how next::method works, seeing as it was originally implemented in pure Perl code. I suspect it involves peeking at the caller’s stack frame. If so, then this is a very different style of customizability from e.g. Python — the MRO was never intended to be pluggable, and the use of a special pseudo-package means it isn’t really, but someone was determined enough to make it happen anyway.

Operator overloading and whatnot

Operator overloading looks a little weird, though really it’s pretty standard Perl.

1
2
3
4
5
6
7
8
package MyClass;

use overload '+' => \&_add;

sub _add {
    my ($self, $other, $swap) = @_;
    ...
}

use overload here is a pragma, where “pragma” means “regular-ass module that does some wizardry when imported”.

\&_add is how you get a reference to the _add sub so you can pass it to the overload module. If you just said &_add or _add, that would call it.

And that’s it; you just pass a map of operators to functions to this built-in module. No worry about name clashes or pollution, which is pretty nice. You don’t even have to give references to functions that live in the package, if you don’t want them to clog your namespace; you could put them in another package, or even inline them anonymously.

One especially interesting thing is that Perl lets you overload every operator. Perl has a lot of operators. It considers some math builtins like sqrt and trig functions to be operators, or at least operator-y enough that you can overload them. You can also overload the “file text” operators, such as -e $path to test whether a file exists. You can overload conversions, including implicit conversion to a regex. And most fascinating to me, you can overload dereferencing — that is, the thing Perl does when you say $hashref->{key} to get at the underlying hash. So a single object could pretend to be references of multiple different types, including a subref to implement callability. Neat.

Somewhat related: you can overload basic operators (indexing, etc.) on basic types (not references!) with the tie function, which is designed completely differently and looks for methods with fixed names. Go figure.

You can intercept calls to nonexistent methods by implementing a function called AUTOLOAD, within which the $AUTOLOAD global will contain the name of the method being called. Originally this feature was, I think, intended for loading binary components or large libraries on-the-fly only when needed, hence the name. Offhand I’m not sure I ever saw it used the way __getattr__ is used in Python.

Is there a way to intercept all method calls? I don’t think so, but it is Perl, so I must be forgetting something.

Actually no one does this any more

Like a decade ago, a council of elder sages sat down and put together a whole whizbang system that covers all of it: Moose.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
package Vector;
use Moose;

has x => (is => 'rw', isa => 'Int');
has y => (is => 'rw', isa => 'Int');

sub magnitude {
    my ($self) = @_;
    return sqrt($self->x ** 2 + $self->y ** 2);
}

Moose has its own way to do pretty much everything, and it’s all built on the same primitives. Moose also adds metaclasses, somehow, despite that the underlying model doesn’t actually support them? I’m not entirely sure how they managed that, but I do remember doing some class introspection with Moose and it was much nicer than the built-in way.

(If you’re wondering, the built-in way begins with looking at the hash called %Vector::. No, that’s not a typo.)

I really cannot stress enough just how much stuff Moose does, but I don’t want to delve into it here since Moose itself is not actually the language model.

The Perl philosophy

I hope you can see what I meant with what I first said about Perl, now. It has multiple inheritance with an MRO, but uses the wrong one by default. It has extensive operator overloading, which looks nothing like how inheritance works, and also some of it uses a totally different mechanism with special method names instead. It only understands methods, not data, leaving you to figure out accessors by hand.

There’s 70% of an object system here with a clear general design it was gunning for, but none of the pieces really look anything like each other. It’s weird, in a distinctly Perl way.

The result is certainly flexible, at least! It’s especially cool that you can use whatever kind of reference you want for storage, though even as I say that, I acknowledge it’s no different from simply subclassing list or something in Python. It feels different in Perl, but maybe only because it looks so different.

I haven’t written much Perl in a long time, so I don’t know what the community is like any more. Moose was already ubiquitous when I left, which you’d think would let me say “the community mostly focuses on the stuff Moose can do” — but even a decade ago, Moose could already do far more than I had ever seen done by hand in Perl. It’s always made a big deal out of roles (read: interfaces), for instance, despite that I’d never seen anyone care about them in Perl before Moose came along. Maybe their presence in Moose has made them more popular? Who knows.

Also, I wrote Perl seriously, but in the intervening years I’ve only encountered people who only ever used Perl for one-offs. Maybe it’ll come as a surprise to a lot of readers that Perl has an object model at all.

End

Well, that was fun! I hope any of that made sense.

Special mention goes to Rust, which doesn’t have an object model you can fiddle with at runtime, but does do things a little differently.

It’s been really interesting thinking about how tiny differences make a huge impact on what people do in practice. Take the choice of storage in Perl versus Python. Perl’s massively common URI class uses a string as the storage, nothing else; I haven’t seen anything like that in Python aside from markupsafe, which is specifically designed as a string type. I would guess this is partly because Perl makes you choose — using a hashref is an obvious default, but you have to make that choice one way or the other. In Python (especially 3), inheriting from object and getting dict-based storage is the obvious thing to do; the ability to use another type isn’t quite so obvious, and doing it “right” involves a tiny bit of extra work.

Or, consider that Lua could have descriptors, but the extra bit of work (especially design work) has been enough of an impediment that I’ve never implemented them. I don’t think the object implementations I’ve looked at have included them, either. Super weird!

In that light, it’s only natural that objects would be so strongly associated with the features Java and C++ attach to them. I think that makes it all the more important to play around! Look at what Moose has done. No, really, you should bear in mind my description of how Perl does stuff and flip through the Moose documentation. It’s amazing what they’ve built.

Introducing AWS AppSync – Build data-driven apps with real-time and off-line capabilities

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/introducing-amazon-appsync/

In this day and age, it is almost impossible to do without our mobile devices and the applications that help make our lives easier. As our dependency on our mobile phone grows, the mobile application market has exploded with millions of apps vying for our attention. For mobile developers, this means that we must ensure that we build applications that provide the quality, real-time experiences that app users desire.  Therefore, it has become essential that mobile applications are developed to include features such as multi-user data synchronization, offline network support, and data discovery, just to name a few.  According to several articles, I read recently about mobile development trends on publications like InfoQ, DZone, and the mobile development blog AlleviateTech, one of the key elements in of delivering the aforementioned capabilities is with cloud-driven mobile applications.  It seems that this is especially true, as it related to mobile data synchronization and data storage.

That being the case, it is a perfect time for me to announce a new service for building innovative mobile applications that are driven by data-intensive services in the cloud; AWS AppSync. AWS AppSync is a fully managed serverless GraphQL service for real-time data queries, synchronization, communications and offline programming features. For those not familiar, let me briefly share some information about the open GraphQL specification. GraphQL is a responsive data query language and server-side runtime for querying data sources that allow for real-time data retrieval and dynamic query execution. You can use GraphQL to build a responsive API for use in when building client applications. GraphQL works at the application layer and provides a type system for defining schemas. These schemas serve as specifications to define how operations should be performed on the data and how the data should be structured when retrieved. Additionally, GraphQL has a declarative coding model which is supported by many client libraries and frameworks including React, React Native, iOS, and Android.

Now the power of the GraphQL open standard query language is being brought to you in a rich managed service with AWS AppSync.  With AppSync developers can simplify the retrieval and manipulation of data across multiple data sources with ease, allowing them to quickly prototype, build and create robust, collaborative, multi-user applications. AppSync keeps data updated when devices are connected, but enables developers to build solutions that work offline by caching data locally and synchronizing local data when connections become available.

Let’s discuss some key concepts of AWS AppSync and how the service works.

AppSync Concepts

  • AWS AppSync Client: service client that defines operations, wraps authorization details of requests, and manage offline logic.
  • Data Source: the data storage system or a trigger housing data
  • Identity: a set of credentials with permissions and identification context provided with requests to GraphQL proxy
  • GraphQL Proxy: the GraphQL engine component for processing and mapping requests, handling conflict resolution, and managing Fine Grained Access Control
  • Operation: one of three GraphQL operations supported in AppSync
    • Query: a read-only fetch call to the data
    • Mutation: a write of the data followed by a fetch,
    • Subscription: long-lived connections that receive data in response to events.
  • Action: a notification to connected subscribers from a GraphQL subscription.
  • Resolver: function using request and response mapping templates that converts and executes payload against data source

How It Works

A schema is created to define types and capabilities of the desired GraphQL API and tied to a Resolver function.  The schema can be created to mirror existing data sources or AWS AppSync can create tables automatically based the schema definition. Developers can also use GraphQL features for data discovery without having knowledge of the backend data sources. After a schema definition is established, an AWS AppSync client can be configured with an operation request, like a Query operation. The client submits the operation request to GraphQL Proxy along with an identity context and credentials. The GraphQL Proxy passes this request to the Resolver which maps and executes the request payload against pre-configured AWS data services like an Amazon DynamoDB table, an AWS Lambda function, or a search capability using Amazon Elasticsearch. The Resolver executes calls to one or all of these services within a single network call minimizing CPU cycles and bandwidth needs and returns the response to the client. Additionally, the client application can change data requirements in code on demand and the AppSync GraphQL API will dynamically map requests for data accordingly, allowing prototyping and faster development.

In order to take a quick peek at the service, I’ll go to the AWS AppSync console. I’ll click the Create API button to get started.

 

When the Create new API screen opens, I’ll give my new API a name, TarasTestApp, and since I am just exploring the new service I will select the Sample schema option.  You may notice from the informational dialog box on the screen that in using the sample schema, AWS AppSync will automatically create the DynamoDB tables and the IAM roles for me.It will also deploy the TarasTestApp API on my behalf.  After review of the sample schema provided by the console, I’ll click the Create button to create my test API.

After the TaraTestApp API has been created and the associated AWS resources provisioned on my behalf, I can make updates to the schema, data source, or connect my data source(s) to a resolver. I also can integrate my GraphQL API into an iOS, Android, Web, or React Native application by cloning the sample repo from GitHub and downloading the accompanying GraphQL schema.  These application samples are great to help get you started and they are pre-configured to function in offline scenarios.

If I select the Schema menu option on the console, I can update and view the TarasTestApp GraphQL API schema.


Additionally, if I select the Data Sources menu option in the console, I can see the existing data sources.  Within this screen, I can update, delete, or add data sources if I so desire.

Next, I will select the Query menu option which takes me to the console tool for writing and testing queries. Since I chose the sample schema and the AWS AppSync service did most of the heavy lifting for me, I’ll try a query against my new GraphQL API.

I’ll use a mutation to add data for the event type in my schema. Since this is a mutation and it first writes data and then does a read of the data, I want the query to return values for name and where.

If I go to the DynamoDB table created for the event type in the schema, I will see that the values from my query have been successfully written into the table. Now that was a pretty simple task to write and retrieve data based on a GraphQL API schema from a data source, don’t you think.


 Summary

AWS AppSync is currently in AWS AppSync is in Public Preview and you can sign up today. It supports development for iOS, Android, and JavaScript applications. You can take advantage of this managed GraphQL service by going to the AWS AppSync console or learn more by reviewing more details about the service by reading a tutorial in the AWS documentation for the service or checking out our AWS AppSync Developer Guide.

Tara

 

What’s the Best Solution for Managing Digital Photos and Videos?

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/discovering-best-solution-for-photo-video-backup/

Digital Asset Management (DAM)

If you have spent any time, as we have, talking to photographers and videographers about how they back up and archive their digital photos and videos, then you know that there’s no one answer or solution that users have discovered to meet their needs.

Based on what we’ve heard, visual media artists are still searching for the best combination of software, hardware, and cloud storage to preserve their media, and to be able to search, retrieve, and reuse that media as easily as possible.

Yes, there are a number of solutions out there, and some users have created combinations of hardware, software, and services to meet their needs, but we have met few who claim to be satisfied with their solution for digital asset management (DAM), or expect that they will be using the same solution in just a year or two.

We’d like to open a dialog with professionals and serious amateurs to learn more about what you’re doing, what you’d like to do, and how Backblaze might fit into that solution.

We have a bit of cred in this field, as we currently have hundreds of petabytes of digital media files in our data centers from users of Backblaze Backup and Backblaze B2 Cloud Storage. We want to make our cloud services as useful as possible for photographers and videographers.

Tell Us Both Your Current Solution and Your Dream Solution

To get started, we’d love to hear from you about how you’re managing your photos and videos. Whether you’re an amateur or a professional, your experiences are valuable and will help us understand how to provide the best cloud component of a digital asset management solution.

Here are some questions to consider:

  • Are you using direct-attached drives, NAS (Network-Attached Storage), or offline storage for your media?
  • Do you use the cloud for media you’re actively working on?
  • Do you back up or archive to the cloud?
  • Did you have a catalog or record of the media that you’ve archived that you use to search and retrieve media?
  • What’s different about how you work in the field (or traveling) versus how you work in a studio (or at home)?
  • What software and/or hardware currently works for you?
  • What’s the biggest impediment to working in the way you’d really like to?
  • How could the cloud work better for you?

Please Contribute Your Ideas

To contribute, please answer the following two questions in the comments below or send an email to [email protected]. Please comment or email your response by December 22, 2017.

  1. How are you currently backing up your digital photos, video files, and/or file libraries/catalogs? Do you have a backup system that uses attached drives, a local network, the cloud, or offline storage media? Does it work well for you?
  2. Imagine your ideal digital asset backup setup. What would it look like? Don’t be constrained by current products, technologies, brands, or solutions. Invent a technology or product if you wish. Describe an ideal system that would work the way you want it to.

We know you have opinions about managing photos and videos. Bring them on!

We’re soliciting answers far and wide from amateurs and experts, weekend video makers and well-known professional photographers. We have a few amateur and professional photographers and videographers here at Backblaze, and they are contributing their comments, as well.

Once we have gathered all the responses, we’ll write a post on what we learned about how people are currently working and what they would do if anything were possible. Look for that post after the beginning of the year.

Don’t Miss Future Posts on Media Management

We don’t want you to miss our future posts on photography, videography, and digital asset management. To receive email notices of blog updates (and no spam, we promise), enter your email address above using the Join button at the top of the page.

Come Back on Thursday for our Photography Post (and a Special Giveaway, too)

This coming Thursday we’ll have a blog post about the different ways that photographers and videographers are currently managing their digital media assets.

Plus, you’ll have the chance to win a valuable hardware/software combination for digital media management that I am sure you will appreciate. (You’ll have to wait until Thursday to find out what the prize is, but it has a total value of over $700.)

Past Posts on Photography, Videography, and Digital Asset Management

We’ve written a number of blog posts about photos, videos, and managing digital assets. We’ve posted links to some of them below.

Four Tips To Help Photographers and Videographers Get The Most From B2

Four Tips To Help Photographers and Videographers Get The Most From B2

How to Back Up Your Mac’s Photos Library

How to Back Up Your Mac’s Photos Library

How To Back Up Your Flickr Library

How To Back Up Your Flickr Library

Getting Video Archives Out of Your Closet

Getting Video Archives Out of Your Closet

B2 Cloud Storage Roundup

B2 Cloud Storage Roundup

Backing Up Photos While Traveling

Backing up photos while traveling – feedback

Should I Use an External Drive for Backup?

Should I use an external drive for backup?

How to Connect your Synology NAS to B2

How to Connect your Synology NAS to B2

The post What’s the Best Solution for Managing Digital Photos and Videos? appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

AWS Media Services – Process, Store, and Monetize Cloud-Based Video

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-media-services-process-store-and-monetize-cloud-based-video/

Do you remember what web video was like in the early days? Standalone players, video no larger than a postage stamp, slow & cantankerous connections, overloaded servers, and the ever-present buffering messages were the norm less than two decades ago.

Today, thanks to technological progress and a broad array of standards, things are a lot better. Video consumers are now in control. They use devices of all shapes, sizes, and vintages to enjoy live and recorded content that is broadcast, streamed, or sent over-the-top (OTT, as they say), and expect immediate access to content that captures and then holds their attention. Meeting these expectations presents a challenge for content creators and distributors. Instead of generating video in a one-size-fits-all format, they (or their media servers) must be prepared to produce video that spans a broad range of sizes, formats, and bit rates, taking care to be ready to deal with planned or unplanned surges in demand. In the face of all of this complexity, they must backstop their content with a monetization model that supports the content and the infrastructure to deliver it.

New AWS Media Services
Today we are launching an array of broadcast-quality media services, each designed to address one or more aspects of the challenge that I outlined above. You can use them together to build a complete end-to-end video solution or you can use one or more in building-block style. In true AWS fashion, you can spend more time innovating and less time setting up and running infrastructure, leaving you ready to focus on creating, delivering, and monetizing your content. The services are all elastic, allowing you to ramp up processing power, connections, and storage and giving you the ability to handle million-user (and beyond) spikes with ease.

Here are the services (all accessible from a set of interactive consoles as well as through a comprehensive set of APIs):

AWS Elemental MediaConvert – File-based transcoding for OTT, broadcast, or archiving, with support for a long list of formats and codecs. Features include multi-channel audio, graphic overlays, closed captioning, and several DRM options.

AWS Elemental MediaLive – Live encoding to deliver video streams in real time to both televisions and multiscreen devices. Allows you to deploy highly reliable live channels in minutes, with full control over encoding parameters. It supports ad insertion, multi-channel audio, graphic overlays, and closed captioning.

AWS Elemental MediaPackage – Video origination and just-in-time packaging. Starting from a single input, produces output for multiple devices representing a long list of current and legacy formats. Supports multiple monetization models, time-shifted live streaming, ad insertion, DRM, and blackout management.

AWS Elemental MediaStore – Media-optimized storage that enables high performance and low latency applications such as live streaming, while taking advantage of the scale and durability of Amazon Simple Storage Service (S3).

AWS Elemental MediaTailor – Monetization service that supports ad serving and server-side ad insertion, a broad range of devices, transcoding, and accurate reporting of server-side and client-side ad insertion.

Instead of listing out all of the features in the sections below, I’ve simply included as many screen shots as possible with the expectation that this will give you a better sense of the rich set of features, parameters, and settings that you get with this set of services.

AWS Elemental MediaConvert
MediaConvert allows you to transcode content that is stored in files. You can process individual files or entire media libraries, or anything in-between. You simply create a conversion job that specifies the content and the desired outputs, and submit it to MediaConvert. There’s no software to install or patch and the service scales to meet your needs without affecting turnaround time or performance.

The MediaConvert Console lets you manage Output presets, Job templates, Queues, and Jobs:

You can use a built-in system preset or you can make one of your own. You have full control over the settings when you make your own:

Jobs templates are named, and produce one or more output groups. You can add a new group to a template with a click:

When everything is ready to go, you create a job and make some final selections, then click on Create:

Each account starts with a default queue for jobs, where incoming work is processed in parallel using all processing resources available to the account. Adding queues does not add processing resources, but does cause them to be apportioned across queues. You can temporarily pause one queue in order to devote more resources to the others. You can submit jobs to paused queues and you can also cancel any that have yet to start.

Pricing for this service is based on the amount of video that you process and the features that you use.

AWS Elemental MediaLive
This service is for live encoding, and can be run 24×7. MediaLive channels are deployed on redundant resources distributed in two physically separated Availability Zones in order to provide the reliability expected by our customers in the broadcast industry. You can specify your inputs and define your channels in the MediaLive Console:

After you create an Input, you create a Channel and attach it to the Input:

You have full control over the settings for each channel:

 

AWS Elemental MediaPackage
This service lets you deliver video to many devices from a single source. It focuses on protection and just-in-time packaging, giving you the ability to provide your users with the desired content on the device of their choice. You simply create a channel to get started:

Then you add one or more endpoints. Once again, plenty of options and full control, including a startover window and a time delay:

You find the input URL, user name, and password for your channel and route your live video stream to it for packaging:

AWS Elemental MediaStore
MediaStore offers the performance, consistency, and latency required for live and on-demand media delivery. Objects are written and read into a new “temporal” tier of object storage for a limited amount of time, then move silently into S3 for long-lived durability. You simply create a storage container to group your media content:

The container is available within a minute or so:

Like S3 buckets, MediaStore containers have access policies and no limits on the number of objects or storage capacity.

MediaStore helps you to take full advantage of S3 by managing the object key names so as to maximize storage and retrieval throughput, in accord with the Request Rate and Performance Considerations.

AWS Elemental MediaTailor
This service takes care of server-side ad insertion while providing a broadcast-quality viewer experience by transcoding ad assets on the fly. Your customer’s video player asks MediaTailor for a playlist. MediaTailor, in turn, calls your Ad Decision Server and returns a playlist that references the origin server for your original video and the ads recommended by the Ad Decision Server. The video player makes all of its requests to a single endpoint in order to ensure that client-side ad-blocking is ineffective. You simply create a MediaTailor Configuration:

Context information is passed to the Ad Decision Server in the URL:

Despite the length of this post I have barely scratched the surface of the AWS Media Services. Once AWS re:Invent is in the rear view mirror I hope to do a deep dive and show you how to use each of these services.

Available Now
The entire set of AWS Media Services is available now and you can start using them today! Pricing varies by service, but is built around a pay-as-you-go model.

Jeff;

UI Testing at Scale with AWS Lambda

Post Syndicated from Stas Neyman original https://aws.amazon.com/blogs/devops/ui-testing-at-scale-with-aws-lambda/

This is a guest blog post by Wes Couch and Kurt Waechter from the Blackboard Internal Product Development team about their experience using AWS Lambda.

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Here’s how we did it.

The backstory:

Blackboard is a global leader in delivering robust and innovative education software and services to clients in higher education, government, K12, and corporate training. We have a large product development team working across the globe in at least 10 different time zones, with an internal tools team providing support for quality and workflows. We have been using Selenium Webdriver to perform automated cross-browser UI testing since 2007. Because we are now practicing continuous delivery, the automated UI testing challenge has grown due to the faster release schedule. On top of that, every commit made to each branch triggers an execution of our automated UI test suite. If you have ever implemented an automated UI testing infrastructure, you know that it can be very challenging to scale and maintain. Although there are services that are useful for testing different browser/OS combinations, they don’t meet our scale needs.

It used to take three hours to synchronously run our functional UI suite, which revealed the obvious need for parallel execution. Previously, we used Mesos to orchestrate a Selenium Grid Docker container for each test run. This way, we were able to run eight concurrent threads for test execution, which took an average of 16 minutes. Although this setup is fine for a single workflow, the cracks started to show when we reached the scale required for Blackboard’s mature product lines. Going beyond eight concurrent sessions on a single container introduced performance problems that impact the reliability of tests (for example, issues in Webdriver or the browser popping up frequently). We tried Mesos and considered Kubernetes for Selenium Grid orchestration, but the answer to scaling a Selenium Grid was to think smaller, not larger. This led to our breakthrough with AWS Lambda.

The solution:

We started using AWS Lambda for UI testing because it doesn’t require costly infrastructure or countless man hours to maintain. The steps we outline in this blog post took one work day, from inception to implementation. By simply packaging the UI test suite into a Lambda function, we can execute these tests in parallel on a massive scale. We use a custom JUnit test runner that invokes the Lambda function with a request to run each test from the suite. The runner then aggregates the results returned from each Lambda test execution.

Selenium is the industry standard for testing UI at scale. Although there are other options to achieve the same thing in Lambda, we chose this mature suite of tools. Selenium is backed by Google, Firefox, and others to help the industry drive their browsers with code. This makes Lambda and Selenium a compelling stack for achieving UI testing at scale.

Making Chrome Run in Lambda

Currently, Chrome for Linux will not run in Lambda due to an absent mount point. By rebuilding Chrome with a slight modification, as Marco Lüthy originally demonstrated, you can run it inside Lambda anyway! It took about two hours to build the current master branch of Chromium to build on a c4.4xlarge. Unfortunately, the current version of ChromeDriver, 2.33, does not support any version of Chrome above 62, so we’ll be using Marco’s modified version of version 60 for the near future.

Required System Libraries

The Lambda runtime environment comes with a subset of common shared libraries. This means we need to include some extra libraries to get Chrome and ChromeDriver to work. Anything that exists in the java resources folder during compile time is included in the base directory of the compiled jar file. When this jar file is deployed to Lambda, it is placed in the /var/task/ directory. This allows us to simply place the libraries in the java resources folder under a folder named lib/ so they are right where they need to be when the Lambda function is invoked.

To get these libraries, create an EC2 instance and choose the Amazon Linux AMI.

Next, use ssh to connect to the server. After you connect to the new instance, search for the libraries to find their locations.

sudo find / -name libgconf-2.so.4
sudo find / -name libORBit-2.so.0

Now that you have the locations of the libraries, copy these files from the EC2 instance and place them in the java resources folder under lib/.

Packaging the Tests

To deploy the test suite to Lambda, we used a simple Gradle tool called ShadowJar, which is similar to the Maven Shade Plugin. It packages the libraries and dependencies inside the jar that is built. Usually test dependencies and sources aren’t included in a jar, but for this instance we want to include them. To include the test dependencies, add this section to the build.gradle file.

shadowJar {
   from sourceSets.test.output
   configurations = [project.configurations.testRuntime]
}

Deploying the Test Suite

Now that our tests are packaged with the dependencies in a jar, we need to get them into a running Lambda function. We use  simple SAM  templates to upload the packaged jar into S3, and then deploy it to Lambda with our settings.

{
   "AWSTemplateFormatVersion": "2010-09-09",
   "Transform": "AWS::Serverless-2016-10-31",
   "Resources": {
       "LambdaTestHandler": {
           "Type": "AWS::Serverless::Function",
           "Properties": {
               "CodeUri": "./build/libs/your-test-jar-all.jar",
               "Runtime": "java8",
               "Handler": "com.example.LambdaTestHandler::handleRequest",
               "Role": "<YourLambdaRoleArn>",
               "Timeout": 300,
               "MemorySize": 1536
           }
       }
   }
}

We use the maximum timeout available to ensure our tests have plenty of time to run. We also use the maximum memory size because this ensures our Lambda function can support Chrome and other resources required to run a UI test.

Specifying the handler is important because this class executes the desired test. The test handler should be able to receive a test class and method. With this information it will then execute the test and respond with the results.

public LambdaTestResult handleRequest(TestRequest testRequest, Context context) {
   LoggerContainer.LOGGER = new Logger(context.getLogger());
  
   BlockJUnit4ClassRunner runner = getRunnerForSingleTest(testRequest);
  
   Result result = new JUnitCore().run(runner);

   return new LambdaTestResult(result);
}

Creating a Lambda-Compatible ChromeDriver

We provide developers with an easily accessible ChromeDriver for local test writing and debugging. When we are running tests on AWS, we have configured ChromeDriver to run them in Lambda.

To configure ChromeDriver, we first need to tell ChromeDriver where to find the Chrome binary. Because we know that ChromeDriver is going to be unzipped into the root task directory, we should point the ChromeDriver configuration at that location.

The settings for getting ChromeDriver running are mostly related to Chrome, which must have its working directories pointed at the tmp/ folder.

Start with the default DesiredCapabilities for ChromeDriver, and then add the following settings to enable your ChromeDriver to start in Lambda.

public ChromeDriver createLambdaChromeDriver() {
   ChromeOptions options = new ChromeOptions();

   // Set the location of the chrome binary from the resources folder
   options.setBinary("/var/task/chrome");

   // Include these settings to allow Chrome to run in Lambda
   options.addArguments("--disable-gpu");
   options.addArguments("--headless");
   options.addArguments("--window-size=1366,768");
   options.addArguments("--single-process");
   options.addArguments("--no-sandbox");
   options.addArguments("--user-data-dir=/tmp/user-data");
   options.addArguments("--data-path=/tmp/data-path");
   options.addArguments("--homedir=/tmp");
   options.addArguments("--disk-cache-dir=/tmp/cache-dir");
  
   DesiredCapabilities desiredCapabilities = DesiredCapabilities.chrome();
   desiredCapabilities.setCapability(ChromeOptions.CAPABILITY, options);
  
   return new ChromeDriver(desiredCapabilities);
}

Executing Tests in Parallel

You can approach parallel test execution in Lambda in many different ways. Your approach depends on the structure and design of your test suite. For our solution, we implemented a custom test runner that uses reflection and JUnit libraries to create a list of test cases we want run. When we have the list, we create a TestRequest object to pass into the Lambda function that we have deployed. In this TestRequest, we place the class name, test method, and the test run identifier. When the Lambda function receives this TestRequest, our LambdaTestHandler generates and runs the JUnit test. After the test is complete, the test result is sent to the test runner. The test runner compiles a result after all of the tests are complete. By executing the same Lambda function multiple times with different test requests, we can effectively run the entire test suite in parallel.

To get screenshots and other test data, we pipe those files during test execution to an S3 bucket under the test run identifier prefix. When the tests are complete, we link the files to each test execution in the report generated from the test run. This lets us easily investigate test executions.

Pro Tip: Dynamically Loading Binaries

AWS Lambda has a limit of 250 MB of uncompressed space for packaged Lambda functions. Because we have libraries and other dependencies to our test suite, we hit this limit when we tried to upload a function that contained Chrome and ChromeDriver (~140 MB). This test suite was not originally intended to be used with Lambda. Otherwise, we would have scrutinized some of the included libraries. To get around this limit, we used the Lambda functions temporary directory, which allows up to 500 MB of space at runtime. Downloading these binaries at runtime moves some of that space requirement into the temporary directory. This allows more room for libraries and dependencies. You can do this by grabbing Chrome and ChromeDriver from an S3 bucket and marking them as executable using built-in Java libraries. If you take this route, be sure to point to the new location for these executables in order to create a ChromeDriver.

private static void downloadS3ObjectToExecutableFile(String key) throws IOException {
   File file = new File("/tmp/" + key);

   GetObjectRequest request = new GetObjectRequest("s3-bucket-name", key);

   FileUtils.copyInputStreamToFile(s3client.getObject(request).getObjectContent(), file);
   file.setExecutable(true);
}

Lambda-Selenium Project Source

We have compiled an open source example that you can grab from the Blackboard Github repository. Grab the code and try it out!

https://blackboard.github.io/lambda-selenium/

Conclusion

One year ago, one of our UI test suites took hours to run. Last month, it took 16 minutes. Today, it takes 39 seconds. Thanks to AWS Lambda, we can reduce our build times and perform automated UI testing at scale!