Tag Archives: open source

AWS Solutions Constructs – A Library of Architecture Patterns for the AWS CDK

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-solutions-constructs-a-library-of-architecture-patterns-for-the-aws-cdk/

Cloud applications are built using multiple components, such as virtual servers, containers, serverless functions, storage buckets, and databases. Being able to provision and configure these resources in a safe, repeatable way is incredibly important to automate your processes and let you focus on the unique parts of your implementation.

With the AWS Cloud Development Kit, you can leverage the expressive power of your favorite programming languages to model your applications. You can use high-level components called constructs, preconfigured with “sensible defaults” that you can customize, to quickly build a new application. The CDK provisions your resources using AWS CloudFormation to get all the benefits of managing your infrastructure as code. One of the reasons I like the CDK, is that you can compose and share your own custom components as higher-level constructs.

As you can imagine, there are recurring patterns that can be useful to more than one customer. For this reason, today we are launching the AWS Solutions Constructs, an open source extension library for the CDK that provides well-architected patterns to help you build your unique solutions. CDK constructs mostly cover single services. AWS Solutions Constructs provide multi-service patterns that combine two or more CDK resources, and implement best practices such as logging and encryption.

Using AWS Solutions Constructs
To see the power of a pattern-based approach, let’s take a look at how that works when building a new application. As an example, I want to build an HTTP API to store data in a Amazon DynamoDB table. To keep the content of the table small, I can use DynamoDB Time to Live (TTL) to expire items after a few days. After the TTL expires, data is deleted from the table and sent, via DynamoDB Streams, to a AWS Lambda function to archive the expired data on Amazon Simple Storage Service (S3).

To build this application, I can use a few components:

  • An Amazon API Gateway endpoint for the API.
  • A DynamoDB table to store data.
  • A Lambda function to process the API requests, and store data in the DynamoDB table.
  • DynamoDB Streams to capture data changes.
  • A Lambda function processing data changes to archive the expired data.

Can I make it simpler? Looking at the available patterns in the AWS Solutions Constructs, I find two that can help me build my app:

  • aws-apigateway-lambda, a Construct that implements an API Gateway REST API connected to a Lambda function. As an example of the “sensible defaults” used by AWS Solutions Constructs, this pattern enables CloudWatch logging for the API Gateway.
  • aws-dynamodb-stream-lambda, a Construct implementing a DynamoDB table streaming data changes to a Lambda function with the least privileged permissions.

To build the final architecture, I simply connect those two Constructs together:

I am using TypeScript to define the CDK stack, and Node.js for the Lambda functions. Let’s start with the CDK stack:

 

import * as cdk from '@aws-cdk/core';
import * as lambda from '@aws-cdk/aws-lambda';
import * as apigw from '@aws-cdk/aws-apigateway';
import * as dynamodb from '@aws-cdk/aws-dynamodb';
import { ApiGatewayToLambda } from '@aws-solutions-constructs/aws-apigateway-lambda';
import { DynamoDBStreamToLambda } from '@aws-solutions-constructs/aws-dynamodb-stream-lambda';

export class DemoConstructsStack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const apiGatewayToLambda = new ApiGatewayToLambda(this, 'ApiGatewayToLambda', {
      deployLambda: true,
      lambdaFunctionProps: {
        code: lambda.Code.fromAsset('lambda'),
        runtime: lambda.Runtime.NODEJS_12_X,
        handler: 'restApi.handler'
      },
      apiGatewayProps: {
        defaultMethodOptions: {
          authorizationType: apigw.AuthorizationType.NONE
        }
      }
    });

    const dynamoDBStreamToLambda = new DynamoDBStreamToLambda(this, 'DynamoDBStreamToLambda', {
      deployLambda: true,
      lambdaFunctionProps: {
        code: lambda.Code.fromAsset('lambda'),
        runtime: lambda.Runtime.NODEJS_12_X,
        handler: 'processStream.handler'
      },
      dynamoTableProps: {
        tableName: 'my-table',
        partitionKey: { name: 'id', type: dynamodb.AttributeType.STRING },
        timeToLiveAttribute: 'ttl'
      }
    });

    const apiFunction = apiGatewayToLambda.lambdaFunction;
    const dynamoTable = dynamoDBStreamToLambda.dynamoTable;

    dynamoTable.grantReadWriteData(apiFunction);
    apiFunction.addEnvironment('TABLE_NAME', dynamoTable.tableName);
  }
}

At the beginning of the stack, I import the standard CDK constructs for the Lambda function, the API Gateway endpoint, and the DynamoDB table. Then, I add the two patterns from the AWS Solutions Constructs, ApiGatewayToLambda and DynamoDBStreamToLambda.

After declaring the two ApiGatewayToLambda and DynamoDBStreamToLambda constructs, I store the Lambda function, created by the ApiGatewayToLambda constructs, and the DynamoDB table, created by DynamoDBStreamToLambda, in two variables.

At the end of the stack, I “connect” the two patterns together by granting permissions to the Lambda function to read/write in the DynamoDB table, and add the name of the DynamoDB table to the environment of the Lambda function, so that it can be used in the function code to store data in the table.

The code of the two Lambda functions is in the lambda folder of the CDK application. I am using the Node.js 12 runtime.

The restApi.js function implements the API and writes data to the DynamoDB table. The URL path is used as partition key, all the query string parameters in the URL are stored as attributes. The TTL for the item is computed adding a time window of 7 days to the current time.

const { DynamoDB } = require("aws-sdk");

const docClient = new DynamoDB.DocumentClient();

const TABLE_NAME = process.env.TABLE_NAME;
const TTL_WINDOW = 7 * 24 * 60 * 60; // 7 days expressed in seconds

exports.handler = async function (event) {

  const item = event.queryStringParameters;
  item.id = event.pathParameters.proxy;

  const now = new Date(); 
  item.ttl = Math.round(now.getTime() / 1000) + TTL_WINDOW;

  const response = await docClient.put({
    TableName: TABLE_NAME,
    Item: item
  }).promise();

  let statusCode = 204;
  
  if (response.err != null) {
    console.error('request: ', JSON.stringify(event, undefined, 2));
    console.error('error: ', response.err);
    statusCode = 500
  }

  return {
    statusCode: statusCode
  };
};

The processStream.js function is processing data capture records from the DynamoDB Stream, looking for the items deleted by TTL. The archive functionality is not implemented in this sample code.

exports.handler = async function (event) {
  event.Records.forEach((record) => {
    console.log('Stream record: ', JSON.stringify(record, null, 2));
    if (record.userIdentity.type == "Service" &&
      record.userIdentity.principalId == "dynamodb.amazonaws.com") {

      // Record deleted by DynamoDB Time to Live (TTL)
      
      // I can archive the record to S3, for example using Kinesis Data Firehose.
    }
  }
};

Let’s see if this works! First, I need to install all dependencies. To simplify dependencies, each release of AWS Solutions Constructs is linked to the corresponding version of the CDK. I this case, I am using version 1.46.0 for both the CDK and the AWS Solutions Constructs patterns. The first three commands are installing plain CDK constructs. The last two commands are installing the AWS Solutions Constructs patterns I am using for this application.

npm install @aws-cdk/[email protected]
npm install @aws-cdk/[email protected]
npm install @aws-cdk/[email protected]
npm install @aws-solutions-constructs/[email protected]
npm install @aws-solutions-constructs/[email protected]

Now, I build the application and use the CDK to deploy the application.

npm run build
cdk deploy

Towards the end of the output of the cdk deploy command, a green light is telling me that the deployment of the stack is completed. Just next, in the Outputs, I find the endpoint of the API Gateway.

 ✅  DemoConstructsStack

Outputs:
DemoConstructsStack.ApiGatewayToLambdaLambdaRestApiEndpoint9800D4B5 = https://1a2c3c4d.execute-api.eu-west-1.amazonaws.com/prod/

I can now use curl to test the API:

curl "https://1a2c3c4d.execute-api.eu-west-1.amazonaws.com/prod/danilop?name=Danilo&company=AWS"

Let’s have a look at the DynamoDB table:

The item is stored, and the TTL is set. After a week, the item will be deleted and sent via DynamoDB Streams to the processStream.js function.

After I complete my testing, I use the CDK again to quickly delete all resources created for this application:

cdk destroy

Available Now
The AWS Solutions Constructs are available now for TypeScript and Python. The AWS Solutions Builders team is working to make these constructs also available when using Java and C# with the CDK, stay tuned. There is no cost in using the AWS Solutions Constructs, or the CDK, you only pay for the resources created when deploying the stack.

In this first release, 25 patterns are included, covering lots of different use cases. Which new patterns and features should we focus now? Give use your feedback in the open source project repository!

Danilo

Introducing GitHub Super Linter: one linter to rule them all

Post Syndicated from Lucas Gravley original https://github.blog/2020-06-18-introducing-github-super-linter-one-linter-to-rule-them-all/

Setting up a new repository with all the right linters for the different types of code can be time consuming and tedious. So many tools and configurations to choose from and often more than one linter is needed to cover all the languages used.

The GitHub Super Linter was built out of necessity by the GitHub Services DevOps Engineering team to maintain consistency in our documentation and code while making communication and collaboration across the company a more productive experience. Now we are open sourcing that so everyone can use and improve it!

The Super Linter solves many of these requirements through automation. Some included features:

  • Prevent broken code from being uploaded to master branches
  • Help establish coding best practices across multiple languages
  • Build guidelines for code layout and format
  • Automate the process to help streamline code reviews
  • With these basic criteria, we should be shipping better, cleaner, and more stable code internally and to our customers and partners

What is it?

The Super Linter is a source code repository that is packaged into a Docker container and called by GitHub Actions. This allows for any repository on GitHub.com to call the Super Linter and start utilizing its benefits.

The Super Linter will currently support a lot of languages and more coming in the future. For details on languages, check out the README.md.

How it works

When you’ve set your repository to start running this action, any time you open a pull request, it will start linting the code case and return via the Status API. It will let you know if any of your code changes passed successfully, or if any errors were detected, where they are, and what they are. This then allows the developer to go back to their branch, fix any issues, and create a new push to the open pull request. At that point, the Super Linter will run again and validate the updated code and repeat the process. You can configure your branch protection rules to make sure all code must pass before being able to merge as an additional measure.

There’s a ton of customization with flags and templates that can help you customize the Super Linter to your individual repository. Just follow the detailed directions at the Super Linter repository and the Super Linter wiki.

This tool can also be helpful for any repository where multiple types of code and/or documentation all live together (monorepo).

GitHub Super Linter in action

Default rules

Standardizing a rule set across the Super Linter has been an interesting challenge as each developer is unique in how they code. This is why we allow users to use any rules for the linter as they see fit for their repository. But, if no ruleset is defined, we must default to a certain standard.

The rule set for Ruby and Rails are pulled from the Ruby gem: rubocop-github and follow the same rules and versioning we use on GitHub.com.

For other languages, we choose what is the default when installing the linter such as: coffeelint or yamllint. For others, we try to find a happy middle ground that lays the simple groundwork and helps establish some best practices like: Markdownlint or pylint.

The beauty of this is, out of the box you will start establishing the framework, and your team can decide at any point, if additional customization is needed, you have all the ability to do so.

Just navigate to the Super Linter and copy templates from the TEMPLATES folder to your local repository.

Join in the fun

We encourage you to set up this action and start the process of cleaning up your codebase and building your team’s standards and best practices.

How can I contribute?

We’re always looking to update best practices, add additional languages, and make the tool easier for consumption. If you’d like to help contribute to this action, check out our contributing guide.

Learn more about our Super Linter

An open source camera stack for Raspberry Pi using libcamera

Post Syndicated from David Plowman original https://www.raspberrypi.org/blog/an-open-source-camera-stack-for-raspberry-pi-using-libcamera/

Since we released the first Raspberry Pi camera module back in 2013, users have been clamouring for better access to the internals of the camera system, and even to be able to attach camera sensors of their own to the Raspberry Pi board. Today we’re releasing our first version of a new open source camera stack which makes these wishes a reality.

(Note: in what follows, you may wish to refer to the glossary at the end of this post.)

We’ve had the building blocks for connecting other sensors and providing lower-level access to the image processing for a while, but Linux has been missing a convenient way for applications to take advantage of this. In late 2018 a group of Linux developers started a project called libcamera to address that. We’ve been working with them since then, and we’re pleased now to announce a camera stack that operates within this new framework.

Here’s how our work fits into the libcamera project.

We’ve supplied a Pipeline Handler that glues together our drivers and control algorithms, and presents them to libcamera with the API it expects.

Here’s a little more on what this has entailed.

V4L2 drivers

V4L2 (Video for Linux 2) is the Linux kernel driver framework for devices that manipulate images and video. It provides a standardised mechanism for passing video buffers to, and/or receiving them from, different hardware devices. Whilst it has proved somewhat awkward as a means of driving entire complex camera systems, it can nonetheless provide the basis of the hardware drivers that libcamera needs to use.

Consequently, we’ve upgraded both the version 1 (Omnivision OV5647) and version 2 (Sony IMX219) camera drivers so that they feature a variety of modes and resolutions, operating in the standard V4L2 manner. Support for the new Raspberry Pi High Quality Camera (using the Sony IMX477) will be following shortly. The Broadcom Unicam driver – also V4L2‑based – has been enhanced too, signalling the start of each camera frame to the camera stack.

Finally, dumping raw camera frames (in Bayer format) into memory is of limited value, so the V4L2 Broadcom ISP driver provides all the controls needed to turn raw images into beautiful pictures!

Configuration and control algorithms

Of course, being able to configure Broadcom’s ISP doesn’t help you to know what parameters to supply. For this reason, Raspberry Pi has developed from scratch its own suite of ISP control algorithms (sometimes referred to generically as 3A Algorithms), and these are made available to our users as well. Some of the most well known control algorithms include:

  • AEC/AGC (Auto Exposure Control/Auto Gain Control): this monitors image statistics into order to drive the camera exposure to an appropriate level.
  • AWB (Auto White Balance): this corrects for the ambient light that is illuminating a scene, and makes objects that appear grey to our eyes come out actually grey in the final image.

But there are many others too, such as ALSC (Auto Lens Shading Correction, which corrects vignetting and colour variation across an image), and control for noise, sharpness, contrast, and all other aspects of image processing. Here’s how they work together.

The control algorithms all receive statistics information from the ISP, and cooperate in filling in metadata for each image passing through the pipeline. At the end, the metadata is used to update control parameters in both the image sensor and the ISP.

Previously these functions were proprietary and closed source, and ran on the Broadcom GPU. Now, the GPU just shovels pixels through the ISP hardware block and notifies us when it’s done; practically all the configuration is computed and supplied from open source Raspberry Pi code on the ARM processor. A shim layer still exists on the GPU, and turns Raspberry Pi’s own image processing configuration into the proprietary functions of the Broadcom SoC.

To help you configure Raspberry Pi’s control algorithms correctly for a new camera, we include a Camera Tuning Tool. Or if you’d rather do your own thing, it’s easy to modify the supplied algorithms, or indeed to replace them entirely with your own.

Why libcamera?

Whilst ISP vendors are in some cases contributing open source V4L2 drivers, the reality is that all ISPs are very different. Advertising these differences through kernel APIs is fine – but it creates an almighty headache for anyone trying to write a portable camera application. Fortunately, this is exactly the problem that libcamera solves.

We provide all the pieces for Raspberry Pi-based libcamera systems to work simply “out of the box”. libcamera remains a work in progress, but we look forward to continuing to help this effort, and to contributing an open and accessible development platform that is available to everyone.

Summing it all up

So far as we know, there are no similar camera systems where large parts, including at least the control (3A) algorithms and possibly driver code, are not closed and proprietary. Indeed, for anyone wishing to customise a camera system – perhaps with their own choice of sensor – or to develop their own algorithms, there would seem to be very few options – unless perhaps you happen to be an extremely large corporation.

In this respect, the new Raspberry Pi Open Source Camera System is providing something distinctly novel. For some users and applications, we expect its accessible and non-secretive nature may even prove quite game-changing.

What about existing camera applications?

The new open source camera system does not replace any existing camera functionality, and for the foreseeable future the two will continue to co-exist. In due course we expect to provide additional libcamera-based versions of raspistill, raspivid and PiCamera – so stay tuned!

Where next?

If you want to learn more about the libcamera project, please visit https://libcamera.org.

To try libcamera for yourself with a Raspberry Pi, please follow the instructions in our online documentation, where you’ll also find the full Raspberry Pi Camera Algorithm and Tuning Guide.

If you’d like to know more, and can’t find an answer in our documentation, please go to the Camera Board forum. We’ll be sure to keep our eyes open there to pick up any of your questions.

Acknowledgements

Thanks to Naushir Patuck and Dave Stevenson for doing all the really tricky bits (lots of V4L2-wrangling).

Thanks also to the libcamera team (Laurent Pinchart, Kieran Bingham, Jacopo Mondi and Niklas Söderlund) for all their help in making this project possible.

 

Glossary

3A, 3A Algorithms: refers to AEC/AGC (Auto Exposure Control/Auto Gain Control), AWB (Auto White Balance) and AF (Auto Focus) algorithms, but may implicitly cover other ISP control algorithms. Note that Raspberry Pi does not implement AF (Auto Focus), as none of our supported camera modules requires it
AEC: Auto Exposure Control
AF: Auto Focus
AGC: Auto Gain Control
ALSC: Auto Lens Shading Correction, which corrects vignetting and colour variations across an image. These are normally caused by the type of lens being used and can vary in different lighting conditions
AWB: Auto White Balance
Bayer: an image format where each pixel has only one colour component (one of R, G or B), creating a sort of “colour mosaic”. All the missing colour values must subsequently be interpolated. This is a raw image format meaning that no noise, sharpness, gamma, or any other processing has yet been applied to the image
CSI-2: Camera Serial Interface (version) 2. This is the interface format between a camera sensor and Raspberry Pi
GPU: Graphics Processing Unit. But in this case it refers specifically to the multimedia coprocessor on the Broadcom SoC. This multimedia processor is proprietary and closed source, and cannot directly be programmed by Raspberry Pi users
ISP: Image Signal Processor. A hardware block that turns raw (Bayer) camera images into full colour images (either RGB or YUV)
Raw: see Bayer
SoC: System on Chip. The Broadcom processor at the heart of all Raspberry Pis
Unicam: the CSI-2 receiver on the Broadcom SoC on the Raspberry Pi. Unicam receives pixels being streamed out by the image sensor
V4L2: Video for Linux 2. The Linux kernel driver framework for devices that process video images. This includes image sensors, CSI-2 receivers, and ISPs

The post An open source camera stack for Raspberry Pi using libcamera appeared first on Raspberry Pi.

Announcing TorchServe, An Open Source Model Server for PyTorch

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/announcing-torchserve-an-open-source-model-server-for-pytorch/

PyTorch is one of the most popular open source libraries for deep learning. Developers and researchers particularly enjoy the flexibility it gives them in building and training models. Yet, this is only half the story, and deploying and managing models in production is often the most difficult part of the machine learning process: building bespoke prediction APIs, scaling them, securing them, etc.

One way to simplify the model deployment process is to use a model server, i.e. an off-the-shelf web application specially designed to serve machine learning predictions in production. Model servers make it easy to load one or several models, automatically creating a prediction API backed by a scalable web server. They’re also able to run preprocessing and postprocessing code on prediction requests. Last but not least, model servers also provide production-critical features like logging, monitoring, and security. Popular model servers include TensorFlow Serving and the Multi Model Server.

Today, I’m extremely happy to announce TorchServe, a PyTorch model serving library that makes it easy to deploy trained PyTorch models at scale without having to write custom code.

Introducing TorchServe
TorchServe is a collaboration between AWS and Facebook, and it’s available as part of the PyTorch open source project. If you’re interested in how the project was initiated, you can read the initial RFC on Github.

With TorchServe, PyTorch users can now bring their models to production quicker, without having to write custom code: on top of providing a low latency prediction API, TorchServe embeds default handlers for the most common applications such as object detection and text classification. In addition, TorchServe includes multi-model serving, model versioning for A/B testing, monitoring metrics, and RESTful endpoints for application integration. As you would expect, TorchServe supports any machine learning environment, including Amazon SageMaker, container services, and Amazon Elastic Compute Cloud (EC2).

Several customers are already enjoying the benefits of TorchServe.

Toyota Research Institute Advanced Development, Inc. (TRI-AD) is developing software for automated driving at Toyota Motor Corporation. Says Yusuke Yachide, Lead of ML Tools at TRI-AD: “we continuously optimize and improve our computer vision models, which are critical to TRI-AD’s mission of achieving safe mobility for all with autonomous driving. Our models are trained with PyTorch on AWS, but until now PyTorch lacked a model serving framework. As a result, we spent significant engineering effort in creating and maintaining software for deploying PyTorch models to our fleet of vehicles and cloud servers. With TorchServe, we now have a performant and lightweight model server that is officially supported and maintained by AWS and the PyTorch community”.

Matroid is a maker of computer vision software that detects objects and events in video footage. Says Reza Zadeh, Founder and CEO at Matroid Inc.: “we develop a rapidly growing number of machine learning models using PyTorch on AWS and on-premise environments. The models are deployed using a custom model server that requires converting the models to a different format, which is time-consuming and burdensome. TorchServe allows us to simplify model deployment using a single servable file that also serves as the single source of truth, and is easy to share and manage”.

Now, I’d like to show you how to install TorchServe, and load a pretrained model on Amazon Elastic Compute Cloud (EC2). You can try other environments by following the documentation.

Installing TorchServe
First, I fire up a CPU-based Amazon Elastic Compute Cloud (EC2) instance running the Deep Learning AMI (Ubuntu edition). This AMI comes preinstalled with several dependencies that I’ll need, which will speed up setup. Of course you could use any AMI instead.

TorchServe is implemented in Java, and I need the latest OpenJDK to run it.

sudo apt install openjdk-11-jdk

Next, I create and activate a new Conda environment for TorchServe. This will keep my Python packages nice and tidy (virtualenv works too, of course).

conda create -n torchserve

source activate torchserve

Next, I install dependencies for TorchServe.

pip install sentencepiece       # not available as a Conda package

conda install psutil pytorch torchvision torchtext -c pytorch

If you’re using a GPU instance, you’ll need an extra package.

conda install cudatoolkit=10.1

Now that dependencies are installed, I can clone the TorchServe repository, and install TorchServe.

git clone https://github.com/pytorch/serve.git

cd serve

pip install .

cd model-archiver

pip install .

Setup is complete, let’s deploy a model!

Deploying a Model
For the sake of this demo, I’ll simply download a pretrained model from the PyTorch model zoo. In real life, you would probably use your own model.

wget https://download.pytorch.org/models/densenet161-8d451a50.pth

Next, I need to package the model into a model archive. A model archive is a ZIP file storing all model artefacts, i.e. the model itself (densenet161-8d451a50.pth), a Python script to load the state dictionary (matching tensors to layers), and any extra file you may need. Here, I include a file named index_to_name.json, which maps class identifiers to class names. This will be used by the built-in image_classifier handler, which is in charge of the prediction logic. Other built-in handlers are available (object_detector, text_classifier, image_segmenter), and you can implement your own.

torch-model-archiver --model-name densenet161 --version 1.0 \
--model-file examples/image_classifier/densenet_161/model.py \
--serialized-file densenet161-8d451a50.pth \
--extra-files examples/image_classifier/index_to_name.json \
--handler image_classifier

Next, I create a directory to store model archives, and I move the one I just created there.

mkdir model_store

mv densenet161.mar model_store/

Now, I can start TorchServe, pointing it at the model store and at the model I want to load. Of course, I could load several models if needed.

torchserve --start --model-store model_store --models densenet161=densenet161.mar

Still on the same machine, I grab an image and easily send it to TorchServe for local serving using an HTTP POST request. Note the format of the URL, which includes the name of the model I want to use.

curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg

curl -X POST http://127.0.0.1:8080/predictions/densenet161 -T kitten.jpg

The result appears immediately. Note that class names are visible, thanks to the built-in handler.

[
{"tiger_cat": 0.4693356156349182},
{"tabby": 0.46338796615600586},
{"Egyptian_cat": 0.06456131488084793},
{"lynx": 0.0012828155886381865},
{"plastic_bag": 0.00023323005007114261}
]

I then stop TorchServe with the ‘stop‘ command.

torchserve --stop

As you can see, it’s easy to get started with TorchServe using the default configuration. Now let me show you how to set it up for remote serving.

Configuring TorchServe for Remote Serving
Let’s create a configuration file for TorchServe, named config.properties (the default name). This files defines which model to load, and sets up remote serving. Here, I’m binding the server to all public IP addresses, but you can restrict it to a specific address if you want to. As this is running on an EC2 instance, I need to make sure that ports 8080 and 8081 are open in the Security Group.

model_store=model_store
load_models=densenet161.mar
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081

Now I can start TorchServe in the same directory, without having to pass any command line arguments.

torchserve --start

Moving back to my local machine, I can now invoke TorchServe remotely, and get the same result.

curl -X POST http://ec2-54-85-61-250.compute-1.amazonaws.com:8080/predictions/densenet161 -T kitten.jpg

You probably noticed that I used HTTP. I’m guessing a lot of you will require HTTPS in production, so let me show you how to set it up.

Configuring TorchServe for HTTPS
TorchServe can use either the Java keystore or a certificate. I’ll go with the latter.

First, I create a certificate and a private key with openssl.

openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem

Then, I update the configuration file to define the location of the certificate and key, and I bind TorchServe to its default secure ports (don’t forget to update the Security Group).

model_store=model_store
load_models=densenet161.mar
inference_address=https://0.0.0.0:8443
management_address=https://0.0.0.0:8444
private_key_file=mykey.key
certificate_file=mycert.pem

I restart TorchServe, and I can now invoke it with HTTPS. As I use a self-signed certificate, I need to pass the ‘–insecure’ flag to curl.

curl --insecure -X POST https://ec2-54-85-61-250.compute-1.amazonaws.com:8443/predictions/densenet161 -T kitten.jpg

There’s a lot more to TorchServe configuration, and I encourage you to read its documentation!

Getting Started
TorchServe is available now at https://github.com/pytorch/serve.

Give it a try, and please send us feedback on Github.

– Julien

 

 

 

AWS Step Functions support in Visual Studio Code

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/aws-step-functions-support-in-visual-studio-code/

The AWS Toolkit for Visual Studio Code has been installed over 115,000 times since launching in July 2019. We are excited to announce toolkit support for AWS Step Functions, enabling you to define, visualize, and create your Step Functions workflows without leaving VS Code.

Version 1.8 of the toolkit provides two new commands in the Command Palette to help you define and visualize your workflows. The toolkit also provides code snippets for seven different Amazon States Language (ASL) state types and additional service integrations to speed up workflow development. Automatic linting detects errors in your state machine as you type, and provides tooltips to help you correct the errors. Finally, the toolkit allows you to create or update Step Functions workflows in your AWS account without leaving VS Code.

Defining a new state machine

To define a new Step Functions state machine, first open the VS Code Command Palette by choosing Command Palette from the View menu. Enter Step Functions to filter the available options and choose AWS: Create a new Step Functions state machine.

Screen capture of the Command Palette in Visual Studio Code with the text ">AWS Step Functions" entered

Creating a new Step Functions state machine in VS Code

A dialog box appears with several options to help you get started quickly. Select Hello world to create a basic example using a series of Pass states.

A screen capture of the Visual Studio Code Command Palette "Select a starter template" dialog with "Hello world" selected

Selecting the “Hello world” starter template

VS Code creates a new Amazon States Language file containing a workflow with examples of the Pass, Choice, Fail, Wait, and Parallel states.

A screen capture of a Visual Studio Code window with a "Hello World" example state machine

The “Hello World” example state machine

Pass states allow you to define your workflow before building the implementation of your logic with Task states. This lets you work with business process owners to ensure you have the workflow right before you start writing code. For more information on the other state types, see State Types in the ASL documentation.

Save your new workflow by choosing Save from the File menu. VS Code automatically applies the .asl.json extension.

Visualizing state machines

In addition to helping define workflows, the toolkit also enables you to visualize your workflows without leaving VS Code.

To visualize your new workflow, open the Command Palette and enter Preview state machine to filter the available options. Choose AWS: Preview state machine graph.

A screen capture of the Visual Studio Code Command Palette with the text ">Preview state machine" entered and the option "AWS: Preview state machine graph" highlighted

Previewing the state machine graph in VS Code

The toolkit renders a visualization of your workflow in a new tab to the right of your workflow definition. The visualization updates automatically as the workflow definition changes.

A screen capture of a Visual Studio Code window with two side-by-side tabs, one with a state machine definition and one with a preview graph for the same state machine

A state machine preview graph

Modifying your state machine definition

The toolkit provides code snippets for 12 different ASL states and service integrations. To insert a code snippet, place your cursor within the States object in your workflow and press Ctrl+Space to show the list of available states.

A screen capture of a Visual Studio Code window with a code snippet insertion dialog showing twelve Amazon States Langauge states

Code snippets are available for twelve ASL states

In this example, insert a newline after the definition of the Pass state, press Ctrl+Space, and choose Map State to insert a code snippet with the required structure for an ASL Map State.

Debugging state machines

The toolkit also includes features to help you debug your Step Functions state machines. Visualization is one feature, as it allows the builder and the product owner to confirm that they have a shared understanding of the relevant process.

Automatic linting is another feature that helps you debug your workflows. For example, when you insert the Map state into your workflow, a number of errors are detected, underlined in red in the editor window, and highlighted in red in the Minimap. The visualization tab also displays an error to inform you that the workflow definition has errors.

A screen capture of a Visual Studio Code window with a tooltip dialog indicating an "Unreachable state" error

A tooltip indicating an “Unreachable state” error

Hovering over an error opens a tooltip with information about the error. In this case, the toolkit is informing you that MapState is unreachable. Correct this error by changing the value of Next in the Pass state above from Hello World Example to MapState. The red underline automatically disappears, indicating the error has been resolved.

To finish reconciling the errors in your workflow, cut all of the following states from Hello World Example? through Hello World and paste into MapState, replacing the existing values of MapState.Iterator.States. The workflow preview updates automatically, indicating that the errors have been resolved. The MapState is indicated by the three dashed lines surrounding most of the workflow.

A Visual Studio Code window displaying two tabs, an updated state machine definition and the automatically-updated preview of the same state machine

Automatically updating the state machine preview after changes

Creating and updating state machines in your AWS account

The toolkit enables you to publish your state machine directly to your AWS account without leaving VS Code. Before publishing a state machine to your account, ensure that you establish credentials for your AWS account for the toolkit.

Creating a state machine in your AWS account

To publish a new state machine to your AWS account, bring up the VS Code Command Palette as before. Enter Publish to filter the available options and choose AWS: Publish state machine to Step Functions.

Screen capture of the Visual Studio Command Palette with the command "AWS: Publish state machine to Step Functions" highlighted

Publishing a state machine to AWS Step Functions

Choose Quick Create from the dialog box to create a new state machine in your AWS account.

Screen Capture from a Visual Studio Code flow to publish a state machine to AWS Step Functions with "Quick Create" highlighted

Publishing a state machine to AWS Step Functions

Select an existing execution role for your state machine to assume. This role must already exist in your AWS account.

For more information on creating execution roles for state machines, please visit Creating IAM Roles for AWS Step Functions.

Screen capture from Visual Studio Code showing a selection execution role dialog with "HelloWorld_IAM_Role" selected

Selecting an IAM execution role for a state machine

Provide a name for the new state machine in your AWS account, for example, Hello-World. The name must be from one to 80 characters, and can use alphanumeric characters, dashes, or underscores.

Screen capture from a Visual Studio Code flow entering "Hello-World" as a state machine name

Naming your state machine

Press the Enter or Return key to confirm the name of your state machine. The Output console opens, and the toolkit displays the result of creating your state machine. The toolkit provides the full Amazon Resource Name (ARN) of your new state machine on completion.

Screen capture from Visual Studio Code showing the successful creation of a new state machine in the Output window

Output of creating a new state machine

You can check creation for yourself by visiting the Step Functions page in the AWS Management Console. Choose the newly-created state machine and the Definition tab. The console displays the definition of your state machine along with a preview graph.

Screen capture of the AWS Management Console showing the newly-created state machine

Viewing the new state machine in the AWS Management Console

Updating a state machine in your AWS account

It is common to change workflow definitions as you refine your application. To update your state machine in your AWS account, choose Quick Update instead of Quick Create. Select your existing workflow.

A screen capture of a Visual Studio Code dialog box with a single state machine displayed and highlighted

Selecting an existing state machine to update

The toolkit displays “Successfully updated state machine” and the ARN of your state machine in the Output window on completion.

Summary

In this post, you learn how to use the AWS Toolkit for VS Code to create and update Step Functions state machines in your local development environment. You discover how sample templates, code snippets, and automatic linting can accelerate your development workflows. Finally, you see how to create and update Step Functions workflows in your AWS account without leaving VS Code.

Install the latest release of the toolkit and start building your workflows in VS Code today.

 

Improving Transparency of AWS Elastic Beanstalk

Post Syndicated from Rob Sutter original https://aws.amazon.com/blogs/compute/improving-transparency-of-aws-elastic-beanstalk/

This post is courtesy of David LaBissoniere, Software Development Manager, AWS Elastic Beanstalk.

Today I want to discuss two recent announcements from the AWS Elastic Beanstalk team which improve transparency into our planning and development. We launched a new public roadmap, and we shifted to developing the Elastic Beanstalk command line interface (EB CLI) on GitHub as a community-involved open source project.

Public Roadmap

In January, we launched an experimental public roadmap on GitHub, joining other teams like AWS container services, AWS CloudFormation, and AWS App Mesh. The roadmap allows us to be more transparent about our priorities, and enables you to directly influence them. You can propose a feature by opening a GitHub issue, or comment on existing issues. 2020 is shaping up to be a significant year for us, and as we continue to invest in the service, we want customer input to help direct our focus.

The roadmap itself is built as a GitHub project board and contains five columns:

Just Shipped — Launched and available for production use.
Public Beta — Available in a preview form but not yet recommended for production usage.
Coming Soon — Launching soon, generally within the next one to three months.
We’re Working On It — In progress, but further out.
Researching — We’re interested in this feature but are still thinking about the best way to implement it.

Screen capture of the AWS Elastic Beanstalk project board on GitHub

Please feel free to create a GitHub issue for a feature you want us to support, or give a thumbs-up to existing issues. We’d also love to hear from you in the issue comments about how you’d like to use a particular feature or how you think it should work. While the roadmap doesn’t include every single item we are working on, it does include many of the regular incremental launches customers rely on, for example, new platform runtime updates like PHP 7.3 or .NET Core 3.1. We’re starting out with a subset of our planned and in-flight work, and expect to gradually expand our use of the roadmap over the course of the year.

EB CLI on GitHub

A popular way to use Elastic Beanstalk is our command line interface, the EB CLI. As of January 16, it is hosted on GitHub as an Apache 2.0-licensed open source project. We plan to do nearly all of our CLI development openly on GitHub and welcome pull requests from the community. Many customers rely on the EB CLI as part of their development and deployment workflows. We hope to improve transparency into this critical tool by open-sourcing it, and we also hope you join us in improving it.

We’re thrilled to start off the year with these two announcements. Watch the roadmap for more announcements in this space!

SVT-AV1: an open-source AV1 encoder and decoder

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/svt-av1-an-open-source-av1-encoder-and-decoder-ad295d9b5ca2

SVT-AV1: open-source AV1 encoder and decoder

by Andrey Norkin, Joel Sole, Mariana Afonso, Kyle Swanson, Agata Opalach, Anush Moorthy, Anne Aaron

SVT-AV1 is an open-source AV1 codec implementation hosted on GitHub https://github.com/OpenVisualCloud/SVT-AV1/ under a BSD + patent license. As mentioned in our earlier blog post, Intel and Netflix have been collaborating on the SVT-AV1 encoder and decoder framework since August 2018. The teams have been working closely on SVT-AV1 development, discussing architectural decisions, implementing new tools, and improving compression efficiency. Since open-sourcing the project, other partner companies and the open-source community have contributed to SVT-AV1. In this tech blog, we will report the current status of the SVT-AV1 project, as well as the characteristics and performance of the encoder and decoder.

SVT-AV1 codebase status

The SVT-AV1 repository includes both an AV1 encoder and decoder, which share a significant amount of the code. The SVT-AV1 decoder is fully functional and compliant with the AV1 specification for all three profiles (Main, High, and Professional).

The SVT-AV1 encoder supports all AV1 tools which contribute to compression efficiency. Compared to the most recent master version of libaom (AV1 reference software), SVT-AV1 is similar in compression efficiency and at the same time achieves significantly lower encoding latency on multi-core platforms when using its inherent parallelization capabilities.

SVT-AV1 is written in C and can be compiled on major platforms, such as Windows, Linux, and macOS. In addition to the pure C function implementations, which allows for more flexible experimentation, the codec features extensive assembly and intrinsic optimizations for the x86 platform. See the next section for an outline of the main SVT-AV1 features that allow high performance at competitive compression efficiency. SVT-AV1 also includes extensive documentation on the encoder design targeted to facilitate the onboarding process for new developers.

Architectural features

One of Intel’s goals for SVT-AV1 development was to create an AV1 encoder that could offer performance and scalability. SVT-AV1 uses parallelization at several stages of the encoding process, which allows it to adapt to the number of available cores, including the newest servers with significant core count. This makes it possible for SVT-AV1 to decrease encoding time while still maintaining compression efficiency.

The SVT-AV1 encoder uses multi-dimensional (process-, picture/tile-, and segment-based) parallelism, multi-stage partitioning decisions, block-based multi-stage and multi-class mode decisions, and RD-optimized classification to achieve attractive trade-offs between compression and performance. Another feature of the SVT architecture is open-loop hierarchical motion estimation, which makes it possible to decouple the first stage of motion estimation from the rest of the encoding process.

Compression efficiency and performance

Encoder performance

SVT-AV1 reaches similar compression efficiency as libaom at the slowest speed settings. During the codec development, we have been tracking the compression and encoding results at the https://videocodectracker.dev/ site. The plot below shows the improvements in the compression efficiency of SVT-AV1 compared to the libaom encoder over time. Note that the libaom compression has also been improving over time, and the plot below represents SVT-AV1 catching up with the moving target. In the plot, the Y-axis shows the additional bitrate in percent needed to achieve similar quality as libaom encoder according to three metrics. The plot shows the results of the 2-pass encoding mode in both codecs. SVT-AV1 uses 4-thread mode, whereas libaom operates in a single-thread mode. The SVT-AV1 results for the 1-pass fixed-QP encoding mode, commonly used in research, are even more competitive, as detailed below.

Reducing BD-rate between SVT-AV1 and libaom in 2-pass encoding mode

The comparison results of the SVT-AV1 against libaom on objective-1-fast test set are presented in the table below. For estimating encoding times, we used Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz machine with 52 physical cores and 96 GB of RAM, with 60 jobs running in parallel. Both codecs use bi-directional hierarchical prediction structure of 16 pictures. The results are presented for 1-pass mode with fixed frame-level QP offsets. A single-threaded compression mode is used. Below, we compute the BD-rates for the various quality metrics: PSNR on all three color planes, VMAF, and MS-SSIM. A negative BD-Rate indicates that the SVT-AV1 encodes produce the same quality with the indicated relative reduction in bitrate. As seen below, SVT-AV1 demonstrates 16.5% decrease in encoding time compared to libaom while being slightly more efficient in compression ability. Note that the encoding times ratio may vary depending on the instruction sets supported by the platform. The results have been obtained on SVT-AV1 cs2 branch (a development branch that is currently being merged into the master, git hash 3a19f29) against the libaom master branch (git hash fe72512). The QP values used to calculate the BD-rates are: 20, 32, 43, 55, 63.

BD-rates of SVT-AV1 vs libaom in 1-pass encoding mode with fixed QP offsets. Negative numbers indicate reduction in bitrate needed to reach the same quality level. The overall encoding time difference is change in total CPU time for all sequences and QPs of SVT-AV1 compared to that of libaom.

*The overall encoding CPU time difference is calculated as change in total CPU time for all sequences and QPs of the test compared to that of the anchor. It is not equal to the average of per sequence values. Per each sequence, the encoding CPU time difference is calculated as change in total CPU time for all QPs for this sequence.

Since all sequences in the objective-1-fast test set have 60 frames, both codecs use one key frame. The following command line parameters have been used to compare the codecs.

libaom parameters:

--passes=1 --lag-in-frames=25 --auto-alt-ref=1 --min-gf-interval=16 --max-gf-interval=16 --gf-min-pyr-height=4 --gf-max-pyr-height=4 --kf-min-dist=65 --kf-max-dist=65 --end-usage=q --use-fixed-qp-offsets=1 --deltaq-mode=0 --enable-tpl-model=0 --cpu-used=0

SVT-AV1 parameters:

--preset 1 --scm 2 --keyint 63 --lookahead 0 --lp 1

The results above demonstrate the excellent objective performance of SVT-AV1. In addition, SVT-AV1 includes implementations of some subjective quality tools, which can be used if the codec is configured for the subjective quality.

Decoder performance

On the objective-1-fast test set, the SVT-AV1 decoder is slightly faster than the libaom in the 1-thread mode, with larger improvements in the 4-thread mode. We observe even larger speed gains over libaom decoder when decoding bitstreams with multiple tiles using the 4-thread mode. The testing has been performed on Windows, Linux, and macOS platforms. We believe the performance is satisfactory for a research decoder, where the trade-offs favor easier experimentation over further optimizations necessary for a production decoder.

Testing framework

To help ensure codec conformance, especially for new code contributions, the code has been comprehensively covered with unit tests and end-to-end tests. The unit tests are built on the Google Test framework. The unit and end-to-end tests are triggered automatically for each pull request to the repository, which is supported by GitHub actions. The tests support sharding, and they run in parallel to speed-up the turn-around time on pull requests.

Unit and e2e test have passed for this pull request

What’s next?

Over the last several months, SVT-AV1 has matured to become a complete encoder/decoder package providing competitive compression efficiency and performance trade-offs. The project is bolstered with extensive unit test coverage and documentation.

Our hope is that the SVT-AV1 codebase helps further adoption of AV1 and encourages more research and development on top of the current AV1 tools. We believe that the demonstrated advantages of SVT-AV1 make it a good platform for experimentation and research. We invite colleagues from industry and academia to check out the project on Github, reach out to the codebase maintainers for questions and comments or join one of the SVT-AV1 Open Dev meetings. We welcome more contributors to the project.


SVT-AV1: an open-source AV1 encoder and decoder was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Automating MySQL schema migrations with GitHub Actions and more

Post Syndicated from Shlomi Noach original https://github.blog/2020-02-14-automating-mysql-schema-migrations-with-github-actions-and-more/

In the past year, GitHub engineers shipped GitHub Packages, Actions, Sponsors, Mobile, security advisories and updates, notifications, code navigation, and more. Needless to say, the development pace at GitHub is accelerated.

With MySQL serving our backends, updating code requires changes to the underlying database schema. New features may require new tables, columns, changes to existing columns or indexes, dropping unused tables, and so on. On average, we have two schema migrations running daily on our production servers. Some days we have a half dozen migrations to run. We’ll cover how this amounted to a significant toil on the database infrastructure team, and how we searched for a solution to automate the manual parts of the process.

What’s in a migration?

At first glance, migrating appears to be no more difficult than adding a CREATE, ALTER or DROP TABLE statement. At a closer look, the process is far more complex, and involves multiple owners, platforms, environments, and transitions between those pieces. Here’s the flow as we experience it at GitHub:

1. Starting the process

It begins with a developer who identifies the need for a schema change. Maybe they need a new table, or a new column in an existing table. The developer has a local testing environment where they can experiment however they like, until they’re satisfied and wish to apply changes to production.

2. Feedback and review

The developer doesn’t just apply their changes online. First, they seek review and discussion with their peers. Depending on the change, they may ask for a review from a group of schema reviewers (at GitHub, this is a volunteer group experienced with database design). Then, they seek the agreement of the database infrastructure team, who owns the production databases. The database infrastructure team reviews the changes, looking for performance concerns, among other potential issues. Assuming all reviews are favorable, it’s on the database infrastructure engineer to deploy the change to production.

3. Taking the change to production

At this point, we need to determine where the change is taking place since we have multiple clusters. Some of them are sharded, so we have to ask: Where do the affected tables exist in our clusters or schemas? Next, we need to know what to run. The developer presented the schema they want to see in production, but how do we transition the existing production schema into the one requested? What’s the formal CREATE, ALTER or DROP statement? Following what to run, we need to know how we should run the migration. Do we run the query directly? Or is it a blocking operation and we need an online schema change tool? And finally, we need to know when to execute the migration. Perhaps now is not a good time if there’s already a migration running on the cluster.

4. Migration

At long last, we’re ready to run the migration. Some of our larger tables may take hours and even days to migrate, especially since the site needs to be up and running. We want to track status. And we want to see what impact the migration may have on production, or, preferably, to ensure it does not have an impact.

5. Completing the process

Even as the migration completes there are further steps to take. There’s some cleanup process, and we want to unblock the next migration, if any currently exists. The database infrastructure team wishes to advertise to the developer that the changes have taken place, and the developer will have their own followup to address.

Throughout that flow, there’s a lot of potential for friction:

  • Does the database infrastructure team review the developer’s request in a timely fashion?
  • Is the review process productive?
  • Do we need to wait for something before running the migration?
  • Is the database infrastructure engineer actually available to run the migration, or perhaps they’re busy with other tasks?

The database infrastructure engineer needs to either create or review the migration statement, double-check their logic, ensure they can begin the migration, follow up, unblock other migrations as needed, advertise progress to the developer, and so on.

With our volume of daily migrations, this flow sometimes consumed hours of work of a database infrastructure engineer per day, and—in the best-case scenario—at least several hours of work per week. They would frequently multitask between two or three migrations and keep mental notes for next steps. Developers would ping us to ask what the status was, and their work was sometimes blocked until the migration was complete.

A brief history of schema migration automation at GitHub

GitHub was originally created as a Ruby on Rails (RoR) app. Like other frameworks, and in particular, those using Active Record, RoR has a built-in mechanism to generate database schema from code, as well as programmatically express migrations. RoR tooling can analyze code changes and create and run the SQL statements to change the database schema.

We use the GitHub flow to manage our own development: when suggesting a change, we create a branch, commit, push, and open a pull request. We use the declarative approach to schema definition: our RoR GitHub repository contains the full schema definition, such as the CREATE TABLE statements that generate the complete schema. This way, we know exactly what schema is associated with each commit or branch. Counter that with the programmatic approach, where your commits contain migration statements, and where to deduce a schema you need to start at some baseline and run through all statements sequentially.

The database infrastructure and the application teams collaborated to create a set of chatops tooling. We ran a chatops command to list pull requests with schema changes, and then another command to generate the CREATE/ALTER/DROP statement for a given pull request. For this, we used RoR’s rake command. Our wrapper scripts then added meta information, like which cluster is involved, and generated a script used to run the migration.

The generated statements and script were mostly fine, with occasional SQL syntax errors. We’d review the output and fix it manually as needed.

A few years ago we developed gh-ost, an online table migration solution, which added even more visibility and control through our chatops. We’d check progress, change runtime configuration, and cut-over the migration through chat. While simple, these were still manual steps.

The heart of GitHub’s app remains with the same RoR, but we’ve expanded far beyond it. We created more repositories and some also use RoR, while others are in other programming languages such as Go. However, we didn’t use Object Relational Mapping practice with the new repositories.

As GitHub expanded, the more toil the database infrastructure team had. We’d review pull requests, compare schemas, generate migration statements manually, and verify on a local machine. Other than the git log, no formal tracking for schema migrations existed. We’d check in chat, issues, and pull requests to see what was done and what wasn’t. We’d keep track of ongoing migrations in our heads, context switch between the migrations throughout the day, and how often we’d get interrupted by notifications. And we did this while taking each migration through the next step, keeping mental notes, and communicating the progress to our peers.

With these steps in mind, we wanted a solution to automate the process. We came up with various ideas, and in 2019 GitHub Actions was released. This was our solution: multiple loosely coupled components, each owning a specific aspect of the flow, all orchestrated by a controller service. The next section covers the breakdown of our solution.

Code

Our basic premise is that schema design should be treated as code. We want the schema to be versioned, and we want to know what schema is associated and with what version of our code.

To illustrate, GitHub provides not only github.com, but also GitHub Enterprise, an on-premise solution. On github.com we run continuous deployments. With GitHub Enterprise, we make periodic releases, and our customers can upgrade in-house. This means we need to be able to reproduce any schema changes we make to github.com on a customer’s Enterprise server.

Therefore we must keep our schema design coupled with the code in the same git repository. For a developer to design a schema change, they need to follow our normal development flow: create a branch, commit, push, and open a pull request. The pull request is where code is reviewed and discussion takes place for any changes. It’s where continuous integration and testing run. Our solution revolves around the pull request, and this is standardized across all our repositories.

The change

Once a pull request is opened, we need to be able to identify what changes we’d like to make. Typically, when we review code changes, we look at the diff. And it might be tempting to expect that git diff can help us formalize the schema change. Unfortunately, this is not the case, and git diff is poor at identifying these changes. For example, consider this simplified table definition:

CREATE TABLE some_table (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  hostname varchar(128) NOT NULL,
  PRIMARY KEY (id),
  KEY (hostname)
);

Suppose we decide to add a new column and drop the index on hostname. The new schema becomes:

CREATE TABLE some_table (
  id int(10) unsigned NOT NULL AUTO_INCREMENT,
  hostname varchar(128) NOT NULL,
  time_created TIMESTAMP NOT NULL,
  PRIMARY KEY (id)
);

Running git diff on the two schemas yields the following:

@@ -1,6 +1,6 @@
 CREATE TABLE some_table (
   id int(10) unsigned NOT NULL DEFAULT 0,
   hostname varchar(128) NOT NULL,
-  PRIMARY KEY (id),
-  KEY (hostname)
+  time_created TIMESTAMP NOT NULL,
+  PRIMARY KEY (id)
 );

The pull request’s “Files changed” tab shows the same:

This is a sample Pull Request where we change a table's schema. git diff does a poor job of analyzing the schema change.

See how the PRIMARY KEY line goes into the diff because of the trailing comma. This diff does not capture the schema change well, and while RoR provides tooling for that,  we’ve still had to carefully review them. Fortunately, there’s a good MySQL-oriented tool to do the task.

skeema

skeema is an open source schema management utility developed by Evan Elias. It expects the declarative approach, and looks for a schema definition on your file system (hopefully as part of your repository). The file system layout should include a directory per schema/database, a file per table, and then some special configuration files telling skeema the identities of, and the credentials for, MySQL servers in various environments. Skeema is able to run useful tasks, such as:

  • skeema diff: generate SQL statements that convert the existing database schema into the schema defined in the file system. This includes as many CREATE, ALTER and DROP TABLE statements as needed.
  • skeema push: actually apply changes to the database server for the schema to match the one on file system
  • skeema pull: rewrite the filesystem schema based on the existing schema in the MySQL server.

skeema can do much more, including the ability to invoke online schema change tools—but that’s outside this post’s scope.

Git users will feel comfortable with skeema. Indeed, skeema works very well with git-versioned schemas. For us, the most valuable asset is its diff output: a well formed, reliable set of statements to show the SQL transition from one schema to another. For example, skeema diff output for the above schema change is:

USE `test`;
ALTER TABLE `some_table` ADD COLUMN `time_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, DROP KEY `hostname`;

Note that the above is not only correct, but also formal. It reproduces correctly whether our code uses lower/upper case, includes/omits default value, etc.

We wanted to use skeema to tell us what statements we needed to run to get from our existing state into the state defined in the pull request. Assuming the master branch reflects our current production schema, this now becomes a matter of diffing the schemas between master and the pull request’s branch.

Skeema wasn’t without its challenges, and we had to figure out where to place skeema from a design perspective. Do the developers own it? Does every repository own it? Is there a central service to own it? Each presented its own problems, from false ownership to excessive responsibilities and access.

GitHub Actions

Enter GitHub Actions. With Actions, you’re able to run code as a response to events taking place in your repository. A new pull request, review, comment, issue, and quite a few others, are such events. The code (the action) is arbitrary, and GitHub spawns a container on its own infrastructure, where your code will run. What makes this extra interesting is that the container can get access to your repository. GitHub Actions implicitly receives an API token to interact with the repository.

The container comes with popular software packages pre-installed, such as a MySQL server.

Perhaps the most classic use of Actions is CI/CD.  When  a pull_request event occurs (a new pull request and any subsequent commit) run some code to build, test, lint, or validate the change. We took this approach to run skeema as part of a pull_request action flow, called skeema-diff.

Here’s a simplified breakdown of the action:

  1. Fetch skeema binary
  2. Checkout master branch
  3. Run skeema push to populate the container’s MySQL server with the schema as defined by the master branch
  4. Checkout pull request’s branch
  5. Run skeema diff to generate the statements that take the schema from the one in MySQL (remember, this is the master schema) to the one in the pull request’s branch
  6. Add the diff as a comment in the pull request
  7. Add a special label to indicate this pull request has a schema change

The GitHub Action, running skeema, generates schema diff output, which is added as a comment to the Pull Request. The comment presents the correct ALTER statement implied by the code change. This comment is both human and machine readable.

The code is more complex than what we’ve shown. We actually use base and head instead of master and branch, and there’s some logic to formalize, edit and validate the diff, to handle commits that further change the schema, among other processes.

By now, we have a partial flow, which works entirely on GitHub’s platform:

  • Schema change as code
  • Review process, based on GitHub’s pull request flow
  • Automated schema change analysis, based on skeema running in a GitHub Action
  • A visible output, presented as a pull request comment

Up to this point, everything is constrained to the repository. The repository itself doesn’t have information about where the schema gets deployed in production. This information is something that’s outside the repository’s scope, and it’s owned by the database infrastructure team rather than the repository’s developers. Neither the repository nor any action running on that repository has access to production, nor should they, as that would be a breach of domains.

Before we describe how the schema gets to production, let’s jump ahead and discuss the schema migration itself.

Schema migrations and gh-ost

Even the simplest schema migration isn’t simple. We are concerned with three types of table migrations:

  • CREATE TABLE is the simplest and the safest. We created something that didn’t exist before, and its creation time is instantaneous. Note that if the target cluster is sharded, this must be applied on all shards. If the cluster is sharded with vitess, then the vitess vtgate service automatically handles this for us.
  • DROP TABLE is a simple statement that comes with a great risk. What if it’s still in use and some code breaks as a result of the table going away? Note that we don’t actually drop tables as part of schema migrations. Any DROP TABLE statement is converted into a RENAME TABLE. Instead of DROP TABLE repositories (whoops!), our automation runs RENAME TABLE repositories TO _repositories_DROP_20200101123456. If our application fails because of this, we have an instant revert command: RENAME back to the original. Renamed tables are kept around for a few days prior to being garbage collected and dropped by our automation.
  • ALTER TABLE is the most complex case, mainly because it takes time to alter a table. We don’t actually ALTER tables in-place. We use gh-ost to emulate an ALTER TABLE, and the end result is the same even though the process is completely different. It doesn’t lock our apps, throttles as much as needed, and it’s controllable as well as auditable. We’ve run gh-ost in production for over three and a half years. It has little to no impact on production, and we generally don’t care that it’s running. But some of our larger tables may still take hours or even days to migrate. We also only run one ALTER (or, gh-ost) at a time on a cluster. Concurrent migrations are possible but compete over resources, leading to overall longer runtimes than sequential execution. This means that an ALTER migration requires scheduling. We need to be able to tell if a migration is already running on a cluster, as well as prioritize and queue migrations that apply to the same cluster. We also need to be able to tell the status over the duration of hours or days, and this needs to be communicated to the developer, the owner of the change. And, if the cluster is sharded, we need to run the migration per shard.

In order to run a migration, we must first determine the strategy for that migration (Is it direct query, gh-ost, or a manual?). We need to be able to tell where it can run,  how to go about the process if the cluster is sharded, as well as When to schedule it. While migrations can wait in queue while others are running, we want to be able to prioritize migrations, in case the queue is large.

skeefree

We created skeefree as the glue, which means it’s an orchestrating service that’s aware of our repositories, can communicate with our pull requests, knows about production (or, can get information about production) and which invokes the migrations. We run skeefree as a stateless kubernetes service, backed by a MySQL database that holds the state. Note that skeefree’s own schema is managed by skeefree.

skeefree uses GitHub’s API to interact with pull requests, GitHub’s internal inventory and discovery services, to locate clusters in production, and gh-ost to run migrations. Skeefree is best described by following a schema migration flow:

  1. A developer wishes to change the schema, so they open a pull request.
  2. skeema-diff Action springs to life and seeks a schema change. If a schema change isn’t found in the pull request, nothing happens. If there is a schema change, the Action, computes the change via skeema, adds a well-formed comment to the pull request indicating the change, and adds a migration:skeema:diff label to the pull request. This is done via the GitHub API.
  3. A developer looks into the change, and seeks review from a team member. At this time they may communicate to team members without actually going to production. Finally, they add the label migration:for:review.
  4. skeefree is aware of the developer’s repository and uses the GitHub API to periodically look for open pull requests, which are labeled by both migration:skeema:diff and migration:for:review, and have been approved by at least one developer.
  5. Once detected, skeefree investigates the pull request, and reads the schema change comment, generated by the Action. It maps the schema/repository to the schema/production cluster, and uses our inventory and discovery services to know if the cluster is sharded. Then, it finds the location and name of the cluster.
  6. skeefree then adds this to its backend database, and advertises its analysis on the pull request with another comment. This comment generally means “here’s what I will do if you approve”. And it proceeds to get a review from an authority. Once the user labels the Pull Request as "migration:for:review", skeefree analyzes the migration and evaluates where it needs to run. It proceeds to seek review from an authority.
  7. For most repositories, the authority is the database-infrastructure team. On our original RoR repository, we also seek review from a cross-functional team, known as the db-schema-reviewers, who are familiar with the general application and database design throughout the years and who have more context to offer. skeefree automatically knows which teams should be notified on which repositories.
  8. The relevant teams review and hopefully approve, and skeefree detects the approval, before choosing the proper strategy (direct query for CREATE and DROP, or RENAME), and gh-ost for ALTER. It then queues the migration(s).
  9. skeefree’s scheduler periodically checks what next can be executed. Remember we only run a single ALTER migration on a given cluster at a time, but we also have a limited number of runner hosts. If there’s a free runner host and the cluster is not running any migration, skeefree then proceeds to kick off a migration. Skeefree advertises this fact as a pull request comment to notify the developer that the migration started.
  10. Once the migration is complete, skeefree announces it in a pull request comment. The same applies should the migration fail.
  11. The pull request may also have more than one migration. Perhaps the cluster is sharded, or there may be multiple tables changed in the pull request. Once all migrations are successfully completed, skeefree advertises this in a pull request comment. The developer is notified that all migrations are done, and they’re encouraged to proceed with their standard deploy/merge flow.

as skeefree runs the migrations, it adds comments on the Pull Request page to indicate its progress. When all migrations are complete, skeefree comments as much, again on the pull request page.

Analysis of the flow

There are a few nuances here that make a good experience to everyone involved:

  • The database infrastructure team doesn’t know about the pull request until the developer explicitly adds the migration:for:review label. It’s like a draft pull request or a pull request that’s a work in progress, only this flag applies specifically to the schema migration flow. This allows the developer to use their preferred flow, and communicate with their team without interrupting the database infrastructure team or getting premature reviews.
  • The skeema analysis is contained within the repository, which means That no external service is required. The developer can check the diff result, themselves.
  • The Action is the only part of the flow that looks at the code. Neither skeefree nor gh-ost look at the actual code, and they don’t need git access.
  • The database infrastructure team only needs to take a single step, which is review the pull request.
  • The developers own the creation of pull requests, getting peer reviews, and finally, deploying and merging. These are the exact operations that should be under their ownership. Moreover, they get visibility into the state of their migration. By looking at the pull request page or their GitHub notifications, they can tell whether the pull request has been reviewed, queued, started, completed, or failed. They don’t need to ask. Even better, we have chatops that give visibility into the overall state of migration queue, a running migration’s progress, and more. These chatops are available for all to invoke.
  • The database infrastructure team owns the process of mapping the repository schema to production. This is done via chatops, but can also be completed via configuration. The team is able to cancel a pull request, retry a failed migration, and more.
  • gh-ost is generally trusted, and we have control over a running migration. This means that we can force it to throttle, set up a different throttle threshold, make it use less resources, or terminate it, if needed. We also have a throttling mechanism throughout our stack, so that long running processes like migrations yield to higher priority operations, which extends their own runtime so it doesn’t generate too much load on our database servers.
  • We use our own prefered pull request flow, oActions (skeefree was an early adopter for Actions), GitHub API, and our existing datacenter and database infrastructure, all of which are well understood internally.

Public availability

skeefree and the skeema-diff Action were authored internally at GitHub to solve a specific problem. skeefree uses our internal inventory and discovery services, it works with our chatops and uses some internal libraries.

Our experience in releasing open source software is that no one’s use case is exactly the same as ours. Our perception of an automated migrations flow may be very different from another organization’s perception. We still want to share more than just our words, so we’ve open sourced the code.

It’s a bit of a peculiar OSS release:

  • it’s missing some libraries; it will not build.
  • It expects some of our internal services to exist, which more than likely won’t be on your platform.
  • It expects chatops, and you may not be using chatops.
  • The code also needs to be rewritten for adaptation to your environment,

Note that the code is available, but not open for issues and pull requests. We hope the community finds it useful.

Get the code

The post Automating MySQL schema migrations with GitHub Actions and more appeared first on The GitHub Blog.

Building an AWS IoT Core device using AWS Serverless and an ESP32

Post Syndicated from Moheeb Zara original https://aws.amazon.com/blogs/compute/building-an-aws-iot-core-device-using-aws-serverless-and-an-esp32/

Using a simple Arduino sketch, an AWS Serverless Application Repository application, and a microcontroller, you can build a basic serverless workflow for communicating with an AWS IoT Core device.

A microcontroller is a programmable chip and acts as the brain of an electronic device. It has input and output pins for reading and writing on digital or analog components. Those components could be sensors, relays, actuators, or various other devices. It can be used to build remote sensors, home automation products, robots, and much more. The ESP32 is a powerful low-cost microcontroller with Wi-Fi and Bluetooth built in and is used this walkthrough.

The Arduino IDE, a lightweight development environment for hardware, now includes support for the ESP32. There is a large collection of community and officially supported libraries, from addressable LED strips to spectral light analysis.

The following walkthrough demonstrates connecting an ESP32 to AWS IoT Core to allow it to publish and subscribe to topics. This means that the device can send any arbitrary information, such as sensor values, into AWS IoT Core while also being able to receive commands.

Solution overview

This post walks through deploying an application from the AWS Serverless Application Repository. This allows an AWS IoT device to be messaged using a REST endpoint powered by Amazon API Gateway and AWS Lambda. The AWS SAR application also configures an AWS IoT rule that forwards any messages published by the device to a Lambda function that updates an Amazon DynamoDB table, demonstrating basic bidirectional communication.

The last section explores how to build an IoT project with real-world application. By connecting a thermal printer module and modifying a few lines of code in the example firmware, the ESP32 device becomes an AWS IoT–connected printer.

All of this can be accomplished within the AWS Free Tier, which is necessary for the following instructions.

An example of an AWS IoT project using an ESP32, AWS IoT Core, and an Arduino thermal printer

An example of an AWS IoT project using an ESP32, AWS IoT Core, and an Arduino thermal printer.

Required steps

To complete the walkthrough, follow these steps:

  • Create an AWS IoT device.
  • Install and configure the Arduino IDE.
  • Configure and flash an ESP32 IoT device.
  • Deploying the lambda-iot-rule AWS SAR application.
  • Monitor and test.
  • Create an IoT thermal printer.

Creating an AWS IoT device

To communicate with the ESP32 device, it must connect to AWS IoT Core with device credentials. You must also specify the topics it has permissions to publish and subscribe on.

  1. In the AWS IoT console, choose Register a new thing, Create a single thing.
  2. Name the new thing. Use this exact name later when configuring the ESP32 IoT device. Leave the remaining fields set to their defaults. Choose Next.
  3.  Choose Create certificate. Only the thing cert, private key, and Amazon Root CA 1 downloads are necessary for the ESP32 to connect. Download and save them somewhere secure, as they are used when programming the ESP32 device.
  4. Choose Activate, Attach a policy.
  5. Skip adding a policy, and choose Register Thing.
  6. In the AWS IoT console side menu, choose Secure, Policies, Create a policy.
  7. Name the policy Esp32Policy. Choose the Advanced tab.
  8. Paste in the following policy template.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": "iot:Connect",
          "Resource": "arn:aws:iot:REGION:ACCOUNT_ID:client/THINGNAME"
        },
        {
          "Effect": "Allow",
          "Action": "iot:Subscribe",
          "Resource": "arn:aws:iot:REGION:ACCOUNT_ID:topicfilter/esp32/sub"
        },
    	{
          "Effect": "Allow",
          "Action": "iot:Receive",
          "Resource": "arn:aws:iot:REGION:ACCOUNT_ID:topic/esp32/sub"
        },
        {
          "Effect": "Allow",
          "Action": "iot:Publish",
          "Resource": "arn:aws:iot:REGION:ACCOUNT_ID:topic/esp32/pub"
        }
      ]
    }
  9. Replace REGION with the matching AWS Region you’re currently operating in. This can be found on the top right corner of the AWS console window.
  10.  Replace ACCOUNT_ID with your own, which can be found in Account Settings.
  11. Replace THINGNAME with the name of your device.
  12. Choose Create.
  13. In the AWS IoT console, choose Secure, Certification. Select the one created for your device and choose Actions, Attach policy.
  14. Choose Esp32Policy, Attach.

Your AWS IoT device is now configured to have permission to connect to AWS IoT Core. It can also publish to the topic esp32/pub and subscribe to the topic esp32/sub. For more information on securing devices, see AWS IoT Policies.

Installing and configuring the Arduino IDE

The Arduino IDE is an open-source development environment for programming microcontrollers. It supports a continuously growing number of platforms including most ESP32-based modules. It must be installed along with the ESP32 board definitions, MQTT library, and ArduinoJson library.

  1. Download the Arduino installer for the desired operating system.
  2. Start Arduino and open the Preferences window.
  3. For Additional Board Manager URLs, add
    https://dl.espressif.com/dl/package_esp32_index.json.
  4. Choose Tools, Board, Boards Manager.
  5. Search esp32 and install the latest version.
  6. Choose Sketch, Include Library, Manage Libraries.
  7. Search MQTT, and install the latest version by Joel Gaehwiler.
  8. Repeat the library installation process for ArduinoJson.

The Arduino IDE is now installed and configured with all the board definitions and libraries needed for this walkthrough.

Configuring and flashing an ESP32 IoT device

A collection of various ESP32 development boards.

A collection of various ESP32 development boards.

For this section, you need an ESP32 device. To check if your board is compatible with the Arduino IDE, see the boards.txt file. The following code connects to AWS IoT Core securely using MQTT, a publish and subscribe messaging protocol.

This project has been tested on the following devices:

  1. Install the required serial drivers for your device. Some boards use different USB/FTDI chips for interfacing. Here are the most commonly used with links to drivers.
  2. Open the Arduino IDE and choose File, New to create a new sketch.
  3. Add a new tab and name it secrets.h.
  4. Paste the following into the secrets file.
    #include <pgmspace.h>
    
    #define SECRET
    #define THINGNAME ""
    
    const char WIFI_SSID[] = "";
    const char WIFI_PASSWORD[] = "";
    const char AWS_IOT_ENDPOINT[] = "xxxxx.amazonaws.com";
    
    // Amazon Root CA 1
    static const char AWS_CERT_CA[] PROGMEM = R"EOF(
    -----BEGIN CERTIFICATE-----
    -----END CERTIFICATE-----
    )EOF";
    
    // Device Certificate
    static const char AWS_CERT_CRT[] PROGMEM = R"KEY(
    -----BEGIN CERTIFICATE-----
    -----END CERTIFICATE-----
    )KEY";
    
    // Device Private Key
    static const char AWS_CERT_PRIVATE[] PROGMEM = R"KEY(
    -----BEGIN RSA PRIVATE KEY-----
    -----END RSA PRIVATE KEY-----
    )KEY";
  5. Enter the name of your AWS IoT thing, as it is in the console, in the field THINGNAME.
  6. To connect to Wi-Fi, add the SSID and PASSWORD of the desired network. Note: The network name should not include spaces or special characters.
  7. The AWS_IOT_ENDPOINT can be found from the Settings page in the AWS IoT console.
  8. Copy the Amazon Root CA 1, Device Certificate, and Device Private Key to their respective locations in the secrets.h file.
  9. Choose the tab for the main sketch file, and paste the following.
    #include "secrets.h"
    #include <WiFiClientSecure.h>
    #include <MQTTClient.h>
    #include <ArduinoJson.h>
    #include "WiFi.h"
    
    // The MQTT topics that this device should publish/subscribe
    #define AWS_IOT_PUBLISH_TOPIC   "esp32/pub"
    #define AWS_IOT_SUBSCRIBE_TOPIC "esp32/sub"
    
    WiFiClientSecure net = WiFiClientSecure();
    MQTTClient client = MQTTClient(256);
    
    void connectAWS()
    {
      WiFi.mode(WIFI_STA);
      WiFi.begin(WIFI_SSID, WIFI_PASSWORD);
    
      Serial.println("Connecting to Wi-Fi");
    
      while (WiFi.status() != WL_CONNECTED){
        delay(500);
        Serial.print(".");
      }
    
      // Configure WiFiClientSecure to use the AWS IoT device credentials
      net.setCACert(AWS_CERT_CA);
      net.setCertificate(AWS_CERT_CRT);
      net.setPrivateKey(AWS_CERT_PRIVATE);
    
      // Connect to the MQTT broker on the AWS endpoint we defined earlier
      client.begin(AWS_IOT_ENDPOINT, 8883, net);
    
      // Create a message handler
      client.onMessage(messageHandler);
    
      Serial.print("Connecting to AWS IOT");
    
      while (!client.connect(THINGNAME)) {
        Serial.print(".");
        delay(100);
      }
    
      if(!client.connected()){
        Serial.println("AWS IoT Timeout!");
        return;
      }
    
      // Subscribe to a topic
      client.subscribe(AWS_IOT_SUBSCRIBE_TOPIC);
    
      Serial.println("AWS IoT Connected!");
    }
    
    void publishMessage()
    {
      StaticJsonDocument<200> doc;
      doc["time"] = millis();
      doc["sensor_a0"] = analogRead(0);
      char jsonBuffer[512];
      serializeJson(doc, jsonBuffer); // print to client
    
      client.publish(AWS_IOT_PUBLISH_TOPIC, jsonBuffer);
    }
    
    void messageHandler(String &topic, String &payload) {
      Serial.println("incoming: " + topic + " - " + payload);
    
    //  StaticJsonDocument<200> doc;
    //  deserializeJson(doc, payload);
    //  const char* message = doc["message"];
    }
    
    void setup() {
      Serial.begin(9600);
      connectAWS();
    }
    
    void loop() {
      publishMessage();
      client.loop();
      delay(1000);
    }
  10. Choose File, Save, and give your project a name.

Flashing the ESP32

  1. Plug the ESP32 board into a USB port on the computer running the Arduino IDE.
  2. Choose Tools, Board, and then select the matching type of ESP32 module. In this case, a Sparkfun ESP32 Thing was used.
  3. Choose Tools, Port, and then select the matching port for your device.
  4. Choose Upload. Arduino reads Done uploading when the upload is successful.
  5. Choose the magnifying lens icon to open the Serial Monitor. Set the baud rate to 9600.

Keep the Serial Monitor open. When connected to Wi-Fi and then AWS IoT Core, any messages received on the topic esp32/sub are logged to this console. The device is also now publishing to the topic esp32/pub.

The topics are set at the top of the sketch. When changing or adding topics, remember to add permissions in the device policy.

// The MQTT topics that this device should publish/subscribe
#define AWS_IOT_PUBLISH_TOPIC   "esp32/pub"
#define AWS_IOT_SUBSCRIBE_TOPIC "esp32/sub"

Within this sketch, the relevant functions are publishMessage() and messageHandler().

The publishMessage() function creates a JSON object with the current time in milliseconds and the analog value of pin A0 on the device. It then publishes this JSON object to the topic esp32/pub.

void publishMessage()
{
  StaticJsonDocument<200> doc;
  doc["time"] = millis();
  doc["sensor_a0"] = analogRead(0);
  char jsonBuffer[512];
  serializeJson(doc, jsonBuffer); // print to client

  client.publish(AWS_IOT_PUBLISH_TOPIC, jsonBuffer);
}

The messageHandler() function prints out the topic and payload of any message from a subscribed topic. To see all the ways to parse JSON messages in Arduino, see the deserializeJson() example.

void messageHandler(String &topic, String &payload) {
  Serial.println("incoming: " + topic + " - " + payload);

//  StaticJsonDocument<200> doc;
//  deserializeJson(doc, payload);
//  const char* message = doc["message"];
}

Additional topic subscriptions can be added within the connectAWS() function by adding another line similar to the following.

// Subscribe to a topic
  client.subscribe(AWS_IOT_SUBSCRIBE_TOPIC);

  Serial.println("AWS IoT Connected!");

Deploying the lambda-iot-rule AWS SAR application

Now that an ESP32 device has been connected to AWS IoT, the following steps walk through deploying an AWS Serverless Application Repository application. This is a base for building serverless integration with a physical device.

  1. On the lambda-iot-rule AWS Serverless Application Repository application page, make sure that the Region is the same as the AWS IoT device.
  2. Choose Deploy.
  3. Under Application settings, for PublishTopic, enter esp32/sub. This is the topic to which the ESP32 device is subscribed. It receives messages published to this topic. Likewise, set SubscribeTopic to esp32/pub, the topic on which the device publishes.
  4. Choose Deploy.
  5. When creation of the application is complete, choose Test app to navigate to the application page. Keep this page open for the next section.

Monitoring and testing

At this stage, two Lambda functions, a DynamoDB table, and an AWS IoT rule have been deployed. The IoT rule forwards messages on topic esp32/pub to TopicSubscriber, a Lambda function, which inserts the messages on to the DynamoDB table.

  1. On the application page, under Resources, choose MyTable. This is the DynamoDB table that the TopicSubscriber Lambda function updates.
  2. Choose Items. If the ESP32 device is still active and connected, messages that it has published appear here.

The TopicPublisher Lambda function is invoked by the API Gateway endpoint and publishes to the AWS IoT topic esp32/sub.

1.     On the application page, find the Application endpoint.

2.     To test that the TopicPublisher function is working, enter the following into a terminal or command-line utility, replacing ENDPOINT with the URL from above.

curl -d '{"text":"Hello world!"}' -H "Content-Type: application/json" -X POST https://ENDPOINT/publish

Upon success, the request returns a copy of the message.

Back in the Serial Monitor, the message published to the topic esp32/sub prints out.

Creating an IoT thermal printer

With the completion of the previous steps, the ESP32 device currently logs incoming messages to the serial console.

The following steps demonstrate how the code can be modified to use incoming messages to interact with a peripheral component. This is done by wiring a thermal printer to the ESP32 in order to physically print messages. The REST endpoint from the previous section can be used as a webhook in third-party applications to interact with this device.

A wiring diagram depicting an ESP32 connected to a thermal printer.

A wiring diagram depicting an ESP32 connected to a thermal printer.

  1. Follow the product instructions for powering, wiring, and installing the correct Arduino library.
  2. Ensure that the thermal printer is working by holding the power button on the printer while connecting the power. A sample receipt prints. On that receipt, the default baud rate is specified as either 9600 or 19200.
  3. In the Arduino code from earlier, include the following lines at the top of the main sketch file. The second line defines what interface the thermal printer is connected to. &Serial2 is used to set the third hardware serial interface on the ESP32. For this example, the pins on the Sparkfun ESP32 Thing, GPIO16/GPIO17, are used for RX/TX respectively.
    #include "Adafruit_Thermal.h"
    
    Adafruit_Thermal printer(&Serial2);
  4. Replace the setup() function with the following to initialize the printer on device bootup. Change the baud rate of Serial2.begin() to match what is specified in the test print. The default is 19200.
    void setup() {
      Serial.begin(9600);
    
      // Start the thermal printer
      Serial2.begin(19200);
      printer.begin();
      printer.setSize('S');
    
      connectAWS();
    }
    
  5. Replace the messageHandler() function with the following. On any incoming message, it parses the JSON and prints the message on the thermal printer.
    void messageHandler(String &topic, String &payload) {
      Serial.println("incoming: " + topic + " - " + payload);
    
      // deserialize json
      StaticJsonDocument<200> doc;
      deserializeJson(doc, payload);
      String message = doc["message"];
    
      // Print the message on the thermal printer
      printer.println(message);
      printer.feed(2);
    }
  6. Choose Upload.
  7. After the firmware has successfully uploaded, open the Serial Monitor to confirm that the board has connected to AWS IoT.
  8. Enter the following into a command-line utility, replacing ENDPOINT, as in the previous section.
    curl -d '{"message": "Hello World!"}' -H "Content-Type: application/json" -X POST https://ENDPOINT/publish

If successful, the device prints out the message “Hello World” from the attached thermal printer. This is a fully serverless IoT printer that can be triggered remotely from a webhook. As an example, this can be used with GitHub Webhooks to print a physical readout of events.

Conclusion

Using a simple Arduino sketch, an AWS Serverless Application Repository application, and a microcontroller, this post demonstrated how to build a basic serverless workflow for communicating with a physical device. It also showed how to expand that into an IoT thermal printer with real-world applications.

With the use of AWS serverless, advanced compute and extensibility can be added to an IoT device, from machine learning to translation services and beyond. By using the Arduino programming environment, the vast collection of open-source libraries, projects, and code examples open up a world of possibilities. The next step is to explore what can be done with an Arduino and the capabilities of AWS serverless. The sample Arduino code for this project and more can be found at this GitHub repository.

An Update on CDNJS

Post Syndicated from Zack Bloom original https://blog.cloudflare.com/an-update-on-cdnjs/

An Update on CDNJS

When you loaded this blog, a file was delivered to your browser called jquery-3.2.1.min.js. jQuery is a library which makes it easier to build websites, and was at one point included on as many as 74.1% of all websites. A full eighteen million sites include jQuery and other libraries using one of the most popular tools on Earth: CDNJS. Beginning about a month ago Cloudflare began to take a more active role in the operation of CDNJS. This post is here to tell you more about CDNJS’ history and explain why we are helping to manage CDNJS.

What CDNJS Does

Virtually every site is composed of not just the code written by its developers, but also dozens or hundreds of libraries. These libraries make it possible for websites to extend what a web browser can do on its own. For example, libraries can allow a site to include powerful data visualizations, respond to user input, or even get more performant.

These libraries created wondrous and magical new capabilities for web browsers, but they can also cause the size of a site to explode. Particularly a decade ago, connections were not always fast enough to permit the use of many libraries while maintaining performance. But if so many websites are all including the same libraries, why was it necessary for each of them to load their own copy?

If we all load jQuery from the same place the browser can do a much better job of not actually needing to download it for every site. When the user visits the first jQuery-powered site it will have to be downloaded, but it will already be cached on the user’s computer for any subsequent jQuery-powered site they might visit.

An Update on CDNJS

The first visit might take time to load:

An Update on CDNJS

But any future visit to any website pointing to this common URL would already be cached:

An Update on CDNJS

<!-- Loaded only on my site, will need to be downloaded by every user -->
<script src="./jquery.js"></script>

<!-- Loaded from a common location across many sites -->
<script src="https://cdnjs.cloudflare.com/jquery.js"></script>

Beyond the performance advantage, including files this way also made it very easy for users to experiment and create. When using a web browser as a creation tool users often didn’t have elaborate build systems (this was also before npm), so being able to include a simple script tag was a boon. It’s worth noting that it’s not clear a massive performance advantage was ever actually provided by this scheme. It is becoming even less of a performance advantage now that browser vendors are beginning to use separate cache’s for each website you visit, but with millions of sites using CDNJS there’s no doubt it is a critical part of the web.

A CDN for all of us

My first Pull Request into the CDNJS project was in 2013. Back then if you created a JavaScript project it wasn’t possible to have it included in the jQuery CDN, or the ones provided by large companies like Google and Microsoft. They were only for big, important, projects. Of course, however, even the biggest project starts small. The community needed a CDN which would agree to host nearly all JavaScript projects, even the ones which weren’t world-changing (yet). In 2011, that project was launched by Ryan Kirkman and Thomas Davis as CDNJS.

The project was quickly wildly successful, far beyond their expectations. Their CDN bill quickly began to skyrocket (it would now be over a million dollars a year on AWS). Under the threat of having to shut down the service, Cloudflare was approached by the CDNJS team to see if we could help. We agreed to support their efforts and created cdnjs.cloudflare.com which serves the CDNJS project free of charge.

CDNJS has been astonishingly successful. The project is currently installed on over eighteen million websites (10% of the Internet!), offers files totaling over 1.5 billion lines of code, and serves over 173 billion requests a month. CDNJS only gets more popular as sites get larger, with 34% of the top 10k websites using the service. Each month we serve almost three petabytes of JavaScript, CSS, and other resources which power the web via cdnjs.cloudflare.com.

An Update on CDNJS
Spikes can happen when a very large or popular site installs CDNJS, or when a misbehaving web crawler discovers a CDNJS link.

The future value of CDNJS is now in doubt, as web browsers are beginning to use a separate cache for every website you visit. It is currently used on such a wide swath of the web, however, it is unlikely it will be disappearing any time soon.

How CDNJS Works

CDNJS starts with a Github repo. That project contains every file served by CDNJS, at every version which it has ever offered. That’s 182 GB without the commit history, over five million files, and over 1.5 billion lines of code.

Given that it stores and delivers versioned code files, in many ways it was the Internet’s first JavaScript package manager. Unlike other package managers and even other CDNs everything CDNJS serves is publicly versioned. All 67,724 commits! This means you as a user can verify that you are being served files which haven’t been tampered with.

To make changes to CDNJS a commit has to be made. For new projects being added to CDNJS, or when projects change significantly, these commits are made by humans, and get reviewed by other humans. When projects just release new versions there is a bot made by Peter and maintained by Sven which sucks up changes from npm and automatically creates commits.

Within Cloudflare’s infrastructure there is a set of machines which are responsible for pulling the latest version of the repo periodically. Those machines then become the origin for cdnjs.cloudflare.com, with Cloudflare’s Global Load Balancer automatically handling failures. Cloudflare’s cache automatically stores copies of many of the projects making it possible for us to deliver them quickly from all 195 of our data centers.

An Update on CDNJS

The Internet on a Shoestring Budget

The CDNJS project has always been administered independently of Cloudflare. In addition to the founders, the project has additionally been maintained by exceptionally hard-working caretakers like Peter and Matt Cowley. Maintaining a single repo of nearly every frontend project on Earth is no small task, and it has required a substantial amount of both manual work and bot development.

Unfortunately approximately thirty days ago one of those bots stopped working, preventing updated projects from appearing in CDNJS. The bot’s open-source maintainer was not able to invest the time necessary to keep the bot running. After several weeks we were asked by the community and the CDNJS founders to take over maintenance of the CDNJS repo itself. This means the Cloudflare engineering team is taking responsibility for keeping the contents of github.com/cdnjs/cdnjs up to date, in addition to ensuring it is correctly served on cdnjs.cloudflare.com.

We agreed to do this because we were, frankly, scared. Like so many open-source projects CDNJS was a critical part of our world, but wasn’t getting the attention it needed to survive. The Internet relies on CDNJS as much as on any other single project, losing it or allowing it to be commandeered would be catastrophic to millions of websites and their visitors. If it began to fail, some sites would adapt and update, others would be broken forever.

CDNJS has always been, and remains, a project for and by the community. We are invested in making all decisions in a transparent and inclusive manner. If you are interested in contributing to CDNJS or in the topics we’re currently discussing please visit the CDNJS Github Issues page.

An Update on CDNJS

A Plan for the Future

One example of an area where we could use your help is in charting a path towards a CDNJS which requires less manual moderation. Nothing can replace the intelligence and creativity of a human (yet), but for a task like managing what resources go into a CDN, it is error prone and time consuming. At present a human has to review every new project to be included, and often has to take additional steps to include new versions of a project.

As a part of our analysis of the project we examined a snapshot of the still-open PRs made against CDNJS for several months:

An Update on CDNJS

The vast majority of these PRs were changes which ultimately passed the automated review but nevertheless couldn’t be merged without manual review.

There is consensus that we should move to a model which does not require human involvement in most cases. We would love your input and collaboration on the best way for that to be solved. If this is something you are passionate about, please contribute here.

Our plan is to support the CDNJS project in whichever ways it requires for as long as the Internet relies upon it. We invite you to use CDNJS in your next project with the full assurance that it is backed by the same network and team who protect and accelerate over twenty million of your favorite websites across the Internet. We are also planning more posts diving further into the CDNJS data, subscribe to this blog if you would like to be notified upon their release.

Behind the scenes: GitHub security alerts

Post Syndicated from Justin Hutchings original https://github.blog/2019-12-11-behind-the-scenes-github-vulnerability-alerts/

If you have code on GitHub, chances are that you’ve had a security vulnerability alert at some point. Since the feature launched, GitHub has sent more than 62 million security alerts for vulnerable dependencies.

How does it work?

Vulnerability alerts rely on two pieces of data: an inventory of all the software that your code depends on, and a curated list of known vulnerabilities in open-source code. 

Any time you push a change to a dependency manifest file, GitHub has a job that parses those manifest files, and stores your dependency on those packages in the dependency graph. If you’re dependent on something that hasn’t been seen before, a background task runs to get more information about the package from the package registries themselves and adds it. We use the information from the package registries to establish the canonical repository that the package came from, and to help populate metadata like readmes, known versions, and the published licenses. 

On GitHub Enterprise Server, this process works identically, except we don’t get any information from the public package registries in order to protect the privacy of the server and its code. 

The dependency graph supports manifests for JavaScript (npm, Yarn), .NET (Nuget), Java (Maven), PHP (Composer), Python (PyPI), and Ruby (Rubygems). This data powers our vulnerability alerts, but also dependency insights, the used by badge, and the community contributors experiences. 

Beyond the dependency graph, we aggregate data from a number of sources and curate those to bring you actionable security alerts. GitHub brings in security vulnerability data from a number of sources, including the National Vulnerability Database (a service of the United States National Institute of Standards and Technology), maintainer security advisories from open-source maintainers, community datasources, and our partner WhiteSource

Once we learn about a vulnerability, it passes through an advanced machine learning model that’s trained to recognize vulnerabilities which impact developers. This model rejects anything that isn’t related to an open-source toolchain. If the model accepts the vulnerability, a bot creates a pull request in a GitHub private repository for our  team of curation experts to manually review.

GitHub curates vulnerabilities because CVEs (Common Vulnerability Entries) are often ambiguous about which open-source projects are impacted. This can be particularly challenging when multiple libraries with similar names exist, or when they’re a part of a larger toolkit. Depending on the kind of vulnerability, our curation team may follow-up with outside security researchers or maintainers about the impact assessment. This follow-up helps to confirm that an alert is warranted and to identify the exact packages that are impacted. 

Once the curation team completes the mappings, we merge the pull request and it starts a background job that notifies users about any affected repositories. Depending on the vulnerability, this can cause a lot of alerts. In a recent incident, more than two million repositories were alerted about a vulnerable version of lodash, a popular JavaScript utility library.

GitHub Enterprise Server customers get a slightly different experience. If an admin has enabled security vulnerability alerts through GitHub Connect, the server will download the latest curated list of vulnerabilities from GitHub.com over the private GitHub Connect channel on its next scheduled sync (about once per hour). If a new vulnerability exists, the server determines the impacted users and repositories before generating alerts directly. 

Security vulnerabilities are a matter of public good. High-profile breaches impact the trustworthiness of the entire tech industry, so we publish a curated set of vulnerabilities on our GraphQL APIs for community projects and enterprise tools to use in custom workflows as necessary. Users can also browse the known vulnerabilities from public sources on the GitHub Advisory Database.

Engineers behind the feature

Despite advanced technology, security alerting is a human process driven by dedicated GitHubbers. Meet Rob (@rschultheis), one of the core members of our security team, and learn about his experiences at GitHub through a friendly Q&A:

Humphrey Dogart (German Shepherd) and Rob Schultheis (Software Engineer on the GitHub Security team)

How long have you been with GitHub? 

Two years

How did you get into software security? 

I’ve worked with open source software for most of my 20 year career in tech, and honestly for much of that time I didn’t pay much attention to security. When I started at GitHub I was given the opportunity to work on the first iteration of security alerts. It quickly became clear that having a high quality, open dataset was going to be a critical factor in the success of the feature. I dove into the task of curating that advisory dataset and found a whole side to the industry that was open for exploration, and I’ve stayed with it ever since!

What are the trickiest parts of vulnerability curation? 

The hardest problem is probably confirming that our advisory data correctly identifies which version(s) of a package are vulnerable to a given advisory, and which version(s) first address it.

What was the most difficult security vulnerability you’ve had to publish? 

One memorable vulnerability was CVE-2015-9284. This one was tough in several ways because it was a part of a popular library, it was also unpatched when it became fully public, and finally, it was published four years after the initial disclosure to maintainers. Even worse, all attempts to fix it had stalled.

We ended up proceeding to publish it and the community quickly responded and finally got the security issue patched.

What’s your favorite feel-good moment working in security? 

Seeing tweets and other feedback thanking us is always wonderful. We do read them! And that goes the same for those critical of the feature or the way certain advisories were disclosed or published. Please keep them coming—they’re really valuable to us as we keep evolving our security offerings.

Since you work at home, can you introduce us to your furry officemate? 

I live with a seven month old shepherd named Humphrey Dogart. His primary responsibilities are making sure I don’t spend all day on the computer, and he does a great job of that. I think we make a great team!


Learn more about GitHub security alerts

The post Behind the scenes: GitHub security alerts appeared first on The GitHub Blog.

Open-Sourcing Metaflow, a Human-Centric Framework for Data Science

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/open-sourcing-metaflow-a-human-centric-framework-for-data-science-fa72e04a5d9?source=rss----2615bd06b42e---4

by David Berg, Ravi Kiran Chirravuri, Romain Cledat, Savin Goyal, Ferras Hamad, Ville Tuulos

tl;dr Metaflow is now open-source! Get started at metaflow.org.

Netflix applies data science to hundreds of use cases across the company, including optimizing content delivery and video encoding. Data scientists at Netflix relish our culture that empowers them to work autonomously and use their judgment to solve problems independently. We want our data scientists to be curious and take smart risks that have the potential for high business impact.

About two years ago, we, at our newly formed Machine Learning Infrastructure team started asking our data scientists a question: “What is the hardest thing for you as a data scientist at Netflix?” We were expecting to hear answers related to large-scale data and models, and maybe issues related to modern GPUs. Instead, we heard stories about projects where getting the first version to production took surprisingly long — mainly because of mundane reasons related to software engineering. We heard many stories about difficulties related to data access and basic data processing. We sat in meetings where data scientists discussed with their stakeholders how to best version different versions of their models without impacting production. We saw how excited data scientists were about modern off-the-shelf machine learning libraries, but we also witnessed various issues caused by these libraries when they were casually included as dependencies in production workflows.

We realized that nearly everything that data scientists wanted to do was already doable technically, but nothing was easy enough. Our job as a Machine Learning Infrastructure team would therefore not be mainly about enabling new technical feats. Instead, we should make common operations so easy that data scientists would not even realize that they were difficult before. We would focus our energy solely on improving data scientist productivity by being fanatically human-centric.

How could we improve the quality of life for data scientists? The following picture started emerging:

Our data scientists love the freedom of being able to choose the best modeling approach for their project. They know that feature engineering is critical for many models, so they want to stay in control of model inputs and feature engineering logic. In many cases, data scientists are quite eager to own their own models in production, since it allows them to troubleshoot and iterate the models faster.

On the other hand, very few data scientists feel strongly about the nature of the data warehouse, the compute platform that trains and scores their models, or the workflow scheduler. Preferably, from their point of view, these foundational components should “just work”. If, and when they fail, the error messages should be clear and understandable in the context of their work.

A key observation was that most of our data scientists had nothing against writing Python code. In fact, plain-and-simple Python is quickly becoming the lingua franca of data science, so using Python is preferable to domain specific languages. Data scientists want to retain their freedom to use arbitrary, idiomatic Python code to express their business logic — like they would do in a Jupyter notebook. However, they don’t want to spend too much time thinking about object hierarchies, packaging issues, or dealing with obscure APIs unrelated to their work. The infrastructure should allow them to exercise their freedom as data scientists but it should provide enough guardrails and scaffolding, so they don’t have to worry about software architecture too much.

Introducing Metaflow

These observations motivated Metaflow, our human-centric framework for data science. Over the past two years, Metaflow has been used internally at Netflix to build and manage hundreds of data-science projects from natural language processing to operations research.

By design, Metaflow is a deceptively simple Python library:

Data scientists can structure their workflow as a Directed Acyclic Graph of steps, as depicted above. The steps can be arbitrary Python code. In this hypothetical example, the flow trains two versions of a model in parallel and chooses the one with the highest score.

On the surface, this doesn’t seem like much. There are many existing frameworks, such as Apache Airflow or Luigi, which allow execution of DAGs consisting of arbitrary Python code. The devil is in the many carefully designed details of Metaflow: for instance, note how in the above example data and models are stored as normal Python instance variables. They work even if the code is executed on a distributed compute platform, which Metaflow supports by default, thanks to Metaflow’s built-in content-addressed artifact store. In many other frameworks, loading and storing of artifacts is left as an exercise for the user, which forces them to decide what should and should not be persisted. Metaflow removes this cognitive overhead.

Metaflow is packed with human-centric details like this, all of which aim at boosting data scientist productivity. For a comprehensive overview of all features of Metaflow, take a look at our documentation at docs.metaflow.org.

Metaflow on Amazon Web Services

Netflix’s data warehouse contains hundreds of petabytes of data. While a typical machine learning workflow running on Metaflow touches only a small shard of this warehouse, it can still process terabytes of data.

Metaflow is a cloud-native framework. It leverages elasticity of the cloud by design — both for compute and storage. Netflix has been one of the largest users of Amazon Web Services (AWS) for many years and we have accumulated plenty of operational experience and expertise in dealing with the cloud, AWS in particular. For the open-source release, we partnered with AWS to provide a seamless integration between Metaflow and various AWS services.

Metaflow comes with built-in capability to snapshot all code and data in Amazon S3 automatically, which is a key value proposition of our internal Metaflow setup. This provides us with a comprehensive solution for versioning and experiment tracking without any user intervention, which is core to any production-grade machine learning infrastructure.

In addition, Metaflow comes bundled with a high-performance S3 client, which can load data up to 10Gbps. This client has been massively popular amongst our users, who can now load data into their workflows an order of magnitude faster than before, enabling faster iteration cycles.

For general purpose data processing, Metaflow integrates with AWS Batch, which is a managed, container-based compute platform provided by AWS. The user can benefit from infinitely scalable compute clusters by adding a single line in their code: @batch. For training machine learning models, besides writing their own functions, the user has the choice to use AWS Sagemaker, which provides high-performance implementations of various models, many of which support distributed training.

Metaflow supports all common off-the-shelf machine learning frameworks through our @conda decorator, which allows the user to specify external dependencies for their steps safely. The @conda decorator freezes the execution environment, providing good guarantees of reproducibility, both when executed locally as well as in the cloud.

For more details, read this page about Metaflow’s integration with AWS.

From Prototype To Production

Out of the box, Metaflow provides a first-class local development experience. It allows data scientists to develop and test code quickly on your laptop, similar to any Python script. If your workflow supports parallelism, Metaflow takes advantage of all CPU cores available on your development machine.

We encourage our users to deploy their workflows to production as soon as possible. In our case, “production” means a highly available, centralized DAG scheduler, Meson, where users can export their Metaflow runs for execution with a single command. This allows them to start testing their workflow with regularly updating data quickly, which is a highly effective way to surface bugs and issues in the model. Since Meson is not available in open-source, we are working on providing a similar integration to AWS Step Functions, which is a highly available workflow scheduler.

In a complex business environment like Netflix’s, there are many ways to consume the results of a data science workflow. Often, the final results are written to a table, to be consumed by a dashboard. Sometimes, the resulting model is deployed as a microservice to support real-time inferencing. It is also common to chain workflows so that the results of a workflow are consumed by another. Metaflow supports all these modalities, although some of these features are not yet available in the open-source version.

When it comes to inspecting the results, Metaflow comes with a notebook-friendly client API. Most of our data scientists are heavy users of Jupyter notebooks, so we decided to focus our UI efforts on a seamless integration with notebooks, instead of providing a one-size-fits-all Metaflow UI. Our data scientists can build custom model UIs in notebooks, fetching artifacts from Metaflow, which provide just the right information about each model. A similar experience is available with AWS Sagemaker notebooks with open-source Metaflow.

Get Started With Metaflow

Metaflow has been eagerly adopted inside of Netflix, and today, we are making Metaflow available as an open-source project.

We hope that our vision of data scientist autonomy and productivity resonates outside Netflix as well. We welcome you to try Metaflow, start using it in your organization, and participate in its development.

You can find the project home page at metaflow.org and the code at github.com/Netflix/metaflow. Metaflow is comprehensively documented at docs.metaflow.org. The quickest way to get started is to follow our tutorial. If you want to learn more before getting your hands dirty, you can watch presentations about Metaflow at the high level or dig deeper into the internals of Metaflow.

If you have any questions, thoughts, or comments about Metaflow, you can find us at Metaflow chat room or you can reach us by email at [email protected]. We are eager to hear from you!


Open-Sourcing Metaflow, a Human-Centric Framework for Data Science was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

New – Amazon Managed Apache Cassandra Service (MCS)

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-amazon-managed-apache-cassandra-service-mcs/

Managing databases at scale is never easy. One of the options to store, retrieve, and manage large amounts of structured data, including key-value and tabular formats, is Apache Cassandra. With Cassandra, you can use the expressive Cassandra Query Language (CQL) to build applications quickly.

However, managing large Cassandra clusters can be difficult and takes a lot of time. You need specialized expertise to set up, configure, and maintain the underlying infrastructure, and have a deep understanding of the entire application stack, including the Apache Cassandra open source software. You need to add or remove nodes manually, rebalancing partitions, and doing so while keeping your application available with the required performance. Talking with customers, we found out that they often keep their clusters scaled up for peak load because scaling down is complex. To keep your Cassandra cluster updated, you have to do it node by node. It’s hard to backup and restore a cluster if something goes wrong during an update, and you may end up skipping patches or running an outdated version.

Introducing Amazon Managed Cassandra Service
Today, we are launching in open preview Amazon Managed Apache Cassandra Service (MCS), a scalable, highly available, and managed Apache Cassandra-compatible database service. Amazon MCS is serverless, so you pay for only the resources you use and the service automatically scales tables up and down in response to application traffic. You can build applications that serve thousands of requests per second with virtually unlimited throughput and storage.

With Amazon MCS, you can run your Cassandra workloads on AWS using the same Cassandra application code and developer tools that you use today. Amazon MCS implements the Apache Cassandra version 3.11 CQL API, allowing you to use the code and drivers that you already have in your applications. Updating your application is as easy as changing the endpoint to the one in the Amazon MCS service table.

Amazon MCS provides consistent single-digit-millisecond read and write performance at any scale, so you can build applications with low latency to provide a smooth user experience. You have visibility into how your application is performing using Amazon CloudWatch.

There is no limit on the size of a table or the number of items, and you do not need to provision storage. Data storage is fully managed and highly available. Your table data is replicated automatically three times across multiple AWS Availability Zones for durability.

All customer data is encrypted at rest by default. You can use encryption keys stored in AWS Key Management Service (KMS)Amazon MCS is also integrated with AWS Identity and Access Management (IAM) to help you manage access to your tables and data.

Using Amazon Managed Cassandra Service
You can use Amazon MCS with the console, CQL, or existing Apache 2.0 licensed Cassandra drivers. In the console there is a CQL editor, or you can connect using cqlsh. 

To connect using cqlsh, I need to generate service-specific credentials for an existing IAM user. This is just a command using the AWS Command Line Interface (CLI):

aws iam create-service-specific-credential --user-name USERNAME --service-name mcs.amazonaws.com

{
    "ServiceSpecificCredential": {
        "CreateDate": "2019-11-27T14:36:16Z",
        "ServiceName": "mcs.amazonaws.com",
        "ServiceUserName": "USERNAME-at-123412341234",
        "ServicePassword": "...",
        "ServiceSpecificCredentialId": "...",
        "UserName": "USERNAME",
        "Status": "Active"
    }
}

Amazon MCS only accepts secure connections using TLS.  I download the Amazon root certificate and edit the cqlshrc configuration file to use it. Now, I can connect with:

cqlsh {endpoint} {port} -u {ServiceUserName} -p {ServicePassword} --ssl

First, I create a keyspace. A keyspace contains one or more tables and defines the replication strategy for all the tables it contains. With Amazon MCS the default replication strategy for all keyspaces is the Single-region strategy. It replicates data 3 times across multiple Availability Zones in a single AWS Region.

To create a keyspace I can use the console or CQL. In the Amazon MCS console, I provide the name for the keyspace.

Similarly, I can use CQL to create the bookstore keyspace:

CREATE KEYSPACE IF NOT EXISTS bookstore WITH REPLICATION={'class': 'SingleRegionStrategy'};

Now I create a table. A table is where your data is organized and stored. Again, I can use the console or CQL. From the console, I select the bookstore keyspace and give the table a name.

Below that, I add the columns for my books table. Each row in a table is referenced by a primary key, that can be composed of one or more columns, the values of which determine which partition the data is stored in. In my case the primary key is the ISBN. Optionally, I can add clustering columns, which determine the sort order of records within a partition. I am not using clustering columns for this table.

Alternatively, using CQL, I can create the table with the following commands:

USE bookstore;

CREATE TABLE IF NOT EXISTS books
(isbn text PRIMARY KEY,
title text,
author text,
pages int,
year_of_publication int);

I now use CQL to insert a record in the books table:

INSERT INTO books (isbn, title, author, pages, year_of_publication)
VALUES ('978-0201896831',
'The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd Edition)',
'Donald E. Knuth', 672, 1997);

Let’s run a quick query. In the console, I select the books table and then Query table.

In the CQL Editor, I use the default query and select Run command.

By default, I see the result of the query in table view:

If I prefer, I can see the result in JSON format, similar to what an application using the Cassandra API would see:

To insert more records, I use csqlsh again and upload some data from a local CSV file:

COPY books (isbn, title, author, pages, year_of_publication)
FROM './books.csv' WITH delimiter=',' AND header=TRUE;

Now I look again at the content of the books table:

SELECT * FROM books;

I can select a row using a primary key, or use filtering for additional conditions. For example:

SELECT title FROM books WHERE isbn='978-1942788713';

SELECT title FROM books WHERE author='Scott Page' ALLOW FILTERING;

With Amazon MCS you can use existing Apache Cassandra 2.0–licensed drivers and developer tools. Open-source Cassandra drivers are available for Java, Python, Ruby, .NET, Node.js, PHP, C++, Perl, and Go.

You can learn more in the Amazon MCS documentation.

Available in Open Preview
Amazon MCS is available today in open preview in US East (N. Virginia), US East (Ohio), Europe (Stockholm), Asia Pacific (Singapore), Asia Pacific (Tokyo).

As we work with the Cassandra API libraries, we are contributing bug fixes to the open source Apache Cassandra project. We are also contributing back improvements such as built-in support for AWS authentication (SigV4), which simplifies managing credentials for customers running Cassandra on Amazon Elastic Compute Cloud (EC2), since EC2 and IAM can handle distribution and management of credentials using instance roles automatically. We are also announcing the funding of AWS promotional service credits for testing Cassandra-related open-source projects. To learn more about these contributions, visit the Open Source blog.

During the preview, you can use Amazon MCS with on-demand capacity. At general availability, we will also offer the option to use provisioned throughput for more predictable workloads. With on-demand capacity mode, Amazon MCS charges you based on the amount of data your applications read and write from your tables. You do not need to specify how much read and write throughput capacity to provision to your tables because Amazon MCS accommodates your workloads instantly as they scale up or down.

As part of the AWS Free Tier, you can get started with Amazon MCS for free. For the first three months, you are offered a monthly free tier of 30 million write request units, 30 million read request units, and 1 GB of storage. Your free tier starts when you create your first Amazon MCS resource.

Next year we are making it easier to migrate your data Amazon MCS, adding support to use AWS Database Migration Service.

Amazon MCS makes it easy to use Cassandra workloads at any scale, providing a simple programming interface to build new applications, or migrate existing ones. I can’t wait to see what are you going to use it for!

Growth Monitor pi: an open monitoring system for plant science

Post Syndicated from Helen Lynn original https://www.raspberrypi.org/blog/growth-monitor-pi-an-open-monitoring-system-for-plant-science/

Plant scientists and agronomists use growth chambers to provide consistent growing conditions for the plants they study. This reduces confounding variables – inconsistent temperature or light levels, for example – that could render the results of their experiments less meaningful. To make sure that conditions really are consistent both within and between growth chambers, which minimises experimental bias and ensures that experiments are reproducible, it’s helpful to monitor and record environmental variables in the chambers.

A neat grid of small leafy plants on a black plastic tray. Metal housing and tubing is visible to the sides.

Arabidopsis thaliana in a growth chamber on the International Space Station. Many experimental plants are less well monitored than these ones.
(“Arabidopsis thaliana plants […]” by Rawpixel Ltd (original by NASA) / CC BY 2.0)

In a recent paper in Applications in Plant Sciences, Brandin Grindstaff and colleagues at the universities of Missouri and Arizona describe how they developed Growth Monitor pi, or GMpi: an affordable growth chamber monitor that provides wider functionality than other devices. As well as sensing growth conditions, it sends the gathered data to cloud storage, captures images, and generates alerts to inform scientists when conditions drift outside of an acceptable range.

The authors emphasise – and we heartily agree – that you don’t need expertise with software and computing to build, use, and adapt a system like this. They’ve written a detailed protocol and made available all the necessary software for any researcher to build GMpi, and they note that commercial solutions with similar functionality range in price from $10,000 to $1,000,000 – something of an incentive to give the DIY approach a go.

GMpi uses a Raspberry Pi Model 3B+, to which are connected temperature-humidity and light sensors from our friends at Adafruit, as well as a Raspberry Pi Camera Module.

The team used open-source app Rclone to upload sensor data to a cloud service, choosing Google Drive since it’s available for free. To alert users when growing conditions fall outside of a set range, they use the incoming webhooks app to generate notifications in a Slack channel. Sensor operation, data gathering, and remote monitoring are supported by a combination of software that’s available for free from the open-source community and software the authors developed themselves. Their package GMPi_Pack is available on GitHub.

With a bill of materials amounting to something in the region of $200, GMpi is another excellent example of affordable, accessible, customisable open labware that’s available to researchers and students. If you want to find out how to build GMpi for your lab, or just for your greenhouse, Affordable remote monitoring of plant growth in facilities using Raspberry Pi computers by Brandin et al. is available on PubMed Central, and it includes appendices with clear and detailed set-up instructions for the whole system.

The post Growth Monitor pi: an open monitoring system for plant science appeared first on Raspberry Pi.

Getting started with Git and GitHub is easier than ever with GitHub Desktop 2.2

Post Syndicated from Amanda Pinsker original https://github.blog/2019-10-02-get-started-easier-with-github-desktop-2-2/

Anyone who uses Git knows that it has a steep learning curve. We’ve learned from developers that most people tend to learn from a buddy, whether that’s a coworker, a professor, a friend, or even a YouTube video. In GitHub Desktop 2.2, we’re releasing the first version of an interactive Git and GitHub tutorial that can be your buddy and help you get started. If you’re new to Desktop, you can download and try out the tutorial at desktop.github.com.

Get set up

To get set up, we help you through two major pieces: creating a repository and connecting an editor. When you first open Desktop, a welcome page appears with a new option to “Create a Tutorial Repository”. Starting with this option creates a tutorial repository that guides you through the core concepts of working with Git using GitHub Desktop.

There are a lot of tools you need to get started with Git and GitHub. The most important of these is your code editor. In the first step of the tutorial, you’re prompted to install an editor if you don’t have one already.

Learn the GitHub flow

Next, we guide you through how to use GitHub Desktop to make changes to code locally and get your work on GitHub. You’ll create a new branch, make a change to a file, commit it, push it to GitHub, and open your first pull request.

We’ve also heard that new users initially experience confusion between Git, GitHub, and GitHub Desktop. We cover these differences in the tutorial and make sure to reinforce the explanations.

Keep going with your own project

In GitHub Desktop 1.6, we introduced suggested next steps based on the state of your repository. Now when you complete the tutorial, we similarly suggest next steps: exploring projects on GitHub that you might want to contribute to, creating a new project, or adding an existing project to Desktop. We always want GitHub Desktop to be the tool that makes your next steps clear, whether you’re in the flow of your work, or you’re a new developer just getting started.

What’s next?

With GitHub Desktop 2.2, we’re making the product our users love more approachable to newcomers. We’ll be iterating on the tutorial based on your feedback, and we’ll continue to build on the connection between GitHub and your local machine. If you want to start building something but don’t know how, think of GitHub Desktop as your buddy to help you get started.

Learn more about GitHub Desktop

The post Getting started with Git and GitHub is easier than ever with GitHub Desktop 2.2 appeared first on The GitHub Blog.

Accelerating the GitHub Sponsors beta with Stripe Connect

Post Syndicated from Katie Delfin original https://github.blog/2019-09-10-accelerating-the-github-sponsors-beta/

Update: GitHub Sponsors is available in every country where GitHub does business, not just the 30 countries supported by Stripe Connect. We’ll continue to use our existing manual payout system for anyone outside of Connect’s list of currently supported countries.


Back in May, we announced GitHub Sponsors, a new way to support the developers who build and maintain the open source software you use every day. We launched in beta as early as possible so we could work closely with the community to address feedback and understand their needs before adding more developers to the program. Today, we’re taking the next step in accelerating the availability of GitHub Sponsors through a new streamlined onboarding and payment experience with Stripe Connect.

The early days of manual payments

Within hours of announcing GitHub Sponsors, thousands of people signed up for the waitlist from all over the world. It was awesome to see so much excitement for the program, but we knew this meant we had a lot of work ahead of us. 

We started the beta with a small group of sponsored developers, and every few weeks, we invited more into the program. Everything was a manual process—setting up account information, verifying identity, waiting for approval, running reports, and processing payouts across multiple departments. Due to the manual nature of the process, we were limited to a small number of people in the program until we could automate and streamline our operations.

Onboarding more developers, faster

At GitHub, we do business with companies of all sizes, from Fortune 50 companies to early stage startups from all over the world. Collecting revenue globally is something we’ve done for years, but with GitHub Sponsors, it’s not just about receiving money—it’s also about paying it out. This means providing a seamless and secure experience for sponsored developers to verify their identities, enter banking information, receive funds, and manage payouts. We also know that, for many maintainers, this will be the first time they’ve been financially rewarded for their contributions to open source. We want to ensure the process is easy for developers to complete, so they can spend more time doing what they love, like contributing to open source.

Stripe Connect Express offers a simple, low overhead onboarding experience that complies with web accessibility guidelines, coupled with the ability to localize the experience to simplify payouts in countries with unique tax laws and compliance regulations. And by not building our own onboarding solution, we’re able to save months of engineering and maintenance time, and start onboarding more sponsored developers today.

Localization

One of Stripe Connect’s features that’s helped us deliver a great experience to our users is their localization support. Stripe just released international support for 30 countries (with more to come) for Connect Express. With this expanded support, Stripe takes care of adjusting for localized rules and regulations for supported countries—no small feat. We couldn’t be more thrilled to be one of the first to offer support across all 30 countries to serve our global community.

Our maintainer onboarding process can now start to keep up with the growing demand of the program. With Stripe, users can onboard using Connect Express, reducing onboarding time from a week-long process to under five minutes.  

And there’s much more to come. While we’re just under four months into the program, we’re focused on making GitHub Sponsors available to even more developers around the world.

Sign up to join the beta—you’ll hear from us soon.

Learn more about GitHub Sponsors

The post Accelerating the GitHub Sponsors beta with Stripe Connect appeared first on The GitHub Blog.

Running GitHub on Rails 6.0

Post Syndicated from Eileen M. Uchitelle original https://github.blog/2019-09-09-running-github-on-rails-6-0/

On August 26, 2019, the GitHub application was deployed to production with 100 percent of traffic on the newest Rails version: 6.0. This change came just 1.5 weeks after the final release of Rails 6.0. Rails upgrades aren’t always something companies announce, but looking back at GitHub’s history of being on a custom fork of Rails 3.2, this upgrade is a big deal. It represents how far we’ve come in the last few years, and the hard work and dedication from our upgrade team made it smoother, easier, and faster than any of our previous upgrades.

At GitHub, we have a lot to celebrate with the release of Rails 6.0 and the subsequent production deploy. First, we were more involved in this release than we have been in any previous release of Rails. GitHub engineers sent over 100 pull requests to Rails 6.0 to improve documentation, fix bugs, add features, and speed up performance. For many GitHub contributors, this was the first time sending changes to the Rails framework, demonstrating that upgrading Rails not only helps GitHub internally, but also improves our developer community as well.

Second, we deployed Rails 6.0 to production without any negative impact to customers—we had only one Rails 6.0 exception occur during testing, and it was hit by a bot! We were able to achieve this level of stability for the upgrade because we were heavily involved with its development. As soon as we finished the Rails 5.2 upgrade last year, we started upgrading our application to Rails 6.0.

Instead of waiting for the final release, we’d upgrade every week by pulling in the latest changes from Rails master and run all of our tests against that new version. This allowed us to find regressions quickly and early—often finding regressions in Rails master just hours after they were introduced. Upgrading weekly made it easy to find where these regressions were introduced since we were bisecting Rails with only a week’s worth of commits instead of more than a year of commits. Once our build for Rails 6.0 was green, we’d merge the pull request to master, and all new code that went into GitHub would need to pass in Rails 5.2 and the newest master build of Rails. Upgrading every week worked so well that we’ll continue using this process for upgrading from 6.0 to 6.1.

In addition to ensuring that Rails 6.0 was stable, we also contributed to the new features of the framework like parallel testing and multiple databases. The code for these tools is used in our production application every day—it’s well-tested, GitHub-approved code in a public, open source framework. By upstreaming this tooling, we’re able to reduce complexity in our code base and set a standard where so many companies once had to implement this functionality on their own.

There are so many wins to staying upgraded that go beyond more security, faster performance, and new features. By staying current with Rails master, we’re influencing the future of the framework to meet our needs and giving back to the open source community in big ways. This process means that the GitHub code base evolves alongside Rails instead of in response to Rails. Investing in our application by staying up to date with the Rails framework has had a tremendous positive effect on our code base and engineering teams. Staying current allows us to invest in our community, invest in our tools for the long term, and improve the experience of working with the GitHub code base for our engineers.

Keep a look out for all the great stuff we’ll be contributing to Rails 6.1 and beyond—this is just the beginning.

The post Running GitHub on Rails 6.0 appeared first on The GitHub Blog.

Re-Architecting the Video Gatekeeper

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/re-architecting-the-video-gatekeeper-f7b0ac2f6b00?source=rss----2615bd06b42e---4

By Drew Koszewnik

This is the story about how the Content Setup Engineering team used Hollow, a Netflix OSS technology, to re-architect and simplify an essential component in our content pipeline — delivering a large amount of business value in the process.

The Context

Each movie and show on the Netflix service is carefully curated to ensure an optimal viewing experience. The team responsible for this curation is Title Operations. Title Operations will confirm, among other things:

  • We are in compliance with the contracts — date ranges and places where we can show a video are set up correctly for each title
  • Video with captions, subtitles, and secondary audio “dub” assets are sourced, translated, and made available to the right populations around the world
  • Title name and synopsis are available and translated
  • The appropriate maturity ratings are available for each country

When a title meets all of the minimum above requirements, then it is allowed to go live on the service. Gatekeeper is the system at Netflix responsible for evaluating the “liveness” of videos and assets on the site. A title doesn’t become visible to members until Gatekeeper approves it — and if it can’t validate the setup, then it will assist Title Operations by pointing out what’s missing from the baseline customer experience.

Gatekeeper accomplishes its prescribed task by aggregating data from multiple upstream systems, applying some business logic, then producing an output detailing the status of each video in each country.

The Tech

Hollow, an OSS technology we released a few years ago, has been best described as a total high-density near cache:

  • Total: The entire dataset is cached on each node — there is no eviction policy, and there are no cache misses.
  • High-Density: encoding, bit-packing, and deduplication techniques are employed to optimize the memory footprint of the dataset.
  • Near: the cache exists in RAM on any instance which requires access to the dataset.

One exciting thing about the total nature of this technology — because we don’t have to worry about swapping records in-and-out of memory, we can make assumptions and do some precomputation of the in-memory representation of the dataset which would not otherwise be possible. The net result is, for many datasets, vastly more efficient use of RAM. Whereas with a traditional partial-cache solution you may wonder whether you can get away with caching only 5% of the dataset, or if you need to reserve enough space for 10% in order to get an acceptable hit/miss ratio — with the same amount of memory Hollow may be able to cache 100% of your dataset and achieve a 100% hit rate.

And obviously, if you get a 100% hit rate, you eliminate all I/O required to access your data — and can achieve orders of magnitude more efficient data access, which opens up many possibilities.

The Status-Quo

Until very recently, Gatekeeper was a completely event-driven system. When a change for a video occurred in any one of its upstream systems, that system would send an event to Gatekeeper. Gatekeeper would react to that event by reaching into each of its upstream services, gathering the necessary input data to evaluate the liveness of the video and its associated assets. It would then produce a single-record output detailing the status of that single video.

Old Gatekeeper Architecture

This model had several problems associated with it:

  • This process was completely I/O bound and put a lot of load on upstream systems.
  • Consequently, these events would queue up throughout the day and cause processing delays, which meant that titles may not actually go live on time.
  • Worse, events would occasionally get missed, meaning titles wouldn’t go live at all until someone from Title Operations realized there was a problem.

The mitigation for these issues was to “sweep” the catalog so Videos matching specific criteria (e.g., scheduled to launch next week) would get events automatically injected into the processing queue. Unfortunately, this mitigation added many more events into the queue, which exacerbated the problem.

Clearly, a change in direction was necessary.

The Idea

We decided to employ a total high-density near cache (i.e., Hollow) to eliminate our I/O bottlenecks. For each of our upstream systems, we would create a Hollow dataset which encompasses all of the data necessary for Gatekeeper to perform its evaluation. Each upstream system would now be responsible for keeping its cache updated.

New Gatekeeper Architecture

With this model, liveness evaluation is conceptually separated from the data retrieval from upstream systems. Instead of reacting to events, Gatekeeper would continuously process liveness for all assets in all videos across all countries in a repeating cycle. The cycle iterates over every video available at Netflix, calculating liveness details for each of them. At the end of each cycle, it produces a complete output (also a Hollow dataset) representing the liveness status details of all videos in all countries.

We expected that this continuous processing model was possible because a complete removal of our I/O bottlenecks would mean that we should be able to operate orders of magnitude more efficiently. We also expected that by moving to this model, we would realize many positive effects for the business.

  • A definitive solution for the excess load on upstream systems generated by Gatekeeper
  • A complete elimination of liveness processing delays and missed go-live dates.
  • A reduction in the time the Content Setup Engineering team spends on performance-related issues.
  • Improved debuggability and visibility into liveness processing.

The Problem

Hollow can also be thought of like a time machine. As a dataset changes over time, it communicates those changes to consumers by breaking the timeline down into a series of discrete data states. Each data state represents a snapshot of the entire dataset at a specific moment in time.

Hollow is like a time machine

Usually, consumers of a Hollow dataset are loading the latest data state and keeping their cache updated as new states are produced. However, they may instead point to a prior state — which will revert their view of the entire dataset to a point in the past.

The traditional method of producing data states is to maintain a single producer which runs a repeating cycle. During that cycle, the producer iterates over all records from the source of truth. As it iterates, it adds each record to the Hollow library. Hollow then calculates the differences between the data added during this cycle and the data added during the last cycle, then publishes the state to a location known to consumers.

Traditional Hollow usage

The problem with this total-source-of-truth iteration model is that it can take a long time. In the case of some of our upstream systems, this could take hours. This data-propagation latency was unacceptable — we can’t wait hours for liveness processing if, for example, Title Operations adds a rating to a movie that needs to go live imminently.

The Improvement

What we needed was a faster time machine — one which could produce states with a more frequent cadence, so that changes could be more quickly realized by consumers.

Incremental Hollow is like a faster time machine

To achieve this, we created an incremental Hollow infrastructure for Netflix, leveraging work which had been done in the Hollow library earlier, and pioneered in production usage by the Streaming Platform Team at Target (and is now a public non-beta API).

With this infrastructure, each time a change is detected in a source application, the updated record is encoded and emitted to a Kafka topic. A new component that is not part of the source application, the Hollow Incremental Producer service, performs a repeating cycle at a predefined cadence. During each cycle, it reads all messages which have been added to the topic since the last cycle and mutates the Hollow state engine to reflect the new state of the updated records.

If a message from the Kafka topic contains the exact same data as already reflected in the Hollow dataset, no action is taken.

Hollow Incremental Producer Service

To mitigate issues arising from missed events, we implement a sweep mechanism that periodically iterates over an entire source dataset. As it iterates, it emits the content of each record to the Kafka topic. In this way, any updates which may have been missed will eventually be reflected in the Hollow dataset. Additionally, because this is not the primary mechanism by which updates are propagated to the Hollow dataset, this does not have to be run as quickly or frequently as a cycle must iterate the source in traditional Hollow usage.

The Hollow Incremental Producer is capable of reading a great many messages from the Kafka topic and mutating its Hollow state internally very quickly — so we can configure its cycle times to be very short (we are currently defaulting this to 30 seconds).

This is how we built a faster time machine. Now, if Title Operations adds a maturity rating to a movie, within 30 seconds, that data is available in the corresponding Hollow dataset.

The Tangible Result

With the data propagation latency issue solved, we were able to re-implement the Gatekeeper system to eliminate all I/O boundaries. With the prior implementation of Gatekeeper, re-evaluating all assets for all videos in all countries would have been unthinkable — it would tie up the entire content pipeline for more than a week (and we would then still be behind by a week since nothing else could be processed in the meantime). Now we re-evaluate everything in about 30 seconds — and we do that every minute.

There is no such thing as a missed or delayed liveness evaluation any longer, and the disablement of the prior Gatekeeper system reduced the load on our upstream systems — in some cases by up to 80%.

Load reduction on one upstream system

In addition to these performance benefits, we also get a resiliency benefit. In the prior Gatekeeper system, if one of the upstream services went down, we were unable to evaluate liveness at all because we were unable to retrieve any data from that system. In the new implementation, if one of the upstream systems goes down then it does stop publishing — but we still gate stale data for its corresponding dataset while all others make progress. So for example, if the translated synopsis system goes down, we can still bring a movie on-site in a region if it was held back for, and then receives, the correct subtitles.

The Intangible Result

Perhaps even more beneficial than the performance gains has been the improvement in our development velocity in this system. We can now develop, validate, and release changes in minutes which might have before taken days or weeks — and we can do so with significantly increased release quality.

The time-machine aspect of Hollow means that every deterministic process which uses Hollow exclusively as input data is 100% reproducible. For Gatekeeper, this means that an exact replay of what happened at time X can be accomplished by reverting all of our input states to time X, then re-evaluating everything again.

We use this fact to iterate quickly on changes to the Gatekeeper business logic. We maintain a PREPROD Gatekeeper instance which “follows” our PROD Gatekeeper instance. PREPROD is also continuously evaluating liveness for the entire catalog, but publishing its output to a different Hollow dataset. At the beginning of each cycle, the PREPROD environment will gather the latest produced state from PROD, and set each of its input datasets to the exact same versions which were used to produce the PROD output.

The PREPROD Gatekeeper instance “follows” the PROD instance

When we want to make a change to the Gatekeeper business logic, we do so and then publish it to our PREPROD cluster. The subsequent output state from PREPROD can be diffed with its corresponding output state from PROD to view the precise effect that the logic change will cause. In this way, at a glance, we can validate that our changes have precisely the intended effect, and zero unintended consequences.

A Hollow diff shows exactly what changes

This, coupled with some iteration on the deployment process, has resulted in the ability for our team to code, validate, and deploy impactful changes to Gatekeeper in literally minutes — at least an order of magnitude faster than in the prior system — and we can do so with a higher level of safety than was possible in the previous architecture.

Conclusion

This new implementation of the Gatekeeper system opens up opportunities to capture additional business value, which we plan to pursue over the coming quarters. Additionally, this is a pattern that can be replicated to other systems within the Content Engineering space and elsewhere at Netflix — already a couple of follow-up projects have been launched to formalize and capitalize on the benefits of this n-hollow-input, one-hollow-output architecture.

Content Setup Engineering is an exciting space right now, especially as we scale up our pipeline to produce more content with each passing quarter. We have many opportunities to solve real problems and provide massive value to the business — and to do so with a deep focus on computer science, using and often pioneering leading-edge technologies. If this kind of work sounds appealing to you, reach out to Ivan to get the ball rolling.


Re-Architecting the Video Gatekeeper was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Spinnaker Sets Sail to the Continuous Delivery Foundation

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/spinnaker-sets-sail-to-the-continuous-delivery-foundation-e81cd2cbbfeb?source=rss----2615bd06b42e---4

Author: Andy Glover

Since releasing Spinnaker to the open source community in 2015, the platform has flourished with the addition of new cloud providers, triggers, pipeline stages, and much more. Myriad new features, improvements, and innovations have been added by an ever growing, actively engaged community. Each new innovation has been a step towards an even better Continuous Delivery platform that facilitates rapid, reliable, safe delivery of flexible assets to pluggable deployment targets.

Over the last year, Netflix has improved overall management of Spinnaker by enhancing community engagement and transparency. At the Spinnaker Summit in 2018, we announced that we had adopted a formalized project governance plan with Google. Moreover, we also realized that we’ll need to share the responsibility of Spinnaker’s direction as well as yield a level of long-term strategic influence over the project so as to maintain a healthy, engaged community. This means enabling more parties outside of Netflix and Google to have a say in the direction and implementation of Spinnaker.

A strong, healthy, committed community benefits everyone; however, open source projects rarely reach this critical mass. It’s clear Spinnaker has reached this special stage in its evolution; accordingly, we are thrilled to announce two exciting developments.

First, Netflix and Google are jointly donating Spinnaker to the newly created Continuous Delivery Foundation (or CDF), which is part of the Linux Foundation. The CDF is a neutral organization that will grow and sustain an open continuous delivery ecosystem, much like the Cloud Native Computing Foundation (or CNCF) has done for the cloud native computing ecosystem. The initial set of projects to be donated to the CDF are Jenkins, Jenkins X, Spinnaker, and Tekton. Second, Netflix is joining as a founding member of the CDF. Continuous Delivery powers innovation at Netflix and working with other leading practitioners to promote Continuous Delivery through specifications is an exciting opportunity to join forces and bring the benefits of rapid, reliable, and safe delivery to an even larger community.

Spinnaker’s success is in large part due to the amazing community of companies and people that use it and contribute to it. Donating Spinnaker to the CDF will strengthen this community. This move will encourage contributions and investments from additional companies who are undoubtedly waiting on the sidelines. Opening the doors to new companies increases the innovations we’ll see in Spinnaker, which benefits everyone.

Donating Spinnaker to the CDF doesn’t change Netflix’s commitment to Spinnaker, and what’s more, current users of Spinnaker are unaffected by this change. Spinnaker’s previously defined governance policy remains in place. Overtime, new stakeholders will emerge and play a larger, more formal role in shaping Spinnaker’s future. The prospects of an even healthier and more engaged community focused on Spinnaker and the manifold benefits of Continuous Delivery is tremendously exciting and we’re looking forward to seeing it continue to flourish.


Spinnaker Sets Sail to the Continuous Delivery Foundation was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

New – Open Distro for Elasticsearch

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-open-distro-for-elasticsearch/

Elasticsearch is a distributed, document-oriented search and analytics engine. It supports structured and unstructured queries, and does not require a schema to be defined ahead of time. Elasticsearch can be used as a search engine, and is often used for web-scale log analytics, real-time application monitoring, and clickstream analytics.

Originally launched as a true open source project, some of the more recent additions to Elasticsearch are proprietary. My colleague Adrian explains our motivation to start Open Distro for Elasticsearch in his post, Keeping Open Source Open. As strong believers in, and supporters of, open source software, we believe this project will help continue to accelerate open source Elasticsearch innovation.

Open Distro for Elasticsearch
Today we are launching Open Distro for Elasticsearch. This is a value-added distribution of Elasticsearch that is 100% open source (Apache 2.0 license) and supported by AWS. Open Distro for Elasticsearch leverages the open source code for Elasticsearch and Kibana. This is not a fork; we will continue to send our contributions and patches upstream to advance these projects.

In addition to Elasticsearch and Kibana, the first release includes a set of advanced security, event monitoring & alerting, performance analysis, and SQL query features (more on those in a bit). In addition to the source code repo, Open Distro for Elasticsearch and Kibana are available as RPM and Docker containers, with separate downloads for the SQL JDBC and the PerfTop CLI. You can run this code on your laptop, in your data center, or in the cloud.

Contributions are welcome, as are bug reports and feature requests.

Inside Open Distro for Elasticsearch
Let’s take a quick look at the features that we are including in Open Distro for Elasticsearch. Some of these are currently available in Amazon Elasticsearch Service; others will become available in future updates.

Security – This plugin that supports node-to-node encryption, five types of authentication (basic, Active Directory, LDAP, Kerberos, and SAML), role-based access controls at multiple levels (clusters, indices, documents, and fields), audit logging, and cross-cluster search so that any node in a cluster can run search requests across other nodes in the cluster. Learn More

Event Monitoring & Alerting – This feature notifies you when data from one or more Elasticsearch indices meets certain conditions. You could, for example, notify a Slack channel if an application logs more than five HTTP 503 errors in an hour. Monitoring is based on jobs that run on a defined schedule, checking indices against trigger conditions, and raising alerts when a condition has been triggered. Learn More

Deep Performance Analysis – This is a REST API that allows you to query a long list of performance metrics for your cluster. You can access the metrics programmatically or you can visualize them using perf top and other perf tools. Learn More

SQL Support – This feature allows you to query your cluster using SQL statements. It is an improved version of the elasticsearch-sql plugin, and supports a rich set of statements.

This is just the beginning; we have more in the works, and also look forward to your contributions and suggestions!

Jeff;