Tag Archives: Internet of Things

Learn about New AWS re:Invent Launches – December AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-new-aws-reinvent-launches-december-aws-online-tech-talks/

AWS Tech Talks

Join us in the next couple weeks to learn about some of the new service and feature launches from re:Invent 2018. Learn about features and benefits, watch live demos and ask questions! We’ll have AWS experts online to answer any questions you may have. Register today!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

Compute

December 19, 2018 | 01:00 PM – 02:00 PM PTDeveloping Deep Learning Models for Computer Vision with Amazon EC2 P3 Instances – Learn about the different steps required to build, train, and deploy a machine learning model for computer vision.

Containers

December 11, 2018 | 01:00 PM – 02:00 PM PTIntroduction to AWS App Mesh – Learn about using AWS App Mesh to monitor and control microservices on AWS.

Data Lakes & Analytics

December 10, 2018 | 11:00 AM – 12:00 PM PTIntroduction to AWS Lake Formation – Build a Secure Data Lake in Days – AWS Lake Formation (coming soon) will make it easy to set up a secure data lake in days. With AWS Lake Formation, you will be able to ingest, catalog, clean, transform, and secure your data, and make it available for analysis and machine learning.

December 12, 2018 | 11:00 AM – 12:00 PM PTIntroduction to Amazon Managed Streaming for Kafka (MSK) – Learn about features and benefits, use cases and how to get started with Amazon MSK.

Databases

December 10, 2018 | 01:00 PM – 02:00 PM PTIntroduction to Amazon RDS on VMware – Learn how Amazon RDS on VMware can be used to automate on-premises database administration, enable hybrid cloud backups and read scaling for on-premises databases, and simplify database migration to AWS.

December 13, 2018 | 09:00 AM – 10:00 AM PTServerless Databases with Amazon Aurora and Amazon DynamoDB – Learn about the new serverless features and benefits in Amazon Aurora and DynamoDB, use cases and how to get started.

Enterprise & Hybrid

December 19, 2018 | 11:00 AM – 12:00 PM PTHow to Use “Minimum Viable Refactoring” to Achieve Post-Migration Operational Excellence – Learn how to improve the security and compliance of your applications in two weeks with “minimum viable refactoring”.

IoT

December 17, 2018 | 11:00 AM – 12:00 PM PTIntroduction to New AWS IoT Services – Dive deep into the AWS IoT service announcements from re:Invent 2018, including AWS IoT Things Graph, AWS IoT Events, and AWS IoT SiteWise.

Machine Learning

December 10, 2018 | 09:00 AM – 10:00 AM PTIntroducing Amazon SageMaker Ground Truth – Learn how to build highly accurate training datasets with machine learning and reduce data labeling costs by up to 70%.

December 11, 2018 | 09:00 AM – 10:00 AM PTIntroduction to AWS DeepRacer – AWS DeepRacer is the fastest way to get rolling with machine learning, literally. Get hands-on with a fully autonomous 1/18th scale race car driven by reinforcement learning, 3D racing simulator, and a global racing league.

December 12, 2018 | 01:00 PM – 02:00 PM PTIntroduction to Amazon Forecast and Amazon Personalize – Learn about Amazon Forecast and Amazon Personalize – what are the key features and benefits of these managed ML services, common use cases and how you can get started.

December 13, 2018 | 01:00 PM – 02:00 PM PTIntroduction to Amazon Textract: Now in Preview – Learn how Amazon Textract, now in preview, enables companies to easily extract text and data from virtually any document.

Networking

December 17, 2018 | 01:00 PM – 02:00 PM PTIntroduction to AWS Transit Gateway – Learn how AWS Transit Gateway significantly simplifies management and reduces operational costs with a hub and spoke architecture.

Robotics

December 18, 2018 | 11:00 AM – 12:00 PM PTIntroduction to AWS RoboMaker, a New Cloud Robotics Service – Learn about AWS RoboMaker, a service that makes it easy to develop, test, and deploy intelligent robotics applications at scale.

Security, Identity & Compliance

December 17, 2018 | 09:00 AM – 10:00 AM PTIntroduction to AWS Security Hub – Learn about AWS Security Hub, and how it gives you a comprehensive view of high-priority security alerts and your compliance status across AWS accounts.

Serverless

December 11, 2018 | 11:00 AM – 12:00 PM PTWhat’s New with Serverless at AWS – In this tech talk, we’ll catch you up on our ever-growing collection of natively supported languages, console updates, and re:Invent launches.

December 13, 2018 | 11:00 AM – 12:00 PM PTBuilding Real Time Applications using WebSocket APIs Supported by Amazon API Gateway – Learn how to build, deploy and manage APIs with API Gateway.

Storage

December 12, 2018 | 09:00 AM – 10:00 AM PTIntroduction to Amazon FSx for Windows File Server – Learn about Amazon FSx for Windows File Server, a new fully managed native Windows file system that makes it easy to move Windows-based applications that require file storage to AWS.

December 14, 2018 | 01:00 PM – 02:00 PM PTWhat’s New with AWS Storage – A Recap of re:Invent 2018 Announcements – Learn about the key AWS storage announcements that occurred prior to and at re:Invent 2018. With 15+ new service, feature, and device launches in object, file, block, and data transfer storage services, you will be able to start designing the foundation of your cloud IT environment for any application and easily migrate data to AWS.

December 18, 2018 | 09:00 AM – 10:00 AM PTIntroduction to Amazon FSx for Lustre – Learn about Amazon FSx for Lustre, a fully managed file system for compute-intensive workloads. Process files from S3 or data stores, with throughput up to hundreds of GBps and sub-millisecond latencies.

December 18, 2018 | 01:00 PM – 02:00 PM PTIntroduction to New AWS Services for Data Transfer – Learn about new AWS data transfer services, and which might best fit your requirements for data migration or ongoing hybrid workloads.

Amazon SageMaker Neo – Train Your Machine Learning Models Once, Run Them Anywhere

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-neo-train-your-machine-learning-models-once-run-them-anywhere/

Machine learning (ML) is split in two distinct phases: training and inference. Training deals with building the model, i.e. running a ML algorithm on a dataset in order to identify meaningful patterns. This often requires large amounts of storage and computing power, making the cloud a natural place to train ML jobs with services such as Amazon SageMaker and the AWS Deep Learning AMIs.

Inference deals with using the model, i.e. predicting results for data samples that the model has never seen. Here, the requirements are different: developers are typically concerned with optimizing latency (how long does a single prediction take?) and throughput (how many predictions can I run in parallel?). Of course, the hardware architecture of your prediction environment has a very significant impact on such metrics, especially if you’re dealing with resource-constrained devices: as a Raspberry Pi enthusiast, I often wish the little fellow packed a little more punch to speed up my inference code.

Tuning a model for a specific hardware architecture is possible, but the lack of tooling makes this an error-prone and time-consuming process. Minor changes to the ML framework or the model itself usually require the user to start all over again. Unfortunately, this forces most ML developers to deploy the same model everywhere regardless of the underlying hardware, thus missing out on significant performance gains.

Well, no more. Today, I’m very happy to announce Amazon SageMaker Neo, a new capability of Amazon SageMaker that enables machine learning models to train once and run anywhere in the cloud and at the edge with optimal performance.

Introducing Amazon SageMaker Neo

Without any manual intervention, Amazon SageMaker Neo optimizes models deployed on Amazon EC2 instances, Amazon SageMaker endpoints and devices managed by AWS Greengrass.

Here are the supported configurations:

  • Frameworks and algorithms: TensorFlow, Apache MXNet, PyTorch, ONNX, and XGBoost.
  • Hardware architectures: ARM, Intel, and NVIDIA starting today, with support for Cadence, Qualcomm, and Xilinx hardware coming soon. In addition, Amazon SageMaker Neo is released as open source code under the Apache Software License, enabling hardware vendors to customize it for their processors and devices.

The Amazon SageMaker Neo compiler converts models into an efficient common format, which is executed on the device by a compact runtime that uses less than one-hundredth of the resources that a generic framework would traditionally consume. The Amazon SageMaker Neo runtime is optimized for the underlying hardware, using specific instruction sets that help speed up ML inference.

This has three main benefits:

  • Converted models perform at up to twice the speed, with no loss of accuracy.
  • Sophisticated models can now run on virtually any resource-limited device, unlocking innovative use cases like autonomous vehicles, automated video security, and anomaly detection in manufacturing.
  • Developers can run models on the target hardware without dependencies on the framework.

Under the hood

Most machine learning frameworks represent a model as a computational graph: a vertex represents an operation on data arrays (tensors) and an edge represents data dependencies between operations. The Amazon SageMaker Neo compiler exploits patterns in the computational graph to apply high-level optimizations including operator fusion, which fuses multiple small operations together; constant-folding, which statically pre-computes portions of the graph to save execution costs; a static memory planning pass, which pre-allocates memory to hold each intermediate tensor; and data layout transformations, which transform internal data layouts into hardware-friendly forms. The compiler then produces efficient code for each operator.

Once a model has been compiled, it can be run by the Amazon SageMaker Neo runtime. This runtime takes about 1MB of disk space, compared to the 500MB-1GB required by popular deep learning libraries. An application invokes a model by first loading the runtime, which then loads the model definition, model parameters, and precompiled operations.

I can’t wait to try this on my Raspberry Pi. Let’s get to work.

Downloading a pre-trained model

Plenty of pre-trained models are available in the Apache MXNet, Gluon CV or TensorFlow model zoos: here, I’m using a 50-layer model based on the ResNet architecture, pre-trained with Apache MXNet on the ImageNet dataset.

First, I’m downloading the 227MB model as well as the JSON file defining its different layers. This file is particularly important: it tells me that the input symbol is called ‘data’ and that its shape is [1, 3, 224, 224], i.e. 1 image, 3 channels (red, green and blue), 224×224 pixels. I’ll need to make sure that images passed to the model have this exact shape. The output shape is [1, 1000], i.e. a vector containing the probability for each one of the 1,000 classes present in the ImageNet dataset.

To define a performance baseline, I use this model and a vanilla unoptimized version of Apache MXNet 1.2 to predict a few images: on average, inference takes about 6.5 seconds and requires about 306 MB of RAM.

That’s pretty slow: let’s compile the model and see how fast it gets.

Compiling the model for the Raspberry Pi

First, let’s store both model files in a compressed TAR archive and upload it to an Amazon S3 bucket.

$ tar cvfz model.tar.gz resnet50_v1-symbol.json resnet50_v1-0000.params
a resnet50_v1-symbol.json
a resnet50_v1-0000.paramsresnet50_v1-0000.params
$ aws s3 cp model.tar.gz s3://jsimon-neo/
upload: ./model.tar.gz to s3://jsimon-neo/model.tar.gz

Then, I just have to write a simple configuration file for my compilation job. If you’re curious about other frameworks and hardware targets, ‘aws sagemaker create-compilation-job help‘ will give you the exact syntax to use.

{
    "CompilationJobName": "resnet50-mxnet-raspberrypi",
    "RoleArn": $SAGEMAKER_ROLE_ARN,
    "InputConfig": {
        "S3Uri": "s3://jsimon-neo/model.tar.gz",
        "DataInputConfig": "{\"data\": [1, 3, 224, 224]}",
        "Framework": "MXNET"
    },
    "OutputConfig": {
        "S3OutputLocation": "s3://jsimon-neo/",
        "TargetDevice": "rasp3b"
    },
    "StoppingCondition": {
        "MaxRuntimeInSeconds": 300
    }
}

Launching the compilation process takes a single command.

$ aws sagemaker create-compilation-job --cli-input-json file://job.json

Compilation is complete in seconds. Let’s figure out the name of the compilation artifact, fetch it from Amazon S3 and extract it locally

$ aws sagemaker describe-compilation-job \
--compilation-job-name resnet50-mxnet-raspberrypi \
--query "ModelArtifacts"
{
"S3ModelArtifacts": "s3://jsimon-neo/model-rasp3b.tar.gz"
}
$ aws s3 cp s3://jsimon-neo/model-rasp3b.tar.gz .
$ tar xvfz model-rasp3b.tar.gz
x compiled.params
x compiled_model.json
x compiled.so

As you can see, the artifact contains:

  • The original model and symbol files.
  • A shared object file storing compiled, hardware-optimized, operators used by the model.

For convenience, let’s rename them to ‘model.params’, ‘model.json’ and ‘model.so’, and then copy them on the Raspberry pi in a ‘resnet50’ directory.

$ mkdir resnet50
$ mv compiled.params resnet50/model.params
$ mv compiled_model.json resnet50/model.json
$ mv compiled.so resnet50/model.so
$ scp -r resnet50 [email protected]:~

Setting up the inference environment on the Raspberry Pi

Before I can predict images with the model, I need to install the appropriate runtime on my Raspberry Pi. Pre-built packages are available [neopackages]: I just have to download the one for ‘armv7l’ architectures and to install it on my Pi with the provided script. Please note that I don’t need to install any additional deep learning framework (Apache MXNet in this case), saving up to 1GB of persistent storage.

$ scp -r dlr-1.0-py2.py3-armv7l [email protected]:~
<ssh to the Pi>
$ cd dlr-1.0-py2.py3-armv7l
$ sh ./install-py3.sh

We’re all set. Time to predict images!

Using the Amazon SageMaker Neo runtime

On the Pi, the runtime is available as a Python package named ‘dlr’ (deep learning runtime). Using it to predict images is what you would expect:

  • Load the model, defining its input and output symbols.
  • Load an image.
  • Predict!

Here’s the corresponding Python code.

import os
import numpy as np
from dlr import DLRModel

# Load the compiled model
input_shape = {'data': [1, 3, 224, 224]} # A single RGB 224x224 image
output_shape = [1, 1000]                 # The probability for each one of the 1,000 classes
device = 'cpu'                           # Go, Raspberry Pi, go!
model = DLRModel('resnet50', input_shape, output_shape, device)

# Load names for ImageNet classes
synset_path = os.path.join(model_path, 'synset.txt')
with open(synset_path, 'r') as f:
    synset = eval(f.read())

# Load an image stored as a numpy array
image = np.load('dog.npy').astype(np.float32)
print(image.shape)
input_data = {'data': image}

# Predict 
out = model.run(input_data)
top1 = np.argmax(out[0])
prob = np.max(out)
print("Class: %s, probability: %f" % (synset[top1], prob))

Let’s give it a try on this image. Aren’t chihuahuas and Raspberry Pis made for one another?



(1, 3, 224, 224)
Class: Chihuahua, probability: 0.901803

The prediction is correct, but what about speed and memory consumption? Well, this prediction takes about 0.85 second and requires about 260MB of RAM: with Amazon SageMaker Neo, it’s now 5 times faster and 15% more RAM-efficient than with a vanilla model.

This impressive performance gain didn’t require any complex and time-consuming work: all we had to do was to compile the model. Of course, your mileage will vary depending on models and hardware architectures, but you should see significant improvements across the board, including on Amazon EC2 instances such as the C5 or P3 families.

Now available

I hope this post was informative. Compiling models with Amazon SageMaker Neo is free of charge, you will only pay for the underlying resource using the model (Amazon EC2 instances, Amazon SageMaker instances and devices managed by AWS Greengrass).

The service is generally available today in US-East (N. Virginia), US-West (Oregon) and Europe (Ireland). Please start exploring and let us know what you think. We can’t wait to see what you will build!

Julien;

Learn about AWS – November AWS Online Tech Talks

Post Syndicated from Robin Park original https://aws.amazon.com/blogs/aws/learn-about-aws-november-aws-online-tech-talks/

AWS Tech Talks

AWS Online Tech Talks are live, online presentations that cover a broad range of topics at varying technical levels. Join us this month to learn about AWS services and solutions. We’ll have experts online to help answer any questions you may have.

Featured this month! Check out the tech talks: Virtual Hands-On Workshop: Amazon Elasticsearch Service – Analyze Your CloudTrail Logs, AWS re:Invent: Know Before You Go and AWS Office Hours: Amazon GuardDuty Tips and Tricks.

Register today!

Note – All sessions are free and in Pacific Time.

Tech talks this month:

AR/VR

November 13, 2018 | 11:00 AM – 12:00 PM PTHow to Create a Chatbot Using Amazon Sumerian and Sumerian Hosts – Learn how to quickly and easily create a chatbot using Amazon Sumerian & Sumerian Hosts.

Compute

November 19, 2018 | 11:00 AM – 12:00 PM PTUsing Amazon Lightsail to Create a Database – Learn how to set up a database on your Amazon Lightsail instance for your applications or stand-alone websites.

November 21, 2018 | 09:00 AM – 10:00 AM PTSave up to 90% on CI/CD Workloads with Amazon EC2 Spot Instances – Learn how to automatically scale a fleet of Spot Instances with Jenkins and EC2 Spot Plug-In.

Containers

November 13, 2018 | 09:00 AM – 10:00 AM PTCustomer Showcase: How Portal Finance Scaled Their Containerized Application Seamlessly with AWS Fargate – Learn how to scale your containerized applications without managing servers and cluster, using AWS Fargate.

November 14, 2018 | 11:00 AM – 12:00 PM PTCustomer Showcase: How 99designs Used AWS Fargate and Datadog to Manage their Containerized Application – Learn how 99designs scales their containerized applications using AWS Fargate.

November 21, 2018 | 11:00 AM – 12:00 PM PTMonitor the World: Meaningful Metrics for Containerized Apps and Clusters – Learn about metrics and tools you need to monitor your Kubernetes applications on AWS.

Data Lakes & Analytics

November 12, 2018 | 01:00 PM – 01:45 PM PTSearch Your DynamoDB Data with Amazon Elasticsearch Service – Learn the joint power of Amazon Elasticsearch Service and DynamoDB and how to set up your DynamoDB tables and streams to replicate your data to Amazon Elasticsearch Service.

November 13, 2018 | 01:00 PM – 01:45 PM PTVirtual Hands-On Workshop: Amazon Elasticsearch Service – Analyze Your CloudTrail Logs – Get hands-on experience and learn how to ingest and analyze CloudTrail logs using Amazon Elasticsearch Service.

November 14, 2018 | 01:00 PM – 01:45 PM PTBest Practices for Migrating Big Data Workloads to AWS – Learn how to migrate analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premises deployments to AWS.

November 15, 2018 | 11:00 AM – 11:45 AM PTBest Practices for Scaling Amazon Redshift – Learn about the most common scalability pain points with analytics platforms and see how Amazon Redshift can quickly scale to fulfill growing analytical needs and data volume.

Databases

November 12, 2018 | 11:00 AM – 11:45 AM PTModernize your SQL Server 2008/R2 Databases with AWS Database Services – As end of extended Support for SQL Server 2008/ R2 nears, learn how AWS’s portfolio of fully managed, cost effective databases, and easy-to-use migration tools can help.

DevOps

November 16, 2018 | 09:00 AM – 09:45 AM PTBuild and Orchestrate Serverless Applications on AWS with PowerShell – Learn how to build and orchestrate serverless applications on AWS with AWS Lambda and PowerShell.

End-User Computing

November 19, 2018 | 01:00 PM – 02:00 PM PTWork Without Workstations with AppStream 2.0 – Learn how to work without workstations and accelerate your engineering workflows using AppStream 2.0.

Enterprise & Hybrid

November 19, 2018 | 09:00 AM – 10:00 AM PTEnterprise DevOps: New Patterns of Efficiency – Learn how to implement “Enterprise DevOps” in your organization through building a culture of inclusion, common sense, and continuous improvement.

November 20, 2018 | 11:00 AM – 11:45 AM PTAre Your Workloads Well-Architected? – Learn how to measure and improve your workloads with AWS Well-Architected best practices.

IoT

November 16, 2018 | 01:00 PM – 02:00 PM PTPushing Intelligence to the Edge in Industrial Applications – Learn how GE uses AWS IoT for industrial use cases, including 3D printing and aviation.

Machine Learning

November 12, 2018 | 09:00 AM – 09:45 AM PTAutomate for Efficiency with Amazon Transcribe and Amazon Translate – Learn how you can increase efficiency and reach of your operations with Amazon Translate and Amazon Transcribe.

Mobile

November 20, 2018 | 01:00 PM – 02:00 PM PTGraphQL Deep Dive – Designing Schemas and Automating Deployment – Get an overview of the basics of how GraphQL works and dive into different schema designs, best practices, and considerations for providing data to your applications in production.

re:Invent

November 9, 2018 | 08:00 AM – 08:30 AM PTEpisode 7: Getting Around the re:Invent Campus – Learn how to efficiently get around the re:Invent campus using our new mobile app technology. Make sure you arrive on time and never miss a session.

November 14, 2018 | 08:00 AM – 08:30 AM PTEpisode 8: Know Before You Go – Learn about all final details you need to know before you arrive in Las Vegas for AWS re:Invent!

Security, Identity & Compliance

November 16, 2018 | 11:00 AM – 12:00 PM PTAWS Office Hours: Amazon GuardDuty Tips and Tricks – Join us for office hours and get the latest tips and tricks for Amazon GuardDuty from AWS Security experts.

Serverless

November 14, 2018 | 09:00 AM – 10:00 AM PTServerless Workflows for the Enterprise – Learn how to seamlessly build and deploy serverless applications across multiple teams in large organizations.

Storage

November 15, 2018 | 01:00 PM – 01:45 PM PTMove From Tape Backups to AWS in 30 Minutes – Learn how to switch to cloud backups easily with AWS Storage Gateway.

November 20, 2018 | 09:00 AM – 10:00 AM PTDeep Dive on Amazon S3 Security and Management – Amazon S3 provides some of the most enhanced data security features available in the cloud today, including access controls, encryption, security monitoring, remediation, and security standards and compliance certifications.

How to build a front-line concussion monitoring system using AWS IoT and serverless data lakes – Part 2

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/big-data/how-to-build-a-front-line-concussion-monitoring-system-using-aws-iot-and-serverless-data-lakes-part-2/

In part 1 of this series, we demonstrated how to build a data pipeline in support of a data lake. We used key AWS services such as Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda. In part 2, we discuss how to process and visualize the data by creating a serverless data lake that uses key analytics to create actionable data.

Create a serverless data lake and explore data using AWS Glue, Amazon Athena, and Amazon QuickSight

As we discussed in part 1, you can store heart rate data in an Amazon S3 bucket using Kinesis Data Streams. However, storing data in a repository is not enough. You also need to be able to catalog and store the associated metadata related to your repository so that you can extract the meaningful pieces for analytics.

For a serverless data lake, you can use AWS Glue, which is a fully managed data catalog and ETL (extract, transform, and load) service. AWS Glue simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. As you get your AWS Glue Data Catalog data partitioned and compressed for optimal performance, you can use Amazon Athena for the direct query to S3 data. You can then visualize the data using Amazon QuickSight.

The following diagram depicts the data lake that is created in this demonstration:

Amazon S3 now has the raw data stored from the Kinesis process. The first task is to prepare the Data Catalog and identify what data attributes are available to query and analyze. To do this task, you need to create a database in AWS Glue that will hold the table created by the AWS Glue crawler.

An AWS Glue crawler scans through the raw data available in an S3 bucket and creates a data table with a Data Catalog. You can add a scheduler to the crawler to run periodically and scan new data as required. For specific steps to create a database and crawler in AWS Glue, see the blog post Build a Data Lake Foundation with AWS Glue and Amazon S3.

The following figure shows the summary screen for a crawler configuration in AWS Glue:

After configuring the crawler, choose Finish, and then choose Crawler in the navigation bar. Select the crawler that you created, and choose Run crawler.

The crawler process can take 20–60 seconds to initiate. It depends on the Data Catalog, and it creates a table in your database as defined during the crawler configuration.

You can choose the table name and explore the Data Catalog and table:

In the demonstration table details, our data has three attribute time stamps as value_time, the person’s ID as id, and the heart rate as colvalue. These attributes are identified and listed by the AWS Glue crawler. You can see other information such as the data format (text) and the record count (approx. 15,000 with each record size of 61 bytes).

You can use Athena to query the raw data. To access Athena directly from the AWS Glue console, choose the table, and then choose View data on the Actions menu, as shown following:

As noted, the data is currently in a JSON format and we haven’t partitioned it. This means that Athena continues to scan more data, which increases the query cost. The best practice is to always partition data and to convert the data into a columnar format like Apache Parquet or Apache ORC. This reduces the amount of data scans while running a query. Having fewer data scans means better query performance at a lower cost.

To accomplish this, AWS Glue generates an ETL script for you. You can schedule it to run periodically for your data processing, which removes the necessity for complex code writing. AWS Glue is a managed service that runs on top of a warm Apache Spark cluster that is managed by AWS. You can run your own script in AWS Glue or modify a script provided by AWS Glue that meets your requirements. For examples of how to build a custom script for your solution, see Providing Your Own Custom Scripts in the AWS Glue Developer Guide.

For detailed steps to create a job, see the blog post Build a Data Lake Foundation with AWS Glue and Amazon S3. The following figure shows the final AWS Glue job configuration summary for this demonstration:

In this example configuration, we enabled the job bookmark, which helps AWS Glue maintain state information and prevents the reprocessing of old data. You only want to process new data when rerunning on a scheduled interval.

When you choose Finish, AWS Glue generates a Python script. This script processes your data and stores it in a columnar format in the destination S3 bucket specified in the job configuration.

If you choose Run Job, it takes time to complete depending on the amount of data and data processing units (DPUs) configured. By default, a job is configured with 10 DPUs, which can be increased. A single DPU provides processing capacity that consists of 4 vCPUs of compute and 16 GB of memory.

After the job is complete, inspect your destination S3 bucket, and you will find that your data is now in columnar Parquet format.

Partitioning has emerged as an important technique for organizing datasets so that they can be queried efficiently by a variety of big data systems. Data is organized in a hierarchical directory structure based on the distinct values of one or more columns. For information about efficiently processing partitioned datasets using AWS Glue, see the blog post Work with partitioned data in AWS Glue.

You can create triggers for your job that run the job periodically to process new data as it is transmitted to your S3 bucket. For detailed steps on how to configure a job trigger, see Triggering Jobs in AWS Glue.

The next step is to create a crawler for the Parquet data so that a table can be created. The following image shows the configuration for our Parquet crawler:

Choose Finish, and execute the crawler.

Explore your database, and you will notice that one more table was created in the Parquet format.

You can use this new table for direct queries to reduce costs and to increase the query performance of this demonstration.

Because AWS Glue is integrated with Athena, you will find in the Athena console an AWS Glue catalog already available with the table catalog. Fetch 10 rows from Athena in a new Parquet table like you did for the JSON data table in the previous steps.

As the following image shows, we fetched the first 10 rows of heartbeat data from a Parquet format table. This same Athena query scanned only 4.99 KB of data compared to 205 KB of data that was scanned in a raw format. Also, there was a significant improvement in query performance in terms of run time.

Visualize data in Amazon QuickSight

Amazon QuickSight is a data visualization service that you can use to analyze data that has been combined. For more detailed instructions, see the Amazon QuickSight User Guide.

The first step in Amazon QuickSight is to create a new Amazon Athena data source. Choose the heartbeat database created in AWS Glue, and then choose the table that was created by the AWS Glue crawler.

Choose Import to SPICE for quicker analytics. This option creates a data cache and improves graph loading. All non-database datasets must use SPICE. To learn more about SPICE, see Managing SPICE Capacity.

Choose Visualize, and wait for SPICE to import the data to the cache. You can also schedule a periodic refresh so that new data is loaded to SPICE as the data is pipelined to the S3 bucket.

When the SPICE import is complete, you can create a visual dashboard easily. The following figure shows graphs displaying the occurrence of heart rate records per device.  The first graph is a horizontally stacked bar chart, which shows the percentage of heart rate occurrence per device. In the second graph, you can visualize the heart rate count group to the heart rate device.

Conclusion

Processing streaming data at scale is relevant in every industry. Whether you process data from wearables to tackle human health issues or address predictive maintenance in manufacturing centers, AWS can help you simplify your data ingestion and analysis while keeping your overall IT expenditure manageable.

In this two-part series, you learned how to ingest streaming data from a heart rate sensor and visualize it in such a way to create actionable insights. The current state of the art available in the big data and machine learning space makes it possible to ingest terabytes and petabytes of data and extract useful and actionable information from that process.


Additional Reading

If you found this post useful, be sure to check out Work with partitioned data in AWS Glue, and 10 visualizations to try in Amazon QuickSight with sample data.

 


About the Authors

Saurabh Shrivastava is a partner solutions architect and big data specialist working with global systems integrators. He works with AWS partners and customers to provide them architectural guidance for building scalable architecture in hybrid and AWS environments.

 

 

 

Abhinav Krishna Vadlapatla is a Solutions Architect with Amazon Web Services. He supports startups and small businesses with their cloud adoption to build scalable and secure solutions using AWS. During his free time, he likes to cook and travel.

 

 

 

John Cupit is a partner solutions architect for AWS’ Global Telecom Alliance Team. His passion is leveraging the cloud to transform the carrier industry. He has a son and daughter who have both graduated from college. His daughter is gainfully employed, while his son is in his first year of law school at Tulane University. As such, he has no spare money and no spare time to work a second job.

 

 

David Cowden is partner solutions architect and IoT specialist working with AWS emerging partners. He works with customers to provide them architectural guidance for building scalable architecture in IoT space.

 

 

 

Josh Ragsdale is an enterprise solutions architect at AWS. His focus is on adapting to a cloud operating model at very large scale. He enjoys cycling and spending time with his family outdoors.

 

 

 

Pierre-Yves Aquilanti, Ph.D., is a senior specialized HPC solutions architect at AWS. He spent several years in the oil & gas industry to optimize R&D applications for large scale HPC systems and enable the potential of machine learning for the upstream. He and his family crave to live in Singapore again for the human, cultural experience and eat fresh durians.

 

 

Manuel Puron is an enterprise solutions architect at AWS. He has been working in cloud security and IT service management for over 10 years. He is focused on the telecommunications industry. He enjoys video games and traveling to new destinations to discover new cultures.

 

How to build a front-line concussion monitoring system using AWS IoT and serverless data lakes – Part 1

Post Syndicated from Saurabh Shrivastava original https://aws.amazon.com/blogs/big-data/how-to-build-a-front-line-concussion-monitoring-system-using-aws-iot-and-serverless-data-lakes-part-1/

Sports-related minor traumatic brain injuries (mTBI) continue to incite concern among different groups in the medical, sports, and parenting community. At the recreational level, approximately 1.6–3.8 million related mTBI incidents occur in the United States every year, and in most cases, are not treated at the hospital. (See “The epidemiology and impact of traumatic brain injury: a brief overview” in Additional resources.) The estimated medical and indirect costs of minor traumatic brain injury are reaching $60 billion annually.

Although emergency facilities in North America collect data on admitted traumatic brain injuries (TBI) cases, there isn’t meaningful data on the number of unreported mTBIs among athletes. Recent studies indicate a significant rate of under-reporting of sports-related mTBI due to many factors. These factors include the simple inability of team staff to either recognize the signs and symptoms or to actually witness the impact. (See “A prospective study of physician-observed concussions during junior ice hockey: implications for incidence rates” in Additional resources.)

The majority of players involved in hockey and football are not college or professional athletes. There are over 3 million youth hockey players and approximately 5 million registered participants in football. (See “Head Impact Exposure in Youth Football” in Additional resources.) These recreational athletes don’t have basic access to medical staff trained in concussion recognition and sideline injury assessment. A user-friendly measurement and a smartphone-based assessment tool would facilitate the process between identifying potential head injuries, assessment, and return to play (RTP) criteria.

Recently, the use of instrumented sports helmets, including the Head Impact Telemetry System (HITS), has allowed for detailed recording of impacts to the head in many research trials. This practice has led to recommendations to alter contact in practices and certain helmet design parameters. (See “Head impact severity measures for evaluating mild traumatic brain injury risk exposure” in Additional resources.) However, due to the higher costs of the HITS system and complexity of the equipment, it is not a practical impact alert device for the general recreational population.

A simple, practical, and affordable system for measuring head trauma within the sports environment, subject to the absence of trained medical personnel, is required.

Given the proliferation of smartphones, we felt that this was a practical device to investigate to provide this type of monitoring.  All smartphone devices have an embedded Bluetooth communication system to receive and transmit data at various ranges.  For the purposes of this demonstration, we chose a class 1 Bluetooth device as the hardware communication method. We chose it because of its simplicity, widely accepted standard, and compatibility to interface with existing smartphones and IoT devices.

Remote monitoring typically involves collecting information from devices (for example, wearables) at the edge, integrating that information into a data lake, and generating inferences that can then be served back to the relevant stakeholders. Additionally, in some cases, compute and inference must also be done at the edge to shorten the feedback loop between data collection and response.

This use case can be extended to many other use cases in myriad verticals. In this two-part series, we show you how to build a data pipeline in support of a data lake. We use key AWS services such as Amazon Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and AWS Lambda. In part 2, we focus on generating simple inferences from that data that can support RTP parameters.

Architectural overview

Here is the AWS architecture that we cover in this two-part series:

Note: For the purposes of our demonstration, we chose to use heart rate monitoring sensors rather than helmet sensors because they are significantly easier to acquire. Both types of sensors are very similar in how they transmit data. They are also very similar in terms of how they are integrated into a data lake solution.

The resulting demonstration transfers the heartbeat data using the following components:

  • AWS Greengrass set up with a Raspberry Pi 3 to stream heart rate data into the cloud.
  • Data is ingested via Amazon Kinesis Data Streams, and raw data is stored in an Amazon S3 bucket using Kinesis Data Firehose. Find more details about writing to Kinesis Data Firehose using Kinesis Data Streams.
  • Kinesis Data Analytics averages out the heartbeat-per-minute data during stream data ingestion and passes the average to an AWS Lambda
  • AWS Lambda enriches the heartbeat data by comparing the real-time data with baseline information stored in Amazon DynamoDB.
  • AWS Lambda sends SMS/email alerts via an Amazon SNS topic if the heartbeat rate is greater than 120 BPM, for example.
  • AWS Glue runs an extract, transform, and load (ETL) job. This job transforms the data store in a JSON format to a compressed Apache Parquet columnar format and applies that transformed partition for faster query processing. AWS Glue is a fully managed ETL service for crawling data stored in an Amazon S3 bucket and building a metadata catalog.
  • Amazon Athena is used for ad hoc query analysis on the data that is processed by AWS Glue. This data is also available for machine learning processing using predictive analysis to reduce heart disease risk.
  • Amazon QuickSight is a fully managed visualization tool. It uses Amazon Athena as a data source and depicts visual line and pie charts to show the heart rate data in a visual dashboard.

All data pipelines are serverless and are refreshed periodically to provide up-to-date data.

You can use Kinesis Data Firehose to transform the data in the pipeline to a compressed Parquet format without needing to use AWS Glue. For the purposes of this post, we are using AWS Glue to highlight its capabilities, including a centralized AWS Glue Data Catalog. This Data Catalog can be used by Athena for ad hoc queries and by Apache Spark EMR to run complex machine learning processes. AWS Glue also lets you edit generated ETL scripts and supports “bring your own ETL” to process data for more complex use cases.

Configuring key processes to support the pipeline

The following sections describe how to set up and configure the devices and services used in the demonstration to build a data pipeline in support of a data lake.

Remote sensors and IoT devices

You can use commercially available heart rate monitors to collect electrocardiography (ECG) information such as heart rate. The monitor is strapped around the chest area with the sensor placed over the sternum for better accuracy. The monitor measures the heart rate and sends the data over Bluetooth Low Energy (BLE) to a Raspberry Pi 3. The following figure depicts the device-side architecture for our demonstration.

The Raspberry Pi 3 is host to both the IoT device and the AWS Greengrass core. The IoT device is responsible for connecting to the heart rate monitor over BLE and collecting the heart rate data. The collected data is then sent locally to the AWS Greengrass core, where it can be processed and routed to the cloud through a secure connection. The AWS Greengrass core serves as the “edge” gateway for the heart rate monitor.

Set up AWS Greengrass core software on Raspberry Pi 3

To prepare your Raspberry Pi for running AWS Greengrass software, follow the instructions in Environment Setup for Greengrass in the AWS Greengrass Developer Guide.

After setting up your Raspberry Pi, you are ready to install AWS Greengrass and create your first Greengrass group. Create a Greengrass group by following the steps in Configure AWS Greengrass on AWS IoT. Then install the appropriate certificates to the Raspberry Pi by following the steps to start AWS Greengrass on a core device.

The preceding steps deploy a Greengrass group that consists of three discrete configurable items: a device, a subscription list, and the connectivity information.

The core device is a set of code that is responsible for collecting the heart rate information from the sensor and sending it to the AWS Greengrass core. This device is using the AWS IoT Device SDK for Python including the Greengrass Discovery API.

Use the following AWS CLI command to create a Greengrass group:

aws greengrass create-group --name heartRateGroup

To complete the setup, follow the steps in Create AWS IoT Devices in an AWS Greengrass Group.

After you complete the setup, the heart rate data is routed from the device to the AWS IoT Core service using AWS Greengrass. As such, you need to add a single subscription in the Greengrass group to facilitate this message route:

Here, your device is named Heartrate_Sensor, and the target is the IoT Cloud on the topic iot/heartrate. That means that when your device publishes to the iot/heartrate topic, AWS Greengrass also sends this message to the AWS IoT Core service on the same topic. Then you can use the breadth of AWS services to process the data.

The connectivity information is configured to use the local host because the IoT device resides on the Raspberry Pi 3 along with the AWS Greengrass core software. The IoT device uses the Discovery API, which is responsible for retrieving the connectivity information of the AWS Greengrass core that the IoT device is associated with.

The IoT device then uses the endpoint and port information to open a secure TLS connection to AWS Greengrass core, where the heart rate data is sent. The AWS Greengrass core connectivity information should be depicted as follows:

The power of AWS Greengrass core is that you can deploy AWS Lambda functions and new subscriptions to process the heart rate information locally on the Raspberry Pi 3. For example, you can deploy an AWS Lambda function that can trigger a reaction if the detected heart rate is reaching a set threshold. In this scenario, different individuals might require different thresholds and responses, so you could theoretically deploy unique Lambda functions on a per-individual basis if needed.

Configure AWS Greengrass and AWS IoT Core

To enable further processing and storage of the heart rate data messages published from AWS Greengrass core to AWS IoT Core, create an AWS IoT rule. The AWS IoT rule retrieves messages published to the IoT/heartrate topic and sends them to the Kinesis data stream through an AWS IoT rule action for Kinesis action.  

Simulate heart rate data

You might not have access to an IoT device, but you still want to run a proof of concept (PoC) around heart rate use cases. You can simulate data by creating a shell script and deploying that data simulation script on an Amazon EC2 instance. Refer to the EC2 user guide to get started with Amazon EC2 Linux instances.

On the Amazon EC2 instance, create a shell script kinesis_client_HeartRate.sh, and copy the provided code to start writing some records into the Kinesis data stream. Be sure to create your Kinesis data stream and replace the variable <your_stream_name> in the following script.

#!/bin/sh
while true
do
  deviceID=$(( ( RANDOM % 10 )  + 1 ))
  heartRate=$(jot -r 1 60 140)
  echo "$deviceID,$heartRate"
  aws kinesis put-record --stream-name <your_stream_name> --data "$deviceID,$heartRate"$'\n' --partition-key $deviceID --region us-east-1
done

You can also use the Kinesis Data Generator to create data and then stream it to your solution or demonstration. For details on its use, see the blog post Test Your Streaming Data Solution with the New Amazon Kinesis Data Generator.

Ingest data using Kinesis and manage alerts with Lambda, DynamoDB, and Amazon SNS

Now you need to ingest data from the IoT device, which can be processed for real-time notifications when abnormal heart rates are detected.

Streaming data from the heart rate monitoring device is ingested to Kinesis Data Streams. Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data. For this project, the data stream was configured with one open shard and a data retention period of 24 hours. This lets you send 1 MB of data or 1,000 events per second and read 2 MB of data per second. If you need to support more devices, you can scale up and add more shards using the UpdateShardCount API or the Amazon Kinesis scaling utility.

You can configure your data stream by using the following AWS CLI command (and then using the appropriate flag to turn on encryption).

aws kinesis create-stream --stream-name hearrate_stream --shard-count 1

You can use an AWS CloudFormation template to create the entire stack depicted in the following architecture diagram.

When launching an AWS CloudFormation template, be sure to enter your email address or mobile phone number with the appropriate endpoint protocol (“Email” or “SMS”) as parameters:

Alternatively, you can follow the manual steps in the documentation links that are provided in this post.

Streaming data in Kinesis can be processed and analyzed in real time by Kinesis clients. Refer to the Kinesis Data Streams Developer Guide to learn how to create a Kinesis data stream.

To identify abnormal heart rate information, you must use real-time analytics to detect abnormal behavior. You can use Kinesis Data Analytics to perform analytics on streaming data in real time. Kinesis Data Analytics consists of three configurable components: source, real-time analytics, and destination. Refer to the AWS documentation to learn the detailed steps to configure Kinesis Data Analytics.

Kinesis Data Analytics uses Kinesis Data Streams as the source stream for the data. In the source configuration process, if there are scenarios where in-filtering or masking records is required, you can preprocess records using AWS Lambda. The data in this particular case is relatively simple, so you don’t need preprocessing of records on the data.

The Kinesis Data Analytics schema editor lets you edit and transform the schema if required. In the following example, we transformed the second column to Value instead of COL_Value.

The SQL code to perform the real-time analysis of the data has to be copied to the SQL Editor for real-time analytics. The following is the sample code that was used for this demonstration.

“CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
                                   VALUEROWTIME TIMESTAMP,
                                   ID INTEGER, 
                                   COLVALUE INTEGER);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM" 
SELECT STREAM ROWTIME,
              ID,
              AVG("Value") AS HEARTRATE
FROM     "SOURCE_SQL_STREAM_001"
GROUP BY ID, 
         STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND) HAVING AVG("Value") > 120 OR AVG("Value") < 40;”

This code generates DESTINATION_SQL_STREAM. It inserts values into the stream only when the average value of the heart beat that is received from SOURCE_SQL_STREAM_001 is greater than 120 or less than 40 in the 60-second time window.

For more information about the tumbling window concept, see Tumbling Windows (Aggregations Using GROUP BY).

Next, add an AWS Lambda function as one of your destinations, and configure it as follows:

In the destination editor, make sure that the stream name selected is the DESTINATION_SQL_STREAM. You only want to trigger the Lambda function when anomalies in the heart rate are detected. The output format can be JSON or CSV. In this example, our Lambda function expects the data in JSON format, so we chose JSON.

Athlete and athletic trainer registration information is stored in the heartrate Registrations DynamoDB table. Amazon DynamoDB offers fully managed encryption at rest using an AWS Key Management Service (AWS KMS) managed encryption key for DynamoDB. You need to create a table with encryption at rest enabled. Follow the detailed steps in Amazon DynamoDB Encryption at Rest.

Each record in the table should include deviceid, customerid, firstname, lastname, and mobile. The following is an example table record for reference.

{
  "customerid": {
    "S": "3"
  },
  "deviceid": {
    "S": "7"
  },
  "email": {
    "S": "[email protected]"
  },
  "firstname": {
    "S": "John"
  },
  "lastname": {
    "S": "Smith"
  },
  "mobile": {
    "S": "19999999999"
  }
}

Refer to the DynamoDB Developer Guide for complete instructions for creating and populating a DynamoDB table.

The Lambda function is created to process the record passed from the Kinesis Data Analytics application.  The node.js Lambda function retrieves the athlete and athletic trainer information from the DynamoDB registrations table. It then alerts the athletic trainer to the event by sending a cellular text message via the Amazon Simple Notification Service (Amazon SNS).

Note: The default AWS account limit for Amazon SNS for mobile messages is $1.00 per month. You can increase this limit through an SNS Limit Increase case as described in AWS Service Limits.

You now create a new Lambda function with a runtime of Node.js 6.10 and choose the Create a custom role option for IAM permissions.  If you are new to deploying Lambda functions, see Create a Simple Lambda Function.

You must configure the new Lambda function with a specific IAM role, providing privileges to Amazon CloudWatch Logs, Amazon DynamoDB, and Amazon SNS as provided in the supplied AWS CloudFormation template.

The provided AWS Lambda function retrieves the HR Monitor Device ID and HR Average from the base64-encoded JSON message that is passed from Kinesis Data Analytics.  After retrieving the HR Monitor Device ID, the function then queries the DynamoDB Athlete registration table to retrieve the athlete and athletic trainer information.

Finally, the AWS Lambda function sends a mobile text notification (which does not contain any sensitive information) to the athletic trainer’s mobile number retrieved from the athlete data by using the Amazon SNS service.

To store the streaming data to an S3 bucket for further analysis and visualization using other tools, you can use Kinesis Data Firehose to connect the pipeline to Amazon S3 storage.  To learn more, see Create a Kinesis Data Firehose Delivery Stream.

Kinesis Data Firehose delivers the streaming data in intervals to the destination S3 bucket. The intervals can be defined using either an S3 buffer size or an S3 buffer interval (or both, whichever exceeds the first metric). The data in the Data Firehose delivery stream can be transformed. It also lets you back up the source record before applying any transformation. The data can be encrypted and compressed to GZip, Zip, or Snappy format to store the data in a columnar format like Apache Parquet and Apache ORC. This improves the query performance and reduces the storage footprint. You should enable error logging for operational and production troubleshooting.

Conclusion

In part 1 of this blog series, we demonstrated how to build a data pipeline in support of a data lake. We used key AWS services such as Kinesis Data Streams, Kinesis Data Analytics, Kinesis Data Firehose, and Lambda. In part 2, we’ll discuss how to deploy a serverless data lake and use key analytics to create actionable insights from the data lake.

Additional resources

Langlois, J.A., Rutland-Brown, W. & Wald, M., “The epidemiology and impact of traumatic brain injury: a brief overview,” Journal of Head Trauma Rehabilitation, Vol. 21, No. 5, 2006, pp. 375-378.

Echlin, S. E., Tator, C. H., Cusimano, M. D., Cantu, R. C., Taunton, J. E., Upshur E. G., Hall, C. R., Johnson, A. M., Forwell, L. A., Skopelja, E. N., “A prospective study of physician-observed concussions during junior ice hockey: implications for incidence rates,” Neurosurg Focus, 29 (5):E4, 2010

Daniel, R. W., Rowson, S., Duma, S. M., “Head Impact Exposure in Youth Football,” Annals of Biomedical Engineering., Vol. 10, 2012, 1007.

Greenwald, R. M., Gwin, J. T., Chu, J. J., Crisco, J. J., “Head impact severity measures for evaluating mild traumatic brain injury risk exposure,” Neurosurgery Vol. 62, 2008, pp. 789–79


Additional Reading

If you found this post useful, be sure to check out Setting Up Just-in-Time Provisioning with AWS IoT Core, and Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics.

 


About the Authors

Saurabh Shrivastava is a partner solutions architect and big data specialist working with global systems integrators. He works with AWS partners and customers to provide them architectural guidance for building scalable architecture in hybrid and AWS environments.

 

 

 

Abhinav Krishna Vadlapatla is a Solutions Architect with Amazon Web Services. He supports startups and small businesses with their cloud adoption to build scalable and secure solutions using AWS. During his free time, he likes to cook and travel.

 

 

 

John Cupit is a partner solutions architect for AWS’ Global Telecom Alliance Team.  His passion is leveraging the cloud to transform the carrier industry.  He has a son and daughter who have both graduated from college. His daughter is gainfully employed, while his son is in his first year of law school at Tulane University.  As such, he has no spare money and no spare time to work a second job.

 

 

David Cowden is partner solutions architect and IoT specialist working with AWS emerging partners. He works with customers to provide them architectural guidance for building scalable architecture in IoT space.

 

 

 

Josh Ragsdale is an enterprise solutions architect at AWS.  His focus is on adapting to a cloud operating model at very large scale. He enjoys cycling and spending time with his family outdoors.

 

 

 

Pierre-Yves Aquilanti, Ph.D., is a senior specialized HPC solutions architect at AWS. He spent several years in the oil & gas industry to optimize R&D applications for large scale HPC systems and enable the potential of machine learning for the upstream. He and his family crave to live in Singapore again for the human, cultural experience and eat fresh durians.

 

 

Manuel Puron is an enterprise solutions architect at AWS. He has been working in cloud security and IT service management for over 10 years. He is focused on the telecommunications industry. He enjoys video games and traveling to new destinations to discover new cultures.

 

Real-time bushfire alerting with Complex Event Processing in Apache Flink on Amazon EMR and IoT sensor network

Post Syndicated from Santoshkumar Kulkarni original https://aws.amazon.com/blogs/big-data/real-time-bushfire-alerting-with-complex-event-processing-in-apache-flink-on-amazon-emr-and-iot-sensor-network/

Bushfires are frequent events in the warmer months of the year when the climate is hot and dry. Countries like Australia and the United States are severely impacted due to the bushfires causing devastating effects to human lives and property. Over the years the prediction of bushfires has been a subject of study in various research projects. Many of these projects use complex machine learning algorithms. These algorithms learn to predict the bushfires from the real-time spread for the fire over a particular geographical region­.

In this blog post, we use event processing paradigm provided by Apache Flink’s Complex Event Processing (CEP) to detect potential bushfire patterns from incoming temperature events from IoT sensors in real time, and then send alerts via email. A real-time heat-map visualization of the area under surveillance is also integrated for monitoring purposes.

This post uses the following AWS services:

Overview of the real-time bushfire prediction alert system

The development and deployment of a large-scale wireless sensor network for bushfire detection and alerting is a complex task. The scenario for this post assumes that the sensors are long-lived battery powered and deployed over a multi-hop wireless mesh network using technologies like LoRaWAN. It also assumes that the placement of the IoT sensors is placed strategically within an area under surveillance, and not under the direct exposure of sunlight. This placement avoids excessive heating, and constantly recording and emitting the temperature readings where they are installed. The sensors can also communicate with each other to send and receive individual temperature readings to perceive the status of their surroundings. Key parameters recorded by the devices include temperature in degree Celsius, time stamp, node id, and infectedBy, as illustrated in Figure 1.

Figure 1. List of sensor events containing measured temperature by the IoT sensor devices over different time points

Once a potential bushfire starts to spread, the IoT sensors placed within its path can detect the subsequent temperature increase. It can then share the news with their neighboring sensors. This phenomenon is similar to spreading an epidemic over a network following a directional path. It is usually referred to as a Susceptible-Infected (SI) epidemic model in network science.

As shown in Figure 2, the parameter ‘infectedBy’ sent by a given node indicates that it has been infected by a neighboring IoT device (that is, the bushfire has been spread through this path) with the ‘node id’ listed as the parameter value. Here, we assume that once a node is infected within the network by one of its neighbors, it isn’t infected again by another node. Therefore, the value of the ‘infectedBy’ parameter remains the same.

Figure 2. High-level overview of an IoT sensor network monitoring temperature of the surrounding geographical area

For the purposes of this scenario, an 11-node IoT sensor network is shown in Figure 2 that exhibits how the bushfire spreads over time. The placement of IoT sensors can be visualized as an undirected graph network where each node is an IoT sensor device. A link between two neighboring nodes denotes the wireless connectivity within a multi-hop wireless ad hoc network. Figure 2 shows the following details:

  • At time t1, the nodes are all emitting the temperature data. However, none of them have reported any temperature greater than the bushfire alert threshold, which is set as 50°
  • At time t2, node-1 reports a temperature of 50° Celsius, which was over the pre-set threshold. In reality, it could have been a small-scale bushfire that recently triggered in the area under node-1’s surveillance.
  • As time moves forward to t3, we see that the fire rapidly spreads to the area monitored by node-2 in the network. Hence, we can now say that node-2 is infected by node-1. This is reflected in the ‘infectedBy’ parameter emitted by the node-2 at which has the value 1 denoting how the fire is spreading over time.
  • Next, at time t4, the fire spreads further to node-3, and followed by node-4 and node-5 by time .

This analogy helps us visualize the overall spread of the bushfire over a network of IoT devices. In this blog post, we use the CEP feature from Apache Flink to detect an event pattern where the measured temperature is higher than 50° Celsius and has an infection degree of 4 within 5 wirelessly connected IoT devices. Once this pattern is detected by real-time event stream processing in Amazon EMR, an SNS alert email is sent to the subscribed email address. The email highlights the path through which the fire has spread so far.

Architecture overview

In this section, we build a real-time event processing pipeline from start to finish. This streamlines the temperature measurements emitted by the IoT devices over the infrastructure layer to build predictive alert and visualization systems for potential bushfire monitoring. The high-level architectural diagram in Figure 3 shows the components required to build this system.

Figure 3. High-level block diagram of the real-time bushfire alert and visualization systems

The diagram shows that the IoT sensor events (that is, measured temperature) feed into an IoT Gateway. The gateway has internet connectivity to forward the records to a stream processing system. The gateway is the landing zone for all the IoT events, which ingests the records into a durable streaming storage. The IoT gateway should scale automatically to support over a billion devices without requiring us to manage any infrastructure.

Next, we store the events in the durable storage, as the temperature events are coming from non-reliable sources, like the IoT sensor devices. As a result, the events cannot be replayed in case any of the records are lost. The streaming storage should also support ingress and egress of large number of events. The events are then consumed by a stream processing engine that matches the incoming events to a pattern and later sends out alerts to the subscribers, if necessary. The raw events are also ingested into a visualization system. This system displays the real-time heat map of the temperatures in the area under surveillance by the IoT nodes.

Building a real-time bushfire alert and visualization system in AWS

Figure 4. Architecture of the real-time IoT stream processing pipeline using AWS services

In this section, we depict a component-level architecture for an event processing system using several of the AWS services, as shown in Figure 4. The IoT sensor data is sent to the AWS IoT services, which receives the incoming temperature records. It then acts as a gateway to the streaming pipeline. The AWS IoT Services is equipped with a rules engine. This can be used to configure an action for the incoming events so that they can be forwarded to another AWS service as a destination.

In this case, Amazon Kinesis Data Streams service is chosen as the destination to act as reliable underlying stream storage system with 1 day as retention period. Figure 5 below shows one such rule and action configuration used in this blog article. The events are then consumed by the Apache Flink processing engine running on an Amazon EMR cluster. Apache Flink consumes the records from the Amazon Kinesis Data Streams shards and matches the records against a pre-defined pattern to detect the possibility of a potential bushfire.

Figure 5. AWS IoT rule and action for the incoming temperature events

Use Apache Flink as the stream processing engine

In this blog, we have chosen Apache Flink as the stream processing engine as it provides high throughput with low latency processing of real-time events. More importantly, it supports stream processing and windowing with Event Time semantics for Amazon Kinesis Data Streams. This is an important feature where events arrive out of order and may also be delayed due to unreliable wireless network communication. Another great feature available in Apache Flink is the Complex Event Processing (CEP) library. It allows you to detect patterns within the stream of incoming events over a period of time. Let’s explore these features in more detail and how they can be used in this particular use case.

Characteristics of IoT events, Event-time processing and Watermarks

Most IoT use cases deal with a large number of sensor devices continuously generating high volume of events over time. The events generally have a time stamp in the record, which indicates when it was generated. However, for consumers, the events can arrive out of order or with delays during processing. In a stream processing application, the consumer should be able to deal with out of order and delayed events.

Apache Flink does this by using event time windowing. The window slides, not according to the processing time window, but by the event time. This helps to make alerting decisions based on the event time where it is more relevant. When the processing window is based on event time, we must know when to advance the event time. That tells us when we can close the window and trigger the next processing task. This is done through ‘Watermarks’ in Flink. There are various mechanisms to generate watermarks. We have used the TimeLagWatermarkGenerator, which generates watermarks that are lagging behind the processing time by a certain unit of time. This assumes that some of the records arrive in the Flink consumer after a certain delay. It is also important to mention that we chose Flink as the processing engine. It provides the Flink CEP feature to match the event based on the pattern we provide. This feature is currently not available in Structured Spark Streaming and Amazon Kinesis Data Analytics.

Complex Event Processing Apache Flink

Once the records are fetched from the Amazon Kinesis Data stream into the Apache Flink application running on Amazon EMR cluster, they must be matched against a pattern. The pattern filters records that first reached the threshold temperature of 50° Celsius and are next followed by another event (from another IoT sensor), which has also reached the same threshold temperature and has been infected by the first IoT sensor node corresponding to the first event on the pattern.

For example, among the incoming events, we get an event from the IoT sensor from node-1 where the temperature is greater than 50° Celsius, which is the first event in the CEP pattern. Next, we look for an event, which follows this first event. For example, an event from node-2 that has a temperature that reached the 50° threshold mark, and has its ‘infectedBy’ field set to 1 indicating node-1. If this condition repeats iteratively for the other three nodes like node-3, node-4, and node-5, then a complete pattern of four network degree path (N1 -> N2 -> N3 -> N -> N5) of potential bushfire initiated from node-1 is said to be detected.

In our implementation, we have chosen to send out an alert when the fire spreads to five connected nodes. But in reality, the number of nodes the fire spreads to before we send out an alert must be carefully chosen. A logical diagram of this particular pattern is shown in Figure 6. Finally, in response to this potential bushfire, an alert email is published to SNS and it delivers the email to the service’s subscribers. A sample SNS alert email is shown below in Figure 7.

Figure 6. The logical pattern diagram for predicting the bushfire

Figure 7. A sample Amazon SNS email alert to notify a potential bushfire and its traversing path

Real-time visualisation of the potential bushfire spread over time

All the incoming IoT event records (unfiltered and raw events got from the Amazon Kinesis Data Stream) are pushed into an Amazon Elasticsearch Service cluster for durable storage and visualization on the Kibana web UI. (For more information about Kibana, see What Is Kibana.) A real-time heat-map visualization and dashboard is created to continuously monitor the progress of the bushfire as shown in Figure 8. As you can see from Figure 8, the bushfire is spreading from node-1 to node-2 to node-3, and then to node-4 and node-5. This visualization can be further enhanced by recording the geographical location of the IoT sensor nodes. Then, plotting the heat-map over the geographical area under surveillance.

Figure 8. A sample bushfire heat-map visualisation from Amazon Elasticsearch Services

Setup and source code

The URL below explains in detail, the steps to set up all the necessary AWS components and run the IoT simulator and Apache Flink CEP. Here, we provide an AWS CloudFormation template. It creates the architecture from start to finish, shown in Figure 4, by setting up the IoT simulator in an EC2 instance with all other respective components, and then automatically runs the stack. Once the stack creation is completed, users can visit the Kibana web UI to observe the real-time bushfire dashboard. They can also receive an SNS alert email for potential bushfires when they confirm their email subscription for it.

To launch the AWS CloudFormation stack, click on the following Launch Stack button to open the AWS CloudFormation console.

Specify the Amazon S3 URL for the template, and then proceed to the next step where you specify the details explained in the next section. After you provide all the parameters, you can proceed and then create the stack. The parameters that you use are as follows:

1. SNS subscription email: This should be a valid email address for the fire alert notification to be sent. Once the CloudFormation stack creation initiates, you get an email on your provided email account for confirming the subscription. Choose the Confirm subscription button to receive the SNS notification.

Note: You may safely delete the SNS subscription from the Amazon SNS console upon completion of the end-to-end observation.

2. Public subnet ID for EMR cluster: Choose a public subnet from the drop-down menu. This is the subnet where the EMR cluster is created. The EMR cluster requires access to the internet to access the Kinesis stream and the Elasticsearch domain. Therefore, a public subnet with the option auto-assign public IPv4 address enabled within a VPC where the enableDnsHostnames and enableDnsSupport options are set to true.

3. S3 path location for EMR cluster logs: The S3 bucket where EMR puts the cluster logs.

Figure 9. Amazon CloudFormation console to create the AWS resource stack.

4. Public subnet ID for EC2 instance: The subnet where the EC2 instance is created to run the IoT simulator. Once the CloudFormation stack comes up, the IoT simulator automatically runs to ingest the IoT events to the AWS IoT gateway. The choice of this subnet follows the same guidelines set out for the subnet chosen for the EMR cluster shown in Figure 9.

5. Security group ID for EC2 instance: This is the security group attached to the EC2 instance running the IoT simulator. You can optionally add a new rule in this security group for SSH port 22. This allows access to the IoT simulator running in this EC2 instance from your workstation, and use the same public IP address for accessing Kibana web UI described later in this post.

6. Key pair ID for EC2 instance: The key pair to be associated with the EC2 instance and the EMR cluster. It allows you to log in to the instances for further exploration.

Figure 10. Amazon CloudFormation console to create the AWS resource stack.

7. Domain name for Amazon Elasticsearch domain: The Elasticsearch domain name to be created by the CloudFormation template.

8. Public IP address to access Kibana from local machine: The IP address of the local machine from where you want to access Kibana dashboard for visualization. For simplicity, you can provide the public IP address of the workstation from where you are running the CloudFormation stack, this IP address opens up for access on the Amazon Elasticsearch domain for displaying the real-time Kibana dashboard. You also must access the Kibana URL from this IP address only. If your IP address changes, then modify the policy of AWS Elasticsearch cluster with new IP address. For more information, see IP-based Policies in the Amazon Elasticsearch Service Developer Guide.

Once the stack creation is completed, the output section of the CloudFormation stack lists the web URLs to access the necessary resources in this architecture. Use the Kibana web URL and create an index pattern under the Management section with the name “weather-sensor-data,” and then choose the dashboard to see the visualization of the real-time spread of the bushfire covered by the IoT network.

Figure 11. Amazon Elasticsearch domain Kibana Web UI to create the index pattern.

The source codes and an elaborate README are provided in GitHub for the interested users who want to explore the implementation of this architecture in further detail.

https://github.com/aws-samples/realtime-bushfire-alert-with-apache-flink-cep

Troubleshooting Common Issues

1. Kibana Web UI is not accessible

If you see the Message “User: anonymous is not authorized to perform: es:ESHttpGet” while trying to access the Kibana web UI, then this means that the public IP address that was specified during the CloudFormation stack creation time either is not correct or might have been changed. You can confirm the public IP address again from http://checkip.amazonaws.com. Then go to the AWS Management Console for Elasticsearch and modify the security access policy, as shown in the following example, to change the IP address only.

Figure 12. Modifying public IP address in the Amazon Elasticsearch domain access policy.

2.  No records ingested into Elasticsearch

This issue can occur if records fail to be ingested from the EMR cluster. To troubleshoot this, go to the AWS Management Console for IAM and search for the IAM role named “EMR_EC2_Default_Role” and make sure it has the default AWS managed policy “AmazonElasticMapReduceforEC2Role” attached to it.

Figure 13. Verifying the default AWS Managed policy attached to “EMR_EC2_Default_Role” in IAM.

3. No SNS alert E-mail notification

If you do not receive an SNS email alert about the potential bushfire after several minutes of observing the complete visualization, then check whether you had confirmed the SNS subscription at the beginning while the CloudFormation stack was creating by checking your inbox. Additionally, make sure that you have provided a correct email address and re-create the stack again from the scratch.

Summary

In this blog post, we discussed how to build a real-time IoT stream processing, visualization, and alerting pipeline using various AWS services. We took advantage of the Complex Event Processing feature provided by Apache Flink to detect patterns within a network from the incoming events. The GitHub repository contains the resources that are required to run through the example provided in this post. Includes further information that helps you to get started quickly. We encourage you to explore the IoT simulator code and test with different network configuration to ingest more records with different patterns and visualize the bushfire spread path pattern on the Kibana dashboard.


Additional Reading

If you found this post useful, be sure to check out Build a Real-time Stream Processing Pipeline with Apache Flink on AWS, Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight, and Integrating IoT Events into Your Analytic Platform.

 


About the Authors

Santoshkumar Kulkarni is a Cloud Support Engineer with AWS Big Data and Analytics Services. He works closely with AWS customers to provide them architectural and engineering assistance and guidance. He is passionate about distributed technologies and streaming systems. In his free time, he enjoys spending time with his family on the beautiful beaches of Sydney.

 

 

 

Joarder Kamal, PhD is a Cloud Support Engineer with AWS Big Data and Analytics Services. He likes building and automating systems that combines distributed communications, real-time data, and collective intelligence. During his spare time, he loves reading travel diaries, doing pencil sketches, and touring across Australia with his wife.

AWS Online Tech Talks – July 2018

Post Syndicated from Sara Rodas original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-july-2018/

Join us this month to learn about AWS services and solutions featuring topics on Amazon EMR, Amazon SageMaker, AWS Lambda, Amazon S3, Amazon WorkSpaces, Amazon EC2 Fleet and more! We also have our third episode of the “How to re:Invent” where we’ll dive deep with the AWS Training and Certification team on Bootcamps, Hands-on Labs, and how to get AWS Certified at re:Invent. Register now! We look forward to seeing you. Please note – all sessions are free and in Pacific Time.

 

Tech talks featured this month:

 

Analytics & Big Data

July 23, 2018 | 11:00 AM – 12:00 PM PT – Large Scale Machine Learning with Spark on EMR – Learn how to do large scale machine learning on Amazon EMR.

July 25, 2018 | 01:00 PM – 02:00 PM PT – Introduction to Amazon QuickSight: Business Analytics for Everyone – Get an introduction to Amazon Quicksight, Amazon’s BI service.

July 26, 2018 | 11:00 AM – 12:00 PM PT – Multi-Tenant Analytics on Amazon EMR – Discover how to make an Amazon EMR cluster multi-tenant to have different processing activities on the same data lake.

 

Compute

July 31, 2018 | 11:00 AM – 12:00 PM PT – Accelerate Machine Learning Workloads Using Amazon EC2 P3 Instances – Learn how to use Amazon EC2 P3 instances, the most powerful, cost-effective and versatile GPU compute instances available in the cloud.

August 1, 2018 | 09:00 AM – 10:00 AM PT – Technical Deep Dive on Amazon EC2 Fleet – Learn how to launch workloads across instance types, purchase models, and AZs with EC2 Fleet to achieve the desired scale, performance and cost.

 

Containers

July 25, 2018 | 11:00 AM – 11:45 AM PT – How Harry’s Shaved Off Their Operational Overhead by Moving to AWS Fargate – Learn how Harry’s migrated their messaging workload to Fargate and reduced message processing time by more than 75%.

 

Databases

July 23, 2018 | 01:00 PM – 01:45 PM PT – Purpose-Built Databases: Choose the Right Tool for Each Job – Learn about purpose-built databases and when to use which database for your application.

July 24, 2018 | 11:00 AM – 11:45 AM PT – Migrating IBM Db2 Databases to AWS – Learn how to migrate your IBM Db2 database to the cloud database of your choice.

 

DevOps

July 25, 2018 | 09:00 AM – 09:45 AM PT – Optimize Your Jenkins Build Farm – Learn how to optimize your Jenkins build farm using the plug-in for AWS CodeBuild.

 

Enterprise & Hybrid

July 31, 2018 | 09:00 AM – 09:45 AM PT – Enable Developer Productivity with Amazon WorkSpaces – Learn how your development teams can be more productive with Amazon WorkSpaces.

August 1, 2018 | 11:00 AM – 11:45 AM PT – Enterprise DevOps: Applying ITIL to Rapid Innovation – Innovation doesn’t have to equate to more risk for your organization. Learn how Enterprise DevOps delivers agility while maintaining governance, security and compliance.

 

IoT

July 30, 2018 | 01:00 PM – 01:45 PM PT – Using AWS IoT & Alexa Skills Kit to Voice-Control Connected Home Devices – Hands-on workshop that covers how to build a simple backend service using AWS IoT to support an Alexa Smart Home skill.

 

Machine Learning

July 23, 2018 | 09:00 AM – 09:45 AM PT – Leveraging ML Services to Enhance Content Discovery and Recommendations – See how customers are using computer vision and language AI services to enhance content discovery & recommendations.

July 24, 2018 | 09:00 AM – 09:45 AM PT – Hyperparameter Tuning with Amazon SageMaker’s Automatic Model Tuning – Learn how to use Automatic Model Tuning with Amazon SageMaker to get the best machine learning model for your datasets, to tune hyperparameters.

July 26, 2018 | 09:00 AM – 10:00 AM PT – Build Intelligent Applications with Machine Learning on AWS – Learn how to accelerate development of AI applications using machine learning on AWS.

 

re:Invent

July 18, 2018 | 08:00 AM – 08:30 AM PT – Episode 3: Training & Certification Round-Up – Join us as we dive deep with the AWS Training and Certification team on Bootcamps, Hands-on Labs, and how to get AWS Certified at re:Invent.

 

Security, Identity, & Compliance

July 30, 2018 | 11:00 AM – 11:45 AM PT – Get Started with Well-Architected Security Best Practices – Discover and walk through essential best practices for securing your workloads using a number of AWS services.

 

Serverless

July 24, 2018 | 01:00 PM – 02:00 PM PT – Getting Started with Serverless Computing Using AWS Lambda – Get an introduction to serverless and how to start building applications with no server management.

 

Storage

July 30, 2018 | 09:00 AM – 09:45 AM PT – Best Practices for Security in Amazon S3 – Learn about Amazon S3 security fundamentals and lots of new features that help make security simple.

HackSpace magazine 7: Internet of Everything

Post Syndicated from Andrew Gregory original https://www.raspberrypi.org/blog/hackspace-magazine-7-internet-of-everything/

We’re usually averse to buzzwords at HackSpace magazine, but not this month: in issue 7, we’re taking a deep dive into the Internet of Things.HackSpace magazine issue 7 cover

Internet of Things (IoT)

To many people, IoT is a shady term used by companies to sell you something you already own, but this time with WiFi; to us, it’s a way to make our builds smarter, more useful, and more connected. In HackSpace magazine #7, you can join us on a tour of the boards that power IoT projects, marvel at the ways in which other makers are using IoT, and get started with your first IoT project!

Awesome projects

DIY retro computing: this issue, we’re taking our collective hat off to Spencer Owen. He stuck his home-brew computer on Tindie thinking he might make a bit of beer money — now he’s paying the mortgage with his making skills and inviting others to build modules for his machine. And if that tickles your fancy, why not take a crack at our Z80 tutorial? Get out your breadboard, assemble your jumper wires, and prepare to build a real-life computer!

Inside HackSpace magazine issue 7

Shameless patriotism: combine Lego, Arduino, and the car of choice for 1960 gold bullion thieves, and you’ve got yourself a groovy weekend project. We proudly present to you one man’s epic quest to add LED lights (controllable via a smartphone!) to his daughter’s LEGO Mini Cooper.

Makerspaces

Patriotism intensifies: for the last 200-odd years, the Black Country has been a hotbed of making. Urban Hax, based in Walsall, is the latest makerspace to show off its riches in the coveted Space of the Month pages. Every space has its own way of doing things, but not every space has a portrait of Rob Halford on the wall. All hail!

Inside HackSpace magazine issue 7

Diversity: advice on diversity often boils down to ‘Be nice to people’, which might feel more vague than actionable. This is where we come in to help: it is truly worth making the effort to give people of all backgrounds access to your makerspace, so we take a look at why it’s nice to be nice, and at the ways in which one makerspace has put niceness into practice — with great results.

And there’s more!

We also show you how to easily calculate the size and radius of laser-cut gears, use a bank of LEDs to etch PCBs in your own mini factory, and use chemistry to mess with your lunch menu.

Inside HackSpace magazine issue 7
Helen Steer inside HackSpace magazine issue 7
Inside HackSpace magazine issue 7

All this plus much, much more waits for you in HackSpace magazine issue 7!

Get your copy of HackSpace magazine

If you like the sound of that, you can find HackSpace magazine in WHSmith, Tesco, Sainsbury’s, and independent newsagents in the UK. If you live in the US, check out your local Barnes & Noble, Fry’s, or Micro Center next week. We’re also shipping to stores in Australia, Hong Kong, Canada, Singapore, Belgium, and Brazil, so be sure to ask your local newsagent whether they’ll be getting HackSpace magazine.

And if you can’t get to the shops, fear not: you can subscribe from £4 an issue from our online shop. And if you’d rather try before you buy, you can always download the free PDF. Happy reading, and happy making!

The post HackSpace magazine 7: Internet of Everything appeared first on Raspberry Pi.

Serverless Architectures with AWS Lambda: Overview and Best Practices

Post Syndicated from Andrew Baird original https://aws.amazon.com/blogs/architecture/serverless-architectures-with-aws-lambda-overview-and-best-practices/

For some organizations, the idea of “going serverless” can be daunting. But with an understanding of best practices – and the right tools — many serverless applications can be fully functional with only a few lines of code and little else.

Examples of fully-serverless-application use cases include:

  • Web or mobile backends – Create fully-serverless, mobile applications or websites by creating user-facing content in a native mobile application or static web content in an S3 bucket. Then have your front-end content integrate with Amazon API Gateway as a backend service API. Lambda functions will then execute the business logic you’ve written for each of the API Gateway methods in your backend API.
  • Chatbots and virtual assistants – Build new serverless ways to interact with your customers, like customer support assistants and bots ready to engage customers on your company-run social media pages. The Amazon Alexa Skills Kit (ASK) and Amazon Lex have the ability to apply natural-language understanding to user-voice and freeform-text input so that a Lambda function you write can intelligently respond and engage with them.
  • Internet of Things (IoT) backends – AWS IoT has direct-integration for device messages to be routed to and processed by Lambda functions. That means you can implement serverless backends for highly secure, scalable IoT applications for uses like connected consumer appliances and intelligent manufacturing facilities.

Using AWS Lambda as the logic layer of a serverless application can enable faster development speed and greater experimentation – and innovation — than in a traditional, server-based environment.

We recently published the “Serverless Architectures with AWS Lambda: Overview and Best Practices” whitepaper to provide the guidance and best practices you need to write better Lambda functions and build better serverless architectures.

Once you’ve finished reading the whitepaper, below are a couple additional resources I recommend as your next step:

  1. If you would like to better understand some of the architecture pattern possibilities for serverless applications: Thirty Serverless Architectures in 30 Minutes (re:Invent 2017 video)
  2. If you’re ready to get hands-on and build a sample serverless application: AWS Serverless Workshops (GitHub Repository)
  3. If you’ve already built a serverless application and you’d like to ensure your application has been Well Architected: The Serverless Application Lens: AWS Well Architected Framework (Whitepaper)

About the Author

 

Andrew Baird is a Sr. Solutions Architect for AWS. Prior to becoming a Solutions Architect, Andrew was a developer, including time as an SDE with Amazon.com. He has worked on large-scale distributed systems, public-facing APIs, and operations automation.

Introducing Microsoft Azure Sphere

Post Syndicated from corbet original https://lwn.net/Articles/751994/rss

Microsoft has issued a
press release
describing the security dangers involved with the
Internet of things (“a weaponized stove, baby monitors that spy, the
contents of your refrigerator being held for ransom
“) and introducing
“Microsoft Azure Sphere” as a combination of hardware and software to
address the problem. “Unlike the RTOSes common to MCUs today, our
defense-in-depth IoT OS offers multiple layers of security. It combines
security innovations pioneered in Windows, a security monitor, and a custom
Linux kernel to create a highly-secured software environment and a
trustworthy platform for new IoT experiences.

The answers to your questions for Eben Upton

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/eben-q-a-1/

Before Easter, we asked you to tell us your questions for a live Q & A with Raspberry Pi Trading CEO and Raspberry Pi creator Eben Upton. The variety of questions and comments you sent was wonderful, and while we couldn’t get to them all, we picked a handful of the most common to grill him on.

You can watch the video below — though due to this being the first pancake of our live Q&A videos, the sound is a bit iffy — or read Eben’s answers to the first five questions today. We’ll follow up with the rest in the next few weeks!

Live Q&A with Eben Upton, creator of the Raspberry Pi

Get your questions to us now using #AskRaspberryPi on Twitter

Any plans for 64-bit Raspbian?

Raspbian is effectively 32-bit Debian built for the ARMv6 instruction-set architecture supported by the ARM11 processor in the first-generation Raspberry Pi. So maybe the question should be: “Would we release a version of our operating environment that was built on top of 64-bit ARM Debian?”

And the answer is: “Not yet.”

When we released the Raspberry Pi 3 Model B+, we released an operating system image on the same day; the wonderful thing about that image is that it runs on every Raspberry Pi ever made. It even runs on the alpha boards from way back in 2011.

That deep backwards compatibility is really important for us, in large part because we don’t want to orphan our customers. If someone spent $35 on an older-model Raspberry Pi five or six years ago, they still spent $35, so it would be wrong for us to throw them under the bus.

So, if we were going to do a 64-bit version, we’d want to keep doing the 32-bit version, and then that would mean our efforts would be split across the two versions; and remember, we’re still a very small engineering team. Never say never, but it would be a big step for us.

For people wanting a 64-bit operating system, there are plenty of good third-party images out there, including SUSE Linux Enterprise Server.

Given that the 3B+ includes 5GHz wireless and Power over Ethernet (PoE) support, why would manufacturers continue to use the Compute Module?

It’s a form-factor thing.

Very large numbers of people are using the bigger product in an industrial context, and it’s well engineered for that: it has module certification, wireless on board, and now PoE support. But there are use cases that can’t accommodate this form factor. For example, NEC displays: we’ve had this great relationship with NEC for a couple of years now where a lot of their displays have a socket in the back that you can put a Compute Module into. That wouldn’t work with the 3B+ form factor.

Back of an NEC display with a Raspberry Pi Compute Module slotted in.

An NEC display with a Raspberry Pi Compute Module

What are some industrial uses/products Raspberry is used with?

The NEC displays are a good example of the broader trend of using Raspberry Pi in digital signage.

A Raspberry Pi running the wait time signage at The Wizarding World of Harry Potter, Universal Studios.
Image c/o thelonelyredditor1

If you see a monitor at a station, or an airport, or a recording studio, and you look behind it, it’s amazing how often you’ll find a Raspberry Pi sitting there. The original Raspberry Pi was particularly strong for multimedia use cases, so we saw uptake in signage very early on.

An array of many Raspberry Pis

Los Alamos Raspberry Pi supercomputer

Another great example is the Los Alamos National Laboratory building supercomputers out of Raspberry Pis. Many high-end supercomputers now are built using white-box hardware — just regular PCs connected together using some networking fabric — and a collection of Raspberry Pi units can serve as a scale model of that. The Raspberry Pi has less processing power, less memory, and less networking bandwidth than the PC, but it has a balanced amount of each. So if you don’t want to let your apprentice supercomputer engineers loose on your expensive supercomputer, a cluster of Raspberry Pis is a good alternative.

Why is there no power button on the Raspberry Pi?

“Once you start, where do you stop?” is a question we ask ourselves a lot.

There are a whole bunch of useful things that we haven’t included in the Raspberry Pi by default. We don’t have a power button, we don’t have a real-time clock, and we don’t have an analogue-to-digital converter — those are probably the three most common requests. And the issue with them is that they each cost a bit of money, they’re each only useful to a minority of users, and even that minority often can’t agree on exactly what they want. Some people would like a power button that is literally a physical analogue switch between the 5V input and the rest of the board, while others would like something a bit more like a PC power button, which is partway between a physical switch and a ‘shutdown’ button. There’s no consensus about what sort of power button we should add.

So the answer is: accessories. By leaving a feature off the board, we’re not taxing the majority of people who don’t want the feature. And of course, we create an opportunity for other companies in the ecosystem to create and sell accessories to those people who do want them.

Adafruit Push-button Power Switch Breakout Raspberry Pi

The Adafruit Push-button Power Switch Breakout is one of many accessories that fill in the gaps for makers.

We have this neat way of figuring out what features to include by default: we divide through the fraction of people who want it. If you have a 20 cent component that’s going to be used by a fifth of people, we treat that as if it’s a $1 component. And it has to fight its way against the $1 components that will be used by almost everybody.

Do you think that Raspberry Pi is the future of the Internet of Things?

Absolutely, Raspberry Pi is the future of the Internet of Things!

In practice, most of the viable early IoT use cases are in the commercial and industrial spaces rather than the consumer space. Maybe in ten years’ time, IoT will be about putting 10-cent chips into light switches, but right now there’s so much money to be saved by putting automation into factories that you don’t need 10-cent components to address the market. Last year, roughly 2 million $35 Raspberry Pi units went into commercial and industrial applications, and many of those are what you’d call IoT applications.

So I think we’re the future of a particular slice of IoT. And we have ten years to get our price point down to 10 cents 🙂

The post The answers to your questions for Eben Upton appeared first on Raspberry Pi.

New – Machine Learning Inference at the Edge Using AWS Greengrass

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-machine-learning-inference-at-the-edge-using-aws-greengrass/

What happens when you combine the Internet of Things, Machine Learning, and Edge Computing? Before I tell you, let’s review each one and discuss what AWS has to offer.

Internet of Things (IoT) – Devices that connect the physical world and the digital one. The devices, often equipped with one or more types of sensors, can be found in factories, vehicles, mines, fields, homes, and so forth. Important AWS services include AWS IoT Core, AWS IoT Analytics, AWS IoT Device Management, and Amazon FreeRTOS, along with others that you can find on the AWS IoT page.

Machine Learning (ML) – Systems that can be trained using an at-scale dataset and statistical algorithms, and used to make inferences from fresh data. At Amazon we use machine learning to drive the recommendations that you see when you shop, to optimize the paths in our fulfillment centers, fly drones, and much more. We support leading open source machine learning frameworks such as TensorFlow and MXNet, and make ML accessible and easy to use through Amazon SageMaker. We also provide Amazon Rekognition for images and for video, Amazon Lex for chatbots, and a wide array of language services for text analysis, translation, speech recognition, and text to speech.

Edge Computing – The power to have compute resources and decision-making capabilities in disparate locations, often with intermittent or no connectivity to the cloud. AWS Greengrass builds on AWS IoT, giving you the ability to run Lambda functions and keep device state in sync even when not connected to the Internet.

ML Inference at the Edge
Today I would like to toss all three of these important new technologies into a blender! You can now perform Machine Learning inference at the edge using AWS Greengrass. This allows you to use the power of the AWS cloud (including fast, powerful instances equipped with GPUs) to build, train, and test your ML models before deploying them to small, low-powered, intermittently-connected IoT devices running in those factories, vehicles, mines, fields, and homes that I mentioned.

Here are a few of the many ways that you can put Greengrass ML Inference to use:

Precision Farming – With an ever-growing world population and unpredictable weather that can affect crop yields, the opportunity to use technology to increase yields is immense. Intelligent devices that are literally in the field can process images of soil, plants, pests, and crops, taking local corrective action and sending status reports to the cloud.

Physical Security – Smart devices (including the AWS DeepLens) can process images and scenes locally, looking for objects, watching for changes, and even detecting faces. When something of interest or concern arises, the device can pass the image or the video to the cloud and use Amazon Rekognition to take a closer look.

Industrial Maintenance – Smart, local monitoring can increase operational efficiency and reduce unplanned downtime. The monitors can run inference operations on power consumption, noise levels, and vibration to flag anomalies, predict failures, detect faulty equipment.

Greengrass ML Inference Overview
There are several different aspects to this new AWS feature. Let’s take a look at each one:

Machine Learning ModelsPrecompiled TensorFlow and MXNet libraries, optimized for production use on the NVIDIA Jetson TX2 and Intel Atom devices, and development use on 32-bit Raspberry Pi devices. The optimized libraries can take advantage of GPU and FPGA hardware accelerators at the edge in order to provide fast, local inferences.

Model Building and Training – The ability to use Amazon SageMaker and other cloud-based ML tools to build, train, and test your models before deploying them to your IoT devices. To learn more about SageMaker, read Amazon SageMaker – Accelerated Machine Learning.

Model Deployment – SageMaker models can (if you give them the proper IAM permissions) be referenced directly from your Greengrass groups. You can also make use of models stored in S3 buckets. You can add a new machine learning resource to a group with a couple of clicks:

These new features are available now and you can start using them today! To learn more read Perform Machine Learning Inference.

Jeff;

 

Robinson: Fedora IoT Edition is go!

Post Syndicated from jake original https://lwn.net/Articles/748953/rss

On his blog, Peter Robinson announced the acceptance of a new edition of Fedora for the Internet of Things (IoT). He had proposed it as a Fedora “spin”, but the Fedora Council decided to make it a full-fledged edition with its own working group. “So what will be happening over the coming weeks (and months)? We’ll be getting the working group in place, getting an initial monthly release process in place so that people can start to have something to kick the tires with and provide feedback and drive discussion. With those two big pieces in place we can start to grow the Fedora IoT community and work out the bits that work and bits that don’t work.

Attending Mobile World Congress? Check Out Our Connected Car Demo!

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/attending-mobile-world-congress-check-out-our-connected-car-demo/

Are you planning to attend Mobile World Congress 2018 in Barcelona (one of my favorite cities)? If so, please be sure to check out the connected car demo in Hall 5 Booth 5E41.

The AWS Greengrass team has been working on a proof of concept with our friends at Vodafone and Saguna to show you how connected cars can change the automotive industry. The demo is built around the emerging concept of multi-access edge computing, or MEC.

Car manufacturers want to provide advanced digital technology in their vehicles but don’t want to make significant upgrades to the on-board computing resources due to cost, power, and time-to-market considerations, not to mention the issues that arise when attempting to retrofit cars that are already on the road. MEC offloads processing resources to the edge of the mobile network, for instance a hub site in the access network. This model helps car manufacturers to take advantage of low-latency compute resources while building features that can evolve and improve over the lifetime of the vehicle, often 20 years or more. It also reduces the complexity and the cost of the on-board components.

The MWC demo streams a live video feed over Vodafone’s 4G LTE network, with Saguna’s AI-powered MEC solution that leverages AWS Greengrass. The demo focuses on driver safety, with the goal of helping to detect drivers that are distracted by talking to someone or something in the car. With an on-board camera aimed at the driver, backed up by AI-powered movement tracking and pattern detection running at the edge of the mobile network, distractions can be identified and the driver can be alerted. This architecture also allows manufacturers to enhance existing cars since most of the computing is handled at the edge of the mobile network.

If you couldn’t make it to Mobile World Congress, you can also check out the video for this solution, here.

Jeff;

[$] Open-source trusted computing for IoT

Post Syndicated from jake original https://lwn.net/Articles/747564/rss

At this year’s FOSDEM in Brussels,
Jan Tobias Mühlberg gave a talk on the
latest work on Sancus, a
project that was originally presented
at the USENIX Security Symposium in 2013. The project is a fully
open-source hardware platform to support “trusted
computing
” and other security functionality. It is designed to be used for
internet of things (IoT)
devices, automotive applications, critical infrastructure, and other
embedded devices where trusted code is expected to be run.

After Section 702 Reauthorization

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/01/after_section_7.html

For over a decade, civil libertarians have been fighting government mass surveillance of innocent Americans over the Internet. We’ve just lost an important battle. On January 18, President Trump signed the renewal of Section 702, domestic mass surveillance became effectively a permanent part of US law.

Section 702 was initially passed in 2008, as an amendment to the Foreign Intelligence Surveillance Act of 1978. As the title of that law says, it was billed as a way for the NSA to spy on non-Americans located outside the United States. It was supposed to be an efficiency and cost-saving measure: the NSA was already permitted to tap communications cables located outside the country, and it was already permitted to tap communications cables from one foreign country to another that passed through the United States. Section 702 allowed it to tap those cables from inside the United States, where it was easier. It also allowed the NSA to request surveillance data directly from Internet companies under a program called PRISM.

The problem is that this authority also gave the NSA the ability to collect foreign communications and data in a way that inherently and intentionally also swept up Americans’ communications as well, without a warrant. Other law enforcement agencies are allowed to ask the NSA to search those communications, give their contents to the FBI and other agencies and then lie about their origins in court.

In 1978, after Watergate had revealed the Nixon administration’s abuses of power, we erected a wall between intelligence and law enforcement that prevented precisely this kind of sharing of surveillance data under any authority less restrictive than the Fourth Amendment. Weakening that wall is incredibly dangerous, and the NSA should never have been given this authority in the first place.

Arguably, it never was. The NSA had been doing this type of surveillance illegally for years, something that was first made public in 2006. Section 702 was secretly used as a way to paper over that illegal collection, but nothing in the text of the later amendment gives the NSA this authority. We didn’t know that the NSA was using this law as the statutory basis for this surveillance until Edward Snowden showed us in 2013.

Civil libertarians have been battling this law in both Congress and the courts ever since it was proposed, and the NSA’s domestic surveillance activities even longer. What this most recent vote tells me is that we’ve lost that fight.

Section 702 was passed under George W. Bush in 2008, reauthorized under Barack Obama in 2012, and now reauthorized again under Trump. In all three cases, congressional support was bipartisan. It has survived multiple lawsuits by the Electronic Frontier Foundation, the ACLU, and others. It has survived the revelations by Snowden that it was being used far more extensively than Congress or the public believed, and numerous public reports of violations of the law. It has even survived Trump’s belief that he was being personally spied on by the intelligence community, as well as any congressional fears that Trump could abuse the authority in the coming years. And though this extension lasts only six years, it’s inconceivable to me that it will ever be repealed at this point.

So what do we do? If we can’t fight this particular statutory authority, where’s the new front on surveillance? There are, it turns out, reasonable modifications that target surveillance more generally, and not in terms of any particular statutory authority. We need to look at US surveillance law more generally.

First, we need to strengthen the minimization procedures to limit incidental collection. Since the Internet was developed, all the world’s communications travel around in a single global network. It’s impossible to collect only foreign communications, because they’re invariably mixed in with domestic communications. This is called “incidental” collection, but that’s a misleading name. It’s collected knowingly, and searched regularly. The intelligence community needs much stronger restrictions on which American communications channels it can access without a court order, and rules that require they delete the data if they inadvertently collect it. More importantly, “collection” is defined as the point the NSA takes a copy of the communications, and not later when they search their databases.

Second, we need to limit how other law enforcement agencies can use incidentally collected information. Today, those agencies can query a database of incidental collection on Americans. The NSA can legally pass information to those other agencies. This has to stop. Data collected by the NSA under its foreign surveillance authority should not be used as a vehicle for domestic surveillance.

The most recent reauthorization modified this lightly, forcing the FBI to obtain a court order when querying the 702 data for a criminal investigation. There are still exceptions and loopholes, though.

Third, we need to end what’s called “parallel construction.” Today, when a law enforcement agency uses evidence found in this NSA database to arrest someone, it doesn’t have to disclose that fact in court. It can reconstruct the evidence in some other manner once it knows about it, and then pretend it learned of it that way. This right to lie to the judge and the defense is corrosive to liberty, and it must end.

Pressure to reform the NSA will probably first come from Europe. Already, European Union courts have pointed to warrantless NSA surveillance as a reason to keep Europeans’ data out of US hands. Right now, there is a fragile agreement between the EU and the United States ­– called “Privacy Shield” — ­that requires Americans to maintain certain safeguards for international data flows. NSA surveillance goes against that, and it’s only a matter of time before EU courts start ruling this way. That’ll have significant effects on both government and corporate surveillance of Europeans and, by extension, the entire world.

Further pressure will come from the increased surveillance coming from the Internet of Things. When your home, car, and body are awash in sensors, privacy from both governments and corporations will become increasingly important. Sooner or later, society will reach a tipping point where it’s all too much. When that happens, we’re going to see significant pushback against surveillance of all kinds. That’s when we’ll get new laws that revise all government authorities in this area: a clean sweep for a new world, one with new norms and new fears.

It’s possible that a federal court will rule on Section 702. Although there have been many lawsuits challenging the legality of what the NSA is doing and the constitutionality of the 702 program, no court has ever ruled on those questions. The Bush and Obama administrations successfully argued that defendants don’t have legal standing to sue. That is, they have no right to sue because they don’t know they’re being targeted. If any of the lawsuits can get past that, things might change dramatically.

Meanwhile, much of this is the responsibility of the tech sector. This problem exists primarily because Internet companies collect and retain so much personal data and allow it to be sent across the network with minimal security. Since the government has abdicated its responsibility to protect our privacy and security, these companies need to step up: Minimize data collection. Don’t save data longer than absolutely necessary. Encrypt what has to be saved. Well-designed Internet services will safeguard users, regardless of government surveillance authority.

For the rest of us concerned about this, it’s important not to give up hope. Everything we do to keep the issue in the public eye ­– and not just when the authority comes up for reauthorization again in 2024 — hastens the day when we will reaffirm our rights to privacy in the digital age.

This essay previously appeared in the Washington Post.