Practical Doomsday

Post Syndicated from Unknown original https://lcamtuf.blogspot.com/2021/08/practical-doomsday.html

 Practical Doomsday

Practical Doomsday is an enjoyable, data-packed romp through the world of rational emergency preparedness. It cuts through the noise of 24-hour news to help you zero in on what actually matters: building a diversified rainy-day fund, staying safe online, and dealing with common mishaps ranging from prolonged power outages to wildfires or floods.

The goal of the book is not to convince you that the end is nigh. To the contrary: I want to reclaim the concept of prepping from the bunker-dwelling prophets of doom. Disasters are not rare, but emergency preparedness is not about expecting the worst; it’s about being able to enjoy our lives to the fullest without worrying about the apocalyptic headline of the day.

☛ Click here to read a sample chapter
☛ Click to order from the publisher

Use code PREORDERDOOMSDAY to get 30% off. Preorders from the publisher get instant early access to a PDF version of the book. Orders can also be placed on Amazon, Barnes & Noble, and in most other places where books are sold.

CICD on Serverless Applications using AWS CodeArtifact

Post Syndicated from Anand Krishna original https://aws.amazon.com/blogs/devops/cicd-on-serverless-applications-using-aws-codeartifact/

Developing and deploying applications rapidly to users requires a working pipeline that accepts the user code (usually via a Git repository). AWS CodeArtifact was announced in 2020. It’s a secure and scalable artifact management product that easily integrates with other AWS products and services. CodeArtifact allows you to publish, store, and view packages, list package dependencies, and share your application’s packages.

In this post, I will show how we can build a simple DevOps pipeline for a sample JAVA application (JAR file) to be built with Maven.

Solution Overview

We utilize the following AWS services/Tools/Frameworks to set up our continuous integration, continuous deployment (CI/CD) pipeline:

The following diagram illustrates the pipeline architecture and flow:

 

aws-codeartifact-pipeline

 

Our pipeline is built on CodePipeline with CodeCommit as the source (CodePipeline Source Stage). This triggers the pipeline via a CloudWatch Events rule. Then the code is fetched from the CodeCommit repository branch (main) and sent to the next pipeline phase. This CodeBuild phase is specifically for compiling, packaging, and publishing the code to CodeArtifact by utilizing a package manager—in this case Maven.

After Maven publishes the code to CodeArtifact, the pipeline asks for a manual approval to be directly approved in the pipeline. It can also optionally trigger an email alert via Amazon Simple Notification Service (Amazon SNS). After approval, the pipeline moves to another CodeBuild phase. This downloads the latest packaged JAR file from a CodeArtifact repository and deploys to the AWS Lambda function.

Clone the Repository

Clone the GitHub repository as follows:

git clone https://github.com/aws-samples/aws-cdk-codeartifact-pipeline-sample.git

Code Deep Dive

After the Git repository is cloned, the directory structure is shown as in the following screenshot :

aws-codeartifact-pipeline-code

Let’s study the files and code to understand how the pipeline is built.

The directory java-events is a sample Java Maven project. Find numerous sample applications on GitHub. For this post, we use the sample application java-events.

To add your own application code, place the pom.xml and settings.xml files in the root directory for the AWS CDK project.

Let’s study the code in the file lib/cdk-pipeline-codeartifact-new-stack.ts of the stack CdkPipelineCodeartifactStack. This is the heart of the AWS CDK code that builds the whole pipeline. The stack does the following:

  • Creates a CodeCommit repository called ca-pipeline-repository.
  • References a CloudFormation template (lib/ca-template.yaml) in the AWS CDK code via the module @aws-cdk/cloudformation-include.
  • Creates a CodeArtifact domain called cdkpipelines-codeartifact.
  • Creates a CodeArtifact repository called cdkpipelines-codeartifact-repository.
  • Creates a CodeBuild project called JarBuild_CodeArtifact. This CodeBuild phase does all of the code compiling, packaging, and publishing to CodeArtifact into a repository called cdkpipelines-codeartifact-repository.
  • Creates a CodeBuild project called JarDeploy_Lambda_Function. This phase fetches the latest artifact from CodeArtifact created in the previous step (cdkpipelines-codeartifact-repository) and deploys to the Lambda function.
  • Finally, creates a pipeline with four phases:
    • Source as CodeCommit (ca-pipeline-repository).
    • CodeBuild project JarBuild_CodeArtifact.
    • A Manual approval Stage.
    • CodeBuild project JarDeploy_Lambda_Function.

 

CodeArtifact shows the domain-specific and repository-specific connection settings to mention/add in the application’s pom.xml and settings.xml files as below:

aws-codeartifact-repository-connections

Deploy the Pipeline

The AWS CDK code requires the following packages in order to build the CI/CD pipeline:

  • @aws-cdk/core
  • @aws-cdk/aws-codepipeline
  • @aws-cdk/aws-codepipeline-actions
  • @aws-cdk/aws-codecommit
  • @aws-cdk/aws-codebuild
  • @aws-cdk/aws-iam
  • @aws-cdk/cloudformation-include

 

Install the required AWS CDK packages as below:

npm i @aws-cdk/core @aws-cdk/aws-codepipeline @aws-cdk/aws-codepipeline-actions @aws-cdk/aws-codecommit @aws-cdk/aws-codebuild @aws-cdk/pipelines @aws-cdk/aws-iam @ @aws-cdk/cloudformation-include

Compile the AWS CDK code:

npm run build

Deploy the AWS CDK code:

cdk synth
cdk deploy

After the AWS CDK code is deployed, view the final output on the stack’s detail page on the AWS CloudFormation :

aws-codeartifact-pipeline-cloudformation-stack

 

How the pipeline works with artifact versions (using SNAPSHOTS)

In this demo, I publish SNAPSHOT to the repository. As per the documentation here and here, a SNAPSHOT refers to the most recent code along a branch. It’s a development version preceding the final release version. Identify a snapshot version of a Maven package by the suffix SNAPSHOT appended to the package version.

The application settings are defined in the pom.xml file. For this post, we define the following:

  • The version to be used, called 1.0-SNAPSHOT.
  • The specific packaging, called jar.
  • The specific project display name, called JavaEvents.
  • The specific group ID, called JavaEvents.

The screenshot below shows the pom.xml settings we utilised in the application:

aws-codeartifact-pipeline-pom-xml

 

You can’t republish a package asset that already exists with different content, as per the documentation here.

When a Maven snapshot is published, its previous version is preserved in a new version called a build. Each time a Maven snapshot is published, a new build version is created.

When a Maven snapshot is published, its status is set to Published, and the status of the build containing the previous version is set to Unlisted. If you request a snapshot, the version with status Published is returned. This is always the most recent Maven snapshot version.

For example, the image below shows the state when the pipeline is run for the FIRST RUN. The latest version has the status Published and previous builds are marked Unlisted.

aws-codeartifact-repository-package-versions

 

For all subsequent pipeline runs, multiple Unlisted versions will occur every time the pipeline is run, as all previous versions of a snapshot are maintained in its build versions.

aws-codeartifact-repository-package-versions

 

Fetching the Latest Code

Retrieve the snapshot from the repository in order to deploy the code to an AWS Lambda Function. I have used AWS CLI to list and fetch the latest asset of package version 1.0-SNAPSHOT.

 

Listing the latest snapshot

export ListLatestArtifact = `aws codeartifact list-package-version-assets —domain cdkpipelines-codeartifact --domain-owner $Account_Id --repository cdkpipelines-codeartifact-repository --namespace JavaEvents --format maven --package JavaEvents --package-version "1.0-SNAPSHOT"| jq ".assets[].name"|grep jar|sed ’s/“//g’`

NOTE : Please note the dynamic CDK variable $Account_Id which represents AWS Account ID.

 

Fetching the latest code using Package Version

aws codeartifact get-package-version-asset --domain cdkpipelines-codeartifact --repository cdkpipelines-codeartifact-repository --format maven --package JavaEvents --package-version 1.0-SNAPSHOT --namespace JavaEvents --asset $ListLatestArtifact demooutput

Notice that I’m referring the last code by using variable $ListLatestArtifact. This always fetches the latest code, and demooutput is the outfile of the AWS CLI command where the content (code) is saved.

 

Testing the Pipeline

Now clone the CodeCommit repository that we created with the following code:

git clone https://git-codecommit.<region>.amazonaws.com/v1/repos/codeartifact-pipeline-repository

 

Enter the following code to push the code to the CodeCommit repository:

cp -rp cdk-pipeline-codeartifact-new /* ca-pipeline-repository
cd ca-pipeline-repository
git checkout -b main
git add .
git commit -m “testing the pipeline”
git push origin main

Once the code is pushed to Git repository, the pipeline is automatically triggered by Amazon CloudWatch events.

The following screenshots shows the second phase (AWS CodeBuild Phase – JarBuild_CodeArtifact) of the pipeline, wherein the asset is successfully compiled and published to the CodeArtifact repository by Maven:

aws-codeartifact-pipeline-codebuild-jarbuild

aws-codeartifact-pipeline-codebuild-screenshot

aws-codeartifact-pipeline-codebuild-screenshot2

 

The following screenshots show the last phase (AWS CodeBuild Phase – Deploy-to-Lambda) of the pipeline, wherein the latest asset is successfully pulled and deployed to AWS Lambda Function.

Asset JavaEvents-1.0-20210618.131629-5.jar is the latest snapshot code for the package version 1.0-SNAPSHOT. This is the same asset version code that will be deployed to AWS Lambda Function, as seen in the screenshots below:

aws-codeartifact-pipeline-codebuild-jardeploy

aws-codeartifact-pipeline-codebuild-screenshot-jarbuild

The following screenshot of the pipeline shows a successful run. The code was fetched and deployed to the existing Lambda function (codeartifact-test-function).

aws-codeartifact-pipeline-codepipeline

Cleanup

To clean up, You can either delete the entire stack through the AWS CloudFormation console or use AWS CDK command like below –

cdk destroy

For more information on the AWS CDK commands, please check the here or sample here.

Summary

In this post, I demonstrated how to build a CI/CD pipeline for your serverless application with AWS CodePipeline by utilizing AWS CDK with AWS CodeArtifact. Please check the documentation here for an in-depth explanation regarding other package managers and the getting started guide.

Metasploit Wrap-Up

Post Syndicated from Matthew Kienow original https://blog.rapid7.com/2021/08/06/metasploit-wrap-up-124/

Desert heat (not the 1999 film)

Metasploit Wrap-Up

This week was more quiet than normal with Black Hat USA and DEF CON, but that didn’t stop the team from delivering some small enhancements and bug fixes! We are also excited to see two new modules #15519 and #15520 from researcher Jacob Baines’ DEF CON talk ​​Bring Your Own Print Driver Vulnerability already appear in the PR queue. Keep an eye out for those modules in the near future!

Our very own Simon Janusz enhanced the CommandDispatcher and SessionManager to support using a negative ID with both the jobs and sessions commands. Quickly access the last job or session by passing -1 to the command. The change allows users to upgrade the most recently opened session to meterpreter using the command sessions -u -1, thus removing the need to run the post/multi/manage/shell_to_meterpreter module.

In addition, our very own Alan David Foster updated the PostgreSQL scanner/postgres/postgres_schemadump module so that it does not ignore the default postgres database. That default database might contain valuable information after all! The enhancements also introduce a new datastore option, IGNORED_DATABASES, to configure a list of databases ignored during the schema dump.

Enhancements and features

  • #15492 from sjanusz-r7 – Adds support for negative session and job ids.
  • #15498 from adfoster-r7 – Updates the PostgreSQL schema_dump module to no longer ignore the default postgres database which may contain useful information, and adds a new datastore option to configure ignored databases.

Bugs fixed

  • #15500 from agalway-r7 – Fixes a regression issue for gitlab_file_read_rce and cacti_filter_sqli_rce where the modules failed to run
  • #15503 from jheysel-r7 – A bug has been fixed in the Cisco Hyperflex file upload RCE module that prevented it from properly deleting the uploaded payload files. Uploaded payload files should now be properly deleted.

Get it

As always, you can update to the latest Metasploit Framework with msfupdate
and you can get more details on the changes since the last blog post from
GitHub:

If you are a git user, you can clone the Metasploit Framework repo (master branch) for the latest.
To install fresh without using git, you can use the open-source-only Nightly Installers or the
binary installers (which also include the commercial edition).

Black Hat 2021: Rapid7 Experts Share Key Day 2 Takeaways

Post Syndicated from Aaron Wells original https://blog.rapid7.com/2021/08/06/black-hat-recap-2/

Black Hat 2021: Rapid7 Experts Share Key Day 2 Takeaways

Here we are again, back for another day of Rapid7 expert debriefings and analysis for some of the most talked-about Black Hat sessions of this year. So without further delay, let’s take it away!

Get more DEF CON 2021 insights from our Research team on Tuesday, August 10

Sign up for our What Happened in Vegas webinar

Detection and Response



Black Hat 2021: Rapid7 Experts Share Key Day 2 Takeaways

Key takeaways

  • How do human behaviors — learned or learning — factor into incident response? Depending on the volume of stakeholders, your team may be under varying extremes of action bias. As in, are speedy actions being prioritized on vulnerabilities that don’t present a high risk profile? Is speed even possible if mitigating actions must suddenly be learned? Vendors have caught on, practicing “Security Theater”— peddling solutions to problems that might not present real risks.
  • Tangential to the previous topic, a question arises when exploring the weaponization of C2 channels: Due to the unlikelihood of an attack via, say, LDAP attributes when establishing C2, does it make sense to roll out an entirely new detection-and-response plan? Many different conditions must be met for an attacker to gain access in the wild, but teams might already have similar responses in place, on the off chance it happens.
  • Zooming out to a topic with broader public appeal, let’s consider how companies use — and abuse — our personal data. An 18-month test run by a professor and a group of students at Virginia Tech revealed how unlikely it is we’ll be able to predict which companies will abuse personal information after someone, say, creates login credentials for a TikTok account and the company launches cookie tracking for that person.

Vulnerability Risk Management



Black Hat 2021: Rapid7 Experts Share Key Day 2 Takeaways

Key takeaways

  • Are Microsoft Exchange Servers creating an entirely new attack surface via Client Access Services (CAS)? Exchange architecture is incredibly complex, so it contains multitudes when it comes to vulnerabilities. CAS ties front-end and back-end services together, receiving the front-end request through a variety of protocols, including some extremely geriatric ones like POP3 and IMAP4. These legacy protocols are contributing to expanded attack surfaces.
  • Vulnerability Exploitability eXchange (VEX) helps teams rethink security advisories and what it means to be vulnerable. Essentially, it enables software providers to communicate they’re not affected by a vulnerability. Two advantages of VEX are 1) that creation and management of vulnerabilities are automated, and 2) that its results are machine-readable.  
  • Open-source software (OSS) is incredible… and incredibly vulnerable. There are so many risks with OSS that a vendor might even put off patching a vulnerability — for whatever business reason — if alerted to it. There’s currently no mechanism to secure so many classes of vulnerabilities in OSS, but maybe there should be. Researchers should work together to create those class-eliminating mechanisms, ultimately reducing the lift when it comes to risk management.

Research and Policy



Black Hat 2021: Rapid7 Experts Share Key Day 2 Takeaways

Key takeaways

  • What is Electromagnetic Fault Injection (EMFI)? It’s when hardware attackers use electromagnetism to hack hardware chips. When it comes to something like a car’s modern combustion engine, EMFI can be leveraged to change a vehicle’s performance, slithering past manufacturer-imposed security protocols. Some owners are beginning to “tune” chips with EMFI in order to push the limits of their vehicles.
  • There’s cause for concern that AI security products are simply repeating back to us the tables on which they were trained. If this is the case, can someone create more nefarious tables to sway AI security entities away from actual security? Attackers can now train explainable AI models on private data, turning them into the latest tool in their arsenals. Consider your attack surface expanded.
  • When companies export their technology beyond their own borders, it isn’t as easy as it sounds in a press release. Whereas policy constantly lagged behind technology, it’s starting to catch up as companies realize the cost of doing business with both digital authoritarians and digital democracies. Is proprietary tech compromised when entering a new country where it must adhere to each and every law imposed on it by local regulators?

Thanks for joining the Rapid7 team at another round of Black Hat debriefings. We hope to see you live and in person in Vegas next year. Until then, stay secure and stay safe!

And if you’re not ready to walk away from the table just yet, revisit our Day 1 takeaways, or sign up now to hear our Research team’s behind-the-scenes insights on DEF CON 2021 at the What Happened in Vegas webinar on Tuesday, August 10.

Field Notes: Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow

Post Syndicated from Kevin Soucy original https://aws.amazon.com/blogs/architecture/field-notes-building-an-automated-scene-detection-pipeline-for-autonomous-driving/

This Field Notes blog post in 2020 explains how to build an Autonomous Driving Data Lake using this Reference Architecture. Many organizations face the challenge of ingesting, transforming, labeling, and cataloging massive amounts of data to develop automated driving systems. In this re:Invent session, we explored an architecture to solve this problem using Amazon EMR, Amazon S3, Amazon SageMaker Ground Truth, and more. You learn how BMW Group collects 1 billion+ km of anonymized perception data from its worldwide connected fleet of customer vehicles to develop safe and performant automated driving systems.

Architecture Overview

The objective of this post is to describe how to design and build an end-to-end Scene Detection pipeline which:

This architecture integrates an event-driven ROS bag ingestion pipeline running Docker containers on Elastic Container Service (ECS). This includes a scalable batch processing pipeline based on Amazon EMR and Spark. The solution also leverages AWS Fargate, Spot Instances, Elastic File System, AWS Glue, S3, and Amazon Athena.

reference architecture - build automated scene detection pipeline - Autonomous Driving

Figure 1 – Architecture Showing how to build an automated scene detection pipeline for Autonomous Driving

The data included in this demo was produced by one vehicle across four different drives in the United States. As the ROS bag files produced by the vehicle’s on-board software contains very complex data, such as Lidar Point Clouds, the files are usually very large (1+TB files are not uncommon).

These files usually need to be split into smaller chunks before being processed, as is the case in this demo. These files also may need to have post-processing algorithms applied to them, like lane detection or object detection.

In our case, the ROS bag files are split into approximately 10GB chunks and include topics for post-processed lane detections before they land in our S3 bucket. Our scene detection algorithm assumes the post processing has already been completed. The bag files include object detections with bounding boxes, and lane points representing the detected outline of the lanes.

Prerequisites

This post uses an AWS Cloud Development Kit (CDK) stack written in Python. You should follow the instructions in the AWS CDK Getting Started guide to set up your environment so you are ready to begin.

You can also use the config.json to customize the names of your infrastructure items, to set the sizing of your EMR cluster, and to customize the ROS bag topics to be extracted.

You will also need to be authenticated into an AWS account with permissions to deploy resources before executing the deploy script.

Deployment

The full pipeline can be deployed with one command: * `bash deploy.sh deploy true` . The progress of the deployment can be followed on the command line, but also in the CloudFormation section of the AWS console. Once deployed, the user must upload 2 or more bag files to the rosbag-ingest bucket to initiate the pipeline.

The default configuration requires two bag files to be processed before an EMR Pipeline is initiated. You would also have to manually initiate the AWS  Glue Crawler to be able to explore the parquet data with tools like Athena or Quicksight.

ROS bag ingestion with ECS Tasks, Fargate, and EFS

This solution provides an end-to-end scene detection pipeline for ROS bag files, ingesting the ROS bag files from S3, and transforming the topic data to perform scene detection in PySpark on EMR. This then exposes scene descriptions via DynamoDB to downstream consumers.

The pipeline starts with an S3 bucket (Figure 1 – #1) where incoming ROS bag files can be uploaded from local copy stations as needed. We recommend, using Amazon Direct Connect for a private, high-throughout connection to the cloud.

This ingestion bucket is configured to initiate S3 notifications each time an object ending in the prefix “.bag” is created. An AWS Lambda function then initiates a Step Function for orchestrating the ECS Task. This passes the bucket and bag file prefix to the ECS task as environment variables in the container.

The ECS Task (Figure 1 – #2) runs serverless leveraging Fargate as the capacity provider, This avoids the need to provision and autoscale EC2 instances in the ECS cluster. Each ECS Task processes exactly one bag file. We use Elastic FileStore to provide virtually unlimited file storage to the container, in order to easily work with larger bag files. The container uses the open-source bagpy python library to extract structured topic data (for example, GPS, detections, inertial measurement data,). The topic data is uploaded as parquet files to S3, partitioned by topic and source bag file. The application writes metadata about each file, such as the topic names found in the file and the number of messages per topic, to a DynamoDB table (Figure 1 – #4).

This module deploys an AWS  Glue Crawler configured to crawl this bucket of topic parquet files. These files populate the AWS Glue Catalog with the schemas of each topic table and make this data accessible in Athena, Glue jobs, Quicksight, and Spark on EMR.  We use the AWS Glue Catalog (Figure 1 – #5) as a permanent Hive Metastore.

Glue Data Catalog of parquet datasets on S3

Figure 2 – Glue Data Catalog of parquet datasets on S3

 

Run ad-hoc queries against the Glue tables using Amazon Athena

Figure 3 – Run ad-hoc queries against the Glue tables using Amazon Athena

The topic parquet bucket also has an S3 Notification configured for all newly created objects, which is consumed by an EMR-Trigger Lambda (Figure 1 – #5). This Lambda function is responsible for keeping track of bag files and their respective parquet files in DynamoDB (Figure 1 – #6). Once in DynamoDB, bag files are assigned to batches, initiating the EMR batch processing step function. Metadata is stored about each batch including the step function execution ARN in DynamoDB.

EMR pipeline orchestration with AWS Step Functions

Figure 4 – EMR pipeline orchestration with AWS Step Functions

The EMR batch processing step function (Figure 1 – #7) orchestrates the entire EMR pipeline, from provisioning an EMR cluster using the open-source EMR-Launch CDK library to submitting Pyspark steps to the cluster, to terminating the cluster and handling failures.

Batch Scene Analytics with Spark on EMR

There are two PySpark applications running on our cluster. The first performs synchronization of ROS bag topics for each bagfile. As the various sensors in the vehicle have different frequencies, we synchronize the various frequencies to a uniform frequency of 1 signal per 100 ms per sensor. This makes it easier to work with the data.

We compute the minimum and maximum timestamp in each bag file, and construct a unified timeline. For each 100 ms we take the most recent signal per sensor and assign it to the 100 ms timestamp. After this is performed, the data looks more like a normal relational table and is easier to query and analyze.

Batch Scene Analytics with Spark on EMR

Figure 5 – Batch Scene Analytics with Spark on EMR

Scene Detection and Labeling in PySpark

The second spark application enriches the synchronized topic dataset (Figure 1 – #8), analyzing the detected lane points and the object detections. The goal is to perform a simple lane assignment algorithm for objects detected by the on-board ML models and to save this enriched dataset (Figure 1 – #9) back to S3 for easy-access by analysts and data scientists.

Object Lane Assignment Example

Figure 9 – Object Lane Assignment example

 

Synchronized topics enriched with object lane assignments

Figure 9 – Synchronized topics enriched with object lane assignments

Finally, the last step takes this enriched dataset (Figure 1 – #9) to summarize specific scenes or sequences where a person was identified as being in a lane. The output of this pipeline includes two new tables as parquet files on S3 – the synchronized topic dataset (Figure 1 – #8) and the synchronized topic dataset enriched with object lane assignments (Figure 1 – #9), as well as a DynamoDB table with scene metadata for all person-in-lane scenarios (Figure 1 – #10).

Scene Metadata

The Scene Metadata DynamoDB table (Figure 1 – #10) can be queried directly to find sequences of events, as will be covered in a follow up post for visually debugging scene detection algorithms using WebViz/RViz. Using WebViz, we were able to detect that the on-board object detection model labels Crosswalks and Walking Signs as “person” even when a person is not crossing the street, for example:

Example DynamoDB item from the Scene Metadata table

Example DynamoDB item from the Scene Metadata table

Figure 10 – Example DynamoDB item from the Scene Metadata table

These scene descriptions can also be converted to Open Scenario format and pushed to an ElasticSearch cluster to support more complex scenario-based searches. For example, downstream simulation use cases or for visualization in QuickSight. An example of syncing DynamoDB tables to ElasticSearch using DynamoDB streams and Lambda can be found here (https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/). As DynamoDB is a NoSQL data store, we can enrich the Scene Metadata table with scene parameters. For example, we can identify the maximum or minimum speed of the car during the identified event sequence, without worrying about breaking schema changes. It is also straightforward to save a dataframe from PySpark to DynamoDB using open-source libraries.

As a final note, the modules are built to be exactly that, modular. The three modules that are easily isolated are:

  1. the ECS Task pipeline for extracting ROS bag topic data to parquet files
  2. the EMR Trigger Lambda for tracking incoming files, creating batches, and initiating a batch processing step function
  3. the EMR Pipeline for running PySpark applications leveraging Step Functions and EMR Launch

Clean Up

To clean up the deployment, you can run bash deploy.sh destroy false. Some resources like S3 buckets and DynamoDB tables may have to be manually emptied and deleted via the console to be fully removed.

Limitations

The bagpy library used in this pipeline does not yet support complex or non-structured data types like images or LIDAR data. Therefore its usage is limited to data that can be stored in a tabular csv format before being converted to parquet.

Conclusion

In this post, we showed how to build an end-to-end Scene Detection pipeline at scale on AWS to perform scene analytics and scenario detection with Spark on EMR from raw vehicle sensor data. In a subsequent blog post, we will cover how how to extract and catalog images from ROS bag files, create a labelling job with SageMaker GroundTruth and then train a Machine Learning Model to detect cars.

Recommended Reading: Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close