Tag Archives: Automotive

Breaking down data silos: Volkswagen’s approach with Amazon DataZone

2025-10-07 Bandana Das

Post Syndicated from Bandana Das original https://aws.amazon.com/blogs/big-data/breaking-down-data-silos-volkswagens-approach-with-amazon-datazone/

Over the years, organizations have invested in building purpose-built cloud-based data warehouses that are siloed from one another. One of the major challenges these organizations encounter today is enabling cross-organization discovery and access to data across these siloed data warehouses built using different technology stacks. The data mesh pattern addresses these issues, founded in four principles: domain-oriented decentralized data ownership and architecture, treating data as a product, providing self-serve data infrastructure as a platform, and implementing federated governance. The data mesh pattern helps organizations mimic their organizational structure into data domains and makes it possible to share the data across the organization and beyond to improve their business models.

In 2019, Volkswagen AG and Amazon Web Services (AWS) started their collaboration to co-develop the Digital Production Platform (DPP), with the goal of enhancing production and logistics efficiency by 30% while reducing production costs by the same margin. The DPP was developed to streamline access to data from shop floor devices and manufacturing systems by handling integrations and providing a range of standardized interfaces. However, as applications and use cases evolved on the platform, a significant challenge emerged: the ability to share data across applications stored in isolated data warehouses (within Amazon Redshift in isolated AWS accounts designated for specific use cases), without the need to consolidate data into a central data warehouse. Another challenge was discovering all the available data stored across multiple data warehouses and facilitating a workflow to request access to data across business domains within each plant. The common method used was largely manual, relying on emails and general communication (through tickets and emails). The manual approach not only increased the overhead but also varied from one use case to another in terms of data governance.

In this post, we introduce Amazon DataZone and explore how Volkswagen used Amazon DataZone to build their data mesh, tackle the challenges encountered, and break the data silos. A key aspect of the solution was enabling data providers to automatically publish their data products to Amazon DataZone, serving as a central data mesh for enhanced data discoverability. Additionally, we provide code to guide you through the deployment and implementation process.

Introduction to Amazon DataZone

Amazon DataZone is a data management service that makes it faster and straightforward to catalog, discover, share, and govern data stored across AWS, on-premises, and third-party sources. Key features of Amazon DataZone include the business data catalog, with which users can search for published data, request access, and start working on data in days instead of weeks. In addition, the service facilitates collaboration across teams and helps them manage and monitor data assets across different organizational units. The service also includes the Amazon DataZone portal, which offers a personalized analytics experience for data assets through a web-based application or API. Lastly, Amazon DataZone offers governed data sharing, which makes sure the right data is accessed by the right user for the right purpose with a governed workflow.

Solution overview

The following architecture diagram represents a high-level design that is built on top of the data mesh pattern. It separates source systems, data domain producers (data publishers), data domain subscribers (data consumers), and central governance to highlight the key aspects. This data mesh architecture is specially tailored for cross-AWS account usage. The objective of this approach is to create a foundation for building data governance on a scale, supporting the objectives of data producers and consumers with strong and consistent governance.

This architecture allows for the integration of multiple data warehouses into a centralized governance account that stores all the metadata from each environment.

A data domain producer uses Amazon Redshift as their analytical data warehouse to store, process, and manage structured and semi-structured data. The data domain producers load data into their respective Amazon Redshift clusters through extract, transform, and load (ETL) pipelines they manage, own, and operate. The producers maintain control over their data through Amazon Redshift security features, including column-level access controls and dynamic data masking, supporting data governance at the source. A data domain producer uses Amazon Redshift ETL and Amazon Redshift Spectrum to process and transform raw data into consumable data products. The data products could be Amazon Redshift tables, views, or materialized views.

Data domain producers expose datasets to the rest of the organization by registering them to Amazon DataZone service, which acts as a central data catalog. They can choose what data assets to share, for how long, and how consumers can interact with these. They’re also responsible for maintaining the data and making sure it’s accurate and current.

The data assets from the producers are then published using the data source run to Amazon DataZone in the central governance account. This process populates the technical metadata into the business data catalog for each data asset. The business metadata can be added by business users (data analysts) to provide business context, tags, and data classification for the datasets. This approach provides the required features to allow producers to create catalog entries with Amazon Redshift from all their data warehouses built in with Redshift clusters. In addition, the central data governance account is used to share datasets securely between producers and consumers. It’s important to note that sharing is done through metadata linking alone. No data (except logs) exists in the governance account. The data isn’t copied to the central account; just a reference to the data is used, so that the data ownership remains with the producer.

Amazon DataZone provides a streamlined way to search for data. The Amazon DataZone data portal provides a personalized view for users to discover and search data assets. An Amazon DataZone user (consumer) with permissions to access the data portal can search for assets and submit requests for subscription of data assets using a web-based application. An approver can then approve or reject the subscription request.

When a data domain consumer has access to an asset in the catalog, they can consume it (query and analyze) using the Amazon Redshift query editor. Each consumer runs their own workload based on their use case. In this way, the team can choose the tools for the job to perform analytics and machine learning activities in its AWS consumer environment.

Publishing and registering data assets to Amazon DataZone

To publish a data asset from the producer account, each asset must be registered in Amazon DataZone for consumer subscription. For more information, refer to Create and run an Amazon DataZone data source for Amazon Redshift. In the absence of an automated registration process, required tasks must be completed manually for each data asset.

Using the automated registration workflow, the manual steps can be automated for the Amazon Redshift data asset (Redshift table or view) that needs to be published in an Amazon DataZone domain or when there’s a schema change in an already published data asset.

The following architecture diagram represents how data assets from Amazon Redshift data warehouses have been automatically published to the data mesh created with Amazon DataZone.

The process consists of the following steps:

In the producer account (Account B), the data to be shared resides in a Redshift cluster.
The producer account (Account B) uses a mechanism to trigger the dataset registration AWS Lambda function with a specific payload containing the information and name of the database, schema, table, or view that has a change in metadata.
The Lambda function performs the steps to automatically register and publish the dataset in Amazon DataZone:
1. Get the Amazon Redshift clusterName, dbName, schemas, and tables from the JSON payload, which is used as the event to trigger the Lambda function.
2. Get the Amazon DataZone data warehouse blueprint ID.
3. Enable the blueprint in the data producer account.
4. Identify the Amazon DataZone Domain ID and project ID for the producer via assuming role in Amazon DataZone account (Account A).
5. Check if an environment already exists in the project. If not, create an environment.
6. Create a new Redshift data source by providing the correct Redshift database information in the newly created environment.
7. Initiate a data source run request in the data source to make the Redshift tables or views available in Amazon DataZone.
8. Publish the tables or views in the Amazon DataZone catalog.

Prerequisites

The following prerequisites are required before starting:

Two AWS accounts to implement the solution have been described in this post. However, you can also use Amazon DataZone to publish data within a single account or across multiple accounts.
- Amazon DataZone account (Account A) – This is the central data governance account, which will have the Amazon DataZone domain and project.
- Data domain producer account (Account B) – This account acts as the data domain producer. It has been added as an associated account to Account A.

Prerequisites in data domain producer account (Account B)

As part of this post, we want to publish assets and subscribe to assets from a Redshift cluster that already exists. Complete the following prerequisite steps to set up Account B:

Set up the Redshift cluster, including database, schema, tables, and views (optional). The node type must be from the RA3 family. For more information, see Amazon Redshift provisioned clusters.
Create a superuser in Amazon Redshift for Amazon DataZone. For the Redshift cluster, the database user you provide in AWS Secrets Manager must have superuser permissions. For reference please see the note section in this QuickStart guide with sample Amazon Redshift data
Store the user’s credentials in Secrets Manager. Select the credential type, enter the credential values, and choose the AWS Key Management Service (AWS KMS) key with which to encrypt the secret.
Add the tags to the Secret Manager secret to allow Amazon DataZone to find this secret and limit the access to a particular Amazon DataZone domain and Amazon DataZone project. The Redshift cluster Amazon Resource Name (ARN) must be added as a tag so it can be used by Amazon Redshift as a valid credential. For reference please see the note section in this QuickStart guide with sample Amazon Redshift data

Add an Amazon DataZone provisioning IAM role and Amazon Redshift manage access IAM role in the secret’s resource policy. The AWS Identity and Access Management (IAM) roles are created as part of the AWS Cloud Development Kit (AWS CDK) deployment (discussed later in this post). The following code shows an example of the Secrets Manager secret’s resource policy. Store the secret ARN in an AWS Systems Manager parameter.

{
  "Version" : "2012-10-17",
  "Statement" : [ {
    "Effect" : "Allow",
    "Principal" : "*",
    "Action" : "secretsmanager:GetSecretValue",
    "Resource" : "*",
    "Condition" : {
      "ArnEquals" : {
        "aws:PrincipalArn" : [ 
          "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:role/DzRedshiftAccess-<<AWS_Region>>-<< Amazon_DataZone _Domain_Name>>",
          "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:role/DataZoneProvisioning-<< Amazon_DataZone_Account_id(Account A)>>"
        ]
      }
    }
  } ]
}

If your secret is encrypted with a custom KMS key, append the key policy with the following statement and add a tag to the key: AmazonDatazoneEnvironment = All. You can skip this step if you’re using an AWS managed KMS key.

{
    "Effect": "Allow",
    "Principal": {
        "Service": "logs.<<AWS_Region>>.amazonaws.com",
        "AWS": "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:root"
    },
    "Action": [
        "kms:Decrypt",
        "kms:Encrypt",
        "kms:GenerateDataKey*",
        "kms:ReEncrypt*"
    ],
    "Resource": "*"
 },
 {
    "Sid": "AllowDatazoneRoles-DEV",
    "Effect": "Allow",
    "Principal": {
        "AWS": "*"
    },
    "Action": [
        "kms:Decrypt",
        "kms:Describe*",
        "kms:Get*",
        "kms:Encrypt",
        "kms:GenerateDataKey",
        "kms:ReEncrypt*",
        "kms:CreateGrant"
    ],
    "Resource": "*",
    "Condition": {
        "StringLike": {
            "aws:PrincipalArn": [
                "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:role/aws-service-role/redshift.amazonaws.com/AWSServiceRoleForRedshift",
                "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:role/datazone_*",
                "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:role/<<Redshift_Cluster_IAM_Role>>",
                "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:role/service-role/AmazonDataZoneRedshiftAccess-<<AWS_Region>>-*"
            ]
         }
     }
 }

Place a mechanism to generate the following payload to trigger the dataset registration Lambda function. The payload must contain the relevant Redshift database, schema, and table or view that you want to publish in the Amazon DataZone domain. The following example code assumes you have three databases in your Redshift cluster and within those databases you have different schemas, tables, and views. You should adjust the payload based on your use case.

{
    "source": "redshift-user-initiated",
    "detail-type": "Amazon Redshift dataset registration in Amazon DataZone",
    "datasets": [
        {
            "clusterName": "<<YOUR_REDSHIFT_CLUSTER_NAME>>",
            "dbName":"<<YOUR_REDSHIFT_DATABASE_NAME_1>>",
            "schemas": [
                {
                    "schemaName":"<<YOUR_REDSHIFT_SCHEMA_NAME>>",
                    "addAllTables":false,
                    "addAllViews":false,
                    "tables":[
                        "<<YOUR_REDSHIFT_TABLE_NAME>>",
                        "<<YOUR_REDSHIFT_TABLE_NAME>>"
                    ],
                    "views":[
                        "<<YOUR_REDSHIFT_VIEW_NAME>>"
                    ]
                }
            ]
        },
        {
            "clusterName": "<<YOUR_REDSHIFT_CLUSTER_NAME>>",
            "dbName":"<<YOUR_REDSHIFT_DATABASE_NAME_2>>",
            "schemas": [
                {
                    "schemaName":"<<YOUR_REDSHIFT_SCHEMA_NAME>>",
                    "addAllTables":true,
                    "addAllViews":true,
                    "tables":[],
                    "views":[]
                }
            ]
        },
        {
            "clusterName": "<<YOUR_REDSHIFT_CLUSTER_NAME>>",
            "dbName":"<<YOUR_REDSHIFT_DATABASE_NAME_3>>",
            "schemas": [
                {
                    "schemaName":"<<YOUR_REDSHIFT_SCHEMA_NAME>>",
                    "addAllTables":true,
                    "addAllViews":false,
                    "tables":[],
                    "views":[
                        "<<YOUR_REDSHIFT_VIEW_NAME>>"
                    ]
                }
            ]
        }
    ]
}

Prerequisites in Amazon DataZone account (Account A)

Complete the following steps to set up your Amazon DataZone account (Account A):

Sign in to Account A and make sure you have already deployed an Amazon DataZone domain and a project within that domain. Refer to Create Amazon DataZone domains for instructions to create a domain.
If your Amazon DataZone domain is encrypted with a KMS key, add the data domain account (Account B) to the KMS key policy with the following actions:
```
"Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey"
]
```

Create an IAM role that is assumable by Account B and make sure the role has a following policy attached and is a member (as contributor) of your Amazon DataZone project. For this post, we call the role dz-assumable-env-dataset-registration-role. By adding this role, you can successfully run the registration Lambda function.

In the following policy, provide the AWS Region and account ID corresponding to where your Amazon DataZone domain is created, and the KMS key ARN used to encrypt the domain:

  {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "datazone:CreateDataSource",
                "datazone:CreateEnvironment",
                "datazone:CreateEnvironmentProfile",
                "datazone:GetDataSource",
                "datazone:GetDataSourceRun",
                "datazone:GetEnvironment",
                "datazone:GetEnvironmentProfile",
                "datazone:GetIamPortalLoginUrl",
                "datazone:ListDataSources",
                "datazone:ListDomains",
                "datazone:ListEnvironmentProfiles",
                "datazone:ListEnvironments",
                "datazone:ListProjectMemberships",
                "datazone:ListProjects",
                "datazone:StartDataSourceRun",
                "datazone:UpdateDataSource",
                "datazone:SearchUserProfiles"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "kms:Decrypt",
                "kms:DescribeKey",
                "kms:GenerateDataKey"
            ],
            "Resource": "arn:aws:kms:<<account_region>>:<<Datazone_Account_id(Account A)>>


}:key/${DataZonekmsKey}",
            "Effect": "Allow"
        }
    ]
}

Add Account B in the trust relationship of this role with the following trust relationship:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::<<Data_Producer_Acct_Id(Account B)>>:root",
                    "arn:aws:iam::<<Datazone_Account_id(Account A)>>:root",
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Add the role as a member of the Amazon DataZone project in which you want to register your data sources. For more information, see Add members to a project.

Additional tools

The following tools are needed to deploy the solution using the AWS CDK:

Either Bash/ or ZSH terminal
Node and NPM using Node Version Manager:
- Install Node Version Manager (NVM)
- Install Node version 18.12.0 using the following command (node and npm binaries should now be available):
```
$ nvm install 18.12.0
```
Python
AWS Command Line Interface (AWS CLI); see Installing or updating to the latest version of the AWS CLI
AWS CDK for Python (Boto3); see Installation
AWS CDK

Deploy the solution

After you complete the prerequisites, use the AWS CDK stack provided on the GitHub repo to deploy the solution for automatic registration of data assets into the Amazon DataZone domain. Complete the following steps:

Clone the repository from GitHub to your preferred integrated development environment (IDE) using the following commands:

git clone https://github.com/aws-samples/sample-how-to-automate-amazon-redshift-cluster-data-asset-publish-to-amazon-datazone

$ cd sample-how-to-automate-amazon-redshift-cluster-data-asset-publish-to-amazon-datazone

At the base of the repository folder, run the following commands to build and deploy resources to AWS:
```
$ npm install
```
```
$ npm run lint
```
Sign in to Account B (the data domain producer account) using the AWS CLI with your profile name.
Make sure you have configured the Region in your credential’s configuration file.
Bootstrap the AWS CDK environment with the following commands at the base of the repository folder. Provide the profile name of your deployment account (Account B). Bootstrapping is a one-time activity and is not needed if your AWS account is already bootstrapped.
```
$ export AWS_PROFILE=<<PROFILE_NAME>>
```
```
$ npm run cdk bootstrap
```
Replace the placeholder parameters (marked with the suffix _PLACEHOLDER) in the file config/DataZoneConfig.ts:
1. Amazon DataZone domain and project name of your Amazon DataZone instance. Make sure all names are in lowercase.
2. The AWS account ID of the Amazon DataZone account (Account A).
3. The assumable IAM role from the prerequisites.
4. The AWS Systems Manager parameter name containing the Secrets Manager secret ARN of the Amazon Redshift credentials.
Use the following command in the base folder to deploy the AWS CDK solution. During deployment, enter y if you want to deploy the changes for some stacks when you see the prompt Do you wish to deploy these changes (y/n)?
```
npm run cdk deploy --all
```
After the deployment is complete, sign in to Account B and open the AWS CloudFormation console to verify that the infrastructure was deployed.

Test automatic data registration to Amazon DataZone

Complete the following steps to test the solution:

Sign in to Account B (producer account).
On the Lambda console, open the datazone-redshift-dataset-registration function.
Under TEST EVENTS, choose Create new test event.

For Event name, enter Redshift, and for Event JSON, enter the following JSON structure (change the cluster, schema, database, and table names according to your environment):

{
  "source": "redshift-user-initiated",
  "detail-type": "Amazon Redshift dataset registration in Amazon DataZone",
  "datasets": [
    {
      "clusterName": "YOUR_REDSHIFT_CLUSTER_NAME",
      "dbName": "DATABASE_NAME",
      "schemas": [
        {
          "schemaName": "SCHEMA_NAME_1",
          "addAllTables": false,
          "addAllViews": false,
          "tables": [
            "TABLE_NAME"
          ],
          "views": []
        },
        {
          "schemaName": "SCHEMA_NAME_2",
          "addAllTables": false,
          "addAllViews": false,
          "tables": [],
          "views": [
            "VIEW_NAME"
          ]
        }
      ]
    }
  ]
}

Choose Save.
Choose Invoke.
Open the Amazon DataZone console in Account A where you deployed the resources.
Choose Domains in the navigation pane, then open your domain.
On the domain details page, locate the Amazon DataZone data portal URL in the Summary section. Choose the link to the data portal.
For more details about accessing Amazon DataZone, refer to How can I access Amazon DataZone?
In the data portal, open your project and choose the Data tab.
In the navigation pane, choose Data sources and find the newly created data source for Amazon Redshift.
Verify that the data source has been successfully published.

After the data sources are published, users can discover the published data and submit a subscription request. The data producer can approve or reject requests. Upon approval, users can consume the data by querying the data in the Amazon Redshift query editor. The following screenshot illustrates data discovery in the Amazon DataZone data portal.

Clean up

Complete the following steps to clean up the resources deployed through the AWS CDK:

Sign in to Account B, go to the Amazon DataZone domain portal, and check there is no subscription for your published data asset. If there is a subscription, either ask the subscriber to unsubscribe or revoke the subscription request.
Delete the published data assets that were created in the Amazon DataZone project by the dataset registration Lambda function.
Delete the remaining resources created using the following command in the base folder:
```
npm run cdk destroy –all
```

Conclusion

Amazon DataZone offers a seamless integration with AWS services, providing a powerful solution for organizations like Volkswagen to break down their data silos and implement effective data mesh architectures through a straightforward implementation highlighted in this post. By using Amazon DataZone, Volkswagen addressed its immediate data sharing hurdles and laid the groundwork for a more agile, data-driven future in automotive manufacturing. The automated data publishing from various warehouses, coupled with standardized governance workflows, has significantly reduced the manual overhead that once slowed down Volkswagen’s data engineering teams. Now, instead of navigating a labyrinth of emails, tickets, and communication, Volkswagen’s data engineers and data scientists can quickly discover and access the data they need, all while maintaining their security and compliance standards.

By using Amazon DataZone, organizations can bring their isolated data together in ways that make it simpler for teams to collaborate while maintaining security and compliance at scale. This approach not only addresses current data governance challenges but also creates a highly scalable foundation for future data-driven innovations. For guidance on establishing your organization’s data mesh with Amazon DataZone, contact your AWS team today.

About the Authors

How Stellantis streamlines floating license management with serverless orchestration on AWS

2025-06-12 Göksel SARIKAYA

Post Syndicated from Göksel SARIKAYA original https://aws.amazon.com/blogs/architecture/how-stellantis-streamlines-floating-license-management-with-serverless-orchestration-on-aws/

This post is written by Goeksel Sarikaya, Senior Delivery Consultant at AWS, and Milosz Stawarski, Senior Software Architect at Stellantis.

Software licensing is a critical aspect of many organizations’ operations, with various models available to suit different needs. Two common types are named user licenses, which are assigned to specific individuals, and floating licenses, which can be shared among a pool of users. Some independent software vendors (ISVs) offer both options, whereas others might have limitations, particularly in cloud environments.

In this post, we explore a unique scenario where an ISV, unable to provide a floating license option for cloud usage, worked with Stellantis to develop an alternative solution. This approach, implemented with the ISV’s permission, treats named user licenses as if they were floating, automatically assigning and removing them based on the state of user workbench instances.

This solution is not intended to circumvent licensing terms or reduce costs at the expense of ISVs. Rather, it’s a collaborative approach to address specific customer needs when traditional floating licenses aren’t available. We will demonstrate how the solution uses serverless AWS services like Amazon EventBridge, AWS Lambda, Amazon DynamoDB, and AWS Systems Manager, keeping in mind that any similar implementation should only be pursued with explicit permission from the software vendor.

Overview of Stellantis

Stellantis N.V., born from the merger of FCA and PSA Group, leads the change towards software defined vehicles (SDV). As part of this transformation, AWS and Stellantis created the Virtual Engineering Workbench (VEW), a modular framework to develop, integrate, and test vehicle software in the cloud, ultimately connecting their vehicles to the cloud.

The VEW provides predefined environments tailored to specific use cases. These environments come fully equipped with the tools, integrated development environments (IDEs), and licensing necessary for developers to jumpstart their projects.

For more details on VEW, refer to Stellantis’ SDV transformation with the Virtual Engineering Workbench on AWS.

Overview of solution

As the number of developers and projects grew, Stellantis faced a challenge in managing the limited number of named user licenses for their software tools. The manual process of assigning and revoking licenses became increasingly time-consuming and inefficient, potentially hindering the agility and productivity of their development teams.

Stellantis and AWS tackled this challenge head-on by collaborating on an innovative, dynamic license management solution using AWS serverless services. This solution transforms the traditional named user license model into a more flexible floating license system, automatically assigning and revoking licenses based on the state of user workbench instances. The licenses and solution discussed in this post pertain solely to the use of standalone software tools such as those used in automotive domains. These do not involve sharing of user data or content when licenses are reused.

Before we dive into the detailed workflow of the solution, let’s examine the high-level architecture. The following diagram illustrates how various AWS services work together to create this efficient license management system.

Architecture

This architecture uses key AWS services such as EventBridge, Lambda, DynamoDB, and Systems Manager to create a scalable, serverless solution that significantly reduces administrative overhead and optimizes license utilization.

In the following sections, we explore each component of this architecture in detail, explaining how they interact to provide a seamless license management experience for Stellantis’ VEW.

In workbench accounts (user accounts)

The design is serverless and based on an event-driven approach. The workflow in the user accounts is as follows:

Workbench instances are Amazon Elastic Compute Cloud (Amazon EC2). Their start and stop automatically sends AWS events.
An EventBridge rule invokes a Lambda function when such an event occurs. This function checks the tags on the EC2 instance to distinguish workbenches from other EC2 instances. Two tags are important for identifying workbench instances: vew:workbench:ownerId and vew:workbench:type.
The Lambda function creates a custom event with the following data: user-id, workbench-type, workbench-state, and instance-id, and sends this event to the default event bus.
An EventBridge rule forwards the custom event to a custom event bus in the license server account.

In license server account

The following steps take place in the license server account:

An EventBridge rule invokes a Lambda
This function interacts with a DynamoDB table that stores a mapping of licensed products to users. The function does the following:
1. Deduces the licensed products present in the workbench from the workbench type.
2. For each licensed product, it verifies if the combination of product and user is already present in the DynamoDB
3. If the workbench is starting:
  1. If the combination is already present, it increases the count of workbenches in the table for this item by 1.
  2. If the combination is not present, it creates a new item in the table (product, user-id, workbench-count, timestamp).
4. If the workbench is stopping, it decreases the count of workbenches in the table for this item by 1. If the count becomes 0, the item is deleted.
Any update to the DynamoDB table triggers another Lambda
If the change in the table is a creation of a new entry or deletion of an entry, this function writes the current timestamp to a Systems Manager parameter in both cases. This is so that if no changes are detected in the database, we don’t unnecessarily run the xLC (License Client for related product) caller function.
Another Lambda function is invoked every minute. It compares the timestamp written in the Systems Manager parameter indicating a DynamoDB item creation or deletion with the last time the function called the xLC CLI to assign users to a license.
If the DynamoDB timestamp is earlier, the function stops. If the DynamoDB timestamp is later, the function queries the table for obtaining the user-id for each product.
To maintain a comprehensive record of license assignment operations, you can enable data plane events for DynamoDB in AWS CloudTrail.
For each licensed product, the function uses Run Command, a capability of Systems Manager, to invoke the xLC CLI API on the license server to assign named users to a license for a product. The function provides the list of users assigned to the product to the API. This updates the named user list on the license server—the list is completely overwritten, which includes adding new user IDs and removing ones that are no longer needed.

Benefits and key features

The solution offers the following benefits:

Automated license assignment and removal – Users are automatically assigned licenses when their workbench instances start, and licenses are returned to the pool when instances stop, providing efficient license utilization.
Scalable and serverless architecture – The solution is built on serverless AWS services, allowing it to scale seamlessly as the number of users and workbench instances grows, without the need for provisioning or managing servers.
Centralized license management – The license server account acts as a central hub for managing licenses across multiple workbench accounts, simplifying administration and providing a unified view of license usage.
Reduced administrative overhead – By automating the license assignment and removal process, the solution can significantly reduce the administrative burden associated with manual license management.
Optimized license utilization – Licenses are assigned only when needed and returned to the pool when no longer required, maximizing license availability and minimizing idle licenses.
Monitoring and metrics – The solution provides monitoring capabilities and license usage metrics, enabling better visibility and informed decision-making regarding license procurement and allocation.

Conclusion

By implementing this serverless solution, it is possible to transform a manual named user license management systems to an automated floating license system for software tools. The event-driven architecture and serverless components provide efficient and scalable license assignment and removal based on the workbench instance state.

This solution has streamlined the license management process, reducing administrative overhead and optimizing license utilization. It is now possible to provision software tools more efficiently, improving productivity and resource allocation across the organization. Additionally, the centralized license management and monitoring capabilities provide better visibility and control over license usage, enabling informed decision-making and cost optimization.

Overall, this AWS based floating license solution has empowered organizations to use software tools more effectively, while minimizing the operational burden associated with license management. For more serverless learning resources, visit Serverless Land.

About the authors

How Volkswagen Autoeuropa built a data solution with a robust governance framework, simplifying access to quality data using Amazon DataZone

2024-11-13 Dhrubajyoti Mukherjee

Post Syndicated from Dhrubajyoti Mukherjee original https://aws.amazon.com/blogs/big-data/how-volkswagen-autoeuropa-built-a-data-solution-with-a-robust-governance-framework-simplifying-access-to-quality-data-using-amazon-datazone/

This is a joint post co-authored with Martin Mikoleizig from Volkswagen Autoeuropa.

This second post of a two-part series that details how Volkswagen Autoeuropa, a Volkswagen Group plant, together with AWS, built a data solution with a robust governance framework using Amazon DataZone to become a data-driven factory. Part 1 of this series focused on the customer challenges, overall solution architecture and solution features, and how they helped Volkswagen Autoeuropa overcome their challenges. This post dives into the technical details, highlighting the robust data governance framework that enables ease of access to quality data using Amazon DataZone.

At Amazon, we work backward, a systematic way to vet ideas and create new products. The key tenet of this approach is to start by defining the customer experience, then iteratively work backward from that point until the team achieves clarity of thought around what to build. The first section of this post discusses how we aligned the technical design of the data solution with the data strategy of Volkswagen Autoeuropa. Next, we detail the governance guardrails of the Volkswagen Autoeuropa data solution. Finally, we highlight the key business outcomes.

Aligning the solution with the data strategy

At an early stage of the project, the Volkswagen Autoeuropa and AWS team identified that a data mesh architecture for the data solution aligns with the Volkswagen Autoeuropa’s vision of becoming a data-driven factory. With this in mind, the team implemented the following steps:

Define data domains – In a workshop, the team identified the data landscape and its distribution in Volkswagen Autoeuropa. Next, the team grouped the data assets of the organization along the lines of business and defined the data domains. Because Volkswagen Autoeuropa is at an early stage of their data mesh journey, defining data domains along the lines of business is the recommended approach. As the data solution evolves, Volkswagen Autoeuropa might consider other criteria such as business subdomains to define data domains. The team defined more than five data domains, such as production, quality, logistics, planning, and finance.
Identify pioneer cases – The team identified the pioneer use cases that onboard the data solution first, to validate its business value. The team identified two use cases. The first use case helps predict test results during the car assembly process. The second use case enables the creation of reports containing shop floor key metrics for different management levels. The following criteria were considered to identify these use cases:
- Use cases that deliver measurable business value for Volkswagen Autoeuropa.
- Use cases with high AWS maturity.
- Use cases whose requirements can be met with the first release version of the data solution.
Onboard key data products – The team identified the key data products that enabled these two use cases and aligned to onboard them into the data solution. These data products belonged to data domains such as production, finance, and logistics. In addition, the team aligned on business metadata attributes that would help with data discovery. The data products are classified as either source-based data or consumer-based data. Source-based data is the unaltered, raw data that is generated from source systems (for example, quality data, safety data) and is useful for other business use cases. Consumer-based data is the aggregated and transformed data from source systems. Reuse of consumer-based data saves cost in extract, transform, and load (ETL) implementation and system maintenance.

In addition to the preceding steps, the team established a data quality framework to improve the quality of the data product registered in the data solution. The following table shows the mapping of the data mesh-based solution components to Amazon DataZone and AWS Glue features. The table also provides generic examples of the components in the automotive industry.

Data Solution Components	AWS Service Features	Generic Examples
Data domains	Amazon DataZone projects and Amazon DataZone domain units	Production, logistics
Use cases	Amazon DataZone projects	Smart manufacturing, predictive maintenance
Data products	Amazon DataZone assets	Sales data, sensor data
Business metadata	Amazon DataZone glossaries and metadata forms	Data product owner information, data refresh frequency
Data quality framework	AWS Glue Data Quality	A quality score of 92%

Empowering teams with a governance framework

This section discusses the governance framework that was put in place to empower the teams at Volkswagen Autoeuropa by enhancing their analytics journey. It highlights the guardrails that enable ease of access to quality data.

Business metadata

Business metadata helps users understand the context of the data, which can lead to increased trust in the data. Moreover, establishing a common set of attributes of the data products promotes a consistent experience for the users. In addition to the business context, at Volkswagen Autoeuropa, the metadata includes information related to data classification and if the data contains personally identifiable information (PII). The data solution uses Amazon DataZone glossaries and metadata forms to provide business context to their data. Apart from the previous benefits, using the appropriate keywords in Amazon DataZone glossary terms and metadata forms can help with the search and filtering capability of data products in the Amazon DataZone data portal.

Data quality framework

The data quality framework is a comprehensive solution designed to streamline the process of data quality checks and publishing a quality score. It uses AWS Glue Data Quality to generate recommendation rulesets, run orchestrated jobs, store results, and send notifications. This framework can be seamlessly integrated into an AWS Glue job, providing a quality score for data pipeline jobs. The quality score of a data product is published in the Amazon DataZone data portal for consumers to evaluate. The key components of the solution are as follows:

Recommendation ruleset generation – The framework generates tailored rulesets based on metadata from the AWS Glue Data Catalog table, providing relevant and comprehensive quality checks.
Orchestrated job execution – Jobs are run in AWS Step Functions to perform data quality checks using the generated rulesets against data sources, evaluating data quality based on defined rules and criteria.
Result storage and notification – Results, including quality scores, quality status, and rulesets checked, are stored in an Amazon Simple Storage Service (Amazon S3) bucket, maintaining a historical record. End-users receive notifications with relevant details.
Data quality score publishing – The quality scores are published in the Amazon DataZone data portal, enabling consumers to access and evaluate data quality.
Subscription and quality score requirements – Consumers can subscribe to data sources or targets based on their desired quality score thresholds, making sure they receive data that meets their specific needs and standards.
Integration and extensibility – The framework is designed for seamless integration into existing AWS Glue jobs or data pipelines and provides a flexible and extensible architecture for customization and enhancement.

Federated governance

Federated governance empowers producer and consumer teams to operate independently while adhering to a central governance model. For the data solution at Volkswagen Autoeuropa, this meant a centralized team defined the governance guardrails and decentralized data teams employed those guardrails. The following are a few examples of how the team established federated governance in Volkswagen Autoeuropa:

Management of Amazon DataZone glossaries and metadata forms – In this mechanism, the Volkswagen Autoeuropa IT team defined the Amazon DataZone glossaries and metadata forms in a central manner. The data teams used them to publish the data assets in the Amazon DataZone. This provides consistency of business metadata across the organization. The following figure explains the process.
The workflow in the Amazon DataZone data portal consists of the following steps:

1. The data solution administrator belonging to the Volkswagen Autoeuropa IT team aligns with stakeholders such as data producers, data consumers, and source system owners, and maintains the business metadata using the Amazon DataZone glossaries and metadata forms.
2. The producer project teams use the Amazon DataZone glossary terms and fill the Amazon DataZone metadata forms to enrich the inventory assets.
3. After the business metadata is populated, the team publishes the assets in the Amazon DataZone data portal.

Management of Amazon DataZone project membership – In this scenario, the management of Amazon DataZone project membership is delegated to a designated administrator of the project. The following figure explains the process.
The workflow consists of the following steps:

1. The data solution administrator belonging to the Volkswagen Autoeuropa IT team provisions the Amazon DataZone project and environment using automation. The data solution administrator is the owner of the project.
2. The data solution administrator delegates the management of the Amazon DataZone project membership to a designated administrator by assigning the owner role.
3. The Amazon DataZone project administrator assigns the contributor role to eligible users.
4. The users access the Amazon DataZone project and its assets from the Amazon DataZone data portal.

Authentication and authorization

The Amazon DataZone portal supports two types of authorizations: AWS Identity and Access Management (IAM) roles and AWS IAM Identity Center users. The data solution supports both of these authorization methods. The choice of authentication mechanism is a function of the type of authorization used for Amazon DataZone.

For IAM role authorization, an IAM role is created for each user, incorporating a prefix. Each data solution user role has a permission to list the Amazon DataZone domains (datazone:ListDomains) and to get the data portal login URL (datazone:GetIamPortalLoginUrl) in the Amazon DataZone AWS account. For reasons that are out of scope for this post, there could only be three SAML federated roles in an AWS account in the customer environment. As such, the team didn’t have a dedicated SAML federated role for each Amazon DataZone user. The data solution user role implemented a trust policy allowing the user’s AWS Security Token Service (AWS STS) federated user session principal Amazon Resource Name (ARN). If you don’t have limitations on the number of SAML federated roles per AWS account, you can make all data solution user roles SAML federated roles and update the trust policy accordingly.

For IAM Identity Center authorization, the configuration is done either at the AWS Organizations level or AWS account level in IAM Identity Center. Because there are currently no APIs available for identity source configuration in IAM Identity Center, the team followed the appropriate instructions to configure the identity source on the AWS Management Console.

After the chosen authorization option is activated, Amazon DataZone administrators grant the IAM principals (IAM role or IAM Identity Center user) access to the Amazon DataZone portal. For more details, refer to Manage users in the Amazon DataZone console.

Business outcomes

Volkswagen Autoeuropa and AWS established an iterative mechanism to enable the continuous growth of the data solution. This iterative improvement is expressed as a flywheel as shown in the following figure.

The outcome of each component of the flywheel powers the next component, creating a virtuous cycle. The data solution flywheel consists of five components:

Data solution growth – The primary focus of the flywheel is to accelerate the growth of the data solution. This growth is measured by metrics such as number of data products, number of use cases onboarded into the solution, and number of users.
Enhancing user experience – This component focuses on enhancing the user experience of the data solution. One way to measure the user experience is through user feedback surveys.
Data solution use cases – Improved, positive user experience with the data solution contributes to the increased number of use cases that want to onboard the data solution.
Data producers and consumers – As the number of use cases increases, so does the number of data producers and consumers. Data producers make data available to power the use cases. Data consumers use the data to drive the use cases.
Selection of data products – After data producers onboard the data solution, they publish the assets in the Amazon DataZone data portal. This leads to a larger selection of data products. This, in turn, creates a positive experience for the data solution users.

In addition to the previous components, the positive user experience is reinforced by improving governance guardrails, increasing number of reusable assets, and maximizing operational excellence.

As of writing this post, Volkswagen Autoeuropa reduced the time to discover data from days to minutes using the data solution. This led to approximately 384 times improvement in data discovery time. Data access took several weeks before the Volkswagen Autoeuropa and AWS collaboration. With the help of the data solution powered by Amazon DataZone, the data access time was reduced to minutes. Overall, the data solution resulted in regaining between 48 hours and weeks of customer productivity over the course of a month.

The data solution powered by Amazon DataZone is driving measurable business impact for Volkswagen Autoeuropa. It enables Volkswagen Autoeuropa to deliver digital use cases faster, with less effort, and a higher overall quality. Volkswagen Autoeuropa believes that Amazon DataZone will be key in their journey to become a data-driven factory and to leverage AI.

Conclusion

This post explored how Volkswagen Autoeuropa built a robust and scalable data solution using Amazon DataZone. The first step was to align the solution with Volkswagen Autoeuropa’s overarching data strategy to drive business value.

The establishment of a comprehensive governance framework was central to this effort. This framework encompasses key components, such as business metadata, data quality, federated governance, access controls, and security, which maintain the trustworthiness and reliability of Volkswagen Autoeuropa’s data assets. The post highlighted the Volkswagen Autoeuropa data solution flywheel, showcasing how the solution enabled improved decision-making, increased operational efficiency, and accelerated digital transformation initiatives across the organization.

The data solution built at Volkswagen Autoeuropa is one of the first implementations within the Volkswagen Group and is a blueprint for other Volkswagen production plants.

“This project is a blueprint for other Volkswagen production plants. By involving the AWS team and using Amazon DataZone, we are able to govern our data centrally and make it accessible in an automated and secure way.”

– Daniel Madrid, Head of IT, Volkswagen Autoeuropa.

If you’re looking to harness the power of data mesh to drive innovation and business value within your organization, we’ve got you covered. In Strategies for building a data mesh-based enterprise solution on AWS, we dive deep into the key considerations and current recommendations to establish a robust, scalable, and well-governed data mesh on AWS. This documentation covers everything from aligning your data mesh with overall business strategy to implementing the data mesh strategy framework.

To get hands-on experience with real-world code examples, see our GitHub repository. This open source project provides a step-by-step blueprint for constructing a data mesh architecture using the powerful capabilities of Amazon DataZone, AWS Cloud Development Kit (AWS CDK), and AWS CloudFormation.

About the Authors

BDB-4558-Dhruba Dhrubajyoti Mukherjee is a Cloud Infrastructure Architect with a strong focus on data strategy, data analytics, and data governance at AWS. He uses his deep expertise to provide guidance to global enterprise customers across industries, helping them build scalable and secure AWS solutions that drive meaningful business outcomes. Dhrubajyoti is passionate about creating innovative, customer-centric solutions that enable digital transformation, business agility, and performance improvement. An active contributor to the AWS community, Dhrubajyoti authors AWS Prescriptive Guidance publications, blog posts, and open source artifacts, sharing his insights and best practices with the broader community. Outside of work, Dhrubajyoti enjoys spending quality time with his family and exploring nature through his love of hiking mountains.

BDB-4558-Ravi Ravi Kumar is a Data Architect and Analytics expert at AWS, where he finds immense fulfilment in working with data. His days are dedicated to designing and analyzing complex data systems, uncovering valuable insights that drive business decisions. Outside of work, he unwinds by listening to music and watching movies, activities that allow him to recharge after a long day of data wrangling.

Martin Mikoleizig studied mechanical engineering and production technology at the RWTH Aachen University before starting to work in Dr. h.c. Ing. F. Porsche AG 2015 as a production planner for the engine assembly. Over several years as a Project Manager on Testing Technology for new engine models, he also introduced several innovations like human-machine collaborations and intelligent assistance systems. Starting in 2017, he was responsible for the shop floor IT team of the module lines in Zuffenhausen before he became responsible for the planning of the E-Drive assembly at Porsche. Additionally, he was responsible for the Digitalisation Strategy of the Production Ressort at Porsche. In October 2022, he was assigned to Volkswagen Autoeuropa in Portugal in the role of a Digital Transformation Manager for the plant, driving the digital transformation towards a data-driven factory.

BDB-4558-Wei Weizhou Sun is a Lead Architect at AWS, specializing in digital manufacturing solutions and IoT. With extensive experience in Europe, she has enhanced operational efficiencies, reducing latency and increasing throughput. Weizhou’s expertise includes industrial computer vision, predictive maintenance, and predictive quality, consistently delivering top performance and client satisfaction. A recognized thought leader in IoT and remote driving, she has contributed to business growth through innovations and open source work. Committed to knowledge sharing, Weizhou mentors colleagues and contributes to practice development. Known for her problem-solving skills and customer focus, she delivers solutions that exceed expectations. In her free time, Weizhou explores new technologies and fosters a collaborative culture.

BDB-4558-Ajinkya Ajinkya Patil is a Senior Security Architect with AWS Professional Services, specializing in security consulting for customers in the automotive industry. Since joining AWS in 2019, he has played a key role in helping automotive companies design and implement robust security solutions on AWS. Ajinkya is an active contributor to the AWS community, having presented at AWS re:Inforce and authored articles for the AWS Security Blog and AWS Prescriptive Guidance. Outside of his professional pursuits, Ajinkya is passionate about travel and photography, often capturing the diverse landscapes he encounters on his journeys.

BDB-4558-Adjoa Adjoa Taylor has over 20 years of experience in industrial manufacturing, providing industry and technology consulting services, digital transformation, and solution delivery. Currently, Adjoa leads Product Centric Digital Transformation, enabling customers in solving complex manufacturing problems using smart factory and industry-leading transformation mechanisms. Most recently, she drives value with AI/ML and generative AI use cases for the plant floor. Adjoa is an experienced leader, having spent over 20 years of her career delivering projects in countries throughout North America, Latin America, Europe, and Asia. Adjoa brings deep experience across multiple business segments with a focus on business outcome-driven solutions. Adjoa is passionate about helping customers solve problems while realizing the art of the possible through implementing value-based solutions.

How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using Amazon DataZone

2024-10-31 Dhrubajyoti Mukherjee

Post Syndicated from Dhrubajyoti Mukherjee original https://aws.amazon.com/blogs/big-data/how-volkswagen-autoeuropa-built-a-data-mesh-to-accelerate-digital-transformation-using-amazon-datazone/

This is a joint blog post co-authored with Martin Mikoleizig from Volkswagen Autoeuropa.

Volkswagen Autoeuropa is a Volkswagen Group plant that produces the T-Roc. The plant is located near Lisbon, Portugal and produces about 934 cars per day. In 2023, Volkswagen Autoeuropa represented 1.3% of the national GDP of Portugal and 4% in national export of goods impact with a sales volume of 3.3511 billion Euros. Volkswagen Autoeuropa aims to become a data-driven factory and has been using cutting-edge technologies to enhance digitalization efforts.

In this post, we discuss how Volkswagen Autoeuropa used Amazon DataZone to build a data marketplace based on data mesh architecture to accelerate their digital transformation. The data mesh, built on Amazon DataZone, simplified data access, improved data quality, and established governance at scale to power analytics, reporting, AI, and machine learning (ML) use cases. As a result, the data solution offers benefits such as faster access to data, expeditious decision making, accelerated time to value for use cases, and enhanced data governance.

Understanding Volkswagen Autoeuropa’s challenges

At the time of writing this post, Volkswagen Autoeuropa has already implemented more than 15 successful digital use cases in the context of real-time visualization, business intelligence, industrial computer vision, and AI.

Before the AWS partnership, Volkswagen Autoeuropa faced the following challenges.

Long lead time to access data – The digital use cases launched by Volkswagen Autoeuropa spent most of their project time getting access to the data that was relevant to their use cases. After the right data for the use case was found, the IT team provided access to the data through manual configuration. The lead time to access data was often from several days to weeks.
Insufficient data governance and auditing – Data was shared directly to use cases by copying it. Therefore, the IT team connected the data manually from their sources to the desired destinations multiple times. This process wasn’t centrally tracked to discover any information on the data sharing process. For example, if the data was copied in the past, how many use cases have access to the data, when access was granted, and who granted the access.
Redundant effort to process the same information – Because the IT team copied the data sources based on the exact use case requirements, they shared specific columns of the tables from the data. As additional use cases requested access to the same data with different column requirements, even more copies of the data were created.
Repeated process to establish security and governance guardrails – Each time the IT and the security team provided a connection to a new data source, they had to set up the security and governance guardrails. This required repeated manual effort.
Data quality issues – Because the data was processed redundantly and shared multiple times, there was no guarantee of or control over the quality of the data. This led to reduced trust in the data.
Absence of data catalog and metadata management – Data didn’t have any metadata associated with it, and so use cases couldn’t consume the data without further explanation from the data source owners and specialists. Furthermore, no process to discover new data existed. Similar to the consumption process, use cases would consult specialists to understand the context of the data and if it could provide value.

Envisioning a data solution for Volkswagen Autoeuropa

To address these challenges, Volkswagen Autoeuropa embarked on a bold vision. They envisioned a seamless data consumption process, similar to an online shopping experience. They envisioned a data marketplace where data users could browse and access high-quality, secure data with clear specifications, business context, and relevant attributes. This vision materialized into a project aimed at transforming data accessibility and governance as the foundation for the digital ecosystem. The vision to be realized: Data as seamless as online shopping.

In collaboration with Amazon Web Services (AWS), Volkswagen Autoeuropa joined the Enhanced Plant Onboarding Program of the Global Volkswagen Group’s Digital Production Platform (DPP EPO) strategy. Through this partnership, AWS and Volkswagen Autoeuropa created a data marketplace that significantly improved data availability.

In the discovery phase of the project, Volkswagen Autoeuropa and AWS evaluated several options to build the data solution. In the end, Volkswagen Autoeuropa chose a solution based on data mesh architecture using Amazon DataZone. Being a managed service, Amazon DataZone provided the necessary speed and agility to build the solution. At the same time, it led to higher operational efficiencies and lower operational overhead. The team adopted a data mesh architecture because the principles of the data mesh aligned with Volkswagen Autoeuropa’s vision of being a data driven factory.

Solution overview

This section describes the key features and architecture of the Volkswagen Autoeuropa data solution. The solution is based on a data mesh architecture.

Data solution features

The following figure shows the key capabilities of the Volkswagen Autoeuropa data solution.

The key capabilities of the solution are:

Data quality – In the solution, we’ve built a data quality framework to streamline the process of data quality checks and publishing quality scores. It uses AWS Glue Data Quality to generate recommendation rulesets, run orchestrated jobs, store results, and send notifications to users. This framework can be seamlessly integrated into AWS Glue jobs, providing a quality score for data pipeline jobs. In addition, the quality score is published in the Amazon DataZone data portal, allowing consumers to subscribe to the data based on its quality score.Assigning a quality score to the data helps build trust in the data, and shifts the responsibility of maintaining data quality to the data owner. As a result, the quality of the results delivered by these use cases improves.
Data registration – The producers sign in to the Amazon DataZone data portal using their AWS Identity and Access Management (IAM) credentials or single sign-on with integration through AWS IAM Identity Center. They register their data assets, which are stored in Amazon Simple Storage Service (Amazon S3), in the Amazon DataZone data catalog. The metadata of the data assets is stored in an AWS Glue catalog and made available in the business data catalog of Amazon DataZone and in the Amazon DataZone data source. The producers add business context such as business unit name, data owner contact information, and data refresh frequency using Amazon DataZone glossaries and metadata forms. In addition, they use generative AI capabilities to generate business metadata. After the business metadata is generated, they review the changes and modify the metadata if needed.Because all data products in Volkswagen Autoeuropa are now registered in the same location, the likelihood of data duplication is significantly reduced. Moreover, the data producers are improving the quality of the data by adding business context to it.
Data discovery – The consumers sign in to the Amazon DataZone data portal using their IAM credentials or single sign-on with integration through IAM Identity Center and search the data using keywords in the search bar. After the results are returned, they can further filter the results using glossary terms and project names. Finally, they review the business metadata of the data assets to evaluate if the data is relevant to their business use cases. They can check the quality score of the data assets and the refresh schedule for their use cases.With a data discovery capability in place, consumers can gain information about the data without the need to consult the source system owners or specialists.
Data access management – When the consumers find a data asset that’s relevant to their use case, they request access to it using the subscription feature of Amazon DataZone. Data is classified as public, internal, and confidential. For public and internal data assets, the access request is automatically approved. For confidential data assets, the data producer team reviews the access request and either accepts or rejects the subscription request.With a central place to manage data access, data owners can view which use cases have access to their data and when the access request was granted. The fine-grained access control feature of Amazon DataZone gives data owners granular control of their data at the row and column levels.
Data consumption – Upon approval of the subscription request, Amazon DataZone provisions the backend infrastructure to make the data accessible to the corresponding consumers. After this process is complete, the consumers can access the data through Amazon Athena using the deep link feature of Amazon DataZone. The data consumption pattern in Volkswagen Autoeuropa supports two use cases:
- Cloud-to-cloud consumption – Both data assets and consumer teams or applications are hosted in the cloud.
- Cloud-to-on-premises consumption – Data assets are hosted in the cloud and consumer use cases or applications are hosted on-premises.

Requirements specific to a use case requires access to the relevant data assets; sharing data to use cases using Amazon DataZone doesn’t require creating multiple copies. As a result, duplication and processing of data. Furthermore, by reducing the number of copies of the data, the overall quality of the data products improves. In addition, the backend automation of Amazon DataZone to make data available to use cases reduces the manual effort and improves the lead time to access data.

Single collaborative environment – The Amazon DataZone data portal provides a single collaborative environment to the users in Volkswagen Autoeuropa. Data consumers such as use case owners, data engineers, data scientists, and ML engineers can browse and request access to data assets. At the same time, data producers, such as use case owners and source system owners, can publish and curate their data in the Amazon DataZone data portal. This collaborative experience promotes teamwork and accelerates the realization of business value. Furthermore, the security and governance guardrails scales across the organization as the number of use cases increases.

Data solution architecture

The following figure displays the reference architecture of the data solution at Volkswagen Autoeuropa. In the next part of the post, we discuss how we arrived at the solution.

The architecture includes:

The data from SAP applications, manufacturing execution systems (MES), and supervisory control and data acquisition (SCADA) systems is ingested into the producer accounts of Volkswagen Autoeuropa.
In the producer account, raw data is transformed using AWS Glue. The technical metadata of the data is stored in AWS Glue catalog. The data quality is measured using the data quality framework. The data stored in Amazon Simple Storage Service (Amazon S3) is registered as an asset in the Amazon DataZone data catalog hosted in the central governance account.
The central governance account hosts the Amazon DataZone domain and the related Amazon DataZone data portal. The AWS accounts of the data producers and consumers are associated with the Amazon DataZone domain. Amazon DataZone projects belonging to the data producers and consumers are created under the related Amazon DataZone domain units.
Consumers of the data products sign in to the Amazon DataZone data portal hosted in the central governance account using their IAM credentials or single sign-on with integration through IAM Identity Center. They search, filter, and view asset information (for example, data quality, business, and technical metadata).
After the consumer finds the asset they need, they request access to the asset using the subscription feature of Amazon DataZone. Based on the validity of the request, the asset owner approves or rejects the request.
After the subscription request is granted and fulfilled, the asset is accessed in the consumer account for a one-time query using Athena and Microsoft Power BI applications hosted on premises. This consumption pattern can be extended for AI and machine learning (AI/ML) model development using Amazon SageMaker and reporting purposes using Amazon QuickSight.

User journey

After discussing the desired system with the use case teams and stakeholders and analyzing the current workflow, Volkswagen Autoeuropa grouped the user personas of the data solution into three main categories: data producer, data consumer, and data solution administrator. This sets the foundation for the desired user experience and what’s needed to achieve the solution goals.

Data producer

Data producers create the data products in the data solution. There are two types of data producers.

Data source owners – Data source owners publish the raw data in the Amazon DataZone data portal. These data products are attributed as source-based data.
Use case owners – Use case owners publish data that’s fit for consumption by other use cases. These data products are called consumer-based data.

The following figure shows the user journey of a data producer:

A data producer’s journey includes:

Identify data of interest –
1. Identify data (Volkswagen Autoeuropa network).
2. Perform data quality checks (Volkswagen Autoeuropa network).
Connect data to the data solution –
1. Ingest data into the data solution (Amazon DataZone portal).
2. Start process to connect data using AWS Glue.
Locate the data source in the data solution –
1. Register data (Amazon DataZone portal).
2. Add data to the inventory in Amazon DataZone.
Add or edit metadata –
1. Add or edit metadata (Amazon DataZone portal).
2. Publish data assets (Amazon DataZone portal).
Approve or reject subscription request –
1. Review subscription requests.
Maintain data assets –
1. Manage data assets (Amazon DataZone portal).

Data consumer

Data consumers use data for business analytics, machine learning, AI, and business reporting. Data consumers are data engineers, data scientists, ML engineers, and business users. The following diagram shows the journey of a data consumer.

A data consumer’s journey includes:

Access Amazon DataZone portal –
1. Amazon DataZone portal – Access is granted based on the user’s assigned domain and projects.
Search for data assets –
1. Data assets in Amazon DataZone portal – Search for data and brows the results by glossary terms or the project name. Use additional filters to refine the results.
View business metadata –
1. Select a data asset to see additional information – Review the description, data quality score and metadata.
Request access to data (subscribe) –
1. Subscribe to request access.
2. After the subscription request is approved, review the data products that you have access to.
3. Query the data to view and consume the data.
Retrieve additional data –
1. Repeat the steps as needed to access and retrieve additional data.

Data solution administrator

Data solution administrators are responsible for performing administrative tasks on the data solution. The following figure shows the common tasks performed by the data solution administrator.

A data administrator’s journey includes:

Manage projects –
1. Manage Amazon DataZone domain.
2. Manage Amazon DataZone projects within the domain.
Manage environment –
1. Set up the environment to manage the infrastructure.
Manage business metadata glossary –
1. Manage and enable Amazon DataZone glossaries and metadata forms.
Manage data assets –
1. Manage assets.
2. Query the data to view and consume the data.
Manage access to data solution –
1. Monitor and revoke access when appropriate.

Conclusion

In this post, you learned how Volkswagen Autoeuropa embarked on a bold vision to become a data driven factory. It shows how this vision was put into action by building a data solution based on data mesh architecture using Amazon DataZone. It highlights the key features and architecture of the data solutions and presents the user journey. As of writing this post, Volkswagen Autoeuropa reduced the data discovery time from days to minutes using the data solution. The time to access data took several weeks before the Volkswagen Autoeuropa and AWS collaboration. Now, with the help of the data solution, the data access time has been reduced to several minutes.

In May 2024, the team achieved a major milestone by successfully offering data on the data solution and transporting it instantly to Power BI, a process that previously took several weeks.

“After one year of work, we did the full roundtrip from offering data on our new data marketplace built using Amazon DataZone to transporting it instantly to third-party tools, a process that previously took several weeks. This was a big achievement for our team.”

– Jorge Paulino, Product owner of the data solution. Volkswagen Autoeuropa.

The next post of the two-part series details discusses how we built the solution, its technical details, and the business value created.

If you want to harness the agility and scalability of a data mesh architecture and Amazon DataZone to accelerate innovation and drive business value for your organization, we have the resources to get you started. Be sure to check out the AWS Prescriptive Guidance: Strategies for building a data mesh-based enterprise solution on AWS. This comprehensive guide covers the key considerations and best practices for establishing a robust, well-governed data mesh on AWS. From aligning your data mesh with overall business strategy to scaling the data mesh across your organization, this Prescriptive Guidance provides a clear roadmap to help you succeed.

If you’re curious to get hands-on, see the GitHub repository: Building an enterprise Data Mesh with Amazon DataZone, Amazon DataZone, AWS CDK, and AWS CloudFormation. This open source project delivers a step-by-step guide to build a data mesh architecture using Amazon DataZone, AWS Cloud Development Kit (AWS CDK), and AWS CloudFormation.

About the Authors

Dhrubajyoti Mukherjee is a Cloud Infrastructure Architect with a strong focus on data strategy, data analytics, and data governance at Amazon Web Services (AWS). He uses his deep expertise to provide guidance to global enterprise customers across industries, helping them build scalable and secure AWS solutions that drive meaningful business outcomes. Dhrubajyoti is passionate about creating innovative, customer-centric solutions that enable digital transformation, business agility, and performance improvement. An active contributor to the AWS community, Dhrubajyoti authors AWS Prescriptive Guidance publications, blog posts, and open-source artifacts, sharing his insights and best practices with the broader community. Outside of work, Dhrubajyoti enjoys spending quality time with his family and exploring nature through his love of hiking mountains.

Ravi Kumar is a Data Architect and Analytics expert at Amazon Web Services; he finds immense fulfillment in working with data. His days are dedicated to designing and analyzing complex data systems, uncovering valuable insights that drive business decisions. Outside of work, he unwinds by listening to music and watching movies, activities that allow him to recharge after a long day of data wrangling.

Martin Mikoleizig studied mechanical engineering and production technology at the RWTH Aachen University before starting to work in Dr. h.c. Ing. F. Porsche AG 2015 as a production planner for the engine assembly. In several years as a Project Manager on Testing Technology for new engine models he also introduced several innovations like human-machine-collaborations and intelligent assistance systems. From 2017, he was responsible for the Shopfloor IT team of the module lines in Zuffenhausen before he became responsible for the Planning of the E-Drive assembly at Porsche. Beside this he was responsible for the Digitalisation Strategy of the Production Ressort at Porsche. Since October 2022, he has been assigned to Volkswagen Autoeuropa in Portugal in the role of a Digital Transformation Manager for the plant driving the Digital Transformation towards a Data Driven Factory.

Weizhou Sun is a Lead Architect at Amazon Web Services, specializing in digital manufacturing solutions and IoT. With extensive experience in Europe, she has enhanced operational efficiencies, reducing latency and increasing throughput. Weizhou’s expertise includes Industrial Computer Vision, predictive maintenance, and predictive quality, consistently delivering top performance and client satisfaction. A recognized thought leader in IoT and remote driving, she has contributed to business growth through innovations and open-source work. Committed to knowledge sharing, Weizhou mentors colleagues and contributes to practice development. Known for her problem-solving skills and customer focus, she delivers solutions that exceed expectations. In her free time, Weizhou explores new technologies and fosters a collaborative culture.

Shameka Almond is an Advisory Consultant at Amazon Web Services. She works closely with enterprise customers to help them better understand the business impact and value of implementing data solutions, including data governance best practices. Shameka has over a decade of wide-ranging IT experience in the manufacturing and aerospace industries, and the nonprofit sector. She has supported several data governance initiatives, helping both public and private organizations identify opportunities for improvement and increased efficiency. Outside of the office she enjoys hosting large family gatherings, and supporting community outreach events dedicated to introducing students in K-12 to STEM.

Adjoa Taylor has over 20 years of experience in industrial manufacturing, providing industry and technology consulting services, digital transformation, and solution delivery. Currently Adjoa leads Product Centric Digital Transformation, enabling customers to solve complex manufacturing problems by leveraging Smart Factory and Industry leading transformation mechanisms. Most recently driving value with AI/ML and generative AI use-cases for the plant floor. Adjoa is an experienced leader spending over 20 years of her career delivering projects in countries throughout North America, Latin America, Europe, and Asia. Through prior roles, Adjoa brings deep experience across multiple business segments with a focus on business outcome driven solutions. Adjoa is passionate about helping customers solve problems while realizing the art of the possible via the right impacting value-based solution.

How Volkswagen streamlined access to data across multiple data lakes using Amazon DataZone – Part 1

2024-07-18 Bandana Das

Post Syndicated from Bandana Das original https://aws.amazon.com/blogs/big-data/how-volkswagen-streamlined-access-to-data-across-multiple-data-lakes-using-amazon-datazone-part-1/

Over the years, organizations have invested in creating purpose-built, cloud-based data lakes that are siloed from one another. A major challenge is enabling cross-organization discovery and access to data across these multiple data lakes, each built on different technology stacks. A data mesh addresses these issues with four principles: domain-oriented decentralized data ownership and architecture, treating data as a product, providing self-serve data infrastructure as a platform, and implementing federated governance. Data mesh enables organizations to organize around data domains with a focus on delivering data as a product.

In 2019, Volkswagen AG (VW) and Amazon Web Services (AWS) formed a strategic partnership to co-develop the Digital Production Platform (DPP), aiming to enhance production and logistics efficiency by 30 percent while reducing production costs by the same margin. The DPP was developed to streamline access to data from shop-floor devices and manufacturing systems by handling integrations and providing standardized interfaces. However, as applications evolved on the platform, a significant challenge emerged: sharing data across applications stored in multiple isolated data lakes in Amazon Simple Storage Service (Amazon S3) buckets in individual AWS accounts without having to consolidate data into a central data lake. Another challenge is discovering available data stored across multiple data lakes and facilitating a workflow to request data access across business domains within each plant. The current method is largely manual, relying on emails and general communication, which not only increases overhead but also varies from one use case to another in terms of data governance. This blog post introduces Amazon DataZone and explores how VW used it to build their data mesh to enable streamlined data access across multiple data lakes. It focuses on the key aspect of the solution, which was enabling data providers to automatically publish data assets to Amazon DataZone, which served as the central data mesh for enhanced data discoverability. Additionally, the post provides code to guide you through the implementation.

Introduction to Amazon DataZone

Amazon DataZone is a data management service that makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and third-party sources. Key features of Amazon DataZone include a business data catalog that allows users to search for published data, request access, and start working on data in days instead of weeks. Amazon DataZone projects enable collaboration with teams through data assets and the ability to manage and monitor data assets across projects. It also includes the Amazon DataZone portal, which offers a personalized analytics experience for data assets through a web-based application or API. Lastly, Amazon DataZone governed data sharing ensures that the right data is accessed by the right user for the right purpose with a governed workflow.

Architecture for Data Management with Amazon DataZone

Figure 1: Data mesh pattern implementation on AWS using Amazon DataZone

The architecture diagram (Figure 1) represents a high-level design based on the data mesh pattern. It separates source systems, data domain producers (data publishers), data domain consumers (data subscribers), and central governance to highlight key aspects. This cross-account data mesh architecture aims to create a scalable foundation for data platforms, supporting producers and consumers with consistent governance.

A data domain producer resides in an AWS account and uses Amazon S3 buckets to store raw and transformed data. Producers ingest data into their S3 buckets through pipelines they manage, own, and operate. They are responsible for the full lifecycle of the data, from raw capture to a form suitable for external consumption.
A data domain producer maintains its own ETL stack using AWS Glue, AWS Lambda to process, AWS Glue Databrew to profile the data and prepare the data asset (data product) before cataloguing it into AWS Glue Data Catalog in their account.
A second pattern could be that a data domain producer prepares and stores the data asset as table within Amazon Redshift using AWS S3 Copy.
Data domain producers publish data assets using datasource run to Amazon DataZone in the Central Governance account. This populates the technical metadata in the business data catalog for each data asset. The business metadata, can be added by business users to provide business context, tags, and data classification for the datasets. Producers control what to share, for how long, and how consumers interact with it.
Producers can register and create catalog entries with AWS Glue from all their S3 buckets. The central governance account securely shares datasets between producers and consumers via metadata linking, with no data (except logs) existing in this account. Data ownership remains with the producer.
With Amazon DataZone, once data is cataloged and published into the DataZone domain, it can be shared with multiple consumer accounts.
The Amazon DataZone Data portal provides a personalized view for users to discover/search and submit requests for subscription of data assets using a web-based application. The data domain producer receives the notification of subscription requests in the Data portal and can approve/reject the requests.
Once approved, the consumer account can read and further process data assets to implement various use cases with AWS Lambda, AWS Glue, Amazon Athena, Amazon Redshift query editor v2, Amazon QuickSight (Analytics use cases) and with Amazon Sagemaker (Machine learning use cases).

Manual process to publish data assets to Amazon DataZone

To publish a data asset from the producer account, each asset must be registered in Amazon DataZone as a data source for consumer subscription. The Amazon DataZone User Guide provides detailed steps to achieve this. In the absence of an automated registration process, all required tasks must be completed manually for each data asset.

How to automate publishing data assets from AWS Glue Data Catalog from the producer account to Amazon DataZone

Using the automated registration workflow, the manual steps can be automated for any new data asset that needs to be published in an Amazon DataZone domain or when there’s a schema change in an already published data asset.

The automated solution reduces the repetitive manual steps to publish the data sources (AWS Glue tables) into an Amazon DataZone domain.

Architecture for automated data asset publish

Figure 2 Architecture for automated data publish to Amazon DataZone

To automate publishing data assets:

In the producer account (Account B), the data to be shared resides in an Amazon S3 bucket (Figure 2). An AWS Glue crawler is configured for the dataset to automatically create the schema using AWS Cloud Development Kit (AWS CDK).
Once configured, the AWS Glue crawler crawls the Amazon S3 bucket and updates the metadata in the AWS Glue Data Catalog. The successful completion of the AWS Glue crawler generates an event in the default event bus of Amazon EventBridge.
An EventBridge rule is configured to detect this event and invoke a dataset-registration AWS Lambda function.
The AWS Lambda function performs all the steps to automatically register and publish the dataset in Amazon Datazone.

Steps performed in the dataset-registration AWS Lambda function

- The AWS Lambda function retrieves the AWS Glue database and Amazon S3 information for the dataset from the Amazon Eventbridge event triggered by the successful run of the AWS Glue crawler.
- It obtains the Amazon DataZone Datalake blueprint ID from the producer account and the Amazon DataZone domain ID and project ID by assuming an IAM role in the central governance account where the Amazon Datazone domain exists.
- It enables the Amazon DataZone Datalake blueprint in the producer account.
- It checks if the Amazon Datazone environment already exists within the Amazon DataZone project. If it does not, then it initiates the environment creation process. If the environment exists, it proceeds to the next step.
- It registers the Amazon S3 location of the dataset in Lake Formation in the producer account.
- The function creates a data source within the Amazon DataZone project and monitors the completion of the data source creation.
- Finally, it checks whether the data source sync job in Amazon DataZone needs to be started. If new AWS Glue tables or metadata is created or updated, then it starts the data source sync job.

Prerequisites

As part of this solution, you will publish data assets from an existing AWS Glue database in a producer account into an Amazon DataZone domain for which the following prerequisites need to be performed.

You need two AWS accounts to deploy the solution.
- One AWS account will act as the data domain producer account (Account B) which will contain the AWS Glue dataset to be shared.
- The second AWS account is the central governance account (Account A), which will have the Amazon DataZone domain and project deployed. This is the Amazon DataZone account.
- Ensure that both the AWS accounts belong to the same AWS Organization
Remove the IAMAllowedPrincipals permissions from the AWS Lake Formation tables for which Amazon DataZone handles permissions.
Make sure in both AWS accounts that you have cleared the checkbox for Default permissions for newly created databases and tables under the Data Catalog settings in Lake Formation (Figure 3).

Figure 3: Clear default permissions in AWS Lake Formation
Sign in to Account A (central governance account) and make sure you have created an Amazon DataZone domain and a project within the domain.

If your Amazon DataZone domain is encrypted with an AWS Key Management Service (AWS KMS) key, add Account B (producer account) to the key policy with the following actions:

{
  "Sid": "Allow use of the key",
  "Effect": "Allow",
  "Principal": {
    "AWS": "arn:aws:iam::<Account B>:root"
  },
  "Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey"
  ],
  "Resource": "*"
}

Ensure you have created an AWS Identity and Access Management (IAM) role that Account B (producer account) can assume and this IAM role is added as a member (as contributor) of your Amazon DataZone project. The role should have the following permissions:

This IAM role is called dz-assumable-env-dataset-registration-role in this example. Adding this role will enable you to successfully run the dataset-registration Lambda function. Replace the account-region, account id, and DataZonekmsKey in the following policy with your information. These values correspond to where your Amazon DataZone domain is created and the AWS KMS key Amazon Resource Name (ARN) used to encrypt the Amazon DataZone domain.

{
    "Version": "2012-10-17",
    "Statement": [
         {
            "Action": [
                "DataZone:CreateDataSource",
               "DataZone:CreateEnvironment",
               "DataZone:CreateEnvironmentProfile",
               "DataZone:GetDataSource",
               "DataZone:GetEnvironment",
               "DataZone:GetEnvironmentProfile",
               "DataZone:GetIamPortalLoginUrl",
               "DataZone:ListDataSources",
                "DataZone:ListDomains",
                "DataZone:ListEnvironmentProfiles",
                "DataZone:ListEnvironments",
                "DataZone:ListProjectMemberships",
               "DataZone:ListProjects",
                "DataZone:StartDataSourceRun"
            ],
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                 "kms:Decrypt",
                "kms:DescribeKey",
                "kms:GenerateDataKey"
            ],
           "Resource": "arn:aws:kms:${account_region}:${account_id}:key/${DataZonekmsKey}",
            "Effect": "Allow"
        }
    ]
}

Add the AWS account in the trust relationship of this role with the following trust relationship. Replace ProducerAccountId with the AWS account ID of Account B (data domain producer account).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::${ProducerAccountId}:root",
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
} }

The following tools are needed to deploy the solution using AWS CDK:
- Either Bash or ZSH terminal
- Node and NPM using Node Version Manager
  - Install Node Version Manager (NVM)
  - Install Node version 18.12.0 using following command
```
$ nvm install 18.12.0
```

- - The node and npm binaries should now be available
- Python
- AWS Command Line Interface (AWS CLI)
- AWS SDK for Python
- AWS CDK

Deployment Steps

After completing the pre-requisites, use the AWS CDK stack provided on GitHub to deploy the solution for automatic registration of data assets into DataZone domain

Clone the repository from GitHub to your preferred IDE using the following commands.

git clone https://github.com/aws-samples/automate-and-simplify-aws-glue-data-asset-publish-to-amazon-datazone.git

cd automate-and-simplify-aws-glue-data-asset-publish-to-amazon-datazone

At the base of the repository folder, run the following commands to build and deploy resources to AWS.
```
npm install 
npm run lint
```
Sign in to the AWS account B (the data domain producer account) using AWS Command Line Interface (AWS CLI) with your profile name.
Ensure you have configured the AWS Region in your credential’s configuration file.
Bootstrap the CDK environment with the following commands at the base of the repository folder. Replace <PROFILE_NAME> with the profile name of your deployment account (Account B). Bootstrapping is a one-time activity and is not needed if your AWS account is already bootstrapped.
```
export AWS_PROFILE=<PROFILE_NAME>
npm run cdk bootstrap
```
Replace the placeholder parameters (marked with the suffix _PLACEHOLDER) in the file config/DataZoneConfig.ts (Figure 4).

- Amazon DataZone domain and project name of your Amazon DataZone instance. Make sure all names are in lowercase.
- The AWS account ID and Region.
- The assumable IAM role from the prerequisites.
- The deployment role starting with cfn-xxxxxx-cdk-exec-role-.

Figure 4: Edit the DataZoneConfig file

In the AWS Management Console for Lake Formation, select Administrative roles and tasks from the navigation pane (Figure 5) and make sure the IAM role for AWS CDK deployment that starts with cfn-xxxxxx-cdk-exec-role- is selected as an administrator in Data lake administrators. This IAM role needs permissions in Lake Formation to create resources, such as an AWS Glue database. Without these permissions, the AWS CDK stack deployment will fail.

Figure 5: Add cfn-xxxxxx-cdk-exec-role- as a Data Lake administrator

Use the following command in the base folder to deploy the AWS CDK solution
```
npm run cdk deploy --all
```

During deployment, enter y if you want to deploy the changes for some stacks when you see the prompt Do you wish to deploy these changes (y/n)?

After the deployment is complete, sign in to your AWS account B (producer account) and navigate to the AWS CloudFormation console to verify that the infrastructure deployed. You should see a list of the deployed CloudFormation stacks as shown in Figure 6.

Figure 6: Deployed CloudFormation stacks

Test automatic data registration to Amazon DataZone

To test, we use the Online Retail Transactions dataset from Kaggle as a sample dataset to demonstrate the automatic data registration.

Download the Online Retail.csv file from Kaggle dataset.
Login to AWS Account B (producer account) and navigate to the Amazon S3 console, find the DataZone-test-datasource S3 bucket, and upload the csv file there (Figure 7).

Figure 7: Upload the dataset CSV file

The AWS Glue crawler is scheduled to run at a specific time each day. However for testing, you can manually run the crawler by going to the AWS Glue console and selecting Crawlers from the navigation pane. Run the on-demand crawler starting with DataZone-. After the crawler has run, verify that a new table has been created.
Go to the Amazon DataZone console in AWS account A (central governance account) where you deployed the resources. Select Domains in the navigation pane (Figure 8), then Select and open your domain.

Figure 8: Amazon DataZone domains
After you open the Datazone Domain, you can find the Amazon Datazone data portal URL in the Summary section (Figure 9). Select and open data portal.

Figure 9: Amazon DataZone data portal URL
In the data portal find your project (Figure 10). Then select the Data tab at the top of the window.

Figure 10: Amazon DataZone Project overview
Select the section Data Sources (Figure 11) and find the newly created data source DataZone-testdata-db.

Figure 11: Select Data sources in the Amazon Datazone Domain Data portal
Verify that the data source has been successfully published (Figure 12).

Figure 12: The data sources are visible in the Published data section
After the data sources are published, users can discover the published data and can submit a subscription request. The data producer can approve or reject requests. Upon approval, users can consume the data by querying data in Amazon Athena. Figure 13 illustrates data discovery in the Amazon DataZone data portal.

Figure 13: Example data discovery in the Amazon DataZone portal

Clean up

Use the following steps to clean up the resources deployed through the CDK.

Empty the two S3 buckets that were created as part of this deployment.
Go to the Amazon DataZone domain portal and delete the published data assets that were created in the Amazon DataZone project by the dataset-registration Lambda function.
Delete the remaining resources created using the following command in the base folder:
```
npm run cdk destroy --all
```

Conclusion

By using AWS Glue and Amazon DataZone, organizations can make their data management easier and allow teams to share and collaborate on data smoothly. Automatically sending AWS Glue data to Amazon DataZone not only makes the process simple but also keeps the data consistent, secure, and well-governed. Simplify and standardize publishing data assets to Amazon DataZone and streamline data management with Amazon DataZone. For guidance on establishing your organization’s data mesh with Amazon DataZone, contact your AWS team today.

About the Authors

Bandana Das is a Senior Data Architect at Amazon Web Services and specializes in data and analytics. She builds event-driven data architectures to support customers in data management and data-driven decision-making. She is also passionate about enabling customers on their data management journey to the cloud.

Anirban Saha is a DevOps Architect at AWS, specializing in architecting and implementation of solutions for customer challenges in the automotive domain. He is passionate about well-architected infrastructures, automation, data-driven solutions and helping make the customer’s cloud journey as seamless as possible. Personally, he likes to keep himself engaged with reading, painting, language learning and traveling.

Chandana Keswarkar is a Senior Solutions Architect at AWS, who specializes in guiding automotive customers through their digital transformation journeys by using cloud technology. She helps organizations develop and refine their platform and product architectures and make well-informed design decisions. In her free time, she enjoys traveling, reading, and practicing yoga.

Sindi Cali is a ProServe Associate Consultant with AWS Professional Services. She supports customers in building data driven applications in AWS.

Software-defined edge architecture for connected vehicles

2023-04-03 James Simon

Post Syndicated from James Simon original https://aws.amazon.com/blogs/architecture/software-defined-edge-architecture-for-connected-vehicles/

To remain competitive in a marketplace that increasingly views transportation as a service emphasizing customer experience, vehicle capabilities and mobility applications need to improve and increase value over time, much like the internet of things and smart phones have done.

Vehicle manufacturers and fleet operators are responding to this change by using data to inform and operate their businesses, and by adopting software-defined vehicle features and capabilities. In the broader transportation community, insurance providers are evolving to usage-based insurance, which offers rates that are based on a customer’s risk profile rather than a general population. Telematics service providers are expanding their offerings to fleet operators to include machine learning (ML)-driven capabilities like driver coaching, compliance, and predictive maintenance. Rental car providers are developing apps that provide a personalized in-vehicle experience.

All of these features rely on in-vehicle data collection and processing, evolving vehicles from simple data gathering sensors to fully smart devices. To meet these demands, you can adopt cloud native architectures that use microservices, containers, and declarative application programming interfaces (APIs). This blog post explores a system architecture AWS and Luxsoft have developed together in order to help our customers reduce friction and accelerate time to market for the development, deployment and operation of edge applications required to make vehicles into smart devices.

Software-defined edge

Software is becoming more critical to vehicle function. A modern car has approximately 70-100 Electronic Control Units (ECUs), which control most core functions in the engine, transmission, Heating, ventilation, and air conditioning (HVAC) Automatic Brake System (ABS), body, and airbag hardware components. With new features such as infotainment systems, autonomous driving (AD), and advanced driver assistance systems (ADAS), modern cars use approximately 100 million lines of code, and this is increasing rapidly. ECU and software complexity produce challenges with portability of applications, this could be due to variations of the hardware, or CAN and other network communication differences, without an abstraction layer, application software must be conformed to the operating environment.

One difficulty presented by this increasing complexity and the multitude of integration points and communication interfaces at the edge is that applications typically must be re-developed, customized or at least cross-compiled to fit each hardware platform, often requiring a lengthy integration development and testing cycle for each new device. As a result, edge applications executing ML and other data-driven workloads can produce a pace that is slower than consumer expectations.

The development pace for software-defined vehicles (SDVs) relies on reusing and redeploying software applications. Software reuse is a challenge when target hardware and processing environments aren’t designed in a common way. Therefore, an early step in developing SDVs is to address these challenges at the vehicle edge, that is to say, the externally connected vehicle devices.

System architecture

To help address the challenges of creating reusable software and applications for the vehicle edge, we worked with AWS Partner DXC Luxsoft to create an end-to-end system architecture. The architecture in Figure 1 uses software-defined mobility devices at the vehicle edge to connect to multiple hardware components.

Figure 1. AWS Partner DXC Luxsoft to create an end-to-end system architecture

Let’s explore this architecture step by step.

Edge application Developers check in source code to a CI/CD system for build and test. Applications are built as containers and tested on Amazon EC2 Elastic Compute Cloud (Amazon EC2) instances with target same Amazon Machine Images (AMI).
Edge application containers are stored in Amazon Elastic Container Registry (Amazon ECR).
Containers are registered as AWS IoT Greengrass components for target devices.
Deploy edge applications to multiple target types using AWS IoT Greengrass.
Data is published back to AWS through AWS IoT Core basic telemetry or AWS IoT Fleetwise.
AWS IoT Core routes data to connected vehicle databases and services for further processing. Alternatively, AWS IoT Fleetwise-collected data is routed through AWS IoT Fleetwise.
The Fleet Management portal allows fleet managers to view results produced by applications and services and data collected by AWS IoT Fleetwise. This can include geolocation, vehicle health, Usage Based Insurance (UBI), or other driver scores.
Mobile clients can be created to allow end users and consumers to view application and service results and scores for applications like UBI.

How it works

This technology stack abstracts the specific hardware used, presenting a common run environment that you can deploy to devices that are based on ARM processors with ARM Sytem Ready firmware, including Amazon EC2 Graviton instances.

This architecture uses the following:

Device software stack with an ARM System Ready board support package
Yocto custom Linux build with the SOAFEE EWAOL layer(s) and AWS IoT Greengrass
Instance of the AWS IoT Fleetwise edge agent
Application containers for pre and post processing of data collected by AWS IoT Fleetwise or other applications

We paired this stack with AWS IoT Greengrass V2 and a container-based version of AWS IoT Fleetwise edge agent that we been modified to publish to a local broker first to allow pre and post-processing. This also makes AWS IoT Fleetwise-collected data available to other application containers running on the device. The pre- and post-processing containers prepare messages and data for use by other edge applications or exchange with the cloud services and will be available with the architecture source code.

Benefits

Because this architecture is container-based and abstracts the hardware, deploying and updating applications becomes more efficient. Because Amazon EC2 Graviton instances are ARM-based, you can deploy specific AMIs and configurations that contain this architecture’s technology stack as part of a continuous integration and continuous deployment (CI/CD) pipeline. This means that you can develop new applications and services entirely in the cloud, test them in the cloud with bit equivalent binaries and containers, and then deploy them to hardware components. This can save weeks to months of development and verification through the use of automation. The use of ARM based systems cloud native development and testing strategies can be applied, reducing the need for hardware test equipment and bringing new revenue streams and customer experiences to market at a pace that matches the current demand.

Conclusion

With this architecture, you can develop and deliver new edge processing applications to one or more vehicle device platforms. You can also develop, test, and deploy purpose-built edge compute applications as containerized AWS IoT Greengrass V2 components, including applications like Usage Based Insurance, over the air update agents, and driver distraction.

With this architecture alone in place, value can quickly be added by developing AWS IoT Fleetwise data campaigns targeting specific data required to fulfill business value for consumers, operators or fleet managers, such as vehicle battery state of charge or health indicators.

You can add value to this architecture by developing AWS IoT Fleetwise data campaigns for specific data required to fulfill business value for consumers, operators, or fleet managers. Examples of this data include vehicle battery state of charge and health indicators.

If you’re interested in creating or contributing to a common architecture that can accelerate developing and deploying edge applications to multiple hardware components, contact Connected Vehicle Tech Strategy Lead James Simon or AWS Partner DXC Luxsoft for a working demonstration or to start a proof of concept.

How to secure an enterprise scale ACM Private CA hierarchy for automotive and manufacturing

2022-06-13 Anthony Pasquariello

Post Syndicated from Anthony Pasquariello original https://aws.amazon.com/blogs/security/how-to-secure-an-enterprise-scale-acm-private-ca-hierarchy-for-automotive-and-manufacturing/

In this post, we show how you can use the AWS Certificate Manager Private Certificate Authority (ACM Private CA) to help follow security best practices when you build a CA hierarchy. This blog post walks through certificate authority (CA) lifecycle management topics, including an architecture overview, centralized security, separation of duties, certificate issuance auditing, and certificate sharing by means of templates. These topics provide best practices surrounding your ACM Private CA hierarchy so that you can build the right CA hierarchy for your organization.

With ACM Private CA, you can create private certificate authority hierarchies, including root and subordinate CAs, without the upfront investment and ongoing maintenance costs of operating your own private CA. You can issue certificates for authenticating internal users, computers, applications, services, servers or other devices, and code signing.

This post includes the following Amazon Web Services (AWS) services:

Solution overview

In this blog post, you’ll see an example automotive manufacturing company and their supplier companies. Each will have associated AWS accounts, which we will call Manufacturer Account(s) and Supplier Account(s), respectively.

Automotive manufacturing companies usually have modules that come from different suppliers. Modules, in the automotive context, are embedded systems that control electrical systems in the vehicle. These modules might be interconnected throughout the in-vehicle network or provide connectivity external to the vehicle, for example, for navigation or sending telemetry to off-board systems.

The architecture needs to allow the Manufacturer to retain control of their CA hierarchy, while giving their external Suppliers limited access to sign the certificates on these modules with the Manufacturer’s CA hierarchy. The architecture we provide here gives you the basic information you need to cover the following objectives:

Creation of accounts that logically separate CAs in a hierarchy
IAM role creation for specific personas to manage the CA lifecycle
Auditing the CA hierarchy by using audit reports
Cross-account sharing by using AWS RAM with certificate template scoping

Architecture overview

Figure 1 shows the solution architecture.

Figure 1: Multi-account certificate authority hierarchy using ACM Private CA

The Manufacturer has two categories of AWS accounts:

A dedicated account to hold the Manufacturer’s root CA
An account to hold their subordinate CA

Note: The diagram shows two subordinate CAs in the Manufacturer account. However, depending on your security needs, you can have a subordinate CA per account per supplier.

Additionally, each Supplier has one AWS account. These accounts will have the Manufacturer’s subordinate CA shared by using AWS RAM. The Manufacturer will have a subordinate CA for each Supplier.

Logically separate accounts

In order to minimize the scope of impact and scope users to actions within their duties, it’s critical that you logically separate AWS accounts based on workload within the CA hierarchy. The following section shows a recommendation for how to do that.

AWS account that holds the root CA

You, the Manufacturer, should place the ACM Private root CA within its own dedicated AWS account to segment and tightly control access to the root CA. This limits access at the account level and only uses the dedicated account for a single purpose: holding the root CA for your organization. This account will only have access from IAM principals that maintain the CA hierarchy through a federation service like AWS Single Sign-On (AWS SSO) or direct federation to IAM through an existing identity provider. This account also has AWS CloudTrail enabled and configured for business-specific alerting, including actions like creation, updating, or deletion of the root CA.

AWS account that holds the subordinate CAs

You, the Manufacturer, will have a dedicated account where the entire CA hierarchy below the root will be located. You should have a separate subordinate CA for each Supplier, and in some cases a separate subordinate CA for each hardware module the Supplier is building. The subordinate CAs can issue certificates for specific hardware modules within the Supplier account.

This Manufacturer account shares each subordinate CA to the respective Supplier’s AWS account by using AWS RAM. This provides joint control to the shared subordinate CA, creating isolation between individual Suppliers. AWS RAM allows Suppliers to control certificate issuance and revocation if this is allowed by the Manufacturer. Each Supplier is only shared certificate provisioning access through AWS RAM configuration, which means that you can tightly monitor and revoke access through AWS RAM. Given this sharing through AWS RAM, the Suppliers don’t have access to modify or delete the CA hierarchy itself and can only provision certificates from it.

Supplier AWS account(s)

These AWS accounts are owned by each respective Supplier. For example, you might partner with radio, navigation system, and telemetry suppliers. Each Supplier would have their own AWS account, which they control. The Supplier accepts an invitation from the manufacturer through AWS RAM, sharing the subordinate CA. The subordinate is allowed to take only certain actions, based on how the Manufacturer configured the share (more on this later in the post).

Separation of duties by means of IAM role creation

In order to follow least privilege best practices when you create a CA hierarchy with ACM Private CA, you must create IAM roles that are specific to each job function. The recommended method is to separate administrator and certificate issuer roles.

For this automotive manufacturing use case, we recommend the following roles:

Manufacturer IAM roles:
- A CA admin role with CA disable permission
- A CA admin role with CA delete permission
Supplier certificate issuer IAM role:
- Certificate issuing roles managed by attribute-based access control (ABAC) for developers

Manufacturer IAM role overview

In this flow, one IAM role is able to disable the CA, and a second principal can delete the CA. This enables two-person control for this highly privileged action—meaning that you need a two-person quorum to rotate the CA certificate.

Day-to-day CA admin policy (with CA disable)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "acm-pca:ImportCertificateAuthorityCertificate",
                "acm-pca:DeletePolicy",
                "acm-pca:PutPolicy",
                "acm-pca:TagCertificateAuthority",
                "acm-pca:ListTags",
                "acm-pca:GetCertificate",
                "acm-pca:CreateCertificateAuthority",
                "acm-pca:ListCertificateAuthorities",
                "acm-pca:UntagCertificateAuthority",
                "acm-pca:GetCertificateAuthorityCertificate",
                "acm-pca:RevokeCertificate",
                "acm-pca:UpdateCertificateAuthority",
                "acm-pca:GetPolicy",
                "acm-pca:IssueCertificate",
                "acm-pca:DescribeCertificateAuthorityAuditReport",
                "acm-pca:CreateCertificateAuthorityAuditReport",
                "acm-pca:RestoreCertificateAuthority",
                "acm-pca:GetCertificateAuthorityCsr",
                "acm-pca:DeletePermission",
                "acm-pca:DescribeCertificateAuthority",
                "acm-pca:CreatePermission",
                "acm-pca:ListPermissions"
            ],
            "Resource": “*”
        },
        {
            "Effect": "Deny",
            "Action": [
                "acm-pca:DeleteCertificateAuthority"
            ],
            "Resource": <Enter Root CA ARN Here>
        }
    ]
}

Privileged CA admin policy (with CA delete)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "acm-pca:ImportCertificateAuthorityCertificate",
                "acm-pca:DeletePolicy",
                "acm-pca:PutPolicy",
                "acm-pca:TagCertificateAuthority",
                "acm-pca:ListTags",
                "acm-pca:GetCertificate",
                "acm-pca:UntagCertificateAuthority",
                "acm-pca:GetCertificateAuthorityCertificate",
                "acm-pca:RevokeCertificate",
                "acm-pca:GetPolicy",
    "acm-pca:CreateCertificateAuthority",
                "acm-pca:ListCertificateAuthorities",
                "acm-pca:DescribeCertificateAuthorityAuditReport",
                "acm-pca:CreateCertificateAuthorityAuditReport",
                "acm-pca:RestoreCertificateAuthority",
                "acm-pca:GetCertificateAuthorityCsr",
                "acm-pca:DeletePermission",
    "acm-pca:IssueCertificate",
                "acm-pca:DescribeCertificateAuthority",
                "acm-pca:CreatePermission",
                "acm-pca:ListPermissions",
                "acm-pca:DeleteCertificateAuthority"
            ],
            "Resource": “*”
        },
        {
            "Effect": "Deny",
            "Action": [
                "acm-pca:UpdateCertificateAuthority"
            ],
            "Resource": <Enter Root CA ARN Here>
        }
    ]
}

We recommend that you, the Manufacturer, create a two-person process for highly privileged events like CA certificate rotation ceremonies. The preceding policies serve two purposes. First, they allow you to designate separation of management duties between day-to-day CA admin tasks and infrequent root CA rotation ceremonies. The day-to-day CA admin policy allows all ACM Private CA actions except the ability to delete the root CA. This is because the day-to-day CA admin should not be deleting the root CA. Meanwhile, the privileged CA admin policy has the ability to call DeleteCertificateAuthority. However, in order to call DeleteCertificateAuthority, you first need to have the day-to-day CA admin role disable the root CA.

This means that both roles listed here are necessary to perform a root CA deletion for a rotation or replacement ceremony. This arrangement creates a way to control the deletion of the CA resource by requiring two separate actors to disable and delete. It’s crucial that the two roles are assumed by two different people at the identity provider. Having one person assume both of these roles negates the increased security created by each role.

You might also consider enforcing tagging of CAs at the organization level so that each new CA has relevant tags. The blog post Securing resource tags used for authorization using a service control policy in AWS Organizations illustrates in detail how to secure tags using service control policies (SCPs), so that only authorized users can modify tags.

Supplier IAM role overview

Your Suppliers should also follow least privilege when creating IAM roles within their own accounts. However, as we’ll see in the Cross-account sharing by using AWS RAM section, even if the Suppliers don’t follow best practices, the Manufacturer’s ACM Private CA hierarchy is still isolated and secure.

That being said, here are common IAM roles that your Suppliers should create within their own accounts:

Developers who provision certificates for development and QA workloads
Developers who provision certificates for production

These certificate issuing roles give the Supplier the ability to issue end-entity certificates from the CA hierarchy. In this use case, the Supplier needs two different levels of permissions: non-production certificates and production certificates. To simplify the roles within IAM, the Supplier decided to use ABAC. These ABAC policies allow operations when the principal’s tag matches the resource tag. Because the Supplier has many similar policies, each with a different set of users, they use ABAC to create a single IAM policy that uses principal tags rather than creating multiple slightly different IAM policies.

Certificate issuing policy that uses ABAC

{
	"Version": "2012-10-17",
	"Statement": [
	{
		"Effect": "Allow",
		"Action": [
			"acm-pca:IssueCertificate",
			"acm-pca:ListTags",
			"acm-pca:GetCertificate",
			"acm-pca:ListCertificateAuthorities"
		],
		"Resource": "*",
		"Condition": {
			"StringEquals": {
				"aws:ResourceTag/access-project": "${aws:PrincipalTag/access-project}",
				"aws:ResourceTag/access-team": "${aws:PrincipalTag/access-team}"
			}
		}
	}
}

This single policy enables all personas to be scoped to least privilege access. If you look at the Condition portion of the IAM policy, you can see the power of ABAC. This condition verifies that the PrincipalTag matches the ResourceTag. The Supplier is federating into IAM roles through AWS SSO and tagging the Supplier’s principals within its selected identity providers.

Because you as the Manufacturer have tagged the subordinate CAs that are shared with the Supplier, the Supplier can use identity provider (IdP) attributes as tags to simplify the Supplier’s IAM strategy. In this example, the Supplier configures each relevant user in the IdP with the attribute (tag) key: access-team. This tag matches the tagging strategy used by the Manufacturer. Here’s the mapping for each persona within the use case:

Dev environment:
- access-team: DevTeam
Production environment:
- access-team: ProdTeam

You can choose to add or remove tags depending on your use case, and the preceding scenario serves as a simple example. This offloads the need to create new IAM policies as the number of subordinate CAs grow. If you decide to use ABAC, make sure that you require both principal tagging and resource tagging upon creation of each, because these tags become your authorization mechanism.

CA lifecycle: Audit report published by the Manufacturer

In terms of auditing and monitoring, we recommend that the Manufacturer have a mechanism to track how many certificates were issued for a specific Supplier or module. Within the Manufacturer accounts, you can generate audit reports through the console or CLI. This allows you, the manufacturer, to gather metrics on certificate issuance and revocation. Following is an example of a certificate issuance.

Figure 2: Audit report output for certificate issuance

For more information on generating an audit report, see Using audit reports with your private CA.

Cross-account sharing by using AWS RAM

With AWS RAM, you can share CAs with another account. We recommend that you, as a Manufacturer, use AWS RAM to share CAs with Suppliers so that they can issue certificates without administrator access to the CA. This arrangement allows you as the Manufacturer to more easily limit and revoke access if you change Suppliers. The Suppliers can create certificates through the ACM console or through the CLI, API, or AWS CloudFormation. Manufacturers are only sharing the ability to create, manage, bind, and export certificates from the CA hierarchy. The CA hierarchy itself is contained within the Manufacturers’ accounts, and not within the Suppliers’ accounts. By using AWS RAM, the Suppliers don’t have any administrator access to the CA hierarchy. From a cost perspective, you can centrally control and monitor the costs of your private CA hierarchy without having to deal with cost-sharing across Suppliers.

Refer to How to use AWS RAM to share your ACM Private CA cross-account for a full walkthrough on how to use RAM with ACM Private CA.

Certificate templates with AWS RAM managed permissions

AWS RAM has the ability to create managed permissions in order to define the actions that can be performed on shared resources. For each shareable resource type, you can use AWS RAM managed permissions to define which permissions to grant to whom for shared resource types that support additional managed permissions. This means that when you use AWS RAM to share a resource (in this case ACM Private CA), you can now specify which IAM actions can take place on that resource. AWS RAM managed permissions integrate with the following ACM Private CA certificate templates:

Permission 1: BlankEndEntityCertificate_APICSRPassthrough
Permission 2: EndEntityClientAuthCertificate
Permission 3: EndEntityServerAuthCertificate
Permission 4: subordinatesubordinateCACertificate_PathLen0
Permission 5: RevokeCertificate

These five certificate templates allow a Manufacturer to scope its Suppliers to the certificate template provisioning level. This means that you can limit which certificate templates can be issued by the Suppliers.

Let’s assume you have a Supplier that is supplying a module that has infotainment media capability, and you, the manufacturer, want the Supplier to provision the end-entity client certificate but you don’t want them to be able to revoke that certificate. You can use AWS RAM managed permissions to scope that Supplier’s shared private CA to allow the EndEntityClientAuthCertificate issuance template, which implicitly denies RevokeCertificate template actions. This further scopes down what the Supplier is authorized to issue on the shared CA, gives the responsibility for revoking infotainment device certificates to the Manufacturer, but still allows the Supplier to load devices with a certificate upon creation.

Example of creating a resource share in AWS RAM by using the AWS CLI

This walkthrough shows you the general process of sharing a private CA by using AWS RAM and then accepting that shared resource in the partner account.

Create your shared resource in AWS RAM from the Manufacturer subordinate CA account. Notice that in the example that follows, we selected one of the certificate templates within the managed permissions option. This limits the shared CA so that it can only issue certain types of certificate templates.

Note: Replace the <variable> placeholders with your own values.
```
aws ram create-resource-share
		--name Shared_Private_CA
		--resource-arns arn:aws:acm-pca:<region:111122223333>:certificate-authority/<xxxx-xxxx-xxxx-xxxx-example>
		--permission-arns "arn:aws:ram::aws:permission/<AWSRAMBlankEndEntityCertificateAPICSRPassthroughIssuanceCertificateAuthority>"
		--principals <444455556666>
```
From the Supplier account, the Supplier administrator will accept the resource. Follow How to use AWS RAM to share your ACM Private CA cross-account to complete the shared resource acceptance and issue an end entity certificate.

Conclusion

In this blog post, you learned about the various considerations for building a secure public key infrastructure (PKI) hierarchy by using ACM Private CA through an example customer’s prescriptive setup. You learned how you can use AWS RAM to share CAs across accounts easily and securely. You also learned about sharing specific CAs through the ability to define permissions to specific principals across accounts, allowing for granular control of permissions on principals that might act on those resources.

The main takeaways of this post are how to create least privileged roles within IAM in order to scope down the activities of each persona and limit the potential scope of impact for your organization’s private CA hierarchy. Although these best practices are specific to manufacturer business requirements, you can alter them based on your business needs. With the managed permissions in AWS RAM, you can further scope down the actions that principals can perform with your CA by limiting the certificate templates allowed on that CA when you share it. Using all of these tools, you can help your PKI hierarchy to have a high level of security. To learn more, see the other ACM Private CA posts on the AWS Security Blog.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

How to Run Massively Scalable ADAS Simulation Workloads on CAEdge

2022-02-10 Hendrik Schoeneberg

Post Syndicated from Hendrik Schoeneberg original https://aws.amazon.com/blogs/architecture/how-to-run-massively-scalable-adas-simulation-workloads-on-caedge/

This post was co-written by Hendrik Schoeneberg, Sr. Global Big Data Architect, The An Binh Nguyen, Product Owner for Cloud Simulation at Continental, Autonomous Mobility – Engineering Platform, Rumeshkrishnan Mohan, Global Big Data Architect, and Junjie Tang, Principal Consultant at AWS Professional Services.

AV/ADAS simulations processing large-scale field sensor data such as radar, lidar, and high-resolution video come with many challenges. Typically, the simulation workloads are spiky with occasional, but high compute demands, so the platform must scale up or down elastically to match the compute requirements. The platform must be flexible enough to integrate specialized ADAS simulation software, use distributed computing or HPC frameworks, and leverage GPU accelerated compute resources where needed.

Continental created the Continental Automotive Edge (CAEdge) Framework to address these challenges. It is a modular multi-tenant hardware and software framework that connects the vehicle to the cloud. You can learn more about this in Developing a Platform for Software-defined Vehicles with Continental Automotive Edge (CAEdge) and Developing a platform for software-defined vehicles (re:Invent session-AUT304).

In this blog post, we’ll illustrate how the CAEdge Framework orchestrates ADAS simulation workloads with Amazon Managed Workflows for Apache Airflow (MWAA). We’ll show how it delegates the high-performance workloads to AWS Batch for elastic, highly scalable, and customizable compute needs. We’ll showcase the “bring your own software-in-the-loop” (BYO-SIL) pattern, detailing how to leverage specialized and proprietary simulation software in your workflow. We’ll also demonstrate how to integrate the simulation platform with tenant data in the CAEdge framework. This orchestrate and delegate pattern has previously been introduced in Field Notes: Deploying Autonomous Driving and ADAS Workloads at Scale with Amazon Managed Workflows for Apache Airflow and the Reference Architecture for Autonomous Driving Data Lake.

Solution overview for autonomous driving simulation

The following diagram shows a high-level overview of this solution.

Figure 1. Architecture diagram for autonomous driving simulation

It can be broken down into five major parts, illustrated in Figure 1.

Simulation API: Amazon API Gateway (label 1) provides a REST API for authenticated users to schedule or monitor simulation. It runs on Amazon Managed Workflows for Apache Airflow (MWAA) using AWS Lambda (label 2).
Simulation control plane: The simulation control plane (label 3) built on MWAA enables users to design and initiate workflows and integrate with AWS services.
Scalable compute backend: We leverage the parallelization and elastic scalability capabilities of AWS Batch to distribute the workload (label 4). Additionally, we want to be able to run proprietary software components as part of the simulation workflows and use highly customizable Amazon EC2 compute environments. For example, we can use GPU acceleration or Graviton-based instances for workloads that must run on ARM, instead of x86 architectures.
Autonomous Driving Data Lake (ADDL) integration: The simulations’ input and output data will be stored in a data lake (label 5) on Amazon S3. To provide efficient read and write access, data gets copied before each simulation run using RAID0 bundled ephemeral instance storage drives. After the simulation, results are written back to the data lake and are ready for reporting and analytics. We use AWS Lake Formation for metadata storage, data cataloging, and permission handling.
Automated deployment: The solution architecture’s deployment is fully automated using a CI/CD pipeline in AWS CodePipeline and AWS CodeBuild (label 6). We can then perform automated testing and deployment to multiple target environments.

In this blog we will focus on the simulation control plane, scalable compute backend, and ADDL integration.

Simulation control plane

In a typical simulation, there are tasks like data movement (copying input data and persisting the results). These steps can require high levels of parallelization or GPU support. In addition, they can involve third-party or proprietary software, specialized runtime environments, or architectures like ARM. To facilitate all task requirements, we will delegate task initiation to the AWS services that best match the requirements. Correct sequence of tasks is achieved by an orchestration layer. Decoupling the simulation orchestration from task initiation follows the key design pattern to separate concerns. This enables the architecture to be adapted to specific requirements.

For the high-level task orchestration, we’ll introduce a simulation control plane built on MWAA. Modeling all simulation tasks in a directed acyclic graph (DAG), MWAA performs scheduling and task initiation in the correct sequence. It also handles the integration with AWS’ services. You can intuitively monitor and debug the progress of a simulation run across different services and networks. For the workflow within CAEdge, we’ll use AWS Batch to run simulations that are packaged as Docker containers.

Parameterizable workflows: Airflow supports parameterization of DAGs by providing a JSON object when triggering a DAG run that can be accessed at runtime. This enables us to create a simulation execution framework in which we model the simulation steps. It lets you specify the simulations’ parameters such as which Docker container to execute, runtime configuration, or input and output data locations, see Figures 2 and 3.

Figure 2. Specify simulations’ parameters in JSON

Figure 3. Read simulations’ parameters from JSON

Scalable compute backend

Publishing the Docker image: To run a simulation, you must provide a Docker container in Amazon Elastic Container Registry (ECR). Manually push Docker images to ECR using the command line interface (CLI) or using CI/CD pipeline automation.

Choosing the compute option: Containers at AWS describes the services options for developers. Various factors can contribute to your decision-making. For the CAEdge platform, we want to run thousands of containers with fine-grained control over the underlying compute instance’s configuration. We also want to run containers in privileged mode, so AWS Batch on EC2 is a great match. Figure 4 outlines the compute backend’s architecture.

Figure 4. Compute backend architecture diagram

To run a Docker container on AWS Batch, we need three components: A job definition, a compute environment, and a job queue. The AWS Batch job definition specifies which job to run and how to run it. For example, we can define the Docker image to use, and specify the vCPU and memory configuration. The job definition will be submitted to a job queue, which in turn is linked to one or more compute environments. The compute environment specifies the compute configuration used to run a containerized workload, in addition to instance types and storage configurations. Creating a compute environment describes this more in detail. In the section ‘ADDL integration,’ we’ll describe how to select and configure the EC2 container instances to maximize network and disk I/O.

The AWS Batch array job feature can spin up thousands of independent, but similarly configured jobs. Instead of submitting a single job for each input file from the simulation control plane, we can instead submit a single array job. This provides the collection of input files as input. AWS Batch can spin up child jobs to process each entry of the collection in parallel, reducing operational overhead drastically.

With these components, we can now submit a job definition to a job queue from the simulation control plane, which will initiate on the corresponding compute environment.

ADDL integration

In the CAEdge platform, all simulation recordings and their metadata are stored and cataloged in the data lake. On average, single recordings are around 100 GB in size with simulation requests containing 100–300 recordings. A simulation request can therefore require 10–30 TB of data movement from S3 to the containers before the simulation starts. The containers’ performance will directly depend on I/O performance, as the input data is read and processed during execution. To provide the highest performance of data transfer and simulation workload, we need data storage options that maximize network and disk I/O throughput.

Choosing the storage option: For CAEdge’s simulation workloads, the storage solution acts as a temporary scratch space for the simulation containers’ input data. It should maximize I/O throughput. These requirements are met in the most cost-efficient way by choosing EC2 instances of the M5d family and their attached instance storage.

The M5d are general-purpose instances with NVMe-based SSDs, which are physically connected to the host server and provide block-level storage coupled to the lifetime of the instance. As described in the previous section, we configured AWS Batch to create compute environments using EC2’s M5d instances. While other storage options like Amazon Elastic File System (EFS) and Amazon Elastic Block Storage (EBS) can scale to match even demanding throughput scenarios, the simulation benefits from a high-performance, temporary storage location that is directly attached to the host.

Bundling the volumes: M5d instances can have multiple NVMe drives attached, depending on their size and storage configuration. To provide a single stable storage location for the simulation containers, bundle all attached NVMe SSDs into a single RAID0 volume during the instance’s launch, using a modified user data script. With this, we can provide a fixed storage location that can be accessed from the simulation containers. Additionally, we are distributing I/O operations evenly across all disks in the RAID0 configuration, which improves the read and write performance.

KPIs used for measurement

The SIL factor tells you how long the simulation takes compared to the duration of the recorded data.

SIL factor example:

For a recording with 60 minutes of data and a simulation duration of 120 minutes, the resulting SIL factor is 120 / 60 = 2. The simulation runs twice as long as real time.
For a recording with 60 minutes of data and a simulation duration of 60 minutes, the resulting SIL factor is 60 / 60 = 1. The simulation runs in real time.

Similarly, the aggregated SIL factor tells you how long the simulation for multiple recordings takes compared to the duration of the recorded data. It also factors in the horizontal scaling capabilities.

Aggregated SIL factor example:

For 3 recordings with 60 minutes of data and an overall simulation duration of 120 minutes, the resulting SIL factor is 120 / 3 * 60 = 0.67. The overall simulation is a third faster than real time.

Performance results

The storage optimizations described in the ADDL integration KPI section preceding, led to an improvement of the overall simulation duration of 50%. To benchmark the overall simulation platform’s performance, we created a simulation run with 15 recordings. The simulation run completed successfully with an aggregated SIL factor of 0.363, or, alternatively put, the simulation was roughly three times faster than real time. During production use, the platform will handle simulation runs with an average of 100-300 recordings. For these runs, the aggregated SIL factor is expected to be even smaller, as more simulations can be processed in parallel.

Conclusion

In this post, we showed how to design a platform for ADAS simulation workloads using the Bring Your Own Software-In-the-Loop (BYO-SIL) concept. We covered the key components including simulation control plane, scalable compute backend, and ADDL integration based on Continental Automotive Edge (CAEdge). We discussed the key performance benefits due to horizontal scalability and how to choose an optimized storage integration pattern. This led to an improvement of the overall simulation duration of 50%.

ZABBIX – Open-Source Monitoring Software for Automotive Monitoring

2022-01-17 Dmitry Lambert

Post Syndicated from Dmitry Lambert original https://blog.zabbix.com/zabbix-open-source-monitoring-software-for-automotive-monitoring/18776/

In this article, I will try to cover the theoretical models on monitoring your vehicle fleet with minimal to no cost at all by using the ELM327 microcontroller, a python library to process the collected data and a Zabbix proxy running on a small Raspberry Pi device to store and sent the collected metrics to the central Zabbix server.

Expanding the scope of our Zabbix instance

The first thing that comes to mind when someone mentions a monitoring system is pretty simple. People think about server monitoring, and with servers, we usually mean Linux and Windows systems and also network monitoring for all kinds of flavors of switches, routers, firewalls, etc. But by putting so much focus on these standard things, we are someway limiting the possibilities of monitoring systems. Zabbix has proven itself as an extremely powerful monitoring tool that can combine and monitor all client infrastructure – no matter if we are talking about the aforementioned servers, network devices, services, applications, or anything else. And most important – Zabbix is truly a 100% open-source product, which allows anyone to use all listed functionalities for free.

Please, keep in mind that no doubt there are systems available that are created exactly for the same purpose I will cover here. Maybe they are more reliable. Perhaps they require less effort to achieve the desired result. But that is the exact reason why the presented model is mostly theoretical, with the primary goal being to show that it is hard to put Zabbix in some functionality boundaries. Usually, the only limitation is our imagination. And it is up to you to treat this information for pure entertainment or try to implement it in a place where you find it suitable.

Monitoring a car fleet

Let’s get straight to the point. You don’t need to own a huge logistics company with a thousand vehicle fleet to understand it. In simple words – if you or any of your relatives own a car, you should be aware that cars tend to break. Just like it usually happens, there could be many types of issues, starting with a flat tire and ending with some ongoing damage in the gearbox or engine. It is important to understand that vehicles themselves are becoming smarter. If in the past it was a purely mechanical device, then nowadays it is a highly complicated electronic system on top of that mechanical device that can diagnose the slightest deviations from accepted norms that are set by the manufacturer and inform about this malfunction either with an indicator light on your dash or simply with a log message that will be accessible only when read with specialized software or tools.

Keep in mind that malfunctions in vehicles are not as simple as boolean ( works or not ). In most cases an issue is noticed before the car is not able to move forward, and the purpose of that is to be informed and fix the issue before it has turned into a defect that actually prevents the vehicle from functioning.

And now think about this from an automotive business perspective. We may be talking about hundreds of vehicles that are always on the move to deliver something or someone in time. It should be straightforward that in such a niche business, each of these vehicles should be able to traverse close to a thousand kilometers per day.

Thankfully, as mentioned previously, the smart diagnostic system will let you know about all the potential problems. On the other hand, the driver of the vehicle usually has nothing to do with its repairs or technical condition. So in a perfect world, we should have a few technical employees that would simply ask returning driver whether everything is fine with the car after his shift – are there any errors, and connect with diagnostic software to read its logs to make sure that everything actually is ok. If it’s not ok, information should be passed to the technical department to move this vehicle to the maintenance.

Why such pressure? Well, remember that most of these notifications serve as a warning that something is not working as it should, but currently, it is not causing harm. However, if it is ignored, there is a high chance that at some point, the vehicle will not be able to continue its way to the destination.

So this is where Zabbix joins the conversation. Imagine if all this data transfer from the vehicle diagnostic system to the responsible employees would happen automatically, with potential error prioritization and escalation to further levels if any vehicle has an ongoing issue that remains active for multiple days. And remember that Zabbix is a completely free and open-source system, which means that we could achieve this result for free. And we are absolutely not limited to DTC ( Diagnostic Trouble Codes ) readings. Combining this ecosystem with the recent Zabbix 6.0 LTS release, we can create a geomap with the current location of any vehicle from our fleet. With a little effort, we can also get speed measurements, long stops, starts, and much more.

This is the part when the tested but still theoretical model comes into action. By now, we are aware that a car is way smarter than it may look, and it gathers and stores a lot of useful information. However, the Zabbix monitoring system as per the most common standard sits somewhere in our headquarters and monitors generic metrics of our IT infrastructure. So how could we potentially get this information from our vehicle to Zabbix?

Since all information is stored in ECU (Electronic Control Unit), there is also a way to read it. And it is achieved through OBD (On-Board Diagnostic) socket through the standardized protocol. Just like anything else, OBD has multiple versions or protocols of communications. Still, if we are talking about seamlessly modern cars, most likely we are talking about OBD-II, which included Electronic signaling and messaging format.

Using ELM327 to gather data

precisely OBD-II will help us to gather all information from the vehicle, to further transfer it to our Zabbix monitoring system. Initially, this may yet sound very unclear because we have some kind of socket to access our ECU, but how can we actually gather some meaningful data? For that, we will need ELM327

ELM327 is a programmed microcontroller produced by ELM Electronics for translating the OBD interface. Even today ELM327 command protocol is one of the most popular PC to OBD interface standards. Typically ELM abstracts the low-level protocol and presents a simple interface that can be called via UART, typically by a hand-held diagnostic tool or a computer program connected by USB, RS-232, Bluetooth, or WiFi. In our case, we don’t need and don’t have any dedicated diagnostic tool, so we will have to use something else to work with OBD-II and translate all incoming data. With the ELM-327, it is straightforward. You can purchase an ELM327 OBD2-Bluetooth adapter on Amazon for a couple of dollars, and it will be enough to provide the required functionality.

Data processing with Python-OBD and Raspberry Pi

As it usually happens, for all things that we need, we can find a Python library published under GPLv2. And as you already noticed from the screenshot, we are not limited to stored DTC values. In addition to that, we are able to read live data from our vehicle, such as speed, fuel pressure, coolant temperature, intake temperature, and much more.

The closer we get to the result, the simpler the task starts to look. At this point, we basically have everything that we need. We have the data, and we have the interface from which to read it. ELM327 allows us to transport this data to our device, and the python library will enable us to translate and process this information, therefore allowing sending clean data to our Zabbix. The only open question is what device should we use in our vehicle, on which we could run our Python script, and which would have GSM access to transfer gathered data to the Zabbix server. In my example choice was as simple as cheap – Raspberry Pi.

And then it’s a matter of choice when you have Raspberry set up on a vehicle, connected via Bluetooth or any other way to your ELM327, that is plugged into an OBD-II connector. With Python script running on Pi device to receive and process data from our ECM, we need to decide what piece of software from Zabbix we want on this device.

Zabbix proxy for data storage

Considering that the car could be driving through different areas where internet coverage could not be the best, but we also don’t want to lose any data simply because there was no connection, I think it is best to install Zabbix proxy on Raspberry Pi.

Zabbix proxy perfectly suits such a small setup and helps us with its main purpose. Proxy has a local database that stores all information that has to be sent to our Zabbix server. If because of some networking trouble this data can’t be passed to our server, it will be kept in the local database for a moment when a network connection is restored and data is sent. Luckily for us, Zabbix has Official Packages for Raspberry Pi OS, so we don’t need to tailor any magic around it.

The functionality of the Zabbix proxy allows us to choose between two modes ( Active and Passive ), which basically allows us to choose the direction of communication. It might not be the cheapest approach to purchase a static IP address for each unit. Therefore we will be using Zabbix Proxy (Active), which simply will connect to our Zabbix server and send all gathered information. Of course, there are security measures for validation to make sure that only designated devices will be able to send data to a server. If an even more secure approach is required, users may choose to use TLS encryption with PSK or Certificates.

Collecting the current latitude and longitude

Previously I mentioned, that with the new Geomap widget, it is possible to achieve a live view of the current location from all your fleet on a single dashboard. To do that, we obviously need live latitude and longitude readings, which ECU and stock Raspberry Pi are not able to provide. But this is the beauty of Raspberry Pi. With minimal investments, we can purchase a GPS unit and combine it with our Pi.

With a very simplified Python script, we can gather all required data, and move it to our Zabbix proxy that is installed on localhost, which then will parse this information to our Zabbix Server that will allow us to see it in the dashboard. As this is not a very native and straightforward approach to monitoring, we won’t be able to use native item types to collect this data. This means that all the collection must be done within the script, and then we need to pass this data using the Zabbix-sender utility. The purpose of this utility is very simple, without any complications, take data that is provided and send it to a specified Hostname.

Since Zabbix has a very powerful preprocessing engine, we don’t have to make our script over-complicated with data transformation to meet guidelines for data visualization within Zabbix. We can send raw data, just like it is, and then use any suitable preprocessing step in the Zabbix frontend to extract the value we need to visualize.

The many uses of the collected data

When the data arrives in the Latest data in our Zabbix frontend, consider the most complicated part of this task is done. And just like before the idea of automotive monitoring with Zabbix, the only limitation is your imagination. You can simply collect this data without any actions. Monitor it on your own, from time to time, just to see if you can do anything meaningful with it.

You are also able to utilize a wide list of trigger functions within Zabbix to define that it is a problem when some particular value is received. For example, when some DTC appeared on a device, or let’s say, the average speed of the vehicle exceeds a threshold. Maybe you want to set some borders for coordinates, and if a particular vehicle gets outside of a specified radius, it could raise a problem in your monitoring system.

It is up to you how to react to these triggers. It could be just a flashing light on your Problems view within the Zabbix frontend. It could also automatically create incidents for your maintenance team with a message that a particular vehicle has worn out brake pads that has to be replaced. But maybe if these brake pads are not replaced for a full week since the first time it was noticed, you want to receive a personalized message on your mobile phone so that you can escalate this issue further.

No secret that there are flaws and downsides. As I mentioned right in beginning, there are software and devices that are developed and adopted exactly for this purpose, however, my approach may not be 100% reliable. Data transfer from ECU is not as live as reading CPU utilization from your computer. All of this is just a reminder that monitoring is not limited to network devices and servers. And Zabbix, which is growing every year, provides more and more features to its users while remaining absolutely free and open-source, is here to support all your ideas and help them come to life.

The post ZABBIX – Open-Source Monitoring Software for Automotive Monitoring appeared first on Zabbix Blog.

Developing a Platform for Software-defined Vehicles with Continental Automotive Edge (CAEdge)

2022-01-06 Martin Stamm

Post Syndicated from Martin Stamm original https://aws.amazon.com/blogs/architecture/developing-a-platform-for-software-defined-vehicles-with-continental-automotive-edge-caedge/

This post was co-written by Martin Stamm, Principal Expert SW Architecture at Continental Automotive, Andreas Falkenberg, Senior Consultant at AWS Professional Services, Daniel Krumpholz, Engagement Manager at AWS Professional Services, David Crescence, Sr. Engagement Manager at AWS, and Junjie Tang, Principal Consultant at AWS Professional Services.

Automakers are embarking on a digital transformation journey to become more agile, efficient, and innovative. As part of this transformation, Continental created Continental Automotive Edge (CAEdge) – a modular multi-tenant hardware and software framework that connects the vehicle to the cloud. Continental collaborated with Amazon Web Services (AWS) to develop and scale this framework.

At this AWS re:Invent session, Continental and AWS demonstrated the new and transformative vehicle architectures and software built with CAEdge. These will provide future vehicle manufacturers, Original equipment manufacturers (OEMs) and partners with a multi-tenant development environment for software-intensive vehicle architectures. These can be used to implement software, sensor and big data solutions in a fraction of the development time needed before. As a result, vehicle software can be developed and tested more efficiently, then securely and rolled out directly to vehicles. The framework is already being tested in an automotive manufacturer’s series development.

Addressing core automotive industry pain points

Continental, OEMs and other major Tier 1 companies are required to quickly adapt to new technology requirements without knowing capacity or scaling needs, while at the same time staying ahead of the market. Developers are facing several challenges, in particular the processing of huge amounts of data. For example, a single test vehicle for AV/ADAS generates 20 – 100 TB of data per day. The handling of these data sets and the time to availability in distributed sites can cause major delays in development cycles. Delays are also experienced by developers due to the high numbers of test cases in simulation and validation. In an on-premises environment, this poses significant costs and scaling challenges to provide the required capacity.

The pace of the required transformation to becoming a software-centric organization is creating new challenges and opportunities like:

Current electronic architectures are decentralized, expensive, and complex to develop therefore difficult to maintain and extend.
Vehicle and cloud converge require new software (SW)-defined architectures, integration and operations competencies.
Digital Lifecycle Management enables new business models, go- to-market strategies and partnerships.

In addition to the distribution of huge datasets and distributed work setups is a need for cutting edge security technologies. Encryption at transfer/rest, data residency laws, and secure developer access are common security challenges and are addressed using CAEdge technology.

In this blog post, we describe how to build a secure multi-tenant AWS environment that forms the foundation for CAEdge. We discuss how AWS is helping Continental build the base infrastructure that allows for fast onboarding of OEMs, partners and suppliers through a highly automated process. Development activities can start within hours, instead of days or weeks; with a bootstrapped development environment for software-intensive vehicle architectures. This is all while meeting the strictest security and compliance requirements.

Overview of the CAEdge Framework

The following diagram gives an overview of the CAEdge Framework:

Architecture Diagram showing the CAEdge Platform

Figure 1 – Architecture Diagram showing the CAEdge Framework

The framework is based on the following modular building blocks:

Scalable Compute Platform – High Performance, embedded computer with automotive software stack and connection to the AWS cloud.
Cloud – Cloud services for developers and end-users.
DevOps Workbench – Toolchain for software development and maintenance covering the entire software lifecycle.

The building blocks of the framework are defined by clear API operations and can be integrated easily for various use cases, such as different middleware or CI / CD pipelines.

Overview of the CAEdge Multi-Tenant Framework

Continentals’ core architecture and terminology for a vehicle software development framework include:

CAEdge Framework as an Isolated AWS Organization – Continental’s CAEdge framework runs in a dedicated AWS Organization. Only CAEdge-related workloads are hosted in this AWS Organization. This ensures separation from any other workloads outside of the CAEdge context. The CAEdge framework provides multiple central security, access management, and orchestration services to its users.
Isolated Tenants – The framework is fully tenant-aware. A tenant is an isolated entity that represents an OEM, OEM sub-division, partner, or supplier. A key feature of this system is to ensure complete isolation from one tenant to another. We use a defense-in-depth security approach to ensure tenant separation.
Tenant-Owned Resources and Services – Each tenant has a dedicated set of resources and services that can be consumed and used by all tenant users and services. Tenant-owned resources and services include, but are not limited to:
- Dedicated, tenant-specific data lake,
- Tenant specific logging, monitoring, and operations,
- Tenant-specific UI.
Projects – Each tenant can host an arbitrary number of projects with 1-N users assigned to them. A project is a high-level construct with the goal to create a unique product or service, such as a new “Door Lock” system software. The term project is used in a broad scope. Anything can be classified as a project.
Workbenches – A project consists of 1-N well-defined workbenches. A workbench represents a fully configured development environment of a specific “Workbench Type”. For example, a workbench of type “Simulation” allows for configuration and execution of Simulation Jobs based on drive data. Each workbench is implemented via a well-defined number of AWS Accounts, which is called an AWS Account Set.
- An AWS Account Set always includes at least a Toolchain Account, Dev Account, QA Account and Prod Account. All AWS Accounts are baselined with IAM resources, security services and potentially workbench specific blueprints so development can start quickly for the end-user.

The following diagram illustrates the high-level architecture:

Figure 2 – High-level architecture diagram

The CAEdge framework uses a data mesh architecture using AWS Lake Formation and Glue to create the tenant-level data lake. The Reference Architecture for Autonomous Driving Data Lake is used to design the Simulation workbench.

Implementation Details

With the core architecture and terminology defined, let’s look at the implementation details of the architecture that was described in the preceding image.

Isolated Tenants – Achieving a High Degree of Separation

To achieve a multi-tenant environment, we followed a multi-layered security hardening approach:

Tenant Separation on AWS Account Level: Starting at the AWS Account level, we used individual AWS Accounts where possible. An AWS account can never be assigned to more than one tenant. The functional scope of an AWS Account is kept as small as possible. This increases the number of total AWS Accounts, but significantly reduces the blast radius in case of any breach or incident. Just to give an example:
- Each Dev, QA, and Prod Stage of a Workbench is its own AWS Account. No AWS Account ever combines multiple stages at once.
- Each CAEdge tenant-owned data lake consists of multiple AWS Accounts. A data lake also requires updates as time passes. To allow for side-effect free and well tested updates of the data lake-infrastructure, each tenant comes with a Dev, QA, and Prod data lake.
Tenant Separation via a well-defined Organizational Unit (OU) structure and Service Control Policies (SCP): Each Tenant gets assigned a dedicated OU structure with multiple sub-OUs. This allows for tenant-specific security hardening on SCP-level and potential custom security hardening, in case dedicated tenants have specific security requirements. The SCPs are designed in such a way to allow for a maximum degree of freedom for the individual AWS Account user; while at the same time protecting the integrity of CAEdge and while staying compliant and secure according to specific requirements.
Tenant Separation through an AWS Account Metadata-Layer and automated IAM assignments: The CAedge framework uses a central Amazon DynamoDB database that maps AWS Accounts to Tenants and stores any other Metadata in the Context of an AWS Account. This includes including the Account Owner, Account Type, and Cost-related information. With this database, we can query AWS Accounts based on specific Tenants, Projects, and Workbenches. Furthermore, this forms the foundation of a fully automated permission and AWS Account access-management capability that enforces any Tenant, Project and Workbench boundary.
Tenant Separation Security Controls via AWS Security Services: On top of the standard AWS security services, such as AWS GuardDuty, AWS Config, AWS Inspector and AWS SecurityHub, we use IAM Access Analyzer in combination with our DynamoDB Account Metadata Store to detect the creation of any cross-account permissions that span outside of the AWS Organization, or that may have Cross-Tenant implications.

Automated creation and management of Tenant-Owned Resources and Services, Projects and Workbenches

CAEdge follows the “Everything-as-an-API Approach” and is designed as a fully open platform on the internet. All key features are exposed via a secured, public API. This includes the creation of Projects, Workbenches, and AWS Accounts including the management of access rights in a self-service manner for authorized users, as well as any updates affecting subsequent long-term management. This can only be achieved through a very high degree of automation.

We architect the following services to achieve this high degree of automation:

AWS Control Tower – An AWS managed service for account creation and OU assignment.
AWS Deployment Framework (AWS ADF) – an extensive and flexible framework to manage and deploy resources across multiple AWS Accounts and Regions within an AWS Organization. We use ADF to baseline all accounts with the resources required. This includes all security services, default IAM Roles, network related resources, such as VPCs and DNS and any other resources specific to the AWS Account type.
AWS Single Sign-On (AWS SSO) – A central IAM solution to control access to AWS Accounts. AWS SSO assignments are fully automated based on our defined access patterns using our custom Dispatch application and an extended version of the AWS SSO Enterprise solution.
AWS DynamoDB – A fully managed NoSQL database service for storing tenant, project and AWS Account data. Including information related to ownership, cost management, access management.
Dispatch CAEdge Web Application – A fully serverless web application that exposes functionality to end-users via API calls. It handles authentication, authorization, and provides business logic in the form of AWS Lambda functions to orchestrate all of the aforementioned services.

The following diagram provides a high-level overview of the automation mechanism at the platform level:

Figure 3 – High-level overview of the automation mechanism

With this solution in place, Continental enables OEMs, suppliers, and other partners to spin up developer workbenches in a tenant context within minutes; thereby reducing the setup time from weeks to minutes using a self-service approach.

Conclusion

In this post, we showed how Continental built a secure multi-tenant platform that serves as the scalable foundation for software-intensive, vehicle-related workloads. For other organizations experiencing challenges when transforming into a software-centric organization, this solution eases the onboarding process so developers can start building within hours instead of months.

Ingesting Automotive Sensor Data using DXC RoboticDrive Ingestor on AWS

2021-12-17 Bryan Berezdivin

Post Syndicated from Bryan Berezdivin original https://aws.amazon.com/blogs/architecture/ingesting-automotive-sensor-data-using-dxc-roboticdrive-ingestor-on-aws/

This post was co-written by Pawel Kowalski, a Technical Product Manager for DXC RoboticDrive and Dr. Max Böhm, a software and systems architect and DXC Distinguished Engineer.

To build the first fully autonomous vehicle, L5 standard per SAE, auto-manufacturers collected sensor data from test vehicle fleets across the globe in their testing facilities and driving circuits. It wasn’t easy collecting, ingesting and managing massive amounts of data, and that’s before it’s used for any meaningful processing, and training the AI self-driving algorithms. In this blog post, we outline a solution using DCX RoboticDrive Ingestor to address the following challenges faced by automotive original equipment manufacturers (OEMs) when ingesting sensor data from the car to the data lake:

Each test drive alone can gather hundreds of terabytes of data per fleet. A few vehicles doing test fleets can produce petabyte(s) worth of data in a day.
Vehicles collect data from all the sensors into local storage with fast disks. Disks must offload data to a buffer station and then to a data lake as soon as possible to be reused in the test drives.
Data is stored in binary file formats like ROSbag, ADTF or MDF4, that are difficult to read for any pre-ingest checks and security scans.
On-premises (ingest location) to Cloud data ingestion requires a hybrid solution with reliable connectivity to the cloud.
Ingesting process depositing files into a data lake without organizing them first can cause discovery and read performance issues later in the analytics phase.
Orchestrating and scaling an ingest solution can involve several steps, such as copy, security scans, metadata extraction, anonymization, annotation, and more.

Overview of the RoboticDrive Ingestor solution

The DXC RoboticDrive Ingestor (RDI) solution addresses the core requirements as follows:

Cloud ready: Implements a recommended architectural pattern where data from globally located logger copy stations are ingested into the cloud data lake in one or more AWS Regions over AWS Direct Connect or Site-to-Site VPN connections.
Scalable: Runs multiple steps as defined in an Ingest Pipeline, containerized on Amazon EKS based Amazon EC2 compute nodes that can auto-scale horizontally and vertically on AWS. Steps can be for example, Virus Scan, Copy to Amazon S3, Quality Check, Metadata Extraction, and others.
Performant: Uses the Apache Spark-based DXC RoboticDrive Analyzer component to process large amounts of sensor data files efficiently in parallel, which will be described in a future blog post. Analyzer can read and processes native automotive file formats, extract and prepare metadata, and populate metadata into the DXC RoboticDrive Meta Data Store. The maximum throughput of network and disk is leveraged and data files are placed on the data lake following a hierarchical prefix domain model, that aligns with the best practices around S3 performance.
Operations ready: Containerized deployment, custom monitoring UI, visual workflow management with MWAA, Amazon CloudWatch based monitoring for virtual infrastructure and cloud native services.
Modular: Programmatically add or customize steps into the managed Ingest Pipeline DAG, defined in Amazon Managed Workflows for Apache Airflow (MWAA).
Security: IAM based access policy and technical roles, Amazon Cognito based UI authentication, virus scan, and an optional data anonymization step.

The ingest pipeline copies new files to S3 followed by user configurable steps, such as Virus Scan or Integrity Checks. When new files have been uploaded, cloud native S3 event notifications generate messages in an Amazon SQS queue. These are then consumed by serverless AWS Lambda functions or by containerized workloads in the EKS cluster, to asynchronously perform metadata extraction steps.

Walkthrough

There are two main areas of the ingest solution:

On-Premises devices – these are dedicated copy stations in the automotive OEM’s data center where data cartridges are physically connected. Copy stations are easily configurable devices that automatically mount inserted data cartridges and share its content via NFS. At the end of the mounting process, they call an HTTP post API to trigger the ingest process.
Managed Ingest Pipeline – this is a Managed service in the cloud to orchestrate the ingest process. As soon as a cartridge is inserted into the Copy Station, mounted and shared via NFS, the ingest process can be started.

Figure 1 – Architecture showing the DXC RoboticDrive Ingestor (RDI) solution

Depending on the customer’s requirements, the ingest process may differ. It can be easily modified using predefined Docker images. A configuration file defines what features from the Ingest stack should be used. It can be a simple copy from the source to the cloud but we can easily extend it with additional steps, such as a virus scan, special data partitioning, data quality checks on even scripts provided by the automotive OEM.

The ingest process is orchestrated by a dedicated MWAA pipeline built dynamically from the configuration file. We can easily track the progress, see the logs and, if required, stop or restart the process. Progress is tracked at a detailed level, including the number and volume of data uploaded, remaining files, estimated finish time, and ingest summary, all visualized in a dedicated ingest UI.

Prerequisites

AWS Site-To-Site VPN (in production an AWS Direct Connect connection is used instead)
Amazon Managed Workflows for Apache Airflow (MWAA)
Managed Kubernetes Service (Amazon EKS)
Data provided on source side, shared via a Network File System (NFS)

Ingest pipeline

To build an MWAA pipeline, we need to choose from prebuilt components of the ingestor and add them into the JSON configuration, which is stored in an Airflow variable:

{
    "tasks" : [
        {
            "name" : "Virus_Scan",
            "type" : "copy",
            "srcDir" : "file:///mnt/ingestor/CopyStation1/",
            "quarantineDir" : "s3a://k8s-ingestor/CopyStation1/quarantine/”
        },
        {
            "name" : "Copy_to_S3",
            "type" : "copy",
            "srcDir" : "file:///mnt/ingestor/car01/test.txt",
            "dstDir" : "s3a://k8s-ingestor/copy_dest/",
            "integrity_check" : false
        },
        {
            "name" : "Quality_Check",
            "type" : "copy",
            "srcDir" : "file:///mnt/ingestor/car01/test.txt",
            "InvalidFilesDir" : "s3a://k8s-ingestor/CopyStation1/invalid/",
        }
    ]
}

The final ingest pipeline, a Directed Acyclic Graph (DAG), is built automatically from the configuration and changes are visible within seconds:

Figure 2 – Screenshot showing the final ingest pipeline, a Directed Acyclic Graph (DAG)

If required, external tools or processes can also call ingest components using the Ingest API, which is shown in the following screenshot:

Figure 3 – Screenshot showing a tool being used to call ingest components using the Ingest API

The usual ingest process consists of the following steps:

Ingest data source (cartridge) into the Copy Station

2. Wait for Copy Station to report proper cartridge mounting. In case a dedicated device (Copy Station) is not used, and you want to use an external disc as your data source, it can be mounted manually:

mount -o ro /dev/sda /media/disc

Here, /dev/sda is the disc containing your data and /media/disc is the mountpoint, which must also be added to /etc/exports.

3. Every time the MWAA ingest pipeline (DAG) is triggered, a Kubernetes job is created through a Helm template and Helm values which add the above data source as a persistent volume (PV) mount inside the job.

Helm template is as follows:

apiVersion: v1

kind: PersistentVolume

metadata:

name: {{ .name }}

namespace: {{ .Values.namespace }}

spec:

capacity:

storage: {{ .storage }}

accessModes:

- {{ .accessModes }}

nfs:

server: {{ .server }}

path: {{ .path }}

Helm values:

name: "YourResourceName"

server: 10.1.1.44

path: "/cardata"

mountPath: "/media/disc"

storage: 100Mi

accessModes: ReadOnlyMany

4. Trigger ingest manually from MWAA UI – this can also be automatically triggered from a Copy Station by calling MWAA API:

curl -X POST "https://roboticdrive/api/v1/dags/ingest_dag/dagRuns" -H "accept: application/json" -H "Content-Type: application/json" -d "{\"dag_run_id\":\"IngestDAG\"}"

5. Watch ingest progress – this can be omitted as the ingest results, such as success, failure, can be notified by email or reported to a ticketing system.

6. Once successfully uploaded, turn off the Copy Station manually or it can be the last task of a flow in the following ingest DAG:

(…)

def ssh_operator(task_identifier: str, cmd: str):

return SSHOperator(task_id=task_identifier,

command=cmd + ' ',

remote_host=CopyStation,

ssh_hook=sshHook,

trigger_rule=TriggerRule.NONE_FAILED,

do_xcom_push=True,

dag= IngestDAG)

host_shutdown = ssh_operator(task_identifier='host_shutdown', \

cmd='sudo shutdown –h +5 &')

When launched, you can track the progress and basic KPIs on the Monitoring UI:

Figure 5 - Screenshot basic KPIs on the Monitoring UI

Figure 4 – Screenshot basic KPIs on the Monitoring UI

Conclusion

In this blog post, we showed how to set up the DXC RoboticDrive Ingestor. This solution was designed to overcome several data ingestion challenges and resulted in the following positive results:

Time-to-Analyze KPI decreased by 30% in our tests, local offload steps become obsolete, and data offloaded from cars were accessible on a globally accessible data-lake (S3) for further processing and analysis.
MWAA integrated with EKS made the solution flexible, fault tolerant, as well as easy to manage and maintain.
Autoscaling the compute capacity as needed for any pre-processing steps within ingestion, helped with boosting performance and productivity.
On-premises to AWS cloud connectivity options provided flexibility in configuring and scaling (up and down), and provides optimal performance and bandwidth.

We hope you found this solution insightful and look forward to hearing your feedback in the comments!

Field Notes: Building a Data Service for Autonomous Driving Systems Development using Amazon EKS

2021-11-30 Ajay Vohra

Post Syndicated from Ajay Vohra original https://aws.amazon.com/blogs/architecture/field-notes-building-a-data-service-for-autonomous-driving-systems-development-using-amazon-eks/

Many aspects of autonomous driving (AD) system development are based on data that capture real-life driving scenarios. Therefore, research and development professionals working on AD systems need to handle an ever-changing array of interesting datasets composed from the real-life driving data. In this blog post, we address a key problem in AD system development, which is how to dynamically compose interesting datasets from real-life driving data and serve them at scale in near real-time.

The first challenge in composing large interesting datasets is high latency. If you have to wait for the entire dataset to be composed before you can start consuming the dataset, you may have to wait for several minutes, or even hours. This latency slows down AD system research and development. The second challenge is creating a data service that can cost-efficiently serve the dynamically composed datasets at scale. In this blog post, we propose solutions to both these challenges.

For the challenge of high latency, we propose dynamically composing the data sets as chunked data streams, and serving them using a Amazon FSx for Lustre high-performance file-system. Chunked data streams immediately solve the latency issue, because you do not need to compose the entire stream before it can be consumed. For the challenge of cost-efficiently serving the datasets at scale, we propose using Amazon EKS with auto-scaling features.

Overview of the Data Service Architecture

The data service described in this post dynamically composes and serves data streams of selected sensor modalities for a specified drive scene selected from the A2D2 driving dataset. The data stream is dynamically composed from the extracted A2D2 drive scene data stored in Amazon S3 object data store, and the accompanying meta-data stored in an Amazon Redshift data warehouse. While the data service described in this post uses the Robot Operating System (ROS), the data service can be easily adapted for use with other robotic systems.

The data service runs in Kubernetes Pods in an Amazon EKS cluster configured to use a Horizontal Pod Autoscaler and EKS Cluster Autoscaler. An Amazon Managed Service For Apache Kafka (MSK) cluster provides the communication channel between the data service, and the data clients. The data service implements a request-response paradigm over Apache Kafka topics. However, the response data is not sent back over the Kafka topics. Instead, the data service stages the response data in Amazon S3, Amazon FSx for Lustre, or Amazon EFS, as specified in the data client request, and only the location of the staged response data is sent back to the data client over the Kafka topics. The data client directly reads the response data from its staged location.

The data client running in a ROS enabled Amazon EC2 instance plays back the received data stream into ROS topics, whereby it can be nominally consumed by any ROS node subscribing to the ROS topics. The solution architecture diagram for the data service is shown in Figure 1.

Figure 1 – Data service solution architecture with default configuration

Data Client Request

Imagine the data client wants to request drive scene data in ROS bag file format from A2D2 autonomous driving dataset for vehicle id a2d2, drive scene id 20190401145936, starting at timestamp 1554121593909500 (microseconds) , and stopping at timestamp 1554122334971448 (microseconds). The data client wants the response to include data only from the camera/front_left sensor encoded in sensor_msgs/Image ROS data type, and the lidar/front_left sensor encoded in sensor_msgs/PointCloud2 ROS data type. The data client wants the response data to be streamed back chunked in series of rosbag files, each file spanning 1000000 microseconds of the drive scene. The data client wants the chunked response rosbag files to be staged on a shared Amazon FSx for Lustre file system.

Finally, the data client wants the camera/front_left sensor data to be played back on /a2d2/camera/front_left ROS topic, and the lidar/front_left sensor data to be played back on /a2d2/lidar/front_left ROS topic.

The data client can encode such a data request using the following JSON object, and send it to the Kafka bootstrap servers b-1.msk-cluster-1:9092,b-2.msk-cluster-1:9092 on the Apache Kafka topic named a2d2.

{
 "servers": "b-1.msk-cluster-1:9092,b-2.msk-cluster-1:9092",
 "requests": [{
    "kafka_topic": "a2d2", 
    "vehicle_id": "a2d2",
    "scene_id": "20190401145936",
    "sensor_id": ["lidar/front_left", "camera/front_left"],
    "start_ts": 1554121593909500, 
    "stop_ts": 1554122334971448,
    "ros_topic": {"lidar/front_left": "/a2d2/lidar/front_left", 
    "camera/front_left": "/a2d2/camera/front_left"},
    "data_type": {"lidar/front_left": "sensor_msgs/PointCloud2",
    "camera/front_left": "sensor_msgs/Image"},
    "step": 1000000,
    "accept": "fsx/multipart/rosbag",
    "preview": false
 }]
}

At any given time, one or more EKS pods in the data service are listening for messages on the Kafka topic a2d2. The EKS pod that picks the request message responds to the request by composing the requested data as a series of rosbag files, and staging them on FSx for Lustre, as requested in the "accept": "fsx/multipart/rosbag" field.

Each rosbag in the response is dynamically composed from the drive scene data stored in Amazon S3, using the meta-data stored in Amazon Redshift. Each rosbag contains drive scene data for a single time step. In the preceding example, the time step is specified as "step": 1000000 (microseconds).

Visualizing the Data Service Response

If a human is interested in visualizing the data response, one can use any ROS visualization tool. One such tool is rviz. This tool can be run on the ROS desktop. In the following screenshot, we show the visualization of the response using rviz tool for the example data request shown previously.

Figure 2 – Visualization of response using rviz tool

Dynamically Transforming the Coordinate Frames

The data service supports dynamically transforming the composed data from one coordinate frame to another frame. A typical use case is to transform the data from a sensor specific coordinate frame to AV (ego) coordinate frame. Such transformation request can be included in the data client request.

For example, imagine the data client wants to compose a data stream from all the LiDAR sensors, and transform the point cloud data into the vehicle’s coordinate frame. The example configuration c-config-lidar.json allows you to do that. Following is a visualization of the LiDAR point cloud data transformed to the vehicle coordinate frame and visualized in the rviz tool from a top-down perspective.

Figure 3 – Top-down rviz visualization of point-cloud data transformed to ego vehicle view

Walkthrough

In this walkthrough, we use the A2D2 autonomous driving dataset. The complete code for this walk-through and reference documentation is available in the associated Github repository. So before we get into the walk-through, clone the Github repository on your laptop using the Git clone command. Next, ensure these prerequisites are satisfied.

The approximate cost of the walk-through of this tutorial with default configuration is US $2,000. The actual cost may vary considerably based on actual configuration, and the duration used for the walk-through.

Configure the data service

To configure the data service, we need to create a new AWS CloudFormation stack in the AWS console using the cfn/mozart.yml template from the cloned repository on your laptop.

This template creates AWS Identity and Access Management (IAM) resources, so when you create the CloudFormation Stack using the console, in the review step, you must check I acknowledge that AWS CloudFormation might create IAM resources. The stack input parameters you must specify are the following:

Parameter Name table

For all other stack input parameters, default values are recommended during the first walkthrough. Review the complete list of all the template input parameters in the Github repository reference.

Once the stack status in CloudFormation console is CREATE_COMPLETE, find the ROS desktop instance launched in your stack in the Amazon EC2 console, and connect to the instance using SSH as user ubuntu, using your SSH key pair. The ROS desktop instance is named as <name-of-stack>-desktop.
When you connect to the ROS desktop using SSH, and you see the message "Cloud init in progress. Machine will REBOOT after cloud init is complete!!", disconnect and try later after about 20 minutes. The desktop installs the NICE DCV server on first-time startup, and reboots after the install is complete.
If the message NICE DCV server is enabled!appears, run the command sudo passwd ubuntu to set a new strong password for user ubuntu. Now you are ready to connect to the desktop using the NICE DCV client.
Download and install the NICE DCV client on your laptop.
Use the NICE DCV Client to login to the desktop as user ubuntu
When you first login to the desktop using the NICE DCV client, you may be asked if you would like to upgrade the OS version. Do not upgrade the OS version.

Now you are ready to proceed with the following steps. For all the commands in this blog, we assume the working directory to be ~/amazon-eks-autonomous-driving-data-service on the ROS desktop.

If you used an IAM role to create the stack above, you must manually configure the credentials associated with the IAM role in the ~/.aws/credentials file with the following fields:

aws_access_key_id=

aws_secret_access_key=

aws_session_token=

If you used an IAM user to create the stack, you do not have to manually configure the credentials. In the working directory, run the command:

./scripts/configure-eks-auth.sh

When successfully running this command, the following confirmation appears AWS Credentials Removed.

Configure the EKS cluster environment

In this step, we configure the EKS cluster environment by running the command:

./scripts/setup-dev.sh

This step also builds and pushes the data service container image into Amazon ECR.

Prepare the A2D2 data

Before we can run the A2D2 data service, we need to extract the raw A2D2 data into your S3 bucket, extract the metadata from the raw data, and upload the metadata into the Redshift cluster. We execute these three steps using an AWS Step Functions state machine. To create and run the AWS Step Functions state machine, run the following command in the working directory:

./scripts/a2d2-etl-steps.sh

Note the executionArn of the state machine execution in the output of the previous command. To check the status of the execution, use following command, replacing executionArn below with your value:

The state machine execution time depends on many variable factors, and may take anywhere from 4 – 24 hours, or possibly longer. All the AWS Batch jobs started as part of the state machine automatically reattempt in case of failure.

Run the data service

The data service is deployed using a Helm Chart, and runs as a kubernetes deployment in EKS. To start the data service, execute the following command in the working directory:

kubectl get pods -n a2d2

Run the data service client

To visualize the response data requested by the A2D2 data client, we will use the rviz tool on the ROS desktop. Open a terminal on the desktop, and run rviz.

In the rviz tool, use File>Open Config to select /home/ubuntu/amazon-eks-autonomous-driving-data-service/a2d2/config/a2d2.rviz as the rviz config. You should notice that the rviz tool is now configured with two areas, one for visualizing image data, and the other for visualizing point cloud data.

To run the data client, open a new terminal on the desktop, and execute the following command in the root directory of the cloned Github repository on the ROS desktop:

python ./a2d2/src/data_client.py --config ./a2d2/config/c-config-ex1.json 1>/tmp/a.out 2>&1 &

After a brief delay, you should be able to preview the response data in the rviz tool. You can set "preview": false in the data client config file, ./a2d2/config/c-config-ex1.json, and rerun the preceding command to view the complete response. For maximum performance, pre-load S3 data to FSx for Lustre.

Hard reset of the data service

This step is for reference purposes. If at any time you need to do a hard reset of the data service, you can do so by executing:

helm delete a2d2-data-service

This will delete all data service EKS pods immediately. All in-flight service responses will be aborted. Because the connection between the data client and data service is asynchronous, the data clients may wait indefinitely, and you may need to cleanup the data client processes manually on the ROS desktop using operating system tools. Note, each data client instance spawns multiple Python processes. You may also want to cleanup /fsx/rosbag directory.

Clean Up

When you no longer need the data service, delete the AWS CloudFormation stack from the AWS CloudFormation console. Deleting the stack will shut down the desktop instance, and delete the EFS and FSx for Lustre file-systems created in the stack. The Amazon S3 bucket is not deleted.

Conclusion

In this post, we demonstrated how to build a data service that can dynamically compose near real-time chunked data streams at scale using EKS, Redshift, MSK, and FSx for Lustre. By using a data service, you increase agility, flexibility and cost-efficiency in AD system research and development.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Deploy and Visualize ROS Bag Data on AWS using rviz and Webviz for Autonomous Driving

2021-08-26 Aubrey Oosthuizen

Post Syndicated from Aubrey Oosthuizen original https://aws.amazon.com/blogs/architecture/field-notes-deploy-and-visualize-ros-bag-data-on-aws-using-rviz-and-webviz-for-autonomous-driving/

In the automotive industry, ROS bag files are frequently used to capture drive data from test vehicles configured with cameras, LIDAR, GPS, and other input devices. The data for each device is stored as a topic in the ROS bag file. Developers and engineers need to visualize and inspect the contents of ROS bag files to identify issues or replay the drive data.

There are a couple of challenges to be addressed by migrating the visualization workflow into Amazon Web Services (AWS):

Search, identify, and stream scenarios for ADAS engineers. Visualization tool should be ready instantly, load the data for a certain scenario over a search API, and show the first result through data streaming, to provide a good user experience.
Native integration with the tool chain. Many customers implement the Data Catalog, data storage, and search API in AWS native services. This visualization tool should be integrated into such a tool chain directly.

Overview of solutions

This blog post describes three solutions on how to deploy and visualize ROS bag data on AWS by using two popular visualization tools:

rviz is the standard open-source visualization tool used by the ROS community and has a large set of tools and plugin support.
Webviz is an open-source tool created by Cruise that provides modular and configurable browser-based visualization.

In the autonomous driving data lake reference architecture, both visualization tools are covered in the step 10: Provide an advanced analytics and visualization toolchain including search function for particular scenarios using AWS AppSync, Amazon QuickSight (KPI reporting and monitoring), and Webviz, rviz, or other tooling for visualization.

Prerequisites

Confirm you have followed the guide for working with the AWS CDK in TypeScript
Install the AWS Command Line Interface (AWS CLI) v2
Configure the AWS CLI
For solution 2: Any Virtual Network Computing (VNC) compatible viewer can be used. You can find the binaries for your OS for TigerVNC from GitHub.

Solution 1 – Visualize ROS bag files using rviz on AWS RoboMaker virtual desktops

AWS RoboMaker provides simulation and testing infrastructure for robotics as a managed service. This includes out of the box support for virtual desktops with ROS tooling preconfigured and installed. When you launch a virtual desktop, AWS RoboMaker launches the NICE DCV web browser client. This client provides access to your AWS Cloud9 desktop and streaming applications.

Launch rviz on AWS RoboMaker virtual desktop

Follow the guide on creating a new development environment to provision a new integrated development environment (IDE) and open it. After your AWS Cloud9 IDE is open, you can launch a new virtual desktop by pressing the Virtual Desktop button at the top center of the IDE, and selecting Launch Virtual Desktop. This might take a couple of seconds to open in a new browser tab.

After your virtual desktop is loaded, you can run rviz by opening the terminal and running the following commands:

$ source /opt/ros/melodic/setup.bash
$ roscore &
$ rosrun rviz rviz

Solution 2 – Visualize ROS bag files using rviz on EC2 and TigerVNC

Note: We strongly recommend using the AWS RoboMaker managed solution for provisioning virtual desktops for your visualization needs. In cases where this is not possible due to different Linux distributions or versions, this solution allows an alternative method for setting up a virtual desktop on EC2.

In this solution (source code) we use AWS Cloud Development Kit (AWS CDK) to deploy a new Ubuntu 18.04 AMI EC2 instance to your AWS account and preconfigure it with rviz, TigerVNC, and Ubuntu Desktop.

Figure 1. Architecture for solution 2 (visualize ROS bag files using rviz on EC2 and TigerVNC)

Open a shell terminal that has your AWS CLI configured. Run the following commands to clone the code and install the corresponding nodejs dependencies.

$ git clone https://github.com/aws-samples/aws-autonomous-driving-data-lake-ros-bag-visualization-using-rviz.git rviz-infra
$ cd rviz-infra
$ npm install

Note: Review the README to understand the project structure and commands.

Next, configure your project-specific settings, like your AWS Account, Region, VNC password, VPC to deploy the Amazon EC2 machine into, and EC2 instance type by running the bootstrap script:

$ npm run project-bootstrap

You will be prompted for various inputs to bootstrap the project, including the VNC password to use. Most of these input values will be stored into your cdk.json file, and the VNC password will be stored in the AWS Systems Manager Parameter Store, a capability of AWS Systems Manager.

Run the following command to deploy your stack into your AWS account.

$ cdk deploy

After the stack has been deployed, and the EC2 instance provisioned, its user-data script will initiate and install TigerVNC and the required ROS tooling.

To see the progress, let’s connect to the instance using SSH and then tail the bootstrapping log.

$ ./ssm.sh ssh
$ sudo su ubuntu
$ tail -f /var/log/cloud-init-output.log

It takes approximately 15 minutes for the user-data bootstrapping script to finish. When it finishes, you will see the message “rviz-setup bootstrapping completed. You can now log in via VNC”.

Open a new shell terminal in the project root and start a port forwarding session using SSM through the ssm helper script:

$ ./ssm.sh vnc

After you see the waiting for connections output, you can open your VNC viewer and connect to localhost:5901.

When prompted for a password, enter the one you used when previously running the bootstrapping script.

You now have access to your Ubuntu Desktop environment.

Running rviz

If you opted to install sample data, the Ford AV Sample Data has been downloaded and installed on the EC2 instance already. To visualize it in rviz, a few helper scripts have been created and can be run by opening a new terminal and initiating the following commands:

$ cd /home/ubuntu/catkin_ws
$ ./0-run-all.sh

This helper script will open and run rviz on the Ford sample data.

Figure 2. Ford AV Dataset visualized with rviz on Ubuntu Desktop

You should now be able to use this server to run and visualize your ROS bag files using rviz.

Solution 3 – Visualize ROS bag files using Webviz on Amazon Elastic Container Service (Amazon ECS)

The third solution (source code) uses AWS CDK to deploy Webviz as a container running on AWS Fargate, fronted by an Application Load Balancer (ALB). In addition, the Infrastructure as Code (IaC) can either create a new Amazon S3 bucket or import an existing one. The S3 bucket would have its cross-origin resource sharing (CORS) rules updated to allow streaming bag files from your ALB domain.

The bucket and ROS bag files won’t need to be made public since we will use presigned Amazon S3 URLs for authorizing the streaming of the files.

Finally, the AWS CDK code deploys an AWS Lambda function that can be invoked to generate a properly formatted Webviz streaming URL that contains the HTTP encoded and presigned URL for streaming your bag files directly in the browser.

Figure 3. Architecture for solution 3 (visualize ROS bag files using Webviz on Amazon ECS)

To clone the repository and install its dependencies.

$ git clone https://github.com/aws-samples/aws-autonomous-driving-data-lake-ros-bag-visualization-using-webviz.git webviz-infra
$ cd webviz-infra
$ npm install

Note: Review the project README to understand the different files, project structures, and commands.

Next, modify the cdk.json to specify your specific project configurations (for example, region, bucket name, and whether you wish to import an existing bucket or create a new one).

{
  "context": {
    ....
    "bucketName": "<bucket_name>", // [required] Name of bucket to use or create
    "bucketExists": true, // [required] Should create or update existing bucket
    "generateUrlFunctionName": "generate_ros_streaming_url", // [required] Name of lambda function
    "scenarioDB": { // [optional] Configuration of SceneDescription table
      "partitionKey": "bag_file",
      "sortKey": "scene_id",
      "region": "eu-west-1",
      "tableName": "SceneDescriptions"
    }
  }
}

To deploy the CDK stack into your AWS account run the following command.

$ cdk deploy

You can now access your Webviz instance by opening the URL value for the webvizserviceServiceURL output from the previous deploy command.

Figure 4. Example showing Webviz being served by our ALB

Custom layouts for Webviz can be imported through json configs. The solution contains an example config in the project root. This custom layout contains the topic configurations and window layouts specific to our ROS bag format and should be modified according to your ROS bag topics.

Select Config → Import/Export Layout
Copy and paste the contents of layout.json

Figure 5. Configuring custom layout for Webviz

Next, you need some ROS bag files to stream into your S3 bucket. If you configured AWS CDK to use an existing bucket, and the bucket already contains some ROS bag files, then you can skip the next step.

Upload ROS bag files

You can use the AWS CLI to copy a local bag file to the specified S3 bucket using the aws s3 cp command. You can also copy files between S3 buckets with the aws s3 cp command.

Generate streaming URL with helper script

The code repository contains a Python helper script in the project root to invoke your deployed Lambda function generate_ros_streaming_url locally with the correct payload.

To run the helper script, run the following command in your terminal:

$ python get_url.py \
--bucket-name <bucket_name> \
--key <ros_bag_key>

Response format: http://webviz-lb-<account>.<region>.elb.amazonaws.com?remote-bag-url=<presigned-url>

The response URL can be opened directly in your browser to visualize your targeted ROS bag file.

Generate a streaming URL through Lambda function in the AWS Console

You can generate a streaming URL by invoking your Lambda function generate_ros_streaming_url with the following example payload in the AWS console.

{
"key": "<ros_bag_key>",
"bucket": "<bucket_name>",
"seek_to": "<ros_timestamp>"
}

The seek_to value informs the Lambda function to add a parameter to jump to the specified ROS timestamp when generating the streaming URL.

Example output:

{
"statusCode": 200,
"body":
"{\"url\": \"http://webviz-lb-<domain>.<region>.elb.amazonaws.com?remote-bag-url=<PRESIGNED_ENCODED_URL>&seek-to=<ros_timestamp>\"}"
}

This body output URL can be opened in your browser to start visualizing your bag files directly.

By using a Lambda function to generate the streaming URL, you have the flexibility to integrate it with other user interfaces or dashboards. For example, if you use Amazon QuickSight to visualize different detected scenarios, you can define a customer action to invoke the Lambda function through API Gateway to get a streaming URL for the target scenario.

Similarly, custom web applications can be used to visualize the scenes and their corresponding ROS bag files stored in a metadata store. Invoke the Lambda function from your web server to generate and return a visualization URL that can be use by the web application.

Using the streaming URL

Open the streaming URL in your browser. If you added a seek_to value while generating the URL, it should jump to that point in the bag file.

Figure 6. Example visualization of ROS bag file streamed from Amazon S3

That’s it. You should now start to see your ROS bag file being streamed directly from Amazon S3 in your browser.

Deploying Webviz as part of the Autonomous Driving Data Lake Reference Solution

This solution forms part of the autonomous driving data lake solution which consists of a reference architecture and corresponding field notes and open-source code modules:

If this Webviz solution is deployed in conjunction with Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow (ASD) it supports some out-of-the-box integration with solution 3.

You can configure the solution’s cdk.json to specify the relevant values for the SceneDescription table created by the ASD CDK code. Redeploy the stack after changing this using $ cdk deploy.

With these values, the Lambda function generate_ros_streaming_url now supports an additional payload format:

{
"record_id": “<scene_description_table_partition_key>”,
"scene_id": “<scene_description_table_sort_key>”
}

The get_url.py script also supports the additional scene lookup parameters. To look up a scene stored in your SceneDescription table run the following commands in your terminal:

$ python get_url.py –-record <scene_description_table_partition_key> --scene <scene_description_table_sort_key>

Invoking the generate_ros_streaming_url with the record and scene parameters will result in a lookup of the ROS bag file for the scene from DynamoDB, presigning the ROS bag file and returning an URL to stream the file directly in your browser.

Cleaning Up

To clean up the AWS RoboMaker development environment, review Deleting an Environment.

For the CDK application, you can destroy your CDK stack by running $ cdk destroy from your terminal. Some buckets will need to be manually emptied and deleted from the AWS console.

Conclusion

This blog post illustrated how to deploy two common tools used to visualize ROS bag files, using three different solutions. First, we showed you how to set up an AWS RoboMaker development environment and run rviz. Second, we showed you how to deploy an Amazon EC2 machine that automatically configures Ubuntu-Desktop with TigerVNC and rviz preinstalled. Third, we showed you how to deploy Webviz on Fargate and configure a bucket to allow streaming bag files. Finally, you learned how streaming URLs can be generated and integrated into your custom scenario detection and visualization tools.

We hope you found this post interesting and helpful in extending your autonomous vehicle solutions, and invite your comments and feedback.

Field Notes: Building an automated scene detection pipeline for Autonomous Driving – ADAS Workflow

2021-08-06 Kevin Soucy

Post Syndicated from Kevin Soucy original https://aws.amazon.com/blogs/architecture/field-notes-building-an-automated-scene-detection-pipeline-for-autonomous-driving/

This Field Notes blog post in 2020 explains how to build an Autonomous Driving Data Lake using this Reference Architecture. Many organizations face the challenge of ingesting, transforming, labeling, and cataloging massive amounts of data to develop automated driving systems. In this re:Invent session, we explored an architecture to solve this problem using Amazon EMR, Amazon S3, Amazon SageMaker Ground Truth, and more. You learn how BMW Group collects 1 billion+ km of anonymized perception data from its worldwide connected fleet of customer vehicles to develop safe and performant automated driving systems.

Architecture Overview

The objective of this post is to describe how to design and build an end-to-end Scene Detection pipeline which:

ingests ROS bag files using Amazon Elastic Container Service (ECS), AWS Fargate, Amazon S3 and Amazon Elastic File System (EFS)
performs scene analytics like synchronization of ROS bag topics and object lane assignment using Spark on Amazon EMR
exposes Scene Metadata to downstream consumers via Amazon S3 and Amazon DynamoDB

This architecture integrates an event-driven ROS bag ingestion pipeline running Docker containers on Elastic Container Service (ECS). This includes a scalable batch processing pipeline based on Amazon EMR and Spark. The solution also leverages AWS Fargate, Spot Instances, Elastic File System, AWS Glue, S3, and Amazon Athena.

reference architecture - build automated scene detection pipeline - Autonomous Driving

Figure 1 – Architecture Showing how to build an automated scene detection pipeline for Autonomous Driving

The data included in this demo was produced by one vehicle across four different drives in the United States. As the ROS bag files produced by the vehicle’s on-board software contains very complex data, such as Lidar Point Clouds, the files are usually very large (1+TB files are not uncommon).

These files usually need to be split into smaller chunks before being processed, as is the case in this demo. These files also may need to have post-processing algorithms applied to them, like lane detection or object detection.

In our case, the ROS bag files are split into approximately 10GB chunks and include topics for post-processed lane detections before they land in our S3 bucket. Our scene detection algorithm assumes the post processing has already been completed. The bag files include object detections with bounding boxes, and lane points representing the detected outline of the lanes.

Prerequisites

This post uses an AWS Cloud Development Kit (CDK) stack written in Python. You should follow the instructions in the AWS CDK Getting Started guide to set up your environment so you are ready to begin.

You can also use the config.json to customize the names of your infrastructure items, to set the sizing of your EMR cluster, and to customize the ROS bag topics to be extracted.

You will also need to be authenticated into an AWS account with permissions to deploy resources before executing the deploy script.

Deployment

The full pipeline can be deployed with one command: * `bash deploy.sh deploy true` . The progress of the deployment can be followed on the command line, but also in the CloudFormation section of the AWS console. Once deployed, the user must upload 2 or more bag files to the rosbag-ingest bucket to initiate the pipeline.

The default configuration requires two bag files to be processed before an EMR Pipeline is initiated. You would also have to manually initiate the AWS Glue Crawler to be able to explore the parquet data with tools like Athena or Quicksight.

ROS bag ingestion with ECS Tasks, Fargate, and EFS

This solution provides an end-to-end scene detection pipeline for ROS bag files, ingesting the ROS bag files from S3, and transforming the topic data to perform scene detection in PySpark on EMR. This then exposes scene descriptions via DynamoDB to downstream consumers.

The pipeline starts with an S3 bucket (Figure 1 – #1) where incoming ROS bag files can be uploaded from local copy stations as needed. We recommend, using Amazon Direct Connect for a private, high-throughout connection to the cloud.

This ingestion bucket is configured to initiate S3 notifications each time an object ending in the prefix “.bag” is created. An AWS Lambda function then initiates a Step Function for orchestrating the ECS Task. This passes the bucket and bag file prefix to the ECS task as environment variables in the container.

The ECS Task (Figure 1 – #2) runs serverless leveraging Fargate as the capacity provider, This avoids the need to provision and autoscale EC2 instances in the ECS cluster. Each ECS Task processes exactly one bag file. We use Elastic FileStore to provide virtually unlimited file storage to the container, in order to easily work with larger bag files. The container uses the open-source bagpy python library to extract structured topic data (for example, GPS, detections, inertial measurement data,). The topic data is uploaded as parquet files to S3, partitioned by topic and source bag file. The application writes metadata about each file, such as the topic names found in the file and the number of messages per topic, to a DynamoDB table (Figure 1 – #4).

This module deploys an AWS Glue Crawler configured to crawl this bucket of topic parquet files. These files populate the AWS Glue Catalog with the schemas of each topic table and make this data accessible in Athena, Glue jobs, Quicksight, and Spark on EMR. We use the AWS Glue Catalog (Figure 1 – #5) as a permanent Hive Metastore.

Figure 2 – Glue Data Catalog of parquet datasets on S3

Figure 3 – Run ad-hoc queries against the Glue tables using Amazon Athena

The topic parquet bucket also has an S3 Notification configured for all newly created objects, which is consumed by an EMR-Trigger Lambda (Figure 1 – #5). This Lambda function is responsible for keeping track of bag files and their respective parquet files in DynamoDB (Figure 1 – #6). Once in DynamoDB, bag files are assigned to batches, initiating the EMR batch processing step function. Metadata is stored about each batch including the step function execution ARN in DynamoDB.

Figure 4 – EMR pipeline orchestration with AWS Step Functions

The EMR batch processing step function (Figure 1 – #7) orchestrates the entire EMR pipeline, from provisioning an EMR cluster using the open-source EMR-Launch CDK library to submitting Pyspark steps to the cluster, to terminating the cluster and handling failures.

Batch Scene Analytics with Spark on EMR

There are two PySpark applications running on our cluster. The first performs synchronization of ROS bag topics for each bagfile. As the various sensors in the vehicle have different frequencies, we synchronize the various frequencies to a uniform frequency of 1 signal per 100 ms per sensor. This makes it easier to work with the data.

We compute the minimum and maximum timestamp in each bag file, and construct a unified timeline. For each 100 ms we take the most recent signal per sensor and assign it to the 100 ms timestamp. After this is performed, the data looks more like a normal relational table and is easier to query and analyze.

Figure 5 – Batch Scene Analytics with Spark on EMR

Scene Detection and Labeling in PySpark

The second spark application enriches the synchronized topic dataset (Figure 1 – #8), analyzing the detected lane points and the object detections. The goal is to perform a simple lane assignment algorithm for objects detected by the on-board ML models and to save this enriched dataset (Figure 1 – #9) back to S3 for easy-access by analysts and data scientists.

Figure 9 – Object Lane Assignment example

Figure 9 – Synchronized topics enriched with object lane assignments

Finally, the last step takes this enriched dataset (Figure 1 – #9) to summarize specific scenes or sequences where a person was identified as being in a lane. The output of this pipeline includes two new tables as parquet files on S3 – the synchronized topic dataset (Figure 1 – #8) and the synchronized topic dataset enriched with object lane assignments (Figure 1 – #9), as well as a DynamoDB table with scene metadata for all person-in-lane scenarios (Figure 1 – #10).

Scene Metadata

The Scene Metadata DynamoDB table (Figure 1 – #10) can be queried directly to find sequences of events, as will be covered in a follow up post for visually debugging scene detection algorithms using WebViz/RViz. Using WebViz, we were able to detect that the on-board object detection model labels Crosswalks and Walking Signs as “person” even when a person is not crossing the street, for example:

Example DynamoDB item from the Scene Metadata table

Figure 10 – Example DynamoDB item from the Scene Metadata table

These scene descriptions can also be converted to Open Scenario format and pushed to an ElasticSearch cluster to support more complex scenario-based searches. For example, downstream simulation use cases or for visualization in QuickSight. An example of syncing DynamoDB tables to ElasticSearch using DynamoDB streams and Lambda can be found here (https://aws.amazon.com/blogs/compute/indexing-amazon-dynamodb-content-with-amazon-elasticsearch-service-using-aws-lambda/). As DynamoDB is a NoSQL data store, we can enrich the Scene Metadata table with scene parameters. For example, we can identify the maximum or minimum speed of the car during the identified event sequence, without worrying about breaking schema changes. It is also straightforward to save a dataframe from PySpark to DynamoDB using open-source libraries.

As a final note, the modules are built to be exactly that, modular. The three modules that are easily isolated are:

the ECS Task pipeline for extracting ROS bag topic data to parquet files
the EMR Trigger Lambda for tracking incoming files, creating batches, and initiating a batch processing step function
the EMR Pipeline for running PySpark applications leveraging Step Functions and EMR Launch

Clean Up

To clean up the deployment, you can run bash deploy.sh destroy false. Some resources like S3 buckets and DynamoDB tables may have to be manually emptied and deleted via the console to be fully removed.

Limitations

The bagpy library used in this pipeline does not yet support complex or non-structured data types like images or LIDAR data. Therefore its usage is limited to data that can be stored in a tabular csv format before being converted to parquet.

Conclusion

In this post, we showed how to build an end-to-end Scene Detection pipeline at scale on AWS to perform scene analytics and scenario detection with Spark on EMR from raw vehicle sensor data. In a subsequent blog post, we will cover how how to extract and catalog images from ROS bag files, create a labelling job with SageMaker GroundTruth and then train a Machine Learning Model to detect cars.

Field Notes: Deploying Autonomous Driving and ADAS Workloads at Scale with Amazon Managed Workflows for Apache Airflow

2021-08-05 Hendrik Schoeneberg

Post Syndicated from Hendrik Schoeneberg original https://aws.amazon.com/blogs/architecture/field-notes-deploying-autonomous-driving-and-adas-workloads-at-scale-with-amazon-managed-workflows-for-apache-airflow/

Cloud Architects developing autonomous driving and ADAS workflows are challenged by loosely distributed process steps along the tool chain in hybrid environments. This is accelerated by the need to create a holistic view of all running pipelines and jobs. Common challenges include:

finding and getting access to the data sources specific to your use case, understanding the data, moving and storing it,
cleaning and transforming it and,
preparing it for downstream consumption.

Verifying that your data pipeline works correctly and ensuring you provide a certain data quality while your application is being developed adds to the complexity. Furthermore, the fact that the data itself constantly changes poses more challenges. In this blog post, we show you the steps to follow from a workflow run in an Amazon Managed Workflows for Apache Airflow (MWAA) environment. All required infrastructure is created leveraging the AWS Cloud Development Kit (CDK).

We illustrate how image data collected during field operational tests (FOTs) of vehicles can be processed by:

detecting objects within the video stream
adding the bounding boxes of all detected objects to the image data
providing visual and metadata information for data cataloguing as well as potential applications on the bounding boxes.

We define a workflow that processes ROS bag files on Amazon S3, and extracts individual PNGs from a video stream using AWS Fargate on Amazon Elastic Container Service (Amazon ECS). Then we show how to use Amazon Rekognition to detect objects within the exported images and draw bounding boxes for the detected objects within the images. Finally, we demonstrate how users’ interface with MWAA to develop and publish their workflows. We also show methods to keep track of the performance and costs of your workflows.

Overview of the solution

Figure 1 – Architecture for deploying Autonomous Driving & ADAS workloads at scale

The preceding diagram shows our target solution architecture. It includes five components:

Amazon MWAA environment: we use Amazon MWAA to orchestrate our image processing workflow’s tasks. A workflow, or DAG (directed acyclic graph), is a set of task definitions and dependencies written in Python. We create a MWAA environment using CDK and define our image processing pipeline as a DAG which MWAA can orchestrate.
ROS bag file processing workflow: this is a workflow run on Managed Workflows for Apache Airflow. This detects and highlights objects within the camera stream of a ROS bag file. The workflow is as follows:
- Monitors the location in Amazon S3 for the new ROS bag files.
- If a new file gets uploaded, we extract video data from the ROS bag file’s topic that holds the video stream, extract individual PNGs from the video stream and store them on S3.
- We use AWS Fargate on Amazon ECS to run a containerized image extraction workload.
- Once the extracted images are stored on S3, they are passed to Amazon Rekognition for object detection. We then provide an image or video to the Amazon Rekognition API, and the service can identify objects, people, text, scenes, and activities. Amazon Rekognition Image can return the bounding box for many object labels.
- We use the information to highlight the detected objects within the image by drawing the bounding box.
- The bounding boxes of the objects Amazon Rekognition detects within the extracted PNGs are rendered directly into the image using Pillow, a fork of the Python Imaging Library (PIL).
- The resulting images are then stored in Amazon S3 and our workflow is complete.
Predefined DAGs: Our solution comes with a pre-defined ROS bag image processing DAG that will be copied into the Amazon MWAA environment with CDK.
AWS CodePipeline for automated CI/CD pipeline: CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define
Amazon CloudWatch for monitoring and operational dashboard: With Amazon CloudWatch we can monitor important metrics of the ROS bag image processing workflow, for example. to verify the compute load, number of running workflows or tasks or to monitor and alert on processing errors.

Prerequisites

Prepare an AWS account to be used for the installation
AWS CLI installed and configured to point to your AWS account
Use a named profile to access your AWS account
Install the AWS Cloud Development Kit
Our example uses Python version 3.7 or newer
Docker installed and running
Download the sample ROS bag file

Deployment

To deploy and run the solution, we follow these steps:

Configure and create an MWAA environment in AWS using CDK
Copy the ROS bag file to S3
Manage the ROS bag processing DAG from the Airflow UI
Inspect the results
Monitor the operations dashboard

Each component illustrated in the solution architecture diagram, as well as all required IAM roles and configuration parameters are created from an AWS CDK application. This is included in these example sources.

1. Configure and create an MWAA environment in AWS using CDK

Download the deployment templates
Extract the zip file to a directory
Open a shell and go to the extracted directory
Start the deployment by typing the following code:

./deploy.sh <named-profile> deploy true <region>

Replace <named-profile> with the named profile which you configured for your AWS CLI to point to your target AWS account.
Replace <region> with the Region that is associated with your named profile, for example, us-east-1.
As an example: if your named profile for the AWS CLI is called rosbag-processing and your associated Region is us-east-1 the resulting command would be:

./deploy.sh rosbag-processing deploy true us-east-1

After confirming the deployment, the CDK will take several minutes to synthesize the CloudFormation templates, build the Docker image and deploy the solution to the target AWS account.

Figure 2 – Screenshot of Rosbag processing stack

Inspect the CloudFormation stacks being created in the AWS console. Sign into your AWS console and select the service CloudFormation in the AWS Region that you chose for your named profile (for exmaple, us-east-After the installation script has run, the CloudFormation stack overview in the console should show the following stacks:

Figure 3 – Screenshot of CloudFormation stack overview

2. Copy the ROS bag file to S3

You can download the ROS bag file here. After downloading you need to copy the file to the input bucket that was created during the CDK deployment in the previous step.
Open the AWS management console and navigate to the service S3. You should find a list of your buckets including the buckets for input data, output data and DAG definitions that were created during the CDK deployment.

Figure 4 – Screenshot showing S3 buckets created

Select the bucket with prefix rosbag-processing-stack-srcbucket and copy the ROS bag file into the bucket.
Now that the ROS bag file is in S3, we need to enable the image processing workflow in MWAA.

3. Manage the ROS bag processing DAG from the Airflow UI

In order to access the Airflow web server, open the AWS management console and navigate to the MWAA service.
From the overview, select the MWAA environment that was created and select Open Airflow UI as shown in the following screenshot:

Figure 5 – Screenshot showing Airflow Environments

The Airflow UI and the ROS bag image processing DAG should appear as in the following screenshot:

We can now enable the DAG by flipping the switch (1) from “Off” to “On” position. Amazon MWAA will now schedule the launch for the workflow.
By selecting the DAG’s name (rosbag_processing) and selecting the Graph View, you can review the workflow’s individual tasks:

Figure 6 – The Airflow UI and the ROS bag image processing DAG

Figure 7 – Dag Workflow

After enabling the image processing DAG by flipping the switch to “on” in the previous step, Airflow will detect the newly uploaded ROS bag file in S3 and start processing it. You can monitor the progress from the DAG’s tree view (1), as shown in the following screenshot:

Figure 8 – Monitor progress in the DAG’s tree view

Re-running the image processing pipeline

Our solution uses S3 metadata to keep track of the processing status of each ROS bag file in the input bucket in S3. If you want to re-process a ROS bag file:

open the AWS management console,
navigate to S3, click on the ROS bag file you want to re-process,
remove the file’s metadata tag processing.status from the properties tab:

Figure 9 – Metadata tag

With the metadata tag processing.status removed, Amazon MWAA picks up the file the next time the image processing workflow is launched. You can manually start the workflow by choosing Trigger DAG (1) from the Airflow web server UI.

Figure 10 – Manually start the workflow by choosing Trigger DAG

4. Inspect the results

In the AWS management console navigate to S3, and select the output bucket with prefix rosbag-processing-stack-destbucket. The bucket holds both the extracted images from the ROS bag’s topic containing the video data (1) and the images with highlighted bounding boxes for the objects detected by Amazon Rekognition (2).

Figure 11 – Output S3 buckets

Navigate to the folder 20201005 and select an image,
Navigate to the folder bounding_boxes
Select the corresponding image. You should receive an output similar to the following two examples:

Figure 12 – Left: Image extracted from ROS bag camera feed topic. Right: Same image with highlighted objects that Amazon Rekognition detected.

Amazon CloudWatch for monitoring and creating operational dashboard

Amazon MWAA comes with CloudWatch metrics to monitor your MWAA environment. You can create dashboards for quick and intuitive operational insights. To do so:

Open the AWS management console and navigate to the service CloudWatch.
Select the menu “Dashboards” from the navigation pane and click on “Create dashboard”.
After providing a name for your dashboard and selecting a widget type, you can choose from a variety of metrics that MWAA provides. This is shown in the following picture.

MWAA comes with a range of pre-defined metrics that allow you to monitor failed tasks, the health of your managed Airflow environment and helps you right-size your environment.

Figure 13 – CloudWatch metrics in a dashboard help to monitor your MWAA environment.

Clean Up

In order to delete the deployed solution from your AWS account, you must:

1. Delete all data in your S3 buckets

In the management console navigate to the service S3. You should see three buckets containing rosbag-processing-stack that were created during the deployment.
Select each bucket and delete its contents. Note: Do not delete the bucket itself.

2. Delete the solution’s CloudFormation stack

You can delete the deployed solution from your AWS account either from the management console or the command line interface.

Using the management console:

Open the management console and navigate to the CloudFormation service. Make sure to select the same region you deployed your solution to (for exmaple,. us-east-1).
Select stacks from the side menu. The solution’s CloudFormation stack is shown in the following screenshot:

Figure 14 – Cleaning up the CloudFormation Stack

Select the rosbag-processing-stack and select Delete. CloudFormation will now destroy all resources related to the ROS bag image processing pipeline.

Using the Command Line Interface (CLI):

Open a shell and navigate to the directory from which you started the deployment, then type the following:

cdk destroy --profile <named-profile>

Replace <named-profile> with the named profile which you configured for your AWS CLI to point to your target AWS account.

As an example: if your named profile for the AWS CLI is called rosbag-processing and your associated Region is us-east-1 the resulting command would be:

cdk destroy --profile rosbag-processing

After confirming that you want to destroy the CloudFormation stack, the solution’s resources will be deleted from your AWS account.

Conclusion

This blog post illustrated how to deploy and leverage workflows for running Autonomous Driving & ADAS workloads at scale. This solution was designed using fully managed AWS services. The solution was deployed on AWS using AWS CDK templates. Using MWAA we were able to set up a centralized, transparent, intuitive workflow orchestration across multiple systems. Even though the pipeline involved several steps in multiple AWS services, MWAA helped us understand and visualize the workflow, its individual tasks and dependencies.

Usually, distributed systems are hard to debug in case of errors or unexpected outcomes. However, building the ROS bag processing pipelines on MWAA allowed us to inspect the log files involved centrally from the Airflow UI, for debugging or auditing. Furthermore, with MWAA the ROS bag processing workflow is part of the code base and can be tested, refactored, versioned just like any other software artifact. By providing the CDK templates, we were able to architect a fully automated solution which you can adapt to your use case.

We hope you found this interesting and helpful and invite your comments on the solution!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Automating Data Ingestion and Labeling for Autonomous Vehicle Development

2021-06-28 Amr Ragab

Post Syndicated from Amr Ragab original https://aws.amazon.com/blogs/architecture/field-notes-automating-data-ingestion-and-labeling-for-autonomous-vehicle-development/

This post was co-written by Amr Ragab, AWS Sr. Solutions Architect, EC2 Engineering and Anant Nawalgaria, former AWS Professional Services EMEA.

One of the most common needs we have heard from customers in Autonomous Vehicle (AV) development, is to launch a hybrid deployment environment at scale. As vehicle fleets are deployed across the globe, they are capturing real-time telemetry and sensory data. A data lake is created to process the data, and then iterating that dataset improves machine learning models for L3+ development. These datasets can include 4K60Hz camera video captures, LIDAR, RADAR and car telemetry data. The first step is to create the data gravity on which the compute and labeling portions will operate.

Architecture Overview

In this blog, we explain how to build the components in the following architecture, which takes an input dataset of 4K camera data, and performs event-driven video processing. This also includes anonymizing the dataset with face and license plate blurring using AWS Batch.

Figure 1 – Architecture for Automating Data Ingestion and Labeling for Autonomous Vehicle Development

The output is a cleaned dataset which is processed in an Amazon SageMaker Ground Truth workflow. You can visualize the results in a SageMaker Jupyter notebook.

You can transfer data from on-premises to AWS with both online transfer using AWS Storage Gateway, AWS Transfer Family, or AWS DataSync via AWS DirectConnect, as well as offline using AWS Snowball. Whether you use an online or offline approach, a fully automated processing workflow can still be initiated.

Description of the Dataset

The dataset we acquired was a driving sample in the N. Virginia/Washington DC metro area, as shown in the following map.

The dataset we acquired was a driving sample in the N. Virginia/Washington DC metro area, as shown in the following image

Figure 2 – Map of the area in which the driving sample was taken.

This path was chosen because of several different driving patterns including city, suburban, highway and unique driving characteristics.

We captured a 4K-60Hz video as well telemetry data from the CAN bus and finally GPS coordinates. The telemetry data from the CAN bus and GPS coordinates were streamed in real time over 4G/5G mobile network through the Amazon Kinesis service. Reach out to your AWS account team if you are interested in exploring connected car applications.

Video Processing Workflow and Manifest Creation

We walk you through the process for the video processing workflow and manifest creation.

The first step in the workflow is to take the 4K video file and split it into frames. Some additional processing is done to perform color and lens correction but that is specific to the camera used in this blog.
The next step is to use the ML-based anonymizer application which processes incoming video frames and applies face and license plate blurring on the dataset.
It uses the excellent work from Understand.ai and is available on Github via the Apache 2.0 License.
We then take the processed data and create a manifest.json file which is uploaded to a S3 bucket.
The S3 bucket then becomes the source for the SageMaker Ground Truth workflow.
Ancillary steps also include applying a lifecycle policy to Amazon S3 to transfer the raw video file to Amazon S3 Glacier. The video is then reconstructed from the processed frames. The docker image contains the following enablement stack:

Ubuntu 18.04 – nvcr.io/nvidia/cuda
AWS command line utility
FFMPEG
Understand.ai anonymizer GitHub

SageMaker Ground Truth Labeling and Analysis

To preparing image metadata, we use Amazon Rekognition. Any extra information, such as other objects were added to the image using Amazon Rekognition, and following is the Lambda code for it.

Preparing image metadata using Amazon Rekognition

Figure 3 -Lambda code to add additional objects to the image using Amazon Rekognition.

Before starting the analysis, let’s create some helper functions using the following code:

Code sample to show Helper functions

Code sample to show additional help functions

Compute Summary Statistics

First, let’s compute some summary statistics for a set of labeling jobs. For example, we might want to make a high-level comparison between two labeling jobs performed with different settings. Here, we’ll calculate the number of annotations and the mean confidence score for two jobs. We know that the first job was run with three workers per task, and the second was run with five workers per task.

Code to calculate the number of annotations and the mean confidence score for two jobs.

We can determine that the mean confidence of the whole job was 40%. Now, let us do the querying on the dataN.

Example 1:

Objective: All images with at least 5 cars annotations with confidence score of at least 80%.

Query: select * from s3object s where s.”demo-full-dataset-2″ is not null and ‘Car’ in s.”demo-full-dataset-2-metadata”.”class-map”.* and size(s.”demo-full-dataset-2″.”annotations”) >= 5 and min(s.”demo-full-dataset-2-metadata”.objects[*].”confidence”) >= 0.8 limit 20

Code:

Query output

Results:

Photos showing video output

Example 2:

Objective: Identify all images with at least one Person in them.

Query: select * from s3object s where ‘Person’ in s.”Objects”[*] limit 10.

Code:

Objective: Identify all images with at least one Person in them

Results:

Identify all images with at least one Person in them

Results - Identify all images with at least one Person in them

Example 3:

Objective: Identify all images with at least some text in them (for example, on signboards)

Query: select * from s3object s where CHAR_LENGTH(s.Text)>0 and s.Text limit 10

Code:

Objective: Identify all images with at least some text in them

Results:

Results - images with at least some text in them

images with at least some text in them

Conclusion

In this blog, we showed an architecture to automate data ingestion and Ground Truth labeling for autonomous vehicle development. We initiated a workflow to process a data lake to anonymize the individual video frames and then prepare the dataset for Ground Truth labeling. The ground truth labeling UI was offered to a globally distributed workforce which labeled our images at scale. If you are developing an autonomous vehicle/robotics platform, contact your AWS account team for more information.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

AWS extends its TISAX scope to cover the London and Paris Regions

2020-12-17 Clara Lim

Post Syndicated from Clara Lim original https://aws.amazon.com/blogs/security/aws-extends-tisax-scope-to-cover-london-paris-regions/

We’re excited to announce the completion of Trusted Information Security Assessment Exchange (TISAX) certification on December 08, 2020 for the London and Paris regions. These regions were assessed at the HIGH protection level (AL 2) for the control domains Information Handling and Data Protection, according to article 28 (“Processor”) of the European General Data Protection Regulation (GDPR).

The TISAX certification helps provide automotive industry organizations with the assurance they need to build secure applications and services in the cloud. The certification was established by the German Association of the Automotive Industry (VDA) and is governed by the European Network Exchange (ENX).

With this scope expansion, a total of 10 regions globally (Seattle, Frankfurt, Ireland, Oregon, Ohio, Northern Virginia, Canada, Seoul, London, and Paris) are currently certified for TISAX and demonstrate a consistent and standardized approach to information security systems for the automotive industry.

An independent third-party auditor conducted and accredited the assessment. Automotive customers can rely on the AWS TISAX assessment results and labels published on the ENX Portal for timely exchange of compliance status with their supply chains. The scope ID and assessment ID are STRN58 and AYZ39G, respectively.

For more information, see Trusted Information Security Assessment Exchange.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Amazon Lookout for Vision – New ML Service Simplifies Defect Detection for Manufacturing

2020-12-01 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/amazon-lookout-for-vision-new-machine-learning-service-that-simplifies-defect-detection-for-manufacturing/

Today, I’m excited to announce Amazon Lookout for Vision, a new machine learning (ML) service that helps customers in industrial environments to detect visual defects on production units and equipment in an easy and cost-effective way.

Can you spot the circuit board with the defect in these images?

Maybe you can if you are familiar with circuit boards, but I have to say that it took me a while to discover the error. Humans, when properly trained and are well rested, are great at finding anomalies in a set of objects. However, when they are tired or not properly trained – like me in this example – they can be slow, prone to errors and inconsistent.

That’s why many companies use machine vision technologies to detect anomalies. However, these technologies need to be calibrated with controlled lighting and camera viewpoints. In addition, you need to specify hard-coded rules that define what is a defect and what is not, making the technologies very specialized and complex to build.

Lookout for Vision is a new machine learning service that helps increase industrial product quality and reduce operational costs by automating visual inspection of product defects across production processes. Lookout for Vision uses deep learning models to replace hard-coded rules and handles the differences in camera angle, lighting and other challenges that arise from the operational environment. With Lookout for Vision, you can reduce the need for carefully controlled environments.

Using Lookout for Vision, you can detect damages to manufactured parts, identify missing components or parts, and uncover underlying process-related issues in your manufacturing lines.

How to Get Started With Lookout for Vision
The first thing I want to mention is that to use Lookout for Vision, you don’t need to be a machine learning expert. Lookout for Vision is a fully managed service and comes with anomaly detection models that can be optimized for your use case and your data.

There are several steps for using Lookout for Vision. The first is preparing the dataset, which includes creating a dataset of images and adding labels to the images. Then, Lookout for Vision uses this dataset to automatically train the ML model that learns to detect anomalies in your product. The final part is using the model in production. You can keep evaluating the performance of your trained model and improve it at any time using tools that Lookout for Vision provides.

Preparing the Data
To get started with the model, you first need a set of images of your product. For better results, include images with normal (no defects) and anomalous content (includes defects). To get started with training, you will need at least 20 normal images and 10 anomalous images.

There are many ways of importing images into Lookout for Vision from the AWS Management Console: You can provide manifests for annotated images using the Amazon SageMaker Ground Truth service, provide images from an S3 bucket or upload directly from your computer.

After you upload the images, you need to add labels to classify the images in your dataset as normal or anomalous. Labeling is a very important step, as this is the key information that Lookout for Vision uses to train the model for your use case.

For this demo, I import the images from an S3 bucket. If you’ve organized the images in your S3 bucket by folder name (/anomaly/01.jpeg), Lookout for Vision will automatically import the folder structure into corresponding labels.

Training the Model
When our dataset is ready, we need to train our model with it. The training button is enabled once you have the minimum number of labeled images: 20 normal and 10 anomalous.

Depending on the size of the dataset, training may take a while to complete: for me, it took around an hour to train the model with 100 images. Note that you will begin incurring costs when Lookout for Vision starts to actually train the model. After training is complete, your model is ready to detect anomalies in new images.

Evaluating the Model
There are a couple of ways to evaluate whether your model is ready to be deployed to production. The first is to review the performance metrics of the model and the second is to run some production–like tests that will help you to verify if the model is ready to be deployed.

There are three main performance metrics: precision, recall and the F1 score. Precision measures the percentage of times the model prediction is correct and recall measures the percentage of true defects the model identified. F1 score is used to determine the model performance metric.

If you want to run some production-like tests to verify if your model is ready, use the run trial detection feature. This will enable you to run your Lookout for Vision model and predict anomalies on new images. You can further improve the model by manually verifying the results and adding new training images.

I used the three images that appear at the beginning of this post for my trial detection. The trial detection job ran for 15-20 minutes, and after that Lookout for Vision used the trained model to classify the images into “Normal” and “Anomaly.” When Lookout for Vision finalizes the trial detection job, you can verify the results as correct or incorrect, and add this images to the dataset.

Using the Model in Production
To use Lookout for Vision, you need to integrate the AWS SDKs or CLI in the systems that are processing the images of the products in the manufacturing line, and internet connectivity is required for this to work. The first thing you need to do is to start the model. When using Lookout for Vision, you are billed for the time your model is running and making inferences. For example, if you start your model at 8 a.m. and stop it at 5 p.m., you will be billed for 9 hours.

# Example CLI
aws lookoutvision start-model 
--project-name circuitBoard 
--model-version 1
--additional-output-config "Bucket=<OUTPUT_BUCKET>,Prefix=<PREFIX_KEY>" 
--min-anomaly-detection-units 10 

# Example response
{ "status" : "STARTING_HOSTING" }

When your model is ready, you can call the detect-anomalies API from Lookout for Vision.

# Example CLI
aws lookoutvision detect-anomalies 
--project-name circuitBoard 
--model-version 1

And this API will return a JSON response that shows if the image is an anomaly or not, along with the confidence level of that prediction.

{
    "DetectAnomalyResult": {
        "Source": {
            "Type": "direct"
        },
        "IsAnomalous": true,
        "Confidence": 0.97
    }
}

When you are done with detecting anomalies for the day, use the stop-model API. In the Lookout for Vision service console you can find code snippets on how to use these APIs.

When you are using Lookout for Vision in production, you’ll find a dashboard that helps you to sort and track the production lines by most defective line, line with the most recent defects, and the line with the highest anomaly ratio.

Available Today
Lookout for Vision is available in all AWS Regions.

To get started with Amazon Lookout for Vision, visit the service page today.

— Marcia

Field Notes: Building an Autonomous Driving and ADAS Data Lake on AWS

2020-10-14 Junjie Tang

Post Syndicated from Junjie Tang original https://aws.amazon.com/blogs/architecture/field-notes-building-an-autonomous-driving-and-adas-data-lake-on-aws/

Customers developing self-driving car technology are continuously challenged by the amount of data captured and created during the development lifecycle. This is accelerated by the need to design and launch incremental feature improvements on advanced driver-assistance systems (ADAS). Efforts to advance ADAS functionality have led to new approaches for storing, cataloging, and analyzing driving data captured from vehicles on the road today. Combining data from connected vehicle fleets transmitted over cellular networks in combination with manually ingesting data from vehicle data loggers requires a complex architecture and elastic data lake capability that only AWS provides.

This blog explains how to build an Autonomous Driving Data Lake using this Reference Architecture. We cover the workflow from how to ingest the data, prepare it for machine learning, catalog the output from ADAS systems and vehicle sensors, label it, automatically detect scenarios, and manage the various workflows required for moving it into an organized data lake construct. The AWS Autonomous Driving and ADAS Data Lake Reference Architecture was developed after working with numerous customers on the challenges they faced in achieving this. Using multiple AWS services and solution best practices, we outlined an approach we found helpful to others.

In the blog post, Autonomous Vehicle and ADAS development on AWS Part 1: Achieving Scale, we outlined the benefits having a data lake in the cloud, as opposed to building your own on-premises data lake solution.

Before we dive into the details of the reference architecture, let’s review the typical workflow of autonomous and ADAS development. The diagram below shows the following steps, much of which are common in any machine learning project:

data acquisition and ingest,
data processing and analytics,
labeling, map development,
model and algorithm development,
simulation and validation, and
orchestration and deployment.

This blog focuses on data ingest, data processing and analytics, labeling, and the data lake itself, as shown in the following workflow diagram:

driving dev workflow

Why build a data lake for ADAS and Autonomous Driving System development?

At AWS re:Invent 2019, BMW spoke at this session; (AUT306): Creating a data-driven, cloud-native ecosystem at BMW Group where they explained the motivation for developing an enterprise-wide cloud data lake, or the Cloud Data Hub as they call it. The Cloud Data Hub ingests information from multiple lines of business including sources like manufacturing systems, logistics, customer service, after-sales, and connected vehicle sensor telemetry data.

BMW created a global organization to drive DevOps culture with a strong emphasis on tooling, data quality, and end-to-end data lineage. Their journey began in the Hadoop eco-system (Hive, HBase) with an on-premises, heterogeneous, and difficult-to-scale environment, then moved to cloud-native building blocks with a focus on data quality. Examples of benefits derived from the AWS Cloud implementation are multi-region deployments, high security standards, and compliance with local regulations. The goal is to democratize the most valuable information assets so they can be used across a broad community globally.

The BMW Cloud Data Hub is just one example of how customers manage information assets for sharing across the enterprise. Using a similar approach, the AWS Autonomous Driving and ADAS Data Lake Reference Architecture extends the data lake pattern to address specific challenges around:

metadata and the data catalog including automated scenario detection;
data lineage from source to semantic layer;
data sharing with external consumers and third party service providers via Amazon SageMaker Ground Truth.

Now let’s review the details of the Autonomous Driving Data Lake Reference Architecture:

1. Ingest data from autonomous fleet with AWS Outposts for local data processing.

Sensor data is captured and written to data loggers containing multiple SSD hard drives. Once at the garage or customer facility, the hard drives are removed and inserted into copy stations. From there the data is copied to Amazon S3 or to a local storage system and AWS Outposts for pre-processing. AWS Outposts is a fully managed service that extends AWS infrastructure with AWS Services like Amazon Elastic Kubernetes Service, Amazon Relational Database Service, Amazon EMR on Amazon S3 (EMR supports EMRFS and HDFS which both natively support S3). These services are used to run data integrity checks, compress the data to remove redundant information and prepare the data for downstream AD workloads.

Using AWS DataSync, the data is synchronized at high data rates and securely between on-premises network attached storage (NAS) sources and Amazon S3. This is done over a high bandwidth connection provided by AWS Direct Connect.

2. Ingest vehicle telemetry data in real time using AWS IoT Core and Amazon Kinesis Data Firehose.

Vehicle telemetry is captured and published to the cloud using a number of different technologies, typically over HTTPS or MQTT. In this architecture, AWS IoT Greengrass provides an intelligent edge runtime in the vehicle with application logic running in Lambda functions deployed locally to filter vehicle network signals like CAN data, GPS location, ADAS system output, road condition metadata derived from cameras, and other vehicle sensor information.

AWS IoT Greengrass allows customers to deploy containers, machine learning inference models, create multiple data streams and prioritize them based on business logic you define. Eventually, the data ends up in S3 to be combined with the sensor data captured from the data logger process defined in step one above.

3. Remove and transform low quality data.

Autonomous vehicles produce terabytes of data per hour. In this trove of information, there may be redundant as well as corrupted data coming from the vehicle telemetry stream and raw sensor stack. This data needs to be normalized for optimal downstream processing. Customers use a number of technologies to do this. For example, Amazon EMR provides a runtime for high volume, complex data processing using open source Apache big data processing engines like Spark. There are a few common steps in the data transformation including;

checking if the driving is complete by combining batch files and streaming data;
parsing the log files based on recording formats (rosbag, mdf4, etc.);
decoding the signals from binary formats to readable text;
filtering inconsistent data files; and
synchronize the timestamp of signals

EMR Launch is an open source framework developed by AWS Labs available for customers to accelerate and simplify defining, deploying, managing, and using Amazon EMR clusters with the following features:

separating the definition of cluster security configurations (EMR Profile) and cluster resource configurations (Cluster Configuration) into reusable and shareable constructs; and
provide a suite of tools to simplify the construction of orchestration pipelines using Amazon Step Functions (EMR Launch Function).

4. Schedule the extract, transform, load (ETL) jobs using Apache Airflow.

To create trustworthy insights and empower your next machine learning use case with a reliable data foundation you need to bring the data creation process under design control. Fundamental to this approach is a centralized, governed workflow system powered by Apache Airflow. With Airflow you can establish trust in your data processing pipelines by making the workflow part of your code base to enable transparent, repeatable, pipeline executions.

The following solution diagram shows how radar and video data processing in MDF4 format achieves the highest scalability by leveraging AWS Fargate for Amazon ECS.

To ensure data pipeline integrity, the solution is deployed securely in an Amazon Virtual Private Cloud with end-to-end TLS and is only accessed from a private bastion host.
The containers for Airflow Webserver, Scheduler and Worker are deployed in multiple Availability Zones for high availability.
The communication between the components are decoupled via Amazon ElastiCache for Redis.
The status of running jobs is stored in Amazon Aurora.
To learn more about how to leverage complex workflows and model training jobs on AWS, a future blog post will describe the detailed architecture of Apache Airflow.

radar and video data processing in MDF4 format

5. Enrich data with map information and weather conditions based on GPS location and timestamp.

Data-sets are enriched using Amazon EMR with map or weather information from external geospatial and weather service providers and stored in Amazon S3, or database services like Amazon DynamoDB. Sensors like cameras and LiDAR could malfunction or even fail in adverse weather conditions. Advanced sensor fusing could perform the weather perception in real-time. The result of the weather perception could be verified with the real weather conditions.

6. Extract metadata into Amazon DynamoDB and Amazon Elasticsearch Service.

Using drive logs that contain the telemetry, telematics, perception, and sensor data, a catalog is built to create searchable quantitative metadata that includes speed, turning angles, location as well as simple and complex semantic descriptions of scene snippets such as “high velocity,” “left turn,” or “pedestrian.” The data lake catalog is updated with scenario data and indexed in Amazon Elastic Search for discovery by analysts and ADS engineers. The extraction process is ideally fully automated, but many higher order behavioral descriptions may require human annotations from later processing steps.

The majority of systems leverage quantitative and simple semantic descriptions for the drive data, with a clear trend towards needing higher order behaviors extracted in the data lake catalog. These more complex behaviors are ideal for enhanced searching capabilities and better validating coverage mapping. ASAM OpenSCENARIO defines a scenario description language that provides a common ontology and hierarchy for detailing the behavior of the vehicle and surroundings.

This standard provides an open approach for describing complex, synchronized maneuvers that involve multiple entities including other vehicles, vulnerable road users (VRUs) like pedestrians, bikers, construction workers, and other traffic participants. The description of a maneuver may be based on driver actions like performing a lane change by the ego, or based on the actions of others in the scenario such as a cut in from another driver. OpenSCENARIO also accounts for the appearance/description of the participant in the scene.

7. Store data lineage in Amazon Neptune and catalog data using AWS Glue Data Catalog.

Amazon Neptune is a fully managed graph database service. It’s useful to catalog data lineage in a graph model to visualize file and object dependencies. AWS Glue is a fully managed service that provides a data catalog making assets in the data lake discoverable. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

The following diagram shows how we parse data from the Ford Autonomous Vehicle Dataset in Rosbag format using Amazon EMR. We store it in Amazon S3 in parquet format, use AWS Glue Crawler to read the file schema and create the tables in the AWS Glue Data Catalog, and finally use Amazon Athena to query the velocity data.

Ford Autonomous Vehicle Dataset

8. Process drive data and perform deep signal validation.

Deploy your drive data signal validation code in Amazon Elastic Kubernetes Service (Amazon EKS). EKS is a managed service that makes it easy for you to run Kubernetes without needing to stand up or maintain your own Kubernetes control plane. Only a subset of the signals from the Rosbag or MDF4 files will be extracted for KPI calculation and aggregation which potentially reduces the stored data volume from gigabytes to megabytes.

9. Perform automated labeling using Amazon SageMaker Ground Truth.

Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning. Ground Truth offers automatic data labeling and/or annotation which uses a machine learning model to label your data.

In addition, the service helps you create custom workflows for data labeling that leverage human workers from Amazon Mechanical Turk, the AWS Partner Network, or your own private workforce to improve the automated labeling accuracy. Ground Truth now supports 3D Point Cloud Labeling for task types like Object Detection, Object Tracking and Semantic Segmentation. This blog shows how to use the service with open data sets from Audi A2D2 and KITTI. Alternatively, customers can run a custom container on Amazon EKS for Ground Truth generation and labeling.

10. Provide a search function for particular scenarios using AWS AppSync.

Developers and data scientists can search for a particular scenario and all of the associated metadata related to it. AWS AppSync is a managed service that uses GraphQL to make it easy for applications to get data from a range of data sources such as Amazon DynamoDB, Amazon ES and AWS Lambda.

Additional Aspects to consider

China: The collection of raw data that includes video, lidar, radar, and GPS data is defined as a controlled activity by the government (geographic information surveying and mapping). This is a regulated activity and must be done under the governance of local certified map providers with navigation surveying licenses.
Data Encryption and Anonymization: Some ADS/ADAS use cases include sensitive or personal information. AWS Key Management Service (KMS) supports Customer Master Keys (CMKs). Vehicle Identification Numbers (VIN) can be anonymized by Amazon EMR jobs. This blog shows how to anonymize personal data like faces from the video using Amazon Rekognition.
Exchange Data with partners: AWS Data Exchange is a service that makes it easy for AWS customers to securely exchange file-based data sets in the AWS Cloud. Providers in AWS Data Exchange have a secure, transparent, and reliable channel to reach AWS customers and grant existing customers their subscriptions more efficiently.
Data Lake as code: AWS provides a full stack of DevOps tooling including AWS CodeCommit, AWS CodeBuild and AWS CodePipeline to simplify the provisioning and management infrastructure, to deploy application code, automate software release processes, and monitor application and infrastructure performance. Third party CI/CD tools like Jenkins and Zuul can be integrated as well.

Conclusion

In this post, we discussed the steps outlined in this Reference architecture to build an Autonomous Driving and ADAS Data Lake on AWS. We hope you found this interesting and helpful and invite your comments on the architecture.

Also, check out the Automotive issue of the AWS Architecture Monthly Magazine.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Field Notes: Implementing Hardware-in-the-Loop for Autonomous Driving Development on AWS

2020-10-13 Bryan Berezdivin

Post Syndicated from Bryan Berezdivin original https://aws.amazon.com/blogs/architecture/field-notes-implementing-hardware-in-the-loop-for-autonomous-driving-development-on-aws/

Automotive customers use AWS as their platform for advanced driving assistance systems (ADAS) and autonomous driving (AD) development to accelerate their development cycles and experience faster time-to-market. In the blog post, Autonomous Vehicle and ADAS development on AWS Part 1: Achieving Scale, we illustrated how software in the loop (SiL) and hardware in the loop (HiL) simulations are part of the workflow used to develop and validate safe AD and ADAS functionality. In this post, I run through some of the more common questions and patterns for implementing HiL on AWS, while looking at some of the differences from running your development all on-premises.

HiL simulations leverage test drive data and derived synthetic data to develop and validate various functions in the AD software stack. Test drive log data is ingested and stored in Amazon S3 for use for HiL simulations in parallel with other AD development workloads including visualization, processing, labeling, analysis, and model and algorithm development.

As such, we see customers exist in a hybrid context with their HiL workloads running on-premises to support customized equipment. For ADAS and ADS customers, this poses a few questions and considerations:

What are the recommendations to deploy HiL for AD on AWS?
How is this different from what customers were used to on-premises?

For hybrid customers, there are assumptions and misconceptions:

Do I need to replicate the test drive data locally, and if so, what are the considerations and consequences?

For the purposes of brevity, the remainder of this blog post will use the term AD development to encompass ADAS and AD unless specifically called out.

HiL Building Blocks

Simulations and validations make up an important aspect of AD development. According to Rand’s analysis, there is a need to demonstrate safe driving on billions of miles for an autonomous vehicle to have a lower failure rate than a human driver. While this analysis is statistically derived for fully autonomous driving (SAE Level 5), it demonstrates a need for further validation on millions of miles. This pattern is reflected in most ADAS and AD development projects, where software in the loop (SiL) and HiL are used for verification and validation.

The following diagram is an illustration of the ISO 26262 V-Model, a product development approach for matching requirements with corresponding tests, where HiL simulation is required for much of system level testing and validation phases on the right side of the overlaid V-Model.

Figure 1: V-Model as defined by ISO-26262

HiL simulations require a few key elements. The main component is the device under test (DUT), such as one or more electronic control units (ECUs) running the AD software stack. HiL simulations allow customers to put the device under test (DUT) under the rigor of real-world signals found in a vehicle. By providing more accurate environments and scenarios for the DUT, it can be fine-tuned for key performance indicators (KPIs) such as power utilization, response time, and accuracy.

The DUT is connected to a “HIL Rig,” a high performance server with multiple expansion boards to connect to various components of the AD system. The various interfaces are identified in Figure 2: High Level Hardware-in-the-loop (HiL) System and Interfaces and include Controller Area Network (CAN), Automotive Ethernet, Low Voltage Differential Signal (LVDS), and PCIExpress. These interfaces emulate the vehicular topology for testing purposes and allow system level validation of the DUT.

The HiL Rigs and the corresponding software tooling are offered by companies like Elektrobit, dSpace, National Instruments, and Opal-RT. The HiL systems facilitate time synchronized inputs and outputs to the DUT and measures system performance. These solutions have optimizations for large-scale operations aligned with faster validation cycles for customers. This latter point is relevant when a project requires validation across a large number of miles. Elektrobit provides the ability to orchestrate and deploy large HiL server farms that can be deployed to work in parallel. The larger server farms allow parallel HiL simulations to reduce the time to validate thousands of miles of drive time and assess the key performance indicators (KPIs) for the feature sets. Results can be acted on more quickly and reduce the overall development time.

Figure 2: High Level Hardware-in-the-loop (HiL) System and Interfaces

The HIL system loads sensor data derived from test drive logs. These logs vary in format, but often are captured and stored as MDF4, ADTF, rosbag, or other data logger proprietary formats. These are then processed for HIL simulations to implement open-loop and closed-loop simulations.

Open loop simulations refer to replay of log data from test drives.
Closed-loop simulations rely on the behavior of the system as inputs vary based on new outputs of the simulation.

Both open loop and closed loop simulations are part of autonomous driving development, but open-loop simulations require the largest datasets (multiple petabytes on average) due to reliance on the log data from test drives making them a primary concern for deploying HiL in a hybrid manner.

Overview of Solution

Architectures for supporting HIL simulations with AWS for AD development vary based primarily on the networking available at HiL locations. A common pattern for AWS customers is to have the HiL systems directly interfacing with Amazon S3 over high-bandwidth network links leveraging AWS Direct Connect. This is the simplest approach to deploying HiL and avoids hybrid data management of the petabytes of data in Amazon S3 to a local storage system.

AWS Direct Connect provides customers options to deploy their HIL rigs at their data center or in AWS colocation facilities with low latency connections. AWS has the largest number of Direct Connection locations and points-of-presence (POPs) to enable low latency connectivity to any of the >24 AWS Regions. The following diagram illustrates a reference architecture leveraging a direct interface from the HiL systems and Amazon S3.

Figure 3 : Reference Architecture for Hardware-in-the-Loop (HiL) Direct to Amazon S3

As shown in Figure 3, we illustrate the common interfaces, topology, and AWS services used for autonomous driving customers.

Amazon S3 is used to store and analyze the test drive logs used by the HiL simulations and also the results from the simulation runs for further analysis.
Metadata of test drive data is populated in various database and analytics services, referred to as the data catalogues, with metadata crawlers and processing pipelines that extract from the drive log and test result data on Amazon S3.
The data catalogues provide flexible search interfaces for developers and validation engineers or advanced analytics tools. These systems provide keyword search in Amazon Elasticsearch or SQL queries in Amazon Redshift or Amazon RDS and noSQL interfaces using Amazon DynamoDB. Amazon Partner Network solutions for these database and analysis tools are common as well, such as those in AWS Marketplace.
Validation engineer, data scientists, or developers use these data catalogues to find scenarios for testing. These personas also use the HiL management interfaces to configure and orchestrate the HiL simulation runs on the scenarios identified and ensure traceability.
HiL management systems control the HiL Rigs that interface to the DUT and implement the HiL simulations using the test drive logs. The HiL management system then writes results back to S3 for further analysis via various tool chains.

A common question AWS customers have is how to determine an optimal hybrid architecture using this approach. The primary factors are properly sized network links to accommodate data sets used by the HiL simulations as well as low latency network links between Amazon S3 and the HiL rigs. As a result, a key factor is ensuring use of an AWS region for your AWS storage that is in close proximity to your HiL testing site(s).

Based on current HiL implementations, open-loop simulations can sustain latencies of 30-50 ms RTT. AWS has numerous AWS Direct Connect locations in co-location facilities with latencies <5ms RTT. Sizing for these network links can be calculated based on the expected dataset sizes and the interval of time targeted for simulation run. We show a basic formula used for network sizing.

Average_Throughput (Gbps) = Average_Dataset_Size(GB)*8 / Time_Interval (seconds)

As an example, for a scenario where an average of 20PB is needed by the HIL rig every 2 weeks, we require ~200Gbps for the AWS Direct Connect bandwidth.

Figure 4 shows an example of a high-level architecture supported by Elektrobit with multiple EB 9101 test racks grouped together. This architecture supports multiple ECUs to be tested at once, leveraging drive log data in Amazon S3. This system is controlled with a central management software that allows optimal orchestration to keep the Elektrobit HiL system running optimally.

Use cases include:

The automated replay of all relevant sensor data with high time precision to ECU
Capture of ECU responses including debug data
Integration of customer components inside the HiL rack for visualization or post-processing.

Another common question from AWS customers is whether this architecture is supported for their HiL implementation. Many HiL providers are adding AWS functionality to their software and hardware stacks in response to customers transitioning to cloud for the development platforms. Some vendors still require Amazon S3 as a supported interface in their HiL Rigs. The work needed to accommodate Amazon S3 is usually a small level of effort for any developer by using Amazon SDKs on the HiL rig software stack. If there is a project where this is needed, contact the AWS account teams and your HiL vendors to ensure a successful and cost efficient project implementation.

Figure 4: Elektrobit HiL Architecture with AWS

An alternative HiL solution shown in Figure 5 includes Amazon S3 as the primary storage for drive log data and the scale out NAS storage system is located on-premises operating as a cache for the HIL rigs. This is common when the networking options at the HIL site are limited in bandwidth or latency to handle the target datasets and time windows.

AWS customers calculate the size of the cache to transfer the entire dataset over the intended time interval. Following is a simple calculation to demonstrate this.

Cache_Size(GB) = Average_Dataset_Size(GB) - Average_Throughput (Gbps) /8 * Time_Interval (seconds)

In this example, a customer has 40 Gbps AWS Direct Connect available and a 10PB dataset needed for HIL simulations every 2 weeks. Using the preceding formula there is a need for local cache of four PB capable of high read-rates.

Figure 5: Reference Architecture for Hardware-in-the Loop (HiL) with Local Cache to Amazon S3

In this hybrid architecture there is a need to orchestrate the data movement in line with the needs of the HIL simulation data set. This requires third party software generally or built in functionality into workflow orchestration tools like Apache Airflow. At CES 2019, Dell EMC and AWS illustrated a solution for this hybrid architecture documented in this short solution brief using Isilon as the scale out NAS storage system and DataIQ as the data movement and orchestration mechanism.

Any of these architectures can be cost-optimized, and AWS has programs and pricing options for Amazon Direct Connect as well as the other AWS services involved. There are Enterprise Agreements and Migration Acceleration Programs (MAP) in line with the holistic AD development platform needs, that reduce the costs for hybrid architecture functionality needed in the HiL solutions. One common need is support for AWS Direct Connect “flat rate” pricing option to accommodate the data transfer out (DTO) needs for the HiL workload. If you need details on these programs for your AD development project, contact your AWS account team.

Conclusion

In this blog post, we discussed two common architectural patterns for supporting HiL simulations for ADAS and Autonomous Driving development. These help customers decide on the right networking, storage, and hybrid topologies for these systems.

HiL systems directly interfacing with Amazon S3 is the most common pattern as you see with Elektrobit HiL solutions, but for customers with limited network links the use of a local cache is an option. Autonomous driving customers looking to increase velocity in their SAE Level 2-5 development programs with HiL simulations have achieved success with AWS as the development platform using these patterns. AWS has a team dedicated to autonomous driving, so contact your AWS account team to get a more prescriptive solution for your HiL or related ADAS and AD development needs.

Also, check out the Automotive issue of the AWS Architecture Monthly Magazine.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.