Tag Archives: Customer Solutions

Reimagine Software Development With CodeWhisperer as Your AI Coding Companion

2023-07-19 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/reimagine-software-development-with-codewhisperer-as-your-ai-coding-companion/

In the few months since Amazon CodeWhisperer became generally available, many customers have used it to simplify and streamline the way they develop software. CodeWhisperer uses generative AI powered by a foundational model to understand the semantics and context of your code and provide relevant and useful suggestions. It can help build applications faster and more securely, and it can help at different levels, from small suggestions to writing full functions and unit tests that help decompose a complex problem into simpler tasks.

Imagine you want to improve your code test coverage or implement a fine-grained authorization model for your application. As you begin writing your code, CodeWhisperer is there, working alongside you. It understands your comments and existing code, providing real-time suggestions that can range from snippets to entire functions or classes. This immediate assistance adapts to your flow, reducing the need for context-switching to search for solutions or syntax tips. Using a code companion can enhance focus and productivity during the development process.

When you encounter an unfamiliar API, CodeWhisperer accelerates your work by offering relevant code suggestions. In addition, CodeWhisperer offers a comprehensive code scanning feature that can detect elusive vulnerabilities and provide suggestions to rectify them. This aligns with best practices such as those outlined by the Open Worldwide Application Security Project (OWASP). This makes coding not just more efficient, but also more secure and with an increased assurance in the quality of your work.

CodeWhisperer can also flag code suggestions that resemble open-source training data, and flag and remove problematic code that might be considered biased or unfair. It provides you with the associated open-source project’s repository URL and license, making it easier for you to review them and add attribution where necessary.

Here are a few examples of CodeWhisperer in action that span different areas of software development, from prototyping and onboarding to data analytics and permissions management.

CodeWhisperer Speeds Up Prototyping and Onboarding
One customer using CodeWhisperer in an interesting way is BUILDSTR, a consultancy that provides cloud engineering services focused on platform development and modernization. They use Node.js and Python in the backend and mainly React in the frontend.

I talked with Kyle Hines, co-founder of BUILDSTR, who said, “leveraging CodeWhisperer across different types of development projects for different customers, we’ve seen a huge impact in prototyping. For example, we are impressed by how quickly we are able to create templates for AWS Lambda functions interacting with other AWS services such as Amazon DynamoDB.” Kyle said their prototyping now takes 40% less time, and they noticed a reduction of more than 50% in the number of vulnerabilities present in customer environments.

Kyle added, “Because hiring and developing new talent is a perpetual process for consultancies, we leveraged CodeWhisperer for onboarding new developers and it helps BUILDSTR Academy reduce the time and complexity for onboarding by more than 20%.”

CodeWhisperer for Exploratory Data Analysis
Wendy Wong is a business performance analyst building data pipelines at Service NSW and agile projects in AI. For her contributions to the community, she’s also an AWS Data Hero. She says Amazon CodeWhisperer has significantly accelerated her exploratory data analysis process, when she is analyzing a dataset to get a summary of its main characteristics using statistics and visualization tools.

She finds CodeWhisperer to be a swift, user-friendly, and dependable coding companion that accurately infers her intent with each line of code she crafts, and ultimately aids in the enhancement of her code quality through its best practice suggestions.

“Using CodeWhisperer, building code feels so much easier when I don’t have to remember every detail as it will accurately autocomplete my code and comments,” she shared. “Earlier, it would take me 15 minutes to set up data preparation pre-processing tasks, but now I’m ready to go in 5 minutes.”

Wendy says she has gained efficiency by delegating these repetitive tasks to CodeWhisperer, and she wrote a series of articles to explain how to use it to simplify exploratory data analysis.

Another tool used to explore data sets is SQL. Wendy is looking into how CodeWhisperer can help data engineers who are not SQL experts. For instance, she noticed they can just ask to “write multiple joins” or “write a subquery” to quickly get the correct syntax to use.

CodeWhisperer Accelerates Testing and Other Daily Tasks
I had the opportunity to spend some time with software engineers in the AWS Developer Relations Platform team. That’s the team that, among other things, builds and operates the community.aws website.

Nikitha Tejpal’s work primarily revolves around TypeScript, and CodeWhisperer aids her coding process by offering effective autocomplete suggestions that come up as she types. She said she specifically likes the way CodeWhisperer helps with unit tests.

“I can now focus on writing the positive tests, and then use a comment to have CodeWhisperer suggest negative tests for the same code,” she says. “In this way, I can write unit tests in 40% less time.”

Her colleague, Carlos Aller Estévez, relies on CodeWhisperer’s autocomplete feature to provide him with suggestions for a line or two to supplement his existing code, which he accepts or ignores based on his own discretion. Other times, he proactively leverages the predictive abilities of CodeWhisperer to write code for him. “If I want explicitly to get CodeWhisperer to code for me, I write a method signature with a comment describing what I want, and I wait for the autocomplete,” he explained.

For instance, when Carlos’s objective was to check if a user had permissions on a given path or any of its parent paths, CodeWhisperer provided a neat solution for part of the problem based on Carlos’s method signature and comment. The generated code checks the parent directories of a given resource, then creates a list of all possible parent paths. Carlos then implemented a simple permission check over each path to complete the implementation.

“CodeWhisperer helps with algorithms and implementation details so that I have more time to think about the big picture, such as business requirements, and create better solutions,” he added.

CodeWhisperer is a Multilingual Team Player
CodeWhisperer is polyglot, supporting code generation for 15 programming languages: Python, Java, JavaScript, TypeScript, C#, Go, Rust, PHP, Ruby, Kotlin, C, C++, Shell scripting, SQL, and Scala.

CodeWhisperer is also a team player. In addition to Visual Studio (VS) Code and the JetBrains family of IDEs (including IntelliJ, PyCharm, GoLand, CLion, PhpStorm, RubyMine, Rider, WebStorm, and DataGrip), CodeWhisperer is also available for JupyterLab, in AWS Cloud9, in the AWS Lambda console, and in Amazon SageMaker Studio.

At AWS, we are committed to helping our customers transform responsible AI from theory into practice by investing to build new services to meet the needs of our customers and make it easier for them to identify and mitigate bias, improve explainability, and help keep data private and secure.

You can use Amazon CodeWhisperer for free in the Individual Tier. See CodeWhisperer pricing for more information. To get started, follow these steps.

— Danilo

How quirion created nested email templates using Amazon Simple Email Service (SES)

2023-07-18 Dominik Richter

Post Syndicated from Dominik Richter original https://aws.amazon.com/blogs/messaging-and-targeting/how-quirion-created-nested-email-templates-using-amazon-simple-email-service-ses/

This is part two of the two-part guest series on extending Simple Email Services with advanced functionality. Find part one here.

quirion, founded in 2013, is an award-winning German robo-advisor with more than 1 billion Euro under management. At quirion, we send out five thousand emails a day to more than 60,000 customers.

Managing many email templates can be challenging

We chose Amazon Simple Email Service (SES) because it is an easy-to-use and cost-effective email platform. In particular, we benefit from email templates in SES, which ensure a consistent look and feel of our communication. These templates come with a styled and personalized HTML email body, perfect for transactional emails. However, managing many email templates can be challenging. Several templates share common elements, such as the company’s logo, name or imprint. Over time, some of these elements may change. If they are not updated across all templates, the result is an inconsistent set of templates. To overcome this problem, we created an application to extend the SES template functionality with an interface for creating and managing nested templates.

This post shows how you can implement this solution using Amazon Simple Storage Service (Amazon S3), Amazon API Gateway, AWS Lambda and Amazon DynamoDB.

Solution: compose email from nested templates using AWS Lambda

The solution we built is fully serverless, which means we do not have to manage the underlying infrastructure. We use AWS Cloud Development Kit (AWS CDK) to deploy the architecture.

The figure below describes the architecture diagram for the proposed solution.

The entry point to the application is an API Gateway that routes requests to a Lambda function. A request consists of an HTML file that represents a part of an email template and metadata that describes the structure of the template.
The Lambda function is the key component of the application. It takes the HTML file and the metadata and stores them in a S3 Bucket and a DynamoDB table.
Depending on the metadata, it takes an existing template from storage, inserts the HTML from the request into it and creates a SES email template.

Architecture diagram of the solution: new templates in Amazon SES are created by a Lambda function accessed through API Gateway. THe Lambda function reads and writes HTML from S3 and reads and writes metadata from DynamoDB.

The solution is simplified for this blog post and is used to show the possibilities of SES. We will not discuss the code of the Lambda function as there are several ways to implement it depending on your preferred programming language.

Prerequisites

An AWS Account that provides access to AWS services.
An Amazon Simple Email Service (Amazon SES) verified identity. You can create and verify a sending identity by following the steps described in the documentation.
AWS CDK installed.
AWS CLI installed.
git installed.
Depending on your system, Docker may need to be running.
GO installed.
curl installed.

Walkthrough

Step 1: Use the AWS CDK to deploy the application
To download and deploy the application run the following commands:

$ git clone https://github.com/quirionit/aws-ses-examples.git
$ cd aws-ses-examples/projects/go-src
$ go mod tidy
$ cd ../../projects/template-api
$ npm install
$ cdk deploy

Step 2: Create nested email templates

To create a nested email template, complete the following steps:

On the AWS Console, choose the API Gateway.
You should see an API with a name that includes SesTemplateApi.
Click on the name and note the Invoke URL from the details page.
In your terminal, navigate to aws-ses-examples/projects/template-api/files and run the following command. Note that you must use your gateway’s Invoke URL.
```
curl -F [email protected] -F "isWrapper=true" -F "templateName=m-full" -F "child=content" -F "variables=FIRSTNAME" -F "variables=LASTNAME" -F "plain=Hello {{.FIRSTNAME}} {{.LASTNAME}},{{template \"content\" .}}" YOUR INVOKE URL/emails
```
The request triggers the Lambda function, which creates a template in DynamoDB and S3. In addition, the Lambda function uses the properties of the request to decide when and how to create a template in SES. With “isWrapper=true” the template is marked as a template that wraps another template and therefore no template is created in SES. “child=content” specifies the entry point for the child template that is used within m-full.html. It also uses FIRSTNAME and LASTNAME as replacement tags for personalization.
In your terminal, run the following command to create a SES email template that uses the template created in step 4 as a wrapper.

Step 3: Analyze the result

On the AWS Console, choose DynamoDB.
From the sidebar, choose Tables.
Select the table with the name that includes SesTemplateTable.
Choose Explore table items. It should now return two new items.

The table stores the metadata that describes how to create a SES email template. Creating an email template in SES is initiated when an element’s Child attribute is empty or null. This is the case for the item with the name order-confirmation. It uses the BucketKey attribute to identify the required HTML stored in S3 and the Parent attribute to determine the metadata from the parent template. The Variables attribute is used to describe the placeholders that are used in the template.
On the AWS Console, choose S3.
Select the bucket with the name that starts with ses-email-templates.
Select the template/ folder. It should return two objects.

The m-full.html contains the structure and the design of an email template and is used with the order-confirmation.html which contains the content.
On the AWS Console, choose the Amazon Simple Email Service.
From the sidebar, choose Email templates. It should return the following template.

Step 4: Send an email with the created template

Open the send-order-confirmation.json file from aws-ses-examples/projects/template-api/files in a text editor.
Set a verified email address as Source and ToAddresses and save the file.
Navigate your terminal to aws-ses-examples/projects/template-api/files and run the following command:
aws ses send-templated-email --cli-input-json file://send-order-confirmation.json
As a result, you should get an email.

Step 5: Cleaning up

Navigate your terminal to aws-ses-examples/projects/template-api.
Delete all resources with cdk destroy.
Delete the created SES email template with:
aws ses delete-template --template-name order-confirmation

Next Steps

There are several ways to extend this solution’s functionality, including the ones below:

If you send an email that contains invalid personalization content, Amazon SES might accept the message, but won’t be able to deliver it. For this reason, if you plan to send personalized email, you should configure Amazon SES to send Rendering Failure event notifications.
The Amazon SES template feature does not support sending attachments, but you can add the functionality yourself. See part one of this blog series for instructions.
When you create a new Amazon SES account, by default your emails are sent from IP addresses that are shared with other SES users. You can also use dedicated IP addresses that are reserved for your exclusive use. This gives you complete control over your sender reputation and enables you to isolate your reputation for different segments within email programs.

Conclusion

In this blog post, we explored how to use Amazon SES with email templates to easily create complex transactional emails. The AWS CLI was used to trigger SES to send an email, but that could easily be replaced by other AWS services like Step Functions. This solution as a whole is a fully serverless architecture where we don’t have to manage the underlying infrastructure. We used the AWS CDK to deploy a predefined architecture and analyzed the deployed resources.

About the authors

	Mark Kirchner is a backend engineer at quirion AG. He uses AWS CDK and several AWS services to provide a cloud backend for a web application used for financial services. He follows a full serverless approach and enjoys resolving problems with AWS.
	Dominik Richter is a Solutions Architect at Amazon Web Services. He primarily works with financial services customers in Germany and particularly enjoys Serverless technology, which he also uses for his own mobile apps.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

How quirion sends attachments using email templates with Amazon Simple Email Service (SES)

2023-07-18 Dominik Richter

Post Syndicated from Dominik Richter original https://aws.amazon.com/blogs/messaging-and-targeting/how-quirion-sends-attachments-using-email-templates-with-amazon-simple-email-service-ses/

This is part one of the two-part guest series on extending Simple Email Services with advanced functionality. Find part two here.

quirion is an award-winning German robo-advisor, founded in 2013, and with more than 1 billion euros under management. At quirion, we send out five thousand emails a day to more than 60,000 customers.

We chose Amazon Simple Email Service (SES) because it is an easy-to-use and cost-effective email platform. In particular, we benefit from email templates in SES, which ensure a consistent look and feel of our communication. These templates come with a styled and personalized HTML email body, perfect for transactional emails. Sometimes it is necessary to add attachments to an email, which is currently not supported by the SES template feature. To overcome this problem, we created a solution to use the SES template functionality and add file attachments.

This post shows how you can implement this solution using Amazon Simple Storage Service (Amazon S3), Amazon EventBridge, AWS Lambda and AWS Step Functions.

Solution: orchestrate different email sending options using AWS Step Functions

The solution we built is fully serverless, which means we do not have to manage the underlying infrastructure. We use AWS Cloud Development Kit (AWS CDK) to deploy the architecture and analyze the resources.

The solution extends SES to send attachments using email templates. SES offers three possibilities for sending emails:

Simple — A standard email message. When you create this type of message, you specify the sender, the recipient, and the message body, and Amazon SES assembles the message for you.
Raw — A raw, MIME-formatted email message. When you send this type of email, you have to specify all of the message headers, as well as the message body. You can use this message type to send messages that contain attachments. The message that you specify has to be a valid MIME message.
Templated — A message that contains personalization tags. When you send this type of email, Amazon SES API v2 automatically replaces the tags with values that you specify.

In this post, we will combine the Raw and the Templated options.

The figure below describes the architecture diagram for the proposed solution.

The entry point to the application is an EventBridge event bus that routes incoming events to a Step Function workflow.
An event consists of the personalization parameters, the sender and recipient addresses, the template name and optionally the document-related properties such as a reference to the S3 bucket in which the document is stored. Depending on whether the event contains document-related properties, the Step Function workflow decides how the email is prepared and sent.
In case the event does not contain document-related properties, it uses the SendEmail action to send a templated email. The action requires the template name and the data to replace the personalization tags.
If the event contains document-related properties, the raw sending option of the SendEmail action must be used. If we also want to use an email template, we need to use that as a raw MIME message. So, we use the TestRenderEmailTemplate action to get the raw MIME message from the template and use a Lambda function to get and add the document. The Lambda function then triggers SES to send the email.

The solution is simplified for this blog post and is used to show the possibilities of SES. We will not discuss the code of the lambda function as there are several ways to implement it depending on your preferred programming language.

Architecture diagram of the solution: an AWS Step Functions workflow is triggered by EventBridge. If the event contains no document, the workflow triggers Amazon SES SendEmail. Otherwise, it uses SES TestRenderEmailTemplate as input for a Lambda function, which gets the document from S3 and then sends the email.

Prerequisites

An AWS Account that provides access to AWS services.
An Amazon Simple Email Service (Amazon SES) verified identity. You can create and verify a sending identity by following the steps described in the documentation.
AWS CDK installed.
AWS CLI installed.
Depending on your system, Docker may need to be running.
git installed.
GO installed.

Walkthrough

Step 1: Use the AWS CDK to deploy the application

To download and deploy the application run the following commands:

$ git clone [email protected]:quirionit/aws-ses-examples.git
$ cd aws-ses-examples/projects/go-src
$ go mod tidy
$ cd ../../projects/email-sender
$ npm install
$ cdk deploy

Step 2: Create a SES email template

In your terminal, navigate to aws-ses-examples/projects/email-sender and run:

aws ses create-template --cli-input-json file://files/hello_doc.json

Step 3: Upload a sample document to S3

To upload a document to S3, complete the following steps:

On the AWS Console, choose the S3.
Select the bucket with the name that starts with ses-documents.
Copy and save the bucket name for later.
Create a new folder called test.
Upload the hello.txt from aws-ses-examples/projects/email-sender/files into the folder.

Screenshot of Amazon S3 console, showing the ses-documents bucket containing the file tes/hello.txt

Step 4: Trigger sending an email using Amazon EventBridge

To trigger sending an email, complete the following steps:

On the AWS Console, choose the Amazon EventBridge.
Select Event busses from the sidebar.
Select Send events.
Create an event as the following image shows. You can copy the Event detail from aws-ses-examples/projects/email-sender/files/event.json. Don’t forget to replace the sender, recipient and bucket with your values.
As a result of sending the event, you should receive an email with the document attached.
To send an email without attachment, edit the event as follows:

Step 5: Analyze the result

On the AWS Console, choose Step Functions.
Select the state machine with the name that includes EmailSender.
You should see two Succeeded executions. If you select them the dataflows should look like this:
You can select each step of the dataflows and analyze the inputs and outputs.

Step 6: Cleaning up

Navigate your terminal to aws-ses-examples/projects/email-sender.
Delete all resources with cdk destroy.
Delete the created SES email template with:

aws ses delete-template --template-name HelloDocument

Next Steps

There are several ways to extend this solution’s functionality, see some of them below:

If you send an email that contains invalid personalization content, Amazon SES might accept the message, but won’t be able to deliver it. For this reason, if you plan to send personalized email, you should configure Amazon SES to send Rendering Failure event notifications.
You can create nested templates to share common elements, such as the company’s logo, name or imprint. See part two of this blog series for instructions.
When you create a new Amazon SES account, by default your emails are sent from IP addresses that are shared with other SES users. You can also use dedicated IP addresses that are reserved for your exclusive use. This gives you complete control over your sender reputation and enables you to isolate your reputation for different segments within email programs.

Conclusion

In this blog post, we explored how to use Amazon SES to send attachments using email templates. We used an Amazon EventBridge to trigger a Step Function that chooses between sending a raw or templated SES email. This solution uses a full serverless architecture without having to manage the underlying infrastructure. We used the AWS CDK to deploy a predefined architecture and analyzed the deployed resources.

About the authors

	Mark Kirchner is a backend engineer at quirion AG. He uses AWS CDK and several AWS services to provide a cloud backend for a web application used for financial services. He follows a full serverless approach and enjoys resolving problems with AWS.
	Dominik Richter is a Solutions Architect at Amazon Web Services. He primarily works with financial services customers in Germany and particularly enjoys Serverless technology, which he also uses for his own mobile apps.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

Optimize AWS Config for AWS Security Hub to effectively manage your cloud security posture

2023-07-17 Nicholas Jaeger

Post Syndicated from Nicholas Jaeger original https://aws.amazon.com/blogs/security/optimize-aws-config-for-aws-security-hub-to-effectively-manage-your-cloud-security-posture/

AWS Security Hub is a cloud security posture management service that performs security best practice checks, aggregates security findings from Amazon Web Services (AWS) and third-party security services, and enables automated remediation. Most of the checks Security Hub performs on AWS resources happen as soon as there is a configuration change, giving you nearly immediate visibility of non-compliant resources in your environment, compared to checks that run on a periodic basis. This near real-time finding and reporting of non-compliant resources helps you to quickly respond to infrastructure misconfigurations and reduce risk. Security Hub offers these continuous security checks through its integration with the AWS Config configuration recorder.

By default, AWS Config enables recording for more than 300 resource types in your account. Today, Security Hub has controls that cover approximately 60 of those resource types. If you’re using AWS Config only for Security Hub, you can optimize the configuration of the configuration recorder to track only the resources you need, helping to reduce the costs related to monitoring those resources in AWS Config and the amount of data produced, stored, and analyzed by AWS Config. This blog post walks you through how to set up and optimize the AWS Config recorder when it is used for controls in Security Hub.

Using AWS Config and Security Hub for continuous security checks

When you enable Security Hub, you’re alerted to first enable resource recording in AWS Config, as shown in Figure 1. AWS Config continually assesses, audits, and evaluates the configurations and relationships of your resources on AWS, on premises, and in other cloud environments. Security Hub uses this capability to perform change-initiated security checks. Security Hub checks that use periodic rules don’t depend on the AWS Config recorder. You must enable AWS Config resource recording for all the accounts and in all AWS Regions where you plan to enable Security Hub standards and controls. AWS Config charges for the configuration items that are recorded, separately from Security Hub.

Figure 1: Security Hub alerts you to first enable resource recording in AWS Config

When you get started with AWS Config, you’re prompted to set up the configuration recorder, as shown in Figure 2. AWS Config uses the configuration recorder to detect changes in your resource configurations and capture these changes as configuration items. Using the AWS Config configuration recorder not only allows for continuous security checks, it also minimizes the need to query for the configurations of the individual services, saving your service API quotas for other use cases. By default, the configuration recorder records the supported resources in the Region where the recorder is running.

Note: While AWS Config supports the configuration recording of more than 300 resource types, some Regions support only a subset of those resource types. To learn more, see Supported Resource Types and Resource Coverage by Region Availability.

Figure 2: Default AWS Config settings

Optimizing AWS Config for Security Hub

Recording global resources as well as current and future resources in AWS Config is more than what is necessary to enable Security Hub controls. If you’re using the configuration recorder only for Security Hub controls, and you want to cost optimize your use of AWS Config or reduce the amount of data produced, stored, and analyzed by AWS Config, you only need to record the configurations of approximately 60 resource types, as described in AWS Config resources required to generate control findings.

Set up AWS Config, optimized for Security Hub

We’ve created an AWS CloudFormation template that you can use to set up AWS Config to record only what’s needed for Security Hub. You can download the template from GitHub.

This template can be used in any Region that supports AWS Config (see AWS Services by Region). Although resource coverage varies by Region (Resource Coverage by Region Availability), you can still use this template in every Region. If a resource type is supported by AWS Config in at least one Region, you can enable the recording of that resource type in all Regions supported by AWS Config. For the Regions that don’t support the specified resource type, the recorder will be enabled but will not record any configuration items until AWS Config supports the resource type in the Region.

Security Hub regularly releases new controls that might rely on recording additional resource types in AWS Config. When you use this template, you can subscribe to Security Hub announcements with Amazon Simple Notification Service (SNS) to get information about newly released controls that might require you to update the resource types recorded by AWS Config (and listed in the CloudFormation template). The CloudFormation template receives periodic updates in GitHub, but you should validate that it’s up to date before using it. You can also use AWS CloudFormation StackSets to deploy, update, or delete the template across multiple accounts and Regions with a single operation. If you don’t enable the recording of all resources in AWS Config, the Security Hub control, Config.1 AWS Config should be enabled, will fail. If you take this approach, you have the option to disable the Config.1 Security Hub control or suppress its findings using the automation rules feature in Security Hub.

Customizing for your use cases

You can modify the CloudFormation template depending on your use cases for AWS Config and Security Hub. If your use case for AWS Config extends beyond your use of Security Hub controls, consider what additional resource types you will need to record the configurations of for your use case. For example, AWS Firewall Manager, AWS Backup, AWS Control Tower, AWS Marketplace, and AWS Trusted Advisor require AWS Config recording. Additionally, if you use other features of AWS Config, such as custom rules that depend on recording specific resource types, you can add these resource types in the CloudFormation script. You can see the results of AWS Config rule evaluations as findings in Security Hub.

Another customization example is related to the AWS Config configuration timeline. By default, resources evaluated by Security Hub controls include links to the associated AWS Config rule and configuration timeline in AWS Config for that resource, as shown in Figure 3.

Figure 3: Link from Security Hub control to the configuration timeline for the resource in AWS Config

The AWS Config configuration timeline, as illustrated in Figure 4, shows you the history of compliance changes for the resource, but it requires the AWS::Config::ResourceCompliance resource type to be recorded. If you need to track changes in compliance for resources and use the configuration timeline in AWS Config, you must add the AWS::Config::ResourceCompliance resource type to the CloudFormation template provided in the preceding section. In this case, Security Hub may change the compliance of the Security Hub managed AWS Config rules, which are recorded as configuration items for the AWS::Config::ResourceCompliance resource type, incurring additional AWS Config recorder charges.

Figure 4: Config resource timeline

Summary

You can use the CloudFormation template provided in this post to optimize the AWS Config configuration recorder for Security Hub to reduce your AWS Config costs and to reduce the amount of data produced, stored, and analyzed by AWS Config. Alternatively, you can run AWS Config with the default settings or use the AWS Config console or scripts to further customize your configuration to fit your use case. Visit Getting started with AWS Security Hub to learn more about managing your security alerts.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

2023-07-14 Nitin Arora

Post Syndicated from Nitin Arora original https://aws.amazon.com/blogs/big-data/how-amazon-finance-automation-built-a-data-mesh-to-support-distributed-data-ownership-and-centralize-governance/

Amazon Finance Automation (FinAuto) is the tech organization of Amazon Finance Operations (FinOps). Its mission is to enable FinOps to support the growth and expansion of Amazon businesses. It works as a force multiplier through automation and self-service, while providing accurate and on-time payments and collections. FinAuto has a unique position to look across FinOps and provide solutions that help satisfy multiple use cases with accurate, consistent, and governed delivery of data and related services.

In this post, we discuss how the Amazon Finance Automation team used AWS Lake Formation and the AWS Glue Data Catalog to build a data mesh architecture that simplified data governance at scale and provided seamless data access for analytics, AI, and machine learning (ML) use cases.

Challenges

Amazon businesses have grown over the years. In the early days, financial transactions could be stored and processed on a single relational database. In today’s business world, however, even a subset of the financial space dedicated to entities such as Accounts Payable (AP) and Accounts Receivable (AR) requires separate systems handling terabytes of data per day. Within FinOps, we can curate more than 300 datasets and consume many more raw datasets from dozens of systems. These datasets can then be used to power front end systems, ML pipelines, and data engineering teams.

This exponential growth necessitated a data landscape that was geared towards keeping FinOps operating. However, as we added more transactional systems, data started to grow in operational data stores. Data copies were common, with duplicate pipelines creating redundant and often out-of-sync domain datasets. Multiple curated data assets were available with similar attributes. To resolve these challenges, FinAuto decided to build a data services layer based on a data mesh architecture. FinAuto wanted to verify that the data domain owners would retain ownership of their datasets while users got access to the data by using a data mesh architecture.

Solution overview

Being customer focused, we started by understanding our data producers’ and consumers’ needs and priorities. Consumers prioritized data discoverability, fast data access, low latency, and high accuracy of data. Producers prioritized ownership, governance, access management, and reuse of their datasets. These inputs reinforced the need of a unified data strategy across the FinOps teams. We decided to build a scalable data management product that is based on the best practices of modern data architecture. Our source system and domain teams were mapped as data producers, and they would have ownership of the datasets. FinAuto provided the data services’ tools and controls necessary to enable data owners to apply data classification, access permissions, and usage policies. It was necessary for domain owners to continue this responsibility because they had visibility to the business rules or classifications and applied that to the dataset. This enabled producers to publish data products that were curated and authoritative assets for their domain. For example, the AR team created and governed their cash application dataset in their AWS account AWS Glue Data Catalog.

With many such partners building their data products, we needed a way to centralize data discovery, access management, and vending of these data products. So we built a global data catalog in a central governance account based on the AWS Glue Data Catalog. The FinAuto team built AWS Cloud Development Kit (AWS CDK), AWS CloudFormation, and API tools to maintain a metadata store that ingests from domain owner catalogs into the global catalog. This global catalog captures new or updated partitions from the data producer AWS Glue Data Catalogs. The global catalog is also periodically fully refreshed to resolve issues during metadata sync processes to maintain resiliency. With this structure in place, we then needed to add governance and access management. We selected AWS Lake Formation in our central governance account to help secure the data catalog, and added secure vending mechanisms around it. We also built a front-end discovery and access control application where consumers can browse datasets and request access. When a consumer requests access, the application validates the request and routes them to a respective producer via internal tickets for approval. Only after the data producer approves the request are permissions provisioned in the central governance account through Lake Formation.

Solution tenets

A data mesh architecture has its own advantages and challenges. By democratizing the data product creation, we removed dependencies on a central team. We made reuse of data possible with data discoverability and minimized data duplicates. This also helped remove data movement pipelines, thereby reducing data transfer and maintenance costs.

We realized, however, that our implementation could potentially impact day-to-day tasks and inhibit adoption. For example, data producers need to onboard their dataset to the global catalog, and complete their permissions management before they can share that with consumers. To overcome this obstacle, we prioritized self-service tools and automation with a reliable and simple-to-use interface. We made interaction, including producer-consumer onboarding, data access request, approvals, and governance, quicker through the self-service tools in our application.

Solution architecture

Within Amazon, we isolate different teams and business processes with separate AWS accounts. From a security perspective, the account boundary is one of the strongest security boundaries in AWS. Because of this, the global catalog resides in its own locked-down AWS account.

The following diagram shows AWS account boundaries for producers, consumers, and the central catalog. It also describes the steps involved for data producers to register their datasets as well as how data consumers get access. Most of these steps are automated through convenience scripts with both AWS CDK and CloudFormation templates for our producers and consumer to use.

The workflow contains the following steps:

Data is saved by the producer in their own Amazon Simple Storage Service (Amazon S3) buckets.
Data source locations hosted by the producer are created within the producer’s AWS Glue Data Catalog.
Data source locations are registered with Lake Formation.
An onboarding AWS CDK script creates a role for the central catalog to use to read metadata and generate the tables in the global catalog.
The metadata sync is set up to continuously sync data schema and partition updates to the central data catalog.
When a consumer requests table access from the central data catalog, the producer grants Lake Formation permissions to the consumer account AWS Identity and Access Management (IAM) role and tables are visible in the consumer account.
The consumer account accepts the AWS Resource Access Manager (AWS RAM) share and creates resource links in Lake Formation.
The consumer data lake admin provides grants to IAM users and roles mapping to data consumers within the account.

The global catalog

The basic building block of our business-focused solutions are data products. A data product is a single domain attribute that a business understands as accurate, current, and available. This could be a dataset (a table) representing a business attribute like a global AR invoice, invoice aging, aggregated invoices by a line of business, or a current ledger balance. These attributes are calculated by the domain team and are available for consumers who need that attribute, without duplicating pipelines to recreate it. Data products, along with raw datasets, reside within their data owner’s AWS account. Data producers register their data catalog’s metadata to the central catalog. We have services to review source catalogs to identify and recommend classification of sensitive data columns such as name, email address, customer ID, and bank account numbers. Producers can review and accept those recommendations, which results in corresponding tags applied to the columns.

Producer experience

Producers onboard their accounts when they want to publish a data product. Our job is to sync the metadata between the AWS Glue Data Catalog in the producer account with the central catalog account, and register the Amazon S3 data location with Lake Formation. Producers and data owners can use Lake Formation for fine-grained access controls on the table. It is also now searchable and discoverable via the central catalog application.

Consumer experience

When a data consumer discovers the data product that they’re interested in, they submit a data access request from the application UI. Internally, we route the request to the data owner for the disposition of the request (approval or rejection). We then create an internal ticket to track the request for auditing and traceability. If the data owner approves the request, we run automation to create an AWS RAM resource share to share with the consumer account covering the AWS Glue database and tables approved for access. These consumers can now query the datasets using the AWS analytics services of their choice like Amazon Redshift Spectrum, Amazon Athena, and Amazon EMR.

Operational excellence

Along with building the data mesh, it’s also important to verify that we can operate with efficiency and reliability. We recognize that the metadata sync process is at the heart of this global data catalog. As such, we are hypervigilant of this process and have built alarms, notifications, and dashboards to verify that this process doesn’t fail silently and create a single point of failure for the global data catalog. We also have a backup repair service that syncs the metadata from producer catalogs into the central governance account catalog periodically. This is a self-healing mechanism to maintain reliability and resiliency.

Empowering customers with the data mesh

The FinAuto data mesh hosts around 850 discoverable and shareable datasets from multiple partner accounts. There are more than 300 curated data products to which producers can provide access and apply governance with fine-grained access controls. Our consumers use AWS analytics services such as Redshift Spectrum, Athena, Amazon EMR, and Amazon QuickSight to access their data. This capability with standardized data vending from the data mesh, along with self-serve capabilities, allows you to innovate faster without dependency on technical teams. You can now get access to data faster with automation that continuously improves the process.

By serving the FinOps team’s data needs with high availability and security, we enabled them to effectively support operation and reporting. Data science teams can now use the data mesh for their finance-related AI/ML use cases such as fraud detection, credit risk modeling, and account grouping. Our finance operations analysts are now enabled to dive deep into their customer issues, which is most important to them.

Conclusion

FinOps implemented a data mesh architecture with Lake Formation to improve data governance with fine-grained access controls. With these improvements, the FinOps team is now able to innovate faster with access to the right data at the right time in a self-serve manner to drive business outcomes. The FinOps team will continue to innovate in this space with AWS services to further provide for customer needs.

To learn more about how to use Lake Formation to build a data mesh architecture, see Design a data mesh architecture using AWS Lake Formation and AWS Glue.

About the Authors

Nitin Arora is a Sr. Software Development Manager for Finance Automation in Amazon. He has over 18 years of experience building business critical, scalable, high-performance software. Nitin leads several data and analytics initiatives within Finance, which includes building Data Mesh. In his spare time, he enjoys listening to music and read.

Pradeep Misra is a Specialist Solutions Architect at AWS. He works across Amazon to architect and design modern distributed analytics and AI/ML platform solutions. He is passionate about solving customer challenges using data, analytics, and AI/ML. Outside of work, Pradeep likes exploring new places, trying new cuisines, and playing board games with his family. He also likes doing science experiments with his daughters.

Rajesh Rao is a Sr. Technical Program Manager in Amazon Finance. He works with Data Services teams within Amazon to build and deliver data processing and data analytics solutions for Financial Operations teams. He is passionate about delivering innovative and optimal solutions using AWS to enable data-driven business outcomes for his customers.

Andrew Long, the lead developer for data mesh, has designed and built many of the big data processing systems that have fueled Amazon’s financial data processing infrastructure. His work encompasses a range of areas, including S3-based table formats for Spark, diverse Spark performance optimizations, distributed orchestration engines and the development of data cataloging systems. Additionally, Andrew finds pleasure in sharing his knowledge of partner acrobatics.

Kumar Satyen Gaurav, is an experienced Software Development Manager at Amazon, with over 16 years of expertise in big data analytics and software development. He leads a team of engineers to build products and services using AWS big data technologies, for providing key business insights for Amazon Finance Operations across diverse business verticals. Beyond work, he finds joy in reading, traveling and learning strategic challenges of chess.

How AWS helped Altron Group accelerate their vision for optimized customer engagement

2023-07-13 Jason Yung

Post Syndicated from Jason Yung original https://aws.amazon.com/blogs/big-data/how-aws-helped-altron-group-accelerate-their-vision-for-optimized-customer-engagement/

This is a guest post co-authored by Jacques Steyn, Senior Manager Professional Services at Altron Group.

Altron is a pioneer of providing data-driven solutions for their customers by combining technical expertise with in-depth customer understanding to provide highly differentiated technology solutions. Alongside their partner AWS, they participated in AWS Data-Driven Everything (D2E) workshops and a bespoke AWS Immersion Day workshop that catered to their needs to improve their engagement with their customers.

This post discusses the journey that took Altron from their initial goals, to technical implementation, to the business value created from understanding their customers and their unique opportunities better. They were able to think big but start small with a working solution involving rich business intelligence (BI) and insights provided to their key business areas.

Data-Driven Everything engagement

Altron has provided information technology services since 1965 across South Africa, the Middle East, and Australia. Although the group saw strong results at 2022-year end, the region continues to experience challenging operating conditions with global supply chains disrupted, electronic component shortages, and scarcity of IT talent.

To reflect the needs of their customers spread across different geographies and industries, Altron has organized their operating model across individual Operating Companies (OpCos). These are run autonomously with different sales teams, creating siloed operations and engagement with customers and making it difficult to have a holistic and unified sales motion.

To succeed further, their vision of data requires it to be accessible and actionable to all, with key roles and responsibilities defined by those who produce and consume data, as shown in the following figure. This allows for transparency, speed to action, and collaboration across the group while enabling the platform team to evangelize the use of data:

Altron engaged with AWS to seek advice on their data strategy and cloud modernization to bring their vision to fruition. The D2E program was selected to help identify the best way to think big but start small by working collaboratively to ideate on the opportunities to build data as a product, particularly focused on federating customer profile data in an agile and scalable approach.

Amazon mechanisms such as Working Backwards were employed to devise the most delightful and meaningful solution and put customers at the heart of the experience. The workshop helped devise the think big solution that starting with the Systems Integration (SI) OpCo as the first flywheel turn would be the best way to start small and prototype the initial data foundation collaboratively with AWS Solutions Architects.

Preparing for an AWS Immersion Day workshop

The typical preparation of an AWS Immersion Day involves identifying examples of common use case patterns and utilizing demonstration data. To maximize its success, the Immersion Day was stretched across multiple days as a hands-on workshop to enable Altron to bring their own data, build a robust data pipeline, and scale their long-term architecture. In addition, AWS and Altron identified and resolved any external dependencies, such as network connectivity to data sources and targets, where Altron was able to put the connectivity to the sources in place.

Identifying key use cases

After a number of preparation meetings to discuss business and technical aspects of the use case, AWS and Altron identified two uses cases to resolve their two business challenges:

Business intelligence for business-to-business accounts – Altron wanted to focus on their business-to-business (B2B) accounts and customer data. In particular, they wanted to enable their account managers, sales executives, and analysts to use actual data and facts to get a 360 view of their accounts.
- Goals – Grow revenue, increase the conversion ratio of opportunities, reduce the average sales cycle, improve the customer renewal rate.
- Target – Dashboards to be refreshed on a daily basis that would provide insights on sales, gross profit, sales pipelines, and customers.
Data quality for account and customer data – Altron wanted to enable data quality and data governance best practices.
- Goals – Lay the foundation for a data platform that can be used in the future by internal and external stakeholders.

Conducting the Immersion Day workshop

Altron set aside 4 days for their Immersion Day, during which time AWS had assigned a dedicated Solutions Architect to work alongside them to build the following prototype architecture:

This solution includes the following components:

AWS Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning, and application development. The Altron team created an AWS Glue crawler and configured it to run against Azure SQL to discover its tables. The AWS Glue crawler populates the table definition with its schema in AWS Glue Data Catalog.
This step consists of two components:
1. A set of AWS Glue PySpark jobs reads the source tables from Azure SQL and outputs the resulting data in the Amazon Simple Storage Service “Raw Zone”. Basic formatting and readability of the data is standardized here. The jobs run on a scheduled basis, according to the upstream data availability (which currently is daily).
2. Users are able to manually upload reference files (CSV and Excel format) via the Amazon Web Services console directly to the Amazon S3 buckets. Depending on the frequency of upload, the Altron team will consider automated mechanisms and remove manual upload.
The reporting zone is based on a set of Amazon Athena views, which are consumed for BI purposes. The Altron team used Athena to explore the source tables and create the views in SQL language. Depending on the needs, the Altron team will materialize these views or create corresponding AWS Glue jobs.
Athena exposes the content of the reporting zone for consumption.
The content of the reporting zone is ingested via SPICE in Amazon QuickSight. BI users create dashboards and reports in QuickSight. Business users can access QuickSight dashboards from their mobile, thanks to the QuickSight native application, configured to use single sign-on (SSO).
An AWS Step Functions state machine orchestrates the run of the AWS Glue jobs. The Altron team will expand the state machine to include automated refresh of QuickSight SPICE datasets.
To verify the data quality of the sources through statistically-relevant metrics, AWS Glue Data Quality runs data quality tasks on relevant AWS Glue tables. This can be run manually or scheduled via Amazon Eventbridge (Optional).

Generating business outcomes

In 4 days, the Altron SI team left the Immersion Day workshop with the following:

A data pipeline ingesting data from 21 sources (SQL tables and files) and combining them into three mastered and harmonized views that are cataloged for Altron’s B2B accounts.
A set of QuickSight dashboards to be consumed via browser and mobile.
Foundations for a data lake with data governance controls and data quality checks. The datasets used for the workshop originate from different systems; by integrating the datasets during the workshop implementation, the Altron team can have a comprehensive overview of their customers.

Altron’s sales teams are now able to quickly refresh dashboards encompassing previously disparate datasets that are now centralized to get insights about sales pipelines and forecasts on their desktop or mobile. The technical teams are equally adept at adjusting to business needs by autonomously onboarding new data sources and further enriching the user experience and trust in the data.

Conclusion

In this post, we walked you through the journey the Altron team took with AWS. The outcomes to identify the opportunities that were most pressing to Altron, applying a working backward approach and coming up with a best-fit architecture, led to the subsequent AWS Immersion Day to implement a working prototype that helped them become more data-driven.

With their new focus on AWS skills and mechanisms, increasing trust in their internal data, and understanding the importance of driving change in data literacy and mindset, Altron is better set up for success to best serve their customers in the region.

To find out more about how Altron and AWS can help work backward on your data strategy and employ the agile methodologies discussed in this post, check out Data Management. To learn more about how can help you turn your ideas into solutions, visit the D2E website and the series of AWS Immersion Days that you can choose from. For more hands-on bespoke options, contact your AWS Account Manager, who can provide more details.

Special thanks to everyone at Altron Group who helped contribute to the success of the D2E and Build Lab workshops:

The Analysts (Liesl Kok, Carmen Kotze)
Data Engineers (Banele Ngemntu, James Owen, Andrew Corry, Thembelani Mdlankomo)
QuickSight BI Developers (Ricardo De Gavino Dias, Simei Antoniades)
Cloud Administrator (Shamiel Galant)

About the authors

Jacques Steyn runs the Altron Data Analytics Professional Services. He has been leading the building of data warehouses and analytic solutions for the past 20 years. In his free time, he spends time with his family, whether it be on the golf , walking in the mountains, or camping in South Africa, Botswana, and Namibia.

Jason Yung is a Principal Analytics Specialist with Amazon Web Services. Working with Senior Executives across the Europe and Asia-Pacific Regions, he helps customers become data-driven by understanding their use cases and articulating business value through Amazon mechanisms. In his free time, he spends time looking after a very active 1-year-old daughter, alongside juggling geeky activities with respectable hobbies such as cooking sub-par food.

Michele Lamarca is a Senior Solutions Architect with Amazon Web Services. He helps architect and run Solutions Accelerators in Europe to enable customers to become hands-on with AWS services and build prototypes quickly to release the value of data in the organization. In his free time, he reads books and tries (hopelessly) to improve his jazz piano skills.

Hamza is a Specialist Solutions Architect with Amazon Web Services. He runs Solutions Accelerators in EMEA regions to help customers accelerate their journey to move from an idea into a solution in production. In his free time, he spends time with his family, meets with friends, swims in the municipal swimming pool, and learns new skills.

Vega Cloud brings FinOps solutions to their customers faster by embedding Amazon QuickSight

2023-07-13 Kris Bliesner

Post Syndicated from Kris Bliesner original https://aws.amazon.com/blogs/big-data/vega-cloud-brings-finops-solutions-to-their-customers-faster-by-embedding-amazon-quicksight/

This is a guest post authored by Kris Bliesner and Mike Brown from Vega Cloud.

Vega Cloud is a premier member of the FinOps Foundation, a program by Linux Foundation supporting FinOps practitioners on cloud financial management best practices. Vega Cloud provides a place where finance, engineers, and innovators come together to accelerate the business value of the cloud with concrete curated data, context-relevant recommendations, and automation to achieve cost savings. Vega Cloud’s platform is based on the FinOps Foundation’s best practices and removes the confusion behind the business value of cloud services and accelerates strategic decisions while maintaining cost optimization. Vega’s curated reports provide actionable insights to accelerate time-to-value from years into days. On average, Vega’s customers immediately identify 15–25% of underutilized cloud spend with a clear direction on how to reallocate the funds to maximize business impact.

Vega Cloud has been growing rapidly and saw an opportunity to accelerate cloud intelligence at hyperscale. Engineering leadership chose Amazon QuickSight, which allowed Vega to add insightful analytics into its platform with customized interactive visuals and dashboards, while scaling at a lower cost without the need to manage infrastructure.

In this post, we discuss how Vega uses QuickSight to bring cloud intelligence solutions to our customers.

Bringing solutions to market at a fast pace

The Vega Cloud Platform was designed from the start by cloud pioneers to enable businesses to get the most value out of their cloud spend. This is done by a multi-step process that starts with data analytics and ends with automation to remediate inefficiencies. The Vega Cloud Platform consumes customer billing and usage information from cloud providers and third parties, and uses that data to show customers what they are spending and which services they are consuming, with business context to help the customer with chargebacks and cost inquiries. The Vega Cloud Platform then analyzes the data collected and produces context relevant recommendations across five major categories: financial, waste elimination, utilization, process, and architecture. Finally, the Vega Cloud Platform enables customers to choose which recommendations to implement through automated processes and immediately receive cost benefits without massive amounts of work by end developers or app teams.

Vega is constantly updating and improving the platform by adding more recommendation types, deeper analytics, and easier automation to save time. When looking for an embedded analytics solution to bring these insights to customers, Vega looked for a tool that would allow us to keep up with our rapid growth and iterate quickly. With QuickSight, Vega has been able to scale from the proof of concept stage to enterprise-level analytics and visualizations as the company grows. QuickSight enables our product team to ship product quickly and rapidly test customer feedback and assumptions. Vega has tremendously reduced the time from idea to implementation for the analytics solutions by using QuickSight.

Using embedded QuickSight saved Vega 6–12 months of development time, allowing us to go to market sooner. Vega Cloud’s team of certified FinOps practitioners—a unique combination of finance professionals, architects, FinOps practitioners, educators, engineers, financial analysts, and data analysts with deep expertise in multi-cloud environments—can focus on driving business growth and meeting customer needs. QuickSight gives the Vega team one place to build reports and dashboards, allowing the Vega Cloud Platform to deliver data analytics to customers quickly and consistently. The Vega Cloud Platform uses QuickSight APIs to seamlessly onboard new users. In addition to the cost savings Vega Cloud has achieved by saving development time, QuickSight doesn’t require licensing or maintenance costs. The AWS pay-as-you-go pricing model allowed Vega Cloud to hit the ground running and scale with real-time demand.

Creating a powerful array of solutions for customers

By embedding QuickSight, Vega Cloud has been able to bring a wealth of information to cloud consumers, helping them gain value and efficiencies. For example, an enterprise customer in the energy industry engaged Vega for cloud cost optimization and saw monthly savings exceeding 25% with cumulative savings of over $1.36 million over 11 months. They moved from 24% Reserved Instance/Savings Plans (RI/SP) coverage to 53% coverage. Their optimization efforts led them to increase their cloud commit by 10 times over 5 years.

The Vega Cloud Platform has two SKUs that use embedded QuickSight: Vega Inform and Vega Optimize. Vega Inform is about cost allocation, chargebacks and showbacks, anomaly detection, spend analysis, and deep usage analytics. Vega Optimize is an easy-to-use set of dashboards to help customers better understand the optimization opportunities they have across their entire enterprise. In the Vega Inform SKU, the Vega Cloud Platform provides true multi-cloud cost reporting with cash and fiscal views and the ability to switch between them seamlessly. The Vega Cloud Platform is a curated data platform to ensure customers avoid garbage in/garbage out scenarios. Vega curates customer usage and billing data to verify billing rates, usage, and credit allocation, and then enables retroactive cleanups to historical spend.

Vega Optimize is a core piece of the Vega Cloud Platform and allows end-users to see cost-optimization recommendations with business context added using the embedded QuickSight dashboards. The Vega Cloud Platform enables end-users to self-manage and approve optimization recommendations for implementation—ensuring that businesses are taking the actions they need to better manage their cloud investments.

Vega customers can identify and act upon near-term optimization opportunities prioritized by business impact and level of effort, as well as identify, purchase, and track committed use resources. QuickSight enables end-users to easily filter down data to exactly what the user wants to see. Doing so enables questions to be answered more quickly, which ensures the right optimization takes place in a timely manner. The depth and breadth of data the Vega Cloud Platform consumes and surfaces to end-users via QuickSight provides customers with a platform approach to enabling FinOps within their organizations.

Powerful and dynamic QuickSight features

The Vega Cloud Platform ingests billions of lines of data on behalf of customers, which must be converted into actionable insight and personalized to a decision-maker’s role inside the organization. Processing terabytes of data can lead to delayed and infrequent reports, slowing down an organization’s ability to respond and compete in their respective markets. It is paramount that customers have consistent, dependable access to timely reports with the correct business context. QuickSight is powered by SPICE (Super-fast, Parallel, In-memory Calculation Engine), a robust in-memory engine that now supports up to 1 billion rows of data. Thanks to the Vega Cloud Platform’s implementation of QuickSight, Vega has offloaded the responsibility including engineering from customers to ingest and curate billions of rows of data every day. The Vega Cloud Platform uses role-based permissions, with row-level security from QuickSight, to centralize and tailor data to prioritize actionable insights with the ability to quickly investigate details to provide evidence-based decisions to customers.

QuickSight allows Vega to highlight data that is considered high-cost or high-value to its customers so that they can take quick action. This is accomplished by purpose-built dashboards based on over a decade of experience to ensure customers see the items that will be most impactful in their optimization efforts. The advanced visualizations, sorting, and filtering capabilities of QuickSight allow the Vega Cloud Platform to scale usage by multiple groups within a business, including finance, DevOps, IT and many others. Along with QuickSight, the Vega Cloud Platform uses many other AWS services, including but not limited to Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon Athena, AWS Glue, and more.

Scaling into the future with QuickSight

Vega is focused on continuing to provide customers with cloud intelligence at hyperscale using QuickSight. The Vega Cloud Platform roadmap includes a proof of concept for Amazon QuickSight Q, which would give customers the ability to ask questions in natural language and receive accurate answers with relevant visualizations that help users gain insights from the data. This also includes paginated reports, which makes it easier for customers to create, schedule, and share reports.

QuickSight has enabled Vega Cloud to grow rapidly, while saving time and money and delivering FinOps solutions to businesses in any and every vertical industry consuming cloud at scale.

To learn more about how you can embed customized data visuals and interactive dashboards into any application, visit Amazon QuickSight Embedded.

About the Authors

Kris Bliesner, CEO, Vega Cloud is a seasoned technology leader with over 25 years of experience in IT management, cloud computing, and consumer-based technology. As the co-founder and CEO of Vega Cloud, Kris continues to be at the forefront of revolutionizing cloud infrastructure optimization.

Mike Brown, CTO, Vega Cloud is a highly-skilled technology leader and co-founder of Vega Cloud, where he currently serves as the Chief Technology Officer (CTO). With a proven track record in driving technological innovation, Mike has been instrumental in shaping the application architecture and solutions for the company.

Amazon Mexico FP&A dives deep into financial data with Amazon QuickSight

2023-07-11 Gonzalo Lezma

Post Syndicated from Gonzalo Lezma original https://aws.amazon.com/blogs/big-data/amazon-mexico-fpa-dives-deep-into-financial-data-with-amazon-quicksight/

This is a guest post by Gonzalo Lezma from Amazon Mexico FP&A.

The Financial Planning and Analysis (FP&A) team in Mexico provides strategic support to Amazon’s CFO and executive team on planning, analysis, and reporting related to Amazon Mexico. We produce and manage key finance deliverables, such as internal profit and loss (P&L) reports for all business groups. We are also involved in planning processes, such as monthly forecast estimates, annual operating plans, and 3-year forecasts.

Our team needed to address five key challenges: manual recurrent reporting, managing data from different sources, moving away from large and slow spreadsheets, enabling ad hoc data insights extraction by business users, and variance analysis. To tackle these issues, we chose Amazon QuickSight for our business intelligence (BI) needs.

In this post, I discuss how QuickSight has enabled us to focus on financial and business analysis that helps drive business strategy.

Fully automating recurrent reporting

Creating and maintaining reports manually is time-consuming due to dense data granularity and multiple business groups and sub-products. This involves a lot of data to process and numerous stakeholders to please. Therefore, recurrent reporting requires allocating human hours to elaborate those reports, check the spreadsheet formulas, and rely on attention to detail to validate the numbers to report.

QuickSight dashboards show the Profit and Loss (P&L), which update as soon as databases refresh, simplifying recurrent reporting dramatically. There is no need for human intervention, which eliminates any risk of error during the elaboration of recurrent reporting. The maintenance and elaboration time has decreased from a week to zero for processes that are currently in QuickSight. We employ the QuickSight alerts feature (as shown in the following screenshot) to remain informed when specific metrics exceed a predefined threshold. This enables us to stay cognizant of significant changes in our P&L at a granular level.

Data from different sources

With new marketplaces and channels constantly emerging in Mexico, not all of them are integrated into the financial planning system; therefore, shadow P&Ls and reports are common and unavoidable. Therefore, the team has to find ways to track them without compromising accuracy or consistency, which poses significant additional challenges. Moreover, with multiple channels and teams reporting those numbers, it’s time-consuming to manually update the data source from every team we work with.

QuickSight can onboard data on channels and products that are relatively new and haven’t been onboarded to official planning systems. The team has numerous options to load data, including Amazon Redshift, CSV files, and Excel spreadsheets. There is virtually no limit on the granularity and scope of our reports.

Large and slow spreadsheets

Although spreadsheets are a popular tool for financial analysis, they have limitations for large and complex datasets. This affects performance, reliability, and validation. Spreadsheets become slow, bulky, and prone to errors, making it challenging to manage large datasets efficiently.

The SPICE (Super-fast, Parallel, In-memory Calculation Engine) in-memory engine that QuickSight uses is unparalleled, compared with other solutions the team tried in the past such as Tableau and Excel, eliminating the need for large spreadsheets dramatically. In addition to the time spent in elaborating the reports, the team was having a hard time reading and visualizing them.

The MX Financial Planning and Analysis Dashboard shown below shows the main contributors to the Gross Merchandise Sales for our business. If sales growth is at 20.92% as it says in the graph, we know that 9.09% is due to our NAFN channel. The graph at the bottom shows which products drove the sales increase.

Ad hoc data insights extraction

The finance space frequently requires ad hoc financial data on recent historic trends for a particular product and timespan. Given the number of channels, products, and scenarios the team works on, this creates a big problem to tackle. Extracting these data insights requires significant bandwidth, which can take away from other essential tasks the team needs to focus on.

Amazon QuickSight Q can answer any simple question about the data in a straightforward and nimble manner, allowing the team to handle ad hoc data insights using natural language requests. The following screenshot shows a graph we frequently get using Q to report shipping costs.

Variance analysis

Providing accurate and insightful variance analysis is a significant challenge for anyone working in financial analysis (for example, explaining price or profit per unit by separating mix effect and rate effect). Huge and difficult-to-understand spreadsheets might sound familiar to anyone who has tried to tackle this problem in the finance space.

With QuickSight URL actions (as shown in the following gif and screenshot), the team can right-click on the variance they want to dissect and link to another sheet with granular detail that has a decomposition of the main drivers explaining that particular variance, replacing the huge and cumbersome Excel variance analysis tool the team used to have.

Summary

All in all, the dynamic and interactive nature of the dashboards allows our internal users to go deeper into the data with just a click of the mouse. Now, building visualizations is intuitive, insightful, and fast. In fact, the whole solution and tools were built without the need of a dedicated BI team. In addition to this, we developed internal QuickSight dashboards to view our own customers’ QuickSight usage, so we have perfect visibility on which areas and users are more active and which features are more used by our partners.

With our QuickSight solution, we have automation, self-service, speedy reaction to requests, and flexibility.

To learn more about how QuickSight can help your business with dashboards, reports, and more, visit Amazon QuickSight.

About the Author

Gonzalo Lezma is the Mexico Finance Manager for the Amazon LATAM Finance Team. He is a lifelong learner, tech and data lover.

Automated Code Review on Pull Requests using AWS CodeCommit and AWS CodeBuild

2023-07-10 Verinder Singh

Post Syndicated from Verinder Singh original https://aws.amazon.com/blogs/devops/automated-code-review-on-pull-requests-using-aws-codecommit-and-aws-codebuild/

Pull Requests play a critical part in the software development process. They ensure that a developer’s proposed code changes are reviewed by relevant parties before code is merged into the main codebase. This is a standard procedure that is followed across the globe in different organisations today. However, pull requests often require code reviewers to read through a great deal of code and manually check it against quality and security standards. These manual reviews can lead to problematic code being merged into the main codebase if the reviewer overlooks any problems.

To help solve this problem, we recommend using Amazon CodeGuru Reviewer to assist in the review process. CodeGuru Reviewer identifies critical defects and deviation from best practices in your code. It provides recommendations to remediate its findings as comments in your pull requests, helping reviewers miss fewer problems that may have otherwise made into production. You can easily integrate your repositories in AWS CodeCommit with Amazon CodeGuru Reviewer following these steps.

The purpose of this post isn’t, however, to show you CodeGuru Reviewer. Instead, our aim is to help you achieve automated code reviews with your pull requests if you already have a code scanning tool and need to continue using it. In this post, we will show you step-by-step how to add automation to the pull request review process using your code scanning tool with AWS CodeCommit (as source code repository) and AWS CodeBuild (to automatically review code using your code reviewer). After following this guide, you should be able to give developers automatic feedback on their code changes and augment manual code reviews so fewer problems make it into your main codebase.

Solution Overview

The solution comprises of the following components:

AWS CodeCommit: AWS service to host private Git repositories.
Amazon EventBridge: AWS service to receive pullRequestCreated and pullRequestSourceBranchUpdated events and trigger Amazon EventBridge rule.
AWS CodeBuild: AWS service to perform code review and send the result to AWS CodeCommit repository as pull request comment.

The following diagram illustrates the architecture:

Figure 1: This architecture diagram illustrates the workflow where developer raises a Pull Request and receives automated feedback on the code changes using AWS CodeCommit, AWS CodeBuild and Amazon EventBridge rule

Figure 1. Architecture Diagram of the proposed solution in the blog

Developer raises a pull request against the main branch of the source code repository in AWS CodeCommit.
The pullRequestCreated event is received by the default event bus.
The default event bus triggers the Amazon EventBridge rule which is configured to be triggered on pullRequestCreated and pullRequestSourceBranchUpdated events.
The EventBridge rule triggers AWS CodeBuild project.
The AWS CodeBuild project runs the code quality check using customer’s choice of tool and sends the results back to the pull request as comments. Based on the result, the AWS CodeBuild project approves or rejects the pull request automatically.

Walkthrough

The following steps provide a high-level overview of the walkthrough:

Create a source code repository in AWS CodeCommit.
Create and associate an approval rule template.
Create AWS CodeBuild project to run the code quality check and post the result as pull request comment.
Create an Amazon EventBridge rule that reacts to AWS CodeCommit pullRequestCreated and pullRequestSourceBranchUpdated events for the repository created in step 1 and set its target to AWS CodeBuild project created in step 3.
Create a feature branch, add a new file and raise a pull request.
Verify the pull request with the code review feedback in comment section.

1. Create a source code repository in AWS CodeCommit

Create an empty test repository in AWS CodeCommit by following these steps. Once the repository is created you can add files to your repository following these steps. If you create or upload the first file for your repository in the console, a branch is created for you named main. This branch is the default branch for your repository. If you are using a Git client instead, consider configuring your Git client to use main as the name for the initial branch. This blog post assumes the default branch is named as main.

2. Create and associate an approval rule template

Create an AWS CodeCommit approval rule template and associate it with the code repository created in step 1 following these steps.

3. Create AWS CodeBuild project to run the code quality check and post the result as pull request comment

This blog post is based on the assumption that the source code repository has JavaScript code in it, so it uses jshint as a code analysis tool to review the code quality of those files. However, users can choose a different tool as per their use case and choice of programming language.

Create an AWS CodeBuild project from AWS Management Console following these steps and using below configuration:

Source: Choose the AWS CodeCommit repository created in step 1 as the source provider.
Environment: Select the latest version of AWS managed image with operating system of your choice. Choose New service role option to create the service IAM role with default permissions.
Buildspec: Use below build specification. Replace <NODEJS_VERSION> with the latest supported nodejs runtime version for the image selected in previous step. Replace <REPOSITORY_NAME> with the repository name created in step 1. The below spec installs the jshint package, creates a jshint config file with a few sample rules, runs it against the source code in the pull request commit, posts the result as comment to the pull request page and based on the results, approves or rejects the pull request automatically.

version: 0.2
phases:
  install:
    runtime-versions:
      nodejs: <NODEJS_VERSION>
    commands:
      - npm install jshint --global
  build:
    commands:
      - echo \{\"esversion\":6,\"eqeqeq\":true,\"quotmark\":\"single\"\} > .jshintrc
      - CODE_QUALITY_RESULT="$(echo \`\`\`) $(jshint .)"; EXITCODE=$?
      - aws codecommit post-comment-for-pull-request --pull-request-id $PULL_REQUEST_ID --repository-name <REPOSITORY_NAME> --content "$CODE_QUALITY_RESULT" --before-commit-id $DESTINATION_COMMIT_ID --after-commit-id $SOURCE_COMMIT_ID --region $AWS_REGION	
      - |
        if [ $EXITCODE -ne 0 ]
        then
          PR_STATUS='REVOKE'
        else
          PR_STATUS='APPROVE'
        fi
      - REVISION_ID=$(aws codecommit get-pull-request --pull-request-id $PULL_REQUEST_ID | jq -r '.pullRequest.revisionId')
      - aws codecommit update-pull-request-approval-state --pull-request-id $PULL_REQUEST_ID --revision-id $REVISION_ID --approval-state $PR_STATUS --region $AWS_REGION

Once the AWS CodeBuild project has been created successfully, modify its IAM service role by following the below steps:

Choose the CodeBuild project’s Build details tab.
Choose the Service role link under the Environment section which should navigate you to the CodeBuild’s IAM service role in IAM console.
Expand the default customer managed policy and choose Edit.
Add the following actions to the existing codecommit actions:

"codecommit:CreatePullRequestApprovalRule",
"codecommit:GetPullRequest",
"codecommit:PostCommentForPullRequest",
"codecommit:UpdatePullRequestApprovalState"

Choose Next.
On the Review screen, choose Save changes.

4. Create an Amazon EventBridge rule that reacts to AWS CodeCommit pullRequestCreated and pullRequestSourceBranchUpdated events for the repository created in step 1 and set its target to AWS CodeBuild project created in step 3

Follow these steps to create an Amazon EventBridge rule that gets triggered whenever a pull request is created or updated using the following event pattern. Replace the <REGION>, <ACCOUNT_ID> and <REPOSITORY_NAME> placeholders with the actual values. Select target of the event rule as AWS CodeBuild project created in step 3.

Event Pattern

{
    "detail-type": ["CodeCommit Pull Request State Change"],
    "resources": ["arn:aws:codecommit:<REGION>:<ACCOUNT_ID>:<REPOSITORY_NAME>"],
    "source": ["aws.codecommit"],
    "detail": {
      "isMerged": ["False"],
      "pullRequestStatus": ["Open"],
      "repositoryNames": ["<REPOSITORY_NAME>"],
      "destinationReference": ["refs/heads/main"],
      "event": ["pullRequestCreated", "pullRequestSourceBranchUpdated"]
    },
    "account": ["<ACCOUNT_ID>"]
  }

Follow these steps to configure the target input using the below input path and input template.

Input transformer – Input path

{
    "detail-destinationCommit": "$.detail.destinationCommit",
    "detail-pullRequestId": "$.detail.pullRequestId",
    "detail-sourceCommit": "$.detail.sourceCommit"
}

Input transformer – Input template

{
    "sourceVersion": <detail-sourceCommit>,
    "environmentVariablesOverride": [
        {
            "name": "DESTINATION_COMMIT_ID",
            "type": "PLAINTEXT",
            "value": <detail-destinationCommit>
        },
        {
            "name": "SOURCE_COMMIT_ID",
            "type": "PLAINTEXT",
            "value": <detail-sourceCommit>
        },
        {
            "name": "PULL_REQUEST_ID",
            "type": "PLAINTEXT",
            "value": <detail-pullRequestId>
        }
    ]
}

5. Create a feature branch, add a new file and raise a pull request

Create a feature branch following these steps. Push a new file called “index.js” to the root of the repository with the below content.

function greet(dayofweek) {
  if (dayofweek == "Saturday" || dayofweek == "Sunday") {
    console.log("Have a great weekend");
  } else {
    console.log("Have a great day at work");
  }
}

Now raise a pull request using the feature branch as source and main branch as destination following these steps.

6. Verify the pull request with the code review feedback in comment section

As soon as the pull request is created, the AWS CodeBuild project created in step 3 above will be triggered which will run the code quality check and post the results as a pull request comment. Navigate to the AWS CodeCommit repository pull request page in AWS Management Console and check under the Activity tab to confirm the automated code review result being displayed as the latest comment.

The pull request comment submitted by AWS CodeBuild highlights 6 errors in the JavaScript code. The errors on lines first and third are based on the jshint rule “eqeqeq”. It recommends to use strict equality operator (“===”) instead of the loose equality operator (“==”) to avoid type coercion. The errors on lines second, fourth and fifth are based on jshint rule “quotmark” which recommends to use single quotes with strings instead of double quotes for better readability. These jshint rules are defined in AWS CodeBuild project’s buildspec in step 3 above.

Figure 2: The image shows the AWS CodeCommit pull request's Activity tab with code review results automatically posted by the automated code reviewer

Figure 2. Pull Request comments updated with automated code review results.

Conclusion

In this blog post we’ve shown how using AWS CodeCommit and AWS CodeBuild services customers can automate their pull request review process by utilising Amazon EventBridge events and using their own choice of code quality tool. This simple solution also makes it easier for the human reviewers by providing them with automated code quality results as input and enabling them to focus their code review more on business logic code changes rather than static code quality issues.

About the authors

Validating attestation documents produced by AWS Nitro Enclaves

2023-07-06 maceneff

Post Syndicated from maceneff original https://aws.amazon.com/blogs/compute/validating-attestation-documents-produced-by-aws-nitro-enclaves/

This blog post is written by Paco Gonzalez Senior EMEA IoT Specialist SA.

AWS Nitro Enclaves offers an isolated, hardened, and highly constrained environment to host security-critical applications. Think of AWS Nitro Enclaves as regular Amazon Elastic Compute Cloud (Amazon EC2) virtual machines (VMs) but with the added benefit of the environment being highly constrained.

A great benefit of using AWS Nitro Enclaves is that you can run your software as if it was a regular EC2 instance, but with no persistent storage and limited access to external systems. The only way to communicate with AWS Nitro Enclaves is using a VSOCK socket. This special type of communication mechanism acts as an isolated communication channel between the parent EC2 instance and AWS Nitro Enclaves.

Fig 1 – AWS Nitro Enclaves uses the proven isolation of the Nitro Hypervisor to further isolate the CPU and memory of the Nitro Enclaves from users, applications, and libraries on the parent instance.

AWS Nitro Enclaves comes with a custom Linux device called the Nitro Security Module (NSM), which is accessible via /dev/nsm. This device provides attestation capability to the Nitro Enclaves. The attestation comes in the form of an attestation document. The attestation document makes it easy and safe to build trust between systems that interact with the Nitro Enclaves. The external system must have a mechanism to process the attestation document to determine the validity of the attestation document.

In this post, I go through the anatomy of an attestation document produced by the NSM API. I then show you an example of how to perform different validations that help determine the accuracy of an attestation document produced by the AWS Nitro Enclaves Security Module. I use syntactic and semantic validations to check for the attestation document’s correctness before proceeding with a cryptographic validation of the contents of the document’s payload. The examples used in this post use the C language. Look at the companion repository available in GitHub for access to the all source code used in this post.

Anatomy of an attestation document produced by AWS Nitro Enclaves

The attestation document uses the Concise Binary Object Representation (CBOR) format to encode the data. The CBOR object is wrapped using the CBOR Object Signing and Encryption (COSE) protocol. The COSE format used is a single-signer data structure called “COSE_Sign1”. The object is comprised of headers, the payload, and a signature.

For more information about COSE, see RFC 8152: CBOR Object Signing and Encryption (COSE). For more information about CBOR, see RFC 8949 Concise Binary Object Representation (CBOR).

We published a library to make it easy to interact with the NSM. The library contains helpers which your application, running on the Nitro Enclaves, can use to communicate with the NSM device.

Here is the minimum code needed to generate an attestation document:

#include <stdlib.h>
#include <stdio.h>
#include <nsm.h>

#define NSM_MAX_ATTESTATION_DOC_SIZE (16 * 1024)

int main(void) {

    /// NSM library initialization function.  
    /// *Returns*: A descriptor for the opened device file.

    int nsm_fd = nsm_lib_init();
    if (nsm_fd < 0) {
        exit(1);
    }

    /// NSM `GetAttestationDoc` operation for non-Rust callers.  
    /// *Argument 1 (input)*: The descriptor to the NSM device file.  
    /// *Argument 2 (input)*: User data.  
    /// *Argument 3 (input)*: The size of the user data buffer.  
    /// *Argument 4 (input)*: Nonce data.  
    /// *Argument 5 (input)*: The size of the nonce data buffer.  
    /// *Argument 6 (input)*: Public key data.  
    /// *Argument 7 (input)*: The size of the public key data buffer.  
    /// *Argument 8 (output)*: The obtained attestation document.  
    /// *Argument 9 (input / output)*: The document buffer capacity (as input)
    /// and the size of the received document (as output).  
    /// *Returns*: The status of the operation.

    int status;
    uint8_t att_doc_buff[NSM_MAX_ATTESTATION_DOC_SIZE];
    uint32_t att_doc_cap_and_size = NSM_MAX_ATTESTATION_DOC_SIZE;

    status = nsm_get_attestation_doc(nsm_fd, NULL, 0, NULL, 0, NULL, 0, att_doc_buff, 
                                    &att_doc_cap_and_size);
    if (status != ERROR_CODE_SUCCESS) {
        printf("[Error] Request::Attestation got invalid response: %s\n",status);
        exit(1);
    }

    printf("########## attestation_document_buff ##########\r\n");
    for(int i=0; i<att_doc_cap_and_size; i++)
        fprintf(stdout, "%02X", att_doc_buff[i]);

    exit(0);
}

To produce a sample attestation document, initialize the device, call the function ‘nsm_get_attestation_doc’ inside the AWS Nitro Enclaves, and dump the contents. The library is written using Rust, but it contains
bindings for C. You can read more about the library and some of the other relevant capabilities
here.

The COSE headers contain a protected and an un-protected data section. The cryptographic algorithm used for the signature is specified inside the protected area. AWS Nitro Enclaves use a 384-bit elliptic curve algorithm (P-384) to sign attestation documents. AWS Nitro Enclaves do not use the unprotected data field so it is always left blank.

The payload contains fixed parameters that include the following: information about the issuing NSM, a timestamp of the issuing event, a map of all the locked Platform Configuration Registers (PCRs) at the moment the attestation document was generated, the hashing algorithm used to produce the digest that was used to calculate the PCR values – AWS Nitro Enclaves use a 384 bit secure hashing algorithm (SHA384), a x509 certificate signed by AWS Nitro Enclaves’ Private Public Key Infrastructure (PKI). An AWS Nitro Enclaves certificate expires three hours after it has been issued. The common name (CN) contains information about the issuing NSM – and finally the issuing Certificate Authority (CA) bundle. The payload also contains optional parameters that a third-party application can use to create custom authentication and authorization workflows. The optional parameters are: a public key, a cryptographic nonce, and additional arbitrary data.

Finally, the signature is the result of a signing operation using the private key related to the public key contained inside the certificate that is part of the payload.

Fig 2. An attestation document is generated and signed by the Nitro Hypervisor. It contains information about the Nitro Enclaves and it can be used by an external service to verify the identity of Nitro Enclaves and to establish trust. You can use the attestation document to build your own cryptographic attestation mechanisms.

Syntactical validation

Early validation of the attestation document format makes sure that only documents that conform to the expected structure are processed in subsequent steps.

I start by attempting to decode the CBOR object and testing to see if it corresponds to a COSE object signed with one signer or ‘COSE_Sign1’ structure. This can be easily done by looking at the most significant first three bits (MSB) of the first byte – I am expecting a stream of CBOR bytes (decimal 6). Then, I take the least significant (LSB) remaining five bits of the first byte – I am expecting a tag that tells me it is a COSE_Sign1 object (decimal 18).

assert(att_doc_buff[0*] == 6 <<5 | 18); // 0xD2

* Note that the time of writing, the NSM does not include the COSE tag and thus this validation cannot be made and is mentioned in this post for informational purposes only. However, it is important to keep this in mind, as the tag is part of the standard, and the NSM device or library could include it in the future.*

The next step is to parse the actual CBOR object. A COSE_Sign1 object is an array of size 4 (protected headers, un protected headers, payload, and signature). Therefore, I must check that the next three MSB correspond to Type 4 (array) and that the size is exactly 4.

assert(att_doc_buff[0] == 4 <<5 | 4); // 0x84

The next byte determines what the first CBOR item of the array looks like. I am expecting the protected COSE header as the first item of the array. The CBOR field should indicate that the contents of the item are of a Type 2 (raw bytes) and the size should be exactly 4.

assert(att_doc_buff[1] == 2 <<5 | 4); // 0x44

The next four bytes represent the protected header. The contents of this item is a regular CBOR object. The object should contain a Type 5 (map) with a single item (1). The item first key is expected to be the number 1. The first three MSB of the first byte should be a Type 1 (negative integer). The remaining five LSB should indicate that the value is an 8-bit number (decimal 24). The last byte should be negative 35 as it maps to the P-384 curve that Nitro Enclaves use. Note that CBOR negative numbers are stored minus 1.

assert(att_doc_buff[2] == 5 <<5 | 1); // 0xA1
assert(att_doc_buff[3] == 0x01); // 0x01
assert(att_doc_buff[4] == 1 <<5 | 24); // 0x38
assert(att_doc_buff[4] == 35-1); // 0x22

The next byte corresponds to the unprotected header. AWS Nitro Enclaves do not use unprotected headers. Therefore, the expected is a Type 5 (map) with zero items.

assert(att_doc_buff[6] == 5 <<5 | 0); // 0xA0

Now that I am done inspecting the headers, I can move onto the payload. The CBOR object used for the payload is Type 2 (raw bytes). This time we are expecting a large steam of bytes. The remaining five LSB are used to indicate the data type used to indicate the size of the byte stream (i.e. 8-bit, 16-bit). AWS Nitro Enclaves attestation documents are about 5 KiB without using any of the three optional parameters. The optional parameters have a size limit of 1 KiB each. This means that it would be highly unlikely for the buffer to be larger than a 16-bit number (CBOR short count: 25).

assert(att_doc_buff[8] == 2 <<5 | 25); // 0x59

The next two bytes represent the size of the payload which I am going to skip those for now, as the contents of the payload are validated in subsequent steps. I’ll move onto the final portion of the attestation document: the signature. The signature has to be a Type 2 (raw bytes) of exactly 96 bytes.

    uint16_t payload_size = att_doc_buff[8] << 8 | att_doc_buff[9];
    assert(att_doc_buff[9+payload_size+1] == (2<<5 | 24));   // 0x58
    assert(att_doc_buff[9+payload_size+1+1] == 96);         // 0x60

At this point, I have validated that the data produced by the NSM looks the way it should. My application is ready to start looking into the contents of the attestation document.

I want to make sure that the document contains all mandatory fields and I can check that the fields have the right structure and their sizes are within the expected boundaries. I have evidence that the data looks the way it should, so I am ready to use an off-the-shelf CBOR library to make the validation process easier instead of doing it by hand.

Here is an example of how to load a CBOR object using libcbor and standard C libraries to check the contents. I am showing just one example to illustrate the process. Refer to the section ‘Verifying the root of trust’ in the AWS Nitro Enclaves User Guide for a detailed description of each parameter and the validations that your application should perform to make sure that the document is valid.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

#include <cbor.h>
#include <openssl/ssl.h>

#define APP_X509_BUFF_LEN                   (1024*2)
#define APP_ATTDOC_BUFF_LEN                 (1024*10)

void output_handler(char * msg){
    fprintf(stdout, "\r\n%s\r\n", msg);
}

void output_handler_bytes(uint8_t * buffer, int buffer_size){    
    for(int i=0; i<buffer_size; i++)
        fprintf(stdout, "%02X", buffer[i]);
    fprintf(stdout, "\r\n");
}

int read_file( unsigned char * file, char * file_name, size_t elements) {
    FILE * fp; size_t file_len = 0;
    fp = fopen(file_name, "r");
    file_len = fread(file, sizeof(char), elements, fp);
    if (ferror(fp) != 0 ) {
        fputs("Error reading file", stderr);
    } 
    fclose(fp);
    return file_len; 
}  

int main(int argc, char* argv[]) {

    // STEP 0 - LOAD ATTESTATION DOCUMENT

    // Check inputs, expect two
    if (argc != 3) { 
        fprintf(stderr, "%s\r\n", "ERROR: usage: ./main {att_doc_sample.bin} {AWS_NitroEnclaves_Root-G1.pem}"); exit(1);
    }

    // Load file into buffer, use 1st argument
    unsigned char * att_doc_buff = malloc(APP_ATTDOC_BUFF_LEN);
    int att_doc_len = read_file(att_doc_buff, argv[1], APP_ATTDOC_BUFF_LEN );

    // STEP 1 - SYTANCTIC VALIDATON

    // Check COSE TAG (skipping - not currently implemented by AWS Nitro Enclaves)
    // assert(att_doc_buff[0] == 6 <<5 | 18); // 0xD2
    // Check if this is an array of exactly 4 items
    assert(att_doc_buff[0] == (4<<5 | 4));      // 0x84
    // Check if next item is a byte stream of 4 bytes
    assert(att_doc_buff[1] == (2<<5 | 4));      // 0x44
    // Check is fist item if byte stream is a map with 1 item
    assert(att_doc_buff[2] == (5<<5 | 1));      // 0xA1
    // Check that the first key of the map is 0x01
    assert(att_doc_buff[3] == 0x01);            // 0x01
    // Check that value of the the first key of the map is -35 (P-384 curve)
    assert(att_doc_buff[4] == (1 <<5 | 24));    // 0x38
    assert(att_doc_buff[5] == 35-1);            // 0x22
    // Check that next item is a map of 0 items
    assert(att_doc_buff[6] == (5<<5 | 0));      // 0xA0
    // Check that the next item is a byte stream and the size is a 16-bit number (dec. 25)
    assert(att_doc_buff[7] == (2<<5 | 25));     // 0x59
    // Cast the 16-bit number
    uint16_t payload_size = att_doc_buff[8] << 8 | att_doc_buff[9];
    // Check that the item after the payload is a byte stream and the size is 8-bit number (dec. 24)
    assert(att_doc_buff[9+payload_size+1] == (2<<5 | 24));   // 0x58
    // Check that the size of the signature is exactly 96 bytes
    assert(att_doc_buff[9+payload_size+1+1] == 96);         // 0x60

    // Parse buffer using library
    struct cbor_load_result ad_result;
    cbor_item_t * ad_item = cbor_load(att_doc_buff, att_doc_len, &ad_result);
    free(att_doc_buff); // not needed anymore

    // Parse protected header -> item 0 
    cbor_item_t * ad_pheader = cbor_array_get(ad_item, 0); 
    size_t ad_pheader_len = cbor_bytestring_length(ad_pheader);

    // Parse signed bytes -> item 2 (skip un-protected headers as they are always empty)
    cbor_item_t * ad_signed = cbor_array_get(ad_item, 2);
    size_t ad_signed_len = cbor_bytestring_length(ad_signed);

    // Load signed bytes as a new CBOR object
    unsigned char * ad_signed_d = cbor_bytestring_handle(ad_signed);
    struct cbor_load_result ad_signed_result;
    cbor_item_t * ad_signed_item = cbor_load(ad_signed_d, ad_signed_len, &ad_signed_result);

    // Create the pair structure
    struct cbor_pair * ad_signed_item_pairs = cbor_map_handle(ad_signed_item);

    // Parse signature -> item 3
    cbor_item_t * ad_sig = cbor_array_get(ad_item, 3); 
    size_t ad_sig_len = cbor_bytestring_length(ad_sig);
    unsigned char * ad_sig_d = cbor_bytestring_handle(ad_sig);

    // Example 01: Check that the first item's key is the string "module_id" and that is not empty
    size_t module_k_len = cbor_string_length(ad_signed_item_pairs[0].key);
    unsigned char * module_k_str = realloc(cbor_string_handle(ad_signed_item_pairs[0].key), module_k_len+1); //null char
    module_k_str[module_k_len] = '\0';
    size_t module_v_len = cbor_string_length(ad_signed_item_pairs[0].value);
    unsigned char * module_v_str = realloc(cbor_string_handle(ad_signed_item_pairs[0].value), module_v_len+1); //null char
    module_v_str[module_v_len] = '\0';
    assert(module_k_len != 0);
    assert(module_v_len != 0);

    // Example 02: Check that the module id key is actually the string "module_id"
    assert(!strcmp("module_id",(const char *)module_k_str));

    // Example 03: Check that the signature is exactly 96 bytes long
    assert(ad_sig_len == 96);

    // Example 04: Check that the protected header is exactly 4 bytes long
    assert(ad_pheader_len == 4);

Semantic validation

The next step is to look at the data contained in the attestation document and check if it conforms to pre-defined business rules. The attestation document contains a certificate that was signed by the AWS Nitro Enclaves’ PKI. This validation it is important, as it proves that the document was signed by the AWS Nitro Enclaves’ PKI.

The signature of an x509 certificate is based on the certificate’s payload digest. Validating this signature means that I trust the information contained within the certificate, including the public key which I can later use to validate the attestation document itself. Furthermore, the information in the document contains details about the NSM module and a timestamp. Passing this check provides the assurances I need to trust that the document originated from my software running on AWS Nitro Enclaves at a specific time.

Fig 3. The attestation document contains a x.509 certificate that was signed by the AWS Nitro Enclaves’ PKI.

Here is an example of how I use the AWS Nitro Enclaves’ Private PKI root certificate from an external file. Then, use the CA bundle contained in the attestation document to validate the authenticity of the certificate contained in the document. In this example, I am using the OpenSSL library.

// STEP 2 -  SEMANTIC VALIDATION

    // Load AWS Nitro Enclave's Private PKI root certificate
    unsigned char * x509_root_ca = malloc(APP_X509_BUFF_LEN);
    int x509_root_ca_len = read_file(x509_root_ca, argv[2], APP_X509_BUFF_LEN );
    BIO * bio = BIO_new_mem_buf((void*)x509_root_ca, x509_root_ca_len);
    X509 * caX509 = PEM_read_bio_X509(bio, NULL, NULL, NULL);
    if (caX509 == NULL) {
        fprintf(stderr, "%s\r\n", "ERROR: PEM_read_bio_X509 failed"); exit(1);
    }
    free(x509_root_ca); free(bio);
    // Create CA_STORE
    X509_STORE * ca_store = NULL;
    ca_store = X509_STORE_new();
    /* ADD X509_V_FLAG_NO_CHECK_TIME FOR TESTING! TODO REMOVE */
    X509_STORE_set_flags (ca_store, X509_V_FLAG_NO_CHECK_TIME);
    if (X509_STORE_add_cert(ca_store, caX509) != 1) {
        fprintf(stderr, "%s\r\n", "ERROR: X509_STORE_add_cert failed"); exit(1);
    }
    // Add certificates to CA_STORE from cabundle
    // Skip the first one [0] as that is the Root CA and we want to read it from an external source
    for (int i = 1; i < cbor_array_size(ad_signed_item_pairs[5].value); ++i){ 
        cbor_item_t * ad_cabundle = cbor_array_get(ad_signed_item_pairs[5].value, i); 
        size_t ad_cabundle_len = cbor_bytestring_length(ad_cabundle);
        unsigned char * ad_cabundle_d = cbor_bytestring_handle(ad_cabundle);
        X509 * cabnX509 = X509_new();
        cabnX509 = d2i_X509(&cabnX509, (const unsigned char **)&ad_cabundle_d, ad_cabundle_len);
        if (cabnX509 == NULL) {
            fprintf(stderr, "%s\r\n", "ERROR: d2i_X509 failed"); exit(1);
        }
        if (X509_STORE_add_cert(ca_store, cabnX509) != 1) {
            fprintf(stderr, "%s\r\n", "ERROR: X509_STORE_add_cert failed"); exit(1);
        }
    }

    // Load certificate from attestation dcoument - this a certificate that we don't trust (yet)
    size_t ad_signed_cert_len = cbor_bytestring_length(ad_signed_item_pairs[4].value);
    unsigned char * ad_signed_cert_d = realloc(cbor_bytestring_handle(ad_signed_item_pairs[4].value), ad_signed_cert_len);
    X509 * pX509 = X509_new();
    pX509 = d2i_X509(&pX509, (const unsigned char **)&ad_signed_cert_d, ad_signed_cert_len);
    if (pX509 == NULL) {
        fprintf(stderr, "%s\r\n", "ERROR: d2i_X509 failed"); exit(1);
    }
    // Initialize X509 store context and veryfy untrusted certificate
    STACK_OF(X509) * ca_stack = NULL;
    X509_STORE_CTX * store_ctx = X509_STORE_CTX_new();
    if (X509_STORE_CTX_init(store_ctx, ca_store, pX509, ca_stack) != 1) {
        fprintf(stderr, "%s\r\n", "ERROR: X509_STORE_CTX_init failed"); exit(1);
    }
    if (X509_verify_cert(store_ctx) != 1) {
        fprintf(stderr, "%s\r\n", "ERROR: X509_verify_cert failed"); exit(1);
    }
    fprintf(stdout, "%s\r\n", "OK: ########## Root of Trust Verified! ##########");

Having proof that the certificate was signed by the expected CA is just the beginning. I also want to make sure that the contents of the certificate are correct. This involves checking that the certificate has not expired, as well as making sure that the critical extensions contain correct information to name a few.

Cryptographic validation

The syntactic validation helped me determine that the attestation document has the right shape, and the sematic validation helped me determine if the document meets my business rules. However, I still don’t know for sure if the document is valid.

The attestation document contains critical information, such as PCRs and the AWS Identity Access and Management (IAM) role among other details. I can safely use these two values in my authentication or authorization workflows if I can prove that they are trustworthy.

The attestation document was signed using a private key that is never exposed. However, the corresponding public key is contained within the certificate that was issued and stored within the attestation document. I know I can trust the contents of this certificate because I have proof that the certificate was signed by an entity that I trust.

Here is an example where I cryptographically prove that all the protected contents of the attestation document are related to the public key contained in the certificate. To validate the COSE signature, I must first recreate the original message that was used during the signature operation – COSE uses a specific format. Then, I use OpenSSL to check if there is a match between the message, signature, and public key. If the signature checks, then I can trust the contents of the already semantically-verified payload.

 // STEP 3 - CRYPTOGRAPHIC VALIDATION

    #define SIG_STRUCTURE_BUFFER_S (1024*10)
    // Create new empty key
    EVP_PKEY * pkey = EVP_PKEY_new();
    // Create a new eliptic curve object using P-384 curve
    EC_KEY * ec_key = EC_KEY_new_by_curve_name(NID_secp384r1);
    // Reference the public key stucture and eliptic curve object with each other
    EVP_PKEY_assign_EC_KEY(pkey, ec_key);
    // Load the public key from the attestation document (we trust it now)
    pkey = X509_get_pubkey(pX509);
    if (pkey == NULL) {
        fprintf(stderr, "%s\r\n", "ERROR: X509_get_pubkey failed"); exit(1);
    }
    // Allocate, initialize and return a digest context
    EVP_MD_CTX * ctx = EVP_MD_CTX_create();
    // Set up verification context
    if (EVP_DigestVerifyInit(ctx, NULL, EVP_sha384(), NULL, pkey) <= 0) {
        fprintf(stderr, "%s\r\n", "ERROR: EVP_DigestVerifyInit failed"); exit(1);
    }
    // Recreate COSE_Sign1 structure, and serilise it into a buffer
    cbor_item_t * cose_sig_arr = cbor_new_definite_array(4);
    cbor_item_t * cose_sig_arr_0_sig1 = cbor_build_string("Signature1"); 
    cbor_item_t * cose_sig_arr_2_empty = cbor_build_bytestring(NULL, 0);

    assert(cbor_array_push(cose_sig_arr, cose_sig_arr_0_sig1));
    assert(cbor_array_push(cose_sig_arr, ad_pheader));
    assert(cbor_array_push(cose_sig_arr, cose_sig_arr_2_empty));
    assert(cbor_array_push(cose_sig_arr, ad_signed));

    unsigned char sig_struct_buffer[SIG_STRUCTURE_BUFFER_S];
    size_t sig_struct_buffer_len = cbor_serialize(cose_sig_arr, sig_struct_buffer, SIG_STRUCTURE_BUFFER_S);
    // Hash message and load it into the verificaiton context
    if (EVP_DigestVerifyUpdate(ctx, sig_struct_buffer, sig_struct_buffer_len) <= 0) {
        fprintf(stderr, "%s\r\n", "ERROR: nEVP_DigestVerifyUpdate failed"); exit(1);
    }
    // Create R and V BIGNUM structures
    BIGNUM * sig_r = BN_new(); BIGNUM * sig_v = BN_new();
    BN_bin2bn(ad_sig_d, 48, sig_r); BN_bin2bn(ad_sig_d + 48, 48, sig_v);
    // Allocate an empty ECDSA_SIG structure
    ECDSA_SIG * ec_sig = ECDSA_SIG_new();
    // Set R and V values
    ECDSA_SIG_set0(ec_sig, sig_r, sig_v);
    // Convert R and V values into DER format
    int sig_size = i2d_ECDSA_SIG(ec_sig, NULL);
    unsigned char * sig_bytes = malloc(sig_size); unsigned char * p;
    memset_s(sig_bytes,sig_size,0xFF, sig_size);
    p = sig_bytes;
    sig_size = i2d_ECDSA_SIG(ec_sig, &p);
    // Verify the data in the context against the signature and get final result
    if (EVP_DigestVerifyFinal(ctx, sig_bytes, sig_size) != 1) {
        fprintf(stderr, "%s\r\n", "ERROR: EVP_DigestVerifyFinal failed"); exit(1);
    } else {
        fprintf(stdout, "%s\r\n", "OK: ########## Message Verified! ##########"); 
        free(sig_bytes);
        exit(0);
    }
    //#endif

    exit(1);

}

Conclusion

In this post, I went through a detailed examination of attestation documents produced by the AWS Nitro Enclaves. Then, I went over different types of validations (syntactic, semantic, and cryptographic) that safely help determine if an attestation document should be trusted. I’ve also included access to a public repository that contains the source code used in this post. New AWS Nitro Enclaves users can use it as a starting point when looking to integrate their applications with AWS Nitro Enclaves and build highly secure and confidential solutions.

How To Build an Email Service on SES

2023-06-30 tweirjon

Post Syndicated from tweirjon original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-build-an-email-service-on-ses/

Foundations

Amazon Simple Email Service (SES) handles hundreds of billions of email messages every month. While many are outbound, one of the fastest-growing parts of the business is for inbound traffic. Customers send and receive email via SES using a combination of public SMTP interfaces and the SES SDK. Traditionally, most customers used SES alongside their existing corporate mail systems, but did you know it’s possible to build a complete email service with SES at its core? In fact, it’s already been done – it’s known as Amazon WorkMail, and it provides mailbox and calendar services to tens of thousands of customers (and millions of mailboxes) around the world.

Ingredients for Success

Email transport depends on a few core components. First of all, you have to be a reputable sender, or the receiving email systems are going to reject anything you try to send. You also have to be insulated against spurious reports of abuse, so that one bad apple can’t take down the entire service for everyone. The solution for both of those issues is the same: have an enormous number of public Mail Transfer Agents (MTAs), and manage their IP reputations actively. If someone reports spam coming from one of those IPs, and it gets added to a block list somewhere on the internet, you have to have a rapid response mechanism to engage with the block list operator and take their prescribed steps to clean up the entry.

The Highest Standards of Security

Similarly, you have to consult those same block lists when mail is sent to your own systems from anywhere on the internet. Inbound email is subjected to a variety of authentication steps before it’s released for delivery to a destination. Quality providers will leverage checks called SPF (Sender Policy Framework) and DKIM (Domain Keys Identified Mail). SPF is designed to prevent malicious senders from masquerading as other domains, and DKIM enables a receiving system to validate the authenticity of the sender and to confirm it hasn’t been manipulated while in transit. If either of these checks fail, a receiving system may take action ranging from dropping the message entirely, to flagging it as suspicious but still delivering it to the user’s inbox. A third security control, DMARC (Domain-based Message Authentication, Reporting, and Conformance) takes SPF and DKIM outputs and generates a series of instructions for receiving mailbox providers about what to do with questionable mail. Any serious provider will support these mechanisms and provide visibility into their actual performance on your email.

Amazon WorkMail’s Interface with SES

Once you’ve got clean email and reputable senders or recipients, you have to be able to figure out where to deliver the message itself. SES Inbound has a specific internal action when used with WorkMail, where the message is routed to WorkMail’s own infrastructure for matching against a known user’s inbox and performing the indexing and storage operations necessary to make it show up in your desktop, web, or mobile mail client. There are a number of options which may take place while that message is in transit, however, and the SES framework supports those with its flexible routing options. For example, a very popular choice is for customers to trigger a transport rule powered by AWS Lambda for inbound and/or outbound messages. Some of these are simple – they append a standard banner to the message if it is inbound from an external source, for example – but there is really no limit to what programmatic steps can be taken. You could submit message content to a large language model (LLM) for training or inspection. You could examine its use of language with AWS Bedrock to train a foundational model in generative AI about how to write emails itself. WorkMail and SES support and encourage these kind of big ideas for working with your message content.

Managing Spikes and Growth

Another critical advantage SES provides is the ability to absorb huge spikes in inbound traffic, and to sustain very large permitted volumes of outbound traffic as well. Email’s underlying standards and protocols offer administrators some degree of control over delays in transit, by implementing retry intervals to buffer messages if they can’t be delivered immediately. The classic on-premise enterprise use case, however, still runs the risk of overwhelming the capacity of the (single) mail server, either due to a malicious action by a sender or a huge increase in usage over a very short period of time. SES absorbs those spikes automatically and has orders of magnitude more capacity than any typical on-premise deployment, meaning that your mail enjoys multiple tiers of buffering only when required, and with no introduced latency if buffering is unnecessary.

Putting it All Together

So how does it all work together? The inbound use case is our main focus. When a message arrives via SMTP, SES first interrogates a back-end directory to confirm that the message is destined for an SES customer. If so, it looks up how the customer’s domain is configured, or if it is a WorkMail customer domain. From there the message passes through the SES message scanner, where its content is evaluated for spam or malware, and a scoring indicator is added to the message headers. That score may result in the message being dropped altogether, or it may result in the message ultimately being delivered to a Junk Mail folder in a WorkMail mailbox. Once scored, the message is either stored in the customer’s S3 storage, or delivered to WorkMail for further processing, such as being put in a specific folder, or redirected to another recipient. Once it’s stored somewhere, the customer can interact with it either using SES APIs, or via standard mail clients interacting with a WorkMail mailbox. In practice a mailbox is a structured object format also within S3, but without raw S3 access because the storage is managed as a system resource within WorkMail instead of being owned by an end customer.

The Customer Experience

When a WorkMail customer wants to send a message, they compose it in a mail client and then click ‘Send’ to send it via SMTP. In the outbound case WorkMail relays the message to SES internet-facing mail relays, which in turn look up the recipient domain information for details on how to route it. SES mail relays also perform the necessary security and authentication checks to ensure that the message is sent by a valid user (either SES native or WorkMail) and that the content is cryptographically signed so a receiving system can verify it hasn’t been manipulated in transit, using the DKIM mechanism described previously. When those steps are complete, the message is handed off to the next mail relay on the internet, and SES has no further role in its future unless a receiving system flags it as abusive. In that case the feedback is delivered to SES automatically and a series of containment actions are considered based on the nature and history of abuse reports. Thus the feedback loop to IP reputation is maintained even in the case of a rogue actor sending bad mail.

Robust Tooling Makes Email Look Easy

The bottom line is that SES enables these flows, and a customer wanting to build a comprehensive mail system could do so themselves if they didn’t want to use WorkMail or another existing email service provider. We’ve seen a tremendous range of creative solution-building from customers when they combine SES inbound and outbound mail, a subset of WorkMail mailboxes and their own rules and organization policies, the use of AWS Lambdas, and inline email security gateways. The flexibility to build whatever you need, without being tied to a single product vendor, is what makes SES so popular with its customers, and ensures that WorkMail – as a turnkey mail service – works so reliably for those customers who just need their mail and calendar to work.

Position2’s Arena Calibrate helps customers drive marketing efficiency with Amazon QuickSight Embedded

2023-06-30 Vinod Nambiar

Post Syndicated from Vinod Nambiar original https://aws.amazon.com/blogs/big-data/position2s-arena-calibrate-helps-customers-drive-marketing-efficiency-with-amazon-quicksight-embedded/

This is a guest post by Vinod Nambiar from Position².

Position² is a leading US-based growth marketing services provider focused on data-driven strategy and technology to deliver growth with improved return on investment (ROI).

Position² was established in 2006 in Silicon Valley and has a clientele spanning American Express, Lenovo, Fujitsu, and Thales. We work with clients ranging from VC-funded startups to Fortune 500 firms. Our 200-member team is based in the US and Bangalore, comprising marketing and software experts, engineers, data scientists, client management, creative writers, and designers. The team brings deep domain expertise in digital, B2B, B2C, analytics, technology, mobile, marketing automation, and UX/UI domain. Our integrated campaigns are powered by cutting-edge content creation, digital advertising, web design/development, marketing automation, and analytics.

We have built two software products to help our customers drive digital marketing success and growth:

Arena is an easy-to-use, intuitive platform that provides clients with a single interface to engage with Position². It includes project management, process workflows, collaboration, and performance tracking, helping deliver superior customer experience, transparency, and real-time data.
Arena Calibrate is a customizable digital marketing dashboard that helps marketers track their cross-platform performance at a glance, saving them hours of manual work. Its machine learning (ML) engine provides automated insights to improve campaign performance and ROI.

In this post, we share how Arena Calibrate helps our customers drive marketing efficiency using Amazon QuickSight Embedded.

Beyond services to a data automation and reporting platform

Very early on, we realized that in order to move the needle on marketing efficiency, we needed to go beyond marketing services and help clients master their data analytics and reporting.

Marketing data challenges are real, and we previously faced them every day across our agency. We’ve used a wide variety of tools in the market, but most of them ultimately expect the user to spend significant energy performing time-consuming analytics activities, like configuring data/platform connectors and building datasets. After all that, we were still missing out on proactive analysis that identifies trends and uncovers optimization opportunities.

We know there are many businesses out there facing similar challenges. This led us to build Arena Calibrate by using the best ML algorithms, data connectors, and business intelligence (BI) platforms.

Arena Calibrate helps marketers leap ahead with the best tech stack available without breaking their budget.

In-depth insights in time: Enter Amazon QuickSight

The challenge was to get data from multiple platforms into one place accurately, automatically, and easily. Each platform has data in different formats. We wanted to help customers answer questions such as, “How do you track your customers’ journey through the funnel? Can you leverage advanced BI techniques to get meaningful insights when you have the data?”

We evaluated a range of products before zeroing in on Amazon QuickSight, a unified, serverless BI service enabling organizations to deliver data-driven insights to all users.

The three primary reasons we selected QuickSight were:

Better together with AWS – We were already using AWS for our Arena platform, including services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Block Store (Amazon EBS), and Application Load Balancer. As a result, we wanted to continue using AWS services to ensure seamless integration within our technology stack.
Customization and automation flexibility – The QuickSight API allowed us to have customization using AWS Identity Access Management (IAM). APIs helped us with authorization and authentication. With IAM APIs, we were able to create users and map them to the required permissions and roles, thereby giving the right permissions to the right dashboards.
Pay as you go model – Consumption-based pricing ensured flexibility for our customers and us by allowing us to innovate with no monetary risks. We could start small and grow with time without being locked into expensive user-based and capacity-based contracts.

Enabling customers to visualize marketing performance and success

The focus of marketing professionals today is twofold: brand building and revenue growth marketing, which ensures that all your marketing operations can be tracked back to revenue and sales. While the former is a longer-term focus, the latter needs operations to be tracked minutely and any changes need to be incorporated immediately to ensure Return on Ad Spend (ROAS).

Arena Calibrate’s advanced software and BI service package allows you to quickly understand the overall health of your marketing funnel and drive impact on lead generation, marketing-qualified leads (MQLs), sales-qualified leads (SQLs), Customer Acquisition Cost (CAC), and ROAS. We provide insights across the funnel from lead to revenue. Our clients use our dashboards to make marketing decisions like campaign performance, website traffic changes, and audience segmentation. The built-in ML engine helps optimize marketing campaigns by adjusting the targeting, messaging, and timing of campaigns based on their performance.

We use QuickSight as the base for our visualization platform, and clients can visualize predictive forecasting models on ad spend and revenue; make decisions based on optimizing spend and arresting churn; and identify high-value customer segments, marketing channel effectiveness, conversion rates, customer acquisition costs, or ROI. We also use QuickSight ML insights to augment trend analysis and automate anomaly detection for our users.

Arena Calibrate enables users to connect to their data sources in minutes and launch their dashboards quickly with ready-made templates. Our BI team is then available to customize the dashboard for deeper insights and gather accurate ROI.

QuickSight is embedded into Arena Calibrate. The QuickSight API allowed us to generate the signed dashboard URL, which was then embedded into Arena Calibrate.

The following screenshot of Arena Calibrate shows the Campaign Performance dashboard.

The Pay-per-click (PPC) & Search Engine Marketing (SEM) dashboard lets you deep dive into key metrics by campaigns, assets, ad formats, landing pages, and device types to better gauge performance.

The E-Commerce dashboard lets you monitor the performance of your online store through key metrics such as abandoned cart rate, average order value, cart conversion rate, and more.

The Cross Channel Campaign Performance dashboard lets you consolidate performance of all relevant metrics across your ad campaigns, enabling you to evaluate campaign ROAS and revenue from one place.

Today, there are dozens of customer dashboards powered by QuickSight across diverse sectors ranging from software as a service (SaaS) to FinTech, IT to healthcare.

Position²’s customer focus

We broadly serve two segments of users. Our Position² services clients automatically have access to the Arena Calibrate dashboard for their business. The pricing for the dashboard is included in the service fee, in most cases.

Arena Calibrate is also available as a standalone BI platform. Here we combine product-led growth (PLG) and sales to attract prospects largely in the SMB and enterprise markets. Prospects can sign up online, connect their platforms, and test-drive Arena Calibrate by integrating up to five standard data sources for free.

We follow a tiered pricing approach based on the number of platforms and the types of platforms (for example, advertising, marketing automation, or CRM) the user needs to integrate, as well as customization needs.

Cost reduction by 50% and accelerated time-to-value with QuickSight

QuickSight allows us to centralize our organization’s BI reporting—both for internal and external purposes. We pull data from the various sources, then use Snowflake as a database and QuickSight as our visualization tool. The QuickSight in-memory engine SPICE (Super-fast, Parallel, In-memory Calculation Engine) on top of its native integration with Snowflake has improved the dashboard load times by up to 20%. Also, the QuickSight console makes it simple to set up datasets for SPICE. There were direct cost savings when we moved to QuickSight because billing is usage-based. We dropped our costs by approximately 50%.

The comprehensive access and security capabilities of QuickSight allow us to support our enterprise customers by providing the right access to relevant dashboards to the right users with ease. QuickSight allows us to make customizations tailored to each customer’s unique requirements. For example, for newer users, we create dashboards by reusing templates and changing the datasets relevant to them—a process we are currently in the process of automating using QuickSight APIs.

QuickSight, along with Snowflake’s transformation and automation capabilities, has allowed us to reduce our dashboard publishing time from 2 days to 2 hours. Templates have helped us reuse the dashboard layout. Customers appreciate seeing various dashboards in one place—across product lines, business lines, and more. With no or very less engineering effort, the BI team is able to build dashboards. We could also enable author embedding with ease, because it doesn’t require any extra coding or licensing with QuickSight.

The BI team also prefers the features such as iFrame-based embedding and reliability of QuickSight over our previous tool. Our prior tool was slow to render, had stability issues, and had lot of downtime, unlike QuickSight, a serverless, auto-scaling BI service.

We have been able to drive effectiveness through a better understanding of spend and the ability to channel the spend to the best possible outcome drivers. In addition, we have seen efficiency gains by faster time to insights and reduced the need to move across multiple tools to access marketing operations data. Overall, we have seen an increase in ROAS across our client base by 3–5% and productivity gains across clients and services teams by 20–25%.

Calibrating the data-driven future

The long-term vision for Arena Calibrate is to empower the data-driven marketer. With advancements in AI and data analytics, Position² is well placed to analyze customers’ data and provide actionable insights and recommendations to improve the performance of campaigns and drive growth. We look forward to the continued collaboration with QuickSight to enable this journey.

To learn more about how QuickSight can help your business with dashboards, reports, and more, visit Amazon QuickSight.

About the Author

Vinod Nambiar is the co-founder at Arena and Managing Director of Position². An engineer with a passion for advertising, Vinod has been instrumental in designing all processes for delivery operations. His passion is to explore how the latest developments in technology can transform digital marketing. He is associated with various global forums in digital marketing and has been part of the faculty at leading marketing institutes in India like Northpoint and Mudra Institute of Communications. When not thinking digital, he can be found doing yoga and reading books ranging from spiritual to fiction. He lives with his wife and two children in Bangalore.

iostudio delivers key metrics to public sector recruiters with Amazon QuickSight

2023-06-27 Jon Walker

Post Syndicated from Jon Walker original https://aws.amazon.com/blogs/big-data/iostudio-delivers-key-metrics-to-public-sector-recruiters-with-amazon-quicksight/

This is a guest post by Jon Walker and Ari Orlinsky from iostudio written in collaboration with Sumitha AP from AWS.

iostudio is an award-winning marketing agency based in Nashville, TN. We build solutions that bring brands to life, making content and platforms work together. We serve our customers, who range from small technology startups to government agencies, as a social media strategy partner with in-house video production capabilities, a creative resource that provides data-driven insights about a campaign’s performance, a content marketing machine with connections across the United States, and a sophisticated customer engagement partner.

We wanted to include interactive, real-time visualizations to support recruiters from one of our government clients. Our previous solution offered visualization of key metrics, but point-in-time snapshots produced only in PDF format. We chose Amazon QuickSight because it gave us dynamic and interactive dashboards embedded in our application, while saving us money and development time.

In this post, we discuss how we built a solution using QuickSight that delivers real-time visibility of key metrics to public sector recruiters.

Modernized analytics and reporting

At iostudio, we faced the challenge of modernizing our government client’s static recruitment marketing analytics solution. Given the limitations of the static PDF charts used for its recruitment marketing data, we recognized the opportunity to introduce real-time interactive dashboards to improve insights needed to drive recruitment marketing initiatives. With the solution we built using QuickSight, recruiters are given access to rich visualizations on interactive dashboards in real time, eliminating uncertainty about whether the information they are looking at is accurate.

We created a QuickSight dashboard as a proof of concept, and it surpassed our expectations because of its advanced visualizations. We embedded QuickSight dashboards into our web application, making it seamless for recruiters to log in and get the insights they need. Because we used QuickSight anonymous embedding APIs, we were able to do this without registering and managing all our users in QuickSight. After this initial proof of concept, we gained confidence in our solution quickly, and we were able to build and launch our solution to production within 4–6 weeks. As a result, we reduced our development time by 60%, allowing us to bring this embedded analytics solution to our government client faster. We also saved 75% on our annual external software costs. Switching to QuickSight has enabled us to better serve our customers.

Taking care of sensitive data

iostudio operates in the AWS GovCloud environment because many of our customers are government agencies. This makes protecting customer data even more important. When building our solution for recruiters, we needed to ensure that recruiters can only see data related to the marketing campaigns that are assigned to them. We used row-level security with tag-based rules in QuickSight to restrict data on a per-user basis.

Integrating with AWS

With the AWS technology stack, we were able to create a custom-fit solution using a regionally diverse, service-oriented model to improve our time to deliver interactive reporting to our customers while also trimming costs. With AWS, we aren’t forced to pay for a bundle with services that we don’t use. We can pick what we need, and use what we need with pay-as-you-go pricing.

Our client had previously been using a data integration tool called Pentaho to get data from different sources into one place, which wasn’t an optimal solution. The following diagram illustrates our updated solution architecture using AWS services.

The custom-fit solution that we built uses AWS Lambda to extract data from three key data sources: a referral marketing tool, Google analytics data, and call center operations data. The data lands in an Amazon Simple Storage Service (Amazon S3) bucket, and from there AWS Glue jobs are used to transform the data and load it into another S3 bucket. QuickSight is connected to this data using Amazon Athena, helping us create real-time and interactive dashboards. This end-to-end extract, transform, and load (ETL) process is run with the help of AWS Step Functions, giving us the ability to orchestrate and monitor all the steps of the ETL process seamlessly.

Conclusion

By switching to QuickSight, we were able to provide our client’s recruiters with key metrics in real time, while reducing our development time and cutting costs significantly. Because the components in the architecture are reusable and interoperable, we were able to extend this solution to even more of our customers.

To learn more about how you can embed customized data visuals and interactive dashboards into any application, visit Amazon QuickSight Embedded.

About the Authors

Jon Walker, Senior Director of Engineering, is a native Nashvillian who has been in the technology field for over 19 years. He oversees enterprise-wide system engineering, development, and technology programs for large federal and DoD clients, as well as iostudio’s commercial clients.

Ari Orlinsky, Director of Information Services, leads iostudio’s Information Systems Department, responsible for AWS Cloud, SaaS applications, on-premises technology, risk assessment, compliance, budgeting, and human resource management. With nearly 20 years’ experience in strategic IS and technology operations, Ari has developed a keen enthusiasm for emerging technologies, DOD security and compliance, large format interactive experiences, and customer service communication technologies. As iostudio’s Technical Product Owner across internal and client-facing applications including a cloud-based omni-channel contact center platform, he advocates for secure deployment of applicable technologies to the cloud while ensuring resilient on-premises data center solutions.

Sumitha AP is a Sr. Solutions Architect at AWS. Sumitha works with SMB customers to help them design secure, scalable, reliable, and cost-effective solutions in the AWS Cloud. She has a focus on data and analytics and provides guidance on building analytics solutions on AWS.

Enable business users to analyze large datasets in your data lake with Amazon QuickSight

2023-06-23 Eliad Maimon

Post Syndicated from Eliad Maimon original https://aws.amazon.com/blogs/big-data/enable-business-users-to-analyze-large-datasets-in-your-data-lake-with-amazon-quicksight/

This blog post is co-written with Ori Nakar from Imperva.

Imperva Cloud WAF protects hundreds of thousands of websites and blocks billions of security events every day. Events and many other security data types are stored in Imperva’s Threat Research Multi-Region data lake.

Imperva harnesses data to improve their business outcomes. To enable this transformation to a data-driven organization, Imperva brings together data from structured, semi-structured, and unstructured sources into a data lake. As part of their solution, they are using Amazon QuickSight to unlock insights from their data.

Imperva’s data lake is based on Amazon Simple Storage Service (Amazon S3), where data is continually loaded. Imperva’s data lake has a few dozen different datasets, in the scale of petabytes. Each day, TBs of new data is added to the data lake, which is then transformed, aggregated, partitioned, and compressed.

In this post, we explain how Imperva’s solution enables users across the organization to explore, visualize, and analyze data using Amazon Redshift Serverless, Amazon Athena, and QuickSight.

Challenges and needs

A modern data strategy gives you a comprehensive plan to manage, access, analyze, and act on data. AWS provides the most complete set of services for the entire end-to-end data journey for all workloads, all types of data, and all desired business outcomes. In turn, this makes AWS the best place to unlock value from your data and turn it into insight.

Redshift Serverless is a serverless option of Amazon Redshift that allows you to run and scale analytics without having to provision and manage data warehouse clusters. Redshift Serverless automatically provisions and intelligently scales data warehouse capacity to deliver high performance for all your analytics. You just need to load and query your data, and you only pay for the compute used for the duration of the workloads on a per-second basis. Redshift Serverless is ideal when it’s difficult to predict compute needs such as variable workloads, periodic workloads with idle time, and steady-state workloads with spikes.

Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, straightforward to use, and makes it simple for anyone with SQL skills to quickly analyze large-scale datasets in multiple Regions.

QuickSight is a cloud-native business intelligence (BI) service that you can use to visually analyze data and share interactive dashboards with all users in the organization. QuickSight is fully managed and serverless, requires no client downloads for dashboard creation, and has a pay-per-session pricing model that allows you to pay for dashboard consumption. Imperva uses QuickSight to enable users with no technical expertise, from different teams such as marketing, product, sales, and others, to extract insight from the data without the help of data or research teams.

QuickSight offers SPICE, an in-memory, cloud-native data store that allows end-users to interactively explore data. SPICE provides consistently fast query performance and automatically scales for high concurrency. With SPICE, you save time and cost because you don’t need to retrieve data from the data source (whether a database or data warehouse) every time you change an analysis or update a visual, and you remove the load of concurrent access or analytical complexity off the underlying data source with the data.

In order for QuickSight to consume data from the data lake, some of the data undergoes additional transformations, filters, joins, and aggregations. Imperva cleans their data by filtering incomplete records, reducing the number of records by aggregations, and applying internal logic to curate millions of security incidents out of hundreds of millions of records.

Imperva had the following requirements for their solution:

High performance with low query latency to enable interactive dashboards
Continuously update and append data to queryable sources from the data lake
Data freshness of up to 1 day
Low cost
Engineering efficiency

The challenge faced by Imperva and many other companies is how to create a big data extract, transform, and load (ETL) pipeline solution that fits these requirements.

In this post, we review two approaches Imperva implemented to address their challenges and meet their requirements. The solutions can be easily implemented while maintaining engineering efficiency, especially with the introduction of Redshift Serverless.

Imperva’s solutions

Imperva needed to have the data lake’s data available through QuickSight continuously. The following solutions were chosen to connect the data lake to QuickSight:

QuickSight caching layer, SPICE – Use Athena to query the data into a QuickSight SPICE dataset
Redshift Serverless – Copy the data to Redshift Serverless and use it as a data source

Our recommendation is to use a solution based on the use case. Each solution has its own advantages and challenges, which we discuss as part of this post.

The high-level flow is the following:

Data is continuously updated from the data lake into either Redshift Serverless or the QuickSight caching layer, SPICE
An internal user can create an analysis and publish it as a dashboard for other internal or external users

The following architecture diagram shows the high-level flow.

High-level flow

In the following sections, we discuss the details about the flow and the different solutions, including a comparison between them, which can help you choose the right solution for you.

Solution 1: Query with Athena and import to SPICE

QuickSight provides inherent capabilities to upload data using Athena into SPICE, which is a straightforward approach that meets Imperva’s requirements regarding simple data management. For example, it suits stable data flows without frequent exceptions, which may result in SPICE full refresh.

You can use Athena to load data into a QuickSight SPICE dataset, and then use the SPICE incremental upload option to load new data to the dataset. A QuickSight dataset will be connected to a table or a view accessible by Athena. A time column (like day or hour) is used for incremental updates. The following table summarizes the options and details.

Option	Description	Pros/Cons
Existing table	Use the built-in option by QuickSight.	Not flexible—the table is imported as is in the data lake.
Dedicated view	A view will let you better control the data in your dataset. It allows joining data, aggregation, or choosing a filter like the date you want to start importing data from. Note that QuickSight allows building a dataset based on custom SQL, but this option doesn’t allow incremental updates.	Large Athena resource consumption on a full refresh.
Dedicated ETL	Create a dedicated ETL process, which is similar to a view, but unlike the view, it allows reuse of the results in case of a full refresh. In case your ETL or view contains grouping or other complex operations, you know that these operations will be done only by the ETL process, according to the schedule you define.	Most flexible, but requires ETL development and implementation and additional Amazon S3 storage.

The following architecture diagram details the options for loading data by Athena into SPICE.

Architecture diagram details the options for loading data by Athena into SPICE

The following code provides a SQL example for a view creation. We assume the existence of two tables, customers and events, with one join column called customer_id. The view is used to do the following:

Aggregate the data from daily to weekly, and reduce the number of rows
Control the start date of the dataset (in this case, 30 weeks back)
Join the data to add more columns (customer_type) and filter it

CREATE VIEW my_dataset AS
SELECT DATE_ADD('day', -DAY_OF_WEEK(day) + 1, day) AS first_day_of_week,
       customer_type, event_type, COUNT(events) AS total_events
FROM my_events INNER JOIN my_customers USING (customer_id)
WHERE customer_type NOT IN ('Reseller')
      AND day BETWEEN DATE_ADD('DAY',-7 * 30 -DAY_OF_WEEK(CURRENT_DATE) + 1, CURRENT_DATE)
      AND DATE_ADD('DAY', -DAY_OF_WEEK(CURRENT_DATE), CURRENT_DATE)
GROUP BY 1, 2, 3

Solution 2: Load data into Redshift Serverless

Redshift Serverless provides full visibility to the data, which can be viewed or edited at any time. For example, if there is a delay in adding data to the data lake or the data isn’t properly added, with Redshift Serverless, you can edit data using SQL statements or retry data loading. Redshift Serverless is a scalable solution that doesn’t have a dataset size limitation.

Redshift Serverless is used as a serving layer for the datasets that are to be used in QuickSight. The pricing model for Redshift Serverless is based on storage utilization and the run of queries; idle compute resources have no associated cost. Setting up a cluster is simple and doesn’t require you to choose node types or amount of storage. You simply load the data to tables you create and start working.

To create a new dataset, you need to create an Amazon Redshift table and run the following process every time data is added:

Transform the data using an ETL process (optional):
- Read data from the tables.
- Transform to the QuickSight dataset schema.
- Write the data to an S3 bucket and load it to Amazon Redshift.
Delete old data if it exists to avoid duplicate data.
Load the data using the COPY command.

The following architecture diagram details the options to load data into Redshift Serverless with or without an ETL process.

Architecture diagram details the options to load data into Redshift Serverless with or without an ETL process

The Amazon Redshift COPY command is simple and fast. For example, to copy daily partition Parquet data, use the following code:

COPY my_table
FROM 's3://my_bucket/my_table/day=2022-01-01'
IAM_ROLE 'my_role' 
FORMAT AS PARQUET

Use the following COPY command to load the output file of the ETL process. Values will be truncated according to Amazon Redshift column size. The column truncation is important because, unlike in the data lake, in Amazon Redshift, the column size must be set. This option prevents COPY failures:

COPY my_table
FROM 's3://my_bucket/my_table/day=2022-01-01'
IAM_ROLE 'my_role' 
FORMAT AS JSON GZIP TRUNCATECOLUMNS

The Amazon Redshift COPY operation provides many benefits and options. It supports multiple formats as well as column mapping, escaping, and more. It also allows more control over data format, object size, and options to tune the COPY operation for improved performance. Unlike data in the data lake, Amazon Redshift has column length specifications. We use TRUNCATECOLUMNS to truncates the data in columns to the appropriate number of characters so that it fits the column specification.

Using this method provides full control over the data. In case of a problem, we can repair parts of the table by deleting old data and loading the data again. It’s also possible to use the QuickSight dataset JOIN option, which is not available in SPICE when using incremental update.

Additional benefit of this approach is that the data is available for other clients and services looking to use the same data, such as SQL clients or notebooks servers such as Apache Zeppelin.

Conclusion

QuickSight allows Imperva to expose business data to various departments within an organization. In the post, we explored approaches for importing data from a data lake to QuickSight, whether continuously or incrementally.

However, it’s important to note that there is no one-size-fits-all solution; the optimal approach will depend on the specific use case. Both options—continuous and incremental updates—are scalable and flexible, with no significant cost differences observed for our dataset and access patterns.

Imperva found incremental refresh to be very useful and uses it for simple data management. For more complex datasets, Imperva has benefitted from the greater scalability and flexibility provided by Redshift Serverless.

In cases where a higher degree of control over the datasets was required, Imperva chose Redshift Serverless so that data issues could be addressed promptly by deleting, updating, or inserting new records as necessary.

With the integration of dashboards, individuals can now access data that was previously inaccessible to them. Moreover, QuickSight has played a crucial role in streamlining our data distribution processes, enabling data accessibility across all departments within the organization.

To learn more, visit Amazon QuickSight.

About the Authors

Eliad Maimon is a Senior Startups Solutions Architect at AWS in Tel-Aviv with over 20 years of experience in architecting, building, and maintaining software products. He creates architectural best practices and collaborates with customers to leverage cloud and innovation, transforming businesses and disrupting markets. Eliad is specializing in machine learning on AWS, with a focus in areas such as generative AI, MLOps, and Amazon SageMaker.

Ori Nakar is a principal cyber-security researcher, a data engineer, and a data scientist at Imperva Threat Research group. Ori has many years of experience as a software engineer and engineering manager, focused on cloud technologies and big data infrastructure.

Amazon OpenSearch Service’s vector database capabilities explained

2023-06-22 Jon Handler

Post Syndicated from Jon Handler original https://aws.amazon.com/blogs/big-data/amazon-opensearch-services-vector-database-capabilities-explained/

OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, security monitoring, and observability applications, licensed under the Apache 2.0 license. It comprises a search engine, OpenSearch, which delivers low-latency search and aggregations, OpenSearch Dashboards, a visualization and dashboarding tool, and a suite of plugins that provide advanced capabilities like alerting, fine-grained access control, observability, security monitoring, and vector storage and processing. Amazon OpenSearch Service is a fully managed service that makes it simple to deploy, scale, and operate OpenSearch in the AWS Cloud.

As an end-user, when you use OpenSearch’s search capabilities, you generally have a goal in mind—something you want to accomplish. Along the way, you use OpenSearch to gather information in support of achieving that goal (or maybe the information is the original goal). We’ve all become used to the “search box” interface, where you type some words, and the search engine brings back results based on word-to-word matching. Let’s say you want to buy a couch in order to spend cozy evenings with your family around the fire. You go to Amazon.com, and you type “a cozy place to sit by the fire.” Unfortunately, if you run that search on Amazon.com, you get items like fire pits, heating fans, and home decorations—not what you intended. The problem is that couch manufacturers probably didn’t use the words “cozy,” “place,” “sit,” and “fire” in their product titles or descriptions.

In recent years, machine learning (ML) techniques have become increasingly popular to enhance search. Among them are the use of embedding models, a type of model that can encode a large body of data into an n-dimensional space where each entity is encoded into a vector, a data point in that space, and organized such that similar entities are closer together. An embedding model, for instance, could encode the semantics of a corpus. By searching for the vectors nearest to an encoded document — k-nearest neighbor (k-NN) search — you can find the most semantically similar documents. Sophisticated embedding models can support multiple modalities, for instance, encoding the image and text of a product catalog and enabling similarity matching on both modalities.

A vector database provides efficient vector similarity search by providing specialized indexes like k-NN indexes. It also provides other database functionality like managing vector data alongside other data types, workload management, access control and more. OpenSearch’s k-NN plugin provides core vector database functionality for OpenSearch, so when your customer searches for “a cozy place to sit by the fire” in your catalog, you can encode that prompt and use OpenSearch to perform a nearest neighbor query to surface that 8-foot, blue couch with designer arranged photographs in front of fireplaces.

Using OpenSearch Service as a vector database

With OpenSearch Service’s vector database capabilities, you can implement semantic search, Retrieval Augmented Generation (RAG) with LLMs, recommendation engines, and search rich media.

Semantic search

With semantic search, you improve the relevance of retrieved results using language-based embeddings on search documents. You enable your search customers to use natural language queries, like “a cozy place to sit by the fire” to find their 8-foot-long blue couch. For more information, refer to Building a semantic search engine in OpenSearch to learn how semantic search can deliver a 15% relevance improvement, as measured by normalized discounted cumulative gain (nDCG) metrics compared with keyword search. For a concrete example, our Improve search relevance with ML in Amazon OpenSearch Service workshop explores the difference between keyword and semantic search, based on a Bidirectional Encoder Representations from Transformers (BERT) model, hosted by Amazon SageMaker to generate vectors and store them in OpenSearch. The workshop uses product question answers as an example to show how keyword search using the keywords/phrases of the query leads to some irrelevant results. Semantic search is able to retrieve more relevant documents by matching the context and semantics of the query. The following diagram shows an example architecture for a semantic search application with OpenSearch Service as the vector database.

Architecture diagram showing how to use Amazon OpenSearch Service to perform semantic search to improve relevance

Retrieval Augmented Generation with LLMs

RAG is a method for building trustworthy generative AI chatbots using generative LLMs like OpenAI, ChatGPT, or Amazon Titan Text. With the rise of generative LLMs, application developers are looking for ways to take advantage of this innovative technology. One popular use case involves delivering conversational experiences through intelligent agents. Perhaps you’re a software provider with knowledge bases for product information, customer self-service, or industry domain knowledge like tax reporting rules or medical information about diseases and treatments. A conversational search experience provides an intuitive interface for users to sift through information through dialog and Q&A. Generative LLMs on their own are prone to hallucinations—a situation where the model generates a believable but factually incorrect response. RAG solves this problem by complementing generative LLMs with an external knowledge base that is typically built using a vector database hydrated with vector-encoded knowledge articles.

As illustrated in the following diagram, the query workflow starts with a question that is encoded and used to retrieve relevant knowledge articles from the vector database. Those results are sent to the generative LLM whose job is to augment those results, typically by summarizing the results as a conversational response. By complementing the generative model with a knowledge base, RAG grounds the model on facts to minimize hallucinations. You can learn more about building a RAG solution in the Retrieval Augmented Generation module of our semantic search workshop.

Architecture diagram showing how to use Amazon OpenSearch Service to perform retrieval-augmented generation

Recommendation engine

Recommendations are a common component in the search experience, especially for ecommerce applications. Adding a user experience feature like “more like this” or “customers who bought this also bought that” can drive additional revenue through getting customers what they want. Search architects employ many techniques and technologies to build recommendations, including Deep Neural Network (DNN) based recommendation algorithms such as the two-tower neural net model, YoutubeDNN. A trained embedding model encodes products, for example, into an embedding space where products that are frequently bought together are considered more similar, and therefore are represented as data points that are closer together in the embedding space. Another possibility
is that product embeddings are based on co-rating similarity instead of purchase activity. You can employ this affinity data through calculating the vector similarity between a particular user’s embedding and vectors in the database to return recommended items. The following diagram shows an example architecture of building a recommendation engine with OpenSearch as a vector store.

Architecture diagram showing how to use Amazon OpenSearch Service as a recommendation engine

Media search

Media search enables users to query the search engine with rich media like images, audio, and video. Its implementation is similar to semantic search—you create vector embeddings for your search documents and then query OpenSearch Service with a vector. The difference is you use a computer vision deep neural network (e.g. Convolutional Neural Network (CNN)) such as ResNet to convert images into vectors. The following diagram shows an example architecture of building an image search with OpenSearch as the vector store.

Architecture diagram showing how to use Amazon OpenSearch Service to search rich media like images, videos, and audio files

Understanding the technology

OpenSearch uses approximate nearest neighbor (ANN) algorithms from the NMSLIB, FAISS, and Lucene libraries to power k-NN search. These search methods employ ANN to improve search latency for large datasets. Of the three search methods the k-NN plugin provides, this method offers the best search scalability for large datasets. The engine details are as follows:

Non-Metric Space Library (NMSLIB) – NMSLIB implements the HNSW ANN algorithm
Facebook AI Similarity Search (FAISS) – FAISS implements both HNSW and IVF ANN algorithms
Lucene – Lucene implements the HNSW algorithm

Each of the three engines used for approximate k-NN search has its own attributes that make one more sensible to use than the others in a given situation. You can follow the general information in this section to help determine which engine will best meet your requirements.

In general, NMSLIB and FAISS should be selected for large-scale use cases. Lucene is a good option for smaller deployments, but offers benefits like smart filtering where the optimal filtering strategy—pre-filtering, post-filtering, or exact k-NN—is automatically applied depending on the situation. The following table summarizes the differences between each option.

.	NMSLIB-HNSW	FAISS-HNSW	FAISS-IVF	Lucene-HNSW
Max Dimension	16,000	16,000	16,000	1024
Filter	Post filter	Post filter	Post filter	Filter while search
Training Required	No	No	Yes	No
Similarity Metrics	l2, innerproduct, cosinesimil, l1, linf	l2, innerproduct	l2, innerproduct	l2, cosinesimil
Vector Volume	Tens of billions	Tens of billions	Tens of billions	< Ten million
Indexing latency	Low	Low	Lowest	Low
Query Latency & Quality	Low latency & high quality	Low latency & high quality	Low latency & low quality	High latency & high quality
Vector Compression	Flat	Flat Product Quantization	Flat Product Quantization	Flat
Memory Consumption	High	High Low with PQ	Medium Low with PQ	High

Approximate and exact nearest-neighbor search

The OpenSearch Service k-NN plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors: approximate k-NN, score script (exact k-NN), and painless extensions (exact k-NN).

Approximate k-NN

The first method takes an approximate nearest neighbor approach—it uses one of several algorithms to return the approximate k-nearest neighbors to a query vector. Usually, these algorithms sacrifice indexing speed and search accuracy in return for performance benefits such as lower latency, smaller memory footprints, and more scalable search. Approximate k-NN is the best choice for searches over large indexes (that is, hundreds of thousands of vectors or more) that require low latency. You should not use approximate k-NN if you want to apply a filter on the index before the k-NN search, which greatly reduces the number of vectors to be searched. In this case, you should use either the score script method or painless extensions.

Score script

The second method extends the OpenSearch Service score script functionality to run a brute force, exact k-NN search over knn_vector fields or fields that can represent binary objects. With this approach, you can run k-NN search on a subset of vectors in your index (sometimes referred to as a pre-filter search). This approach is preferred for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indexes may lead to high latencies.

Painless extensions

The third method adds the distance functions as painless extensions that you can use in more complex combinations. Similar to the k-NN score script, you can use this method to perform a brute force, exact k-NN search across an index, which also supports pre-filtering. This approach has slightly slower query performance compared to the k-NN score script. If your use case requires more customization over the final score, you should use this approach over score script k-NN.

Vector search algorithms

The simple way to find similar vectors is to use k-nearest neighbors (k-NN) algorithms, which compute the distance between a query vector and the other vectors in the vector database. As we mentioned earlier, the score script k-NN and painless extensions search methods use the exact k-NN algorithms under the hood. However, in the case of extremely large datasets with high dimensionality, this creates a scaling problem that reduces the efficiency of the search. Approximate nearest neighbor (ANN) search methods can overcome this by employing tools that restructure indexes more efficiently and reduce the dimensionality of searchable vectors. There are different ANN search algorithms; for example, locality sensitive hashing, tree-based, cluster-based, and graph-based. OpenSearch implements two ANN algorithms: Hierarchical Navigable Small Worlds (HNSW) and Inverted File System (IVF). For a more detailed explanation of how the HNSW and IVF algorithms work in OpenSearch, see blog post “Choose the k-NN algorithm for your billion-scale use case with OpenSearch”.

Hierarchical Navigable Small Worlds

The HNSW algorithm is one of the most popular algorithms out there for ANN search. The core idea of the algorithm is to build a graph with edges connecting index vectors that are close to each other. Then, on search, this graph is partially traversed to find the approximate nearest neighbors to the query vector. To steer the traversal towards the query’s nearest neighbors, the algorithm always visits the closest candidate to the query vector next.

Inverted File

The IVF algorithm separates your index vectors into a set of buckets, then, to reduce your search time, only searches through a subset of these buckets. However, if the algorithm just randomly split up your vectors into different buckets, and only searched a subset of them, it would yield a poor approximation. The IVF algorithm uses a more elegant approach. First, before indexing begins, it assigns each bucket a representative vector. When a vector is indexed, it gets added to the bucket that has the closest representative vector. This way, vectors that are closer to each other are placed roughly in the same or nearby buckets.

Vector similarity metrics

All search engines use a similarity metric to rank and sort results and bring the most relevant results to the top. When you use a plain text query, the similarity metric is called TF-IDF, which measures the importance of the terms in the query and generates a score based on the number of textual matches. When your query includes a vector, the similarity metrics are spatial in nature, taking advantage of proximity in the vector space. OpenSearch supports several similarity or distance measures:

Euclidean distance – The straight-line distance between points.
L1 (Manhattan) distance – The sum of the differences of all of the vector components. L1 distance measures how many orthogonal city blocks you need to traverse from point A to point B.
L-infinity (chessboard) distance – The number of moves a King would make on an n-dimensional chessboard. It’s different than Euclidean distance on the diagonals—a diagonal step on a 2-dimensional chessboard is 1.41 Euclidean units away, but 2 L-infinity units away.
Inner product – The product of the magnitudes of two vectors and the cosine of the angle between them. Usually used for natural language processing (NLP) vector similarity.
Cosine similarity – The cosine of the angle between two vectors in a vector space.
Hamming distance – For binary-coded vectors, the number of bits that differ between the two vectors.

Advantage of OpenSearch as a vector database

When you use OpenSearch Service as a vector database, you can take advantage of the service’s features like usability, scalability, availability, interoperability, and security. More importantly, you can use OpenSearch’s search features to enhance the search experience. For example, you can use Learning to Rank in OpenSearch to integrate user clickthrough behavior data into your search application and improve search relevance. You can also combine OpenSearch text search and vector search capabilities to search documents with keyword and semantic similarity. You can also use other fields in the index to filter documents to improve relevance. For advanced users, you can use a hybrid scoring model to combine OpenSearch’s text-based relevance score, computed with the Okapi BM25 function and its vector search score to improve the ranking of your search results.

Scale and limits

OpenSearch as vector database support billions of vector records. Keep in mind the following calculator regarding number of vectors and dimensions to size your cluster.

Number of vectors

OpenSearch VectorDB takes advantage of the sharding capabilities of OpenSearch and can scale to billions of vectors at single-digit millisecond latencies by sharding vectors and scale horizontally by adding more nodes. The number of vectors that can fit in a single machine is a function of the off-heap memory availability on the machine. The number of nodes required will depend on the amount of memory that can be used for the algorithm per node and the total amount of memory required by the algorithm. The more nodes, the more memory and better performance. The amount of memory available per node is computed as memory_available = (node_memory – jvm_size) * circuit_breaker_limit, with the following parameters:

node_memory – The total memory of the instance.
jvm_size – The OpenSearch JVM heap size. This is set to half of the instance’s RAM, capped at approximately 32 GB.
circuit_breaker_limit – The native memory usage threshold for the circuit breaker. This is set to 0.5.

Total cluster memory estimation depends on total number of vector records and algorithms. HNSW and IVF have different memory requirements. You can refer to Memory Estimation for more details.

Number of dimensions

OpenSearch’s current dimension limit for the vector field knn_vector is 16,000 dimensions. Each dimension is represented as a 32-bit float. The more dimensions, the more memory you’ll need to index and search. The number of dimensions is usually determined by the embedding models that translate the entity to a vector. There are a lot of options to choose from when building your knn_vector field. To determine the correct methods and parameters to choose, refer to Choosing the right method.

Customer stories:

Amazon Music

Amazon Music is always innovating to provide customers with unique and personalized experiences. One of Amazon Music’s approaches to music recommendations is a remix of a classic Amazon innovation, item-to-item collaborative filtering, and vector databases. Using data aggregated based on user listening behavior, Amazon Music has created an embedding model that encodes music tracks and customer representations into a vector space where neighboring vectors represent tracks that are similar. 100 million songs are encoded into vectors, indexed into OpenSearch, and served across multiple geographies to power real-time recommendations. OpenSearch currently manages 1.05 billion vectors and supports a peak load of 7,100 vector queries per second to power Amazon Music recommendations.

The item-to-item collaborative filter continues to be among the most popular methods for online product recommendations because of its effectiveness at scaling to large customer bases and product catalogs. OpenSearch makes it easier to operationalize and further the scalability of the recommender by providing scale-out infrastructure and k-NN indexes that grow linearly with respect to the number of tracks and similarity search in logarithmic time.

The following figure visualizes the high-dimensional space created by the vector embedding.

A visualization of the vector encoding of Amazon Music entries in the large vector space

Brand protection at Amazon

Amazon strives to deliver the world’s most trustworthy shopping experience, offering customers the widest possible selection of authentic products. To earn and maintain our customers’ trust, we strictly prohibit the sale of counterfeit products, and we continue to invest in innovations that ensure only authentic products reach our customers. Amazon’s brand protection programs build trust with brands by accurately representing and completely protecting their brand. We strive to ensure that public perception mirrors the trustworthy experience we deliver. Our brand protection strategy focuses on four pillars: (1) Proactive Controls (2) Powerful Tools to Protect Brands (3) Holding Bad Actors Accountable (4) Protecting and Educating Customers. Amazon OpenSearch Service is a key part of Amazon’s Proactive Controls.

In 2022, Amazon’s automated technology scanned more than 8 billion attempted changes daily to product detail pages for signs of potential abuse. Our proactive controls found more than 99% of blocked or removed listings before a brand ever had to find and report it. These listings were suspected of being fraudulent, infringing, counterfeit, or at risk of other forms of abuse. To perform these scans, Amazon created tooling that uses advanced and innovative techniques, including the use of advanced machine learning models to automate the detection of intellectual property infringements in listings across Amazon’s stores globally. A key technical challenge in implementing such automated system is the ability to search for protected intellectual property within a vast billion-vector corpus in a fast, scalable and cost effective manner. Leveraging Amazon OpenSearch Service’s scalable vector database capabilities and distributed architecture, we successfully developed an ingestion pipeline that has indexed a total of 68 billion, 128- and 1024-dimension vectors into OpenSearch Service to enable brands and automated systems to conduct infringement detection, in real-time, through a highly available and fast (sub-second) search API.

Conclusion

Whether you’re building a generative AI solution, searching rich media and audio, or bringing more semantic search to your existing search-based application, OpenSearch is a capable vector database. OpenSearch supports a variety of engines, algorithms, and distance measures that you can employ to build the right solution. OpenSearch provides a scalable engine that can support vector search at low latency and up to billions of vectors. With OpenSearch and its vector DB capabilities, your users can find that 8-foot-blue couch easily, and relax by a cozy fire.

About the Authors

Jon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included four years of coding a large-scale, eCommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a Ph. D. in Computer Science and Artificial Intelligence from Northwestern University.

Jianwei Li is a Principal Analytics Specialist TAM at Amazon Web Services. Jianwei provides consultant service for customers to help customer design and build modern data platform. Jianwei has been working in big data domain as software developer, consultant and tech leader.

Dylan Tong is a Senior Product Manager at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.

Vamshi Vijay Nakkirtha is a Software Engineering Manager working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include distributed systems. He is an active contributor to various plugins, like k-NN, GeoSpatial, and dashboard-maps.

How GoDaddy Implemented a Multi-Region Event-Driven Platform at Scale

2023-06-16 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/how-godaddy-implemented-a-multi-region-event-driven-platform-at-scale/

GoDaddy, a leading global provider of domain registration and web hosting services, has served over 84 million domains and 22 million customers since its establishment in 1997. Among its various internal systems, the Customer Signal Platform provides tooling to capture, analyze, and act on customer and product data to drive better business outcomes. With this platform, GoDaddy can track user visits and interactions on its website and use meaningful event data to improve its customer experience and overall business performance.

Nowadays, the Customer Signal Platform processes 400 million events every day. As GoDaddy expands its integrations, it aims to increase this number to 2 billion events per day in the near future.

When building the Customer Signal Platform, GoDaddy had three main requirements for the system architecture:

Minimize their operational load.
Scale automatically as traffic changes.
Provide high availability and ensure that all the customer signals are captured.

Amazon EventBridge Event Bus
After evaluating many options against their requirements, GoDaddy decided to implement the customer signal platform using Amazon EventBridge Event Bus. EventBridge Event Bus is a serverless event bus that helps you receive, filter, transform, route, and deliver events. Because EventBridge is serverless, it requires minimal configuration to get started and scales automatically—GoDaddy’s first two requirements were checked.

To comply with the third requirement, the solution needed to provide business continuity and ensure that no event is lost from the moment the client produces it until it gets to the platform to be analyzed. EventBridge Event Bus comes with many features that helped GoDaddy build their application with this requirement in mind.

The main feature that GoDaddy took advantage of was global endpoints. EventBridge global endpoints provide a reliable and simple way to improve the business continuity of event-driven applications. This new feature, added in 2022, allows customers to build a multi-Region event-driven application.

EventBridge Global Endpoints
Global endpoints allow you to configure a managed DNS endpoint in EventBridge, to which your applications will send events. Then you need to configure two custom event buses in two distinct AWS Regions. One is the primary Region, and the other is the failover, or secondary Region. The failover of events is decided based on the health indicated by an Amazon Route 53 health check. When the health check is healthy, the events are routed from the global endpoint to the custom event bus in the primary Region. And if the health check is unhealthy, then the global endpoint will send the events to the event bus in the secondary Region.

The simplest configuration for global endpoints is the active/archive configuration. This configuration provides business continuity and simplicity at the same time. The active/archive configuration defines two different Regions. The primary Region is where the application is deployed and all the business processes are happening. The archive Region is where only a custom bus is deployed and all the events are archived.

In addition, there is a bidirectional replication rule between the buses in separate Regions. In the normal case, when there are no errors, whenever an event arrives at the custom bus in the primary Region, the event is automatically replicated to the archive custom bus in the secondary Region.

In the case of failover, the global endpoint redirects the events to the secondary Region, where they get archived for processing at another time.

GoDaddy Implementation of Global Endpoints
GoDaddy was looking for a solution that minimized their operations load while still providing business continuity, and that is why they adopted global endpoints and the active/archive configuration. In this way, they could have the event processing logic in their primary Region and have a secondary Region in case of any issues.

In their configuration, events are archived in the secondary Region for 30 days, after which the events expire. In the case of a failover, because they don’t need to process the events in real time, they collect them in the archive. If the issue is resolved within 24 hours, the retention period for the replication rule, the events are sent automatically to the primary Region. If the issue is solved in more than 24 hours the events need to be replayed to the primary Region.

The following image shows what their current solution looks like. They are working with two Regions. US West (Oregon) is their primary Region and is the location of the data lake, which is the primary consumer of the events. US East (N. Virginia) is the secondary Region. Events are being produced in different clients; from the clients, they are sent to Amazon API Gateway. GoDaddy deployed two API Gateways in their two Regions. The events are sent to the API Gateway with the smallest latency from the client. To do that, they use latency-based routing provided by Amazon Route 53. Then events are sent to an AWS Lambda function that validates the events and forwards them to the EventBridge global endpoint at the DNS level.

The global endpoint is configured with the active/archive setup, and the failover is configured to be triggered via a Route 53 health check that monitors an Amazon CloudWatch alarm. That alarm observes the IngestionToInvocationStartLatency metric in the primary Region.

IngestionToInvocationStartLatency is a service-level metric that exposes the time to process events from the point at which they are ingested by EventBridge to the point the first invocation of a target in the configured rules is made. This metric is measured across all the rules in your bus and provides an indication of the health of the EventBridge service. Any extended periods of high latency over 30 seconds indicate a service disruption.

When the system is in the normal state, the events are forwarded from the global endpoint to the custom ingress event bus in the primary Region. That custom event bus has replication enabled; this means that all the events that arrive at the bus get replicated automatically in the secondary Region custom ingress event bus.

All the events received by the ingress event bus are sent to the enrichment function. This function performs basic validation and authentication, and it enriches the event data to make sure that all the events from different clients are standard.

From there, the events are forwarded to the data platform event bus to be sent to the different consumer targets. The main target is their data lake solution, which analyzes all the events.

What Was the Impact?
For GoDaddy, business continuity is important, and their customer signals are not getting lost due to any issue with their platform. This makes them confident that they can expand their customer signal platforms from 400 million events per day to 2 billion events per day without introducing any additional operations overhead.

Now, they can confidently process hundreds of millions of events per day to their system, and they can keep on growing. The following image shows the number of events ingested by global endpoints in a normal day.

While GoDaddy’s use of the active/archive pattern enables them to ensure they never lose any events, they’re already starting to see certain use cases where they want to minimize any delays in processing their events, even when service disruptions occur. Because they’re already replicating their events to a secondary Region, they can deploy their most critical consumers to both Regions and enable an active/active configuration for their mission-critical systems. Active/active configuration allows you to process parallel events in both the primary and secondary Regions, simplifying the processing of events even during disruptions and enabling business continuity.

The vision when building the Customer Signal Platform was to align with GoDaddy’s high bar for reliability, scalability, and maintainability and, at the same time, keep the platform self-service so that developers can focus on business needs. This led GoDaddy to choose Amazon EventBridge global endpoints and serverless technologies to build this solution.

GoDaddy Customer Signal Platform is an excellent example of what serverless technologies enable. By leveraging the cloud to handle as much of the undifferentiated heavy lifting as possible, GoDaddy has reduced the operational complexity of setting up an event bus for a multi-Region strategy, implemented failover mechanisms in the case of Regional distruptions, and ensured that events are not lost by enabling replication. Global endpoints active/archive configuration improves the availability of customer applications with the least amount of configuration changes.

If you want to get started with EventBridge global endpoints, you can check out this talk on event-driven applications. For a working demo on how to use EventBridge global endpoints for failover events, check out this Serverless Land repository.

— Marcia

Deploying an automated Amazon CloudWatch dashboard for AWS Outposts using AWS CDK

2023-06-15 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/deploying-an-automated-amazon-cloudwatch-dashboard-for-aws-outposts-using-aws-cdk/

This post is written by Enrico Liguori, Networking Solutions Architect, Hybrid Cloud and Sumeeth Siriyur, Sr. Hybrid Cloud Solutions Architect.

AWS Outposts is a fully managed service that brings the same AWS infrastructure, services, APIs, and tools to virtually any data center, colocation space, manufacturing floor, or on-premises facility where it might be needed. With Outposts, you can run some AWS services on-premises and connect to a broad range of services available in the local AWS Region. Outposts supports workloads requiring low latency, local data processing, data residency, and application migration.

Outposts capacity is driven as per your compute and storage requirements to run workloads. You can monitor Outposts resources using metrics gathered by Amazon CloudWatch. Using these metrics, you can effectively monitor and manage the Outposts resources as they would in the Region, levereging cloud native tools such as CloudWatch dashboards. Check the Monitoring best practices for AWS Outposts blog post to dive deep into the available monitoring options for Outposts.

CloudWatch dashboards are customizable home pages in the CloudWatch console that can be used to monitor resources running on Outposts in a single view. For example, you can monitor in a single pane the number Amazon EC2 instances used per EC2 instance type, the available capacity of Amazon EBS volumes and Amazon S3 buckets, and the operational status of the service link of Outposts.

As a you start deploying additional Outposts resources as a part of their capacity expansion, they must all be integrated and visualized within CloudWatch in an automated way. Traditionally CloudWatch dashboards are built manually and may be time consuming to tune. This post provides also an overview of building CloudWatch dashboards in an automated way using AWS Cloud Development Kit (AWS CDK).

Overview

CloudWatch metrics available to monitor Outposts resources and capacity

CloudWatch metrics for Outposts are available to customers in all public AWS Regions and AWS GovCloud (US) at no additional cost. We can classify the available metrics in two main categories:

Metrics that can be used to gain visibility into Outposts capacity and operativity, published mainly under AWS/Outposts namespace.For Amazon Simple Storage Service (Amazon S3) on Outpost, the capacity metrics are published under AWS/S3_Outposts namespace.
Service specific metrics that can be used to monitor each resource hosted on Outposts, such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Load Balancer (Amazon ELB), Amazon Relational Database Service (Amazon RDS). Just like for any resource hosted in the Region, these metrics are published under service specific namespaces. For example, when you launch a new EC2 instance in Outposts, it starts publishing Amazon EC2 metrics under AWS/EC2

To identify the metrics published under the service specific namespaces, we can leverage metadata in the form of tags. A tag is a label that you assign to an AWS resource and consists of a key and an optional value. For the purpose of the monitoring strategy described in this post, we use a tag that contains the OutpostID of the Outpost where the resource is deployed. In this way, we can easily filter the CloudWatch metrics that we would like to show in our dashboard.

To enforce the assignment of tags to our resources we can implement a tagging strategy using AWS tag Policies and Service Control Policies (SCPs).

The following sections describe two different methods to build a CloudWatch dashboard that includes the different types of metrics described so far. In both cases, we see how particularly useful the presence of tags is to identify the service-specific metrics.

Manual approach to building a CloudWatch dashboard for Outposts

This section describes a manual (i.e., non-automated) approach to building a dashboard that could summarize both the capacity utilization metrics and the service specific metrics for your resources running on Outposts.

The benefit of this approach is that we can implement a fully operational dashboard directly from the CloudWatch console. However, it will simultaneously require more effort to properly tune the dashboard to satisfy your monitoring requirements.

You can start creating the dashboard opening the CloudWatch console and following the steps listed in the public documentation.

To display a metric under AWS/Outposts namespace we can choose any of the widgets available. Based on the nature of the data, we can choose different types of Widgets such as Number, Line, Gauge, Explorer, or you can even build your own custom widget.

Together with the Widget type, we must select Outposts namespace in the metric graph dialog box and then navigate to the specific metric of interest.

In case we are creating the dashboard in a different account than the Outposts owner, we must select the right account in the View data drop-down menu to see the Outposts metric in which we are interested.

After selecting one or more metrics we can select Create widget button.

For the service specific metrics, we recommend using the explorer widget. In this way, we can utilize the tagging strategy described earlier to automatically identify the metrics belonging to the resources running on Outposts. Check the documentation page for a step-by-step guide for creating an explorer widget based on tags.

Automated outpost dashboard

After we’ve seen how to build a dashboard manually from the console, in this secton we describe an automated approach to deploy a dashboard for Outposts through AWS CDK.

AWS CDK is an open source software development framework to model and provision your cloud application resources using familiar programming languages, including TypeScript, JavaScript, Python, C#, and Java. For the solution in this post, we use Python.

Architecture overview

The AWS CDK stack described in this post, assumes that the resources running on Outposts (EC2 instances, S3 buckets, Application Load Balancers (ALBs), and RDS instances) are tagged using the tagging strategy described earlier.

Specifying a tag name and a tag value in a configuration file automatically discovers the resources with that tag and adds the related metrics to the CloudWatch dashboard.

Together with the service specific metrics, it creates a series of widgets that we can use to monitor the capacity available and utilized in each Outpost that belongs to the account where the script is running.

The workflow is made of the following phases:

The AWS CDK stack creates an AWS CodeCommit repository and uploads its own code into it. The code contains a series of modules, one for each section of the CloudWatch dashboard. A section of the dashboard contains one or more widgets showing the metrics of a specific service.
To maintain the CloudWatch dashboard always up-to-date with the resources matching the tag, it creates a pipeline in AWS CodePipeline that can dynamically create and or update the dashboard. The pipeline runs the code in the CodeCommit repository and is made of two stages. In the first one, the build stage, it builds the dependencies needed by the AWS CDK stack. In the second stage, the Deploy stage, it loads and runs the modules used to build the dashboard.
Each module contains the code to automatically discover the tagged resources of a specific service. This discovery phase uses standard AWS APIs called through the Python SDK Boto3.
Based on the results of the discovery phase, AWS CDK produces an AWS CloudFormation template containing the definition of the CloudWatch dashboard sections. The template is submitted to CloudFormation.
CloudFormation creates or, if already defined, updates the CloudWatch dashboard.
Together with the dashboard, the AWS CDK script also contains the definition of a CloudWatch Event that, once deployed, triggers the pipeline each time a resource tagged with the specified tag is created or destroyed.

Prerequisites

To implement the solution presented in this post, you must configure:

git as distributed version control system.
In case it is the first time that you’re using AWS CDK in this account and region, you must:

a. Install the AWS CDK, and its prerequisites, following these instructions.

b. Go through the AWS CDK bootstrapping process. This is required only for the first time that we use AWS CDK in a specific AWS environment (an AWS environment is a combination of an AWS account and Region).

How to install

Step 1: Clone the AWS CDK code hosted on GitHub with:

$ git clone https://github.com/aws-samples/automated-cloudwatch-dashboard.git

Step 2: enter the directory using the following:

$ cd  automated-cloudwatch-dashboard/

Step 3: Install the needed Python dependencies with:

$ pip install -r requirements.txt

Step 4: Modify the configuration file

Before deploying the stack, we must modify the configuration file to specify the tag we use for identifying our resources running on Outposts. Open the file with the name config.yaml with your preferred text editor and specify:

- - A name for the dashboard. The default name used is Automated-CloudWatch-Dashboard.
  - Replace <tag_name> placeholder following the tag_name variable with the tag name used to tag the resources that you want to include in the dashboard.
  - Replace <tag_value> placeholder under tag_values variable with the tag value that you used.

Here is an example config.yaml configuration file:

dashboard_name: Automated-CloudWatch-Dahsboard
tag_name: OutpostID
tag_values:
  - op-1234567890abcdefg

Stack deployment

We can deploy the stack with the following:

$ cdk deploy

At the end of the deployment process, the pipeline that creates the dashboard is provisioned. You can now go to your CloudWatch console to view it.

Automated Outposts dashboard overview

Now that we have built our dashboard, let’s review each section:

Outpost capacity

The AWS CDK stacks define a capacity section for each Outpost available to the AWS account where the script runs.

In this section, we find four widgets showing metrics published under the AWS/Outpost namespace. The first widget shows for each EC2 instance type available on the Outposts the number of instances utilized and available for that instance type. In the second row, we can visualize the available capacity for the Amazon EBS volumes and for the S3 buckets. The last widget shows the operational status of the service link of Outposts.

2. EC2 instances

In this section of the dashboard, we find the metrics showing the CPU, Network, and Disk Utilization for an EC2 instance. It has defined a section of this type for each EC2 instance with a tag assigned matching the name and the value specified in the configuration file of the script.

3. Application Load Balancer

The ALB section aggregates metrics showing the operational status of a load balancer hosted on Outposts. A section of this type is defined for each ALB with an assigned tag matching the one specified in the configuration file.

4. S3 buckets

The S3 buckets section is defined only once and aggregates the utilization metrics for all S3 buckets with an assigned tag.

5. AutoScaling group

The AutoScaling group section can be used to monitor the number of instances in service in a specific AS group with a tag assigned. This section is defined once and can aggregate the metrics for multiple AutoScaling groups.

Clean up

To terminate the resources that we created in this post, run the following:

$ cdk destroy

Then, go to the Cloudformation console and delete the stack with the name “Deploy-AutomatedCloudWatchDashboard”.

Conclusion

In conclusion, this post demonstrates a manual way of creating CloudWatch Metrics dashboard using the CloudWatch console and an automated way using AWS CDK. The automated approach is also scalable by automatically discovering any new resources added to the existing Outposts in the your environment without any changes to the code.

Secure Connectivity from Public to Private: Introducing EC2 Instance Connect Endpoint

2023-06-14 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/secure-connectivity-from-public-to-private-introducing-ec2-instance-connect-endpoint-june-13-2023/

This blog post is written by Ariana Rahgozar, Solutions Architect, and Kenneth Kitts, Sr. Technical Account Manager, AWS.

Imagine trying to connect to an Amazon Elastic Compute Cloud (Amazon EC2) instance within your Amazon Virtual Private Cloud (Amazon VPC) over the Internet. Typically, you’d first have to connect to a bastion host with a public IP address that your administrator set up over an Internet Gateway (IGW) in your VPC, and then use port forwarding to reach your destination.

Today we launched Amazon EC2 Instance Connect (EIC) Endpoint, a new feature that allows you to connect securely to your instances and other VPC resources from the Internet. With EIC Endpoint, you no longer need an IGW in your VPC, a public IP address on your resource, a bastion host, or any agent to connect to your resources. EIC Endpoint combines identity-based and network-based access controls, providing the isolation, control, and logging needed to meet your organization’s security requirements. As a bonus, your organization administrator is also relieved of the operational overhead of maintaining and patching bastion hosts for connectivity. EIC Endpoint works with the AWS Management Console and AWS Command Line Interface (AWS CLI). Furthermore, it gives you the flexibility to continue using your favorite tools, such as PuTTY and OpenSSH.

In this post, we provide an overview of how the EIC Endpoint works and its security controls, guide you through your first EIC Endpoint creation, and demonstrate how to SSH to an instance from the Internet over the EIC Endpoint.

EIC Endpoint product overview

EIC Endpoint is an identity-aware TCP proxy. It has two modes: first, AWS CLI client is used to create a secure, WebSocket tunnel from your workstation to the endpoint with your AWS Identity and Access Management (IAM) credentials. Once you’ve established a tunnel, you point your preferred client at your loopback address (127.0.0.1 or localhost) and connect as usual. Second, when not using the AWS CLI, the Console gives you secure and seamless access to resources inside your VPC. Authentication and authorization is evaluated before traffic reaches the VPC. The following figure shows an illustration of a user connecting via an EIC Endpoint:

Figure 1. User connecting to private EC2 instances through an EIC Endpoint

EIC Endpoints provide a high degree of flexibility. First, they don’t require your VPC to have direct Internet connectivity using an IGW or NAT Gateway. Second, no agent is needed on the resource you wish to connect to, allowing for easy remote administration of resources which may not support agents, like third-party appliances. Third, they preserve existing workflows, enabling you to continue using your preferred client software on your local workstation to connect and manage your resources. And finally, IAM and Security Groups can be used to control access, which we discuss in more detail in the next section.

Prior to the launch of EIC Endpoints, AWS offered two key services to help manage access from public address space into a VPC more carefully. First is EC2 Instance Connect, which provides a mechanism that uses IAM credentials to push ephemeral SSH keys to an instance, making long-lived keys unnecessary. However, until now EC2 Instance Connect required a public IP address on your instance when connecting over the Internet. With this launch, you can use EC2 Instance Connect with EIC Endpoints, combining the two capabilities to give you ephemeral-key-based SSH to your instances without exposure to the public Internet. As an alternative to EC2 Instance Connect and EIC Endpoint based connectivity, AWS also offers Systems Manager Session Manager (SSM), which provides agent-based connectivity to instances. SSM uses IAM for authentication and authorization, and is ideal for environments where an agent can be configured to run.

Given that EIC Endpoint enables access to private resources from public IP space, let’s review the security controls and capabilities in more detail before discussing creating your first EIC Endpoint.

Security capabilities and controls

Many AWS customers remotely managing resources inside their VPCs from the Internet still use either public IP addresses on the relevant resources, or at best a bastion host approach combined with long-lived SSH keys. Using public IPs can be locked down somewhat using IGW routes and/or security groups. However, in a dynamic environment those controls can be hard to manage. As a result, careful management of long-lived SSH keys remains the only layer of defense, which isn’t great since we all know that these controls sometimes fail, and so defense-in-depth is important. Although bastion hosts can help, they increase the operational overhead of managing, patching, and maintaining infrastructure significantly.

IAM authorization is required to create the EIC Endpoint and also to establish a connection via the endpoint’s secure tunneling technology. Along with identity-based access controls governing who, how, when, and how long users can connect, more traditional network access controls like security groups can also be used. Security groups associated with your VPC resources can be used to grant/deny access. Whether it’s IAM policies or security groups, the default behavior is to deny traffic unless it is explicitly allowed.

EIC Endpoint meets important security requirements in terms of separation of privileges for the control plane and data plane. An administrator with full EC2 IAM privileges can create and control EIC Endpoints (the control plane). However, they cannot use those endpoints without also having EC2 Instance Connect IAM privileges (the data plane). Conversely, DevOps engineers who may need to use EIC Endpoint to tunnel into VPC resources do not require control-plane privileges to do so. In all cases, IAM principals using an EIC Endpoint must be part of the same AWS account (either directly or by cross-account role assumption). Security administrators and auditors have a centralized view of endpoint activity as all API calls for configuring and connecting via the EIC Endpoint API are recorded in AWS CloudTrail. Records of data-plane connections include the IAM principal making the request, their source IP address, the requested destination IP address, and the destination port. See the following figure for an example CloudTrail entry.

Figure 2. Partial CloudTrail entry for an SSH data-plane connection

EIC Endpoint supports the optional use of Client IP Preservation (a.k.a Source IP Preservation), which is an important security consideration for certain organizations. For example, suppose the resource you are connecting to has network access controls that are scoped to your specific public IP address, or your instance access logs must contain the client’s “true” IP address. Although you may choose to enable this feature when you create an endpoint, the default setting is off. When off, connections proxied through the endpoint use the endpoint’s private IP address in the network packets’ source IP field. This default behavior allows connections proxied through the endpoint to reach as far as your route tables permit. Remember, no matter how you configure this setting, CloudTrail records the client’s true IP address.

EIC Endpoints strengthen security by combining identity-based authentication and authorization with traditional network-perimeter controls and provides for fine-grained access control, logging, monitoring, and more defense in depth. Moreover, it does all this without requiring Internet-enabling infrastructure in your VPC, minimizing the possibility of unintended access to private VPC resources.

Getting started

Creating your EIC Endpoint

Only one endpoint is required per VPC. To create or modify an endpoint and connect to a resource, a user must have the required IAM permissions, and any security groups associated with your VPC resources must have a rule to allow connectivity. Refer to the following resources for more details on configuring security groups and sample IAM permissions.

The AWS CLI or Console can be used to create an EIC Endpoint, and we demonstrate the AWS CLI in the following. To create an EIC Endpoint using the Console, refer to the documentation.

Creating an EIC Endpoint with the AWS CLI

To create an EIC Endpoint with the AWS CLI, run the following command, replacing [SUBNET] with your subnet ID and [SG-ID] with your security group ID:

aws ec2 create-instance-connect-endpoint \
    --subnet-id [SUBNET] \
    --security-group-id [SG-ID]

After creating an EIC Endpoint using the AWS CLI or Console, and granting the user IAM permission to create a tunnel, a connection can be established. Now we discuss how to connect to Linux instances using SSH. However, note that you can also use the OpenTunnel API to connect to instances via RDP.

Connecting to your Linux Instance using SSH

With your EIC Endpoint set up in your VPC subnet, you can connect using SSH. Traditionally, access to an EC2 instance using SSH was controlled by key pairs and network access controls. With EIC Endpoint, an additional layer of control is enabled through IAM policy, leading to an enhanced security posture for remote access. We describe two methods to connect via SSH in the following.

One-click command

To further reduce the operational burden of creating and rotating SSH keys, you can use the new ec2-instance-connect ssh command from the AWS CLI. With this new command, we generate ephemeral keys for you to connect to your instance. Note that this command requires use of the OpenSSH client. To use this command and connect, you need IAM permissions as detailed here.

Once configured, you can connect using the new AWS CLI command, shown in the following figure:

Figure 3. AWS CLI view upon successful SSH connection to your instance

To test connecting to your instance from the AWS CLI, you can run the following command where [INSTANCE] is the instance ID of your EC2 instance:

aws ec2-instance-connect ssh --instance-id [INSTANCE]

Note that you can still use long-lived SSH credentials to connect if you must maintain existing workflows, which we will show in the following. However, note that dynamic, frequently rotated credentials are generally safer.

Open-tunnel command

You can also connect using SSH with standard tooling or using the proxy command. To establish a private tunnel (TCP proxy) to the instance, you must run one AWS CLI command, which you can see in the following figure:

Figure 4. AWS CLI view after running new SSH open-tunnel command, creating a private tunnel to connect to our EC2 instance

You can run the following command to test connectivity, where [INSTANCE] is the instance ID of your EC2 instance and [SSH-KEY] is the location and name of your SSH key. For guidance on the use of SSH keys, refer to our documentation on Amazon EC2 key pairs and Linux instances.

ssh ec2-user@[INSTANCE] \
    -i [SSH-KEY] \
    -o ProxyCommand='aws ec2-instance-connect open-tunnel \
    --instance-id %h'

Once we have our EIC Endpoint configured, we can SSH into our EC2 instances without a public IP or IGW using the AWS CLI.

Conclusion

EIC Endpoint provides a secure solution to connect to your instances via SSH or RDP in private subnets without IGWs, public IPs, agents, and bastion hosts. By configuring an EIC Endpoint for your VPC, you can securely connect using your existing client tools or the Console/AWS CLI. To learn more, visit the EIC Endpoint documentation.

AWS Professional Services scales by improving performance and democratizing data with Amazon QuickSight

2023-06-14 Ameya Agavekar

Post Syndicated from Ameya Agavekar original https://aws.amazon.com/blogs/big-data/aws-professional-services-scales-by-improving-performance-and-democratizing-data-with-amazon-quicksight/

The AWS Professional Services (ProServe) Insights team builds global operational data products that serve over 8,000 users within Amazon. Our team was formed in 2019 as an informal group of four analysts who supported ad hoc analysis for a division of ProServe consultants. ProServe is responsible for assisting enterprises as they shift to the cloud by incorporating Amazon Web Services (AWS) into their overall architecture. In recent years, the demand for domain expertise (ProServe) grew as various industries and organizations accelerated the move to the cloud.

We work hand in hand with customer teams and AWS partners to provide deep expertise in the architecture, design, development, and implementation of cloud computing initiatives that result in real business outcomes.

As our organization grew rapidly, we built new tools to scale analytical insights into our customers’ sales and delivery mechanisms. We were frustrated by the limitations of our previous business intelligence (BI) solution, which was holding us back from our vision to accelerate data sharing, team collaboration, and security within Amazon. To scale and continue innovating, we developed a secure Amazon QuickSight environment for our internal customers.

In this post, we discuss how QuickSight has helped us improve our performance, democratize our data, and provide insights to our internal customers at scale.

Enabling teams to build their own analyses at scale

The Insights team builds dashboards and supports thousands of internal consultants and hundreds of analysts and engineers across the globe who drive local products and insights. As a team of engineers focused on building a single source of truth within Amazon, we wanted a BI tool that was cost-effective, secure, and integrated seamlessly with the other AWS services we use. ProServe team members need to make strategic decisions on behalf of customers, and we play a key role by providing the tools they need to make the right decisions. We’ve made a big impact with QuickSight because it doesn’t require in-depth knowledge about data visualizations to build dashboards and provide insights, empowering our users to build what they need.

Our QuickSight instance is secure and tailored to use the Amazon active directory framework. We also helped 13 Amazon teams to set up their own instances. The teams we serve have benefited from our switch to QuickSight. For example, AWS Professional Services launched Financial Insights Tool (FIT) 2 years ago, a QuickSight dashboard that reports project financials, project revenue leakage, and margin erosion by evaluating actuals and forecasts at any granularity. FIT saves 30 minutes per project per Engagement Manager (EM) per week, and securely centralizes project financials into a scalable tool. This empowers EMs to avoid building disparate local reporting that creates logic inconsistencies and data security issues.

One of our ProServe teams has 19 dashboards on QuickSight, including Catalog, Trend and Analysis, KPI Monitoring, Business Management, and Quality Control. In 2022, one of the KPI Monitoring dashboards helped save at least 5,600 hours in total across 230 managers and 2,000 consultants. Last year, this team also reported over 29,600 distinct views on their 19 dashboards.

Additionally, we launched the first iteration of a hygiene dashboard in February 2022. This dashboard helps our operations team and end customers improve the data quality of key attribution and reduce manual intervention. The hygiene dashboard makes sure all the checks are enabled to safely promote customer outcomes with better planning and forecasting capabilities. The adoption of the dashboard led to a 73% reduction in hygiene issues from February 2022 to February 2023. We attribute this reduction to better inspection from weekly business reviews, and with this dashboard as an inspection tool delivered via email subscription to our stakeholders.

Improved performance and flexibility

To onboard a new reader or author on our previous BI tool, we had to provide a new license each time. This manual process was time-consuming and costly. The QuickSight usage-based pricing model makes sure that we can provide analytics and insights to all users without the need to pay ahead for user-specific licenses. QuickSight guarantees that we can provide everyone access by default, enabling data democratization in ways we couldn’t before. Since moving to QuickSight, we have scaled from 1,400 licenses to over 8,000 unique viewers, thanks to its auto scaling capability. QuickSight is fully managed, serverless, and can automatically scale to tens of thousands of users without any infrastructure to manage or capacity to plan for.

We used to have to track our dashboard loading times on our previous tool, but QuickSight has made that a thing of the past. With QuickSight SPICE (Super-fast, Parallel, In-memory Calculation Engine), our dashboards load in seconds, not minutes. Our high impact dashboards for ProServe weekly business reviews, utilization and time card deep dive, and impact points use the powerful SPICE functionality, which has support for up to 1 billion rows.

The Weekly Business Review (WBR) Backlog Summary dashboard that replaced the complex scorecards with interactive QuickSight visuals:

Implementing the FAIRS data sharing framework

QuickSight improves the ability to securely Find, Access, Interoperate, and Reuse (FAIRS) the data across ProServe, AWS, and Amazon. This framework can be described as follows:

Findable – Metadata and data should be easy to find for both humans and computers. All datasets and dashboards in the same account provide us an opportunity to see the big picture. This streamlines the reports’ and dashboards’ dictionary-building process for ProServe. Automatic discovery of datasets, reports, and dashboards improves the analytical community’s efficiency and productivity.
Accessible – Users need to be able to access the data they need easily and securely. In our case, we can control the access using appropriate roles, groups, and highly secure authentication methods not only for users but also for our data sources, datasets, and dashboards using Amazon Active Directory Connected Groups.
Interoperable – As the worldwide central team across 10 geographies and over 2,000 practices, our data needs to interoperate with applications or workflows for analysis, storage, and processing. This allows many different parts of the organization to collaborate as a cohesive unit.
Reusable – The ultimate goal of FAIRS is to optimize the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated or combined in different settings. This way, we can save time and focus our analysis on diving deeper into the insights.
Security – Security is job Zero. We believe that everyone should have secure data access to make data-driven decisions. At ProServe, we use Amazon Active Directory, column-level security (CLS), and row-level security (RLS) to maintain the high bar in data security.

QuickSight has brought us closer to achieving FAIRS than we could get with our previous BI solution. Dashboards are easily searchable, highly collaborative, and secure.

The User Activities Tracking dashboard created by ProServe’s QuickSight administrators:

Seamless AWS Integration

QuickSight integrates seamlessly with other AWS services such as Amazon SageMaker to use machine learning insights within a low code or no code environment. Our users can simply connect to any of the QuickSight supported data sources—including Amazon Simple Storage Service (Amazon S3), Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon Redshift—and select the SageMaker model they want to use for prediction. QuickSight has also made it possible for dashboard development from non-analyst roles.

QuickSight is constantly evolving. Eighty-four features were added in 2022. One of those features is asset management, which has made it easier for us to transfer assets from one group to another to onboard new employees. With QuickSight, we have reached over 9,200 distinct viewers and over 500,000 total views at Amazon with 485 dashboards.

The Skills dashboard representing the percentage match of top 100 skills entered by each specialization:

QuickSight has enabled us to scale with the growth of our team without sacrificing our performance. Instead, we improved our performance and democratized our data, thanks to great capabilities such as SPICE, the flexibility of the product, and its seamless integration with the other AWS services. We look forward to continuing to provide solutions to our internal customers by exploring other QuickSight capabilities such as Amazon QuickSight Q, embedded analytics, and more.

To learn more about how QuickSight can help your business with dashboards, reports, and more, visit Amazon QuickSight.

About the Authors

Ameya Agavekar is a results-driven, highly skilled data strategist. Ameya leads the data engineering and data science function for AWS Professional Services World-Wide Business Insights & Analytics team. Outside of work, Ameya is a professional pilot. He enjoys serving community by applying his unique flying skills with the US Airforce auxiliary Civil Air Patrol.

Tucker Shouse leads the AWS Professional Services World-Wide Business Insights & Analytics team. Prior to AWS, Tucker worked with financial services, retail, healthcare, and non-profit clients to develop digital and data products and strategies as a Manager at Alvarez & Marsal Corporate Performance Improvement. Outside of work, Tucker enjoys spending time with his wife and daughter enjoying the outdoors and music.

How Klarna Bank AB built real-time decision-making with Amazon Kinesis Data Analytics for Apache Flink

2023-06-13 Nir Tsruya

Post Syndicated from Nir Tsruya original https://aws.amazon.com/blogs/big-data/how-klarna-bank-ab-built-real-time-decision-making-with-amazon-kinesis-data-analytics-for-apache-flink/

This is a joint post co-authored with Nir Tsruya from Klarna Bank AB.

Klarna is a leading global payments and shopping service, providing smarter and more flexible shopping and purchase experiences to 150 million active consumers across more than 500,000 merchants in 45 countries. Klarna offers direct payments, pay after delivery options, and instalment plans in a smooth one-click purchase experience that lets consumers pay when and how they prefer to. The ability to utilize data to make near-real-time decisions is a source of competitive advantage for Klarna.

This post presents a reference architecture for real-time queries and decision-making on AWS using Amazon Kinesis Data Analytics for Apache Flink. In addition, we explain why the Klarna Decision Tooling team selected Kinesis Data Analytics for Apache Flink for their first real-time decision query service. We show how Klarna uses Kinesis Data Analytics for Apache Flink as part of an end-to-end solution including Amazon DynamoDB and Apache Kafka to process real-time decision-making.

AWS offers a rich set of services that you can use to realize real-time insights. These services include Kinesis Data Analytics for Apache Flink, the solution Klarna that uses to underpin automated decision-making in their business today. Kinesis Data Analytics for Apache Flink allows you to easily build stream processing applications for a variety of sources including Amazon Kinesis Data Streams, Amazon Managed Streaming for Apache Kafka (Amazon MSK), and Amazon MQ.

The challenge: Real-time decision-making at scale

Klarna’s customers expect a real-time, frictionless, online experience when shopping and paying online. In the background, Klarna needs to assess risks such as credit risk, fraud attempts, and money laundering for every customer credit request in every operating geography. The outcome of this risk assessment is called a decision. Decisions generate millions of risk assessment transactions a day that must be run in near-real time. The final decision is the record of whether Klarna has approved or rejected the request to extend credit to a consumer. These underwriting decisions are critical artefacts. First, they contain information that must be persisted for legal reasons. Second, they are used to build profiles and models that are fed into underwriting policies to improve the decision process. Under the hood, a decision is the sum of a number of transactions (for example, credit checks), coordinated and persisted via a decision store.

Klarna wanted to build a framework to ensure decisions persist successfully, ensuring timely risk assessment and quick decisions for customers. First, the Klarna team looked to solve the problem of producing and capturing decisions by using a combination of Apache Kafka and AWS Lambda. By publishing decision artefacts directly to a Kafka topic, the Klarna team found that high latency could cause long transaction wait times or transactions to be rejected altogether, leading to delays in getting ratified decisions to customers in a timely fashion and potential lost revenue. This approach also caused operational overhead for the Klarna team, including management of the schema evolution, replaying old events, and native integration of Lambda with their self-managed Apache Kafka clusters.

Design requirements

Klarna was able to set out their requirements for a solution to capture risk assessment artefacts (decisions), acting as a source of truth for all underwriting decisions within Klarna. The key requirements included at-least once reliability and millisecond latency, enabling real-time access to decision-making and the ability to replay past events in case of missing data in downstream systems. Additionally, the team needed a system that could scale to keep pace with Klarna’s rapid [10 times] growth.

Solution overview

The solution consists of two components: a combination of an highly available API with DynamoDB as the data store to store each decision, and Amazon DynamoDB Streams with Kinesis Data Analytics. Kinesis Data Analytics is a fully managed Apache Flink service and used to stream, process, enrich, and standardize the decision in real time and replay past events (if needed).

The following diagram illustrates the overall flow from end-user to the downstream systems.

The flow includes the following steps:

As the end-user makes a purchase, the policy components assess risk and the decision is sent to a decision store via the Decision Store API.
The Decision Store API persists the data in DynamoDB and responds to the requester. Decisions for each transaction are time-ordered and streamed by DynamoDB Streams. Decision Store also enables centralised schema management and handles evolution of event schemas.
The Kinesis Data Analytics for Apache Flink application is the consumer of DynamoDB streams. The application makes sure that the decisions captured are conforming to the expected event schema that is then published to a Kafka topic to be consumed by various downstream systems. Here, Kinesis Data Analytics for Apache Flink plays a vital part in the delivery of those events: aggregating, enriching, and mapping data to adhere to the event schema. This provides a standardized way for consumers to access decisions from their respective producers. The application enables at-least once delivery capability, and Flink’s checkpoint and retry mechanism guarantees that every event is processed and persisted.
The published Kafka events are consumed by the downstream systems and stored in an Amazon Simple Storage Service (Amazon S3) bucket. The events stored in Amazon S3 reflect every decision ever taken by the producing policy components, and can be used by the decision store to backfill and replay any past events. In addition to preserving the history of decision events, events are also stored as a set of variables in the variable store.
Policy components use the variable store to check for similar past decisions to determine if a request can be accepted or denied immediately. The request is then processed as described by the preceding workflow, and the next subsequent request will be answered by the variable store based on the result of the previous decision.

The decision store provides a standardized workflow for processing and producing events for downstream systems and customer support. With all the events captured and safely stored in DynamoDB, the decision store provides an API for support engineers (and other supporting tools like chatbots) to query and access past decisions in near-real time.

Solution impact

The solution provided benefits in three areas.

First, the managed nature of Kinesis Data Analytics allowed the Klarna team to focus on value-adding application development instead of managing infrastructure. The team is able to onboard new use cases in less than a week. They can take full advantage of the auto scaling feature in Kinesis Data Analytics and pre-built sources and destinations.

Second, the team can use Apache Flink to ensure the accuracy, completeness, consistency, and reliability of data. Flink’s native capability of stateful computation and data accuracy through the use of checkpoints and savepoints directly supports Klarna team’s vision to add more logic into the pipelines, allowing the team to expand to different use cases confidently. Additionally, the low latency of the service ensures that enriched decision artefacts are available to consumers and subsequently to the policy agents for future decision-making in near-real time.

Third, the solution enables the Klarna team to take advantage of the Apache Flink open-source community, which provides rich community support and the opportunity to contribute back to the community by bug fixing or adding new features.

This solution has proven to scale with increased adoption of a new use case, translating to a 10-times increase in events over 3 months.

Lessons learned

The Klarna team faced a few challenges with Flink serialization and upgrading Apache Flink versions. Flink serialization is an interesting concept and critical for the application’s performance. Flink uses a different set of serializers in order to serialize data between the operators. It’s up to the team to configure the best and most efficient serializer based on the use case. The Klarna team configured the objects as Flink POJO, which reduced the pipeline runtime by 85%. For more information, refer to Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can before deploying a Flink application to production.

The other challenge faced by the team was upgrading the Apache Flink version in Kinesis Data Analytics. Presently, the Kinesis Data Analytics for Apache Flink application requires the creation of a new Kinesis Data Analytics for Apache Flink application. Currently, reusing a snapshot (the binary artefact representing the state of the Flink application, used to restore the application to the last checkpoint taken) is not possible between two different applications. For that reason, upgrading the Apache Flink version requires additional steps in order to ensure the application doesn’t lose data.

What’s next for Klarna and Kinesis Data Analytics for Apache Flink?

The team is looking into expanding the usage of Kinesis Data Analytics and Flink in Klarna. Because the team is already highly experienced in the technology, their first ambition will be to own the infrastructure of a Kinesis Data Analytics for Apache Flink deployment, and connect it to different Klarna data sources. The team then will host business logic provided by other departments in Klarna such as Fraud Prevention. This will allow the specialised teams to concentrate on the business logic and fraud detection algorithms, while decision tooling will handle the infrastructure.

Klarna, AWS, and the Flink community

A key part of choosing Kinesis Data Analytics for Apache Flink was the open-source community and support.

Several teams within Klarna created different implementations of a Flink DynamoDB connector, which were used internally by multiple teams. Klarna then identified the opportunity to create a single maintained DynamoDB Flink connector and contribute it to the open-source community. This has initiated a collaboration within Klarna, led by the Klarna Flink experts and accompanied by Flink open-source contributors from AWS.

The main principle for designing the DynamoDB Flink connector was utilizing the different write capacity modes of DynamoDB. DynamoDB supports On-demand and Provisioned capacity modes and each behaves differently when it comes to how it handles incoming throughput. On-demand mode will automatically scale up DynamoDB write capacity and apply itself to the incoming load. However, Provisioned mode is more limiting, and will throttle incoming traffic according to the provisioned write capacity.

To comply with this process, the DynamoDB Flink connector was designed to allow concurrent writes to DynamoDB. The number of concurrent requests can be configured to comply with DynamoDB’s capacity mode. In addition, the DynamoDB Flink connector supports backpressure handling in case the DynamoDB write provisioning is low compared to the incoming load from the Apache Flink application.

At the time of writing, the DynamoDB Flink connector has been open sourced.

Conclusion

Klarna has successfully been running Kinesis Data Analytics for Apache Flink in production since October 2020. It provides several key benefits. The Klarna development team can focus on development, not on cluster and operational management. Their applications can be quickly modified and uploaded. The low latency properties of the service ensure a near-real-time experience for end-users, data consumers, and producers, which underpin risk assessment and the decision-making processes underpinning continuous traffic growth. At the same time, exactly-once processing in combination with Flink checkpoints and savepoints means that critical decision-making and legal data is not lost.

To learn more about Kinesis Data Analytics and getting started, refer to Using a Studio notebook with Kinesis Data Analytics for Apache Flink and More Kinesis Data Analytics Solutions on GitHub.

About the authors

Nir Tsruya is a Lead Engineer in Klarna. He leads 2 engineering teams focusing mainly on real time data processing and analytics at large scale.

Ankit Gupta is a Senior Solutions Architect at Amazon Web Serves based in Stockholm, Sweden, where we helps customers across the Nordics succeed in Cloud. He’s particularly passionate about building strong Networking foundation in Cloud.

Daniel Arenhage is a Solutions Architect at Amazon Web Services based in Gothenburg, Sweden.