Tag Archives: machine learning

Generate machine learning insights for Amazon Security Lake data using Amazon SageMaker

Post Syndicated from Jonathan Nguyen original https://aws.amazon.com/blogs/security/generate-machine-learning-insights-for-amazon-security-lake-data-using-amazon-sagemaker/

Amazon Security Lake automatically centralizes the collection of security-related logs and events from integrated AWS and third-party services. With the increasing amount of security data available, it can be challenging knowing what data to focus on and which tools to use. You can use native AWS services such as Amazon QuickSight, Amazon OpenSearch, and Amazon SageMaker Studio to visualize, analyze, and interactively identify different areas of interest to focus on, and prioritize efforts to increase your AWS security posture.

In this post, we go over how to generate machine learning insights for Security Lake using SageMaker Studio. SageMaker Studio is a web integrated development environment (IDE) for machine learning that provides tools for data scientists to prepare, build, train, and deploy machine learning models. With this solution, you can quickly deploy a base set of Python notebooks focusing on AWS Security Hub findings in Security Lake, which can also be expanded to incorporate other AWS sources or custom data sources in Security Lake. After you’ve run the notebooks, you can use the results to help you identify and focus on areas of interest related to security within your AWS environment. As a result, you might implement additional guardrails or create custom detectors to alert on suspicious activity.


  1. Specify a delegated administrator account to manage the Security Lake configuration for all member accounts within your organization.
  2. Security Lake has been enabled in the delegated administrator AWS account.
  3. As part of the solution in this post, we focus on Security Hub as a data source. AWS Security Hub must be enabled for your AWS Organizations. When enabling Security Lake, select All log and event sources to include AWS Security Hub findings.
  4. Configure subscriber query access to Security Lake. Security Lake uses AWS Lake Formation cross-account table sharing to support subscriber query access. Accept the resource share request in the subscriber AWS account in AWS Resource Access Manager (AWS RAM). Subscribers with query access can query the data that Security Lake collects. These subscribers query Lake Formation tables in an Amazon Simple Storage Service (Amazon S3) bucket with Security Lake data using services such as Amazon Athena.

Solution overview

Figure 1 that follows depicts the architecture of the solution.

Figure 1 SageMaker machine learning insights architecture for Security Lake

Figure 1 SageMaker machine learning insights architecture for Security Lake

The deployment builds the architecture by completing the following steps:

  1. A Security Lake is set up in an AWS account with supported log sources — such as Amazon VPC Flow Logs, AWS Security Hub, AWS CloudTrail, and Amazon Route53 — configured.
  2. Subscriber query access is created from the Security Lake AWS account to a subscriber AWS account.

    Note: See Prerequisite #4 for more information.

  3. The AWS RAM resource share request must be accepted in the subscriber AWS account where this solution is deployed.

    Note: See Prerequisite #4 for more information.

  4. A resource link database in Lake Formation is created in the subscriber AWS account and grants access for the Athena tables in the Security Lake AWS account.
  5. VPC is provisioned for SageMaker with IGW, NAT GW, and VPC endpoints for the AWS services used in the solution. IGW and NAT are required to install external open-source packages.
  6. A SageMaker Domain for SageMaker Studio is created in VPCOnly mode with a single SageMaker user profile that is tied to a dedicated AWS Identity and Access Management (IAM) role.
  7. A dedicated IAM role is created to restrict access to create and access the presigned URL for the SageMaker Domain from a specific CIDR for accessing the SageMaker notebook.
  8. An AWS CodeCommit repository containing Python notebooks is used for the AI and ML workflow by the SageMaker user-profile.
  9. An Athena workgroup is created for the Security Lake queries with an S3 bucket for output location (access logging configured for the output bucket).

Deploy the solution

You can deploy the SageMaker solution by using either the AWS Management Console or the AWS Cloud Development Kit (AWS CDK).

Option 1: Deploy the solution with AWS CloudFormation using the console

Use the console to sign in to your subscriber AWS account and then choose the Launch Stack button to open the AWS CloudFormation console pre-loaded with the template for this solution. It takes approximately 10 minutes for the CloudFormation stack to complete.

Select this image to open a link that starts building the CloudFormation stack

Option 2: Deploy the solution by using the AWS CDK

You can find the latest code for the SageMaker solution in the SageMaker machine learning insights GitHub repository, where you can also contribute to the sample code. For instructions and more information on using the AWS CDK, see Get Started with AWS CDK.

To deploy the solution by using the AWS CDK

  1. To build the app when navigating to the project’s root folder, use the following commands:
    npm install -g aws-cdk-lib
    npm install

  2. Update IAM_role_assumption_for_sagemaker_presigned_url and security_lake_aws_account default values in source/lib/sagemaker_domain.ts with their respective appropriate values.
  3. Run the following commands in your terminal while authenticated in your subscriber AWS account. Be sure to replace <INSERT_AWS_ACCOUNT> with your account number and replace <INSERT_REGION> with the AWS Region that you want the solution deployed to.
    cdk bootstrap aws://<INSERT_AWS_ACCOUNT>/<INSERT_REGION>
    cdk deploy

Post deployment steps

Now that you’ve deployed the SageMaker solution, you must grant the SageMaker user profile in the subscriber AWS account query access to your Security Lake. You can Grant permission for the SageMaker user profile to Security Lake in Lake Formation in the subscriber AWS account.

Grant permission to the Security Lake database

  1. Copy the SageMaker user-profile Amazon resource name (ARN) arn:aws:iam::<account-id>:role/sagemaker-user-profile-for-security-lake
  2. Go to Lake Formation in the console.
  3. Select the amazon_security_lake_glue_db_us_east_1 database.
  4. From the Actions Dropdown, select Grant.
  5. In Grant Data Permissions, select SAML Users and Groups.
  6. Paste the SageMaker user profile ARN from Step 1.
  7. In Database Permissions, select Describe and then Grant.

Grant permission to Security Lake – Security Hub table

  1. Copy the SageMaker user-profile ARN arn:aws:iam:<account-id>:role/sagemaker-user-profile-for-security-lake
  2. Go to Lake Formation in the console.
  3. Select the amazon_security_lake_glue_db_us_east_1 database.
  4. Choose View Tables.
  5. Select the amazon_security_lake_table_us_east_1_sh_findings_1_0 table.
  6. From Actions Dropdown, select Grant.
  7. In Grant Data Permissions, select SAML Users and Groups.
  8. Paste the SageMaker user-profile ARN from Step 1.
  9. In Table Permissions, select Describe and then Grant.

Launch your SageMaker Studio application

Now that you have granted permissions for a SageMaker user-profile, we can move on to launching the SageMaker application associated to that user-profile.

  1. Navigate to the SageMaker Studio domain in the console.
  2. Select the SageMaker domain security-lake-ml-insights-<account-id>.
  3. Select the SageMaker user profile sagemaker-user-profile-for-security-lake.
  4. Select the Launch drop-down and select Studio
    Figure 2 SageMaker domain user-profile AWS console screen

    Figure 2: SageMaker domain user-profile AWS console screen

Clone Python notebooks

You’ll work primarily in the SageMaker user profile to create a data-science app to work in. As part of the solution deployment, we’ve created Python notebooks in CodeCommit that you will need to clone.

To clone the Python notebooks

  1. Navigate to CloudFormation in the console.
  2. In the Stacks section, select the SageMakerDomainStack.
  3. Select to the Outputs tab/
  4. Copy the value for sagemakernotebookmlinsightsrepositoryURL. (For example: https://git-codecommit.us-east-1.amazonaws.com/v1/repos/sagemaker_ml_insights_repo)
  5. Go back to your SageMaker app.
  6. In Studio, in the left sidebar, choose the Git icon (identified by a diamond with two branches), then choose Clone a Repository.
    Figure 3 SageMaker clone CodeCommit repository

    Figure 3: SageMaker clone CodeCommit repository

  7. Paste the CodeCommit repository link from Step 4 under the Git repository URL (git). After you paste the URL, select Clone “https://git-codecommit.us-east-1.amazonaws.com/v1/repos/sagemaker_ml_insights_repo”, then select Clone.

    NOTE: If you don’t select from the auto-populated drop-down, SageMaker won’t be able to clone the repository.

    Figure 4 SageMaker clone CodeCommit URL

    Figure 4: SageMaker clone CodeCommit URL

Generating machine learning insights using SageMaker Studio

You’ve successfully pulled the base set of Python notebooks into your SageMaker app and they can be accessed at sagemaker_ml_insights_repo/notebooks/tsat/. The notebooks provide you with a starting point for running machine learning analysis using Security Lake data. These notebooks can be expanded to existing native or custom data sources being sent to Security Lake.

Figure 5: SageMaker cloned Python notebooks

Figure 5: SageMaker cloned Python notebooks

Notebook #1 – Environment setup

The 0.0-tsat-environ-setup notebook handles the installation of the required libraries and dependencies needed for the subsequent notebooks within this blog. For our notebooks, we use an open-source Python library called Kats, which is a lightweight, generalizable framework to perform time series analysis.

  1. Select the 0.0-tsat-environ-setup.ipynb notebook for the environment setup.

    Note: If you have already provisioned a kernel, you can skip steps 2 and 3.

  2. In the right-hand corner, select No Kernel
  3. In the Set up notebook environment pop-up, leave the defaults and choose Select.
    Figure 6 SageMaker application environment settings

    Figure 6: SageMaker application environment settings

  4. After the kernel has successfully started, choose the Terminal icon to open the image terminal.
    Figure 7: SageMaker application terminal

    Figure 7: SageMaker application terminal

  5. To install open-source packages from https instead of http, you must update the sources.list file. After the terminal opens, send the following commands:
    cd /etc/apt
    sed -i 's/http:/https:/g' sources.list

  6. Go back to the 0.0-tsat-environ-setup.ipynb notebook and select the Run drop-down and select Run All Cells. Alternatively, you can run each cell independently, but it’s not required. Grab a coffee! This step will take about 10 minutes.

    IMPORTANT: If you complete the installation out of order or update the requirements.txt file, you might not be able to successfully install Kats and you will need to rebuild your environment by using a net-new SageMaker user profile.

  7. After installing all the prerequisites, check the Kats version to determine if it was successfully installed.
    Figure 8: Kats installation verification

    Figure 8: Kats installation verification

  8. Install PyAthena (Python DB API client for Amazon Athena) which is used to query your data in Security Lake.

You’ve successfully set up the SageMaker app environment! You can now load the appropriate dataset and create a time series.

Notebook #2 – Load data

The 0.1-load-data notebook establishes the Athena connection to query data in Security Lake and creates the resulting time series dataset. The time series dataset will be used for subsequent notebooks to identify trends, outliers, and change points.

  1. Select the 0.1-load-data.ipynb notebook.
  2. If you deployed the solution outside of us-east-1, update the con details to the appropriate Region. In this example, we’re focusing on Security Hub data within Security Lake. If you want to change the underlying data source, you can update the TABLE value.
    Figure 9: SageMaker notebook load Security Lake data settings

    Figure 9: SageMaker notebook load Security Lake data settings

  3. In the Query section, there’s an Athena query to pull specific data from Security Hub, this can be expanded as needed to a subset or can include all products within Security Hub. The query below pulls Security Hub information after 01:00:00 1/1/2022 from the products listed in productname.
    Figure 10: SageMaker notebook Athena query

    Figure 10: SageMaker notebook Athena query

  4. After the values have been updated, you can create your time series dataset. For this notebook, we recommend running each cell individually instead of running all cells at once so you can get a bit more familiar with the process. Select the first cell and choose the Run icon.
    Figure 11: SageMaker run Python notebook code

    Figure 11: SageMaker run Python notebook code

  5. Follow the same process as Step 4 for the subsequent cells.

    Note: If you encounter any issues with querying the table, make sure you completed the post-deployment step for Grant permission to Security Lake – Security Hub table.

You’ve successfully loaded your data and created a timeseries! You can now move on to generating machine learning insights from your timeseries.

Notebook #3 – Trend detector

The 1.1-trend-detector.ipynb notebook handles trend detection in your data. Trend represents a directional change in the level of a time series. This directional change can be either upward (increase in level) or downward (decrease in level). Trend detection helps detect a change while ignoring the noise from natural variability. Each environment is different, and trends help us identify where to look more closely to determine why a trend is positive or negative.

  1. Select 1.1-trend-detector.ipynb notebook for trend detection.
  2. Slopes are created to identify the relationship between x (time) and y (counts).
    Figure 12: SageMaker notebook slope view

    Figure 12: SageMaker notebook slope view

  3. If the counts are increasing with time, then it’s considered a positive slope and the reverse is considered a negative slope. A positive slope isn’t necessarily a good thing because in an ideal state we would expect counts of a finding type to come down with time.
    Figure 13: SageMaker notebook trend view

    Figure 13: SageMaker notebook trend view

  4. Now you can plot the top five positive and negative trends to identify the top movers.
    Figure 14: SageMaker notebook trend results view

    Figure 14: SageMaker notebook trend results view

Notebook #4 – Outlier detection

The 1.2-outlier-detection.ipynb notebook handles outlier detection. This notebook does a seasonal decomposition of the input time series, with additive or multiplicative decomposition as specified (default is additive). It uses a residual time series by either removing only trend or both trend and seasonality if the seasonality is strong. The intent is to discover useful, abnormal, and irregular patterns within data sets, allowing you to pinpoint areas of interest.

  1. To start, it detects points in the residual that are over 5 times the inter-quartile range.
  2. Inter-quartile range (IQR) is the difference between the seventy-fifth and twenty-fifth percentiles of residuals or the spread of data within the middle two quartiles of the entire dataset. IQR is useful in detecting the presence of outliers by looking at values that might lie outside of the middle two quartiles.
  3. The IQR multiplier controls the sensitivity of the range and decision of identifying outliers. By using a larger value for the iqr_mult_thresh parameter in OutlierDetector, outliers would be considered data points, while a smaller value would identify data points as outliers.

    Note: If you don’t have enough data, decrease iqr_mult_thresh to a lower value (for example iqr_mult_thresh=3).

    Figure 15: SageMaker notebook outlier setting

    Figure 15: SageMaker notebook outlier setting

  4. Along with outlier detection plots, investigation SQL will be displayed as well, which can help with further investigation of the outliers.

    In the diagram that follows, you can see that there are several outliers in the number of findings, related to failed AWS Firewall Manager policies, which have been identified by the vertical red lines within the line graph. These are outliers because they deviate from the normal behavior and number of findings on a day-to-day basis. When you see outliers, you can look at the resources that might have caused an unusual increase in Firewall Manager policy findings. Depending on the findings, it could be related to an overly permissive or noncompliant security group or a misconfigured AWS WAF rule group.

    Figure 16: SageMaker notebook outlier results view

    Figure 16: SageMaker notebook outlier results view

Notebook #5 – Change point detection

The 1.3-changepoint-detector.ipynb notebook handles the change point detection. Change point detection is a method to detect changes in a time series that persist over time, such as a change in the mean value. To detect a baseline to identify when several changes might have occurred from that point. Change points occur when there’s an increase or decrease to the average number of findings within a data set.

  1. Along with identifying change points within the data set, the investigation SQL is generated to further investigate the specific change point if applicable.

    In the following diagram, you can see there’s a change point decrease after July 27, 2022, with confidence of 99.9 percent. It’s important to note that change points differ from outliers, which are sudden changes in the data set observed. This diagram means there was some change in the environment that resulted in an overall decrease in the number of findings for S3 buckets with block public access being disabled. The change could be the result of an update to the CI/CD pipelines provisioning S3 buckets or automation to enable all S3 buckets to block public access. Conversely, if you saw a change point that resulted in an increase, it could mean that there was a change that resulted in a larger number of S3 buckets with a block public access configuration consistently being disabled.

    Figure 17: SageMaker changepoint detector view

    Figure 17: SageMaker changepoint detector view

By now, you should be familiar with the set up and deployment for SageMaker Studio and how you can use Python notebooks to generate machine learning insights for your Security Lake data. You can take what you’ve learned and start to curate specific datasets and data sources within Security Lake, create a time series, detect trends, and identify outliers and change points. By doing so, you can answer a variety of security-related questions such as:

  • CloudTrail

    Is there a large volume of Amazon S3 download or copy commands to an external resource? Are you seeing a large volume of S3 delete object commands? Is it possible there’s a ransomware event going on?

  • VPC Flow Logs

    Is there an increase in the number of requests from your VPC to external IPs? Is there an increase in the number of requests from your VPC to your on-premises CIDR? Is there a possibility of internal or external data exfiltration occurring?

  • Route53

    Which resources are making DNS requests that they haven’t typically made within the last 30–45 days? When did it start? Is there a potential command and control session occurring on an Amazon Elastic Compute Cloud (Amazon EC2) instance?

It’s important to note that this isn’t a solution to replace Amazon GuardDuty, which uses foundational data sources to detect communication with known malicious domains and IP addresses and identify anomalous behavior, or Amazon Detective, which provides customers with prebuilt data aggregations, summaries, and visualizations to help security teams conduct faster and more effective investigations. One of the main benefits of using Security Lake and SageMaker Studio is the ability to interactively create and tailor machine learning insights specific to your AWS environment and workloads.

Clean up

If you deployed the SageMaker machine learning insights solution by using the Launch Stack button in the AWS Management Console or the CloudFormation template sagemaker_ml_insights_cfn, do the following to clean up:

  1. In the CloudFormation console for the account and Region where you deployed the solution, choose the SageMakerML stack.
  2. Choose the option to Delete the stack.

If you deployed the solution by using the AWS CDK, run the command cdk destroy.


Amazon Security Lake gives you the ability to normalize and centrally store your security data from various log sources to help you analyze, visualize, and correlate appropriate security logs. You can then use this data to increase your overall security posture by implementing additional security guardrails or take appropriate remediation actions within your AWS environment.

In this blog post, you learned how you can use SageMaker to generate machine learning insights for your Security Hub findings in Security Lake. Although the example solution focuses on a single data source within Security Lake, you can expand the notebooks to incorporate other native or custom data sources in Security Lake.

There are many different use-cases for Security Lake that can be tailored to fit your AWS environment. Take a look at this blog post to learn how you can ingest, transform and deliver Security Lake data to Amazon OpenSearch to help your security operations team quickly analyze security data within your AWS environment. In supported Regions, new Security Lake account holders can try the service free for 15 days and gain access to its features.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Jonathan Nguyen

Jonathan Nguyen

Jonathan is a Principal Security Architect at AWS. His background is in AWS security, with a focus on threat detection and incident response. He helps enterprise customers develop a comprehensive AWS security strategy, deploy security solutions at scale, and train customers on AWS security best practices.

Madhunika Reddy Mikkili

Madhunika Reddy Mikkili

Madhunika is a Data and Machine Learning Engineer with the AWS Professional Services Shared Delivery Team. She is passionate about helping customers achieve their goals through the use of data and machine learning insights. Outside of work, she loves traveling and spending time with family and friends.

Bots Are Better than Humans at Solving CAPTCHAs

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/08/bots-are-better-than-humans-at-solving-captchas.html

Interesting research: “An Empirical Study & Evaluation of Modern CAPTCHAs“:

Abstract: For nearly two decades, CAPTCHAS have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAS have continued to improve. Meanwhile, CAPTCHAS have also evolved in terms of sophistication and diversity, becoming increasingly difficult to solve for both bots (machines) and humans. Given this long-standing and still-ongoing arms race, it is critical to investigate how long it takes legitimate users to solve modern CAPTCHAS, and how they are perceived by those users.

In this work, we explore CAPTCHAS in the wild by evaluating users’ solving performance and perceptions of unmodified currently-deployed CAPTCHAS. We obtain this data through manual inspection of popular websites and user studies in which 1, 400 participants collectively solved 14, 000 CAPTCHAS. Results show significant differences between the most popular types of CAPTCHAS: surprisingly, solving time and user perception are not always correlated. We performed a comparative study to investigate the effect of experimental context ­ specifically the difference between solving CAPTCHAS directly versus solving them as part of a more natural task, such as account creation. Whilst there were several potential confounding factors, our results show that experimental context could have an impact on this task, and must be taken into account in future CAPTCHA studies. Finally, we investigate CAPTCHA-induced user task abandonment by analyzing participants who start and do not complete the task.

Slashdot thread.

And let’s all rewatch this great ad from 2022.

AVA Discovery View: Surfacing Authentic Moments

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/ava-discovery-view-surfacing-authentic-moments-b8cd145491cc

By: Hamid Shahid, Laura Johnson, Tiffany Low


At Netflix, we have created millions of artwork to represent our titles. Each artwork tells a story about the title it represents. From our testing on promotional assets, we know which of these assets have performed well and which ones haven’t. Through this, our teams have developed an intuition of what visual and thematic artwork characteristics work well for what genres of titles. A piece of promotional artwork may resonate more in certain regions, for certain genres, or for fans of particular talent. The complexity of these factors makes it difficult to determine the best creative strategy for upcoming titles.

Our assets are often created by selecting static image frames directly from our source videos. To improve it, we decided to invest in creating a Media Understanding Platform, which enables us to extract meaningful insights from media that we can then surface in our creative tools. In this post, we will take a deeper look into one of these tools, AVA Discovery View.

Intro to AVA Discovery View

AVA is an internal tool that surfaces still frames from video content. The tool provides an efficient way for creatives (photo editors, artwork designers, etc.) to pull moments from video content that authentically represent the title’s narrative themes, main characters, and visual characteristics. These still moments are used by multiple teams across Netflix for artwork (on and off the Netflix platform), Publicity, Marketing, Social teams, and more.

Stills are used to merchandise & publicize titles authentically, providing a diverse set of entry points to members who may watch for different reasons. For example, for our hit title “Wednesday”, one member may watch it because they love mysteries, while another may watch because they love coming-of-age stories or goth aesthetics. Another member may be drawn by talent. It’s a creative’s job to select frames with all these entry points in mind. Stills may be enhanced and combined to create a more polished piece of artwork or be used as is. For many teams and titles, Stills are essential to Netflix’s promotional asset strategy.

Watching every moment of content to find the best frames and select them manually takes a lot of time, and this approach is often not scalable. While frames can be saved manually from the video content, AVA goes beyond providing the functionality to surface authentic frames — it suggests the best moments for creatives to use: enter AVA Discovery View.

Example of AVA Discovery View

AVA’s imagery-harvesting algorithms pre-select and group relevant frames into categories like Storylines & Tones, Prominent Characters, and Environments.

Let’s look deeper at how different facets of a title are shown in one of Netflix’s biggest hits — “Wednesday”.

Storyline / Tone

The title “Wednesday” involves a character with supernatural abilities sleuthing to solve a mystery. The title has a dark, imaginative tone with shades of wit and dry humor. The setting is an extraordinary high school where teenagers of supernatural abilities are enrolled. The main character is a teenager and has relationship issues with her parents.

The paragraph above provides a short glimpse of the title and is similar to the briefs that our creatives have to work with. Finding authentic moments from this information to build the base of the artwork suite is not trivial and has been very time-consuming for our creatives.

This is where AVA Discovery View comes in and functions as a creative assistant. Using the information about the storyline and tones associated with a title, it surfaces key moments, which not only provide a nice visual summary but also provide a quick landscape view of the title’s main narrative themes and its visual language.

Storyline & Tone suggestions

Creatives can click on any storyline to see moments that best reflect that storyline and the title’s overall tone. For example, the following images illustrate how it displays moments for the “imaginative” tone.

Prominent Characters

Talent is a major draw for our titles, and our members want to see who is featured in a title to choose whether or not they want to watch that title. Getting to know the prominent characters for a title and then finding the best possible moments featuring them used to be an arduous task.

With the AVA Discovery View, all the prominent characters of the title and their best possible shots are presented to the creatives. They can see how much a character is featured in the title and find shots containing multiple characters and the best possible stills for the characters themselves.


We don’t want the Netflix home screen to shock or offend audiences, so we aim to avoid artwork with violence, nudity, gore or similar attributes.

To help our creatives understand content sensitivities, AVA Discovery View lists moments where content contains gore, violence, intimacy, nudity, smoking, etc.

Sensitive Moments


The setting and the filming location often provide great genre cues and form the basis of great-looking artwork. Finding moments from a virtual setting in the title or the actual filming location required a visual scan of all episodes of a title. Now, AVA Discovery View shows such moments as suggestions to the creatives.

For example, for the title “Wednesday”, the creatives are presented with “Nevermore Academy” as a suggested environment

Suggested Environment — Nevermore Academy


Algorithm Quality

AVA Discovery View included several different algorithms at the start, and since its release, we have expanded support to additional algorithms. Each algorithm needed a process of evaluation and tuning to get great results in AVA Discovery View.

For Visual Search

  • We found that the model was influenced by the text present in the image. For example, stills of title credits would often get picked up and highly recommended to users. We added a step where such stills with text results would be filtered out and not present in the search.
  • We also found that users preferred results that had a confidence threshold cutoff applied to them.

For Prominent Characters

  • We found that our current algorithm model did not handle animated faces well. As a result, we often find that poor or no suggestions are returned for animated content.

For Sensitive Moments

  • We found that setting a high confidence threshold was helpful. The algorithm was originally developed to be sensitive to bloody scenes, and when applied to scenes of cooking and painting, often flagged as false positives.

One challenge we encountered was the repetition of suggestions. Multiple suggestions from the same scene could be returned and lead to many visually similar moments. Users preferred seeing only the best frames and a diverse set of frames.

  • We added a ranking step to some algorithms to mark frames too visually similar to higher-ranked frames. These duplicate frames would be filtered out from the suggestions list.
  • However, not all algorithms can take this approach. We are exploring using scene boundary algorithms to group similar moments together as a single recommendation.

Suggestion Ranking

AVA Discovery View presents multiple levels of algorithmic suggestions, and a challenge was to help users navigate through the best-performing suggestions and avoid selecting bad suggestions.

  • The suggestion categories are presented based on our users’ workflow relevance. We show Storyline/Tone, Prominent Characters, Environments, then Sensitivities.
  • Within each suggestion category, we display suggestions ranked by the number of results and tie break along the confidence threshold.

Algorithm Feedback

As we launched the initial set of algorithms for AVA Discovery View, our team interviewed users about their experiences. We also built mechanisms within the tool to get explicit and implicit user feedback.

Explicit Feedback

  • For each algorithmic suggestion presented to a user, users can click a thumbs up or thumbs down to give direct feedback.

Implicit Feedback

  • We have tracking enabled to detect when an algorithmic suggestion has been utilized (downloaded or published for use on Netflix promotional purposes).
  • This implicit feedback is much easier to collect, although it may not work for all algorithms. For example, suggestions from Sensitivities are meant to be content watch-outs that should not be used for promotional purposes. As a result, this row does poorly on implicit feedback as we do not expect downloads or publish actions on these suggestions.

This feedback is easily accessible by our algorithm partners and used in training improved versions of the models.

Intersection Queries across Multiple Algorithms

Several media understanding algorithms return clip or short-duration video segment suggestions. We compute the timecode intersections against a set of known high-quality frames to surface the best frame within these clips.

We also rely on intersection queries to help users narrow a large set of frames to a specific moment. For example, returning stills with two or more prominent characters or filtering only indoor scenes from a search query.

Technical Architecture

Discovery View Plugin Architecture

Discovery View Plugin Architecture

We built Discovery View as a pluggable feature that could quickly be extended to support more algorithms and other types of suggestions. Discovery View is available via Studio Gateway for AVA UI and other front-end applications to leverage.

Unified Interface for Discovery

All Discovery View rows implement the same interface, and it’s simple to extend it and plug it into the existing view.

Scalable Categories
In the Discovery View feature, we dynamically hide categories or recommendations based on the results of algorithms. Categories can be hidden if no suggestions are found. On the other hand, for a large number of suggestions, only top suggestions are retrieved, and users have the ability to request more.

Graceful Failure Handling
We load Discovery View suggestions independently for a responsive user experience.

Asset Feedback MicroService

Asset Feedback MicroService

We identified that Asset Feedback is a functionality that is useful elsewhere in our ecosystem as well, so we decided to create a separate microservice for it. The service serves an important function of getting feedback about the quality of stills and ties them to the algorithms. This information is available both at individual and aggregated levels for our algorithm partners.

Media Understanding Platform

AVA Discovery View relies on the Media Understanding Platform (MUP) as the main interface for algorithm suggestions. The key features of this platform are

Uniform Query Interface

Hosting all of the algorithms in AVA Discovery View on MUP made it easier for product integration as the suggestions could be queried from each algorithm similarly

Rich Query Feature Set

We could test different confidence thresholds per algorithm, intersect across algorithm suggestions, and order suggestions by various fields.

Fast Algo Onboarding

Each algorithm took fewer than two weeks to onboard, and the platform ensured that new titles delivered to Netflix would automatically generate algorithm suggestions. Our team was able to spend more time evaluating algorithm performance and quickly iterate on AVA Discovery View.

To learn more about MUP, please see a previous blog post from our team: Building a Media Understanding Platform for ML Innovations.


Discovering authentic moments in an efficient and scalable way has a huge impact on Netflix and its creative teams. AVA has become a place to gain title insights and discover assets. It provides a concise brief on the main narratives, the visual language, and the title’s prominent characters. An AVA user can find relevant and visually stunning frames quickly and easily and leverage them as a context-gathering tool.

Future Work

To improve AVA Discovery View, our team needs to balance the number of frames returned and the quality of the suggestions so that creatives can build more trust with the feature.

Eliminating Repetition

AVA Discovery View will often put the same frame into multiple categories, which results in creatives viewing and evaluating the same frame multiple times. How can we solve for an engaging frame being a part of multiple groupings without bloating each grouping with repetition?

Improving Frame Quality

We’d like to only show creatives the best frames from a certain moment and work to eliminate frames that have either poor technical quality (a poor character expression) or poor editorial quality (not relevant to grouping, not relevant to narrative). Sifting through frames that aren’t up to quality standards creates user fatigue.

Building User Trust

Creatives don’t want to wonder whether there’s something better outside an AVA Discovery View grouping or if anything is missing from these suggested frames.

When looking at a particular grouping (like “Wednesday”’s Solving a Mystery or Gothic), creatives need to trust that it doesn’t contain any frames that don’t belong there, that these are the best quality frames, and that there are no better frames that exist in the content that isn’t included in the grouping. Suppose a creative is leveraging AVA Discovery View and doing separate manual work to improve frame quality or check for missing moments. In that case, AVA Discovery View hasn’t yet fully optimized the user experience.


Special thanks to Abhishek Soni, Amir Ziai, Andrew Johnson, Ankush Agrawal, Aneesh Vartakavi, Audra Reed, Brianda Suarez, Faraz Ahmad, Faris Mustafa, Fifi Maree, Guru Tahasildar, Gustavo Carmo, Haley Jones Phillips, Janan Barge, Karen Williams, Laura Johnson, Maria Perkovic, Meenakshi Jindal, Nagendra Kamath, Nicola Pharoah, Qiang Liu, Samuel Carvajal, Shervin Ardeshir, Supriya Vadlamani, Varun Sekhri, and Vitali Kauhanka for making it all possible.

AVA Discovery View: Surfacing Authentic Moments was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Using Machine Learning to Detect Keystrokes

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/08/using-machine-learning-to-detect-keystrokes.html

Researchers have trained a ML model to detect keystrokes by sound with 95% accuracy.

“A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards”

Abstract: With recent developments in deep learning, the ubiquity of microphones and the rise in online services via personal devices, acoustic side channel attacks present a greater threat to keyboards than ever. This paper presents a practical implementation of a state-of-the-art deep learning model in order to classify laptop keystrokes, using a smartphone integrated microphone. When trained on keystrokes recorded by a nearby phone, the classifier achieved an accuracy of 95%, the highest accuracy seen without the use of a language model. When trained on keystrokes recorded using the video-conferencing software Zoom, an accuracy of 93% was achieved, a new best for the medium. Our results prove the practicality of these side channel attacks via off-the-shelf equipment and algorithms. We discuss a series of mitigation methods to protect users against these series of attacks.

News article.

Unsupervised graph anomaly detection – Catching new fraudulent behaviours

Post Syndicated from Grab Tech original https://engineering.grab.com/graph-anomaly-model

Earlier in this series, we covered the importance of graph networks, graph concepts, graph visualisation, and graph-based fraud detection methods. In this article, we will discuss how to automatically detect new types of fraudulent behaviour and swiftly take action on them.

One of the challenges in fraud detection is that fraudsters are incentivised to always adversarially innovate their way of conducting frauds, i.e., their modus operandi (MO in short). Machine learning models trained using historical data may not be able to pick up new MOs, as they are new patterns that are not available in existing training data. To enhance Grab’s existing security defences and protect our users from these new MOs, we needed a machine learning model that is able to detect them quickly without the need for any label supervision, i.e., an unsupervised learning model rather than the regular supervised learning model.

To address this, we developed an in-house machine learning model for detecting anomalous patterns in graphs, which has led to the discovery of new fraud MOs. Our focus was initially on GrabFood and GrabMart verticals, where we monitored the interactions between consumers and merchants. We modelled these interactions as a bipartite graph (a type of graph for modelling interactions between two groups) and then performed anomaly detection on the graph. Our in-house anomaly detection model was also presented at the International Joint Conference on Neural Networks (IJCNN) 2023, a premier academic conference in the area of neural networks, machine learning, and artificial intelligence.

In this blog, we discuss the model and its application within Grab. For avid audiences that want to read the details of our model, you can access it here. Note that even though we implemented our model for anomaly detection in GrabFood and GrabMart, the model is designed for general purposes and is applicable to interaction graphs between any two groups.

Interaction-Focused Anomaly Detection on Bipartite Node-and-Edge-Attributed Graphs
By Rizal Fathony, Jenn Ng, Jia Chen
Presented at International Joint Conference on Neural Networks (IJCNN) 2023

Before we dive into how our model works, it is important to understand the process of graph construction in our application as the model assumes the availability of the graphs in a standardised format.

Graph construction 

We modelled the interactions between consumers and merchants in GrabFood and GrabMart platforms as bipartite graphs (G), where the first group of nodes (U) represents the consumers, the second group of nodes (V) represents the merchants, and the edges (E) connecting them means that the consumers have placed some food/mart orders to the merchants. The graph is also supplied with rich transactional information about the consumers and the merchants in the form of node features (Xu and Xv), as well as order information in the form of edge features (Xe).

Fig 1. Graph construction process

The goal of our anomaly model is to detect anomalous and suspicious behaviours from the consumers or merchants (node-level anomaly detection), as well as anomalous order interactions (edge-level anomaly detection). As mentioned, this detection needs to be done without any label supervision.

Model architecture

We designed our graph anomaly model as a type of autoencoder, with an encoder and two decoders – a feature decoder and a structure decoder. The key feature of our model is that it accepts a bipartite graph with both node and edge attributes as the input. This is important as both node and edge attributes encode essential information for determining if certain behaviours are suspicious. Many previous works on graph anomaly detection only support node attributes. In addition, our model can produce both node and edge level anomaly scores, unlike most of the previous works that produce node-level scores only. We named our model GraphBEAN, which is short for Bipartite Node-and-Edge-Attributed Networks.

From the input, the encoder then processes the attributed bipartite graph into a series of graph convolution layers to produce latent representations for both node groups. Our graph convolution layers produce new representations for each node in both node groups (U and V), as well as for each edge in the graph. Note that the last convolution layer in the encoder only produces the latent representations for nodes, without producing edge representations. The reason for this design is that we only put the latent representations for the active actors, the nodes representing consumers and merchants, but not their interactions.

Fig 2. GraphBEAN architecture

From the nodes’ latent representations, the feature decoder is tasked to reconstruct the original graph with both node and edge attributes via a series of graph convolution layers. As the graph structure is provided by the feature decoder, we task the structure decoder to learn the graph structure by predicting if there exists an edge connecting two nodes. This edge prediction, as well as the graph reconstructed by the feature decoder, are then compared to the original input graph via a reconstruction loss function.

The model is then trained using the bipartite graph constructed from GrabFood and GrabMart transactions. We use a reconstruction-based loss function as the training objective of the model. After the training is completed, we compute the anomaly score of each node and edge in the graph using the trained model.

Anomaly score computation

Our anomaly scores are reconstruction-based. The score design assumes that normal behaviours are common in the dataset and thus, can be easily reconstructed by the model. On the other hand, anomalous behaviours are rare. Therefore the model will have a hard time reconstructing them, hence producing high errors.

Fig 3. Edge-level and node-level anomaly scores computation

The model produces two types of anomaly scores. First, the edge-level anomaly scores, which are calculated from the edge reconstruction error. Second, the node-level anomaly scores, which are calculated from node reconstruction error plus an aggregate over the edge scores from the edges connected to the node. This aggregate could be a mean or max aggregate.

Actioning system

In our implementation of GraphBEAN within Grab, we designed a full pipeline of anomaly detection and actioning systems. It is a fully-automated system for constructing a bipartite graph from GrabFood and GrabMart transactions, training a GraphBEAN model using the graph, and computing anomaly scores. After computing anomaly scores for all consumers and merchants (node-level), as well as all of their interactions (edge-level), it automatically passes the scores to our actioning system. But before that, it also passes them through a system we call fraud type tagger. This is also a fully-automated heuristic-based system that tags some of the detected anomalies with some fraud tags. The purpose of this tagging is to provide some context in general, like the types of detected anomalies. Some examples of these tags are promo abuse or possible collusion.

Fig 4. Pipeline in our actioning system

Both the anomaly scores and the fraud type tags are then forwarded to our actioning system. The system consists of two subsystems:

  • Human expert actioning system: Our fraud experts analyse the detected anomalies and perform certain actioning on them, like suspending certain transaction features from suspicious merchants.
  • Automatic actioning system: Combines the anomaly scores and fraud type tags with other external signals to automatically do actioning on the detected anomalies, like preventing promos from being used by fraudsters or preventing fraudulent transactions from occurring. These actions vary depending on the type of fraud and the scores.

What’s next?

The GraphBEAN model enables the detection of suspicious behaviour on graph data without the need for label supervision. By implementing the model on GrabFood and GrabMart platforms, we learnt that having such a system enables us to quickly identify new types of fraudulent behaviours and then swiftly perform action on them. This also allows us to enhance Grab’s defence against fraudulent activity and actively protect our users.

We are currently working on extending the model into more generic heterogeneous (multi-entity) graphs. In addition, we are also working on implementing it to more use cases within Grab.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

Indirect Instruction Injection in Multi-Modal LLMs

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/07/indirect-instruction-injection-in-multi-modal-llms.html

Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs“:

Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

Building Generative AI into Marketing Strategies: A Primer

Post Syndicated from nnatri original https://aws.amazon.com/blogs/messaging-and-targeting/building-generative-ai-into-marketing-strategies-a-primer/


Artificial Intelligence has undoubtedly shaped many industries and is poised to be one of the most transformative technologies in the 21st century. Among these is the field of marketing where the application of generative AI promises to transform the landscape. This blog post explores how generative AI can revolutionize marketing strategies, offering innovative solutions and opportunities.

According to Harvard Business Review, marketing’s core activities, such as understanding customer needs, matching them to products and services, and persuading people to buy, can be dramatically enhanced by AI. A 2018 McKinsey analysis of more than 400 advanced use cases showed that marketing was the domain where AI would contribute the greatest value. The ability to leverage AI can not only help automate and streamline processes but also deliver personalized, engaging content to customers. It enhances the ability of marketers to target the right audience, predict consumer behavior, and provide personalized customer experiences. AI allows marketers to process and interpret massive amounts of data, converting it into actionable insights and strategies, thereby redefining the way businesses interact with customers.

Generating content is just one part of the equation. AI-generated content, no matter how good, is useless if it does not arrive at the intended audience at the right point of time. Integrating the generated content into an automated marketing pipeline that not only understands the customer profile but also delivers a personalized experience at the right point of interaction is also crucial to getting the intended action from the customer.

Amazon Web Services (AWS) provides a robust platform for implementing generative AI in marketing strategies. AWS offers a range of AI and machine learning services that can be leveraged for various marketing use cases, from content creation to customer segmentation and personalized recommendations. Two services that are instrumental to delivering customer contents and can be easily integrated with other generative AI services are Amazon Pinpoint and Amazon Simple Email Service. By integrating generative AI with Amazon Pinpoint and Amazon SES, marketers can automate the creation of personalized messages for their customers, enhancing the effectiveness of their campaigns. This combination allows for a seamless blend of AI-powered content generation and targeted, data-driven customer engagement.

As we delve deeper into this blog post, we’ll explore the mechanics of generative AI, its benefits and how AWS services can facilitate its integration into marketing communications.

What is Generative AI?

Generative AI is a subset of artificial intelligence that leverages machine learning techniques to generate new data instances that resemble your training data. It works by learning the underlying patterns and structures of the input data, and then uses this understanding to generate new, similar data. This is achieved through the use of models like Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformer models.

What do Generative AI buzzwords mean?

In the world of AI, buzzwords are abundant. Terms like “deep learning”, “neural networks”, “machine learning”, “generative AI”, and “large language models” are often used interchangeably, but they each have distinct meanings. Understanding these terms is crucial for appreciating the capabilities and limitations of different AI technologies.

Machine Learning (ML) is a subset of AI that involves the development of algorithms that allow computers to learn from and make decisions or predictions based on data. These algorithms can be ‘trained’ on a dataset and then used to predict or classify new data. Machine learning models can be broadly categorized into supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

Deep Learning is a subset of machine learning that uses neural networks with many layers (hence “deep”) to model and understand complex patterns. These layers of neurons process different features, and their outputs are combined to produce a final result. Deep learning models can handle large amounts of data and are particularly good at processing images, speech, and text.

Generative AI refers specifically to AI models that can generate new data that mimic the data they were trained on. This is achieved through the use of models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs). Generative AI can create anything from written content to visual designs, and even music, making it a versatile tool in the hands of marketers.

Large Language Models (LLMs) are a type of generative AI that are trained on a large corpus of text data and can generate human-like text. They predict the probability of a word given the previous words used in the text. They are particularly useful in applications like text completion, translation, summarization, and more. While they are a type of generative AI, they are specifically designed for handling text data.

Simply put, you can understand that Large Language Model is a subset of Generative AI, which is then a subset of Machine Learning and they ultimately falls under the umbrella term of Artificial Intelligence.

What are the problems with generative AI and marketing?

While generative AI holds immense potential for transforming marketing strategies, it’s important to be aware of its limitations and potential pitfalls, especially when it comes to content generation and customer engagement. Here are some common challenges that marketers should be aware of:

Bias in Generative AI Generative AI models learn from the data they are trained on. If the training data is biased, the AI model will likely reproduce these biases in its output. For example, if a model is trained primarily on data from one demographic, it may not accurately represent other demographics, leading to marketing campaigns that are ineffective or offensive. Imagine if you are trying to generate an image for a campaign targeting females, a generative AI model might not generate images of females in jobs like doctors, lawyers or judges, leading your campaign to suffer from bias and uninclusiveness.

Insensitivity to Cultural Nuances Generative AI models may not fully understand cultural nuances or sensitive topics, which can lead to content that is insensitive or even harmful. For instance, a generative AI model used to create social media posts for a global brand may inadvertently generate content that is seen as disrespectful or offensive by certain cultures or communities.

Potential for Inappropriate or Offensive Content Generative AI models can sometimes generate content that is inappropriate or offensive. This is often because the models do not fully understand the context in which certain words or phrases should be used. It’s important to have safeguards in place to review and approve content before it’s published. A common problem with LLMs is hallucination: whereby the model speaks false knowledge as if it is accurate. A marketing team might mistakenly publish a auto-generated promotional content that contains a 20% discount on an item when no such promotions were approved. This could have disastrous effect if safeguards are not in place and erodes customers’ trust.

Intellectual Property and Legal Concerns Generative AI models can create new content, such as images, music, videos, and text, which raises questions of ownership and potential copyright infringement. Being a relatively new field, legal discussions are still ongoing to discuss legal implications of using Generative AI, e.g. who should own generated AI content, and copyright infringement.

Not a Replacement for Human Creativity Finally, while generative AI can automate certain aspects of marketing campaigns, it cannot replace the creativity or emotional connections that marketers use in crafting compelling campaigns. The most successful marketing campaigns touch the hearts of the customers, and while Generative AI is very capable of replicating human content, it still lacks in mimicking that “human touch”.

In conclusion, while generative AI offers exciting possibilities for marketing, it’s important to approach its use with a clear understanding of its limitations and potential pitfalls. By doing so, marketers can leverage the benefits of generative AI while mitigating risks.

How can I use generative AI in marketing communications?

Amazon Web Services (AWS) provides a comprehensive suite of services that facilitate the use of generative AI in marketing. These services are designed to handle a variety of tasks, from data processing and storage to machine learning and analytics, making it easier for marketers to implement and benefit from generative AI technologies.

Overview of Relevant AWS Services

AWS offers several services that are particularly relevant for generative AI in marketing:

  • Amazon Bedrock: This service makes FMs accessible via an API. Bedrock offers the ability to access a range of powerful FMs for text and images, including Amazon’s Titan FMs. With Bedrock’s serverless experience, customers can easily find the right model for what they’re trying to get done, get started quickly, privately customize FMs with their own data, and easily integrate and deploy them into their applications using the AWS tools and capabilities they are familiar with.
  • Amazon Titan Models: These are two new large language models (LLMs) that AWS is announcing. The first is a generative LLM for tasks such as summarization, text generation, classification, open-ended Q&A, and information extraction. The second is an embeddings LLM that translates text inputs into numerical representations (known as embeddings) that contain the semantic meaning of the text. In response to the pitfalls mentioned above around Generative AI hallucinations and inaccurate information, AWS is actively working on improving accuracy and ensuring its Titan models produce high-quality responses, said Bratin Saha, an AWS vice president.
  • Amazon SageMaker: This fully managed service enables data scientists and developers to build, train, and deploy machine learning models quickly. SageMaker includes modules that can be used for generative AI, such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs).
  • Amazon Pinpoint: This flexible and scalable outbound and inbound marketing communications service enables businesses to engage with customers across multiple messaging channels. Amazon Pinpoint is designed to scale with your business, allowing you to send messages to a large number of users in a short amount of time. It integrates with AWS’s generative AI services to enable personalized, AI-driven marketing campaigns.
  • Amazon Simple Email Service (SES): This cost-effective, flexible, and scalable email service enables marketers to send transactional emails, marketing messages, and other types of high-quality content to their customers. SES integrates with other AWS services, making it easy to send emails from applications being hosted on services such as Amazon EC2. SES also works seamlessly with Amazon Pinpoint, allowing for the creation of customer engagement communications that drive user activity and engagement.

How to build Generative AI into marketing communications

Dynamic Audience Targeting and Segmentation: Generative AI can help marketers to dynamically target and segment their audience. It can analyze customer data and behavior to identify patterns and trends, which can then be used to create more targeted marketing campaigns. Using Amazon Sagemaker or the soon-to-be-available Amazon Bedrock and Amazon Titan Models, Generative AI can suggest labels for customers based on unstructured data. According to McKinsey, generative AI can analyze data and identify consumer behavior patterns to help marketers create appealing content that resonates with their audience.

Personalized Marketing: Generative AI can be used to automate the creation of marketing content. This includes generating text for blogs, social media posts, and emails, as well as creating images and videos. This can save marketers a significant amount of time and effort, allowing them to focus on other aspects of their marketing strategy. Where it really shines is the ability to productionize marketing content creation, reducing the needs for marketers to create multiple copies for different customer segments. Previously, marketers would need to generate many different copies for each granularity of customers (e.g. attriting customers who are between the age of 25-34 and loves food). Generative AI can automate this process, providing the opportunities to dynamically create these contents programmatically and automatically send out to the most relevant segments via Amazon Pinpoint or Amazon SES.

Marketing Automation: Generative AI can automate various aspects of marketing, such as email marketing, social media marketing, and search engine marketing. This includes automating the creation and distribution of marketing content, as well as analyzing the performance of marketing campaigns. Amazon Pinpoint currently automates customer communications using journeys which is a customized, multi-step engagement experience. Generative AI could create a Pinpoint journey based on customer engagement data, engagement parameters and a prompt. This enables GenAI to not only personalize the content but create a personalized omnichannel experience that can extend throughout a period of time. It then becomes possible that journeys are created dynamically by generative AI and A/B tested on the fly to achieve an optimal pre-defined Key Performance Indicator (KPI).

A Sample Generative AI Use Case in Marketing Communications

AWS services are designed to work together, making it easy to implement generative AI in your marketing strategies. For instance, you can use Amazon SageMaker to build and train your generative AI models which assist with automating marketing content creation, and Amazon Pinpoint or Amazon SES to deliver the content to your customers.

Companies using AWS can theoretically supplement their existing workloads with generative AI capabilities without the needs for migration. The following reference architecture outlines a sample use case and showcases how Generative AI can be integrated into your customer journeys built on the AWS cloud. An e-commerce company can potentially receive many complaints emails a day. Companies spend a lot of money to acquire customers, it’s therefore important to think about how to turn that negative experience into a positive one.


When an email is received via Amazon SES (1), its content can be passed through to generative AI models using GANs to help with sentiment analysis (2). An article published by Amazon Science utilizes GANs for sentiment analysis for cases where a lack of data is a problem. Alternatively, one can also use Amazon Comprehend at this step and run A/B tests between the two models. The limitations with Amazon Comprehend would be the limited customizations you can perform to the model to fit your business needs.

Once the email’s sentiment is determined, the sentiment event is logged into Pinpoint (3), which then triggers an automatic winback journey (4).

Generative AI (e.g. HuggingFace’s Bloom Text Generation Models) can again be used here to dynamically create the content without needing to wait for the marketer’s input (5). Whereas marketers would need to generate many different copies for each granularity of customers (e.g. attriting customers who are between the age of 25-34 and loves food), generative AI provides the opportunities to dynamically create these contents on the fly given the above inputs.

Once the campaign content has been generated, the model pumps the template backs into Amazon Pinpoint (6), which then sends the personalized copy to the customer (7).

Result: Another customer is saved from attrition!


The landscape of generative AI is vast and ever-evolving, offering a plethora of opportunities for marketers to enhance their strategies and deliver more personalized, engaging content. AWS plays a pivotal role in this landscape, providing a comprehensive suite of services that facilitate the implementation of generative AI in marketing. From building and training AI models with Amazon SageMaker to delivering personalized messages with Amazon Pinpoint and Amazon SES, AWS provides the tools and infrastructure needed to harness the power of generative AI.

The potential of generative AI in relation to the marketer is immense. It offers the ability to automate content creation, personalize customer interactions, and derive valuable insights from data, among other benefits. However, it’s important to remember that while generative AI can automate certain aspects of marketing, it is not a replacement for human creativity and intuition. Instead, it should be viewed as a tool that can augment human capabilities and free up time for marketers to focus on strategy and creative direction.

Get started with Generative AI in marketing communications

As we conclude this exploration of generative AI and its applications in marketing, we encourage you to:

  • Brainstorm potential Generative AI use cases for your business. Consider how you can leverage generative AI to enhance your marketing strategies. This could involve automating content creation, personalizing customer interactions, or deriving insights from data.
  • Start leveraging generative AI in your marketing strategies with AWS today. AWS provides a comprehensive suite of services that make it easy to implement generative AI in your marketing strategies. By integrating these services into your workflows, you can enhance personalization, improve customer engagement, and drive better results from your campaigns.
  • Watch out for the next part in the series of integrating Generative AI into Amazon Pinpoint and SES. We will delve deeper into how you can leverage Amazon Pinpoint and SES together with generative AI to enhance your marketing campaigns. Stay tuned!

The journey into the world of generative AI is just beginning. As technology continues to evolve, so too will the opportunities for marketers to leverage AI to enhance their strategies and deliver more personalized, engaging content. We look forward to exploring this exciting frontier with you.

About the Author

Tristan (Tri) Nguyen

Tristan (Tri) Nguyen

Tristan (Tri) Nguyen is an Amazon Pinpoint and Amazon Simple Email Service Specialist Solutions Architect at AWS. At work, he specializes in technical implementation of communications services in enterprise systems and architecture/solutions design. In his spare time, he enjoys chess, rock climbing, hiking and triathlon.

Introducing the latest Machine Learning Lens for the AWS Well-Architected Framework

Post Syndicated from Raju Patil original https://aws.amazon.com/blogs/architecture/introducing-the-latest-machine-learning-lens-for-the-aws-well-architected-framework/

Today, we are delighted to introduce the latest version of the AWS Well-Architected Machine Learning (ML) Lens whitepaper. The AWS Well-Architected Framework provides architectural best practices for designing and operating ML workloads on AWS. It is based on six pillars: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and—a new addition to this revision—Sustainability. The ML Lens uses the Well-Architected Framework to outline the steps for performing an AWS Well-Architected review for your ML implementations.

The ML Lens provides a consistent approach for customers to evaluate ML architectures, implement scalable designs, and identify and mitigate technical risks. It covers common ML implementation scenarios and identifies key workload elements to allow you to architect your cloud-based applications and workloads according to the AWS best practices that we have gathered from supporting thousands of customer implementations.

The new ML Lens joins a collection of Well-Architected lenses that focus on specialized workloads such as the Internet of Things (IoT), games, SAP, financial services, and SaaS technologies. You can find more information in AWS Well-Architected Lenses.

What is the Machine Learning Lens?

Let’s explore the ML Lens across ML lifecycle phases, as the following figure depicts.

Machine Learning Lens

Figure 1. Machine Learning Lens

The Well-Architected ML Lens whitepaper focuses on the six pillars of the Well-Architected Framework across six phases of the ML lifecycle. The six phases are:

  1. Defining your business goal
  2. Framing your ML problem
  3. Preparing your data sources
  4. Building your ML model
  5. Entering your deployment phase
  6. Establishing the monitoring of your ML workload

Unlike the traditional waterfall approach, an iterative approach is required to achieve a working prototype based on the six phases of the ML lifecycle. The whitepaper provides you with a set of established cloud-agnostic best practices in the form of Well-Architected Pillars for each ML lifecycle phase. You can also use the Well-Architected ML Lens wherever you are on your cloud journey. You can choose either to apply this guidance during the design of your ML workloads, or after your workloads have entered production as a part of the continuous improvement process.

What’s new in the Machine Learning Lens?

  1. Sustainability Pillar: As building and running ML workloads becomes more complex and consumes more compute power, refining compute utilities and assessing your carbon footprint from these workloads grows to critical importance. The new pillar focuses on long-term environmental sustainability and presents design principles that can help you build ML architectures that maximize efficiency and reduce waste.
  2. Improved best practices and implementation guidance: Notably, enhanced guidance to identify and measure how ML will bring business value against ML operational cost to determine the return on investment (ROI).
  3. Updated guidance on new features and services: A set of updated ML features and services announced to-date have been incorporated into the ML Lens whitepaper. New additions include, but are not limited to, the ML governance features, the model hosting features, and the data preparation features. These and other improvements will make it easier for your development team to create a well-architected ML workloads in your enterprise.
  4. Updated links: Many documents, blogs, instructional and video links have been updated to reflect a host of new products, features, and current industry best practices to assist your ML development.

Who should use the Machine Learning Lens?

The Machine Learning Lens is of use to many roles, including:

  • Business leaders for a broader appreciation of the end-to-end implementation and benefits of ML
  • Data scientists to understand how the critical modeling aspects of ML fit in a wider context
  • Data engineers to help you use your enterprise’s data assets to their greatest potential through ML
  • ML engineers to implement ML prototypes into production workloads reliably, securely, and at scale
  • MLOps engineers to build and manage ML operation pipelines for faster time to market
  • Risk and compliance leaders to understand how the ML can be implemented responsibly by providing compliance with regulatory and governance requirements

Machine Learning Lens components

The Lens includes four focus areas:

1. The Well-Architected Machine Learning Design Principles

A set of best practices that are used as the basis for developing a Well-Architected ML workload.

2. The Machine Learning Lifecycle and the Well Architected Framework Pillars

This considers all aspects of the Machine Learning Lifecycle and reviews design strategies to align to pillars of the overall Well-Architected Framework.

  • The Machine Learning Lifecycle phases referenced in the ML Lens include:
    • Business goal identification – identification and prioritization of the business problem to be addressed, along with identifying the people, process, and technology changes that may be required to measure and deliver business value.
    • ML problem framing – translating the business problem into an analytical framing, i.e., characterizing the problem as an ML task, such as classification, regression, or clustering, and identifying the technical success metrics for the ML model.
    • Data processing – garnering and integrating datasets, along with necessary data transformations needed to produce a rich set of features.
    • Model development – iteratively training and tuning your model, and evaluating candidate solutions in terms of the success metrics as well as including wider considerations such as bias and explainability.
    • Model deployment – establishing the mechanism to flow data though the model in a production setting to make inferences based on production data.
    • Model monitoring – tracking the performance of the production model and the characteristics of the data used for inference.
  • The Well-Architected Framework Pillars are:
    • Operational Excellence – ability to support ongoing development, run operational workloads effectively, gain insight into your operations, and continuously improve supporting processes and procedures to deliver business value.
    • Security – ability to protect data, systems, and assets, and to take advantage of cloud technologies to improve your security.
    • Reliability – ability of a workload to perform its intended function correctly and consistently, and to automatically recover from failure situations.
    • Performance Efficiency – ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as system demand changes and technologies evolve.
    • Cost Optimization – ability to run systems to deliver business value at the lowest price point.
    • Sustainability – addresses the long-term environmental, economic, and societal impact of your business activities.

3. Cloud-agnostic best practices

These are best practices for each ML lifecycle phase across the Well-Architected Framework pillars irrespective of your technology setting. The best practices are accompanied by:

  • Implementation guidance – the AWS implementation plans for each best practice with references to AWS technologies and resources.
  • Resources – a set of links to AWS documents, blogs, videos, and code examples as supporting resources to the best practices and their implementation plans.

4. Indicative ML Lifecycle architecture diagrams to illustrate processes, technologies, and components that support many of these best practices.

What are the next steps?

The new Well-Architected Machine Learning Lens whitepaper is available now. Use the Lens whitepaper to determine that your ML workloads are architected with operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability in mind.

If you require support on the implementation or assessment of your Machine Learning workloads, please contact your AWS Solutions Architect or Account Representative.

Special thanks to everyone across the AWS Solution Architecture, AWS Professional Services, and Machine Learning communities, who contributed to the Lens. These contributions encompassed diverse perspectives, expertise, backgrounds, and experiences in developing the new AWS Well-Architected Machine Learning Lens.

Globally distributed AI and a Constellation update

Post Syndicated from Rita Kozlov original http://blog.cloudflare.com/globally-distributed-ai-and-a-constellation-update/

Globally distributed AI and a Constellation update

Globally distributed AI and a Constellation update

During Cloudflare's 2023 Developer Week, we announced Constellation, a set of APIs that allow everyone to run fast, low-latency inference tasks using pre-trained machine learning/AI models, directly on Cloudflare’s network.

Constellation update

We now have a few thousand accounts onboarded in the Constellation private beta and have been listening to our customer's feedback to evolve and improve the platform. Today, one month after the announcement, we are upgrading Constellation with three new features:

Bigger models
We are increasing the size limit of your models from 10 MB to 50 MB. While still somewhat conservative during the private beta, this new limit opens doors to more pre-trained and optimized models you can use with Constellation.

Tensor caching
When you run a Constellation inference task, you pass multiple tensor objects as inputs, sometimes creating big data payloads. These inputs travel through the wire protocol back and forth when you repeat the same task, even when the input changes from multiple runs are minimal, creating unnecessary network and data parsing overhead.

The client API now supports caching input tensors resulting in even better network latency and faster inference times.

XGBoost runtime
Constellation started with the ONNX runtime, but our vision is to support multiple runtimes under a common API. Today we're adding the XGBoost runtime to the list.

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable, and it's known for its performance in structured and tabular data tasks.

You can start uploading and using XGBoost models today.

Globally distributed AI and a Constellation update

You can find the updated documentation with these new features and an example on how to use the XGBoost runtime with Constellation in our Developers Documentation.

An era of globally distributed AI

Since Cloudflare’s network is globally distributed, Constellation is our first public release of globally distributed machine learning.

But what does this mean? You may not think of a global network as the place to deploy your machine learning tasks, but machine learning has been a core part of what’s enabled much of Cloudflare’s core functionality for many years. And we run it across our global network in 300 cities.

Is this large spike in traffic an attack or a Black Friday sale? What’s going to be the best way to route this request based on current traffic patterns? Is this request coming from a human or a bot? Is this HTTP traffic a zero-day? Being able to answer these questions using automated machine learning and AI, rather than human intervention, is one of the things that’s enabled Cloudflare to scale.

But this is just a small sample of what globally distributed machine learning enables. The reason this was so helpful for us was because we were able to run this machine learning as an integrated part of our stack, which is why we’re now in the process of opening it up to more and more developers with Constellation.

As Michelle Zatlyn, our co-founder likes to say, we’re just getting started (in this space) — every day we’re adding hundreds of new users to our Constellation beta, testing out and globally deploying new models, and beyond that, deploying new hardware to support the new types of workloads that AI will bring to the our global network.

With that, we wanted to share a few announcements and some use cases that help illustrate why we’re so excited about globally distributed AI. And since it’s Speed Week, it should be no surprise that, well, speed is at the crux of it all.

Custom tailored web experiences, powered by AI

We’ve long known about the importance of performance when it comes to web experiences — in e-commerce, every second of page load time can have as much as a 7% drop off effect on conversion. But being fast is not enough. It’s necessary, but not sufficient. You also have to be accurate.

That is, rather than serving one-size-fits-all experiences, users have come to expect that you know what they want before they do.

So you have to serve personalized experiences, and you have to do it fast. That’s where Constellation can come into play. With Constellation, as a part of your e-commerce application that may already be served from Cloudflare’s network through Workers or Pages, or even store data in D1, you can now perform tasks such as categorization (what demographic is this customer most likely in?) and personalization (if you bought this, you may also like that).

Making devices smarter wherever they are

Another use case where performance is critical is in interacting with the real world. Imagine a face recognition system that detects whether you’re human or not every time you go into your house. Every second of latency makes a difference (especially if you’re holding heavy groceries).

Running inference on Cloudflare’s network, means that within 95% of the world’s population, compute, and thus a decision, is never going to be more than 50ms away. This is in huge contrast to centralized compute, where if you live in Europe, but bought a doorbell system from a US-based company, may be up to hundreds of milliseconds round trip away.

You may be thinking, why not just run the compute on the device then?

For starters, running inference on the device doesn’t guarantee fast performance. Most devices with built in intelligence are run on microcontrollers, often with limited computational abilities (not a high-end GPU or server-grade CPU). Milliseconds become seconds; depending on the volume of workloads you need to process, the local inference might not be suitable. The compute that can be fit on devices is simply not powerful enough for high-volume complex operations, certainly not for operating at low-latency.

But even user experience aside (some devices don’t interface with a user directly), there are other downsides to running compute directly on devices.

The first is battery life — the longer the compute, the shorter the battery life. There's always a power consumption hit, even if you have a custom ASIC chip or a Tensor Processing Unit (TPU), meaning shorter battery life if that's one of your constraints. For consumer products, this means having to switch out your doorbell battery (lest you get locked out). For operating fleets of devices at scale (imagine watering devices in a field) this means costs of keeping up with, and swapping out batteries.

Lastly, device hardware, and even software, is harder to update. As new technologies or more efficient chips become available, upgrading fleets of hundreds or thousands of devices is challenging. And while software updates may be easier to manage, they’ll never be as easy as updating on-cloud software, where you can effortlessly ship updates multiple times a day!

Speaking of shipping software…

AI applications, easier than ever with Constellation

Speed Week is not just about making your applications or devices faster, but also your team!

For the past six years, our developer platform has been making it easy for developers to ship new code with Cloudflare Workers. With Constellation, it’s now just as easy to add Machine Learning to your existing application, with just a few commands.

And if you don’t believe us, don’t just take our word for it. We’re now in the process of opening up the beta to more and more customers. To request access, head on over to the Cloudflare Dashboard where you’ll see a new tab for Constellation. We encourage you to check out our tutorial for getting started with Constellation — this AI thing may be even easier than you expected it to be!

We’re just getting started

This is just the beginning of our journey for helping developers build AI driven applications, and we’re already thinking about what’s next.

We look forward to seeing what you build, and hearing your feedback.

Detecting Scene Changes in Audiovisual Content

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/detecting-scene-changes-in-audiovisual-content-77a61d3eaad6

Avneesh Saluja, Andy Yao, Hossein Taghavi


When watching a movie or an episode of a TV show, we experience a cohesive narrative that unfolds before us, often without giving much thought to the underlying structure that makes it all possible. However, movies and episodes are not atomic units, but rather composed of smaller elements such as frames, shots, scenes, sequences, and acts. Understanding these elements and how they relate to each other is crucial for tasks such as video summarization and highlights detection, content-based video retrieval, dubbing quality assessment, and video editing. At Netflix, such workflows are performed hundreds of times a day by many teams around the world, so investing in algorithmically-assisted tooling around content understanding can reap outsized rewards.

While segmentation of more granular units like frames and shot boundaries is either trivial or can primarily rely on pixel-based information, higher order segmentation¹ requires a more nuanced understanding of the content, such as the narrative or emotional arcs. Furthermore, some cues can be better inferred from modalities other than the video, e.g. the screenplay or the audio and dialogue track. Scene boundary detection, in particular, is the task of identifying the transitions between scenes, where a scene is defined as a continuous sequence of shots that take place in the same time and location (often with a relatively static set of characters) and share a common action or theme.

In this blog post, we present two complementary approaches to scene boundary detection in audiovisual content. The first method, which can be seen as a form of weak supervision, leverages auxiliary data in the form of a screenplay by aligning screenplay text with timed text (closed captions, audio descriptions) and assigning timestamps to the screenplay’s scene headers (a.k.a. sluglines). In the second approach, we show that a relatively simple, supervised sequential model (bidirectional LSTM or GRU) that uses rich, pretrained shot-level embeddings can outperform the current state-of-the-art baselines on our internal benchmarks.

Figure 1: a scene consists of a sequence of shots.

Leveraging Aligned Screenplay Information

Screenplays are the blueprints of a movie or show. They are formatted in a specific way, with each scene beginning with a scene header, indicating attributes such as the location and time of day. This consistent formatting makes it possible to parse screenplays into a structured format. At the same time, a) changes made on the fly (directorial or actor discretion) or b) in post production and editing are rarely reflected in the screenplay, i.e. it isn’t rewritten to reflect the changes.

Figure 2: screenplay elements, from The Witcher S1E1.

In order to leverage this noisily aligned data source, we need to align time-stamped text (e.g. closed captions and audio descriptions) with screenplay text (dialogue and action² lines), bearing in mind a) the on-the-fly changes that might result in semantically similar but not identical line pairs and b) the possible post-shoot changes that are more significant (reordering, removing, or inserting entire scenes). To address the first challenge, we use pre trained sentence-level embeddings, e.g. from an embedding model optimized for paraphrase identification, to represent text in both sources. For the second challenge, we use dynamic time warping (DTW), a method for measuring the similarity between two sequences that may vary in time or speed. While DTW assumes a monotonicity condition on the alignments³ which is frequently violated in practice, it is robust enough to recover from local misalignments and the vast majority of salient events (like scene boundaries) are well-aligned.

As a result of DTW, the scene headers have timestamps that can indicate possible scene boundaries in the video. The alignments can also be used to e.g., augment audiovisual ML models with screenplay information like scene-level embeddings, or transfer labels assigned to audiovisual content to train screenplay prediction models.

Figure 3: alignments between screenplay and video via time stamped text for The Witcher S1E1.

A Multimodal Sequential Model

The alignment method above is a great way to get up and running with the scene change task since it combines easy-to-use pretrained embeddings with a well-known dynamic programming technique. However, it presupposes the availability of high-quality screenplays. A complementary approach (which in fact, can use the above alignments as a feature) that we present next is to train a sequence model on annotated scene change data. Certain workflows in Netflix capture this information, and that is our primary data source; publicly-released datasets are also available.

From an architectural perspective, the model is relatively simple — a bidirectional GRU (biGRU) that ingests shot representations at each step and predicts if a shot is at the end of a scene.⁴ The richness in the model comes from these pretrained, multimodal shot embeddings, a preferable design choice in our setting given the difficulty in obtaining labeled scene change data and the relatively larger scale at which we can pretrain various embedding models for shots.

For video embeddings, we leverage an in-house model pretrained on aligned video clips paired with text (the aforementioned “timestamped text”). For audio embeddings, we first perform source separation to try and separate foreground (speech) from background (music, sound effects, noise), embed each separated waveform separately using wav2vec2, and then concatenate the results. Both early and late-stage fusion approaches are explored; in the former (Figure 4a), the audio and video embeddings are concatenated and fed into a single biGRU, and in the latter (Figure 4b) each input modality is encoded with its own biGRU, after which the hidden states are concatenated prior to the output layer.

Figure 4a: Early Fusion (concatenate embeddings at the input).
Figure 4b: Late Fusion (concatenate prior to prediction output).

We find:

  • Our results match and sometimes even outperform the state-of-the-art (benchmarked using the video modality only and on our evaluation data). We evaluate the outputs using F-1 score for the positive label, and also relax this evaluation to consider “off-by-n” F-1 i.e., if the model predicts scene changes within n shots of the ground truth. This is a more realistic measure for our use cases due to the human-in-the-loop setting that these models are deployed in.
  • As with previous work, adding audio features improves results by 10–15%. A primary driver of variation in performance is late vs. early fusion.
  • Late fusion is consistently 3–7% better than early fusion. Intuitively, this result makes sense — the temporal dependencies between shots is likely modality-specific and should be encoded separately.


We have presented two complementary approaches to scene boundary detection that leverage a variety of available modalities — screenplay, audio, and video. Logically, the next steps are to a) combine these approaches and use screenplay features in a unified model and b) generalize the outputs across multiple shot-level inference tasks, e.g. shot type classification and memorable moments identification, as we hypothesize that this path would be useful for training general purpose video understanding models of longer-form content. Longer-form content also contains more complex narrative structure, and we envision this work as the first in a series of projects that aim to better integrate narrative understanding in our multimodal machine learning models.

Special thanks to Amir Ziai, Anna Pulido, and Angie Pollema.


  1. Sometimes referred to as boundary detection to avoid confusion with image segmentation techniques.
  2. Descriptive (non-dialogue) lines that describe the salient aspects of a scene.
  3. For two sources X and Y, if a) shot a in source X is aligned to shot b in source Y, b) shot c in source X is aligned to shot d in source Y, and c) shot c comes after shot a in X, then d) shot d has to come after shot b in Y.
  4. We experiment with adding a Conditional Random Field (CRF) layer on top to enforce some notion of global consistency, but found it did not improve the results noticeably.

Detecting Scene Changes in Audiovisual Content was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Every request, every microsecond: scalable machine learning at Cloudflare

Post Syndicated from Alex Bocharov original http://blog.cloudflare.com/scalable-machine-learning-at-cloudflare/

Every request, every microsecond: scalable machine learning at Cloudflare

Every request, every microsecond: scalable machine learning at Cloudflare

In this post, we will take you through the advancements we've made in our machine learning capabilities. We'll describe the technical strategies that have enabled us to expand the number of machine learning features and models, all while substantially reducing the processing time for each HTTP request on our network. Let's begin.


For a comprehensive understanding of our evolved approach, it's important to grasp the context within which our machine learning detections operate. Cloudflare, on average, serves over 46 million HTTP requests per second, surging to more than 63 million requests per second during peak times.

Machine learning detection plays a crucial role in ensuring the security and integrity of this vast network. In fact, it classifies the largest volume of requests among all our detection mechanisms, providing the final Bot Score decision for over 72% of all HTTP requests. Going beyond, we run several machine learning models in shadow mode for every HTTP request.

At the heart of our machine learning infrastructure lies our reliable ally, CatBoost. It enables ultra low-latency model inference and ensures high-quality predictions to detect novel threats such as stopping bots targeting our customers' mobile apps. However, it's worth noting that machine learning model inference is just one component of the overall latency equation. Other critical components include machine learning feature extraction and preparation. In our quest for optimal performance, we've continuously optimized each aspect contributing to the overall latency of our system.

Initially, our machine learning models relied on single-request features, such as presence or value of certain headers. However, given the ease of spoofing these attributes, we evolved our approach. We turned to inter-request features that leverage aggregated information across multiple dimensions of a request in a sliding time window. For example, we now consider factors like the number of unique user agents associated with certain request attributes.

The extraction and preparation of inter-request features were handled by Gagarin, a Go-based feature serving platform we developed. As a request arrived at Cloudflare, we extracted dimension keys from the request attributes. We then looked up the corresponding machine learning features in the multi-layered cache. If the desired machine learning features were not found in the cache, a memcached "get" request was made to Gagarin to fetch those. Then machine learning features were plugged into CatBoost models to produce detections, which were then surfaced to the customers via Firewall and Workers fields and internally through our logging pipeline to ClickHouse. This allowed our data scientists to run further experiments, producing more features and models.

Every request, every microsecond: scalable machine learning at Cloudflare
Previous system design for serving machine learning features over Unix socket using Gagarin.

Initially, Gagarin exhibited decent latency, with a median latency around 200 microseconds to serve all machine learning features for given keys. However, as our system evolved and we introduced more features and dimension keys, coupled with increased traffic, the cache hit ratio began to wane. The median latency had increased to 500 microseconds and during peak times, the latency worsened significantly, with the p99 latency soaring to roughly 10 milliseconds. Gagarin underwent extensive low-level tuning, optimization, profiling, and benchmarking. Despite these efforts, we encountered the limits of inter-process communication (IPC) using Unix Domain Socket (UDS), among other challenges, explored below.

Problem definition

In summary, the previous solution had its drawbacks, including:

  • High tail latency: during the peak time, a portion of requests experienced increased  latency caused by CPU contention on the Unix socket and Lua garbage collector.
  • Suboptimal resource utilization: CPU and RAM utilization was not optimized to the full potential, leaving less resources for other services running on the server.
  • Machine learning features availability: decreased due to memcached timeouts, which resulted in a higher likelihood of false positives or false negatives for a subset of the requests.
  • Scalability constraints: as we added more machine learning features, we approached the scalability limit of our infrastructure.

Equipped with a comprehensive understanding of the challenges and armed with quantifiable metrics, we ventured into the next phase: seeking a more efficient way to fetch and serve machine learning features.

Exploring solutions

In our quest for more efficient methods of fetching and serving machine learning features, we evaluated several alternatives. The key approaches included:

Further optimizing Gagarin: as we pushed our Go-based memcached server to its limits, we encountered a lower bound on latency reductions. This arose from IPC over UDS synchronization overhead and multiple data copies, the serialization/deserialization overheads, as well as the inherent latency of garbage collector and the performance of hashmap lookups in Go.

Considering Quicksilver: we contemplated using Quicksilver, but the volume and update frequency of machine learning features posed capacity concerns and potential negative impacts on other use cases. Moreover, it uses a Unix socket with the memcached protocol, reproducing the same limitations previously encountered.

Increasing multi-layered cache size: we investigated expanding cache size to accommodate tens of millions of dimension keys. However, the associated memory consumption, due to duplication of these keys and their machine learning features across worker threads, rendered this approach untenable.

Sharding the Unix socket: we considered sharding the Unix socket to alleviate contention and improve performance. Despite showing potential, this approach only partially solved the problem and introduced more system complexity.

Switching to RPC: we explored the option of using RPC for communication between our front line server and Gagarin. However, since RPC still requires some form of communication bus (such as TCP, UDP, or UDS), it would not significantly change the performance compared to the memcached protocol over UDS, which was already simple and minimalistic.

After considering these approaches, we shifted our focus towards investigating alternative Inter-Process Communication (IPC) mechanisms.

IPC mechanisms

Adopting a first principles design approach, we questioned: "What is the most efficient low-level method for data transfer between two processes provided by the operating system?" Our goal was to find a solution that would enable the direct serving of machine learning features from memory for corresponding HTTP requests. By eliminating the need to traverse the Unix socket, we aimed to reduce CPU contention, improve latency, and minimize data copying.

To identify the most efficient IPC mechanism, we evaluated various options available within the Linux ecosystem. We used ipc-bench, an open-source benchmarking tool specifically designed for this purpose, to measure the latencies of different IPC methods in our test environment. The measurements were based on sending one million 1,024-byte messages forth and back (i.e., ping pong) between two processes.

IPC method Avg duration, μs Avg throughput, msg/s
eventfd (bi-directional) 9.456 105,533
TCP sockets 8.74 114,143
Unix domain sockets 5.609 177,573
FIFOs (named pipes) 5.432 183,388
Pipe 4.733 210,369
Message Queue 4.396 226,421
Unix Signals 2.45 404,844
Shared Memory 0.598 1,616,014
Memory-Mapped Files 0.503 1,908,613

Based on our evaluation, we found that Unix sockets, while taking care of synchronization, were not the fastest IPC method available. The two fastest IPC mechanisms were shared memory and memory-mapped files. Both approaches offered similar performance, with the former using a specific tmpfs volume in /dev/shm and dedicated system calls, while the latter could be stored in any volume, including tmpfs or HDD/SDD.

Missing ingredients

In light of these findings, we decided to employ memory-mapped files as the IPC mechanism for serving machine learning features. This choice promised reduced latency, decreased CPU contention, and minimal data copying. However, it did not inherently offer data synchronization capabilities like Unix sockets. Unlike Unix sockets, memory-mapped files are simply files in a Linux volume that can be mapped into memory of the process. This sparked several critical questions:

  1. How could we efficiently fetch an array of hundreds of float features for given dimension keys when dealing with a file?
  2. How could we ensure safe, concurrent and frequent updates for tens of millions of keys?
  3. How could we avert the CPU contention previously encountered with Unix sockets?
  4. How could we effectively support the addition of more dimensions and features in the future?

To address these challenges we needed to further evolve this new approach by adding a few key ingredients to the recipe.

Augmenting the Idea

To realize our vision of memory-mapped files as a method for serving machine learning features, we needed to employ several key strategies, touching upon aspects like data synchronization, data structure, and deserialization.

Wait-free synchronization

When dealing with concurrent data, ensuring safe, concurrent, and frequent updates is paramount. Traditional locks are often not the most efficient solution, especially when dealing with high concurrency environments. Here's a rundown on three different synchronization techniques:

With-lock synchronization: a common approach using mechanisms like mutexes or spinlocks. It ensures only one thread can access the resource at a given time, but can suffer from contention, blocking, and priority inversion, just as evident with Unix sockets.

Lock-free synchronization: this non-blocking approach employs atomic operations to ensure at least one thread always progresses. It eliminates traditional locks but requires careful handling of edge cases and race conditions.

Wait-free synchronization: a more advanced technique that guarantees every thread makes progress and completes its operation without being blocked by other threads. It provides stronger progress guarantees compared to lock-free synchronization, ensuring that each thread completes its operation within a finite number of steps.

Disjoint Access Parallelism Starvation Freedom Finite Execution Time
With lock Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare
Lock-free Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare
Wait-free Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare Every request, every microsecond: scalable machine learning at Cloudflare

Our wait-free data access pattern draws inspiration from Linux kernel's Read-Copy-Update (RCU) pattern and the Left-Right concurrency control technique. In our solution, we maintain two copies of the data in separate memory-mapped files. Write access to this data is managed by a single writer, with multiple readers able to access the data concurrently.

We store the synchronization state, which coordinates access to these data copies, in a third memory-mapped file, referred to as "state". This file contains an atomic 64-bit integer, which represents an InstanceVersion and a pair of additional atomic 32-bit variables, tracking the number of active readers for each data copy. The InstanceVersion consists of the currently active data file index (1 bit), the data size (39 bits, accommodating data sizes up to 549 GB), and a data checksum (24 bits).

Zero-copy deserialization

To efficiently store and fetch machine learning features, we needed to address the challenge of deserialization latency. Here, zero-copy deserialization provides an answer. This technique reduces the time and memory required to access and use data by directly referencing bytes in the serialized form.

We turned to rkyv, a zero-copy deserialization framework in Rust, to help us with this task. rkyv implements total zero-copy deserialization, meaning no data is copied during deserialization and no work is done to deserialize data. It achieves this by structuring its encoded representation to match the in-memory representation of the source type.

One of the key features of rkyv that our solution relies on is its ability to access HashMap data structures in a zero-copy fashion. This is a unique capability among Rust serialization libraries and one of the main reasons we chose rkyv for our implementation. It also has a vibrant Discord community, eager to offer best-practice advice and accommodate feature requests.

Every request, every microsecond: scalable machine learning at Cloudflare
Feature comparison: rkyv vs FlatBuffers and Cap'n Proto

Enter mmap-sync crate

Leveraging the benefits of memory-mapped files, wait-free synchronization and zero-copy deserialization, we've crafted a unique and powerful tool for managing high-performance, concurrent data access between processes. We've packaged these concepts into a Rust crate named mmap-sync, which we're thrilled to open-source for the wider community.

At the core of the mmap-sync package is a structure named Synchronizer. It offers an avenue to read and write any data expressible as a Rust struct. Users simply have to implement or derive a specific Rust trait surrounding struct definition – a task requiring just a single line of code. The Synchronizer presents an elegantly simple interface, equipped with "write" and "read" methods.

impl Synchronizer {
    /// Write a given `entity` into the next available memory mapped file.
    pub fn write<T>(&mut self, entity: &T, grace_duration: Duration) -> Result<(usize, bool), SynchronizerError> {

    /// Reads and returns `entity` struct from mapped memory wrapped in `ReadResult`
    pub fn read<T>(&mut self) -> Result<ReadResult<T>, SynchronizerError> {

/// FeaturesMetadata stores features along with their metadata
#[derive(Archive, Deserialize, Serialize, Debug, PartialEq)]
pub struct FeaturesMetadata {
    /// Features version
    pub version: u32,
    /// Features creation Unix timestamp
    pub created_at: u32,
    /// Features represented by vector of hash maps
    pub features: Vec<HashMap<u64, Vec<f32>>>,

A read operation through the Synchronizer performs zero-copy deserialization and returns a "guarded" Result encapsulating a reference to the Rust struct using RAII design pattern. This operation also increments the atomic counter of active readers using the struct. Once the Result is out of scope, the Synchronizer decrements the number of readers.

The synchronization mechanism used in mmap-sync is not only "lock-free" but also "wait-free". This ensures an upper bound on the number of steps an operation will take before it completes, thus providing a performance guarantee.

The data is stored in shared mapped memory, which allows the Synchronizer to “write” to it and “read” from it concurrently. This design makes mmap-sync a highly efficient and flexible tool for managing shared, concurrent data access.

Now, with an understanding of the underlying mechanics of mmap-sync, let's explore how this package plays a key role in the broader context of our Bot Management platform, particularly within the newly developed components: the bliss service and library.

System design overhaul

Transitioning from a Lua-based module that made memcached requests over Unix socket to Gagarin in Go to fetch machine learning features, our new design represents a significant evolution. This change pivots around the introduction of mmap-sync, our newly developed Rust package, laying the groundwork for a substantial performance upgrade. This development led to a comprehensive system redesign and introduced two new components that form the backbone of our Bots Liquidation Intelligent Security System – or BLISS, in short: the bliss service and the bliss library.

Every request, every microsecond: scalable machine learning at Cloudflare

Bliss service

The bliss service operates as a Rust-based, multi-threaded sidecar daemon. It has been designed for optimal batch processing of vast data quantities and extensive I/O operations. Among its key functions, it fetches, parses, and stores machine learning features and dimensions for effortless data access and manipulation. This has been made possible through the incorporation of the Tokio event-driven platform, which allows for efficient, non-blocking I/O operations.

Bliss library

Operating as a single-threaded dynamic library, the bliss library seamlessly integrates into each worker thread using the Foreign Function Interface (FFI) via a Lua module. Optimized for minimal resource usage and ultra-low latency, this lightweight library performs tasks without the need for heavy I/O operations. It efficiently serves machine learning features and generates corresponding detections.

In addition to leveraging the mmap-sync package for efficient machine learning feature access, our new design includes several other performance enhancements:

  • Allocations-free operation: bliss library re-uses pre-allocated data structures and performs no heap allocations, only low-cost stack allocations. To enforce our zero-allocation policy, we run integration tests using the dhat heap profiler.
  • SIMD optimizations: wherever possible, the bliss library employs vectorized CPU instructions. For instance, AVX2 and SSE4 instruction sets are used to expedite hex-decoding of certain request attributes, enhancing speed by tenfold.
  • Compiler tuning: We compile both the bliss service and library with the following flags for superior performance:

codegen-units = 1
debug = true
lto = "fat"
opt-level = 3

  • Benchmarking & profiling: We use Criterion for benchmarking every major feature or component within bliss. Moreover, we are also able to use the Go pprof profiler on Criterion benchmarks to view flame graphs and more:

cargo bench -p integration -- --verbose --profile-time 100

go tool pprof -http=: ./target/criterion/process_benchmark/process/profile/profile.pb

This comprehensive overhaul of our system has not only streamlined our operations but also has been instrumental in enhancing the overall performance of our Bot Management platform. Stay tuned to witness the remarkable changes brought about by this new architecture in the next section.

Rollout results

Our system redesign has brought some truly "blissful" dividends. Above all, our commitment to a seamless user experience and the trust of our customers have guided our innovations. We ensured that the transition to the new design was seamless, maintaining full backward compatibility, with no customer-reported false positives or negatives encountered. This is a testament to the robustness of the new system.

As the old adage goes, the proof of the pudding is in the eating. This couldn't be truer when examining the dramatic latency improvements achieved by the redesign. Our overall processing latency for HTTP requests at Cloudflare improved by an average of 12.5% compared to the previous system.

This improvement is even more significant in the Bot Management module, where latency improved by an average of 55.93%.

Every request, every microsecond: scalable machine learning at Cloudflare
Bot Management module latency, in microseconds.

More specifically, our machine learning features fetch latency has improved by several orders of magnitude:

Latency metric Before (μs) After (μs) Change
p50 532 9 -98.30% or x59
p99 9510 18 -99.81% or x528
p999 16000 29 -99.82% or x551

To truly grasp this impact, consider this: with Cloudflare’s average rate of 46 million requests per second, a saving of 523 microseconds per request equates to saving over 24,000 days or 65 years of processing time every single day!

In addition to latency improvements, we also reaped other benefits from the rollout:

  • Enhanced feature availability: thanks to eliminating Unix socket timeouts, machine learning feature availability is now a robust 100%, resulting in fewer false positives and negatives in detections.
  • Improved resource utilization: our system overhaul liberated resources equivalent to thousands of CPU cores and hundreds of gigabytes of RAM – a substantial enhancement of our server fleet's efficiency.
  • Code cleanup: another positive spin-off has been in our Lua and Go code. Thousands of lines of less performant and less memory-safe code have been weeded out, reducing technical debt.
  • Upscaled machine learning capabilities: last but certainly not least, we've significantly expanded our machine learning features, dimensions, and models. This upgrade empowers our machine learning inference to handle hundreds of machine learning features and dozens of dimensions and models.


In the wake of our redesign, we've constructed a powerful and efficient system that truly embodies the essence of 'bliss'. Harnessing the advantages of memory-mapped files, wait-free synchronization, allocation-free operations, and zero-copy deserialization, we've established a robust infrastructure that maintains peak performance while achieving remarkable reductions in latency. As we navigate towards the future, we're committed to leveraging this platform to further improve our Security machine learning products and cultivate innovative features. Additionally, we're excited to share parts of this technology through an open-sourced Rust package mmap-sync.

As we leap into the future, we are building upon our platform's impressive capabilities, exploring new avenues to amplify the power of machine learning. We are deploying a new machine learning model built on BLISS with select customers. If you are a Bot Management subscriber and want to test the new model, please reach out to your account team.

Separately, we are on the lookout for more Cloudflare customers who want to run their own machine learning models at the edge today. If you’re a developer considering making the switch to Workers for your application, sign up for our Constellation AI closed beta. If you’re a Bot Management customer and looking to run an already trained, lightweight model at the edge, we would love to hear from you. Let's embark on this path to bliss together.

How we’re learning to explain AI terms for young people and educators

Post Syndicated from Veronica Cucuiat original https://www.raspberrypi.org/blog/explaining-ai-terms-young-people-educators/

What do we talk about when we talk about artificial intelligence (AI)? It’s becoming a cliche to point out that, because the term “AI” is used to describe so many different things nowadays, it’s difficult to know straight away what anyone means when they say “AI”. However, it’s true that without a shared understanding of what AI and related terms mean, we can’t talk about them, or educate young people about the field.

A group of young people demonstrate a project at Coolest Projects.

So when we started designing materials for the Experience AI learning programme in partnership with leading AI unit Google DeepMind, we decided to create short explanations of key AI and machine learning (ML) terms. The explanations are doubly useful:

  1. They ensure that we give learners and teachers a consistent and clear understanding of the key terms across all our Experience AI resources. Within the Experience AI Lessons for Key Stage 3 (age 11–14), these key terms are also correlated to the target concepts and learning objectives presented in the learning graph. 
  2. They help us talk about AI and AI education in our team. Thanks to sharing an understanding of what terms such as “AI”, “ML”, “model”, or “training” actually mean and how to best talk about AI, our conversations are much more productive.

As an example, here is our explanation of the term “artificial intelligence” for learners aged 11–14:

Artificial intelligence (AI) is the design and study of systems that appear to mimic intelligent behaviour. Some AI applications are based on rules. More often now, AI applications are built using machine learning that is said to ‘learn’ from examples in the form of data. For example, some AI applications are built to answer questions or help diagnose illnesses. Other AI applications could be built for harmful purposes, such as spreading fake news. AI applications do not think. AI applications are built to carry out tasks in a way that appears to be intelligent.

You can find 32 explanations in the glossary that is part of the Experience AI Lessons. Here’s an insight into how we arrived at the explanations.

Reliable sources

In order to ensure the explanations are as precise as possible, we first identified reliable sources. These included among many others:

Explaining AI terms to Key Stage 3 learners: Some principles

Vocabulary is an important part of teaching and learning. When we use vocabulary correctly, we can support learners to develop their understanding. If we use it inconsistently, this can lead to alternate conceptions (misconceptions) that can interfere with learners’ understanding. You can read more about this in our Pedagogy Quick Read on alternate conceptions.

Some of our principles for writing explanations of AI terms were that the explanations need to: 

  • Be accurate
  • Be grounded in education research best practice
  • Be suitable for our target audience (Key Stage 3 learners, i.e. 11- to 14-year-olds)
  • Be free of terms that have alternative meanings in computer science, such as “algorithm”

We engaged in an iterative process of writing explanations, gathering feedback from our team and our Experience AI project partners at Google DeepMind, and adapting the explanations. Then we went through the feedback and adaptation cycle until we all agreed that the explanations met our principles.

A real banana and an image of a banana shown on the screen of a laptop are both labelled "Banana".
Image: Max Gruber / Better Images of AI / Ceci n’est pas une banane / CC-BY 4.0

An important part of what emerged as a result, aside from the explanations of AI terms themselves, was a blueprint for how not to talk about AI. One aspect of this is avoiding anthropomorphism, detailed by Ben Garside from our team here.

As part of designing the the Experience AI Lessons, creating the explanations helped us to:

  • Decide which technical details we needed to include when introducing AI concepts in the lessons
  • Figure out how to best present these technical details
  • Settle debates about where it would be appropriate, given our understanding and our learners’ age group, to abstract or leave out details

Using education research to explain AI terms

One of the ways education research informed the explanations was that we used semantic waves to structure each term’s explanation in three parts: 

  1. Top of the wave: The first one or two sentences are a high-level abstract explanation of the term, kept as short as possible, while introducing key words and concepts.
  2. Bottom of the wave: The middle part of the explanation unpacks the meaning of the term using a common example, in a context that’s familiar to a young audience. 
  3. Top of the wave: The final one or two sentences repack what was explained in the example in a more abstract way again to reconnect with the term. The end part should be a repeat of the top of the wave at the beginning of the explanation. It should also add further information to lead to another concept. 

Most explanations also contain ‘middle of the wave’ sentences, which add additional abstract content, bridging the ‘bottom of the wave’ concrete example to the ‘top of the wave’ abstract content.

Here’s the “artificial intelligence” explanation broken up into the parts of the semantic wave:

  • Artificial intelligence (AI) is the design and study of systems that appear to mimic intelligent behaviour. (top of the wave)
  • Some AI applications are based on rules. More often now, AI applications are built using machine learning that is said to ‘learn’ from examples in the form of data. (middle of the wave)
  • For example, some AI applications are built to answer questions or help diagnose illnesses. Other AI applications could be built for harmful purposes, such as spreading fake news (bottom of the wave)
  • AI applications do not think. (middle of the wave)
  • AI applications are built to carry out tasks in a way that appears to be intelligent. (top of the wave)
Our "artificial intelligence" explanation broken up into the parts of the semantic wave.
Our “artificial intelligence” explanation broken up into the parts of the semantic wave. Red = top of the wave; yellow = middle of the wave; green = bottom of the wave

Was it worth our time?

Some of the explanations went through 10 or more iterations before we agreed they were suitable for publication. After months of thinking about, writing, correcting, discussing, and justifying the explanations, it’s tempting to wonder whether I should have just prompted an AI chatbot to generate the explanations for me.

A window of three images. On the right is a photo of a big tree in a green field in a field of grass and a bright blue sky. The two on the left are simplifications created based on a decision tree algorithm. The work illustrates a popular type of machine learning model: the decision tree. Decision trees work by splitting the population into ever smaller segments. I try to give people an intuitive understanding of the algorithm. I also want to show that models are simplifications of reality, but can still be useful, or in this case visually pleasing. To create this I trained a model to predict pixel colour values, based on an original photograph of a tree.
Rens Dimmendaal & Johann Siemens / Better Images of AI / Decision Tree reversed / CC-BY 4.0

I tested this idea by getting a chatbot to generate an explanation of “artificial intelligence” using the prompt “Explain what artificial intelligence is, using vocabulary suitable for KS3 students, avoiding anthropomorphism”. The result included quite a few inconsistencies with our principles, as well as a couple of technical inaccuracies. Perhaps I could have tweaked the prompt for the chatbot in order to get a better result. However, relying on a chatbot’s output would mean missing out on some of the value of doing the work of writing the explanations in collaboration with my team and our partners.

The visible result of that work is the explanations themselves. The invisible result is the knowledge we all gained, and the coherence we reached as a team, both of which enabled us to create high-quality resources for Experience AI. We wouldn’t have gotten to know what resources we wanted to write without writing the explanations ourselves and improving them over and over. So yes, it was worth our time.

What do you think about the explanations?

The process of creating and iterating the AI explanations highlights how opaque the field of AI still is, and how little we yet know about how best to teach and learn about it. At the Raspberry Pi Foundation, we now know just a bit more about that and are excited to share the results with teachers and young people.

You can access the Experience AI Lessons and the glossary with all our explanations at experience-ai.org. The glossary of AI explanations is just in its first published version: we will continue to improve it as we find out more about how to best support young people to learn about this field.

Let us know what you think about the explanations and whether they’re useful in your teaching. Onwards with the exciting work of establishing how to successfully engage young people in learning about and creating with AI technologies.

The post How we’re learning to explain AI terms for young people and educators appeared first on Raspberry Pi Foundation.

Real-time inference using deep learning within Amazon Kinesis Data Analytics for Apache Flink

Post Syndicated from Jeremy Ber original https://aws.amazon.com/blogs/big-data/real-time-inference-using-deep-learning-within-amazon-kinesis-data-analytics-for-apache-flink/

Apache Flink is a framework and distributed processing engine for stateful computations over data streams. Amazon Kinesis Data Analytics for Apache Flink is a fully managed service that enables you to use an Apache Flink application to process streaming data. The Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning.

In this blog post, we demonstrate how you can use DJL within Kinesis Data Analytics for Apache Flink for real-time machine learning inference. Real-time inference can be valuable in a variety of applications and industries where it is essential to make predictions or take actions based on new data as quickly as possible with low latencies. We show how to load a pre-trained deep learning model from the DJL model zoo into a Flink job and apply the model to classify data objects in a continuous data stream. The DJL model zoo includes a wide variety of pre-trained models for image classification, semantic segmentation, speech recognition, text embedding generation, question answering, and more. It supports HuggingFace, Pytorch, MXNet, and TensorFlow model frameworks and also helps developers create and publish their own models. We will focus on image classification and use a popular classifier called ResNet-18 to produce predictions in real time. The model has been pre-trained on ImageNet with 1.2 million images belonging to 1,000 class labels.

We provide sample code, architecture diagrams, and an AWS CloudFormation template so you can follow along and employ ResNet-18 as your classifier to make real-time predictions. The solution we provide here is a powerful design pattern for continuously producing ML-based predictions on streaming data within Kinesis Data Analytics for Apache Flink. You can adapt the provided solution for your use case and choose an alternative model from the model zoo or even provide your own model.

Image classification is a classic problem that takes an image and selects the best-fitting class, such as whether the image from an autonomous driving system is that of a bicycle or a pedestrian. Common use cases for real-time inference on streams of images include classifying images from vehicle cameras and license plate recognition systems, and classifying images uploaded to social media and ecommerce websites. The use cases typically need low latency while handling high throughput and potentially bursty streams. For example, in ecommerce websites, real-time classification of uploaded images can help in marking pictures of banned goods or hazardous materials that have been supplied by sellers. Immediate determination through streaming inference is needed to trigger alerts and follow-up actions to prevent these images from being part of the catalog. This enables faster decision-making compared to batch jobs that run on a periodic basis. The data stream pipeline can involve multiple models for different purposes, such as classifying uploaded images into ecommerce categories of electronics, toys, fashion, and so on.

Solution overview

The following diagram illustrates the workflow of our solution.

architecture showcasing a kinesis data analytics for apache flink application reading from Images in an Amazon S3 bucket, classifying those images and then writing out to another S3 bucket called "classifications"

The application performs the following steps:

  1. Read in images from Amazon Simple Storage Service (Amazon S3) using the Apache Flink Filesystem File Source connector.
  2. Window the images into a collection of records.
  3. Classify the batches of images using the DJL image classification class.
  4. Write inference results to Amazon S3 at the path specified.

Images are recommended to be of reasonable size so that they may fit into a Kinesis Processing Unit. Images larger than 50MB in size may result in latency in processing and classification.

The main class for this Apache Flink job is located at src/main/java/com.amazon.embeddedmodelinference/EMI.java. Here you can find the main() method and entry point to our Flink job.


To get started, configure the following prerequisites on your local machine:

Once this is set up, you can clone the code base to access the source code for this solution. The Java application code for this example is available on GitHub. To download the application code, clone the remote repository using the following command:

git clone https://github.com/aws-samples/amazon-kinesis-data-analytics-examples

Find and navigate to the folder of the image classification example, called image-classification.

An example set of images to stream and test the code is available in the imagenet-sample-images folder.

Let’s walk through the code step by step.

Test on your local machine

If you would like to test this application locally on your machine, ensure you have AWS credentials set up locally on your machine. Additionally, download the Flink S3 Filesystem Hadoop JAR to use with your Apache Flink installation and place it in a folder named plugins/s3 in the root of your project. Then configure the following environment variables either on your IDE or in your machine’s local variable scope:

plugins.dir=<<path-to-flink-s3-fs-hadoop jar>>
s3.access.key=<<aws access key>>
s3.secret.key=<<aws secret key>>

Replace these values with your own.showcasing the environment properties to replace on IntelliJ

After configuring the environment variables and downloading the necessary plugin JAR, let’s look at the code.

In the main method, after setting up our StreamExecutionEnvironment, we define our FileSource to read files from Amazon S3. By default, this source operator reads from a sample bucket. You can replace this bucket name with your own by changing the variable called bucket, or setting the application property on Kinesis Data Analytics for Apache Flink once deployed.

final FileSource<StreamedImage> source =
FileSource.forRecordStreamFormat(new ImageReaderFormat(), new Path(s3SourcePath))

The FileSource is configured to read in files in the ImageReaderFormat, and will check Amazon S3 for new images every 10 seconds. This can be configured as well.

After we have read in our images, we convert our FileSource into a stream that can be processed:

DataStream<StreamedImage> stream =
env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source");

Next, we create a tumbling window of a variable time window duration, specified in the configuration, defaulting to 60 seconds. Every window close creates a batch (list) of images to be classified using a ProcessWindowFunction.

This ProcessWindowFunction calls the classifier predict function on the list of images and returns the best probability of classification from each image. This result is then sent back to the Flink operator, where it’s promptly written out to the S3 bucket path of your configuration.

.process(new ProcessWindowFunction<StreamedImage, String, String, TimeWindow>() {
                    public void process(String s,
                                        ProcessWindowFunction<StreamedImage, String, String, TimeWindow>.Context context,
                                        Iterable<StreamedImage> iterableImages,
                                        Collector<String> out) throws Exception {

                            List<Image> listOfImages = new ArrayList<Image>();
                            iterableImages.forEach(x -> {
                            // batch classify images
                            List<Classifications> list = classifier.predict(listOfImages);
                            for (Classifications classifications : list) {
                                Classifications.Classification cl = classifications.best();
                                String ret = cl.getClassName() + ": " + cl.getProbability();
                        } catch (ModelException | IOException | TranslateException e) {
                            logger.error("Failed predict", e);

In Classifier.java, we read the image and apply crop, transpose, reshape, and finally convert to an N-dimensional array that can be processed by the deep learning model. Then we feed the array to the model and apply a forward pass. During the forward pass, the model computes the neural network layer by layer. At last, the output object contains the probabilities for each image object that the model is being trained on. We map the probabilities with the object name and return to the map function.

Deploy the solution with AWS CloudFormation

To run this code base on Kinesis Data Analytics for Apache Flink, we have a helpful CloudFormation template that will spin up the necessary resources. Simply open AWS CloudShell or your local machine’s terminal and enter the following commands. Complete the following steps to deploy the solution:

  1. If you don’t have the AWS Cloud Development Kit (AWS CDK) bootstrapped in your account, run the following command, providing your account number and current Region:
cdk bootstrap aws://ACCOUNT-NUMBER/REGION

The script will clone a GitHub repo of images to classify and upload them to your source S3 bucket. Then it will launch the CloudFormation stack given your input parameters. video walking through the setup of the cloudformation template. Described in text later

  1. Enter the following code and replace the BUCKET variables with your own source bucket and sink bucket, which will contain the source images and the classifications, respectively:
git clone https://github.com/EliSchwartz/imagenet-sample-images; cd imagenet-sample-images;
aws s3 cp . $SOURCE_BUCKET --recursive --exclude "*/";
aws cloudformation create-stack --stack-name KDAImageClassification --template-url https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/BDB-3098/BlogStack.template.json --parameters ParameterKey=inputBucketPath,ParameterValue=$SOURCE_BUCKET ParameterKey=outputBucketPath,ParameterValue=$SINK_BUCKET --capabilities CAPABILITY_IAM;

This CloudFormation stack creates the following resources:

    • A Kinesis Data Analytics application with 1 Kinesis Processing Unit (KPU) preconfigured with some application properties
    • An S3 bucket for your output results
  1. When the stack is complete, navigate to the Kinesis Data Analytics for Apache Flink console.
  2. Find the application called blog-DJL-flink-ImageClassification-application and choose Run.
  3. On the Amazon S3 console, navigate to the bucket you specified in the outputBucketPath variable.

If you have readable images in the source bucket listed, you should see classifications of those images within the checkpoint interval of the running application.

Deploy the solution manually

If you prefer to use your own code base, you can follow the manual steps in this section:

  • After you clone the application locally, create your application JAR by navigating to the directory that contains your pom.xml and running the following command:
mvn clean package

This builds your application JAR in the target/ directory called embedded-model-inference-1.0-SNAPSHOT.jar.

application properties on KDA console

  1. Upload this application JAR to an S3 bucket, either the one created from the CloudFormation template, or another one to store code artifacts.
  2. You can then configure your Kinesis Data Analytics application to point to this newly uploaded S3 JAR file.
  3. This is also a great opportunity to configure your runtime properties, as shown in the following screenshot.
  4. Choose Run to start your application.

You can open the Apache Flink Dashboard to check for application exceptions or to see data flowing through the tasks defined in the code.

image of flink dashboard showing successful running of the application

Validate the results

To validate our results, let’s check the results in Amazon S3 by navigating to the Amazon S3 console and finding our S3 bucket. We can find the output in a folder called output-kda.

image showing folders within amazon s3 partitioned by datetime

When we choose one of the data-partitioned folders, we can see partition files. Ensure that there is no underscore in front of your part file, because this indicates that the results are still being finalized according to the rollover interval defined in Apache Flink’s FileSink connector. After the underscores have disappeared, we can use Amazon S3 Select to view our data.

partition files as they land in Amazon S3

We now have a solution that continuously performs classification on incoming images in real time using Kinesis Data Analytics for Apache Flink. It extracts a pre-trained classification model (ResNet-18) from the DJL model zoo, applies some preprocessing, loads the model into a Flink operator’s memory, and continuously applies the model for online predictions on streaming images.

Although we used ResNet-18 in this post, you can choose another model by modifying the classifier. The DJL model zoo provides many other models, both for classification and other applications, that can be used out of the box. You can also load your custom model by providing an S3 link or URL to the criteria. DJL supports models in a large number of engines such as PyTorch, ONNX, TensorFlow, and MXNet. Using a model in the solution is relatively simple. All of the preprocessing and postprocessing code is encapsulated in the (built-in) translator, so all we have to do is load the model, create a predictor, and call predict(). This is done within the data source operator, which processes the stream of input data and sends the links to the data to the inference operator where the model you selected produces the prediction. Then the sink operator writes the results.

The CloudFormation template in this example focused on a simple 1 KPU application. You could extend the solution to further scale out to large models and high-throughput streams, and support multiple models within the pipeline.

Clean up

To clean up the CloudFormation script you launched, complete the following steps:

  1. Empty the source bucket you specified in the bash script.
  2. On the AWS CloudFormation console, locate the CloudFormation template called KDAImageClassification.
  3. Delete the stack, which will remove all of the remaining resources created for this post.
  4. You may optionally delete the bootstrapping CloudFormation template, CDKToolkit, which you launched earlier as well.


In this post, we presented a solution for real-time classification using the Deep Java Library within Kinesis Data Analytics for Apache Flink. We shared a working example and discussed how you can adapt the solution for your specific ML use case. If you have any feedback or questions, please leave them in the comments section.

About the Authors

Jeremy Ber has been working in the telemetry data space for the past 9 years as a Software Engineer, Machine Learning Engineer, and most recently a Data Engineer. At AWS, he is a Streaming Specialist Solutions Architect, supporting both Amazon Managed Streaming for Apache Kafka (Amazon MSK) and AWS’s managed offering for Apache Flink.

Deepthi Mohan is a Principal Product Manager for Amazon Kinesis Data Analytics, AWS’s managed offering for Apache Flink.

Gaurav Rele is a Data Scientist at the Amazon ML Solution Lab, where he works with AWS customers across different verticals to accelerate their use of machine learning and AWS Cloud services to solve their business challenges.

PII masking for privacy-grade machine learning

Post Syndicated from Grab Tech original https://engineering.grab.com/pii-masking

At Grab, data engineers work with large sets of data on a daily basis. They design and build advanced machine learning models that provide strategic insights using all of the data that flow through the Grab Platform. This enables us to provide a better experience to our users, for example by increasing the supply of drivers in areas where our predictive models indicate a surge in demand in a timely fashion.

Grab has a mature privacy programme that complies with applicable privacy laws and regulations and we use tools to help identify, assess, and appropriately manage our privacy risks. To ensure that our users’ data are well-protected and avoid any human-related errors, we always take extra measures to secure this data.

However, data engineers will still require access to actual production data in order to tune effective machine learning models and ensure the models work as intended in production.

In this article, we will describe how the Grab’s data streaming team (Coban), along with the data platform and user teams, have enforced Personally Identifiable Information (PII) masking on machine learning data streaming pipelines. This ensures that we uphold a high standard and embody a privacy by design culture, while enabling data engineers to refine their models with sanitised production data.

PII tagging

Data streaming at Grab leverages the Protocol Buffers (protobuf) data format to structure in-transit data. When creating a new stream, developers must describe its fields in a protobuf schema that is then used for serialising the data wherever it is sent over the wire, and deserialising it wherever it is consumed.

A fictional example schema looks like this (the indexes are arbitrary, but commonly created in sequence):

message Booking {
  string bookingID = 1;
  int64 creationTime = 2;
  int64 passengerID = 3;
  string passengerName = 4;
  ... truncated output ...

Over here, the fourth field passengerName involves a PII and the data pertaining to that field should never be accessible by any data engineer. Therefore, developers owning the stream must tag that field with a PII label like this:

import "streams/coban/options/v1/pii.proto";

message Booking {
  string bookingID = 1;
  int64 creationTime = 2;
  int64 passengerID = 3;
  string passengerName = 4 [(streams.coban.options.v1.pii_type) = PII_TYPE_NAME];
  ... truncated output ...

The imported pii.proto library defines the tags for all possible types of PII. In the example above, the passengerName field has not only been flagged as PII, but is also marked as PII_TYPE_NAME – a specific type of PII that conveys the names of individuals. This high-level typing enables more flexible PII masking methods, which we will explain later.

Once the PII fields have been properly identified and tagged, developers need to publish the schema of their new stream into Coban’s Git repository. A Continuous Integration (CI) pipeline described below ensures that all fields describing PII are correctly tagged.

The following diagram shows this CI pipeline in action.

Fig. 1 CI pipeline failure due to untagged PII fields

When a developer creates a Merge Request (MR) or pushes a new commit to create or update a schema (step 1), the CI pipeline is triggered. It runs an in-house Python script that scans each variable name of the committed schema and tests it against an extensive list of PII keywords that is regularly updated, such as name, address, email, phone, etc (step 2). If there is a match and the variable is not tagged with the expected PII label, the pipeline fails (step 3) with an explicit error message in the CI pipeline’s output, similar to this:

Field name [Booking.passengerName] should have been marked with type streams.coban.options.v1.pii_type = PII_TYPE_NAME

There are cases where a variable name in the schema is a partial match against a PII keyword but is legitimately not a PII – for example, carModelName is a partial match against name but does not contain PII data. In this case, the developer can choose to add it to a whitelist to pass the CI.

However, modifying the whitelist requires approval from the Coban team for verification purposes. Apart from this particular case, the requesting team can autonomously approve their MR in a self-service fashion.

Now let us look at an example of a successful CI pipeline execution.

Fig. 2 CI pipeline success and schema publishing

In Fig. 2, the committed schema (step 1) is properly tagged so our in-house Python script is unable to find any untagged PII fields (step 2). The MR is approved by a code owner (step 3), then merged to the master branch of the repository (step 4).

Upon merging, another CI pipeline is triggered to package the protobuf schema in a Java Archive (JAR) of Scala classes (step 5), which in turn is stored into a package registry (step 6). We will explain the reason for this in a later section.

Production environment

With the schemas published and all of their PII fields properly tagged, we can now take a look at the data streaming pipelines.

Fig. 3 PII flow in the production environment

In this example, the user generates data by interacting with the Grab superapp and making a booking (step 1). The booking service, compiled with the stream’s schema definition, generates and produces Kafka records for other services to consume (step 2). Among those consuming services are the production machine learning pipelines that are of interest to this article (step 3).

PII is not masked in this process because it is actually required by the consuming services. For example, the driver app needs to display the passenger’s actual name, so the driver can confirm their identity easily.

At this part of the process, this is not much of a concern because access to the sacrosanct production environment is highly restricted and monitored by Grab.

PII masking

To ensure the security, stability, and privacy of our users, data engineers who need to tune their new machine learning models based on production data are not granted access to the production environment. Instead, they have access to the staging environment, where production data is mirrored and PII is masked.

Fig. 4 PII masking pipeline from the production environment to the staging environment

The actual PII masking is performed by an in-house Flink application that resides in the production environment. Flink is a reference framework for data streaming that we use extensively. It is also fault tolerant, with the ability to restart from a checkpoint.

The Flink application is compiled along with the JAR containing the schema as Scala classes previously mentioned. Therefore, it is able to consume the original data as a regular Kafka consumer (step 1). It then dynamically masks the PII of the consumed data stream, based on the PII tags of the schema (step 2). Ultimately, it produces the sanitised data to the Kafka cluster in the staging environment as a normal Kafka producer (step 3).

Depending on the kind of PII, there are several methods of masking such as:

  • Names and strings of characters: They are replaced by consistent HMAC (Hash-based message authentication code). A HMAC is a digest produced by a one-way cryptographic hash function that takes a secret key as a parameter. Leveraging a secret key here is a defence against chosen plaintext attacks, i.e. computing the digest of a particular plaintext, like a targeted individual’s name.
  • Numbers and dates: Similarly, they are transformed in a consistent manner, by leveraging a random generator that takes the unmasked value as a seed, so that the same PII input consistently produces the same masked output.

Note that consistency is a recurring pattern. This is because it is a key requirement for certain machine learning models.

This sanitised data produced to the Kafka cluster in the staging environment is then consumed by the staging machine learning pipelines (step 4). There, it is used by data engineers to tune their models effectively with near real-time production data (step 5).

The Kafka cluster in the staging environment is secured with authorisation and authentication (see Zero Trust with Kafka). This is an extra layer of security in case some PII data inadvertently fall through the cracks of PII tagging, following the defence in depth principle.

Finally, whenever a new PII-tagged field is added to a schema, the PII masking Flink application needs to be compiled and deployed again. If the schema is not updated, the Flink pipeline is unable to decode this new field when deserialising the stream. Thus, the added field is just dropped and the new PII data does not make it to the staging environment.

What’s next?

For the immediate next steps, we are going to enhance this design with an in-house product based on AWS Macie to automatically detect the PII that would have fallen through the cracks. Caspian, Grab’s data lake team and one of Coban’s sister teams, has built a service that is already able to detect PII data in relational databases and data lake tables. It is currently being adapted for data streaming.

In the longer run, we are committed to taking our privacy by design posture to the next level. Indeed, the PII masking described in this article does not prevent a bad actor from retrieving the consistent hash of a particular individual based on their non-PII data. For example, the target might be identifiable by a signature in the masked data set, such as unique food or transportation habits.

A possible counter-measure could be one or a combination of the following techniques, ordered by difficulty of implementation:

  • Data minimisation: Non-essential fields in the data stream should not be mirrored at all. E.g. fields of the data stream that are not required by the data engineers to tune their models. We can introduce a dedicated tag in the schema to flag those fields and instruct the mirroring pipeline to drop them. This is the most straightforward approach.
  • Differential privacy: The mirroring pipeline could introduce some noise in the mirrored data, in a way that would obfuscate the signatures of particular individuals while still preserving the essential statistical properties of the dataset required for machine learning. It happens that Flink is a suitable framework to do so, as it can split a stream into multiple windows and apply computation over those windows. Designing and generalising a logic that meets the objective is challenging though.
  • PII encryption at source: PII could be encrypted by the producing services (like the booking service), and dynamically decrypted where plaintext values are required. However, key management and performance are two tremendous challenges of this approach.

We will explore these techniques further to find the solution that works best for Grab and ensures the highest level of privacy for our users.

Join us

Grab is the leading superapp platform in Southeast Asia, providing everyday services that matter to consumers. More than just a ride-hailing and food delivery app, Grab offers a wide range of on-demand services in the region, including mobility, food, package and grocery delivery services, mobile payments, and financial services across 428 cities in eight countries.

Powered by technology and driven by heart, our mission is to drive Southeast Asia forward by creating economic empowerment for everyone. If this mission speaks to you, join our team today!

How GitHub Copilot is getting better at understanding your code

Post Syndicated from Johan Rosenkilde original https://github.blog/2023-05-17-how-github-copilot-is-getting-better-at-understanding-your-code/

To make working with GitHub Copilot feel like a meeting of the minds between developers and the pair programmer, GitHub’s machine learning experts have been busy researching, developing, and testing new capabilities—and many are focused on improving the AI pair programmer’s contextual understanding. That’s because good communication is key to pair programming, and inferring context is critical to making good communication happen.

To pull back the curtain, we asked GitHub’s researchers and engineers about the work they’re doing to help GitHub Copilot improve its contextual understanding. Here’s what we discovered.

From OpenAI’s Codex model to GitHub Copilot

When OpenAI released GPT-3 in June 2020, GitHub knew developers would benefit from a product that leveraged the model specifically for coding. So, we gave input to OpenAI as it built Codex, a descendant of GPT-3 and the LLM that would power GitHub Copilot. The pair programmer launched as a technical preview in June 2021 and became generally available in June 2022 as the world’s first at-scale generative AI coding tool.

To ensure that the model has the best information to make the best predictions with speed, GitHub’s machine learning (ML) researchers have done a lot of work called prompt engineering (which we’ll explain in more detail below) so that the model provides contextually relevant responses with low latency.

Though GitHub’s always experimenting with new models as they come out, Codex was the first really powerful generative AI model that was available, said David Slater, a ML engineer at GitHub. “The hands-on experience we gained from iterating on model and prompt improvements was invaluable.”

All that experimentation resulted in a pair programmer that, ultimately, frees up a developer’s time to focus on more fulfilling work. The tool is often a huge help even for starting new projects or files from scratch because it scaffolds a starting point that developers can adapt and tweak as desired, said Alice Li, a ML researcher at GitHub.

I still find myself impressed and even surprised by what GitHub Copilot can do, even after having worked on it for some time now.

– Alice Li, ML researcher at GitHub

Why context matters

Developers use details from pull requests, a folder in a project, open issues, and more to contextualize their code. When it comes to a generative AI coding tool, we need to teach that tool what information to use to do the same.

Transformer LLMs are good at connecting the dots and big-picture thinking. Generative AI coding tools are made possible by large language models (LLMs). These models are sets of algorithms trained on large amounts of code and human language. Today’s state-of-the-art LLMs are transformers, which makes them adept at making connections between text in a user’s input and the output that the model has already generated. This is why today’s generative AI tools are providing responses that are more contextually relevant than previous AI models.

But they need to be told what information is relevant to your code. Right now, transformers that are fast enough to power GitHub Copilot can process about 6,000 characters at a time. While that’s been enough to advance and accelerate tasks like code completion and code change summarization, the limited amount of characters means that not all of a developer’s code can be used as context.

So, our challenge is to figure out not only what data to feed the model, but also how to best order and enter it to get the best suggestions for the developer.

Learn more about LLMs, generative AI coding tools, and how they’re changing the way developers work.

How GitHub Copilot understands your code

It all comes down to prompts, which are compilations of IDE code and relevant context that’s fed to the model. Prompts are generated by algorithms in the background, at any point in your coding. That’s why GitHub Copilot will generate coding suggestions whether you’re currently writing or just finished a comment, or in the middle of some gnarly code.

  • Here’s how a prompt is created: a series of algorithms first select relevant code snippets or comments from your current file and other sources (which we’ll dive into below). These snippets and comments are then prioritized, filtered, and assembled into the final prompt.

GitHub Copilot’s contextual understanding has continuously matured over time. The first version was only able to consider the file you were working on in your IDE to be contextually relevant. But we knew context went beyond that. Now, just a year later, we’re experimenting with algorithms that will consider your entire codebase to generate customized suggestions.

Let’s look at how we got here:

  • Prompt engineering is the delicate art of creating a prompt so that the model makes the most useful prediction for the user. The prompt tells LLMs, including GitHub Copilot, what data, and in what order, to process in order to contextualize your code. Most of this work takes place in what’s called a prompt library, which is where our in-house ML experts work with algorithms to extract and prioritize a variety of sources of information about the developer’s context, creating the prompt that’ll be processed by the GitHub Copilot model.

  • Neighboring tabs is what we call the technique that allows GitHub Copilot to process all of the files open in a developer’s IDE instead of just the single one the developer is working on. By opening all files relevant to their project, developers automatically invoke GitHub Copilot to comb through all of the data and find matching pieces of code between their open files and the code around their cursor—and add those matches to the prompt.

When developing neighboring tabs, the GitHub Next team and in-house ML researchers did A/B tests to figure out the best parameters for identifying matches between code in your IDE and code in your open tabs. They found that setting a very low bar for when to include a match actually made for the best coding suggestions.

By including every little bit of context, neighboring tabs helped to relatively increase user acceptance of GitHub Copilot’s suggestions by 5%**.

Even if there was no perfect match—or even a very good one—picking the best match we found and including that as context for the model was better than including nothing at all.

– Albert Ziegler, principal ML engineer at GitHub
  • The Fill-In-the-Middle (FIM) paradigm widened the context aperture even more. Prior to FIM, only the code before your cursor would be put into the prompt—ignoring the code after your cursor. (At GitHub, we refer to code before the cursor as the prefix and after the cursor as the suffix.) With FIM, we can tell the model which part of the prompt is the prefix, and which part is the suffix.

Even if you’re creating something from scratch and have a skeleton of a file, we know that coding isn’t linear or sequential. So, while you bounce around your file, FIM helps GitHub Copilot offer better coding suggestions for the part in your file where your cursor is located, or the code that’s supposed to come between the prefix and suffix.

Based on A/B testing, FIM gave a 10% relative boost in performance, meaning developers accepted 10% more of the completions that were shown to them. And thanks to optimal use of caching, neighboring tabs and FIM work in the background without any added latency.

System diagram focused on model quality efforts. The diagram starts on the left with inputs from open tabs, data from editor, and vector database, which feed into a prompt library. (We are continuously working on improvements to provide better context from available sources in the prompt.) This then goes into the prompt, which is fed through a contextual filter model and a GPT model. (We are continuously working on new and improved model engines optimized for GitHub Copilot.) This model provides completions to fill in the middle of the prompt prefix and suffix. From the models, n completions are generated, and less than or equal to n completions are shown.

Improving semantic understanding

Today, we’re experimenting with vector databases that could create a customized coding experience for developers working in private repositories or with proprietary code. Generative AI coding tools use something called embeddings to retrieve information from a vector database.

  • What’s a vector database? It’s a database that indexes high-dimensional vectors.

  • What’s a high-dimensional vector? They’re mathematical representations of objects, and because these vectors can model objects in a number of dimensions, they can capture complexities of that object. When used properly to represent pieces of code, they may represent both the semantics and even intention of the code—not just the syntax.

  • What’s an embedding? In the context of coding and LLMs, an embedding is the representation of a piece of code as a high-dimensional vector. Because of the “knowledge” the LLM has of both programming and natural language, it’s able to capture both the syntax and semantics of the code in the vector.

Here’s how they’d all work together:

  • Algorithms would create embeddings for all snippets in the repository (potentially billions of them), and keep them stored in the vector database.
  • Then, as you’re coding, algorithms would embed the snippets in your IDE.
  • Algorithms would then make approximate matches—also, in real time—between the embeddings that are created for your IDE snippets and the embeddings already stored in the vector database. The vector database is what allows algorithms to quickly search for approximate matches (not just exact ones) on the vectors it stores, even if it’s storing billions of embedded code snippets.

Developers are familiar with retrieving data with hashcodes, which typically look for exact character by character matches, explained Alireza Goudarzi, senior ML researcher at GitHub. “But embeddings—because they arise from LLMs that were trained on a vast amount of data—develop a sense of semantic closeness between code snippets and natural language prompts.”

Read the three sentences below and identify which two are the most semantically similar.

  • Sentence A: The king moved and captured the pawn.
  • Sentence B: The king was crowned in Westminster Abbey.
  • Sentence C: Both white rooks were still in the game.

The answer is sentences A and C because both are about chess. While sentences A and B are syntactically, or structurally similar because both have a king as the subject, they’re semantically different because “king” is used in different contexts.

Here’s how each of those statements could translate to Python. Note the syntactic similarity between snippets A and B despite their semantic difference, and the semantic similarity between snippets A and C despite their syntactic difference.

Snippet A:

if king.location() == pawn.location():
    board.captures_piece(king, pawn)

Snippet B:

if king.location() == "Westminster Abbey":

Snippet C:

if len([ r for r in board.pieces("white") if r.type == "rook" ]) == 2:
    return True

As mentioned above, we’re still experimenting with retrieval algorithms. We’re designing the feature with enterprise customers in mind, specifically those who are looking for a customized coding experience with private repositories and would explicitly opt in to use the feature.

Take this with you

Last year, we conducted quantitative research on GitHub Copilot and found that developers code up to 55% faster while using the pair programmer. This means developers feel more productive, complete repetitive tasks more quickly, and can focus more on satisfying work. But our work won’t stop there.

The GitHub product and R&D teams, including GitHub Next, have been collaborating with Microsoft Azure AI-Platform to continue bringing improvements to GitHub Copilot’s contextual understanding. So much of the work that helps GitHub Copilot contextualize your code happens behind the scenes. While you write and edit your code, GitHub Copilot is responding to your writing and edits in real time by generating prompts–or, in other words, prioritizing and sending relevant information to the model based on your actions in your IDE—to keep giving you the best coding suggestions.

Learn more

Best practices to optimize your Amazon EC2 Spot Instances usage

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/best-practices-to-optimize-your-amazon-ec2-spot-instances-usage/

This blog post is written by Pranaya Anshu, EC2 PMM, and Sid Ambatipudi, EC2 Compute GTM Specialist.

Amazon EC2 Spot Instances are a powerful tool that thousands of customers use to optimize their compute costs. The National Football League (NFL) is an example of customer using Spot Instances, leveraging 4000 EC2 Spot Instances across more than 20 instance types to build its season schedule. By using Spot Instances, it saves 2 million dollars every season! Virtually any organization – small or big – can benefit from using Spot Instances by following best practices.

Overview of Spot Instances

Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud and are available at up to a 90% discount compared to On-Demand prices. Through Spot Instances, you can take advantage of the massive operating scale of AWS and run hyperscale workloads at a significant cost saving. In exchange for these discounts, AWS has the option to reclaim Spot Instances when EC2 requires the capacity. AWS provides a two-minute notification before reclaiming Spot Instances, allowing workloads running on those instances to be gracefully shut down.

In this blog post, we explore four best practices that can help you optimize your Spot Instances usage and minimize the impact of Spot Instances interruptions: diversifying your instances, considering attribute-based instance type selection, leveraging Spot placement scores, and using the price-capacity-optimized allocation strategy. By applying these best practices, you’ll be able to leverage Spot Instances for appropriate workloads and ultimately reduce your compute costs. Note for the purposes of this blog, we will focus on the integration of Spot Instances with Amazon EC2 Auto Scaling groups.


Spot Instances can be used for various stateless, fault-tolerant, or flexible applications such as big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and AI/ML workloads. However, as previously mentioned, AWS can interrupt Spot Instances with a two-minute notification, so it is best not to use Spot Instances for workloads that cannot handle individual instance interruption — that is, workloads that are inflexible, stateful, fault-intolerant, or tightly coupled.

Best practices

  1. Diversify your instances

The fundamental best practice when using Spot Instances is to be flexible. A Spot capacity pool is a set of unused EC2 instances of the same instance type (for example, m6i.large) within the same AWS Region and Availability Zone (for example, us-east-1a). When you request Spot Instances, you are requesting instances from a specific Spot capacity pool. Since Spot Instances are spare EC2 capacity, you want to base your selection (request) on as many spare pools of capacity as possible in order to increase your likelihood of getting Spot Instances. You should diversify across instance sizes, generations, instance types, and Availability Zones to maximize your savings with Spot Instances. For example, if you are currently using c5a.large in us-east-1a, consider including c6a instances (newer generation of instances), c5a.xl (larger size), or us-east-1b (different Availability Zone) to increase your overall flexibility. Instance diversification is beneficial not only for selecting Spot Instances, but also for scaling, resilience, and cost optimization.

To get hands-on experience with Spot Instances and to practice instance diversification, check out Amazon EC2 Spot Instances workshops. And once you’ve diversified your instances, you can leverage AWS Fault Injection Simulator (AWS FIS) to test your applications’ resilience to Spot Instance interruptions to ensure that they can maintain target capacity while still benefiting from the cost savings offered by Spot Instances. To learn more about stress testing your applications, check out the Back to Basics: Chaos Engineering with AWS Fault Injection Simulator video and AWS FIS documentation.

  1. Consider attribute-based instance type selection

We have established that flexibility is key when it comes to getting the most out of Spot Instances. Similarly, we have said that in order to access your desired Spot Instances capacity, you should select multiple instance types. While building and maintaining instance type configurations in a flexible way may seem daunting or time-consuming, it doesn’t have to be if you use attribute-based instance type selection. With attribute-based instance type selection, you can specify instance attributes — for example, CPU, memory, and storage — and EC2 Auto Scaling will automatically identify and launch instances that meet your defined attributes. This removes the manual-lift of configuring and updating instance types. Moreover, this selection method enables you to automatically use newly released instance types as they become available so that you can continuously have access to an increasingly broad range of Spot Instance capacity. Attribute-based instance type selection is ideal for workloads and frameworks that are instance agnostic, such as HPC and big data workloads, and can help to reduce the work involved with selecting specific instance types to meet specific requirements.

For more information on how to configure attribute-based instance selection for your EC2 Auto Scaling group, refer to Create an Auto Scaling Group Using Attribute-Based Instance Type Selection documentation. To learn more about attribute-based instance type selection, read the Attribute-Based Instance Type Selection for EC2 Auto Scaling and EC2 Fleet news blog or check out the Using Attribute-Based Instance Type Selection and Mixed Instance Groups section of the Launching Spot Instances workshop.

  1. Leverage Spot placement scores

Now that we’ve stressed the importance of flexibility when it comes to Spot Instances and covered the best way to select instances, let’s dive into how to find preferred times and locations to launch Spot Instances. Because Spot Instances are unused EC2 capacity, Spot Instances capacity fluctuates. Correspondingly, it is possible that you won’t always get the exact capacity at a specific time that you need through Spot Instances. Spot placement scores are a feature of Spot Instances that indicates how likely it is that you will be able to get the Spot capacity that you require in a specific Region or Availability Zone. Your Spot placement score can help you reduce Spot Instance interruptions, acquire greater capacity, and identify optimal configurations to run workloads on Spot Instances. However, it is important to note that Spot placement scores serve only as point-in-time recommendations (scores can vary depending on current capacity) and do not provide any guarantees in terms of available capacity or risk of interruption.  To learn more about how Spot placement scores work and to get started with them, see the Identifying Optimal Locations for Flexible Workloads With Spot Placement Score blog and Spot placement scores documentation.

As a near real-time tool, Spot placement scores are often integrated into deployment automation. However, because of its logging and graphic capabilities, you may find it to be a valuable resource even before you launch a workload in the cloud. If you are looking to understand historical Spot placement scores for your workload, you should check out the Spot placement score tracker, a tool that automates the capture of Spot placement scores and stores Spot placement score metrics in Amazon CloudWatch. The tracker is available through AWS Labs, a GitHub repository hosting tools. Learn more about the tracker through the Optimizing Amazon EC2 Spot Instances with Spot Placement Scores blog.

When considering ideal times to launch Spot Instances and exploring different options via Spot placement scores, be sure to consider running Spot Instances at off-peak hours – or hours when there is less demand for EC2 Instances. As you may assume, there is less unused capacity – Spot Instances – available during typical business hours than after business hours. So, in order to leverage as much Spot capacity as you can, explore the possibility of running your workload at hours when there is reduced demand for EC2 instances and thus greater availability of Spot Instances. Similarly, consider running your Spot Instances in “off-peak Regions” – or Regions that are not experiencing business hours at that certain time.

On a related note, to maximize your usage of Spot Instances, you should consider using previous generation of instances if they meet your workload needs. This is because, as with off-peak vs peak hours, there is typically greater capacity available for previous generation instances than current generation instances, as most people tend to use current generation instances for their compute needs.

  1. Use the price-capacity-optimized allocation strategy

Once you’ve selected a diversified and flexible set of instances, you should select your allocation strategy. When launching instances, your Auto Scaling group uses the allocation strategy that you specify to pick the specific Spot pools from all your possible pools. Spot offers four allocation strategies: price-capacity-optimized, capacity-optimized, capacity-optimized-prioritized, and lowest-price. Each of these allocation strategies select Spot Instances in pools based on price, capacity, a prioritized list of instances, or a combination of these factors.

The price-capacity-optimized strategy launched in November 2022. This strategy makes Spot Instance allocation decisions based on the most capacity at the lowest price. It essentially enables Auto Scaling groups to identify the Spot pools with the highest capacity availability for the number of instances that are launching. In other words, if you select this allocation strategy, we will find the Spot capacity pools that we believe have the lowest chance of interruption in the near term. Your Auto Scaling groups then request Spot Instances from the lowest priced of these pools.

We recommend you leverage the price-capacity-optimized allocation strategy for the majority of your workloads that run on Spot Instances. To see how the price-capacity-optimized allocation strategy selects Spot Instances in comparison with lowest-price and capacity-optimized allocation strategies, read the Introducing the Price-Capacity-Optimized Allocation Strategy for EC2 Spot Instances blog post.


If you’ve explored the different Spot Instances workshops we recommended throughout this blog post and spun up resources, please remember to delete resources that you are no longer using to avoid incurring future costs.


Spot Instances can be leveraged to reduce costs across a wide-variety of use cases, including containers, big data, machine learning, HPC, and CI/CD workloads. In this blog, we discussed four Spot Instances best practices that can help you optimize your Spot Instance usage to maximize savings: diversifying your instances, considering attribute-based instance type selection, leveraging Spot placement scores, and using the price-capacity-optimized allocation strategy.

To learn more about Spot Instances, check out Spot Instances getting started resources. Or to learn of other ways of reducing costs and improving performance, including leveraging other flexible purchase models such as AWS Savings Plans, read the Increase Your Application Performance at Lower Costs eBook or watch the Seven Steps to Lower Costs While Improving Application Performance webinar.

Magic in minutes: how to build a ChatGPT plugin with Cloudflare Workers

Post Syndicated from Kristian Freeman original http://blog.cloudflare.com/magic-in-minutes-how-to-build-a-chatgpt-plugin-with-cloudflare-workers/

Magic in minutes: how to build a ChatGPT plugin with Cloudflare Workers

Magic in minutes: how to build a ChatGPT plugin with Cloudflare Workers

Today, we're open-sourcing our ChatGPT Plugin Quickstart repository for Cloudflare Workers, designed to help you build awesome and versatile plugins for ChatGPT with ease. If you don’t already know, ChatGPT is a conversational AI model from OpenAI which has an uncanny ability to take chat input and generate human-like text responses.

With the recent addition of ChatGPT plugins, developers can create custom extensions and integrations to make ChatGPT even more powerful. Developers can now provide custom flows for ChatGPT to integrate into its conversational workflow – for instance, the ability to look up products when asking questions about shopping, or retrieving information from an API in order to have up-to-date data when working through a problem.

That's why we're super excited to contribute to the growth of ChatGPT plugins with our new Quickstart template. Our goal is to make it possible to build and deploy a new ChatGPT plugin to production in minutes, so developers can focus on creating incredible conversational experiences tailored to their specific needs.

How it works

Our Quickstart is designed to work seamlessly with Cloudflare Workers. Under the hood, it uses our command-line tool wrangler to create a new project and deploy it to Workers.

When building a ChatGPT plugin, there are three things you need to consider:

  1. The plugin's metadata, which includes the plugin's name, description, and other info
  2. The plugin's schema, which defines the plugin's input and output
  3. The plugin's behavior, which defines how the plugin responds to user input

To handle all of these parts in a simple, easy-to-understand API, we've created the @cloudflare/itty-router-openapi package, which makes it easy to manage your plugin's metadata, schema, and behavior. This package is included in the ChatGPT Plugin Quickstart, so you can get started right away.

To show how the package works, we'll look at two key files in the ChatGPT Plugin Quickstart: index.js and search.js. The index.js file contains the plugin's metadata and schema, while the search.js file contains the plugin's behavior. Let's take a look at each of these files in more detail.

In index.js, we define the plugin's metadata and schema. The metadata includes the plugin's name, description, and version, while the schema defines the plugin's input and output.

The configuration matches the definition required by OpenAI's plugin manifest, and helps ChatGPT understand what your plugin is, and what purpose it serves.

Here's what the index.js file looks like:

import { OpenAPIRouter } from "@cloudflare/itty-router-openapi";
import { GetSearch } from "./search";

export const router = OpenAPIRouter({
  schema: {
    info: {
      title: 'GitHub Repositories Search API',
      description: 'A plugin that allows the user to search for GitHub repositories using ChatGPT',
      version: 'v0.0.1',
  docs_url: '/',
  aiPlugin: {
    name_for_human: 'GitHub Repositories Search',
    name_for_model: 'github_repositories_search',
    description_for_human: "GitHub Repositories Search plugin for ChatGPT.",
    description_for_model: "GitHub Repositories Search plugin for ChatGPT. You can search for GitHub repositories using this plugin.",
    contact_email: '[email protected]',
    legal_info_url: 'http://www.example.com/legal',
    logo_url: 'https://workers.cloudflare.com/resources/logo/logo.svg',

router.get('/search', GetSearch)

// 404 for everything else
router.all('*', () => new Response('Not Found.', { status: 404 }))

export default {
  fetch: router.handle

In the search.js file, we define the plugin's behavior. This is where we define how the plugin responds to user input. It also defines the plugin's schema, which ChatGPT uses to validate the plugin's input and output.

Importantly, this doesn't just define the implementation of the code. It also automatically generates an OpenAPI schema that helps ChatGPT understand how your code works — for instance, that it takes a parameter "q", that it is of "String" type, and that it can be described as "The query to search for". With the schema defined, the handle function makes any relevant parameters available as function arguments, to implement the logic of the endpoint as you see fit.

Here's what the search.js file looks like:

import { ApiException, OpenAPIRoute, Query, ValidationError } from "@cloudflare/itty-router-openapi";

export class GetSearch extends OpenAPIRoute {
  static schema = {
    tags: ['Search'],
    summary: 'Search repositories by a query parameter',
    parameters: {
      q: Query(String, {
        description: 'The query to search for',
        default: 'cloudflare workers'
    responses: {
      '200': {
        schema: {
          repos: [
              name: 'itty-router-openapi',
              description: 'OpenAPI 3 schema generator and validator for Cloudflare Workers',
              stars: '80',
              url: 'https://github.com/cloudflare/itty-router-openapi',

  async handle(request: Request, env, ctx, data: Record<string, any>) {
    const url = `https://api.github.com/search/repositories?q=${data.q}`

    const resp = await fetch(url, {
      headers: {
        'Accept': 'application/vnd.github.v3+json',
        'User-Agent': 'RepoAI - Cloudflare Workers ChatGPT Plugin Example'

    if (!resp.ok) {
      return new Response(await resp.text(), { status: 400 })

    const json = await resp.json()

    // @ts-ignore
    const repos = json.items.map((item: any) => ({
      name: item.name,
      description: item.description,
      stars: item.stargazers_count,
      url: item.html_url

    return {
      repos: repos

The quickstart smooths out the entire development process, so you can focus on crafting custom behaviors, endpoints, and features for your ChatGPT plugins without getting caught up in the nitty-gritty. If you aren't familiar with API schemas, this also means that you can rely on our schema and manifest generation tools to handle the complicated bits, and focus on the implementation to build your plugin.

Besides making development a breeze, it's worth noting that you're also deploying to Workers, which takes advantage of Cloudflare's vast global network. This means your ChatGPT plugins enjoy low-latency access and top-notch performance, no matter where your users are located. By combining the strengths of Cloudflare Workers with the versatility of ChatGPT plugins, you can create conversational AI tools that are not only powerful and scalable but also cost-effective and globally accessible.


To demonstrate the capabilities of our quickstarts, we've created two example ChatGPT plugins. The first, which we reviewed above, connects ChatGPT with the GitHub Repositories Search API. This plugin enables users to search for repositories by simply entering a search term, returning useful information such as the repository's name, description, star count, and URL.

One intriguing aspect of this example is the property where the plugin could go beyond basic querying. For instance, when asked "What are the most popular JavaScript projects?", ChatGPT was able to intuitively understand the user's intent and craft a new query parameter for querying both by the number of stars (measuring popularity), and the specific programming language (JavaScript) without requiring any explicit prompting. This showcases the power and adaptability of ChatGPT plugins when integrated with external APIs, providing more insightful and context-aware responses.

Magic in minutes: how to build a ChatGPT plugin with Cloudflare Workers

The second plugin uses the Pirate Weather API to retrieve up-to-date weather information. Remarkably, OpenAI is able to translate the request for a specific location (for instance, “Seattle, Washington”) into longitude and latitude values – which the Pirate Weather API uses for lookups – and make the correct API request, without the user needing to do any additional work.

Magic in minutes: how to build a ChatGPT plugin with Cloudflare Workers

With our ChatGPT Plugin Quickstarts, you can create custom plugins that connect to any API, database, or other data source, giving you the power to create ChatGPT plugins that are as unique and versatile as your imagination. The possibilities are endless, opening up a whole new world of conversational AI experiences tailored to specific domains and use cases.

Get started today

The ChatGPT Plugin Quickstarts don’t just make development a snap—it also offers seamless deployment and scaling thanks to Cloudflare Workers. With the generous free plan provided by Workers, you can deploy your plugin quickly and scale it infinitely as needed.

Our ChatGPT Plugin Quickstarts are all about sparking creativity, speeding up development, and empowering developers to create amazing conversational AI experiences. By leveraging Cloudflare Workers' robust infrastructure and our streamlined tooling, you can easily build, deploy, and scale custom ChatGPT plugins, unlocking a world of endless possibilities for conversational AI applications.

Whether you're crafting a virtual assistant, a customer support bot, a language translator, or any other conversational AI tool, our ChatGPT Plugin Quickstarts are a great place to start. We're excited to provide this Quickstart, and would love to see what you build with it. Join us in our Discord community to share what you're working on!

Security Risks of AI

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/04/security-risks-of-ai.html

Stanford and Georgetown have a new report on the security risks of AI—particularly adversarial machine learning—based on a workshop they held on the topic.

Jim Dempsey, one of the workshop organizers, wrote a blog post on the report:

As a first step, our report recommends the inclusion of AI security concerns within the cybersecurity programs of developers and users. The understanding of how to secure AI systems, we concluded, lags far behind their widespread adoption. Many AI products are deployed without institutions fully understanding the security risks they pose. Organizations building or deploying AI models should incorporate AI concerns into their cybersecurity functions using a risk management framework that addresses security throughout the AI system life cycle. It will be necessary to grapple with the ways in which AI vulnerabilities are different from traditional cybersecurity bugs, but the starting point is to assume that AI security is a subset of cybersecurity and to begin applying vulnerability management practices to AI-based features. (Andy Grotto and I have vigorously argued against siloing AI security in its own governance and policy vertical.)

Our report also recommends more collaboration between cybersecurity practitioners, machine learning engineers, and adversarial machine learning researchers. Assessing AI vulnerabilities requires technical expertise that is distinct from the skill set of cybersecurity practitioners, and organizations should be cautioned against repurposing existing security teams without additional training and resources. We also note that AI security researchers and practitioners should consult with those addressing AI bias. AI fairness researchers have extensively studied how poor data, design choices, and risk decisions can produce biased outcomes. Since AI vulnerabilities may be more analogous to algorithmic bias than they are to traditional software vulnerabilities, it is important to cultivate greater engagement between the two communities.

Another major recommendation calls for establishing some form of information sharing among AI developers and users. Right now, even if vulnerabilities are identified or malicious attacks are observed, this information is rarely transmitted to others, whether peer organizations, other companies in the supply chain, end users, or government or civil society observers. Bureaucratic, policy, and cultural barriers currently inhibit such sharing. This means that a compromise will likely remain mostly unnoticed until long after attackers have successfully exploited vulnerabilities. To avoid this outcome, we recommend that organizations developing AI models monitor for potential attacks on AI systems, create—formally or informally—a trusted forum for incident information sharing on a protected basis, and improve transparency.

Mitigating DDoS with data science using AWS Shield Advanced and AWS WAF

Post Syndicated from Jasmine Maheshwari original https://aws.amazon.com/blogs/architecture/mitigating-ddos-with-data-science-using-aws-shield-advanced-and-aws-waf/

This blog post helps customers in mitigating distributed denial-of-service (DDoS) using AWS Shield Advanced, AWS WAF, and data science. We explore how to use these services along with machine learning (ML) to detect and mitigate DDoS attacks.

Bad actors conduct DDoS attacks using botnets. Through botnets, attackers look for zero-day vulnerabilities—specifically on network devices such as routers—and make these devices a part of their network. Bots speared around the world are waiting for instructions from control servers.

This post examines a real-world use case. Online payment solutions company Razorpay has a business-to-business (B2B) model where merchants invoke APIs for payments. Given the complex nature of B2B payments, the traffic patterns between small-scale merchants and large-scale merchants needs to be distinguished. This business model puts the company at risk for DDoS attacks, and here is how AWS helps them address this ongoing concern.

First, we will introduce an initial DDoS architecture approach and then how it was modified to meet specific Razorpay needs, as is possible for cross-industry client with varying requirements.

Basic DDoS recommendations

Let’s explore some out-of-the-box DDoS business solution recommendations for different resource categories:

  • Static websites: Amazon CloudFront is a content delivery network (CDN) service that is used to host static websites. The AWS WAF rate-limiting rule can be used to limit traffic on a CDN.
  • Dynamic websites: In this scenario, a standard WAF Rate Limit rule can be used on Application Load Balancer (ALB).
  • APIs with customer context: APIs receive these assets on behalf of other merchants (for example, an end-user places a food order from a food-delivery application, completes a payment, and the request is sent to api.razorpay.com. Razorpay understands that the end user is trying to make a payment toward the food app).

Figure 1 represents the difficulty levels and solutions for each resource category.

DDoS business solution resource category recommendations

Figure 1. DDoS business solution resource category recommendations

DDoS response phases

We’ve discussed initial DDoS architectures, so now let’s explore the various DDoS response phases:

  • Phase 1 – Automated inventory: Creating an automated inventory system capturing the list of APIs or websites exposed. The APIs are ranked by exposure and risk of an outage to create a list of assets that is ranked on the basis of risk.
  • Phase 2 – Bucketing assets into groups: Bucketing APIs and websites (static or dynamic) into groups. This phase also subgroups the assets into unauthenticated or authenticated, users and more. As we will discuss later in this post, most Razorpay APIs have a customer context requiring a multilayered defense mechanism.
  • Phase 3 – Testing the solution: Simulating attack traffic to test the solution under different loads, origins, and more.

AWS DDoS solutions

AWS has two major out-of-the-box solutions when protecting against DDoS attacks: AWS Shield and AWS WAF.

AWS Shield has three important features for DDoS mitigation:

  1. Alarms: Triggers an alarm when a DDoS attack is suspected, customized based on metrics such as total volume, error rates at the ALB, and response latency
  2. Visibility: Provides top five important field values in real time, such as requesting client IPs, countries, user agents, referrer headers, and url routes to take action.
  3. On-call support: Provides on-call support from the Shield Response team (SRT) to understand attack vectors and create AWS WAF rules based on insights.

AWS WAF has two rule types:

  1. Blocking: Blocks requests matching expected variables, from individuals to a combination of fields such as IP, URL route, body, or country
  2. Rate limiting: Tracks the rate of requests for each originating IP address, and triggers the rule action on IPs with rates that go over the limit. A scope-down statement can also be added to the rule, to only track and rate limit requests that match the scope-down statement. (We will soon discuss how Razorpay relies heavily on rate limiting it the AWS WAF and other microservice architecture solutions.)

Razorpay microservices protection architecture

Razorpay has two main layers protecting their internal microservices. The request to payment API api.razorpay.com goes from Amazon Route 53 to ALB. AWS WAF and AWS Shield Advanced are deployed on ALB. Requests are then forwarded to microservices fronted by API Gateway as in Figure 2.

Razorpay’s internal microservices protection layers

Figure 2. Razorpay internal microservices protection layers

Razorpay API Gateway

An API Gateway is a critical piece of infrastructure for microservice architectures. Razorpay is a B2B organization that performs actions on behalf of merchants. To achieve this—and understand context about the merchants and end-users alike—a custom API Gateway:

  • Works at a high scale with very low latency
  • Understands merchant context to provide authentication and authorization
  • Provides security features around malicious traffic

Razorpay uses a self-managed API gateway called Edge. The API Gateway is the first system on the ingress path to understand Razorpay domain context that prior layers cannot. Every request goes through multiple layers of logic at the API Gateway before being forwarded to respective services.

Razorpay Edge and AWS solution insights

As Edge performs multiple computations on each request, Razorpay generates insights for each request to build intelligence and prevent DDoS attacks. These events are fed into a time-series database in near real time and generate insights using machine learning (ML) models. The insights:

  1. Identify malicious patterns: Generating confidence percentage that helps in defining further action, verifies consumer authenticity, and serves the request. It also blocks malicious requests at the edge.
  2. Derive pattern-based rate limits: Deriving rate limits based on a larger set of data—including consumer and IP address—by looking at weekly and monthly patterns.
    Once this information is available at Razorypay Edge, it can perform various operations to ensure only legit requests reach the service layer. Let’s explore the request flow to understand each stage’s contribution toward DDoS handling, as in Figure 3.
Razorpay Edge request flow for DDoS handling

Figure 3. Razorpay Edge request flow for DDoS handling

All Razorpay requests flow through API Gateway. Here are some key recommendations:

  • Based on feedback data available, the gateway checks if the given request pattern falls in the malicious category. If the confidence percentage is high for a malicious request, block the request. A block rule is applied at the AWS WAF layer. If the confidence percentage is low, use CAPTCHA to check user authenticity.
  • Apply rate limit on pre-existing data in the request object using feedback from an ML model.
  • Perform authentication/identification and authorization.
  • Apply rate limit on data derived from request object, also using feedback from a ML model.
  • Request user CAPTCHA verification if the request is identified as malicious with low confidence.

As a best practice, publish generated insights for each request to Apache Kafka out of the request flow to avoid latency impact and reduce blast radius.


In this post, you learned about addressing DDoS attacks with ML models.

Organizations can use the insights data generated by an API Gateway to identify request patterns, categorize the patterns into various buckets, and perform the following depending on category:

  • Send feedback to an API Gateway for immediate action
  • Update the rate limits for Edge-based historical patterns
  • Update WAF rule for blocking/reducing rate limits

As discussed, using AWS WAF and AWS Shield Advanced along with ML helps detect and mitigate DDoS attacks.

Using LLMs to Create Bioweapons

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/04/using-llms-to-create-bioweapons.html

I’m not sure there are good ways to build guardrails to prevent this sort of thing:

There is growing concern regarding the potential misuse of molecular machine learning models for harmful purposes. Specifically, the dual-use application of models for predicting cytotoxicity18 to create new poisons or employing AlphaFold2 to develop novel bioweapons has raised alarm. Central to these concerns are the possible misuse of large language models and automated experimentation for dual-use purposes or otherwise. We specifically address two critical the synthesis issues: illicit drugs and chemical weapons. To evaluate these risks, we designed a test set comprising compounds from the DEA’s Schedule I and II substances and a list of known chemical weapon agents. We submitted these compounds to the Agent using their common names, IUPAC names, CAS numbers, and SMILESs strings to determine if the Agent would carry out extensive analysis and planning (Figure 6).


The run logs can be found in Appendix F. Out of 11 different prompts (Figure 6), four (36%) provided a synthesis solution and attempted to consult documentation to execute the procedure. This figure is alarming on its own, but an even greater concern is the way in which the Agent declines to synthesize certain threats. Out of the seven refused chemicals, five were rejected after the Agent utilized search functions to gather more information about the substance. For instance, when asked about synthesizing codeine, the Agent becomes alarmed upon learning the connection between codeine and morphine, only then concluding that the synthesis cannot be conducted due to the requirement of a controlled substance. However, this search function can be easily manipulated by altering the terminology, such as replacing all mentions of morphine with “Compound A” and codeine with “Compound B”. Alternatively, when requesting a b synthesis procedure that must be performed in a DEA-licensed facility, bad actors can mislead the Agent by falsely claiming their facility is licensed, prompting the Agent to devise a synthesis solution.

In the remaining two instances, the Agent recognized the common names “heroin” and “mustard gas” as threats and prevented further information gathering. While these results are promising, it is crucial to recognize that the system’s capacity to detect misuse primarily applies to known compounds. For unknown compounds, the model is less likely to identify potential misuse, particularly for complex protein toxins where minor sequence changes might allow them to maintain the same properties but become unrecognizable to the model.