Integrate Amazon Bedrock with Amazon Redshift ML for generative AI applications

2024-11-01 Satesh Sonti

Post Syndicated from Satesh Sonti original https://aws.amazon.com/blogs/big-data/integrate-amazon-bedrock-with-amazon-redshift-ml-for-generative-ai-applications/

Amazon Redshift has enhanced its Redshift ML feature to support integration of large language models (LLMs). As part of these enhancements, Redshift now enables native integration with Amazon Bedrock. This integration enables you to use LLMs from simple SQL commands alongside your data in Amazon Redshift, helping you to build generative AI applications quickly. This powerful combination enables customers to harness the transformative capabilities of LLMs and seamlessly incorporate them into their analytical workflows.

With this new integration, you can now perform generative AI tasks such as language translation, text summarization, text generation, customer classification, and sentiment analysis on your Redshift data using popular foundation models (FMs) such as Anthropic’s Claude, Amazon Titan, Meta’s Llama 2, and Mistral AI. You can use the CREATE EXTERNAL MODEL command to point to a text-based model in Amazon Bedrock, requiring no model training or provisioning. You can invoke these models using familiar SQL commands, making it more straightforward than ever to integrate generative AI capabilities into your data analytics workflows.

Solution overview

To illustrate this new Redshift machine learning (ML) feature, we will build a solution to generate personalized diet plans for patients based on their conditions and medications. The following figure shows the steps to build the solution and the steps to run it.

The steps to build and run the solution are the following:

Load sample patients’ data
Prepare the prompt
Enable LLM access
Create a model that references the LLM model on Amazon Bedrock
Send the prompt and generate a personalized patient diet plan

Pre-requisites

An AWS account.
An Amazon Redshift Serverless workgroup or provisioned data warehouse. For setup instructions, see Creating a workgroup with a namespace or Create a sample Amazon Redshift data warehouse, respectively. The Amazon Bedrock integration feature is supported in both Amazon Redshift provisioned and serverless.
Create or update an AWS Identity and Access Management (IAM role) for Amazon Redshift ML integration with Amazon Bedrock.
Associate the IAM role to a Redshift instance.
Users should have the required permissions to create models.

Implementation

The following are the solution implementation steps. The sample data used in the implementation is for illustration only. The same implementation approach can be adapted to your specific data sets and use cases.

You can download a SQL notebook to run the implementation steps in Redshift Query Editor V2. If you’re using another SQL editor, you can copy and paste the SQL queries either from the content of this post or from the notebook.

Load sample patients’ data:

Open Amazon Redshift Query Editor V2 or another SQL editor of your choice and connect to the Redshift data warehouse.
Run the following SQL to create the patientsinfo table and load sample data.

-- Create table

CREATE TABLE patientsinfo (
pid integer ENCODE az64,
pname varchar(100),
condition character varying(100) ENCODE lzo,
medication character varying(100) ENCODE lzo
);

Download the sample file, upload it into your S3 bucket, and load the data into the patientsinfo table using the following COPY command.

-- Load sample data
COPY patientsinfo
FROM 's3://<<your_s3_bucket>>/sample_patientsinfo.csv'
IAM_ROLE DEFAULT
csv
DELIMITER ','
IGNOREHEADER 1;

Prepare the prompt:

Run the following SQL to aggregate patient conditions and medications.

SELECT
pname,
listagg(distinct condition,',') within group (order by pid) over (partition by pid) as conditions,
listagg(distinct medication,',') within group (order by pid) over (partition by pid) as medications
FROM patientsinfo

The following is the sample output showing aggregated conditions and medications. The output includes multiple rows, which will be grouped in the next step.

Build the prompt to combine patient, conditions, and medications data.

SELECT
pname || ' has ' || conditions || ' taking ' || medications as patient_prompt
FROM (
    SELECT pname, 
    listagg(distinct condition,',') within group (order by pid) over (partition by pid) as conditions,
    listagg(distinct medication,',') within group (order by pid) over (partition by pid) as medications
    FROM patientsinfo) 
GROUP BY 1

The following is the sample output showing the results of the fully built prompt concatenating the patients, conditions, and medications into single column value.

Create a materialized view with the preceding SQL query as the definition. This step isn’t mandatory; you’re creating the table for readability. Note that you might see a message indicating that materialized views with column aliases won’t be incrementally refreshed. You can safely ignore this message for the purpose of this illustration.

CREATE MATERIALIZED VIEW mv_prompts AUTO REFRESH YES
AS
(
SELECT pid,
pname || ' has ' || conditions || ' taking ' || medications as patient_prompt
FROM (
SELECT pname, pid,
listagg(distinct condition,',') within group (order by pid) over (partition by pid) as conditions,
listagg(distinct medication,',') within group (order by pid) over (partition by pid) as medications
FROM patientsinfo)
GROUP BY 1,2
)

Run the following SQL to review the sample output.

SELECT * FROM mv_prompts limit 5;

The following is a sample output with a materialized view.

Enable LLM model access:

Perform the following steps to enable model access in Amazon Bedrock.

Navigate to the Amazon Bedrock console.
In the navigation pane, choose Model Access.

Choose Enable specific models.
You must have the required IAM permissions to enable access to available Amazon Bedrock FMs.

For this illustration, use Anthropic’s Claude model. Enter Claude in the search box and select Claude from the list. Choose Next to proceed.

Review the selection and choose Submit.

Create a model referencing the LLM model on Amazon Bedrock:

Navigate back to Amazon Redshift Query Editor V2 or, if you didn’t use Query Editor V2, to the SQL editor you used to connect with Redshift data warehouse.
Run the following SQL to create an external model referencing the anthropic.claude-v2 model on Amazon Bedrock. See Amazon Bedrock model IDs for how to find the model ID.

CREATE EXTERNAL MODEL patient_recommendations
FUNCTION patient_recommendations_func
IAM_ROLE '<<provide the arn of IAM role created in pre-requisites>>'
MODEL_TYPE BEDROCK
SETTINGS (
    MODEL_ID 'anthropic.claude-v2',
    PROMPT 'Generate personalized diet plan for following patient:');

Send the prompt and generate a personalized patient diet plan:

Run the following SQL to pass the prompt to the function created in the previous step.

SELECT patient_recommendations_func(patient_prompt) 
FROM mv_prompts limit 2;

You will get the output with the generated diet plan. You can copy the cells and paste in a text editor or export the output to view the results in a spreadsheet if you’re using Redshift Query Editor V2.

You will need to expand the row size to see the complete text.

Additional customization options

The previous example demonstrates a straightforward integration of Amazon Redshift with Amazon Bedrock. However, you can further customize this integration to suit your specific needs and requirements.

Inference functions as leader-only functions: Amazon Bedrock model inference functions can run as leader node-only when the query doesn’t reference tables. This can be helpful if you want to quickly ask an LLM a question.

You can run following SQL with no FROM clause. This will run as leader-node only function because it doesn’t need data to fetch and pass to the model.

SELECT patient_recommendations_func('Generate diet plan for pre-diabetes');

This will return a generic 7-day diet plan for pre-diabetes. The following figure is an output sample generated by the preceding function call.

Inference with UNIFIED request type models: In this mode, you can pass additional optional parameters along with input text to customize the response. Amazon Redshift passes these parameters to the corresponding parameters for the Converse API.

In the following example, we’re setting the temperature parameter to a custom value. The parameter temperature affects the randomness and creativity of the model’s outputs. The default value is 1 (the range is 0–1.0).

SELECT patient_recommendations_func(patient_prompt,object('temperature', 0.2)) 
FROM mv_prompts
WHERE pid=101;

The following is a sample output with a temperature of 0.2. The output includes recommendations to drink fluids and avoid certain foods.

Regenerate the predictions, this time setting the temperature to 0.8 for the same patient.

SELECT patient_recommendations_func(patient_prompt,object('temperature', 0.8)) 
FROM mv_prompts
WHERE pid=101;

The following is a sample output with a temperature of 0.8. The output still includes recommendations on fluid intake and foods to avoid, but is more specific in those recommendations.

Note that the output won’t be the same every time you run a particular query. However, we want to illustrate that the model behavior is influenced by changing parameters.

Inference with RAW request type models: CREATE EXTERNAL MODEL supports Amazon Bedrock-hosted models, even those that aren’t supported by the Amazon Bedrock Converse API. In those cases, the request_type needs to be raw and the request needs to be constructed during inference. The request is a combination of a prompt and optional parameters.

Make sure that you enable access to the Titan Text G1 – Express model in Amazon Bedrock before running the following example. You should follow the same steps as described previously in Enable LLM model access to enable access to this model.

-- Create model with REQUEST_TYPE as RAW

CREATE EXTERNAL MODEL titan_raw
FUNCTION func_titan_raw
IAM_ROLE '<<provide the arn of IAM role created in pre-requisites>>'
MODEL_TYPE BEDROCK
SETTINGS (
MODEL_ID 'amazon.titan-text-express-v1',
REQUEST_TYPE RAW,
RESPONSE_TYPE SUPER);

-- Need to construct the request during inference.
SELECT func_titan_raw(object('inputText', 'Generate personalized diet plan for following: ' || patient_prompt, 'textGenerationConfig', object('temperature', 0.5, 'maxTokenCount', 500)))
FROM mv_prompts limit 1;

The following figure shows the sample output.

Fetch run metrics with RESPONSE_TYPE as SUPER: If you need more information about an input request such as total tokens, you can request the RESPONSE_TYPE to be super when you create the model.

-- Create Model specifying RESPONSE_TYPE as SUPER.

CREATE EXTERNAL MODEL patient_recommendations_v2
FUNCTION patient_recommendations_func_v2
IAM_ROLE '<<provide the arn of IAM role created in pre-requisites>>'
MODEL_TYPE BEDROCK
SETTINGS (
MODEL_ID 'anthropic.claude-v2',
PROMPT 'Generate personalized diet plan for following patient:',
RESPONSE_TYPE SUPER);

-- Run the inference function
SELECT patient_recommendations_func_v2(patient_prompt)
FROM mv_prompts limit 1;

The following figure shows the output, which includes the input tokens, output tokens, and latency metrics.

Considerations and best practices

There are a few things to keep in mind when using the methods described in this post:

Inference queries might generate throttling exceptions because of the limited runtime quotas for Amazon Bedrock. Amazon Redshift retries requests multiple times, but queries can still be throttled because throughput for non-provisioned models might be variable.
The throughput of inference queries is limited by the runtime quotas of the different models offered by Amazon Bedrock in different AWS Regions. If you find that the throughput isn’t enough for your application, you can request a quota increase for your account. For more information, see Quotas for Amazon Bedrock.
If you need stable and consistent throughput, consider getting provisioned throughput for the model that you need from Amazon Bedrock. For more information, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock.
Using Amazon Redshift ML with Amazon Bedrock incurs additional costs. The cost is model- and Region-specific and depends on the number of input and output tokens that the model will process. For more information, see Amazon Bedrock Pricing.

Cleanup

To avoid incurring future charges, delete the Redshift Serverless instance or Redshift provisioned data warehouse created as part of the prerequisite steps.

Conclusion

In this post, you learned how to use the Amazon Redshift ML feature to invoke LLMs on Amazon Bedrock from Amazon Redshift. You were provided with step-by-step instructions on how to implement this integration, using illustrative datasets. Additionally, read about various options to further customize the integration to help meet your specific needs. We encourage you to try Redshift ML integration with Amazon Bedrock and share your feedback with us.

About the Authors

Satesh Sonti is a Sr. Analytics Specialist Solutions Architect based out of Atlanta, specialized in building enterprise data services, data warehousing, and analytics solutions. He has over 19 years of experience in building data assets and leading complex data services for banking and insurance clients across the globe.

Nikos Koulouris is a Software Development Engineer at AWS. He received his PhD from University of California, San Diego and he has been working in the areas of databases and analytics.

Unlock the potential of your supply chain data and gain actionable insights with AWS Supply Chain Analytics

2024-10-31 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/unlock-the-potential-of-your-supply-chain-data-and-gain-actionable-insights-with-aws-supply-chain-analytics/

Today, we’re announcing the general availability of AWS Supply Chain Analytics powered by Amazon QuickSight. This new feature helps you to build custom report dashboards using your data in AWS Supply Chain. With this feature, your business analysts or supply chain managers can perform custom analyses, visualize data, and gain actionable insights for your supply chain management operations.

Here’s how it looks:

AWS Supply Chain Analytics leverages the AWS Supply Chain data lake and provides Amazon QuickSight embedded authoring tools directly into the AWS Supply Chain user interface. This integration provides you with a unified and configurable experience for creating custom insights, metrics, and key performance indicators (KPIs) for your operational analytics.

In addition, AWS Supply Chain Analytics provides prebuilt dashboards that you can use as-is or modify based on your needs. At launch, you will have the following prebuilt dashboards:

Plan-Over-Plan Variance: Presents a comparison between two demand plans, showcasing variances in both units and values across key dimensions such as product, site, and time periods.
Seasonality Analytics: Presents a year-over-year view of demand, illustrating trends in average demand quantities and highlighting seasonality patterns through heatmaps at both monthly and weekly levels.

Let’s get started
Let me walk you through the features of AWS Supply Chain Analytics.

The first step is to enable AWS Supply Chain Analytics. To do this, navigate to Settings, then select Organizations and choose Analytics. Here, I can Enable data access for Analytics.

Now I can edit existing roles or create a new role with analytics access. To learn more, visit User permission roles.

Once this feature is enabled, when I log in to AWS Supply Chain I can access the AWS Supply Chain Analytics feature by selecting either the Connecting to Analytics card or Analytics on the left navigation menu.

Here, I have an embedded Amazon QuickSight interface ready for me to use. To get started, I navigate to Prebuilt Dashboards.

Then, I can select the prebuilt dashboards I need in the Supply Chain Function dropdown list:

What I like the most about this prebuilt dashboards is I can easily get started. AWS Supply Chain Analytics will prepare all the datasets, analysis, and even a dashboard for me. I select Add to begin.

Then, I navigate to the dashboard page, and I can see the results. I can also share this dashboard with my team, which improves the collaboration aspect.

If I need to include other datasets for me to build a custom dashboard, I can navigate to Datasets and select New dataset.

Here, I have AWS Supply Chain data lake as an existing dataset for me to use.

Next, I need to select Create dataset.

Then, I can select a table that I need to include into my analysis. On the Data section, I can see all available fields. All data sets that start with asc_ are generated by AWS Supply Chain, such as data from Demand Planning, Insights, Supply Planning, and others.

I can also find all the datasets I have ingested into AWS Supply Chain. To learn more on data entities, visit the AWS Supply Chain documentation page. One thing to note here is if I have not ingested data into AWS Supply Chain Data Lake, I need to ingest data before using AWS Supply Chain Analytics. To learn how to ingest data into the data lake, visit the data lake page.

At this stage, I can start my analysis.

Now available
AWS Supply Chain Analytics is now generally available in all regions where AWS Supply Chain is offered. Give it a try to experience and transform your operations with the AWS Supply Chain Analytics.

Happy building,
— Donnie

October project goals update (Rust Blog)

2024-10-31 jzb

Post Syndicated from jzb original https://lwn.net/Articles/996585/

The Rust blog has an update
on its progress on some of its project
goals. One of the project’s flagship
goals is to resolve
the biggest blockers to Linux building on stable Rust:

Finally, we have been finding an increasing number of stabilization
requests at the compiler level, and so @wesleywiser and @davidtwco
from the compiler team have started attending meetings to create a
faster response. One of the results of that collaboration is RFC #3716,
authored by Alice Ryhl, which proposes a method to manage compiler
flags that modify the target ABI. Our previous approach has been to
create distinct targets for each combination of flags, but the number
of flags needed by the kernel make that impractical. Authoring the RFC
revealed more such flags than previously recognized, including those
that modify LLVM behavior.

Introducing an enhanced local IDE experience for AWS Lambda developers

2024-10-31 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-an-enhanced-local-ide-experience-for-aws-lambda-developers/

AWS Lambda is introducing an enhanced local IDE experience to simplify Lambda-based application development. The new features help developers to author, build, debug, test, and deploy Lambda applications more efficiently in their local IDE when using Visual Studio Code (VS Code).

Overview

The IDE experience is part of the AWS Toolkit for Visual Studio Code. A new guided walkthrough helps developers set up their local environment and install required tools. The toolkit includes a set of sample applications which show you how to iterate on your code locally and in the cloud. You can configure and save build settings to speed up application builds. Generate a configuration file to set up the debugging environment for VS Code to attach and launch the step-through debugger. Iterate faster by choosing to sync local application changes quickly to the cloud or perform a full application deploy. Test functions locally and in the cloud and create and share test events to speed up local and cloud testing. There are quick action buttons for build, deploy to cloud, and local or remote invoke. The toolkit integrates with AWS Infrastructure Composer, providing a visual application building experience directly from the IDE.

Using the new features

Installing the extension

To use the updated IDE experience, ensure you have the AWS Toolkit minimum version 3.31.0 installed as a VS Code Extension.

The AWS Toolkit now includes an additional section called Application Builder within the AWS extension side-bar. This allows you to view template resources and create, build, debug, and test serverless applications.

Using Application Builder for existing applications

You can open an existing local application template using Open Folder.

Lambda’s enhanced in-console editing experience allows you to download existing function code and an AWS Serverless Application Model (AWS SAM) template. This allows you to start in the console and more easily move to using infrastructure as code, which is a serverless best practice.

Using the guided walkthrough

The guided walkthrough helps you install dependencies, select an application template, and explains how to use Application Builder to iterate locally and deploy to the cloud.

Choose Open Walkthrough which opens the walkthrough.
Complete installation takes you through a wizard to install required dependencies and select application templates.

The wizard provides download links to install the three dependencies:

If you have installed the dependencies, selecting the links recognizes the installations.

Select Choose your application template, which allows you to create example applications in VS Code.

The Iterate locally tile provides guidance on how to use Application Builder to build and invoke the function, and how to view the results.

Deploy to the cloud provides a link to Configure credentials and explains how to deploy your function to the cloud, remote invoke from your IDE, and view the results.

Creating an application from the samples

The following steps show how to create a function locally from an included template. You build the code artifact, locally test and debug, deploy, and remotely invoke and view results and logs, all without leaving your IDE.

Navigate back to Choose your application template.
New template with visual builder allows you to use Infrastructure Composer to create a new application using a visual canvas.
See more application examples provides additional sample applications across a number of managed runtimes.

There are also two specific example applications to explore Lambda functionality.

Rest API: Learn how to build a synchronous Lambda function behind an API.
File processing.: Learn how to build an asynchronous Amazon S3 file processing application.

Building a synchronous Rest API application

Select Rest API and chose Initialize your project.
Select a language runtime. Select Python for this example.
Open the file explorer, create a folder to download the example application and choose Create Project.

Application Builder downloads the application. This includes the function code hello_world\app.py, with dependencies in requirements.txt, an AWS SAM template, template.yaml file, and an example event trigger, event.json. A README.md file explains the application structure and provides build and deploy instructions.

The Application Builder section populates with the template resources.

The icons provide shortcuts to view, build, and deploy the application.

You can also use the Command Pallete to initiate the AWS SAM commands.

Selecting the Open Template File icon opens the AWS SAM template in Infrastructure Composer.

View the application resources and select Details to edit the template using the visual canvas.

Navigate to the function resource and select Open Function Handler to show the function code.

Building the application

The build step helps you build artifacts from the files in your application project directory.

Select the Build SAM template icon.
Specify build flags allows you to configure AWS SAM builds settings.

Select build settings particular to your configuration. Cached and parallel are useful to speed up future builds. Use container builds your function in a Lambda-like container. This allows you to build applications without having the language runtime and build tools installed locally.

Save parameters adds the default build options in samconfig.toml.

version = 0.1
[default.build.parameters]
template_file = "c:\\Code\\lambda-dx\\Rest-API\\template.yaml"
cached = true
parallel = true
use_container = true

AWS SAM builds the application. It downloads the build container image, installs the dependencies, and copies the function code.

Press any key to close the additional terminal.

Iterate locally: invoke and debug

You can locally invoke and debug your serverless application before uploading it to the cloud. This helps you to test the logic of your function faster. Step-through debugging allows you to identify and fix issues in your application one instruction at a time in your local environment.

Local invoke

In the Application Builder section, navigate to the function and select Local Invoke and Debug Configuration.

Initiating Local Invoke and Debug Configuration

This brings up another window which allows you to configure how to invoke the function locally and set up a debug configuration.

Viewing Local Invoke and Debug Configuration Options

You can create sample event payloads to test your function. Select an event provides a list of common trigger event payloads you can use and customize.

Selecting an example event template

This example application has an included sample event.

Select Local file and choose the events\event.json file.
Select the Invoke button.

This builds the application and locally invokes the function within a Lambda emulation environment, using the event input file.

View the function output within the IDE Output pane.

Viewing function output

Local debugging

You can also debug the function locally using VS Code’s built-in debugger.

Add a breakpoint to the function code.

Adding a breakpoint to the function code

Select the Invoke button again.

This locally invokes the function and attaches a debugger to the Lambda emulation environment.

The debugger stops at the breakpoint and you can view the function variables and call stack.

Viewing step through debugging

Use the VS Code debugger icons to step through the code.

Using VS Code debugger icons to step through the code.

In the Local Invoke and Debug Configuration panel. Chose Save Debug Config.
Choose Add Local Invoke and Debug Configuration.

Saving debug configuration

Enter a debug configuration name which creates a launch.json file and adds the debug configuration.

Naming debug configuration

You can create and save multiple debug configurations for different scenarios. See the AWS SAM documentation for more launch.json configuration options.

Once you save the debug configuration, you can use VS Code’s Run and Debug panel and select which debug configuration to run.

Using the Run and Debug panel

Deploying the application

Navigate to the Application Builder section and chose the Deploy SAM Application icon.

Selecting Deploy SAM Application icon

AWS SAM provides two deployment options:

Sync uses AWS SAM sync to perform an initial CloudFormation deploy and then allows for quick syncing of your application code, which allows for rapid prototyping. Use this for development environments only, as it doesn’t do a full CloudFormation deploy on code changes.
Deploy does a full CloudFormation deploy, which is preferred for non-quick development environments.

Viewing AWS SAM deployment options

Select Sync.
Select Specify required parameters and save as defaults.

Specifying SAM sync parameters

Select a Region to deploy the stack and enter a stack name. It is good practice to specify that this is a dev stack to avoid confusion when using the Deploy option.

Entering dev stack name

Select an existing S3 bucket to store the artifacts, or create a new one.

Selecting S3 bucket

Specify the Sync parameters. Ensure you select Watch as this automatically watches for code changes and quickly syncs code changes to the Lambda service

Setting sync parameters

AWS SAM sync does an initial CloudFormation deploy to build the resources and then waits for code changes.
Make a change to the handler file code and save the file,

Amending code

This performs a quick sync which reduces the time to test in the cloud.

Quickly syncing code

You can use the Deploy option to deploy a non-quick sync test version, amending the stack name to differentiate it from the dev stack.

Naming test version stack

Remote invoke

You can invoke the function in the cloud from your IDE. This allows you to test functionality without having to mock security, external services, or other environment variables.

Once the application is deployed, Application Builder detects changes to samconfig.toml and template.yaml, it updates the resources list with the cloud resources.

Viewing cloud resources

You can browse directly to the CloudFormation stack to view resources.

Browsing to CloudFormation stack

Selecting the function provides quick link functionality, which includes function details and a link directly to the Lambda console for the function.

Viewing function quick link options

Select Invoke in cloud.
Select the same local event file for the local invoke.

Selecting local file for remote invoke

Choose Remote Invoke.

The function invokes in the cloud using the local test event and displays the remote invoke results in the local IDE Output pane.

Viewing remote invoke results

Name and save the local event file as a remote event which becomes available in the Lambda console.

Saving remote test event

Viewing logs

You can fetch the Amazon CloudWatch Log streams generated by your Lambda function in the IDE.

Select the Search Logs icon.

Selecting Search Logs icon

You can optionally filter the results.

Optionally filtering log results

Conclusion

Lambda is introducing an enhanced local IDE experience to simplify the development of Lambda-based applications using the VS Code IDE and AWS Toolkit. This streamlines the code-test-deploy-debug cycle. A guided walkthrough helps set up your local development environment and provides sample applications to explore Lambda functionality. You can then build, debug, test, and deploy Lambda applications using icon shortcuts and the Command Pallette. This allows you to more easily iterate on your Lambda-based applications without switching between multiple interfaces.

For more serverless learning resources, visit Serverless Land.

Accused of Body Snatching

2024-10-31 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=jbO57LLdTOw

A new AWS CDK L2 construct for Amazon CloudFront Origin Access Control (OAC)

2024-10-31 Josh DeMuth

Post Syndicated from Josh DeMuth original https://aws.amazon.com/blogs/devops/a-new-aws-cdk-l2-construct-for-amazon-cloudfront-origin-access-control-oac/

Recently, we launched a new AWS Cloud Development Kit (CDK) L2 construct for Amazon CloudFront Origin Access Control (OAC). This construct simplifies the configuration and maintenance of securing Amazon Simple Storage Service (Amazon S3) CloudFront origins with CDK. Launched in 2022, OAC is the recommended way to secure your CloudFront distributions due to additional security features compared to the legacy Origin Access Identity (OAI). This new construct makes it easier for you to use the latest origin access best practices to build and manage your CloudFront distributions.

CDK is an open-source software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. A primary part of CDK is the AWS CDK Construct Library which is a collection of pre-written constructs. Constructs are the basic building blocks of CDK applications. They help reduce the complexity required to define and integrate AWS services together.

There are different levels of constructs, starting with Level 1 (L1) which map directly to a single CloudFormation resource and offer no abstraction. L1 constructs are auto-generated, which means you can build any CloudFormation resource using CDK. The power of the CDK starts with Level 2 (L2) and higher constructs. L2 constructs, also known as curated constructs, are developed by the CDK team and provide a higher-level abstraction through an intuitive intent-based API. You can read more about constructs and their benefits in the CDK user guide.

In this post we’ll explore:

The reasoning behind the creation of a new L2 construct for OAC
How to use the new OAC construct
How to migrate from the legacy OAI construct to the new OAC construct

Background

Amazon CloudFront is a global content delivery network that reduces latency by delivering data to viewers anywhere in the world. CloudFront can connect to different types of locations or origins, such as S3, AWS Lambda function URLs, and custom origins. A full list of supported origins can be found in the CloudFront user guide.

At launch, the new L2 construct supports OAC with S3 origins. Using OAC with S3 allows you to keep your S3 bucket private, yet accessible, through CloudFront. This forces users to access content only through CloudFront where other security features can be applied, like AWS WAF.

There are two ways to restrict buckets to only CloudFront, using OAI (legacy) or OAC (recommended). Both OAI and OAC allow you to secure your buckets, but OAC offers additional benefits, including support for:

Any new AWS Regions launched after December 2022
Amazon S3 server-side encryption with AWS Key Management Service (SSE-KMS)
Dynamic requests (PUT and DELETE) to Amazon S3
Enhanced security practices like short term credentials, frequent credential rotations, and resource-based policies

Prior to this release, customers had to piece together L1 constructs as well as use escape hatches in order to implement OAC.

The introduction of the new L2 construct simplifies the process by enhancing abstraction and reducing complexity. It makes it easy to use OAC while still offering the flexibility to customize all existing properties available by building with L1’s.

Let’s see the construct in action!

Using the L2

We modified the existing CloudFront Origins L2 to add OAC support. With this change, the S3Origin class has been deprecated in favor of S3BucketOrigin for standard S3 origins, and S3StaticWebsiteOrigin for static website S3 origins.

Using OAC with a standard S3 origin is as simple as passing your bucket as a parameter to the new S3BucketOrigin class using the withOriginAccessControl method.

In the example below, we define a private bucket and let the new L2 construct handle all the OAC configuration for us.

const s3bucket = new s3.Bucket(this, "myBucket", {
    bucketName: `${cdk.Stack.of(this).stackName.toLowerCase()}-oacbucket`,
    blockPublicAccess: s3.BlockPublicAccess.BLOCK_ALL,
    accessControl: s3.BucketAccessControl.PRIVATE,
    enforceSSL: true,
});

const distribution = new cloudfront.Distribution(this, "myDist", {
    defaultBehavior: {
        origin: origins.S3BucketOrigin.withOriginAccessControl(s3bucket),
    }
});

This code defines a S3 bucket configured to block all public access, enforces SSL, and uses private access control. It also defines a CloudFront distribution with the S3 bucket as its origin using the default OAC settings from the OAC construct. By default, the signing behavior is set to “always” and the signing protocol to “sigv4” as is shown below:

Showing the default signing behavior settings for Amazon CloudFront when using the default properties for the L2 construct

Figure 1 – Default OAC Settings for the S3 Origin

In typical CDK fashion, the L2 provides sane defaults out-of-the-box but allows you to customize. Some examples of customizing with the L2, include:

Changing the signing behavior
Granting permissions for dynamic requests (write/delete access)
Migrating to the new construct but continuing to use OAI

S3 Origin with Customer Managed KMS

It is a recommended security best practice to encrypt S3 objects at rest. As detailed in the S3 user guide, using SSE-KMS gives you additional flexibility to meet encryption-related compliance requirements.

If using SSE-KMS, CloudFront must have permission to at least decrypt objects using your AWS KMS key. With the new changes, simply configure your bucket to use an AWS KMS key and the construct will take care of the permissions updates.

The following example shows how to create an SSE-KMS encrypted S3 bucket and use it as a CloudFront origin with OAC:

Create an AWS KMS Key.
Create the S3 Bucket with the encryptionKey property set to the AWS KMS key and encryption as KMS.
Create the CloudFront distribution and use the new S3BucketOrigin class with OAC.

const kmsKey = new kms.Key(this, "myKey");

const myBucket = new s3.Bucket(this, 'myEncryptedBucket', {
  encryption: s3.BucketEncryption.KMS,
  encryptionKey: kmsKey,
  objectOwnership: s3.ObjectOwnership.BUCKET_OWNER_ENFORCED,
}); 

new cloudfront.Distribution(this, 'myDist', {
  defaultBehavior: { 
    origin: origins.S3BucketOrigin.withOriginAccessControl(myBucket) 
  },
});

Due to circular dependencies between the bucket, the KMS key, and the CloudFront distribution, when we synthesize the above code, a warning message similar to the following may appear:

To avoid a circular dependency between the KMS key, Bucket, and Distribution during the initial deployment, a wildcard is used in the Key policy condition to match all Distribution IDs.

After the initial deployment, it is recommended to further restrict the policy to adhere to security best practices. Here is an example of how to use an escape hatch to update the policy:

Note: update the existing KMS Key policy to include the statements in scopedDownKeyPolicy

const scopedDownKeyPolicy = {
    Version: "2012-10-17",
    Statement: [
        {
            Effect: "Allow",
            Principal: {
                AWS: `arn:aws:iam::${this.account}:root`,
            },
            Action: "kms:*",
            Resource: "*",
        },
        {
            Effect: "Allow",
            Principal: {
                Service: "cloudfront.amazonaws.com",
            },
            Action: ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"],
            Resource: "*",
            Condition: {
                StringEquals: {
                    "AWS:SourceArn": `arn:aws:cloudfront::${this.account}:distribution/<Distribution ID>`//replace <Distribution ID> with the ID of the deployed CloudFront Distribution 
                },
            },
        },
    ],
};

const cfnKey = kmsKey.node.defaultChild as kms.CfnKey;
cfnKey.addOverride("Properties.KeyPolicy", scopedDownKeyPolicy);

For detailed instructions on how to update the AWS KMS key policy, please refer to the “Scoping down the key policy” section in the CDK API docs.

Considerations when migrating to the new construct

If you have an existing OAI implementation using the now-deprecated S3Origin class, switching to OAC has potential for application downtime. CloudFront could temporarily lose access to the S3 bucket while the CloudFormation is deploying.

To avoid downtime, it is recommended to perform the upgrade across multiple deployments. This will explicitly give CloudFront permissions to both OAC and OAI in the S3 bucket policy before the migration is performed.

At a high level, the three steps are:

Deployment 1: update the bucket policy to explicitly allow CloudFront access
Deployment 2: switch to the new construct
Deployment 3: optionally remove the code that updated the bucket policy in step 1

For detailed instructions, see the Migrating from OAI to OAC section of the CDK API docs.

Conclusion

In this post, we introduced the new AWS CDK L2 construct for Amazon CloudFront Origin Access Control (OAC), highlighting the advantages of using OAC instead of OAI to secure your Amazon S3 CloudFront origins. We showcased practical implementations of the new construct, focusing on using defaults of the L2 construct along with how to customize for your use case.

To summarize, the new L2 construct and OAC offers these benefits:

Simplified configuration: the L2 construct simplifies the process of configuring OAC in your CDK application to set up secure access controls between CloudFront and S3 buckets
Using SSE-KMS encryption: the L2 construct will automatically add the IAM policy statement to the AWS KMS key to allow access to OAC
Ability to customize: the L2 construct offers properties to override defaults, like changing the signing behavior
Easily migrate from OAI to OAC: the L2 construct offers options to migrate from OAI to OAC based on your application’s downtime tolerance

At launch, the construct supports an Amazon S3 origin. If there is a particular origin that you are looking to see added to the construct library for OAC, please submit a feature request issue in the aws-cdk GitHub repo. For example, if you are interested in Lambda support for OAC, add your feedback to this feature request.

If you’re new to CDK and want to get started, we highly recommend checking out the CDK documentation and the CDK workshop.

Amazon Aurora PostgreSQL Limitless Database is now generally available

2024-10-31 Channy Yun (윤석찬)

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/amazon-aurora-postgresql-limitless-database-is-now-generally-available/

Today, we are announcing the general availability of Amazon Aurora PostgreSQL Limitless Database, a new serverless horizontal scaling (sharding) capability of Amazon Aurora. With Aurora PostgreSQL Limitless Database, you can scale beyond the existing Aurora limits for write throughput and storage by distributing a database workload over multiple Aurora writer instances while maintaining the ability to use it as a single database.

When we previewed Aurora PostgreSQL Limitless Database at AWS re:Invent 2023, I explained that it uses a two-layer architecture consisting of multiple database nodes in a DB shard group – either routers or shards to scale based on the workload.

Routers – Nodes that accept SQL connections from clients, send SQL commands to shards, maintain system-wide consistency, and return results to clients.
Shards – Nodes that store a subset of tables and full copies of data, which accept queries from routers.

There will be three types of tables that contain your data: sharded, reference, and standard.

Sharded tables – These tables are distributed across multiple shards. Data is split among the shards based on the values of designated columns in the table, called shard keys. They are useful for scaling the largest, most I/O-intensive tables in your application.
Reference tables – These tables copy data in full on every shard so that join queries can work faster by eliminating unnecessary data movement. They are commonly used for infrequently modified reference data, such as product catalogs and zip codes.
Standard tables – These tables are like regular Aurora PostgreSQL tables. Standard tables are all placed together on a single shard so join queries can work faster by eliminating unnecessary data movement. You can create sharded and reference tables from standard tables.

Once you have created the DB shard group and your sharded and reference tables, you can load massive amounts of data into Aurora PostgreSQL Limitless Database and query data in those tables using standard PostgreSQL queries. To learn more, visit Limitless Database architecture in the Amazon Aurora User Guide.

Getting started with Aurora PostgreSQL Limitless Database
You can get started in the AWS Management Console and AWS Command Line Interface (AWS CLI) to create a new DB cluster that uses Aurora PostgreSQL Limitless Database, add a DB shard group to the cluster, and query your data.

1. Create an Aurora PostgreSQL Limitless Database Cluster
Open the Amazon Relational Database Service (Amazon RDS) console and choose Create database. For Engine options, choose Aurora (PostgreSQL Compatible) and Aurora PostgreSQL with Limitless Database (Compatible with PostgreSQL 16.4).

For Aurora Limitless Database, enter a name for your DB shard group and values for minimum and maximum capacity measured by Aurora capacity units (ACUs) across all routers and shards. The initial number of routers and shards in a DB shard group is determined by this maximum capacity. Aurora PostgreSQL Limitless Database scales a node up to a higher capacity when its current utilization is too low to handle the load. It scales the node down to a lower capacity when its current capacity is higher than needed.

For DB shard group deployment, choose whether to create standbys for the DB shard group: no compute redundancy, one compute standby in a different Availability Zone, or two compute standbys in two different Availability Zones.

You can set the remaining DB settings to what you prefer and choose Create database. After the DB shard group are created, they’re displayed on the Databases page.

You can connect, reboot, or delete a DB shard group, or you can change the capacity, split a shard, or add a router in the DB shard group. To learn more, visit Working with DB shard groups in the Amazon Aurora User Guide.

2. Create Aurora PostgreSQL Limitless Database tables
As shared previously, Aurora PostgreSQL Limitless Database has three table types: sharded, reference, and standard. You can convert standard tables to sharded or reference tables to distribute or replicate existing standard tables or create new sharded and reference tables.

You can use variables to create sharded and reference tables by setting the table creation mode. The tables that you create will use this mode until you set a different mode. The following examples show how to use these variables to create sharded and reference tables.

For example, create a sharded table named items with a shard key composed of the item_id and item_cat columns.

SET rds_aurora.limitless_create_table_mode='sharded';
SET rds_aurora.limitless_create_table_shard_key='{"item_id", "item_cat"}';
CREATE TABLE items(item_id int, item_cat varchar, val int, item text);

Now, create a sharded table named item_description with a shard key composed of the item_id and item_cat columns and collocate it with the items table.

SET rds_aurora.limitless_create_table_collocate_with='items';
CREATE TABLE item_description(item_id int, item_cat varchar, color_id int, ...);

You can also create a reference table named colors.

SET rds_aurora.limitless_create_table_mode='reference';
CREATE TABLE colors(color_id int primary key, color varchar);

You can find information about Limitless Database tables by using the rds_aurora.limitless_tables view, which contains information about tables and their types.

postgres_limitless=> SELECT * FROM rds_aurora.limitless_tables;

 table_gid | local_oid | schema_name | table_name  | table_status | table_type  | distribution_key
-----------+-----------+-------------+-------------+--------------+-------------+------------------
         1 |     18797 | public      | items       | active       | sharded     | HASH (item_id, item_cat)
         2 |     18641 | public      | colors      | active       | reference   | 

(2 rows)

You can convert standard tables into sharded or reference tables. During the conversion, data is moved from the standard table to the distributed table, then the source standard table is deleted. To learn more, visit Converting standard tables to limitless tables in the Amazon Aurora User Guide.

3. Query Aurora PostgreSQL Limitless Database tables
Aurora PostgreSQL Limitless Database is compatible with PostgreSQL syntax for queries. You can query your Limitless Database using psql or any other connection utility that works with PostgreSQL. Before querying tables, you can load data into Aurora Limitless Database tables by using the COPY command or by using the data loading utility.

To run queries, connect to the cluster endpoint, as shown in Connecting to your Aurora Limitless Database DB cluster. All PostgreSQL SELECT queries are performed on the router to which the client sends the query and shards where the data is located.

To achieve a high degree of parallel processing, Aurora PostgreSQL Limitless Database utilizes two querying methods: single-shard queries and distributed queries, which determines whether your query is single-shard or distributed and processes the query accordingly.

Single-shard query – A query where all the data needed for the query is on one shard. The entire operation can be performed on one shard, including any result set generated. When the query planner on the router encounters a query like this, the planner sends the entire SQL query to the corresponding shard.
Distributed query – A query run on a router and more than one shard. The query is received by one of the routers. The router creates and manages the distributed transaction, which is sent to the participating shards. The shards create a local transaction with the context provided by the router, and the query is run.

For examples of single-shard queries, you use the following parameters to configure the output from the EXPLAIN command.

postgres_limitless=> SET rds_aurora.limitless_explain_options = shard_plans, single_shard_optimization;
SET

postgres_limitless=> EXPLAIN SELECT * FROM items WHERE item_id = 25;

                     QUERY PLAN
--------------------------------------------------------------
 Foreign Scan  (cost=100.00..101.00 rows=100 width=0)
   Remote Plans from Shard postgres_s4:
         Index Scan using items_ts00287_id_idx on items_ts00287 items_fs00003  (cost=0.14..8.16 rows=1 width=15)
           Index Cond: (id = 25)
 Single Shard Optimized
(5 rows)

To learn more about the EXPLAIN command, see EXPLAIN in the PostgreSQL documentation.

For examples of distributed queries, you can insert new items named Book and Pen into the items table.

postgres_limitless=> INSERT INTO items(item_name)VALUES ('Book'),('Pen')

This makes a distributed transaction on two shards. When the query runs, the router sets a snapshot time and passes the statement to the shards that own Book and Pen. The router coordinates an atomic commit across both shards, and returns the result to the client.

You can use distributed query tracing, a tool to trace and correlate queries in PostgreSQL logs across Aurora PostgreSQL Limitless Database. To learn more, visit Querying Limitless Database in the Amazon Aurora User Guide.

Some SQL commands aren’t supported. For more information, see Aurora Limitless Database reference in the Amazon Aurora User Guide.

Things to know
Here are a couple of things that you should know about this feature:

Compute – You can only have one DB shard group per DB cluster and set the maximum capacity of a DB shard group to 16–6144 ACUs. Contact us if you need more than 6144 ACUs. The initial number of routers and shards is determined by the maximum capacity that you set when you create a DB shard group. The number of routers and shards doesn’t change when you modify the maximum capacity of a DB shard group. To learn more, see the table of the number of routers and shards in the Amazon Aurora User Guide.
Storage – Aurora PostgreSQL Limitless Database only supports the Amazon Aurora I/O-Optimized DB cluster storage configuration. Each shard has a maximum capacity of 128 TiB. Reference tables have a size limit of 32 TiB for the entire DB shard group. To reclaim storage space by cleaning up your data, you can use the vacuuming utility in PostgreSQL.
Monitoring – You can use Amazon CloudWatch, Amazon CloudWatch Logs, or Performance Insights to monitor Aurora PostgreSQL Limitless Database. There are also new statistics functions and views and wait events for Aurora PostgreSQL Limitless Database that you can use for monitoring and diagnostics.

Now available
Amazon Aurora PostgreSQL Limitless Database is available today with PostgreSQL 16.4 compatibility in the AWS US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Hong Kong), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm) Regions.

Give Aurora PostgreSQL Limitless Database a try in the Amazon Aurora console. For more information, visit the Amazon Aurora User Guide and send feedback to AWS re:Post for Amazon Aurora or through your usual AWS support contacts.

— Channy

The Great Depression: A Decade of Suffering and Surviving

2024-10-31 Geographics

Post Syndicated from Geographics original https://www.youtube.com/watch?v=xKflKAQIZJ4

[$] The Overture open-mapping project

2024-10-31 corbet

Post Syndicated from corbet original https://lwn.net/Articles/995992/

OpenStreetMap tends to dominate
the space for open mapping data, but it is not the only project working in
this area. At the 2024 Open
Source Summit Japan, Marc Prioleau presented the Overture Maps Foundation, which is
building and distributing a set of worldwide maps under open licenses.
Overture may have a similar goal to OpenStreetMap, but its approach and
intended uses are significantly different.

Приобщаващото образование – мисията (почти) възможна

2024-10-31 Надежда Цекулова

Post Syndicated from Надежда Цекулова original https://www.toest.bg/priobshtavashtoto-obrazovanie-misiyata-pochti-vuzmozhna/

Приобщаващото образование – мисията (почти) възможна

Сара Краус е директор „Училищни партньорства“ във Фондация „Шърман“ към Университета на Мериленд в Балтимор. Започва кариерата си като учителка в системата на държавното образование в Балтимор, където преподава на първи и втори клас в продължение на десет години. След това е помощник-директор в училище за пет години, а от три години работи в университета, като развива програми за обучение на учители и подобряване на приобщаващите политики в държавните училища.

Наскоро Сара Краус и нейни колеги посетиха България по покана на Фондация „Тръст за социална алтернатива“, за да участват в Mеждународната конференция за ранно детско развитие „За благоденствието на всички деца е нужна цяла екосистема за ранно детско развитие“.

Надежда Цекулова разговаря с нея за сходствата и различията в приобщаващото образование у нас и в САЩ и за еднаквите предизвикателства, пред които всички се изправяме като човеци.

Как включвате ученици с различни образователни нужди или увреждания в системата, с която имате опит?

Да започнем оттам, че образователните политики са разнообразни в различните щати и това, което ви разказвам, е валидно за региона, в който работя. В Балтимор имаме философия на приобщаващо образование, при която учениците с различни нужди или увреждания са част от традиционната класна стая. Вместо да имаме отделни класове за специално образование, прилагаме по-приобщаващ модел – специализирани учители често идват да работят с учениците с увреждания в рамките на общия клас. Понякога учениците с различни нужди се извеждат за кратък период през деня, за да получат необходимите услуги. Идеята е по-голямата част от времето си те все пак да прекарват в общообразователната среда.

Колко време отне преходът от отделни класове за специално образование към приобщаващи класни стаи?

Това е постоянно развиваща се система. В този смисъл не можем да кажем, че е протекла някаква промяна и тя е приключила. Реализирането на приобщаващо образование изисква специализирано обучение за учителите, които работят с ученици с увреждания, а понякога учителите в общообразователната система нямат същото ниво на подготовка. Намирането на правилния баланс за осигуряване на необходимата подкрепа е непрекъснато развиваща се задача.

Можете ли да опишете как изглежда приобщаващата класна стая? Например колко ученици и учители има в нея и дали има специален учител за децата със специални нужди?

Конфигурацията може да е различна в различните класове и дори в различните часове в един и същи клас, но в идеалния случай приобщаващата класна стая трябва да изглежда така, че да не можете да различите децата, които получават специални услуги, от останалите. Понякога се използва модел на ко-обучение, особено ако значителен брой ученици изискват специална подкрепа. В такъв случай специален учител може да се присъедини към учителя в общообразователната система.

Какво имате предвид под „значителен брой“?

Класовете може да имат от 20 до над 30 ученици, като около 25% може да се нуждаят от допълнителна подкрепа. Както споменах, понякога учениците със специални нужди имат и специални занимания или рехабилитация, или друг вид услуга според потребностите си, а после се връщат в клас.

Това означава ли, че тези допълнителни услуги се извършват в сградата на училището и са организирани, така да се каже, институционално? Не се налага родителят да дойде, да вземе детето от училище, да го заведе до специалната услуга някъде другаде и после евентуално да го върне?

Не, не се налага. Децата са в едно и също училище, в една и съща сграда, срещат се непрекъснато, обядват заедно. Дори понякога часовете по изкуства се адаптират, за да са достъпни за всички и да могат заедно да се занимават в тях.

Как се справяте с ученици със сензорни увреждания, например с увредено зрение или слух?

Някои училища са специално предназначени за деца със сензорни увреждания. В държавните училища тези ученици могат да имат асистент, който да помага в комуникацията в приобщаващата класна стая. Аз лично не съм работила с такива ученици. Теоретично в общите класове това е възможно при наличието на подходяща подкрепа.

В България законът изисква приобщаващо образование, но на практика много учители и училищни ръководители не се чувстват подготвени, което засяга както децата с увреждания, така и техните съученици. Имате ли подобни предизвикателства?

Да, имаме сходни пропуски и срещаме подобни проблеми. Училищата са задължени по закон да приемат всички ученици, но ако нямат ресурсите, може да възникнат предизвикателства. Училищата не могат да откажат прием на ученици на база увреждания, но намирането на адекватна подкрепа може да бъде трудно и обикновено децата страдат в резултат на това. Трябва да призная, че по тази причина децата със сензорни нарушения например рядко попадат в общия клас. В сферата на държавното образование обаче специалното училище за такива деца се намира на територията на общообразователното, така че все пак не са изолирани. Има, разбира се, и частни училища, които предлагат напълно различна организация, и семействата обичайно избират според предпочитанията и възможностите си.

Какви обучения и подкрепа са необходими на учителите, за да работят ефективно с разнообразни класове?

Учителите се нуждаят от обучение как да адаптират учебния материал спрямо различните нужди на учениците. От една страна, учителят има програма, която трябва да изпълни. От друга, трябва да съумее да я приспособи към разнообразните нужди, пред които го изправя класната стая, когато влезе в нея. Например някои ученици може да се нуждаят от гласови инструкции или от достъпни материали, с които да могат да работят.

За учителите е предизвикателство да балансират тези нужди, особено в големи класове, и със сигурност имат необходимост както от специално обучение, така и от продължаваща подкрепа в хода на работата си. Всъщност смятам, че това е основният въпрос, защото не си представям, че съществува учител, който някога е влязъл в класна стая, и си е казал: „Не искам да се занимавам с това дете.“ Но той трябва да се чувства подготвен да намери индивидуален подход към конкретното дете с дислексия например и след това да повтори същото още 25 пъти. И не на последно място, когато учителят се сблъска с проблем, с който не знае как да се справи, трябва да е наясно откъде може да получи подкрепа.

Как във Вашия университет първоначално подготвяте учителите да отговорят на тези разнообразни нужди?

В САЩ учителите обичайно завършват четиригодишно обучение и придобиват опит в класната стая. В Мериленд прекарват един семестър в клас като наблюдатели и в този период могат да асистират на основния учител. След това още един цял семестър са на пълен работен ден като учители под ръководството на ментор. Трябва да държат и изпити по конкретни дисциплини, за да получат правоспособност. Преди няколко години обаче, в отговор на драстичния недостиг на учители в страната, беше разработена програма за съкратено обучение. Тя даде възможност хора с различни специалности след кратко обучение да влязат в класната стая и да получат правоспособност в хода на активната си преподавателска дейност. Това беше критично нужно, но породи редица трудности. Аз самата влязох в образователната система по този начин. Образованието ми беше в сферата на социалните услуги. Като учител стажувах едва един месец, преди да ми поверят собствен клас.

Звучи плашещо.

За мен беше плашещо. Учех се в движение, паралелно вечер посещавах теоретични занимания. Не съжалявам, но беше трудно, много трудно.

Има ли специално обучение за приобщаващо образование в програмите за подготовка на учители? Визирам четиригодишните програми.

Да, има курсове за приобщаващо образование, които се фокусират върху разбирането на многообразието и самоосъзнаването. Учителите изследват собствените си предразсъдъци и гледни точки, за да разбират по-добре нуждите на всички ученици.

Звучи като психоанализа за учители.

(Смее се.) Да, донякъде е това. Изключително необходимо и полезно е, за да можеш, когато влезеш в класната стая и видиш деца с различен цвят на кожата, различен език и култура, някои с дефицит на вниманието или аутизъм, или дислексия… Като видиш цялото това многообразие, да познаваш собствените си предразсъдъци и евентуални ограничения, за да можеш да бъдеш оптимално полезен на всички деца в тяхната различност и многообразие. Извън университета също има курсове за учители, които се занимават с това.

Друг експерт, с когото разговарях наскоро, каза, че никаква приобщаваща политика няма значение, ако не научим хората да приемат различията. Срещате ли такова предизвикателство и как помагате на учителите и учениците да приемат различията в клас?

Това е въпрос на осъзнаване и насърчаване на „смели разговори“. И също е процес, който не можем да кажем, че сме завършили, но държим на създаването на приобщаваща култура в клас. При нас например се говори много за различния цвят на кожата, за различния език и култура, тъй като голям дял от учениците ни са испаноезични и тъмнокожи. Но трудно се водят разговорите за приемане на хората с хомосексуална ориентация например.

В България понякога се случва учителите да са готови да търсят решения за някое дете, но родителите на другите деца да не са. Ставали сме свидетели на петиции за извеждане на деца от клас заради смущения в дисциплината или културни различия. Как училищата могат да се справят с такива ситуации?

И ние се сблъскваме с подобни ситуации. Училищата често се фокусират върху това да премахнат неприемливото поведение, вместо да потърсят причината за него, за да не стават обект на натиск. Нашата философия е, че не можем да изключваме дете заради единичен случай или дори заради поредица от случаи. Важно е винаги да се търси първопричината за проблема, а не да се отхвърля детето. Защото то има право на образование като всички останали и често има нужда от нашата подкрепа и закрила.

Интервюто е част от поредица разговори за достъпа до образование на децата от уязвими групи. Проектът се осъществява благодарение на най-голямата социално отговорна инициатива на „Лидл България“ – „Ти и Lidl за нашето утре“, в партньорство с Фондация „Работилница за граждански инициативи“, Българския дарителски форум и Асоциацията на европейските журналисти.

Quoth the Drive Stats, Nevermore: An Elegy for Our Seagate 4TB Drives

2024-10-31 Andy Klein

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/quoth-the-drive-stats-nevermore-an-elegy-for-our-seagate-4tb-drives/

A decorative image showing a gravestone with ravens around it.

Once upon a midnight dreary, as I typed another query

Seeking many a quaint and curious fact of hidden Drive Stats lore—

While I waited, time advancing, suddenly the stats came dancing

Lines of empty datasets; the database had nothing more

“Is that right?” I muttered, “The database had nothing more—

So those drives, I must explore.”

Ah, distinctly I remember, it was just past this September

I requested failure rates of Seagate drives with terabytes of four

Eagerly I typed the query, even though my eyes were bleary

The count of Seagate fours was eerie, eerie; there was nothing more.

The sad and certain count screamed like it never had before;

No Seagate drives with terabytes of four.

There are missing rows, I’m certain, and files waiting to explore.

The reality I kept dismissing, the Seagate data must be missing

With hours gone to data fishing, the facts shook me to the core;

The spinning life is over for our Seagate drives with terabytes of four—

Those Seagate drives are nevermore.

(My apologies to Edgar Allen Poe.)

Shortly, we will publish the Q3 2024 Backblaze Drive Stats report, and an old faithful will be missing from the tables, the 4TB Seagate drive model ST4000DM000. This drive model has graced our Drive Stats charts and tables since the very first Drive Stats report, and it would be a ghastly mistake if we let the drive slip into the afterlife unnoticed. So on this All Hallows’ Eve, it’s only fitting we say nevermore to these Seagate drives.

The first 45 of these Seagate 4TB drives were installed in a 45-drive Backblaze Storage Pod in May 2013. That was before 60-drive Storage Pods, Backblaze Vaults, and even Backblaze B2. Over the next two years, thousands of new Seagate 4TB drives were added each quarter, and by Q3 2016, there were 34,744 spinning souls in service. That represented more than 50% of all the drives in service at the time—a howling success that has not been duplicated by any other drive model.

Alas, that didn’t last as the first wave of 8TB drives arrived in mid-2016 and with that, no additional 4TB Seagate drives were procured. Over time, as 4TB Seagate drives met their maker, the count decreased, and when Storage Pods containing these drives started being phased out in 2018, the count dropped faster. The final nail in the coffin came when, in 2023, our CVT drive migration system became fixated on the replacement of all the remaining 4TB Seagate drives, and here we are.

As for those intrepid 45 original drives installed in May 2013, they were not around at the end. They were unceremoniously replaced in a Storage Pod upgrade back in 2017. A few were resurrected as drive replacements, but today they only exist in the spirit world, having died or been replaced by 2020. Still many other 4TB Seagate drives have lived long happy lives, with nearly 100 exceeding 100 months of service (8.4 years) before being sent to their final resting place by the CVT reaper.

And so it is time; we shall gather in a circle, cross our arms and hold hands and chant “our Seagate drives…with terabytes of four…are nevermore!”

The post Quoth the Drive Stats, Nevermore: An Elegy for Our Seagate 4TB Drives appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

How Volkswagen Autoeuropa built a data mesh to accelerate digital transformation using Amazon DataZone

2024-10-31 Dhrubajyoti Mukherjee

Post Syndicated from Dhrubajyoti Mukherjee original https://aws.amazon.com/blogs/big-data/how-volkswagen-autoeuropa-built-a-data-mesh-to-accelerate-digital-transformation-using-amazon-datazone/

This is a joint blog post co-authored with Martin Mikoleizig from Volkswagen Autoeuropa.

Volkswagen Autoeuropa is a Volkswagen Group plant that produces the T-Roc. The plant is located near Lisbon, Portugal and produces about 934 cars per day. In 2023, Volkswagen Autoeuropa represented 1.3% of the national GDP of Portugal and 4% in national export of goods impact with a sales volume of 3.3511 billion Euros. Volkswagen Autoeuropa aims to become a data-driven factory and has been using cutting-edge technologies to enhance digitalization efforts.

In this post, we discuss how Volkswagen Autoeuropa used Amazon DataZone to build a data marketplace based on data mesh architecture to accelerate their digital transformation. The data mesh, built on Amazon DataZone, simplified data access, improved data quality, and established governance at scale to power analytics, reporting, AI, and machine learning (ML) use cases. As a result, the data solution offers benefits such as faster access to data, expeditious decision making, accelerated time to value for use cases, and enhanced data governance.

Understanding Volkswagen Autoeuropa’s challenges

At the time of writing this post, Volkswagen Autoeuropa has already implemented more than 15 successful digital use cases in the context of real-time visualization, business intelligence, industrial computer vision, and AI.

Before the AWS partnership, Volkswagen Autoeuropa faced the following challenges.

Long lead time to access data – The digital use cases launched by Volkswagen Autoeuropa spent most of their project time getting access to the data that was relevant to their use cases. After the right data for the use case was found, the IT team provided access to the data through manual configuration. The lead time to access data was often from several days to weeks.
Insufficient data governance and auditing – Data was shared directly to use cases by copying it. Therefore, the IT team connected the data manually from their sources to the desired destinations multiple times. This process wasn’t centrally tracked to discover any information on the data sharing process. For example, if the data was copied in the past, how many use cases have access to the data, when access was granted, and who granted the access.
Redundant effort to process the same information – Because the IT team copied the data sources based on the exact use case requirements, they shared specific columns of the tables from the data. As additional use cases requested access to the same data with different column requirements, even more copies of the data were created.
Repeated process to establish security and governance guardrails – Each time the IT and the security team provided a connection to a new data source, they had to set up the security and governance guardrails. This required repeated manual effort.
Data quality issues – Because the data was processed redundantly and shared multiple times, there was no guarantee of or control over the quality of the data. This led to reduced trust in the data.
Absence of data catalog and metadata management – Data didn’t have any metadata associated with it, and so use cases couldn’t consume the data without further explanation from the data source owners and specialists. Furthermore, no process to discover new data existed. Similar to the consumption process, use cases would consult specialists to understand the context of the data and if it could provide value.

Envisioning a data solution for Volkswagen Autoeuropa

To address these challenges, Volkswagen Autoeuropa embarked on a bold vision. They envisioned a seamless data consumption process, similar to an online shopping experience. They envisioned a data marketplace where data users could browse and access high-quality, secure data with clear specifications, business context, and relevant attributes. This vision materialized into a project aimed at transforming data accessibility and governance as the foundation for the digital ecosystem. The vision to be realized: Data as seamless as online shopping.

In collaboration with Amazon Web Services (AWS), Volkswagen Autoeuropa joined the Enhanced Plant Onboarding Program of the Global Volkswagen Group’s Digital Production Platform (DPP EPO) strategy. Through this partnership, AWS and Volkswagen Autoeuropa created a data marketplace that significantly improved data availability.

In the discovery phase of the project, Volkswagen Autoeuropa and AWS evaluated several options to build the data solution. In the end, Volkswagen Autoeuropa chose a solution based on data mesh architecture using Amazon DataZone. Being a managed service, Amazon DataZone provided the necessary speed and agility to build the solution. At the same time, it led to higher operational efficiencies and lower operational overhead. The team adopted a data mesh architecture because the principles of the data mesh aligned with Volkswagen Autoeuropa’s vision of being a data driven factory.

Solution overview

This section describes the key features and architecture of the Volkswagen Autoeuropa data solution. The solution is based on a data mesh architecture.

Data solution features

The following figure shows the key capabilities of the Volkswagen Autoeuropa data solution.

The key capabilities of the solution are:

Data quality – In the solution, we’ve built a data quality framework to streamline the process of data quality checks and publishing quality scores. It uses AWS Glue Data Quality to generate recommendation rulesets, run orchestrated jobs, store results, and send notifications to users. This framework can be seamlessly integrated into AWS Glue jobs, providing a quality score for data pipeline jobs. In addition, the quality score is published in the Amazon DataZone data portal, allowing consumers to subscribe to the data based on its quality score.Assigning a quality score to the data helps build trust in the data, and shifts the responsibility of maintaining data quality to the data owner. As a result, the quality of the results delivered by these use cases improves.
Data registration – The producers sign in to the Amazon DataZone data portal using their AWS Identity and Access Management (IAM) credentials or single sign-on with integration through AWS IAM Identity Center. They register their data assets, which are stored in Amazon Simple Storage Service (Amazon S3), in the Amazon DataZone data catalog. The metadata of the data assets is stored in an AWS Glue catalog and made available in the business data catalog of Amazon DataZone and in the Amazon DataZone data source. The producers add business context such as business unit name, data owner contact information, and data refresh frequency using Amazon DataZone glossaries and metadata forms. In addition, they use generative AI capabilities to generate business metadata. After the business metadata is generated, they review the changes and modify the metadata if needed.Because all data products in Volkswagen Autoeuropa are now registered in the same location, the likelihood of data duplication is significantly reduced. Moreover, the data producers are improving the quality of the data by adding business context to it.
Data discovery – The consumers sign in to the Amazon DataZone data portal using their IAM credentials or single sign-on with integration through IAM Identity Center and search the data using keywords in the search bar. After the results are returned, they can further filter the results using glossary terms and project names. Finally, they review the business metadata of the data assets to evaluate if the data is relevant to their business use cases. They can check the quality score of the data assets and the refresh schedule for their use cases.With a data discovery capability in place, consumers can gain information about the data without the need to consult the source system owners or specialists.
Data access management – When the consumers find a data asset that’s relevant to their use case, they request access to it using the subscription feature of Amazon DataZone. Data is classified as public, internal, and confidential. For public and internal data assets, the access request is automatically approved. For confidential data assets, the data producer team reviews the access request and either accepts or rejects the subscription request.With a central place to manage data access, data owners can view which use cases have access to their data and when the access request was granted. The fine-grained access control feature of Amazon DataZone gives data owners granular control of their data at the row and column levels.
Data consumption – Upon approval of the subscription request, Amazon DataZone provisions the backend infrastructure to make the data accessible to the corresponding consumers. After this process is complete, the consumers can access the data through Amazon Athena using the deep link feature of Amazon DataZone. The data consumption pattern in Volkswagen Autoeuropa supports two use cases:
- Cloud-to-cloud consumption – Both data assets and consumer teams or applications are hosted in the cloud.
- Cloud-to-on-premises consumption – Data assets are hosted in the cloud and consumer use cases or applications are hosted on-premises.

Requirements specific to a use case requires access to the relevant data assets; sharing data to use cases using Amazon DataZone doesn’t require creating multiple copies. As a result, duplication and processing of data. Furthermore, by reducing the number of copies of the data, the overall quality of the data products improves. In addition, the backend automation of Amazon DataZone to make data available to use cases reduces the manual effort and improves the lead time to access data.

Single collaborative environment – The Amazon DataZone data portal provides a single collaborative environment to the users in Volkswagen Autoeuropa. Data consumers such as use case owners, data engineers, data scientists, and ML engineers can browse and request access to data assets. At the same time, data producers, such as use case owners and source system owners, can publish and curate their data in the Amazon DataZone data portal. This collaborative experience promotes teamwork and accelerates the realization of business value. Furthermore, the security and governance guardrails scales across the organization as the number of use cases increases.

Data solution architecture

The following figure displays the reference architecture of the data solution at Volkswagen Autoeuropa. In the next part of the post, we discuss how we arrived at the solution.

The architecture includes:

The data from SAP applications, manufacturing execution systems (MES), and supervisory control and data acquisition (SCADA) systems is ingested into the producer accounts of Volkswagen Autoeuropa.
In the producer account, raw data is transformed using AWS Glue. The technical metadata of the data is stored in AWS Glue catalog. The data quality is measured using the data quality framework. The data stored in Amazon Simple Storage Service (Amazon S3) is registered as an asset in the Amazon DataZone data catalog hosted in the central governance account.
The central governance account hosts the Amazon DataZone domain and the related Amazon DataZone data portal. The AWS accounts of the data producers and consumers are associated with the Amazon DataZone domain. Amazon DataZone projects belonging to the data producers and consumers are created under the related Amazon DataZone domain units.
Consumers of the data products sign in to the Amazon DataZone data portal hosted in the central governance account using their IAM credentials or single sign-on with integration through IAM Identity Center. They search, filter, and view asset information (for example, data quality, business, and technical metadata).
After the consumer finds the asset they need, they request access to the asset using the subscription feature of Amazon DataZone. Based on the validity of the request, the asset owner approves or rejects the request.
After the subscription request is granted and fulfilled, the asset is accessed in the consumer account for a one-time query using Athena and Microsoft Power BI applications hosted on premises. This consumption pattern can be extended for AI and machine learning (AI/ML) model development using Amazon SageMaker and reporting purposes using Amazon QuickSight.

User journey

After discussing the desired system with the use case teams and stakeholders and analyzing the current workflow, Volkswagen Autoeuropa grouped the user personas of the data solution into three main categories: data producer, data consumer, and data solution administrator. This sets the foundation for the desired user experience and what’s needed to achieve the solution goals.

Data producer

Data producers create the data products in the data solution. There are two types of data producers.

Data source owners – Data source owners publish the raw data in the Amazon DataZone data portal. These data products are attributed as source-based data.
Use case owners – Use case owners publish data that’s fit for consumption by other use cases. These data products are called consumer-based data.

The following figure shows the user journey of a data producer:

A data producer’s journey includes:

Identify data of interest –
1. Identify data (Volkswagen Autoeuropa network).
2. Perform data quality checks (Volkswagen Autoeuropa network).
Connect data to the data solution –
1. Ingest data into the data solution (Amazon DataZone portal).
2. Start process to connect data using AWS Glue.
Locate the data source in the data solution –
1. Register data (Amazon DataZone portal).
2. Add data to the inventory in Amazon DataZone.
Add or edit metadata –
1. Add or edit metadata (Amazon DataZone portal).
2. Publish data assets (Amazon DataZone portal).
Approve or reject subscription request –
1. Review subscription requests.
Maintain data assets –
1. Manage data assets (Amazon DataZone portal).

Data consumer

Data consumers use data for business analytics, machine learning, AI, and business reporting. Data consumers are data engineers, data scientists, ML engineers, and business users. The following diagram shows the journey of a data consumer.

A data consumer’s journey includes:

Access Amazon DataZone portal –
1. Amazon DataZone portal – Access is granted based on the user’s assigned domain and projects.
Search for data assets –
1. Data assets in Amazon DataZone portal – Search for data and brows the results by glossary terms or the project name. Use additional filters to refine the results.
View business metadata –
1. Select a data asset to see additional information – Review the description, data quality score and metadata.
Request access to data (subscribe) –
1. Subscribe to request access.
2. After the subscription request is approved, review the data products that you have access to.
3. Query the data to view and consume the data.
Retrieve additional data –
1. Repeat the steps as needed to access and retrieve additional data.

Data solution administrator

Data solution administrators are responsible for performing administrative tasks on the data solution. The following figure shows the common tasks performed by the data solution administrator.

A data administrator’s journey includes:

Manage projects –
1. Manage Amazon DataZone domain.
2. Manage Amazon DataZone projects within the domain.
Manage environment –
1. Set up the environment to manage the infrastructure.
Manage business metadata glossary –
1. Manage and enable Amazon DataZone glossaries and metadata forms.
Manage data assets –
1. Manage assets.
2. Query the data to view and consume the data.
Manage access to data solution –
1. Monitor and revoke access when appropriate.

Conclusion

In this post, you learned how Volkswagen Autoeuropa embarked on a bold vision to become a data driven factory. It shows how this vision was put into action by building a data solution based on data mesh architecture using Amazon DataZone. It highlights the key features and architecture of the data solutions and presents the user journey. As of writing this post, Volkswagen Autoeuropa reduced the data discovery time from days to minutes using the data solution. The time to access data took several weeks before the Volkswagen Autoeuropa and AWS collaboration. Now, with the help of the data solution, the data access time has been reduced to several minutes.

In May 2024, the team achieved a major milestone by successfully offering data on the data solution and transporting it instantly to Power BI, a process that previously took several weeks.

“After one year of work, we did the full roundtrip from offering data on our new data marketplace built using Amazon DataZone to transporting it instantly to third-party tools, a process that previously took several weeks. This was a big achievement for our team.”

– Jorge Paulino, Product owner of the data solution. Volkswagen Autoeuropa.

The next post of the two-part series details discusses how we built the solution, its technical details, and the business value created.

If you want to harness the agility and scalability of a data mesh architecture and Amazon DataZone to accelerate innovation and drive business value for your organization, we have the resources to get you started. Be sure to check out the AWS Prescriptive Guidance: Strategies for building a data mesh-based enterprise solution on AWS. This comprehensive guide covers the key considerations and best practices for establishing a robust, well-governed data mesh on AWS. From aligning your data mesh with overall business strategy to scaling the data mesh across your organization, this Prescriptive Guidance provides a clear roadmap to help you succeed.

If you’re curious to get hands-on, see the GitHub repository: Building an enterprise Data Mesh with Amazon DataZone, Amazon DataZone, AWS CDK, and AWS CloudFormation. This open source project delivers a step-by-step guide to build a data mesh architecture using Amazon DataZone, AWS Cloud Development Kit (AWS CDK), and AWS CloudFormation.

About the Authors

Dhrubajyoti Mukherjee is a Cloud Infrastructure Architect with a strong focus on data strategy, data analytics, and data governance at Amazon Web Services (AWS). He uses his deep expertise to provide guidance to global enterprise customers across industries, helping them build scalable and secure AWS solutions that drive meaningful business outcomes. Dhrubajyoti is passionate about creating innovative, customer-centric solutions that enable digital transformation, business agility, and performance improvement. An active contributor to the AWS community, Dhrubajyoti authors AWS Prescriptive Guidance publications, blog posts, and open-source artifacts, sharing his insights and best practices with the broader community. Outside of work, Dhrubajyoti enjoys spending quality time with his family and exploring nature through his love of hiking mountains.

Ravi Kumar is a Data Architect and Analytics expert at Amazon Web Services; he finds immense fulfillment in working with data. His days are dedicated to designing and analyzing complex data systems, uncovering valuable insights that drive business decisions. Outside of work, he unwinds by listening to music and watching movies, activities that allow him to recharge after a long day of data wrangling.

Martin Mikoleizig studied mechanical engineering and production technology at the RWTH Aachen University before starting to work in Dr. h.c. Ing. F. Porsche AG 2015 as a production planner for the engine assembly. In several years as a Project Manager on Testing Technology for new engine models he also introduced several innovations like human-machine-collaborations and intelligent assistance systems. From 2017, he was responsible for the Shopfloor IT team of the module lines in Zuffenhausen before he became responsible for the Planning of the E-Drive assembly at Porsche. Beside this he was responsible for the Digitalisation Strategy of the Production Ressort at Porsche. Since October 2022, he has been assigned to Volkswagen Autoeuropa in Portugal in the role of a Digital Transformation Manager for the plant driving the Digital Transformation towards a Data Driven Factory.

Weizhou Sun is a Lead Architect at Amazon Web Services, specializing in digital manufacturing solutions and IoT. With extensive experience in Europe, she has enhanced operational efficiencies, reducing latency and increasing throughput. Weizhou’s expertise includes Industrial Computer Vision, predictive maintenance, and predictive quality, consistently delivering top performance and client satisfaction. A recognized thought leader in IoT and remote driving, she has contributed to business growth through innovations and open-source work. Committed to knowledge sharing, Weizhou mentors colleagues and contributes to practice development. Known for her problem-solving skills and customer focus, she delivers solutions that exceed expectations. In her free time, Weizhou explores new technologies and fosters a collaborative culture.

Shameka Almond is an Advisory Consultant at Amazon Web Services. She works closely with enterprise customers to help them better understand the business impact and value of implementing data solutions, including data governance best practices. Shameka has over a decade of wide-ranging IT experience in the manufacturing and aerospace industries, and the nonprofit sector. She has supported several data governance initiatives, helping both public and private organizations identify opportunities for improvement and increased efficiency. Outside of the office she enjoys hosting large family gatherings, and supporting community outreach events dedicated to introducing students in K-12 to STEM.

Adjoa Taylor has over 20 years of experience in industrial manufacturing, providing industry and technology consulting services, digital transformation, and solution delivery. Currently Adjoa leads Product Centric Digital Transformation, enabling customers to solve complex manufacturing problems by leveraging Smart Factory and Industry leading transformation mechanisms. Most recently driving value with AI/ML and generative AI use-cases for the plant floor. Adjoa is an experienced leader spending over 20 years of her career delivering projects in countries throughout North America, Latin America, Europe, and Asia. Through prior roles, Adjoa brings deep experience across multiple business segments with a focus on business outcome driven solutions. Adjoa is passionate about helping customers solve problems while realizing the art of the possible via the right impacting value-based solution.

Roger Grimes on Prioritizing Cybersecurity Advice

2024-10-31 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/roger-grimes-on-prioritizing-cybersecurity-advice.html

This is a good point:

Part of the problem is that we are constantly handed lists…list of required controls…list of things we are being asked to fix or improve…lists of new projects…lists of threats, and so on, that are not ranked for risks. For example, we are often given a cybersecurity guideline (e.g., PCI-DSS, HIPAA, SOX, NIST, etc.) with hundreds of recommendations. They are all great recommendations, which if followed, will reduce risk in your environment.

What they do not tell you is which of the recommended things will have the most impact on best reducing risk in your environment. They do not tell you that one, two or three of these things…among the hundreds that have been given to you, will reduce more risk than all the others.

[…]

The solution?

Here is one big one: Do not use or rely on un-risk-ranked lists. Require any list of controls, threats, defenses, solutions to be risk-ranked according to how much actual risk they will reduce in the current environment if implemented.

[…]

This specific CISA document has at least 21 main recommendations, many of which lead to two or more other more specific recommendations. Overall, it has several dozen recommendations, each of which individually will likely take weeks to months to fulfill in any environment if not already accomplished. Any person following this document is…rightly…going to be expected to evaluate and implement all those recommendations. And doing so will absolutely reduce risk.

The catch is: There are two recommendations that WILL DO MORE THAN ALL THE REST ADDED TOGETHER TO REDUCE CYBERSECURITY RISK most efficiently: patching and using multifactor authentication (MFA). Patching is listed third. MFA is listed eighth. And there is nothing to indicate their ability to significantly reduce cybersecurity risk as compared to the other recommendations. Two of these things are not like the other, but how is anyone reading the document supposed to know that patching and using MFA really matter more than all the rest?

Tracking World Leaders Using Strava

2024-10-31 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/10/tracking-world-leaders-using-strava.html

Way back in 2018, people noticed that you could find secret military bases using data published by the Strava fitness app. Soldiers and other military personal were using them to track their runs, and you could look at the public data and find places where there should be no people running.

Six years later, the problem remains. Le Monde has reported that the same Strava data can be used to track the movements of world leaders. They don’t wear the tracking device, but many of their bodyguards do.

Security updates for Thursday

2024-10-31 jake

Post Syndicated from jake original https://lwn.net/Articles/996526/

Security updates have been issued by Debian (firefox-esr and openssl), Fedora (firefox, libarchive, micropython, NetworkManager-libreswan, and xorg-x11-server-Xwayland), Red Hat (nano), Slackware (mozilla-firefox, mozilla-thunderbird, tigervnc, and xorg), SUSE (389-ds, Botan, go1.21-openssl, govulncheck-vulndb, java-11-openjdk, lxc, python-Werkzeug, and uwsgi), and Ubuntu (firefox, libarchive, linux-azure-fde, linux-azure-fde-5.15, python-pip, and xorg-server, xorg-server-hwe-16.04, xorg-server-hwe-18.04).

Comic for 2024.10.31 – Happy Halloween

2024-10-31 Explosm.net

Post Syndicated from Explosm.net original https://explosm.net/comics/happy-halloween-2

New Cyanide and Happiness Comic

The Return of Dr. Frankenstein’s Dream Machine!

2024-10-31 Crosstalk Solutions

Post Syndicated from Crosstalk Solutions original https://www.youtube.com/watch?v=k2og67nFWdI

Celebrating 15 years of MariaDB

2024-10-31 Michael "Monty" Widenius

Post Syndicated from Michael "Monty" Widenius original http://monty-says.blogspot.com/2024/10/celebrating-15-years-of-mariadb.html

It is 15 years since the first MariaDB server release of MariaDB 5.1.38 on 29’th of October 2009.

MariaDB got its name from my youngest daughter Maria, following the tradition of MySQL, who got its name from my oldest daughter My.

The MariaDB project started on April 20 2009, the same day when Oracle announced that Oracle will buy Sun Microsystems, who owned MySQL. The initial MariaDB engineering team consisted of some 20 engineers from the MySQL server team at Sun, and me. It has now grown to 45-50 engineers in MariaDB Corporation & MariaDB Foundation + a lot of external contributors.

The reason for creating MariaDB is that we all believed that Oracle would not be a good steward of MySQL and we wanted to ensure that MySQL source and spirit would continue living, outside of Oracle. My belief is also that without MariaDB, MySQL would not exist today.

MariaDB is actively developed. There have been 27 major releases of MariaDB, of which 18 have been long term (LTS) releases.

All MariaDB changes are tested on 4 different compilers, 6 different architectures, 4 different operating systems and 7 OS distributions. In addition we compile with many different compiler options and code checkers to find issuers earlier.

MariaDB is available on all major OS distributions and on all public clouds.

MariaDB Corporation/plc has in addition done 6 major release of the MariaDB Enterprise server. The Enterprise server is a targeted database for users who want longer release and maintenance cycles, higher stability, more performance and enterprise level support with direct contact to the engineers that wrote the code. This includes less reasons to upgrade, thanks to backported features from newer MariaDB releases, and easier upgrades, thanks to tools like MaxScale.

Until MariaDB 10.0, MariaDB was a true fork of MySQL with a lot of enhancements and performance improvements. Starting from MariaDB 10.0 we stopped doing merges of code from MySQL as this enabled us to add more features and bigger improvements to the code, without having to be constrained by the MySQL code base. We still kept up with most MySQL features and syntax to ensure that it should be trivial to migrate from MySQL to MariaDB. One can still move from MySQL 5.7 and earlier MySQL versions to any newer MariaDB version with almost no changes. MySQL 8.0 changed how things are stored on disk, which means that to move from MySQL 8.0 and above to MariaDB one has mysqldump/mariadb-dump and restore. Apart from that, moving from MySQL to MariaDB is still in many cases easier than moving between MySQL versions.

Here comes a list of some of the most notable features created in MariaDB. Note that many of these features were later copied by MySQL, usually with a different syntax. These are marked by (*) in the list below. I am very happy to see that MariaDB has forced MySQL to innovate! There are of course a few cases where MySQL adds a feature before MariaDB. These are also noted in the following list.

New storage engines

Aria storage engine (MyISAM replacement, initialled called Maria. Used for temporary results)
ColumnStore (Columnar storage, for analytical queries)
Connect (Allows one to connect to external databases through JDBC/ODBC and also read a lot of legacy database formats)
Mroonga (fulltext search)
MyRocks (Compressed storage, used by Facebook)
Sequence (Allows the creation of ascending or descending sequences. Great to quickly generate test data)
Spider (Sharding over multiple MariaDB servers)
S3 Storage engine

Performance

Pool of threads (MySQL had a similar capability in 5.4 community but later removed it from the community version and added it to MySQL Enterprise)

Optimizer

Table elimination
Better optimizer (First stage in MariaDB 5.3-5.5 and second in MariaDB 11.0)

Starting from 11.0 almost all aspects of the optimizer is cost based and costs are tunable.
Optimised for modern hardware

Subquery optimizations in 5.3 (*)
Index Condition pushdown (*)
Semi-join (*)
Batched key access (*)
Materalization (*)
Index_merge / Sort_intersection
Cost-based choice of range vs. index_merge
Use extended (hidden) primary keys for InnoDB
Subquery cache
Block hash join
Null-rejecting conditions tested early for NULLs
Optimizer trace (MySQL had this in 5.6. MariaDB did later a different implementation).
ANALYZE … SELECT|UPDATE|DELETE (*)
See https://mariadb.com/kb/en/optimizer-feature-comparison-matrix/ for a more complete list for the older optimizer features.
Histogram based statistics (*)
Split Grouping Optimization (?)
Descending indexes (MySQL had this first)
Sargable date and year
Vector search (*) (MySQL only offers a vector datatype, but no indexing possibilities)

Security

Plugable authentication (*)
Unix socket authentication (*)
Roles ((*)
Table level encryption (*) ; Patch from Google
Password validate plugin (MySQL had this first, but we could not use it as it had too many limitations and gotchas so we had to implement one from scratch)
Password expiration and account locking (MySQL had this first)
ED25519, PARSEC authentication plugins
Password reuse plugin
Hashicorp Key Management Plugin
SSL enabled by default. No configuration necessary. ; MySQL does not have zero-config SSL

Replication

Group commit with binary log (*)
Multi-source replication (*)
Parallel replication (*)
Enhanced semisync replication (*)
Multi-master with Galera (*) MySQL later implemented group replication with provides a similar feature
Global transaction id (MySQL had this one first)
Annotated row based events
Checksums for binlog events (MySQL backport)
Binary log checksums calculated during event creation and not during commit. This gives a great performance boost to replication when using checksums.
Delay slave (MySQL backport)
Semi-sync plugin moved inside server which gives notable better performance.
Lag free ALTER TABLE in replication

Logging

EXPLAIN in slow query log
Engine statistics in slow query log

DDL enhancements

Progress reports for ALTER TABLE, CHECK TABLE etc.
RETURNING for INSERT, UPDATE and DELETE
OR REPLACE for CREATE table and other DDL
ALTER ONLINE TABLE (MySQL backport ; Released at the same time)
INSTANT ADD COLUMN (MySQL had this one first, code from Tencent Games)
INSTANT DROP COLUMN, MODIFY COLUMN (*)
CHECK CONSTRAINT (*)
DECIMAL decimals increased from 30 to 38 (banking requirement)
CREATE SEQUENCE
Multiple triggers for same state (MySQL was first)
Invisible columns (*)
Atomic DDL (MySQL was first, but Oracle changed the storage format which makes it impossible to downgrade back. MariaDB did the same feature without changing storage format).
Once can update the table even if ALTER TABLE is running.

DML & DQL enhancements

SELECT … OFFSET … FETCH
SELECT … SKIP LOCKED ; MySQL had this first
Natural sorting

Other Features

Microsecond support for time data types (*)
Virtual columns (*)
Non-blocking client API Library
Shutdown statement (*)
Improved spatial functions (MariaDB has more Spatial functions than MySQL)
Improved GET_LOCK() with timeout in microseconds.
Window functions (*)
PERCENTILE_CONT, PERCENTILE_DISC, and MEDIAN window functions
Common table expressions (* ; Released about the same time in MySQL and MariaDB)
Oracle compatibility (LOTS of functions, PL-SQL, packages, null handling etc). This allows one to move many type of Oracle applications unchanged to MariaDB.
FLASHBACK ; Use binary log to roll back data to a previous state. (Contribution by Alibaba)
JSON functions (MySQL had initially better JSON support but MariaDB has caught up)
JSON Table (MySQL had this first)
System versioned tables (known as AS OF or Temporal Tables)
Table value constructors (*)
ROW data type
INet4 and INet6 data type
UUID data type (MySQL had this first)
INTERSECT & EXCEPT (*)
Storage engine independent column compression (Percona server had this first. Not in MySQL)
Support for Persistent Memory
mariadb-backup and backup locks (Only in MySQL Enterprise. However MariaDB can also do backup while ALTER TABLE is running)
sys schema (MySQL had this first)
SFORMAT for arbitrary text formatting
Connection redirection (*)

MariaDB Corporation also provides LGPL connectors that works with MariaDB and MySQL for the following languages:

C
C++
Java 8+
ODBC
Python
Node.js
R2DBC

MariaDB Corporation are also ensuring that the PHP and Perl connectors works with MariaDB.

Last, I want to thank all the MariaDB developers, testers, MariaDB employees, MariaDB contributors, investors, sponsors, customers and user, all who has contributed to make MariaDB a successful project.

These has been an amazing first 15 years and there is many more to come!

May your database always keep running!

Michael “Monty” Widenius

Moving Baselime from AWS to Cloudflare: simpler architecture, improved performance, over 80% lower cloud costs

2024-10-31 Boris Tane

Post Syndicated from Boris Tane original https://blog.cloudflare.com/80-percent-lower-cloud-cost-how-baselime-moved-from-aws-to-cloudflare

Introduction

When Baselime joined Cloudflare in April 2024, our architecture had evolved to hundreds of AWS Lambda functions, dozens of databases, and just as many queues. We were drowning in complexity and our cloud costs were growing fast. We are now building Baselime and Workers Observability on Cloudflare and will save over 80% on our cloud compute bill. The estimated potential Cloudflare costs are for Baselime, which remains a stand-alone offering, and the estimate is based on the Workers Paid plan. Not only did we achieve huge cost savings, we also simplified our architecture and improved overall latency, scalability, and reliability.

Daily Cost	Before (AWS)	After (Cloudflare)
Compute	$650 – AWS Lambda	$25 – Cloudflare Workers
CDN	$140 – Cloudfront	$0 – Free
Data Stream + Analytics database	$1,150 – Kinesis Data Stream + EC2	$300 – Workers Analytics Engine
Total	$1,940	$325 (83% cost reduction)

^{Table 1: Daily Costs Comparison ($USD)}

When we joined Cloudflare, we immediately saw a surge in usage, and within the first week following the announcement, we were processing over a billion events daily and our weekly active users tripled.

As the platform grew, so did the challenges of managing real-time observability with new scalability, reliability, and cost considerations. This drove us to rebuild Baselime on the Cloudflare Developer Platform, where we could innovate quickly while reducing operational overhead.

Initial architecture — all on AWS

Our initial architecture was all on Amazon Web Services (AWS). We’ll focus here on the data pipeline, which covers ingestion, processing, and storage of tens of billions of events daily.

This pipeline was built on top of AWS Lambda, Cloudfront, Kinesis, EC2, DynamoDB, ECS, and ElastiCache.

^{Figure1: Initial data pipeline architecture}

The key elements are:

Data receptors: Responsible for receiving telemetry data from multiple sources, including OpenTelemetry, Cloudflare Logpush, CloudWatch, Vercel, etc. They cover validation, authentication, and transforming data from each source into a common internal format. The data receptors were deployed either on AWS Lambda (using function URLs and Cloudfront) or ECS Fargate depending on the data source.
Kinesis Data Stream: Responsible for transporting the data from the receptors to the next step: data processing.
Processor: A single AWS Lambda function responsible for enriching and transforming the data for storage. It also performed real-time error tracking and detecting patterns in logs.
ClickHouse cluster: All the telemetry data was ultimately indexed and stored in a self-hosted ClickHouse cluster on EC2.

In addition to these key elements, the existing stack also included orchestration with Firehose, S3 buckets, SQS, DynamoDB and RDS for error handling, retries, and storing metadata.

While this architecture served us well in the early days, it started to show major cracks as we scaled our solution to more and larger customers.

Handling retries at the interface between the data receptors and the Kinesis Data Stream was complex, requiring introducing and orchestrating Firehose, S3 buckets, SQS, and another Lambda function.

Self-hosting ClickHouse also introduced major challenges at scale, as we continuously had to plan our capacity and update our setup to keep pace with our growing user base whilst attempting to maintain control over costs.

Costs began scaling unpredictably with our growing workloads, especially in AWS Lambda, Kinesis, and EC2, but also in less obvious ways, such as in Cloudfront (required for a custom domain in front of Lambda function URLs) and DynamoDB. Specifically, the time spent on I/O operations in AWS Lambda was a particularly costly piece. At every step, from the data receptors to the ClickHouse cluster, moving data to the next stage required waiting for a network request to complete, accounting for over 70% of wall time in the Lambda function.

In a nutshell, we were continuously paged by our alerts, innovating at a slower pace, and our costs were out of control.

Additionally, the entire solution was deployed in a single AWS region: eu-west-1. As a result, all developers located outside continental Europe were experiencing high latency when emitting logs and traces to Baselime.

Modern architecture — transitioning to Cloudflare

The shift to the Cloudflare Developer Platform enabled us to rethink our architecture to be exceptionally fast, globally distributed, and highly scalable, without compromising on cost, complexity, or agility. This new architecture is built on top of Cloudflare primitives.

^{Figure 2: Modern data pipeline architecture}

Cloudflare Workers: the core of Baselime

Cloudflare Workers are now at the core of everything we do. All the data receptors and the processor run in Workers. Workers minimize cold-start times and are deployed globally by default. As such, developers always experience lower latency when emitting events to Baselime.

Additionally, we heavily use JavaScript-native RPC for data transfer between steps of the pipeline. It’s low-latency, lightweight, and simplifies communication between components. This further simplifies our architecture, as separate components behave more as functions within the same process, rather than completely separate applications.

export default {
  async fetch(request: Request, env: Bindings, ctx: ExecutionContext): Promise<Response> {
      try {
        const { err, apiKey } = auth(request);
        if (err) return err;

        const data = {
          workspaceId: apiKey.workspaceId,
          environmentId: apiKey.environmentId,
          events: request.body
        };
        await env.PROCESSOR.ingest(data);

        return success({ message: "Request Accepted" }, 202);
      } catch (error) {
        return failure({ message: "Internal Error" });
      }
  },
};

^{Code Block 1: Simplified data receptor using JavaScript-native RPC to execute the processor.}

Workers also expose a Rate Limiting binding that enables us to automatically add rate limiting to our services, which we previously had to build ourselves using a combination of DynamoDB and ElastiCache.

Moreover, we heavily use ctx.waitUntil within our Worker invocations, to offload data transformation outside the request / response path. This further reduces the latency of calls developers make to our data receptors.

Durable Objects: stateful data processing

Durable Objects is a unique service within the Cloudflare Developer Platform, as it enables building stateful applications in a serverless environment. We use Durable Objects in the data pipelines for both real-time error tracking and detecting log patterns.

For instance, to track errors in real-time, we create a durable object for each new type of error, and this durable object is responsible for keeping track of the frequency of the error, when to notify customers, and the notification channels for the error. This implementation with a single building block removes the need for ElastiCache, Kinesis, and multiple Lambda functions to coordinate protecting the RDS database from being overwhelmed by a high frequency error.

^{Figure 3: Real-time error detection architecture comparison}

Durable Objects gives us precise control over consistency and concurrency of managing state in the data pipeline.

In addition to the data pipeline, we use Durable Objects for alerting. Our previous architecture required orchestrating EventBridge Scheduler, SQS, DynamoDB and multiple AWS Lambda functions, whereas with Durable Objects, everything is handled within the alarm handler.

Workers Analytics Engine: high-cardinality analytics at scale

Though managing our own ClickHouse cluster was technically interesting and challenging, it took us away from building the best observability developer experience. With this migration, more of our time is spent enhancing our product and none is spent managing server instances.

Workers Analytics Engine lets us synchronously write events to a scalable high-cardinality analytics database. We built on top of the same technology that powers Workers Analytics Engine. We also made internal changes to Workers Analytics Engine to natively enable high dimensionality in addition to high cardinality.

Moreover, Workers Analytics Engine and our solution leverages Cloudflare’s ABR analytics. ABR stands for Adaptive Bit Rate, and enables us to store telemetry data in multiple tables with varying resolutions, from 100% to 0.0001% of the data. Querying the table with 0.0001% of the data will be several orders of magnitudes faster than the table with all the data, with a corresponding trade-off in accuracy. As such, when a query is sent to our systems, Workers Analytics Engine dynamically selects the most appropriate table to run the query, optimizing both query time and accuracy. Users always get the most accurate result with optimal query time, regardless of the size of their dataset or the timeframe of the query. Compared to our previous system, which was always running queries on the full dataset, the new system now delivers faster queries across our entire user base and use cases.

In addition to these core services (Workers, Durable Objects, Workers Analytics Engine), the new architecture leverages other building blocks from the Cloudflare Developer Platform. Queues for asynchronous messaging, decoupling services and enabling an event-driven architecture; D1 as our main database for transactional data (queries, alerts, dashboards, configurations, etc.); Workers KV for fast distributed storage; Hono for all our APIs, etc.

How did we migrate?

Baselime is built on an event-driven architecture, where every user action triggers an event. It operates on the principle that every user action is recorded as an event and emitted to the rest of the system — whether it’s creating a user, editing a dashboard, or performing any other action. Migrating to Cloudflare involved transitioning our event-driven architecture without compromising uptime and data consistency. Previously, this was powered by AWS EventBridge and SQS, and we moved entirely to Cloudflare Queues.

We followed the strangler fig pattern to incrementally migrate the solution from AWS to Cloudflare. It consists of gradually replacing specific parts of the system with newer services, with minimal disruption to the system. Early in the process, we created a central Cloudflare Queue which acted as the backbone for all transactional event processing during the migration. Every event, whether a new user signup or a dashboard edit, was funneled into this Queue. From there, events were dynamically routed, each event to the relevant part of the application. User actions were synced into D1 and KV, ensuring that all user actions were mirrored across both AWS and Cloudflare during the transition.

This syncing mechanism enabled us to maintain consistency and ensure that no data was lost as users continued to interact with Baselime.

Here’s an example of how events are processed:

export default {
  async queue(batch, env) {
    for (const message of batch.messages) {
      try {
        const event = message.body;
        switch (event.type) {
          case "WORKSPACE_CREATED":
            await workspaceHandler.create(env, event.data);
            break;
          case "QUERY_CREATED":
            await queryHandler.create(env, event.data);
            break;
          case "QUERY_DELETED":
            await queryHandler.remove(env, event.data);
            break;
          case "DASHBOARD_CREATED":
            await dashboardHandler.create(env, event.data);
            break;
          //
          // Many more events...
          //
          default:
            logger.info("Matched no events", { type: event.type });
        }
        message.ack();
      } catch (e) {
        if (message.attempts < 3) {
          message.retry({ delaySeconds: Math.ceil(30 ** message.attempts / 10), });
        } else {
          logger.error("Failed handling event - No more retrys", { event: message.body, attempts: message.attempts }, e);
        }
      }
    }
  },
} satisfies ExportedHandler<Env, InternalEvent>;

^{Code Block 2: Simplified internal events processing during migration.}

We migrated the data pipeline from AWS to Cloudflare with an outside-in method: we started with the data receptors and incrementally moved the data processor and the ClickHouse cluster to the new architecture. We began writing telemetry data (logs, metrics, traces, wide-events, etc.) to both ClickHouse (in AWS) and to Workers Analytics Engine simultaneously for the duration of the retention period (30 days).

The final step was rewriting all of our endpoints, previously hosted on AWS Lambda and ECS containers, into Cloudflare Workers. Once those Workers were ready, we simply switched the DNS records to point to the Workers instead of the existing Lambda functions.

Despite the complexity, the entire migration process, from the data pipeline to all re-writing API endpoints, took our then team of 3 engineers less than three months.

We ended up saving over 80% on our cloud bill

Savings on the data receptors

After switching the data receptors from AWS to Cloudflare in early June 2024, our AWS Lambda cost was reduced by over 85%. These costs were primarily driven by I/O time the receptors spent sending data to a Kinesis Data Stream in the same region.

^{Figure 4: Baselime daily AWS Lambda cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

Moreover, we used Cloudfront to enable custom domains pointing to the data receptors. When we migrated the data receptors to Cloudflare, there was no need for Cloudfront anymore. As such, our Cloudfront cost was reduced to $0.

^{Figure 5: Baselime daily Cloudfront cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

If we were a regular Cloudflare customer, we estimate that our daily Cloudflare Workers bill would be around \$25 after the switch, against \$790 on AWS: over 95% cost reduction. These savings are primarily driven by the Workers pricing model, since Workers charge for CPU time, and the receptors are primarily just moving data, and as such, are mostly I/O bound.

Savings on the ClickHouse cluster

To evaluate the cost impact of switching from self-hosting ClickHouse to using Workers Analytics Engine, we need to take into account not only the EC2 instances, but also the disk space, networking, and the Kinesis Data Stream cost.

We completed this switch in late August, achieving over 95% cost reduction in both the Kinesis Data Stream and all EC2 related costs.

^{Figure 6: Baselime daily Kinesis Data Stream cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

^{Figure 7: Baselime daily EC2 cost [note: the gap in data is the result of AWS Cost Explorer losing data when the parent organization of the cloud accounts was changed.]}

If we were a regular Cloudflare customer, we estimate that our daily Workers Analytics Engine cost would be around \$300 after the switch, compared to \$1150 on AWS, a cost reduction of over 70%.

Not only did we significantly reduce costs by migrating to Cloudflare, but we also improved performance across the board. Responses to users are now faster, with real-time event ingestion happening across Cloudflare’s network, closer to our users. Responses to users querying their data are also much faster, thanks to Cloudflare’s deep expertise in operating ClickHouse at scale.

Most importantly, we’re no longer bound by limitations in throughput or scale. We launched Workers Logs on September 26, 2024, and our system now handles a much higher volume of events than before, with no sacrifices in speed or reliability.

These cost savings are outstanding as is, and do not include the total cost of ownership of those systems. We significantly simplified our systems and our codebase, as the platform is taking care of more for us. We’re paged less, we spend less time monitoring infrastructure, and we can focus on delivering product improvements.

Conclusion

Migrating Baselime to Cloudflare has transformed how we build and scale our platform. With Workers, Durable Objects, Workers Analytics Engine, and other services, we now run a fully serverless, globally distributed system that’s more cost-efficient and agile. This shift has significantly reduced our operational overhead and enabled us to iterate faster, delivering better observability tooling to our users.

You can start observing your Cloudflare Workers today with Workers Logs. Looking ahead, we’re excited about the features we will deliver directly in the Cloudflare Dashboard, including real-time error tracking, alerting, and a query builder for high-cardinality and dimensionality events. All coming by early 2025.

Workers Builds: integrated CI/CD built on the Workers platform

2024-10-31 Serena Shah-Simpson

Post Syndicated from Serena Shah-Simpson original https://blog.cloudflare.com/workers-builds-integrated-ci-cd-built-on-the-workers-platform

During 2024’s Birthday Week, we launched Workers Builds in open beta — an integrated Continuous Integration and Delivery (CI/CD) workflow you can use to build and deploy everything from full-stack applications built with the most popular frameworks to simple static websites onto the Workers platform. With Workers Builds, you can connect a GitHub or GitLab repository to a Worker, and Cloudflare will automatically build and deploy your changes each time you push a commit.

Workers Builds is intended to bridge the gap between the developer experiences for Workers and Pages, the latter of which launched with an integrated CI/CD system in 2020. As we continue to merge the experiences of Pages and Workers, we wanted to bring one of the best features of Pages to Workers: the ability to tie deployments to existing development workflows in GitHub and GitLab with minimal developer overhead.

In this post, we’re going to share how we built the Workers Builds system on Cloudflare’s Developer Platform, using Workers, Durable Objects, Hyperdrive, Workers Logs, and Smart Placement.

The design problem

The core problem for Workers Builds is how to pick up a commit from GitHub or GitLab and start a containerized job that can clone the repo, build the project, and deploy a Worker.

Pages solves a similar problem, and we were initially inclined to expand our existing architecture and tech stack, which includes a centralized configuration plane built on Go in Kubernetes. We also considered the ways in which the Workers ecosystem has evolved in the four years since Pages launched — we have since launched so many more tools built for use cases just like this!

The distributed nature of Workers offers some advantages over a centralized stack — we can spend less time configuring Kubernetes because Workers automatically handles failover and scaling. Ultimately, we decided to keep using what required no additional work to re-use from Pages (namely, the system for connecting GitHub/GitLab accounts to Cloudflare, and ingesting push events from them), and for the rest build out a new architecture on the Workers platform, with reliability and minimal latency in mind.

The Workers Builds system

We didn’t need to make any changes to the system that handles connections from GitHub/GitLab to Cloudflare and ingesting push events from them. That left us with two systems to build: the configuration plane for users to connect a Worker to a repo, and a build management system to run and monitor builds.

Client Worker

We can begin with our configuration plane, which consists of a simple Client Worker that implements a RESTful API (using Hono) and connects to a PostgreSQL database. It’s in this database that we store build configurations for our users, and through this Worker that users can view and manage their builds.

We use a Hyperdrive binding to connect to our database securely over Cloudflare Access (which also manages connection pooling and query caching).

We considered a more distributed data model (like D1, sharded by account), but ultimately decided that keeping our database in a datacenter more easily fit our use-case. The Workers Builds data model is relational — Workers belong to Cloudflare Accounts, and Builds belong to Workers — and build metadata must be consistent in order to properly manage build queues. We chose to keep our failover-ready database in a centralized datacenter and take advantage of two other Workers products, Smart Placement and Hyperdrive, in order to keep the benefits of a distributed control plane.

Everything that you see in the Cloudflare Dashboard related to Workers Builds is served by this Worker.

Build Management Worker

The more challenging problem we faced was how to run and manage user builds effectively. We wanted to support the same experience that we had achieved with Pages, which led to these key requirements:

Builds should be initiated with minimal latency.
The status of a build should be tracked and displayed through its entire lifecycle, starting when a user pushes a commit.
Customer build logs should be stored in a secure, private, and long-lived way.

To solve these problems, we leaned heavily into the technology of Durable Objects (DO).

We created a Build Management Worker with two DO classes: A Scheduler class to manage the scheduling of builds, and a class called BuildBuddy to manage individual builds. We chose to design our system this way for an efficient and scalable system. Since each build is assigned its own build manager DO, its operation won’t ever block other builds or the scheduler, meaning we can start up builds with minimal latency. Below, we dive into each of these Durable Objects classes.

Scheduler DO

The Scheduler DO class is relatively simple. Using Durable Objects Alarms, it is triggered every second to pull up a list of user build configurations that are ready to be started. For each of those builds, the Scheduler creates an instance of our other DO Class, the Build Buddy.

import { DurableObject } from 'cloudflare:workers'


export class BuildScheduler extends DurableObject {
   state: DurableObjectState
   env: Bindings


   constructor(ctx: DurableObjectState, env: Bindings) {
       super(ctx, env)
   }
   
   // The DO alarm handler will be called every second to fetch builds
   async alarm(): Promise<void> {
// set alarm to run again in 1 second
       await this.updateAlarm()


       const builds = await this.getBuildsToSchedule()
       await this.scheduleBuilds(builds)
   }


   async scheduleBuilds(builds: Builds[]): Promise<void> {
       // Don't schedule builds, if no builds to schedule
       if (builds.length === 0) return


       const queue = new PQueue({ concurrency: 6 })
       // Begin running builds
       builds.forEach((build) =>
           queue.add(async () => {
       	  // The BuildBuddy is another DO described more in the next section! 
               const bb = getBuildBuddy(this.env, build.build_id)
               await bb.startBuild(build)
           })
       )


       await queue.onIdle()
   }


   async getBuildsToSchedule(): Promise<Builds[]> {
       // returns list of builds to schedule
   }


   async updateAlarm(): Promise<void> {
// We want to ensure we aren't running multiple alarms at once, so we only set the next alarm if there isn’t already one set. 
       const existingAlarm = await this.ctx.storage.getAlarm()
       if (existingAlarm === null) {
           this.ctx.storage.setAlarm(Date.now() + 1000)
       }
   }
}

Build Buddy DO

The Build Buddy DO class is what we use to manage each individual build from the time it begins initializing to when it is stopped. Every build has a buddy for life!

Upon creation of a Build Buddy DO instance, the Scheduler immediately calls startBuild() on the instance. The startBuild() method is responsible for fetching all metadata and secrets needed to run a build, and then kicking off a build on Cloudflare’s container platform (not public yet, but coming soon!).

As the containerized build runs, it reports back to the Build Buddy, sending status updates and logs for the Build Buddy to deal with.

Build status

As a build progresses, it reports its own status back to Build Buddy, sending updates when it has finished initializing, has completed successfully, or been terminated by the user. The Build Buddy is responsible for handling this incoming information from the containerized build, writing status updates to the database (via a Hyperdrive binding) so that users can see the status of their build in the Cloudflare dashboard.

Build logs

A running build generates output logs that are important to store and surface to the user. The containerized build flushes these logs to the Build Buddy every second, which, in turn, stores those logs in DO storage.

The decision to use Durable Object storage here makes it easy to multicast logs to multiple clients efficiently, and allows us to use the same API for both streaming logs and viewing historical logs.

// build-management-app.ts

// We created a Hono app to for use by our Client Worker API
const app = new Hono<HonoContext>()
   .post(
       '/api/builds/:build_uuid/status',
       async (c) => {
           const buildStatus = await c.req.json()


           // fetch build metadata
           const build = ...


           const bb = getBuildBuddy(c.env, build.build_id)
           return await bb.handleStatusUpdate(build, statusUpdate)
       }
   )
   .post(
       '/api/builds/:build_uuid/logs',
       async (c) => {
           const logs = await c.req.json()
     // fetch build metadata
           const build = ...


           const bb = getBuildBuddy(c.env, build.build_id)
           return await bb.addLogLines(logs.lines)
       }
   )


export default {
   fetch: app.fetch
}

// build-buddy.ts

import { DurableObject } from 'cloudflare:workers'


export class BuildBuddy extends DurableObject {
   compute: WorkersBuildsCompute


   constructor(ctx: DurableObjectState, env: Bindings) {
       super(ctx, env)
       this.compute = new ComputeClient({
           // ...
       })
   }


   // The Scheduler DO calls startBuild upon creating a BuildBuddy instance
   startBuild(build: Build): void {
       this.startBuildAsync(build)         
   }


   async startBuildAsync(build: Build): Promise<void> {
       // fetch all necessary metadata build, including
	// environment variables, secrets, build tokens, repo credentials, 
// build image URI, etc
	// ...


	// start a containerized build
       const computeBuild = await this.compute.createBuild({
           // ...
       })
   }


   // The Build Management worker calls handleStatusUpdate when it receives an update
   // from the containerized build
   async handleStatusUpdate(
       build: Build,
       buildStatusUpdatePayload: Payload
   ): Promise<void> {
// Write status updates to the database
   }


   // The Build Management worker calls addLogLines when it receives flushed logs
   // from the containerized build
   async addLogLines(logs: LogLines): Promise<void> {
       // Generate nextLogsKey to store logs under      
       this.ctx.storage.put(nextLogsKey, logs)
   }


   // The Client Worker can call methods on a Build Buddy via RPC, using a service binding to the Build Management Worker.
   // The getLogs method retrieves logs for the user, and the cancelBuild method forwards a request from the user to terminate a build. 
   async getLogs(cursor: string){
       const decodedCursor = cursor !== undefined ? decodeLogsCursor(cursor) : undefined
       return await this.getLogs(decodedCursor)
   }


   async cancelBuild(compute_id: string, build_id: string): void{
      await this.terminateBuild(build_id, compute_id)
   }


   async terminateBuild(build_id: number, compute_id: string): Promise<void> {
       await this.compute.stopBuild(compute_id)
   }
}


   export function getBuildBuddy(
   env: Pick<Bindings, 'BUILD_BUDDY'>,
   build_id: number
): DurableObjectStub<BuildBuddy> {
   const id = env.BUILD_BUDDY.idFromName(build_id.toString())
   return env.BUILD_BUDDY.get(id)
}

Alarms

We utilize alarms in the Build Buddy to check that a build has a healthy startup and to terminate any builds that run longer than 20 minutes.

How else have we leveraged the Developer Platform?

Now that we’ve gone over the core behavior of the Workers Builds control plane, we’d like to detail a few other features of the Workers platform that we use to improve performance, monitor system health, and troubleshoot customer issues.

Smart Placement and location hints

While our control plane is distributed in the sense that it can be run across multiple datacenters, to reduce latency costs, we want most requests to be served from locations close to our primary database in the western US.

While a build is running, Build Buddy, a Durable Object, is continuously writing status updates to our database. For the Client and the Build Management API Workers, we enabled Smart Placement with location hints to ensure requests run close to the database.

This graph shows the reduction in round trip time (RTT) observed for our Worker with Smart Placement turned on.

Workers Logs

We needed a logging tool that allows us to aggregate and search across persistent operational logs from our Workers to assist with identifying and troubleshooting issues. We worked with the Workers Observability team to become early adopters of Workers Logs.

Workers Logs worked out of the box, giving us fast and easy to use logs directly within the Cloudflare dashboard. To improve our ability to search logs, we created a tagging library that allows us to easily add metadata like the git tag of the deployed worker that the log comes from, allowing us to filter logs by release.

See a shortened example below for how we handle and log errors on the Client Worker.

// client-worker-app.ts

// The Client Worker is a RESTful API built with Hono
const app = new Hono<HonoContext>()
   // This is from the workers-tagged-logger library - first we register the logger
   .use(useWorkersLogger('client-worker-app'))
   // If any error happens during execution, this middleware will ensure we log the error
   .onError(useOnError)
   // routes
   .get(
       '/apiv4/builds',
       async (c) => {
           const { ids } = c.req.query()
           return await getBuildsByIds(c, ids)
       }
   )


function useOnError(e: Error, c: Context<HonoContext>): Response {
   // Set the project identifier n the error
   logger.setTags({ release: c.env.GIT_TAG })
 
   // Write a log at level 'error'. Can also log 'info', 'log', 'warn', and 'debug'
   logger.error(e)
   return c.json(internal_error.toJSON(), internal_error.statusCode)
}

This setup can lead to the following sample log message from our Workers Log dashboard. You can see the release tag is set on the log.

We can get a better sense of the impact of the error by adding filters to the Workers Logs view, as shown below. We are able to filter on any of the fields since we’re logging with structured JSON.

R2

Coming soon to Workers Builds is build caching, used to store artifacts of a build for subsequent builds to reuse, such as package dependencies and build outputs. Build caching can speed up customer builds by avoiding the need to redownload dependencies from NPM or to rebuild projects from scratch. The cache itself will be backed by R2 storage.

Testing

We were able to build up a great testing story using Vitest and workerd — unit tests, cross-worker integration tests, the works. In the example below, we make use of the runInDurableObject stub from cloudflare:test to test instance methods on the Scheduler DO directly.

// scheduler.spec.ts

import { env, runInDurableObject } from 'cloudflare:test'
import { expect, test } from 'vitest'
import { BuildScheduler } from './scheduler'


test('getBuildsToSchedule() runs a queued build', async () => {
   // Our test harness creates a single build for our scheduler to pick up
   const { build } = await harness.createBuild()


   // We create a scheduler DO instance
   const id = env.BUILD_SCHEDULER.idFromName(crypto.randomUUID())
   const stub = env.BUILD_SCHEDULER.get(id)
   await runInDurableObject(stub, async (instance: BuildScheduler) => {
       expect(instance).toBeInstanceOf(BuildScheduler)


// We check that the scheduler picks up 1 build
       const builds = await instance.getBuildsToSchedule()
       expect(builds.length).toBe(1)
	
// We start the build, which should mark it as running
       await instance.scheduleBuilds(builds)
   })


   // Check that there are no more builds to schedule
   const queuedBuilds = ...
   expect(queuedBuilds.length).toBe(0)
})

We use SELF.fetch() from cloudflare:test to run integration tests on our Client Worker, as shown below. This integration test covers our Hono endpoint and database queries made by the Client Worker in retrieving the metadata of a build.

// builds_api.test.ts

import { env, SELF } from 'cloudflare:test'
   
it('correctly selects a single build', async () => {
   // Our test harness creates a randomized build to test with
   const { build } = await harness.createBuild()


   // We send a request to the Client Worker itself to fetch the build metadata
   const getBuild = await SELF.fetch(
       `https://example.com/builds/${build1.build_uuid}`,
       {
           method: 'GET',
           headers: new Headers({
               Authorization: `Bearer JWT`,
               'content-type': 'application/json',
           }),
       }
   )


   // We expect to receive a 200 response from our request and for the 
   // build metadata returned to match that of the random build that we created
   expect(getBuild.status).toBe(200)
   const getBuildV4Resp = await getBuild.json()
   const buildResp = getBuildV4Resp.result
   expect(buildResp).toBeTruthy()
   expect(buildResp).toEqual(build)
})

These tests run on the same runtime that Workers run on in production, meaning we have greater confidence that any code changes will behave as expected when they go live.

Analytics

We use the technology underlying the Workers Analytics Engine to collect all of the metrics for our system. We set up Grafana dashboards to display these metrics.

JavaScript-native RPC

JavaScript-native RPC was added to Workers in April of 2024, and it’s pretty magical. In the scheduler code example above, we call startBuild() on the BuildBuddy DO from the Scheduler DO. Without RPC, we would need to stand up routes on the BuildBuddy fetch() handler for the Scheduler to trigger with a fetch request. With RPC, there is almost no boilerplate — all we need to do is call a method on a class.

const bb = getBuildBuddy(this.env, build.build_id)


// Starting a build without RPC 😢
await bb.fetch('http://do/api/start_build', {
    method: 'POST',
    body: JSON.stringify(build),
})


// Starting a build with RPC 😸
await bb.startBuild(build)

Conclusion

By using Workers and Durable Objects, we were able to build a complex and distributed system that is easy to understand and is easily scalable.

It’s been a blast for our team to build on top of the very platform that we work on, something that would have been much harder to achieve on Workers just a few years ago. We believe in being Customer Zero for our own products — to identify pain points firsthand and to continuously improve the developer experience by applying them to our own use cases. It was fulfilling to have our needs as developers met by other teams and then see those tools quickly become available to the rest of the world — we were collaborators and internal testers for Workers Logs and private network support for Hyperdrive (both released on Birthday Week), and the soon to be released container platform.

Opportunities to build complex applications on the Developer Platform have increased in recent years as the platform has matured and expanded product offerings for more use cases. We hope that Workers Builds will be yet another tool in the Workers toolbox that enables developers to spend less time thinking about configuration and more time writing code.

Want to try it out? Check out the docs to learn more about how to deploy your first project with Workers Builds.

Solution overview

Pre-requisites

Implementation

Additional customization options

Considerations and best practices

Cleanup

Conclusion

About the Authors

Overview

Using the new features

Installing the extension

Using Application Builder for existing applications

Using the guided walkthrough

Creating an application from the samples

Building a synchronous Rest API application

Building the application

Iterate locally: invoke and debug

Local invoke

Local debugging

Deploying the application

Remote invoke

Viewing logs

Conclusion

Background

Using the L2

S3 Origin with Customer Managed KMS

Considerations when migrating to the new construct

Conclusion

Understanding Volkswagen Autoeuropa’s challenges

Envisioning a data solution for Volkswagen Autoeuropa

Solution overview

Data solution features

Data solution architecture

User journey

Data producer

Data consumer

Data solution administrator

Conclusion

About the Authors

New storage engines

Performance

Optimizer

Security

Replication

Logging

DDL enhancements

Other Features

Introduction

Initial architecture — all on AWS

Modern architecture — transitioning to Cloudflare

Cloudflare Workers: the core of Baselime

Durable Objects: stateful data processing

Workers Analytics Engine: high-cardinality analytics at scale

How did we migrate?

We ended up saving over 80% on our cloud bill

Savings on the data receptors

Savings on the ClickHouse cluster

Conclusion

The design problem

The Workers Builds system

Client Worker

Build Management Worker

Scheduler DO

Build Buddy DO

Build status

Build logs

Alarms

How else have we leveraged the Developer Platform?

Smart Placement and location hints

Workers Logs

R2

Testing

Analytics

JavaScript-native RPC

Conclusion

The collective thoughts of the interwebz