Cockpit project releases Cockpit Files plugin

2024-06-12 jzb

Post Syndicated from jzb original https://lwn.net/Articles/978156/

The Cockpit project has
announced
the first release of Cockpit
Files, a plugin for Cockpit that allows file management on your server
via a web browser:

Cockpit Files was initially started by Google Summer of Code (GSoC)
student Mahmoud Hamdy
and is now under active development by the Cockpit team. The goal is
to replace the functionality of the cockpit-navigator
plugin from 45Drives and include automated testing per commit, a
standard PatternFly-based interface, and consistency with the rest of
Cockpit.

Development builds for Fedora are available via a
Copr repository, and packages are expected for Arch, Debian, and
Fedora. LWN covered the
Cockpit project in March.

Ingest and analyze your data using Amazon OpenSearch Service with Amazon OpenSearch Ingestion

2024-06-12 Sharmila Shanmugam

Post Syndicated from Sharmila Shanmugam original https://aws.amazon.com/blogs/big-data/ingest-and-analyze-your-data-using-amazon-opensearch-service-with-amazon-opensearch-ingestion/

In today’s data-driven world, organizations are continually confronted with the task of managing extensive volumes of data securely and efficiently. Whether it’s customer information, sales records, or sensor data from Internet of Things (IoT) devices, the importance of handling and storing data at scale with ease of use is paramount.

A common use case that we see amongst customers is to search and visualize data. In this post, we show how to ingest CSV files from Amazon Simple Storage Service (Amazon S3) into Amazon OpenSearch Service using the Amazon OpenSearch Ingestion feature and visualize the ingested data using OpenSearch Dashboards.

OpenSearch Service is a fully managed, open source search and analytics engine that helps you with ingesting, searching, and analyzing large datasets quickly and efficiently. OpenSearch Service enables you to quickly deploy, operate, and scale OpenSearch clusters. It continues to be a tool of choice for a wide variety of use cases such as log analytics, real-time application monitoring, clickstream analysis, website search, and more.

OpenSearch Dashboards is a visualization and exploration tool that allows you to create, manage, and interact with visuals, dashboards, and reports based on the data indexed in your OpenSearch cluster.

Visualize data in OpenSearch Dashboards

Visualizing the data in OpenSearch Dashboards involves the following steps:

Ingest data – Before you can visualize data, you need to ingest the data into an OpenSearch Service index in an OpenSearch Service domain or Amazon OpenSearch Serverless collection and define the mapping for the index. You can specify the data types of fields and how they should be analyzed; if nothing is specified, OpenSearch Service automatically detects the data type of each field and creates a dynamic mapping for your index by default.
Create an index pattern – After you index the data into your OpenSearch Service domain, you need to create an index pattern that enables OpenSearch Dashboards to read the data stored in the domain. This pattern can be based on index names, aliases, or wildcard expressions. You can configure the index pattern by specifying the timestamp field (if applicable) and other settings that are relevant to your data.
Create visualizations – You can create visuals that represent your data in meaningful ways. Common types of visuals include line charts, bar charts, pie charts, maps, and tables. You can also create more complex visualizations like heatmaps and geospatial representations.

Ingest data with OpenSearch Ingestion

Ingesting data into OpenSearch Service can be challenging because it involves a number of steps, including collecting, converting, mapping, and loading data from different data sources into your OpenSearch Service index. Traditionally, this data was ingested using integrations with Amazon Data Firehose, Logstash, Data Prepper, Amazon CloudWatch, or AWS IoT.

The OpenSearch Ingestion feature of OpenSearch Service introduced in April 2023 makes ingesting and processing petabyte-scale data into OpenSearch Service straightforward. OpenSearch Ingestion is a fully managed, serverless data collector that allows you to ingest, filter, enrich, and route data to an OpenSearch Service domain or OpenSearch Serverless collection. You configure your data producers to send data to OpenSearch Ingestion, which automatically delivers the data to the domain or collection that you specify. You can configure OpenSearch Ingestion to transform your data before delivering it.

OpenSearch Ingestion scales automatically to meet the requirements of your most demanding workloads, helping you focus on your business logic while abstracting away the complexity of managing complex data pipelines. It’s powered by Data Prepper, an open source streaming Extract, Transform, Load (ETL) tool that can filter, enrich, transform, normalize, and aggregate data for downstream analysis and visualization.

OpenSearch Ingestion uses pipelines as a mechanism that consists of three major components:

Source – The input component of a pipeline. It defines the mechanism through which a pipeline consumes records.
Processors – The intermediate processing units that can filter, transform, and enrich records into a desired format before publishing them to the sink. The processor is an optional component of a pipeline.
Sink – The output component of a pipeline. It defines one or more destinations to which a pipeline publishes records. A sink can also be another pipeline, which allows you to chain multiple pipelines together.

You can process data files written in S3 buckets in two ways: by processing the files written to Amazon S3 in near real time using Amazon Simple Queue Service (Amazon SQS), or with the scheduled scans approach, in which you process the data files in batches using one-time or recurring scheduled scan configurations.

In the following section, we provide an overview of the solution and guide you through the steps to ingest CSV files from Amazon S3 into OpenSearch Service using the S3-SQS approach in OpenSearch Ingestion. Additionally, we demonstrate how to visualize the ingested data using OpenSearch Dashboards.

Solution overview

The following diagram outlines the workflow of ingesting CSV files from Amazon S3 into OpenSearch Service.

solution_overview

The workflow comprises the following steps:

The user uploads CSV files into Amazon S3 using techniques such as direct upload on the AWS Management Console or AWS Command Line Interface (AWS CLI), or through the Amazon S3 SDK.
Amazon SQS receives an Amazon S3 event notification as a JSON file with metadata such as the S3 bucket name, object key, and timestamp.
The OpenSearch Ingestion pipeline receives the message from Amazon SQS, loads the files from Amazon S3, and parses the CSV data from the message into columns. It then creates an index in the OpenSearch Service domain and adds the data to the index.
Lastly, you create an index pattern and visualize the ingested data using OpenSearch Dashboards.

OpenSearch Ingestion provides a serverless ingestion framework to effortlessly ingest data into OpenSearch Service with just a few clicks.

Prerequisites

Make sure you meet the following prerequisites:

You must have access to the AWS account in which you wish to set up this solution.
You must have created an OpenSearch Service domain. For instructions, refer to Creating and managing Amazon OpenSearch Service domains.

Create an SQS queue

Amazon SQS offers a secure, durable, and available hosted queue that lets you integrate and decouple distributed software systems and components. Create a standard SQS queue and provide a descriptive name for the queue, then update the access policy by navigating to the Amazon SQS console, opening the details of your queue, and editing the policy on the Advanced tab.

The following is a sample access policy you could use for reference to update the access policy:

{
  "Version": "2008-10-17",
  "Id": "example-ID",
  "Statement": [
    {
      "Sid": "example-statement-ID",
      "Effect": "Allow",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Action": "SQS:SendMessage",
      "Resource": "<SQS_QUEUE_ARN>"
    }
  ]
}

SQS FIFO (First-In-First-Out) queues aren’t supported as an Amazon S3 event notification destination. To send a notification for an Amazon S3 event to an SQS FIFO queue, you can use Amazon EventBridge.

create_sqs_queue

Create an S3 bucket and enable Amazon S3 event notification

Create an S3 bucket that will be the source for CSV files and enable Amazon S3 notifications. The Amazon S3 notification invokes an action in response to a specific event in the bucket. In this workflow, whenever there in an event of type S3:ObjectCreated:*, the event sends an Amazon S3 notification to the SQS queue created in the previous step. Refer to Walkthrough: Configuring a bucket for notifications (SNS topic or SQS queue) to configure the Amazon S3 notification in your S3 bucket.

create_s3_bucket

Create an IAM policy for the OpenSearch Ingest pipeline

Create an AWS Identity and Access Management (IAM) policy for the OpenSearch pipeline with the following permissions:

Read and delete rights on Amazon SQS
GetObject rights on Amazon S3
Describe domain and ESHttp rights on your OpenSearch Service domain

The following is an example policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "es:DescribeDomain",
      "Resource": "<OPENSEARCH_SERVICE_DOMAIN_ENDPOINT>:domain/*"
    },
    {
      "Effect": "Allow",
      "Action": "es:ESHttp*",
      "Resource": "<OPENSEARCH_SERVICE_DOMAIN_ENDPOINT>/*"
    },
    {
      "Effect": "Allow",
      "Action": "s3:GetObject",
      "Resource": "<S3_BUCKET_ARN>/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "sqs:DeleteMessage",
        "sqs:ReceiveMessage"
      ],
      "Resource": "<SQS_QUEUE_ARN>"
    }
  ]
}

create_policy

Create an IAM role and attach the IAM policy

A trust relationship defines which entities (such as AWS accounts, IAM users, roles, or services) are allowed to assume a particular IAM role. Create an IAM role for the OpenSearch Ingestion pipeline (osis-pipelines.amazonaws.com), attach the IAM policy created in the previous step, and add the trust relationship to allow OpenSearch Ingestion pipelines to write to domains.

create_iam_role

Configure an OpenSearch Ingestion pipeline

A pipeline is the mechanism that OpenSearch Ingestion uses to move data from its source (where the data comes from) to its sink (where the data goes). OpenSearch Ingestion provides out-of-the-box configuration blueprints to help you quickly set up pipelines without having to author a configuration from scratch. Set up the S3 bucket as the source and OpenSearch Service domain as the sink in the OpenSearch Ingestion pipeline with the following blueprint:

version: '2'
s3-pipeline:
  source:
    s3:
      acknowledgments: true
      notification_type: sqs
      compression: automatic
      codec:
        newline: 
          #header_destination: <column_names>
      sqs:
        queue_url: <SQS_QUEUE_URL>
      aws:
        region: <AWS_REGION>
        sts_role_arn: <STS_ROLE_ARN>
  processor:
    - csv:
        column_names_source_key: column_names
        column_names:
          - row_id
          - order_id
          - order_date
          - date_key
          - contact_name
          - country
          - city
          - region
          - sub_region
          - customer
          - customer_id
          - industry
          - segment
          - product
          - license
          - sales
          - quantity
          - discount
          - profit
    - convert_entry_type:
        key: sales
        type: double
    - convert_entry_type:
        key: profit
        type: double
    - convert_entry_type:
        key: discount
        type: double
    - convert_entry_type:
        key: quantity
        type: integer
    - date:
        match:
          - key: order_date
            patterns:
              - MM/dd/yyyy
        destination: order_date_new
  sink:
    - opensearch:
        hosts:
          - <OPEN_SEARCH_SERVICE_DOMAIN_ENDPOINT>
        index: csv-ingest-index
        aws:
          sts_role_arn: <STS_ROLE_ARN>
          region: <AWS_REGION>

On the OpenSearch Service console, create a pipeline with the name my-pipeline. Keep the default capacity settings and enter the preceding pipeline configuration in the Pipeline configuration section.

Update the configuration setting with the previously created IAM roles to read from Amazon S3 and write into OpenSearch Service, the SQS queue URL, and the OpenSearch Service domain endpoint.

create_pipeline

Validate the solution

To validate this solution, you can use the dataset SaaS-Sales.csv. This dataset contains transaction data from a software as a service (SaaS) company selling sales and marketing software to other companies (B2B). You can initiate this workflow by uploading the SaaS-Sales.csv file to the S3 bucket. This invokes the pipeline and creates an index in the OpenSearch Service domain you created earlier.

Follow these steps to validate the data using OpenSearch Dashboards.

First, you create an index pattern. An index pattern is a way to define a logical grouping of indexes that share a common naming convention. This allows you to search and analyze data across all matching indexes using a single query or visualization. For example, if you named your indexes csv-ingest-index-2024-01-01 and csv-ingest-index-2024-01-02 while ingesting the monthly sales data, you can define an index pattern as csv-* to encompass all these indexes.

create_index_pattern

Next, you create a visualization. Visualizations are powerful tools to explore and analyze data stored in OpenSearch indexes. You can gather these visualizations into a real time OpenSearch dashboard. An OpenSearch dashboard provides a user-friendly interface for creating various types of visualizations such as charts, graphs, maps, and dashboards to gain insights from data.

You can visualize the sales data by industry with a pie chart with the index pattern created in the previous step. To create a pie chart, update the metrics details as follows on the Data tab:

Set Metrics to Slice
Set Aggregation to Sum
Set Field to sales

create_dashboard

To view the industry-wise sales details in the pie chart, add a new bucket on the Data tab as follows:

Set Buckets to Split Slices
Set Aggregation to Terms
Set Field to industry.keyword

create_pie_chart

You can visualize the data by creating more visuals in the OpenSearch dashboard.

add_visuals

Clean up

When you’re done exploring OpenSearch Ingestion and OpenSearch Dashboards, you can delete the resources you created to avoid incurring further costs.

Conclusion

In this post, you learned how to ingest CSV files efficiently from S3 buckets into OpenSearch Service with the OpenSearch Ingestion feature in a serverless way without requiring a third-party agent. You also learned how to analyze the ingested data using OpenSearch dashboard visualizations. You can now explore extending this solution to build OpenSearch Ingestion pipelines to load your data and derive insights with OpenSearch Dashboards.

About the Authors

Sharmila Shanmugam is a Solutions Architect at Amazon Web Services. She is passionate about solving the customers’ business challenges with technology and automation and reduce the operational overhead. In her current role, she helps customers across industries in their digital transformation journey and build secure, scalable, performant and optimized workloads on AWS.

Harsh Bansal is an Analytics Solutions Architect with Amazon Web Services. In his role, he collaborates closely with clients, assisting in their migration to cloud platforms and optimizing cluster setups to enhance performance and reduce costs. Before joining AWS, he supported clients in leveraging OpenSearch and Elasticsearch for diverse search and log analytics requirements.

Rohit Kumar works as a Cloud Support Engineer in the Support Engineering team at Amazon Web Services. He focuses on Amazon OpenSearch Service, offering guidance and technical help to customers, helping them create scalable, highly available, and secure solutions on AWS Cloud. Outside of work, Rohit enjoys watching or playing cricket. He also loves traveling and discovering new places. Essentially, his routine revolves around eating, traveling, cricket, and repeating the cycle.

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 2

2024-06-12 Asad Bin Imtiaz

Post Syndicated from Asad Bin Imtiaz original https://aws.amazon.com/blogs/big-data/how-swisscom-automated-amazon-redshift-as-part-of-their-one-data-platform-solution-using-aws-cdk-part-2/

In this series, we talk about Swisscom’s journey of automating Amazon Redshift provisioning as part of the Swisscom One Data Platform (ODP) solution using the AWS Cloud Development Kit (AWS CDK), and we provide code snippets and the other useful references.

In Part 1, we did a deep dive on provisioning a secure and compliant Redshift cluster using the AWS CDK and the best practices of secret rotation. We also explained how Swisscom used AWS CDK custom resources to automate the creation of dynamic user groups that are relevant for the AWS Identity and Access Management (IAM) roles matching different job functions.

In this post, we explore using the AWS CDK and some of the key topics for self-service usage of the provisioned Redshift cluster by end-users as well as other managed services and applications. These topics include federation with the Swisscom identity provider (IdP), JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

Scheduled actions

To optimize cost-efficiency for provisioned Redshift cluster deployments, Swisscom implemented a scheduling mechanism. This functionality is driven by the user configuration of the cluster, as described in Part 1 of this series, wherein the user may enable dynamic pausing and resuming of clusters based on specified cron expressions:

redshift_options:
...
  use_scheduler: true                                         # Whether to use Redshift scheduler
  scheduler_pause_cron: "cron(00 18 ? * MON-FRI *)"           # Cron expression for scheduler pause
  scheduler_resume_cron: "cron(00 08 ? * MON-FRI *)"          # Cron expression for scheduler resume
...

This feature allows Swisscom to reduce operational costs by suspending cluster activity during off-peak hours. This leads to significant cost savings by pausing and resuming clusters at appropriate times. The scheduling is achieved using the AWS CloudFormation action CfnScheduledAction. The following code illustrates how Swisscom implemented this scheduling:

if config.use_scheduler:
    cfn_scheduled_action_pause = aws_redshift.CfnScheduledAction(
        scope, "schedule-pause-action",
        # ...
        schedule=config.scheduler_pause_cron,
        # ...
        target_action=aws_redshift.CfnScheduledAction.ScheduledActionTypeProperty(
                         pause_cluster=aws_redshift.CfnScheduledAction.ResumeClusterMessageProperty(
                            cluster_identifier='cluster-identifier'
                         )
                      )
    )

    cfn_scheduled_action_resume = aws_redshift.CfnScheduledAction(
        scope, "schedule-resume-action",
        # ...
        schedule=config.scheduler_resume_cron,
        # ...
        target_action=aws_redshift.CfnScheduledAction.ScheduledActionTypeProperty(
                         resume_cluster=aws_redshift.CfnScheduledAction.ResumeClusterMessageProperty(
                            cluster_identifier='cluster-identifier'
                         )
                      )
    )

JDBC connections

The JDBC connectivity for Amazon Redshift clusters was also very flexible, adapting to user-defined subnet types and security groups in the configuration:

redshift_options:
...
  subnet_type: "routable-private"         # 'routable-private' OR 'non-routable-private'
  security_group_id: "sg-test_redshift"   # Security Group ID for Amazon Redshift (referenced group must exists in Account)
...

As illustrated in the ODP architecture diagram in Part 1 of this series, a considerable part of extract, transform, and load (ETL) processes is anticipated to operate outside of Amazon Redshift, within the serverless AWS Glue environment. Given this, Swisscom needed a mechanism for AWS Glue to connect to Amazon Redshift. This connectivity to Redshift clusters is provided through JDBC by creating an AWS Glue connection within the AWS CDK code. This connection allows ETL processes to interact with the Redshift cluster by establishing a JDBC connection. The subnet and security group defined in the user configuration guide the creation of JDBC connectivity. If no security groups are defined in the configuration, a default one is created. The connection is configured with details of the data product from which the Redshift cluster is being provisioned, like ETL user and default database, along with network elements like cluster endpoint, security group, and subnet to use, providing secure and efficient data transfer. The following code snippet demonstrates how this was achieved:

jdbc_connection = glue.Connection(
    scope, "redshift-glue-connection",
    type=ConnectionType("JDBC"),
    connection_name="redshift-glue-connection",
    subnet=connection_subnet,
    security_groups=connection_security_groups,
    properties={
        "JDBC_CONNECTION_URL": f"jdbc:redshift://{cluster_endpoint}/{database_name}",
        "USERNAME": etl_user.username,
        "PASSWORD": etl_user.password.to_string(),
        "redshiftTmpDir": f"s3://{data_product_name}-redshift-work"
    }
)

By doing this, Swisscom made sure that serverless ETL workflows in AWS Glue can securely communicate with newly provisioned Redshift cluster running within a secured virtual private cloud (VPC).

Identity federation

Identity federation allows a centralized system (the IdP) to be used for authenticating users in order to access a service provider like Amazon Redshift. A more general overview of the topic can be found in Identity Federation in AWS.

Identity federation not only enhances security due to its centralized user lifecycle management and centralized authentication mechanism (for example, supporting multi-factor authentication), but also improves the user experience and reduces the overall complexity of identity and access management and thereby also its governance.

In Swisscom’s setup, Microsoft Active Directory Services are used for identity and access management. At the initial build stages of ODP, Amazon Redshift offered two different options for identity federation:

IAM-based SAML 2.0 IdP federation, as outlined in Federate Amazon Redshift access with Microsoft Azure AD single sign-on. See also Using IAM authentication to generate database user credentials.
Native IdP federation, as outlined in Integrate Amazon Redshift native IdP federation with Microsoft Azure AD using a SQL client. See also Native identity provider (IdP) federation for Amazon Redshift.

In Swisscom’s context, during the initial implementation, Swisscom opted for IAM-based SAML 2.0 IdP federation because this is a more general approach, which can also be used for other AWS services, such as Amazon QuickSight (see Setting up IdP federation using IAM and QuickSight).

At 2023 AWS re:Invent, AWS announced a new connection option to Amazon Redshift based on AWS IAM Identity Center. IAM Identity Center provides a single place for workforce identities in AWS, allowing the creation of users and groups directly within itself or by federation with standard IdPs like Okta, PingOne, Microsoft Entra ID (Azure AD), or any IdP that supports SAML 2.0 and SCIM. It also provides a single sign-on (SSO) experience for Redshift features and other analytics services such as Amazon Redshift Query Editor V2 (see Integrate Identity Provider (IdP) with Amazon Redshift Query Editor V2 using AWS IAM Identity Center for seamless Single Sign-On), QuickSight, and AWS Lake Formation. Moreover, a single IAM Identity Center instance can be shared with multiple Redshift clusters and workgroups with a simple auto-discovery and connect capability. It makes sure all Redshift clusters and workgroups have a consistent view of users, their attributes, and groups. This whole setup fits well with ODP’s vision of providing self-service analytics across the Swisscom workforce with necessary security controls in place. At the time of writing, Swisscom is actively working towards using IAM Identity Center as the standard federation solution for ODP. The following diagram illustrates the high-level architecture for the work in progress.

Audit logging

Amazon Redshift audit logging is useful for auditing for security purposes, monitoring, and troubleshooting. The logging provides information, such as the IP address of the user’s computer, the type of authentication used by the user, or the timestamp of the request. Amazon Redshift logs the SQL operations, including connection attempts, queries, and changes, and makes it straightforward to track the changes. These logs can be accessed through SQL queries against system tables, saved to a secure Amazon Simple Storage Service (Amazon S3) location, or exported to Amazon CloudWatch.

Amazon Redshift logs information in the following log files:

Connection log – Provides information to monitor users connecting to the database and related connection information like their IP address.
User log – Logs information about changes to database user definitions.
User activity log – Tracks information about the types of queries that both the users and the system perform in the database. It’s useful primarily for troubleshooting purposes.

With the ODP solution, Swisscom wanted to write all the Amazon Redshift logs to CloudWatch. This is currently not directly supported by the AWS CDK, so Swisscom implemented a workaround solution using the AWS CDK custom resources option, which invokes the SDK on the Redshift action enableLogging. See the following code:

    custom_resources.AwsCustomResource(self, f"{self.cluster_identifier}-custom-sdk-logging",
           on_update=custom_resources.AwsSdkCall(
               service="Redshift",
               action="enableLogging",
               parameters={
                   "ClusterIdentifier": self.cluster_identifier,
                   "LogDestinationType": "cloudwatch",
                   "LogExports": ["connectionlog","userlog","useractivitylog"],
               },
               physical_resource_id=custom_resources.PhysicalResourceId.of(
                   f"{self.account}-{self.region}-{self.cluster_identifier}-logging")
           ),
           policy=custom_resources.AwsCustomResourcePolicy.from_sdk_calls(
               resources=[f"arn:aws:redshift:{self.region}:{self.account}:cluster:{self.cluster_identifier}"]
           )
        )

AWS Config rules and remediation

After a Redshift cluster has been deployed, Swisscom needed to make sure that the cluster meets the governance rules defined in every point in time after creation. For that, Swisscom decided to use AWS Config.

AWS Config provides a detailed view of the configuration of AWS resources in your AWS account. This includes how the resources are related to one another and how they were configured in the past so you can see how the configurations and relationships change over time.

An AWS resource is an entity you can work with in AWS, such as an Amazon Elastic Compute Cloud (Amazon EC2) instance, Amazon Elastic Block Store (Amazon EBS) volume, security group, or Amazon VPC.

The following diagram illustrates the process Swisscom implemented.

If an AWS Config rule isn’t compliant, a remediation can be applied. Swisscom defined the pause cluster action as default in case of a non-compliant cluster (based on your requirements, other remediation actions are possible). This is covered using an AWS Systems Manager automation document (SSM document).

Automation, a capability of Systems Manager, simplifies common maintenance, deployment, and remediation tasks for AWS services like Amazon EC2, Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon S3, and many more.

The SSM document is based on the AWS document AWSConfigRemediation-DeleteRedshiftCluster. It looks like the following code:

description: | 
  ### Document name - PauseRedshiftCluster-WithCheck 

  ## What does this document do? 
  This document pauses the given Amazon Redshift cluster using the [PauseCluster](https://docs.aws.amazon.com/redshift/latest/APIReference/API_PauseCluster.html) API. 

  ## Input Parameters 
  * AutomationAssumeRole: (Required) The ARN of the role that allows Automation to perform the actions on your behalf. 
  * ClusterIdentifier: (Required) The identifier of the Amazon Redshift Cluster. 

  ## Output Parameters 
  * PauseRedshiftClusterWithoutSnapShot.Response: The standard HTTP response from the PauseCluster API. 
  * PauseRedshiftClusterWithSnapShot.Response: The standard HTTP response from the PauseCluster API. 
schemaVersion: '0.3' 
assumeRole: '{{ AutomationAssumeRole }}' 
parameters: 
  AutomationAssumeRole: 
    type: String 
    description: (Required) The ARN of the role that allows Automation to perform the actions on your behalf. 
    allowedPattern: '^arn:aws[a-z0-9-]*:iam::\d{12}:role\/[\w-\/.@+=,]{1,1017}$' 
  ClusterIdentifier: 
    type: String 
    description: (Required) The identifier of the Amazon Redshift Cluster. 
    allowedPattern: '[a-z]{1}[a-z0-9_.-]{0,62}' 
mainSteps: 
  - name: GetRedshiftClusterStatus 
    action: 'aws:executeAwsApi' 
    inputs: 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
      Service: redshift 
      Api: DescribeClusters 
    description: |- 
      ## GetRedshiftClusterStatus 
      Gets the status for the given Amazon Redshift Cluster. 
    outputs: 
      - Name: ClusterStatus 
        Selector: '$.Clusters[0].ClusterStatus' 
        Type: String 
    timeoutSeconds: 600 
  - name: Condition 
    action: 'aws:branch' 
    inputs: 
      Choices: 
        - NextStep: PauseRedshiftCluster 
          Variable: '{{ GetRedshiftClusterStatus.ClusterStatus }}' 
          StringEquals: available 
      Default: Finish 
  - name: PauseRedshiftCluster 
    action: 'aws:executeAwsApi' 
    description: | 
      ## PauseRedshiftCluster 
      Makes PauseCluster API call using Amazon Redshift Cluster identifier and pauses the cluster without taking any final snapshot. 
      ## Outputs 
      * Response: The standard HTTP response from the PauseCluster API. 
    timeoutSeconds: 600 
    isEnd: false 
    nextStep: VerifyRedshiftClusterPause 
    inputs: 
      Service: redshift 
      Api: PauseCluster 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
    outputs: 
      - Name: Response 
        Selector: $ 
        Type: StringMap 
  - name: VerifyRedshiftClusterPause 
    action: 'aws:assertAwsResourceProperty' 
    timeoutSeconds: 600 
    isEnd: true 
    description: | 
      ## VerifyRedshiftClusterPause 
      Verifies the given Amazon Redshift Cluster is paused. 
    inputs: 
      Service: redshift 
      Api: DescribeClusters 
      ClusterIdentifier: '{{ ClusterIdentifier }}' 
      PropertySelector: '$.Clusters[0].ClusterStatus' 
      DesiredValues: 
        - pausing 
  - name: Finish 
    action: 'aws:sleep' 
    inputs: 
      Duration: PT1S 
    isEnd: true

The SSM automations document is deployed with the AWS CDK:

from aws_cdk import aws_ssm as ssm  

ssm_document_content = #read yaml document as dict  

document_id = 'automation_id'   
document_name = 'automation_name' 

document = ssm.CfnDocument(scope, id=document_id, content=ssm_document_content,  
                           document_format="YAML", document_type='Automation', name=document_name) 

To run the automation document, AWS Config needs the right permissions. You can create an IAM role for this purpose:

from aws_cdk import iam 

#Create role for the automation 
role_name = 'role-to-pause-redshift'
automation_role = iam.Role(scope, 'role-to-pause-redshift-cluster', 
                           assumed_by=iam.ServicePrincipal('ssm.amazonaws.com'), 
                           role_name=role_name) 

automation_policy = iam.Policy(scope, "policy-to-pause-cluster", 
                               policy_name='policy-to-pause-cluster', 
                               statements=[ 
                                   iam.PolicyStatement( 
                                       effect=iam.Effect.ALLOW, 
                                       actions=['redshift:PauseCluster', 
                                                'redshift:DescribeClusters'], 
                                       resources=['*'] 
                                   ) 
                               ]) 

automation_role.attach_inline_policy(automation_policy)

Swisscom defined the rules to be applied following AWS best practices (see Security Best Practices for Amazon Redshift). These are deployed as AWS Config conformance packs. A conformance pack is a collection of AWS Config rules and remediation actions that can be quickly deployed as a single entity in an AWS account and AWS Region or across an organization in AWS Organizations.

Conformance packs are created by authoring YAML templates that contain the list of AWS Config managed or custom rules and remediation actions. You can also use SSM documents to store your conformance pack templates on AWS and directly deploy conformance packs using SSM document names.

This AWS conformance pack can be deployed using the AWS CDK:

from aws_cdk import aws_config  
  
conformance_pack_template = # read yaml file as str 
conformance_pack_content = # substitute `role_arn_for_substitution` and `document_for_substitution` in conformance_pack_template

conformance_pack_id = 'conformance-pack-id' 
conformance_pack_name = 'conformance-pack-name' 


conformance_pack = aws_config.CfnConformancePack(scope, id=conformance_pack_id, 
                                                 conformance_pack_name=conformance_pack_name, 
                                                 template_body=conformance_pack_content)

Conclusion

Swisscom is building its next-generation data-as-a-service platform through a combination of automated provisioning processes, advanced security features, and user-configurable options to cater for diverse data handling and data products’ needs. The integration of the Amazon Redshift construct in the ODP framework is a significant stride in Swisscom’s journey towards a more connected and data-driven enterprise landscape.

In Part 1 of this series, we demonstrated how to provision a secure and compliant Redshift cluster using the AWS CDK as well as how to deal with the best practices of secret rotation. We also showed how to use AWS CDK custom resources in automating the creation of dynamic user groups that are relevant for the IAM roles matching different job functions.

In this post, we showed, through the usage of the AWS CDK, how to address key Redshift cluster usage topics such as federation with the Swisscom IdP, JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

The code snippets in this post are provided as is and will need to be adapted to your specific use cases. Before you get started, we highly recommend speaking to an Amazon Redshift specialist.

About the Authors

Asad bin Imtiaz is an Expert Data Engineer at Swisscom, with over 17 years of experience in architecting and implementing enterprise-level data solutions.

Jesús Montelongo Hernández is an Expert Cloud Data Engineer at Swisscom. He has over 20 years of experience in IT systems, data warehousing, and data engineering.

Samuel Bucheli is a Lead Cloud Architect at Zühlke Engineering AG. He has over 20 years of experience in software engineering, software architecture, and cloud architecture.

Srikanth Potu is a Senior Consultant in EMEA, part of the Professional Services organization at Amazon Web Services. He has over 25 years of experience in Enterprise data architecture, databases and data warehousing.

How Swisscom automated Amazon Redshift as part of their One Data Platform solution using AWS CDK – Part 1

2024-06-12 Asad Bin Imtiaz

Post Syndicated from Asad Bin Imtiaz original https://aws.amazon.com/blogs/big-data/how-swisscom-automated-amazon-redshift-as-part-of-their-one-data-platform-solution-using-aws-cdk-part-1/

Swisscom is a leading telecommunications provider in Switzerland. Swisscom’s Data, Analytics, and AI division is building a One Data Platform (ODP) solution that will enable every Swisscom employee, process, and product to benefit from the massive value of Swisscom’s data.

In a two-part series, we talk about Swisscom’s journey of automating Amazon Redshift provisioning as part of the Swisscom ODP solution using the AWS Cloud Development Kit (AWS CDK), and we provide code snippets and the other useful references.

In this post, we deep dive into provisioning a secure and compliant Redshift cluster using the AWS CDK and discuss the best practices of secret rotation. We also explain how Swisscom used AWS CDK custom resources in automating the creation of dynamic user groups that are relevant for the AWS Identity and Access management (IAM) roles matching different job functions.

In Part 2 of this series, we explore using the AWS CDK and some of the key topics for self-service usage of the provisioned Redshift cluster by end-users as well as other managed services and applications. These topics include federation with the Swisscom identity provider (IdP), JDBC connections, detective controls using AWS Config rules and remediation actions, cost optimization using the Redshift scheduler, and audit logging.

Amazon Redshift is a fast, scalable, secure, fully managed, and petabyte scale data warehousing service empowering organizations and users to analyze massive volumes of data using standard SQL tools. Amazon Redshift benefits from seamless integration with many AWS services, such as Amazon Simple Storage Service (Amazon S3), AWS Key Management Service (AWS KMS), IAM, and AWS Lake Formation, to name a few.

The AWS CDK helps you build reliable, scalable, and cost-effective applications in the cloud with the considerable expressive power of a programming language. The AWS CDK supports TypeScript, JavaScript, Python, Java, C#/.Net, and Go. Developers can use one of these supported programming languages to define reusable cloud components known as constructs. A data product owner in Swisscom can use the ODP AWS CDK libraries with a simple config file to provision ready-to-use infrastructure, such as S3 buckets; AWS Glue ETL (extract, transform, and load) jobs, Data Catalog databases, and crawlers; Redshift clusters; JDBC connections; and more, with all the needed permissions in just a few minutes.

One Data Platform

The ODP architecture is based on the AWS Well Architected Framework Analytics Lens and follows the pattern of having raw, standardized, conformed, and enriched layers as described in Modern data architecture. By using infrastructure as code (IaC) tools, ODP enables self-service data access with unified data management, metadata management (data catalog), and standard interfaces for analytics tools with a high degree of automation by providing the infrastructure, integrations, and compliance measures out of the box. At the same time, the ODP will also be continuously evolving and adapting to the constant stream of new additional features being added to the AWS analytics services. The following high-level architecture diagram shows ODP with different layers of the modern data architecture. In this series, we specifically discuss the components specific to Amazon Redshift (highlighted in red).

Harnessing Amazon Redshift for ODP

A pivotal decision in the data warehousing migration process involves evaluating the extent of a lift-and-shift approach vs. re-architecture. Balancing system performance, scalability, and cost while taking into account the rigid system pieces requires a strategic solution. In this context, Amazon Redshift has stood out as a cloud-centered data warehousing solution, especially with its straightforward and seamless integration into the modern data architecture. Its straightforward integration and fluid compatibility with AWS services like Amazon QuickSight, Amazon SageMaker, and Lake Formation further solidifies its choice for forward-thinking data warehousing strategies. As a columnar database, it’s particularly well suited for consumer-oriented data products. Consequently, Swisscom chose to provide a solution wherein use case-specific Redshift clusters are provisioned using IaC, specifically using the AWS CDK.

A crucial aspect of Swisscom’s strategy is the integration of these data domain and use case-oriented individual clusters into a virtually single and unified data environment, making sure that data ingestion, transformation, and eventual data product sharing remains convenient and seamless. This is achieved by custom provisioning of the Redshift clusters based on user or use case needs, in a shared virtual private cloud (VPC), with data and system governance policies and remediation, IdP federation, and Lake Formation integration already in place.

Although many controls for governance and security were put in place in the AWS CDK construct, Swisscom users also have the flexibility to customize their clusters based on what they need. The cluster configurator allows users to define the cluster characteristics based on individual use case requirements while remaining within the bounds of defined best practices. The key configurable parameters include node types, sizing, subnet types for routing based on different security policies per user case, enabling scheduler, integration with IdP setup, and any additional post-provisioning setup, like the creation of specific schemas and group-level access on it. This flexibility in configuration is achieved for the Amazon Redshift AWS CDK construct through a Python data class, which serves as a template for users to specify aspects like subnet types, scheduler cron expressions, and specific security groups for the cluster, among other configurations. Users are also able to select the type of subnets (routable-private or non-routable-private) to adhere to network security policies and architectural standards. See the following data class options:

class RedShiftOptions:
    node_type: NodeType
    number_of_nodes: int
    vpc_id: str
    security_group_id: Optional[str]
    subnet_type: SubnetType
    use_redshift_scheduler: bool
    scheduler_pause_cron: str
    scheduler_resume_cron: str
    maintenance_window: str
    # Additional configuration options ...

The separation of configuration in the RedShiftOptions data class from the cluster provisioning logic in the RedShiftCluster AWS CDK construct is in line with AWS CDK best practices, wherein both constructs and stacks should accept a property object to allow for full configurability completely in code. This separates the concerns of configuration and resource creation, enhancing the readability and maintainability. The data class structure reflects the user configuration from a configuration file, making it straightforward for users to specify their requirements. The following code shows what the configuration file for the Redshift construct looks like:

# ===============================
# Amazon Redshift Options
# ===============================
# The enriched layer is based on Amazon Redshift.
# This section has properties for Amazon Redshift.
#
redshift_options:
  provision_cluster: true                                     # Skip provisioning Amazon Redshift in enriched layer (required)
  number_of_nodes: 2                                          # Number of nodes for redshift cluster to provision (optional) (default = 2)
  node_type: "ra3.xlplus"                                     # Type of the cluster nodes (optional) (default = "ra3.xlplus")
  use_scheduler: true                                        # Whether to use the Amazon Redshift scheduler (optional)
  scheduler_pause_cron: "cron(00 18 ? * MON-FRI *)"           # Cron expression for scheduler pause (optional)
  scheduler_resume_cron: "cron(00 08 ? * MON-FRI *)"          # Cron expression for scheduler resume (optional)
  maintenance_window: "sun:23:45-mon:00:15"                   # Maintenance window for Amazon Redshift (optional)
  subnet_type: "routable-private"                             # 'routable-private' OR 'non-routable-private' (optional)
  security_group_id: "sg-test-redshift"                       # Security group ID for Amazon Redshift (optional) (reference must exist)
  user_groups:                                                # User groups and their privileges on default DB
    - group_name: dba
      access: [ 'ALL' ]
    - group_name: data_engineer
      access: [ 'SELECT' , 'INSERT' , 'UPDATE' , 'DELETE' , 'TRUNCATE' ]
    - group_name: qa_engineer
      access: [ 'SELECT' ]
  integrate_all_groups_with_idp: false

Admin user secret rotation

As part of the cluster deployment, an admin user is created with its credentials stored in AWS Secrets Manager for database management. This admin user is used for automating several setup operations, such as the setup of database schemas and integration with Lake Formation. For the admin user, as well as other users created for Amazon Redshift, Swisscom used AWS KMS for encryption of the secrets associated with cluster users. The use of Secrets Manager made it simple to adhere to IAM security best practices by supporting the automatic rotation of credentials. Such a setup can be quickly implemented on the AWS Management Console or may be integrated in AWS CDK code with friendly methods in the aws_redshift_alpha module. This module provides higher-level constructs (specifically, Layer 2 constructs), including convenience and helper methods, as well as sensible default values. This module is experimental and under active development and may have changes that aren’t backward compatible. See the following admin user code:

admin_secret_kms_key_options = KmsKeyOptions(
    ...
    key_name='redshift-admin-secret',
    service="secretsmanager"
)
admin_secret_kms_key = aws_kms.Key(
    scope, 'AdminSecretKmsKey,
    # ...
)

# ...

cluster = aws_redshift_alpha.Cluster(
            scope, cluster_identifier,
            # ...
            master_user=aws_redshift_alpha.Login(
                master_username='admin',
                encryption_key=admin_secret_kms_key
                ),
            default_database_name=database_name,
            # ...
        )

See the following code for secret rotation:

self.cluster.add_rotation_single_user(aws_cdk.Duration.days(60))

Methods such as add_rotation_single_user internally rely on a serverless application hosted in the AWS Serverless Application Model repository, which may be in a different AWS Region outside of the organization’s permission boundary. To effectively use such functions, make sure access to this serverless repository within the organization’s service control policies. If the access is not feasible, consider implementing solutions such as custom AWS Lambda functions replicating these functionalities (within your organization’s permission boundary).

AWS CDK custom resource

A key challenge Swisscom faced was automating the creation of dynamic user groups tied to specific IAM roles at deployment time. As an initial and simple solution, Swisscom’s approach was creating an AWS CDK custom resource using the admin user to submit and run SQL statements. This allowed Swisscom to embed the logic for the database schema, user group assignments, and Lake Formation-specific configurations directly within AWS CDK code, making sure that these crucial steps are automatically handled during cluster deployment. See the following code:

sql = get_rendered_stacked_sqls()

custom_resources.AwsCustomResource(scope, 'RedshiftSQLCustomResource',
                                           on_update=custom_resources.AwsSdkCall(
                                               service='RedshiftData',
                                               action='executeStatement',
                                               parameters={
                                                   'ClusterIdentifier': cluster_identifier,
                                                   'SecretArn': secret_arn,
                                                   'Database': database_name,
                                                   'Sql': f'{sqls}',
                                               },
                                               physical_resource_id=custom_resources.PhysicalResourceId.of(
                                                   f'{account}-{region}-{cluster_identifier}-groups')
                                           ),
                                           policy=custom_resources.AwsCustomResourcePolicy.from_sdk_calls(
                                               resources=[f'arn:aws:redshift:{region}:{account}:cluster:{cluster_identifier}']
                                           )
                                        )


cluster.secret.grant_read(groups_cr)

This method of dynamic SQL, embedded within the AWS CDK code, provides a unified deployment and post-setup of the Redshift cluster in a convenient manner. Although this approach unifies the deployment and post-provisioning configuration with SQL-based operations, it remains an initial strategy. It is tailored for convenience and efficiency in the current context. As ODP further evolves, Swisscom will iterate this solution to streamline SQL operations during cluster provisioning. Swisscom remains open to integrating external schema management tools or similar approaches where they add value.

Another aspect of Swisscom’s architecture is the dynamic creation of IAM roles tailored for the user groups for different job functions within the Amazon Redshift environment. This IAM role generation is also driven by the user configuration, acting as a blueprint for dynamically defining user role to policy mappings. This allowed them to quickly adapt to evolving requirements. The following code illustrates the role assignment:

policy_mappings = {
    "role1": ["Policy1", "Policy2"],
    "role2": ["Policy3", "Policy4"],
    ...
    # Example:
    # "dba-role": ["AmazonRedshiftFullAccess", "CloudWatchFullAccess"],
    # ...
}

def create_redshift_role(role_name, policy_list):
   # Implementation to create Redshift role with provided policies
   ...

redshift_role_1 = create_redshift_role(
    data_product_name, "role1", policy_names=policy_mappings["role1"])
redshift_role_1 = create_redshift_role(
    data_product_name, "role1", policy_names=policy_mappings["role1"])
# Example:
# redshift_dba_role = create_redshift_role(
#   data_product_name, "dba-role", policy_names=policy_mappings["dba-role"])
...

Conclusion

Swisscom is building its data-as-a-service platform, and Amazon Redshift has a crucial role as part of the solution. In this post, we discussed the aspects that need to be covered in your IaC best practices to deploy secure and maintainable Redshift clusters using the AWS CDK. Although Amazon Redshift supports industry-leading security, there are aspects organizations need to adjust to their specific requirements. It is therefore important to define the configurations and best practices that are right for your organization and bring it to your IaC to make it available for your end consumers.

We also discussed how to provision a secure and compliant Redshift cluster using the AWS CDK and deal with the best practices of secret rotation. We also showed how to use AWS CDK custom resources in automating the creation of dynamic user groups that are relevant for the IAM roles matching different job functions.

In Part 2 of this series, we will delve into enhancing self-service capabilities for end-users. We will cover topics like integration with the Swisscom IdP, setting up JDBC connections, and implementing detective controls and remediation actions, among others.

The code snippets in this post are provided as is and will need to be adapted to your specific use cases. Before you get started, we highly recommend speaking to an Amazon Redshift specialist.

About the Authors

Asad bin Imtiaz is an Expert Data Engineer at Swisscom, with over 17 years of experience in architecting and implementing enterprise-level data solutions.

Jesús Montelongo Hernández is an Expert Cloud Data Engineer at Swisscom. He has over 20 years of experience in IT systems, data warehousing, and data engineering.

Samuel Bucheli is a Lead Cloud Architect at Zühlke Engineering AG. He has over 20 years of experience in software engineering, software architecture, and cloud architecture.

NVIDIA MLPerf Training V4.0 is Out and Very Boring

2024-06-12 Patrick Kennedy

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/nvidia-mlperf-training-v4-0-is-out-very-boring-amd-intel-nvidia-google/

NVIDIA’s MLPerf Training V4.0 is out. It is mostly NVIDIA H100 and H200 so if you are looking to compare others it is slim pickings

The post NVIDIA MLPerf Training V4.0 is Out and Very Boring appeared first on ServeTheHome.

[$] Elevating CentOS 7 to a new life

2024-06-12 jzb

Post Syndicated from jzb original https://lwn.net/Articles/977762/

CentOS Linux 7 was first
released in July 2014, and is due to go end-of-life (EOL) on June 30.
By now, anyone who pays attention to such things is aware that Red Hat pulled the plug on
CentOS Linux in late 2020 to be replaced by CentOS Stream
instead. CentOS Linux 8
support was wound
down at the end of 2021 rather than in 2029 as originally stated.
CentOS Linux 7 was allowed to serve out its
full lifespan—but that EOL is approaching rapidly and
there’s no direct upgrade path. Users and organizations looking for a lifeline might want to consider
AlmaLinux’s ELevate
utility, which allows CentOS users to migrate to alternate enterprise
Linux (EL) operating systems.

How to Win at Real Life

2024-06-12 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=E-7On8POkGA

Nominations are open for the PSF Board election

2024-06-12 jzb

Post Syndicated from jzb original https://lwn.net/Articles/978149/

The Python Software
Foundation (PSF) has announced
that nominations are open for the PSF Board election through June
25:

Who runs for the board? People who care about the Python community,
who want to see it flourish and grow, and also have a few hours a
month to attend regular meetings, serve on committees, participate in
conversations, and promote the Python community.

The PSF has a video about
serving on the board for those who might be interested. PSF members
can nominate themselves or another member. Candidates
will be announced on June 27. Voting begins on July 2 and will end on
July 16.

Ultimate Monopod Review & Buying Guide

2024-06-12 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/watch?v=Mjah59ZiZyM

[$] Memory sealing for the GNU C Library

2024-06-12 corbet

Post Syndicated from corbet original https://lwn.net/Articles/978010/

The mseal() system call allows a
process to prevent any future changes to portions of its address space
(thus “sealing” them); it was patterned after the mimmutable() system call in OpenBSD.
mseal() generated a lot of discussion, but it was finally merged
for the upcoming 6.10 kernel release. While mseal() was initially
aimed at securing the Chrome browser, the hope was that it would be useful
elsewhere; as a step toward realizing that hope, Adhemerval Zanella has
posted a
patch series adding support for — and use of — mseal() to the
GNU C library (glibc).

systemd 256 released

2024-06-12 corbet

Post Syndicated from corbet original https://lwn.net/Articles/978151/

Systemd 256 has been released. As usual, the list of changes is long; see
this article for an overview, or the
announcement for all the details.

Three mid-week stable kernel updates

2024-06-12 jzb

Post Syndicated from jzb original https://lwn.net/Articles/978139/

Greg Kroah-Hartman has announced another round of stable kernel
updates: 6.9.4, 6.6.33, and 6.1.93 have been released. Each contains
another set of important fixes, users of these kernels are advised to
upgrade right away.

Celebrating 10 years of Project Galileo

2024-06-12 Matthew Prince

Post Syndicated from Matthew Prince original https://blog.cloudflare.com/celebrating-10-years-of-project-galileo

One of the great benefits of the Internet has been its ability to empower activists and journalists in repressive societies to organize, communicate, and simply find each other. Ten years ago today, Cloudflare launched Project Galileo, a program which today provides security services, at no cost, to more than 2,600 independent journalists and nonprofit organizations around the world supporting human rights, democracy, and local communities. You can read last week’s blog and Radar dashboard that provide a snapshot of what public interest organizations experience on a daily basis when it comes to keeping their websites online.

Origins of Project Galileo

We’ve admitted before that Project Galileo was born out of a mistake, but it’s worth reminding ourselves. In 2014, when Cloudflare was a much smaller company with a smaller network, our free service did not include DDoS mitigation. If a free customer came under a withering attack, we would stop proxying traffic to protect our own network. It just made sense.

One evening, a site that was using us came under a significant DDoS attack, exhausting Cloudflare resources. After pulling up the site and seeing Cyrillic writing and pictures of men with guns, the young engineer on call followed the playbook. He pushed a button and sent all the attack traffic to the site’s origin, effectively kicking it off the Internet.

This was in 2014, during Russia’s first invasion into Ukraine, when Russia invaded Crimea. What the engineer did not know was that he had just kicked off an independent Ukrainian newspaper that was covering the attack and the invasions. The newspaper had tried to pay for services with a credit card but failed because Russia had targeted Ukraine’s financial infrastructure, taking banking institutions offline. It wasn’t the engineer’s fault. He had no reason to know that the site was important, and no alternative playbook to follow.

After that incident, we vowed to never let an organization that was serving such an important purpose go offline simply because they couldn’t pay for services. And so the idea for Project Galileo was born.

Although the idea of providing free security services was straightforward, figuring out which organizations are important enough to deserve such services was not. We know we can’t build a better Internet alone – it’s why Cloudflare’s mission is to help build a better Internet. So with Project Galileo, we sought the assistance of a group of civil society organizations to partner with us and help identify the organizations that need our protection.

Repression of ideas that were threatening to authority hardly started with DDoS attacks or the invention of the Internet. We named the effort Project Galileo after the story of Galileo Galilei. Galileo was persecuted in the 1600s for publishing a book concluding that the Earth was not at the center of the universe, but that the Earth orbits the sun. After Galileo was labeled a heretic, his book was banned and his ideas were suppressed for more than 100 years.

Four hundred years after Galileo, we see attempts to suppress the online voices of journalists and human rights workers who might challenge the status quo. We’re proud of the fact that through Project Galileo, we keep so many of those voices online.

Growth of Project Galileo

Ten years after the launch of Project Galileo, Cloudflare has changed a lot. Our network has grown from data centers in fewer than 30 cities in 2014 to a network that runs in 320 cities and more than 120 countries. We’ve massively expanded our product suite to include whole new lines of products, including a full set of Zero Trust services and a developer suite that enables developers to build a wide range of applications, including AI applications, on our network.

As Cloudflare has grown, so has Project Galileo. We have more than quadrupled the number of entities we protect in the last five years, from 600 at Project Galileo’s five-year anniversary to more than 2,600 today, located in 111 different countries. We’ve expanded from our original 14 civil society partners to 54 today. Our partners span countries, continents, and subject matter areas, sharing their expertise on organizations that would benefit from cybersecurity assistance.

When we expand our product offerings, we routinely ask whether new services would be valuable to the journalists, humanitarian groups, and nonprofits that benefit from Project Galileo. After Cloudflare launched our Zero Trust offering, we announced that we would offer those services for free to participants in Project Galileo to protect themselves against threats like data loss and malware. After Cloudflare acquired Area 1, we announced that we would offer Cloudflare’s email security products for free to the same participants.

We’ve tried to make our products easy for a small organization to use, building a Social Impact Portal and a Zero Trust roadmap for civil society and at-risk communities. Cloudflare’s teams also help participants onboard and troubleshoot when they face challenges.

What Project Galileo means for civil society groups now

On June 6, we celebrated Project Galileo’s 10-year anniversary with partners from government, civil society, and industry at an event in Washington, DC. We used the opportunity to talk about the future of the Internet, and how we can all work together to protect and advance the free and open Internet.

For humanitarian organizations with few resources, the types of services offered under Project Galileo can be life changing. At our Project Galileo event, we heard the story of a small French nonprofit that lost 17 years of data after being targeted by ransomware. Our resources help organizations defend themselves not only against nation states determined to take them offline, but also against common ransomware and phishing attacks.

During our event, the President of the National Endowment for Democracy (NED) told the story of traveling in the Western Balkans where the struggle for an independent media is palpable. NED is a strong supporter of media outlets across the region. But those media outlets come under frequent cyber attacks that have incapacitated their websites. As described by Damon Wilson:

Those attacks prevent news from reaching the public, where information is very much something that is used and weaponized against communities across Bosnia. And this was precisely the case with one of our partners, Buka. It’s a news outlet that’s based in Banja Luka and Republika Srpska. And while I was there, I met with some of our partners from Banja Luka who had been physically beaten up and intimidated. There’s a crackdown on civil society, new restrictions and laws against them. But for Buka, it was a little bit of a different scenario because earlier this year they suffered a DDoS attack, during which their server servers were overwhelmed by up to 700 million page requests. And the sheer volume suggests the attackers had significant resources, making it a particularly severe threat.

But by onboarding Buka into Project Galileo, we were able to help them restore their site’s functionality, and now Buka’s website is equipped to withstand even the most sophisticated attacks, ensuring that their critical reporting continues uninterrupted, exactly at the time when the Republic gets Covid, Republika Srpska government is looking to close and restrict independent civic voices in that part of Bosnia.

And this is just one example. Last week, traveling in Bosnia, of the numerous NED partners who’ve benefited from Cloudflare’s Project Galileo since NED became a partner in 2019, it’s profound to the efficacy of our partners’ work. It effectively ensures that bad actors can’t silence the voices and the work of democracy advocates and independent media around the world.

The importance of collaboration

Our work with Project Galileo highlights the power of the partnerships that we’ve built, not only with civil society, but with government and industry partners as well. By working together, we can expand protections for the many at-risk organizations that need cybersecurity assistance. Cybersecurity is a team sport.

In 2023, one of our Project Galileo partners, the CyberPeace Institute, approached us about doing even more to help protect nonprofit organizations against phishing attacks. The CyberPeace Institute collaborates with its partners to reduce the harms from cyberattacks on people’s lives worldwide and provide them assistance. CyberPeace also analyzes cyberattacks to expose their societal impact, to demonstrate how international laws and norms are being violated, and to advance responsible behavior in cyberspace.

CyberPeace realized that there was an opportunity to document attacks against civil society groups and improve the ecosystem for everyone. Many development and humanitarian organizations are small, with limited staff and little cybersecurity experience. They can easily fall prey to common cyber attacks – like phishing – designed to access their systems or steal their data. If they manage to use tools effectively to defend themselves, they do not typically report on the information about the attacks they see.

CyberPeace proposed to help onboard development and humanitarian organizations to Cloudflare services through their CyberPeace Builders program and analyze the phishing campaigns targeting those organizations. The substantive insights and information gained from that work could then be fed to other civil society organizations as real time security alerts. Cloudflare worked with CyberPeace to develop the new approach, enabling their volunteers to onboard organizations in their network to Area 1 tools and their analysts to access threat indicators from the collective organizations onboarded.

Government can play an important role in helping protect civil society from cyberattacks as well. Since the Summit for Democracy last year, Cloudflare has been working closely with the Joint Cyber Defense Collaborative (JCDC), which is run by the U.S. Cybersecurity and Infrastructure Security Agency (CISA), on their High-Risk Communities initiative. Earlier this year, JCDC launched a web page outlining cybersecurity resources for civil society communities facing digital security threats because of their work. The effort includes tools and services that nonprofits can use to secure themselves online, including those offered under Project Galileo.

Expanding Cloudflare’s Impact

In many ways, the creation of Project Galileo altered the trajectory of the company. Project Galileo cemented the idea that protecting and keeping important organizations online, regardless of whether they could pay us, was part of Cloudflare’s DNA. It pushed us to innovate to improve security not only for the large enterprises that pay us, but for the small organizations doing good for the world that cannot afford to pay for the latest technological innovation. It gave us our mission – to help build a better Internet – and a standard to live up to and measure ourselves against.

To meet that standard, we routinely reach out to offer our services to important organizations in need. In 2022, after Russia’s invasion of Ukraine, Cloudflare jumped in to offer services to Ukrainian critical infrastructure facing a barrage of cyberattacks and have continued providing them services ever since. At our Project Galileo event, the State Department’s Special Envoy and Coordinator for Digital Freedom Envoy read an email she’d received from Ukraine’s Deputy Foreign Minister and Chief Digital Transformation officer of Ukraine the night before:

It is absolutely definite that Cloudflare services provide a vital layer of cybersecurity within the Ukrainian segment of cyberspace. Numerous DDoS attacks are directed at state electronic services, fintech, official information sources. So if there was no Cloudflare as a proven protection against DDoS attacks, it would have serious consequences causing chaos, especially when these attacks are synchronized by the enemy in parallel with kinetic attacks.

We’ve launched sections of Cloudflare Radar designed to use Cloudflare’s network to help civil society monitor Internet outages and disruptions, as well as route hijacks and other traffic anomalies. We’ve participated in the Freedom Online Coalition’s Task Force on Internet Shutdowns.

Project Galileo also helped pave the way for a variety of Cloudflare projects to provide other at-risk populations free services. These programs include:

Athenian Project: Launched in 2017, the Athenian Project is Cloudflare’s program to protect election-related domains for state and local governments so that citizens have reliable access to information on voter registration, polling places, and the reporting of election results.
Cloudflare for Campaigns: Launched in 2020, Cloudflare for Campaigns helps secure US political candidates’ election websites and internal data while also ensuring site reliability during peak traffic periods. The program is run in partnership with Defending Digital Campaigns.
Project Pangea: Launched in 2021, Project Pangea is a program to provide secure, performant and reliable access to the Internet for community networks that support underserved communities.
Project Safekeeping: Launched in 2022, Project Safekeeping supports at-risk critical infrastructure entities in Australia, Japan, Germany, Portugal, and the UK by providing Zero Trust and application security solutions.
Project Cybersafe Schools: Launched in 2023, Project Cybersafe Schools equips small public school districts in the US with Zero Trust services, including email protection and DNS filtering.
Project Secure Health: Launched on June 10, 2024, Project Secure Health provides security tools to Australia’s general practitioner clinics to safeguard patient data and counter challenges such as data breaches, ransomware attacks, phishing scams, and insider threats.

Looking forward

The world has only gotten more complicated since we first launched Project Galileo in 2014. We face real challenges ranging from malicious cyber actors targeting critical infrastructure, to election interference, to data theft. Governments have responded with increasingly aggressive attempts to control aspects of the Internet. At our recent celebration of Project Galileo, we lamented the thirteenth consecutive year of decline of global Internet freedom, as documented by our Project Galileo partner Freedom House.

But one thing has not changed. We continue to believe the single, global Internet is a miracle that we should all be fighting for. We sometimes forget that the Internet is an incredibly radical concept. The world somehow came together over the last 40 years, agreed on a set of standards, and then made it so that a collection of networks could all exchange data. And that miracle that is the Internet has brought incredible opportunities for the voices of civil society to be heard, to help extend their impact, to spread their message, and to keep them connected.

Connecting everyone online in a permissionless way comes with real harms and real risks. But we need to be surgical as we address those challenges. We need to partner to find solutions that preserve the open Internet, much as we do with projects like Project Galileo. Even if we are at a moment of democratic decline, continuing to defend the open, interoperable Internet preserves space and capacity for a future in which the Internet can also fuel greater freedom.

OpenSUSE Leap 15.6 released

2024-06-12 corbet

Post Syndicated from corbet original https://lwn.net/Articles/978137/

The openSUSE
Leap 15.6 release is available; this is intended to be the last
Leap 15.x release before Leap 16 comes out.
“Leap 15.6 is projected to receive maintenance and security updates
until the end of 2025 to ensure sufficient overlap with the next
release“. Changes include the addition of the Cockpit server-management tool, a
6.4 kernel, GNOME 45, and many other upgrades. This release also
removes a long list of unmaintained Python packages. See the
release notes for details.

Security updates for Wednesday

2024-06-12 jzb

Post Syndicated from jzb original https://lwn.net/Articles/978136/

Security updates have been issued by AlmaLinux (booth), Debian (cyrus-imapd and vlc), Fedora (firefox, libarchive, php, and singularity-ce), Oracle (ipa and ruby:3.3), Red Hat (389-ds-base, buildah, c-ares, cockpit, containernetworking-plugins, fence-agents, gdk-pixbuf2, gvisor-tap-vsock, kernel, kernel-rt, kpatch-patch, libreoffice, podman, protobuf-c, python-idna, rpm-ostree, ruby, and tomcat), Slackware (cups and mozilla), SUSE (bind, cups, iperf, kernel, nano, and poppler), and Ubuntu (libapache-mod-jk, linux-aws, linux-aws-5.15, linux-aws, linux-oracle, linux-intel-iotg-5.15, linux-nvidia, and mysql-8.0).

The Gulag Uprising of 1954

2024-06-12 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=dxSGwsN28_w

Using AI for Political Polling

2024-06-12 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/06/using-ai-for-political-polling.html

Public polling is a critical function of modern political campaigns and movements, but it isn’t what it once was. Recent US election cycles have produced copious postmortems explaining both the successes and the flaws of public polling. There are two main reasons polling fails.

First, nonresponse has skyrocketed. It’s radically harder to reach people than it used to be. Few people fill out surveys that come in the mail anymore. Few people answer their phone when a stranger calls. Pew Research reported that 36% of the people they called in 1997 would talk to them, but only 6% by 2018. Pollsters worldwide have faced similar challenges.

Second, people don’t always tell pollsters what they really think. Some hide their true thoughts because they are embarrassed about them. Others behave as a partisan, telling the pollster what they think their party wants them to say—or what they know the other party doesn’t want to hear.

Despite these frailties, obsessive interest in polling nonetheless consumes our politics. Headlines more likely tout the latest changes in polling numbers than the policy issues at stake in the campaign. This is a tragedy for a democracy. We should treat elections like choices that have consequences for our lives and well-being, not contests to decide who gets which cushy job.

Polling Machines?

AI could change polling. AI can offer the ability to instantaneously survey and summarize the expressed opinions of individuals and groups across the web, understand trends by demographic, and offer extrapolations to new circumstances and policy issues on par with human experts. The politicians of the (near) future won’t anxiously pester their pollsters for information about the results of a survey fielded last week: they’ll just ask a chatbot what people think. This will supercharge our access to realtime, granular information about public opinion, but at the same time it might also exacerbate concerns about the quality of this information.

I know it sounds impossible, but stick with us.

Large language models, the AI foundations behind tools like ChatGPT, are built on top of huge corpuses of data culled from the Internet. These are models trained to recapitulate what millions of real people have written in response to endless topics, contexts, and scenarios. For a decade or more, campaigns have trawled social media, looking for hints and glimmers of how people are reacting to the latest political news. This makes asking questions of an AI chatbot similar in spirit to doing analytics on social media, except that they are generative: you can ask them new questions that no one has ever posted about before, you can generate more data from populations too small to measure robustly, and you can immediately ask clarifying questions of your simulated constituents to better understand their reasoning

Researchers and firms are already using LLMs to simulate polling results. Current techniques are based on the ideas of AI agents. An AI agent is an instance of an AI model that has been conditioned to behave in a certain way. For example, it may be primed to respond as if it is a person with certain demographic characteristics and can access news articles from certain outlets. Researchers have set up populations of thousands of AI agents that respond as if they are individual members of a survey population, like humans on a panel that get called periodically to answer questions.

The big difference between humans and AI agents is that the AI agents always pick up the phone, so to speak, no matter how many times you contact them. A political candidate or strategist can ask an AI agent whether voters will support them if they take position A versus B, or tweaks of those options, like policy A-1 versus A-2. They can ask that question of male voters versus female voters. They can further limit the query to married male voters of retirement age in rural districts of Illinois without college degrees who lost a job during the last recession; the AI will integrate as much context as you ask.

What’s so powerful about this system is that it can generalize to new scenarios and survey topics, and spit out a plausible answer, even if its accuracy is not guaranteed. In many cases, it will anticipate those responses at least as well as a human political expert. And if the results don’t make sense, the human can immediately prompt the AI with a dozen follow-up questions.

Making AI agents better polling subjects

When we ran our own experiments in this kind of AI use case with the earliest versions of the model behind ChatGPT (GPT-3.5), we found that it did a fairly good job at replicating human survey responses. The ChatGPT agents tended to match the responses of their human counterparts fairly well across a variety of survey questions, such as support for abortion and approval of the US Supreme Court. The AI polling results had average responses, and distributions across demographic properties such as age and gender, similar to real human survey panels.

Our major systemic failure happened on a question about US intervention in the Ukraine war. In our experiments, the AI agents conditioned to be liberal were predominantly opposed to US intervention in Ukraine and likened it to the Iraq war. Conservative AI agents gave hawkish responses supportive of US intervention. This is pretty much what most political experts would have expected of the political equilibrium in US foreign policy at the start of the decade but was exactly wrong in the politics of today.

This mistake has everything to do with timing. The humans were asked the question after Russia’s full-scale invasion in 2022, whereas the AI model was trained using data that only covered events through September 2021. The AI got it wrong because it didn’t know how the politics had changed. The model lacked sufficient context on crucially relevant recent events.

We believe AI agents can overcome these shortcomings. While AI models are dependent on the data they are trained with, and all the limitations inherent in that, what makes AI agents special is that they can automatically source and incorporate new data at the time they are asked a question. AI models can update the context in which they generate opinions by learning from the same sources that humans do. Each AI agent in a simulated panel can be exposed to the same social and media news sources as humans from that same demographic before they respond to a survey question. This works because AI agents can follow multi-step processes, such as reading a question, querying a defined database of information (such as Google, or the New York Times, or Fox News, or Reddit), and then answering a question.

In this way, AI polling tools can simulate exposing their synthetic survey panel to whatever news is most relevant to a topic and likely to emerge in each AI agent’s own echo chamber. And they can query for other relevant contextual information, such as demographic trends and historical data. Like human pollsters, they can try to refine their expectations on the basis of factors like how expensive homes are in a respondent’s neighborhood, or how many people in that district turned out to vote last cycle.

Likely use cases for AI polling

AI polling will be irresistible to campaigns, and to the media. But research is already revealing when and where this tool will fail. While AI polling will always have limitations in accuracy, that makes them similar to, not different from, traditional polling. Today’s pollsters are challenged to reach sample sizes large enough to measure statistically significant differences between similar populations, and the issues of nonresponse and inauthentic response can make them systematically wrong. Yet for all those shortcomings, both traditional and AI-based polls will still be useful. For all the hand-wringing and consternation over the accuracy of US political polling, national issue surveys still tend to be accurate to within a few percentage points. If you’re running for a town council seat or in a neck-and-neck national election, or just trying to make the right policy decision within a local government, you might care a lot about those small and localized differences. But if you’re looking to track directional changes over time, or differences between demographic groups, or to uncover insights about who responds best to what message, then these imperfect signals are sufficient to help campaigns and policymakers.

Where AI will work best is as an augmentation of more traditional human polls. Over time, AI tools will get better at anticipating human responses, and also at knowing when they will be most wrong or uncertain. They will recognize which issues and human communities are in the most flux, where the model’s training data is liable to steer it in the wrong direction. In those cases, AI models can send up a white flag and indicate that they need to engage human respondents to calibrate to real people’s perspectives. The AI agents can even be programmed to automate this. They can use existing survey tools—with all their limitations and latency—to query for authentic human responses when they need them.

This kind of human-AI polling chimera lands us, funnily enough, not too distant from where survey research is today. Decades of social science research has led to substantial innovations in statistical methodologies for analyzing survey data. Current polling methods already do substantial modeling and projecting to predictively model properties of a general population based on sparse survey samples. Today, humans fill out the surveys and computers fill in the gaps. In the future, it will be the opposite: AI will fill out the survey and, when the AI isn’t sure what box to check, humans will fill the gaps. So if you’re not comfortable with the idea that political leaders will turn to a machine to get intelligence about which candidates and policies you want, then you should have about as many misgivings about the present as you will the future.

And while the AI results could improve quickly, they probably won’t be seen as credible for some time. Directly asking people what they think feels more reliable than asking a computer what people think. We expect these AI-assisted polls will be initially used internally by campaigns, with news organizations relying on more traditional techniques. It will take a major election where AI is right and humans are wrong to change that.

This essay was written with Aaron Berger, Eric Gong, and Nathan Sanders, and previously appeared on the Harvard Kennedy School Ash Center’s website.

In the Works – AWS Region in Taiwan

2024-06-12 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/in-the-works-aws-region-in-taiwan/

Today, we’re announcing that a new AWS Region will be coming to Taiwan by early 2025. The new AWS Asia Pacific (Taipei) Region will consist of three Availability Zones at launch, and will give AWS customers in Taiwan the ability to run workloads and store data that must remain in Taiwan.

Each of the Availability Zones will be physically independent of the others in the Region – close enough to support applications that need low latency, yet sufficiently distant to significantly reduce the risk that an event at an Availability Zone level might impact business continuity.

The Availability Zones in this Region will be connected together through high-bandwidth, low-latency network connections over dedicated, fully redundant fiber. This connectivity supports applications that need synchronous replication between Availability Zones for availability or redundancy. You can take a peek at the AWS Global Infrastructure page to learn more about how we design and build Regions and Availability Zones.

We are currently working on Regions in Malaysia, Mexico, New Zealand, the Kingdom of Saudi Arabia, Thailand, and the AWS European Sovereign Cloud. The AWS Cloud operates 105 Availability Zones within 33 AWS Regions around the world, with announced plans for 21 more Availability Zones and seven more Regions, including Taiwan.

AWS in Taiwan
AWS has been investing and supporting customers and partners in Taiwan for more than 10 years. To support our customers in Taiwan, we have business development teams, solutions architects, partner managers, professional services consultants, support staff, and various other roles working in our Taipei office.

Other AWS infrastructure includes two Amazon CloudFront edge locations along with access to the AWS global backbone through multiple redundant submarine cables. You can access any other AWS Region (except Beijing and Ningxia) from AWS Direct Connect locations in Taipei, operated by Chief Telecom and Chunghwa Telecom. With AWS Direct Connect, your data that would have previously been transported over the internet is delivered through a private network connection between your facilities and AWS.

You can also use AWS Outposts in Taiwan, a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. With AWS Local Zones in Taipei, you can deliver applications that require single-digit millisecond latency to end users.

AWS continues to invest in upskilling students, local developers and technical professionals, nontechnical professionals, and the next generation of IT leaders in Taiwan through offerings like AWS Academy, AWS Educate, and AWS Skill Builder. Since 2017, AWS has trained more than eight million people across the Asia Pacific-Japan region on cloud skills, including more than 100,000 people in Taiwan.

To learn more, join AWS Summit 2024 Taiwan in July; in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS.

AWS customers in Taiwan
AWS customers in Taiwan have been increasingly moving their applications to AWS and running their technology infrastructure in other AWS Regions around the world. With the addition of this new AWS Region, customers will be able to provide even lower latency to end users and use advanced technologies such as generative artificial intelligence (generative AI), Internet of Things (IoT), mobile services, and more, to drive innovation. This Region will give AWS customers the ability to run their workloads and store their content in Taiwan.

Here are some examples of customers using AWS to drive innovation:

Chunghwa Telecom is the largest integrated telecom provider in Taiwan. To improve AI data security and governance, they use Amazon Bedrock for a variety of generative AI applications, including automatically generating specifications documents for the software development lifecycle and crafting custom marketing campaigns. With Amazon Bedrock, Chunghwa Telecom is saving developer hours and has also developed an immersive, interactive virtual English teacher for the first time.

Gamania Group is a leader in the development and publication of online games in Taiwan. To maximize the value of running on AWS, they worked with AWS Training and Certification to enhance AWS skills across all of its departments, such as AWS Classroom training, AWS Well-Architected Framework, and AWS GameDay events. As a result, they reduced the time needed to make critical operational decisions by 50 percent, lowered its time-to-market by up to 70 percent, and accelerated the launch of new games.

KKCompany Technologies is a multimedia technology group building a music streaming platform, AI-powered streaming solution, and cloud intelligence service in Taiwan. The company specializes in generative AI, multimedia technology, and digital transformation consulting services for enterprises in Taiwan and Japan. You can find BlendVision, a cloud-based streaming solution in AWS Marketplace.

To learn more about Taiwan customer cases, visit AWS Customer Success Stories in English or our Traditional Chinese page.

Stay Tuned
We’ll announce the opening of this and the other Regions in future blog posts, so be sure to stay tuned! To learn more, visit the AWS Region in Taiwan page in Traditional Chinese.

— Channy

SSH agent extensions as an arbitrary RPC mechanism

2024-06-12 Matthew Garrett

Post Syndicated from Matthew Garrett original https://mjg59.dreamwidth.org/69646.html

A while back, I wrote about using the SSH agent protocol to satisfy WebAuthn requests. The main problem with this approach is that it required starting the SSH agent with a special argument and also involved being a little too friendly with the implementation – things worked because I could provide an arbitrary public key and the implementation never validated that, but it would be legitimate for it to start doing so and then break everything. And it also only worked for keys stored on tokens that ssh supports – there was no way to extend this to other keystores on the client (such as the Secure Enclave on Macs, or TPM-backed keys on PCs). I wanted a better solution.

It turns out that it was far easier than I expected. The ssh agent protocol is documented here, and the interesting part is the extension support extension mechanism. Basically, you can declare an extension and then just tunnel whatever you want over it. As before, my goto was the go ssh agent package which conveniently implements both the client and server side of this. Implementing the local agent is trivial – look up SSH_AUTH_SOCK, connect to it, create a new agent client that can communicate with that by calling NewClient, and then implement the ExtendedAgent interface, create a new socket, and call ServeAgent against that. Most of the ExtendedAgent functions should simply call through to the original agent, with the exception of Extension(). Just add a case statement against extensionType, define some reasonably namespaced extension, and you’re done.

Now you need to use this agent. You probably don’t want to use this for arbitrary hosts (agent forwarding should only be enabled for remote systems you trust, not arbitrary machines you connect to – if you enabled agent forwarding for github and github got compromised, github would be able to use any private keys loaded into your agent, and you probably don’t want that). So the right approach is to add a Host entry to the ssh config with a ForwardAgent stanza pointing at the socket you created in your new agent. This way the configured subset of remote hosts will automatically talk to this new custom agent, while forwarding for anything else will still be at the user’s discretion.

For the remote end things are even easier. Look up SSH_AUTH_SOCK and call NewClient as before, and then simply call client.Extension(). Whatever you stick in the contents argument will simply end up being received at the client end. You now have a communication channel between a the remote system and the local client, and what you do with that is up to you. I’m using it to allow a remote system to obtain auth tokens from Okta and forward WebAuthn challenges that can either be satisfied via a local WebAuthn token or by passing the query off to Mac TouchID, but there’s fundamentally no constraints whatsoever on what can be done here.

(If you want to do this on Windows and still have everything work with existing clients you’ll need to take this into account – Windows didn’t really do Unix sockets until recently so everything there is awful)

comments

Wiwynn Shows Intel Gaudi 3 and AMD Instinct MI300X AI Systems at Comptuex 2024

2024-06-12 Cliff Robinson

Post Syndicated from Cliff Robinson original https://www.servethehome.com/wiwynn-shows-intel-gaudi-3-and-amd-instinct-mi300x-ai-systems-at-comptuex-2024/

At Computex 2024, the team saw Wiwynn’s Intel Gaudi 3 and AMD Instinct MI300X servers as well as ZutaCore 2-phase direct liquid cooling

The post Wiwynn Shows Intel Gaudi 3 and AMD Instinct MI300X AI Systems at Comptuex 2024 appeared first on ServeTheHome.

Visualize data in OpenSearch Dashboards

Ingest data with OpenSearch Ingestion

Solution overview

Prerequisites

Create an SQS queue

Create an S3 bucket and enable Amazon S3 event notification

Create an IAM policy for the OpenSearch Ingest pipeline

Create an IAM role and attach the IAM policy

Configure an OpenSearch Ingestion pipeline

Validate the solution

Clean up

Conclusion

About the Authors

Scheduled actions

JDBC connections

Identity federation

Audit logging

AWS Config rules and remediation

Conclusion

About the Authors

One Data Platform

Harnessing Amazon Redshift for ODP

Admin user secret rotation

AWS CDK custom resource

Conclusion

About the Authors

Origins of Project Galileo

Growth of Project Galileo

What Project Galileo means for civil society groups now

The importance of collaboration

Expanding Cloudflare’s Impact

Looking forward

Polling Machines?

Making AI agents better polling subjects

Likely use cases for AI polling

The collective thoughts of the interwebz