TVS Supply Chain Solutions built a file transfer platform using AWS Transfer Family for AS2 for B2B collaboration

Post Syndicated from Suresh Kanniappan original https://aws.amazon.com/blogs/architecture/tvs-supply-chain-solutions-built-a-file-transfer-platform-using-aws-transfer-family-for-as2-for-b2b-collaboration/

TVS Supply Chain Solutions (TVS SCS), promoted by the erstwhile TVS Group and now part of the $3 billion TVS Mobility Group, is an India-based multinational company who pioneered the development of the supply chain solutions market in India.

For the last 2 decades, it has provided supply chain management services to customers in the automotive, consumer goods, defense, and utility sectors in India, the United Kingdom, Europe, and the US. It has a presence in 26 countries with over 17,000 employees and provides services to 78 global Fortune 500 companies. The company went public in 2023.

To meet its customers’ compliance requirements, TVS SCS sought a reliable file transfer solution supporting Applicability Statement 2 (AS2), a business-to-business (B2B) messaging protocol. This post describes how TVS SCS built a secure file transfer platform using AWS Transfer Family for AS2 to exchange Electronic Data Interchange (EDI) documents with their B2B customers in the logistics industry.

Business use case

Several end customers in the manufacturing sector mandated the exchange of EDI documents through the AS2 protocol over the internet. To address this requirement while maintaining manageability, security, and scalability, TVS SCS implemented a file transfer platform on AWS.

TVS SCS serves end customers in the manufacturing sector who require supply chain solutions between various locations:

  • Source – Plants, warehouses, technology
  • Destination – OEM vendors, plants, dealers

The process involves the following steps:

  1. The end customer sends a booking request document (booking fact) to TVS SCS.
  2. TVS SCS and the end customer exchange a series of EDI documents.
  3. TVS SCS must acknowledge, process, and update the end customer upon receipt of each EDI document.

TVS SCS built a file transfer platform using Transfer Family with AS2 configuration to achieve the following:

  • Securely exchange EDI documents with end customers
  • Provide continuous notification using Message Disposition Notifications (MDNs)

The following diagram illustrates the end-to-end business process (requisition, sourcing, purchase orders, receiving, and invoicing) between TVS SCS and an end customer using the AS2 protocol.

End to end business process

Why the cloud?

TVS SCS chose AWS to build their AS2-compliant file transfer platform for three key reasons:

  • Data location – All relevant data (such as order creation and customer details) already resides in AWS
  • Infrastructure management – AWS addresses challenges in the following areas:
    • Maintaining highly available and scalable infrastructure
    • Maintaining correct AS2 system interoperability with trading partners
    • Meeting compliance requirements
  • Versatility for non-AS2 customers – TVS SCS uses multiple scalable and fully managed AWS services to build customized APIs and webhooks for customers not using AS2

This cloud-based approach allows TVS SCS to focus on their core business while AWS handles the complexities of secure, compliant, and scalable file transfer infrastructure.

Why Transfer Family and AS2?

AS2 is a B2B messaging protocol commonly used for exchanging EDI documents securely with integrity control according to the EDIFACT standard, reliably, and cost-effectively over the internet using the HTTP and HTTPS protocols. B2B integration over the AS2 protocol can be challenging, such as with trading partner onboarding, AS2 EDI integration, firewall configuration, certificate maintenance, and high licensing costs for commercial AS2 solutions.

By choosing Transfer Family with AS2 configuration, TVS SCS addresses these challenges and gains several advantages:

  • Simplified partner onboarding
  • Managed infrastructure, reducing maintenance overhead
  • Built-in security features
  • Flexible scaling to meet changing business needs
  • Pay-as-you-go pricing model

Solution overview

The following diagram shows the relationship between the AS2 objects involved in the inbound and outbound processes.

Relationship between the AS2 objects

The following diagram illustrates the solution architecture with AWS services.

Solution Architecture

For step-by-step instructions about creating an AS2 server using Transfer Family, refer to Create an AS2 server using the Transfer Family console.

The allowlisted IP address of the end-customer AS2 server is allowed to communicate with Transfer Family for AS2 on AWS. The customer sends the EDI document through Transfer Family, and the EDIs are stored in Amazon Simple Storage Service (Amazon S3). The business logic is implemented in AWS Lambda functions to read the EDI documents, process them, and update customers. AWS B2B Data Interchange, a fully managed service for automating EDI document transformation, can be considered as a complementary or alternative solution for EDI processing. There are two Lambda functions created: one handles truck booking using NodeJS, and the other handles outbound file transfer (from Amazon S3 to the AS2 server) using Python 3.2.

This architecture enables TVS SCS to securely and efficiently manage the EDI document flow, from receipt through processing and outbound transfer, using scalable and serverless AWS services. The solution provides a compliant and cost-effective approach to B2B data exchange with customers and partners.

Prerequisites

For the prerequisites to configure Transfer Family with AS2, see Configuring AS2. To learn more about the security features in Transfer Family, see Security in AWS Transfer Family.

End customer to TVS SCS communication workflow

The following diagram illustrates the step-by-step process of a truck booking request from an end customer to TVS SCS using AWS services.

End customer to TVS SCS communication workflow

This streamlined workflow demonstrates how TVS SCS uses AWS services to efficiently handle truck booking requests from customers:

  1. The customer initiates a truck booking by sending a booking fact EDI to TVS SCS. The EDI contains details like customer name, date, source location, destination location, and more.
  2. The signed and encrypted booking fact EDI is sent as an inbound HTTP AS2 payload to Transfer Family through the internet.
  3. Transfer Family writes the booking fact EDI to the S3 bucket.
  4. TVS SCS confirms receipt of the booking fact EDI either through the inline HTTP response or an asynchronous HTTP POST request to the originating server.
  5. The EDI exchange audit trail is logged in Amazon CloudWatch Logs.
  6. The EDI document is available for TVS SCS consumption, and a Lambda function processes the document using business logic.

TVS SCS to end customer communication workflow

The following diagram depicts the workflow from TVS SCS to the end customer.

TVS SCS to end customer communication workflow

This workflow demonstrates how TVS SCS uses AWS services to provide timely and accurate updates to customers throughout the delivery process:

  1. The customer confirms the price quote. TVS SCS uploads EDI documents to S3 bucket.
  2. TVS SCS sends a series of updates using the AS2 outbound connector, such as truck allocation, truck departure, truck in-transit status, truck delay notifications, delivery confirmation, and billing invoice. A Lambda function reads the EDI documents from Amazon S3 and runs business logic to generate responses for the end customer.
  3. The EDI documents are sent as an outbound HTTP payload.
  4. The customer AS2 server sends an acknowledgment using MDN.
  5. The EDI exchange audit trail is logged in CloudWatch Logs.
  6. The EDI document is available for the customer’s consumption and further processing.

Results

The following customer challenges were addressed with this solution:

  • It meets end customer requirements for EDI file exchange through AS2 protocol
  • It eliminates the need for in-house AS2 infrastructure management
  • It provides flexibility to add new customers to the file transfer platform

By addressing these challenges and using AWS services, TVS SCS has created a future-proof file transfer platform.

Summary

This post demonstrated how cloud-based services can transform traditional B2B communication processes, offering supply chain companies a path to improved efficiency, compliance, and customer satisfaction. For supply chain providers facing similar challenges, this solution offers a blueprint for modernizing file transfer systems while maintaining compliance with industry standards.

To learn more about this AWS solution for supply chain companies, contact AWS for further assistance. AWS can provide detailed information about implementation, pricing, and how to tailor the solution to your specific business needs. They have teams of experts who can guide companies through the process of modernizing their B2B communication systems using cloud-based services.


About the Authors

Batch data ingestion into Amazon OpenSearch Service using AWS Glue

Post Syndicated from Ravikiran Rao original https://aws.amazon.com/blogs/big-data/batch-data-ingestion-into-amazon-opensearch-service-using-aws-glue/

Organizations constantly work to process and analyze vast volumes of data to derive actionable insights. Effective data ingestion and search capabilities have become essential for use cases like log analytics, application search, and enterprise search. These use cases demand a robust pipeline that can handle high data volumes and enable efficient data exploration.

Apache Spark, an open source powerhouse for large-scale data processing, is widely recognized for its speed, scalability, and ease of use. Its ability to process and transform massive datasets has made it an indispensable tool in modern data engineering. Amazon OpenSearch Service—a community-driven search and analytics solution—empowers organizations to search, aggregate, visualize, and analyze data seamlessly. Together, Spark and OpenSearch Service offer a compelling solution for building powerful data pipelines. However, ingesting data from Spark into OpenSearch Service can present challenges, especially with diverse data sources.

This post showcases how to use Spark on AWS Glue to seamlessly ingest data into OpenSearch Service. We cover batch ingestion methods, share practical examples, and discuss best practices to help you build optimized and scalable data pipelines on AWS.

Overview of solution

AWS Glue is a serverless data integration service that simplifies data preparation and integration tasks for analytics, machine learning, and application development. In this post, we focus on batch data ingestion into OpenSearch Service using Spark on AWS Glue.

AWS Glue offers multiple integration options with OpenSearch Service using various open source and AWS managed libraries, including:

In the following sections, we explore each integration method in detail, guiding you through the setup and implementation. As we progress, we incrementally build the architecture diagram shown in the following figure, providing a clear path for creating robust data pipelines on AWS. Each implementation is independent of the others. We chose to showcase them separately, because in a real-world scenario, only one of the three integration methods is likely to be used.

Image showing the high level architecture diagram

You can find the code base in the accompanying GitHub repo. In the following sections, we walk through the steps to implement the solution.

Prerequisites

Before you deploy this solution, make sure the following prerequisites are in place:

Clone the repository to your local machine

Clone the repository to your local machine and set the BLOG_DIR environment variable. All the relative paths assume BLOG_DIR is set to the repository location in your machine. If BLOG_DIR is not being used, adjust the path accordingly.

git clone [email protected]:aws-samples/opensearch-glue-integration-patterns.git
cd opensearch-glue-integration-patterns
export BLOG_DIR=$(pwd)

Deploy the AWS CloudFormation template to create the necessary infrastructure

The main focus of this post is to demonstrate how to use the mentioned libraries in Spark on AWS Glue to ingest data into OpenSearch Service. Though we center on this core topic, several key AWS components will need to be pre-provisioned for the integration examples, such as a Amazon Virtual Private Cloud (Amazon VPC), multiple Subnets, an AWS Key Management Service (AWS KMS) key, an Amazon Simple Storage Service (Amazon S3) bucket, an AWS Glue role, and an OpenSearch Service cluster with domains for OpenSearch Service and Elasticsearch. To simplify the setup, we’ve automated the provisioning of this core infrastructure using the cloudformation/opensearch-glue-infrastructure.yaml AWS CloudFormation template.

  1. Run the following commands

The CloudFormation template will deploy the necessary networking components (such as VPC and subnets), Amazon CloudWatch logging, AWS Glue role, and OpenSearch Service and Elasticsearch domains required to implement the proposed architecture. Use a strong password (8–128 characters, three of which are lowercase, uppercase, numbers, or special characters, and no /, “, or spaces) and adhere to your organization’s security standards for ESMasterUserPassword and OSMasterUserPassword in the following command:

cd ${BLOG_DIR}/cloudformation/
aws cloudformation deploy \
--template-file ${BLOG_DIR}/cloudformation/opensearch-glue-infrastructure.yaml \
--stack-name GlueOpenSearchStack \
--capabilities CAPABILITY_NAMED_IAM \
--region <AWS_REGION> \
--parameter-overrides \
ESMasterUserPassword=<ES_MASTER_USER_PASSWORD> \
OSMasterUserPassword=<OS_MASTER_USER_PASSWORD>

You should see a success message such as "Successfully created/updated stack – GlueOpenSearchStack" after the resources have been provisioned successfully. Provisioning this CloudFormation stack typically takes approximately 30 minutes to complete.

  1. On the AWS CloudFormation console, locate the GlueOpenSearchStack stack, and confirm that its status is CREATE_COMPLETE.

Image showing the "CREATE_COMPLETE" status of cloudformation template

You can review the deployed resources on the Resources tab, as shown in the following screenshot.The screenshot does not display all the created resources.

Image showing the "Resources" tab of cloudformation template

Additional setup steps

In this section, we collect essential information, including the S3 bucket name and the OpenSearch Service and Elasticsearch domain endpoints. These details are required for executing the code in subsequent sections.

Capture the details of the provisioned resources

Use the following AWS CLI command to extract and save the output values from the CloudFormation stack to a file named GlueOpenSearchStack_outputs.txt. We refer to the values in this file in upcoming steps.

aws cloudformation describe-stacks \
--stack-name GlueOpenSearchStack \
--query 'sort_by(Stacks[0].Outputs[], &OutputKey)[].{Key:OutputKey,Value:OutputValue}' \
--output table \
--no-cli-pager \
--region <AWS_REGION> > ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt

Download NY Green Taxi December 2022 dataset and copy to S3 bucket

The purpose of this post is to demonstrate the technical implementation of ingesting data into OpenSearch Service using AWS Glue. Understanding the dataset itself is not essential, aside from its data format, which we discuss in AWS Glue notebooks in later sections. To learn more about the dataset, you can find additional information on the NYC Taxi and Limousine Commission website.

We specifically request that you download the December 2022 dataset, because we have tested the solution using this particular dataset:

S3_BUCKET_NAME=$(awk -F '|' '$2 ~ /S3Bucket/ {gsub(/^[ \t]+|[ \t]+$/, "", $3); print $3}' ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt)
mkdir -p ${BLOG_DIR}/datasets && cd ${BLOG_DIR}/datasets
curl -O https://d37ci6vzurychx.cloudfront.net/trip-data/green_tripdata_2022-12.parquet
aws s3 cp green_tripdata_2022-12.parquet s3://${S3_BUCKET_NAME}/datasets/green_tripdata_2022-12.parquet

Download the required JARs from the Maven repository and copy to S3 bucket

We’ve specified a particular JAR file version to ensure stable deployment experience. However, we recommend adhering to your organization’s security best practices and reviewing any known vulnerabilities in the version of the JAR files before deployment. AWS does not guarantee the security of any open-source code used here. Additionally, please verify the downloaded JAR file’s checksum against the published value to confirm its integrity and authenticity.

mkdir -p ${BLOG_DIR}/jars && cd ${BLOG_DIR}/jars
# OpenSearch Service jar
curl -O https://repo1.maven.org/maven2/org/opensearch/client/opensearch-spark-30_2.12/1.0.1/opensearch-spark-30_2.12-1.0.1.jar
aws s3 cp opensearch-spark-30_2.12-1.0.1.jar s3://${S3_BUCKET_NAME}/jars/opensearch-spark-30_2.12-1.0.1.jar
# Elasticsearch jar
curl -O https://repo1.maven.org/maven2/org/elasticsearch/elasticsearch-spark-30_2.12/7.17.23/elasticsearch-spark-30_2.12-7.17.23.jar
aws s3 cp elasticsearch-spark-30_2.12-7.17.23.jar s3://${S3_BUCKET_NAME}/jars/elasticsearch-spark-30_2.12-7.17.23.jar

In the following sections, we implement the individual data ingestion methods as outlined in the architecture diagram.

Ingest data into OpenSearch Service using the OpenSearch Spark library

In this section, we load an OpenSearch Service index using Spark and the OpenSearch Spark library. We demonstrate this implementation by using AWS Glue notebooks, employing basic authentication using user name and password.

To demonstrate the ingestion mechanisms, we have provided the Spark-and-OpenSearch-Code-Steps.ipynb notebook with detailed instructions. Follow the steps in this section in conjunction with the instructions in the notebook.

Set up the AWS Glue Studio notebook

Complete the following steps:

  1. On the AWS Glue console, choose ETL jobs in the navigation pane.
  2. Under Create job, choose Notebook.

Image showing AWS console page for AWS Glue to open notebook

  1. Upload the notebook file located at ${BLOG_DIR}/glue_jobs/Spark-and-OpenSearch-Code-Steps.ipynb.
  2. For IAM role, choose the AWS Glue job IAM role that begins with GlueOpenSearchStack-GlueRole-*.

Image showing AWS console page for AWS Glue to open notebook

  1. Enter a name for the notebook (for example, Spark-and-OpenSearch-Code-Steps) and choose Save.

Image showing AWS Glue OpenSearch Notebook

Replace the placeholder values in the notebook

Complete the following steps to update the placeholders in the notebook:

  1. In Step 1 in the notebook, replace the placeholder <GLUE-INTERACTIVE-SESSION-CONNECTION-NAME> with the AWS Glue interactive session connection name. You can get the name of the interactive session by executing the following command:
cd ${BLOG_DIR}
awk -F '|' '$2 ~ /GlueInteractiveSessionConnectionName/ {gsub(/^[ \t]+|[ \t]+$/, "", $3); print $3}' ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt
  1. In Step 1 in the notebook, replace the placeholder <S3-BUCKET-NAME> and populate the variable s3_bucket with the bucket name. You can get the name of the S3 bucket by executing the following command:
awk -F '|' '$2 ~ /S3Bucket/ {gsub(/^[ \t]+|[ \t]+$/, "", $3); print $3}' ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt
  1. In Step 4 in the notebook, replace <OPEN-SEARCH-DOMAIN-WITHOUT-HTTPS> with the OpenSearch Service domain name. You can get the domain name by executing the following command:
awk -F '|' '$2 ~ /OpenSearchDomainEndpoint/ {gsub(/^[ \t]+|[ \t]+$/, "", $3); print $3}' ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt

Run the notebook

Run each cell of the notebook to load data into the OpenSearch Service domain and read it back to verify the successful load. Refer to the detailed instructions within the notebook for execution-specific guidance.

Spark write modes (append vs. overwrite)

It is recommended to write data incrementally into OpenSearch Service indexes using the append mode, as demonstrated in Step 8 in the notebook. However, in certain cases, you may need to refresh the entire dataset in the OpenSearch Service index. In these scenarios, you can use the overwrite mode, though it is not advised for large indexes. When using overwrite mode, the Spark library deletes rows from the OpenSearch Service index one by one and then rewrites the data, which can be inefficient for large datasets. To avoid this, you can implement a preprocessing step in Spark to identify insertions and updates, and then write the data into OpenSearch Service using append mode.

Ingest data into Elasticsearch using the Elasticsearch Hadoop library

In this section, we load an Elasticsearch index using Spark and the Elasticsearch Hadoop Library. We demonstrate this implementation by using AWS Glue as the engine for Spark.

Set up the AWS Glue Studio notebook

Complete the following steps to set up the notebook:

  1. On the AWS Glue console, choose ETL jobs in the navigation pane.
  2. Under Create job, choose Notebook.

Image showing AWS console page for AWS Glue to open notebook

  1. Upload the notebook file located at ${BLOG_DIR}/glue_jobs/Spark-and-Elasticsearch-Code-Steps.ipynb.
  2. For IAM role, choose the AWS Glue job IAM role that begins with GlueOpenSearchStack-GlueRole-*.

Image showing AWS console page for AWS Glue to open notebook

  1. Enter a name for the notebook (for example, Spark-and-ElasticSearch-Code-Steps) and choose Save.

Image showing AWS Glue Elasticsearch Notebook

Replace the placeholder values in the notebook

Complete the following steps:

  1. In Step 1 in the notebook, replace the placeholder <GLUE-INTERACTIVE-SESSION-CONNECTION-NAME> with the AWS Glue interactive session connection name. You can get the name of the interactive session by executing the following command:
awk -F '|' '$2 ~ /GlueInteractiveSessionConnectionName/ {gsub(/^[ \t]+|[ \t]+$/, "", $3); print $3}' ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt
  1. In Step 1 in the notebook, replace the placeholder <S3-BUCKET-NAME> and populate the variable s3_bucket with the bucket name. You can get the name of the S3 bucket by executing the following command:
awk -F '|' '$2 ~ /S3Bucket/ {gsub(/^[ \t]+|[ \t]+$/, "", $3); print $3}' ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt
  1. In Step 4 in the notebook, replace <ELASTIC-SEARCH-DOMAIN-WITHOUT-HTTPS> with the Elasticsearch domain name. You can get the domain name by executing the following command:
awk -F '|' '$2 ~ /ElasticsearchDomainEndpoint/ {gsub(/^[ \t]+|[ \t]+$/, "", $3); print $3}' ${BLOG_DIR}/GlueOpenSearchStack_outputs.txt

Run the notebook

Run each cell in the notebook to load data to the Elasticsearch domain and read it back to verify the successful load. Refer to the detailed instructions within the notebook for execution-specific guidance.

Ingest data into OpenSearch Service using the AWS Glue OpenSearch Service connection

In this section, we load an OpenSearch Service index using Spark and the AWS Glue OpenSearch Service connection.

Create the AWS Glue job

Complete the following steps to create an AWS Glue Visual ETL job:

  1. On the AWS Glue console, choose ETL jobs in the navigation pane.
  2. Under Create job, choose Visual ETL

This will open the AWS Glue job visual editor.Image showing AWS console page for AWS Glue to open Visual ETL

  1. Choose the plus sign, and under Sources, choose Amazon S3.

Image showing AWS console page for AWS Glue Visual Editor

  1. In the visual editor, choose the Data Source – S3 bucket node.
  2. In the Data source properties – S3 pane, configure the data source as follows:
    • For S3 source type, select S3 location.
    • For S3 URL, choose Browse S3, and choose the green_tripdata_2022-12.parquet file from the designated S3 bucket.
    • For Data format, choose Parquet.
  1. Choose Infer schema to let AWS Glue detect the schema of the data.

This will set up your data source from the specified S3 bucket.

Image showing AWS console page for AWS Glue Visual Editor

  1. Choose the plus sign again to add a new node.
  2. For Transforms, choose Drop Fields to include this transformation step.

This will allow you to remove any unnecessary fields from your dataset before loading it into OpenSearch Service.

Image showing AWS console page for AWS Glue Visual Editor

  1. Choose the Drop Fields transform node, then select the following fields to drop from the dataset:
    • payment_type
    • trip_type
    • congestion_surcharge

This will remove these fields from the data before it is loaded into OpenSearch Service.

Image showing AWS console page for AWS Glue Visual Editor

  1. Choose the plus sign again to add a new node.
  2. For Targets, choose Amazon OpenSearch Service.

This will configure OpenSearch Service as the destination for the data being processed.

Image showing AWS console page for AWS Glue Visual Editor

  1. Choose the Data target – Amazon OpenSearch Service node and configure it as follows:
    • For Amazon OpenSearch Service connection, choose the connection GlueOpenSearchServiceConnec-* from the drop down.
    • For Index, enter green_taxi. The green_taxi index was created earlier in the “Ingest data into OpenSearch Service using the OpenSearch Spark library” section.

This configures the OpenSearch Service to write the processed data to the specified index.

Image showing AWS console page for AWS Glue Visual Editor

  1. On the Job details tab, update the job details as follows:
    • For Name, enter a name (for example, Spark-and-Glue-OpenSearch-Connection).
    • For Description, enter an optional description (for example, AWS Glue job using Glue OpenSearch Connection to load data into Amazon OpenSearch Service).
    • For IAM role, choose the role starting with GlueOpenSearchStack-GlueRole-*.
    • For the Glue version, choose Glue 4.0 – Supports spark 3.3, Scala 2, Python 3
    • Leave the rest of the fields as default.
    • Choose Save to save the changes.

Image showing AWS console page for AWS Glue Visual Editor

  1. To run the AWS Glue job Spark-and-Glue-OpenSearch-Connector, choose Run.

This will initiate the job execution.

Image showing AWS console page for AWS Glue Visual Editor

  1. Choose the Runs tab and wait for the AWS Glue job to complete successfully.

You will see the status change to Succeeded when the job is complete.

Image showing AWS console page for AWS Glue job run status

Clean up

To clean up your resources, complete the following steps:

  1. Delete the CloudFormation stack:
aws cloudformation delete-stack \
--stack-name GlueOpenSearchStack \
--region <AWS_REGION>
  1. Delete the AWS Glue jobs:
    • On the AWS Glue console, under ETL jobs in the navigation pane, choose Visual ETL.
    • Select the jobs you created (Spark-and-Glue-OpenSearch-Connector, Spark-and-ElasticSearch-Code-Steps, and Spark-and-OpenSearch-Code-Steps) and on the Actions menu, choose Delete.

Conclusion

In this post, we explored several ways to ingest data into OpenSearch Service using Spark on AWS Glue. We demonstrated the use of three key libraries: the AWS Glue OpenSearch Service connection, the OpenSearch Spark Library, and the Elasticsearch Hadoop Library. The methods outlined in this post can help you streamline your data ingestion into OpenSearch Service.

If you’re interested in learning more and getting hands-on experience, we’ve created a workshop that walks you through the entire process in detail. You can explore the full setup for ingesting data into OpenSearch Service, handling both batch and real-time streams, and building dashboards. Check out the workshop Unified Real-Time Data Processing and Analytics Using Amazon OpenSearch and Apache Spark to deepen your understanding and apply these techniques step by step.


About the Authors

Ravikiran Rao is a Data Architect at Amazon Web Services and is passionate about solving complex data challenges for various customers. Outside of work, he is a theater enthusiast and amateur tennis player.

Vishwa Gupta is a Senior Data Architect with the AWS Professional Services Analytics Practice. He helps customers implement big data and analytics solutions. Outside of work, he enjoys spending time with family, traveling, and trying new food.

Suvojit Dasgupta is a Principal Data Architect at Amazon Web Services. He leads a team of skilled engineers in designing and building scalable data solutions for AWS customers. He specializes in developing and implementing innovative data architectures to address complex business challenges.This post showcases how to use Spark on AWS Glue to seamlessly ingest data into OpenSearch Service. We cover batch ingestion methods, share practical examples, and discuss best practices to help you build optimized and scalable data pipelines on AWS.

[$] Chimera Linux works toward a simplified desktop

Post Syndicated from daroc original https://lwn.net/Articles/1004324/

Chimera Linux is a new distribution
designed to be “simple, transparent, and easy to pick up“. The
distribution is built from scratch, and

recently announced
its first beta release. While the documentation and
installation process are both a bit rough, the project already provides a
usable desktop with plenty of useful software — one built primarily on
tools adopted from BSD.

AWS re:Invent 2024: Security, identity, and compliance recap

Post Syndicated from Marshall Jones original https://aws.amazon.com/blogs/security/aws-reinvent-2024-security-identity-and-compliance-recap/

AWS re:Invent 2024 was held in Las Vegas December 2–6, with over 54,000 attendees participating in more than 2,300 sessions and hands-on labs. The conference was a hub of innovation and learning hosted by AWS for the global cloud computing community.

In this blog post, we cover on-demand sessions and major security, identity, and compliance announcements that were unveiled leading up to and during the conference. Whether you missed the event or want to revisit the key takeaways, we’ve compiled the essential information for you to provide a comprehensive overview of the latest developments in AWS security, identity, and compliance. This year’s event put best practices for zero trust, generative AI–driven security, identity, and access management, DevSecOps, network and infrastructure security, data protection, and threat detection and incident response at the forefront.

Key announcements

For identity and access management, we launched multiple new features that can help you scale permissions management across your AWS Organizations.

  • Resource control policies (RCPs) – RCPs are a new type of organization policy that can be used to centrally create and enforce preventative controls on AWS resources in your organization. Using RCPs, you can centrally set the maximum available permissions to your AWS resources as you scale your workloads on AWS.
  • Centrally manage root access – With central management for root access, you now have a capability to centrally manage your root credentials, simplify auditing of credentials, and perform tightly scoped privileged tasks across your AWS member accounts managed using AWS Organizations.
  • Declarative policies – Declarative policies simplify the way you enforce durable intent, such as baseline configurations for AWS services within your organization.

Amazon Cognito announced four new features:

  • Feature tiers – Amazon Cognito launched two user pool feature tiers: Essentials and Plus. The Essentials tier offers comprehensive and flexible user authentication and access control features, helping you to implement secure, scalable, and customized sign-up and sign-in experiences. The Plus tier offers threat protection capabilities against suspicious sign-ins for customers who have elevated security needs for their applications.
  • Developer-focused console – Amazon Cognito now offers a streamlined getting-started experience featuring a quick wizard and use case–specific recommendations. This approach helps you set up configurations and reach your end users faster and more efficiently than ever before.
  • Managed Login – This feature is a fully managed, hosted sign-in and sign-up experience that you can personalize to align with your company or application branding. Managed Login helps you offload the undifferentiated heavy lifting of designing and maintaining custom implementations such as passwordless authentication and localization.
  • Passwordless authentication – With passwordless authentication, you can secure user access to your application with passkeys, email, and text messages. If your users choose to use passkeys to sign in, they can do so using a built-in authenticator, such as Touch ID on Apple MacBooks and Windows Hello facial recognition on PCs.

To discover security issues in your environment, Amazon GuardDuty launched Extended Threat Detection, a capability that you can use to identify sophisticated, multi-stage threats targeting your AWS accounts, workloads, and data. You can now use new threat sequence findings that cover multiple resources and data sources over an extensive time period, allowing you to spend less time on first-level analysis and more time responding to critical severity threats to minimize business impact.

Amazon OpenSearch Service now offers a zero-ETL integration with Amazon Security Lake, enabling you to query and analyze security data in-place directly through OpenSearch Service. This integration allows you to efficiently explore voluminous data sources that were previously cost-prohibitive to analyze, helping you streamline security investigations and obtain comprehensive visibility of your security landscape. With the flexibility to selectively ingest data and without the need to manage complex data pipelines, you can now focus on effective security operations while potentially lowering your analytics costs.

AWS Security Incident Response is a new service that helps you respond to security issues in your environment. This new service combines the power of automated monitoring and investigation, accelerated communication and coordination, and direct 24/7 access to the AWS Customer Incident Response Team to quickly prepare for, respond to, and recover from security events.

In the zero trust space, AWS Verified Access AWS Verified Accessand Amazon VPC Lattice both launched support for accessing non-HTTPS resources. Verified Access enables you to provide secure, VPN-less access to your corporate applications over protocols such as TCP, SSH, and RDP. With the launch of VPC Resources for Amazon VPC Lattice, you can now access your application dependencies through a VPC Lattice service network. You’re able to connect to your application dependencies that are hosted in different VPCs, accounts, and on-premises using additional protocols, including TLS, HTTP, HTTPS, and now TCP. Watch the on demand session to learn how you can enable zero trust access over non-HTTP(S) protocols by using AWS Verified Access.

Amazon Route 53 Resolver DNS Firewall launched an advanced firewall rule that has a new set of capabilities that you can use to monitor and block suspicious DNS traffic associated with advanced DNS threats.

Amazon Virtual Private Cloud launched block public access, which is a one-click declarative control that admins can implement centrally to authoritatively block internet traffic for each of their VPCs.

As more and more customers deploy generative AI workloads into production, it’s important to have proper security controls. Amazon Bedrock launched two new features to help with this:

  • Automated Reasoning checks – Automated Reasoning checks help detect hallucinations and provide a verifiable proof that a large language model response is accurate. With Automated Reasoning checks, domain experts can more straightforwardly build specifications called Automated Reasoning Policies that encapsulate their knowledge in fields such as operational workflows and HR policies. Users of Amazon Bedrock Guardrails can validate generated content against an Automated Reasoning Policy to identify inaccuracies and unstated assumptions, and explain why statements are accurate in a verifiable way.
  • Multimodal toxicity detection (Preview) – Amazon Bedrock Guardrails now supports multimodal toxicity detection for image content, enabling organizations to apply content filters to images. This capability, now in public preview, removes the heavy lifting required to build your own safeguards for image data or spend cycles with manual evaluation that can be error-prone and tedious.

AWS has continued to work closely with partners to help drive customer success. There were three new partner programs launched at AWS re:Invent:

  • AI Security category – The AI Security category in the AWS Security competency helps you identify AWS Partners with deep experience securing AI environments and defending AI workloads against advanced threats. Partners in this category are validated for their capabilities in areas such as prevention of sensitive data disclosure, prevention of injection threats, security posture management, and implementing responsible AI filtering.
  • AWS Security Incident Response Specialization – Today, AWS customers rely on various third-party tools and services to support their internal security incident response capabilities. To better help both customers and partners, AWS introduced AWS Security Incident Response, a new service that helps you prepare for, respond to, and recover from security events. Alongside approved AWS Partners, AWS Security Incident Response monitors, investigates, and escalates triaged security findings from Amazon GuardDuty and other threat detection tools through AWS Security Hub. Security Incident Response is designed to identify and escalate only high-priority incidents.
  • Amazon Security Lake Ready Specialization – This specialization recognizes AWS Partners who have technically validated their software solutions to integrate with Amazon Security Lake and demonstrated successful customer deployments. These solutions have been technically validated by AWS Partner Solutions Architects for their sound architecture and proven customer success.

Experience content on demand

If you were unable to join us in person or you want to watch a session again, you can view the sessions that are available on demand. Catch the CEO Keynote with Matt Garman to learn how AWS is reinventing foundational building blocks in addition to developing brand-new experiences, all to empower AWS customers and partners with what they need to build a better future. You can also replay additional re:Invent 2024 keynotes.

Watch the Security Innovation Talk, with AWS CISO Chris Betz, to hear how the latest AWS innovations are helping customers move fast and stay secure. Learn how AWS empowers organizations to confidently integrate and automate security into their products, services, and processes so security teams can focus their time on work that brings the highest value to the business. Chris also shares how AWS is helping to make the internet more secure by scaling security innovation and investing in the security community.

Stream any of the AWS security, identity, and compliance breakout sessions and new launch talks on demand to learn about the following key topics, and more:

Consider joining us for more in-person security learning opportunities by saving the date for AWS re:Inforce 2025, which will take place June 16–18 in Philadelphia, Pennsylvania. We look forward to seeing you there!

If you want to discuss how these new announcements can help your organization improve its security posture, AWS is here to help. Contact your AWS account team today.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Security, Identity, & Compliance re:Post or contact AWS Support.

Author

Marshall Jones

Marshall is a Worldwide Security Specialist Solutions Architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he’s focused on helping enterprise AWS customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Apurva More

Apurva More

Apurva is a part of the AWS Security, Identity, and Compliance team, with 13 years of experience in global product marketing across both startups and large enterprises. Known for her expertise in market positioning, competitive analysis, and customer insights, she has launched products that resonate with target audiences and drive revenue growth, while collaborating cross-functionally to align product vision with market needs and business goals.

AWS Weekly Roundup: New Asia Pacific Region, DynamoDB updates, Amazon Q developer, and more (January 13, 2025)

Post Syndicated from Betty Zheng (郑予彬) original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-new-asia-pacific-region-dynamodb-updates-amazon-q-developer-and-more-january-13-2025/

As we move into the second week of 2025, China is celebrating Laba Festival (腊八节), a traditional holiday, which marks the beginning of Chinese New Year preparations. On this day, Chinese people prepare Laba congee, a special porridge combining various grains, dried fruits, and nuts. This

nutritious mixture symbolizes harmony, prosperity, and good fortune — with each ingredient representing the diversity and abundance of life. This traditional practice dates back to when Buddha achieved enlightenment after consuming rice porridge, making it a symbol of both material and spiritual nourishment. The festival, occurring on the eighth day of the twelfth lunar month, marks the countdown to Spring Festival, China’s most significant traditional holiday celebrating family reunion and renewal.

As our global tech community grows, such cultural celebrations remind us of the importance of inclusive innovation and shared progress.

Last week’s launches

Let’s take a look at what Amazon Web Services (AWS) launched in this week.

New AWS Asia Pacific (Thailand) Region– AWS has expanded its global infrastructure with the launch of the new Asia Pacific (Thailand) AWS Region, featuring three Availability Zones. With this addition, customers in Thailand and throughout Southeast Asia can serve customers with reduced latency while maintaining data residency within Thailand. The newly launched Region supports the complete range of AWS services and strengthens our presence in the rapidly growing ASEAN market.

New AWS Direct Connect location in Bangkok – Following the launch of our Thailand Region, we’ve established a new AWS Direct Connect location in Bangkok and expanded our existing infrastructure. This addition provides customers in Thailand with improved connectivity options and reduced network latency when accessing AWS services.

Database and analytics

Configurable point-in-time recovery periods for Amazon DynamoDBAmazon DynamoDB now enables customizable point-in-time recovery (PITR) periods, which means customers can specify recovery durations ranging from 1 to 35 days on a per-table basis. This enhancement enables organizations to meet precise compliance requirements while maximizing cost-efficiency. The feature is now available across all AWS Regions, including AWS GovCloud (US West) and China Regions. This flexibility in data recovery periods empowers customers to align their backup policies precisely with their business requirements and regulatory obligations.

Amazon MSK Connect APIs with AWS PrivateLinkAmazon Managed Streaming for Apache Kafka Connect (Amazon MSK Connect) APIs now support AWS PrivateLink, giving customers access to MSK Connect APIs through private endpoints within their virtual private cloud (VPC). This enhancement provides increased security and reduced data exposure by keeping traffic within the AWS network.

Generative AI and machine learning

Amazon Q Developer in SageMaker Code EditorAmazon Q Developer is now integrated into the Amazon SageMaker Code Editor integrated development environment (IDE), enhancing the developer’s experience with AI-powered code assistance. Intelligent code suggestions, documentation assistance, and contextual recommendations are now directly available within the SageMaker development environment.

Management and governance

AWS Systems Manager Automation in AWS ChatbotAWS Chatbot now offers 20 additional AWS Systems Manager Automation runbook recommendations, expanding its capabilities for automated operations management. These new recommendations help customers streamline their operational tasks and implement best practices more efficiently through chat-based interactions.

AWS Transit Gateway cost analysis enhancement – We’ve introduced new capabilities for analyzing Transit Gateway data processing charges using cost allocation tags. This feature provides improved visibility and control over networking costs, enabling organizations to track and optimize AWS Transit Gateway usage efficiently. The enhanced cost analysis tools deliver detailed insights into network traffic patterns and associated costs.

Other AWS news and highlights

2024’s most popular DevOps blog posts – The retrospective blog post “The most visited DevOps and Developer Productivity blog posts in 2024” has reached the top one position on this week’s AWS most popular articles chart. This compilation presents the most influential DevOps content from 2024, offering insights into trending topics and best practices. The collection examines key developments in continuous integration and continuous development (CI/CD), infrastructure as code (IaC), and automation practices.

New security course for generative AIAWS Skill Builder has released a new course focusing on securing generative AI applications on AWS. This comprehensive training teaches professionals to implement security best practices for artificial intelligence and machine learning (AI/ML) workloads, addressing data protection, model security, and compliance requirements. The course meets the growing demand for specialized security knowledge in the rapidly evolving field of generative AI.

Amazon Connect Contact Lens free trials – We’re introducing free trials for first-time users of Amazon Connect Contact Lens conversational analytics and performance evaluations. New customers can process up to 100,000 voice minutes monthly at no cost for 2 months, and first-time performance evaluation users receive a 30-day free trial starting with their first evaluation. With this initiative, customers can experience Contact Lens capabilities in their environment without additional costs. The free trials are available across all AWS Regions where Contact Lens is supported.

For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS page.

Whether you’re a developer, architect, business leader, or you’re starting your cloud journey – and regardless of what 2024 brought your way – 2025 presents new opportunities for everyone.

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Betty

How to monitor, optimize, and secure Amazon Cognito machine-to-machine authorization

Post Syndicated from Abrom Douglas original https://aws.amazon.com/blogs/security/how-to-monitor-optimize-and-secure-amazon-cognito-machine-to-machine-authorization/

Amazon Cognito is a developer-centric and security-focused customer identity and access management (CIAM) service that simplifies the process of adding user sign-up, sign-in, and access control to your mobile and web applications. Cognito is a highly available service that supports a range of use cases, from managing user authentication and authorization to enabling secure access to your APIs and workloads. It’s a managed service that can act as an identity provider (IdP) for your applications, can scale to millions of users, provides advanced security features, and can support identity federation with third-party IdPs.

A feature of Amazon Cognito is support for OAuth 2.0 client credentials grants, used for machine-to-machine (M2M) authorization. As your M2M use cases scale, it becomes important to have proper monitoring, optimization of token issuance, and awareness of security best practices and considerations. It’s a best practice for app clients to locally cache and reuse access tokens while still valid and not expired. You can customize how long issued tokens are valid, so it’s important to make sure that the timeframe is aligned with your security requirements. If caching and reusing access tokens isn’t possible at the client level or cannot be enforced, then combining your M2M use cases with a REST API proxy integration using Amazon API Gateway enables you to cache token responses. By using API Gateway caching, you can optimize the request and response of access tokens for M2M authorization. This reduces redundant calls to Cognito for access tokens, thus improving the overall performance, availability, and security of your M2M use cases.

In this post, we explore strategies to help monitor, optimize, and secure Amazon Cognito M2M authorization. You’ll first learn some effective monitoring techniques to keep track of your usage, then delve into optimization strategies using API Gateway and token caching. Lastly, we will cover security best practices and considerations to bolster the security of your M2M use cases. Let’s dive in and discover how to make the most out of your Amazon Cognito M2M implementation.

Machine-to-machine authorization

Amazon Cognito uses an OAuth 2.0 client credentials grant to handle M2M authorization. A Cognito user pool can issue a client ID and client secret to allow your service to request a JSON web token (JWT)-compliant access token to access protected resources. Figure 1 illustrates how an app client requests an access token using the client credentials grant flow with Amazon Cognito.

Figure 1: Client credentials grant flow

Figure 1: Client credentials grant flow

The client credential grant flow (Figure 1) includes the following steps:

  1. The app client makes an HTTP POST request to the Amazon Cognito user pool /token endpoint (see The token issuer endpoint for more information), which provides an authorization header consisting of the client ID and client secret, and request parameters consisting of grant type, client ID, and scopes.
  2. After validating the request, Cognito will return a JWT-compliant access token.
  3. The client can make subsequent requests to a downstream resource server using the Cognito issued access token.
  4. The resource server gets a JSON Web Key Set (JWKS) from the Cognito user pool. The JWKS contains the user pool’s public keys, which should be used to verify the token signature.
  5. The resource server uses the public key to verify the signature of the access token is valid (proving the token has not been tampered with). The resource server also needs to verify that the token is not expired and required claims and values are present, including scopes. The resource server should use the aws-jwt-verify library to verify that the access token is valid.
  6. After the access token is verified and the app client is authorized, the requested resource is returned to the app client.

You can learn more about OAuth 2.0 support for client credentials grants and other authentication flows that Amazon Cognito supports in How to use OAuth 2.0 in Amazon Cognito: Learn about the different OAuth 2.0 grants.

Now, let’s dive deep into the monitoring, optimization, and security considerations around M2M authorization with Amazon Cognito.

Monitoring usage and costs

In May 2024, Amazon Cognito introduced pricing for M2M authorization to support continued growth and expand M2M features. Customer accounts using M2M with Cognito prior to May 9, 2024, are exempt from M2M pricing until May 9, 2025 (for more information, see Amazon Cognito introduces tiered pricing for machine-to-machine (M2M) usage). To get better visibility into your existing Amazon Cognito usage types, you can use the Security tab of the Cost and Usage Dashboards Operations Solution (CUDOS) dashboard. This dashboard is part of the Cloud Intelligence Dashboard, an opensource framework that provides AWS customers actionable insights and optimization opportunities at an organization scale. As shown in Figure 2, the Security tab in the CUDOS dashboard provides visuals that show the cost and spend of Amazon Cognito per usage type and the projected cost for M2M app clients and token requests after the exemption period with daily granularity. This daily breakdown allows you to track how your cost optimization efforts are trending.

Figure 2: Example Amazon Cognito spend and projected cost with daily granularity

Figure 2: Example Amazon Cognito spend and projected cost with daily granularity

You can also see the monthly spend per account for each usage type, as shown in Figure 3.

Figure 3: Example Amazon Cognito spend and projected cost per AWS account

Figure 3: Example Amazon Cognito spend and projected cost per AWS account

You can see the usage and spend per resource ID of user pools contributing to the cost, as shown in Figure 4. This resource-level granularity enables you to identify the top spending user pool and prioritize usage and cost management efforts accordingly. An interactive demo of this dashboard is available. For more information, see Cloud Intelligence Dashboards.

Figure 4: Example Amazon Cognito resource usage and cost by resource ID, account, and AWS Region

Figure 4: Example Amazon Cognito resource usage and cost by resource ID, account, and AWS Region

In addition to using the CUDOS dashboard to help understand Cognito M2M usage and costs, you can also request fine-grained usage details down to the app client level. This can include the number of access tokens successfully requested per app client and the last time the app client was used to issue tokens. To understand fine-grained app client usage, you need to make sure that token requests include the client_id request query parameter. This will result in an AWS CloudTrail log event that includes the client ID within the additionalEventData JSON object that is associated with the client credentials token request, as shown in Figure 5.

Figure 5: Sample CloudTrail event log including client_id

Figure 5: Sample CloudTrail event log including client_id

You can also use an Amazon CloudWatch log group to capture and store your CloudTrail logs for longer retention and analysis. Then using CloudWatch Logs Insights, you can use the following sample query to gather app client usage.

fields additionalEventData.userPoolId as user_pool_id, additionalEventData.requestParameters.client_id.0 as client_id, eventName, additionalEventData.responseParameters.status 
| filter additionalEventData.requestParameters.grant_type.0="client_credentials" and eventName="Token_POST" and additionalEventData.responseParameters.status="200"
| stats count(*) as count, latest(eventTime) as last_used by user_pool_id, client_id
| sort count desc

Figure 6 is an example result from the preceding CloudWatch Logs Insights query. The result includes the user_pool_id, client_id, count, and last_used columns. The total number of successful token requests grouped per user pool and client ID will be displayed in the count column and the last time the app client successfully issued an access token will be displayed in the last_used column.

Figure 6: Example screenshot result set from CloudWatch Logs Insights query

Figure 6: Example screenshot result set from CloudWatch Logs Insights query

Optimizing token requests

Now that you know how to better monitor your Amazon Cognito usage and costs, let’s dive deeper into how to optimize your token requests usage. For M2M, it’s recommended that clients use mechanisms to locally cache access tokens to use for authorization. This will reduce the need for the client to request a new access token until the previously issued token is no longer valid. However, the environment where the client runs could be hosted by an external third party or owned by a different team and as the resource owner, you won’t have control over whether the third party implements token caching at the client side. If this is a scenario that you have, you can use a HTTP proxy integration to cache the access token using API Gateway. Because the M2M use case follows the client credentials grant flow of the OAuth 2.0 specification, the /token endpoint of your user pool is what will be configured with the API Gateway proxy integration. This proxy integration is where caching in API Gateway can be used. With caching, you can reduce the number of token requests made to your user pool /token endpoint and improve the latency of the client receiving a cached token in the response. With caching, you can achieve additional benefits, such as cost optimization, improved performance efficiency, higher levels of availability, and custom domain flexibility.

Solution overview

Figure 7: Token caching solution

Figure 7: Token caching solution

The solution (shown in the Figure 7) includes the following steps.

  1. The client makes an HTTP POST request to an API Gateway REST API.
  2. The API Gateway method request caches the scope URL query string parameter and the Authorization HTTP request header as caching keys. The integration request is configured as a proxy to the /oauth2/token endpoint of your Amazon Cognito user pool.
  3. Cognito validates the request, making sure that the client ID and client secret are correct from the authorization header, a valid client ID has been provided as a query string parameter, and the client is authorized for the requested scopes.
  4. If the request is valid, Cognito returns an access token to the gateway through the integration response. With caching enabled, the response from the HTTP integration (Cognito token endpoint) is cached for the specified time-to-live (TTL) period.
  5. The method response of the gateway returns the access token to the client.
  6. Subsequent token requests with a remaining cached TTL will be returned, using the authorization header and scope as the caching keys.

To set up token caching, follow the steps in Managing user pool token expiration and caching. After a valid token request is returned through the API Gateway proxy integration and cached, subsequent token requests to the proxy that match the caching keys (authorization header and scope parameter) will return that same access token. This token will be returned to the client until the TTL of the cached token has expired. It’s recommended to set the TTL of the cache to be a few minutes less than the TTL of the access token issued from Amazon Cognito. For example, if your security posture requires access tokens to be valid for 1 hour, then set your caching TTL to be a few minutes less than the 1-hour token validity. It’s also important to understand the ideal caching capacity for your use case. The caching capacity affects the CPU, memory, and network bandwidth of the cache instance within the gateway. As a result, the cache capacity can affect the performance of your cache. See Enable Amazon API Gateway caching for more information. For information about how to determine the ideal cache capacity for your use case, see How do I select the best Amazon API Gateway Cache capacity to avoid hitting a rate limit?. Let’s now explore some security best practices and considerations to raise the security bar of your M2M use cases.

Security best practices

Now that you know how to monitor Amazon Cognito M2M usage and costs and how to optimize access token requests, let’s review some security best practices and considerations. Using OAuth 2.0 client credentials grant for M2M authorization helps protect your APIs. One of the key factors for this is that the access token used by the client to connect to the resource server is a temporary and time-bound token. The client must obtain a new access token after its previous token has expired so you won’t have to issue long-lived credentials that are used directly between the client and the resource server. The client ID and client secret remain confidential on the client and are only used between the client and the Amazon Cognito user pool to request an access token.

Use AWS Secrets Manager

If the workload is running on AWS, use AWS Secrets Manager so you don’t have to worry about hard-coding credentials into workloads and applications. If the workload is running on premises or through another provider, then use a similar secrets’ vault or privileged access management solution to house the workload credentials. The workload should retrieve credentials for authentication only at runtime.

Use AWS WAF

It’s a security best practice to use AWS WAF to protect your Amazon Cognito user pool endpoints. This can help protect your user pools from unwanted HTTP web requests by forwarding selected non-confidential headers, request body, query parameters, and other request components to an AWS WAF web access control list (ACL) associated with your user pool. By using AWS WAF, you can also add managed rule groups to your user pool, such as the AWS managed rule group for Bot Control, to add protection against automated bots that can consume excess resources, cause downtime, or perform malicious activities. Learn more about how to associate an AWS WAF Web ACL with your Cognito user pool.

Always verify tokens

After a client has obtained an access token, it’s important to make sure the client is authorized to access the requested resources. If the resource is using API Gateway and the built-in Amazon Cognito authorizer, then the integrity of the token, the signature, and token expiration are checked and validated for you. However, if you require a more custom authorization decision with API Gateway, you can use an AWS Lambda authorizer along with the aws-jwt-verify library. By doing so, you can verify that the signature of the JWT token is valid, make sure that the token isn’t expired, and that the necessary and expected claims are present (including necessary scopes). For more fine-grained authorization decisions, look into using Amazon Verified Permissions with the resource server or even within a Lambda authorizer. If the resource server is an external system that is, outside of AWS or a custom resource server, you want to make sure that the access token is validated and verified before the requested resources are returned to the client.

Define scopes at the app client level

It’s important to carefully define and constrain the scope of access for each app client to align with the principle of least privilege. By restricting each client ID to only the necessary scopes, organizations can minimize the risk of issuing access tokens with more access and permissions than is required. If your use case aligns with M2M multi-tenancy, consider creating a dedicated app client per tenant and using defined custom scopes for that tenant. Remember that the number of M2M app clients is a pricing dimension and will incur a cost. See Custom scope multi-tenancy best practices for more information.

Security considerations

If you’re using API Gateway to proxy token requests and caching access tokens, the following are some security considerations to raise the security bar of your M2M workload.

Allow token requests only through an API Gateway proxy

After your API Gateway proxy integration is configured and set up for optimization and you have AWS WAF configured for your user pool, you can add an additional layer of security by using an allow list so that only requests from your API Gateway proxy to your Amazon Cognito user pool are accepted. For this, inject a custom HTTP header within the integration request of the POST method execution and create an allow rule within your web ACL that looks for that specific header. You will also create an additional web ACL rule to block all traffic. The single allow rule will have a priority order of 0 and the block-all-traffic rule will have a priority order of 1. Ultimately, this will block all requests that go directly to your Cognito user pool /token endpoint and only allow requests that have been made through the API Gateway proxy. Figure 8 that follows is a deeper explanation of this setup.

Figure 8: Token caching solution with AWS WAF

Figure 8: Token caching solution with AWS WAF

The process shown in Figure 8 has the following steps:

  1. The client makes a direct HTTP POST call to the /oauth2/token endpoint of the Amazon Cognito user pool. This request would be denied by the AWS WAF web ACL deny all rule.
  2. The client initiates an OAuth2 client credentials grant (HTTP POST) against an API Gateway stage (/token).
  3. The REST API gateway is a proxy integration to the /oauth2/token endpoint of the Cognito user pool.
    1. Within the integration request settings, configure a custom header (for example, x-wafAuthAllowRule). Treat the value of this header as a secret that remains only within the API Gateway integration request and is not exposed outside of the gateway.
    2. Consider using Lambda, Amazon EventBridge, and AWS Secrets Manager to automatically rotate this header value in both the API Gateway integration request and in the AWS WAF web ACL rule.
  4. The request is proxied to the Cognito /oauth2/token endpoint and AWS WAF is configured to protect the Cognito user pool endpoints and therefore web ACL rules are evaluated.
    1. The custom header from the integration request (the preceding step) is evaluated against the web ACL rules to allow this request.
  5. Cognito will verify the authorization header (containing the client ID and client secret) and requested scopes.
  6. After successful credential validation, an access token is returned to the gateway within the integration response.
  7. The access token is cached using the following caching keys:
    1. Authorization header.
    2. Scope query string parameter.
  8. The access token is returned to the client through API Gateway.
  9. Subsequent token requests with a remaining cached TTL are returned to client immediately, using the authorization header and scope as the caching keys.

Additional authorizer with API Gateway

Using the client credentials grant is designed to obtain an access token so that an app client can access downstream resources. If you’re using API Gateway as a proxy integration to your token endpoint, as described previously, you can also use a separate authorizer with an API Gateway proxy. Therefore, to begin the OAuth 2.0 client credentials grant flow, a separate authorization takes place first. For example, if you’re in a highly regulated industry, you might require the use of mTLS authentication to obtain an access token. This might seem like a double-authentication scenario; however, this helps prevent unauthenticated attempts against your API Gateway proxy integration to get an access token from Amazon Cognito.

Encrypting the API cache

While configuring your API Gateway proxy integration and provisioning your API cache, you can enable encryption of the cached response data. Because this caches access tokens for the set TTL of your choosing, you should consider encrypting this data at rest if necessary to help meet your security requirements. You can use the default method caching or set an override stage-level caching and enable encryption at rest.

Conclusion

In this post, we shared how you can monitor, optimize, and enhance the security posture of your machine-to-machine (M2M) authorization use cases with Amazon Cognito. This involved using the Cost and Usage Dashboards Operations Solution (CUDOS) to understand your Cognito M2M token requests and costs. We also discussed using caching from Amazon API Gateway as an HTTP proxy integration to the Cognito user pool /oauth2/token endpoint. By following the guidance in this post, you can better understand your M2M usage and costs and achieve added benefits such as cost optimization, performance efficiency, and higher levels of availability. Lastly, we provided several security best practices and considerations that can be used as additional layers to elevate your security posture.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on Amazon Cognito re:Post or contact AWS Support.

Abrom Douglas

Abrom Douglas III

Abrom is a Senior Solutions Architect within AWS Identity and has over 19 years of software engineering and security experience, specializing in the identity and access management (IAM) space. He loves speaking with customers about how IAM can provide secure outcomes that enable both business and technology goals. In his free time, he enjoys cheering for Arsenal FC, photography, travel, and competing in duathlons.

Nisha Notani

Nisha Notani

Nisha is a Senior Technical Account Manager for AWS in London, working closely with enterprise customers to accelerate their cloud journey through strategic guidance and technical expertise. She helps organizations build cloud maturity across the AWS Well-Architected pillars, with a focus on operational excellence, observability, and reliability. As an active member of the cloud financial management community, she supports customers in implementing FinOps best practices and cost optimization strategies across their organizations. A passionate mentor, she guides colleagues in their professional development, serves on the AWS Support Give Back program core team to promote volunteering, and actively mentors students in local schools and colleges, providing guidance on their career journeys.

Security updates for Monday

Post Syndicated from jake original https://lwn.net/Articles/1004962/

Security updates have been issued by AlmaLinux (dpdk, firefox, iperf3, thunderbird, and webkit2gtk3), Debian (firefox-esr, gnuchess, node-mocha, openafs, python-django, and thunderbird), Fedora (libxmp, python-jinja2, suricata, thunderbird, and xen), Mageia (avahi, libjxl, opencontainers-runc, radare2, rizin, and tinyproxy), Oracle (cups, dpdk, firefox, iperf3, kernel, thunderbird, and webkit2gtk3), SUSE (apptainer, chromedriver, dnsmasq, govulncheck-vulndb, gstreamer, gstreamer-plugins-base, gstreamer-plugins-good, logback, and python311-slixmpp), and Ubuntu (libxmltok, linux-realtime, roundcube, and snapd).

Microsoft Takes Legal Action Against AI “Hacking as a Service” Scheme

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/01/microsoft-takes-legal-action-against-ai-hacking-as-a-service-scheme.html

Not sure this will matter in the end, but it’s a positive move:

Microsoft is accusing three individuals of running a “hacking-as-a-service” scheme that was designed to allow the creation of harmful and illicit content using the company’s platform for AI-generated content.

The foreign-based defendants developed tools specifically designed to bypass safety guardrails Microsoft has erected to prevent the creation of harmful content through its generative AI services, said Steven Masada, the assistant general counsel for Microsoft’s Digital Crimes Unit. They then compromised the legitimate accounts of paying customers. They combined those two things to create a fee-based platform people could use.

It was a sophisticated scheme:

The service contained a proxy server that relayed traffic between its customers and the servers providing Microsoft’s AI services, the suit alleged. Among other things, the proxy service used undocumented Microsoft network application programming interfaces (APIs) to communicate with the company’s Azure computers. The resulting requests were designed to mimic legitimate Azure OpenAPI Service API requests and used compromised API keys to authenticate them.

Slashdot thread.

Медийната грамотност като задължителен елемент на образованието

Post Syndicated from Яна Хашъмова original https://www.toest.bg/mediynata-gramotnost-kato-zadulzhitelen-element-na-obrazovanieto/

Медийната грамотност като задължителен елемент на образованието

За никого не е тайна, че нивото на образование на обществото е пряко свързано със здравето, икономическия растеж и стандарта на живот на хората в същото това общество. Образованието до известна степен е свързано и с политическата реалност в една държава.

В САЩ например училищното образование изостава от изискванията на обществото. Нивата на знания по математика и четене се задържат стабилни в последните двайсетина години, но в азиатските страни растат и САЩ изостава в сравнителен аспект. Висшето образование също изисква промени. Но по-тревожна е тенденцията за негативна обществена нагласа към висшето образование, която се забелязва в САЩ. Въпреки че страната продължава да бъде водещата световна икономика, много политици и икономисти се страхуват, че скоро ще отстъпи от тази позиция. Eдин от индикаторите за това е фактът, че през последното десетилетие качеството на живот в САЩ спада.

В България, за щастие, на висшето образование все още се гледа положително, като условие за по-добър стандарт на живот. Националният статистически институт публикува данни за 2,5% спад на студентите в българските вузове през 2023 г. в сравнение с 2022-ра, но без да се отчитат младежите, които следват в чужбина, както и негативната демография. Несъмнено обаче училищната система се нуждае от спешни промени. Нивото на образование на българското население е под средното за Европейския съюз. От 89 държави в изследване на U.S. News качеството на живот в България е на 43-то място – след Индия, Бразилия и Филипините, преди Румъния и Русия и на едно от последните места в Европейския съюз. 

Въпреки че прякото сравнение на образователните системи в САЩ и България е трудно, дори невъзможно – по много причини, – ще си позволя някои мисли за нужните промени в образователните политики и реалности и в двете държави.

Заварчици или философи?

Над 30 години съм преподавателка в държавен университет в САЩ и през това време образованието се промени. Интернет като че ли управлява знанията ни (да не споменавам изкуствения интелект) и с всеки следващ випуск гимназисти, които се записват в курсовете ми, забелязвам промяна не само в нивото на знанията им, но и в начина, по който възприемат образованието и се отнасят към него.

Липсват им основни понятия за обществото и света, в областта на географията, историята, познания за другия извън прякото обкръжение. Може би са чували за комунизма, но нямат никаква представа какво е това. А изучаването на чужд език се счита за напълно излишно, въпреки че много изследвания доказват ползата му за когнитивното развитие на младите хора, за комуникативността, мултикултурната грамотност и др. Имам и преки наблюдения върху училищното образование покрай сина ми, който тази година завършва гимназия в САЩ.

Администрацията на Роналд Рейгън започна системен деструктивен подход към образователната система, който като че ли става все по-вреден през годините. 

Рейгън всяка година настояваше за 20% намаление на бюджета за образование, режеше от фондовете за строителство в държавните университети и твърдеше, че държавата трябва да престане да субсидира интелектуалното любопитство. Тази политика не само продължава, но и се превръща в обществена нагласа. През ноември 2015 г., по време на предизборен дебат, републиканският сенатор от Флорида Марко Рубио заявява:

Заварчиците получават повече от философите. На нас ни трябват заварчици, а не философи.

Залата избухва в аплодисменти. 

За тези, които биха повярвали на това твърдение, Forbes съобщава, че хората с философско образование заработват 78% повече от заварчиците. Но извън пряката материална полза, мащабно вълнуваща младите хора в САЩ, пък и в България, философията като предмет, наред с много други познания, гради и критично мислене и знания за индивида и обществото.

По-тревожни са аплодисментите, разкриващи негативното отношение на мнозина към висшето образование. Същото отношение се забелязва и към училищата. Съкращаването на средствата, лошата материална база, ниското заплащане и недостигът на учители водят до гимназисти, които са слабо подготвени за включване в трудовия сектор или за продължаване на образованието си в университет. Националният институт по грамотност на САЩ публикува данните за грамотността на американското население през учебната 2024–2025 година. Като цяло възрастното население е грамотно – 79%, 21% са неграмотни. Но грамотността на 54% от тези близо 80% грамотни хора е под нивото на шестокласник. Част от резултатите се дължат на факта, че грамотността се измерва с познанията по английски език, а доста имигранти го владеят слабо или не го владеят изобщо. 

По-различни знания и подходи

Както споменах по-горе, интернет също преобръща нагласите към образованието, което налага и спешни промени в образователните системи. Мрежата предлага всякаква информация, достъпна от екрана на мобилния ни телефон, и въпреки това човек трябва да притежава основни знания за света и за обществото, за да може да има адекватно отношение към себе си и към страната си. (Подчертавам това за адекватността, защото, за съжаление, често потъването в дигиталното и в интернет не води до такова отношение.) 

Училището трябва да свърже споменатите основни знания с действителността извън класната стая, с производството, с бизнеса, с неправителствения сектор и т.н. Но също така то трябва да култивира качества, умения и нагласи като критично мислене, успешна комуникация, работа в екип, ефективно сътрудничество, медийна грамотност и обществена ангажираност. 

Един от методите за постигане на такива резултати е т.нар. активно учене, или активно усвояване на материала. Ученикът трябва да вземе активно участие и да е в центъра на учебния процес. Това става с повече проекти – индивидуални и групови, с повече възможности учениците да се научат от грешките си след контролни и изпити, да им се разреши да поправят оценките си, след като анализират пропуските си и дадат правилни примери, и т.н. 

Важно е също така учебният процес да излиза колкото може по-често извън класната стая и да се осигурят възможности за практика. 

И може би най-ценното: да се култивират доброволен труд и ангажираност в обществото за създаването на гражданско съзнание. В България примерите за млади хора без всякакъв морал и гражданска съвест са доста. И това, макар и тревожно, не е учудващо предвид факта колко много са примерите за зрели и утвърдени хора без всякакъв морал и гражданска съвест, които даже са част от политическия живот и държавното управление.

Училището има възможността да култивира друго поведение и отношение към живота. В гимназията на сина ми учениците завършват със 120 часа практика в XII клас и 80 часа доброволен труд в IX, X и XI клас. Не твърдя, че това е идеалното училище. Такова няма. Но то използва ефективна методика за по-добро образование за XXI век. Повечето държавни училища в САЩ и в България са далеч от постигането на такъв баланс – на обучение, което предлага знания и изгражда качества. 

Предметът медийна грамотност

Освен различна структура и подходи в обучението, спешно се налага въвеждането на медийна грамотност като предмет и в САЩ, и в България. И в двете страни, пък и в много други, гражданите стават обект на масивни и системни дезинформационни кампании и подклаждане на страхове. По медийна грамотност България заема 33-то място през 2022 г. от 41 европейски страни според изследване на „Отворено общество“. През 2023 г. вече е на 35-то място, като непосредствено преди нас са Сърбия, Молдова, Черна гора и Румъния.

Същото изследване има и разширен вариант, в който към европейските страни са добавени още 6 държави – Канада, Австралия, САЩ, Израел, Южна Корея и Япония. САЩ е на 18-то място от 47-те изследвани държави, а България – на 41-во. Въпреки че преподаватели и изследователи вярват в необходимостта младите хора да бъдат грамотни консуматори на дигитална информация, в САЩ само два щата – Ню Джърси и Делауеър – законово въвеждат изучването на медийна грамотност от I до XII клас. 

В първата група на грамотност в изследването, цитирано по-горе, влизат Финландия, Дания, Норвегия, Естония, Швеция, Ирландия и Швейцария. Много добър пример е Естония. Масивно подлагана на дезинформация от Русия с години, през 2010-та страната въвежда медийната грамотност като предмет в училищата от I клас до края на средното образование.

Как се култивира медийна грамотност?

  • създават се условия учениците да задават въпроси, а не само да отговарят;
  • представят се различни гледни точки, които предизвикват комплексни дебати;
  • изисква се учениците да проверяват информацията и да намират различни източници за проверка и сравнение;
  • очаква се да могат да обосноват разбиранията и интерпретациите си с документирани факти и доказателства.

Култивирането на умения с подобни подходи възпитава не консуматори на информация, а изследователи и помага за създаването на гражданско съзнание и отговорност. Такова образование определено е отговор на масовия и доста ефикасен популизъм в политиката на САЩ, България и доста други европейски държави.

Не само че е крайно време, но вече е задължително българската образователна система да включи медийната грамотност в програмата. Проекти като „Медийна грамотност в класната стая“ предлагат добър пример за повишаването на грамотността сред учениците, но това е крайно недостатъчно и ще продължи да бъде недостатъчно, докато медийната грамотност не се превърне в държавна политика. 

The collective thoughts of the interwebz